Defect Prediction

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 9822 Experts worldwide ranked by ideXlab platform

Ahmed E Hassan - One of the best experts on this subject based on the ideXlab platform.

  • the impact of class rebalancing techniques on the performance and interpretation of Defect Prediction models
    arXiv: Software Engineering, 2018
    Co-Authors: Chakkrit Tantithamthavorn, Ahmed E Hassan, Kenichi Matsumoto
    Abstract:

    Defect Prediction models that are trained on class imbalanced datasets (i.e., the proportion of Defective and clean modules is not equally represented) are highly susceptible to produce inaccurate Prediction models. Prior research compares the impact of class rebalancing techniques on the performance of Defect Prediction models. Prior research efforts arrive at contradictory conclusions due to the use of different choice of datasets, classification techniques, and performance measures. Such contradictory conclusions make it hard to derive practical guidelines for whether class rebalancing techniques should be applied in the context of Defect Prediction models. In this paper, we investigate the impact of 4 popularly-used class rebalancing techniques on 10 commonly-used performance measures and the interpretation of Defect Prediction models. We also construct statistical models to better understand in which experimental design settings that class rebalancing techniques are beneficial for Defect Prediction models. Through a case study of 101 datasets that span across proprietary and open-source systems, we recommend that class rebalancing techniques are necessary when quality assurance teams wish to increase the completeness of identifying software Defects (i.e., Recall). However, class rebalancing techniques should be avoided when interpreting Defect Prediction models. We also find that class rebalancing techniques do not impact the AUC measure. Hence, AUC should be used as a standard measure when comparing Defect Prediction models.

  • the use of summation to aggregate software metrics hinders the performance of Defect Prediction models
    IEEE Transactions on Software Engineering, 2017
    Co-Authors: Feng Zhang, Shane Mcintosh, Ahmed E Hassan, Ying Zou
    Abstract:

    Defect Prediction models help software organizations to anticipate where Defects will appear in the future. When training a Defect Prediction model, historical Defect data is often mined from a Version Control System (VCS, e.g., Subversion), which records software changes at the file-level. Software metrics, on the other hand, are often calculated at the class- or method-level (e.g., McCabe's Cyclomatic Complexity). To address the disagreement in granularity, the class- and method-level software metrics are aggregated to file-level, often using summation (i.e., McCabe of a file is the sum of the McCabe of all methods within the file). A recent study shows that summation significantly inflates the correlation between lines of code ( Sloc ) and cyclomatic complexity ( Cc ) in Java projects. While there are many other aggregation schemes (e.g., central tendency, dispersion), they have remained unexplored in the scope of Defect Prediction. In this study, we set out to investigate how different aggregation schemes impact Defect Prediction models. Through an analysis of 11 aggregation schemes using data collected from 255 open source projects, we find that: (1) aggregation schemes can significantly alter correlations among metrics, as well as the correlations between metrics and the Defect count; (2) when constructing models to predict Defect proneness, applying only the summation scheme (i.e., the most commonly used aggregation scheme in the literature) only achieves the best performance (the best among the 12 studied configurations) in 11 percent of the studied projects, while applying all of the studied aggregation schemes achieves the best performance in 40 percent of the studied projects; (3) when constructing models to predict Defect rank or count, either applying only the summation or applying all of the studied aggregation schemes achieves similar performance, with both achieving the closest to the best performance more often than the other studied aggregation schemes; and (4) when constructing models for effort-aware Defect Prediction, the mean or median aggregation schemes yield performance values that are significantly closer to the best performance than any of the other studied aggregation schemes. Broadly speaking, the performance of Defect Prediction models are often underestimated due to our community's tendency to only use the summation aggregation scheme. Given the potential benefit of applying additional aggregation schemes, we advise that future Defect Prediction models should explore a variety of aggregation schemes.

  • an empirical comparison of model validation techniques for Defect Prediction models
    IEEE Transactions on Software Engineering, 2017
    Co-Authors: Chakkrit Tantithamthavorn, Shane Mcintosh, Ahmed E Hassan, Kenichi Matsumoto
    Abstract:

    Defect Prediction models help software quality assurance teams to allocate their limited resources to the most Defect-prone modules. Model validation techniques, such as $k$ -fold cross-validation, use historical data to estimate how well a model will perform in the future. However, little is known about how accurate the estimates of model validation techniques tend to be. In this paper, we investigate the bias and variance of model validation techniques in the domain of Defect Prediction. Analysis of 101 public Defect datasets suggests that 77 percent of them are highly susceptible to producing unstable results– - selecting an appropriate model validation technique is a critical experimental design choice. Based on an analysis of 256 studies in the Defect Prediction literature, we select the 12 most commonly adopted model validation techniques for evaluation. Through a case study of 18 systems, we find that single-repetition holdout validation tends to produce estimates with 46-229 percent more bias and 53-863 percent more variance than the top-ranked model validation techniques. On the other hand, out-of-sample bootstrap validation yields the best balance between the bias and variance of estimates in the context of our study. Therefore, we recommend that future Defect Prediction studies avoid single-repetition holdout validation, and instead, use out-of-sample bootstrap validation.

  • comments on researcher bias the use of machine learning in software Defect Prediction
    IEEE Transactions on Software Engineering, 2016
    Co-Authors: Chakkrit Tantithamthavorn, Shane Mcintosh, Ahmed E Hassan, Kenichi Matsumoto
    Abstract:

    Shepperd et al. find that the reported performance of a Defect Prediction model shares a strong relationship with the group of researchers who construct the models. In this paper, we perform an alternative investigation of Shepperd et al.'s data. We observe that (a) research group shares a strong association with other explanatory variables (i.e., the dataset and metric families that are used to build a model); (b) the strong association among these explanatory variables makes it difficult to discern the impact of the research group on model performance; and (c) after mitigating the impact of this strong association, we find that the research group has a smaller impact than the metric family. These observations lead us to conclude that the relationship between the research group and the performance of a Defect Prediction model are more likely due to the tendency of researchers to reuse experimental components (e.g., datasets and metrics). We recommend that researchers experiment with a broader selection of datasets and metrics to combat any potential bias in their results.

  • studying just in time Defect Prediction using cross project models
    Empirical Software Engineering, 2016
    Co-Authors: Yasutaka Kamei, Shane Mcintosh, Takafumi Fukushima, Kazuhiro Yamashita, Naoyasu Ubayashi, Ahmed E Hassan
    Abstract:

    Unlike traditional Defect Prediction models that identify Defect-prone modules, Just-In-Time (JIT) Defect Prediction models identify Defect-inducing changes. As such, JIT Defect models can provide earlier feedback for developers, while design decisions are still fresh in their minds. Unfortunately, similar to traditional Defect models, JIT models require a large amount of training data, which is not available when projects are in initial development phases. To address this limitation in traditional Defect Prediction, prior work has proposed cross-project models, i.e., models learned from other projects with sufficient history. However, cross-project models have not yet been explored in the context of JIT Prediction. Therefore, in this study, we empirically evaluate the performance of JIT models in a cross-project context. Through an empirical study on 11 open source projects, we find that while JIT models rarely perform well in a cross-project context, their performance tends to improve when using approaches that: (1) select models trained using other projects that are similar to the testing project, (2) combine the data of several other projects to produce a larger pool of training data, and (3) combine the models of several other projects to produce an ensemble model. Our findings empirically confirm that JIT models learned using other projects are a viable solution for projects with limited historical data. However, JIT models tend to perform best in a cross-project context when the data used to learn them are carefully selected.

Shane Mcintosh - One of the best experts on this subject based on the ideXlab platform.

  • the use of summation to aggregate software metrics hinders the performance of Defect Prediction models
    IEEE Transactions on Software Engineering, 2017
    Co-Authors: Feng Zhang, Shane Mcintosh, Ahmed E Hassan, Ying Zou
    Abstract:

    Defect Prediction models help software organizations to anticipate where Defects will appear in the future. When training a Defect Prediction model, historical Defect data is often mined from a Version Control System (VCS, e.g., Subversion), which records software changes at the file-level. Software metrics, on the other hand, are often calculated at the class- or method-level (e.g., McCabe's Cyclomatic Complexity). To address the disagreement in granularity, the class- and method-level software metrics are aggregated to file-level, often using summation (i.e., McCabe of a file is the sum of the McCabe of all methods within the file). A recent study shows that summation significantly inflates the correlation between lines of code ( Sloc ) and cyclomatic complexity ( Cc ) in Java projects. While there are many other aggregation schemes (e.g., central tendency, dispersion), they have remained unexplored in the scope of Defect Prediction. In this study, we set out to investigate how different aggregation schemes impact Defect Prediction models. Through an analysis of 11 aggregation schemes using data collected from 255 open source projects, we find that: (1) aggregation schemes can significantly alter correlations among metrics, as well as the correlations between metrics and the Defect count; (2) when constructing models to predict Defect proneness, applying only the summation scheme (i.e., the most commonly used aggregation scheme in the literature) only achieves the best performance (the best among the 12 studied configurations) in 11 percent of the studied projects, while applying all of the studied aggregation schemes achieves the best performance in 40 percent of the studied projects; (3) when constructing models to predict Defect rank or count, either applying only the summation or applying all of the studied aggregation schemes achieves similar performance, with both achieving the closest to the best performance more often than the other studied aggregation schemes; and (4) when constructing models for effort-aware Defect Prediction, the mean or median aggregation schemes yield performance values that are significantly closer to the best performance than any of the other studied aggregation schemes. Broadly speaking, the performance of Defect Prediction models are often underestimated due to our community's tendency to only use the summation aggregation scheme. Given the potential benefit of applying additional aggregation schemes, we advise that future Defect Prediction models should explore a variety of aggregation schemes.

  • an empirical comparison of model validation techniques for Defect Prediction models
    IEEE Transactions on Software Engineering, 2017
    Co-Authors: Chakkrit Tantithamthavorn, Shane Mcintosh, Ahmed E Hassan, Kenichi Matsumoto
    Abstract:

    Defect Prediction models help software quality assurance teams to allocate their limited resources to the most Defect-prone modules. Model validation techniques, such as $k$ -fold cross-validation, use historical data to estimate how well a model will perform in the future. However, little is known about how accurate the estimates of model validation techniques tend to be. In this paper, we investigate the bias and variance of model validation techniques in the domain of Defect Prediction. Analysis of 101 public Defect datasets suggests that 77 percent of them are highly susceptible to producing unstable results– - selecting an appropriate model validation technique is a critical experimental design choice. Based on an analysis of 256 studies in the Defect Prediction literature, we select the 12 most commonly adopted model validation techniques for evaluation. Through a case study of 18 systems, we find that single-repetition holdout validation tends to produce estimates with 46-229 percent more bias and 53-863 percent more variance than the top-ranked model validation techniques. On the other hand, out-of-sample bootstrap validation yields the best balance between the bias and variance of estimates in the context of our study. Therefore, we recommend that future Defect Prediction studies avoid single-repetition holdout validation, and instead, use out-of-sample bootstrap validation.

  • comments on researcher bias the use of machine learning in software Defect Prediction
    IEEE Transactions on Software Engineering, 2016
    Co-Authors: Chakkrit Tantithamthavorn, Shane Mcintosh, Ahmed E Hassan, Kenichi Matsumoto
    Abstract:

    Shepperd et al. find that the reported performance of a Defect Prediction model shares a strong relationship with the group of researchers who construct the models. In this paper, we perform an alternative investigation of Shepperd et al.'s data. We observe that (a) research group shares a strong association with other explanatory variables (i.e., the dataset and metric families that are used to build a model); (b) the strong association among these explanatory variables makes it difficult to discern the impact of the research group on model performance; and (c) after mitigating the impact of this strong association, we find that the research group has a smaller impact than the metric family. These observations lead us to conclude that the relationship between the research group and the performance of a Defect Prediction model are more likely due to the tendency of researchers to reuse experimental components (e.g., datasets and metrics). We recommend that researchers experiment with a broader selection of datasets and metrics to combat any potential bias in their results.

  • studying just in time Defect Prediction using cross project models
    Empirical Software Engineering, 2016
    Co-Authors: Yasutaka Kamei, Shane Mcintosh, Takafumi Fukushima, Kazuhiro Yamashita, Naoyasu Ubayashi, Ahmed E Hassan
    Abstract:

    Unlike traditional Defect Prediction models that identify Defect-prone modules, Just-In-Time (JIT) Defect Prediction models identify Defect-inducing changes. As such, JIT Defect models can provide earlier feedback for developers, while design decisions are still fresh in their minds. Unfortunately, similar to traditional Defect models, JIT models require a large amount of training data, which is not available when projects are in initial development phases. To address this limitation in traditional Defect Prediction, prior work has proposed cross-project models, i.e., models learned from other projects with sufficient history. However, cross-project models have not yet been explored in the context of JIT Prediction. Therefore, in this study, we empirically evaluate the performance of JIT models in a cross-project context. Through an empirical study on 11 open source projects, we find that while JIT models rarely perform well in a cross-project context, their performance tends to improve when using approaches that: (1) select models trained using other projects that are similar to the testing project, (2) combine the data of several other projects to produce a larger pool of training data, and (3) combine the models of several other projects to produce an ensemble model. Our findings empirically confirm that JIT models learned using other projects are a viable solution for projects with limited historical data. However, JIT models tend to perform best in a cross-project context when the data used to learn them are carefully selected.

  • automated parameter optimization of classification techniques for Defect Prediction models
    International Conference on Software Engineering, 2016
    Co-Authors: Chakkrit Tantithamthavorn, Shane Mcintosh, Ahmed E Hassan, Kenichi Matsumoto
    Abstract:

    Defect Prediction models are classifiers that are trained to identify Defect-prone software modules. Such classifiers have configurable parameters that control their characteristics (e.g., the number of trees in a random forest classifier). Recent studies show that these classifiers may underperform due to the use of suboptimal default parameter settings. However, it is impractical to assess all of the possible settings in the parameter spaces. In this paper, we investigate the performance of Defect Prediction models where Caret --- an automated parameter optimization technique --- has been applied. Through a case study of 18 datasets from systems that span both proprietary and open source domains, we find that (1) Caret improves the AUC performance of Defect Prediction models by as much as 40 percentage points; (2) Caret-optimized classifiers are at least as stable as (with 35% of them being more stable than) classifiers that are trained using the default settings; and (3) Caret increases the likelihood of producing a top-performing classifier by as much as 83%. Hence, we conclude that parameter settings can indeed have a large impact on the performance of Defect Prediction models, suggesting that researchers should experiment with the parameters of the classification techniques. Since automated parameter optimization techniques like Caret yield substantially benefits in terms of performance improvement and stability, while incurring a manageable additional computational cost, they should be included in future Defect Prediction studies.

Kenichi Matsumoto - One of the best experts on this subject based on the ideXlab platform.

  • the impact of class rebalancing techniques on the performance and interpretation of Defect Prediction models
    arXiv: Software Engineering, 2018
    Co-Authors: Chakkrit Tantithamthavorn, Ahmed E Hassan, Kenichi Matsumoto
    Abstract:

    Defect Prediction models that are trained on class imbalanced datasets (i.e., the proportion of Defective and clean modules is not equally represented) are highly susceptible to produce inaccurate Prediction models. Prior research compares the impact of class rebalancing techniques on the performance of Defect Prediction models. Prior research efforts arrive at contradictory conclusions due to the use of different choice of datasets, classification techniques, and performance measures. Such contradictory conclusions make it hard to derive practical guidelines for whether class rebalancing techniques should be applied in the context of Defect Prediction models. In this paper, we investigate the impact of 4 popularly-used class rebalancing techniques on 10 commonly-used performance measures and the interpretation of Defect Prediction models. We also construct statistical models to better understand in which experimental design settings that class rebalancing techniques are beneficial for Defect Prediction models. Through a case study of 101 datasets that span across proprietary and open-source systems, we recommend that class rebalancing techniques are necessary when quality assurance teams wish to increase the completeness of identifying software Defects (i.e., Recall). However, class rebalancing techniques should be avoided when interpreting Defect Prediction models. We also find that class rebalancing techniques do not impact the AUC measure. Hence, AUC should be used as a standard measure when comparing Defect Prediction models.

  • an empirical comparison of model validation techniques for Defect Prediction models
    IEEE Transactions on Software Engineering, 2017
    Co-Authors: Chakkrit Tantithamthavorn, Shane Mcintosh, Ahmed E Hassan, Kenichi Matsumoto
    Abstract:

    Defect Prediction models help software quality assurance teams to allocate their limited resources to the most Defect-prone modules. Model validation techniques, such as $k$ -fold cross-validation, use historical data to estimate how well a model will perform in the future. However, little is known about how accurate the estimates of model validation techniques tend to be. In this paper, we investigate the bias and variance of model validation techniques in the domain of Defect Prediction. Analysis of 101 public Defect datasets suggests that 77 percent of them are highly susceptible to producing unstable results– - selecting an appropriate model validation technique is a critical experimental design choice. Based on an analysis of 256 studies in the Defect Prediction literature, we select the 12 most commonly adopted model validation techniques for evaluation. Through a case study of 18 systems, we find that single-repetition holdout validation tends to produce estimates with 46-229 percent more bias and 53-863 percent more variance than the top-ranked model validation techniques. On the other hand, out-of-sample bootstrap validation yields the best balance between the bias and variance of estimates in the context of our study. Therefore, we recommend that future Defect Prediction studies avoid single-repetition holdout validation, and instead, use out-of-sample bootstrap validation.

  • comments on researcher bias the use of machine learning in software Defect Prediction
    IEEE Transactions on Software Engineering, 2016
    Co-Authors: Chakkrit Tantithamthavorn, Shane Mcintosh, Ahmed E Hassan, Kenichi Matsumoto
    Abstract:

    Shepperd et al. find that the reported performance of a Defect Prediction model shares a strong relationship with the group of researchers who construct the models. In this paper, we perform an alternative investigation of Shepperd et al.'s data. We observe that (a) research group shares a strong association with other explanatory variables (i.e., the dataset and metric families that are used to build a model); (b) the strong association among these explanatory variables makes it difficult to discern the impact of the research group on model performance; and (c) after mitigating the impact of this strong association, we find that the research group has a smaller impact than the metric family. These observations lead us to conclude that the relationship between the research group and the performance of a Defect Prediction model are more likely due to the tendency of researchers to reuse experimental components (e.g., datasets and metrics). We recommend that researchers experiment with a broader selection of datasets and metrics to combat any potential bias in their results.

  • automated parameter optimization of classification techniques for Defect Prediction models
    International Conference on Software Engineering, 2016
    Co-Authors: Chakkrit Tantithamthavorn, Shane Mcintosh, Ahmed E Hassan, Kenichi Matsumoto
    Abstract:

    Defect Prediction models are classifiers that are trained to identify Defect-prone software modules. Such classifiers have configurable parameters that control their characteristics (e.g., the number of trees in a random forest classifier). Recent studies show that these classifiers may underperform due to the use of suboptimal default parameter settings. However, it is impractical to assess all of the possible settings in the parameter spaces. In this paper, we investigate the performance of Defect Prediction models where Caret --- an automated parameter optimization technique --- has been applied. Through a case study of 18 datasets from systems that span both proprietary and open source domains, we find that (1) Caret improves the AUC performance of Defect Prediction models by as much as 40 percentage points; (2) Caret-optimized classifiers are at least as stable as (with 35% of them being more stable than) classifiers that are trained using the default settings; and (3) Caret increases the likelihood of producing a top-performing classifier by as much as 83%. Hence, we conclude that parameter settings can indeed have a large impact on the performance of Defect Prediction models, suggesting that researchers should experiment with the parameters of the classification techniques. Since automated parameter optimization techniques like Caret yield substantially benefits in terms of performance improvement and stability, while incurring a manageable additional computational cost, they should be included in future Defect Prediction studies.

Istvan Gergely Czibula - One of the best experts on this subject based on the ideXlab platform.

  • A novel approach for software Defect Prediction through hybridizing gradual relational association rules with artificial neural networks
    Information Sciences, 2018
    Co-Authors: Diana-lucia Miholca, Gabriela Czibula, Istvan Gergely Czibula
    Abstract:

    Abstract The growing complexity of software projects requires increasing consideration of their analysis and testing. Identifying Defective software entities is essential for software quality assurance and it also improves activities related to software testing. In this study, we developed a novel supervised classification method called HyGRAR for software Defect Prediction. HyGRAR is a non-linear hybrid model that combines gradual relational association rule mining and artificial neural networks to discriminate between Defective and non-Defective software entities. Experiments performed based on 10 open-source data sets demonstrated the excellent performance of the HYGRAR classifier. HyGRAR performed better than most of the previously proposed approaches for software Defect Prediction in performance evaluations using the same data sets.

  • software Defect Prediction using relational association rule mining
    Information Sciences, 2014
    Co-Authors: Gabriela Czibula, Zsuzsanna Marian, Istvan Gergely Czibula
    Abstract:

    This paper focuses on the problem of Defect Prediction, a problem of major importance during software maintenance and evolution. It is essential for software developers to identify Defective software modules in order to continuously improve the quality of a software system. As the conditions for a software module to have Defects are hard to identify, machine learning based classification models are still developed to approach the problem of Defect Prediction. We propose a novel classification model based on relational association rules mining. Relational association rules are an extension of ordinal association rules, which are a particular type of association rules that describe numerical orderings between attributes that commonly occur over a dataset. Our classifier is based on the discovery of relational association rules for predicting whether a software module is or it is not Defective. An experimental evaluation of the proposed model on the open source NASA datasets, as well as a comparison to similar existing approaches is provided. The obtained results show that our classifier overperforms, for most of the considered evaluation measures, the existing machine learning based techniques for Defect Prediction. This confirms the potential of our proposal.

Xiao-yuan Jing - One of the best experts on this subject based on the ideXlab platform.

  • cross project and within project semisupervised software Defect Prediction a unified approach
    IEEE Transactions on Reliability, 2018
    Co-Authors: Xiao-yuan Jing, Ying Sun, Jing Sun, Lin Huang, Fangyi Cui, Yanfei Sun
    Abstract:

    When there exist not enough historical Defect data for building an accurate Prediction model, semisupervised Defect Prediction (SSDP) and cross-project Defect Prediction (CPDP) are two feasible solutions. Existing CPDP methods assume that the available source data are well labeled. However, due to expensive human efforts for labeling a large amount of Defect data, usually, we can only utilize the suitable unlabeled source data. We call CPDP in this scenario as cross-project semisupervised Defect Prediction (CSDP). Although some within-project semisupervised Defect Prediction (WSDP) methods have been developed in recent years, there still exists much room for improvement on Prediction performance. In this paper, we aim to provide a unified and effective solution for both CSDP and WSDP problems. We introduce the semisupervised dictionary learning technique and propose a cost-sensitive kernelized semisupervised dictionary learning (CKSDL) approach. CKSDL can make full use of the limited labeled Defect data and a large amount of unlabeled data in the kernel space. In addition, CKSDL considers the misclassification costs in the dictionary learning process. Extensive experiments on 16 projects indicate that CKSDL outperforms state-of-the-art WSDP methods, using unlabeled cross-project Defect data can help improve the WSDP performance, and CKSDL generally obtains significantly better Prediction performance than related SSDP methods in the CSDP scenario.

  • progress on approaches to software Defect Prediction
    IET Software, 2018
    Co-Authors: Xiao-yuan Jing, Xiaoke Zhu
    Abstract:

    Software Defect Prediction is one of the most popular research topics in software engineering. It aims to predict Defect-prone software modules before Defects are discovered, therefore it can be used to better prioritise software quality assurance effort. In recent years, especially for recent 3 years, many new Defect Prediction studies have been proposed. The goal of this study is to comprehensively review, analyse and discuss the state-of-the-art of Defect Prediction. The authors survey almost 70 representative Defect Prediction papers in recent years (January 2014-April 2017), most of which are published in the prominent software engineering journals and top conferences. The selected Defect Prediction papers are summarised to four aspects: machine learning-based Prediction algorithms, manipulating the data, effort-aware Prediction and empirical studies. The research community is still facing a number of challenges for building methods and many research opportunities exist. The identified challenges can give some practical guidelines for both software engineering researchers and practitioners in future software Defect Prediction.

  • label propagation based semi supervised learning for software Defect Prediction
    Automated Software Engineering, 2017
    Co-Authors: Zhi-wu Zhang, Xiao-yuan Jing, Tiejian Wang
    Abstract:

    Software Defect Prediction can automatically predict Defect-prone software modules for efficient software test in software engineering. When the previous Defect labels of modules are limited, predicting the Defect-prone modules becomes a challenging problem. In static software Defect Prediction, there exist the similarity among software modules, a software module can be approximated by a sparse representation of the other part of the software modules, and class-imbalance problem, the number of Defect-free modules is much larger than that of Defective ones. In this paper, we propose to use graph based semi-supervised learning technique to predict software Defect. By using Laplacian score sampling strategy for the labeled Defect-free modules, we construct a class-balance labeled training dataset firstly. And then, we use a nonnegative sparse algorithm to compute the nonnegative sparse weights of a relationship graph which serve as clustering indicators. Lastly, on the nonnegative sparse graph, we use a label propagation algorithm to iteratively predict the labels of unlabeled software modules. We thus propose a nonnegative sparse graph based label propagation approach for software Defect classification and Prediction, which uses not only few labeled data but also abundant unlabeled ones to improve the generalization capability. We vary the size of labeled software modules from 10 to 30 % of all the datasets in the widely used NASA projects. Experimental results show that the NSGLP outperforms several representative state-of-the-art semi-supervised software Defect Prediction methods, and it can fully exploit the characteristics of static code metrics and improve the generalization capability of the software Defect Prediction model.

  • Multiple kernel ensemble learning for software Defect Prediction
    Automated Software Engineering, 2015
    Co-Authors: Tiejian Wang, Zhi-wu Zhang, Xiao-yuan Jing, Liqiang Zhang
    Abstract:

    Software Defect Prediction aims to predict the Defect proneness of new software modules with the historical Defect data so as to improve the quality of a software system. Software historical Defect data has a complicated structure and a marked characteristic of class-imbalance; how to fully analyze and utilize the existing historical Defect data and build more precise and effective classifiers has attracted considerable researchers' interest from both academia and industry. Multiple kernel learning and ensemble learning are effective techniques in the field of machine learning. Multiple kernel learning can map the historical Defect data to a higher-dimensional feature space and make them express better, and ensemble learning can use a series of weak classifiers to reduce the bias generated by the majority class and obtain better predictive performance. In this paper, we propose to use the multiple kernel learning to predict software Defect. By using the characteristics of the metrics mined from the open source software, we get a multiple kernel classifier through ensemble learning method, which has the advantages of both multiple kernel learning and ensemble learning. We thus propose a multiple kernel ensemble learning (MKEL) approach for software Defect classification and Prediction. Considering the cost of risk in software Defect Prediction, we design a new sample weight vector updating strategy to reduce the cost of risk caused by misclassifying Defective modules as non-Defective ones. We employ the widely used NASA MDP datasets as test data to evaluate the performance of all compared methods; experimental results show that MKEL outperforms several representative state-of-the-art Defect Prediction methods.

  • dictionary learning based software Defect Prediction
    International Conference on Software Engineering, 2014
    Co-Authors: Xiao-yuan Jing, Shi Ying, Zhi-wu Zhang
    Abstract:

    In order to improve the quality of a software system, software Defect Prediction aims to automatically identify Defective software modules for efficient software test. To predict software Defect, those classification methods with static code attributes have attracted a great deal of attention. In recent years, machine learning techniques have been applied to Defect Prediction. Due to the fact that there exists the similarity among different software modules, one software module can be approximately represented by a small proportion of other modules. And the representation coefficients over the pre-defined dictionary, which consists of historical software module data, are generally sparse. In this paper, we propose to use the dictionary learning technique to predict software Defect. By using the characteristics of the metrics mined from the open source software, we learn multiple dictionaries (including Defective module and Defective-free module sub-dictionaries and the total dictionary) and sparse representation coefficients. Moreover, we take the misclassification cost issue into account because the misclassification of Defective modules generally incurs much higher risk cost than that of Defective-free ones. We thus propose a cost-sensitive discriminative dictionary learning (CDDL) approach for software Defect classification and Prediction. The widely used datasets from NASA projects are employed as test data to evaluate the performance of all compared methods. Experimental results show that CDDL outperforms several representative state-of-the-art Defect Prediction methods.