Enhancing Just-in-Time Defect Prediction Using Change Request-based Metrics

No Thumbnail Available

Date

2021-02

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Identifying defective software components as early as their commit helps to reduce signi cant software development and maintenance costs. In recent years, several studies propose to use just-in-time (JIT) defect prediction techniques to identify changes that could introduce defects at check-in time. To predict defect introducing changes, JIT defect prediction approaches use change metrics collected from software repositories. These change metrics, however, capture code and code change related information. Information related to the change requests (e.g., clarity of change request and di culty to implement the change) that could determine the change's proneness to introducing new defects are not studied. In this study, we propose to augment the publicly available change metrics dataset with six change request-based metrics collected from issue tracking systems. To build the prediction model, we used ve machine learning algorithms: AdaBoost, XGBoost, Deep Neural Network, Random Forest and Logistic Regression. The proposed approach is evaluated using a dataset collected from four open source software systems, i.e., Eclipse platform, Eclipse JDT, Bugzilla and Mozilla. The results show that the augmented dataset improves the performance of JIT defect prediction in 19 out of 20 cases. F1-score of JIT defect prediction in the four systems is improved by an average of 4.8%, 3.4%, 1.7%, 1.1% and 1.1% while using AdaBoost, XGBoost, Deep Neural Network, Random Forest and Logistic Regression, respectively. Finally, among the ve algorithms used for building the machine learning models, AdaBoost is found to be better algorithm for enhancing the performance of JIT defect prediction. To see which of the features contributed to the improvement of JIT defect prediction, we computed feature importance using the best performing algorithm, AdaBoost. The result shows that number of comments (NC), Severity and number of developers assigned (NDA) are among the top important features from the entire augmented dataset.

Description

Keywords

Just-In-Time software defect prediction, Software Defect Prediction, Software Bugs, Issue Tracking Systems, Software Metrics

Citation