Enhancing Just-in-Time Defect Prediction Using Change Request-based Metrics
No Thumbnail Available
Date
2021-02
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Identifying defective software components as early as their commit helps to reduce signi cant software
development and maintenance costs. In recent years, several studies propose to use just-in-time (JIT)
defect prediction techniques to identify changes that could introduce defects at check-in time. To
predict defect introducing changes, JIT defect prediction approaches use change metrics collected
from software repositories. These change metrics, however, capture code and code change related
information. Information related to the change requests (e.g., clarity of change request and di culty
to implement the change) that could determine the change's proneness to introducing new defects are
not studied. In this study, we propose to augment the publicly available change metrics dataset with
six change request-based metrics collected from issue tracking systems. To build the prediction model,
we used ve machine learning algorithms: AdaBoost, XGBoost, Deep Neural Network, Random
Forest and Logistic Regression. The proposed approach is evaluated using a dataset collected from
four open source software systems, i.e., Eclipse platform, Eclipse JDT, Bugzilla and Mozilla. The
results show that the augmented dataset improves the performance of JIT defect prediction in 19
out of 20 cases. F1-score of JIT defect prediction in the four systems is improved by an average of
4.8%, 3.4%, 1.7%, 1.1% and 1.1% while using AdaBoost, XGBoost, Deep Neural Network, Random
Forest and Logistic Regression, respectively. Finally, among the ve algorithms used for building the
machine learning models, AdaBoost is found to be better algorithm for enhancing the performance
of JIT defect prediction. To see which of the features contributed to the improvement of JIT defect
prediction, we computed feature importance using the best performing algorithm, AdaBoost. The
result shows that number of comments (NC), Severity and number of developers assigned (NDA) are
among the top important features from the entire augmented dataset.
Description
Keywords
Just-In-Time software defect prediction, Software Defect Prediction, Software Bugs, Issue Tracking Systems, Software Metrics