TY - JOUR
T1 - Finding the best learning to rank algorithms for effort-aware defect prediction
AU - Yu, Xiao
AU - Dai, Heng
AU - Li, Li
AU - Gu, Xiaodong
AU - Keung, Jacky Wai
AU - Bennin, Kwabena Ebo
AU - Li, Fuyang
AU - Liu, Jin
PY - 2023/5
Y1 - 2023/5
N2 - Effort-Aware Defect Prediction (EADP) ranks software modules or changes based on their predicted number of defects (i.e., considering modules or changes as effort) or defect density (i.e., considering LOC as effort) by using learning to rank algorithms. Ranking instability refers to the inconsistent conclusions produced by existing empirical studies of EADP. The major reason is the poor experimental design, such as comparison of few learning to rank algorithms, the use of small number of datasets or datasets without indicating numbers of defects, and evaluation with inappropriate or few metrics. Objective: To find a stable ranking of learning to rank algorithms to investigate the best ones for EADP, Method: We examine the practical effects of 34 algorithms on 49 datasets for EADP. We measure the performance of these algorithms using 7 module-based and 7 LOC-based metrics and run experiments under cross-release and cross-project settings, respectively. Finally, we obtain the ranking of these algorithms by performing the Scott-Knott ESD test. Results: When module is used as effort, random forest regression performs the best under cross-release setting, and linear regression performs the best under cross-project setting among the learning to rank algorithms; (2) when LOC is used as effort, LTR-linear (Learning-to-Rank with the linear model) performs the best under cross-release setting, and Ranking SVM performs the best under cross-project setting. Conclusion: This comprehensive experimental procedure allows us to discover a stable ranking of the studied algorithms to select the best ones according to the requirement of software projects.
AB - Effort-Aware Defect Prediction (EADP) ranks software modules or changes based on their predicted number of defects (i.e., considering modules or changes as effort) or defect density (i.e., considering LOC as effort) by using learning to rank algorithms. Ranking instability refers to the inconsistent conclusions produced by existing empirical studies of EADP. The major reason is the poor experimental design, such as comparison of few learning to rank algorithms, the use of small number of datasets or datasets without indicating numbers of defects, and evaluation with inappropriate or few metrics. Objective: To find a stable ranking of learning to rank algorithms to investigate the best ones for EADP, Method: We examine the practical effects of 34 algorithms on 49 datasets for EADP. We measure the performance of these algorithms using 7 module-based and 7 LOC-based metrics and run experiments under cross-release and cross-project settings, respectively. Finally, we obtain the ranking of these algorithms by performing the Scott-Knott ESD test. Results: When module is used as effort, random forest regression performs the best under cross-release setting, and linear regression performs the best under cross-project setting among the learning to rank algorithms; (2) when LOC is used as effort, LTR-linear (Learning-to-Rank with the linear model) performs the best under cross-release setting, and Ranking SVM performs the best under cross-project setting. Conclusion: This comprehensive experimental procedure allows us to discover a stable ranking of the studied algorithms to select the best ones according to the requirement of software projects.
KW - Empirical study
KW - Learning to rank
KW - Ranking instability
KW - Software defect prediction
U2 - 10.1016/j.infsof.2023.107165
DO - 10.1016/j.infsof.2023.107165
M3 - Article
AN - SCOPUS:85147546045
SN - 0950-5849
VL - 157
JO - Information and Software Technology
JF - Information and Software Technology
M1 - 107165
ER -