A Cost-Effectiveness Metric for Association Rule Mining in Software Defect Prediction

DOI: https://doi.org/10.21203/rs.3.rs-1988568/v1

Abstract

This paper proposes a cost-effectiveness metric for association rule mining suitable for software defect prediction where conditions of defective modules are characterized as association rules.Given a certain amount of test effort (or the number of test cases), the proposed metric is the expected number of defects to be discovered in modules that meet an association rule.Since the amount test effort is limited in general and full testing of all modules is ineffective, the proposed metric is useful to focus on the most cost effective set of modules to be tested with limited test effort. The proposed metric is defined based on the exponential Software Reliability Growth Model (SRGM) extended with the module size parameter, assuming that a larger module require more effort to discover defects. To evaluate the effectiveness of the proposed metric, association rules were extracted and prioritized based on the proposed metric using data sets of four open source software projects. The LOC-based cumulative-lift chart, which is often used to evaluate the cost effectiveness of defect prediction, shows that the proposed metric can focus on the rules that can discover more defects than the conventional association rule metrics, confidence and odds ratio.

Full Text

This preprint is available for download as a PDF.