Performance comparison approach of different disease prediction studies using quantitative index


 Background and aim Comparing databases performances is quite challenging, especially they are presenting different characteristics including the quantitative and quantitative aspects acquired. The primary aim of this paper is to introduce a quantitative method for performance comparison of prediction of diseases studies using measurable parameters. Results The proposed method focuses on the following parameters: the accuracy and the number of attributes, to compute the performance index. We outline the major obstacles in terms of the construction of a comparative reliable approach, and a demonstrative case regarding the prediction of heart disease is given to validate the effectiveness of the proposed method. Conclusions Comparing performance of databases used in computer aided diagnosis should be based on objective quantitative basis, inhere presented index allows to achieve this goal, and overcome the variability found through databases. The presented methods should constitute a major tool for normalizing the comparison of databases performances, hence allowing more transparent and better evidencing databases qualities.


Introduction
The integration of machine learning (ML) in the medical eld is helping to improve health care delivery and its quality [1]. Indeed, ML techniques are evolving towards solving real health problems. Indeed, among these problems we nd the detection and the prediction of diseases such as diabetes, heart disease, cancer, liver disease, and brain diseases [2][3][4][5][6]. The ML techniques usually include neural network, support vector machine, Naïve Bayes, Decision Tree, and genetic algorithm. In order to evaluate the predictive model, researchers compared their achieved results with other studies in literature using applied metrics. Typically, the most commonly employed metrics are those based on the confusion matrix [7]. However, comparing methods in a measurable and informative way is very complex, since each study used different techniques under different conditions with various and different parameters whereas homogeneity is required the same for all researchers to compare different applied techniques, so that a reasonable comparison can be conducted.
Hence, this paper aims to provide measurable and quantitative methods to perform a reasonable comparison of used methods, so that it could be possible to evaluate whether their proposed models were reliable comparing to others studies, such approach allows minimizing accurately biases. Special consideration is given to scienti c literature for the prediction of heart disease to appraise the proposed methods using many parameters.

Methods
The assessment of the prediction of performance is based on the analysis of the confusion matrix method. This method has binary class, one is indexed by the actual class of an object, and the other is indexed by the predictive class. Various well-known measurement indices can be derived, such as sensitivity, speci city, accuracy and precision [8]. These measurements are describing the achieved results in order to evaluate the predictive model. Generally, researchers are comparing their achieved results with other studies in literature using the accuracy which was being the most known and used metric. In fact, each model takes into account effectively different parameters such as the Ml algorithm, number of output class, number of subjects and the number of used attributes, besides it is more useful and convenient to identify and introduce the contribution of a new model of comparison where more than one parameter of studied models is different. Thus, we suggested a new performance metric for reliable comparison. The introduced performance metric is de ned as a ratio of the achieved accuracy to the number of used attributes as follows: [Please see the supplementary les section to view the equation.] (1) The introduced index would enhance methods comparison based on ratio considering both the accuracy and the number of used attributes.

Discussion
We have tested the suggested comparison measurable methodology described above for predicting of heart disease. The parameters used of reviewed studies were summarized in the Table 1 [9][10][11][12][13][14][15]. In the literature, studies employed different data sets which is impacting the comparison process and the objectivity and measurability. In addition, whenever using the same data set, the numbers of attributes are varying and depending of the conducted study. Indeed, the number of attributes remains a critical parameter for the prediction of heart disease; subsequently it is related to the risk factors to develop such disease. In this case, the known University of California Irvine (UCI) heart disease data set has used larger number of attributes to predict the disease that the total number of attributes 76 attributes. Furthermore, each study used different number of attributes among 76 attributes suggested. Thus, we used the metric index ρ i (Performance index) in our suggested method to improve the performance comparison. The ndings of the performed comparison using the introduced indices are shown in Figure 1. Indeed, the nding showed signi cant correlation between the accuracy and the number of attributes. Therefore, the result is well illustrating the interest of used metric index and demonstrated the validity of our proposed method which is taking the form of measurable performance designed speci cally for this particular application.

Conclusion
We introduced new quantitative metric standard to evaluate the performance of a predictive model. The suggested approach using Performance index ρ i is reliable, repetitive, allowing quantifying the performance and the ability to compare new model to reported models in the literature. This proposed method allows advancing the research methodology and helps researchers to evaluate their models in most quantitative and reliable manner.  Table   Table1. Summary of reviewed studies for the prediction of heart disease