An Comprehensive Index (CIB) with Combination of Consistency in both Case Control and Cohort Study to Determine the Ecacy of a Biomarker

Background: This study aimed to describe biomarkers using the comprehensive index of biomarker (CIB) based on consistency rate in both case control (Youden index, Yen) and cohort study (Crc) to determine the ecacy of a biomarker. Methods: The CIB is the geometric mean of Yen and Crc. The simulated data were generated to observe CIB features of sensitivity, specicity, and receiver operating characteristic (ROC) analysis for biomarkers Results: CIB was found to be related to the CRC values ROC analysis. The higher Spe could indicate better diagnostic power and the higher Sen could indicate better joint action for biomarkers with the same Yen. Although Yen is the common index used to evaluate the effectiveness of a biomarker, unfortunately, the Yen value was signicantly larger than CIB value under the moderate Spe, showing overestimation. Conclusion: The CIB with combination of consistency in both case control and cohort study could be more reasonable. The CIB could provide a better understanding of the power of a biomarker and would be better at evaluating biomarkers from new systems or concepts.


Background
One of the main purposes of seeking biomarkers is diagnosis of a disease. However, most studies to identify biomarkers use case-control studies rather than cohort studies [1][2][3]. In case-control studies, the potential relationship between a biomarker and the disease is examined by comparing frequencies of this biomarker in the diseased and non-diseased subjects. The e cacy of a biomarker is normally described in terms of changes in consistency (Youden index, Yen) [2][3][4].
In a cohort study, a suspected biomarker should be considered an exposure factor, and the exposed and unexposed subjects should be observed until they develop the disease. This type of research design is chronologically consistent in that we diagnose the disease from the biomarker; therefore, a cohort study also has a stronger ability to test a biomarker [5][6][7].
In cohort studies, the difference in a disease's incidence between an exposed and non-exposed group, which also is consistency rate in cohort study (Crc), indicates the role of the observed factor in the disease's pathogenesis [8][9][10]. A de nite relationship between the results of a case-control study and a cohort study is as follows [11], where Pe and Pn represent the disease's incidence in the exposed and non-exposed (biomarker) groups, respectively; Pd and Pc represent the frequencies of the observation factor (biomarker) in disease group and in the control group, respectively, in the case-control study; "m" represents the incidence in the total population (generally, the disease is an event with a small probability; therefore, m was assigned a value of 1% in the present study) and Crc represents consistency rate in cohort study.
We nd that results of a case-control study and a cohort study is not always parallel. For example, if the occurrence probability of a biomarker is 0.80 in the disease group and 0.05 in the control group, while its Yen is 0.80 (0.85-0.05) and Crc is 0.145. When the cardinal number is relatively large (0.90 vs 0.10, Yen=0.8), the Crc is 0.082. The problem is that there was a signi cant difference between the Yen and Crc in a small probability event (say m=0.01). The occurrence of disease is a small probability event; therefore, the Yen was signi cantly larger in the case-control design, showing overestimation. This represents a serious problem for determining the e cacy of a biomarker. In the present study, we propose an comprehensive index of biomarker (CIB) with combination of consistency in both case control and cohort study to determine the e cacy of a biomarker.

Calculation of CIB
The basic principle of the analysis model is to comprehensively consider consistency in case control study and cohort study to determine the e cacy of a biomarker. The e cacy of a biomarker is normally described in terms of Yen, which is the sum of the positive rates of a biomarker in the disease group and the negative rates of this biomarker in the control group minus 1 as follows: Yen = Pd-(Pc-1)-1 = Pd-Pc where Pd and Pc represent the observed frequencies of a biomarker in the disease group and the control group, respectively, from the case-control study.
The consistency in cohort study (Crc) is the sum of the incidence in the exposure group (positive group of a biomarker) and the healthy rate in the non-exposure group (negative group of a biomarker) minus 1 as follows: Crc = Pe-(Pn-1)-1 = Pe-Pn where Pe and Pn represent the incidence in the exposed group and non-exposed group, respectively, from the cohort study.
We de ne the geometric mean of Yen and Crc as comprehensive index (CIB) as follows: The geometric means are given because this mean tend to smaller numeric values. The range of CIB was (0 1), a larger CIB implied a stronger power of a biomarker.

Evaluation of ROC analysis
The receiver operating characteristic (ROC) analysis is the common method used to evaluate the effectiveness of diagnosis made using a biomarker [1,2,12]. In present study, ROC analysis was evaluated based on CIB whether the ROC analysis was still available or not.
A model comprising four sets of simulation data was established. Four sets of normally distributed random numbers (100 ± 20, n = 5000; 115 ± 20, n = 5000; 125 ± 20, n = 5000; 140 ± 20, n = 5000) were generated using the SPSS statistical software (IBM Corp., Armonk, NY, USA). Model A consisted of the datasets of 100 ± 20 and 115 ± 20; Model B consisted of the datasets of 100 ± 20 and 125 ± 20, and Model C consisted of the datasets of 100 ± 20 and 140 ± 20. The receiver operating characteristic (ROC) analysis was performed as shown in Figure 1.

Evaluation of sensitivity and speci city
Most studies that attempt to identify biomarkers use a case-control design rather than a cohort design. In case-control studies, the potential relationship between a biomarker and the disease is examined by comparing the frequencies of this biomarker in the diseased and non-diseased (control) groups. With the case-control approach, biomarkers are assessed in already diseased individuals, and the power of a biomarker is typically expressed as the positive rates of a biomarker in the disease group (referred to as sensitivity, Sen) and the negative rates of the biomarker in the control group (referred to as speci city, Spe) [4]. However, even for biomarkers with the same Youden index, the diagnostic power may be different. Further, it is unclear whether the Sen or Spe is more relevant with CIB for biomarkers with the same Youden index. If the cardinal number (value in the control group) is relatively small (and Spe is higher), CIB could change in spite of these biomarkers with the same Yen. Evaluation of Sen and Spe in the case-control study based on CIB values was performed using the values shown in Table 1. The incidence in the total population is considered as 1% for calculating CIB

Combination of two biomarkers based on CIB
Under ideal conditions, the power of a combination of two biomarkers would be better than the power of a single biomarker. Further, it is unclear whether biomarkers with the same CIB were combined, the combined power (CIB) would be similar or not. According to the above assumptions, we have chosen the simulated data analytical method to solves this problem.
We assume genetic markers with those expected under the hypothesis of panmixia (Hardy-Weinberg equation), and establish the simulated data (1 and 0 standing for positive and negative) on the SPSS platform according to random numbers; the frequencies of each group are generated by design, each group including 5000 cases (n=5000) and two items (genetic markers); the allele frequency of each item is same and the positive distribution is independent in one group.
Two simulated data groups are selected as disease group and control group depending on design, then CIB are calculated (m=1%). The joint action of multiple indices is evaluated with binary logistic regression [4] and a new CIB are calculated again.

RelationshipbetweenYen and CIB
Yen is the common index used to evaluate the effectiveness of a biomarker or diagnosis made using a biomarker. Further, it is necessary to know that relationship between Yen and CIB. Different Yen with the moderate cardinal number was generated using simulated data as shown in Table 2. The scatter diagram was plotted using the Yen as X-axis and CIB as Y-axis.

Results
For ROC analysis simulation, the simulated data sample size was 5000, and the results for the casecontrol study are shown in Table 3. The results showed that CIB resulted in an increase in Area Under Curve (AUC) in the ROC analysis. Thus, the ROC analysis could be used as a reference of CIB. The Sen and Spe of biomarkers in a case-control study were evaluated based on CIB values as shown in Table 1. The values in the table indicate that higher Spe (or a lower false-positive rate) could indicate better diagnostic power (CIB) for biomarkers with the same Yen.
In Table 4, the combined different cardinal numbers and CIB values for biomarkers with the same CIB values are shown. A combination of two biomarkers was found to have more signi cant power (as the CIB increased when a combination of two biomarkers was used), however, CIBs for biomarkers with the same CIB were not similar when two biomarkers were combined. Relationship between Yen and CIB are shown in Fig. 2. A plotted scatter diagram revealed that when the available CIB level was de ned as 0.5, the Yen was > 0.90 to reach 0.5 for a 0.01 incidence rate in the total population.

Discussion
Biomarkers are used for disease diagnosis; therefore, the CIB could be more reasonable. Even so, the Yen is still important, thus suggesting CIB that is an comprehensive index with combination of consistency in both case control and cohort study to quantitatively describe the power of a biomarker and evaluate the effectiveness of a biomarker or diagnosis made using a biomarker. Fortunately, the ROC analysis could still be available and used as a reference of CIB, however, there were more features for CIB. The results indicated that higher Spe could indicate better diagnostic power and that higher Sen could indicate better joint action for biomarkers with the same Yen.
Because the CIB range is typically 0-1, we still propose that a CIB > 0.50 is considered to have clinical value [4]. Accordingly, the Yen over 0.9 could reach clinical value (Fig. 2). Therefore, to obtain a highpower CIB value, a combination of two or more biomarkers is necessary.
More importantly, we found that the Yen value from the case-control design was signi cantly larger than CIB value, showing overestimation. Another example is the analyses of genetic associations (screening genetic marker), which have been successful in mapping genes, but clinically disappointing because of inconsistent ndings, which has been partly attributed to overestimations in case-control studies.
Statistical differences do not necessarily represent strong clinical effects. Except for Mendelian diseases, signi cant associations are di cult to detect because few genes have a Yen over 0.9. Hence, it might be misleading to pay attention only to the results of Yen.
It should be pointed out that to simplify the calculation, the CIB value in present study was not equal to the actual CIB values. When the natural incidence is given for a disease, the de nite CIB could be calculated by data from case-control studies without di culty.

Conclusion
The CIB with combination of consistency in both case control and cohort study could be more reasonable than Yen for determining the e cacy of a biomarker. We propose that the CIB provides a better understanding of the power of a biomarker and would be better at evaluating biomarkers from new systems or concepts.
Abbreviations AUC: area under the curve; CIB: comprehensive index of biomarker; Crc: consistency rate in a cohort study; Pc: frequencies of a biomarker in the control group in case-control study; Pd: frequencies of a biomarker in the disease group in case-control study; Pe: incidence in the exposed group in cohort study; Pn: incidence in the non-exposed group in cohort study; ROC: receiver operating characteristic; Sen: sensitivity; Spe: speci city; Yen: Youden index Declarations Ethics approval and consent to participate This article does not contain any studies with human participants or animals performed.

Consent to publish
Not applicable

Availability of data and materials
The data used to support the ndings of this study are available from the corresponding author upon request.

Competing interests
None declared