Multiple risk-factor model
The basic assumptions of the analytical model are: (1) the prevalence of the different observed factors is independent of each other and play a role in a superimposed manner, regardless of interaction or weight function; and (2) a chronic disease is a continuous process of the superimposed manner of suspected factors.
A four-factor model simulating pathogenic data was established. Four sets of random numbers with binomial distributions (P = 0.5, N = 100,000) were generated using SPSS statistical software. The four sets of data, which were independent of each other, were named A, B, C, and D. By adding the four sets of data to create group results for the ABCD group, group A can be regarded as a factor of the ABCD group, as shown in Fig. 1. The highest value of ABCD was used as the denominator to convert value of ABCD from 0 to 1. A higher number of suspected factors indicates a higher probability of disease. Hence, A can be regarded as a cause of ABCD; this model was named the four-factor model, which has a probability of 0.5.
In the same way, four-factor models simulating pathogenic data in which the probability of the suspected factor was 0.01 and 0.001 in the study population (four-factor models with 0.01 and 0.001) were established to observe the influence of suspected-factor distribution on differences in incidence between the two groups.
Evaluation of effect size
In a similar way, a three-factor model with 0.5 (A vs ACB) and a two-factor model with 0.5 (A vs AB) were established to evaluate the differences in the magnitude of the associations between the observed factors and outcomes. Using the A group as the cause group and ABCD, ABC, and AB as results, a cohort study was established to generate simulated results. The difference in the observed occurrence of disease between the two groups was then calculated to evaluate the effect sizes.
Evaluation of odds ratio and Youden’s index
We assumed that the frequencies of a genetic marker (gene) were distributed in disease and control groups as shown in Table 1.
Table 1
Frequency distribution of genetic markers in disease and control groups (%).
Genetic marker | Groups | Total |
Disease | Control |
Carrying gene | a | b | a + b |
Not carrying gene | c | d | c + d |
Total | 100 | 100 | 200 |
The following equations were used to determine the Youden’s index (Y) and odds ratios (OR)[10, 11]
Y = a + d-1 = a+(1-b)-1 = a-b
OR = (a*d)/(b*c)=[a*(1-b)]/[b*(1-a)]
Meanwhile, we also suggested the true and false-positive ratio (TFR) in a case-control study as follows:
TFR = a / b
The basic principle of the analysis model is to comprehensively consider which of Y, OR and TFR could correctly reflect consistency in a cohort study (CRC).
The CRC is the sum of the incidence in the exposure group and the healthy rate in the non-exposure group minus 1 as follows:
CRC = Pe-(Pn-1)-1 = Pe-Pn
where Pe and Pn represent the incidence in the exposed group and non-exposed group, respectively, from the cohort study.
Evaluation of Y, OR and TFR in case-control study based on CRC from cohort study was performed using special numbers. A definite relationship between cohort outcomes and that from case-control study is as follows [16]:
where Pe and Pn represent the incidence of the exposure group and that of non-exposure group, respectively, in the cohort study; Pd and Pc represent the frequencies of the observation factor in disease group and in the control group, respectively, in the case-control study; and “m” represents the incidence in the total population and is assigned to a value of 5% in the present study because e a chronic disease usually is a low probability event.