For the analysis, we used data from Health Examinees (HEXA), which consists of over the 40-year-old South Korean adults18. Table 1 presents the descriptive characteristics of the participants for the seven diseases: asthma, breast cancer, CAD, glaucoma, hyperthyroidism, hypothyroidism, and T2D. For each disease group, more than 300 cases and 30,000 controls were included, and the average age of disease cases was higher than that of the controls (P < 0.05, Student’s t-test). For asthma, hyperthyroidism, and hypothyroidism, there was a significantly higher proportion of women in the disease cases and these diseases are known to affect women more frequently26,27. For T2D and CAD, the incidence in men was higher, which is in accordance with previous research28,29. In the disease groups for asthma, CAD, and T2D, for which body mass index (BMI) is a risk factor30–32, the average BMI was higher than that in the control groups.
Performance of PRS in an East Asian population
We calculated PRSs for the seven diseases under various conditions, using three PRS methods: P + T, PRSice, and PRS-CS, and two GWAS summary statistics from Europe and East Asian. Summary statistics for European and East Asian populations were obtained from the UKB and BBJ, and their ancestry was considered in the PRS calculation. A total of 42 PRSs, including six PRSs for each disease, were generated. All PRSs were significantly associated with their target disease in a positive direction (OR > 1 and P < 0.05, logistic regression).
To quantify and compare the predictive performance of PRS for each disease, we considered several metrics. The cumulative incidence plot visually presents the disease incidence and enables comparison of different PRSs. High- and the low-risk groups (highest 10% and lowest 10% of the PRS distribution) were identified, and the incidence of disease risk was analyzed over time to compare PRS methods, ancestries (from the GWAS summary), and risk groups (Fig. 1). For all PRSs, there were large differences in the incidence between the risk groups, with the high-risk group showing a higher incidence than the low-risk group (Figure S1). However, the disease incidence between each PRS method and each population of GWAS data varied by disease. For example, for asthma and T2D, PRS-CS showed superior performance to other PRS methods in that the high-risk group had a higher incidence than the others, while the low-risk group had a lower incidence than the others. In addition, T2D GWAS summary statistics from East Asia (BBJ) showed better performance than the data from Europe (UKB) when applying the same PRS method. However, the optimal PRS methods and ancestry of GWAS data for the classification of risk groups differed for other diseases. For CAD, glaucoma, hyperthyroidism, and hypothyroidism, PRS-CS performed optimally when summary statistics from Europe were used. However, when using summary statistics for East Asians, PRS-CS performed better only for the classification of one of the low-or high-risk groups. In addition, among the breast cancer PRSs, using European GWAS data with PRS-CS showed the worst performance, with the high-risk group of PRS-CS having the lowest incidence among high-risk groups and the low-risk group having the highest incidence among low-risk groups. Except for T2D, the ancestry of the GWAS data did not show a dominant effect on either side, and P + T and PRSice did not optimally classify either risk group for any disease.
The receiver operating characteristic (ROC) curve is one of the most common metrics for characterizing the accuracy of PRS. AUC, defined as the area under the ROC curve, provides an estimate of the probability that the predicted risk of a randomly selected case is higher than the predicted risk of a randomly selected control33. ROC curves and the AUC of each PRS were generated for subjects with specific diseases by applying five-fold cross-validation (Fig. 2). For diseases other than breast cancer, PRS-CS showed a better or similar predictive performance to other PRS methods. Similar to the results of the cumulative incidence plot (Fig. 1), summary statistics of T2D GWAS from East Asia showed better performance than summary statistics from Europe when applying PRS methods, P + T (AUC = 0.631 and 0.583 for East Asia and Europe, respectively), PRSice (AUC = 0.639, 0.567), and PRS-CS (AUC = 0.669, 0.616). Moreover, summary statistics from the East Asian GWAS also outperformed those from Europeans for CAD and hypothyroidism, while the opposite was true for hyperthyroidism. For breast cancer, PRSice and P + T showed the most significant AUC when applying summary statistics from East Asia and Europe, respectively.
Applying P-value thresholds in PRS-CS
For the analysis of breast cancer, PRS-CS showed the poorest performance metrics among the PRS methods (Figs. 1 and 2). One of the biggest differences between the three PRS methods is that P + T and PRSice apply P-value thresholds to decrease noise, whereas PRS-CS applies continuous shrinkage priors to effect sizes of genetic variant. To identify the effect of these differences on the performance of PRS-CS, we considered four thresholds of GWAS P-values, 5 × 10− 2, 5 × 10− 4, 5 × 10− 6, and 5 × 10− 8, and generated PRSs using PRS-CS. As in the previous analysis, the AUC of each PRS was calculated by applying five-fold cross-validation, and the incidence of the lowest 10% and highest 10% subgroups of PRS distribution was generated.
As shown in Table 2, the predictive power of PRS-CS increased by applying different P-value thresholds for glaucoma (AUC 0.57 to 0.579) and hypothyroidism (AUC 0.553 to 0.561), when using BBJ summary statistics. Although PRS-CS did not outperform the other PRS methods, the predictive power for breast cancer did increase with P-value cut-offs (AUC 0.582 to 0.586 in BBJ, 0.551 to 0.589 in UKB), and classification of risk groups was better, in that the incidence decreased in the low-risk group (0.90–0.68% in UKB) and increased in the high-risk group (1.74 to 2.09% in BBJ, 1.19–1.77% in UKB). As shown in Fig. 3, PRS-CS showed better performance than P + T and PRSice when P-value thresholds were applied, even if it was not an optimal method previously (Fig. 1). For the remaining diseases, applying the P-value thresholds did not positively affect the performance of PRS (Table 2 and Figure S2).
Table 2
Predictive performance of PRS-CS by P-value threshold. AUC is mean values of five-fold cross validation. Lowest 10% and highest 10% subgroup of PRS distribution were presented as low- and high-risk respectively. AUC area under the receiver operator characteristic curve, CAD coronary artery disease, T2D type 2 diabetes. a, SNPs satisfying threshold were not in linkage disequilibrium reference panel of PRS-CS.
|
Threshold
|
AUC of BBJ
|
Incidence of BBJ
|
AUC of UKB
|
Incidence of UKB
|
Low-risk
|
High-risk
|
Low-risk
|
High-risk
|
Asthma
|
1
|
0.547
|
1.06%
|
2.27%
|
0.564
|
1.09%
|
2.51%
|
|
5×10− 2
|
0.542
|
1.25%
|
2.12%
|
0.562
|
1.39%
|
2.64%
|
|
5×10− 4
|
0.542
|
1.37%
|
1.86%
|
0.555
|
1.23%
|
2.69%
|
|
5×10− 6
|
0.537
|
1.54%
|
2.32%
|
0.544
|
1.44%
|
2.25%
|
|
5×10− 8
|
0.531
|
1.30%
|
2.15%
|
0.538
|
1.25%
|
2.13%
|
Breast Cancer
|
1
|
0.582
|
0.58%
|
1.74%
|
0.551
|
0.90%
|
1.19%
|
|
5×10− 2
|
0.579
|
0.68%
|
2.09%
|
0.520
|
1.09%
|
1.16%
|
|
5×10− 4
|
0.584
|
0.74%
|
1.67%
|
0.581
|
0.71%
|
1.58%
|
|
5×10− 6
|
0.581
|
0.58%
|
1.77%
|
0.589
|
0.68%
|
1.77%
|
|
5×10− 8
|
0.586
|
0.68%
|
1.54%
|
0.578
|
0.58%
|
1.67%
|
CAD
|
1
|
0.559
|
2.08%
|
3.52%
|
0.556
|
2.24%
|
4.08%
|
|
5×10− 2
|
0.558
|
2.10%
|
3.47%
|
0.549
|
2.20%
|
3.90%
|
|
5×10− 4
|
0.549
|
2.25%
|
3.30%
|
0.536
|
2.18%
|
3.62%
|
|
5×10− 6
|
0.542
|
2.10%
|
3.31%
|
0.533
|
2.20%
|
3.28%
|
|
5×10− 8
|
0.541
|
2.01%
|
3.28%
|
0.525
|
2.17%
|
3.33%
|
Glaucoma
|
1
|
0.570
|
0.34%
|
1.22%
|
0.574
|
0.49%
|
1.33%
|
|
5×10− 2
|
0.575
|
0.42%
|
0.97%
|
0.546
|
0.51%
|
1.27%
|
|
5×10− 4
|
0.579
|
0.59%
|
1.10%
|
0.560
|
0.53%
|
1.20%
|
|
5×10− 6
|
0.562
|
0.49%
|
1.22%
|
0.575
|
0.38%
|
1.16%
|
|
5×10− 8
|
0.535
|
0.53%
|
1.08%
|
0.558
|
0.55%
|
1.05%
|
Hyperthyroidism
|
1
|
0.564
|
1.64%
|
3.39%
|
0.551
|
1.51%
|
3.39%
|
|
5×10− 2
|
0.551
|
1.49%
|
2.87%
|
0.540
|
1.59%
|
3.10%
|
|
5×10− 4
|
0.556
|
1.56%
|
3.10%
|
0.549
|
1.69%
|
3.18%
|
|
5×10− 6
|
0.555
|
1.59%
|
3.21%
|
0.550
|
1.82%
|
3.00%
|
|
5×10− 8
|
0.553
|
1.49%
|
3.05%
|
0.536
|
1.97%
|
2.85%
|
Hypothyroidism
|
1
|
0.553
|
1.85%
|
3.31%
|
0.593
|
1.28%
|
3.56%
|
|
5×10− 2
|
0.540
|
1.61%
|
2.82%
|
0.584
|
1.36%
|
3.74%
|
|
5×10− 4
|
0.561
|
1.56%
|
3.36%
|
0.588
|
1.20%
|
3.74%
|
|
5×10− 6
|
0.544
|
1.59%
|
2.72%
|
0.587
|
1.33%
|
3.28%
|
|
5×10− 8
|
NAa
|
NAa
|
NAa
|
0.583
|
1.36%
|
3.46%
|
T2D
|
1
|
0.669
|
2.51%
|
19.83%
|
0.616
|
3.98%
|
15.15%
|
|
5×10− 2
|
0.660
|
2.72%
|
18.98%
|
0.603
|
4.53%
|
14.19%
|
|
5×10− 4
|
0.647
|
2.63%
|
18.16%
|
0.594
|
5.32%
|
14.39%
|
|
5×10− 6
|
0.638
|
3.25%
|
17.41%
|
0.600
|
4.43%
|
15.10%
|
|
5×10− 8
|
0.634
|
3.52%
|
16.54%
|
0.596
|
4.73%
|
14.66%
|