PD is a complex disorder with unclear etiology and lack of efficient treatments21. Studies have shown that early intervention can effectively improve symptoms and slow down disease progression22. As such, an accurate and suitable model that can enable PD early detection is of great importance. In our research, we first assessed the genetic susceptibility to PD by using the genome-wise summary statistics that encompass 6.2 million SNPs and achieved an average of 22.95% higher AUCs as compared to existing PRS that is built with only a limited number of SNPs14. We have designed a PD risk prediction model that achieved an AUC of 0.91 and Nagelkerke’s R2 of 0.59 by using PRS, demographic variables, as well as non-invasive and cost-effective clinical assessments. Through association analyses, we found that the increase in PRS leads to an increase in the odds of PD, and the effects of PRS can be modified by gender and UPDRS.
We constructed PRS using four methods, which allows for the comprehensive assessment of genetic risk and shed light on the genetic architecture of PD. We found that neither the polygenic model SBLUP nor the oligogenic model C + T can sufficiently capture the genetic susceptibility to PD. Instead, SDPR which allows for much more flexible effect size distributions has the highest prediction accuracy. This suggests that PD may have an omnigenic architecture, where a large number of SNPs have small-to-moderate effects and a few core genes have large effects19. Indeed, GWAS also provides additional evidence to suggest the omnigenic architecture. The Manhattan plot in supplementary Figure S4 shows only a few significant PD-associated SNPs. Notably, our prediction model built with just PRS can achieve an average AUC of 0.75 (scenario 0 in Fig. 1), which is substantially higher than that in the existing literature (e.g., the AUCs are around 0.55 and 0.61 in Chairta et al.23, and Li et al.14, respectively).
We designed a practically useful framework for PD risk prediction, which achieves sufficient accuracy (i.e., AUC > 0.9) with non-invasive variables. We considered three scenarios that reflect accessibility of these risk factors. Although demographic variables can be easily obtained, incorporating them into the risk prediction built with PRS has barely improved the prediction accuracy and PRS itself explained more than 80% of the explained variability in the outcome. We further incorporated clinical assessments (i.e., MoCA, UPSIT, and GDS), which do not pose significant burden to both clinicians and patients24, into the risk prediction built with PRS and demographic variables. This model has substantially improved the prediction accuracy with AUC reaching 0.91 (scenario II in Fig. 1). Despite the large predictive power of MoCA, UPSIT, and GDS, PRS still can explain 18.1% of the explained variance. Indeed, for risk prediction models built without PRS, its average AUC is significantly lower than that of the model with PRS. This suggests that neither PRS nor clinical assessments alone achieve the best prediction performance and integrating them can significantly enhance the prediction accuracy. CSF biomarkers can reflect PD progression and facilitate its early diagnosis. Therefore, we investigated the additional predictive power of CSF biomarkers on PD risk prediction despite their ascertainments can be invasive25. Surprisingly, we found that given PRS, demographic, and clinical assessments, CSF biomarkers barely provide any additional information (scenario III in Fig. 1) and only reach an AUC of 0.92, which is almost the same as the model without CSF biomarkers (p-value = 0.43). It is worth noting PRS contributes significantly to PD risk even for models with clinical assessments and biomarkers incorporated. PRS can explain at least 10% of the explained variability in the outcome, which is consistent with previous research, indicating the critical role of genetic factors in the progression of PD26. Considering the cost, implementation, and accuracy, we recommend the PD risk model that builds with PRS, demographic, and clinical assessments (i.e., scenario II).
We undertook a preliminary investigation to explore the relationships among PRS, other known PD risk factors, and the risk of PD. Consistent with the results of Koch et al.27, PRS in PD cases is significantly higher than that of controls. Compared to the lowest PRS quintile, individuals in the highest quintile had a 7.12-fold increase in the odds of PD (Fig. 2). Our study also found that PRS is correlated with both UPDRS and UPSIT scores (Supplementary Figure S3), which is quite consistent with previous studies28 29. For example, PRS is found to be positively correlated with MDS-UPDRS motor scores28 and negatively associated with UPSIT scores 29. We did not find consistent evidence of correlations between PRS and CSF biomarkers. Although PRS built with C + T methods shows correlation between PRS and CSF biomarkers (i.e., t-tau and p-tau), PRS built with the other methods indicate otherwise. In the existing literature, consensus has not been reached on the associations between PRS and CSF biomarkers. For example, Lee et al. found negative relationships between PRS and Aβ4230. However, Ibanez et al. suggested that PRS is not related to Aβ42 and p-tau, but is marginally associated with t-tau12. Further investigations are needed. Interestingly, we found the effect of PRS on PD risk can be modified by other factors. Given the same PRS, the odds of PD for men are significantly higher than those of women. In fact, males are twice as likely to develop PD as females, which may be due to the nigrostriatal dopaminergic pathway and the potential neuroprotective effects of estrogen31. In addition, we also found an interaction between PRS and UPDRS. Although no existing literature has reported this interaction, researchers recently have found that UPDRS interacts with a few SNPs32, which indicates a potential interaction between PRS and UPDRS.
While we have designed a practically useful PD risk prediction model and revealed the complex relationships between PRS and other risk factors, there are several limitations. First, our study is of limited sample sizes and lacks of external validation, which limits its generalizability. However, our reported accuracy is a realistic estimate, as we adopted cross-validation and determined the prediction accuracy based only on the validation set. While it would be better to obtain additional datasets, our model has provided insights into PD risk and can serve as the basis for future studies. Second, our PD risk prediction models were built based on the logistic regression, a commonly used and easily interpretable approach. Further studies are needed to explore the other algorithms, especially machine learning and deep learning models33, in modeling the PD risk.
In conclusion, our study suggests that PD could have the omnigenic architecture, where a few SNPs have large effects and the rest have small-to-moderate effects. We prefer to use SDPR, a method that is flexible in modeling effect size distributions, to construct PRS for PD. We recommend to use PRS, demographic variables, and easily obtainable clinical assessments to predict PD risk (AUC = 0.91), and leave out the invasive CSF biomarkers. The detected correlations (i.e., PRS with UPDRS and UPSIT) as well as the interactions (PRS with gender and UPDRS) have provided additional insights into PD etiology.