We presented the IDEARS platform, which uses state-of-the-art machine learning algorithms XGBoost and SHAP to provide a ranking of risk factors for PD using the world’s largest and most comprehensive prospective community study, the UK Biobank. As expected, ageing was by far the most significant factor in predicting PD followed by gender, PD is far more prevalent in males, a fact indicated by the larger number of males with PD in the UKB. Our unbiased machine learning approach uncovered a novel set of features most associated with PD. Interestingly, several well-established risk factors thought to have a high level of association with PD were not identified in the most important features in our model (e.g., pesticide exposure, smoking status, traumatic brain injury and caffeine consumption).
Of note is the importance of insulin-like growth factor 1 (IGF-1), which presented in the top 4 most important features, based on mean SHAP score in the combined dataset, and male and female lists. On deeper inspection of the data, it was clear that IGF-1 levels were elevated in males up to 10 years before disease onset, and when age normalising the data this patten could also be unearthed in females. IGF-1 is an endocrine, paracrine and autocrine hormone that is a primary mediator of the effects of growth hormone. Major functions of IGF- include insulinlike activity, cell proliferation and survival, antioxidant effects and neuroprotection. In vivo studies have demonstrated IGF-1 deficiency results in increased oxidative stress, inflammation, neuronal cell death and cognitive deficits that can be improved by exogenous IGF-119,20. It is well documented that IGF-1 is elevated in serum at diagnosis in PD patients, and levels at this time correlate with disease severity5,8. To account for the discrepancy in the beneficial effects of IGF-1 and the fact it is increased in PD, it has been hypothesised that IGF-1 signalling is defective in PD, resulting in a decrease in the neuroprotective effects and reduction in the brains ability to buffer oxidative damage. Moreover, IGF-1 signalling is known to be dysregulated by both toxin-induced inflammation and central obesity5, 21, 22, which is consistent with our model identifying prospective biomarkers predictive of greater PD risk in these categories. Therefore higher-than-average IGF-1 levels, especially in men, years before diagnosis may be indicative of a compensatory mechanism in response to dysregulated IGF-1 signalling. Our findings suggest that IGF-1 should be further considered as a prognostic biomarker for PD risk.
The IDEARS model identified bilirubin levels as being elevated in the 5 years before diagnosis but only in males, although the was trend for an increase in females. There is evidence that bilirubin levels are elevated in the early years post PD diagnosis, but to our knowledge no one has investigated the higher levels pre-PD diagnosis. A recent meta-analysis concluded that there was an increase in total bilirubin serum levels in PD patients, however it was more robust in the Caucasian population6, consistent with the UKB cohort. Furthermore, bilirubin levels negatively correlated with disease severity23 and dopamine replacement therapy elevated bilirubin levels compared to drug naïve PD patients24. Bilirubin is part of the heme oxygenase antioxidant pathway, therefore elevated levels in PD are likely a compensatory mechanism to increased oxidate stress in the parkinsonian brain. Thus, like IGF-1, bilirubin should be considered as a prognostic biomarker for PD risk, however it may not be a strong marker in females and some ethnicities.
AST:ALT was elevated in males up to 10 years before PD diagnosis but only increases in females after diagnosis, this is consistent with elevated ALT being protective, and being higher in the male SHAP list. Elevated AST:ALT ratios between 1–2 are indicative of non-alcoholic fatty liver disease (NAFLD) or non-alcoholic steatohepatitis (NASH), whilst levels < 2 are indicative of alcoholic liver disease25,26, therefore the moderate increases in the UKB PD cohort may be indicative of NAFLD/NASH, although some individuals in the PD group have levels above 2. A recent study of NAFLD and PD found that there was greater risk of PD in females with NAFLD27, and an earlier study found that NASH in males and females with hepatitis B and C infection led to a greater PD risk28. With that said, NAFLD is associated with cardiovascular disease and metabolic disorders which does not fully align with our other findings (see below)29. Whilst more research on NAFLD and PD is required, our findings indicate elevated AST:ALT may be a useful prospective biomarker of PD, especially in males.
The IDEARS model identified several features associated with cardiovascular health and body adiposity. Total and LDL cholesterol levels were reduced in PD in males 10 years before diagnosis but only 5 years in females. This observation is in keeping with a large population-based study of 261,638 statin-free individuals, which identified that males who had lower levels of total and LDL cholesterol were at a greater risk of developing PD, however there was no significant differences in females9. Given lower LDL levels, PD patients have shown a reduced risk of myocardial infarction and stroke30,31, and it has been hypothesised that the reduced cholesterol levels may be due to nonmotor peripheral symptoms, such as constipation, that can manifest before motor symptoms appear9.
Cardiovascular health is also strongly linked to metabolic regulation, and there are mixed findings on the co-morbidity of type 2 diabetes and PD, with some studies showing an increase 32, and others showing a reduced prevalence31,33. As mentioned above HbA1c is higher several years before PD onset, but not elevated at the time of diagnosis, although the proportion of the PD group with HbA1c in the diabetic range is slightly higher than the non-PD group. Therefore, further research is needed to investigate the possible associations of diabetes and PD.
Several epidemiological studies have linked central adiposity to PD34,35, which is consistent with output from the IDEARS model with increased waist circumference 10 years before diagnosis a risk factor in both sexes (and hip circumference in females). Although this observation may be at odds with better cardiovascular and metabolic health in general, body fat distribution is likely key factor, and increased adiposity has also been hypothesised to modulate IGF-1 signalling5,21. Clearly, more research is required to better understand the complex interactions of body adiposity and the risk of PD.
Several features relating to the immune system were identified by the IDEARS model, specifically an increase in neutrophil count, a decrease in lymphocyte count and in NLR, were all identified to be altered both 10 years before and at diagnosis in males, whilst only NLR followed the same pattern in females. An elevated neutrophil count is associated with the occurrence, progression and severity of inflammation or infection, whereas a decreased lymphocyte count, as part of the adaptive immune response, is heavily depressed by stress. Thus, NLR is considered a compound biomarker of inflammation and stress, and therefore it is perhaps not surprisingly that NLR is the most robust and consistent example of a prospective biomarker of PD risk from the IDEARS model. A recent study demonstrated similar findings with increased NLR in 100 PD patients, but no change in Alzheimer’s disease 7. Increased neutrophil count and NLR are in keeping with the literature that inflammation and infection are risk factors for PD. NLR may therefore be considered a useful prospective biomarker for the risk of PD, however as it is associated with many other chronic diseases, it should be used in combination with other biomarkers identified by our model.
Epidemiological studies have revealed viral (e.g. influenza, HSV, hepatitis) and bacterial (e.g. C. pneumonia and H. pylori) infections are associated with an increased risk of developing PD 28,36−39. Inflammatory conditions, such as head trauma, allergic rhinitis and exaggerated allergic reactions following insect stings, have been linked to an increased risk of developing PD40–43. Neuroinflammation is also a common pathological hallmark seen in the PD brain44–47. Conversely, long-term use of non-steroidal anti-inflammatory drugs (NSAIDs) reduce the risk of developing PD 48–51. Our analysis clearly demonstrates a protective effect of Ibuprofen use in the UKB participants, which was more pronounced in at higher NLR.
The reduction in lymphocyte count well before PD in our study is consistent with another recent analysis using the UKB dataset (thus validating our approach)12, as well as a meta-analysis that showed decreased numbers of CD3+ and CD4+ lymphocyte subsets in intermediate and late-stage PD, whilst a decrease in CD8+ T lymphocytes was also observed52. It is interesting to observe that this reduction in lymphocyte count occurs up to 10 years prior to diagnosis in males, and therefore maybe a better prospective marker in men. It is noteworthy that ‘suffers from nerve’ was also a highly ranked risk factor in the IDEARS model (8th overall), and therefore the PD group may have higher-than-average stress levels, which could depress lymphocyte counts. More detailed analyses of CD4+ T lymphocyte subsets suggests that they are skewed towards proinflammatory phenotypes (i.e., increased Th1, Th17, and reduced Th2 and Tregs) in PD patients53–55. The inflammatory milieu in PD is a likely contributor to decreased IGF-1 signalling mentioned previously5,19,20. Overall, these findings imply a predisposition to PD may be established by conditions that induce peripheral inflammation (injury/infection) and stress, or in individuals with an immune system skewed towards inflammation.
Given that PD is an age-related motor disease it was unsurprising that the IDEARS model identified several features associated with frailty and cognitive function. Reduced hand grip strength and decreased walking pace can be considered early markers of motor dysfunction and given they are significantly reduced in both sexes 10 years before diagnosis they should be considered as useful clinical measures to predict the risk of PD onset. Existing literature has identified the importance of these factors. Hand grip strength and reduced dexterity have been reported as a predictors of motor symptom severity in PD56. Slow walking speed has been correlated with both advanced aged and PD severity57, and it is also one of the first complaints in the early stages of the disease58. Increased number of ICD conditions at baseline, increased number of medications/treatments taken and reduced forced vital capacity were also apparent 10 years before diagnosis in both sexes, they are likely indictors of general ill health and multiple co-morbidities in PD patients. Arthritis, hypertension, atrial fibrillation, depression, back problems, and cataracts are commonly reported co-morbidities of PD 32,59, and require a wide range of treatments.
Other significant gender differences were observed with parental PD being more important for men than women, which may suggest that since idiopathic PD has a phenotype that strongly overlaps with monogenic forms of the disease60, there may be a greater genetic component in idiopathic PD in males. While platelet count, vitamin D and testosterone (normalised by gender) more important for women. Vitamin deficiency has been linked to neurodegenerative diseases, and a deficiency in vitamin D in particular has been linked to reduced dopamine levels and alpha-synuclein accumulating, which are pathological hallmarks of PD 61. Vitamin D has been shown to have neuroprotective, anti-inflammatory and antioxidant effects in vitro62, however a recent metanalysis could not conclude clear benefits of vitamin D supplementation in reducing PD risk63.
The application of a novel methodology in the IDEARs pipeline has enabled us to examine a much larger range of variables without a priori assumption. The advantages of using XGBoost and SHAP in this context is in the ability to consider a large number of variables and accurately determine their importance in the model while implicitly modelling interactions between variables, resulting in a demonstratively higher AUC. The disadvantage is the black box nature of this approach. We have sought to mitigate this by providing a separate univariate analysis of individual variables. In addition to the power of determining the most significant risk factors in driving PD, this approach could be used separately to provide a risk score which would be more accurate than existing methods.