In this study, we identified 28 significant MS genetic loci associated with risk of worsening of disability over time. Using these loci, we developed and validated simple, learnable, and robust ensemble genetic machine learning model(s) to predict disability progression, and to identify PwMS prone to future disability accumulation. However, there is little current knowledge of the functional implications of these common SNPs.
Different estimates of the variance in disability progression explained by MS genetic loci have been reported in prior studies.5,6,8,44−47 For instance, using 125 early MS cases with 5 years of follow-up from our cohort, Pan et al.,6 constructed a genetic risk score from 7 of 116 MS SNPs3,4 to explain 32.7% of the variance in annualised EDSS, but did not validate their findings externally; whereas Jackson et al.,5 developed a RF-based genetic model on MS disease severity scores (MSSS) which included 19 of ~ 200 autosomal SNPs3,4 to explain 21% of the variability in MSSS, with just 4% chance of validating their results externally. However, if we assume that the underlying assumptions made about the disability process were correct; then a common drawback to these studies is not the variability explained, but rather the utility and reliability/robustness of the identified associations, and the derived predictions in clinical practice.
In our study, we made a considered and clinically plausible Markov’s assumption (i.e., that future disability is predicated on the prior disability history) to study the disability progression process in MS and employed robust MEML ensembles to predict future disability outcomes. Of the 28 genetic loci from the IMSGC list of known MS risk loci, 7 were non-functional SNPs, and were identified as having the greatest effect. However as with MS risk, it is very difficult to provide actual biological mechanisms for the identified SNPs associations, other than just non-specific genetic markers of disability progression.
Instead of relying on the complex predictions generated by the ensembles to make prediction decisions, here we presented simple and transparent relational rules sets (see Web Appendix C3) that could be translated to aid existing clinical predictions,10,15,17,19 or clinical research studies via a web application delivering equal prediction accuracy as the original ensemble. Clinicians could use these rules (provided genotyping was available) alongside recent clinical predictions,10,15,17,19 and identify MS cases that are at greater risk of disability accrual in the short and medium term, and institute more aggressive MS therapies where indicated.9
The high intra-class correlations between the observed and ensemble-derived predicted probabilities of worsening, conditional on the SNP associations revealed a good model fit. The obtained p-values for these correlations (all p \(\le\) 5.2x10−10) were far smaller than recently reported,5,47 suggesting a near 100% chance of replicating our results in an external MS population.
The strengths of this study lies in the assumptions we made regarding the underlying disability process in MS (defined above), and the use of novel machine learning platforms capable of analysing the longitudinal evolution of EDSS scores. By analysing the continuous-time evolution of EDSS transitions, the genetic variance in disability progression was significantly increased compared to other studies.5,6,8,46 However, we recognise limitations in our study. For instance, our genetic ensemble lacks genome-wide coverage, and epistatic interactions amongst the MS genetic loci used. A genome-wide analysis to further identified novel SNPs associations which are not MS related, could be a fruitful area of future research. Similarly, we lack an external validation cohort (an external MS population) that matches our prospective, data dense AUSLONG cohort with genotyping available. Thirdly, as emphasised the genetic variants utilised here do not have any described biological effect making it difficult to elucidate the mechanisms underlying MS progression from these data.
In conclusion, our study provides a simple, our study provides a simple, learnable, and interpretable and robust ensemble genetic machine learning models that aggregates association evidence from 28 candidate MS genetic markers to predict future worsening of disability in PwMS. Our ensembles provided decision rules which could be translated to provide additional prognostic values to existing clinical prediction models10,15,17,19, with the additional benefit of incorporating relevant genetic information into clinical decision making for PwMS. Finally, modeling the continuous-time evolution of EDSS, increased the variance in disability progression that is genetically determined.