Quantitative longitudinal predictions of Alzheimer's disease by multi-modal predictive learning

Background: Quantitatively predicting the progression of Alzheimer’s disease (AD) in an individual on a continuous scale, such as AD assessment scale-cognitive (ADAS-cog) scores, is informative for a personalized approach as opposed to qualitatively classifying the individual into a broad disease category. We hypothesize that multi-modal data and predictive learning models can be employed for longitudinally predicting ADAS-cog scores. Methods: Multivariate regression techniques were employed to model baseline multi-modal data (demographics, neuroimaging, and cerebrospinal ﬂ uid based markers, and genetic factors) and future ADAS-cog scores. Prediction models were subjected to repeated cross-validation and the resulting mean absolute error and cross-validated correlation of the model assessed. Results: Prediction models on multi-modal data outperformed single modal data up to 36 months. Incorporating baseline ADAS-cog scores to prediction models marginally improved predictive performance. Conclusions: Future ADAS-cog scores were successfully estimated via predictive learning aiding clinicians in identifying those at greater risk of decline and apply interventions at an earlier disease stage and inform likely future disease progression in individuals enrolled in AD clinical trials.

The progressive nature of AD makes diagnosing an individual into any of the discrete groups a challenging proposition [10,11]. Conventional progression tracking analyzes clinical changes in MRI, CSF and cognitive biomarkers [12,13], but this could be ine cient as the changes can be slow and di cult to detect [14,15]. The change in these biomarkers is nonlinear with AD's progression, further complicating longitudinal tracking. Therefore, quantifying and tracking the condition of the patient by continuous measures such as ADAS-cog scores has been advocated [16,17]. ADAS-cog is widely used clinically (to measure language, memory, praxis, and other cognitive abilities) and provides an accurate description of the cognitive state on a continuous scale, making it an ideal choice in our study [18,19]. The availability of standardized multi-modal data and corresponding longitudinal ADAS-cog scores from research organizations, such as the Alzheimer's Disease Neuroimaging Initiative (ADNI) project, has enabled the development of novel techniques for tracking AD progression by employing machine learning [20].
However, predicting ADAS-cog scores has been reported as very di cult [21]. In the recent Alzheimer's Disease Prediction of Longitudinal Evolution (TADPOLE) Challenge (https://tadpole.grandchallenge.org/), forecasts of clinical diagnosis and ventricle volume were very good, whereas, for ADAScog, no team participating in the challenge was able to generate forecasts that were signi cantly better than chance.
Multivariate regression techniques, such as partial least squares regression (PLSR), support-vector regression (SVR) and random forest regression, enable modeling complex relationships between baseline multi-modal ADNI data (predictors) with future ADAS-cog 13 scores [22,23]. The multivariate nature of the modeling is desirable for the ADAS-cog score trajectory analysis due to the complementary nature of the AD measures. The resulting trajectory predictions could alert clinicians to prescribe appropriately (once disease modifying interventions are available). Moreover, knowing the likely future trajectory of the disease will provide a benchmark with which to test clinical evolution in patients enrolled in clinical trials.
We hypothesized that the multivariate regression techniques are well suited for multi-factorial diseases and that the progression of AD, as indicated by ADAS-cog scores in subsequent timelines, can be accurately predicted. Furthermore, the inclusion of baseline ADAS-cog scores could improve the predictions of the model in subsequent follow-ups.

ADNI Dataset
Data in this study were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/). In addition to the various summary tables directly provided by ADNI, we used summary tables prepared for the TADPOLE grand challenge based on ADNI data at https://tadpole.grand-challenge) [21,24]. The data are from the TADPOLE tables if not otherwise stated. Speci c variable names are provided as supplementary Table S.1. The ADNI project started in 2003 as a public-private partnership, led by PI Michael W. Weiner, MD. The main objective of ADNI is to evaluate the application of serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment in a multi-modal approach to determine the longitudinal progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD). We utilized pre-processed ADNI data because of the standardized processing pipeline that ensured the quality of the data. This multimodal data is readily available for other researchers enabling a direct comparison of the study results.

Subjects
The characteristics of subjects recruited in the ADNI dataset are described in detail here http://adni.loni.usc.edu/. The trends of the ADAS-cog 13 scores utilized in this study are provided in Figure S.1 and the details of subject characteristics are provided in Table S.2 of the supplementary section. There are fewer subjects in follow-up visits than in the baseline visit due to subject attrition and missing data. Note that some subjects change diagnostic status over the follow-up period. The roster identi cation (RID) numbers of the included subjects are provided as comma-separated values in the supplementary section.

MRI
As MRI features, we used 9 features: intracranial volume (ICV), and volumes of the hippocampus, entorhinal cortex, and lateral ventricles as well as the latter four divided by the ICV. These features were selected based on previous studies [25]. We included volumes divided by the ICV as it is unclear whether raw or ICV-corrected volumes are better predictors of dementia [25,26]. MR imaging protocol details are provided by ADNI at http://adni.loni.usc.edu/methods/mri-tool/mri-analysis/. Cortical reconstruction and volumetric segmentation had been performed with the FreeSurfer 5.1 image analysis suite. A brief description of the processing is provided in the supplementary material (Section B) [27].

AV-45 PET
As AV-45 PET features, we used standardized uptake values (SUVs) in four regions: frontal cortex, cingulate, lateral parietal cortex, and lateral temporal cortex. The AV-45 PET measures amyloid-beta load in the brain. AV-45 PET imaging and preprocessing details are available at http://adni.loni.usc.edu/methods/pet-analysis-method/pet-analysis/ [28]. We used regional SUV ratios processed according to the UC Berkeley protocol [28][29][30]. Each AV-45 PET scan was co-registered to the corresponding MRI and the mean AV-45 uptake within the regions of interest and reference regions was calculated. Regions of interest were composites of frontal regions, anterior/posterior cingulate regions, lateral parietal regions, and lateral temporal regions [31]. The final PET measurements were the average amyloid-beta uptakes in the four ROIs normalized by the whole cerebellum reference region.

FDG PET
As FDG-PET features, we used average SUVs in ve brain regions: bilateral angular gyri, bilateral posterior cingulate gyri, and bilateral inferior temporal gyri. The FDG PET data measures glucose consumption and is shown to be strongly related to dementia and cognitive impairment when compared to normal control subjects [30,32,33]. Motion correction and co-registration with MRI was performed on the acquired PET data. The highest 50% of voxel values within a hand-drawn pons/cerebellar vermis region were selected and their mean was used to normalize each ROI measurement resulting in the final FDG PET measurements. Regions of interests were bilateral angular gyri, bilateral posterior cingulate gyri, and bilateral inferior temporal gyri.

CSF proteins
The baseline CSF Aβ 42 , t-tau, and p-tau were used as CSF features [34]. CSF was collected in the morning after an overnight fast using a 20-or 24-gauge spinal needle, frozen within 1 hour of collection, and transported on dry ice to the ADNI Biomarker Core laboratory at the University of Pennsylvania Medical Center. The levels of Aβ 42 , t-tau, and p-tau in CSF were used.

Neuropsychology and behavioral (NePB) assessments
The NePB assessments reflect the cognitive abilities of the subjects. Subjects underwent a battery of NePB tests [35]. We selected to include 5 NePB scores as NePB features: the summary score from Mini-Mental State Examination (MMSE) [36], three summary scores of Rey's auditory verbal learning test (RAVLT; learning, immediate, and percent forgetting) [37], and a summary score from the functional activities Questionnaire (FAQ) [38].

Risk factors: age, education, and APOE
Past studies have found several risk factors contributing to AD [8]. We considered age, the number of APOE e4 alleles, and the years of education. With aging, normal cognitive decline is an accepted phenomenon, but lower education and lower cerebral metabolic activity could accelerate the normal decline [39]. The APOE e4 allele, present in approximately 10-15% of people, increases the risk for lateonset AD and lowers the age of onset. One copy of e4 (e3/e4) can increase risk by 2-3 times while homozygotes (e4/e4) can be at 12 times increased risk [40]. We coded APOE e4 status of absence, single copy or homozygous coded as 0, 1 and 2 respectively.

ADAS-cog scores
The ADAS-cog 11 task scale was developed to assess the e cacy of anti-dementia treatments. Further developments to the scale shifted its sensitivity towards pre-dementia syndromes as well, primarily mild cognitive impairment (MCI). The ADAS-cog 13 task scale was one such improvement on the original ADAS-cog 11, with additional memory and attention/executive function tasks [41]. The final 13 tasks test verbal memory (3 tasks), clinician-rated perception (4 tasks), and general cognition (6 tasks). It was found to perform better than the ADAS-cog 11 at discriminating between MCI and mild AD patients, as well as have better sensitivity to treatment effects in MCI [42]. As the ADAS-cog 13 fully encompasses the ADAS-cog 11 tasks, it is also backward compatible. As such, we used the ADAS-cog 13 scale for our study as a continuous quantitative measure of a subject's disease status. The scores at baseline, 12-month, 24-month and 36-month timelines were obtained from the ADNI dataset (Table S.2). The value (0 to 85) of these scores is lowest for the normal control group and increases with disease progression and the scores are highest for AD subjects.

Multivariate regression analysis
We employed multivariate regression to predict ADAS-cog scores based on predictor variables detailed in section 2.1. We considered four different prediction tasks: predicting ADAS-cog score at baseline and at 12, 24, or 36 months after the baseline. In all of these tasks, the predictor variables are from the baseline visit. The group of features (predictors) used for regression are denoted by the column vectors X i , (i = 0, 1, . . ., L), where L is the number of features ( Figure S.2). The ADAS-cog scores (dependent variable or response variable) are denoted by the column vector Y.
We employed widely used machine learning techniques including partial least squares regression (PLSR) [43], support vector regression (SVR) [44], and random forest regression (RF) and created prediction models [45]. Additionally, a genetic algorithm (GA) was utilized to rank the variables in the order of importance in the multi-modal case [46]. The details on these methods are provided in the supplementary section. Figure 1: Schematic of regression modeling. X is single or multi-modal predictors and Y is the target value to be predicted. We utilized 5-fold cross-validation repeated 10 times to account for the random assignment of subjects to different folds. Partial least squares regression (PLSR), support-vector regression (SVR) and random forest regression (RF) models were trained and tuned based on training folds and evaluated on test folds. The utilized Matlab function and hyperparameter tuning are shown in italics. Cross-validated correlation (ρ) and mean absolute error (MAE) metrics were employed and average performance for 10 runs computed.

Regression modeling and performance metrics
The prediction of the ADAS scores (at baseline, 12-months, 24-months, and 36 months) was performed by employing PLSR, SVR, and RF. Both single modal (each modality of Section 2.1 alone) and multimodal predictors (all modalities of Section 2.1 combined,) were considered. All the predictors were from the baseline visit. We evaluated the prediction models using 5-fold repeated cross-validation with 10 repeats, see Figure 1 and  (Figure S.2). All variables were assumed to be continuous and we standardized the variables to be zero-mean and unit standard deviation. The model was evaluated in terms of correlation coe cient (ρ) and the mean absolute error (MAE) between the actual ADAS-cog 13 scores and its model-predicted values. From the 5-fold cross-validation, we averaged the resulting 5 distinct values and computed 95% confidence intervals (CIs) using the bootstrapping method. Similarly, MAE and its CIs were computed. The process was repeated 10 times and its distribution analyzed. For mathematical details of these performance metrics as well as the CI computation in the case of repeated cross-validation that takes into account inter-dependency of distinct repeats, readers may consult Lewis et.al. [47].
The analyses were performed on MATLAB 2018b (The Mathworks Inc, Natick, MA) using native machine learning functions. The PLSR was executed with plsregress function and the optimal number of PLS components was manually selected based on the least root mean square error for training data [48]. SVR was executed with trsvm and RF with trensemble and in both methods the models were tuned by setting OptimizeHyperparameters argument as auto [49,50]. Additionally, GA-PLS was utilized to analyze the importance of each modality in the multimodal PLSR regression models [51].

Results
As depicted in Figure 1, we created single modality and multi-modal regression models and compared their performance. The comparison (Figure 3) shows that multi-modal based prediction models outperform single modality consistently in all the timelines (baseline and subsequent 12, 24 and 36month follow-up) in all subjects tested (i.e., collapsing over diagnostic categories). The correlation between the predicted ADAS-Cog 13 based on multi-modal data and that observed at 12, 24 and 36 months, reached 0.86, 0.82, and 0.75, respectively. The performance comparison ( Figure S.3) shows that the differences among PLSR, SVR, and RF were not significant (i.e., p > 0.05), except for some instances where PLSR underperformed compared to RF (baseline and 12 months: MRI, CSF, and FDG; 24 and 36 months: APOE and multi-modal). However, PLSR models were computationally faster and performed consistently.
By analyzing the importance of measures ( Figure 2) contributing to PLSR's correlation we observe that the neuropsychological and behavioral parameters (NePB) were most important and consistent across time periods for predicting ADAS score, followed by CSF and MRI biomarkers. Despite the association of age at baseline, years of education (Edu.) and APOE e4 status with AD risk, thers parameters were found to be least important, perhaps because these factors are somehow re ected in other parameters. By contrast, the importance of amyloid and τ increased when predictions were made 36 months in advance ( Figure 2). Additionally, metabolic activity in temporal right and left sides were on the opposite ends of the importance in the ADAS-cog score predictions.  Grouping data based on diagnosis at baseline (Figure 4) and analyzing the performance further magnified the poor correlation when a single modality approach was employed to predict this multifactorial disease. We observe that NePB, single modal, data shows the best predictive performance, in keeping with the fact that the to-be-predicted variable (ADAS-Cog 13) also contains NeBP outcomes. However, the multimodal approach performs better than MCI and AD groups especially during 24-and 36month time periods. Due to the high variation in ADAS scores in AD groups the correlation (ρ) and MAE were not inversely proportional to each other.   Our multimodal approach (multivariate) based prediction models with the inclusion of baseline ADAS-cog scores were better (ρ = 0.80 to 0.90, Figure 5) than prediction models based only on baseline ADAS-cog scores (univariate, ρ = 0.75 to 0.87). The inclusion of the ADAS-cog score with other baseline multi-modal predictors was observed with improvements (p = 0.002 to 0.18) in the correlations. Overall, the prediction models predict well across the time periods and this can be observed when we compare the mean predicted values versus the actual mean values ( Figure 6).

Discussion
We present a multi-modal regression approach to quantitatively track the progression of Alzheimer's disease and show that it outperforms the conventional single modal approach. Quantification of AD aids clinicians in decisions with treatment and a multi-modal approach ensures that the prediction models consider all biomarkers contributing to the disease condition. Furthermore, conventional classification of patients into normal, MCI or AD could be avoided as a clear distinction amongst the group is a challenging task [52].
The classification of subjects based on a few modalities has been the focus of most recent studies. Although high classification accuracy (>80%) has been reported [11], we speculate that the impact of mislabeling a subject in the wrong category (and hence, wrong therapy prescribed) is higher than the error in predicting ADAS-cog scores (<5 units). Additionally, ADAS-cog scores are easy to interpret and follow the longitudinal tracking of AD progression. In agreement with the classification-based studies [4], the multi-modal approach outperforms the single modality, however, in this study multi-modal data were used for predicting the ADAS-cog scores. Furthermore, our multi-modal approach shows that ADAS-cog scores are conducive to longitudinal predictions contrary to Marinescu et al [21], where ADAS-cog scores were concluded not predictable. We, however, acknowledge that studies were not set up equally as there were time constraints, differences in subjects and underutilization of longitudinal data.
Clinically, NePB tests and ADAS-cog scores measure the subject's cognitive abilities and this similarity was showcased with the observance of higher correlations (Figure 3). CSF biomarkers showed high correlations several studies support this strong relationship between CSF biomarkers and AD state [34]. As the precise pathophysiology and relative contribution of different pathogenic factors to AD at different phases of disease progression are currently still under investigation, the results advocate that instead of manually estimating the best markers, a multi-modal approach is beneficial. However, we acknowledge that the variable selection methods can be utilized to select the best AD measures (or create sparse models) utilized in multimodal modeling further improving the robustness of the prediction model.

Limitations
The multivariate techniques (i.e., PLSR, SVR, and RF) were observed to perform very similarly in their predictions but the computation times were different, and this prompted us to favor PLSR. Other nonlinear model selection techniques could improve current results [53]. The subject attrition during follow-ups may have diminished the predictive performance of the model.

Conclusion
ADAS-cog 13 scores re ect the current cognitive state of individuals, and through multivariate regression and a multi-modal dataset, our results show that quantitative longitudinal prediction of AD progression is possible. Thus, the automated multi-modal approach may help clinicians make timely decisions for interventions at all stages of AD and inform likely disease progression at the start of clinical trials.

Declarations
Ethical Approval and Consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
Readers are directed to www.adni-info.org for detailed information on the ADNI project and the TADPOLE challenge https://tadpole.grand-challenge.org constructed by the EuroPOND consortium (http://europond.eu). The main codes and resulting .mat file are available on GitHub: https://github.com/mithp/ADAS_multimodal.git

Competing interests
The authors have no conflicts of interest related to the execution of this study and the preparation of the manuscript.    Single modality uses one predictor at a time while multi-modal uses all the predictors as indicated above. The sample size for baseline (N = 757), 12-months (N = 629), 24-months (N = 563) and 36-months (N = 314) were different due to missing values (cohort attrition). The predictors consist of age at baseline, years of formal education (Edu.), APOE e4 status (absence, single copy or homozygous coded as 0, 1 and 2 respectively), MRI-derived parameters, neuropsychiatric and behavioral assessment (NePB), AV45-PET measurements, CSF biomarkers (amyloid-β, τ, pτ) and FDG-PET measures. The number of features is indicated above each modality abbreviations. All the variables were considered as continuous and standardized to be zero-mean and unit standard deviation.   Figure 1 Schematic of regression modeling. X is single or multi-modal predictors and Y is the target value to be predicted. We utilized 5-fold cross-validation repeated 10 times to account for the random assignment of subjects to different folds. Partial least squares regression (PLSR), support-vector regression (SVR) and random forest regression (RF) models were trained and tuned based on training folds and evaluated on test folds. The utilized Matlab function and hyperparameter tuning are shown in italics. Cross-validated correlation (ρ) and mean absolute error (MAE) metrics were employed and average performance for 10 runs computed.    Performance comparison of prediction models utilizing only ADAS scores vs. multimodal data with and without the combination with baseline ADAS scores. The p-values correspond to pair-wise differences between the three prediction models at different time periods.