A Multimodal Machine Learning Model for Predicting Dementia Conversion in Alzheimer’s Disease

doi:10.21203/rs.3.rs-3148332/v1

Alzheimer's disease (AD) accounts for 60–70% of the population with dementia. Despite the integration of MRI and PET in AD clinics and research, there is a lack of validated models for predicting dementia conversion from MCI. Thus, we aimed to investigate and validate a machine learning model to predict this. A total of 196 subjects were enrolled from four hospitals and the Alzheimer’s Disease Neuroimaging Initiative dataset. Volumes of the ROI, white matter hyperintensity, and regional SUVR were analyzed using T1, T2-FLAIR MRIs, and amyloid PET (αPET), along with automatically provided hippocampal occupancy scores and Fazekas scales. Compared with the GBM model trained solely on demographics, AUC of the cross-validation models incorporating T1 image features (p_Bonferroni=0.03) and T1 and αPET image features (p_Bonferroni<0.001). The two cross-validated models (p_Bonferroni=0.08) did not differ significantly in their predictive measures. After performing the inference, the model combining T1 and αPET image features exhibited the highest AUC (0.875), which was comparable to that of the model using only T1 image features (0.835). Our machine learning model utilizing Solitaire T1 MRI features shows promising predictive value for dementia conversion within a 4-year timeframe, making it applicable in circumstances where αPET is unavailable.

Biological sciences/Neuroscience

Health sciences/Medical research

Health sciences/Neurology

Physical sciences/Nanoscience and technology

Alzheimer's disease (AD) is the most common neurodegenerative disorder, accounting for 60–70% of patients with dementia. Throughout the course of neurodegeneration, cognitive function and daily functional abilities deteriorate progressively.

Mild cognitive impairment (MCI) is a diagnostic entity defined as an intermediate stage between subjective cognitive decline and dementia. Among patients with MCI, the rate of conversion to dementia is known to be around 10–15% annually [3, 4, 5]. Since proper and early intervention at the MCI stage contributes to a better prognosis [6], the prediction of dementia conversion at the MCI stage is critical for improving patients’ quality of life.

With the development of magnetic resonance imaging (MRI) and positron emission tomography (PET), technologies for analyzing brain patterns and the underlying pathologies of AD are widely used. Machine learning contributes greatly to diagnosing the current status, estimating disease progression, and classifying subtypes of patients with AD [7]. Several studies have implemented models for predicting dementia conversion based on MRI features [8, 9, 10], and one study reported the Area Under the Curve (AUC) for predicting dementia conversion as 0.910 with the combination of magnetic resonance spectroscopy features of the posterior cingulate cortex and the parahippocampal gyrus volume [11]. However, there is a lack of a verified model that is universally applicable in various clinical settings because obtaining the required data sufficient for algorithm development of the prediction model is difficult in real-world situations [12].

Therefore, we aimed to investigate a machine learning model that can be utilized in various clinical settings to predict dementia conversion in patients with MCI on the AD spectrum. After evaluating multiple machine learning models in the field and utilizing multimodal image features from MRI and PET scans from a multisite dataset, validation and comparison analyses were performed to confirm the models with the highest predictive measures.

Table 1

Baseline characteristics of sample population
Total N = 196	Non-converter 76% (149)	Converter 24% (47)	Statistic	p-value
Age (years), mean (SD)	71.66 ± 7.16	72.94 ± 6.64	1.08	0.28
MMSE, mean (SD)	26.46 ± 3.09	24.0 ± 3.24	-4.68	< 0.001
Female sex, % (N)	57 (85)	55 (26)	${\chi }^{2}$ = 0.04	0.83
CDR (Baseline), mean (SD)	0.5	0.5
CDR (Follow up), mean (SD)	0.5	1.13 ± 0.33
ApoE $\epsilon$4 Carrier, % (N)	31 (46)	49 (23)	${\chi }^{2}$ = 5.11	0.02
Study interval, years	2.61 ± 0.50	2.69 ± 0.54	0.97	0.34

ADNI, Alzheimer’s Disease Neuroimaging Initiative; MMSE, Mini-mental State Examination; CDR, Clinical Dementia Rating. Chi-square tests for sex and ApoE $\epsilon$4 Carrier, two-sample t-tests for age and MMSE.

The demographic characteristics of the participants are presented in Table 1. Table 1 summarizes the demographic information for all groups, baseline measurements, and follow-up CDR for the converter and non-converter groups, and statistical differences between the converter and non-converter groups. Of the 196 patients with MCI, 47 converted to AD. Except for the MMSE score and ApoE4 carrier status, there were no statistically significant differences in demographics.

Supplementary Table S3 shows the overall inference performance of the different models based on BA, SE, and SP, The AUC results after applying the test set to the trained models are shown in Figure 1. The models that showed a high AUC among those with trained demographic characteristics were SVM (AUC = 0.656), LR (AUC = 0.677), GBM (AUC = 0.634), and XGB (AUC = 0.634). For models trained with T1 image features, the AUC of the RF (AUC: 0.738), SVM (AUC: 0.817), and GBM (AUC: 0.824) models were the highest. For models trained with T2-FLAIR (T2-weighted-Fluid-Attenuated Inversion Recovery) image features, the AUC of the LR (AUC: 0.430), GBM (AUC: 0.430), and XGB (AUC: 0.380) models were the highest. Finally, in the case of models trained with amyloid positron emission tomography (αPET) image features, the AUC of the RF (AUC: 0.869), SVM (AUC: 0.846), and GBM (AUC: 0.824) models were the highest. The model with the highest AUC among the models trained on a single modality showed in common was the GBM model. Therefore, the GBM was used as a model when combining modality features with demographic characteristics. Another thing, in the case of models trained with T2-FLAIR image features, all AUCs were less than 0.5. T2-FLAIR was excluded from the modality combination experiments because it was considered to provide less information than the other modalities.

Figure 2 shows the AUC comparison of 10-fold cross-validation results using the selected GBM for modality combinations of T1 and αPET image features added to the demographic characteristics. The mean AUC of the GBM model using only demographic characteristics was 0.83; the GBM model using demographic characteristics and T1 image features was 0.93; and the GBM model using demographic characteristics, T1 images, and αPET image features was 0.98. Kruskal-Wallis one-way analysis was performed, followed by a post-hoc test. There was a significant difference between the models using only demographic characteristics and those with additional T1 image features (p_Bonferroni=0.03), and between the models using only demographic characteristics and those with additional T1 and αPET image features (p_Bonferroni<0.001). However, the model with additional αPET image features over demographics and T1 image features did not differ significantly in the prediction AUC (p_Bonferroni=0.08).

Table 2. Cross-validated GBM performance measures according to the addition of image modality features

	BA	p_Bonferroni	SE	p_Bonferroni	SP	p_Bonferroni
Demographic	0.77 ± 0.11		0.81 ± 0.17		0.72 ± 0.09
Demographic + T1	0.90 ± 0.06	0.01	0.96 ± 0.04	0.06	0.84 ± 0.10	0.05
Demographic	0.77 ± 0.11		0.81 ± 0.17		0.72 ± 0.09
Demographic + T1 + αPET	0.92 ± 0.05	< 0.01	0.95 ± 0.06	0.11	0.89 ± 0.08	< 0.01
Demographic + T1	0.90 ± 0.06		0.96 ± 0.04		0.84 ± 0.10
Demographic + T1 + αPET	0.92 ± 0.05	1.00	0.95 ± 0.06	1.00	0.89 ± 0.08	0.67

BA; Balanced Accuracy, SE; Sensitivity, SP; Specificity.

As shown in Table 2, BA, SE, and SP were significantly higher when only T1 or both T1 and αPET image features were added to the demographic features. However, αPET image features did not bring a significant performance change when added to the model with baseline demographic and T1 image features. Similarly, in the cross-validation performance shown in Table 2, the performance of the GBM during inference increased as image modality features were added, as shown in Figure 3. The AUC of the model trained only for demographic characteristic information was 0.634. However, when the T1 image feature was added, the AUC increased significantly to 0.835, whereas when the αPET image feature was added, the AUC increased slightly to 0.875. Interestingly, the performance of the model using demographic characteristics and T1 image features showed (BA = 0.815, SE = 0.889, SP = 0.742) and higher performance (BA = 0.744, SE = 0.778, SP = 0.710) with additional αPET features.

This study aimed to investigate and validate a universally applicable machine learning model for predicting dementia conversion in patients within a 4-year timeframe. By combining demographic information, T1 MRI, and αPET imaging features, we evaluated the performance of the model and compared it with previous studies in the field.

Our findings demonstrated that the integration of demographic information and imaging features significantly improved the reliability of predicting the likelihood of patients with MCI converting to AD within a specified timeframe. The dataset used in this study consisted of 196 patients with MCI with a conversion rate to dementia of 24%, which is higher than the previously known AD conversion rate [3, 4, 5]. This highlights the importance of establishing a scientific basis for identifying patients with MCI with the least possibility of converting to AD, despite the majority maintaining an MCI state or experiencing a return to normal cognitive levels. To ensure the sensitivity of our model, we applied the SMOTE technique to up-sample the data of the dementia conversion group, which allowed us to capture the patterns associated with progression and improve the overall performance of the model.

Previous studies have explored various modalities and techniques for predicting dementia conversion in patients with mild cognitive impairment (MCI). Hinrichs et al [15] utilized longitudinal MRI data and achieved maximum SVM classification model performance with an AUC of 79% in predicting dementia conversion in patients with progressive MCI (MCIp). Moradi et al [16] developed a prediction model in 1-3 years intervals. They performed aggregation with MRI features and MMSE scores adjusted for age, obtaining an AUC of 90.2% by training an RF model on aggregated biomarkers. Zhang et al [17] combined fluorodeoxyglucose (FDG) - PET, MRI, and cognitive scores to train a multi-kernel SVM model, achieving an AUC of 76.8%. T. Zhang et al [18] proposed a framework using a combination of structural and functional MRI features and trained an SVM model, resulting in a conversion prediction accuracy of 84.71%. Franciotti et al [19] constructed a multimodal dataset using neurophysiological test scores, cerebrospinal fluid (CSF), the ApoE genotype, and structural MRI features, achieving an accuracy of 90% when combining neurophysiological test scores and CSF. Additionally, Liu et al [20] proposed a framework for developing a predictive model within three years using structural MRI features, FDG-PET, CSF, ApoE genotype, and neuropsychological scores, with the AUC of the ELM model being the best at 94.7%.

In contrast to previous studies, our approach focused on observing the impact of modality combinations, and specifically evaluated the effects of adding αPET image features. Our results indicated that the addition of MRI features, such as T1 images, to demographic characteristics significantly increased the AUC, suggesting the utility of MRI features alone in predicting dementia conversion. Although the αPET image features were helpful in further improving the model's performance, the difference was not statistically significant compared with the model incorporating MRI features alone. This finding suggests that AD dementia conversion prediction within a 4-year timeframe can be achieved without relying on the relatively challenging cost and possible radiation hazards of αPET.

Although our study yielded promising results, some limitations must be acknowledged. First, we did not consider the lifestyle patterns of patients with MCI, such as alcohol consumption, smoking, and exercise, which could potentially enhance the performance of our decision-making model. Incorporating this information into future studies may lead to better predictive values. Second, although the importance of αPET remains, our findings demonstrate that a modality combination excluding αPET can predict dementia conversion within a relatively short period. This could alleviate concerns related to radiation exposure and reduce the burden on patients with MCI.

Our study confirmed the value of αPET's value in predicting dementia conversion in patients with MCI. Nevertheless, our machine learning model utilizing Solitaire T1 MRI features demonstrated promising predictive value for dementia conversion within a 4-year timeframe, particularly in circumstances without αPET scans. The predictive ability of our model, even without αPET, suggests that AD dementia conversion prediction is feasible and can aid clinicians in treatment planning or prognostic evaluation for patients with MCI within a 4-year period. Further research is warranted to explore the potential role of life pattern information and validate the generalizability of our findings to larger and more diverse datasets.

A total of 196 subjects were enrolled from four tertiary hospitals and the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset. Within a four-year follow-up period, we defined the subjects as the dementia conversion group when their global Clinical Dementia Rating (CDR) score reached 1.0 or higher within the follow-up period. Subjects maintaining a global CDR score of 0.5 were defined as the non-conversion group. The collected demographics of all sites are (1) age, (2) sex, (3) Mini-Mental State Examination (MMSE), (4) ApoE 4 Carrier, (5) CDR. Those aged 50-85 years, diagnosed with MCI at the time of initial treatment, and who underwent follow-up diagnostic tests within 2–4 years were included in the eligibility criteria. Approval of the MRI and αPET images used for this study was obtained from the Yeouido St. Mary's Hospital Institutional Review Board (IRB) [2022-1185], the IRB of Chungnam National University Hospital (CNUH-2022-05-020), the IRB of Ajou University Hospital (AJIRB-MED-EXP-22-284) and the IRB of Kyung Hee University Hospital (KNUH-2022-05-012) with a waiver of informed consent. All conformed to the Declaration of Helsinki (https://www.nature.com/srep/journal-policies/editorial-policies#experimental-subjects). Image acquisition methods are described for each site.

Site1 dataset underwent to brain MRI and PET at the Catholic University of Korea, Yeouido St. Mary’s Hospital, Seoul, Republic of Korea. A dataset satisfying the conversion definition was extracted and 44 non-conversion groups were obtained. MRI and PET images were obtained from patients with mild cognitive impairment. The Site1 dataset was acquired from human subjects on 3.0T a Siemens scanner. T1-weighted MRI images were acquired (TR=1700~1800ms, TE=2.6ms, and flip angle=9 ). T2 FLAIR MRI images were acquired (TR/TI=9000/2500ms, TE=76ms, Flip angle=150 ). αPET images were acquired with ¹⁸F-Florbetaben, ¹⁸F-Flutemetamol.

Site2 dataset underwent brain MRI and PET at Chungnam National University Hospital, Daejeon, Republic of Korea. A dataset satisfying the conversion definition was extracted, and two non-conversion groups were obtained. MRI and PET images were obtained from patients with mild cognitive impairment. 3D T1-weighted MRI images were acquired on a 3.0T Siemens (TR=2000ms, TE=2.29ms, flip angle=8 ), 3.0T GE (TR=7.956ms, TE=2.82ms, flip angle=10 ). T2 FLAIR MRI images were acquired on a 3.0T Siemens (TR/TI=9000/2500ms, TE=121ms, Flip angle=121 ), 3.0T GE (TR/TI=11000/2648.61ms, TE=93.544, flip angle=160 ). αPET images were acquired with ¹⁸F-Flutemetamol.

Site3 dataset underwent to brain MRI and PET at the Ajou University Hospital, Suwon, Republic of Korea. A dataset satisfying the conversion definition was extracted, and 34 non-conversion and 3 conversion groups were obtained. MRI and PET images were obtained from patients with mild cognitive impairment. 3D T1-weighted MRI images were acquired on a 3.0T GE (TR=7.1~8.88ms, TE=2.776~3.396ms, Flip angle=8 or 12 ), 3.0T Philips (TR=9.8ms, TE=4.6ms, Flip angle=8 ). T2 FLAIR MRI images were acquired on a 3.0T GE (TR/TI=8800–12000/2450~2709ms, TE=89~128ms, Flip angle=160 ), 3.0T Philips (TR/TI=8000/2500ms, TE=125ms, Flip angle=90 ). αPET images were acquired with ¹⁸F-Flutemetamol.

Site4 dataset underwent to brain MRI and PET at Kyung Hee University Medical Center, Seoul, Republic of Korea. A dataset satisfying the conversion definition was extracted, and 29 non-conversion and 14 conversion groups were obtained. MRI and PET images were obtained from patients with mild cognitive impairment. 3D T1-weighted MRI images were acquired on a 3.0T Philips (TR=9.4ms, TE=4.6ms, Flip angle=8 ), 3.0T Siemens (TR=2000ms, TE=3.05ms, Flip angle=9 ). T2 FLAIR MRI images were acquired using a 3.0T Philips (TR/TI=10000/2800, TE=120 or 125ms, Flip angle=90 ) a 3.0T Siemens (TR/TI=8000~10730/2500~2665.9ms, TE=86~115ms, Flip angle=150 ). αPET images were acquired with ¹⁸F-Florbetaben.

For this study, we used the ADNIMERGE subset, in which demographic and clinical test scores and MRI and PET variables were summarized. This subset is part of the official dataset provided by the ADNI. When data satisfying the conversion definition were extracted from the subset, 40 non-conversion and 12 conversion groups were obtained. 3D T1-weighted MRI images were acquired on a 3.0T GE (TR=7.3~7.6ms, TE=3.05~0.12ms, Flip angle=11 ), 3.0T Philips (TR=6.5ms, TE=2.9ms, Flip angle:9 ), 3.0T Siemens (TR=2300ms, TE=2.95~2.98ms, Flip angle=9 ). T2 FLAIR MRI images were acquired on a 3.0T GE (TR/TI=4800/1442~1482ms, TE=115.7~117ms, Flip angle=90 ), 3.0T Philips (TR/TI=4800/1650ms, TE=271~275ms, Flip angle=90 ), 3.0T Siemens (TR/TI=4800 or 9000/1650~2500ms, TE=90~443ms, 120 ). αPET images were acquired using ¹⁸F-Florbetapir, ¹⁸F-Florbetaben.

The acquired 3D T1 images were pre-processed and segmented into 114 ROIs using AQUA (Neurophet, South Korea). After calculating the volume of the segmented area, intracranial volume (ICV) normalization was performed. In addition, the hippocampal occupancy score (HOC), which is used as an index of neurodegenerative disease biomarkers [13], was calculated and used as an input. The white matter hyper intensities (WMHs), periventricular WMHs, and deep WMHs were calculated from the T2 FLAIR image, where registration was applied to the 3D T1 image, and the Fazekas scale was rated for each region as minimal (0), moderate (1), and severe (2). The acquired αPET images were also registered with 3D T1 images, the voxels in αPET images were scaled using the mean uptake value in the cerebellar gray matter to calculate the standardized uptake value ratio (SUVR). Consequently, the number of features of each modality used as input was 115 for volumetric information, 6 for WMH information, and 144 for SUVR information.

We used the synthetic minority oversampling technique (SMOTE) to remove the possibility of biased prediction by balancing dementia conversion and non-conversion data. Standardization was performed to ensure the same level of importance, and all features were used in the model. For this reason, the z score method was used, where is the original value for feature j, is the normalized value, is the feature’s mean and is the feature’s standard deviation. Consequently, the z-score method produces a new dataset in which all features have zero mean and unit standard deviation. The values for categorical features were also encoded.

The purpose of ICV normalization was to correct for differences in the ROI volume due to the different head sizes of individuals and sexes. This was performed by dividing the total intracranial volume (ICV) by each volumetric feature of the subject. This normalization method is commonly used [14].

The 196 cases dataset was divided in a stratified way into a training set (80%) and a testing set (20%), maintaining the sample percentage of each class in both sets. We evaluated the performance of each model using our dataset. The models used were decision trees (DT), random forests (RF), support vector machines (SVM), linear regression classifiers (LR), gradient boosting models (GBM), and Extreme Gradient Boosting (XGB). We trained each model and set up a grid search using the hyperparameters to select a model that generalized well. In the process of hyperparameter tuning, a 10-fold cross-validation was performed. Three categories were considered for the prediction models. The models were constructed based on demographic characteristics and each modality (T1, T2-FLAIR, and αPET) features. Single-modality models were built using demographic characteristics and modality features. To observe the change in model performance according to the change in input information, we selected a model that was excellent in a single modality among the three models, with high performance for each model. After training the single-modality model, the modalities with relatively low performance were excluded. By adding modality features to the demographic characteristics using the selected model, the change in the performance of the model according to the input information was confirmed. The reason for considering this approach was that it was possible to check the performance change when the demographic characteristics and modality features were combined, and which combination showed good performance.

Because a two-class task model was developed to predict the possibility of dementia conversion in the MCI patient group, four metrics representing classification performance were used as model evaluation indicators. where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives (Eq. 1, 2, 3, and 4):