As shown in Fig. 1, the workflow followed for the article selection included the four phases (identification, screening, eligibility, and inclusion) proposed by the PRISMA guidelines (18, 19). The 159 articles remaining after eliminating duplicates were screened and, after applying the exclusion criteria, 47 articles were selected for the review.
The risk of bias analysis is shown in Fig. 2 and Table 2. The overall risk of bias of all the studies was considered low. From the 48 articles selected at the eligibility stage, only one study (22) was not included in the qualitative analysis because of the high risk of bias. The sample size in this study was seven subjects, and it did not include any validation method. Therefore, the final number of studies included in the qualitative analysis was 47.
Table 2
Risk of bias analysis for individual studies
Author (year)
|
Algorithm
|
Validation
|
Database
|
Total bias of the study
|
Plant et al. (2010) (23)
|
0
|
0
|
2
|
2
|
Costafreda et al. (2011) (24)
|
1
|
0
|
1
|
2
|
Chincarini et al. (2011) (25)
|
0
|
0
|
0
|
0
|
Filipovych et al. (2011) (26)
|
0
|
0
|
0
|
0
|
Hinrichs et al. (2011) (8)
|
0
|
0
|
0
|
0
|
Westman et al. (2011) (27)
|
0
|
0
|
1
|
1
|
Zhang et al. (2011) (28)
|
0
|
0
|
0
|
0
|
Cho et al. (2012) (29)
|
0
|
0
|
0
|
0
|
Gray et al. (2012) (30)
|
0
|
0
|
0
|
0
|
Toussaint et al. (2012) (31)
|
1
|
0
|
0
|
1
|
Zhang et al. (2012) (9)
|
0
|
0
|
0
|
0
|
Casanova et al. (2013) (32)
|
0
|
0
|
0
|
0
|
Liu, X. et al. (2013) (33)
|
0
|
0
|
0
|
0
|
Wee et al. (2013) (34)
|
0
|
0
|
0
|
0
|
Young et al. (2013) (35)
|
1
|
0
|
0
|
1
|
Guerrero et al. (2014) (36)
|
0
|
0
|
0
|
0
|
Lebedev et al. (2014) (37)
|
0
|
0
|
0
|
0
|
Liu, M. et al. (2014) (38)
|
0
|
0
|
0
|
0
|
Liu, F. et al. (2014) (39)
|
0
|
0
|
0
|
0
|
Min et al. (2014) (40)
|
0
|
0
|
0
|
0
|
Suk et al. (2014) (41)
|
0
|
0
|
0
|
0
|
Cheng et al. (2015) (42)
|
0
|
0
|
0
|
0
|
Moradi et al. (2015) (43)
|
0
|
0
|
0
|
0
|
Ritter et al. (2015) (44)
|
1
|
0
|
0
|
1
|
Salvatore et al. (2015) (45)
|
0
|
0
|
0
|
0
|
Collij et al. (2016) (46)
|
0
|
2
|
2
|
4
|
Li et al. (2016) (47)
|
0
|
0
|
2
|
2
|
López et al. (2016) (48)
|
2
|
0
|
0
|
2
|
Thung et al. (2016) (49)
|
0
|
0
|
0
|
0
|
Long et al. (2017) (50)
|
0
|
0
|
0
|
0
|
Donnelly-Kehoe et al. (2018) (51)
|
0
|
0
|
0
|
0
|
Gao et al. (2018) (52)
|
1
|
0
|
0
|
1
|
Khanna et al. (2018) (53)
|
1
|
0
|
0
|
1
|
Popuri et al. (2018) (54)
|
0
|
0
|
0
|
0
|
Gupta et al. (2019) (55)
|
2
|
0
|
0
|
2
|
Lee et al. (2019) (56)
|
0
|
0
|
0
|
0
|
Pusil et al. (2019) (57)
|
0
|
0
|
0
|
0
|
Spasov et al. (2019) (58)
|
0
|
0
|
2
|
2
|
Wee et al. (2019) (59)
|
0
|
0
|
0
|
0
|
Abrol et al. (2020) (60)
|
0
|
0
|
0
|
0
|
Gao et al. (2020) (61)
|
0
|
0
|
0
|
0
|
Giorgio et al. (2020) (62)
|
0
|
0
|
0
|
0
|
Lin et al. (2020) (63)
|
1
|
0
|
0
|
1
|
Pan et al. (2020) (64)
|
0
|
0
|
0
|
0
|
Ramon-Julvez et al. (2020) (65)
|
0
|
0
|
0
|
0
|
Xiao et al. (2020) (66)
|
1
|
0
|
0
|
1
|
Xu et al. (2020) (67)
|
0
|
0
|
0
|
0
|
|
0
|
0
|
2
|
2
|
|
12/96
|
2/96
|
12/96
|
27/288
|
Note. This table shows the results of the bias analysis performed based on Higgins et al. (20) with the puntuations specified in Table 1.
|
The studies selected for the qualitative analysis are presented in Table 3 following the structure explained in the data extraction section (study, cohort, sample [mean age], database, features and neuroimaging technique, classification method, validation method, results [% accuracy], and AUC ROC).
Magnetic Resonance was the most common kind of neuroimaging used (in 28 out of 47 studies), followed by Positron Emission Tomography (PET, in three out of 47 studies), 13 studies included data from both techniques (MRI and PET), two studies used Magnetoencephalography (MEG) data and one study used MRI and MEG data.
Regarding the source of the datasets, 40 out of 47 studies used the ADNI database in any of its versions (ADNI-1, 2, 3 or GO) to obtain samples of healthy, MCI and AD subjects. Of the remaining seven studies, two used data from AddNeuroMed (https://consortiapedia.fastercures.org/consortia/anm/) database (24, 27) and five collected their own data (23, 46, 48, 57, 67).
Although almost all studies used the same database, the cohorts selected varied across them. Most articles (28 out of 47 studies) divided their participants in four groups: healthy controls, stable MCI patients (sMCI), progressive MCI patients (pMCI), and AD patients. Ten articles selected three cohorts formed by MCI, AD, and healthy subjects, although in order to predict the conversion to AD, they had to distinguish between pMCI and sMCI patients. The remaining nine studies used different groups of participants: only sMCI and pMCI (44, 48, 49, 57), only healthy controls and MCI (53, 61, 62, 67), or the distinction between early and late MCI (59).
The sample size also varied across studies, Pusil et al. (57) has the smallest sample with 54 subjects and Popuri et al. (54) has the largest sample with 1,294 subjects. The sample size follows an ascendant trend across years, which may be attributed to the increased data availability in the ADNI database. Mean age ranged from 62 to 79 years old. Although eight studies did not include the mean age of the sample, they used an ADNI database and therefore the age range might be similar to the rest of the studies. The variations in age between studies may be due to differences in participant selection and the moment when the study was conducted (since the ADNI database has been incorporating more data over the years).
As for feature selection, the most common were whole brain volumes, selected in 24 articles, and intensity measurements of glucose metabolism, selected in 13 PET studies, also nine studies included biological features (as APOE4 genotype). Other selected features were neuropsychological test results (seven out of 47 studies) and demographic variables as age (six out of 47 studies). 19 studies only used one type of feature such as 3D MRI data or whole brain grey matter volumes and 28 studies selected two or more different types of features.
Regarding the ML methods used to classify the patients and detect probable MCI to AD progression, the most popular were those based in Support Vector Machine (SVM). SVM was used in eight out of the 47 studies, and also in five in combination with other methods such as a Neural Network for feature selection (56) or Locally-Linear Embedding for dimensionality reduction (Liu et al., 2013). SVM is a supervised ML algorithm that has demonstrated its utility in neuroimaging-based applications, especially in classification of future clinical outcomes (68). This method takes every measurement from every subject as a single point in a multidimensional space, with the number of dimensions being the total number of features of that particular dataset (for example, 93 grey matter volumes from regions of interest). The algorithm then finds the maximal margin separating hyperplane that optimally differentiates groups of data points representing different classes (e.g. pMCI vs. sMCI, or AD vs. HC). The data instances closest to the group boundaries are the support vectors and are, by definition, the ones that determine the position of the hyperplane. The mapping into a higher dimensional space is done by a kernel function, usually polynomial or Gaussian (24). The SVM algorithm is trained with labelled data (indicating whether the data belongs to a healthy person, sMCI, pMCI, or AD patient, for example) to generate this multidimensional space. Once the model has been trained we can introduce a new subject with MCI and it will be classified in the multidimensional space into the boundaries of one of the previously defined groups (i.e. sMCI, pMCI, AD, etc.). For example, if the new patient is classified as belonging to the AD group, we can infer that this subject is more likely to develop a future AD due to being more similar to subjects in that group. The different groups for classification will depend on the specific methodology of each study.
The combination of SVM with other methods allows better feature selection and to avoid overfitting of data, this will facilitate the generalization of the model (i.e. achieving high accuracy when applied to different datasets). For example, Thung et al. (49) used SVM with multiple kernels (linear, Gaussian, and polynomial) after feature selection with least squares and logistic elastic net regressions and also matrix completion with label-guided low-rank matrix completion method. On the other hand, Toussaint et al. (31) used non-linear SVM with Gaussian Radial Basis Function kernel but only after a two-sample t-test and a spatial independent component analysis, performed for the detection of glucose metabolism and characteristic region patterns of AD patients. Other classification methods used were Random Forest (37, 43, 51) or Neural Networks that can have different architectures but the most commonly used for image classification tasks were Convolutional Neural Networks (58, 59, 64, 65).
As for validation methods, Cross-Validation was selected in 27 studies, with different numbers of folds and/or iterations. Cross-Validation consists in dividing the sample in two parts, one to train the algorithm (training set) and another one for validation (testing set). This partition can be done several times changing the train/test split of the data, and the accuracy of each iteration can be averaged to obtain a more robust quantification of the model performance instead of just validating the model on one test sample.
Another validation method is the Leave-One-Out Cross Validation, selected in five studies. In this case, the model is trained with all the data except for one data point, then it tries to classify the data point left out and does the same with the rest of the sample in subsequent iterations. The train/test method was selected in ten studies with different percentages of data partitions. Westman et al. (27) validated the model on an independent test set of 51 subjects and Popuri et al. (54) also performed the validation with an independent sample.
The results of ML classification algorithms can be assessed based on their sensitivity (percentage of correctly detected pMCI patients or true positive ratio) and specificity (percentage of healthy or sMCI subjects correctly identified or true negative ratio), or accuracy (percentage of correctly classified subjects). By changing the decision threshold of the classifier we can compensate the ratio between true positive/true negative and generate a graphic representation of that ratio, or what is known as the Receiver Operating Characteristic (ROC) curve (69). The calculation of the area under the ROC curve (AUC ROC) represents a good quantitative index for the comparison of classification models, since it indicates the ability of the model to predict both the presence and non-presence of disease, or in this case, the progression or lack of progression from MCI to AD (70). An AUC ROC of one implies a perfect classification of every subject in the sample. The maximum accuracies achieved by every study in the prediction of AD conversion from MCI patients or the accuracy of the method in discriminating between a progressive/stable MCI are shown in the “Results” column of Table 3; the AUC coefficient is presented when available. The best results with the SVM algorithm were obtained by Pusil et al. (2019) with 100% accuracy, but using a small sample of 56, making the model hardly generalizable. In studies with bigger samples, Guerrero et al. (36) had the highest accuracy results (97.1% with 511 subjects) followed by Long et al. (2017; 96.5% and 427 subjects), Gupta et al. (2019; 93.6%, 158 subjects) and Wee et al. (2019; 92.5%, 1083 subjects). Finally, the highest AUC ROC value was 0.99 in Long et al. (2017) followed by 0.96 in Xu et al. (2020) and 0.95 in Gupta et al. (2019).
Table 3
Studies selected following PRISMA guidelines
Author (year)
|
Groups
|
Sample size (mean age)
|
Database
|
Neuroimaging technique and features
|
Classification method
|
Validation method
|
Results (% accuracy)
|
AUC ROC
|
Plant et al. (2010) (23)
|
HS AD
MCI
|
18 (64.8)
32 (68.8)
24 (69.7)
|
Sample collected for the study
|
MRI: Whole-brain volume measures
|
SVM
Bayes
VFI
|
Train/test method: AD + HS as train set and MCI as test set.
|
SVM/Bayes/ VFI accuracy for pMCI vs sMCI:
SVM: 50
Bayes: 58.3
VFI: 75
|
NA
|
Costafreda et al. (2011) (24)
|
HS
AD
MCI
|
88 (73.6)
71 (74.9)
103 (74.1)
|
AddNeuroMed
|
MRI: 3D hippocampal morphometric measures
|
nl-SVM-RBFk
|
4-fold Cross Validation
|
pMCI vs sMCI: 80
|
NA
|
Chincarini et al. (2011) (25)
|
HS
AD
sMCI
pMCI
|
189 (76.6)
144 (75.5)
166 (75.7)
136 (75.1)
|
ADNI-1
|
MRI: GM volumes
|
SVM
|
20-fold Cross Validation
|
NA
|
0.74
|
Filipovych et al. (2011) (26)
|
HS
AD
sMCI
pMCI
|
63 (75.2)
54 (77.4)
174 (74.5)
68 (76.2)
|
ADNI-1
|
MRI: Whole-brain GM density
|
Semi-supervised SVM
|
Leave-one-out Cross Validation
|
pMCI: 79.4
sMCI: 51.7
|
0.69
|
Hinrichs et al. (2011) (8)
|
HS
AD
MCI
|
66 (76.2)
58 (76.6)
119 (75.1)
|
ADNI-1
|
MRI and PET: scan data, APOE4 genotype, CSF assays and cognitive tests results
|
MK-SVM
|
Train/test method: AD + HS as train set and MCI as test set
|
pMCI vs sMCI: NA
|
0.79
|
Westman et al. (2011) (37)
|
HS
AD
MCI
|
112 (73)
117 (76)
122 (75)
|
AddNeuroMed
|
MRI: whole-brain volume, age and education
|
OPLS
|
Train/test method: sample of 51 subjects
|
pMCI vs sMCI: 73
|
NA
|
Zhang et al. (2011) (28)
|
HS
AD
sMCI
pMCI
|
52 (75.3)
51 (75.2)
56 (75.3)
43 (75.3)
|
ADNI-1
|
MRI and PET: Volume, intensity and CSF (Aβ42, t-tau y p-tau) measurements
|
SVM
|
10-fold Cross Validation
|
pMCI: 91.5
sMCI: 73.4
|
NA
|
Cho et al. (2012) (29)
|
HS
AD
sMCI
pMCI
|
160 (76.2)
128 (76.0)
131 (74.8)
72 (74.8)
|
ADNI-1
|
MRI: Cortical thickness
|
LDA
|
Train/test method: 50/50 partition
|
pMCI vs sMCI: 70
|
NA
|
Gray et al. (2012) (30)
|
HS
AD
sMCI
pMCI
|
54 (NA)
50 (NA)
64 (NA)
53 (NA)
|
ADNI-1
|
PET: Signal intensity and relative change over 12 month
|
Nl-SVM-RBFk
|
Train/test method: 75/25 partition with 1000 iterations
|
sMCI vs pMCI: 63.1
|
0.66
|
Toussaint et al. (2012) (31)
|
HS
AD
sMCI
pMCI
|
80 (76.4)
80 (76.0)
40 (76.4)
40 (76.4)
|
ADNI-1
|
PET: Glucose metabolic signal and clinical measures
|
Two-sample t-test + spatial ICA + nl-SVM-RBFk
|
Leave-one-out Cross Validation
|
pMCI vs sMCI: 80
|
NA
|
Zhang et al. (2012) (9)
|
HS
AD
sMCI
pMCI
|
47 (NA)
40 (NA)
42 (NA)
38 (NA)
|
ADNI-1
|
MRI and PET: Volume, intensity and CSF (Aβ42, t-tau y p-tau) measurements
|
M3TL
|
10-fold Cross Validation
|
pMCI vs sMCI: 73.9
|
0.79
|
Casanova et al. (2013) (32)
|
HS
AD
sMCI
pMCI
|
188 (75.9)
171 (75.5)
182 (75.2)
153 (75.0)
|
ADNI-1
|
MRI: GM volume
|
RLR
|
10-fold Cross Validation
|
pMCI vs sMCI: 61.5
|
NA
|
Liu, X. et al. (2013) (33)
|
HS
AD
sMCI
pMCI
|
138 (76)
86 (75)
93 (75)
97 (75)
|
ADNI-1
|
MRI: Volume and cortical thickness
|
SVM
EN
LDA
|
Leave-one-out Cross Validation
|
pMCI vs sMCI:
SVM: 66
EN:68
LDA: 68
|
0.53
0.61
0.68
|
Wee et al. (2013) (34)
|
HS
AD
sMCI
pMCI
|
200 (75.8)
198 (75.7)
111 (75.3)
89 (74.8)
|
ADNI-1
|
MRI: Cortical thickness and correlation of cortical thickness between pairs of ROIs
|
Mk-SVM
|
10-fold Cross Validation
|
pMCI vs sMCI: 75.05
|
0.84
|
Young et al. (2013) (35)
|
HS
AD
sMCI
pMCI
|
73 (75.9)
63 (75.2)
96 (75.6)
47 (74.5)
|
ADNI-1
|
MRI and PET: Volume, intensity, APOE4 genotype and CSF (Aβ42, t-tau y p-tau) measurements
|
Gaussian Process
|
Leave-one-out Cross Validation
|
sMCI vs pMCI: 74.1
|
0.79
|
Guerrero et al. (2014) (36)
|
HS
AD
sMCI
pMCI
|
175 (76.3)
106 (75.4)
114 (75.1)
116 (74.7)
|
ADNI-1
ADNI-GO
|
3D MRI data
|
SVM
|
Train/test
|
pMCI vs sMCI: 97.1
|
NA
|
Lebedev et al. (2014) (37)
|
HS
AD
MCI
|
225 (75.9)
185 (75.2)
165 (75.5)
|
ADNI-1
|
MRI: Cortical thickness, demographic variables and APOE4 genotype
|
RF
|
Independent sample
|
pMCI vs sMCI: 82.3
|
0.83
|
Liu, M. et al. (2014) (38)
|
HS
AD
sMCI
pMCI
|
229 (76.0)
198 (75.7)
236 (74.9)
167 (74.9)
|
ADNI-1
|
MRI. Whole-brain GM density
|
SVM
|
10-fold Cross Validation
|
pMCI vs sMCI: 70.7
|
NA
|
Liu, F. et al. (2014) (39)
|
HS
AD
MCI
|
52 (75.3)
51 (75.2)
99 (75.3)
|
ADNI-1
|
MRI and PET: Volume and intensity measurements
|
Mk-SVM
|
10-fold Cross Validation
|
sMCI vs pMCI:67.8
|
0.69
|
Min et al. (2014) (40)
|
HS
AD
sMCI
pMCI
|
128 (76.1)
97 (75.9)
117 (75.0)
117 (75.2)
|
ADNI-1
|
MRI: Multi-atlas GM volume measurements
|
SVM
|
10-fold Cross Validation
|
pMCI vs sMCI: 72.4
|
0.67
|
Suk et al. (2014) (41)
|
HS
AD
MCI
|
101 (75.9)
93 (75.5)
204 (74.9)
|
ADNI-1
|
MRI and PET: Volume and intensity measurements
|
DBM
|
10-fold Cross Validation
|
pMCI vs sMCI: 75.9
|
0.74
|
Cheng et al. (2015) (42)
|
HS
AD
sMCI
pMCI
|
52 (NA)
51 (NA)
56 (NA)
53 (NA)
|
ADNI-1
|
MRI and PET: Volume, intensity and CSF (Aβ42, t-tau y p-tau) measurements
|
M2TL
|
10-fold Cross Validation
|
sMCI vs pMCI:80.1
|
0.85
|
Moradi et al. (2015) (43)
|
HS
AD
sMCI
pMCI
|
231 (NA)
200 (NA)
100 (NA)
164 (NA)
|
ADNI-1
|
MRI: GM volumes, age and cognitive measures
|
RF
|
10-fold Cross Validation
|
pMCI vs sMCI: 82
|
0.90
|
Ritter et al. (2015) (44)
|
sMCI
pMCI
|
151 (74.1)
86 (74.6)
|
ADNI-1
|
MRI and PET: Neuropsychological test, clinical variables, cortical thickness, demographic data and intensity measurements
|
SVM with RBFk
Classification tree
RF
|
30 iterations of 10-fold Cross Validation
|
sMCI vs pMCI:
SVM: 66.5
Classification Tree: 66.1
RF: 63.1
|
NA
|
Salvatore et al. (2015) (45)
|
HS
AD
sMCI
pMCI
|
162 (76.3)
137 (76.0)
134 (74.5)
76 (74.8)
|
ADNI-1
|
MRI: GM and WM volumes
|
SVM
|
20-fold Cross Validation
|
pMCI vs sMCI: 66
|
NA
|
Collij et al. (2016) (46)
|
HS
AD
MCI
|
100 (66.7)
100 (63.2)
60 (62.8)
|
Sample collected for the study
|
MRI: Cortical thickness
|
SVM
|
Train/test method: 50/50 partition
|
pMCI vs sMCI: 70.8
|
0.77
|
Li et al. (2016) (47)
|
HS
AD
sMCI
pMCI
|
42 (65.6)
25 (69.4)
10 (66.5)
21 (68.6)
|
ADNI-1
|
MRI: GM whole-brain FCS and functional data
|
SVM
|
Leave One Out Cross Validation
|
pMCI vs SMCI: 80.6
|
NA
|
López et al. (2016) (48)
|
sMCI
pMCI
|
21 (72.7)
12 (75.7)
|
Sample collected for the study
|
MRI and MEG: Cognitive reserve; APOE genotype; hippocampal volumes; 3D MRI data; MEG recordings; neuropsychological tests
|
HLR
|
Train/test method: 75/25 partition
|
pMCI vs sMCI: 95.5
|
0.97
|
Thung et al. (2016) (49)
|
sMCI pMCI
|
53 (75.7)
60 (75.2)
|
ADNI-1
|
MRI: Whole-brain GM volume measures and changes in 4 years of follow-up
|
Mk-SVM
|
10-fold Cross Validation
|
pMCI vs SMCI: 78.2
|
0.84
|
Long et al. (2017) (50)
|
HS
AD
sMCI
pMCI
|
135 (76.2)
65 (75.6)
132 (75.2)
95 (75.1)
|
ADNI-1
|
MRI: Whole-brain GM and Whole-brain WM
|
SVM
|
10-fold Cross Validation
|
pMCI vs sMCI:
with GM: 96.5
with WM: 96.0
|
GM: 0.99
WM: 0.99
|
Donnelly-Kehoe et al. (2018) (51)
|
HS
AD
sMCI
pMCI
|
100 (NA)
100 (NA)
100 (NA)
100 (NA)
|
ADNI-1
|
MRI: Demographic, Morphometric and MMSE
|
RF
SVM
AB
|
Train/test method: 75/25 partition
|
NA
|
0.75
0.76
0.62
|
Gao et al. (2018) (52)
|
HS
AD
MCI
|
94 (76.3)
58 (74.2)
147 (74.8)
|
ADNI-1
|
MRI and PET: Hippocampus measurement, Medical history, Neuropsychological tests and Volume-based morphometry
|
GPR
PLS
|
Train/test method: AD + HS as train set and MCI as test set + follow-up
|
sMCI vs pMCI
GPR:82.2
PLS:85.5
|
NA
|
Khanna et al. (2018) (53)
|
HS
MCI
|
315 (NA)
609 (NA)
|
ADNI-1
|
MRI and PET: Volume, clinical and SNP measures
|
GBM
|
10 iterations of a 10-fold Cross Validation
|
C-index (it’s a generalization of the AUC ROC calculation for binary classification): 0.86
|
0.95
|
Popuri et al. (2018) (54)
|
sHS
uHS
pSH
pMCI
sMCI
eDAT
lDAT
|
753 (75.4)
110 (78.9)
58 (78.2)
486 (74.8)
881 (75.0)
232 (76.6)
464 (75.8)
|
ADNI-1
|
PET: Glucose metabolic signal
|
FPDS
|
Independent group
|
Classification of DAT+/DAT-:
sMCI = 70.4
pMCI = 67.9
|
sMCI vs pMCI at 2, 3 and 5 years:
0.80
0.79
0.77
|
Gupta et al. (2019) (55)
|
HS
AD
sMCI
pMCI
|
38 (76.7)
38 (77.1)
36 (74.2)
46 (76.1)
|
ADNI-1
|
MRI and PET: Volume, intensity and CSF (Aβ42, t-tau y p-tau) measurements
|
Mk-SVM
|
10-fold Cross Validation
|
pMCI vs sMCI: 93.6
|
0.95
|
Lee et al. (2019) (56)
|
HS
AD
sMCI
pMCI
|
229 (75.9)
198 (75.3)
214 (75.0)
160 (74.9)
|
ADNI-1
|
MRI: GM volumes
|
rDNN + SVM
|
10-fold Cross Validation
|
pMCI vs sMCI: 88.5
|
0.95
|
Pusil et al. (2019) (57)
|
sMCI
pMCI
|
27 (71.2)
27 (74.8)
|
Sample collected for the study
|
MEG: Brain connectivity matrix
|
MCFS + SVM with RBF kernel
|
Train /test method: 80/20 partition
|
pMCI vs sMCI: 100
|
NA
|
Spasov et al. (2019) (58)
|
HS
AD
sMCI
pMCI
|
184 (74.6)
192 (75.6)
181 (73.7)
228 (72.2)
|
ADNI-1
|
MRI: 3D data, demographic, neuropsychological, and biological (APOE4) measures
|
CNN
|
Train/test method: 90/10 partition
|
pMCI vs sMCI: 86
|
0.92
|
Wee et al. (2019) (59)
|
HS
AD
MCI
eMCI
lMCI
|
ADNI-1/ADNI-2:
242/300 (76.9/75.6)
355/261 (76.3/75.3)
415/NA (75.9)
NA/314 (72.9)
NA/208 (73.7)
|
Cortical thickness
|
Graph NN
|
10-fold Cross Validation
|
Conversion from:
lMCI to AD: 75
eMCI to AD: 92
|
NA
|
Abrol et al. (2020) (60)
|
HS
AD
sMCI
pMCI
|
237 (74.3)
157 (75.1)
245 (72.1)
189 (74.2)
|
ADNI-1
ADNI-2
ADNI-3
ADNI-GO
|
3D MRI data
|
ResNET
|
Train/test method: 80/20 partition
|
pMCI vs sMCI: 77.8
|
0.78
|
Gao et al. (2020) (61)
|
HS
sMCI
pMCI
|
847 (56.9)
129 (74.8)
168 (74.8)
|
ADNI-1
|
3D MRI data
|
Age prediction + AD-NET
|
5-fold Cross Validation
|
pMCI vs sMCI; 76
|
0.81
|
Giorgio et al. (2020) (62)
|
HS
MCI
|
167 (NA)
167 (NA)
|
ADNI-1
|
MRI and PET: GM density;
Biological and cognitive measurements
|
GMLVQ
|
10-fold Cross Validation
|
pMCI vs sMCI: 81.4
|
NA
|
Lin et al. (2020) (63)
|
HS
AD
sMCI
pMCI
|
200 (73.9)
102 (75.7)
205 (71.8)
110 (73.9)
|
ADNI-1
|
MRI and PET: Volume, cortical thickness, intensity measurements, APOE4 presence and levels of Aβ42, T-tau and P-tau in CSF
|
LASSO + ELM with Gaussian kernel
|
10-fold Cross Validation
|
sMCI vs pMCI: 84.7
|
0.88
|
Pan et al. (2020) (64)
|
HS
AD
sMCI
pMCI
|
262 (74.5)
237 (76.0)
175 (74.5)
115 (74.8)
|
ADNI-1
|
2D MRI data
|
CNN + EL
|
5-fold Cross Validation on independent sample
|
pMCI vs sMCI: 62
|
0.59
|
Ramon-Julvez et al. (2020) (65)
|
HS
AD
sMCI
pMCI
|
181 (NA)
191 (NA)
227 (NA)
179 (NA)
|
ADNI-1
|
MRI data and Jacobian determinant of diffeomorphic transformations
|
CNN
|
10-fold Cross Validation
|
sMCI vs pMCI: 89
|
0.94
|
Xiao et al. (2020) (66)
|
HS
AD
sMCI
pMCI
|
50 (77.8)
51 (75.8)
45 (71.9)
51 (72.5)
|
ADNI-1
|
MRI: GM volumes
|
Logistic Regression
|
10-fold Cross Validation
|
pMCI vs sMCI: 72.9
|
NA
|
Xu et al. (2020) (67)
|
HS
MCI
|
53 (69.6)
76 (73.7)
|
Sample collected for the study
|
MEG: Brain connectivity matrix
|
MG2G Embedding model
|
Train/validation/ test method: 85/10/5 partition
|
HS vs pMCI vs sMCI: 82
pMCI vs sMCI: 87
|
0.75–0.96
|
Note. AUC = Area Under the Curve; AD = Alzheimer’s Disease; HS = Healthy Subjects; MCI = Mild Cognitive Impairment; lMCI = late MCI; eMCI = early MCI; pMCI = progressive MCI; sMCI = stable MCI; WM = White Matter; GM = Grey Matter; CNN = Convolutional Neural Network; rDNN = randomized Deep Neural Network; FCS = Functional Connectivity Strength; RF = Random Forest; SVM = Support Vector Machine; EN = Elastic Nets; AB = Ada-Boost; nl-SVM-RBFk = non-linear SVM with Radial Basis Function kernel; SNP = Single Nucleotide Polymorphisms; GPR = Gaussian Process Regression; PLS = Partial Least Squares; OPLS = Orthogonal Partial Least Squares; MMSE = Mini Mental State Examination; VFI = Voting Feature Intervals; NN = Neural Network; AD-NET = Age-Adjust Neural Network; Res-Net = Deep Residual Neural Network; SNN = Spiking Neural Network; EL = Ensemble Learning; RLR = Regularized Logistic Regression; ; F-FDG = Fluorine 18 fluorodesoxyglucose; DAT = Dementia Alzheimer Type; lDAT = late DAT; eDAT = early DAT; FPDS = FDG-PET DAT Score; ICA = Independent Component Analysis; GMB = Gradient Boosting Model; M2TL = Multimodal manifold-regularized transfer learning; ss = sample selection; M3TL = Multi-Modal Multi-Task Learning; DBM = Deep Boltzmann Machine; ELM = Extreme Learning Machine; MG2G = Multiple Graph2Gauss; MCFS = Multi-Cluster Feature Selection; HLR = Hierarchical Logistic Regression; NA = Not Applicable. |
Finally, regarding the interpretability analysis, Table 4 shows that most of the studies presented results of specificity and sensitivity (44 out of 47 studies), all the studies performed a stability measurement of their model, only four studies did not compared their results with the existing literature, and only seven did not specify which features were the most important for the classification task. On the other hand, only 19 studies presented their results along with some kind of visualization of the most relevant brain areas for the prediction of MCI conversion. Finally, only 17 articles made an analysis of confounds and Lebedev et al. (37) were the only group that complied with the generalizability sublevel, testing their model in different cohorts.
Table 4
Analysis of the interpretability based on Kohoutová et al. (21)
|
|
Model
|
|
Feature
|
|
Biology
|
Author (year)
|
|
SEN/SPE
|
GN
|
AC
|
|
ST
|
IMP
|
VIS
|
|
LIT
|
Plant et al. (2010) (23)
|
|
√
|
-
|
√
|
|
√
|
√
|
-
|
|
√
|
Costafreda et al. (2011) (24)
|
|
√
|
-
|
√
|
|
√
|
√
|
√
|
|
√
|
Chincarini et al. (2011) (25)
|
|
√
|
-
|
√
|
|
√
|
√
|
-
|
|
√
|
Filipovych et al. (2011) (26)
|
|
√
|
-
|
-
|
|
√
|
√
|
√
|
|
√
|
Hinrichs et al. (2011) (8)
|
|
√
|
-
|
-
|
|
√
|
√
|
√
|
|
√
|
Westman et al. (2011) (37)
|
|
√
|
-
|
-
|
|
√
|
√
|
√
|
|
√
|
Zhang et al. (2011) (28)
|
|
-
|
-
|
-
|
|
√
|
√
|
√
|
|
√
|
Cho et al. (2012) (29)
|
|
√
|
-
|
-
|
|
√
|
√
|
√
|
|
√
|
Gray et al. (2012) (30)
|
|
√
|
-
|
√
|
|
√
|
√
|
√
|
|
√
|
Toussaint et al. (2012) (31)
|
|
√
|
-
|
√
|
|
√
|
√
|
√
|
|
√
|
Zhang et al. (2012) (9)
|
|
√
|
-
|
-
|
|
√
|
√
|
√
|
|
√
|
Casanova et al. (2013) (32)
|
|
√
|
-
|
-
|
|
√
|
√
|
-
|
|
√
|
Liu, X. et al. (2013) (33)
|
|
√
|
-
|
-
|
|
√
|
√
|
√
|
|
√
|
Wee et al. (2013) (34)
|
|
√
|
-
|
-
|
|
√
|
√
|
-
|
|
√
|
Young et al. (2013) (35)
|
|
√
|
-
|
-
|
|
√
|
-
|
-
|
|
-
|
Guerrero et al. (2014) (36)
|
|
√
|
-
|
√
|
|
√
|
√
|
-
|
|
√
|
Lebedev et al. (2014) (37)
|
|
√
|
√
|
-
|
|
√
|
√
|
√
|
|
√
|
Liu, M. et al. (2014) (38)
|
|
√
|
-
|
-
|
|
√
|
√
|
√
|
|
√
|
Liu, F. et al. (2014) (39)
|
|
√
|
-
|
-
|
|
√
|
-
|
-
|
|
-
|
Min et al. (2014) (40)
|
|
√
|
-
|
√
|
|
√
|
-
|
-
|
|
-
|
Suk et al. (2014) (41)
|
|
√
|
-
|
√
|
|
√
|
√
|
-
|
|
√
|
Cheng et al. (2015) (42)
|
|
√
|
-
|
√
|
|
√
|
-
|
-
|
|
-
|
Moradi et al. (2015) (43)
|
|
√
|
-
|
-
|
|
√
|
√
|
-
|
|
√
|
Ritter et al. (2015) (44)
|
|
√
|
-
|
-
|
|
√
|
√
|
-
|
|
√
|
Salvatore et al. (2015) (45)
|
|
√
|
-
|
-
|
|
√
|
√
|
√
|
|
√
|
Collij et al. (2016) (46)
|
|
√
|
-
|
√
|
|
√
|
√
|
√
|
|
√
|
Li et al. (2016) (47)
|
|
√
|
-
|
√
|
|
√
|
√
|
√
|
|
√
|
López et al. (2016) (48)
|
|
√
|
-
|
-
|
|
√
|
√
|
-
|
|
√
|
Thung et al. (2016) (49)
|
|
√
|
-
|
√
|
|
√
|
√
|
-
|
|
√
|
Long et al. (2017) (50)
|
|
√
|
-
|
-
|
|
√
|
√
|
-
|
|
√
|
Donnelly-Kehoe et al. (2018) (51)
|
|
√
|
-
|
√
|
|
√
|
√
|
-
|
|
√
|
Gao et al. (2018) (52)
|
|
√
|
-
|
-
|
|
√
|
√
|
-
|
|
√
|
Khanna et al. (2018) (53)
|
|
-
|
-
|
√
|
|
√
|
√
|
-
|
|
√
|
Popuri et al. (2018) (54)
|
|
√
|
-
|
√
|
|
√
|
√
|
-
|
|
√
|
Gupta et al. (2019) (55)
|
|
√
|
-
|
-
|
|
√
|
√
|
-
|
|
√
|
Lee et al. (2019) (56)
|
|
√
|
-
|
-
|
|
√
|
√
|
√
|
|
√
|
Pusil et al. (2019) (57)
|
|
√
|
-
|
-
|
|
√
|
√
|
-
|
|
√
|
Spasov et al. (2019) (58)
|
|
√
|
-
|
-
|
|
√
|
√
|
-
|
|
√
|
Wee et al. (2019) (59)
|
|
√
|
-
|
√
|
|
√
|
√
|
√
|
|
√
|
Abrol et al. (2020) (60)
|
|
√
|
-
|
√
|
|
√
|
√
|
√
|
|
√
|
Gao et al. (2020) (61)
|
|
√
|
-
|
-
|
|
√
|
-
|
-
|
|
√
|
Giorgio et al. (2020) (62)
|
|
√
|
-
|
-
|
|
√
|
√
|
-
|
|
√
|
Lin et al. (2020) (63)
|
|
√
|
-
|
-
|
|
√
|
√
|
-
|
|
√
|
Pan et al. (2020) (64)
|
|
-
|
-
|
-
|
|
√
|
√
|
-
|
|
√
|
Ramon-Julvez et al. (2020) (65)
|
|
√
|
-
|
-
|
|
√
|
-
|
-
|
|
√
|
Xiao et al. (2020) (66)
|
|
√
|
-
|
-
|
|
√
|
-
|
-
|
|
√
|
Xu et al. (2020) (67)
|
|
√
|
-
|
-
|
|
√
|
√
|
√
|
|
√
|
Total (48)
|
|
44
|
1
|
17
|
|
47
|
40
|
19
|
|
43
|
Note. This table shows an interpretability analysis performed for each study selected in our review following the framework proposed in Kohoutová et al. (2020). Presence (√) or absence (-) of the different sublevels assessments. Behavioural Analysis, Representational Analysis and Invasive studies sub-levels are not applicable to this type of studies by its definition. SEN = Sensitivity; SPE = Specificity; GN = Generalizability; AC = Analysis of confounds; ST = Stability; IMP = Importance; VIS = Visualization; LIT = Literature. |