Independent and reproducible hippocampal radiomic biomarkers for multisite Alzheimer’s disease: diagnosis, longitudinal progress and biological basis

Background: Hippocampal morphological change is one of the main hallmarks of Alzheimer's disease (AD). The primary aim of this study is to explore whether radiomics feature is a robust biomarker for AD and further explore the biological basis of those features. Methods: Hippocampal radiomic features were extracted for classification and prediction using a support vector machine (SVM) with 1943 subjects (multi-site). Results: Multivariate classifier-based SVM analysis provided individual-level predictions for distinguishing AD patients (N=261) from normal controls (NCs; N=231) with accuracy=88.21% with inter-site cross-validation. Further analyses of a large, independent ADNI dataset (N=1228) reinforced these findings. In mild cognitive impairment (MCI) groups, a systemic analysis demonstrated that the identified features were significantly associated with clinical features (e.g., apolipoprotein E (APOE) genotype, polygenic risk scores, cerebrospinal fluid (CSF) Aβ, CSF Tau), and longitudinal changes in cognition ability; more importantly, the radiomic features have a consistently altered pattern with changes in the MMSE scores over 5 years of follow-up. Conclusion: These comprehensive results suggest that hippocampal radiomic features can serve as robust biomarkers for clinical applications in AD/MCI. (ANOVA)


Background
Convergence magnetic resonance imaging (MRI)-based biomarkers that target gray matter atrophy or shape alterations are the most commonly used measures for early biomarker detection of Alzheimer's disease (AD) [1][2][3]. These markers have been used to perform classification analyses that distinguish AD patients from normal controls (NCs) with 80-90% accuracy that has reached 95% in several small sample studies [4]. However, due to the limited sample size, the reproducibility and generalizability of these results are debatable; nevertheless this kind of robustness is the most fundamental property for clinical translation [5][6][7][8].
Hippocampal atrophy, or shape change, is one of the main hallmarks of AD [9].
However, the volume and/or shape are only crude proxies for the complex anatomical changes that occur in AD, and studies of atrophy often ignore the fact that this process is not uniform across different disease phases [10]. The hippocampus undergoes microstructural changes before severe atrophy, and pathological changes such as neurofibrillary tangles (NFTs) and amyloid-β (Aβ) are not directly detectable at the current resolution of clinical MRI [11]. Thus, novel MRI analyses that yield greater information about subtle changes in the hippocampus would be a significant contribution.
Radiomics, a method of texture analysis, provides information about first-, second-, and higher-order morphological features [12][13][14]. Texture analysis includes a variety of image analysis techniques that quantify the variations in surface intensity or patterns, including some that are imperceptible to the human visual system [15], and is a useful way to extract detail information from brain images, increase the precision of diagnosis, and assess prognosis [11,14,[16][17][18][19].
The apolipoprotein E (APOE) gene is known as a major genetic risk factor for AD [20]. Polygenic risk scores (PGRSs), comprehensive indicators that combine multiple risk alleles, provide a quantitative measure of genetic disease risk [21]. Previous large-scale genome-wide association studies found an association between the PGRSs for AD and the structural indices of certain brain regions [22][23][24]. During the progression of AD, Aβ plaques are considered to occur in the early stage, while Tau accumulation is considered to be the main factor underlying later dysfunction [25,26]. However, the association between the imaging biomarkers and these neurobiological measures is not yet clear.
Inspired by these studies, the first aim of this study is to explore whether hippocampal radiomic features with reproducible change patterns can serve as MRI biomarkers of AD using multisite MRI data (715 subjects from 6 sites). To further assess the reproducibility and generalization of the findings, an independent dataset from the Alzheimer's Disease Neuroimaging Initiative (ADNI) (N = 1228) was included. The second hypothesis is that these hippocampal radiomic features or classification outputs have a solid neurobiological basis, thereby improving the power for individual diagnosis or prediction. For this purpose, we investigate the neurobiological basis of the identified hippocampal radiomic features by relating these features to other variables, including APOE, PGRS, CSF Aβ, CSF Tau, and the progression of disease status of MCI subjects. The results confirm that hippocampal radiomic features are robust neuroimaging biomarkers with a solid neurobiological basis that are useful for diagnosing AD and predicting the likelihood of progression from MCI; therefore, they have translation potential. Institute (MNI) space using affine registration and resampled to 1 mm×1 mm×1 mm.

Materials and
The hippocampus was then segmented bilaterally using an revised segmentation method based on multi-atlas based local label learning (LLL) (https://www.nitrc.org/projects/locallabel), an automatic segmentation method including the N3 bias correction of imaging inhomogeneity [27,28]. For each side, we then extracted 495 features, including intensity-based features, shape-based features and texture-based features, across 8 wavelet-based frequency domains.
The definitions and detailed descriptions can be found in previous publications [19,29], and are listed in the supplementary material S11.

Redundancy removal and statistical analysis
Because shape features are the same in different frequency domains of the wavelet transform, duplicate features were removed before subsequent analysis. To reduce site effects, we first tested for group differential analysis at each site and then performed meta-analytic tests to integrate the multisite results [30]. At each site, statistical significance was determined using two-sample two-sided t-tests for any two groups after the age and gender effects were regressed out using a linear regression model. We then used the Liptak-Stouffer z-scoremethod [31] to integrate the results [32,33]. Briefly, the p-value of each feature at the i-th site was converted to the corresponding Zi score using the following formula: Zi = Φ-1(1-pi), where Φ-1 is the inverse of the standard normal cumulative density function. Then, a combined z-score for each feature was obtained using the Liptak-Stouffer formula as follows: [Due to technical limitations, the formula could not be displayed here. Please see the supplementary files section to access the formula.] where wi is the inverse of the variance in zi, which represents a relative measure of the statistical power compared with the other datasets. The z-scores were expected to follow a standard normal distribution under the null hypothesis. Using this method, we calculated the p-value according to the corresponding z-scores for the tests of group difference between the NC, MCI and AD groups. The correlation between the t-score of each feature in the ADNI dataset and the z-score of each feature in the in-house dataset was computed to verify the reproducibility of the results.

Classification analysis, validation and generalizability
To assess the multivariate performance of the radiomic features, we established a support vector machine (SVM) model to classify the AD patients and NCs.
Specifically, for each feature in each center, we first introduce a common min-max feature normalization scheme, a nonlinear SVM with a radial basis function (RBF) kernel was constructed using LIBSVM (http://www.csie.ntu.edu.tw/~cjlin/libsvm/),, and SVM-recursive feature elimination (SVM-RFE) was used for feature selection.
The classification analysis was evaluated with the inter-and intra-site crossvalidation methods [34]. We next tested the generalizability of the classification with radiomic features in the ADNI data, two additional independent cross-validation steps were performed. 1) In-house data were used as the training set, and ADNI data were applied as the testing set (CV3), 2) Conversely, the ADNI data was used as the training set, and in-house data was used as the testing set (CV4) ( Table 2, supplementary material S02-S04).
The classification performance was evaluated by means of several accuracy metrics [accuracy (ACC), sensitivity (SEN) and specificity (SPE)] and the areas under the receiver operating characteristic (ROC) curves (AUCs) [35]. To further verify the clinical relevance of the radiomic-based classification, we also investigated the correlations between the classifier output (decision score) and the cognitive ability scores of individual subjects in the test sets.

Relationship between radiomic features and
cognitive ability, APOE genetics, Aβ, Tau, and longitudinal changes in the cognitive abilities in

MCI group
In this study, the conservative features, which were defined as the overlap of the altered and conserved radiomic features, were obtained from group meta-analysis and classification analysis ( Figure 1). Later, we continue to explore the relationship between the clinical information and the conservative features as well as the uniformity with longitudinal changes in cognitive ability. To assess the association between the radiomic features and cognitive ability, Pearson's correlation coefficients were calculated between the features and the MMSE scores in individual subjects, both combining AD and MCI and treating each group separately. We also perform these correlation analyses in the ADNI and in-house cohorts to further evaluate the generalizability of the identified features.
We also evaluated the differences in the identified features between Aβ+ and Aβ-in the MCI subgroups (also between Tau+ and Tau-and between APOE ε4+ and APOE ε4-). Aβ and Tau occasionally coexist, which greatly increases the risk for AD [25,36,37]. Thus, we also evaluated the differences in the identified features in three MCI subgroups (Aβ+&Tau+, Aβ-&Tau+ or Aβ+&Tau-, and Aβ-&Tau-). To explore the relationship between Aβ/Tau and the radiomic features, Pearson's correlations between the radiomic features and Aβ/Tau were also evaluated in the MCI groups (supplementary material S05).
To explore the relationship between the radiomic features and PGRSs, we used 533 subjects with genome-wide single-nucleotide polymorphisms. For each subject, we used the "score" utility in PLINK [38] and recent summary statistics for AD [22] to compute the polygenic AD risk score. The Spearman's correlations between the radiomic features and the PGRSs were also calculated after accounting for the regression group and race effects (supplementary material S06).
We also evaluated whether the changes in these hippocampal radiomic features  (Table S11) to assess the feasibility of using these features to predict disease progression and to predict whether the MCI subject would progress to AD. To continue exploring the different patterns of the changes in the radiomic features of the PMCI and SMCI subjects, 35 PMCI subjects and 29 SMCI subjects with more than five points of longitudinal data and a time interval between each visit greater than 1 year were selected for further analyses of the relationship between the changes in the MMSE scores and the changes in the identified radiomic feature (supplementary material S07-S08).  Figure S6). To evaluate the accuracy in each pair of sites, we used the same method as CV1 to validate the robustness of the radiomic features, revealing a mean ACC higher than 0.84 (supplementary material S04). With the intra-site cross-validation, the mean AUC = 0.95±0.02 (Figure 3a-3b, Table 2).
When using the in-house data as the training data and the ADNI data as the testing data, we still achieved a higher AUC = 0.84 (ACC = 0.79) (Figure 3c Table S12). In addition, AUC improved from 0.65 to 0.82 by using logistical regression to predict the conversion of MCI subjects by adding radiomic features one by one until the AUC did not increase (Figure 5c, S07). For subjects with >5 visits, the changes in the radiomic features had a highly uniform variation trend with the changes in the MMSE scores in the PMCI subjects (N = 34) and SMCI subjects (N = 29) (Figure 5d, Figure S10).

Discussion
In a large-scale analysis of data pooled across sites, we demonstrated that radiomic features appear to be a robust, reproducible and generalizable imaging signature of AD using four types of cross-validations with widely used machine learning techniques. Moreover, these features are related to neurobiological substrates underlying the progress of AD and cognitive decline progression in MCI. This is of great significance for early clinical diagnosis or prognostic follow-up in AD.
The multisite MRI findings of altered hippocampal radiomic features in AD confirm and extend previous neuroimaging findings [11,14,19,28]. Our results show robust and reproducible AD-related alterations in intensity features, such as kurtosis, mean, mad, median, entropy and uniformity; these features reflect the properties or distribution of gray matter within the hippocampus in AD patients [11], which may indicate atrophy or gray matter loss in the hippocampus [39,40]. Interestingly, some shape features, such as the area, compactness and surface-to-volume ratio, were also altered in AD/MCI; these findings replicate those of previous studies [16, [41][42][43], and these phenomena indicate that atrophy of the hippocampus does not occur collaterally [44,45]. Textural features, such as long run emphasis (LRE), gray level nonuniformity (GLN), and run length nonuniformity (RLN), were the most significantly different features between the two groups. Although pathological features of AD, such as NFTs and Aβ plaques, cannot be detected by MRI, these microstructural changes might lead to altered textural patterns detectable via texture analysis [11, [46][47][48]. In support of this, the 27 identified features showed significant associations with the positron emission tomography (PET) amyloid value in the hippocampus (supplementary material S10). A subset of radiomic features was also significantly associated with high-risk genetic status, Aβ and Tau deposition, and changes in cognitive ability. These results highlight that the altered radiomic features have a solid neurobiological basis and confirm that with "radiomics: images are more than pictures, as they are also data" [13].
Neuroimaging genetics has moved from establishing heritable phenotypes to finding genetic markers associated with imaging phenotypes. APOE polymorphic alleles are the main genetic determinants of AD risk [49,50]. As expected, this study shows significant radiomic differences between APOE ε4+ and APOE ε4-in MCI subjects.
Unlike a single gene focus, the PGRS has been proposed to have improved predictive ability and statistical power. The PGRS of AD has been found to be associated with cognitive decline and brain imaging measures, highlighting the fact that elevated genetic risk influences traits even among individuals without dementia [23,24,[51][52][53]. Hence, the association between the PGRS and radiomic features highlights that hippocampal textures can be used to predict whether a subject has an accelerated genetic basis for progressing from MCI to AD. Aβ plaques and Tau NFTs are the hallmark lesions of AD, and the strong association between radiomic features and the progression of Aβ and/or Tau in the MCI subjects indicates the strong neurobiological substrate of these features [2,11,16,54]. It is still a significant challenge to evaluate and predict the progress of MCI [28,55]. A particularly important finding is that the changes in the pattern of the identified features was strongly correlated with the changes in the MMSE scores of the PMCI and SMCI groups over five years. This provide powerful evidence for using features to follow the progress of high-risk groups, which has important clinical implications, although there is no definite marker to determine this eventual progress at the present time.
MRI studies aimed at identifying robust disease biomarkers require large numbers of samples, which can be difficult to obtain from a single site [56], and the generalizability of models to larger multisite datasets is an important step to increasing the statistical power necessary to identify biomarkers in translational neuroimaging [6,8]. It should be noted that the ADNI cohort, which has a large sample (more than 1500), was collected from more than 50 centers [57,58], and the mean sample size is relatively small ( Figure S1). Importantly, because these results were validated with data from independent sites, our approach is expected to have greater generalizability to future datasets compared to a single site study using internal validation methods such as leave-N-out cross-validation [4,34]. To some degree, better classification might potentially be achieved via parameter tuning, feature selection/combination, or other methods, such as deep learning.
However, these benefits must be balanced with the risk of overfitting and reduction of the generalizability of the classification approach to novel data [35]. More significantly, the negative correlation between the classifier's output and the MMSE implies that the more severe the disease is, the easier it is to recognize, which implies a potential bridge between medical imaging and personalized medicine [59].

Limitations and caveats
Despite these advances, this study has several limitations that should be considered. First, the most obvious advantage of a multisite study is the large number of subjects with more generalizable information than a single site. Most but not all of them are MCI subjects, and these large pools of data come from different sites, which inevitably leads to the results being affected by the inhomogeneity of the subjects. Despite the increasing number of multisite MRI studies, the problem of site/scanner confounds has not been entirely addressed, although a recent study provides a statistical framework to approach this issue by correcting distributional shifts between datasets [60]. Although the classification results are encouraging, the accuracy may improve with the use of a more powerful classification approach, such as deep learning, with more samples [61]. We should rethink how the normalization and segmentation methods affect the radiomics measures, although we have confirmed the test-retest probability of these measures based on the LLL method [62]. Finally, combining markers from other brain regions will help identify robust and reproducible biomarkers for clinical applications [4,8,[63][64][65]].

Conclusion
This systematic study highlights the presence of hippocampal textural abnormalities in AD and the possibility that textures can serve as neuroimaging biomarkers for AD for further clinical applications.    Table 1. Demographic and genetic characteristics of the target sample.