Alterations of white matter abnormalities in Parkinson’s disease: a machine learning approach

The inter-tract/region dependencies of white-matter in Parkinson’s disease are usually ignored by standard statistical tests. Moreover, it remains unclear whether the disruption of white-matter tracts/regions suces to identify Parkinson’s disease patients from healthy controls. A machine learning approach was applied to capture the interdependencies between white-matter tracts/regions and to differentiate PD patients from healthy controls. First, the mean regional white-matter proles, including white-matter volume, fractional anisotropy, mean diffusivity, axial diffusivity, and radial diffusivity, were extracted as features in Parkinson’s disease patients (N = 78) and in healthy controls (N = 91). Then, the feature selection and classication were performed using t-test and linear support vector machine, respectively. Last, the relationships between clinical variables and regional magnetic resonance indices were estimated. Our results showed the combined features (white-matter volume, fractional anisotropy, mean diffusivity, axial diffusivity, and radial diffusivity) had the best performance with an accuracy of 75.15% and area under curve of 0.8171, respectively. The most discriminative white-matter features were centered on the association bers, commissural bers, projection bers, and striatal bers. The discriminative regions of right anterior limb of internal capsule had positive association trends with the Unied Parkinson Disease Rating Scale III score; while the genu of corpus callosum and right retrolenticular part of internal capsule had positively association trends with the Hamilton Depression Rating Scale score. Our nding showed the multivariate machine learning approach is a promising tool to detect abnormal white-matter tracts/regions in Parkinson’s disease, and provides us a multidimensional means for neuroimaging classication.


Introduction
Parkinson's disease (PD) is the second most common neurodegenerative disorder in the world, affecting 2-3% of the population over the age of 65 (Poewe, Seppi et al. 2017). Although the precise mechanism underlying the pathophysiology of PD remains unknown, increasing evidences have suggested that pathophysiology of PD was associated with white matter (WM) abnormalities involving a number of brain WM pathways (Juttukonda, Franco et al. 2019. Machine learning (ML) approaches can model multiple variables at the same time, thus being able to discover complicated distribution patterns of the data (Belic, Bobic et al. 2019). Compared with conventional univariate analysis, the advantages of using multivariate ML analysis in neuroimaging data include: 1) enabling the investigation of implicit relationships between different variables, 2) producing an aggregated prediction for individual subject based on the variables collectively. Previous studies have highlighted the importance of shifting neuroimaging analysis from univariate analysis methods to multivariate approaches (Jollans, Boyle et al. 2019, Khosla, Jamison et al. 2019. Moreover, the discriminate patterns discovered using multivariate approaches are independent on P values, which may increase false positive rates (Type I error) (Jimenez, Angeles-Valdez et al. 2019), and should be carefully considered in standard test methods (Eklund, Nichols et al. 2016).
were extracted for each subject (Li, Liu et al. 2020). Thus, a matrix of 169 subjects x 50 features was obtained for each WM index.

FS and Classi cation
In current study, we utilized the two-sample t-test (P threshold from 0 to 1 with a 0.01 interval) as the FS method (Cui, Xia et al. 2016, Tian, Qian et al. 2020). In addition, a nested leave-one-out cross-validation (LOOCV) strategy was applied in our classi cation framework (Figure 1), where the selection of optimal feature subsets and evaluation of classi cation performance were performed in the inner loop and outer loop, respectively (Wee, Yap et al. 2011, Wee, Yap et al. 2012. Furthermore, we applied LSVM to make classi cation in both the inner and outer LOOCV (Guyon, Weston et al. 2002, Rakotomamonjy 2003. The implementation of LSVM was based on LIBSVM toolbox for Matlab (http://www.csie.ntu.edu.tw/~cjlin/libsvm) (Chang and Lin 2011), and the penalty factor C was set at the default value (C = 1) (Cui, Xia et al. 2016). For each testing subject, the classi cation score could be estimated from the LSVM, and the participant with a positive score or negative score was considered as HC or PD, respectively.

Evaluation of the Classi cation Framework
A total of six indices were estimated to evaluate the classi cation performance of distinct WM indicesbased methods, including AUC (area under the receiver operating characteristic (ROC) curve), accuracy, sensitivity, speci city, positive predictive valve (PPV), and negative predictive value (NPV). In addition, to test whether the AUC and accuracy were signi cantly higher than values by chance, and whether the classi cation performance of combined features (WMV, FA, MD, AD, RD) performed better than the single WM index, a permutation test with 1000 times was performed. To control for the error of multiple comparisons among all the six magnetic resonance (MR) indices and each paired comparison, a falsediscovery rate (FDR) method was applied.

Discriminative Features
With the utilization of ML approach, the determination of abnormal brain regions was totally different with univariate analysis such as t-test. In current study, the most discriminative regions were discovered in the process of cross-validation. Speci cally, the FS procedure would be performed once for each outer LOOCV fold after the selection of the optimal P threshold. This would result in slightly different selected features for each time. Several studies indicated that the discriminative features referred to the features which selected on all folds of the outer LOOCV (Qian, Zheng et al. 2018, Tian, Qian et al. 2020. Further, the classi cation contribution of each feature was estimated by averaging the absolute weight across all outer LOOCV folds. The higher discriminative weight, the greater contribution of the corresponding feature (Dai, Yan et al. 2012, Cui, Xia et al. 2016).

Demographic, clinical and test data
To test the group differences in age, education level and neuropsychological scores, the data were analyzed using two-sample t test. The gender data were additionally analyzed using χ2 test. Our results demonstrated that no signi cant differences were found among two groups based on age, gender, years of education and MMSE (all P > 0.05). As expected, it showed signi cant differences between PD and HC in HDRS values, indicating that PD patients experienced signi cantly more depression symptoms than HC (P < 0.05) ( Table 1).

Classi cation performance
A total of six kinds of classi cation framework using single or combined WM indices were evaluated in current study. The classi cation performance of our results was summarized and demonstrated in Table  2 and Figure 2. Speci cally, the combined WMV, FA, MD, AD and RD features-based classi cation framework could accurately discriminated PD from HC with an AUC of 0.8171. The accuracy, sensitivity, speci city, PPV and NPV were 75.15%, 74.36%, 75.82%, 72.50%, and 77.53%, respectively. Except for WMV and FA, the remaining four WM indices-based approach was demonstrated signi cantly higher accuracy rate and AUC values than chance (P < 0.05). The permutation test indicated that the classi cation performance of the combined features was signi cantly higher compared with the WMV, FA or RD index (Combined vs. WMV: P accuracy = 0.021, P AUC = 0.036; Combined vs. FA: P accuracy = 0.020, P AUC = 0.039; Combined vs. RD: P accuracy = 0.042, P AUC = 0.047). Further, we observed that the discrimination capacity of combined features was slightly higher than MD or AD feature (Combined vs. MD: P accuracy = 0.642, P AUC = 0.736; Combined vs. AD: P accuracy = 0.724, P AUC = 0.697).

Discriminative WM Features
There were 42 discriminative WM features for the LSVM classi er, which included 6 WMVs, 8 FAs, 10 MDs, 15 ADs, and 3 RDs (Table 3 and Figure 3). Speci cally, the 6 WMV features were derived from 3 left WM regions, the external capsule, cingulum, and fornix (cres)/stria terminalis; 3 right WM regions, the tapetum, uncinated fasciculus, and superior longitudinal fasciculus. The 8 regions for the FA feature were derived from 1 bilateral WM regions: the inferior fronto-occipital fasciculus; 2 left WM regions, the uncinate fasciculus, and superior longitudinal fasciculus; 4 right WM regions, the hippocampus part of the cingulum, posterior thalamic radiation, superior fronto-occipital fasciculus, and retrolenticular part of internal capsule. The 10 regions for the MD feature were 3 bilateral WM regions: the inferior frontooccipital fasciculus, anterior limb of internal capsule, and superior corona radiate; 1 left WM regions, the superior longitudinal fasciculus; 2 right WM regions, the hippocampus part of the cingulum, and retrolenticular part of internal capsule; and 1 middle WM regions, the splenium of corpus callosum. The 15 regions for the AD feature were 2 bilateral WM regions: the inferior fronto-occipital fasciculus, and anterior limb of internal capsule; 3 left WM regions, the superior longitudinal fasciculus, superior corona radiate, and posterior corona radiate; 5 right WM regions, the hippocampus part of the cingulum, posterior limb of internal capsule, retrolenticular part of internal capsule, fornix (cres)/stria terminalis , and cingulum; and 3 middle WM regions, the genu of corpus callosum, body of corpus callosum, and splenium of corpus callosum. The 3 regions for the RD feature were 1 left WM regions, the superior longitudinal fasciculus; 1 right WM regions, the superior corona radiate; and 1 middle WM regions, the splenium of corpus callosum.

Relationship between discriminative features and clinical variables
The relationships between clinical variables (UPDRS III and HDRS scores) and regional MR indices were estimated. Our results demonstrated that there was no signi cant correlation in current study (P < 0.05), while we observed the association trends in three brain regions (0.05 < P < 0.1). Speci cally, the discriminative regions of right anterior limb of internal capsule (R = 0.190, P = 0.096) had positive association trends with UPDRS III score; while the genu of corpus callosum (R = 0.196, P = 0.085) and right retrolenticular part of internal capsule (R = 0.220, P = 0.053) were found positively association trends with HDRS score (Figure 4).

Discussion
The current study demonstrated that the aberrant brain patterns in PD could be detected using a multivariate ML approach, and the aberrant WM regions were primarily focused on the association bers, commissural bers, projection bers, and striatal bers. Second, PD and HC could be well differentiated using WM features and the combined features (WMV, FA, MD, AD, and RD) had the higher classi cation performance than single feature. Moreover, the discriminative regions of right anterior limb of internal capsule had positive association trends with the UPDRS III score; while the genu of corpus callosum and right retrolenticular part of internal capsule had positively association trends with the HDRS score.
The most discriminative WM connections/regions: The association bers As the top discriminative weight in our study, the inferior fronto-occipital fasciculus has the greater contribution of the corresponding feature. The abnormal of inferior fronto-occipital fasciculus was also reported in many previous studies (Zarkali, McColgan et al. 2020) (Wang, Jiang et al. 2016). The inferior fronto-occipital fasciculus, as an important component of the anatomical substrates, involved in peripheral vision and the visual spatial processing (Schmahmann, Smith et al. 2008). Low visual performance showed WM changes within the inferior fronto-occipital fasciculus (Zarkali, McColgan et al. 2020) might account for freezing of gait in PD (Wang, Jiang et al. 2016). As part of the association bers, the disruption of the uncinate fasciculus tract and superior longitudinal fasciculus integrity is also reported in freezing of gait in PD patients (Pietracupa, Suppa et al. 2018, Tan, Keong et al. 2019. Meanwhile, the axonal damage in the superior longitudinal fasciculus were associated in cognitive impairment in PD (Duncan, Firbank et al. 2016).
Anatomically, the fornix and cingulum comprise the efferent and afferent major bers of the hippocampus, which are responsible for cognitive de cits in PD (Kamagata, Motoi et al. 2012). Similarly, abnormalities in the WM bers that connect the prefrontal cortex to various brain regions, especially the limbic systems, can also cause emotional disturbance in PD patients (Li, Liu et al. 2020).

The most discriminative WM connections/regions: The commissural bers
Corpus callosum is the largest WM bundle in the human brain (Catani, Howard et al. 2002), can assess the deterioration of inter-hemispheric connectivity. Since it transmits cognitive, sensory and motor information across the hemispheres, and de cits in this area could affect complex motor tasks, such as freezing of gait (Fling, Dale et al. 2016), and cognitive impairment (Bledsoe, Stebbins et al. 2018) in PD patients. The neurodegeneration of corpus callosum in PD may re ect its role in motor, cognitive and emotion features (Dillon, Gonenc et al. 2018), which is in accordance with our nding that the de cits of genu of corpus callosum had positive association trends with HDRS score.

The most discriminative WM connections/regions: The projection bers
Thalamic bers have been shown to be involved in the function of the basal ganglia-thalamo-cortical loop, which also affects movement and perception (Shine, Matar et al. 2013). The corona radiata is a dense WM structure that carries almost all of its neural connections from and to the cerebral cortex, as well as to the motor tracts (Guimaraes, Campos et al. 2018). Fiber tracts passing through the internal capsule connect cerebral hemispheres with subcortical structures, brainstem, and spinal cord (Emos and Agarwal 2020). As an important part of the motor circle, damage of the internal capsule may result in clinically symptomatic motor and sensory de cits (Schmahmann, Smith et al. 2008). Previous studies reported that freezing of gait in PD patients showed more pronounced WM abnormalities than HC in the areas of internal capsule, thalamic radiation, and corona radiata (Wang, Jiang et al. 2016, Pietracupa, Suppa et al. 2018, Bharti, Suppa et al. 2019. The anterior limb of internal capsule, where the projection bers pass from the prefrontal cortex, rostral cingulate area and supplementary motor area, then down to the thalamus, hypothalamus and basis pontis (Schmahmann, Smith et al. 2008), contains ber tracts that travel transversely between the caudate nucleus and the putamen (Emos and Agarwal 2020). Its de cits related to freezing of gait in PD patients (Pietracupa, Suppa et al. 2018), which accordance with the de cits in the region had positive association trends with UPDRS III score in our study, though not statistically signi cant. Freezing of gait subtype PD patients or the UPDRS III score was evaluated on the "off" state in PD patients, and there may be a signi cant correlation in a large data. The retrolenticular segment of the internal capsule contains bers of the optic radiation which connect the lateral geniculate nucleus to calcarine ssure (Emos and Agarwal 2020). The nding that retrolenticular part of internal capsule was positively correlated with HDRS score was not reported in previous studies, but the abnormal structure and dysfunction of calcarine ssure was found in major depression patients (Chen, Kendrick et al. 2017) and depressed PD patients (Zhu, Song et al. 2016). We speculated that the abnormal pathway in this "visual region" might indicate sensation and perceptions impairment in patients with depression or altered cognitive functions, such as the ability to pay attention, which may further modulate the mood regulation processes (Zhu, Song et al. 2016).
The most discriminative WM connections/regions: The striatal bers Although these corticostriatal pathways allow different regions of the basal ganglia to be involved in motor control, emotion, and cognition , its role in PD may focus on motor control (Wang, Jiang et al. 2016, Pietracupa, Suppa et al. 2018, Bharti, Suppa et al. 2019).
The current study still has a few limitations. First, the executive de cits, and other dysfunctions were not completely assessed in the patients, and it is essential to collect and analyze such data in future studies. Second, due to inter-subject brain differences and scanner variability, it is important to validate the results with a larger sample size and multicenter imaging dataset.

Conclusions
Overall, in the present study, our nding indicated that the multivariate ML approach is a promising choice for detecting abnormal WM tracts/regions in PD, and provides us a multidimensional means for neuroimaging research and classi cation.
Zhu, Y., X. Song, M. Xu, X. Hu, E. Li, J. Liu, Y. Yuan, J. H. Gao and W. Liu (2016 Values are represented as the mean ± SD. For comparisons of demographics, #P value for the gender distribution in the two groups was obtained using a test. Comparisons of neuropsychological scores between the two groups (HC, PD) were analyzed using a twosample t tests. NA-not applicable. F-female; M-male. P<0.05 was considered significant.   Figure 1 Schematic diagram of the combined WM features-based classi cation framework. The mean regional WM pro les, including WM volume (WMV), fractional anisotropy (FA), mean diffusivity (MD), axial diffusivity (AD), and radial diffusivity (RD), were extracted as features. Then, the feature selection and classi cation were performed using the t-test and linear support vector machine (LSVM), respectively. A nested leave one out cross-validation (LOOCV) was applied for feature selection and classi er training.

Figure 2
Receiver operating characteristic (ROC) curves for classi cation of PD patients using single or combined WM features. In current study, a total of six kinds of classi cation framework using single or combined WM indices were evaluated in current study. The classi cation performance of our results was summarized in Table 2. The area under ROC (AUC) for the combined features and WMV, FA, MD, AD and RD features were 0.82, 0.54, 0.53, 0.77, 0.80, and 0.67, respectively.