Multivariate radiomics models based on 18F-FDG hybrid PET/MRI for distinguishing between Parkinson’s disease and multiple system atrophy

To construct multivariate radiomics models using hybrid 18F-FDG PET/MRI for distinguishing between Parkinson’s disease (PD) and multiple system atrophy (MSA). Ninety patients (60 with PD and 30 with MSA) were randomized to training and test sets in a 7:3 ratio. All patients underwent 18F-fluorodeoxyglucose (18F-FDG) PET/MRI to simultaneously obtain metabolic images (18F-FDG), structural MRI images (T1-weighted imaging (T1WI), T2-weighted imaging (T2WI) and T2-weighted fluid-attenuated inversion recovery (T2/FLAIR)) and functional MRI images (susceptibility-weighted imaging (SWI) and apparent diffusion coefficient). Using PET and five MRI sequences, we extracted 1172 radiomics features from the putamina and caudate nuclei. The radiomics signatures were constructed with the least absolute shrinkage and selection operator algorithm in the training set, with progressive optimization through single-sequence and double-sequence radiomics models. Multivariable logistic regression analysis was used to develop a clinical-radiomics model, combining the optimal multi-sequence radiomics signature with clinical characteristics and SUV values. The diagnostic performance of the models was assessed by receiver operating characteristic and decision curve analysis (DCA). The radiomics signatures showed favourable diagnostic efficacy. The optimal model comprised structural (T1WI), functional (SWI) and metabolic (18F-FDG) sequences (RadscoreFDG_T1WI_SWI) with the area under curves (AUCs) of the training and test sets of 0.971 and 0.957, respectively. The integrated model, incorporating RadscoreFDG_T1WI_SWI, three clinical symptoms (disease duration, dysarthria and autonomic failure) and SUVmax, demonstrated satisfactory calibration and discrimination in the training and test sets (0.993 and 0.994, respectively). DCA indicated the highest clinical benefit of the clinical-radiomics integrated model. The radiomics signature with metabolic, structural and functional information provided by hybrid 18F-FDG PET/MRI may achieve promising diagnostic efficacy for distinguishing between PD and MSA. The clinical-radiomics integrated model performed best.


Introduction
Parkinson's disease (PD) is the second most common neurodegenerative disease in the world and is characterized by the progressive degeneration and death of dopaminergic neurons in the substantia nigra and neuron terminals in the basal ganglia [1]. At present, the diagnosis of PD mainly depends on clinical manifestations [2], as identified by the physicians relying on their clinical experience. A recent study of clinical pathology demonstrated that only 50% of early Parkinsonian patients were accurately diagnosed with PD [3]; others were mostly misdiagnosed as atypical Parkinsonian syndromes (APS). Multiple system atrophy (MSA), as one of the most common APSs, is difficult to distinguish from PD for that they all belong to α-synucleinopathy [4], and the clinical symptoms always overlap especially in the early stages [5]. However, MSA progresses aggressively and has a poorer prognosis, with a mean survival of 8-9 years [4]. Early and reliable distinction between PD and MSA is very important for developing individualized treatment plans as soon as possible, which not only can improve the patients' survival and quality of life, but also can reduce psychosocial and economic burdens on patients and their families [6].
Neuroimaging has become an indispensable tool for the diagnosis of PD and other motor disorders, and magnetic resonance imaging (MRI) and positron emission tomography (PET) are the stronghold in brain imaging techniques [7]. MRI is considered the best method to display the morphology of the brain. Putaminal atrophy, rim sign and the 'hot-crossbun sign' are typical MR features in MSA [4], while there is no specific MRI signal in PD [8]. Compared with brain MRI, 18 F-fluorodeoxyglucose ( 18 F-FDG) PET shows a higher diagnostic sensitivity [9]. Glucose metabolism is always normal or slightly higher in the basal ganglia in early PD, while it is lower in MSA because of putaminal microstructural damage [2,10]. But multiple examinations waste time and are not benefit for patient management [7].
A hybrid PET/MR system combines two optimal brain imaging modalities, and can realize one-stop access to the structural, functional and metabolic images, which ensures their consistency in time and space [11,12]. However, the aforementioned abnormalities in MRI are insufficient to visualize in the early stages [13], which means conventional MRI has a lower sensitivity for early differential diagnosis of PD and MSA. Although some application of neurofunctional MRI, such as resting-state functional MRI and diffusion tensor imaging, has been confirmed to help for discrimination between PD and MSA with high accuracy [14,15], they are not routinely used because of with complicated reprocessing and Parkinsonian patients having difficulties to remain immobile for a relatively long scanning time. In addition, PET imaging can only help to support or refute clinical perceptions, which cannot be used for direct diagnosis without the establishment of regulatory guidelines [8,16]. Therefore, it is necessary to find a quantitative and more accurate method to solve this clinical problem.
Early studies have shown the potential of radiomics for disease detection, diagnosis, prognosis and therapeutic assessment [17][18][19][20][21]. Radiomics refers to converting medical images into mineable high-dimensional data via high-throughput extraction of abundant imaging features [17,22], which can help for clinical decision-making through quantitative analysis of these data. Increasing studies have shown that radiomics and deep learning can obtain superior diagnostic performance in PD clinical assessment or distinction between PD and healthy controls or APSs via different brain imaging techniques [23][24][25][26][27][28][29][30]. However, there have been no studies to explore the PET/MRI radiomics analysis, which combines the optimal imaging techniques and advanced statistical quantitative method to maximize the advantages for distinguishing among PD and other diseases.
In this study, we aimed to develop an optimal radiomics signature using hybrid PET/MRI for the first time to distinguish between PD and MSA. Furthermore, we constructed a clinical-radiomics integrated model to explore a more efficient strategy for diagnosing and identifying of PD and MSA.

Patients
Ninety patients, including 60 PD cases and 30 MSA cases from December 2017 to June 2019, were retrospectively enrolled in our study. Inclusion criteria were as follows: (1) The diagnosis of PD was based on the Movement Disorder Society PD Criteria [31], while MSA was diagnosed based on the second consensus statement on the diagnosis of multiple system atrophy [32] (which were international standard diagnostic criteria for PD and MSA for unavailable pathological diagnoses), and all fulfilling diagnoses were confirmed by neurological physicians with more than 10 years of experience; (2) disease duration was no more than 5 years and the average follow-up period was more than 1.5 years; and (3) all patients had undergone 18 F-FDG PET/MRI examination with MRI sequences including T1-weighted imaging (T1WI), T2weighted imaging (T2WI), T2-weighted fluid-attenuated inversion recovery (T2/FLAIR) imaging, diffusion-weighted imaging (DWI) and susceptibility-weighted imaging (SWI). Exclusion criteria were as follows: (1) evidence of vascular disease confirmed on computed tomography (CT) or MRI; (2) organic lesions (such as trauma, tumours and infections) and other degenerative diseases; (3) severe motion artefacts on images and significant head movement during the scan; and (4) a blood glucose level was ≥11.1 mmol/L. The patient screening process is shown in the study flowchart in Fig. 1a. Patient cohorts were randomly divided into a training set (n = 63) and test set (n = 27) with a ratio of 7:3. The training set was used to construct the model and adjust parameters during the cross-validation. The test set was used to evaluate the generalization performance of the model, which did not involve the process of feature selection, feature standardization and model construction. This study is a retrospective study based on data from one of our clinical studies, in which all patients signed informed consent forms and ethical approval was obtained by the Ethics Committee of Tongji Medical College, Huazhong University of Science and Technology.
Image acquisition and reconstruction of 18 F-FDG PET/MRI PET/MRI images were acquired on a 3.0-T time-of-flight (TOF) Signa PET/MRI (GE Healthcare, Milwaukee, WI, USA). All participants fasted for at least 6 h and stopped any drugs that could affect brain metabolism for at least 12 h before the 18 F-FDG PET/MRI acquisition. An 18 F-FDG dose of 0.1 mCi/kg (3.7 MBq/kg) was intravenously injected after ensuring the blood glucose level was <11.1 mmol/L. The participants rested in a quiet and dimly lit room before and after the 18 F-FDG injection until the start of imaging. At 1 h after intravenous injection of 18  . MRI images were acquired simultaneously with the PET acquisition without re-positioning. The apparent diffusion coefficient (ADC) map (b = 1000) was generated from DWI images. An atlas-based method was used for PET attenuation correction. For PET imaging, the ordered subset expectation maximization (OSEM) iterative reconstruction algorithm with 28 subsets, 2 iterations and 2.14 mm (full width at half maximum) post-filtering was used.

Region-of-interest segmentation, image preprocessing and feature extraction
The radiomics workflow is shown in Fig. 1b. The bilateral putamina and caudate nuclei were selected as the regions of interest (ROIs), which were segmented on T1WI images through the open-source software ITK-SNAP (version 3.6.0, www.itksnap.org). To minimize partial volume effects, these ROIs excluded the most inferior and most superior slices including these structures [33]. The ROIs were delineated manually by a nuclear medicine physician with 2-3 years' experience (Hu X) who was blinded to subject information, and repeated measurements were performed at an interval of 6 weeks. All ROIs were confirmed by two neuroradiologists who had over 10 years of experience (Sun X and Liu F). The intra-observer differences were calculated by the intraclass correlation coefficient (ICC).
Based on the T1WI images, spatial registration of PET and other MRI images was carried out using the SPM software package (version 12.0, http://www.fil.ion.ucl.ac.uk/spm/) implemented in MATLAB 2016a (MathWorks, Natick, MA, USA) to provide the same spatial information (thickness, slice and interlamellar space). 18 F-FDG PET images were transformed into SUV maps by normalization by injected dose and patients' weight using the LIFEx software (version 6.20; www.lifexsoft.org), to provide the SUV values (SUV max , SUV mean , SUV min ) in the corresponding ROIs. ADC values were measured by the Omni Kinetic software (www.omnikinetics.com).
We employed the same strategy of feature selection and model construction above all three steps. The datasets were randomly divided into a training set and a test set with a case number ratio of 7:3. The minimal redundancy maximal relevance (mRMR) algorithm, which can considerably improve the accuracy of feature selection and classification [36], was performed for initial feature selection in the training set. The least absolute shrinkage and selection operator (LASSO) method, which is suitable for the regression of high-dimensional data, was used to select significant distinguishable features to construct the radiomics signature with 10-fold cross-validation.

Clinical-radiomics diagnostic model construction
The clinical variables, including clinical characteristics (age, sex, weight, pre-injection glucose levels, disease duration (DD), age at onset, hypermyotonia, asymmetric symptoms at onset, bradykinesia, limb tremor, dysarthria and autonomic failure (AF)), ADC values and SUV values, were collected and compared between PD and MSA in training and test sets. The clinical-radiomics model was constructed by combining the clinical variables with the optimal multimodal radiomics signature obtained above. The clinical-radiomics model was built by a multivariate logistic regression model with 10-fold cross-validation to distinguish PD and MSA through a likelihood ratio test with back-ward step-down. A nomogram was then constructed on the basis of the clinical-radiomics model.

Model effectiveness evaluation
The area under the receiver operating characteristic (ROC) curve (AUC) was used to evaluate the diagnostic performance of models constructed by the training set and validated by the test set, whereby the radiomics score (Radscore) was calculated via the formula built in the training set. The accuracy of the radiomics signature was evaluated in both the training and test sets. The models' calibration was assessed using calibration curves and the Hosmer-Lemeshow test; decision curve analysis (DCA) was performed to estimate the clinical utility of models.

Statistical analysis
Statistical analysis was performed by R 3.6.1 (www. Rproject.org). The packages in R used in this study were tidyverse, caret, pROC, glmnet, DMWR, rmda, ggpubr, ModelGood, rms, mRMRe, DescTOOLs and irr. Tenfold cross-validation was used in two points for recursive feature elimination to avoid the random overestimation: (1) after construction of radiomics signatures by LASSO and (2) after construction of clinicalradiomics integrated model. The Delong test was applied to compare the differences in ROC curves between two arbitrary models by Medcalc (www. medcalc.org).
The differences in demographic and clinical variables were compared between patients with PD and MSA in both the training set and test set by Graphpad prism 8 (www.graphpad-prism.cn). The Mann-Whitney U test was used for non-normally distributed quantitative data; for normally distributed data, the independent sample ttest was used. Chi-squared tests were used for categorical data.

Characteristics of the patients and clinical features
Demographic and clinical variables of the patients in the training and test sets are summarized in Table 1. All the demographic data, including the age, sex, DD and age at onset, did not show significant differences between PD and MSA patients (P > 0.05). In the training group, the symptoms of age at onset, dysarthria, AF and SUV max showed statistically significant differences between PD and MSA (P < 0.05); but, except for dysarthria, were not statistically significantly different in the test group.

Diagnostic performance of radiomics signatures
A total of 1127 features (first-order statistics, shape-based texture, grey-level cooccurrence matrix, grey-level run length matrix, grey-level size zone matrix, grey-level dependence matrix, neighbouring grey tone difference matrix) per sequence were extracted in all six PET/MRI sequence images. The intra-observer ICCs all indicated favourable feature extraction reproducibility (ICC > 0.7).
First, we investigated and compared the six singlesequence models of 18 F-FDG, T1WI, T2WI, T2/ FLAIR, SWI and DWI separately. All the radiomics signatures showed significant differences between PD and MSA (P < 0.05, Fig. 2a) in the training cohort and test cohort, with the AUCs >0.8 and most ACCs >0.7 ( Fig. 2d and e). The equations of six single-sequence Radscores and more details of ROC analysis are presented in the Supplementary materials (Supple. Equation 1-6, Supple. Table 1). Although there were no statistically significant differences among individual Radscores of single-sequence models in either the training set or test set (P > 0.05, Fig. 2f and g), 18 F-FDG had the highest diagnostic performance in the training set and test set (AUC = 0.900 vs. 0.833, Fig. 2d and e). The data are presented as mean ± SD or median (IQR) PD Parkinson's disease; MSA multiple system atrophy *The clinical features of PD and MSA showed statistically significant differences Therefore, we chose 18 F-FDG to combine with the other MRI sequences to optimize the model's next step, which was consistent with our multimodal radiomics design. Next, we combined 18 F-FDG with one of the MRI sequences for bimodal radiomics optimisation. Five double-sequence composite models all demonstrated high diagnostic efficiency with AUCs >0.9 in training set and > 0.8 in test set (Table 2). In the analysis of the combinations of 18 Table 2). In the groups combining 18 F-FDG with fMRI, two models were built: (1) Radscore FDG_SWI and (2) Radscore FDG_DWI . The Radscore FDG_SWI showed a larger AUC (training set: AUC = 0.951, 95% CI = 0.865-0.950; test set: AUC = 0.951, 95% CI = 0.792-0.997, Table 2). The equations of five double-sequence Radscores are presented in the Supplementary materials (Supple. Equation 7-11). No significant difference (P > 0.05) was detected when the five composite models were compared with each other, but the differences were statistically significant when comparing the double-sequence models with the single-sequence m o d e l s u s i n g t h e D e L o n g t es t ( P = 0.023 for Radscore FDG vs. Radscore FDG_T1WI , and P = 0.045 for Radscore FDG vs. Radscore FDG_SWI ).
Finally, we attempted to combine 18 F-FDG with T1WI (the best sMRI) and SWI (the best fMRI) to build a more optimized model. After feature dimension reduction, 10 features were used to build Radscore FDG_T1WI_SWI (Fig. 3c), when log(λ) was 0.032 ( Fig. 3a and b). The features showed no significant collinearity which indicated that Radscore FDG_T1WI_SWI avoided overfitting (Supplementary materials, Supple. Fig. 1). The equation of Radscore FDG_T1WI_SWI was  (Table 3, Fig. 3d and e), which indicated statistical significance in comparison to double-sequence composite models (P = 0.045 for Radscore FDG_T1WI_SWI vs. Radscore FDG_T1WI , and P = 0.049 for Radscore FDG_T1WI_SWI vs. Radscore FDG_SWI ).

Clinical-radiomics integrated model
In combination with clinical variables, we further constructed an integrated diagnosis model by logistic regression. The SUV max and symptoms of DD, dysarthria and AF were identified as independent factors in the clinical-radiomics integrated model by a multivariate logistic regression model. The variance inflation factors (VIFs) of the four clinical features were 1.036, 1.376, 1.217 and 1.087. The nomogram based on clinical factors and Radscore is shown in Fig. 4a. The Hosmer-Lemeshow test in the clinical-radiomics integrated model showed no significant differences in the goodness-offit for the training set (P = 0.986). The equation was We also evaluated the discriminatory efficiency of the clinical model and clinical-radiomics integrated model using ROC analyses (Table 3, Fig. 4b and c). The integrated model yielded the largest AUC of 0.993 (95% CI, 0.930-1.000) in the training group and 0.994 (95% CI, 0.861-1.000) in the test group, which showed a statistically significant difference when compared with clinical models (training set: AUC = 0.895, 95% CI = 0.790-0.958; test set: AUC = 0.858, 95% CI = 0.670-0.961; P = 0.012), but no statistical difference in comparison with Radscore FDG_T1WI_SWI (P = 0.098). The DCAs for the clinical model, Radscore FDG_T1WI_SWI and clinical-radiomics integrated model in the training and test sets are shown in Fig. 4f. The DCA indicated the threshold probability, in the range of 0 to 1, that could benefit from clinical-radiomics integrated model.

Discussion
In the present study, we progressively optimized the radiomics models from single and double sequences to multimodality, and obtained an optimal multimodal radiomics signature to  distinguish between PD and MSA using a hybrid PET/MRI, which could provide the metabolic, structural and functional information at the same time and space. To the best of our knowledge, this is the first study of 'radiomics' on a hybrid PET/MRI in the area of neurodegenerative diseases. Besides, the joint analysis of radiomics and clinical features may be of great significance in the differential diagnosis of other indistinguishable diseases. The radiomics models were progressively optimized in three steps. In the monomodal radiomics analysis, the diagnostic efficiency of 18 F-FDG PET was better than that of five different MRI sequences both in the training (AUC = 0.900) and in the test sets (AUC = 0.883), though there were no statistically significant differences between them. In fact, many published studies have shown that normal or slightly elevated 18 F-FDG uptakes were found in the striata of early PD patients [16,37]; however, hypometabolism in the putamen, pons and cerebellum has been demonstrated in different types of MSA [16]. The high diagnostic efficiency of 18 F-FDG is in accordance with the previous research by Brajkovic et al., whose diagnostic accuracy of 18 F-FDG PET was 93% for distinguishing PD and MSA [38]. Based on the fact that functional metabolic changes are often observed before structural changes, and nuance changes cannot regularly be observed by the naked eye, radiomics provides a highly sensitive opportunity to estimate the distribution of metabolism at the microscopic level [39]. Notably, no difference between 18 F-FDG and MRI sequences means that MRI sequences also have abilities to distinguish between PD and MSA, although it was indicated that 18 F-FDG PET tends better diagnostic ability compared with brain MRI, which is consistent with the research performed by Kwon et al. [9].
We combined 18 F-FDG PET with one MRI sequence for bimodal radiomics analysis in the next step for advanced model optimisation. The Radscore FDG_T1WI (training set: AUC = 0.958; test set: AUC = 0.932) and Radscore FDG_SWI (training set: AUC = 0.951; test set: AUC = 0.951) displayed the highest diagnostic efficiency in the combination of 18 F-FDG with sMRI and fMRI, respectively. Compared with PD, putaminal or pontocerebellar atrophy and a rim sign can be seen in typical MSA [8], where very thin T1WI sections are important for anatomical details [8]. The ROIs were sketched based on the T1WI images (section thickness 1 mm) in our study to reflect accurate anatomical structural information. Furthermore, many studies have shown that iron deposition plays an important role in neurodegenerative diseases [40][41][42][43][44], and higher iron deposition has been observed in the putamen and pulvinar thalamus of MSA compared with PD [40,45]. SWI is based on a T2*-weighted gradient echo sequence, which is sensitive to iron deposition. Kraft et al. reported that hypointense putamina signal intensity was more often observed in MSA than in PD by using T2*-weighted sequences [46], which could improve diagnostic accuracy in distinguishing between PD and MSA. Nevertheless, the performance of Radscore FDG_ADC was satisfactory, for increased putaminal diffusivity in some MSA patients compared with PD [8]. The lack of ADC images in the follow-up studies does not mean no sense it makes, just for the better selection (SWI) to reduce waste of health care resources.
In the third step, an optimized multimodal radiomics signature (training set: AUC = 0.971; test set: AUC = 0.957) was developed by combining 18 F-FDG with T1WI (the best sMRI) and SWI (the best fMRI), which was significantly better than single-sequence and double-sequence models (P < 0.05). Interestingly, the AUC was increased whenever adding another sequence to construct a new combinatorial model. This means that multimodal imaging can achieve more accurate diagnostic capability by combining the synergistic advantages of different imaging modalities, and some studies also confirmed superior diagnostic or predictive accuracy with MRI and PET or SPECT [29,30,47]. However, there are some methodological challenges in multimodal imaging, such as image alignment problems and the risk of overfitting due to increased data [48]. In our study, 18 F-FDG PET and MRI images were acquired using a hybrid PET/MRI system at the same time and space, which allowed for more accurate spatial alignment and reduced ROI errors to make the results more reliable. Additionally, we used 10-fold cross-validation to prevent overfitting as much as possible, which made for trustworthy models.
We further constructed the clinical-radiomics integrated model with the best diagnostic discrimination (training set: AUC = 0.993; test set: AUC = 0.994) by combining the optimal Radscore FDG_T1WI_SWI with clinical features. As the selected clinical characteristics, shorter disease duration (rapid disease progression) and more severe symptoms of dysarthria and autonomic failure are more common in MSA patients [32]. However, it must be mentioned that the diagnostic performance and clinical utility of a purely clinical model were not satisfactory and that the difference was not significant (P = 0.098) between the clinicalradiomics integrated model and Radscore FDG_T1WI_SWI . But it is still important to incorporate clinical characteristics within radiomics analysis, for which imaging is intended to complement and supplement, but not replace, clinical decision-making. The clinical symptoms are important basics for differential diagnosis of PD and MSA, although they overlap in the early stages [8,9]. A higher diagnostic efficacy also proves that multidisciplinary approaches, including radiomics analysis, make sense for the diagnosis and treatment of diseases [49].
Notably, we chose 18 F-FDG over dopamine system-related agents like the dopamine transporter (DAT) and the dopamine D2 receptors, because striatal DAT loss may overlap in most PD and MSA patients [50,51]. Also, DAT imaging can only be used for a diagnosis of exclusion [52][53][54]. Compared with F-FDG, dopamine-related agents are not widely used, as the conditions for manufacturing applications are more stringent.
There are some limitations to our study. First, there were no pathology results as a gold standard for PD and MSA diagnosis. We classified patients into PD and MSA groups strictly based on the standard official diagnostic guidelines, and guaranteed the mean follow-up period was >1.5 years, to ensure an accurate diagnosis of the two diseases. Second, the amount of data is an ongoing topic in radiomics research. Larger sample size will make the model more stable and give more reliable results. Therefore, we did not subtype the MSA patients in this study because motor symptoms occur sooner or later in any subtype of MSA [32]. However, subsequent studies could divide MSA patients into predominant parkinsonism (MSA-P) and predominant cerebellar ataxia (MSA-C) groups and add other APS such as progressive supranuclear palsy and corticobasal ganglionic degeneration. Third, more cases could be included for further testing our models. It will be interesting to prospectively recruit some suspected Parkinsonism patients, and classify to different cohorts according to the symptoms (such as the motor symptoms or autonomic dysfunction), and then observe whether models could accurately prognosticate the final-converted diseases (PD, MSA or other parkinsonism). To achieve this aim, it is important to carry out multicentre cooperation and database establishment. Forth, lack of interpretability is one of the biggest limitations of machine learning. Therefore, it is a valuable point to have an in-deep study to find the relationship between clinical or genetic information and radiomics features. Last, manually delineating ROIs is time-consuming and inevitably is accompanied by operator dependence. We attempted to segment the ROIs automatically, which could reduce time consumption and free up a lot of labour. The results showed good performance (data not shown here), but less accurate than the manual delineation method. More accurate, stable and intellective automatic sketching methods and brain zoning studies may achieve precise automatic segmentation in the future, and artificial intelligence screening is our ultimate goal.

Conclusion
The multivariable radiomics models based on the hybrid 18 F-FDG PET/MRI, which provides metabolic, structural and functional information simultaneously, can be applied for the comparatively accurate identification of PD and MSA; and a clinical-radiomics integrated model displayed better performance.