Twenty-one successive patients with previously diagnosed Parkinson's disease who reported to the out-patient clinic at the Medical University of Silesia, Katowice, were recruited for the prospective study. Written informed consent was obtained from all the participants. Finally, 18 PD patients without cognitive complaints or with differing severity of cognitive impairment constructed the patient groups. The demographic and clinical data of the patients are presented in Table 1. The subgroup of nine age-matched patients (3 persons per each diagnosis status) was additionally chosen from the initial 21 patients to inspect the identified biomarkers with no presence of the confounding factor. These patients were age-matched with the use of a k-nearest neighbour algorithm—with the PD patients with mild cognitive impairment treated as the reference group (tests for mean value equality: patient age at the exam p=0.1275, age at the onset p=0.2593, disease duration p=0.9746). PD was diagnosed according to the UK PD Society Brain Bank criteria (UKPDSBB) .
All subjects enrolled in the conducted study were carefully examined by a specialist and were subject to the Hoehn-Yahr test to estimate the stage of Parkinson's Disease. All the additional data such as disease onset, duration and information about the treatment were collected in a specially prepared survey. All patients were subject to the complex testing scenario, including neuropsychological test, Mini-Mental State Examination, Clock Drawing Test and Beck Depression Inventory. Other tests (Rey Auditory Verbal Learning Test, forward & backward Digit Span subscale of Wechsler Adult Intelligence Scale, Trail Making Test (TMT parts A&B) and Benton Visual Recognition Test (BVRT), were performed to estimate the strength of subjects cognitive.
Based on the results of the neuropsychological assessment, patients were then divided into three groups: cognitively normal (PD-CN), PD with mild cognitive impairment (PD-MCI) and PD with dementia (PDD). Cognitive impairment (MCI or dementia) was diagnosed according to Movement Disease Society criteria [4,15].
MRI scans were obtained using a General Electric 1.5 Tesla system. T1 and T2 weighted sequences were acquired for each patient. The total number of 2D scans acquired in the group of 18 patients was 5,984, for an average of 332.5 2D scans per patient.
Data preprocessing and statistical analysis
The MRI data were subject to a complex preprocessing pipeline. As a first step, data with visible crosstalk artefacts/slice overlap artefacts were identified and corrected. The median of Hodges-Lehman estimates of pairwise between-slice MRI signal intensity difference was calculated for each MRI sequence per each 3D study, and the affected 3D studies were identified with the use of Tukey's criterion. The next steps were (1) correction of magnetic field inhomogeneity, performed with the use of N4ITK technique ; (2) image intensity normalisation  to ensure the same signal range on each MRI sequence; and (3) brain extraction performed with an FSL-BET (Brain Extraction Tool) . The images were downsized to a common resolution of 160x160 pixels. Additionally, the images were internally normalised with the use of a z-score algorithm. The final number of 2D scans used in the study was 5,760.
The Shapiro-Wilk test was applied to verify the hypothesis of the distribution normality within each subgroup. Bartlett's test was used to check variance homogeneity. Depending on the test results, parametric ANOVA or nonparametric Kruskal-Wallis ANOVA was performed to test the hypothesis of mean/median value equality across all subgroups. Benjamini-Hochberg procedure was applied to correct for multiple testing. The post hoc Tukey-Kramer test was used in the pairwise comparisons. Effect size analysis  supported the findings obtained by standard statistical testing. Additionally, the above described statistical analysis was performed for the subgroup of the nine age-matched patients.
Convolutional neural networks (CNN) model
MR T1 and T2-weighted modalities were used to train the network.
The applied CNN consisted of 4 convolutional layers (kernel size of 3x3 pixels), followed by a max-pooling (kernel size of 2x2 pixels, a stride of 2) and batch normalisation modules. Each hidden layer was activated with the use of ReLU. Additionally, an image augmentation routine, including various image transformations such as rotation, rescaling, and shifting, was applied, which doubled the training set to 11,520 2D scans. Nadam was used as the optimiser with a learning rate of 0.001. The simplified structure of the proposed network is shown in Figure 1.
The CNN model was built and trained to estimate parameters used in the process of creating activation maps of the neuronal layers' classification. The CNN training was performed independently in three one-versus-other classification scenarios. The patient-level leave-one-out cross-validation schema was applied as presented in Figure 2. Gradient-weighted class activation maps [20,21], rescaled to the targeted resolution, were constructed for each patient, and the regions with the highest average across all patients activation values were recognised.
The segmentation of the identified distinguishing brain area was performed with the use of MiMSeg algorithm . The segmented area relative volume and folding were estimated, and the basic descriptive statistics within each patient group were calculated. Similar to the statistical analysis of the clinical parameters, the hypothesis of mean value equality was verified by ANOVA test after checking if the test assumptions were upheld. Post hoc pairwise comparisons were performed to seek for group differences. A p-value of less than 0.05 was assumed as statistically significant.