4.1 Studies selection
Based on the literature, we identified N=17 studies of VH in patients with PD that included acquisition of T1-weighted structural MRI scan, as part of a structural or functional data analyses, and with patients meeting our inclusion criteria (see below). We contacted the research groups responsible for the studies and among those N=8 groups took part in the project, offering previously published and/or unpublished data: Prof. Simon Lewis44 (University of Sydney), Prof. Phil Hyu Lee and Dr. Chung47 (Yonsei University), Prof. Henry Mak, Prof. Grainne McAlonan and Prof. S.L. Ho40 (King’s College London and The University of Hong Kong,), Prof. Kathy Dujardin, Prof. Renaud Jardri and Dr. Delphine Pins48(University of Lille), Prof. John-Paul Taylor and Dr. Michael Firbank36 (Newcastle University), Dr. Rimona Weil49 (University College London,), Prof. Michele Hu, Prof. Clare Mackay and Dr. Ludovica Griffanti50,51 (Oxford Parkinson’s Centre Discovery Cohort), Dr. Dominic ffytche41 (King’s College London) (see Table 1 in the Results section for details). Only data from participants diagnosed as dementia-free were included to minimise the contribution to the study of global cortical changes in patients with PD dementia. The study (LRS-19/20-17680) was given ethical approval by King’s College London Research Ethics Office, Psychiatry, Nursing and Midwifery (PNM) Research Ethics Panel on the 25/03/2020 and was subsequently pre-registered on the Open Science Framework site on 04/05/2020 (https://osf.io/nzatk). The methods follow the pre-registered plan with the addition of exploratory graph theoretical analyses based on structural covariance (section 4.3.4 and results in section 2.6).
Raw T1-weighted MRI scans were obtained from 8 different groups for a total of 519 subjects. We used 493 MRI scans in the analysis after discarding N=20 participants who did not meet the criteria in terms of diagnosis (e.g. healthy controls, with diagnosis of dementia) or whose scan did not segment well during pre-processing and subsequent troubleshooting steps or was not suitable for analysis (e.g. motion) (N=6). Patients with a MMSE score below 24 (raw) were retained (N=8) only when part of a published work in which the absence of dementia was specifically stated. The final sample comprised 493 participants, 135 with VH, 358 without VH (further details in Results section and in Table 1 and S2). Hallucination data collection varied across groups, as several used a different scale to screen for VH. Each group had previously divided patients into PD-VH and PD-noVH and we retained these original groupings for the mega-analysis.
4.3 MRI data pre-processing and harmonisation
MRI data was pre-processed with Freesurfer 6.0.052 to estimate cortical thickness, surface area and subcortical volumes. Data was processed on King’s College London HPC infrastructure Rosalind (https://rosalind.kcl.ac.uk), with the standard recon-all procedure, consisting of motion correction, skull-stripping, affine registration to Talairach atlas, segmentation, smoothing, and parcellation mapping. In order to screen for possible errors in the segmentation process, mean cortical thickness measures and manual slice by slice inspection were used to identify possible errors in the white-grey matter boundary and pial reconstruction steps. For subjects that did not segment properly the failed processing steps were re-run (autorecon3) after performing the appropriate corrections. Low quality scans (e.g. with excessive motion, n= 4) or scans that did not segment well upon troubleshooting (n =2) were discarded. Individual cortical thickness, subcortical volumes and surface area measures were extracted based on the Destrieux atlas53. In order to explore structural differences between patients with and without VH across the different cohorts minimising variance due to different recruitment sites and, therefore, different scanners, we used a harmonisation method. ComBat is an empirical Bayesian algorithm aiming at minimising the variance due to the scanner features and to maintain the variance related to biological features and has been previously successfully used in studies of cortical thickness54-55. In this study, this method has been also used to harmonise volume and surface area for each participant (see Supplemental information S1 for more details about this method and plotted results).
4.3.1 Group differences analysis
First, we conducted a meta-analysis with R package ‘metafor’56 to check whether patients were matched on the relevant demographic and clinical variables. Results are mentioned in the main text and reported with forest plots and a detailed description in Supplemental Information S2.
Then, we conducted separate exploratory ANOVAs and MANCOVAs for cortical thickness, surface area and subcortical volumes to screen for group differences between hallucinators and non-hallucinators, with age, gender and total intracranial volume (TIV) as covariates (when appropriate upon checking assumptions; see Results and Supplemental Information S2). Multiple comparisons were Bonferroni corrected. The models were calculated using SPSS 220.127.116.11 (IBM corp. 2016) and R 4.0.0 (R core team, 2017). Results are presented in Figure 1, created with a custom colour coding based on p values and by overlaying region labels on a brain render.
We used Tukey’s method programmed in R with the 1.5*IQR rule to identify outliers other than those excluded upon unsuccessful pre-processing. This allowed the careful inspection of the identified subjects in order to verify whether the outlier value depended upon measure errors (e.g. harmonisation bugs) or incorrectly entered data, or on the subject, with the purpose of retaining outliers depending on the subject (e.g. intrinsic features of the subject). No participants were discarded upon this check for this analysis.
4.3.2 Sensitivity and Subgroup analysis.
Of the eight original groups, three used the Neuropsychiatric Inventory (NPI) to score visual hallucinations. For this subgroup of studies, patients were matched for age, gender, onset, levodopa equivalent daily dose (LEDD), and Mini Mental State Examination (MMSE) score. Within each of the 3 studies, patients were also matched in terms of motor symptoms severity (UPDRS-III). We also ran a one-way ANOVA to check whether the subsample was matched for UPDRS-III but data was missing for 20 participants. We computed the group mean and used that to fill the missing value for the between groups multivariate ANOVA. We carried out Pearson’s product moment correlation coefficient between NPI score and the cortical thickness, surface area and subcortical volume data; we computed the same analysis with LED as a covariate in order to address its potential role in VH severity. In addition, we compared the PD-VH and PD-noVH in the data set using the original VH binary scores to check for consistency in the results with the larger data set, including age, gender, disease onset, LED, PD severity (UPDRS-III) and MMSE as covariates (Supplemental Information S4). We also conducted analyses of variance with a larger subgroup and with graded VH scores (mild, moderate, severe), together with an ordinal logistic regression (for details on the sample, methods and results see Supplemental Information S5).
4.3.3 Receptor density profiles.
Regression models with the difference of the means (VH – noVH) of morphometrical features (thickness, surface area, subcortical volume) as dependent variable and receptors density profiles as predictors were carried out, a methodology previously used on brain volumes57. Specifically, receptors density profiles were obtained for D2/D3, 5-HT1A and 5-HT2A based on a [18F] Fallypride template58 and a [11C] Cumi-101 5-HT1A and a [11C] Cimbi-36 5-HT2A templates59, respectively. We have focussed on DA and 5-HT as high resolution templates are available for these receptors of interest at the moment. Including cholinergic maps in the analysis would greatly enrich this approach given the importance of cholinergic transmission in VH in PD (as described in the introduction) and will be done once high resolutions templates will be available. [18F] Fallypride is a D2/D3 receptor antagonist with a high signal to noise ratio60. [11C] Cumi-101 and [11C] Cimbi-36 are high affinity PET radioligands for 5-HT1A and 5-HT2A receptors59 (Beliveau et al., 2017). Parametric modelling of the binding potential used the cerebellum as reference region61 and thus the vertices corresponding to the cerebellum were excluded from the regression analyses. Each of these templates was registered to the Talairach space using the fsaverage template subject and parcellated with the Destrieux atlas, to ensure alignment with the parcellated structural data of our participants. For each of the vertices we extracted the binding potential using fslmeants. Regression models were carried out to estimate the relationship between cortical thickness and surface area differences of the mean between VH and noVH patients (regions resulting from the first group-level MANCOVAs and ANOVAs, see 2.1, S3) and receptor density profiles. For surface area, we used regions that resulted different in PD-VH vs. PD-noVH from an exploratory one-way ANOVA, as the number of regions resulting different in the basis of the MANCOVAs performed and reported in S3 were too small in number to carry out a more powered model. We ran separate models for each receptor and for thickness, surface area and volume. In addition, for each receptor we ran three different models. First, we examined the relationship between the receptor’s binding potential in the regions with significant differences in cortical thickness/surface area between PD-VH and PD-noVH. The sloopes for these models were also compared (Supplemental Information S6). Then, in order to better investigate such relationship, we also assessed whether the receptor’s binding potential could predict thickness/area values for all regions; finally, with the same purpose, we ran models considering only regions where the difference between the groups was not significant (S6). Linear regression models were coded in R using the packages rstatix62 (Kassambara, 2020) and MASS63. For each regression model, in order to identify outliers, Cook’s distance was computed and any data point with a Cook’s distance >1 was marked as highly influential, explored and if appropriate discarded64. In addition, the confidence intervals of the significant regression models were estimated with the bootstrapping technique65 with 100,000 cycles (S6). Methods and results for thickness and surface area are graphically represented in Figure 3, for results on volume and further details see Supplemental Information S6).
4.3.4 Principal component analysis (PCA).
Results from the MANCOVAs comparing PD-VH and PD-noVH highlighted the involvement of widespread cortical regions in a high dimensional dataset. We used principal component analysis (PCA), in order to reduce the dimensionality of the dataset and to identify putative latent dimensions underlying the differences in structure in PD-VH versus PD-noVH patients while retaining as much variance as possible66. Data from both hemispheres was entered in each model (one for cortical thickness, one for surface area). Analyses were carried out with R packages factominer67 and factoextra68. The scree plots for the PCA are reported in S7. Separate PCAs were carried out for thickness and surface area. PCA inputs comprised the significantly different regions from the MANCOVAs (S3). Results are presented in Figure 3, created with a custom colour coding based on the components and by overlaying region labels on a brain render. To further explore a possible relationship of PCA components and hallucination severity, we carried out correlational analyses (Pearson product-moment) between the individual contributions to the different PCA dimensions, the NPI scores in the NPI subsample and the mean thickness and surface area of each dimension/component, that is the mean thickness/area across the regions constituting that component.
4.3.5 Structural covariance and graph theory analysis
In order to investigate inter-regional properties to explore and characterise the gray matter network-level organisation of PD-VH, we built networks based on structural covariance, a technique that assays covariation of differences in grey matter morphology between different brain structures across a specific population69. Since the most widely used atlas for this kind of analysis is the Desikan-Killiany atlas70 (see also71), we extracted morphometric features (thickness, surface area) at the 68 vertices of this atlas. The dataset was harmonised for multi-site effects with the same procedure described in section 4.3.1. The dataset was reduced to 467 cases as the design matrix based on the full dataset was not invertible due to high collinearity of some columns. We discarded N=26 subjects comingfrom the smallest datasets and the problem was overridden. Homogeneity of groups was verified with a Levene’s test72,73 .
The dataset counts 467 subjects, 118 PD-VH, 349 PD-noVH, with participants being matched for age. Age and gender were used as covariates in the models. Analyses were carried out with R package braingraph74 and igraph75. To construct the networks, first we specified a general linear model for each region (thickness/area as outcome variable, age and gender as covariates). The structural covariance matrices of the two groups were defined by estimating the inter-regional correlation between model residuals of thickness and area (in separate models) (e.g.76) to build a 68x68 matrix. A density-based threshold77 was applied to the matrix in order to retain a percentage of the most positive correlations as non-zero elements in a binary adjacency matrix. Different densities ranging from 0.05 to 0.20 with a 0.01 step size were explored. The differences between PD-VH and PD-noVH covariance matrices were then computed, first to establish that the two matrices differed significantly from one another; secondly, a cell by cell comparison was carried out to establish which covariance patterns were significantly greater for the PD-VH group compared to the PD-noVH group. Random undirected and unweighted graphs were created for each group, and vertex- and graph-level metrics were computed for each group. For visualisation purposes a density of 0.13 was selected. Vertex importance was assessed using degree, betweenness centrality and nodal efficiency. A hub was categorised as such if its betweenness centrality was greater than the mean plus 1 standard deviation - calculated on all vertices at the same density78-82. To assess network segregation in order to better understand the communities observed, we used modularity, which is a measure of the strength of network partitions. High modularity is a measure of how much vertices from the same community are more connected to each other. Modularity was computed with the Louvain algorithms, which also partitioned the network in communities83. Cortical thickness-based networks have been shown to have distinct modules/communities of regions, similar to those derived from fMRI and DTI data78. Network analyses were performed with permutation tests (5000 cycles) and bootstrapping analyses to compare vertex-level measures. Results were false discovery rate corrected.
Finally, to further assess the relationship between graph level metrics and visual hallucinations in the full sample, we computed Pearson’s correlation coefficients between the difference of the means of graph metrics of interest (vulnerability, transitivity, local and nodal efficiency, path length, betweenness centrality, eccentricity, distance) for the models on thickness and surface area separately, and the difference of the means in thickness and in surface area, respectively, with the NPI subsample, for which we have all clinical and demographic information and in which participants are matched on all those variables.