Participates
To ensure consistency, the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/) was searched for normal controls (NC), early mild cognitive impairment (EMCI), late mild cognitive impairment (LMCI), and AD who were imaged at baseline using a 3 Tesla MRI scanner. Subjects were excluded if they did not have a Mini-Mental State Examination (MMSE) score, or failed image registration or segmentation. As a result, a total of 514 subjects (83 NC, 217 EMCI, 120 LMCI, and 94 AD) were remained. Since the number of subjects various very much across four groups, sub-groups of 83 EMCI, 83 LMCI, and 83 AD were randomly chosen to match age and gender of 83 NC, and were further used in the current study. Specifically, EMCI and LMCI patients were all amnestic, who were diagnosed based on the following criteria: (1) a subjective memory concern reported by themselves, their partner, or a clinician; (2) MMSE score between 24 and 30; (3) Clinical Dementia Rating (CDR) of 0.5 in the memory box; (4) cognitive and functional performance was not sufficient to diagnose as AD on the screening visit; and (5) scored 9–11 with 16 or more years of education, 5–9 for 8–15 years of education, or 3–6 for 0–7 years of education on the logical memory II subscale of the Wechsler Memory Scale-Revised for EMCI, whereas scored less than or equal to 8 for 16 or more years of education, less than or equal to 4 for 8–15 years of education, or less than or equal to 2 for 0–7 years of education for LMCI. All AD patients had to meet criteria for probable AD according to the NINCDS-ADRDA criteria, and detailed information can be referred to ADNI manual (http://adni.loni.usc.edu/wp-content/uploads/2010/09/ADNI_GeneralProceduresManual.pdf).
MRI data acquisition
Raw unprocessed 3.0 T T1-weighted MRI images were downloaded from the ADNI database, which were scanned using different MRI scanners at multi-sites. Details about data acquisition protocol can be seen in ADNI's official webpage (http://adni.loni.usc.edu/methods/documents/).
Data preprocessing
The T1 images were preprocessed using the standard pipeline in the DPABI toolbox (http://rfmri.org/dpabi) with unified segmentation and diffeomorphic anatomical registration through the exponentiated lie algebra (DARTEL). The major steps were: (1) segmenting each image into gray matter, white matter, and cerebrospinal fluid; (2) normalization using the DARTEL; (3) resampling to a voxel size of 1.5 mm × 1.5 mm × 1.5 mm; (4) modulating by multiplying the voxel values with the Jacobian determinant derived from the spatial normalization; and (5) smoothing with a Gaussian kernel of 8 mm × 8 mm × 8 mm full-width at half maximum.
Gene expression data processing
We processed gene expression data of six postmortem adult brains using a new pipeline (https://github.com/BMHLab/AHBAprocessing) [25]. The major steps were: (1) reassigning probes to genes to the latest version using the Re-annotator toolkit (https://sourceforge.net/projects/reannotator/); (2) intensity-based filtering by a threshold of 50%; (3) selecting probes using RNA-seq data with highest correlation values; and (4) normalize expression data by scaling robust sigmoid for each participant. These procedures resulting 10,027 genes for each tissue sample. Here, we only included tissue samples from the left hemisphere since only two right hemisphere data were available in the AHBA, resulting 1285 samples.
Regional GMV differences
Two sample T-tests were performed within the gray matter mask in EMCI, LMCI, and AD patients as compared to NC to obtain voxel-wise GMV differences map using DPABI, respectively. The results were corrected using Gaussian random field (GRF, a cluster level of p < 0.05 and a voxel level of p < 0.001). Negative and positive overlap among three groups was obtained using intersection.
Moreover, spheres with a radius of 4.5 mm (i.e., 3 times of the voxel size) centered in MNI coordinate of each tissue sample (n = 1285) were draw, and regional mean T-value within this sphere were defined as the t-statistic value of GMV difference for three groups, respectively.
AD risk genes associated with GMV differences
Fifty two reproducible and established AD risk genes based on a recently published literature [32] were intersected with 10027 background genes, resulting 41 interesting genes. Then, we calculated a matrix of 1285 regions × 41 gene expressions. To further explore their relationship with GMV difference, partial least squares (PLS) regression was performed with gene expression data as predictor variables [33]. The first component of the PLS (PLS1) was further used in the current study, which was the linear combination of gene expression values that was most strongly correlated with regional changes in GMV difference. Then, cross-sample non-parametric Spearman rank was performed to determine relationship between regional PLS1 weighted gene expression and regional GMV alterations. To estimate the variability of PLS1 score for each gene, bootstrapping with 1000 times was performed. Z scores were defined as the ratio of the weight of each gene to its bootstrap standard error and ranked the genes according to their contributions to PLS1 using univariate one-sample Z tests [34]. The set of genes with Z>5 or Z< -5 were considered as positive or negative associated gene list. This procedure was performed separately for each dataset. The final gene sets were defined as the overlap between the two datasets (interaction).
Analyses for consistent genes
Cross-sample non-parametric Spearman correlations were performed to explore relationship between gene expression level and GMV changes in each group. Moreover, the number of comparisons (n = 12) was further corrected with a significance threshold of P < 4.16 ×10−3 = 0.05/12 (Bonferroni correction).
Re-analyses for sub-groups
Since EMCI and LMCI consist of patients who were ultimately converted to AD, remitted to NC, and stable in MCI, we subdivided them into convert (cEMCI and cLMCI), stable (sEMCI and sLMCI), and remitted sub-groups. The remitted sub-groups were relatively small, and were not included in the further analyses. Other four sub-groups (15 cEMCI, 59 sEMCI, 43 cLMCI, and 35 sLMCI) were analyzed using the same method. The weighted PLS1 scores and r values of Spearman correlation were calculated for each group.
Functional enrichment analyses
To understand pathways of gene ontology (GO) biological processes, molecular functions, cellular components, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, we performed Metascape analysis [35] using the positive and negative associated gene lists, respectively. The obtained enrichment pathways were thresholded for significance at 5% with at least three genes.