Hierarchical clustering identifies groups of cell-type specific genes
To identify astrocyte-specific genes, we calculated the metric τ [13] for every gene in the transcriptomes from isolated brain cell types (Additional file 1, Table 5 ‘τ scores’ ). Cell type-specific (τ ≥ 0.8) and cell type-enriched (τ ≥ 0.6) genes represented 30% and 60% of the 11,077 genes, respectively. Astrocytes (8%), microglia (7.6%) and neurons (7.2%) present the largest percentage of cell-specific genes, whereas neurons (21%) contained the largest pool of cell-enriched genes, followed by microglia (13.4%) and astrocytes (9.4%). In contrast, endothelial cells and oligodendrocytes presented the lowest percentages of cell-enriched (6%) and specific (3%) genes.
In parallel, because cell-specific functions are arguably performed by co-expressed genes [25], we reasoned that cell-type specific genes would be identified by clustering analysis. Thus, we applied hierarchical clustering (Methods) to the cell-type specific dataset, generating 194 clusters that contained between 1-1651 genes (Fig. 2A; Additional file 1, Table 6 ‘Hierarchical clustering’ and Table 7 ‘Cluster description’). We found that genes from the same cell type segregated together, defining clusters of highly co-expressed genes, plausibly representing cell-specific functions. The dendrogram height of hierarchical clustering was set to identify gene clusters that were highly enriched for individual brain cell types. The criteria for ‘enrichment’ was that the cluster contained more than 100 genes, of which over 70% corresponded to a unique cell type with τ ≥ 0.6 (Additional file 1, Table 8 ‘cluster selection’). Five clusters fulfilled these criteria, corresponding to astrocytes (cluster # 4), endothelial cells (cluster # 5), neurons (cluster # 13), microglia (cluster # 17) and oligodendrocytes (cluster # 164). There were other clusters containing over 77% of total of neuronal (e.g., clusters # 10, 116) or microglial (e.g., clusters # 30, 15) genes (Additional file 1, Table 8 ‘cluster selection’), but they were excluded from the analysis because less than 54% of their genes was enriched (τ ≥ 0.6) for that cell type, suggesting lower cell type specificity.
We continued with astrocyte cluster 4 (1651 genes) and neuronal cluster 13 (1258 genes) for two reasons: (i) a cluster size of over 1000 genes is sufficient for functional categorization and gene set analysis and (ii) since molecular changes in neurons are better characterized in AD than changes in astrocytes, neurons served as a positive control to assure correct data mining.
In the neuronal cluster # 13, 99.9 % of the genes were neuronal according to τ score, of which 54.1% of genes were neuron-specific (τ ≥ 0.8), meaning that they encode for proteins that perform functions highly specific to neurons. Indeed, the top 10 ranked genes based on τ (τ > 0.98) were related to neurotransmission (SYNPR, GABRA1, CNR1, SYT1, GABRG2 and GAD1), regulation of membrane potential (KCN2), microtubules (INA), and cell adhesion (RELN). The remaining 45.9% (τ < 0.8) were neuron-enriched or bulk genes plausibly supporting neuron-specific functions, since they were co-expressed with neuron specific genes (Additional file 1, Table 8 ‘cluster selection’; Additional file 2, Table 2 ‘Neurons cluster 13’).
In astrocyte cluster # 4, 80% of the genes were astrocytic according to τ (Additional file 1, Tables 7 and 8 ‘Cluster description’ and ‘Cluster selection’; Additional file 2, tab ‘Astrocytes cluster 4’). Further, 30% had τ ≥ 0.8, including genes that encode typical astrocyte proteins such as GFAP (τ = 0.95), glutamine synthase (GLUL, τ = 0.95), glutamate decarboxylase (GLUD2, τ = 0.91), glutamate transporters (EAAT1/SLC1A3, τ = 0.83), lactate dehydrogenase (LDHB, τ = 0.91), and enzymes involved in mitochondrial fatty-acid oxidation (ACADVL, τ = 0.92). Enriched genes with τ between 0.6–0.8, or bulk genes below 0.6, arguably necessary for astrocyte functions, are the aquaporins AQP1 (τ = 0.739) and AQP4 (τ = 0.744), and the chaperone clusterin CLU (τ = 0.373), which was identified by GWAS as a risk factor in AD [26]. These findings indicated that the group of genes gathered in cluster #4 is relevant to astrocyte biology.
Identification of cell-type specific gene clusters in AD databases
We next searched for the astrocytic and neuronal gene clusters in gene expression data from three AD whole-tissue transcriptomic databases (heatmaps of z-scores in Fig. 2B). Unexpectedly, hierarchical clustering of subjects (columns) of each cohort revealed heterogeneity in groups with the same clinical diagnosis. Subjects were divided in at least two molecular groups within the Control, MCI, and AD cohorts across the three databases, as shown by the division of the dendrograms into two large groups in each cohort. We refer to type 2 subjects as those showing blue z-scores within the neuron gene cluster, suggesting massive down-regulation of genes, as compared to type 1 subjects, in which genes with red z-scores predominated. Type 1 sub-cohorts included Control1, MCI1 and AD1, while type 2 included Control2, MCI2 and AD2. Numbers of Control1/Control2 patients were 33/45 in Mayo, 51/149 in ROSMAP, and 81/20 in MtSINAI. Numbers of AD1/AD2 patients were 28/54 in Mayo, 70/149 in ROSMAP, and 66/63 in MtSINAI. MCI1/MC2 numbers were 99/58 in ROSMAP.
The most profound down-regulation of genes within the neuronal gene cluster was observed in AD2. Because down-regulation of neuronal genes due to loss of synapses and neuronal demise is a hallmark of AD [18], we reasoned that AD2 was advanced AD, and Control1 a bona fide control composed of subjects that were neither demented, nor preclinical or pre-symptomatic at the time of death. Control1 and AD2 thus appeared to be the extreme groups across the continuum of AD phenotypes. We note that heterogeneity within an AD cohort has been recently reported [27].
Since our goal was to ascertain changes in astrocytic functions, it was critical to eliminate gross variability among samples within the astrocytic and neuronal gene clusters due to global changes in cell composition. To do so, we applied variance stabilizing normalization (Methods), which shifted the peaks of the distributions of differences in gene expression between AD2 and Control1 to zero (Fig. 3A). Prior to normalization, the distributions of gene-expression differences shifted to the left in the neuronal cluster, suggesting depletion of neurons in AD2, while the distribution of astrocytic genes shifted to the right, suggesting a relatively higher content of astrocytes in AD2. After normalization, the global downregulation of all neuronal genes in cluster # 13 in type 2 groups was attenuated, but intra-cohort heterogeneity nevertheless persisted (Fig. 3B and Additional file 3). In addition, new gene clusters that were previously masked were unraveled by hierarchical clustering. Thus, there were three neuronal (Fig. 3B, N-a, b, c), and four astrocytic gene subgroups (Fig. 3B, A-a, b, c, d) with opposite expressions patterns in type 1 versus type 2 subjects. For example, N-b genes were globally downregulated in AD2 and Control 2, as compared to Control 1 and AD1. Importantly, N-b contained genes related to synaptic function, including glutamatergic and GABAergic neurotransmission (e.g., GABRB3; GABRB2; GAD1; PDYN; SYN2; GABRG2; SYN1 NAPB; GRIA2; SLC17A7; GRIK2), supporting that this cohort is a bona fide AD. The normalized expression of cluster # 4 genes in the three AD databases is in Additional file 4.
Because the ROSMAP database included clinical covariate data, such as CERAD and Braak scores, APOE genotype and sex, we examined whether type 1 or type 2 sub-cohorts were associated with specific covariates to gain insight into why they were molecularly heterogeneous. We found that Braak and CERAD scores were significantly higher in AD groups than in Controls (Fig. 3C), confirming the clinical diagnosis. However, type 1 and type 2 subjects with the same clinical diagnoses did not significantly differ in their pathological stages according to CERAD and Braak scoring; nor were type 2 groups more enriched in APOE4 subjects or females (Fig. 3D).
GSVA reveals altered functions in AD and MCI versus Control1
We next identified changes in astrocytic and neuronal functions using the normalized expression data. Because the majority of the astrocytic genes were not annotated in existing pathways in open-source databases, we manually curated the astrocytic cluster in 17 functional categories (gene sets) and 135 subcategories, and the neuronal cluster in 15 functional categories and 79 subcategories (Methods, Additional file 2).
GSVA was then used to compare the gene sets corresponding to different functional categories among cohorts, yielding an enrichment score for each subject and gene set. We started by comparing the extreme cases, AD2 and Control1, using two criteria. First, probability values were computed using a permutation analysis across genes (cut-off for statistical significance was set at FDR-adjusted q < 0.05, Methods). Second, we considered that a function was altered in AD2 as compared to Control1 if such alteration was detected in at least two out of the three databases (consensus criterion).
For neurons, seven functions were altered in AD2 versus Control1 (Fig. 4A, statistics in Additional file 5, extended graphs in Additional file 6): ‘neurotransmission’ and ‘gene expression’ were altered in the three databases, and ‘synaptic plasticity’, ‘regulation of membrane potential’, ‘neural development’, ‘mitochondria’ and ‘stress response’ in MAYO and ROSMAP, but not MtSINAI. ‘Synaptic plasticity’, ‘neurotransmission’, ‘neural development’ and ‘membrane potential’ were down-regulated, and ‘gene expression’, ‘mitochondria’ and ‘stress response’ up-regulated. Among significantly different gene sets in MAYO and ROSMAP (Fig. 4A), Control 2 and MCI2 resembled AD2, and AD1 and MCI1 resembled Control1 (Fig. 4B). In MtSINAI, we found no statistically significant difference between Control 1 and Control 2, and AD groups were similar, although changes in AD2 were more dramatic, as compared to Control1 (Fig. 4B).
For astrocytes, five functions were altered in AD2 versus Control 1 (Fig. 5A, statistics in Additional File 5): ‘PAP’, ‘plasticity’ ‘stress response’ were up-regulated in three/three databases, ‘mitochondria’ was down-regulated in three/three databases, while ‘endomembrane system’ was down-regulated in two/three databases. Because ‘PAP’ are specialized astrocyte-neuron contacts and, hence, this category includes many genes related to gliotransmission, and since ‘gliotransmission’ was highly significantly changed in ROSMAP (qFDR < 0.001), and trending in MAYO (qFDR = 0.07), henceforth ‘PAP’ and ‘gliotransmission’ were considered together. As with neurons, Control2 and MCI2 showed the same direction of change as AD2 compared to Control1 (Fig. 5B). Likewise, enrichment of these gene sets in AD1 resembled Control1, particularly in MAYO and ROSMAP, while AD1 appeared to be a milder version of AD2 in MtSINAI.
Co-variance of gene sets with one another and with pathological covariates
Correlations between altered functions and pathological stages. Since CERAD and Braak scores were available for the ROSMAP data, we examined relationships between the enrichment of each gene set and pathological stages by regressing enrichment scores for each gene set and sample against the corresponding CERAD and Braak scores, regardless of clinical diagnoses (Methods). Data from all the individuals in the three groups (control, MCI and AD) were used in the analysis. For neurons, we found no correlation between Braak and CERAD stages for any of the seven top dysregulated categories described in Fig. 4A. The p values for regressions with Braak/CERAD scores were: ‘neurotransmission’ (0.536/0.919), ‘synaptic plasticity’ (0.342/0.703), ‘neural development’ (0.85/0.552), ‘membrane potential’ (0.794/0.479), ‘gene expression’ (0.55/0.563), ‘mitochondria’ (0.357/0.254) and ‘stress response’ (0.548/0.755) (Additional file 7). The lack of correlation can be interpreted to indicate similar gene dysregulation in all the stages, suggesting that neuronal dysfunction is an early event in AD pathology such that it appears in subjects with no (i.e., Control2) or incipient (i.e., MCI2) cognitive deficits, and low CERAD and Braak scores.
For astrocytes, of the five functions that were significantly altered in AD2 vs Control1 (Fig. 5A), the following ones significantly correlated with Break/CERAD according to p-values: ‘mitochondria’ (0.0000203/0.000742), ‘stress response’ and ‘plasticity’ (0.0768/0.00967) (Additional file 8). In contrast, ‘PAP’ (0.563/0.838), ‘gliotransmission’ (0.811/0.805), and ‘endomembrane system’ (0.709/0.942) did not correlate with pathological stages, suggesting that, as we reasoned with neurons, they represent early manifestations of alterations in astrocytic functions. For example, there is a progressive decline in the expression of mitochondrial genes from CERAD 1 to 3 and from Braak 1 to 6, while the expression of genes involved in ‘endomembrane system’ is altered even before the detection of cognitive deficits in patients.
Correlations among altered functions. We examined correlations among all neuronal and astrocytic gene sets by regressing enrichment scores for each gene set against each other gene set. Hierarchical clustering of correlation coefficients and adjusted p-values are in Additional file 9 for ROSMAP, MAYO, and MtSINAI. In neurons, the strongest direct correlations were among highly related functions such as ‘membrane potential’, ‘neurotransmission’ and ‘synaptic plasticity’ in neurons. Likewise, in astrocytes, the strongest correlations were between ‘PAP’ and ‘gliotransmission’ (orange squares). The strongest anti-correlations point to intertwined changes between: (i) morphological remodeling of PAP and loss of synaptic functions, and (ii) survival pathways included in ‘plasticity’ in astrocytes (see below) and organelle dysfunction.
Alteration of functional subcategories in astrocytes
We next aimed to gain further insight into which factors drive the functional alterations in AD astrocytes by searching for functional subcategories with statistically significant differences in enrichment scores between AD2 and Control1 (qFDR < 0.05) in at least two out of three of the databases (statistics in Additional file 10).
The category ‘PAP’ includes different categories related to morphology, metabolism and gene expression. The consensus subcategory significantly upregulated was ‘PAP morphology’ (qFDR < 0.001, AD2 vs Control1, in MAYO and ROSMAP). Because PAP are specialized in the functional refinement of adjacent synapses at a micro scale [22], PAP augmentation in AD astrocytes may be a compensatory reaction to locally optimize synaptic functions.
Subcategories in ‘Plasticity’ include annotations associated with morphological remodeling (e.g., ‘adherens junction’, ‘ECM’, ‘cytoskeleton’, ‘ciliogenesis’, ‘TGFbeta signaling’, ‘metalloproteases’), and pathways involved in brain development whose role in adult astrocytes is not well understood (e.g., ‘axon outgrowth’, ‘synaptogenesis, ‘WNT signaling’, ‘hippo signaling’, ‘smoothened signaling’, ‘hedgehog signaling’, ‘homeobox signaling’, ‘notch signaling’, ‘FGF signaling’, ‘pluripotency’, and ‘glial cell fate commitment’). Significantly dysregulated pathways were ‘ECM’, upregulated in the three databases, and ‘hippo signaling’ and ‘ciliogenesis’ (i.e., the process of generation of a microtubule-based and centriole-derived cilium whose function in astrocytes is unknown), respectively upregulated and down-regulated in two databases—note that the fact that the general trend of a given general category is towards up-regulation does not preclude that some of its subcategories are down-regulated.
‘Stress responses’ includes ‘AMPK signaling’, ‘antioxidant response’, ‘cell death’, ‘chaperone’, ‘DNA repair’, ‘hypoxia’, ‘complement system’ and ‘cytokines’. No subcategory was dysregulated according to qFDR in at least two databases. However, it is worth noting that ‘TNFalpha’, which is a subcategory of ‘cytokines’ as well as of ‘gliotransmission’, was significantly dysregulated in MAYO and ROSMAP, and trending in MtSINAI (qFDR = 0.093).
In ‘endomembrane system’, encompassing genes related to ‘endoplasmic reticulum’, ‘Golgi’, ‘lysosome’ and ‘vesicles’, no single subcategories were significantly dysregulated, while in ‘mitochondria’, ‘ETC’ and ‘TCA cycle’ were significantly downregulated in AD2 versus Control1 in three and two databases, respectively. For ‘ETC’, qFDR = < 0.001, < 0.001, 0.012, < 0.001 in MAYO, ROSMAP and MtSINAI. For ‘TCA cycle’, qFDR = < 0.001, 0.39, 0.01 in MAYO, ROSMAP and MtSINAI. All in all, the dysregulated subcategories in astrocytes point to alterations in mitochondrial respiration concomitant to morphological remodeling.
PCA confirms main functional changes in AD astrocytes
GSVA was dependent on gene sets organized into pre-determined functions. Thus, we also used PCA as an alternative, unsupervised approach to gain independent insight into the predominant functional changes of AD astrocytes. First, we examined whether expression of genes contained in the astrocytic cluster # 4 segregated cohorts in ROSMAP. PCA segregated the Control1, MC1 and AD1 groups to the left, and the Control2, MCI2 and AD2 groups to the right, according to the principal component 1 (PC1, horizontal axis, Fig. 6A). Violin plots of PC1 scores per group confirmed this tendency.
Second, we examined the functions performed by the proteins encoded by the top 50 dysregulated genes in PC1, representing the maximally co-varying astrocytic genes (Additional file 11). Mirroring our results with GSVA, genes related to ‘plasticity’ and ‘PAP/gliotransmission’ were over-represented in the up-regulated portion, whereas genes related to ‘endomembrane system’, ‘gene expression’, and ‘signaling’ were predominant in the down-regulated portion (Fig. 6B). Genes related to ‘mitochondria’ and ‘stress responses’ appeared in both sections, indicating mixed direction of dysregulation in these functions. Unlike GSVA, PCA also detected changes in ‘signaling’, as suggested by the presence of genes related to this function in the up and down segments of the top PC1 genes.
Third, we asked if the changes detected in AD astrocytes in ROSMAP with PCA were reproducible. Thus, we determined the capacity of the PC1 genes to segregate cohorts in MAYO and MtSINAI. As in ROSMAP, Control1 subjects were mostly negative, and AD2 cases were mostly positive in MAYO and MtSINAI, thus validating the discriminating capacity of the maximally co-varying genes in PC1 (Fig. 6C). AD1 segregated with Control 1 in MAYO, but not in MtSINAI, where AD1 aligned with AD2, confirming less intra-cohort heterogeneity in MtSINAI than in the other two databases, as shown with GSVA. Altogether, these results point to multi-factorial changes in AD astrocytes, encompassing changes in astrocyte-neuron interactions and organelle dysfunction (model in Fig. 7A and description in Discussion).
Finally, comparison of the top 50 UP and 50 DOWN PC1 genes with the DEG astrocytic genes identified with snRNAseq studies in AD brains [4, 5] reveals little concordance (Additional file 12). Specifically, only 0.3% and 1.3% of PC1 genes are in the DEG reported in [5] and [4], respectively. Of note, [5] and [4] only share 32 genes, representing 47% of the DEG in [5] and 4.5 % of those in [4]. The remarkable disparity among databases may be due to differences in methodologies, classifications of disease (e.g., clinical in our study and in [4], but associated to Amyloid-β pathology in [5] regardless of clinical diagnosis), brain regions sampled (prefrontal or temporal cortex in [5] and in our case, but entorhinal cortex in [4]), and/or the total number of subjects analyzed (48 in [5], 12 in [4], and 766 herein).
Cortical APP/PS1 astrocytes partially mimic cortical AD astrocytes
We finally asked whether changes identified in cortical AD astrocytes are recapitulated in mouse models. To answer this question, we examined the expression of human astrocytic genes from cluster #4 in the transcriptome of cortical astrocytes isolated from 15–18 month-old APP/PS1 versus WT mice (N = 4 per group) [10], using the same functional categorization and statistical tools utilized for human AD astrocytes. Comparison with recent snRNAseq mouse data [11], is not appropriate since this study was carried in hippocampi. We found robust down-regulation of all human astrocytic functions in APP/PS1 cortical astrocytes (Fig. 7B). Thus, APP/PS1 astrocytes mimic the down-regulation in ‘endomembrane system’ and ‘mitochondria’ detected in AD astrocytes, but not the up-regulation of ‘PAP’, ‘stress responses’ and ‘plasticity’. We also compared the top 50 and bottom 50 DEG from APP/PS1 astrocytes [10] with the top 50 and bottom 50 genes from PC1 in human AD astrocytes (Fig. 6B). GSVA confirmed that both the top and bottom human AD astrocyte gene sets were down-regulated in APP/PS1 mice (Fig. 7C, left). Importantly, the pool of immune-related genes reported to be up-regulated in mouse APP/PS1 astrocytes [10] are predominantly microglial according to τ scores, and, with the exception of C1S, do not coincide with functionally related human astrocyte-enriched genes in cluster # 4 subcategories such as ‘cytokines’ and ‘complement’ (Additional file 13). As also concluded in [10], this divergence points to the existence of two distinct gene pools in cortical APP/PS1 astrocytes that change in opposite directions in advance disease stages: a down-regulated pool of typical homeostatic astrocyte genes, and an up-regulated pool of typical microglia genes. The loss of astrocytic functions and the adoption of a microglia-like phenotype suggests that cortical APP/PS1 astrocytes may undergo a phenotypical involution with time.
Finally, regardless of the direction of change in mice, DEG in APP/PS1 vs WT cortical astrocytes were largely up-regulated in AD2 the three human databases (Fig. 7C, right panels). This suggests predominance of typical microglial genes in advanced AD, plausibly upregulated in both microglia and astrocytes.