Combinatorial analyses reveal that cellular composition changes have different impact on transcriptomic changes of cell type specific genes in Alzheimer’s Disease brains

Background: Alzheimer’s disease (AD) brains are characterized by progressive neuron loss and gliosis which involves mostly microglia and astrocytes. Comparative transcriptomic analysis on AD vs. normal brain tissues helps to identify key genes/pathways involved in AD initiation and progression. However, many such studies using bulk brain tissue samples have not considered cell composition changes in AD brains, which may lead to expression changes that are not due to transcriptional regulation. Methods: Using five large transcriptomic datasets including 1,681 brain tissue samples (882 AD, 799 normal) in total, we first mined frequent co-expression network modules across them, then combined differential expression and differential co-expression analysis on the mined modules in AD versus normal brains. Integrated with cell type deconvolution analysis, we addressed the question of whether the module expression changes are due to altered cellular composition or transcriptional regulation. We then used four additional large AD/normal transcriptomic datasets to validate our findings. Results: The integrative analysis revealed highly elevated expression level of microglia modules in AD without co-expression change. Decreased expression and elevated co-expression are observed for neuron modules in AD, while significant over-expression and co-expression perturbation are observed in astrocyte modules, all of which has not been previously reported. The expression levels of astrocyte modules also show the strongest correlation with the clinicopathological biomarkers among all cell type specific modules. Conclusion: Further analysis indicated that the overall increased expression of the core microglia modules can be well explained by the increased microglia cell population in AD brains instead of bona fide microglia genes’ upregulation. In contrast, the decreased expression and perturbed co-expression in AD neuron modules are due to both neuron cell loss and expression regulation of neuronal pathways including differentially expressed transcription factors such as BCL6 and STAT3, which previous study was not able to identify from the shadow of the cellular composition change. Similarly, the strong changes in expression and co-expression in the astrocyte modules may be also due to a combinatory effect from astrogliosis and astrocyte gene activation in AD brains. In this work, we demonstrated that the combinatorial analyses not only provide a powerful approach to delineate the origin of transcriptomic changes in bulk tissue data, but also lead to a deeper understanding of genes in AD.

population of different cell types in the affected brain regions when performing the gene differential expression or co-expression network analysis. Therefore, in bulk tissue transcriptomic analysis, we have to be very cautious and determine whether the widely observed down-regulation of neuron gene expression and up-regulated microglia expression patterns are due to changes in transcriptional regulation or simply due to cell population change of certain cell type (Srinivasan et al., 2016).
Although this has been noticed and concluded that in the bulk brain tissue samples from AD patients, mRNA level change reflects cellular composition change (Srinivasan et al., 2016), no further investigation has been carried out to find out that for a specific cell type, how the cellular composition change impacts their corresponding marker gene expression levels. Specifically, whether the cellular composition changes affect gene expression in a similar fashion, and whether bona fide transcriptional regulations can still be revealed on top of the cell population effect in bulk tissue samples. In our study, we re-examined the gene expression changes of AD compared to normal brain from bulk brain tissues in the five major large transcriptomic datasets (Table-1), accounting for the cell composition changes in the affected brain regions of AD. We aimed to separate the bona fide transcriptional regulation signals from the expression level changes caused by cell population changes in the observed differential gene expression and differential coexpression phenomena. This is achieved by combining the gene differential expression (DE) analysis with gene differential co-expression (DC) analysis on the identified gene co-expression network modules, along with the estimation of relative proportions of major brain cell types in AD and normal brain samples. Gene co-expression analysis has been applied in AD research to identify clusters of genes (modules) with specific functions that were dysregulated in the AD brains (Humphries et al., 2015;Miller et al., 2008;Zhang et al., 2013). It is known that co-expressed gene modules are often specific to certain cell type due to the genes involved in common pathways, functions, and/or regulated by common upstream regulators (Seyfried et al., 2017). A gene coexpression module specific to certain cell type with no or minimal correlation change between two conditions but high level of differential expression (DE) indicate that the DE is possibly due to the cell abundance change instead of expression dysregulation. In contrast, changes of gene co-expression relationships (differential correlation aka DC) are more likely due to some of the genes in the module regulated by altered activities of transcription factors or other regulators. Therefore, combining DE and DC analyses on gene co-expression modules along with cell type proportion estimation can differentiate the effect of cellular composition changes from bona fide regulatory changes. It will also identify gene modules with high level of DE and DC change, which may be linked to key upstream regulation change. Therefore, this analysis result can help generate new hypotheses and open new direction for the pursuit of the elusive AD etiology.
We first performed frequent gene co-expression network (FGCN) mining on five large human AD transcriptomic datasets generated from five separate studies, which contain multiple brain regions mentioned above. We obtained FGCN modules which were conserved across multiple datasets, avoiding potential technical platform or systematic bias in any single study. Then, we computed the DE score and DC score for each module and partitioned the modules into different categories according to the high/low DC and DE scores. Next, to adjust for the cell type composition changes in AD and normal brain samples, the abundance for each of the five cell types (neuron, microglia, astrocytes, oligodendrocyte, and endothelial cells) were estimated in every sample using cell type deconvolution method BRETIGEA (Chikina et al., 2015;McKenzie et al., 2018). We observed a consistently decreased abundance of neuron cells and increased abundance of glia cells (microglia, astrocytes) for the AD samples across all five datasets. Two gene modules were identified as microglia core gene modules from AD and normal samples respectively. Both modules showed large increased gene expression levels with minimal changes in correlations between gene expression profiles in AD samples. Further analyses indicated that the increased expression of genes from the microglia core modules with high DE scores are largely due to microglia cell population increase observed in AD progression instead of bona fide upregulation of gene expression within microglia cells, which is a novel finding. We also identified neuron marker enriched modules with high DC and high DE scores, which we believe are due to the combinatorial effects of neuron cell proportion change and dysregulation of neuron associated pathways in AD.
The two astrocyte modules identified with high expression and high co-expression change are likely due to the same combinatory effect, and their eigengene expressions are highly correlated with the clinicopathological biomarkers of AD. Furthermore, we identified six transcription factors for the first time that are frequently differentially expressed among nine transcriptomic AD datasets and are predicted to target multiple genes in these neuron modules, which can shed light on the changed connectome in AD brains.

Data Sets and Sample Processing
Five transcriptomic datasets from one or multiple regions of brain tissue from both AD patients and normal persons were used in this study ( Table 1). Two of them are microarray datasets GSE5281 (Liang et al., 2007), GSE48350 (Berchtold et al., 2008) from NCBI Gene Expression Omnibus (GEO) and three are RNA-seq datasets from the Accelerating Medicines Partnership-Alzheimer's Disease (AMP-AD) Knowledge Portal. The three AMP-AD RNA-seq datasets were designated as MSBB , ROSMAP (De Jager et al., 2018) and Mayo (Allen et al., 2016) by the project original name or research site, which contain transcriptome-wide FPKM values or raw read counts for AD and normal brain samples. Only samples from patients diagnosed as "AD" were considered as AD samples. Four additional transcriptomic datasets containing AD and normal human brain samples are used for validation of differential gene expression. The sample size and dataset details for both FGCN mining and validation sets were summarized in Table 1.
Raw microarray datasets were processed using R/Bioconductor package "affy" to generate normalized expression profiles of the brain samples by RMA normalization using default parameters (Gautier et al., 2004). For preprocessing, all datasets were pre-filtered to remove probes without gene annotation. For genes with multiple probes, we selected the probe with the highest expression value for that gene (Zhang et al., 2010(Zhang et al., , 2012. For the RNA-seq datasets, genes with more than 50% zero expression levels across samples from each condition (i.e., AD or normal) were removed. Next, for both microarray and RNA-seq data, genes with variance in the bottom 20 percentile of the entire dataset were removed. Genes with mean expression value in the bottom 10 percentile were also removed. At the end of the processing, 10,931 genes that present in all five pre-filtered datasets were used for the following frequent co-expression network mining.

Frequent gene co-expression network (FGCN) construction and module detection
Frequent GCN modules were mined separately in AD and normal samples. For microarray datasets, Pearson correlation coefficient (PCC) was used as the correlation measure of every gene pair. For RNA-seq datasets, due to the large range of expression values and non-Gaussian distribution, spearman correlation coefficient (SCC), which is much less sensitive to outliers, was used as described in (Yu et al., 2004(Yu et al., , 2007.
Because the range of correlation values varies substantially among different datasets, instead of applying a uniform threshold across datasets, the gene pairs with top 5 percentile correlation coefficient values (|PCC| or |SCC|, with p-values less than 0.05) within each dataset was selected for the frequent network mining as in previous similar studies (Zhang et al., 2010(Zhang et al., , 2012. For each selected gene pair, the frequency was computed across five datasets and used as the edge weight for FGCN mining. Previously reported lmQCM (local maximized Quasi-Clique Merger) algorithm (Zhang and Huang, 2014) was used to mine the frequent gene co-expression modules in AD as well as in normal samples, which has been implemented by the online tool package TSUNAMI (Huang et al., 2019). lmQCM allows overlaps between modules and is capable of identifying relatively smaller coexpressed local modules than WGCNA. The lmQCM algorithm takes five parameters (t, l, g, b, and minimum module size) that were set as following: t = 1.0, l = 1.0, g = 0.81 (i.e., the initial coexpressed gene pair of a FGCN module should be a gene pair showing up in at least four of the five datasets), b = 0.3, and minimum module size = 10. Prefixes "AD" and "N" were used to designate modules mined from AD samples and the ones from normal brain samples respectively.

Differential expression (DE) measurement of GCN modules between AD and normal samples
To determine if a gene module is differentially expressed between AD and normal samples within a dataset, we adopted the DE score measure for individual genes similar to Lui et al. (Lui et al., 2015) and extended it to measure a group of genes in a FGCN module. For a module with m genes from AD and normal samples, D denotes AD group, while N denotes normal group. We adopted the widely used absolute t-value in t statistics to quantify the degree of differential expression of expression levels. The t-value for a given gene i is defined as: where " and # were mean expression levels in disease and normal states, " and ! were samples sizes of disease and normal states, and " and # were standard deviations of expression levels in disease and normal states (Lui et al., 2015).
To adjust for the different module size, the module DE scores were normalized by dividing the sum of t-value of all genes in a specific module with the module size. The normalized DE score is considered the measure of the overall differential expression level for the module. Higher DE score indicates a larger overall difference expression of a module. For each frequent module, we calculated its DE scores in all five datasets and then get the median of them as the median DE score for that module.

Differential co-expression (DC) measurement of GCN modules between AD and normal samples
Similarly, to determine the differential co-expression level of a module in two different conditions (i.e., AD vs. Normal) within a dataset, we first computed the differential co-expression measure of each pair of genes in the module (Lui et al., 2015). For a pair of genes, we used a differential co-expression measure, Z, which quantifies the correlation difference between two genes in AD and normal samples. !% , between ! and % is defined as: where " and # were samples sizes in the normal and AD disease conditions, !% # and !% " were the Fisher-transforms of the Pearson/Spearman correlation coefficients !% # and !% " , which were defined as: We calculated values for every pair of genes in a given module, then normalize the score by dividing with the $ norm of values within a module. The resulted $ norm of values is considered the overall differential co-expression measure of the module, termed as DC score.
Higher DC score indicates larger overall differential co-expression level of a module between AD and normal samples. For each frequent module, we calculated DC scores in all five datasets and then obtained the median of them as the median DE score of that module.
The original z-score as well as the modified DC score only take the absolute value of a correlation coefficient into account. In order to determine if the module gains or loses correlation in AD versus normal condition, we also measured the correlation change using our previously developed metric Centered Concordance Index (CCI, Han et al., 2016). CCI ranges from 0 to 1, and the larger the CCI value, the stronger the genes are correlated within a module. The CCI boxplot of a given module were plotted across five datasets to measure if a module gains or loses correlation in AD vs. normal condition, and the significance of CCI change was analyzed by t-test.

Categorizing GCN modules into high and low DE and DC groups
With DE and DC measures for all modules defined, we further classified the modules into high or low DE/DC category. The median of the DC/DE scores of all modules were used to define "low" and "high" in DC/DE (solid lines on Figure 2

FGCN module functional enrichment analysis
ToppGene (Chen et al., 2009) was used to perform GO and pathway enrichment analysis of the FGCN modules. The functional enrichment terms were considered as significant enriched if they were with FDR adjusted p-value less than 0.05. Enriched transcription factors (TFs) targeting the module genes were obtained from Enrichr with database "TRANSFAC_and_JASPAR_PWMs" with cut-off for adjusted p-value being 0.05 (Kuleshov et al., 2016). To check if the module gene members are transcription factor or not, we compared the gene list against an online database TFcheckpoint (Tripathi et al., 2013) to uncover if any of the genes are TFs.

Differential gene expression analysis between AD and normal samples for individual datasets
We also performed differential gene expression (DGE) analysis for each of the five datasets with R package "limma" for two microarray datasets, as well as R package "limma-voom" (built for RNA-seq analysis) for three RNA-seq datasets respectively (Ritchie et al., 2015). Foldchange of 1.2 and FDR adjusted p-value < 0.05 were used as the threshold for differentially expressed genes selection. The differentially expressed genes were defined as differential expressed in at least two datasets. To confirm the DEGs from the above analysis, we further acquired four human brain transcriptomic datasets containing AD and normal samples from NCBI GEO database (GSE15222, GSE33000, GSE84422) and from Allen Brain Atlas Aging, dementia and TBI study (https://aging.brain-map.org/), and repeated the same DGE analysis on them. The results are updated in Table-3.

Brain cell type deconvolution on each individual dataset
The brain cell type-specific marker gene lists were obtained from McKenzie et al. (McKenzie et al., 2018) in which five human and mouse cell type-specific transcriptome-wide RNA expression datasets were compared and contrasted to identify consensus brain cell marker genes conserved across five datasets. We estimated brain cell type relative proportions using the R package The change of a specific cell type relative proportions between AD and normal samples was analyzed by Wilcoxon rank sum test, with a significant difference as the FDR adjusted p-value less than 0.05 or 0.01 (* for p-value < 0.05, ** for p-value < 0.01 in the boxplot).

FGCN modules enrichment of specific cell type marker genes
The same sets of marker gene lists for neurons, astrocytes, oligodendrocytes, microglia, endothelial were used for the cell types enrichment analysis for AD as well as normal FGCN modules, which are listed in Supplementary S1. Enrichment of certain cell-type markers in a module was assessed by hypergeometric test, with a significant enrichment as FDR adjusted p-value less than 0.05.

FGCN structure change analysis by identifying gain or loss hub genes from normal to AD condition
In a weighted network, the degree of a node which labels the node strength is defined as the sum of weights. We examined the gene node degree distributions of the co-expression networks generated from the highly correlated gene pairs in the five datasets (see the FGCN mining section) and defined the nodes with the top 5 percentile degree as hubs genes in this network based on literature (Sun and Zhao, 2010). We compared the hub genes in AD and normal networks and defined the hub genes identified in normal network but not in AD as lost hub genes and the hub genes shown in AD network but not present in the normal one as gained hub genes.

FGCN module eigengene computing and the correlation with cell abundance/AD clinicopathological traits
Principle component analysis (PCA) was performed on the expression matrices of the identified FGCN modules from AD as well as normal samples. The first principle component (PC1) of the expression matrix of a specific module was computed as eigengene value of that module to represent the weighted average expression profile (Zhang and Huang, 2014;Zhang et al., 2010). We

Examine the neuron proportion changes across different brain regions in AD and normal individuals respectively
In the analysis we observed some gene modules specifically enriched with neuron markers showed stronger correlation relationships in AD than in normal, which is intriguing given the perturbed neuron activities in AD. Since we performed the FGCN module mining with combined data for those datasets with multiple brain regions, we suspected that for neuron associated genes, their detected co-expression relationship may due to changes in neuron populations either across brain regions or across the cohort. Therefore, we first check if there are concordant cell population changes in different brain regions in a specific condition (AD or normal). Specifically, we examined if the estimated neuron or microglia population change correlates differently across brain regions in AD (comparing to normal brain) samples. The MSBB dataset was used as it contains matched samples of four brain regions from 56 AD patients and 40 normal persons. For each condition (AD or normal), we computed the CCI value that indicates the overall correlation of cell relative proportions across the four brain regions. We compared the CCI of neuron/microglia relative proportion across AD patients to that across normal subjects. Higher CCI for AD may lead to attribution of cell population changes in brain regions to the higher co-expression relationship for neuron gene modules in AD while lower CCI for AD may indicate that the co-expression relationship may mainly be due to neuron population changes or regulatory changes among patients.

Identify genes regulated by enriched transcription factors in neuron modules
To identify genes that can be regulated by transcription factors listed in Table-3, we downloaded ChIP-seq data for BCL6 and STAT3 from (Yan et al., 2013 andZhou et al. 2017), then overlapped the peak region from those data with our AD1 and N1 module gene list. The full list of genes with BCL6/STAT3 peaks in their promoter regions were listed in Supplementary

FGCN modules from AD and normal brains are separated into different groups by DE and DC levels
Gene co-expression network (GCN) modules are subsets of genes with tightly correlated expression profiles in a gene co-expression network. Genes in a GCN module are often coregulated and participate in common biological processes or pathways in homogeneous tissues.
Our frequent network mining approach identified frequent gene co-expression network (FGCN) modules detected across multiple datasets from independent AD studies, thus eliminating the potential systematic bias from cohort composition, experimental design, or technical platform in any single study. This approach allows us to identify bona fide co-expressed genes in a specific condition and eliminates spurious correlation from data/study bias (see Methods for detail). In this study, we obtained 15 FGCN modules in AD samples from the five large transcriptomic datasets mentioned previously (Table 1), ranging from AD1 (largest, 1,247 genes) to AD15 (smallest, 10 genes) and 9 FGCN modules in normal brains ranging from N1 (largest, 1,003 genes) to N9 (smallest, 10 genes)(Supplementary Table 1). The majority of these modules are specific to each condition (AD or normal) as determined by minimal gene overlaps (Supplementary Table 1), which is consistent with previous finding that AD is a complex disease with gene co-expression largely perturbed in AD brains compared with normal ones (Xiang et al., 2018;Zhang et al., 2013). Only three pairs of modules contain genes largely overlapped between AD and normal conditions (Jaccard index > 0.3) and the highest overlapping is between AD1 and N1 (Jaccard index 0.47; Supplementary Table 1).
To characterize the module in overall differential expression and differential co-expression levels, we calculated DE score and DC score between AD and normal samples respectively for all modules using the formula modified from Lui et al. 2015 (Methods; for scores for all modules, please see Supplementary Table 2). In Fig. 1a 1a).
To further characterize the FGCN modules and explore their possible association with AD pathology in different partitions, we examined those modules for the enrichments of DE genes, including biological process/pathway and cell type specificity (Supplementary Table 3). The majority of the modules are enriched with specific cell types (Table-2) or specific biological processes (listed on the side of Fig. 1a). We also checked if the modules are enriched with differentially expressed genes between AD and normal samples (Supplementary Table 4-5). The four partitions of modules each shows distinct enrichment of differentially expressed genes and categories of biological processes (Table 2). HDE_HDC partition showed large perturbation in expression levels. Among them, nearly or more than half of the genes in AD1, N1, N2 and N8 were up expressed in AD, while more than 80% of the genes in AD5, AD11 and N5 were down expressed.
These modules were significantly enriched in three categories of biological processes that all linked to AD pathology. More specifically, AD1, N1, N2 correspond to neuronal function (Uylings and de Brabander, 2002). N5 and AD11 are involved in angiogenesis and vasculature development, which is consistent with the previous finding that changes in vasculature leading to altered blood flow in the brain is a risk factor for AD (Dickstein et al., 2010;Kelleher and Soiza, 2013a;Koizumi et al., 2016). AD2 is linked to gliogenesis and cell motility. N8 is involved in gap junction, which supports the assumption that gap junction associated with blood brain barrier may be disrupted in AD to allow the access of virus or bacteria to AD brain (Cai et al., 2018;Devanand, 2018;Montagne et al., 2017;Readhead et al., 2018).

Cell type deconvolution analysis indicate significantly decreased neuron population and increased glia cells in AD brains
Mounting evidence showed that cell composition changes in AD brains compared to normal ones, with various extent of neuron loss and elevated gliosis (Andrade- Moraes et al., 2013;Gomez-Nicola and Perry, 2016;Jain et al., 2015). To quantify the cell type composition changes and their association with module differential expression or differential co-expression, we estimated the cell composition of five major brain cell types in AD and normal brain samples using BRETIGEA, a method that has been previously developed to successfully quantify the cellular composition (McKenzie et al., 2018). Our results clearly showed that neurons were consistently and significantly decreased in AD samples while microglia, astrocytes, and endothelia were all significantly increased in AD samples across all five datasets (Fig 2. a-d), which are consistent with previous reports (Cotman and Su, 1996;Fakhoury, 2018;Gomez-Nicola and Perry, 2016). We also observed that oligodendrocytes were only significantly changed in two of the five datasets.
The highly differential expression in microglia modules in HDE_LDC section were largely due to cell population change The overall differential expression of a GCN module may reflect the change of specific regulatory network/pathway, the representative cell population change, or the combinatory effect of both, any of which can reflect certain pathological changes in AD brains. Our combined DE-DC analysis provides an approach to differentiate the regulatory change and the cell population change.
Specifically, the HDE_LDC category, where modules are highly differentially expressed but with minimal disruption of gene co-expression level, indicates that the overall gene expression changes of these modules may likely result from the change of representing cell population. This partition includes N6, AD3, and AD14 (Fig. 1b) (Table 2). Notably, genes in this microglia module are also among the core microglia genes in previous study (Patir et al., 2019), and one third of this module (14 out of 42) are also identified from a microglia subpopulation from single cell transcriptomic study that is highly correlated to AD pathology (Mathys et al., 2019).
However, none of the 42 genes was among the recently identified disease-associated microglia genes (Keren-Shaul et al., 2017). The low DC scores indicate the correlation relationship within the module was not disrupted significantly in AD. Indeed, as shown in Fig. 3a, the networks of the microglia module in AD and normal samples showed comparably high density.
To further verify the co-expression relationships among genes within the microglia module, we used another correlation metric, centered concordance index (CCI, Han et al., 2016), to evaluate the microglia module's gene inter-correlation in each of the five datasets separately. The CCI values of AD versus normal samples align very closely to the diagonal line in the upper right corner, which are high in both conditions, indicating these genes are highly correlated with each other in both AD and normal samples in all of the five studies (Fig. 3b). Consistent with our postulation that the over-expression of microglia core modules may be due to cell composition change in AD, a significant increase of estimated microglia relative proportions in AD compared to normal is observed across five datasets (Fig. 2b). Furthermore, the microglia module expression, as modules across all five datasets (Fig. 3d). The result shows a strong positive correlation between DE scores and microglial cell proportion changes (PCC = 0.885, p-value < 0.05). All of these results, consistently observed across five independent AD cohorts, confirm that the microglia core genes' co-expressions are not perturbed in the AD brains and the module over-expression are mostly due to the high microglia proportion in the AD bulk brain samples.
Since some of the previous differential gene expression or co-expression studies of AD (Miller et al., 2008;Zhang et al., 2013) performed on bulk transcriptomic data may not take cell abundance change into account, our findings indicated that some of the identified AD associated microglia gene upregulations may need re-evaluation.

Neuron modules showed high DE that may be due to decreased proportion of neurons and
high DC that may indicate regulatory change and neuron hyperactivity.
The HDE_HDC partition in Fig. 1a contains gene modules that are highly disrupted in both gene expression and co-expression, which indicates that the gene expression changes of the modules are likely involving gene regulatory changes. We took two modules AD1 and N1 for further investigation. AD1 and N1 are the largest modules in AD and normal samples respectively. Genes in these two modules largely overlap with each other (Supplementary Fig. 3). Gene ontology (GO) and pathway enrichment analysis on the modules shows that both of them are highly enriched with neuron markers for neuronal functions, as well as energy metabolism and mitochondrial functions (Fig. 4a). To further investigate the difference between the two neuron modules, we categorized the genes from the two modules into three groups: the overlapping genes, the genes unique to AD1, and the genes unique to N1 (Supplementary Fig. 3). The GO/pathway enrichment results for the three groups are shown in Supplementary Table 7-9. The genes shared between the two modules are enriched in AD pathway, synapse associated pathways, the citric acid cycle, and respiratory electron transport pathway. Interestingly, the genes unique to AD1 are enriched with more synapse and neuron pathways, such as protein-protein interactions at synapses, glutamate binding, activation of AMPA receptors and synaptic plasticity, and voltage gated channels activity, while ubiquitin-mediated proteolysis was found unique to N1. The findings from genes only present in A_M1 support the recent discovery that neuron hyperactivity, especially the glutamate signaling hyperactivity observed in AD brains (Stargardt et al., 2015) , (Zott et al., 2019). The fact that ubiquitinmediated proteolysis only preserved in N1 implies that the decoupling of such activity with neuron activity in AD brains. It may reflect a disrupted normal ubiquitin-mediated proteolysis pathway in AD, which has been observed before in neurodegenerative disorders, and has been attributed to the filamentous protein aggregation in the affected brains (Layfield et al., 2003).
As seen from Table 2, about half of the genes in the two neuron modules (47.5% of AD1 module, 44.8% in N1) are under-expressed in AD brains in at least two datasets. We further investigated if the large proportion of under-expressed genes could be due to an overall downregulation of the neuron associated genes, or due to a decreased proportion of neuronal cells.
A consistently significant decrease of neuron cell population in AD compared to normal has been observed across all five datasets (Fig. 2a). To investigate whether the expression of neuron The high DC scores of the neuron modules imply large changes in correlations of gene expression profiles in AD. To check which cohort (AD or normal) has a higher within-module gene correlation, we computed the CCIs for the neuron modules. To our surprise, the CCI values are consistently higher in AD brains for all five datasets (Fig. 4b), which means these genes are more correlated with each other in AD than in normal brains. Since three of the five datasets contain multiple brain regions, we first checked if the neuron proportion varies consistently across different brain regions in AD brains more than in normal ones, which could potentially lead to a high correlation for the gene expression profiles. As we know, AD is first observed in hippocampus region of the brain and gradually spread to the entire cortex (Anand and Dhikav, 2012). The extent of neuron loss differs in brain regions due to different extent of damage to each region in AD progression, but this difference due to neuron loss may not exist in normal aging brains, which could potentially result in high correlation of neuron module genes in AD. In the dataset of MSBB, where four brain regions (frontal pole, superior temporal gyrus, parahippocampal gyrus, inferior frontal gyrus) are present, we discovered that the inter-correlation of neuron relative proportion variations across the four brain regions (CCI 0.820) was actually lower among AD patients than the ones among normal people (CCI 0.944). As a contrast, the inter-correlation of microglia relative proportion variations across the four brain regions among AD patients is comparable to that across normal persons (CCI 0.856 for AD, 0.868 for normal). We therefore concluded that the variation of neuron relative proportions across regions in AD did not contribute significantly to the high DC value in AD1/N1 modules.
To further check if the correlational change among in AD neuron module is really increased, we examined the GCN structural change by checking the presence of loss of hub genes in AD vs normal GCN. We obtained 547 hub genes each from AD as well as normal GCN networks respectively (designated by possessing top 5 percentile connections in each condition), then identified lost and gained hub genes in AD by comparing the two hub gene lists ( Supplementary   Fig. 6). The hub genes shown in normal GCN but not in AD GCN are denoted as lost hub genes, and the ones shown in AD GCN but not in normal GCN are denoted as gained hub genes in AD.
We obtained 136 genes for each case (Supplementary Fig. 6). We then crosschecked these genes with AD1/N1 and found that most of the changed hub genes are in the neuron modules AD1 or N1. genes, but also suggest that the DE changes observed in these two modules contain change from regulatory level.
To further confirm the expression regulation changes among the neuron modules, we checked if there are changed upstream regulators for the FGCN modules. We performed regulatory transcription factor enrichment analysis to identify regulatory cascades within each module using "TRANSFAC and JASPAR PWMs" database (Kuleshov et al., 2016). Many genes from AD1 and N1 were targeted by TFs that are frequently differentially expressed among the five datasets as well as the four additional validation datasets as shown in Table 3. Specifically, five TFs were predicted to regulate 116 to 204 genes in AD1 and six TFs were predicted to regulate 94 to 273 genes in N1 (for details see Supplementary Table 10). To verify if the TFs may regulate the predicted genes, we acquired the ChIP-seq peak data of the top differentially expressed TFs, Bcl6 and Stat3 (Yan et al, 2013 andZhou et al, 2017). We then crosschecked AD/N1 genes promoter region for binding peaks of these two TFs. We found 18 genes from AD1 and 56 genes from N1 The enriched differentially expressed TFs and their confirmed regulatory role in the M1 genes indicates that the high DE modules in this category are not only due to neuron loss, but also due to upstream regulatory changes. We therefore concluded that the overall decreased expression level of neuron associated genes in AD1/N1 are due to both decreased proportion of neuron cells and regulatory changes. However, without further data, it is difficult to quantify the degree of contribution for either cause.
Astrocyte modules (AD2 and N5) are also significantly differentially expressed and differentially co-expressed in this quadrant (Fig. 1a). In fact, the two astrocyte modules are the mostly elevated expressed modules. We have not further pinpointed the cause of these changes, but with the consistently increased proportions of astrocytes in AD brains among the five datasets ( Fig. 2c), we speculated that the two causes (i.e., cell population and up-regulation) both account for the changes in astrocytes modules, similar as observed in neuron modules.

HDE modules, especially astrocyte modules, are highly associated with AD neuropathological features and cognitive deterioration
Since HDE modules are highly enriched with neuronal and microglia functions, pathways, and genes that are previously linked to AD pathology, we examined the association of the expression levels of genes in these modules with neuropathological features and clinicopathological markers.
We  Table 11). Modules were ranked from high to low based on the average of correlation value to the three traits (Fig. 5a). We observed that high DE modules show significantly stronger association with the clinicopathological markers than the low DE modules (Fig. 5a). The expression of the top two modules, AD2 and N5, which are both highly enriched for astrocyte marker genes (Fig.1b), are strongly correlated with the three AD clinical biomarkers, consistent with the findings that astrocytes are involved in the sustained immune response and mitochondria-related function in AD brains (Fakhoury, 2018;Sekar et al., 2015).  (Fig. 5a). N8 is enriched with gap junction functions and is negatively correlated with all three traits, consistent with gap junction dysfunction observed in AD (Nakase and Naus, 2004). AD11 is enriched with angiogenesis functions, and genes are involved in vascular morphogenesis. It is positively correlated with CDR and Braak stage, supporting with the fact that neurovascular dysfunction plays important role in AD (Dickstein et al., 2010).
Furthermore, we also assessed the association of cell type relative proportions with clinicopathological markers, to check if the cell composition change patterns correlate with the disease progression, which shows the necessity of taking cell type deconvolution into consideration in the transcriptomic analysis with bulk tissue samples. This was achieved by computing the correlation of the cell proportion in the five major cell types (neuron, microglia, astrocytes, endothelial and oligodendrocytes) to the same three clinical biomarkers (CDR, BB score, Plaque_Mean) in MSBB dataset. We did observe high correlation of cell proportions with the three clinical biomarkers except for oligodendrocytes. More specifically, relative proportion of neuron was negatively correlated with clinical traits (Fig. 5b), consistent with the gradual neuron loss in AD progression (Uylings and de Brabander, 2002). Relative proportion of microglia and astrocytes were positively correlated with clinical traits (Fig. 5b), consistent with microgliosis and astrogliosis around amyloid plaques (Fakhoury, 2018). We also observed significant positive correlation of endothelial with AD progression, consistent with endothelial activation in AD (Grammas, 2011;Kelleher and Soiza, 2013b;Koizumi et al., 2016). No significant correlation of oligodendrocytes was observed, which indicated less association of oligodendrocytes with AD compared to the other four cell types (Fig. 5b).

Discussion
Due to the rapid development of next-generation sequencing, large cohorts of transcriptomic datasets from mixed cell types of human brains have accumulated quickly for AD research (Allen et al., 2016;De Jager et al., 2018;Wang et al., 2018). Although single cell sequencing technique start to emerge, issues such as the technical difficulty of cell dissociation and sample preparation, very high dropout rate for signal reading, and the scarcity of brain samples, made it unlikely to have large cohort of AD single cell studies available in the near future (Mathys et al., 2019). To fully take advantage of the existing bulk tissue transcriptomic datasets and identify AD associated pathways, comparative studies have been conducted between AD and normal brains (Miller et al., 2008;Zhang et al., 2013). Some studies also applied GCN network mining approach. For example, Zhang et al. applied weighted gene co-expression network analysis (WGCNA) and identified an AD associated module enriched in microglial function (Zhang et al., 2013). Miller et al. adopted WGCNA to uncover disease-relevant expression patterns for major cell types (Miller et al., 2013).
However, these studies all focused on just a single dataset. In this study, we mined frequent gene co-expression network (FGCN) on five independent transcriptomic gene expression datasets that encompass a total of 1,681 brain tissue samples from AD and normal human subjects to avoid potential system bias from a single dataset, and identified co-expressed modules that are consistently present in multiple brain regions in a large AD population. Member genes in every identified gene modules are consistently highly correlated with each other across all five datasets.
Many transcriptomic analyses have shown that there are significant upregulation of immune response genes and downregulation of neuronal function genes in AD (Miller et al., 2008;Zhang et al., 2013). It is widely accepted that microglia clusters near amyloid plaques and microgliamediated inflammation are activated in in the progression of AD (Clayton et al., 2017;Hansen et al., 2018;Katsumoto et al., 2018). However, most of such studies used bulk brain samples containing mixed cell types without taking microglia cell abundance change into account. To fully and accurately infer AD associated transcriptomic changes, it is necessary to re-examine the effect of cell population changes on gene transcriptomic levels in AD research. In order to distinguish whether the transcriptomic changes are due to changes in gene regulation or cell composition, or both, we introduced a new approach by combining the differential expression and differential coexpression analysis, together with cell type deconvolution to characterize cell specific FGCN modules and differentiate the cause of transcriptomic changes. To the best of our knowledge, this is the first study to combine multiple transcriptomic datasets of AD for frequent network mining and the first one with combined DE and DC analyses along with cell abundance estimation in AD.
The highly enriched microglial marker modules show highly differential expression but minimal co-expression change in AD compared to normal samples. Together with the increased microglia cell population from normal to AD and other analysis, we demonstrated that the overall expression changes of these modules are largely due to cell population change instead of transcriptional regulation, which is consistent with the findings by Srinivasan et al. (Srinivasan, et al. 2016).
However, this does not rule out the possibility of a subpopulation of disease-associated microglia (DAM) cell that may exist in AD brains, as previously identified in (Keren-Shaul et al., 2017, Rangaraju et al., 2018. Actually, out of the 2,077 DAM genes identified in the gene co-expression study (Rangaraju et al. 2018), eight of them are co-expressed in the core microglia gene module we identified (AD3/N6, Supplementary Table 13). However, when only examining this subset of genes, the subset's expression correlation (measured by CCI) are not altered significantly as shown in the Supplementary Figure-7. Please note that the 2,077 DAM signature genes were from APP transgenic AD mouse model samples (Rangaraju et al., 2018), which may be more similar in terms of disease mechanism to early onset AD rather than late onset AD analyzed in this work.
Regardless of the origin, their correlations remain unchanged, which indicates that, just like the rest of the microglia module genes, this subset of genes' relatively high DE can be explained by the increasing of microglia proportion.
In the recent published single cell transcriptomic analysis of 48 AD human samples (Mathys et al., 2019), a specific subtype of microglia is shown to have strong correlation between its cell abundance and the pathological phenotype. Interestingly, among the 77 signature genes for this cell subtype, 14 of them overlap with our combined microglia modules which contains 42 genes (Supplementary Table 13). This significant overlap (Fisher exact test p=4.38E-24) is consistent with our notion that these signature gene expression changes largely reflect cell population change instead of gene regulation change. The strong correlation between the 14 genes for the specific subtype with general microglia genes in our analysis suggests that the proportion of this subtype in general microglia population may remain consistent.
Please note that, the microglia core gene module does not include any cytokine gene, which were known to be upregulated in the neuroinflammation of AD brains (Morimoto et al., 2011). We do not rule out either the possibility that certain individual microglia genes, such as TREM2 or TYROBP, are up-regulated in AD brains in some samples as observed in (Celarain et al., 2016;Haure-Mirande et al., 2017;Ma et al., 2015). However, in the five datasets we have analyzed, the tightly co-expresed core microglia module as a whole, its expression change is largely due to the microglia proliferation in affected AD brains.
For the genes in AD1 and N1 modules, although they share many common genes, a lot of hub genes have changed. Several transcription factors that are differential expressed in AD may account for such change (Table-3), which differential expressions were further confirmed in additional four AD transcriptomic datasets from separate studies (Table-1 and 3). To check if they indeed regulate the genes in those two modules, we further analyzed BCL6 and STAT3 as the examples since they the most frequently differentially expressed TFs. We found these two TFs bind to a large number of genes from AD1 and N1 as indicated from ChIP-seq experiments, which provided strong evidences that they serve as the upstream regulators for those genes' expression.
Besides the experimental evidences from transgenic mice and that Bcl6 and Stat3 regulate M1 genes, as mentioned in the Results. We also found 21 of the AD1 genes are Stat3  Although we observed overall decreased expression of neuron modules which was consistent with the neuron loss in AD brains (Srinivasan et al, 2016), to our surprise, we also detected an increment in overall co-expression level of the neuron modules in AD brains across all five datasets, which is a new discovery. This increased correlation in the neuron module may be linked to the neuronal hyperactivity in AD previously observed in both human and mouse models (Stargardt et al., 2015). Evidence from both imaging and molecular studies revealed that neuronal hyperactivity, especially glutamate signaling, occurs in the cortex and hippocampus during or even before MCI, which modulates Aβ levels and triggers synaptic dysfunction in AD (Pomara and Bruno, 2018;Stargardt et al., 2015;Zott et al., 2019). Multiple experiments in mouse model of AD also showed clustering of hyperactive neurons near amyloid plaques and the reduction of neuronal hyperactivity can prevent further build-up of amyloid plaques and synapse loss (Busche et al., 2008(Busche et al., , 2012Lerdkrai et al., 2018;Nuriel et al., 2017). Since many genes that are uniquely presented in the AD neuron module are also involved in glutamate signaling and synaptic activity (see Supplementary   Table 2), we speculate that the strong co-expression among these genes as well as the upregulated TFs may imply some form of hyperactivity still remained for neurons in symptomatic AD brains.
Such signals could be previously overlooked due to the overall decreased neuron population in late AD stage (previous studies indicated such hyperactivity only present in the asymptomatic phase of AD (Bossers, et al. 2010 ;Lau, et al. 2013)). In the future, we plan to further quantify the impact of neuron population change on expression signals and mitigate its effect to discover real expressional changes due to regulation.
Our results show that modules enriched with neuron gene markers are under-expressed while modules enriched with microglial, astrocytes, endothelial gene makers are over-expressed in AD brains. This is consistent with recent single-cell transcriptomic study on AD and normal brain samples that both up-and down-regulated DE genes were highly cell-type specific, with most genes perturbed in neuron or a specific glia cell type (Mathys et al., 2019). While previous analysis of AD bulk transcriptomic data may have overlooked the effect of cell type or cell population change, our study showed that cell type relative abundance changes actually play a major role in microglia and neuron gene differential expression in AD patient brains.
As for the other three cell types, our finding showed that oligodendrocyte cell is not ubiquitously changed across five datasets, which is a novel discovery. However, we detected significant cell abundance changes for astrocytes and endothelia cells in AD samples as well, which suggests that differential gene expression analysis on bulk tissue transcriptomic data also need to adjust for cell population change when genes are associated to those cell types. Such adjustment will help researchers pinpoint the bona fide mRNA regulatory changes associated with disease initiation and progression.
Due to the rarity of human brain samples and the difficulty in isolate specific cell types from the brain tissue, single cell transcriptomic analysis is still scarce in this field and most transcriptomic analysis are still using bulk tissue brain samples. With comparative analysis being the major type of transcriptomic analysis, the influence of signal changes resulted from cell composition change cannot be overestimated and our findings address the need for a clarification for this issue.

Conclusion
Through FGCN module mining, we identified cell type specific modules that are differentially expressed and/or differentially co-expressed in five large AD cohort studies. Applying further computational analyses on these gene modules, we demonstrated that the overall increased expression of the core microglia module can be well explained by the increased microglia cell population observed in AD brains instead of bona fide upregulation of microglia gene expression.
We were also able to untangle the overall decreased expression in the neuron modules of AD to identify both the neuron cell loss and expression regulation of neuron associated pathways, both of which are a novel discoveries. The overall increased co-expression level of the neuron module may result from the neuron hyperactivity reported previously. Further network hub gene investigation confirmed the co-expression changes, and we identified several differentially expressed transcription factors (BCL6 and STAT3) in multiple studies of AD brains that are potential key regulators for such changes. Furthermore, although not a focus in this study, we observed significant changes in the expression and co-expression levels for the two astrocyte modules, which may also result from a combinatory effect from cell population and the astrocyte gene activation in AD brains. More importantly, these two modules' expression highly correlate with the three universally used AD clinical markers, which indicates that astrocytes genes can serve as good biomarkers for disease progression. This work also demonstrated that the combination of DE, DC and cell type deconvolution analyses not only provides a powerful approach to delineate the origin of the transcriptomic change in bulk sample data, but also leads to a deeper understanding of the roles of the genes in the disease progress.

Ethics approval and consent to participate
The work composed of only publicly available data, therefore no ethnic approval or consent is needed.

Consent for publication
All the authors consent for publication of this work.

Availability of data and materials
All major analysis results are attached in submission and available for public. Further request can be fulfilled by contacting corresponding authors.  There is no color match between panels a and b.    are displayed respectively. The modules are ranked by the mean scc value and marked by corresponding cell types. ast: astrocytes, mic: microglia, neu: neuron. b. Correlation of brain cell type relative abundance with clinical traits. scc and p value between estimated brain cell type relative proportions and each of CDR, Braak & Braak score, PlaqueMean are displayed respectively. The brain cell types are ranked by the mean scc value. ast: astrocytes, mic: microglia, neu: neuron, end: endothelia, oli: oligodendrocytes.