ARHGDIB Plays a Novel Role in the Braak Stages of Alzheimer's Diseases via the Immune Response Mediated by Microglia

Keping Chai (  ckpzjyy@126.com ) Zhejiang Hospital https://orcid.org/0000-0002-6307-9102 Jiwei Liang Huazhong University of Science and Technology Xiaolin Zhang Huazhong University of Science and Technology Huaqian Gu Zhejiang Hospital Panlong Cao Zhejiang Hospital Weiping Ye Zhejiang Hospital Huitao Tang Zhejiang Hospital Jingjing Liu Zhejiang Hospital Shufang Chen Zhejiang Hospital Feng Wan Huazhong University of Science and Technology Gang Logan Liu Huazhong University of Science and Technology Daojiang Shen Zhejiang Hospital

Methods: The transcriptional pro ling data of the human entorhinal cortex, temporal cortex (TC) and frontal cortex (FC) derived from individuals ranging from Braak stages 0 to VI was obtained from the GEO database (GSE131617) in NCBI. For WGCNA analysis, rst, we detected consensus modules of different brain regions and illustrated the relationships of these modules. The second, we conducted single sample gene set enrichment analysis (ssGSEA) to obtain the ecrichment scores of samples and screened the enrichment genes through the best subset regression and the random forest algorithm. Third, we analyzed the relationships between consensus modules (EC-FC, EC-TC, and FC-TC) and ssGSEA enrichment scores. Next, the overlapping genes between differentially expressed genes (DEG, between Braak stage 0 and Braak stage I-VI) and genes of interest in the module were discovered. Metascape analysis were conducted to determine the function of overlapping genes, and Random Forest classi er was preformed to obtain the most signi cant genes from the overlapping genes. The disclosed signi cant genes were nally identi ed through network analysis.
Results: Preservation ((PreservEC, FC) is 0.91, (PreservEC, TC) is 0.95, (PreservTC, FC) is 0.9) of consensus modules and connectivity of different brain regions illustrated very high correlation preservation of all pairs of eigengenes across the two networks. We found that the oxidative damage pathway plays a vital role in classifying the Braak stages via ssGSEA, random forest and best subset algorithm, the imp is 0.57. Through WGCNA analysis, the black module is found highly positively correlated with oxidative damage, which is involved in immune response mainly. Through step by step ltering of the module genes by overlapping with DEGs and Random Forest classi er analysis, we found that LYN, CD68, LAPTM5, IFI30, PI3KAP1, HCK and ARHGDIB were co-expressed and highly correlated with oxidative damage and immune response.
Conclusion: The co-expression network has strong similarities in different brain regions undergoing AD. Molecules such as IFI30, LYN, CD68, PTPRC, HCK, LAPTM5, FCERIG, and ARHGDIB play an important role in the early stages of AD through the in ammatory response mediated by microglia.

Page 4/17
Aberrant deposits of neuro brillary tangles (NFT), the mainly character of Alzheimer's disease (AD) [1], which is highly related to cognitive impairment. A lot of evidence shows that at different stages of AD, the distribution region of NFT in the brain is also different. For example, the entorhinal cortex (EC) is the area where NFT deposits occurs rstly in AD [2]. However, the pathological mechanism of its formation is still unclear. Several hypotheses, such as oxidative damage, oxidative stress, insulin resist, apoE, neuroin ammation and other theories were established [3,4]. Exploring the gene expression patterns of different brain regions, especially EC, may better help understand the mechanism of NFT formation.
Weighted gene co-expression network analysis (WGCNA) is a biology algorithm used to describe the correlation of clinical characters and gene expression based on the microarray data [5]. WGCNA can be used for clustering gene with highly correlated expression, for relating the modules to phenotypes to get the most phenotypic trait-related module, and for summarizing these co-expressed gene clusters by identi cation of the module eigengene or hub genes. Random forest (RF) is a more advanced machine learning algorithm based on decision tree [6]. Like other decision trees, random forests can be used for both regression and classi cation.
In this study, we performed ssGSEA, machine learning and WGCNA analysis on publicly accessible transcriptome data obtained from human cortex of individuals at different Braak stages. Evaluating the ssGSEA results, we found that the oxidative damage pathway play a vital role in classifying the Braak stages via random forest and best subset algorithm, the imp is 0.57. Through calculating the correlation coe cients between the modules and oxidative damage pathway, we obtained a module of interest. Next, we disclosed the overlapping genes between differentially expressed genes (DEG, between Braak stage 0 and Braak stage I-VI) and genes of interest in the module. Using these overlapping genes, we conducted metascape analysis and further identi ed the central players within the module through network analysis. We concluded that C1QA, C1QB, LYN, CD68, LAPTM5, IFI30, PI3KAP1, HCK and ARHGDIB genes are signi cantly associated with oxidative damage and immune response, which may be novel biomarkers involved in AD.

Methods And Materials
Data acquisition and preprocessing The data used in this paper was obtained from the GEO database in NCBI (Gene Expression Omnibus, http://www.ncbi.nlm.gov/geo), and the data entry number is GSE131617 [7]. The platform is Affymetrix downloaded and the expression matrix was obtained, and data ltering was performed before WGCNA analysis. For data ltering, rst, neuropathological diagnosis between minimal senile change and AD was performed. The second, the gene type of APOE is 3*3. 47 samples in the dataset were kept. Probe without corresponding annotation information were removed. There were about 13629 genes in the dataset.
Single sample gene set enrichment analysis (ssGSEA) ssGSEA is an implementation method proposed for single sample GSEA [8,9]. The difference between GSEA and ssGSEA is that ssGSEA no needs to prepare expression matrix le. The functions of gene set were acquired from Molecular Signatures Database (MSigDB) according to the describes in the review, including aging, insulin receptor pathways, oxidative stress, oxidative damage, NFT, Nicotine activity on dopaminergic neuron, etc. Performances of pathway in the gene set were quanti ed by ssGSEA algorithm (R package 'gsva') based on transcriptome pro ling data and pathway gene sets.
Application of best subset regression to nd the best subset of ssGSEA pathway The entorhinal cortical samples were grouped into individuals Braak stage 0 and Braak stages I-VI.
Inputting the ssGSEA scores into best subset regression model via leaps package to predict which group the samples belong to, the best number of features as the input for next analysis.
Application of Random Forest algorithm to nd the most important pathway and genes related to Braak stages The entorhinal cortical samples were grouped into individuals of Braak stage 0 or individuals of Braak stages I-VI. Inputting the overlapping genes counts and ssGSEA enrich scores into random forest classi er model via Boruta package to predict which group the samples belong to and the most important overlapping genes and to identify ssGSEA pathway for the most accurate model for grouping.

Construction of weighted gene co-expression network and identi cation of signi cant modules
Data was processed using R 3.4.2 software. To ensure that the results of network construction are reliable, abnormal samples were removed. Then, the weighted gene co-expression network was constructed by WGCNA package based on R 3.4.2. First, the Pearson correlation coe cient was calculated to assess the similarity of the gene expression pro les. The second, the correlation coe cients between genes were weighted by a power function to obtain a scale-free network. A gene module is a cluster of densely interconnected genes in terms of co-expression. Then, hierarchical cluster was used to identify gene modules and different modules were represented by different colors. Dynamic treecut method was used to identify different modules, the adjacency matrix was converted to a topology overlay matrix (TOM) and modules were detected by cluster analysis during module selection.

Correlation analysis of gene modules with clinical phenotype
To detect the associations of modules and clinical phenotype (ssGSEA scores), rstly, the clinical phenotype data and gene expression data were correlated using the match function. Secondly, the associations of the module eigengene (ME) and the clinicals phenotype were calculated by Pearson's correlation analysis. Modules showing signi cant association to oxidative damage pathway were obtained. At last, to further con rm the modules with signi cant correlation to oxidative damage, the correlation coe cient between the module membership (gene expression level) with gene signi cance (GS, for assessing the association of genes with phenotypes) was calculated using the labeleHeatmap function, and the p values were obtained.
Finding the overlapping genes between the differentially expressed genes (DEG, between Braak stage 0 and Braak stages I-VI) and genes of interest in the module veri ed by WGCNA The entorhinal cortical samples were grouped into individuals at Braak stages 0 and individuals at Braak stages I-VI and Limma packages were preformed to nd the DEG [10,11]. Next, the overlapping genes between downregulated DEG and genes of interest in the module were discovered by using online veen tools (http://bioinformatics.psb.ugent.be/webtools/Venn/).

Metascape analyses, identi cation of Hub genes and protein-protein interaction (PPI) analysis
For the obtained overlapping genes, functional enrichment of Gene Ontology (GO) and KEGG pathways analyses were performed using Metascap (https://metascape.org) [12]. Log P between -23 and -8 was considered to be signi cant enrichment. These enrichment results were also analyzed using Cytoscape for identi cation of important pathway [13]. The identi ed hub genes were further con rmed and analyzed using String network constructed by the online database String(http://string-db.org) [14].
Exploring the cellular distribution of the identi ed genes By using the Cell marker database (http://biocc.hrbmu.edu.cn/CellMarker/search.jsp) , the cellular distribution of the identi ed important genes was further explored.

Results
The identi cation consensus modules across different cortical regions Before WGCNA, the genes detected in GSE131617 were ltered according to the ltering procedure described in Method, and 13629 genes were obtained. Then the microarray data of 47 samples in each cortical region were read by R for Hierarchical clustering ( Supplementary Fig. 1a). The consensus network of scale independence and mean connectivity analysis showed that when the weighted value equals to 14, the average degree of connectivity was close to 0, and scale independence was greater than 0.9, so the weighted value was set to 14 ( Supplementary Fig. 1b). WGCNA was performed to identify consensus modules. Comparison between EC set-speci c modules and EC-FC consensus modules of the global coexpression network indicates that most EC modules are preserved in FC ( Figure.1.a). The strong overlap of the corresponding gene modules showed the similarity of cluster pattern in EC and FC region. Figure.1.b-g, Figure.S1c, Figure.S2 and Figure.S3 showed that the overall preservation of the three networks is positive correlation. The mean density of the three networks exceeds 0.9 in all 3 cortical regions, demonstrating that the overall structures of the co-expression networks are similar for the three cortical regions. These results indicated that the differences of these cortical regions may exist in the particular genes within the consensus network.
ssGSEA functional enrichment analyses and key pathway identi cation and validation help to nd the module of interest veri ed in WGCNA analysis in EC.
In above results, we found overall structures of the co-expression networks are similar for the three cortical regions. In addition, An abundance of studies have shown that in the Braak stages I-II, aberrant deposits of NFT rst appears in the entorhinal cortex, which is signi cant for nding the potential biomarkers and therapeutic targets of AD.
To explore the signaling pathways most related to Alzheimer's disease, rst, the ssGSEA analysis was performed ( Figure.2.a). The genset of pathway related to Alzheimer's disease can be seen in TableS1. The second, best subset regression was conducted to identify the representative subset ( Figure.2.b, c). From the results, we can see that the feature number of best subsets is 8, and GO-NFT, HP-NFT, oxidative damage and axon degeneration pathway are saved in the best subset. Next, we performed the random forest algorithm based on the sklearn and boruta packages to analyze the best subset of data to nd the most important features, as shown in Figure.2.d and TableS2, oxidative damage pathway was found to be the most important feature.
To identify modules which is most signi cantly associated with oxidative damage pathway in EC, the Pearson's correlation coe cient between the module and oxidative damage was calculated. The highest positive association in the module trait relationship was found between black module and oxidative damage score (cor = 0.88, p <0.001, Figure.2e), we also found that the black module has a high correlation with the aging and clearance pathway(cor=0.69, 0.56, p <0.001, Figure.2e). Thus, the black module was selected as module of interest in subsequent analysis.

Identifying hub genes in black module
First, to nd the DEG between Braak stage 0 and Braak stages I-VI, the EC samples were grouped into individuals at Braak stage 0 and Braak stages I-VI and Limma packages were performed. About 10% of the genes were signi cantly changed (p <0.05, Figure. 3a). Next, we performed overlap analysis between DEGs and Top30 genes in black module by online veen tool, we found 26 genes which were in DEGs also in black module ( Figure. 3b-d). These genes highly related to oxidative damage, suggesting that they might play an important role in oxidative damage -AD.

Identifying hub genes functional annotation
The above identi ed overlapping genes were subjected to GO functional and KEGG pathway enrichment analysis. Biological processes of overlapping genes were found to focus on regulation of in ammatory response and leukocyte degranulation. Molecular functions of overlapping genes were found to focus on IgE binding, non-membrance spanning protein tyrosine kinase activity and phosphotransferase activity( Figure. 4, Figure.S4 ).

Identi cation of the most signi cant genes and network construction
To identify the most important genes related to oxidative damage, the overlapping genes were further ltered by RF classi cation. Gene counts were input into RF classi er model, and the unimportant genes, such as C1QA, C1QB, CTSC, SLC2A5, UCP2 and others were removed ( Figure. 5a,TableS3). To ascertain the signi cance of genes and analyze the network in the corresponding modules, the PPI maps were constructed via String (Figure. 5b). Hub genes in network, including PTPRC, LYN, LAPYM5, HCK, IFI30, ARHGDIB and PIK3AP1 was constructed. In the cell marker database, we found that the distribution of above genes in brain cells is very similar, mainly in microglia cell( Figure. 5c).

Discussion
NFT is the major pathological character of neurodegenerative diseases, such as PD (Parkinson's disease)/AD [15]. Exploring the mechanism of NFT formation is extremely important for discovering the therapeutic targets in these diseases. In this study, we performed WGCNA, ssGSEA and machine learning analysis on the dataset GSE131617 which includes 71 samples from individuals at Braak stages between 0 and I-VI; such data from multiple samples based on the different brain regions (EC,FC,TC) is a good candidate for WGCNA analysis. First, consensus modules between different brain regions were constructed, 7 consensus modules were identi ed between EC and FC. Figure.1.b-g, FigureS1c, FigureS2 and FigureS3 showed that the overall preservation of the three networks is positive correlation. The mean density of the three networks exceeds 0.9 in all 3 consensus modules, demonstrating that the overall structures of the co-expression networks are similar for the 3 cortical regions. However, the purple, pink, greenyellowand magenta module of EC were not recognized in the consensus module (EC, FC), indicating that the difference between the two regions is related to these modules. Furthermore, the black and red modules in EC that are most related to oxidative damage and clearance pathway have not been recognized in the consensus module identi ed by EC and TC (FigureS2, Figure.2.e). These showed that TC is quite different from EC in the signal pathway of oxidative damage and clearance.
Many studies have shown that the EC is the region where NFT deposits occurs rstly [16]. Therefore, studying the changes in the transcription level of the EC in the neurodegenerative diseases is extremely important for discovering early biomarkers and therapeutic targets of these diseases. In this study, when we performed ssGSEA and random forest analysis on the dataset of EC samples, we found that the unexpected oxidative damage signaling pathway is most important when distinguishing between Braak stage 0 and Braak stages I-VI rather than the signaling pathway related to NFT( Figure.2.a-d). This indicates that among the important basis of Braak stages, the formation of NFT is more likely due to changes in the expression level of genes related to oxidative stress pathway, rather than the NFT signaling pathway. When we analyzed the overlapping genes in the black module which is most related to oxidative damage and the DEG, we found that these genes are not only related to oxidative damage, but also related to immune response and microglia-mediated in ammation ( Figure.2e, Figure.3, Figure.4). In order to identify genes that are most intensively related to Braak stages, we further used one of the machine learning algorithms, Random Forest, and inputted the expression matrix of the overlapping 26 genes as feature values into the model for training, and nally screened out 9 key genes( Figure.5.a,  TableS3). When analyzing these 9 molecules, we found that most of them are expressed in microglia ( Figure.5.c), which further indicated that microglia plays an important role in the Braak stages (0 VS I-VI).
It has been reported that activated microglia can induce the formation of NFT [17], several hypotheses can explain how the activated microglia mediates the formation of NFT, such as complement pathway, IL-CDK5 pathway, exosome secretion, etc [18][19][20][21]. However, it needs further discussion that how these molecules such as LYN, HCK and PTPRC, which distributed in the microglia, promote the formation of NFT. LYN and HCK as Non-receptor tyrosine-protein kinase can combine with NLRP3 which involved in the phosphorylation of tau and the formation of NFT to promote the release of IL1B from microglia [22][23][24][25]. In the co-expression network, PTPRC and LAPTM5 were identi ed as hub genes too. PTPRC is not only an important regulator of T cell and B cell antigen receptor signal transduction, but also an enzyme that dephosphorylates LYN. It has been reported that LAPTM5 can not only regulate the production of proin ammatory cytokines in macrophages, but also regulate the antigen receptor signal transduction of T cell and B cell [26]. And there is a lot of data showing that LAPTM5 and PTPRC are not only co-expressed in AD/PD ( Figure.5.b), but also in systemic lupus erythematosus, lung cancer and other diseases [27][28][29]. This indicated that LAPTM5, PTPRC may play a similar role in the phosphorylation of LYN. Moreover, in this study, we found that the decrease of the expression of these co-expressed genes at Braak stage I-VI which is negatively correlated with the degree of NFT needs further discussion. It has been reported that the expression of LYN in activated microglia is less than that of homeostasis microglia [30]. This indicate that LYN plays a role in activated microglia, and the decrease of PTPRC and LAPTM5 may lead to an increase of phosphorylated LYN, so that it can promote the release of in ammatory factors.
In this study, we also found that IFI30 and FCERIG in the co-expression network are also distributed in microglia ( Figure.5.b,c). It has been reported that both of them are highly expressed in microglia around , which may imply that the two of them are involved in the function of clearance ( Figure.2.e). However, in this study we found that their expression in Braak stage I-VI decreased. How their reduction in microglia promotes the formation of NFT still needs further study.
To our surprise, ARHGDIB was found mainly co-expressed with LAPTM5 and PTPRC in the co-expression network. Its related pathways are involved in GPCR signaling pathway, apoptosis and survival Caspase cascade [32]. Through the network analysis ( Figure.5.b,c), we speculated that it may has similar functions with LAPTM5 and PTPRC. The decrease of the expression of ARHGDIB may also play a role in the formation of NFT. Further studies are still needed to reveal the function of ARHGDIB in microglia.

Conclusions
Last but not least, through machine learning and WGCNA on microarray data from EC, FC and TC, we not only revealed the similarities and differences in the co-expression network between the above 3 brain regions, but also uncovered some novel molecules, such as HCK, LAPTM5, IFI30 and ARHGDIB, which mediate the formation of NFT in the microglia of EC.      Identifying the most important genes via RF and the cellular distribution of the important genes in the brain. a) Random Forest algorithm result. The blue box plot corresponds to the minimum, average, and maximum Z scores of a color attribute. The red, yellow, and green boxes represent the Z scores of rejected, tentative, and con rmed genes, respectively. b)The PPI network of important genes via String. c) The heatmap shows the selected genes distribution in different cell types.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.