Exploring biomarkers and therapeutic targets for ischemic stroke through integrated microarray data analysis

Miao Lv Guangxi Medical University Wanting He Guangxi Medical University Tian Liang Guangxi Medical University Jialei Yang Guangxi Medical University Xiaolan Huang Guangxi Medical University Shengying Liu Guangxi Medical University Xueying Liang Guangxi Medical University Jianxiong Long Guangxi Medical University Li Su (  suli2018@hotmail.com ) Guangxi Medical University https://orcid.org/0000-0002-1435-9669


Introduction
Stroke was the third leading cause of disability-adjusted life years globally, and the second leading cause of disability-adjusted life years for the age groups over 50 years old (Collaborators. 2020). Stroke was the leading cause of China's 2017 life lost years and disability-adjusted life years (Zhou et al. 2019). Among them, 85% of strokes were caused by cerebral ischemia, which were caused by obstruction of blood circulation in the brain. There are an estimated 11 million stroke cases and 1.1 million stroke-related deaths in China each year. The report of China's National Stroke Screening Survey conducted from 2013 to 2014 showed that compared with the values reported in the past ten years, the prevalence and morbidity of stroke have increased signi cantly (Wu et al. 2019). The treatment strategies for IS are limited. The only approved drug therapy to restore blood ow is intravenous injection of tissue-type plasminogen (tPA) (Group. 1995), However, it must be treated within 4.5 hours of the onset of symptoms. Therefore, the treatment time window is quite narrow, and many patients cannot receive treatment in time (Hacke et al. 2008; Henderson et al. 2018). Therefore, we still need to nd reliable diagnostic biomarkers and therapeutic targets for IS.
Microarray technology makes genome-wide research possible, can perform genome functional annotations, and helps to explore the potential molecular mechanisms of many diseases ( Ayyildiz and Piazza 2019). weighted correlation network analysis (WGCNA), a systematic biological method, is increasingly being used to analyze gene expression data. WGCNA analysis can identify the interaction between different expression modules, and nd highly related gene modules, which promotes gene screening based on the construction of the network and can be used to identify candidate biomarkers or therapeutic targets (Langfelder et al. 2008). Recent studies have shown that hub genes related to the occurrence and development of diseases, candidate biomarkers and therapeutic drugs have been identi ed through bioinformatics technology. Rui  In this study, we identi ed genes associated with IS through the integration of WGCNA and gene differential expression analysis. LASSO regression screened hub genes, and then the TF-gene network and miRNA-TF regulatory network were constructed through the NetworkAnalyst database. Finally, we predicted IS-targeted therapeutic drugs and small molecule compounds through DGIdb and CMap databases.

Data collection and Preprocessing
Microarray datasets GSE58294 and GSE16561 were downloaded from the (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). The onset of 69 ischemic stroke samples includes 3 time points (3h, 5h, and 24h). In this study, ischemic stroke samples in the time point of 24h were selected for analysis. The data set GSE16561 includes 24 control samples and 39 ischemic stroke samples. The details of two microarray datasets were listed in Table 1. Normalization (limma package), log2 transformation and gene name conversion were performed for each data set using Rstudio4.0.5 software.

Construction of weighted correlation network analysis WGCNA)
The "sva" package was used to remove the batch effect on the combined data sets of GSE16561 and GSE58294. After detecting and removing the missing values of the gene expression matrix, a weighted co-expression network was constructed through the WGCNA package. WGCNA's pick Soft Threshold function was used to calculate soft-threshold power. The correlation of module genes between the case group and control group was limited to be greater than 0.5 to screen and obtain module characteristic genes related to ischemic stroke. In addition, we used the "clusterPro ler" and "enrichplot" packages to perform functional enrichment analysis (Gene Ontology, GO and Kyoto Encyclopedia of Genes and Genomes, KEGG).
Identi cation of Differentially expressed genes (DEGs) The standard of differential expression analysis was |log FC|>1, and the P value<0.05. Genes with log FC>1 and the p value<0.05 were up-regulated, while log FC<-1 and the p value< 0.05 were down-regulated.

Identi cation of overlapping genes
Overlapping genes were selected between modular genes by the weighted coexpression network and DEGs of GSE16561 and GSE58294. An online site (http://www.ehbio.com/test/venn/#/) was used to plot Venn diagrams.

Screening of hub genes
Overlap genes were further screened by LASSO regression, which was performed through the "glmnet" package. Then, the "ggstatsplot" package and the "ggpubr" package were used to make a violin plot. Finally, the ROC curve of the hub gene was made by the "pROC" package.
TF-gene interaction network and TF-miRNA regulatory network Building a network of TF target genes and genetic data from ENCODE ChIP-seq (https://www.encodeproject.org/) database. A comprehensive analysis of TF-gene interaction was conducted to evaluate the effects of TF on the expression and functional pathways of hub genes (Ye et al. 2019). The NetworkAnalyst (https://www networkAnalyx.ca /) was used to visualize the interaction network between TF genes and hub genes. Relationships in TF-miRNA co-regulatory networks were collected from the RegNetwork library, which contained transcriptional and post-transcriptional regulatory relationships, and interactions between TF/ miRNAs and their targets could be obtained from this database (Liu et al. 2015). The NetworkAnalyst website was also used to visualize the TF-miRNA coordinated regulation network.

TRRUST database
The TRRUST database (https://www.grnpedia.org/trrust/) is currently the most comprehensive human-mouse TF target interaction database based on manual management. Researchers can search for key TF genes by submitting target genes (Han et al. 2018).

HPA database
To determine the expression and localization of hub genes in normal brain and blood, we downloaded immunohistochemical slices of hub genes in the brain and RNA expression data of the hub genes in the brain and blood from the HPA database (https://www.proteinatlas.org/).

Identi cation of candidate drugs and small molecule compounds
In this study, the Drug-Gene Interaction Database (DGIdb, http://www.dgidb.org) was used to explore effective targeting drugs for hub genes, which parameter select approval lter conditions, and the default database. The Touchtone function of Connection map (CMap, https://clue.io/) database was used to search for small molecule compounds related to the input hub genes, and small molecule compounds with a gene selection score of less than -90 play a reversing role in gene upregulation. The gene selection score of small molecule compounds is greater than 90, promoting the protective effect of genes (Musa et al. 2018).

WGCNA gene modules results
After detecting the missing values of 12694 gene expression matrices, 9520 gene expression matrices were obtained, which were used to construct a weighted co-expression network. When 0.85 was used as the correlation coe cient threshold, we chose a soft threshold of 9 (Fig.1a). A total of 17 coexpression modules (Except for the gray module) were identi ed by WGCNA, the size of which ranged from 91 to 791 genes. Genes with different characteristics were divided into different modules to obtain gene cluster trees and different module colors (Fig. 1b). The heat map showed the relationship of module's characteristic genes between the case group and the control group with the correlation coe cients and signi cance P-values of each module (Fig. 1c). A summary of the signi cance of ischemic stroke-related genes in each module was shown in Fig. 1d.

Identi cation of DEGs
Genes with log FC>1 and the p value<0.05 were up-regulated, while log FC<-1 and the p value< 0.05 were down-regulated. A total of 11 differential expression genes (DEGs) were identi ed in the data set GSE16561, including 6 up-regulated genes and 5 down-regulated genes. At the same time, a total of 304 DEGs were identi ed in the data set GSE58294, including 157 up-regulated genes and 147 down-regulated genes. The DEGs of the two data sets were merged to obtain 313 IS-related DEGs.

Functional enrichment of module genes associated with IS
The modules with high correlation with ischemic stroke were screened from 18 modules (except gray). The correlation between the screening module and ischemic stroke was greater than 0.5, three modules with high correlation with ischemic stroke were obtained, namely green (355), brown (449) and red (306). The three modules were merged, and 1110 module genes were obtained. Meanwhile, 1100 modular genes were included for GO and KEGG function enrichment analysis (Fig. 2a, b). GO enrichment analysis showed that biologic processes (BP) were mainly enriched in the T cell activation, neutrophil degranulation and neutrophil activation involved in immune response. For cellular component (CC), modular genes were mainly enriched in the organellar ribosome, mitochondrial ribosome and mitochondrial protein complex. For the molecular function (MF), modular genes were mostly enriched in immune receptor activity, ribonucleoprotein complex binding and protein N-terminus binding. KEGG pathway analyses were mainly enriched in the NF-κB signaling pathway, apoptosis, lipid and atherosclerosis pathway.

Integrated analysis of WGCNA and DEGs
Integration analysis on the selected three modular genes and IS-related DEGs were performed. A total of 52 genes were obtained by taking the intersections of 313 IS-related module genes and 1100 module genes (Fig. 3a).
Screening of hub genes 52 overlapping genes were further screened in the merged data (control=47, ischemic stroke=62) by LASSO regression analysis, and 10 hub genes were obtained Fig. 4a). The violin map showed that ARG1, LY96, ABCA1, SLC22A4, CD163 were up-regulated, while TPM2, SLC25A42, ID3, FAM102A and CD79B were down-regulated, and P values were all less than 0.05 (Fig. 4b). Fig. 4c showed the ROC curve of 10 hub genes, Values of sensitivity, speci city and AUC (area under the curve) were displayed in Table S1. Among them, FAM102A had the highest diagnostic value and the value of its AUC, sensitivity and speci city were 0.973576, 0.919 and 0.936, respectively. LY96 had the lowest diagnostic value, with an AUC of 0.841, a sensitivity of 0.726, and a speci city of 0.894.
Construction of TF-gene interaction network and TF-miRNA regulatory network Ten hub genes were submitted to the NetworkAnalyst database for analysis, and network for TF-gene interaction (Fig. 5a) and TF-miRNA regulatory network (Fig. 5b) were obtained. In the TF-gene interaction network, ID3 had the highest interaction rate with other TFs, up to 75 degrees. Among the regulatory factors, TFDP1 had signi cant interaction, up to 4 degrees. In the TF-miRNA regulatory network, ABCA1 had a high interaction rate with miRNAs and TFs with a degree of 92. Among the regulatory factors of miRNA and TF genes, transcription factors SP1, hsa-miR-340, hsa-miR-27a, hsa-miR-27b and hsa-miR-96 had signi cant interactions with hub genes. Meanwhile, SP1 (P=0.0246) was the key transcription factor for the six hub genes ABCA1, ARG1, CD163, ID3, LY96, and SLC22A4 in the TRRUST database.

HPA database analysis
The ten hub genes were submitted to the HPA database, and the immunohistochemical slices of the eight genes in the brain (Fig. 5c) and the gene expression levels in the brain and blood were obtained (Fig. 5d). Among them, the expression levels of ABCA1, SLC25A42, SLC22A4, ID3 and FAM102A in the brain were higher than those in the blood.

Potential drug analysis of hub genes
The 5 hub genes including ABCA1, SLC25A42, SLC22A4, ID3 and FAM102A were uploaded to the DGIdb Database, and seven targeted drugs with hub genes were found. As shown in Table 2, there were six therapeutic drugs targeted with ABCA1, namely simvastatin, probucol, feno brate, cyclosporine, pravastatin and atorvastatin. Imatinib was a therapeutic drug targeted with SLC22A4. We also submitted these 5 hub genes to the CMap database for analysis. Four small molecule compounds targeted with ID3 and SLC22A4 were listed in Table 3. There are three small molecule compounds targeted with ID3, namely 2', 5'-dideoxyadenosine, DR-2313 ( Fig. S1) and otenzepad. Dubinidine was a small molecule compound targeted with SLC22A4.

Discussion
In this study, 10 hub genes related to the occurrence of IS were identi ed through WGCNA and LASSO regression analysis based on the merged datasets of GSE16561 and GSE58294. Values of the AUC for the 10 hub genes ranged from 0.841 to 0.974. ID3 had the highest interaction rate with other TFgenes in the TF-gene interaction network. ABCA1, transcription factors SP1 and hsa-miR-340 had a high interaction rate in the miRNA-TF regulatory network. HPA database analysis showed that ABCA1, SLC25A42, SLC22A4, ID3 and FAM102A had higher gene expression levels in the cerebral cortex than those in blood. We also predicted seven therapeutic drugs targeted with ABCA1 and SLC22A4, and four small molecule compounds targeted with ID3 and SLC22A4. KEGG analysis showed that modular genes were mainly enriched in 13 pathways. Among them, the NF-κB signaling pathway, apoptosis, lipid and atherosclerosis pathways may be related to the pathogenesis and development of IS. As for the NF-κB signaling pathway, evidence showed that ischemic preconditioning could activate the NF-κB signaling pathway, and enhance the expression of NF-κB to protein in neurons after lethal ischemiareperfusion Overexpression of the apoptotic protein bcl-2 can reduce cerebral infarction tissue in the middle cerebral artery occlusion model (Martinou et al. 1994). By preventing injury-induced down-regulation of the apoptotic protein bcl-2 expression, estradiol had a profound protective effect on cerebral ischemia caused by permanent middle cerebral artery occlusion (Dubal et al. 1999). The overexpression of a variety of pro-apoptotic and anti-apoptotic proteins in the penumbra after ischemic stroke may be used as potential therapeutic targets for stroke (Uzdensky 2019). For the lipid and atherosclerosis pathway, lipids accumulate on the arterial wall in the form of fatty deposits and cholesterol plaques. It was the main component of early atherosclerotic lesions (Kauw et al. 2018). Atherosclerosis was a chronic in ammatory disease characterized by the formation of lipid plaques on the blood vessel wall, which can increase the risk and recurrence of stroke (Chow et al. 2020).
Ten hub genes were identi ed in this study, which were ARG1, TPM2, LY96, ABCA1, SLC25A42, SLC22A4, ID3, FAM102A, CD79B and CD163. Previous experimental studies have revealed that 6 hub genes were related to the occurrence and development of stroke. Study have shown that STAT6/ARG1 regulated the efferocytosis of the microglia/macrophage phenotype, accelerated the resolution of in ammation, and improved the prognosis of stroke (Cai et al. 2019). Barr et al reported that LY96 and ARG1 had been identi ed as stroke expression pro le genes, which may be biomarkers of stroke (Barr et al. 2010). As a major reverse cholesterol transporter, ABCA1 plays a key role in the formation of high-density lipoprotein (HDL) cholesterol in the brain. ABCA1/ApoE/HDL pathway was at least partially involved in the process of nerve recovery after stroke induced by gw3965 (Cui et al. 2017). In addition, ABCA1 rs2230806 GG was signi cantly associated with an increased risk of IS (Au et al. 2017). SLC22A4's rs273909 was signi cantly associated with IS (P=0.0123) (Yamase et al. 2015), which increased the susceptibility of IS. ID3, an Inhibitor Of DNA Binding 3, was involved in stress response, neural plasticity, neural circuits, and other processes (Avecilla et al. 2017). CD163 is a scavenger receptor expressed in the innate immune cell population. CD163 served as a potential biomarker for monocyte activation in patients with ischemic stroke (Greco et al. 2021).
Adamski et al showed that the transcriptome AUC containing 7 genes PLBD1, PYGL, BST1, DUSP1, FOS, VCAN and FCGR1A was 0.854 for identi cation of stroke by the high throughput next-generation qPCR veri cation (Adamski et al. 2014). In our research, we nally got 10 hub genes, 9 genes had AUC values higher than 0.854, the AUC of 7 genes had higher than 0.9. FAM102A had the highest diagnostic value with the AUC of 0.974, sensitivity of 0.919 and speci city of 0.936, respectively.
NetworkAnalyst database analysis showed that transcription factor SP1 encodes a zinc nger transcription factor, which was involved in many cellular processes such as cell differentiation, apoptosis, and immunity. It has been reported that Curcumin up-regulated Peroxiredoxin 6 through SP1 to induce ischemic oxidative damage after stroke in rats ( HPA database analysis showed that the 10 hub genes in this study were expressed in the brain and blood. Studies have found a correlation between brain gene expression pro les and blood gene expression pro les, and blood gene expression pro les had the potential to evaluate various disease states in terms of diagnosis, mechanism and treatment (Tang et al. 2001(Tang et al. , 2002. A brain tissue gene expression prediction model based on whole blood gene expression data proved that only whole blood expression pro le data could accurately predict the gene expression of unsampled brain tissue Wenjian and Wei 2019). Iturria-medina Y et al showed that blood gene expression strongly re ected the severity of neuropathology, clinical deterioration and disease progression in vivo (Iturria-Medina et al. 2020). Therefore, we hypothesize that gene expression in the blood may re ect brain conditions in some extent. Among them, ABCA1, SLC25A42, SLC22A4, ID3 and FAM102A had higher gene expression levels in the cerebral cortex than in blood. Compared with the other ve genes, we speculated that these ve genes with high expression in brain tissue had more important value in the pathological mechanism of IS occurrence and development, and may be used as a target for drug therapy.
Next, we used DGIdb and cMap databases to predict 5 gene-targeted therapeutic drugs. DGIdb database analysis showed that there were 6 therapeutic drugs targeted with ABCA1, namely simvastatin, probucol, feno brate, cyclosporine, pravastatin and atorvastatin. Imatinib was a therapeutic drug targeted with SLC22A4. CMap database analysis showed that there were 3 small molecule compounds targeted with ID3, namely 2', 5'-dideoxyadenosine, DR-2313 and otenzepad. And dubinidine was a small molecule compound targeted with SLC22A4. Among the drugs targeted ABCA1, simvastatin, pravastatin and atorvastatin are all statins. According to previous studies, the use of statins for the treatment of stroke has long been widely accepted and used by related researchers, which has been promoted in the continuously updated guidelines. According to the 2013 AHA/ASA guidelines, statins were recommended to be applied in patients presumed to be atherosclerotic ischemic stroke or TIA to reduce the risk of stroke and cardiovascular events (Kernan et al. 2014). Statins can signi cantly reduce the infarct volume and neurological de cits in tMCAO model mice. At the same time, compared with atorvastatin, simvastatin and pravastatin were more signi cant in reducing infarct volume (Christophe et al. 2020). Statins seemed to reduce the risk of recurrence of ischemic stroke and other cardiovascular events in patients (Tramacere et al. 2019).
Probucol inhibited LPS-induced microglia activation and improved cerebral ischemic injury in mice through anti-neuro-in ammatory effects, and it may be a potential therapeutic drug for the treatment of ischemic patients (Jung et al. 2016). Feno brate could signi cantly reduce the volume of cerebral infarction, the activation of microglia and the in ltration of neutrophils in the ischemic area to play a protective effect on the brain, which may be related to the acute effect and preconditioning mechanism (Ouk et al. 2009). Feno brate played a preventive and protective effect on cerebral ischemia by reducing the infarct volume, and the related mechanism may be related to the genome regulation in the apoptosis pathway (Altintas et al. 2017). Cyclosporine A had potential neuroprotective properties in ischemic stroke, which interfered with the typical NF-κB signaling pathway, and the Akt pathway may also participate in the role of Cyclosporine A (Deng et al. 2020;Nighoghossian et al. 2016). Deng et al showed that DR2313 exerted neuroprotective effects through its powerful PARP inhibitory effect, and could easily penetrate the blood-brain barrier after peripheral to the medicine. It may be more useful than free radical scavengers in the treatment of acute stroke, which may be a promising drug for stroke treatment (Nakajima et al. 2005).

Conclusion
In summary, we nally screened out 10 hub genes that may play a key role in the occurrence and development of IS through comprehensive bioinformatics analysis. Among them, SLC25A42 and FAM102A may be new candidate biomarkers or therapeutic targets. In addition, we also predicted gene-targeted therapeutic drugs, and many previous studies have supported the value of these drugs in the treatment of IS.

Declarations
Funding This work was supported by National Natural Science Foundation of China (NO. 81874395), Natural Science Foundation of Guangxi Zhuang Autonomous Regio NO. 2017GXNSFDA198013 and NO. 2018GXNSFAA281224).

Con icts of interest
All authors declare that they have no con cts of interest.   Tables   Table 1 Basic imformation of microarray Data .