Using Weighted Gene Co-Expression Network Analysis to Identify Key Genes Related to the Onset of Preeclampsia


 Background: Preeclampsia is a form of hypertension in pregnancy, which induced by complicated factors. However, the pathogenesis of the disease is unclear. The present study was aimed to discover the critical biomarkers associated with the occurrence and development of preeclampsia. Methods：Gene data profile GSE75010 was downloaded from the Gene Expression Omnibus (GEO) database and used as discovery cohort to establish a WGCNA network determining significant modules which associated with clinical traits. Subsequently, functional enrichment analysis, pathway analysis and protein-protein interaction (PPI) network construction were performed on the core genes in significant modules to identify hub genes. Then, gene data profile GSE25906 was used as verified cohort to determine their diagnostic value of hub genes. The protein expression levels of these hub genes in preeclampsia and control placental tissues were verified using immunohistochemistry method. Finally, GSEA was performed to analyze their enrichment pathways. Results: Total 33 co-expression modules were identified after the establishment of WGCNA, of which 4 gene modules were identified as significant modules because they were related to multiple (>3) clinical traits. Total 75 core genes in significant modules were analyzed, and results showed that they were mainly enriched in adaptive immune response (Gene Ontology term) and platelet activation (Kyoto Encyclopedia of Genes and Genomes term). Finally, a total of 5 genes including TYROBP, PLEK, LCP2, HCK, ITGAM were identified as hub genes which scored high in PPI network and had high diagnostic value. Furthermore, the protein level of these 5 genes in placental tissues of preeclampsia was lower than that of the control group. Moreover, these 5 genes were all enriched in 17 pathways, including autoimmunity pathway. Conclusions：These 5 genes (TYROBP, PLEK, LCP2, HCK, ITGAM) may be closely related to the pathogenesis of preeclampsia, which may also help the diagnosis and therapy of preeclampsia.


Background
Preeclampsia is a common obstetric disease which occurs in 5-7% of pregnant woman causing more than 70,000 maternal deaths and 500,000 fetal deaths per year [1]. Pregnant women with preeclampsia will have elevated blood pressure after 20 weeks (systolic blood pressure > 140, diastolic blood pressure > 90), causing proteinuria or severe complications. Various studies [1], [2] demonstrated that preeclampsia signi cantly increases the risk of cardiovascular disease in the mother and fetus in the future. In addition to the developmental risks caused by premature birth, preeclampsia also increases the possibility of the occurrence of mental and neurological diseases in offspring [3]. At present, the American College of Obstetricians and Gynecologists (ACOG ) [4] proposed that pregnant women with high risk of preeclampsia should take aspirin before 16 weeks to prevent it. However, the only treatment for preeclampsia is still delivery. Therefore, uncovering the molecular mechanism of the development of preeclampsia may help the clinical therapy.
Although the pathogenesis of preeclampsia remains known limit, various study had demonstrated that abnormal placenta formation or abnormal placental vascular changes may contribute to the development of preeclampsia [5].
With the delivery of the placenta, preeclampsia symptoms have alleviated or disappeared. Therefore, preeclampsia is considered a placental disease [1]. Therefore, most of the research on the mechanism of preeclampsia focuses on the genetic changes of the placenta. Endothelin-converting enzyme-1 (ECE-1) expression in placenta is signi cantly reduced in preeclampsia, suggesting that it caused poor placental vascularization by affecting the normal regulation of endothelin-1 (ET-1) in placental [6]. In addition, RGS2 (Regulator of G Protein Signaling-2) expression in placenta of preeclampsia patients is reduced, G protein cascades activated by hormones associated with preeclampsia is enhanced. [7]. At present, with the development of technologies such as big data and high-throughput sequencing, omics has become a new method for studying differences in gene expression pro les of preeclampsia placenta [8].
Therefore, using bioinformatics to analyze the pathogenesis of preeclampsia has a strong advantage.
Bioinformatics analysis for placenta gene expression pro le is an effective method commonly used to identify genes involved in the progression of placenta. Lei et al [9] found that the expression of ARRDC3 increased in preeclampsia by analyzing the difference between the expression of preeclampsia and normal placenta gene pro les in the GEO database. In addition, cell functional experiments were conducted and it was found that overexpression of ARRDC3 gene can inhibit the invasion and migration of HTR8/SVneo cells. Weighted gene co-expression network analysis (WGCNA) is an advanced bioinformatics algorithms which clusters genes that are highly related to clinical traits as module, and calculates the correlation coe cient between the modules and clinical traits [10]. Therefore, gene clusters associated with clinical traits were identi ed.
In present study, WGCNA analysis and related experiments were performed to identify key genes associated with the development of preeclampsia. Our present study may provide the novel biomarkers for the diagnosis of preeclampsia, as well as potential targets for the clinical therapy.

Materials And Methods
Data Collection and Preprocessing.
Gene expression pro le data GSE75010 and GSE25906 were downloaded from the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) [11]. Gene expression pro le GSE75010 was used as discovery cohort, while GSE25906 was used as validation cohort. The GSE75010 dataset contained the gene expression pro le information of 80 preeclampsia placentas and 77 non-preeclampsia placentas using a GPL6244 platform (Affymetrix Human Gene 1.0 ST Array).The GSE25906 dataset was performed using the GPL6102 platform (Illumina human-6 v2.0 expression beadchip) and contained the gene expression pro le of 23 preeclampsia and 37 normal control placentas samples. Before performing WGCNA analysis, the probe names in GSE75010 were converted to the gene names (If the gene symbol corresponded to more than one probe names, the mean value of these probes was considered the expression value of the gene). Finally, 21045 genes were annotated.

WGCNA.
Before performing WGCNA, the gene expression pro le of 80 preeclampsia placentas were used to determine the good samples and good genes via a "goodSamplesGenes function" in R software (version 3.6.3). Mean FPKM = 0.5 was used as the criterion to lter the low-expression genes. Furthermore, Pearson correlation analysis was used to cluster the samples and detect outliers. The threshold for identifying outliers was set as 60, and one outlier (GSM1940547) was found and removed. Finally, total 21045 genes of 79 preeclampsia placenta samples were enrolled to construct WGCNA, as well as the corresponding clinical traits of the patients including age, body mass index (BMI), nulliparity, previous hypertension mean uterine perfusion index (PI), mean umbilical perfusion index (PI), maximum systolic blood pressure (BP), maximum diastolic blood pressure (BP ), proteinuria ,delivery mode, gestation days, infant gender ,newborn weight z-score, neonatal intensive care unit (NICU) transfer, placental weight z-score and umbilical cord diameter. The R package "WGCNA" (https://www.r-project.org/; version 3.6.2) was used to construct WGCNA. In WCGNA, genes were analyzed with each other by the Pearson's correlation test and constructed a similarity matrix. The suitable soft-thresholding power was determined which ensured scale independence > 0.85 and the mean connectivity ~ 0. The adjacency matrix was transformed into the topological matrix (TOM), and TOMbased dissimilarity (1-TOM) measure was used to cluster the genes using the hClust function. The mini size of module gene numbers set as 30 to identify the gene modules based on the means of the dynamic tree cut method.
Identi cation of key clinically signi cant gene modules and module core genes.
After the construction of WGCNA, the correlation between the gene modules and clinical traits were analyzed. The clinical traits mainly included the maternal age, BMI, nulliparity,previous hypertension, mean uterine PI, mean umbilical PI, maximum systolic blood pressure, maximum diastolic blood pressure, mode proteinuria, mode of delivery, gestation (days), infant gender, newborn weight z-score, NICU transfer, placental weight z-score, umbilical cord diameter. Clinically signi cant gene modules were determined which were relative to the clinical traits meeting the cutoffs as correlationship score (cor) > 0.3 and P value < 0.05. While clinically signi cant gene modules related to more than 3 clinical traits, the modules were considered as key clinically signi cant gene modules. Then, the module membership (MM) of each gene in clinically signi cant gene modules was calculated. Genes with MM score ≥ 0.8 was set as module core genes.
Functional enrichment analysis and KEGG analysis of module genes.
All module core genes were uploaded to the DAVID 6.8 [12] (http://david-d.ncifcrf.gov/) to perform the Gene Ontology (GO) functional annotation. GO functional annotation included three aspects: biological processes (BP), molecular function (MF) and cellular components (CC). Furthermore, the Kyoto encyclopedia of genes and genomes (KEGG) enrichment pathway analysis was performed on kobas3.0 [13] (http://kobas.cbi.pku.edu.cn/). All results met the cutoff as adjusted P-value < 0.05 and were visualized by R software.
PPI network construction and identi cation of hub genes.
STRING [14] (version 11.0; https://string-db.org/) was used to analyze the interaction between module core genes. The PPI network was visualized by Cytoscape software (version: 3.7.2; https://cytoscape.org/). The R software was used to count the connections between genes, the number of connections represents the degree of connectivity. The genes with the top 10%-degree score were selected as the hub genes for further analysis.
Validation of the expression of hub genes.
The expression of hub genes between 79 preeclampsia placentas and 77 non-preeclampsia placentas in gene expression pro le GSE75010 was analyzed using GraphPad Prism (version 7.0; GraphPad Software, Inc.). Data was presented as mean ± standard deviation. Unpair independent t test was used for statistical analysis, and P < 0.05 was set as cut-off to consider statistically signi cant.
Diagnostic value of hub genes.
The GSE25906 dataset was used as veri cation cohort to evaluate the sensitivity and speci city of the core genes via ROC curve analysis. SPSS 22.0 (IBM, Corps.) was used to create ROC curves. While the area under the ROC curve (AUC) was more than 0.6, the genes were considered to have prominent diagnostic value.

Tissues ethics.
The procedures of the present study were ethically approved by the Clinical Ethics Management Committee of Nanfang Hospital of Southern Medical University (NFEC-2019-133,July 26, 2019). A total of 15 normal placenta tissues and 18 preeclampsia placenta tissues were obtained between February 2020 and May 2020. All women included in this study did not have any complications other than preeclampsia. All pregnant women included in this study signed an informed consent.
All tissues are fresh tissues obtained within half an hour after delivery, and then routinely xed with paraformaldehyde at room temperature for at least 24 hours. Then gradually dehydrate with stepwise alcohol and dimethylbenzene, nally these tissues were embedded in para n and cut as 4-µm-thick serial sections. Subsequently, after heating at 60 ˚C for 1 h, it was dewaxed with xylene at room temperature and then rehydrated with graded ethanol (100, 80, 60, 40%). After extracting the antigen with sodium citrate (100 mm), the sections were blocked with 3% H 2 O 2 for 20 minutes, and the sections were blocked with 3% bovine serum albumin for 30 minutes at room temperature. Then we incubate all samples with primary antibodies, including TYROBP (cat. Subsequently, the secondary goat anti-mouse and rabbit antibodies labeled with horseradish peroxidase immunohistochemical staining (HRP) were incubated at room temperature for 2 hours. The sections were stained with HRP-3,3'-diaminobenzidine kit and the pictures were collected with an orthographic microscope (Olympus BX51). The results of immunohistochemistry were determined by staining intensity and positive cell percentage: Stain intensity score: 0 (no staining), 1 (+), 2 (++) and 3 (+++) ; the proportion of positive cells score: 0 (0-1%), 1 (1-33%), 2 (34-66%) and 3 (67-100%); Then the values were multiplied.
In order to explore the potential function of these hub genes in preeclampsia, the Gene set enrichment analysis method was applied. In the GSE75010 data set, 80 preeclampsia samples were divided into two groups based on the median expression levels of these hub genes. GSEA was used to evaluate whether a set of a priori de ned genes show statistically signi cant differences between the two biological states. In this study, GSEA4.0 software was used, and c2.cp.kegg.v7.1.symbols.gmt dataset was selected as the reference dataset. Then P value < 0.05 and NES > 1.5was chosen as the cut-off criteria.

Results
WGCNA co-expression network construction.
The outlier sample (GSM1940547) was removed, and the remaining gene expression pro le and corresponding clinical traits of 79 preeclampsia samples in GSE75010 were nally enrolled into WGCNA analysis (Fig. 1). We selected the power of β = 6 as the soft-thresholding power to achieve the optimal scale independence (> 0.85) and mean connectivity (~ 0) (Fig. 2a-b). Similarly, checking scale free topology showed that the soft-thresholding power was suitable which ensured scale free R 2 ≥ 0.85 (Fig. 2c). Finally, thirty three co-expression modules were obtained including oral white module, light green module, maroon module, thistle2 module, cyan module, dark orange module, paleturquoise module, light pink4 module ,plum2 module, light cyan module, brown4 module, dark magenta module, purple module, navajowhite2 module, grey60 module, palevioletred3 module, sky blue3 module, ivory module, pink module, yellow green module, midnight blue module, dark turquoise module, light yellow module, bisque4 module, white module, salmon4 module, steel blue module, dark slate blue module, yellow module, darkorange2 module, blue module, green module. Genes without co-expression relationship were set in grey module (Fig. 2d).
Identi cation of clinically signi cant modules and module core genes.
In order to clarify the relationship between preeclampsia clinical traits and gene modules, we performed Pearson analysis on each module and clinical traits. Genes in light green module were signi cantly associated with gestation duration (R = 0.31, P = 0.006) and nulliparity (R=-0.4, P = 0.0003). The genes of cyan module were signi cantly associated with umbilical cord diameter (R=-0.39, P = 0.0003). Moreover, genes in dark orange module were signi cantly associated with newborn weight z-score (R=-0.39, P = 0.0005), and were signi cantly associated with umbilical cord diameter (R=-0.32, P = 0.004). Genes in light cyan module were signi cantly associated with age (R=-0.31, P = 0.005), while genes in navajowhite2 module were signi cantly associated with BMI (R = 0.49, P = 0.000004). The genes in purple module were signi cantly associated with the mean umbilical PI (R=-0.32, P = 0.004), while the gens in pink modules were signi cantly associated with mean uterine PI (R = 0.38, P = 0.0005) and gestation days (R=-0.41, P = 0.0002). Furthermore, the oral white module, plum2 module, dark magenta module and yellow green module were closely related to various clinical traits (> 3), so these were regarded as the signi cant modules for subsequent analysis (Fig. 3). Through calculating the module membership (MM) of each gene in signi cant modules, 75 genes with MM > 0.8 were identi ed and set as module core genes (Table 1).   (Fig. 4). The KEGG pathway enrichment analysis re ected that module core genes were enriched in pathways platelet activation, Fc gamma R-mediated phagocytosis, tuberculosis, natural killer cell mediated cytotoxicity, Fc epsilon RI signaling pathway, NF-kappa B signaling pathway, Toll-like receptor signaling pathway, Cell adhesion molecules (CAMs), phospholipase D signaling pathway and phagosome (Fig. 5).
PPI network construction and identi cation of hub genes.
Module core genes were uploaded to the String database for construction of PPI network (Fig. 6a). R was used to calculate the number of connections between genes, and the genes including PTPRC, TYROBP, PLEK, LCP2, HCK, ITGAM, BTK, CD86 with the top 10%degree score were obtained as hub genes (Fig. 6b). These hub genes were used further analysis.
Identi cation the expression of core genes between preeclampsia placentas and non-preeclampsia placentas.
Furthermore, the expression of hub genes between 79 preeclampsia placentas and 77 non-preeclampsia placentas in GSE75010 were analyzed. The results showed that the expression levels of these hub genes including PTPRC, TYROBP, PLEK, LCP2, HCK, ITGAM, BTK, CD86 were signi cantly decreased in the preeclampsia placentas compared with the non-preeclampsia placentas (Fig. 7).
Diagnostic e ciency analysis of hub genes. exhibited remarkable diagnostic e ciency (Fig. 8).
Immunohistochemical veri cation results of hub genes.
In order to verify the expression of hub genes between the two groups, we performed immunohistochemistry on 18 preeclamptic placenta tissues and 15 normal control placenta tissues. There was no difference in gestational days, age, and BMI between the two groups of pregnant women ( Table 2). The results showed that these hub genes (TYROBP, PLEK, LCP2, HCK, ITGAM) have lower expression in preeclampsia than the normal group (Fig. 9). Annotation: Clinical characteristics of two groups of pregnant women for immunohistochemical staining. The value was expressed as mean ± SD. The P value between the preeclampsia group and the control group was given. *P < 0.05.
In order to explore the potential biological functions of hub genes in preeclampsia, GSE75010 dataset was used for GSEA analysis. The results indicated that these 5 hub genes were all enriched in the 17 pathways including T cell receptor signaling pathway, leukocyte transendothelial migration, viral myocarditis, primary immunode ciency, FC epsilon RI signaling pathway, prion diseases, cell adhesion molecules cams, natural killer cell mediated cytotoxicity, complement and coagulation cascades, leishmania infection, systemic lupus erythematosus, antigen processing and presentation, intestinal immune network for IgA production, pantothenate and COA biosynthesis, nod like receptor signaling pathway, pathogenic Escherichia coli infection, glutathione metabolism (Table 3). These results indicated that these 5 hub genes may be involved in the development of preeclampsia via regulating these pathways.

Discussion
Preeclampsia is one of the most serious diseases that endanger the life of the mother and fetus. Many scholars were devoted to the study the pathogenesis and demonstrated that the possible pathogenesis of preeclampsia included genetic factor, angiogenic imbalances, poor immune adaptation, abnormal lipid metabolism, trophoblast invasive decreased and increased apoptosis or necrosis, abnormal placentation and excessive maternal in ammatory. Among them, most of the research evidence [15] indicates that abnormal placenta formation, and changes in the placental gene pro le have a greater impact on the trophoblast migration and invasion function, which leads to the occurrence of preeclampsia. Therefore, it is of great signi cance to explore the changes of placental gene pro le in preeclampsia.
In the current study, GSE75010 (discovery cohort) and GSE25906 (validation cohort) were downloaded from the GEO database. Through construction of WGCNA, GO analysis, KEGG analysis, construction of PPI network, expression analysis, diagnostic value validation and IHC experiment, TYROBP, PLEK, LCP2, HCK, ITGAM were identi ed as hub genes.
Awamleh [16] showed that ITGAM mRNA expression is reduced in the placenta of patients with PE and IUGR. ITGAM was regulated by miR-210-5p, and participated in the occurrence of preeclampsia by inhibiting the invasion and migration of HTR8 cells. In current study, we found that ITGAM gene belongs to oral white module, which is negatively correlated with mean uterine perfusion index (PI), mean umbilical perfusion index (PI), and positively correlated with the total days of pregnancy. In addition, immunohistochemistry and expression analysis indicate that the ITGAM gene is lowly expressed in preeclampsia and has good diagnostic value. Therefore, we hypothesizes that ITGAM gene is a protective factor for preeclampsia, which is consistent with previous research results.
TYROBP encodes a transmembrane signal peptide whose cytoplasmic region contains an immunoreceptor activation motif (ITAM) based on tyrosine [17]. TYROBP has been identi ed as a universal marker for macrophages in mouse and human tissues [18], and the preeclampsia pathogenesis is manifested by the maternal innate immune response to various proin ammatory stimuli from the placenta. LCP2 is a coding protein that encodes a substrate for the T cell antigen receptor (TCR) to activate the protein tyrosine kinase pathway [19]. Lcp2 protein expression is reduced in colonic epithelial cells of spontaneously hypertensive rat model [20]. Pan [21] proposed that TYROBP and LCP2 may be related to the pathogenesis of atherosclerosis in a high-fat diet Tibetan minipig model. Song.et al conducted a biological analysis of the expression pro le chips of periodontitis and normal gingival tissues and found that PLEK may interact with IRF-8 to cause periodontal in ammation, suggesting that PLEK may be related to in ammatory reactions [22]. Furthermore, PLEK is associated with the pathogenesis of abdominal aortic aneurysms, which are a type of vascular disease, and its pathogenesis may be related to oxidative stress, in ammation [23].
Therefore, PLEK is related to in ammatory response and oxidative stress response, so it may be involved in the development of preeclampsia. Hematopoietic cell kinase (HCK) is a member of the Src family of protein kinases, and is involved in various signal transduction pathways that regulate cell growth, proliferation, differentiation, migration, apoptosis [24]. In colorectal tumors, HCK is associated with prognosis and immune cell responses during local in ammation [25]. However, these genes are related to in ammatory response, immune response, cell migration and invasion, but none have been reported in preeclampsia. In our research, we found these genes belong to oral white module, which is negatively correlated with mean uterine perfusion index (PI), mean umbilical perfusion index (PI), and positively correlated with the total days of pregnancy, as well as possessing good diagnostic value. Moreover, the expression levels of these genes and the results of immunohistochemistry suggest that the placenta in preeclampsia is lower than that in the control group. Therefore, we speculate that TYROBP, PLEK, LCP2, HCK genes may be protective factors in preeclampsia, but more functional experiments are still needed to verify.

Conclusions
In summary, WGCNA analysis, GO analysis, KEGG analysis, expression analysis, diagnosis value analysis, GSEA and immunohistochemical stain were used, and 5 genes (TYROBP, PLEK, LCP2, HCK, ITGAM) which were considered playing a protective role in the development of preeclampsia. These genes may helpful for the diagnosis and treatment of preeclampsia.