Identi cation of Differentially Expressed Genes and Signalling Pathways Related to Ovarian Endometriosis Based on Integrated Bioinformatic Analysis

Kainan Lin Wenzhou Medical University First A liated Hospital: The First A liated Hospital of Wenzhou Medical University Zhenyan Pan Wenzhou Medical University First A liated Hospital: The First A liated Hospital of Wenzhou Medical University Renke He Zhejiang University School of Medicine Hanchu Wang Wenzhou Medical University First A liated Hospital: The First A liated Hospital of Wenzhou Medical University Kai Zhou Wenzhou Medical University First A liated Hospital: The First A liated Hospital of Wenzhou Medical University Liangshan Mu (  liangshanmu@zju.edu.cn ) Zhejiang University School of Medicine


Introduction
Endometriosis is a common gynaecological disease caused by active endometrium that in ltrates into peri-uterine sites, such as pelvic cavity (i.e., ovaries, external structure of uterus, uterosacral ligaments and pouch of Douglas) and the wall of pelvic organs [1]. Approximately 10-15% of reproductive-aged woman worldwide suffer from endometriosis, which causes chronic pelvic pain and infertility [2]. Although endometriosis was rst identi ed and described in the early 20th century, there has been no consensus on etiological theory to date. The most widely accepted theory was proposed by Sampson; this theory assumed that endometrium fragments migrated to pelvic cavity via the fallopian tube with the menstrual blood ow and were subsequently implanted in the ovary and other sites within the body [3]. The coelomic metaplasia theory proposed by Mayer and an immunological theory were also demonstrated to be credible by several studies [4,5].
Three main phenotypes of endometriosis are observed clinically: peritoneal endometriosis, ovarian endometriosis, and deep-in ltrating endometriosis [6]. Revised classi cation criteria released by the American Society for Reproductive Medicine are widely used to classify the severity of endometriosis from minimal (I) to severe (IV) in clinical practice [7]. Diagnostic laparoscopy is the most accurate method to diagnose endometriosis patients. In addition, the location of pain, infertility, positive results from medical imaging examination and CA125 evaluation in blood samples could also predict the onset of endometriosis. However, due to the invasive nature of these procedures as well as diagnostic inaccuracies, uncovering underlying mechanisms of the onset and progression of endometriosis is crucial for medical therapy.
Endometriosis is a complex disease that is related to multiple factors, such as immunology, endocrinology, genetics, and environmental factors. Studies showed that immediate family members of endometriosis patients exhibit a signi cantly increased risk of developing endometriosis [8]. In this regard, identifying endometriosis-related genetic variants is critical for susceptible populations. The differentially expressed genes (DEGs) could reveal signalling pathways potentially linked to the development and progression of endometriosis. Given the limited number of samples and inconsistent study methods, sample integration of included studies revealed signi cant heteroscedasticity.
With the emergence of newly developed study methods, integrated bioinformatic analysis has been demonstrated to be a reliable tool in molecular and biological studies of breast cancer and lung cancer [9,10].
In this study, three microarray expression datasets were downloaded and a total of 57 samples, including 27 cases of ovarian endometriosis and 30 normal endometrium samples from healthy female populations as control group, were included in this study. After identifying the DEGs, we performed GO analysis and KEGG pathway analysis. Then, the PPI network was constructed and visualized. Through this series of analysis, numerous key signalling pathways and potential candidate genes involved in the development and progression of endometriosis are identi ed. Results of this study provide potential molecular targets to help improve the diagnosis and treatment for endometriosis.

Materials And Methods
Gene expression data Microarray data of mRNA expression pro les related to progression of ovarian endometriosis were extracted and downloaded from the GEO database (http://www.ncbi.nlm.nih. gov/geo) of National Coalition Building Institute (NCBI). "Ovarian endometriosis" was selected as keyword for data retrieval, and species types were limited to Homo sapiens, and 22 datasets associated with ovary endometriosis were retrieved. After preliminary screening, gene expression pro les of GSE31515, GSE58178 and GSE120103 met the inclusion criteria of this study and thus were downloaded for further analysis. The dataset GSE31515 contained sequencing data from 3 endometriosis tissue samples and 6 healthy endometrial tissue samples. The platform used to assess the in uence of oxidative stress on endometriotic stromal cells (GSE31515) was GPL6480 Agilent-014850 Whole Human Genome Microarray 4x44K G4112F (Probe Name version). The gene expression pro ling of primary stromal cell cultures isolated from human endometrium and ovarian endometriosis (GSE58178), which contained data from 6 healthy human endometrial tissues and 6 human endometriotic tissues, was based on GPL6947 Illumina HumanHT-12 V3.0 expression bead chip platform. The dataset GSE120103 contained 18 endometrioma samples and 18 control endometrium specimens, and the platform for analysing GSE120103 was GPL6480 Agilent-014850 Whole Human Genome Microarray 4x44K G4112F (Probe Name version). Both platform and series matrix les were downloaded in the CSV data format in this study. The dataset information was displayed in Table 1.

Data processing
Gene sequence annotation was conducted using the platform le through Strawberry-Perl-5.30.2.1 (https://www.perl.org/get.html), and then the data were formatted into the gene expression matrix for subsequent operations. We then merged three gene expression matrices that were converted from the three GSE datasets mentioned above into a single gene expression matrix through Straw-perl-5.30.2.1. Genes that were not simultaneously expressed in the three gene matrixes were excluded from this study. Then, we used R 3.6.3 (https://www.rproject.org/) for subsequent data processing. For batch normalization of data, we used limma and sva packages in the Bioconductor 3.11 tool (http://www.bioconductor.org/packages/release/bioc/html/limma.html;http://www.bioconductor.org/packages/release/bioc/html/sva.html). In addition, Limma R software package was used to identify differentially expressed mRNAs. This study was conducted based on the thresholds of adjust P-value < 0.05 and |log2fold change (FC)| > 1. In addition, R software was used to construct heat maps and volcanic maps of DEGs between the case group and the control group.

Pathway enrichment analysis
The GO analysis is divided into three parts: Molecular Function (MF), Biological Process (BP) and Cellular Component (CC). Individual proteins or genes were identi ed by serial number correspondence or sequence annotation, and the GO number was used to locate the corresponding term, namely functional category or cell type. To better understand the pathways associated with DEGs in the pathogenesis of endometriosis and their corresponding molecular mechanisms, we rst divided DEGs into up-regulated and down-regulated groups. The enrichment analysis of the GO and KEGG pathways was then performed using the Bioconductor 3.11 tool via clusterPro ler package for the up-regulated and down-regulated groups, separately. P < 0.05 indicated statistical signi cance. The most relevant functional pathways of DEGs were determined using R package "pathview", and the location of each DEG was annotated in the functional pathway.

PPI network construction
The PPI among DEGs-encoded proteins was analysed based on Search Tool for the Retrieval of Interacting Genes (STRING) online database (http://string-db.org/) using a combined score of ≥ 0.4 as the cut-off value. To simplify diagrams, we removed all isolated or partially connected nodes and nally constructed a full-scale DEG network. Data from the STRING database were imported into CytoScape 3.8 (https://cytoscape.org/) for visual processing. CytoHubba plug-ins loaded in CytoScape software were used to construct and analyse functional modules.

Identi cation of DEGs in ovarian endometriosis
In total, 30 healthy women were enrolled as the control group, and 27 patients with ovary endometriosis served as the case group in this study. After randomly merging data from different mRNA expression pro les, we used R 3.6.3 for batch normalization to eliminate effects of different experimental factors. Here, |log2FC| > 1 and P < 0.05 served as cut-off values for data inclusion. In addition, we used the limma package to identify DEGs in datasets GSE31515, GSE58178 and GSE120103. The results show that 304 DEGs, which contains 185 downregulated genes (logFC < 0) and 119 up-regulated genes (logFC > 0) in the ectopic endometrial tissue (Table 2), are simultaneously identi ed in three mRNA expression pro les. We subsequently constructed volcano plots and cluster heatmaps of detected DEGs using R3.6.3. Data are presented in Fig. 1 and Fig. 2, respectively. Gene Ontology and KEGG pathway analysis of DEGs in ovarian endometriosis We used R 3.6.3 to convert gene symbols into entrez IDs for further analysis. BP, MF and CC represent the three major categories of GO analysis. Based on R 3.6.3 software, 185 up-regulated genes and 119 down-regulated genes were analysed separately, and results of GO analysis regarding BP, MF and CC are presented in Fig. 3. Regarding BP analysis, up-regulated DEGs are particularly enriched in the regulation of voltage-gated calcium channel activity, positive regulation of cation channel activity, positive regulation of voltage-gated calcium channel activity and response to molecule of bacterial origin. In addition, down-regulated DEGs are particularly enriched in uterus development, the planar cell polarity pathway involved in neural tube closure, appendage development and limb development (adjusted P-value < 0.005). Regarding MF analysis, identi ed up-regulated DEGs are mainly enriched in sarcolemma, sarcoplasmic reticulum, Z disc and sarcoplasm, whereas down-regulated DEGs are mainly enriched in collagen-containing extracellular matrix, endoplasmic reticulum lumen, cell-substrate junction and melanosome (adjusted P-value < 0.005). Regarding CC analysis, up-regulated DEGs are mainly enriched in receptor ligand activity, signalling receptor activator activity, chemokine activity and cytokine activity, whereas down-regulated DEGs are mainly enriched in collagen binding, platelet-derived growth factor binding and peptide disulphide oxidoreductase activity (adjust P-value < 0.005). GO functional analysis results, which include the BP, MF and CC analysis, are displayed in Fig. 3. KEGG analysis results are shown in Fig. 4, Tables 3 and 4.
These results demonstrate that detected up-regulated DEGs are mainly enriched in cytokine-cytokine receptor interaction, malaria and the IL-17 signalling pathway, whereas down-regulated DEGs are mainly enriched in protein processing in endoplasmic reticulum, tight junctions and Salmonella infection (P < 0.005). To uncover the most pertinent functional pathway related to DEGs, we constructed a speci c pathway diagram using R package "pathview" and labelled the 9 down-regulated DEGs related to protein processing in the endoplasmic reticulum and 10 up-regulated DEGs related to cytokine-cytokine receptor interaction (Figs. 5 and 6).

Protein-protein interaction network (PPI) and modular analysis
We used Cytoscape to construct the PPI network, which consisted of 219 nodes (proteins) and 542 edges (interactions). Then we applied cytoHubba to analyse the most signi cant interaction in the network, and the results show that interactions among CCND1, IL6, CCL2, COL1A2, PTGS2, VCAM1, COL3A1, ELN, SERPINE1, and HSP90B1 are the most signi cant interaction among selected DEGs, as shown in Fig. 7.

Discussion
Accumulating evidence suggests that endometrium of endometriosis patients exhibits anomalous gene expression pro les that are distinct from healthy populations [11]. To explore such a discrepancy, we selected three gene expression pro le datasets. By analysing these datasets, a total of 304 DEGs were identi ed, including 185 unregulated and 119 downregulated genes. Furthermore, these DEGs provided promising evidence that supported some new hypotheses of the onset of endometriosis.
The most signi cant symptom of endometriosis is chronic pelvic pain, which is related with the activation of voltage-gated calcium channels.
Calcium channels play a critical role in neuropathic pain development through the regulation of the release of excitatory neurotransmitters as part of the second messenger system [12]. In particular, although the endometriosis implants are ectopic, they are highly innervated [13]. A pilot trial showed that the use of gabapentin as a calcium channel blocker could manage pain effectively [14].
Genes related to uterus development were differentially expressed between healthy populations and endometriosis patients. Crispi et al. proposed the hypothesis that endometriosis may be connected with defects during the development of the female reproductive system. These abnormal genes might be associated with the erroneous localization of the cells that generate the ectopic endometrium during the migration phase [15]. BMP6 belongs to the transforming growth factor-beta super-family that includes proteins involved in cell growth and differentiation [16]. SERPINE1 is a member of the serine protease inhibitor family that contains inhibitors of tissue plasminogen activator and are involved in tissue remodelling. SERPINE1 exhibited signi cantly higher expression in ovarian cancer patients, indicating a potential role in tumour invasion and metastasis [17]. Up-regulated SERPINE1 expression could also explain some malignant characteristics of ectopic endometrium; however, samples included in this study were obtained from patients with deep in ltrating endometriosis.
KEGG analysis revealed that up-regulated DEGs were related to cytokine-cytokine receptor interaction. In patients with endometriosis, cytokine levels in the retained uid were increased compared with that in normal subjects, and ectopic lesions typically occurred at the site of retained uid. This nding suggested that cytokines might play an important role in endometrial cell peritoneal implantation, in ltration, vascular regeneration and local hyperplasia. TNF (tumour necrosis factor) is a cytokine derived from macrophages and monocytes that plays an important role in immune regulation, cell growth and differentiation. Enzyme linked immunosorbent assay results revealed signi cantly increased TNF-α levels in peritoneal uid of patients with endometriosis and infertility compared with the control group (P < 0.05). Increased TNF in the circulating blood attracts activated monocytes and in ammatory cells to the peritoneal cavity, and the number of monocytes was also increased in the local immune system, which stimulates the growth of ectopic endothelium [18]. In patients with endometriosis, a high concentration of VEGF (vascular endothelial growth factor) was noted in the peritoneal uid. Endometrial cells were locally attached and in ltrated the endothelium, and a large number of blood vessels are required to ensure their growth. VEGF is a proliferating factor that plays a major role in neovascularization. Bioinformatic analysis and molecular biology characterization indicated that pro-in ammatory cytokines induced VEGF-C overexpression to enhance lymphangiogenesis in lymphoendothelial cells by inhibiting COUP-TFII levels [19]. IL-8 is an angiogenic factor that promotes the proliferation of new blood vessels, allowing the ectopic endometrium to grow and proliferate, resulting in extensive pelvic adhesions. IL-8 is involved in the pathological process of endometriosis. Ectopic endometrial stromal cells exhibited signi cantly increased IL-6 and IL-8 mRNA gene expression and secretion compared with orthotopic and control endometrial stromal cells prior to treatment [20].
On the other hand, KEGG analysis suggested that down-regulated DEGs were related to protein processing in the endoplasmic reticulum. The endoplasmic reticulum played an important role in the interaction between multiple intracellular organelles along with collateral bio-activities, especially including processes associated with the proteasome and attached ribosome, protein synthesis and other processes. Accumulation of unfolded or misfolded proteins during endoplasmic reticulum stress activated a homeostatic coping mechanism called the unfolded protein response. Female reproductive tissues were highly active at cellular, molecular and genetic levels, and this activity requires the endoplasmic reticulum. In certain severe conditions, the unfolded protein response was not su cient to restore normal endoplasmic reticulum function, which further contributed to the pathogenesis of various diseases, including endometriosis [21].
GATA6 is a fertility-related gene that is expressed in vertebrate ovaries [19]. GATA6 overexpression, which is induced by aberrant methylation in endometriotic cells, regulated the expression of steroid metabolism and steroid hormone receptors. For example, the transcripts of oestrogen receptor α and progesterone receptor are reduced, whereas oestrogen receptor β transcript levels were increased. MMP11 is signi cantly reduced by the overexpression of GATA6 as shown in Table 2. This process transformed healthy endometrium away from spontaneous decidualization and toward the disease phenotype by restricting the ability of endometrial stromal cells to decidualize. PRL and IGFBP1 expression expected to be inhibited, but these genes were up-regulated [22].
A few bioinformatic analyses related to endometriosis have been performed. Tissues in the control groups from datasets were included. Zhang and Wang et al., Yao et al., and Cheng et al. obtained samples from not only endometriosis patients but also healthy women [23][24][25].
In addition, samples de ned as ectopic endometrium were obtained from various regions, which weakened the strength of the conclusions. In this study, we include samples from endometriomas from endometriosis patients and ectopic endometrium from healthy women. However, as this study is only based on analysis, further studies with larger samples and clinical trials are required to con rm the association of identi ed genes in endometriosis.

Conclusion
In conclusion, this study reveals the possibility of new pathological hypotheses for endometriosis, including bacterial contamination, defects in female reproductive system development and retrograde menstruation. This study also identi ed new drug targets in addition to the oestrogen receptor.

Consent for publication
Written informed consent for publication was obtained from all participants.

Availability of data and material
All data generated or analyzed during this study are included in the published article.

Code availability
All software and codes used during this study are included in the published article.
Authors' contributions KN Lin: Data analysis and manuscript writing.  Common DEGs PPI network constructed by STRING online database and Core protein analysis. There is a total of 219 DEGs in the DEGs PPI network complex. The nodes meant proteins; the edges meant the interaction of proteins. Core protein analysis via cytoHubba in Cytoscape software (rank by the degree)

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.