Identi cation of Immune Cell In ltration and Effective Diagnostic Biomarkers in the Lesion of Endometriosis Based on Multi-Bioinformatic Analysis


 Background: In recent years, it is proved that the immune system contributes a lot to the pathogenesis and progression of endometriosis (EM). Identifying the signature of immune cell infiltration, the immune related diagnostic biomarkers, and even the potential therapeutic targets in endometriosis patients will aid the early diagnosis and treatment of EM. Methods: Firstly, the microarray expression dataset was downloaded from the gene expression omnibus (GEO) database. Subsequently, the signature of immune cell infiltration of endometrium tissues was calculated through the xCell algorithm. The hub module most relevant to immune cells was selected through the weighted gene co-expression network (WGCNA). The differentially expressed genes (DEGs) were identified by the limma package of R. The immune-related hub genes were further selected. Finally, the validation and clinical characteristics of hub genes were performed.Results: In our study, it was showed that macrophages and neutrophils were the main infiltrating immune cells in the endometrium tissue and 4 hub genes related to immune infiltration in ectopic lesions of EM were identified through multi-bioinformatic analysis. The expression of those hub genes (TNFSF13B, IL7R, CSF1R, LEP) in other validated datasets was consistent with our results. Each hub gene had significantly correlation with most of immune cells, and the area under the ROC curves (AUC) of those hub genes for disease diagnosis were all higher than 0.8. We also find that those hub genes were connectively concerned with the common complication infertility of EM.Conclusions: It is proved that the dysfunction of the immune system is connectively associated with the ectopic lesions of EM. And the immune related biomarkers (TNFSF13B, IL7R, CSF1R, LEP) may be helpful in the development of the pathogenesis, diagnosis and even the potential therapeutic targets of EM.


Background
Endometriosis is de ned as the presence of functional endometrial glands and stroma outside the uterine cavity with the most common locations for the ectopic endometria (EC) being the ovaries. About 6 to 10% of women of reproductive age have been suffering endometriosis with a series of annoying symptoms, including dysmenorrhea, chronic pelvic pain, and infertility (1,2). Despite decades of research, the etiology and pathogenesis of endometriosis remain unclear. Multiple theories exist regarding its etiology, including the retrograde menstruation theory, hormonal conditions, gene pro les, immune disturbances (3). Currently, laparoscopy is the "gold standard" to diagnose and stage endometriosis, while the mean latency between the onset of symptoms to de nitive (surgical) diagnosis is 6.7 years (4). Therefore, it is urgent for us to explore the molecular mechanisms and novel diagnostic biomarkers underlying endometriosis.
Mounting research suggests that disturbed local and systemic immune systems are involved in the poor clearance and persistence of ectopic endometria of EM patients (1,5,6).The immune environment within the ectopic lesions of women with endometriosis is largely different from that within normal endometria, such as abnormalities in various cell types increased levels of activated peritoneal macrophages, abnormal activation of T and B lymphatic cells, various proin ammatory and regulatory cytokines (7)(8)(9).
Abnormal activated immune cells at and around endometriotic lesion sites can induce the release of a range of proin ammatory mediators and cytokines, which have been shown to promote the persistence of lesions (6, 10). Disturbed local and systemic immune systems in women with endometriosis have been also shown to increase rates of implantation failure and contribute to infertility in women with endometriosis (11).
Over the last decades, with the advancement and availability of high-throughput technologies, microarrays and RNA sequencing, large-scale transcriptome data were produced, which can provide a comprehensive understanding of the molecular mechanisms underlying concerned diseases (12,13). The novel gene signature method xCell (http://xCell.ucsf. edu/) was used to investigate 7 types of in ltration immune cells using extensively in silico analyses based on gene expression data (14). The weighted gene co-expression network analysis (WGCNA) can screen gene modules closely related to concerned diseases by analyzing the correlation between genomic and clinical information, thus providing a basis for further research (15). The Immunology Database and Analysis Portal (ImmPort) repository, applied in the validation of the methods used in the original studies, leveraging studies for meta-analysis, or generating new hypotheses, is an online database created for better immunology research in the future (16,17). In this study, we attempt to analyze the novel, and for the rst time, biomarkers related to immune cell in ltration and explore in ectopic endometria of EM patients and explore its clinical characteristic using multi-bioinformatic analysis.
2. Methods 2.1 Data collecting. The gene expression microarrays of GSE141549 downloaded from the GEO database, was based on GPL10558(Illumina HumanHT-12 V4.0 expression beadchip) and GPL13376(Illumina HumanWG-6 v2.0 expression beadchip) platforms and has been initially corrected to move the batch effects and combined (18). It included 408 samples obtained from healthy and patient endometrium, peritoneum, and patient endometriosis lesions. In this study, all enrolled samples we further analyzed were obtained from ectopic lesions (EC) of endometriosis patients and normal endometria (NE) of healthy peoples without hormonal medication, including 102 EC samples and 32 NE samples. The raw data can be availably downloaded and analyzed from NCBI-GEO.
2.2 Assessment of immune cell in ltration. The gene expression matrix was uploaded to the online tool named xCell(https://xcell.ucsf.edu/) (14). The "Rooney signature (N=7)" column was chosen to calculate the proportions of 7 types of in ltration immune cells in EC and NE samples, including B cells, CD4+T cells, CD8+T cells, DC cells, macrophages, NK cells, neutrophils.
2.3 Construction of the co-expression network and identi cation of the hub module related to immune cells. The gene co-expression networks of EC samples were constructed by the WGCNA package, and the proportion of immune in ltrated cells was input as WGCNA trait data to identify the hub module (15). The Pearson test (P < 0.05) was used to calculate the correlation between module eigengenes and immune cells, and the module most relevant to immune cells was selected and de ned as the hub module.
2.4 Screening of the differentially expressed genes (DEGs). The downloaded matrix was used to screen the differentially expressed genes (DEGs) of EC samples with the limitation of |log2 fold change (FC)|> 1 and an adjusted P-value of < 0.05 by the limma package. In addition, the differential expression of those hub genes in EC and NE samples were analyzed by the Wilcoxon test and visualized using the ggplot2 package.
2.6 Identi cation of immune in ltration related biomarkers for EM. The hub genes were recognized using the Venn diagrams of the overlapping genes identi ed from the hub module, DEGs and immune related genes (IRGs). IRGs were downloaded from the ImmPort database, which disseminates data to the public for the future of immunology (17).

Results
3.1 Gene expression signatures of the ectopic lesion of EM. The analysis procedures of this study were shown in Figure 1A. There was a total of 134 endometrial samples and 19746 genes in the GSE141549 pro le. Boxplot of this dataset was shown in Figure 1B. A total of 816 DEGs identi ed from EC and NE samples, consisting of 334 downregulated and 482 upregulated genes ( Figure 2A). GO and KEGG analyses of DEGs indicated that complement and coagulation cascades, cytochrome P450 were signi cantly enriched ( Figure 2B-C), which indicated that the dysfunction of the immune system and in ammation is connectively associated with the ectopic lesions of endometriosis.
3.2 Signature of the immune in ltration of EM. 7 types of in ltrating immune cells in the endometrium tissue were estimated by the xCell algorithm and showed in a heatmap and a violin diagram. As shown in the heatmap ( Figure 3A), macrophages and neutrophils were the main in ltrating immune cells in the endometrium tissue. In the violin diagram ( Figure 3B), the composition of CD4+T cells along with B cells, macrophages and neutrophils were signi cantly higher in EC samples than NE samples, while NK cells were signi cantly lower than that of controls. The Pearson test (p<0.05) was then used to evaluate the correlation among 7 types of immune cells ( Figure 3C). Most of the immune cells in the endometrium tissue were highly positive correlated, which indicated that immune in ltration is so interconnected. For example, neutrophils had the signi cantly high relationship with macrophages (R 2 =0.76, p<0.05) and B cells (R 2 = 0.57, p<0.05).
3.3 Construction of WGCNA and Identi cation of the hub module. The top 30% variation coe cient of genes (5924 genes) in 102 EC samples and the composition of 7 immune in ltrated cells corresponding to 102 EC samples were input to establish the co-expression networks and identify the hub module related to immune cells using the WGCNA package of R ( Figure 4A). When the soft-thresholding power was 8, a scale-independent topological network was established ( Figure 4B). Genes were classi ed into different modules using the dynamic hybrid cutting method, and the minimum module size cut-off value was 100. A total of nine gene modules were ultimately identi ed with the dynamic tree cutting method which was enforced for the construction of a hierarchical clustering tree by splitting the dendrogram at relevant transition points and the modules with a difference < 0.25 were combined ( Figure 4C).
3.4 Functional analyses of the hub module. The PPI network of the hub module was constructed, and 16 central nodes were selected with the connective degree >20 ( Figure 5A). Furthermore, the biological functions of the hub module were annotated. According to GO and KEGG analysis, most of the genes were involved in allograft rejection, antigen processing and presentation, cell adhesion molecules, lysosome, phagosome ( Figure 5B-C).
3.5 Identi cation and validation of immune in ltration related biomarkers for EM. There were 1793 IRGs downloaded from the ImmPort database. In the Venn diagram ( Figure 6A), there were 7 shared genes (LEP, C3, SLPI, S100A8, TNFSF13B, IL7R, CSF1R) recognized by the results of brown module, DEGs and IRGs mentioned above and put into the string database. Cause the immune response is so interconnected, the hub genes were de ned as the genes interacted with each other among 7 shared genes mentioned above, including TNFSF13B, IL7R, CSF1R, LEP ( Figure 6B). In Figure 6C, the expression of those hub genes in EC were signi cantly higher than that of controls. The independent validation pro les GSE7305 and GSE23339 were also downloaded from the GEO database and separately contained 10 paired EC and NE samples, 10 EC and 9 NE samples (22,23). The expression of those 4 hub genes in GSE7305 and GSE23339 was signi cantly consistent with our result (Supplemental Figure 1A 3.6 Identi cation of clinical characteristics of hub genes. The AUC of those hub genes to distinguish endometriosis patients and normal people were shown in Figure 7A. The AUC of 4 hub genes was higher than 0.8, with LEP being the maximum 0.906 and CSF1R being the minimum 0.834. The independent dataset GSE120103 contained 9 paired eutopic endometrium tissue samples of fertile (EM-F) or infertile (EM-IF) women undergoing endometriosis (24). As shown in Figure 7B, the expression of TNFSF13B, IL7R and LEP in EM-IF was signi cantly higher than that of in EM-F, while CSF1R was much lower. There we further inferred those 4 hub genes may be also connectively associated with the common complication infertility of EM. Figure   8A. The results indicated that hub genes were positively correlated with the composition of immune cells except NK cells. For example ( Figure 8B-C), TNFSF13B was positively related to B cells (R2=0.60, p<2.2e-16) and CSF1R was positively related to macrophages (R2=0.79, p<2.2e-16).

Discussion
Endometriosis is a common gynecological disease whose origin and pathogenesis are yet unknown. The widely accepted theory is the retrograde menstruation theory, which was proposed by Sampson that menstrual debris could pass through the fallopian tubes and implant into the abdominal cavity in a retrograde way. While retrograde menstruation occurs in about 90% of menstruating women, the prevalence of endometriosis is only about 6 to 10% of women of reproductive age (3,25). It indicated that retrograde menstruation theory does not adequately explain the pathogenesis of this disease. Therefore, there must be other possible explanations for it.
A well-functioning immune system could eliminate ectopic endometrial cells from the peritoneal cavity.
New ndings on the genetics, immunological dysfunction, intrinsic abnormalities, and secreted products of endometriotic lesions are associated with the reduced clearance of endometrial fragments and could facilitate the persistence of endometriosis (25).
Through comprehensive bioinformatic analyses of common dataset GSE141549, we compared the immune in ltration between EC and NE samples by XCell algorithm. The results showed that macrophages and neutrophils were the main in ltrating immune cells in the endometrium tissue, and the composition of CD4+T cells along with B cells, macrophages and neutrophils were signi cantly higher in EC samples than NE samples, while NK cells were signi cantly lower than that of controls. The gene coexpression networks were constructed through WGCNA package, and the composition of immune in ltrated cells was selected as WGCNA trait data. A total of nine gene modules were ultimately identi ed with the dynamic tree cutting method. The brown module, consisting of 803 genes, was the highest related to most of immune cells like macrophages and neutrophils. A total of 816 DEGs, consisting of 334 downregulated and 482 upregulated genes, were identi ed from EC and NE samples with the limitation of |log2 fold change (FC)|> 1 and an adjusted P-value of < 0.05 by the limma package. The GO and KEGG functional analysis of DEGs indicated that the dysfunction of the immune system and in ammation is connectively associated with the ectopic lesions of endometriosis. The brown module, DEGs and IRGs mentioned above were put into the Venn diagram and STRING database. The overlapping genes interacted with each other were de ned as the hub genes related to immune in ltrated of EM, including TNFSF13B, IL7R, CSF1R and LEP. And those hub genes were positively correlated with the composition of most immune cells.
The expression of those hub genes in EC were signi cantly higher than that of controls. And the AUC of 4 hub genes to distinguish endometriosis patients and normal people were all higher than 0.8, with LEP being the maximum 0.906 and CSF1R being the minimum 0.834. The independent datasets GSE7305 and GSE23339 were used to validate the hub genes, and the results were completely consistent with our results mentioned above. In independent dataset GSE120103, the expression of TNFSF13B, IL7R and LEP in EM-IF was signi cantly higher than that of in EM-F, while CSF1R was much lower. There we further inferred those 4 hub genes were also connectively associated with the common complication infertility of EM.
The protein encoded by TNFSF13B is a cytokine that belongs to the tumor necrosis factor (TNF) ligand family. This cytokine, acts as a potent B lymphocyte stimulator (BlyS), is produced by macrophages, and necessary for normal B cell differentiation and proliferation (26). Notably, high levels of BlyS overstimulate various B cell responses, leading to other conditions such as autoimmune diseases, allergic diseases, infections and malignancies (27). There was also indicated that the -817C⁄T in the promoter region of this gene possibly associated with idiopathic infertility in Brazilian population (28).
The protein encoded by IL7R is a receptor for interleukin 7 (IL7). IL7 and IL7R are crucial for innate lymphoid cell development and maintenance and are also implicated in autoimmune and chronic in ammatory diseases, as well as in cancer (29).There are also emerging reports of IL7R signaling contributing to the progression of lymphoid malignancies such as T cell acute lymphoblastic leukemia and its effective anti-IL7R targeting antibody therapies (30).However, the role of IL7R in endometriosis disease remains unexplored.
The protein encoded by CSF1R is the receptor for colony stimulating factor 1 (CSF1), a cytokine which controls the production, differentiation, and function of macrophages. This receptor mediates most of the biological effects of this cytokine. CSF1 interaction with CSF1R has been implicated in the growth, invasion, and metastasis of several types of cancer, including breast and endometrial cancers (31,32). In our study, the expression of CSF1 was signi cantly higher in EC than NE samples. Increased CSF1R levels has been implicated in the pathogenesis of endometriosis. It is also suggested that endometrial tissue involved in lesion formation is highly responsive to CSF-1 signaling (33,34).
Leptin (LEP) encodes a protein that is secreted by white adipocytes into the circulation and plays a major role in the regulation of energy homeostasis. Apart from its metabolic properties, leptin is also involved in the regulation of immune and in ammatory responses, hematopoiesis, angiogenesis, and reproduction (35)(36)(37). Increased levels of leptin have been identi ed in peritoneal uid from primary infertility women with endometriosis. In addition, the correlation between leptin levels and the endometriosis stage are signi cantly positive (38, 39).
In a summary, our study indicated that the immune in ltration was highly associated with the the ectopic lesions of EM. And the immune related biomarkers (TNFSF13B, IL7R, CSF1R, LEP) may be helpful in the development of the pathogenesis, diagnosis and even the potential therapeutic targets of EM. However, there is still a certain degree of limitation existing in our study. To the best of our knowledge, further investigations of these hub genes are needed to identify the molecular mechanism and to validate their immunotherapy effect on EM.

Conclusions
In the present study, it was showed that macrophages and neutrophils were the main in ltrating immune cells in the endometrium tissue and 4 biomarkers related to immune in ltration in ectopic endometria of EM patients were identi ed using WGCNA and xCell algorithm, including TNFSF13B, IL7R, CSF1R and LEP. In the independent validated datasets, the expression of those hub genes was upregulated and consistent with our results. The area under the ROC curve (AUC) of 4 hub genes to distinguish EM and normal people were all higher than 0.8. And the expression of 4 hub genes was signi cantly concerned with the common complication infertility of EM.

Declarations
Ethics approval and consent to participate: Not appliable. Consent for publication: All authors have agreed to the publication of the work.
Availability of data and materials: Data used or analyzed during the current study are available from the corresponding author on reasonable request. The pro les analyzed in this study were downloaded from the online repositories.
Competing interests: there is no con ict of interest that I should disclose.
Author Contributions: Ke Zhang designed the experimental apparatus, discussed the results and implications, and commented on the manuscript at all stages. Xiao xie and Lihao Zou performed the experiments. The research direction was provided by Suiqun Guo.