3.1 Gene expression signatures of the ectopic lesion of EM. The analysis procedures of this study were shown in Figure 1A. There was a total of 134 endometrial samples and 19746 genes in the GSE141549 profile. Boxplot of this dataset was shown in Figure 1B. A total of 816 DEGs identified from EC and NE samples, consisting of 334 downregulated and 482 upregulated genes (Figure 2A). GO and KEGG analyses of DEGs indicated that complement and coagulation cascades, cytochrome P450 were significantly enriched (Figure 2B-C), which indicated that the dysfunction of the immune system and inflammation is connectively associated with the ectopic lesions of endometriosis.
3.2 Signature of the immune infiltration of EM. 7 types of infiltrating immune cells in the endometrium tissue were estimated by the xCell algorithm and showed in a heatmap and a violin diagram. As shown in the heatmap (Figure 3A), macrophages and neutrophils were the main infiltrating immune cells in the endometrium tissue. In the violin diagram (Figure 3B), the composition of CD4+T cells along with B cells, macrophages and neutrophils were significantly higher in EC samples than NE samples, while NK cells were significantly lower than that of controls. The Pearson test (p<0.05) was then used to evaluate the correlation among 7 types of immune cells (Figure 3C). Most of the immune cells in the endometrium tissue were highly positive correlated, which indicated that immune infiltration is so interconnected. For example, neutrophils had the significantly high relationship with macrophages (R2=0.76, p<0.05) and B cells (R2= 0.57, p<0.05).
3.3 Construction of WGCNA and Identification of the hub module. The top 30% variation coefficient of genes (5924 genes) in 102 EC samples and the composition of 7 immune infiltrated cells corresponding to 102 EC samples were input to establish the co-expression networks and identify the hub module related to immune cells using the WGCNA package of R (Figure 4A). When the soft-thresholding power was 8, a scale-independent topological network was established (Figure 4B). Genes were classified into different modules using the dynamic hybrid cutting method, and the minimum module size cut-off value was 100. A total of nine gene modules were ultimately identified with the dynamic tree cutting method which was enforced for the construction of a hierarchical clustering tree by splitting the dendrogram at relevant transition points and the modules with a difference < 0.25 were combined (Figure 4C).
To determine the hub module, the correlation between the module and the composition of 7 immune cells was calculated using a Pearson test (P < 0.05). In nine modules, the brown module, consisting of 803 genes, was highest related to macrophages (R2 = 0.89, p = 5e-36), neutrophils (R2=0.73, p=2e-18), DC (R2= 0.62, p= 5e-12), B cells (R2=0.47,p= 5e-07), and CD8+ cells (R2=0.42, p=1e-05) and considered to be the hub module related to immune infiltration (Figure 4D).
3.4 Functional analyses of the hub module. The PPI network of the hub module was constructed, and 16 central nodes were selected with the connective degree >20 (Figure 5A). Furthermore, the biological functions of the hub module were annotated. According to GO and KEGG analysis, most of the genes were involved in allograft rejection, antigen processing and presentation, cell adhesion molecules, lysosome, phagosome (Figure 5B-C).
3.5 Identification and validation of immune infiltration related biomarkers for EM. There were 1793 IRGs downloaded from the ImmPort database. In the Venn diagram (Figure 6A), there were 7 shared genes (LEP, C3, SLPI, S100A8, TNFSF13B, IL7R, CSF1R) recognized by the results of brown module, DEGs and IRGs mentioned above and put into the string database. Cause the immune response is so interconnected, the hub genes were defined as the genes interacted with each other among 7 shared genes mentioned above, including TNFSF13B, IL7R, CSF1R, LEP (Figure 6B). In Figure 6C, the expression of those hub genes in EC were significantly higher than that of controls. The independent validation profiles GSE7305 and GSE23339 were also downloaded from the GEO database and separately contained 10 paired EC and NE samples, 10 EC and 9 NE samples (22, 23). The expression of those 4 hub genes in GSE7305 and GSE23339 was significantly consistent with our result (Supplemental Figure 1A-B).
3.6 Identification of clinical characteristics of hub genes. The AUC of those hub genes to distinguish endometriosis patients and normal people were shown in Figure 7A. The AUC of 4 hub genes was higher than 0.8, with LEP being the maximum 0.906 and CSF1R being the minimum 0.834. The independent dataset GSE120103 contained 9 paired eutopic endometrium tissue samples of fertile (EM-F) or infertile (EM-IF) women undergoing endometriosis (24). As shown in Figure 7B, the expression of TNFSF13B, IL7R and LEP in EM-IF was significantly higher than that of in EM-F, while CSF1R was much lower. There we further inferred those 4 hub genes may be also connectively associated with the common complication infertility of EM.
3.7 Correlation between biomarkers and immune infiltrated cells. The correlation among 4 hub genes (TNFSF13B, IL7R, CSF1R, LEP) and 7 types of immune infiltrated cells (B cells, CD4+T cells, CD8+T cells, DC cells, macrophages, NK cells, neutrophils) was analyzed by the Pearson test and presented in Figure 8A. The results indicated that hub genes were positively correlated with the composition of immune cells except NK cells. For example (Figure 8B-C), TNFSF13B was positively related to B cells (R2=0.60, p<2.2e-16) and CSF1R was positively related to macrophages (R2=0.79, p<2.2e-16).