DOI: https://doi.org/10.21203/rs.3.rs-1895462/v1
Endometrial cancer (EC) identified in pelvic/para-aortic lymph nodes suggests a poorer prognosis. Literature focusing on the role of epithelial-mesenchymal transition (EMT) in lymph node metastasis (LNM) remains few.
Transcriptional data were acquired from TCGA database. Patients with stage IA-IIIC2 EC were included, constituting the LN positive and negative groups. To evaluate the extent of EMT, an EMT signature composed of 315 genes was adopted. EMT-related genes (ERGs) were obtained from dbEMT2 database, differentially expressed ERGs (DEERGs) of which between the two groups were screened out. On the basis of DEERGs, pathway analysis was carried out. We finally used the logistic regression model to build an ERG-based gene signature with diagnostic value for LNM in EC.
A total of 498 patients were included, with 75 in the LN positive group. Median EMT score of tumor tissues from LN negative group was − 0.369, while that from LN positive group was − 0.296 (P < 0.001), suggesting a more mesenchymal phenotype for LNM cases on the EMT continuum. By comparing the expression profiles, 266 genes were identified as DEGs, in which 184 were upregulated and 82 were downregulated. In pathway analyses, various EMT-related pathways were enriched. DEERGs shared between molecular subtypes were relatively few. The ROC curve and logistic regression analysis screened 7 genes with the best performance to discriminate between the LN positive and negative group, i.e. CIRBP, DDR1, F2RL2, HOXA10, PPARGC1A, SEMA3E and TGFB1. A logistic regression model including the 7-genes based risk score, age, grade, myometrial invasion and histological subtype was built, with an AUC of 0.850 and a good calibration (P = 0.074). In a validation dataset composed of 83 EC patients, the model exhibited a promising diagnostic value and was well calibrated (P = 0.42).
The EMT status and expression of ERGs varied in EC tissues from LN positive and negative cases, involving multiple EMT-related signalling pathways, and the distribution of DEGs differed among molecular subtypes. An ERG-based gene signature including 7 DEERGs exhibited a potential application value.
In 2022, there has been an incidence of 84,520 of uterine corpus cancer in China and 17,543 new deaths, where endometrial cancer (EC) accounts for the vast majority. (1) EC identified in pelvic/para-aortic lymph nodes, even as micrometastases, is associated with an increase in staging to at least IIIC1/2 and suggests a poorer prognosis. (2, 3) Lymph node metastasis (LNM) of cancer is a complex process including lymphangiogenesis around/in the primary tumor, activation of epithelial-mesenchymal transition (EMT) to initiate metastasis and survival in the LN (4–12). A substantial proportion of studies on EMT in EC have been published (13–17), but literatures focusing on the role of EMT in LNM remain few (18).
Here, we presented a retrospective study based on the EC database from the cancer genome atlas (TCGA) to discuss the role of EMT in LNM, exploring differentially expressed EMT-related genes (DEERGs) and pathways associated with them, and providing a ERG-based gene signature with diagnostic value for LNM in EC.
We extracted mRNA expression profiles from the Uterine Corpus Endometrial Carcinoma (TCGA, PanCancer Atlas) database through cBioPortal (19, 20). Specially, tumor grade and microsatellite status were obtained from the Xena platform (21). For tumor grade, non-endometrioid EC was categorized into G3. Patients with FIGO 2009 stage IA-IIIC2 EC were included in this study, and stage IIIC1/2 were considered as stage IA-IIIB with pelvic/para-aortic LNM, constituting two groups (LN positive and LN negative).
To quantitively locate a tumor tissue on the EMT continuum, a validated transcriptome-based scoring system composed of 315 genes (Epithelial: 145, Mesenchymal: 170) was adopted. An EMT score was calculated by the maximum vertical distance between the empirical distribution function (ECDF) curve of the mesenchymal and the epithelial gene set by two-sample Kolmogorov-Smirnov test (i.e. the KS method). The EMT score of a tumor with a higher expression of mesenchymal signatures would be defined as positive, and negative for epithelial ones. Thus, an EMT score in the KS method had a range of [-1, 1]. (22, 23) (Supplementary Fig. 1).
For differential expression analysis, 1027 ERGs were obtained from an EMT gene database, dbEMT2 (24), and DEERGs between the LN positive and negative groups were screened out by the ‘limma’ package. (25) A false discovery rate (FDR) < 0.05 and |logFC| > 1 were defined as the significance threshold. On the basis of DEERGs, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were carried out. The STRING database was adopted to construct a protein–protein interaction (PPI) network to visualize the relationships among DEERGs, where 0.4 was defined as the minimum required interaction score, and genes with node degree > 15 were considered hub genes. (26)
To screen candidate genes to construct a risk score for LNM, we adopted the pROC package to evaluate predictive value of them. (27) An area under the curve (AUC) > 0.7 was considered acceptable for further evaluation. The logistic regression model was built to evaluate the predictive value of the genes screened. The Hosmer and Lemeshow test was used to test the calibration of the model. We also introduced an EC database composed of 95 patients from Clinical Proteomic Tumor Analysis Consortium (CPTAC) for external verification. (28)
All the analyses were performed by R software (version 4.1.1). Besides statistical methods mentioned above, Wilcoxon test and Chi-square test were used to compare continuous and category variables, respectively. A P < 0.05 was considered statistically significant.
Clinicopathological characteristics of the 498 patients included in this study were as Table 1, with 15.1% patients exhibited LNM. Tumor grade, histological subtype, myometrial invasion (MI) and molecular subtype were significantly associated with LNM (P < 0.001).
Variable | LN negative (n = 423) | LN positive (n = 75) | |
---|---|---|---|
Age (years), mean ± SD | 63.8 ± 11.1 | 63.3 ± 10.4 | |
Race, n (%) | White | 294 (69.5%) | 45 (60.0%) |
Black or African American | 78 (18.4%) | 21 (28.0%) | |
Asian | 17 (4.0%) | 2 (2.7%) | |
Others | 8 (1.9%) | 3 (4.0%) | |
Histology, n (%) | Grade1 | 95 (22.5%) | 2 (2.7%) |
Grade2 | 109 (25.8%) | 8 (10.7%) | |
Grade3 | 219 (51.8%) | 65 (86.7%) | |
Histological subtype, n (%) | Endometrioid | 350 (82.7%) | 35 (46.7%) |
Non-endometrioid | 73 (17.3%) | 40 (53.3%) | |
MI, n (%) | < 1/2 | 285 (67.4%) | 26 (34.7%) |
≥ 1/2 | 138 (32.6%) | 49 (65.3%) | |
Molecular subtype, n (%) | POLE | 39 (9.2%) | 8 (10.7%) |
MSI-H | 134 (31.7%) | 14 (18.7%) | |
CNL | 139 (32.9%) | 10 (13.3%) | |
CNH | 103 (24.3%) | 42 (56.0%) | |
Missing | 8 (1.9%) | 1 (1.3%) | |
Abbreviations: LN: lymph node; SD: standard deviation; EEC: endometrioid endometrial cancer; MI: myometrial invasion; MSI-H: microsatellite instability-high; CNL: copy-number low; CNH: copy-number high |
A total of 314 ERGs (144 Epi and 170 Mes) were included to evaluate the EMT status, with a pseudogene OR7E14P excluded due to its absence in the TCGA database.
Consistent with the epithelial origin of EC, general expressions of ERGs revealed a more epithelial phenotype [median − 0.361, IQR (-0.449, -0.264)] in the whole cohort. The median EMT score of the LN negative group was − 0.369 [IQR (-0.455, -0.282)], while that of the LN positive group was − 0.296 [IQR (-0.449, -264)], with a P < 0.001 by Wilcoxon test, suggesting a more mesenchymal phenotype of tumors with LNM on the EMT continuum (Fig. 1). Further subgroup analysis revealed that this phenotypic difference was more significant in G3, MI ≥ 1/2 and POLE cases (Supplementary Fig. 2).
By comparing mRNA expression profiles of tumor tissues from 423 LN negative cases and 75 LN positive cases, 266 genes were identified as DEERGs, in which 184 were upregulated and 82 were downregulated. Expressions of ERGs were displayed as Supplementary Figs. 3 and Supplementary table 1.
GO enrichment terms based on DEERGs were as Fig. 2. In the biological process (BP) group, various EMT-related processes were enriched (Supplementary table 2), e.g. mesenchyme development (q = 1.49×10− 15), mesenchymal cell differentiation (q = 3.02×10− 14), epithelial to mesenchymal transition (q = 1.97×10− 12), cellular response to transforming growth factor beta stimulus (q = 4.14×10− 12), response to transforming growth factor beta (q = 6.37×10− 12), exhibiting a propensity for the enrichment of TGF-β signalling pathway. A directed acyclic graph (DAG) displaying the relationship between the BPs enriched was as Supplementary Fig. 4.
KEGG pathway analysis exhibited a significantly enrichment in EMT-related pathways, e.g. Notch signaling pathway, JAK-STAT signaling pathway, HIF-1 signaling pathway, TGF-β signaling pathway and Wnt signaling pathway, most parts of which showed an upregulation in the LNM cases (Fig. 3, Supplementary table 3).
The PPI network of DEERGs based on STRING database was presented in Supplementary Fig. 5. The top 5 hub genes calculated by the EPC method were MYC, ESR1, KRAS, MAPK3 and HDAC1. We also summarized hub genes by each method in Supplementary table 4.
As is shown in Table 1, molecular subtypes of 96.8% patients were available, and CNH cases were more likely to develop LNM than the others (χ2 = 30.5, P < 0.001). We then investigated DEERGs associated with LNM in cases with molecular subtypes (n = 491, Supplementary table 5). A Venn graph displaying DEERGs shared between subtypes was as Fig. 4, where DEERGs differed greatly between subtypes. No DEERGs were shared between all subtypes, and only HOXB7, a gene encoding an activator of TGF-β pathway, was shared by CNL, MSI-H and POLE.
Based on ROC curves adopted to discriminate between LN positive and negative groups, no DEERG exhibited an AUC > 0.7 (Supplementary table 6). Thus, a logistics regression model was constructed to screen candidate genes by including DEERG expression, age, race, grade, MI and histological subtype, ROC curves from which were used to screen target genes. As a result, the median AUC was 0.802 (IQR 0.801–0.805), suggesting a relatively high predictive value of the new models. To refine the candidate genes, genes from models with an AUC > 0.7 and P < 0.05 were selected for further analysis. Subsequently, 29 genes were selected (Supplementary table 7), which were included in a further logistic regression model, and finally 7 genes with the best performance were selected: CIRBP, DDR1, F2RL2, HOXA10, PPARGC1A, SEMA3E, TGFB1 (Supplementary table 8). An EMT-based risk score (ERS) was calculated as the logit from the logistic regression as follows: 1.845 + (0.323×CIRBP-0.096×DDR1-1.381×F2RL2-0.190×HOXA10-2.348×PPARGC1A + 1.137×SEMA3E-0.569×TGFB1) × 10− 3. Then we constructed a new logistic model to predict LNM with ERS, race, age, grade, MI and histological subtype. A further stepwise algorithm filtered out race, reaching an AUC of 0.850 (Fig. 5A).
A Nomogram based upon the above model was presented in Fig. 6, suggesting ERS had the highest weight in the model compared with traditional clinicopathological parameters. A Hosmer-Lemeshow test exhibited a good calibration for this model (χ2 = 14.3, P = 0.074), with a calibration curve in Supplementary Fig. 6.
For external validation of the gene signature, we extracted mRNA expression profile of EC tissue from CPTAC program, from which 83 cases identified as stage IA-IIIC2 were included. A similar logistic regression model was built (Supplementary table 9), with an AUC of 0.908 (Fig. 5B) and an ideal calibration (χ2 = 3.9, P = 0.42).
LNM had an incidence of 18.1% in EC, which was quite difficult to predict in clinical practice. (3) In our institution, patients with early stage EC with MI ≥ 1/2, G3 or non-endometrioid histology, or with advanced stage EC, used to undergo a staging surgery including pelvic and para-aortic lymphadenectomy, with sentinel lymphadenectomy (SLND) optional for patients with relatively low risk. However, non-therapeutic systemic lymphadenectomy still existed in most cases (71.4%, unpublished data), resulting in an increase in the complication morbidity, e.g. blood loss, lymphatic cyst and lower limb edema (29). Meanwhile, variables used to determine lymphadenectomy were mostly those significantly related to LNM. The molecular subtype gained wide attention due to its stronger correlation with LNM (3), but was not widely adopted for lymphadenectomy yet. LNM is a complex biological process where EMT plays a central role in the invasion of lymph vessel and survival in LNs (30, 31), literatures focusing on the role of EMT in LNM in EC remains few (18). Our team is committed to explain and predict LNM at the mechanism level. Therefore, we designed this first TCGA-based study combining transcriptomic data with conventional parameters to explore EMT-related molecular characteristics of LNM in EC.
We applied a well-validated EMT quantification method to describe a tumor sample on the EMT continuum. (23) Though unable to discriminate hybrid E/M samples from mixture of E&M ones, this model did strongly suggested samples from LNM cases tended to have a more mesenchymal ones, consistent with the association between EMT and LNM in literatures, which exhibited the feasibility of further exploration of this process in EC. Intriguingly, this difference in EMT score were more significant in G3 and MI ≥ 1/2 cases, suggesting an association between invasive phenotypes and the role of EMT in LNM. In all the molecular subtypes, LNM cases tended to have higher EMT scores, with POLE showing a statistical significance. This indicated a possible variation between subtypes regarding the extent of dependence on EMT during LNM. There is a blank of study on EMT in different molecular subtypes, which seems promising considering the abundant pathways and molecules in the EMT process. (32)
Among the 1027 ERGs included in this study, over 1/4 showed a differential expression between LNM and non-LNM cases, with multiple pathways being enriched, where TGF-β signalling pathway has been proven a potent inducer of EMT in EC (33). However, no published literature discussed the role of TGF-β pathway in LNM in EC. In the GO analysis based on DEERGs, various pathways related to TGF-β were enriched in the BP group. A similar result was reached by KEGG analysis, with other EMT-related pathways significantly enriched as well, members of which were mostly upregulated (Fig. 3). Hub genes suggested by PPI analysis, e.g. Myc, ESR1 and KRAS, were reported to participate in EMT pathways, i.e. Wnt, TGF-β, MAPK signalling pathways, etc. (34–39), and they also had most interactions with other DEERGs. These hub genes might be associated with the core mechanism and containment strategy of LNM in EC in the future.
Data from this study showed different LNM rate between molecular subtypes, especially for a significantly higher rate in CNH cases, consistent with previous reports (3). DEERGs shared between subtypes were relatively few, suggesting possibly different landscapes of LNM between molecular subtypes in EC, as stated above. Detailed pathway analyses were omitted due to a small sample size in subgroup analysis, and a prospective study on EMT in LNM and its relationship with molecular subtype in EC is being carried out in our institution (NSFC81972426).
As mentioned earlier, an accurate method calculating the LNM probability will contribute to a precise strategy for lymphadenectomy and adjuvant therapy. Several predictive models with good performances have been created, mostly based on clinicopathological parameters. (40, 41) But before the definitive surgery, molecular features of EC can be acquired through hysteroscopy sample as well (42), providing a chance to calculate the risk of LNM based on both clinical and molecular features. With logistic regression model, we finally reached a gene signature composed of 7 DEERGs, the new model based on which exhibited a good discrimination and calibration. It is noteworthy that according to the Nomogram, the ERS accounted for the majority of LNM risk, while traditional clinicopathological parameters used to guide lymphadenectomy in our institution accounted for only a small proportion of the risk. This partly explained the difficulty to predict LNM in clinical practice, i.e. a prediction model based on pathogenesis would be more precise. The value of the gene signature was further validated in a new EC database from CPTAC, suggesting that to determine this gene signature by tissue microarray (TMA) or genechip in hysteroscopy biopsies might contribute to the development of a most potent clinical prediction model for LNM in EC.
There are several limitations in our research. One is that due to the heterogeneity of transcriptome data in different databases, our study did not develop a clinical prediction model able to be adopted universally. But the 7-ERGs based gene signature did exhibited a diagnostic value independent of clinicopathological parameters across databases, supporting future applications. The other is that this study is limited to bioinformatics analysis and not verified by specimens from our institution. The above limitations can be remedied through developing a TMA or genechip with a unified standard. Meanwhile, there is a huge gap for laboratory research to fill on mechanisms of EMT to induce lymphatic spread in EC.
In summary, our study suggested the EMT status and expression of ERGs varied in LNM and non-LNM EC tissues, involving multiple EMT-related signalling pathways, and the distribution of DEGs differed among molecular subtypes. An ERG-based gene signature including 7 DEGs exhibited a good diagnostic value.
AUC: area under the curve, BP: biological process, CNH: copy-number high, CNL: copy-number low, CPTAC: Clinical Proteomic Tumor Analysis Consortium, DAG: directed acyclic graph, DEERG: differentially expressed ERG, EC: endometrial cancer, ECDF: empirical distribution function, EEC: endometrioid endometrial cancer, EMT: epithelial-mesenchymal transition, ERG: EMT-related gene, ERS: EMT-based risk score, GO: Gene Ontology, IQR: interquartile range, KEGG: Kyoto Encyclopedia of Genes and Genomes, LNM: lymph node metastasis, MI: myometrial invasion, MSI-H: microsatellite instability-high, PPI: protein–protein interaction, SD: standard deviation, SLND: sentinel lymphadenectomy, TCGA: the cancer genome atlas, TGF-β: transforming growth factor-beta, TMA: tissue microarray
Ethics approval and consent to participate
All data are from public databases and an ethics approval is waived.
Consent for publication
Not applicable.
Availability of data and materials
All data of this research are from public databases and available from the corresponding author on reasonable request.
Competing interests
The authors declare that they have no competing interests
Funding
This work was supported by the National Natural Science Foundation of China (81972426), Capital Health Research and Development of Special Fund (2022-2Z-4086) to Wang Zhiqi.
Authors’ contributions
LH analyzed the data and wrote the manuscript. LL, ZL & WZ revised the manuscript. WZ designed the research and made the final revision. All authors read and approved the final manuscript.
Acknowledgements
We thank Dr. Ding Li from the School of Life Sciences, Peking University, for his support in bioinformatics.