Identification of differently expressed ARGs
RNA-seq and clinical data of 552 EC tissue samples and 23 non-tumor samples were downloaded from TCGA. With |log2(Fold Change)|>1 and FDR < 0.05, we finally obtained 22 up-regulated and 18 down-regulated ARGs between EC and non-tumor tissues (Fig. 1A, 1B). Box plots were visualized to display the expression patterns of the 40 differentially expressed ARGs, including 22 up-regulated genes (APOL1, ATG4D, BAK1, BAX, BID, BIRC5, BNIP3, CASP3, CDKN2A, CTSB, DAPK2, EIF4EBP1, ERBB2, GAPDH, IFNG, IKBKE, P4HB, PARP1, PTK6, SERPINA1, TP63, WIPI1 ) and 18 down-regulated genes (BCL2, CALCOCO2, CDKN1B, DLC1, FOS, FOXO1, GABARAPL1, GRID1, HSPB8, ITPR1, MYC, NRG2, PPP1R15A, PRKAR1A, RAB33B, ST13, TUSC1, VAMP3) (Fig. 1C).
Functional annotation of the differentially expressed ARGs
According to the results of Enrichr database, in the aspect of biological processes, the genes were mostly enriched in apoptotic process, intrinsic apoptotic signaling pathway and intrinsic apoptotic signaling pathway in response to endoplasmic reticulum stress. In the aspect of cellular components, the genes were mostly enriched in mitochondrial outer membrane, mitochondrion, and autophagosome. In the aspect of molecular function, the genes were mostly enriched in protein heterodimerization activity, protein serine/threonine kinase inhibitor activity, and protein phosphatase binding. Besides, we also found the differentially expressed ARGs were notably associated with apoptosis, pathways in cancer, autophagy, Kaposi sarcoma-associated herpesvirus infection and measles in the KEGG pathway enrichment analysis (Additional file 1).
Construction of a risk score formula in the train set
The relationships between the expression level of the 40 differentially expressed ARGs and overall survival (OS) were evaluated based on the data obtained from TCGA. Firstly, we integrated autophagy-related gene expression profiles and clinical follow-up information to screen out 516 EC samples. Then the entire set (516 samples) were randomly separated into the train set and the test set in a ratio of 6 to 4. Hence, the train set contained 312 samples and the test set contained 204 samples. Using the expression files of the train set, the univariate Cox regression, LASSO Cox regression and multivariate Cox regression analyses were performed (Additional file 2). A prognostic model using the expression of CDKN1B, DLC1, EIF4EBP1, ERBB2 and GRID1 was constructed (Table 2). The risk score of EC prognosis was quantified by the following formula: risk score = CDKN1B * (0.016941) + DLC1 * (-0.33079) + EIF4EBP1 * (0.003577) + ERBB2 * (0.001271) + GRID1 * (0.733501). It was noticed that among these five ARGs, just DLC1 had a negative coefficient. Totally, the expressions of CDKN1B, EIF4EBP1, ERBB2 and GRID1 were negatively related to the OS of EC patients, while that of DLC1 was positively related to OS.
Table 2
The most prognosis-related five autophagy-related genes.
gene | coef | HR | HR.95L | HR.95H | pvalue |
CDKN1B | 0.016941 | 1.017085 | 0.994197 | 1.040501 | 0.144618 |
DLC1 | -0.33079 | 0.718355 | 0.542355 | 0.951468 | 0.02106 |
EIF4EBP1 | 0.003577 | 1.003584 | 1.000417 | 1.00676 | 0.026523 |
ERBB2 | 0.001271 | 1.001272 | 1.000166 | 1.002379 | 0.024188 |
GRID1 | 0.733501 | 2.082357 | 1.20684 | 3.59303 | 0.008402 |
Relationships between the risk score and OS in the train set
Based on these five ARGs, we could calculate the risk score for each patient. Survival analysis was performed and a dichotomous score was adopted. Patients in the train set were then stratified into high-risk (n = 156)and low-risk groups༈n = 156༉ based on the median risk score. The risk score and survival status predicted by the prognostic model are displayed in Fig. 2A-C. The survival rate of the patients in the low-risk group was significantly higher than that in the high-risk group (P = 4.351e-04) (Fig. 2D). Time-dependent ROC analysis demonstrated that the prognostic accuracy of the risk score was 0.706 at 1 year, 0.742 at 3 years, and 0.655 at 5 years, indicative of its good performance in predicting EC prognosis (Fig. 2E). Principal component analysis displayed a different distribution pattern of high and low risk according to 5 autophagy-related gene expression based on the train set (Fig. 2F). Univariate and multivariate Cox regression analyses showed that the model can efficiently predict EC prognosis with ARG risk score as an independent indicator (Fig. 5A, B).
Validation of the risk score in the test set
The test set was applied to validate the predictive power of the five ARGs. Based on the cut-offs in the train set, 204 samples in the test set were divided into low-risk (n = 107) and high-risk groups (n = 97) by using the same risk score formula. Figure 3A-C also showed the risk score and survival status. In consistent with the findings described above, low-risk patients had significantly longer overall survival than high-risk patients (P = 4.398e-02) (Fig. 3D). ROC curve analysis showed that the specificity and sensitivity were highest when the risk score was 0.627 at 1 year, 0.66 at 3 years, 0.67 at 5 years, according to the value of the area under the receiver operating characteristic curve (AUC) (Fig. 3E). Principal component analysis displayed a different distribution pattern of high and low risk according to 5 autophagy-related gene expression based on the test set (Fig. 3F). Likewise, the independency of the prognostic model was also confirmed by univariate and multivariate Cox regression analyses (Fig. 5C, D).
Validation of the risk score in the entire set
We further validated our risk score in the entire set (516 samples). Based on the cut-offs in the train set, the samples were stratified into low-risk group(n = 263)and high-risk group༈n = 253༉. Figure 4A-C also showed the risk score and survival status. KM plot indicated that patients in high-risk group and low-risk group showed significantly different outcomes (P = 3.335e-05) (Fig. 4D). The 1-year, 3‐year, and 5‐year prognostic accuracy of the prognostic model was 0.674, 0.712 and 0.659, respectively (Fig. 4E). Principal component analysis displayed a different distribution pattern of high and low risk according to 5 autophagy-related gene expression based on the entire set (Fig. 4F). The risk score and clinical factors were incorporated into univariate and multivariate Cox regression analyses. The results demonstrated that the ARG risk score in the model can serve as an independent prognostic indicator (Fig. 5E, F). These conclusions supported our speculation that autophagy affects the prognosis of EC.
Relationships between the risk score and clinicopathologic factors
We finally built a complete prognostic model based on the entire set. This model showed a good performance in stratification in clinical stage I-II & III-IV, grade G3&G4, peritoneal wash, cancer status, radiation therapy, minimally invasive surgical approach and endometrioid histological type (Fig. 6). In parallel to the results above, high-risk group in both subgroups was inclined to have worse OS. Meanwhile, relationships were analyzed between the ARGs-based risk score and clinicopathologic factors, including age, grade, clinical stage, peritoneal wash, radiation therapy, surgical approach, cancer status and histological type. Risk score was significantly higher in seniors, poorly differentiated cases, and mix or serous histological type cases (Additional file 3). We also explored the clinical significance of included genes (Table 3).
Table 3
Relations between the expression of the autophagy-related genes and the clinicopathological factors in endometrial cancer.
Genes | Age (> 60/≤60) | Clinical stage (G1-G2/G3-G4) | Grade (I-II/III-IV) | Peritoneal wash (wash negative/positive) | Cancer status (with tumor/tumor free) | Radiation therapy (yes/no) | Surgical approach (minimally invasive/open surgery) | Histological type (endometrioid/ mixed&serous) |
t | p | t | p | t | p | t | p | t | p | t | p | t | p | t | p |
CDKN1B | -0.119 | 0.906 | -0.623 | 0.534 | -6.612 | < 0.001 | -0.55 | 0.584 | -1.808 | 0.075 | -0.591 | 0.555 | 1.336 | 0.183 | -3.337 | 0.001 |
DLC1 | 4.998 | < 0.001 | -0.024 | 0.981 | 3.947 | < 0.001 | 1.019 | 0.313 | 3.575 | < 0.001 | 1.234 | 0.218 | 0.555 | 0.580 | 4.223 | < 0.001 |
EIF4EBP1 | 0.141 | 0.888 | -1.302 | 0.196 | -2.551 | 0.011 | 1.209 | 0.231 | -0.952 | 0.346 | 0.877 | 0.381 | -0.068 | 0.946 | 0.05 | 0.960 |
ERBB2 | -2.346 | 0.020 | -1.681 | 0.097 | -2.576 | 0.011 | -1.98 | 0.055 | -0.737 | 0.464 | -0.364 | 0.716 | 0.116 | 0.908 | -2.458 | 0.017 |
GRID1 | 2.203 | 0.029 | -1.352 | 0.179 | 1.092 | 0.276 | 0.425 | 0.672 | 0.281 | 0.779 | 0.202 | 0.840 | -0.668 | 0.505 | -0.681 | 0.497 |
Nomogram and its clinical utility
To provide clinicians with a quantitative method to predict the overall survival of EC patients, a nomogram was constructed incorporating the risk score and clinical factors (Fig. 7A). The calibration curve of 1-year, 3-year and 5-year survival indicated that the nomogram was almost an ideal model (Fig. 7B). ROC curve analysis also showed that in this model, the risk score AUC was 0.794, and the clinical factors AUC was 0.851, both much significantly higher than that of patient age (AUC = 0.621), clinical stage (AUC = 0.701), grade (AUC = 0.691), peritoneal wash (AUC = 0.726), cancer status (AUC = 0.760) and histological type (AUC = 0.746). Interestingly, the ROC curve of risk score combined with clinical factors was much higher than that of each alone (AUC = 0.859) (Fig. 7C). These further validated the reliability of our previous prognostic model.
External validation of the risk score
The representative protein expression of CDKN1B, DLC1, EIF4EBP1 and ERBB2 was explored in the Human Protein Profiles (Fig. 8A). However, we failed to find GRID1 on the website. ERBB2 possessed the most genetic alterations (11%), amplification being the commonest (Fig. 8B). Moreover, ROC curve analysis found that the key genes had strong capacity to distinguish tumor from normal tissues, and the combined use of these ARGs was more effective (AUC = 0.997) (Fig. 8C).
3.10 Gene set enrichment analysis
We performed GSEA to identify ARGs related pathways. The most enriched pathways in high-risk groups included “Cell cycle”, “Citrate cycle tca cycle”, “DNA replication”, “Homologous recombination”, “Mismatch repair” and “Oxidative phosphorylation”, which implied that the five ARGs are related to EC progress (Additional file 4).
Somatic mutation analysis
The top ten mutated genes in different risk groups were displayed in the Fig. 12. The order of somatic mutations in the high-risk group was as follows: TP53 > PTEN > PIK3CA > TTN > ARID1A > KMT2D > PIK3R1 > MUC16 > ZFHX3 > FAT4 (Fig. 9A). In the low-risk group, the order was: PTEN > ARID1A > PIK3CA > TTN > PIK3R1 > CTNNB1 > CTCF > KMT2D > MUC16 > CSMD3 (Fig. 9B). Besides, we found that the patients in the high-risk group showed significant mutation signatures, compared with those in the low-risk group (Fig. 9C, D). It was noticed that TP53, PTEN, CTNNB1, DOCK3, ARID1A, ARHGAP35, FAT4, CTCF and FBXW7 were mutated differentially between the high- and low-risk groups (P < 0.05) (Fig. 9E).