Differentially expressed ARGs
We analyzed the expression of 222 ARGs in 552 endometrial cancer tissues and 35 non-tumor tissues using the Wilcoxon signed-rank test in R. We obtained 37 differentially expressed ARGs, according to the criteria of |log2FC| > 1 and FDR < 0.05. These ARGs include 19 up-regulated genes (PARP1, PRKCD, CTSB, APOL1, ATG4D, BNIP3, BAK1, P4HB, ERBB2, ERO1A, GAPDH, IKBKE, TP63, EIF4EBP1, SERPINA1, IFNG, PTK6, BIRC5, and CDKN2A) and 18 downregulated genes (ITPR1, FOS, GRID1, HSPB8, NRG3, NRG2, DLC1, BCL2, FOXO1, CCL2, PRKN, TUSC1, CDKN1B, GABARAPL1 ,ST13, RAB33B, CALCOCO2, and MYC). The volcano map (Figure 1A), heatmap (Figure 1B), and boxplot (Figure 1C) were visualized for these ARGs（Supplementary Table S1).
GO enrichment analysis of differentially expressed ARGs
GO functional enrichment analysis of these 37 differentially expressed ARGs was performed (Supplementary Table S2), and the enrichment results were visualized to understand the biological functions of these genes. The results showed these ARGs were primarily involved with the intrinsic apoptotic signaling pathway, processes utilizing the autophagic mechanism, the cellular response to oxidative stress, integral components of the mitochondrial outer membrane, autophagosome membrane, protease binding, and heat shock protein binding (Figure 2A, 2B).
KEGG enrichment analysis of differentially expressed ARGs
The results of the KEGG pathway enrichment analysis indicated that the differentially expressed ARGs were related to autophagy, apoptosis, the ErbB signaling pathway, the HIF-1 signaling pathway, cellular senescence, the AGE-RAGE signaling pathway in diabetic complications, protein processing in the endoplasmic reticulum, endometrial cancer, the FoxO signaling pathway, and the estrogen signaling pathway (Figure3, Table 1). In endometrial cancer, three differentially expressed ARGs (ERBB2, BAK1 and MYC), which are closely related to the occurrence of endometrial cancer, were increased. The estrogen signaling pathway, which is highly associated with the development of endometrial cancer, showed enrichment of four differentially expressed ARGs (ITPR1, FOS, PRKCD, and BCL2).
Survival-related ARGs and the prognostic model
We performed univariate Cox regression of 37 differentially expressed ARGs, and 9 ARGs associated with endometrial cancer prognosis were obtained, including ERBB2, CDKN2A, BAK1, GRID1, NRG3, PTK6, DLC1, P4HB, and BIRC5 (P < 0.05). Seven prognosis-related ARGs (ERBB2, CDKN2A, BAK1, GRID1, NRG3, PTK6, and BIRC5) were considered risk factors (the minimum value of the 95% CI was greater than 1), and their high expression indicates a poor prognosis. Conversely, the high expression of the remaining two genes (DLC1 and P4HB) indicates better survival. Results were visualized using a forest plot (Figure 4A).
LASSO regression analysis was then conducted to exclude genes that may be highly correlated with other genes. The complexity degree of LASSO regression is determined by the parameter lambda (λ). The larger the λ, the greater the penalty for the linear model with more variables. A model with fewer variables should be selected. We obtained 8 candidate genes (CDKN2A, ERBB2, GRID1, NRG3, PTK6, DLC1, P4HB, and BIRC5) by LASSO regression (Figure 4 B-C).
Multivariate Cox regression analysis of the training cohort indicated that CDKN2A, ERBB2, PTK6, and BIRC5 were independent prognostic factors according to the HR values (CDKN2A: 1.571; ERBB2: 2.310; PTK6: 1.355; and BIRC5: 2.375; P < 0.05)(Table 2). Therefore, these genes were used to establish a prognostic model risk score = (0.45 ∗ CDKN2A expression) + (0.84 ∗ ERBB2 expression) + (0.30 ∗ PTK6 expression) + (0.86 ∗ BIRC5 expression).
Validation of the prognostic model
We used the median risk value to divide the training set and the verification set into a high-risk group and a low-risk group. Kaplan–Meier plotter results showed that survival rates for high-risk patients in the training set for 1, 3, and 5 years were 76.0%, 30.1%, and 13.2%, respectively and survival rates for low-risk patients in the training set for 1, 3, and 5 years were 91.0%, 45.6%, and 27.9%, respectively (Figure 5A). To evaluate the predictive accuracy of the prognostic model, we also plotted a ROC curve, where the area under the curve (AUC) was 0.755 for one-year survival, 0.790 for three-year survival, and 0.800 for five-year survival (Figure 5B).
Similarly, we conducted a survival analysis in the verification set, and the results showed that survival rates for high-risk patients in the verification set for 1, 3, and 5 years were 82.2%, 27.4%, and 13.3%, respectively, while survival rates for low-risk patients in the verification set for 1, 3, and 5 years were 88.1%, 45.2%, and 25.2%, respectively (Figure 5C). To evaluate the predictive accuracy of the prognostic model, we also plotted a ROC curve, where the AUC was 0.699 for one-year survival, 0.836 for three-year survival, and 0.820 for five-year survival (Figure 5D).
In addition to using survival curves to validate our prognostic model, we plotted risk curves for patients with endometrial cancer in the training set and the verification set. In both sets, as the patient's risk value increased, patient mortality increased significantly. A heatmap showed expression of risk genes was up-regulated in the high-risk groups. The results from the training and verification sets were internally consistent (Figure 6 A-F).
Validation of risk genes at the protein level
Immunohistochemistry revealed ISH scores for four risk ARGs (CDKN2A, ERBB2, PTK6, and BRIC5) were significantly higher in endometrial cancer tissue than in healthy endometrial tissue, which suggested that these genes are highly expressed in endometrial cancer tissues. We also found that CDKN2A was mainly located in the cytoplasm, membrane, and nucleus, while ERBB2, BRIC5, and PTK6 were mainly located in the cytoplasm and membrane. All immunohistochemical results were derived from the HPA database (Figure 7), the corresponding data are referenced on Table S3.