3.1 DEGs between EC and normal samples
The RNAseq tertiary data set of EC from TCGA included the biological information of 11 normal tissue and 160 EC samples. We identified 4094 DEGs, including 3272 upregulated DEGs and 822 downregulated DEGs. (Figure 1A)
3.2 Identification of IRGs
By overlapping the immune-related genes and DEGs of EC, we identified 247 upregulated and 56 downregulated IRGs, as shown in Figure 1B. Figure 2 shows the results of functional enrichment analysis. GO analysis (Figure 2A) demonstrated that these IRGs were most involved in leukocyte migration in Biological Process（BP）, vesicle lumen in Cellular Component(CC) and receptor ligand activity in Molecular Function(MF). KEGG analysis indicated that these genes were most involved in the interaction of cytokines with cytokine receptors. (Figure 2B).
3.3 Survival analysis and construction of regulatory network
A total of 13 survival-associated IRGs were identified after integrating clinical information from TCGA via univariate COX regression, as shown in Figure 3. After examining the expression of 318 transcription factors (TF), we found 61 with differential expressions between EC and normal samples, as shown in Figure 4A-B. Finally a regulatory network was constructed using these survival-associated IRGs with differently expressed TFs（Figure 4C）.
3.4 Construction of a prognostic model based on prognosis-related IRGs and external validation
We constructed a prognostic model with nine prognostic IRGs based on the results of multivariate Cox regression analysis (Table 1). The formula was as follows: Risk score = expression level of HSPA6*0.006713979+S100A12*0.003828117+CACYBP*0.042341765+NOS2*0.02490294+DKK1*0.015602891+OSM*0.207589957+STC2*0.075574581+ANGPTL3*0.645334283+NR2F2*0.015710952. We further explored the protein expression of these nine prognosis-related IRGs in THPA (Figure 5). Consistent with our results, THPA database showed that HSPA6, S100A12, CACYBP, NOS2, and STC2 in EC tissues were up-regulated, and ANGPTL3 was down-regulated compared with those in normal tissues. However, we did not find expression of DKK1, OSM and NR2F2 proteins in the database.
3.5 Validation of the prognosis-related IRGs in the Oncomine database
We validated the reliability of the prognosis-related IRGs by using Oncomine. The databases showed that the IRGs were differentially expressed in EC and normal tissues. As shown in Supplementary Figure 1, HSPA6, S100A12, CACYBP, NOS2, DKK1, OSM and STC2 were up-regulated, and ANGPTL3 and NR2F2 were down-regulated in EC tissues compared with those in normal tissues.We found that the results were almost consistent with our predictions.
3.6 Validation of the prognostic capacity of the model
Patients were separated into the high-risk group and the low-risk group based on the median risk score (Figure 6A-C). Survival analysis showed that the survival rate in the high‐risk group was remarkably lower than those in the low‐risk group (p==2.366e−06, Figure 6D). The area under curve (AUC) of the receiver operating characteristic (ROC) curve was 0.826 (Figure 6E). Compared with clinical factors (including age, gender, grade, stage and TMN), this signature showed a greater performance in predicting the prognosis of EC. At the same time, univariate and multiple regression analysis(Figure 7A-B) showed that when other clinical parameters were adjusted, the prognostic signature may become an independent predictor. The clinical significance of included genes was also explored in this study (Figure 8A-J). In order to assess the prognostic capacity of the model, we conducted a stratified analysis of clinical factors. Interestingly, we found that nearly the high‐risk patients in subgroups of age ≤ 65(Figure 9A), male(Figure 9B), G1 & G2(Figure 9C) , stage III & IV(Figure 9D) ,T-3-4(Figure 9E),MO(Figure 9F),N1-3(Figure 9G) and EAC (Figure 9H)were inclined to unfavorable overall survival.
3.7 Construction and validation of predictive nomogram
Using a number of independent prognostic factors (including age, gender, grade, stage, TMN, histology, and risk scores), we established a nomogram to predict 1-year and 3-year OS in 100 EC patients. The calibration chart showed that the nomogram might overestimate or underestimate the mortality (Figure 10). These results suggested that the nomogram based on multiple factors can better predict short-term survival (1 year and 3 years) compared to the nomogram based on a single factor.
3.8 Identification of related biological processes and pathways
We employed risk score to classify the entire data set and determine the related pathways with these nine genes by using the Java software GSEA. The results showed that "one carbon pool by folate", "proteasome", "spliceosome" and "RNA degradation" were more abundant in the high-risk group than in the low-risk group. This suggests that in high-risk patients, the nine genes were most involved in pathways of protein degradation, RNA degradation and splicing. That is to say, patients with protein degradation, RNA degradation and splicing effects were more inclined to a poor prognosis (Figure 11).
3.9 Level difference of tumor-infiltrating immune cells between the two risk groups
We compared the infiltration of immune cells in different risk groups. The results showed that Macrophages M0, Macrophages M2 and activated mast cells were significantly enriched in high-risk group, while CD8 T cells and regulatory T cells (Tregs) were significantly enriched in the low-risk group (Figure 12).