Development of a prognostic model for esophageal cancer based on nine immune related genes

Background: Function of the immune system is correlated with the prognosis of the tumor. The effect of immune microenvironment on esophageal cancer (EC) development has not been fully investigated. Methods: This study aimed to explore a prognostic model based on immune-related genes (IRGs) for EC. We obtained the RNA-seq dataset and clinical information of EC from the Cancer Genome Atlas (TCGA). Results: We identied 247 upregulated IRGs and 56 downregulated IRGs. Pathway analysis revealed that the most differentially expressed IRGs were enriched in Cytokine-cytokine receptor interaction. We further screened 13 survival-related IRGs and constructed regulatory networks involving related transcription factors (TFs). Finally, a prognostic model was constructed with 9 IRGs (HSPA6, S100A12, CACYBP, NOS2, DKK1, OSM ,STC2 ,NGPTL3 and NR2F2) by multivariate Cox regression analysis. The patients were classied into two subgroups with different outcomes. When adjusted with clinical factors, this model was veried as an independent predictor, which performed accurately in prognostic prediction. Next, M0 and M2 macrophages and activated mast cells were signicantly enriched in high-risk group, while CD8 T cells and regulatory T cells (Tregs) were signicantly enriched in low-risk group. Conclusions: These molecules may serve as potential therapeutic targets and biomarkers for the new-immunotherapy of EC.


Background
Esophageal cancer (EC) is the eighth commonest cancer worldwide. The National Cancer Institute estimated 16,910 new cases and 15,910 deaths from esophageal cancer in the United States in 2016 (Short et al. 2017). Its incidence has risen by more than six times (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008) (Simard et al. 2012). The overall ve-year survival of EC and that after esophagectomy are still poor, although great improvements have been made in treatment (Huang & Yu 2018). Squamous cell carcinoma is the most common histological type of EC . Tobacco, alcohol, and malnutrition are the most associated risk factors in the development of EC (Prabhu et al. 2016). Once diagnosed, EC must be accurately staged prior to the initiation of treatment. TNM (tumor, lymph node, metastasis) is a staging system based on the status of tumor invasion, lymph node, and metastasis (Pennathur et al. 2013). Early-stage EC is usually treated with endoscopic surgery, advanced EC with surgery with or without chemoradiation (Bollschweiler et al. 2017).
Certain speci c genes and biomarkers are needed to predict the patient's therapeutic response and increase their survival (Huang & Yu 2018). Immune responses is critical in the tumor microenvironment.
Tumor cells with genomic alterations can produce new antigens that can be recognized by the immune cells (Cerezo-Wallis & Soengas 2016). Expression of IRGs can serve as e cient biomarkers. Previous research have explored the IRGs-based prognostic features in patients with non-squamous non-small cell lung cancer ) and papillary thyroid carcinoma (Lin et al. 2019). However, prognostic models based on IRGs for EC remain to be elucidated.

Page 3/27
This study investigated the clinical signi cance of a prognostic model based on immunogenomics.

Data collection
The mRNA pro les and corresponding clinical information of 11 normal tissues and 160 EC samples were downloaded from TCGA (https://www.cancer.gov/) (Lee 2016). A set of IRGs were obtained through the Immunology Database and Analysis Portal (ImmPort) database (https://www.immport.org) (Bhattacharya et al. 2018). A set of tumor-related TFs were obtained from Cistrome Cancer(http://cistrome.org/CistromeCancer/) (Mei et al. 2017). CIBERSORT (https://cibersort.stanford.edu/index.php) is based on a gene expression deconvolution algorithm (Ali et al. 2016) for obtaining immune cells with differences between cancer and normal tissues.

Identi cation of differentially expressed genes (DEGs)
DEGs between EC and normal tissues were identi ed via R software (version: x64 3.2.1) and package Limma. The p value was adjusted into the false discovery rate (FDR). A value of FDR less than 0.05 and |log2(FC)| higher than 1 were considered signi cant.

Identi cation of immune-related genes (IRGs)
DEGs overlapped with immune-related genes were obtained as the differentially expressed IRGs. Based on these IRGs, Gene Ontology (GO) (Ashburner et al. 2000) and Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa et al. 2016) analyses were performed with the clusterpro ler R package to explore the underlying mechanisms of these IRGs.

Identi cation of prognosis-related IRGs and construction of regulatory network
Prognosis-related IRGs were identi ed using univariate COX regression analysis. We analyzed these prognosis-related IRGs using the package R. Then, we investigated the interaction of these IRGs and differentially expressed TFs with a threshold of P < 0.05. Coe cient > 0.3 was considered as positive regulation, otherwise as negative regulation. Subsequently, we constructed a regulatory network with relevant TFs and prognosis-related IRGs by using cytoscape software 3.7.1 (Shannon et al. 2003).

Construction of a prognostic model in EC based on IRGs
We constructed a prognostic model based on the results of a multivariate Cox regression analysis. Based on the median risk score, EC patients were divided into high-risk and low-risk groups. The performance of prognostic model was validated by survival analysis between groups with thresholds of p < 0.05 using the survival and survminer package of R. Receiver operating characteristic (ROC) analysis was performed via the survivalROC package, and the area under curve (AUC) was calculated to evaluate the e ciency of the model in predicting disease onset (Kamarudin et al. 2017). Association between IRG expression and clinical parameters was tested using independent t-tests, and p < 0.05 were considered statistically signi cant. Clinical survival analysis in subgroups was also conducted, and p < 0.05 was considered statistically signi cant.

Veri cation of the prognosis-related IRGs in this model
We used the online software Oncomine (https://www.oncomine.org) to verify the IRGs. For screening, we set the following criteria: 1 "Gene: IRGs in this model"; 2 "Analysis Type: Cancer vs. Normal Analysis"; 3 "Cancer Type: Esophageal Cancer"; 4" Clinical Outcome: Survival Status "; 5 "Data Type: mRNA". Based on the speci c binding between antibodies and antigens, immunohistochemistry can reveal the relative distribution and abundance of proteins. Using The Human Protein Atlas (THPA) (https://www.proteinatlas.org) (Almdahl et al. 1988), we observed the differences in key gene expression between normal and EC tissues.

Building a predictive nomogram
To investigate the possibility of EC 1-OS and 3-OS, we established nomograms by including all independent prognostic factors identi ed by multivariate Cox regression analysis. The effectiveness of the nomogram was evaluated by discrimination and calibration. Finally, we plotted the curve of the nomogram by package rms of R to observe the relationship between the predicted rate of nomogram and the observed rate.

Functional enrichment analysis
We used Gene Set Enrichment Analysis (GSEA) (Subramanian et al. 2005) to identify consistent differences between high-risk and low-risk groups and the associated biological processes. In screening the gene list of KEGG, p < 0.05 was considered statistically signi cant.

Differential expression of tumor-in ltrating immune cells between high-risk and low-risk groups
Status of immune in ltration in EC patients was achieved from the dataset of CIBERSORT. Subsequently, we tested the abundance of immune cells, and its difference between high-risk and low-risk groups by using two-sample T-test.

DEGs between EC and normal samples
The RNAseq tertiary data set of EC from TCGA included the biological information of 11 normal tissue and 160 EC samples. We identi ed 4094 DEGs, including 3272 upregulated DEGs and 822 downregulated DEGs. (Fig. 1A) 3.2 Identi cation of IRGs By overlapping the immune-related genes and DEGs of EC, we identi ed 247 upregulated and 56 downregulated IRGs, as shown in Fig. 1B. Figure 2 shows the results of functional enrichment analysis. GO analysis ( Fig. 2A) demonstrated that these IRGs were most involved in leukocyte migration in Biological Process(BP), vesicle lumen in Cellular Component(CC) and receptor ligand activity in Molecular Function(MF). KEGG analysis indicated that these genes were most involved in the interaction of cytokines with cytokine receptors. (Fig. 2B).

Survival analysis and construction of regulatory network
A total of 13 survival-associated IRGs were identi ed after integrating clinical information from TCGA via univariate COX regression, as shown in Fig. 3. After examining the expression of 318 transcription factors (TF), we found 61 with differential expressions between EC and normal samples, as shown in Fig. 4A-B. Finally a regulatory network was constructed using these survival-associated IRGs with differently expressed TFs( Figure 4C).

Construction of a prognostic model based on prognosis-related IRGs and external validation
We constructed a prognostic model with nine prognostic IRGs based on the results of multivariate Cox regression analysis (Table 1). The formula was as follows: Risk score = expression level of HSPA6*0.006713979 + S100A12*0.003828117 + CACYBP*0.042341765 + NOS2*0.02490294 + DKK1*0.015602891 + OSM*0.207589957 + STC2*0.075574581 + ANGPTL3*0.645334283 + NR2F2*0.015710952. We further explored the protein expression of these nine prognosis-related IRGs in THPA (Fig. 5). Consistent with our results, THPA database showed that HSPA6, S100A12, CACYBP, NOS2, and STC2 in EC tissues were up-regulated, and ANGPTL3 was down-regulated compared with those in normal tissues. However, we did not nd expression of DKK1, OSM and NR2F2 proteins in the database.

Validation of the prognosis-related IRGs in the Oncomine database
We validated the reliability of the prognosis-related IRGs by using Oncomine. The databases showed that the IRGs were differentially expressed in EC and normal tissues. As shown in Supplementary Fig. 1,  HSPA6, S100A12, CACYBP, NOS2, DKK1, OSM and STC2 were up-regulated, and ANGPTL3 and NR2F2 were down-regulated in EC tissues compared with those in normal tissues.We found that the results were almost consistent with our predictions.
3.6 Validation of the prognostic capacity of the model Patients were separated into the high-risk group and the low-risk group based on the median risk score (Fig. 6A-C). Survival analysis showed that the survival rate in the high-risk group was remarkably lower than those in the low-risk group (p = = 2.366e − 06, Fig. 6D). The area under curve (AUC) of the receiver operating characteristic (ROC) curve was 0.826 (Fig. 6E). Compared with clinical factors (including age, gender, grade, stage and TMN), this signature showed a greater performance in predicting the prognosis of EC. At the same time, univariate and multiple regression analysis (Fig. 7A-B) showed that when other clinical parameters were adjusted, the prognostic signature may become an independent predictor. The clinical signi cance of included genes was also explored in this study (Fig. 8A-J). In order to assess the prognostic capacity of the model, we conducted a strati ed analysis of clinical factors. Interestingly, we found that nearly the high-risk patients in subgroups of age ≤ 65 (Fig. 9A), male (Fig. 9B), G1 & G2 (Fig. 9C), stage III & IV (Fig. 9D) ,T-3-4 (Fig. 9E),MO (Fig. 9F),N1-3 (Fig. 9G) and EAC (Fig. 9H)were inclined to unfavorable overall survival.

Construction and validation of predictive nomogram
Using a number of independent prognostic factors (including age, gender, grade, stage, TMN, histology, and risk scores), we established a nomogram to predict 1-year and 3-year OS in 100 EC patients. The calibration chart showed that the nomogram might overestimate or underestimate the mortality (Fig. 10).
These results suggested that the nomogram based on multiple factors can better predict short-term survival (1 year and 3 years) compared to the nomogram based on a single factor.

Identi cation of related biological processes and pathways
We employed risk score to classify the entire data set and determine the related pathways with these nine genes by using the Java software GSEA. The results showed that "one carbon pool by folate", "proteasome", "spliceosome" and "RNA degradation" were more abundant in the high-risk group than in the low-risk group. This suggests that in high-risk patients, the nine genes were most involved in pathways of protein degradation, RNA degradation and splicing. That is to say, patients with protein degradation, RNA degradation and splicing effects were more inclined to a poor prognosis (Fig. 11).

Level difference of tumor-in ltrating immune cells between the two risk groups
We compared the in ltration of immune cells in different risk groups. The results showed that Macrophages M0, Macrophages M2 and activated mast cells were signi cantly enriched in high-risk group, while CD8 T cells and regulatory T cells (Tregs) were signi cantly enriched in the low-risk group (Fig. 12).

Discussion
Esophageal cancer has a large number of new cases every year, and it has historically been regarded as an uncontrollable disease process. The etiology of esophageal cancer may be multifactorial, but part of it is due to the unique manifestation of this cancer (Vaghjiani & Molena 2017). At present, for the treatment of esophageal cancer, attention has shifted to the development of immunotherapy with novel immune biomarkers (Ku 2017). Somatic cells acquire malignancy through genetic alterations. Cancer cells usually evade the recognition of the immune system and develop into clinically meaningful masses (Sugie 2018). Compared with conventional therapies, cancer immunotherapy shows long-lasting response with fewer adverse reactions (Moy et al. 2017). This provides a new option for the treatment of EC.
The prognostic model for EC has been continuously updated (Cao et al. 2014;Winther et al. 2015;Xi et al. 2017). In this study, we identi ed 247 up-regulated and 56 down-regulated IRGs in EC and screened out survival-related IRGs. Based on these data, we established a prognostic model that divided EC patients into high-risk and low-risk groups. This model showed a good predictive performance (AUC 0.826). The model was also an independent prognostic indicator by multivariate analysis incorporating other clinical factors. KEGG analysis indicated that the main pathway was enriched in cytokine-cytokine receptor interaction. Many biological processes are regulated by cytokines, including cell growth, differentiation, immunity, in ammation, and metabolism (O'Shea et al. 2013). Tumor progression can be promoted by cytokines that affect the tumor microenvironment and directly act on cancer cells (Roshani et al. 2014). Moreover, cytokines participate in the immune response of cytotoxic T lymphocytes (CTLs) by modulating the differentiation of Th1 and Th2 cells (Agarwal et al. 2006). Kita Y et al found that STC2 may be involved in lymph node metastasis, making it a potential prognostic marker for patients with EC (Kita et al. 2011). Studies also demonstrated that STC2 may play an important role in ESCC tumorigenesis (Kashyap et al. 2012). Abnormal expression of DKK1, which is regulated by DKK1-CKAP4 pathway, predicts the poor prognosis of esophageal squamous cell carcinoma (ESCC) (Shinno et al. 2018). These results are consistent with our ndings. CacyBP regulates cell proliferation, tumorigenesis, differentiation or gene expression (Topolska-Wos et al. 2016). In colon cancer, CacyBP can promote the growth of cancer cells by enhancing the ubiquitin-mediated degradation of p27kip1 (Zhai et al. 2017). In addition, studies have con rmed that CacyBP level increased in gastric, nasopharyngeal carcinoma, osteogenic sarcoma and melanoma (Zhai et al. 2008;Zhu et al. 2014).
In our prognostic model, the IRGs showing prognostic values included HSPA6, S100A12, CACYBP, NOS2, DKK1, OSM, STC2, ANGPTL3 and NR2F2. Among the, HSPA6 may be associated with early recurrence of HCC (Yang et al. 2015). In ESCC, S100A12 is downregulated at the protein level (Ji et al. 2004). In Barrett's esophagus and related adenocarcinoma, expression of inducible nitric oxide synthase (NOS-2) is increased, and NOS-2 also plays a role in in ammation and epithelial cell growth (Wilson et al. 1998). OSM has been identi ed as an inhibitor of tumor cell growth in a variety of cancers, including melanoma, ovarian cancer, and glioblastoma carcinomas (Brown et al. 1987;Friedrich et al. 2001;Ohata et al. 2001). The splice variant of oncostatin M receptor β is overexpressed in human esophageal squamous cell carcinoma (Kausar et al. 2011). Angiopoietin-like protein 3(ANGPTL3) is indicative of EC prognosis (Zhu et al. 2015). NR2F2 is involved in the progression of prostate adenocarcinoma (Qin et al. 2013), and NR2F2 expression is a prognostic factor for breast neoplasms (Nagasaki et al. 2009). High expression of NR2F2 in certain gastric and esophageal adenocarcinomas is associated with abnormal expression of cadherin 11, suggesting that the NR2F2-related embryonic pathways in these tumors are reactivated (Bringuier et al. 2015). Proteasome dysregulation is implicated in the development of many types of cancer ). The proteasome is involved in cell cycle and transcription, two processes indispensable for cancer development (Catalgol 2012 Mutations in genes encoding splice proteins are frequently found in cancer (Lee & Abdel-Wahab 2016). Small molecule inhibitors that target splice components can be used to create anti-cancer drugs (Effenberger et al. 2017). RNA degradation is a key post-transcriptional regulatory checkpoint to maintain proper functions of organisms. Ribonuclease, a key enzyme responsible for RNA stability, can be used alone for RNA degradation, and can bind to other proteins in the RNA degradation complex (Saramago et al. 2019).
Previous immunotherapies mainly rely on T cells in tumor immune defense (McGray et al. 2014;Traversari & Russo 2016). Evidence has veri ed the role of B cells in tumor immunology (Bindea et al. 2013;Tsou et al. 2016). In the present research, the abundance of CD8 T cells and regulatory T cells in the low-risk group increased.
T cells are critical in host defense against cancer (Jiang & Yan 2016). The value of CD8 T cells for cancer prognosis has been assessed (Fukunaga et al. 2004;Liu et al. 2012;Mahmoud et al. 2012;Pages et al. 2009;Sato et al. 2005). In addition, CD8 T cells also play a role in the progression of EC (Kato et al. 2018;Li et al. 2018). The abundance of CD8 T cells and regulatory T cells(Treg) in the low-risk group was higher than in the high-risk group, suggesting that the in ltration of both types of cells may be a good prognostic signal for EC patients. Tregs are divided into two major subpopulations: thymus-derived Tregs (nTregs) and inducible Tregs (iTregs) (Bilate & Lafaille 2012). Tregs show signi cant versatility in their inhibitory mechanisms by releasing cytokines to directly inhibit signal transduction of effector T cells (Schmidt et al. 2012). Tregs can also inhibit and kill B cells by inducing programmed cell death (Gondek et al. 2008). In ESCC, Tregs expression is positively correlated with tumor invasion, suggesting that Tregs could be taken as a useful prognostic marker for ESCC (Nabeki et al. 2015). The number of Treg cells in patients with advanced EC is signi cantly reduced after chemotherapy (Xu et al. 2011). Tregs can also weaken anti-tumor immunity, making them potential indicators of poor prognosis (Lin et al. 2016).
It is the rst time that a prognostic nomogram is developed with nine immune related genes. This nomogram can be routinely applied and is cost-effective in practice, as it does not need whole-genome sequencing for EC patients. When combined with clinical parameters like TNM stage, the nomogram can show a greater prognostic performance.
Our research still has some shortcomings. First, clinical data should be introduced to validate our ndings. Second, the inequality between the samples may lead to potential bias. Third, the underlying molecular mechanisms needs to be explored through in vitro and in vivo studies.

Conclusions
The model based on nine IRGs is accurate to predict EC prognosis. In ltration of immune cells is related with EC risk. These molecules and immunobiomarkers provide a possibility of developing new target therapies.   Forest plot of hazard ratios showing the prognostic values of genes,in which the unadjusted hazard ratios as well as the corresponding 95% con dence intervals are displayed.    Forest plots including the risk score and other clinical parameters by univariate(A) and multiple regression analysis(B),in which the unadjusted hazard ratios as well as the corresponding 95% con dence intervals are displayed .   Nomogram predicting overall survival for EC patients. A: For each patient, several lines are drawn upward to determine the points received from the predictors in the nomogram. The sum of these points is on the "total point" axis. Then a line is drawn downward to determine the possibility of 1-and 3-year overall survival of EC. B,C:The calibration plot for internal validation of the nomogram. The Y-axis represents actual survival, and the X-axis represents nomogram-predicted survival.