1. Identification of DEGs and Enrichment Analysis in ESCA
MRNA sequencing results of ESCA patients collected from 150 tumor tissues and 11 adjacent normal tissues were downloaded from TCGA database. Three patients with OS fewer than 30 days were removed. A total of 2195 differentially expressed genes (DEGs) were obtained between the two groups with the screening criteria of p value < 0.01 and |log FC| > 0.5 (Fig. 2A). Among these DEGs, 306 genes were overlapped with mitochondria related genes(Fig. 2B).
Gene ontology (GO) pathway enrichment analysis of the 306 mitochondria related DEGs showed, for Biological Process (BP), DEGs were mainly enriched in the pathway of cofactor metabolic process, small molecule catabolc process, and fatty acid metabolic process; for Cellular Component (CC), DEGs were significantly enriched in the pathway of mitochondrial matrix, mitochondrial inner membrane, and mitochondrial outer membrane; for Molecular Function (MF), DEGs were associated with coenzyme binding, lyase activity, and flavin adenine dinucleotide binding (Fig. 2C).
Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis indicated that the pathways were mainly associated with neurodegeneration - multiple disease, carbon metabolism, and diabetic cardiomyopathy (Fig. 2D).
2. Risk Scoring Model Construction and Validation
Based on the gene expression and overall survival (OS) information of 147 tumor samples, 17 genes were screened out by univariate cox regression analysis of 306 mitochondrial related DEGs (Fig. 3A). Combining the results of LASSO regression analysis, it was considered that the model fit the best when the penalty coefficient was 12 (Fig. 3B), thus the corresponding 12 mitochondrial related DEGs including PDHA1, HSPE1, APOOL, H6PD, HIGD1A, MAOB, MTERF2, AASS, MT-ND1, BCAP31, SLC44A2, CHPT were selected. Then, multivariate cox regression analysis was performed on the 12 genes and six genes APOOL, HIGD1A, MAOB, BCAP31, SLC44A2 and CHPT1 were able to enter the equation as a prognostic predictor (Fig. 3C, Table 1). In ESCA, the expression level of BCAP31 was significantly up-regulated in tumor tissues, and the expression level of APOOL, HIGD1A, MAOB, SLC44A2 and CHPT1 were down-regulated in tumor tissues (Fig. 3D). A subsequent stratification analysis revealed these genes that were not differentially expressed by age, gender, grade, or stage (Fig. S1A). Kaplan-Meier survival curves (Table 2, Fig. S1B) confirmed that higher expression of MAOB and lower expression of BCAP31, CHPT1, and HIGD1A contributed to better survival. The difference of overall survival time between high and low expression level of APOOL and SLC44A2 had no statistical significance.
Table 1. The information of 6 prognosis-related genes.
Gene symbol
|
Gene ID
|
Full name
|
Location
|
Function of the encoded protein
|
APOOL
|
139322
|
apolipoprotein O like
|
|
The protein encoded by this gene contains an apolipoprotein O superfamily domain. This domain is found on proteins in circulating lipoprotein complexes.
|
HIGD1A
|
25994
|
HIG1 hypoxia inducible domain family member 1A
|
Mitochondrion and Nucleoplasm
|
Acts upstream of or within negative regulation of apoptotic process. Located in mitochondrion and nucleoplasm. Part of protein-containing complex.
|
MAOB
|
4129
|
monoamine oxidase B
|
Mitochondrion
|
The protein encoded by this gene belongs to the flavin monoamine oxidase family. It is a enzyme located in the mitochondrial outer membrane. It catalyzes the oxidative deamination of biogenic and xenobiotic amines and plays an important role in the metabolism of neuroactive and vasoactive amines in the central nervous sysytem and peripheral tissues. This protein preferentially degrades benzylamine and phenylethylamine.
|
BCAP31
|
10134
|
B cell receptor associated protein 31
|
|
The encoded protein is a multi-pass transmembrane protein of the endoplasmic reticulum that is involved in the anterograde transport of membrane proteins from the endoplasmic reticulum to the Golgi and in caspase 8-mediated apoptosis.
|
SLC44A2
|
57153
|
solute carrier family 44 member 2
|
mitochondrion and plasma membrane.
|
Enables choline transmembrane transporter activity. Involved in choline transport and transmembrane transport.
|
CHPT1
|
56994
|
choline phosphotransferase 1
|
Golgi membrane
|
Enables diacylglycerol cholinephosphotransferase activity. Involved in phosphatidylcholine biosynthetic process and platelet activating factor biosynthetic process.
|
Table 2. Clinical characteristics between low- and high-risk groups.
Variables
|
Low risk(N=74)
|
High risk(N=73)
|
P value
|
Age
|
|
|
|
0.939
|
|
<60
|
34
|
34
|
|
|
≥60
|
40
|
39
|
|
Barretts Esophagus
|
|
|
0.302
|
|
Yes
|
10
|
15
|
|
|
No
|
48
|
45
|
|
|
N/A
|
16
|
13
|
|
Columnar Metaplasia Present
|
|
|
0.166
|
|
Yes
|
8
|
17
|
|
|
No
|
29
|
31
|
|
|
N/A
|
37
|
25
|
|
Esophageal Tumor Cental Location
|
|
|
0.000
|
|
Distal
|
40
|
62
|
|
|
Mid
|
29
|
9
|
|
|
Proximal
|
4
|
2
|
|
|
N/A
|
1
|
0
|
|
History of Esophageal Cancer
|
|
|
0.115
|
|
Yes
|
2
|
6
|
|
|
No
|
55
|
47
|
|
|
N/A
|
17
|
20
|
|
Neoplasm Histologic Grade
|
|
|
0.161
|
|
G1/2
|
42
|
33
|
|
|
G3/X
|
32
|
40
|
|
M
|
|
|
|
0.026
|
|
M0
|
66
|
52
|
|
|
M1/X
|
5
|
13
|
|
|
N/A
|
3
|
8
|
|
N
|
|
|
|
0.007
|
|
N0
|
40
|
22
|
|
|
N1/2/3/X
|
32
|
45
|
|
|
N/A
|
2
|
6
|
|
T
|
|
|
|
0.496
|
|
T0
|
0
|
1
|
|
|
T1/2
|
34
|
28
|
|
|
T3/4
|
38
|
38
|
|
|
N/A
|
2
|
6
|
|
Reflux History
|
|
|
|
0.327
|
|
Yes
|
20
|
26
|
|
|
No
|
40
|
36
|
|
|
N/A
|
14
|
11
|
|
Gender
|
|
|
|
0.176
|
|
Male
|
60
|
65
|
|
|
Female
|
14
|
8
|
|
Tumor Stage
|
|
|
|
0.030
|
|
I/II
|
49
|
33
|
|
|
III/IV
|
22
|
32
|
|
|
N/A
|
3
|
8
|
|
Disease Type
|
|
|
|
0.000
|
|
Adenomas and Adenocarcinomas
|
23
|
47
|
|
|
Squamous Cell Neoplasms
|
51
|
25
|
|
|
Cystic, Mucinous and Serous Neoplasms
|
0
|
1
|
|
Alcohol History
|
|
|
|
0.413
|
|
Yes
|
50
|
53
|
|
|
No
|
23
|
18
|
|
|
N/A
|
1
|
2
|
|
BMI
|
|
|
|
0.200
|
|
Underweight
|
3
|
5
|
|
|
Normal
|
39
|
30
|
|
|
Overweight
|
26
|
36
|
|
|
N/A
|
6
|
2
|
|
Survival Status
|
|
|
|
0.001
|
|
Alive
|
56
|
36
|
|
|
Dead
|
18
|
37
|
|
Risk score of each patient was calculated according to the following formula.
Risk Score=APOOL×0.6955447་HIGD1A×0.4618985-MAOB×0.177846་BCAP31×0.5375771།SLC44A2×0.619591་CHPT1×0.3943502
Divided into high and low risk groups based on the median value of risk scores, Kaplan - Meier survival analysis confirmed that the prognosis of patients in high risk group was significantly worse than that in low risk group (p value = 8.8e-06) (Fig. 4A). Figures 4B, C show risk scores and survival status distributions. With the increase of risk score, the risk plot showed that the death toll increased and the patients’ survival time decreased. The heatmap showed the expression level of 6 modeling genes in the two groups (Fig. 4D). The AUC values for 1, 3, and 5 years were 0.738, 0.738, and 0.924, respectively, indicating that the model could be used to predict patient prognosis accurately (Fig. 4E). The C-index of the model was 0.723. As shown in Figs. 4F-J, the results of the validation set (GSE53624) were consistent with our previous research results.
3. Nomogram construction and validation
Clinical information of 147 ESCA patients was downloaded from the UCSC database. By univariate and multivariate Cox hazard regression analysis, Columnar metaplasia present, M stage, reflux history and risk score were selected to construct Nomogram as P value < 0.05 (Fig. S2A). The AUC values for 1, 3, and 5 years were 0.7, 0.722, and 0.835, respectively, indicating the accuracy of the model in differentiation (Fig. S2B). The C-index of Nomogram was 0.777. The calibration plot revealed excellent agreement between observations and predictions (Fig. S2C). Based on this result, Decision Curve Analysis (DCA) suggested the clinical effectiveness of risk score when combined with various clinical features and Nomogram (Fig. S2D). These results revealed that the risk signature could serve as an independent factor for predicting the prognosis of ESCA patients.
4. Pathway analysis in high and low risk groups
1188 DEGs were obtained between the high risk score and the low risk score groups with the screening criteria of p value < 0.01 and |log FC| > 0.5. The DEGs were then analyzed for GO and KEGG enrichment.
For GO BP, DEGs were mainly enriched in “regulation of neuron projection development, axonogenesis, and epidermis development”. For GO CC, DEGs were significantly enriched “cell-cell junction, collagen-containing extracellular matrix, and apical part of cell”. For GO MF, DEGs were associated with “cell adhesion molecule binding, actin binding, actin filament binding” (Fig. 5A).
KEGG enrichment analysis indicated that the DEGs were mainly enriched in “proteoglycans in cancer, axon guidance, hippo signaling pathway, and tight junction” (Fig. 5B).
GSEA was performed to investigate the potential functions and pathways associated with the high risk group and the low risk group. The top 20 enriched pathways in the high risk and low risk group were obtained (Fig. S3). The result demonstrated that “genes down-regulated in metastases from malignant melanoma, genes down-regulated in esophageal adenocarcinoma and Barret's esophagus” were abundantly enriched in low risk group (Fig. 5C). While “Genes up-regulated in early gastric cancer, cervical cancer proliferation cluster” potential were enriched in high risk group (Fig. 5D).
On the basis of KEGG enrichment analysis results, we explored the correlation between risk score and Hippo signaling pathway biomarker genes. The results of Spearman correlation analyses demonstrated that risk score negatively correlated with a majority of the Hippo signaling pathway biomarker genes in ESCA samples, including AMOTL1, WWTR1, NGFR, TEAD2 et, al. (Fig. 5E). Based on GO CC results, Spearman's correlation analysis revealed that risk score was negatively correlated with more cell-cell junction biomarker genes (Fig. 5F).
5. Immune analysis in high and low risk groups
According to CIBERSORT immune analysis, there were significant differences in the types of immune cells in the high and low risk groups (Fig. 6A). Samples with high risk demonstrated higher abundance of memory resting CD4+ T cells, resting NK cells, M0 and M2 Macrophages, and lower abundance of M1 Macrophages. Results based on CIBERSORT also suggested that samples in high risk group tended to possess higher infiltration levels of B cell, as well as lower infiltration levels of macrophage, and T cell (Fig. S4A, B). To determine whether there is a correlation between risk score and immune cells, including effector memory CD4+ T cell, central memory CD4+ T cell, activated CD4+ T cell, activated DC, immature DC, plaslacytoid DC, macrophage and NK cell, we conducted a correlation test between immune cell marker genes and risk score, as shown in Figure S4. Figure 6B-I showed marker genes connected with risk score, which was suggestive of a certain correlation between immune cells and risk score.
As a new weapon against cancer, immune checkpoint therapy, which controls the activity of T cells, improved anti-tumor immune responses[20]. Therefore, we identified both high risk and low risk group immune checkpoint expression (Fig. S4C). The result indicated that there were significant differences in the expression of CD115, TIM1, CD30, CD112, HVEM, HHLA2, SEMA4A, and BTNL2 (Fig. 6J). Afterwards, we calculated the correlation between risk score and the expression of immune checkpoint (Fig. S4D). The results illustrated that BTNL2, SEMA4A, and CD30 were negatively correlated with risk score, while TIM1, CD155, HVEM, CD112, and HHLA2 were positively correlated with risk score (Fig. 6K).
ESTIMATE calculates the tumor purity, the ESTIMATE score, the immune score, and the stromal score of each ESCA sample. Spearman correlation analysis revealed risk score was negatively correlated with the stromal score, while not significantly associated with the tumor purity, the ESTIMATE score, and the immune score(Fig. 7A). And between high risk and low risk group, there was no significant difference in tumor purity, stromal score, immune score, and ESTIMATE score. (Fig. 7B).
ICI therapy has undoubtedly proven to be an effective strategy of immunotherapy, which led to an important breakthrough in the field of antitumor therapy[21].Thus, the TIDE algorithm was used to predict the potentially different responses to ICB therapy. Patients with higher risk score showed lower level of TIDE score (Fig. 8A). The high risk group contained a higher proportion of responders to ICI therapy compared with low risk group (Fig. 8B). Samples were divided into the high score group and the low score group according to the median value of the stromal score. The low stromal score group contained a higher proportion of responders to ICI therapy compared with low risk group (Fig. 8C). Furthermore, ICI therapy was more effective in patients with high risk scores and low stromal scores as compared to other groups (Fig. 8D). Above all, we speculated that patients with a high risk and low stromal score could benefit more from ICI therapy than patients with a low risk and high stromal score.
6. Mutation analysis in high and low risk groups
Mutation data of the ESCA patients were obtained from TCGA by the maftools package. As shown in Fig. 9A, TP53, TNN, SYNE1, MUC16, DNAH5, CSMD3, HMCN1, KMT2D, LRP1B, PCLO, USH2A, and FLG were frequently mutated in both high risk and low risk group, excluding samples with no mutations called. Among these many mutant genes, the mutation rate of TP53 was significantly different between the high and low risk groups (Fig. 9B). Tumor mutation burden (TMB) scores were calculated for each ESCA patient. Spearman correlation analysis suggested a positive correlation between TMB score and risk score (Fig. 9C), though there was no significant difference in TMB score between high and low risk groups (Fig. 9D). The higher the TMB, the better the effect of tumor immunotherapy. The result is consistent with the result of the high risk group contained a higher proportion of responders to ICI therapy.
7. Drug Sensitivity Analysis of Risk Scoring Model
CellMiner was used to assess the interactions of model genes on drug sensitivity, in order to facilitate better precision treatment of ESCA. As shown in Fig. 10, drugs with the 2 highest correlation with model genes and riskcore were selected, including Simvastatin, Fluvastatin, SNS-314, Nitazoxanide, Everolimus, malacid, Tegafur, 4SC-202, PX-316, Artemether, Selumetinib, Acetalax, BMS-690514,and Sapitinib. A complete list of all the drugs related with model genes and risk score (|r|>0.2, P < 0.05) is provided in the Table S2.