Overview of the mutational landscape of the KEAP1-NRF2 pathway in cancer
First, we investigated the mutational landscape of the KEAP1-NRF2 pathway in cancer by analyzing 11,079 TCGA samples from 33 different cancer types using the CVCDAP database. As shown in (Fig.2A), we found that the KEAP1-NRF2 pathway was altered in many cancers, however it was altered with a percentage that was higher than 10% in only five cancer types. Lung Squamous Cell Cancer (LUSC) was the cancer type that harbor the highest percentage of KEAP1-NRF2 pathway alterations with mutations in 24.2% (KEAP1, 9.92%; NRF2, 13.69%; both, 0.59%) of samples. LUAD came in the second place with KEAP1-NRF2 pathway mutations in 20.82% (KEAP1, 17.68%; NRF2, 2.97%; both, 0.17%) of LUAD patients. In addition, the KEAP1-NRF2 pathway was altered in 12.07% ESCA patients (KEAP1, 2.19%; NRF2, 9.34%; both, 0.54%), in 11.7% of uterine corpus endometrial carcinoma (UCEC) patients (KEAP1, 3.19%; NRF2, 7.04%; both, 1.48%) and in 10.1% of HNSC (KEAP1, 3.95%; NRF2, 5.49%; both, 0.659%).
Overview of the ESCA mutational landscape
The lack of prognostic biormarkers for the ESCA subgroup with KEAP1-NRF2 mutations motivated us to focus on identifying prognostic biormakers that is regulated by KEAP1-NRF2 in ESCA. First, we investigated the different driver mutations in the TCGA-ESCA patients and we found that TP53 was the most frequently mutated gene in ESCA as it was found mutated in 86% of patients (Fig.2B). 42% of ESCA patients harbored TTN mutations which made TTN in the second place. MUC16, SYNE1, CSMD3, FLAG, DNAH5, HMCN1, LRP1B, PCLO and RYR2 were mutated in 23%, 20%, 20%, 18%, 16%, 16%, 16%, 15% and 12% of ESCA patients, respectively. KEAP1-NRF2 was found mutated in 12% of ESCA patients.
The TCGA-ESCA cohort included 182 patient samples. We segregated the cohort into two groups: KEAP1-NRF2-mutated (22 patients) and wild-type (160 patients). In order to ensure that the differences between the two groups were due to KEAP1-NRF2 mutations, we first investigated the driver mutations in the KEAP1-NRF2-mutated group. We found that TP53 was mutated in 95% of patients in this group, NRF2 was found mutated in 85% patients, TTN, KMTD2, MUC13, KEAP1, PATCH, and SACS were found mutated in 41%, 32%, 27%, 23%, 23% and 23%, respectively (Fig.3A). Then, we performed differential gene mutation analysis between the two groups to investigate the percentages of mutations of these driver genes in the two groups. Only NRF2 and KEAP1 weren't mutated in wild-type group while the other driver genes were found mutated in both the KEAP1-NRF2-mutated and wild-type groups, with similar percentages (Fig.3B). Therefore, none of these driver genes can be considered as variables that contribute to differences between the two groups.
In order to better understand the mutational landscape of KEAP1-NRF2 in ESCA, we used the USCS Xena browser to examine the types of mutations and their positions in the domain structure of KEAP1 and NRF2 proteins. As noted earlier, we found that 2.19% of TCGA-ESCA patient samples had KEAP1 mutations while NRF2 was mutated in 9.34% and both were mutated in 0.54%. All the detected KEAP1 mutations were missense mutations while 77.8% (14/18) of NRF2 mutation were missense mutations, 11.1% were intron (2/18), 5.6% (1/18) were in-frame-deletions and 5.6% (1/18) were in-frame-insertions. KEAP1 consists of 605 amino-acids, and 3 main domains with two mutations were detected in the BTB (broad-complex, tramtrack, and bric-a-brac) domain, three in the IVR (intervening region), and one in the Kelch domain, which is essential for the binding of NRF2 (Fig.3C). In the case of NRF2 structure, the majority of mutations (17) occurred in the crucial KEAP1-binding domain Neh2, and only one was found in the Neh1 domain.
Identification of genes regulated by the KEAP1-NRF2 pathway in ESCA
In order to identify genes that are regulated by the KEAP1-NRF2 pathway in ESCA, we subjected the TCGA-ESCA data set (22 KEAP1-NRF2-mutated versus 160 wild-type tumor samples) to differential gene expression analysis. We identified 896 DEGs with log FC >|1| (p < 0.05 with FDR adjustment) (Fig.4A). Of these DEGs, 403 were up-regulated and 493 were down-regulated (Additional file1: Table S1). Since the ultimate effect of changes in the KEAP1-NRF2 pathway is increased activity of NRF2; and hence the over-expression of its target genes, it was not surprising that several bonafideNRF2 target genes were among the up-regulated genes including, AKR1C1, AKR1C2, AKR1B10, GSTM2, UGT1A6, AKR1C3, G6PD, GCLC, GCLM, GSTM3, GPX2, ABCC1, OSGIN1, SRXN1, and TXNRD1. The gene expression profiles of 22 KEAP1-NRF2-mutated and 160 wild-type ESCA patient samples were visualized on a heatmap produced by unsupervised hierarchical clustering, and major differences between the gene expression patterns enabled cluster analysis to discriminate between sample types. Significant differences or trends between KEAP1-NRF2-mutated and wild-type ESCA patient samples were detectable for DEGs with log FC> |1| (Fig.4B). CES1, AKR1C1, ADH7, ALDH3A, and CYP4F11 were the top five up-regulated genes in KEAP1-NRF2-mutated ESCA patient samples (Fig.4C), while PIGR, MUC13, TASPAN8, LGALS4, and OLFM4 were the top five down-regulated genes (Fig.4D).
Epithelial-mesenchymal transition is regulated by the KEAP1-NRF2 pathway in ESCA
The expression signatures of the hallmark gene sets, each containing 50 specific gene sets, were derived by concentrating multiple gene sets from the Molecular Signatures Database to represent well-defined biological statuses or courses. GSEA was performed to determine whether the identified gene sets showed statistically notable differences between the KEAP1-NRF2-mutated and their wild-type counterparts groups (Additional file 2: Table S2). Interestingly, four gene sets were up-regulated in the KEAP1-NRF2-mutated ESCA, namely, estrogen response late, hypoxia, reactive oxygen species pathway and EMT, the 4 gene sets were greatly enriched, with FDR < 0.05 (Fig.5). The gene set with the lowest FDR, namely, EMT (FDR = 0.001), which contained 194 genes was selected for further analysis. In order to specifically identify EMT genes that is associated with the KEAP1-NRF2 pathway in ESCA, we integrated the DEGs between KEAP1-NRF2-mutated and wild-type ESCA patient samples (log FC> |1.5|, FDR < 0.05) with the set of 194 EMT-enriched genes (Fig.6A) using Venny 2.1 web-based tool (http://bioinfogp.cnb.csic.es/tools/venny/index.html). Intriguingly, we found 11 common genes: SPP1, PTHLH, WNT5A, COL11A1, COL7A1, GPC1, SNAI2, ADAM12, FBN2, PFN2, and IGFBP3 (Fig.6B).
Epithelial-mesenchymal transition genes were validated using ESCA cell lines
In order to validate these 11 EMT-related genes as potential NRF2 targets, we used CCLE to download RNA-seq mRNA expression data of 1019 cell lines. Then, using the COSMIC data base we identified human ESCA cell lines that harbors NRF2 and/or KEAP1 mutations. We selected three cell lines (TE6, TE11 and KYSE180) as the KEAP1-NRF2-mutated group. In addition, we selected another three human ESCA cell lines that have neither NRF2 nor KEAP1 mutations (TE5, TE9 and KYSE150) as the wild-type group. Then, we carried out differential gene expression analysis between the two groups. As shown in GSTM3, AKR1C1 and TXNRD1 (well-known NRF2 targets) showed significant up-regulation in the mutated group compared to the wild-type counterpart, which ensures KEAP1-NRF2 pathway alteration in the group (Fig.6C). Furthermore, we investigated the expression of these 11 EMT-related genes between the two groups. Interestingly, five of the 11 genes (SPP1, WNT5A, PTHLH, PNF2, and GPC1) were significantly up-regulated in the mutated ESCA cell lines (Fig.6D). This finding suggests that these five genes are potential NRF2 targets. For further evidence, we investigated the presence of the putative and known antioxidant responsive elements (AREs), the NRF2 binding site, (Fig.6E) in the promoter region of these five genes by using ConTra V3 web tool. We performed insilico analysis within the –1 kb upstream the transcription start site (TSS) of the 5 genes. Interestingly, we identified highly conserved NRF2 binding sites (AREs) in the promoter regions of human PTHLH (positions: -71 and -916), WNT5A (position: -434), SPP1 (positions: -545,-605 and -870), PFN2 (positions:-737 and -864) and GPC1 (position: -458) (Fig.6F).
Evaluation of prognostic power of EMT gene-signature regulated by the KEAP1-NRF2 pathway in ESCA
In order to evaluate the prognostic power of these 5 genes (SPP1, WNT5A, PTHLH, PFN2, and GPC1) as an EMT-derived signature for KEAP1-NRF2 pathway alterations in ESCA, we first analyzed overall survival in the TCGA-ESCA cohort using the SurvExpress database. A total of 184 patient samples were divided into high-risk (n = 127) and low-risk groups (n = 57) based on their expression patterns (Fig.7A). The separation of risk groups was optimized using the ‘maximize risk group’ option provided in the SurvExpress database. The survival probability estimates in the two risk groups were visualized as Kaplan-Meier plots. Strikingly, overall survival analysis revealed that the patients in the high-risk group had poorer survival (HR = 1.67 (CI: 1.01-2.78); Log-Rank p = 0.04443) than the low-risk group (Figure.7B). Moreover, the Rao Giddings (GSE11595) cohort (34 ESCA patient samples) showed that the expression of SPP1, WNT5A, PTHLH, PFN2, and GPC1 in the high-risk group (n=17) was associated with poorer survival (HR = 6.84 (CI: 2.36 - 19.8); Log-Rank p = 3.072×e-5) than the low-risk group (n=17) (Fig.7C). In addition, we analyzed the overall survival in the Peters C.Fitzgerald (GSE19417) cohort available in the SurvExpress database. After optimized risk group separation, a total of 70 ESCA patient samples were divided into high-risk (n = 27) and low-risk groups (n = 43) based on their expression patterns (Fig.7D). The survival probability estimates in the two risk groups were represented as Kaplan-Meier plots. Similarly, overall survival analysis showed that the patients in the high-risk group had poorer survival (HR = 2.25 (CI: 1.34 - 3.79); Log-Rank p = 0.001659) than the low-risk group. As shown in Fig.5, The 5 genes were significantly over-expressed in the high risk patients compared to the low risk group. Moreover, the expression of SPP1, WNT5A, PTHLH, PFN2, and GPC1 successfully discriminated the survival of the ESCA high risk group from that of the low risk group in three ESCA cohorts (288 patients). These findings indicated that over-expression of the 5 genes is associated with a poor prognosis of ESCA and presents SPP1, WNT5A, PTHLH, PFN2, and GPC1 as an EMT signature based on changes in the KEAP1-NRF2 pathway in ESCA.