Transcriptomics subtypes of ESCC with distinct prognosis
To fully understand the transcriptomic heterogeneity and molecular signatures of ESCC associated with prognosis, we investigated the transcriptomic landscape of 120 treatment-naive ESCC tumours prospectively collected under strict protocols (Supplementary Table 1). Unsupervised clustering of RNA-seq profiling using non-negative matrix factorization revealed four stable clusters (Fig. 1a and Extended Data Fig. 1a, Supplementary Table 2). Gene set enrichment analysis (GSEA) of representative genes in each cluster annotated these subtypes as ‘differentiated’, ‘immunogenic’, ‘metabolic’ and ‘stemness’ (Fig. 1b, Supplementary Table 2 and 3). Gene signatures of these four subtypes were validated in three independent patient cohorts 12-14 (Extended Data Fig. 1b). Keratinocyte differentiation and epidermis development genes such as LCE3D, CDSN and KLK5 defined the differentiated subtype. B-cell surface markers MS4A1, CD79A and MZB1 and T-cell chemokine ligands CXCL9, CXCL13 and CXCL10 characterized the immunogenic subtype. The metabolic subtype is associated with the upregulation of genes involved in drug metabolism by cytochrome P450 and retinol metabolism such as GSTA1, ADH7, UGT1A10 and UGT1A3. High expression of WFDC2, PEG10, Wnt signalling modulator SFRP1 and squamous cell carcinoma (SCC) stem cell marker LGR6 15 defined the stemness subtype (Supplementary Table 2). All immune-associated pathways, such as the interferon gamma pathway, TCR pathway and chemokine-signalling pathway, were significantly downregulated in the stemness subtype (Fig. 1b). The transcription factor (TF) profiling further highlighted subtype-specific TFs, including the upregulated activity of MYB, SOX10, SP5 and ARNT2 in the stemness group (Extended Data Fig. 2). Interestingly, these four molecular subtypes of ESCC presented distinguishing histopathological features (Fig. 1c), reflecting the molecular characteristics. For example, keratin pearls within tumours were readily seen in the differentiated subgroup; immunogenic subgroup showed extensive immune cell infiltration within tumours; some tumour cells in the metabolic subgroup demonstrated eosinophilic cytoplasm with less immune cell infiltration, whereas the stemness subgroup presented poorly differentiated tumour cells and had few immune cell infiltrating within the tumour. Importantly, the stemness subtype was associated with the worst overall survival of all subtypes (Fig. 1d, log rank test P = 0.028). The significance as a prognostic biomarker of the top four genes (WFDC1, SFRP1, LGR6 and VWA2) related to the stemness signature was further proven in an independent cohort of 65 ESCC patients using RT-PCR (Fig. 1e, P = 0.031). Strikingly, overexpression of SFPR1 in ESCC cell line KYSE-70 could dramatically increase cell proliferation in vitro and xenograft tumour growth in vivo, while SFRP1 knockdown in KYSE-520 cells had the opposite effects (Fig. 1f and Extended Data Fig. 3).
Tumour immune cell infiltration and NK-like phenotype
We next examined the tumour immune environment of patient tissues based on RNA-seq and determined how this correlated with the four transcriptomic subtypes and prognosis. Several in silico immune deconvolution published methods 16-21 were benchmarked against molecular protein staining of CD4+ and CD8+ cells, tumour cellularity and copy number profiles (Methods, Extended Data Fig. 4), and the best performing method, the Danaher et al. 18 signature, was used to investigate the tumour-infiltrating immune cell populations within the 120 ESCC tumours. Consensus clustering based on the immune cell estimates revealed three distinct clusters: C1 is associated with a generally low level of immune cell infiltration, but with relatively high levels of neutrophils, dendritic and mast cells; C2 is marked by a high level of immune infiltration of B cells, T cells and macrophages; C3 is correlated with a low level of infiltration of all immune cell populations, except for NK cell markers (Fig. 2a). The C3 cluster is significantly associated with high expression of NK cell markers XCL1, XCL2 (Fig. 2b) and CD160 (Extended Data Fig. 5a).
The prognostic values for all profiled immune cells were then assessed in our cohort using a multivariate Cox regression analysis, accounting for patient age, drinking and smoking history, tumour stage and grade, and the higher NK cell abundance was the strongest poor prognostic factor (Fig. 2c and 2d (China), log rank test P = 0.019). This negative correlation with overall survival for NK cell markers was also seen in the TCGA cohort of 90 ESCC samples (Fig. 2d (TCGA), P = 0.015). The comparison between the four transcriptomic subtypes and three immune profile clusters demonstrated a non-random correlation (Fisher’s exact test, p = 9.37e-11). Fifteen of 31 cases (48.4%) from the C3 cluster were stemness tumours, while the differentiated and immunogenic subtypes were overrepresented in the C1 and C2 clusters, respectively (Fig. 2e). Of note, we also observed significantly positive correlations of mRNA expression between LGR6 and XCL1 (r = 0.40, p < 0.0001), XCL2 (r = 0.39, p < 0.0001) and CD160 (r = 0.32, p = 0.0003) (Fig. 2f).
Given the strong association between stemness and NK cell estimates in contrast to the significant absence of immune cell infiltration within tumours of the stemness subgroup (Fig. 1c), immunohistochemistry staining for XCL1, CD160 and LGR6 was performed in the matched serial sections of tumour tissues in order to determine the spatial composition and cell types that expressed these markers. Surprisingly XCL1 and CD160 were predominantly expressed in tumour cells (Fig. 2g), with a few positive immune cells infiltrated in the stroma. This was also supported by the strong positive correlation between XCL1/2 gene expression and tumour cellularity (Extended Data Fig. 5b). XCL1 or CD160 expression colocalized with LGR6 expression (Fig. 2g, Extended Data Fig. 5c), and co-expression within tumours was seen in 16% (17/107) of our cohort for XCL1 and LGR6, and 25% (25/107) for CD160 and LGR6. Of note, XCL1 was exclusively expressed in cancer cells showing adenocarcinoma morphology and dysplastic cells in the submucosa glands, while CD160 could be expressed in both squamous carcinoma and adenocarcinoma cells (Fig. 2g, Extended Data Fig. 5c) as well as in the proliferative and dysplastic cells of the submucosa glands (Extended Data Fig. 5d). Interestingly, we observed that XCL1-expressing dysplastic cells in the submucosal gland in most of the cases are completely separated from the squamous cell carcinoma, suggesting that this subgroup of patients might concurrently have both squamous cell carcinoma and adenocarcinoma or this adeno-squamous carcinoma might be derived from submucosa gland epithelial cells. Our finding of tumour cells overexpressing NK cell markers XCL1/2 and CD160 correlating with the lowest level of immune cell infiltration suggests a new tumour-intrinsic immune evasion mechanism of ESCC.
Molecular characteristics of XCL1-high NK-like cells
Characterization of the NK-like tumour cells is essential to uncover novel molecular mechanisms and pathways which may offer alternative therapeutic targets and candidate drugs for this subgroup of ESCC. We first utilized the Cancer Cell Line Encyclopedia (CCLE) RNA-seq data to analyze the association of XCL1/2 with ESCC. Interestingly, EC cell lines had the highest level of XCL1 mRNA expression out of all cancer types profiled (Fig. 3a). Hierarchical clustering based on differentially expressed genes (n = 413, p < 0.01) between XCL1 high (n = 11) and low (n = 11) ESCC cell lines clearly separated the two groups (Fig. 3b, Supplementary Table 4). The GSEA results further showed that XCL1-high cells exhibited upregulation of drug metabolism cytochrome P450, retinol metabolism and biological oxidations (false discovery rate, FDR < 0.0001), while asparagine N-linked glycosylation, N-Glycan biosynthesis, cytosolic tRNA aminoacylation and signalling by NTRK1 were highly over-represented in XCL1-low cells (Fig. 3c). We also observed a significantly positive correlation in mRNA expression between XCL1 and LGR6 for ESCC lines (Fig. 3d, Pearson’s correlation r = 0.59, p = 0.004), further highlighting the strong association between NK-like and stemness phenotypes. Interestingly, XCL1-high cells also showed significantly downregulated cell cycle activities compared to XCL1-low cells (Fig. 3e), suggesting that XCL1-overexpressing cells are more quiescent, slow-cycling cells.
Given the upregulation of drug metabolism cytochrome P450 in XCL1-high ESCC cells, we next explored the drug sensitivity to fluorouracil (5-FU), the first-line chemotherapy drug for ESCC, between XCL1-high and -low cell lines. All XCL1-high cells lines tested exhibited a much higher IC50 than XCL1-low cells (Fig. 3f, Mann-Whitney rank test, p = 0.007). Impressively, over-expression of XCL1 in an XCL1-low cell line KYSE-150 resulted in a two-fold increase in IC50 of 5-FU compared to the control cells (Fig. 3g, Extended Data Fig. 6a and 6b) while overexpression of XCL1 did not affect the cell proliferation (Extended Data Fig. 6d). We then explored the cancer drug sensitivity data of 367 compounds from the Genomics of Drug Sensitivity in Cancer (GDSC) resource 22. Compared to XCL1-low cells, XCL1-high cell lines displayed a higher level of resistance to almost all drugs screened (Fig. 3h and Supplementary Table 5, IC50 z-score t-test p < 0.05). XCL1-low cell lines exhibited a significantly higher sensitivity to Wnt-C59, LGK974 (both targeting PORCN and WNT signalling) and CP724714 (targeting ERBB2 and RTK signalling). Deregulated pathways associated with XCL1-high cells may represent novel therapeutic targets for this subtype of ESCC with poor prognosis.
The genomic landscape among molecule subtypes of ESCC
Given the heterogeneous transcriptomic and immune signatures observed in our cohort, we next examined whether the genomic landscapes differ among ESCC subtypes. Whole-exome sequencing of 103 samples with matched RNA-seq were performed and analysed (Supplementary Table 6). We identified on average 358 somatic mutations (range 21 – 3,583) and 152 non-silent mutations (range 9 – 1,242) per sample respectively. Overall and subclonal mutation burden and tumour purity were not correlated (Extended Data Fig. 7a). The overall mutational load, the number of non-silent mutations, and somatic copy number profiling were similar among the four ESCC subtypes (Extended Data Fig. 7b). Three driver gene detection methods, MutsigCV 23,24, dNdScv25, OncodriveFM 26, were applied, and 10 ESCC driver genes were identified by at least two methods (Extended Data Fig. 7c), which had all been detected in ESCC previously 5,7-9,27,28. Incorporating representative genomic alterations of the three molecular subtypes ESCC1/2/3 previously identified from 90 TCGA ESCCs 13 and driver genes that have been implicated in large series of Chinese ESCCs5,27, we found that although the alteration frequencies (i.e., mutations, amplifications and deletions) of most significant genes in ESCC were similar among our subtypes, the stemness type had a higher frequency of NOTCH1 and EP300 alterations, affecting 39% and 26% of stemness tumours, respectively (Fig. 4a, Fisher’s exact test, p = 0.018 and 0.008 for NOTCH1 and EP300, respectively). Interestingly, no alterations in NOTCH1 and EP300 were detected in the metabolic subtype, and no ZNF750 alterations were detected in the differentiated subtype. Stemness tumours seemed to have the highest overall frequencies of ESCC1 and ESCC2 alterations (74% and 65% of cases, respectively). Across our cohort, although the highest overall frequency was observed for ESCC1 alterations (61%), other subtype alterations also occurred substantially, 50% for ESCC2 and 52% for ESCC3 (Extended Data Fig. 7d).
Next, we investigated the genomic alterations of ESCC functional genes identified above across ESCC cell lines. Interestingly, in XCL1-high cells, alterations in EP300 occurred in 4/11 (36%) cases, compared to zero cases in XCL1-low cells (Fig. 4b), corresponding to our result of the highest EP300 alternation frequency in the stemness group. Whereas alterations in NOTCH1, FAT1 and KDM6A seemed to be more prevalent, 64%, 55% and 45%, in XCL1-low cells, respectively, than in XCL-high cells, 18%, 9% and 9%, respectively (Fig. 4b). EP300 is a histone acetyltransferase, and its mutation has been shown to be associated with poor outcome in ESCC and HNSCC29,30. We also observed significantly poorer overall survival for patients with EP300 mutations in our cohort (Fig. 4c, HR = 5.248, p = 8e-06). When combining with CREBBP mutations, the EP300/CREBBP mutation status was still predictive of worse survival (Fig. 4d, HR = 2.229, p = 0.02). Although CREBBP mutations were not enriched in the stemness subtype, alterations in EP300 and CREBBP tended to co-occur (n=4) only in stemness samples (Fig. 4a). Furthermore, EP300-mutated patient samples (n=8, 7.8%) had significantly higher EP300 expression levels than EP300-wildtype samples (Fig. 4e, Wilcoxon rank test, p < 0.01), and this was largely due to the unbalanced overexpression of the mutated alleles (Fig. 4f), suggesting the gain-of-function mode for these mutations.
EP300 mutations and overexpression and their relationship with stenmess and NK-like phenotype in ESCC
We next examined whether EP300 mutations and expression had any associations with stemness and NK-like gene signatures. We first performed differential expression analysis between EP300-mutated and wildtype samples, followed by GSEA against representative (i.e., upregulated) gene sets of four transcriptomic subtypes and XCL1-high genes derived from cell lines. We found that the gene set representative of stemness subtype was upregulated in EP300-mutated compared to wildtype samples (NES = 1.36, FDR = 0.05), while all gene sets of other three subtypes were massively downregulated in EP300-mutated samples (Fig. 5a and 5b, FDR < 0.001, Extended Data Fig. 8a). Although there was no upregulation of XCL1-high genes, we observed significantly increased gene expression of CD160 in EP300-mutated samples (Fig. 5c, p = 0.003). Interestingly, the EP300 gene expression was also the highest in the stemness subtype, and there was a positive correlation in expression between EP300, and XCL1/2 and CD160 (Extended Data Fig. 8b and 8c).
Next, focusing on EP300-wildtype samples, we selected top 20 EP300-high and 20 EP300-low expression samples, and found that the stemness gene set and XCL1-high genes were significantly upregulated in the EP300-high group (Fig. 5a and 5b, ‘stemness up’ gene set, NES = 2.49, FDR < 0.0001; XCL1-high gene set, NES = 1.47, FDR = 0.014), whereas the differentiated gene set was greatly downregulated (NES = -3.47, FDR < 0.0001). Interestingly, we found the metabolic gene set was significantly upregulated in EP300-high samples (NES = 2.71, FDR < 0.0001), while the immunogenic gene set was not notably affected by EP300 expression levels (Fig. 5a). Taking all these together, our results demonstrate a strong positive association between EP300 mutations, overexpression and stemness / NK-like signatures, suggesting that EP300 may have a role promoting tumours to become a more aggressive subtype in ESCC.
Functional mutation enrichment in pathways among ESCC subtypes
To further investigate whether specific pathway-related mutations were associated with transcriptomic subtypes, we performed functional mutation enrichment analysis adjusted for sample mutation load using the 50 cancer hallmark gene sets (see Methods). We identified functionally relevant mutations in Wnt/β-catenin signalling which were significantly more enriched in the stemness samples, while functional mutations in inflammatory response, angiogenesis and hypoxia were highly enriched in the immunogenic tumours (Fig. 5d, Kruskal–Wallis test, p < 0.05, Extended Data Fig. 9a). Of note, the hallmark gene set expression profile using GSVA also confirmed the functional differences identified among the four subtypes (Extended Data Fig. 9b). We then evaluated whether the pathway functional mutation enrichment affected the expression of pathway genes, and found that mutation enrichment in Wnt/β-catenin signalling, inflammatory response and hypoxia were positively correlated with pathway gene expression activities for the corresponding ESCC subtypes (Fig. 5e), suggesting that high level of Wnt signalling expression in the stemness group could be a consequence of functional mutations in regulators of the Wnt pathway, and these functional mutations were the most enriched in stemness samples. This is also the case for the high pathway activity of inflammatory response and hypoxia for the immunogenic subtype. The mutations in the Wnt signalling pathway were highly deleterious as predicted by SIFT31 and PolyPhen-232, and the stemness group seemed to have the highest proportion of deleterious Wnt mutations (93%), significantly higher than that of the differentiated subtype (60%, Fisher’s exact test, p = 0.035, Extended Data Fig. 9c).
Clonality analysis among molecule subtypes of ESCC
Given the well-documented interplay between immune infiltration and tumour clonal evolution 33,34, we finally investigated the level of intratumoural heterogeneity (ITH, measured by Shannon diversity index) based on variant allele frequencies of all mutations in each tumour among the immune and transcriptomic subtypes. Our results showed the greatest ITH in the C3 cluster, whereas in C2 where the immune infiltration was the highest, the diversity was greatly reduced (Fig. 5f). In addition, the highest ITH presented in the stemness subtype while the immunogenic subtype had the lowest ITH (Fig. 5f). Our results suggest that the immune microenvironment strongly influences the tumour evolution and (sub)clonal architecture. The stemness / C3 group with the highest level of immune exclusion also had the highest tumour (sub)clonal diversity. Importantly, high ITH was also a significant marker of poor prognosis in ESCC (Fig. 5g, multivariate analysis, hazard ratio HR = 2.214, log rank P = 0.007).