Distinct TNM stages represent with different distributions of molecular subtypes
We analyzed the association between CMS subtypes and tumor stage in a meta-cohort comprising 1,040 patients (Table 1). In line with previous observations we detected an increased prevalence of the poor-prognosis mesenchymal subtype (CMS4) in advanced stages of disease as compared to early stage in the aggregated cohort (stage I 12 (9.8%), stage II 89 (22.9%), stage III 94 (29.4%) and stage IV 45 (38.5%), p < 0.001)(Fig. 1 and Additional file 1: Table S1). The same increase was observed for the individual cohorts separately (Additional file 1: Table S1 and Additional file 1: Figure S1).
Table 1
Basic characteristics of the aggregated cohort (n = 1,040)
| | Total | | GSE39582 | | TCGA |
| | n = 1040 | | n = 511 | | n = 529 |
Gender | Female | 476 | 45.8% | | 227 | 44.4% | | 249 | 47.1% |
| Male | 564 | 54.2% | | 284 | 55.6% | | 280 | 52.9% |
Age | Median (IQRa) | 68 (59–77) | | 69 (59–76) | | 68 (59–77) |
TNM | I | 133 | 12.8% | | 38 | 7.4% | | 95 | 18.0% |
| II | 417 | 40.1% | | 216 | 42.3% | | 201 | 38.0% |
| III | 355 | 34.1% | | 200 | 39.1% | | 155 | 29.3% |
| IV | 135 | 13.0% | | 57 | 11.2% | | 78 | 14.7% |
MSI | MSS | 887 | 85.3% | | 436 | 85.3% | | 451 | 85.3% |
| MSI | 153 | 14.7% | | 75 | 14.7% | | 78 | 14.7% |
CMS | 1 | 153 | 14.7% | | 79 | 15.5% | | 74 | 14.0% |
| 2 | 420 | 40.4% | | 214 | 41.9% | | 206 | 38.9% |
| 3 | 133 | 12.8% | | 66 | 12.9% | | 67 | 12.7% |
| 4 | 240 | 23.1% | | 112 | 21.9% | | 128 | 24.2% |
| Indeterminate | 94 | 9.0% | | 40 | 7.8% | | 54 | 10.2% |
aIQR = interquartile range |
Tumor Stage Reflects Tumor Biology
We tested the hypothesis that tumor stage as defined by TNM, does not only represent disease progression but also reflects different biological entities, by investigating the changes in the number of differentially expressed genes between distinct TNM stages overall and within molecular subtypes. Our analysis revealed considerable gene expression differences between TNM stages, which decreased significantly when stratified for CMS2 and CMS4 representing the major CMS subtypes (Fig. 2A). This was confirmed when stratifying for all subtypes (CMS1-4) (Additional file 1: Figure S2). Furthermore, visualization of the genes that displayed significant differences between tumor stages (ANOVA p < 0.05, n = 2764) shows a clear separation for the immune (CMS1), epithelial (CMS2/3) and mesenchymal (CMS4) subtypes in both a t-SNE plot and a gene expression heatmap (Fig. 2B and Additional file 1: Figure S3). These findings indicate that the observed differential gene expression between cancer stages is largely explained by the biological differences as reflected by CMS subtypes.
CMS4 correlates with more advanced stages and has a higher progression rate
In order to specifically investigate the association between CMS4 and more advanced tumor stages, we built two gene signatures to discriminate disseminated disease (stage III-IV) from local disease (stage I-II), and to separate CMS4 cancers from CMS1/2/3 tumors. Gene signature scores were calculated for both gene signatures. Remarkably, the two scores were highly correlated (r = 0.77, p < 0.001) (Fig. 2C), with only a few overlapping genes (13/200), which confirms that the tumor biology of advanced tumor stages are largely explained by an overrepresentation of CMS4 cancers.
Next, we assessed the rate of progression from early (stage I-II) to advanced (stage III-IV) tumor stage for each of the subtypes by calculating the risk ratios. This shows a markedly increased progression rate towards more advanced stages for CMS4 cancers as compared to CMS1 tumors (RR 1.64, 95% CI: 1.29–2.09), CMS2 (RR 1.25, 95% CI: 1.08–1.46) and CMS3 (RR 1.57, 95% CI: 1.23–2.01) (Fig. 2D).
CMS4 holds prognostic value in high-risk stage II colon cancer
In an effort to validate our findings and provide clinical utility to the insight obtained, we evaluated chemo naive high-risk stage II colon cancers (Table 2). High-risk was defined as either T4 or inadequate lymph node assessment (< 10 nodes assessed). Based on the association between CMSs and tumor stage, we hypothesized that CMS4 cancers are over represented in high-risk stage II cancers due to their worse survival. Indeed, in the combined stage II cohorts, MATCH and GSE33113 (n = 197), CMS4 cancers were more prevalent in high-risk stage II patients (21.7% vs 7.7%, p = 0.02 respectively) (Table 2, Fig. 3A and Additional file: Table S2). DFS for these patients confirmed the poor disease outcome of CMS4 cancers (Fig. 3A). This effect was explained by the poor outcome for patients with a CMS4 cancer in the subgroup with high-risk tumors (5-year DFS 68.0% versus 41.7%) (Fig. 3B and 3C and Additional file 1: Figure S4). These findings were substantiated by a multivariate analysis, which showed a significant correlation of CMS with DFS in the subgroup with high-risk tumors but not in the total stage II cohort (Additional file 1: Table S3). This suggests that CMS might be a relevant addition to the current clinical practice for high-risk stage II patients. This effect might be explained by stage migration, i.e. high-risk stage II tumors contain under-staged stage III tumors. The extended GSE33113 cohort, comprising of both stage II and stage III tumors, indeed revealed that the percentage of stage III colon cancers increased with a rising number of assessed lymph nodes, and plateaued at 10 lymph nodes (Fig. 3D and Additional file 1: Table S4), the minimal recommended number of lymph nodes to evaluate for adequate staging in the Netherlands.
Table 2
Characteristics MATCH and GSE33113
| | Total | | MATCH cohort | | GSE33113 |
| | n = 197 | | n = 112 | | n = 85 |
Gender | Female | 101 | 51.3% | | 57 | 50.9% | | 44 | 51.8% |
| Male | 96 | 48.7% | | 55 | 49.1% | | 41 | 48.2% |
Age | Median (IQR) | 71.0 (63.0–77.0) | | 70.0 (63.0–76.0) | | 74.6 (61.9–80.2) |
T | 3 | 184 | 93.4% | | 107 | 95.5% | | 77 | 90,6% |
| 4 | 13 | 6.6% | | 5 | 4.5% | | 8 | 9.4% |
N | Median (range) | 14 | (1–46) | | 14 | (5–28) | | 12 | (1–46) |
N | < 10 lymph nodes assesed | 45 | 22.8% | | 14 | 12.5% | | 31 | 36.5% |
| ≥ 10 lymph nodes assesed | 142 | 72.1% | | 98 | 87.5% | | 44 | 51.8% |
| Missing | 10 | 5.1% | | 0 | 0,0% | | 10 | 11.8% |
MSI | MSS | 140 | 71.1% | | 79 | 70.5% | | 61 | 71.8% |
| MSI | 52 | 26.4% | | 28 | 25.0% | | 24 | 28.2% |
| Missing | 5 | 2.5% | | 5 | 4.5% | | 0 | 0.0% |
CMS | 1 | 49 | 24.9% | | 29 | 25.9% | | 20 | 23.5% |
| 2 | 83 | 42.1% | | 52 | 46.4% | | 31 | 36.5% |
| 3 | 19 | 9.6% | | 11 | 9.8% | | 8 | 9.4% |
| 4 | 20 | 10.2% | | 5 | 4.5% | | 15 | 17,6% |
| Indeterminate | 26 | 13.2% | | 15 | 13.4% | | 11 | 12.9% |
IQR = interquartile range |
Table 3
Multivariate analysis of relevant parameters and disease-free survival for high-risk stage II patients
| HR | 95% CI limits |
CMS 1 | * | |
CMS 2 | 0.225 | 0.053–0.957 |
CMS 3 | 0.599 | 0.062–5.781 |
CMS 4 | Reference | |
Gender | 2.725 | 0.488–15.225 |
Age | 0.986 | 0.952–1.022 |
Location | 3.45 | 0.799–14.85 |
T | 2.006 | 0.360-11.173 |
MSI | ** | |
CMS, consensus molecular subtype; MSI, microsatellite instability |
*Not estimable due to no events |
**Not estimable due to no MSI patients |