NRF2 expression derivation and training
In total 62 candidate genes were analysed in 9 independent colorectal datasets for differential expression relative to normal tissue (17,34–39). Some datasets were subsetted into different anatomical sites for the purposes of analysis resulting in 24 discrete sets of data (see supplementary Table 1). 40 were found to be differentially expressed in tumours, 21 of which were significantly over-expressed and 20 which were significantly under-expressed, in at least one or more of the datasets (supplementary Figure 1). One gene, COL3A1, was shared as it was over expressed in some datasets and under expressed in others. Of the 40 differentially expressed genes, four could not be matched between the training and validation dataset microarrays so were omitted from further analysis. The final group of 36 genes was: ABCA8, ABI3BP, ADAM12, ADRB1, ANGPT1, ANKRD29, ANKRD44, BCHE, C15orf48, COL3A1, COL5A1, EGLN3, LIFR, METTL7A, PCM1, PLAU, PLCB4, RECK, RGCC, RRM2, SEC14L4, SERPINH1, SFN, SLIT3, SPP1, TNS1, TOM1L2, TSPAN5, TTYH3, VSIG10, VCAN, AKR1C1, LRP8, NAMPT, PTGES, SLC27A5. There was a very high level of co-ordinated expression between the 36 genes in the training dataset as evidenced by pairwise correlations (Figure 1A).
Variable selection
Following PCA, PC1 was indicated to be useful for explaining the survival outcome by both Akaike and Bayesian information criteria. PC1 in the training set had absolute correlations >0.5 with probes that mapped to the following 10 genes: VCAN, ADAM12, COL3A1, COL5A1, SERPINH1, RECK, PLAU, SPPI, TNS1 and SLIT3. Due to the high correlation of these genes with PC1 in the training set, we hypothesised that they were of higher biological relevance for prognosis prediction than other NRF2 target genes. This expression pattern was detected in each of the validation sets (Figure 1B-E).
NRF2 expression a biomarker of worse survival
In stage I/II/III disease, higher NRF2 expression corresponded to worse DFS in GSE14333 (HR[1]=1.551, 95% C.I 1.200–2.004, LRT p = 0.0008) and GSE39582 (HR=1.172, 95% CI 1.008–1.362, LRT p = 0.0383). Including the 60 cases of stage IV disease also available in GSE39582, NRF2 expression was also associated with worse OS (HR=1.240, 95% C.I 1.086–1.416, LRT p = 0.001). In the MRC FOCUS trial, comprised of first line stage IV metastatic patients, NRF2 expression was again associated with a worse overall survival (HR=1.140, 95% C.I 1.035–1.255, LRT p = 0.008). Figure 2 shows that high expression corresponded with worse prognosis for DFS in GSE14333 and GSE39582 (panels A and B), and for OS in GSE39582 and MRC FOCUS trial (panels C and D).
In order to assess the relevance of NRF2 expression in rectal cancer specifically, and the ability to migrate between RNA expression platforms, we performed the analysis on a rectal cancer only expression dataset, where all sampled patients received neoadjuvant chemoradiotherapy (GSE87211). Higher expression was associated with worse DFS (HR=1.431, 95% C.I 1.060–1.933, LRT p = 0.056) but not OS (HR=1.464, 95% C.I 0.955–2.245; LRT, p = 0.197). Figure 3 shows that high expression corresponded to worse prognosis for DFS.
NRF2 expression provides additional explanatory power to known prognostic variables
Within the publically available datasets there were additional variables that are known prognostic factors. The magnitude of their respective effects are summarised in the forest plot (supplementary figure 2). We used these in a multivariate analysis. In GSE14333, after adjusting for the effect of stage and adjuvant chemotherapy, NRF2 expression remained a significant predictor of worse DFS (HR[2]=1.365, 95% C.I 1.049–1.776, LRT p = 0.02). Similarly in GSE39582, the effect of high NRF2 expression corresponds to worse DFS (HR=1.168, 95% C.I 1.000–1.363, LRT p = 0.049) after adjusting for the effect of stage and mismatch repair status (MMR). In the latter dataset, NRF2 expression was also significantly associated with worse OS when adjusting for stage alone (HR=1.185, 95% C.I 1.040–1.350, LRT p = 0.01). No adjusted analysis was carried out for MMR with NRF2 expression on OS due to the known contrasting effects MMR status has on prognosis in early stage and metastatic disease, which could lead to model misspecification.
In the MRC FOCUS trial, prognostic factors within the dataset were site of the primary tumour (sidedness) and BRAF V600E mutation. Again, high NRF2 expression corresponded to worse overall survival (HR=1.123, 95% C.I 1.020–1.237, LRT p = 0.0185). In summary, there was systematic evidence that NRF2 signalling had an effect on DFS and/or OS in all available datasets (Table 2).
NRF2 expression and Consensus Molecular Subtypes (CMS)
In order to understand how NRF2 expression aligns with the current transcriptomic landscape of colorectal cancer, we examined the distribution of the three groups of NRF2 expression level across the four CMS subtypes in the MRC FOCUS trial (Figure 4). While high NRF2 expression can be seen across all subtypes, strikingly CMS 4 showed substantially higher NRF2 expression with no patients in the category of low NRF2 expression. By contrast, the majority of patients in CMS 2 or 3 had low and intermediate NRF2 expression.
[1] As NRF2 expression is a continuous variable here, the HR reported in the text of this section is the HR between the upper and lower tertiles of the NRF2 expression.
[2] See footnote 2.