Basic Data Information and Individual Selection
The detailed flowchart of this study is shown in Figure 1. The expression profiles and corresponding demographical information of 30 non-cancer patients and 365 stage I-III colon cancer patients that satisfied the selection criteria were enrolled as the training cohort. Meanwhile, 585 cases in GSE39582 and 138 cases in GSE17536 were selected as the validation cohort for diagnostic and prognostic models, respectively. The characteristics of the cases were listed in Supplemental Table 1.
Diagnosis Score and Potential Value
Through limma analysis, the expression levels of 36 genes were tested (Supplemental Table 2). To compare the expression levels in each stage of cancer, analysis of variance (ANOVA) was performed and used for intergroup comparison. The overlapping areas “a,” “b,” and “c” were shown in the Venn diagram, which reflects a distinct demarcation in stage progression (Figure 2A). Area “a” demonstrated that the genes were different among each group. Area “b” showed that the genes of stage I were significantly different from those of stage II and III. Similarly, area “c” demonstrated that the genes of stage III were expressed significantly more compared with those of stage I and II. Area “a” revealed no significant ERGs; however, AXIN2 and SLC12A2 were shown in areas “b” and “c” respectively. Thus, these genes can be considered as biomarkers related to the early process.
In combination with AXIN2 and SLC12A2, a diagnosis score model between cancer and non-cancer cases was constructed (Table 1). The heatmap of the model is described in Figure 2B. The calculated AUC was 0.94, which indicates that the diagnosis score has better performance in evaluating consistency between predicted results and actual diagnosis results (Figure 2C1). Higher expression levels of AXIN2 and SLC12A2 in the cancer group was observed compared with the non-cancer group, and the comparison among stages was in-line with the Venn diagram (Figure 2C2). The discrimination power of the diagnosis score was further validated; the ROC curve analysis showed that the AUC was 0.89 (Figure 2D1). Also, the expression of AXIN2 and SLC12A2 among cancer patients was significantly greater than that of non-cancer individuals in the validation cohort (Figure 2D2).
ERGs Related to Colon Cancer Patient Survival and Prognostic Score
In the first dataset, 478 eRNAs were evaluated to find their target genes and 1322 genes were found as potential ERGs. Subsequently, from the 675 intersecting target genes in the training and validation dataset, 11 overlapping candidates were filtered by Spearman’s correlation test and univariate Cox analysis (r > 0.40, p < 0.05, Supplemental Table 3-4) in the training set. After the LASSO Cox regression analysis, 11 genes (ZNF160, ZNF467, DCST2, ATP2A1, ABCG1, ZMIZ2, SMARCD3, PLCB4, PCCA, DNAJC15, EPPK1) were retained to calculate the risk score of each patient (Figure S1). The formula of risk score was following: Risk score = 0.34848396 * ZNF467 expression + 0.71141290 * ZNF160 expression + 0.13741571 * ZMIZ2 expression + 0.01644179 * SMARCD3 expression + (-0.06930923) * PLCB4 expression + (-0.13361941) * PCCA expression + (-0.53763715) * EPPK1 expression + (-0.16501684) * DNAJC15 expression + 0.62206947 * DCST2 expression + 0.50923217 * ATP2A1 expression + 0.50157821 * ABCG1 expression. Most of the genes, including ZNF160, ZNF467, DCST2, ATP2A1, ABCG1, ZMIZ2, and SMARCD3 had positive coefficients, which meant their higher expression may result in a worse outcome, while the negative coefficient (PLCB4, PCCA, DNAJC15, and EPPK1) meant that higher expression levels were related to a longer OS. The density distribution plot of the prognosis score was presented in Figure 3A. The cut-off value of the risk score (2.551676) was determined using the largest Youden index from the survminer package in R, and was used to divide the patients into the low-risk (n1 = 310) and high-risk groups (n2 = 55, Figure 3B1). In addition, the clinical differences between the two groups were compared and displayed as follows:
The heatmap of risk stratification is displayed in Figure 3B2. The Kaplan-Meier survival curves of the two groups using the log-rank test showed that a remarkable difference was observed and patients with a lower risk score had a better outcome (p < 0.001, Figure 3C). There was also a significant difference in the expression of each ERG between the low- and high-risk groups (Figure 3D).
Prognostic Model and Performance Estimation
Taking the clinical features and prognosis score into account, univariate and multivariate Cox analyses were performed to identify the prognostic variables. After stepwise selection, age, TNM classification, and risk score were retained in the model and found to have a significant value with OS for CRC patients, and the Cox regression model satisfied the PH assumption (Table 2; p = 0.097, Figure S2). Integrating age, TNM classification, and prognosis score, a compound nomogram was established as a predictive device (Figure 4A). The C-index of the nomogram was 0.76 (95% confidence of interval (CI): 0.69-0.83), suggesting a good discrimination performance. Calibration curves showed that the outcome predicted through the nomogram was close to the observed outcome (Figure 4B1), which indicated accuracy. As shown in Figure 4B2, the area under the curve (AUC) of the time-dependent ROC curve was 0.78 and 0.70 at 5 and 7 years, respectively, which confirmed the strong predictive accuracy of the integrated nomogram.
The nomogram was validated in the third dataset to further indicate the model stability. Consistent with the results in the training set, the calibration plot showed substantial agreement between predictive and reference lines (Figure 4C1). The model also showed a strong ability to predict survival, with an AUC above 0.7 in the independent validation (Figure 4C2).
Correlation between ERGs and eRNAs and Related Biological Pathways
Through the matchup analysis among the ERGs and eRNAs, two clusters of five eRNAs (ENSR00000061966, ENSR00000061967, and ENSR00000061968; ENSR00000317100 and ENSR00000317101) were found to regulate the same diagnostic ERG (SLC12A2) synchronously and were highly correlated with each other; the correlation coefficients between eRNAs in each cluster were all beyond 0.95 (Figure 5A1). Additionally, there was a correlation observed among these eRNAs and SLC12A2. Interestingly, eRNAs of the same cluster were located in the adjacent area.
The ZNF160 gene was regulated by a three eRNAs, including ENSR00000111307, 19:52622405-52628405, and 19:52623389-52629389, and the coefficients for them were close to 1 (Figure 5A2). Moreover, similar results were shown in ZMAZ2, PLCB4, DNAJC15 and their responding eRNAs. The matrix based on the correlation coefficients was listed in Supplemental Table 5. Due to their matching regulatory relationship and close proximity, the six clusters of eRNAs were likely supposed to be seRNAs.
According to the functional enrichment of the GSVA analysis, the activated pathways were identified. In the high-risk group, the upregulated genes are primarily enriched in aminoacyl tRNA biosynthesis; valine, leucine, and isoleucine degradation; butanoate metabolism; and the citrate cycle TCA cycle (Figure 5B).
Differences between Risk Stratification and Tumor Immune Landscape
Because seven ERGs were immune-related among eleven prognostic features, the tumor immune landscape was further investigated. Comparisons between the two risk groups revealed that there were significant differences among the infiltration levels of 23 immune cells (Figure 6A). The proportion of the immune cells, such as central memory CD4+ T cells, immature B cells, and natural killer T cells, in the high-risk group were higher than those in low-risk group (p < 0.05). As presented in Figure 6B, the estimate score and stromal score were also notably higher in the high-risk group than in the low-risk group (p < 0.05), while the tumor purity of the high-risk group was notably lower than that of the low-risk group (p < 0.05). The infiltration of the five immune cells (B cells, CD4+ T-cells, dendritic cells, macrophages, and neutrophils) was positively correlated with the risk score in colon cancer patients (Figure 6C).
Relationships between Gene Signature, Chemotherapy, and Immunotherapy
We chose three immune checkpoint inhibitors (CTLA-4, CD40, and TIGIT) as targets in immunotherapy. The effectiveness of the immune checkpoints differed significantly among conditions (p < 0.01, Figure 7A). The predicted IC50 of three drugs (TW37, Bicalutamide, and Cisplatin) in the low-risk group was higher than that of the high-risk group, which indicated patients with higher risk scores were more sensitive to these chemotherapy agents (p < 0.01, Figure 7B).
Knockdown of AXIN2 and ABCG1 and Their eRNAs
As shown in Figure 8A, both siRNA types were able to efficiently decrease the AXIN2 and ABCG1 expression and protein levels. EdU staining showed that the number of EdU stained cells was markedly reduced compared with the control by AXIN2 and ABCG1 silencing (Figure 8B). Cell growth curves using CCK8 revealed that the knockdown of AXIN2 and ABCG1 led to a reduction in cell viability (Figure 8C). Transwell migration and invasion experiments further validated that AXIN2 and ABCG1 significantly restained both cell migration and invasion abilities of CRC cells (Figure 8D). To measure the quantification of cell apoptosis, flow cytometry was done, which showed apparent inhibition of migration and invasive ability (Figure 8E).