Identification of CCSC-related lncRNAs in CRC
According to the screening conditions, we isolated a total of 1768 CCSC-related lncRNAs from the TCGA database and displayed the correlation of CSC markers through a Sankey diagram (Figure 1A). EdgeR and the "DESeq2" R packages were used to screen 669 differential lncRNAs as differentially expressed CCSC-related lncRNAs, of which 104 lncRNAs were down-regulated and 565 lncRNAs were up-regulated (Figure 1 B, C). Univariate Cox regression analysis showed that a total of 5 lncRNAs were identified to be closely associated with patients’ survival in CRC, including LINC00174, ZEB1-AS1, MCM3AP-AS1, ALMS1-IT1 and FENDRR. Among the above core CCSC-related lncRNAs, only FENDRR (HR=0.651, p=0.043) was a protective factor for CRC patients, and the remaining four CCSC-related lncRNAs were risk factors for CRC patients (Figure 1D).
Establishment of a prediction model for CCSC-related lncRNAs in CRC
Based on Lasso regression model and iterative analysis of cross-validation method, we constructed a CCSC-related lncRNA risk model composed of 4 lncRNAs (ZEB1-AS1, LINC00174, FENDRR and ALMS1-IT1) (Figure 2 A, B). According to the risk score formula, we took the median of the score (2.25) as the cutoff value to classify the CRC patients. Following that, the TCGA and GEO cohorts were separated into high- and low- risk groups, respectively. In the PCA distribution diagram, the red scatter points (high-risk) and the blue scatter points (low-risk) can be distinguished effectively (Figure 2C). In the training group, survival analysis indicated that the OS of high-risk CRC patients was both inferior than the low-risk group (p<0.001). Consistently, the OS analysis in the test group and the PFS analysis in the train group all presented the same results (p<0.001) (Figure 2 D~F). In the cox univariate analysis, age, T stage, N stage, M stage, and CSC-lncRNA risk score were all associated with the prognosis of CRC patients (p < 0.001). By multivariate Cox regression analysis, the above factors can still served as survival independent factors against the other clinical characters (Figure 3 A, B). The CCSC-related lncRNA model was further evaluated by ROC curve analysis, and the results suggested that this model had the best accuracy in predicting the 5-year survival of CRC patients, with AUC=0.751 (Figure 3C). Notably, our CCSC-related lncRNA model had better predictive ability than any other clinical features (Figure 3D). In addition, clinical traits such as distant metastasis, lymph node metastasis, and T stage had significant statistical differences between high- and low- risk groups (Figure 3E).
According to the scoring scale in Figure 4A, a clinically characteristic score was applied. The patient's outcome can be estimated based on the composite scoring scale. In Figure 4B, The distance of the calibration curve from the grey line indicated that the accuracy of predicting the surival rates. As we can seen, the 5-year survival rate of patients is comparatively accurate. ROC curves and univariate Cox regression analysis further confirmed the reliability of this nomogram (Figure 4C, 4D). The AUC comparison results showed that the nomogram (0.796) is prominently higher than other clinical traits (Figure 4D). Therefore, our model has strong reliability and applicability for the survival prediction of CRC patients.
Mutation signature and functional enrichment analysis
We conducted the GSEA enrichment analysis between the high- and low- risk groups. The top 5 enrichment signaling pathways of the up-regulated and down-regulated genes were respectively listed in Figure 5A and 5B. As the results showed, cell adhesion molecules, cytokine receptor interaction and ECM receptor interaction were significantly activited in the high-risk group. Among the differences of 22 types of immune cells, B cells naive was statically increased in the high-risk group (Figure 5C). Intriguingly, most immune-related functions are suppressed at high-risk group, such as stimulation of antigen-presenting cells (APCs), chemokine receptors (CCRs), checkpoints, inflammation, stimulation of T cells, and type II interferons (IFNs) (Figure 5F). The above investigations re-confirmed that the CSCs can promote tumor progression by regulating the immune microenvironment. Mutation signature analysis listed the top 20 genes with the highest mutation frequency (Figure 5D, E). The results demonstrated that the mutation frequency of TP53 (a classic tumor suppressor gene ) in the high-risk group was higher than the low-risk group. Whereas, the mutation frequencies of the other 19 oncogenes were lower than the low-risk group. In addition, GSVA analysis indicated that the cancer related pathways, such as P53 signaling pathway, cell cycle, DNA replication, PPAR signaling pathway, apoptosis death and biological metabolism, possessed a higher activity in the high-risk group (Figure 5G). Moreover, the microsatellite instability of the low-risk group is more unstable (Figure 5H), suggesting that the low-risk group may be more sensitive to immunotherapy.
Analysis of chemotherapy and immunotherapy in CCSC-related lncRNA risk model
Consistent with conventional cognition, the IC50s of the chemotherapy drugs (Fluorouracil, Bleomycin, Gemcitabine and Sunitinb) were all positively correlated with the CSC risk scores of patients (Figure 6A~D). The IC50 of the high-risk group is significantly higher than that of the low-risk group (Figures 6E~6H). These results indicated that the high-risk group is less sensitive to the effects of chemotherapy drugs and targeted therapy, which may be related to the consequence of CSCs. We also analyzed the correlation between risk scores and gene mutation status (Figure 6I~6L). To further validate our observations, we selected four genes among the genes with high mutation frequency including BIRC6, PIK3CA, SOX9, and TP53. Patients with wild-type BIRC6, PIK3CA, and SOX9 consistently had higher risk scores than mutants (Figure 6I~K). Conversely, patients with wild-type TP53 had lower risk scores than those with mutant TP53 (Figure 6L), which is consistent with previous findings. Therefore, intervention strategies targeting CSC are critical in oncology.
Metascape enrichment analysis, PPI network and Hub genes
To further explore the mechanism of CSC in each subgroup, we performed Metascape enrichment analysis of DEGs between the high- and low- risk groups. The results indicated that the over-expressed genes in the low-risk group were mainly involved in immune regulatory responses. Defense response to bacterium, regulation of leukocyte-mediated immunity and adaptive immune response were mainly enriched (Figure 7A). The over-expressed genes in the high-risk group are not only involved in cancer pathways (DNA damage, telomere stress-induced senescence, histone H3K27 trimethylation, and negative regulation of epithelial to mesenchymal), but also in the CSC-related pathways growth (ovarian follicle development, and ncRNAs involved in STAT3 signaling in hepatocellular carcinoma) (Figure 7B). The interactions of the DEGs between the CSC-subsets were obtained from the String database and visualized by Cytoscape software (Figure 7C). The top 10 genes at the core position in the network were acquired by the cytohubba plugin: FABP1, GUCA2A, PNISR, AGT, HOXB4, CLCA4, GUCA2B, ZG16, AQP8 and HOXA5 (Figure 7D). The HOX family members are remarkably clustered into a relatively complete module and their expression is up-regulated in the high-risk group. Interestingly, HOXB4 is at the center of this module.
On this basis, we found that the expression of HOXB4 gene was not only higher in tumor tissue, but also in the high age group, and its expression level also exhibited a increasing trend with the progression of T stage (Figure 8A~8D). Nomogram and calibration analysis of HOXB4 revealed that an accurate prediction was met at 5-year survival (Figure 8E, F). The COAD cohort showed that, in terms of PFS, DSS, OS, over-expressed HOXB4 indicate a worse prognosis (P<0.05). And the predictive ability consisted a statistical difference in both early and advanced stages (Figure 8G~8L). These above results suggested that HOXB4 has a predictive potential for CRC patients. Moreover, the correlation analysis between HOXB4 and CSC genes revealed that HOXB4 was positively correlated with CD44, ALCAM, PROM1, ITGB1 and SOX2 (Figure 8M). Analysis of the tumor immune microenvironment showed that patients with abundant HOXB4 have increased infiltration of immune cells, including T cells, CD8+ T cells, macrophages, NK cells and TH1 cells (Figure 8N). Interestingly, HOXB4 is positively correlated with the infiltration of the vast majority of immune cells, except Th17cells (Figure 8O). Finally, the single-cell RNA sequencing (scRNA-seq) analysis of CRC showed that HOXB4 mainly exist in NK cells (Figure 8P~Q). Therefore, HOXB4 is expected to be one of the candidate genes for potential biomarkers of CRC.