Identification of differentially expressed irlncRNAs (DEirlncRNAs) and risk irlncRNA in CRC
The overall workflow is illustrated in Figure 1. Briefly, after collecting transcriptional profiles from the publicly available database, we annotated those transcriptomic data based on GTF from Ensembl. Considering it is difficult to directly find out the irlncRNA identity, Pearson Correlation Analysis was conducted and a total of 739 irlncRNAs were defined with the setting correlation coefficients > 0.5 (Table S1). We next performed differentially expressed analysis to evaluate the differences in those irlncRNAs between cancerous and paraneoplastic tissues. We recognized a total of 377 DEirlcRNAs, including 28 low-expressed and 349 high-expressed irlncRNAs in CRC patients, with the filter condition setting to logFC > 2 and P value < 0.05 (Figure 2A). We then performed a univariant Cox analysis and identified 115 risk irlncRNAs that were significantly correlated with survival outcomes of patients with CRC. By taking the intersection of the DEirlcRNAs and the risk irlncRNAs, we ultimately recognized 55 irlncRNA as core irlncRNAs (Figure 2C, Table S1). Consequently, we then constructed a Cox HR model (pair-risk model) containing 11 core irRNA pairs as well as an expression HR model (exp-risk model) containing 11 core irlncRNAs (Table 1, 2). Eventually, we, separately, examined the relationship between the risk scores obtained from the two models and disease-related characteristics, including clinical traits, survival states, immune microenvironment, chemotherapeutic sensitivity, and signaling pathways.
Construction of core irlncRNA pairs and establish of a pair-risk model
To ensure that our model can be adopted without considering the differences between the profiles, we paired those core irlncRNAs and then correlated them with the clinical survival information. With the setting condition of p value < 0.05 in the univariate cox analysis, a total of 75 pairs of core irlncRNAs were determined to further construct the risk assessment model (Table S2). After Lasso regression analysis of these 75 core irlncRNA pairs, 11 of them were finally selected for constructing the pair-risk model (Figure 3A, 3B). We then performed univariate and multivariate Cox analyses of these 11 core irlncRNA pairs to confirm their ability to serve as independent clinical prognostic factors (Figure 3C, 3D).
Evaluation of the clinical predictivity of the pair-risk model
We drew the 1- year ROC curve and found the area under the curve (AUC) is 0.789, indicating that the model can be recognized with high accuracy to reflect the short-term overall survival of CRC patients (Figure 4A). To further validate the stability and generalizability of the model, we plotted the 1-, 5-, and 10-year ROC curves (Figure 4B). We noticed that the 5-year ROC curve was optimal with the best cutoff value of 0.944 (Figure 4C). Based on this optimal cutoff value, we then categorized the CRC patients from the TCGA database into two high- and low-risk groups. Of these, 237 patients were classified into the high-risk group, while the remaining 209 patients were allocated to the low-risk group. Next, by virtue of multi-metric ROC curves, we plotted the ROC curve of the risk model together with that of clinical characteristics including age, T, N, and M stages in the identical diagram for comparison. The results showed that the AUC value of the risk model dramatically surpassed that of other clinical parameters, indicative of the high performance of this model (Figure 4D).
The risk score of individual patients and their survival statuses were illustrated in Figures 5A and 5B, from which we can recognize apparent differences in survival outcomes between the two subgroups of CRC patients. Concordantly, we conducted the Kaplan-Meier analysis and observed that the life expectancy of patients in the high-risk group was substantially decreased when compared to patients in the low-risk group (P value < 0.001) (Figure 5C). To estimate the relationship between the risk model and the clinic traits, we charted a clinic-related heat map using the chi-square test. Our findings suggested that the T grade and N grade were significantly different between the two groups with p-values less than 0.01 and 0.001, respectively (Figure 5D). We next examined the risk score of patients across different clinic traits including tumor stage, T grade, and N grade, respectively. We observed that the differences across tumor stage, between distinct T grades, as well as different N grades were all of the statistical significance (Figure 5E-G). Finally, we conducted the univariate and multivariate cox analyses and found the risk assessment model can serve as an independent prognostic predictor (Figure 5H and 5I).
Taken together, the irlncRNAs-based pair-risk assessment model not least can be utilized as a robust predictor of the survival outcomes, but also serve as a reliable indicator of tumor aggressiveness of the CRC patients.
Assessment of the pair-risk model with tumor immune microenvironment
To probe whether this model can reflect the tumor-infiltrating immune cells in the context of CRC, we first performed the correlation analysis of immune cells via 7 different algorithms, which is illustrated with distinct colors (Figure 6A). We observed that granulocytes, M1 macrophage, dendric cell, neutrophil, Th1 cell, CD8+ T cell were positively correlated with the risk value of the model. On the contrary, B cells, hematopoietic stem cell, neutrophil, NK cell resting, T cell CD4+ memory resting showed a negative correlation (Figure 6B-X). This means that the risk model can be applied to predict the landscape of immune microenvironment in patients with CRC.
Estimation of the chemotherapeutic responsiveness of CRC by the pair-risk model
To investigate whether this risk model can be employed to predict the chemotherapeutic sensitivity for CRC patients, we examined the capacity of our risk model for forecasting drug sensitivity by analyzing the immune-related gene expression as well as the differences of IC50 between the high- and low-risk groups. Our observations indicated that some immune-correlative genes, such as TGFB2 and LAG3, were shown a significantly positive correlation with risk score value (Figure 7A and B). Furthermore, we identified 5 chemotherapeutic drugs, including CCT007093, CCT018159, CGP.60474, CGP.082996, JNK.9L, that appeared to have an inverse correlation between risk score and IC50 value (Figure 7C-G), implicating that our risk model can serve as a tool to assist the chemotherapeutic medication.
Construction and validation of an exp-risk model based on the selected core irlncRNAs
We next constructed an irlncRNA exp-risk model based on the 11 core irlncRNAs. Again, we divided the patients into a high-risk group and a low-risk group based on the median risk value of the expression risk model (Figure 8A). Notably, patients in the high-risk group tended to have an apparent shorter life expectancy when comparing to those in the low-risk group (Figure 8B). Moreover, we illustrated the expression levels of the core lncRNAs involved in those risk groups (Figure 8C). Additionally, we plotted the Kaplan-Meier curves comparing the survival outcomes of the groups and observed consistent results (Figure 8D).
Independent prognostic analysis of the exp-risk model
To evaluate whether this exp-risk model can be applied to predict prognostic outcomes of patients with CRC, we performed univariate independent prognostic analysis. Our observations suggested that the risk score, alongside tumor stage, T grade, N grade, and M grade, was significant associated with survival outcomes of CRC patients (Figure 9A). Moreover, we performed the multifactorial independent prognostic analysis and recognized that only the risk score could serve as an independent high-risk factor for predicting the prognosis of colorectal cancer (Figure 9B). Then, we plotted the curve of the exp-risk model together with the curves of other clinic traits. By comparing the AUC of the exp-risk model to those of other clinic traits, we found the AUC value of this risk model is 0.751, while the AUC values obtained for other clinical traits are less than 0.75, indicating that this model outperforms other clinical traits for predicting the survival outcomes of CRC patients (Figure 9C).
Additionally, to explore whether combining the pairs-risk model with the exp-risk model could produce a more robust prognostic predictability, we categorized the patients involved into four groups: high pair score + high expression score; high pair score + low expression score; low pair score + high expression score; low pair score + low expression score. As expected, we found patients in the high pair score + high expression score group had the dramatically worst life expectancy, with a 5-year survival rate of only around 25% (Figure 9D).
GSEA enrichment analysis
Finally, we performed a GSEA enrichment analysis and found that immune-related signaling and cancer-associated signaling pathways were significantly overactivated in the high-risk group. These results further confirmed the accuracy and reliability of the exp-risk model (Figure 10A-L).