Identification of Angiogenesis Gene Subtypes in Colorectal Cancer
First, we inputted angiogenesis, neovascularization and vasculogenesis respectively into the GeneCards website to obtain different three gene sets. Then, these gene sets were intersected and plotted in a Venn diagram to obtain a total of 260 genes associated with angiogenesis (Fig. 1A). It is well known that angiogene-related genes (ARGs) are key factors in tumor angiogenesis[29]. Based on the above hypothesis, we stratified samples with qualitatively different CRC based on the expression of 260 angiogenesis genes using consensus clustering analysis of the NMF algorithm (Supplementary Fig. 1). As shown in Fig. 1B, we identified two different clusters of modified patterns, including 314 cases in C1 and 229 cases in C2. The further survival analysis implied that C2 had a better prognosis than C1 (Fig. 1C). Finally, we used heat maps to show the relationship between ARGs and clinicopathological features in two typing patterns (Fig. 1D).
Analysis of GSVA and immune function in angiogenesis-related genotyping
To preliminarily unveil the biomolecular mechanism of two different patterns of angiogenesis, we performed GSVA enrichment analyses in patients with C1 and C2. As shown in Fig. 2A, C1 patients showed significant enrichment in the intercellular junction pathway, such as VASCULAR SMOOTH MUSCLE CONTRACTION and FOCAL ADHESION. However, C2 patients presented significant enrichment in metabolism-related pathways, such as AMINOACYL TRNA BIOSYNTHESIS and LYSINE DEGRADATION. Further analysis of the difference in immunoinfiltration between the patients of different pattern cluster demonstrated that C2 patients showed high infiltration abundance of effector memory CD4 + T cells and activated CD4 + T cells, while C1 patients showed higher infiltration of M0 macrophages. (Fig. 2B, C).
Construction and Evaluation of ARGs-DEGs risk core model
The above data showed that CRC classification by ARGs was significantly different in survival. Therefore, we further established a risk model for predicting CRC prognosis based on the two subtypes. As shown in supplementary Table 1, through differential expression genes (DEGs) analysis, we obtained a gene set containing 1735 genes. Then, univariate Cox regression analysis of 1735 ARGs-DEGs was performed, and 38 genes significant associated with the prognosis of CRC were obtained (Supplementary Fig. 2). Furthermore, we applied LASSO regression analysis based on the 38 ARGs-DEGs and constructed a prognostic model contained 13 genes, which including PRKAR2A, CPT2, CALB2, IRF7, ARL8A, ENO2, DPP7, MPC1, CTNNA1, INHBB, SEZ6L2, DHRS7 and GSR (Supplementary Fig. 3A-B). To verify the accuracy and effectiveness of ARGs-DEGs model, all CRC patients were randomly divided into a training cohort (n = 383) and a validation cohort (n = 160) at 7:3. In the training CRC cohort, the median risk score divides CRC patients into low risk and high risk cohorts (Fig. 3A-B). The expression of 13 genes in the gene signature in the high-risk and low-risk groups was visualized as a heat map (Fig. 3C). Kaplan-Meier curve showed that CRC patients in the low-risk group had a better prognosis (Fig. 3D). In training cohort, the values of the 1-year, 3-year and 5-year ROC curves were 0.685, 0.696 and 0.773, respectively (Fig. 3E). In the validation cohort, the accuracy and validity of the ARGs-DEGs model was almost consistent to the training cohort (Supplementary Fig. 4A-E). These data provided strong evidence for the accuracy and effectiveness of ARGs-DEGs model.
Construction and validation of the nomogram
The ARGs-DEGs model demonstrated accurate prognostic performance for CRC, and we intended to further expand the clinical value. Univariate and multiariate Cox regression analyses revealed that the riskscore of ARGs-DEGs model is an independent risk factor for CRC (Fig. 4A-B). In order to better predict the survival of patients with colorectal cancer, we further constructed a norman chart with various clinical features and risk scores (Fig. 4C). Further analysis revealed that the Norman chart was more accurate in evaluating the prognosis of CRC patients, especially at 1, 3 and 5 years. (Fig. 4D).
Pathway enrichment analysis of ARGs-DEGs model
The ARGs-DEGs model presented excellent predictive performance for the prognosis of CRC, and we intended to preliminarily explore its underlying mechanism. According to the stratification of risk scores, KEGG and HALLMARKER analyses were carried out for TCGA-CRC patients by GESA. In the KEGG dataset, the signal pathways enriched in the high-risk group were mainly intercellular junctions, extracellular matrix receptors and adhesion patch junction signaling pathways, and the intracellular metabolism-related pathways were enriched in the low-risk group (Fig. 5A, B). In the HALLMARK dataset, the signal pathways enriched in the high-risk group were mainly epithelial mesenchymal transition and inflammatory response pathways, and the cell cycle and oxidative phosphorylation-related pathways were mainly enriched in the low-risk group (Fig. 5C, D).
Estimation of the association of tumor immune profile with the ARGs-DEGs model
Previous reports have confirmed the close relationship between tumor vessels and TIM, we further evaluated the relationship between ARGs-DEGs model and immune profile in CRC[30]. As shown in Fig. 6A-C, Immunescore, Stromalscore and Estimatescores were significantly lower in the high-risk group compared to the low-risk group. Interestingly, Tumorpurity was significantly higher in the high-risk group (Fig. 6A-D). In addition, CIBERSORT was applied to evaluate the difference in immune cell infiltration in different risk groups, and we found that the proportions of T cells CD4 naive, T cells CD4 memory resting, T cells CD4 memory activated, T cells follicular helper, Dendritic cells resting, Dendritic cells activated and Eosinophils were higher in the low-risk group. However, the proportions of Tregs and Macrophages were higher in the high-risk group (Fig. 6E, F). To clarify the correlation between risk scores and immune checkpoints, we conducted correlation analysis of risk scores with 12 immune checkpoints (Fig. 7A). The results demonstrated that the riskscore is positively correlated with PDCD1, HAVCR2 and VSIR, while CD226 and CD96 were up-regulated in the low-risk group (Fig. 7B-M).
Immune escape capacity and immunotherapy efficacy of CRC predicted by risk score
ICIs, represented by CTLA-4/PD-1 inhibitors, undoubtedly bring a great breakthrough for anti-tumor therapy[31]. Traditionally, PD-L1 and MSI have been used as biomarkers to evaluate response to ICIs, but their accuracy is insufficient. With advances in transcriptome sequencing, TIDE and IPS are increasingly being used to evaluate immunotherapy responsess[32]. We divided CRC patients into four groups by using the IPS evaluation system combined with the expression levels of PD1 and CTLA4, namely ips_PD1_negative_CTLA4_negative (PD- and CTLA-), ips_PD1_positive_CTLA4_negative (PD + and CTLA-), ips_PD1_negative_CTLA4_positive (PD- and CTLA+) and ips_ PD1_positive_CTLA4_positive (PD + and CTLA+). Our analysis showed that CRC patients with low risk scores had higher IPS scores, which suggesting that CRC patients with low risk may be more sensitive to immunotherapy (Fig. 8A-D). Finally, we further analyzed the correlation between the riskscores of ARGs-DEGs model and immune dysfunction, exclusion, MSI and TIDE. As shown in Fig. 8E-G, the high-risk patients had more obvious immune dysfunction and immune exclusion, and MSI levels were higher than those of high-risk patients. However, compared with the level of TIDE of high risk patients, low risk patients are lower. (Fig. 8H).