Clinicopathological characteristics of the patients with CRC
The clinicopathological features of 150 patients with CRC are shown in Table 1. There were 91 male patients (60.67%) and 59 female patients (39.33%). The age of the patients ranged from 31 to 77 years, with a median age of 46 years. All 150 patients were adenocarcinoma, of which low, medium, and high differentiated adenocarcinoma accounted for 21.33% (32/150), 68.67% (103/150), and 10.00% (15/150), respectively. According to the AJCC clinical staging (stage) criteria, the proportions of patients with stage I/II, stage III, and stage IV were 14.67% (22/150), 26.00% (39/150), and 59.33% (89/150), respectively. In TNM staging, the proportion of patients with T1, T2, T3 and T4 was 2.00% (3/150), 7.33% (11/150), 57.33% (86/150) and 33.33% (50/150), respectively. The proportions of patients with N0, N1, N2 and N3 were 33.33%(5/150), 27.33% (41/150), 36.67% (55/150) and 2.66% (4/150), respectively. The proportion of patients with stage M0 and M1 was 46.67% (70/150) and 53.33% (80/150), respectively. In addition, in the detection of serum tumor markers in 150 CRC patients, the proportion of patients with elevated CEA, CA125 and CA199 levels was 60.00% (90/150), 18.70% (28/150) and 48.00% (72/150), respectively.
Mutational analysis
A total of 57 gene panels were detected in 150 CRC patients by NGS, including 7 mutation types. These were single-nucleotide polymorphisms (SNPs), single nucleotide variations (SNVs), insertion-deletion (Indels), copy-number variation (CNV), fusion, complex, and insertion mutation (INS). As shown in Fig. 1A, SNV and Indels were the most common mutation types in CRC patients. SNV mutations occurred in 42 genes, such as TP53 (90/150), KRAS (66/150), APC (45/150), PIK3CA (36/150), FBXW7 (28/150), PTEN (15/150), KIT (14/150), and EGFR (10/150). The cumulative mutation frequency was as high as 92.67% (cumulative mutation frequency = mutation frequency/total frequency). Indels mutations occurred in 11 genes, including APC (26/150), TP53 (15/150), RB1 (6/150), TERT (6/150), ALK (5/150) and FBXW7 (5/150), with a cumulative mutation frequency of 47.33%. The 12 genes panel of TP53, APC, PTEN, BRAF, ALK, KRAS, TERT, FBXW7, TSC1, EGFR, RB1, and CTNNB1 showed both types of SNV and Indels mutations. The other five mutation types are less variable in CRC patients. The cumulative mutation frequencies of Complex, CNV, and fusion were 23.33%, 8.67%, and 8.00%, respectively. INS mutation occurred only in TP53 (1/150) and APC (1/150) genes and the cumulative mutation frequency was 4.67%. SNP variation occurred only in the PMS2 (1/150) gene, with a cumulative variation frequency of 0.67%.
As shown by the outermost circle in Fig. 1B, 12 of the 57 CRC genes are distributed on chromosomes 1 and 9, whereas the other genes are mainly found on chromosomes 3, 4, and 17. Among the 57 CRC genes, TP53 with the highest SNV frequency was located on chromosome 17, and APC with the highest frequency of Indels was located on chromosome 5. Figure 1B shows that among the 57 CRC genes, there are 39 (red) genes with targeted drug therapy and 18 (blue) genes with chemotherapeutic drugs. The histogram of the innermost circle showed the mutation frequency of the 57 CRC genes. In addition, a total of 729 gene network interactions among 57 CRC genes were identified by the String database, of which only one gene G0S did not interact with any other gene.
Common detection of gene mutational analysis
There were 12 gene panels detected in 150 patients. As shown in Fig. 2A, the mutation frequency of TP53 was the highest, and mutations occurred in 112 patients, with a mutation frequency of 74.67%(112/150). The second was the APC gene ranked second with a mutation frequency of 66.67%(100/150). The others were KRAS 44.67%(67/150), PIK3CA 24.67%(37/150), FBXW7 23.33%(35/150), PTEN 16%(24/150), BRAF 7.33%(11/150), NRAS6% (9/150), ERBB 24.67%(7/150), MAP2K 12%(3/150), PDGFRA 1.33%(2/150) and HRAS 1.33%(2/150).
At the same time, five types of mutations were observed in these 12 genes, namely SNV, Indels, CNV, fusion, and complex. In addition, among the 12 mutant genes, there are 6 drug-usable genes, namely KRAS, PIK3 CA, PTEN, BRAF, NRAS, and TP53. Figure 2B shows the mutation sites of these six drug-resistant genes and the corresponding therapeutic drugs for each site. The analysis results show that the genes with the most drug-resistant mutation sites are TP53 and PIK3CA, with TP53 having six exons EX4, EX5, EX6, EX7, EX8, and EX10 mutation sites. PIK3CA had six exons EX2, EX5, EX10, EX12, EX14, and EX21 mutations, but their medication levels were relatively low. The TP53 therapeutic drug was the class C targeted drug Adavosertib; the therapeutic drugs of PIK3CA were the class B targeted drug Alpelisib and the class D targeted drugs cetuximab, panitumumab, and chemotherapeutic drug 5-Fu in combination with irinotecan. Although there were not many mutation sites of KRAS, NRAS, and BRAS, KRAS had three exons EX2, EX3, and EX4 site mutations, and one EX4 site had a SNP mutation. In NRAS, there were three exons EX2, EX3, and EX4 mutations. There are three exons EX11, EX15, and EX18 mutations in BRAF, but their medication grades are class A targeted drugs cetuximab, panitumumab, etc., suggesting that the mutations of these three genes have important clinical significance in the relationship between the efficacy of targeted drugs. In addition, PTEN has three exons EX4, EX7, and EX8 site mutations, which are treated with class C and class A drugs. The Class D targeted drugs are cetuximab and panitumumab; two whole exon mutations were treated with Class C targeted drugs cetuximab and panitumumab.
To screen out the common detection genes of CRC, we sorted 57 CRC genes according to the number of patients detected. As shown in Fig. 2C, genes that detect more than 40 patients are defined as common detection genes. Finally, 36 CRC common detection genes were screened out: TP53, APC, KRAS, EGFR, PIK3CA, FBXW7, PTEN, BRAF, NRAS, ERBB2, MAP2K1, HRAS, PDGFRA, NTRK1, KIT, MTHFR, ROS1, RB1, GSTP1, XRCC1, CTNNB1, ALK, KDR, SMAD4, MET, NQO1, TERT, VHL, JAK1, STK11, TSC1, TSC2, AKT1, PMS2, MSH6 and MSH2.
Comparison with the TCGA database
To assess the universality of 36 common detections of gene panel in patients with CRC, the mutational status of the clinical cohort was compared with the TCGA CRC cohort. The clinical characteristics of 590 patients were obtained, including sex, age at diagnosis, histologic types, clinical stage, T classification, N classification, M classification, and overall survival status (Fig. 3A). To verify whether the mutation frequency of 36 CRC common detection genes in 150 CRC patients matched the mutation data in the TCGA database, we generated a scatter plot based on the mutation rate of the two cohorts. As shown in Fig. 3B, the R2 = 0.8611 of the fitting curve is approximately close to the R2 = 1 of the ideal curve, indicating that the mutation frequency of 36 CRC common detection genes is highly consistent in the two cohorts, suggesting that the 36 mutation genes screened according to the NGS data are credible.
Function annotation and PPI network analysis of 36 CRC common detection genes
GO and KEGG pathway analyses were performed to annotate the 36 CRC genes. The biological process category contained 977 terms, of which the major terms were ‘negative regulation process of apoptosis’, ‘Negative regulation of cell proliferation’ and ‘positive regulation process of kinase’ (Fig. 4A). The top enriched terms for cellular components (Fig. 4B) and molecular functions (Fig. 4C) were ‘cytosol’ and ‘protein binding’, respectively. In the analysis of KEGG pathway, many pathways related to Oncology were observed, including ‘pathways in cancer’, ‘PI3K-AKT signaling pathway’, and ‘Hepatocellular carcinoma’(Fig. 4D).
Additionally, to explore the potential relationships among the 36 CRC common gene detections, a PPI network was constructed. As shown in Fig. 5A, there were 36 nodes and 358 edges in the PPI network. A significant module had 22 nodes connected by 214 edges with a score of 20.381 (Fig. 5B). In this module, the top 10 hub genes were identified according to the MCC ranking, including TP53, PTEN, PIK3CA, KRAS, CTNNB1, NRAS, HRAS, ERBB2, BRAF and MSH6 (Fig. 5C).
The level of immune infiltration of 36 CRC common detection genes
To evaluate the relationship between 36 CRC common detection genes and infiltration by immune cell, the GSVA package was run in R for immune cell enrichment. As shown in Fig. 6A,S1,S2, the genes with more than 5 types of immune infiltrating cells were statistically different. Finally, 27 genes were obtained: STK11, HRAS, RB1, NRAS, GSTP1, TP53, NQO1, JAK1, MSH2, MTHFR, EGFR, KDR, XRCC1, APC, KRAS, TSC1, PMS2, KIT, ALK, PDGFRA, NTRK1, BRAF, PTEN, CTNNB1, FBXW7, MAP2K1 and AKT1.
At the same time, we used 8 algorithms from the TIMER2.0 database (TIMER, EPIC, MCP-COUNTER, CIBERSORT, CIBERSORT-ABS, QUANTISEQ, XCELL, TIDE). The correlation between the mutation status of 36 CRC common detection genes and the infiltration level of 9 immune cells: CD8+T cells, CD4+T cells, Treg cells, B cells, neutrophils, macrophages, DC cells, NK cells, and MDSC cells was analyzed. As shown in Fig. 6B, the round representative gene mutation was positively correlated with the level of immune cell infiltration, whereas the diamond was a negative correlation. Finally, 16 mutant genes with an immune infiltration score greater than 2 were screened: BRAF, MTHFR, APC, TSC1, MSH6, PMS2, TP53, JAK1, CTNNB1, EGFR, ALK, FBXW7, KDR, KRAS, KIT and TSC2.
Venny online tool was used to intersect 16 genes screened in the TIMER2.0 database with 27 genes screened by the ssGSEA method. Finally, 14 common immune-related genes were obtained: TP53, JAK1, MTHFR, EGFR, KDR, APC, KRAS, TSC2, PMS2, KIT, ALK, BRAF, CTNNB1 and FBXW7 (Fig. 6C).
Establishment of an immune-related prognostic risk model
Clinical data from 590 CRC patients and the mutation data of 14 immune-related genes were downloaded from the TCGA database. SPSS 22.0 software was used to perform univariate Cox risk regression analysis for14 immune-related genes and clinicopathological features. The results showed that among the 14 immune-related genes, ALK was statistically significant (P = 0.009). Clinical stage (P < 0.001), T classification (P < 0.001), N classification (P < 0.001), and M classification (P < 0.001) were statistically significant, which were related to the risk of death (Fig. 7A). Subsequent multivariate Cox regression analysis showed that clinical stage (P < 0.001), T classification (P = 0.015), and ALK (P = 0.003) were independent prognostic risk factors (Fig. 7B). Finally, the prognostic risk model was established according to the multivariate Cox risk regression coefficient, h (t,x)/h0 (t) = EXP (0.835stage + 0.878T + 1.094ALK).
Performance evaluation of the prognostic risk model for immune-related genes
The risk scores of 590 CRC patients were calculated according to the immune prognostic risk model, and the patients were divided into high- and low-risk groups. As shown in Fig. 8A, the survival time of CRC patients in the high-risk group was significantly shorter than that in the low-risk group (P = 0.001). The risk scores of risk factors were visualized by Xiantao Academic. The 590 CRC patients were ranked from low to high based on the risk scores and divided into a low-risk group (left) and a high-risk group (right). The survival time and outcome of the patients in the high- and low-risk groups, and the expression heatmap of ALK mutant genes in the high- and low-risk groups (Fig. 8B). The results showed that there were more ALK gene mutations in the high-risk group, and the prognosis of the high-risk group was worse than that of the low-risk group. To improve the accuracy of the risk score in predicting the prognosis of patients and to facilitate clinical application, we used the clinical stage, T classification, and ALK gene to draw a Cox regression nomogram. By summingthe scores of the variables included in the modeland then drawing a vertical line at the total score, it intersects with the three lines representing the predicted survival probability, and the corresponding values of the intersections are the 1-year, 3-year, and 5-year relapse free survival (RFS) of the predicted individual (Fig. 8C). For example, a CRC patient with clinical stage IV, T3, and ALK gene mutation had a total score of approximately 53, and his 1-year, 3-year, and 5-year RFS were 81.00%, 28.50%, and 10.50%, respectively.
To evaluate the specificity and sensitivity of the prognostic risk model, we plotted the ROC curve and calculated the area under the ROC curve (AUC). As shown in Fig. 8D, the AUC of the risk model, clinical stage, T classification, and ALK gene were 0.543,0.768,0.663, and 0.536, respectively. Because the AUC was greater than 0.5, the model was considered to have good specificity and sensitivity. The above assessment of immune-related prognostic risk suggests that late clinical stage, late T classification, and ALK gene mutation will increase the risk of death and the prognosis of patients is poor.
Application of the immune-related prognostic risk model in prognosis evaluation of CRC patients
A total of 112 CRC patients were followed up. Patients were divided into a high-risk group and a low-risk group based on the risk score, including 76 cases in the high-risk group and 36 cases in the low-risk group. The Kaplan-Meier survival curve (Fig. 9A) and log-rank test analysis (P = 0.001) were drawn by SPSS version 2.0. The results showed that the survival time of patients in the low-risk group was significantly longer than that in the high-risk group, indicating that the immune-related prognostic risk model we constructed can be used to predict the prognosis of CRC patients.
The immune-related prognostic risk model is associated with MSI in CRC patients
The clinical data of 92 CRC patients with MSI detection were downloaded from the TCGA database. According to the stability of MSI, patients were divided into MSI-H group and MSS group, including 23 patients in MSI-H group and 69 patients in MSS group. The risk score of 92 patients was calculated according to the immune prognosis risk model. The risk scores of MSI-H group and MSS group were compared by rank sum test. The results showed that the risk score of CRC patients in the MSI-H group was significantly higher than that in the MSS group, and there was a statistical difference (P = 0.020). According to the 2023 Chinese Society of Clinical Oncology (CSCO) guidelines for the diagnosis and treatment of colorectal cancer, immune checkpoint inhibitors are recommended as grade III for the first-line palliative treatment of MSI-H CRC patients. The results suggest that the immune-related prognostic risk model can be used to predict the efficacy of immunotherapy (Fig. 9B).