Association of stromal scores, immune scores and tumor purity with CRCprognosis
Stromal scores, immune scores and tumor purity were significantly associated with CRC clinical stages and prognosis. Using a variety of technologies, we obtained the genomic profiles. The stromal and immune scores, as well as the tumor purity, were generated through the uniform algorithm. The results did not reveal any obvious cohort-bias clustering. Hence, we merged the groups from TCGA and GEO and further investigated whether there were stromal scores, immune scores and tumor purity statistically correlated with CRC clinical stages and prognosis. In the present study, we downloaded the gene expression profiles and clinical data of 1635 CRC patients from the TCGA and GEO databases. Patients were 66.06 ± 12.95 years old, with 882 males (53.9%) and 753 (46.1%) females.
For CRC, TNM stages are valuable prognostic indicators. We included them in the analysis. The stromal scores ranged from -2232.54 to 2193.09, immune scores were distributed between -899.57 and 3202.84, and tumor purity ranged from 0.28 to 0.98, respectively (Data were calculated by ESTIMATE algorithm). Figure 1 illustrates that across different stages, the distribution of stromal scores differed (p = 4.79e-10) while the distribution of immune scores had no variation (p = 0.58). Figures 1 show that tumor purity (p = 1.40e-04) were inversely associated with different stages.
To study the possible correlation of the prognosis with stromal score, as well as immune score and tumor purity, we classified the CRC patients into high and lower groups. For this purpose, we created the X-tile and Kaplan-Meier survival curves. The results suggest that the stromal scores were positive correlation with the OS (p=0.036) is statistically significant. Also, the immune scores are positively associated with OS, but it was not statistically significant (p=0.38). Conversely, tumor purity was significantly negative association with OS (p=0.024) (Figure 1).
Identification of DEGs
To uncover the correlation of gene expression profiles with stromal scores, immune scores and tumor purity, we analysed RNAseq data of all 611 CRC cases obtaining in TCGA database. Figure 2, the volcano plots highlight unique gene expression profiles of cases that we classified under high/low stromal scores, immune scores, and tumor purity groups. We based our comparative analyses on three factors: first, stromal scores through the upregulation of 1507 genes and downregulation of 10 genes in the high stromal score group; second, immune scores through the upregulation of 1235 genes and downregulation of 61 genes in the high and low groups; and third, tumor purity group through the upregulation of 1824 genes and downregulation of 68 genes in the low score group. Furthermore, Figure 2 illustrates the IRGs were the upregulated or downregulated intersection genes in the low tumor purity group and high stromal or immune groups (944 upregulated genes and 4 downregulated genes).
Functional analysis of IRGs
Using the STRING tool that included 2834 edges and 471 nodes, we generated PPI networks to study the interactions among the identified immune-related genes (Figure 3A). For further analysis, we chose three modules with at least 20 nodes. As shown in module A (Figure 3B), the network had formations of 903 edges and 43 nodes. Each gene in this model was associated with 42 other genes, indicating that this module might be the core sub-network of PPI. In module B (Figure 3C), there were 395 edges and 43 nodes that had numerous genes in the middle that were critical to immune response. These included OLR1, ITGB2, SLC2A3, ITGAM, and ATP8B4. In module C (Figure 3D), there were 198 edges and 29 nodes where the higher degree values of connectivity in FBN1, COL3A1, and COL1A2 proved their existence as the core genes in the module.
The clustering of these enriched functional-related genes is associated with the immune response, which is in line with the PPI network analysis. Furthermore, we identified statistically significant GO terms, such as the total of 1325 for biological processes, 83 for molecular function, and 77 for the cellular component (FDR<0.05). GO terms (Table S1) included regulation of leukocyte activation, leukocyte migration and T cell activation (Figure 4A), extracellular matrix and side of membrane (Figure 4B), and receptor regulator activity (Figure 4C). Additionally, the KEGG analysis (Figure 4D) generated pathways that were all linked with the immune response.
Construction and validation of the IRGs-based signature
In the context of this study’s TCGA CRC patients, it utilized the univariate Cox models to investigate the association between the expression levels of their IRGs and their OS. Then, to perform the LASSO method, our selection included 171 statistically significant IRGs, which ultimately determined the fundamental IRGs that were significantly related to OS (Figure 5A and 5B). Subsequently, thirteen IRGs and their coefficient values were identified. The formula is as follows: risk score = (0.1217 × expression of A2ML1) + (0.03442 × expression of CALB2) + (-0.6693 × expression of CD1B) + (0.04806 × expression of COL22A1) + (0.4471 × expression of FCRL2) + (0.00069 × expression of GPX3) + (0.05368 × expression of HAND1) + (0.0023 × expression of IDO1) + (0.006 × expression of LAMP5) + (0.07625 × expression of MAP2) + (0.02431 × expression of MMRN1) + (0.1085 × expression of NKAIN4) + (0.3541 × expression of VAX2). Patients were classified into low- and high-risk groups according to a cutoff risk score of 1.096 (Figure 5C and 5D). Figure 5E demonstrates that the high-risk group had substantially lower levels of OS than in the low-risk at p < 0.0001. For 1-, 3-, and 5-year OS, the area under the ROC curve (AUC) was 0.713, 0.724, and 0.689, respectively (Figure 5F). The GSE39582 cohort verified the prognostic model (N=531). Patients were classified into low- and high-risk groups according to an ideal cutoff risk score of 0.973 (Figure 5G and 5H). Figure 5I illustrates that the high-risk group had substantially lower levels of OS than in the low-risk at p = 5.97e-04. For 1-, 3-, and 5-year OS, the AUC was 0.725, 0.643, and 0.606, respectively (Figure 5J).
Independent prognostic role of the thirteen‐IRGs signature
The univariate and multivariate Cox regression analyses were executed in both GEO and TCGA datasets through the adjustment of clinicopathological features like the T stage, N stage, M stage, and tumor stage, as well as the age and gender. This process aimed to assess if the signature-based risk score of the thirteen-IRGs was an independent prognosis factor for OS, which the results confirmed as true (Figure 6A, 6B, 6D and 6E). We verified the clinical significance of the thirteen-IRGs signature using the Chi-square test to ascertain the signature’s association with the clinical parameters. In the TCGA cohort, significant correlations were found between the higher-risk score and tumor stage (p<0.001), M stage (p<0.01), N stage (p<0.001), and T stage (p<0.001) (Figure 6C). However, no significant difference was found in gender and age. Comparable outcomes were observed in the validation cohort of GSE39582 (Figure 6F).
Establishment and assessment of the nomogram based on the thirteen‐IRGs signature
A nomogram was established in terms of the smallest Akaike Information Criterion (AIC) value occurred from the multivariate Cox regression (AIC = 1132.81), which includes age, T stage, N stage, M stage and the thirteen‐IRGs signature (Figure 7A). The calibration plot demonstrated that the evaluating capability of the nomogram was best in forecasting 3‐year OS (Figure 7B). The C-index of the nomogram was 0.770 (95% CI = 0.718–0.823). As Figure 7C displays, the nomogram illustrated a greater net benefit that had a wider range of threshold probability on the DCA for predicting 1-, 3-, and 5- year OS. Combining thirteen‐IRGs signature with TNM stage showed better net benefit for predicting 3‐year OS (Figure 7D). The AUC for 1‐, 3‐, and 5‐ year OS of the nomogram in TCGA dataset was 0.789, 0.783 and 0.790, respectively (Figure 7E). Based on Figure 7F, CRC patients from the high-risk group, stratified by the nomogram, had significantly lower OS rates compared to the low-risk group (P <0.001). Using the GSE39582 cohort for validation, we obtained similar results, as displayed by Figure S1.
Survival analysis of IRGs
For additional confirmation of their prognostic value and expression, we applied the Kaplan-Meier survival analysis for the thirteen IRGs in signature in patients with CRC from TCGA (Figure 8). We found that A2ML1, CALB2, COL22A1, FCRL2, GPX3, HAND1, IDO1, LAMP5, MAP2, MMRN1, NKAIN4 and VAX2 were identified as cancer-promoting factors given their high expression correlation with shorter OS in patients with CRC. On the other hand, the high expression of CD1B exhibited a significant correlation with longer OS. This association implies the possible protective role of RNAs in CRC. Additionally, we verified these results in the GSE39582 cohort (Figure S2).
Correlation between TIICs and prognostic IRGs
The independent predicted factors of sentinel lymph node status and OS among cancer patients are the Tumor-infiltrating immune cells (TIICs) (34). As a result, we explored the correlation of those thirteen prognostic IRGs with the immune infiltration levels in CRC using TIMER (Table 1). The outcomes reveal the positive correlation of the CD1B expression level with the infiltrating levels of neutrophils (r=0.468, p=3.31E-23) and Dendritic Cells (r=0.505, P=2.05E-27) in colon adenocarcinoma (COAD) (Figure 9A). Moreover, GPX3 has the highest significant correlations with infiltrating levels of macrophages (r=0.54, p=5.93E-32), neutrophils (r=0.408, p=4.45E-17) and Dendritic Cells (r=0.434, P=7.01E-20) (Figure 9B). In addition, IDO1 expression was significantly associated with infiltrating levels of CD8+ T cells (r=0.4, P=4.66E-17), neutrophils (r=0.638, p=3.97E-47) and Dendritic Cells (r=0.564, P=3.47E-35) (Figure 9C).