Tumor-Infiltrating Immune Cell (TIIC) Analysis
To investigate the cell types infiltrating tumors, this study obtained 24,351 cells from GSE132465 through quality control. The correlation between sequencing depth and the total number of genes was 0.9, and the correlation with mitochondrial genes was 0.3, as illustrated in Figure S1. Cell clustering was performed based on the top 20 significant principal components, resulting in the categorization of cells into 17 groups. The high-expression genes in each cell group were visualized using a heatmap (Fig. 1a, b). To annotate the cell types infiltrating tumor tissues, these 17 cell types were identified as T cells, B cells, monocytes, macrophages, NK cells, tissue stem cells, endothelial cells, and epithelial cells. Dimensionality reduction analysis was further conducted using the T-SENE algorithm (Fig.1 c, d). Subsequently, trajectory analysis was performed on these cells, dividing them into three states. Monocytes and macrophages exhibited similar differentiation trajectories, while B cells demonstrated two distinct differentiation fates. One was similar to the differentiation trajectory of endothelial cells, epithelial cells, and tissue stem cells, and the other resembled the differentiation trajectory of T cells (Fig. 1e, f). This study identified five types of immune cells infiltrating tumor tissues, and the expression of specific marker genes for each cell type (Fig. 1i).
Construction of the Immune-Related Risk Model
To precisely delineate immune genes associated with tumor cells, this study identified five distinct immune cell populations within tumor tissues and classified genes exhibiting high expression in these cells as Tumor Immune-Related Genes (TIRGs), totaling 2020 genes (Supplementary Table 1). To validate the functional significance of TIRGs, rigorous GO and KEGG functional enrichment analyses were performed. The GO analysis unveiled predominant involvement of TIRGs in key processes including cytoplasmic translation, immune response activation, modulation of immune response-regulating signaling pathways, activation of immune response-initiating signaling pathways, modulation of immune response-regulating cell surface receptor signaling pathways, and regulation of T cell activation pathways. Similarly, KEGG analysis highlighted significant associations of TIRGs with essential pathways such as Antigen processing and presentation, Ribosome, Lysosome, Th17 cell differentiation, as well as Th1 and Th2 cell differentiation. Both analyses collectively underscore the crucial regulatory role played by TIRGs in immune-related functionalities (Fig. 2a-d, Supplementary Tables 2, 3). To identify genes significantly associated with patient survival within TIRGs, the dataset underwent partitioning into internal training, internal testing, and external validation sets to facilitate comprehensive analysis. Univariate Cox regression analysis was then conducted on patient survival status and time, revealing 120 genes with p-values less than 0.05. Following this, feature selection was performed using the lasso algorithm (Fig. 2e-f). Ultimately, three genes exhibiting the most significant likelihood deviation (p < 0.05) in the multivariate regression analysis namely GZMB, TIMP1, and SLC39A8 were identified and retained as model genes. The risk score for each sample was subsequently computed using the established risk model, following the formula: Risk score = (GZMB level * -0.17278 + TIMP1 level * 0.35553 + SLC39A8 level * -0.42157). Following stratification, all samples were divided into high and low-risk groups utilizing the median Risk score, enabling subsequent Kaplan-Meier survival analysis. Across the training, testing, and validation sets, the obtained p-values were below 0.0001, 0.0026, and 0.00022, respectively (Fig. 2g-j).
Furthermore, a notable distinction in Risk score emerged between surviving and deceased patients (Fig. 2k-n), underscoring the efficacy of the risk model developed herein in discerning patient survival outcomes. To bolster the reliability assessment of the model, an in-depth examination of survival status distribution, risk score distribution, and gene feature expression was conducted across the training set (Fig. 3a), testing set (Fig. 3b), the entirety of the internal dataset (Fig. 3c), and the external validation set (Fig. 3d). The consistent performance observed across all datasets lends robust support to the predictive prowess of the risk score model formulated in this study. In further elucidating the clinical significance of the independently prognosticated outcomes derived from the constructed TIRG model, a comprehensive multifactorial analysis integrating clinical parameters with Risk score was executed, culminating in the generation of nomograms. The findings elucidated that Risk score stratifications, as determined by the TIRGs model, significantly influenced patient survival outcomes across both internal and external datasets (p < 0.05, Fig. 3e-h).
Negative Correlation between Immune Cell Infiltration and Risk Score
To examine the association between risk scores and Tumor-Infiltrating Immune Cells (TIIC), we utilized the CIBERSORT algorithm to compute infiltration scores for 22 distinct immune cell types in both internal and external samples. Subsequently, the 'pearson' method was employed to assess the correlation between model genes, risk scores, and the aforementioned 22 types of immune cells. Our findings unveiled a negative correlation between risk scores and Monocytes, NK cells resting, Plasma cells, T cells CD4 memory activated, T cells CD4 memory resting, T cells CD8, and T cells follicular helper. Conversely, a positive correlation was noted with Macrophages M0, Macrophages M2, and Neutrophils. Consistent immune infiltration patterns were observed in the external validation set (Fig. 4a-d). The relationship between Risk score and cell infiltration scores for each sample in both internal and external datasets was graphically depicted using scatter plots, where statistical significance (p < 0.05, Supplementary Fig. S2). Given the recognized influence of immunogenic cell death (ICD) and necroptosis on the anti-cancer immune response, differential analyses of genes associated with these pathways were conducted across risk groups in both datasets (Supplementary Tables 4, 5). The findings revealed that in both patient cohorts, the majority of ICD-related genes, such as HMGB1 and HSPA4, exhibited upregulation in low-risk patients, while PANX1, ROCK1, and ANXA1 were notably overexpressed in high-risk patients (Supplementary Fig. S3). Similarly, in the combined datasets, it was evident that most genes associated with necroptosis, including EZH2, RIPK3, PGAM5, and ALDH2, displayed relatively higher expression levels in low-risk patients compared to their high-risk counterparts (Supplementary Fig. S4). To further understand the differences in immune function between the two groups of patients, we calculated 28 immune cell function scores as well as the tumor immune type in each sample (Supplementary Fig. S5).
Comprehensive Assessment of Differences between High and Low-Risk Groups in TMB, MSI, and Immune Function
We assessed the correlation between MSI grouping and risk scores within TCGA-COAD samples. Our findings revealed that the microsatellite-stable (MSS) group exhibited significantly higher risk scores compared to the microsatellite instability-high (MSI-H) group. Furthermore, Kaplan-Meier survival analysis demonstrated that patients classified as MSI-L+High-risk displayed the shortest survival duration (Fig. 5a, b). Given the well-established association between tumor mutational burden (TMB) and MSI, along with the documented enhanced response to immunotherapy in patients with elevated TMB levels, we retrieved exome sequencing data from TCGA-COAD, computed TMB scores for each sample, and investigated their correlation with risk group stratifications. We observed that samples categorized in the low-risk group demonstrated elevated TMB scores. Subsequently, based on the median TMB, we stratified samples into High TMB and Low TMB groups and conducted KM survival analysis in association with risk groupings. Our analysis unveiled that patients characterized by Low TMB and High-risk profiles exhibited the most unfavorable survival outcomes (Fig. 5c, d). These findings imply an inverse relationship between risk scores and both MSI and TMB scores, aligning with diminished survival rates. Subsequent to this observation, we employed GSVA to explore disparities in GO, KEGG, and Hallmark gene functions between the two risk subgroups within the internal dataset. The analysis uncovered heightened enrichment of several KEGG pathways linked to cancer progression within the high-risk group, notably including Glycosaminoglycan Degradation, Leukocyte Transendothelial Migration, Cell Adhesion Molecules Cams, Regulation of Actin Cytoskeleton, and the TGF Beta Signaling Pathway. In contrast, pathways such as the Citrate Cycle, TCA Cycle, DNA Replication, Peroxisome, and Mismatch Repair exhibited elevated enrichment in the low-risk group (Fig. 5e). In the GO enrichment analysis, similar findings were observed, with pathways such as Negative Regulation of Extracellular Matrix Disassembly, Angiogenesis Involved In Coronary Vascular Morphogenesis, Type II Transforming Growth Factor Beta Receptor Binding, Regulation of Epithelial To Mesenchymal Transition, and Negative Regulation of Macrophage Cytokine Production showing notable enrichment in the high-risk group (Fig. 5f). Analysis of the HALLMARK gene set via GSVA revealed enrichment of pathways such as Angiogenesis, Apoptosis, Epithelial Mesenchymal Transition, Hypoxia, and Notch Signaling in the high-risk group, all closely associated with cancer progression. Conversely, pathways including DNA Repair, E2F Targets, G2M Checkpoint, Myc Targets V1, and Myc Targets V2 were enriched in the low-risk group, indicating their critical roles in impeding cancer progression (Supplementary Table 6). Additionally, we explored the correlation of four key pathways with risk scores, uncovering a negative correlation between DNA repair and G2M checkpoint pathways with risk scores, and a positive correlation between Epithelial Mesenchymal Transition (EMT) and Notch signaling pathways with risk scores (Fig. 5g). Considering the established association between TMB and DNA repair mechanisms, we conducted a comparative analysis of DNA damage-related pathways between the two risk groups. The results revealed significantly higher enrichment scores for almost all DNA damage repair-related pathways in the low-risk group compared to the high-risk group (Fig. 5g).
Poor Immunotherapy Response in High-Risk Group
To assess the potential immunotherapy response in cohorts stratified by risk level, we employed the TIDE online tool to analyze T-cell infiltration and functionality across internal and external datasets. Our findings indicate notable functional dysregulation and diminished T-cell infiltration in the high-risk cohort (Fig. 6a, b). Moreover, we computed immune phenotype scores for both groups, revealing higher scores in MHC, CP, AZ, and IPS for the low-risk cohort, suggesting a more favorable response to immunotherapy (Fig. 6c, d). Furthermore, our investigation extended to predicting chemotherapeutic efficacy. We noted disparities in IC50 values between high and low-risk groups across different drugs. For example, ABT.263, AKT-inhibitor-VIII, and ATRA exhibited higher IC50 values in the high-risk group, whereas most drugs displayed higher IC50 values in the low-risk group, including A.770041, AG.014699, AICAR, AMG.706, and AS601245 (Fig. 6e). These findings underscore a notable correlation between chemotherapy treatment efficacy and risk scores. Subsequently, to validate the association between immunotherapy response and risk groups, we extracted immunotherapy response expression data and corresponding clinical information from the IMvigor210 dataset. Initially, Kaplan-Meier survival curve analysis elucidated an unfavorable prognosis for patients classified as high-risk, evidenced by significant disparities in risk scores among different survival statuses. Furthermore, concerning immunotherapy response, the responder (R) cohort demonstrated markedly lower risk scores in contrast to the non-responder (NR) group, as depicted in Figure 6S. This observation suggests that immunotherapy may exert a less favorable impact on high-risk patients, which corroborates both our hypotheses and earlier analyses. Additionally, an examination of immune checkpoint-related gene expression levels (PDCD1, CD274, TIGIT, PDCD1LG2, LAG3, HAVCR2, and CTLA4) revealed diminished expression levels and prolonged survival periods in individuals categorized as low-risk (Fig. 6i-o).