A New Prognostic and Therapeutic Immune-related Risk Signature of Colorectal Cancer


 BackgroundAlthough some advanced colorectal cancer (CRC) patients could select immunotherapy, but still most microsatellite stability (MSS) CRC patients did not respond. Our present study aims to set up a novel system for prognostic prediction and immunotherapeutic responsiveness for MSS CRC patients.MethodsUnivariable Cox regression survival analysis and least absolute shrinkage and selector operation (LASSO) regression analysis were performed to identify prognostic genes and establish immune risk signatures. Multivariate Cox regression analysis was performed to verify whether these clinical features could predict prognosis. R package was used to analyze the relationship between the immune-related risk model and these immune cells, effector molecules, and immune checkpoints.ResultsWe constructed an immune-related signature and verified its predictive capability. Immune-related signature included 12 differentially expressed IRGs (12 DE IR MSSGs), including CXCL1, CD36, FABP4, MS4A2, NRG1, VGF, GRP, HDC, XCL1, NGF, MAGEA1, and IL13. The signature consisting of 12 DE IR MSSGs was an independent and effective prognostic factor for the overall survival of CRC patients. In addition, the signature consisting of 12 DE IR MSSGs reflected the infiltration characteristics of different immunocytes in tumor immune microenvironment. The signature consisting of 12 DE IR MSSGs also had a significant correlation with immune checkpoint molecules.


Introduction
Colorectal cancer (CRC) is a major cause of morbidity and mortality of cancers over the world (1). For the patients with early stage, surgery is usually selected as the primary treatment, while those patients initially diagnosed at advanced stage with distant metastasis usually remained poor survival due to limited therapeutic strategies (2,3). Thus, it's critical that the prognosis of CRC patients is signi cantly associated with the timing of diagnosis, and it's important to develop novel bio-markers or more accurate systems for the diagnosis and prognostic prediction. of MMR results in increased rates of accumulating mutations. MSI is a marker of de ciency of DNA MMR activity (8). This mechanism leads to higher mutation rates and increased tumor antigen load. Moreover, as an important biological feature, more in ltrating immune cells and higher level of type I interferon expression were found in the tumor microenvironment of the MSI-H subtype patients, which is essential for the involvement of ICB (9)(10)(11)(12). Unfortunatly, MSI-H/MMR-de ciency only accounts for approximately 15% of CRC cases (13,14). Therefore, reliable predictive biomarkers are urgently required to monitor tumor progression and immunotherapy e cacy in MSS patients. Immune-related genes (IRGs) and tumor immune microenvironment (TIME) have become promising biomarkers for evaluating the survival of multiple cancers (15)(16)(17). Novel biomarkers can enhance the risk strati cation of screening, and identify susceptibility or initial stages of the disease. Besides, speci c biomarkers can determine whether a personalized treatment is required (18).
CRC patients' data and transcriptome RNA sequence data were downloaded and analyzed from The Cancer Genome Atlas (TCGA) for analysis. We estimated the MSS-mediated immune risk pattern from CRC patients and correlated it with clinical and pathological features of CRC patients. Through comprehensive genomic data analysis, we established a new MSS-mediated immune-related risk model from CRC patients. We found that 12 DE IR MSSGs were powerful prognostic biomarkers and predictors in response to immune checkpoint inhibitors. We also explored immune cells, effector molecules, and mutation data related to risk signature, which might provide novel ideas for further immunological studies and more precise immunotherapy for MSS CRC patients. Our further study aims to determine which biomarkers can determine the best predictive index for patients and can be used in personalized treatment.

Differential expression of DE IR MSSGs in CRC patients
According to the TCGA dataset of 222 CRC patients, we used the R package to adjust (P<0.05 and | log2(multiple changes) |≥1.0), and 21,995 DEGs were obtained. We analyzed the difference and obtained the DEGs by heat map and volcano map (Figure 1,2A). The intersection of MSS-related genes, IRGs, and DEGs in CRC identi ed 253 differentially expressed IRGs (DE IR MSSGs) ( Figure 2B). We extracted DE IR MSSGs using the R package by enrichment analysis. The most prosperous positive terms of GO were as follows: "positive regulation of cytokine production", "positive regulation of peptidyl-tyrosine phosphorylation", "regulation of peptidyl-tyrosine phosphorylation", "peptidyl-tyrosine phosphorylation", "peptidyl-tyrosine modi cation" ( Figure 2C). The most prosperous negative terms of GO were as follows: "leukocyte migration", "positive regulation of response to external stimulus", "regulation of lymphocyte activation", "leukocyte proliferation", "lymphocyte proliferation"( Figure 2D). The positive top 5 approaches obtained through KEGG analysis were as follows: "cytokine-cytokine receptor interaction", " PI3K-Akt signaling pathway", "MAPK signaling pathway", "Rheumatoid arthritis", "IL-17 signaling pathway" ( Figure 2E). The negative top 5 approaches obtained through KEGG analysis were as follows: "Cytokine-cytokine receptor interaction", "Neuroactive ligand-receptor interaction", "Pl3K-Akt signaling pathway", "Hematopoietic cell lineage", "JAK-STAT signaling pathway" ( Figure 2F). These differential genes might play an important role in regulating these pathways.

MSS-mediated development of immune-related risk characteristics in CRC patients
Univariate Cox regression analysis was performed to study the prognostic value of these 253 DE IR MSSGs. We identi ed 20 DE IR MSSGs that were signi cantly correlated with the OS (P<0.05). Using pro les of DE IR MSSGs associated with OS, we performed LASSO regression analysis and identi ed 12 CRC-speci c prognostic DE IR MSSGs (Figure 3). To obtain a clinically applicable risk assessment model, we established immune characteristics based on the expressions of the 12 DE IR MSSGs and their corresponding coe cients obtained from multivariate Cox regression analysis ( Figure 3). The formula was listed as follows: risk score =(-0.01640918*CXCL1)+(0.009410274*CD36)+(0.020934107*FABP4)+ (-0.073249101*MS4A2)+(-0.026759138*NRG1)+(0.05610912*VGF)+(-0.003939922*GRP)+ (-0.027542663*HDC)+(0.027187195*XCL1)+(0.046561706*NGF)+(0.044794432*MAGEA1)+ (0.020605364*IL13)+0.257498710. We used R package to calculate the critical value score=0.205878419 of the best risk score. The patient with a score > 0.205878419 was assigned to the high-risk group (n=101), and the the patients with a score≤ 0.205878419 was assigned to the low-risk group (n=121). Table 1 lists the details of the 12 DE IR MSSGs.

Clinical characteristics and OS-associated immune-related risk genes in CRC patients
We then studied the relationship between clinical features (age, sex, clinical stage, histological grade, and TNM staging status) and risk score, which was shown in Table 2. The results showed that the risk score was signi cantly correlated with gender (P<0.01), and other clinical features were not statistically signi cant. By analyzing the relationship between 12 DE IR MSSGs and OS, we found that OS was signi cantly better in the low-risk group ( Figure 4A, P<0.0001), and the AUC was 0.738 in the OS prediction ( Figure 4B). We also analyzed the survival curves of the 12 above-mentioned genes. The results showed that the higher expressions of CXCL1, MS4A2, HDC, NRG1, and IL13 indicated a better survival of CRC patients. However, the results of CD36, FABP4, VGF, GRP, XCL1, NGF, and MAGEA1 showed the opposite trend ( Figure 4C). In the multivariate Cox regression model, after other clinical characteristics were adjusted, the risk score, age, sex, and clinical stage were signi cantly correlated with the patient's survival (Table 3). Therefore, risk groups based on risk characteristics of 12 DE IR MSSGs were independent and effective prognostic factors.

Expression stability of immune-related risk genes in CRC patients
The expressions of CXCL1, VGF, GRP, MAGEA1, and IL13 in CRC patients were signi cantly up-regulated compared with the normal controls, while the expressions of CD36, FABP4, MS4A2, NRG1, HDC, XCL1, and NGF were signi cantly down-regulated ( Figure 5). Therefore, the immune-related risk model established in our present study presented as su cient stability and feasibility.
Tumor-in ltrating immune cells are associated with risk characteristics in CRC patients Next, we identi ed the relationship between the risk characteristics of 12 DE IR MSSGs and tumorin ltrating immune cells. The results showed that the in ltration of T cells, cytotoxic lymphocytes, B lineage, NK cells, monocytic lineage, myeloid dendritic cells, and neutrophils was higher in the low-risk group, which might be related to a better anti-tumor effect ( Figure 6A, P<0.05). Moreover, 12 DE IR MSSGs were signi cantly correlated with T cells, cytotoxic lymphocytes, B lineage, NK cells, monocytic lineage, myeloid dendritic cells, and neutrophils ( Figure 6B). Besides, we studied the association between the risk characteristics of CRC patients and immune cell in ltration to consider whether the risk score partially re ected the effect achieved. We found that the risk score was negatively correlated with the expressions of T cells, cytotoxic lymphocytes, B lineage, NK cells, monocytic lineage, myeloid dendritic cells, and neutrophils ( Figure 6C, P<0.05). The results showed that the risk characteristics of 12 DE IR MSGSs were signi cantly correlated with tumor-in ltrating immune cells.

Relationship between immune checkpoints and immune-related risk characteristics in CRC patients
To determine the relationship between the risk characteristics of 12 DE IR MSSGs and immune checkpoint molecules, we identi ed differences in the expressions of immune checkpoint molecules from TCGA-COAD-READ samples between the two risk groups. In the high-risk group, the expressions of PDCD1, CTLA4, HAVCR2, LAG3, TIGIT, CD274, and PDCDLG2 were up-regulated ( Figure 7A). Moreover, the risk signals of 12 DE IR MSSGs were well correlated with seven immune checkpoint molecules ( Figure  7B). Our data indicated that XCL1 was positively correlated with PDCD1 and LAG3. HDC and M4AS2 were positively correlated with CTLA4 and TIGIT. There was a positive correlation between HDC and CD274. CD36 and HDC were positively correlated with HAVCR2. CD36, M4AS2, and HDC were positively correlated with PDCD1LG2 ( Figure 7C). Our results showed that the risk signals of 12 DE IR MSSGs were well correlated with the expressions of immune checkpoint molecules. These molecules might be used as immune card control point treatment indicators to achieve better e cacy.

Relationship between immune-related danger signals and mutations in CRC patients
We analyzed the available somatic mutation data to evaluate the relationship between mutations and 12 DE IR MSSGs. By analyzing the relationship between the gene expression and mutation of the two groups, we found that there was no signi cant difference in gene mutation ( Figure 8A,B). Our results showed that there were almost no differences in the mutations of these genes between the two risk groups, indicating that these genes were not changed during the mutation process, which could solve the problem of resistance of immune checkpoint inhibitor therapy and play a more signi cant role in the e cacy of immune checkpoint.

Discussion
Because of the tumor heterogeneity and tumor microenvironment immunosuppression of CRC, it is generally impossible to accurately evaluate the clinical results and responsiveness of ICB therapy by using a single biomarker (19). It is well known that the tumor immune microenvironment (TIME) plays a crucial role in tumor progression and occurrence, suggesting that we can try to explore immune-related markers for the prognositic prediction of CRC patients. It is essential to reveal the relationship between these genomic changes and the status of TIME. According to speci c genomic characteristics, CRC could be genetically classi ed into MSI and MSS subtypes (20). MSI CRC has a better responsiveness to ICB therapy, while MSS CRC is more effective in 5-uorouracil treatment. As for the differences in prognosis and therapy between MSI-H and MSS CRC, more studies are required to dissect the differences in biological pathways and regulation between these two subtypes of CRC (21). Understanding the relevant immune genes involved in the MSS regulation can help us nd appropriate methods and provide a reliable reference for the diagnosis and treatment of immunotherapy-unresponsive patients.
Therefore, through the collation and analysis of DEGs, IRGs, and MSSs, we constructed an immunerelated risk characteristic model composed of 12 prognostic DE IR MSSs. Our results showed that CRC patients with high expressions of CXCL1, MS4A2, HDC, NRG1, and IL13 had a better OS. A number of recent studies found that CXCL1, MS4A, NRG1 and IL13 were highly expressed in a variety of tumors (22)(23)(24)(25)(26)(27). These results are not consistent with our research, and thus their unknown mechanism in CRC needs to be further explored. Our present results showed that CRC patients with lower expressions of CD36, FABP4, VGF, GRP, XCL1, NGF, and MAGEA1 had a better OS. Other studies con rm our conclusions, high expression of FABP4, VGF, GRP, XCL1, NGF, and MAGEA1 promoted tumor progression, and inhibition of their expression could play an anti-tumor role (28)(29)(30)(31)(32)(33)(34)(35)(36)(37). However, the function of CD36 in previous studies is contrary to our results. Fang et al. have indicated that CD36 inhibits gpc4-catenin/c-myc signal transduction by promoting the expression of proteasome-dependent β, thus inhibiting downstream aerobic glycolysis and CRC (38). These molecules bring a lot of inspiration for our future research and may reference the diagnosis and treatment of CRC patients. In addition, our study found that there were signi cant differences in OS between different risk groups, and the survival was better in the low-risk group. Further analysis showed that 12 DE IR MSSGs were an independent prognostic factor of CRC, suggesting that 12 DE IR MSSGs could be used as a reliable prognostic tool. We noted that the risk signals of 12 DE IR MSSGs were well correlated with the expressions of immune checkpoint molecules. Surprisingly, this nding is different from previous other studies. Not all patients with MSS colorectal cancer are insensitive to immune checkpoint inhibitors. Based on our subgroup analysis, we found higher expression of PD-1/PD-L1, CTLA-4, TIM3, LAG3, TIGIT and PD-L2 in the high-risk group. Patients in high-risk groups may respond to these immune checkpoint treatments, and this model may be a useful predictor of immune checkpoint therapy. We conducted mutation analysis, and the mutations of these genes were almost not different between the two risk groups, indicating that these genes were not changed during the mutation process, which could solve the problem of resistance of immune checkpoint inhibitor therapy and play a more signi cant role in the e cacy of immune checkpoint.
The prognostic model consisting of 12 DE IR MSSs had a strong predictive power for the prognosis of CRC patients, which might help predict patients' response to immunotherapy and guide more effective immunotherapy strategies. Next, we will collect clinical samples for further veri cation to con rm the accuracy and applicability of the model. We hope that this model might provide more novel and in-depth insights into the development of new immunotherapy and offer a comprehensive understanding and a unique perspective for the prognositic evaluation of CRC patients.

Conclusion
We constructed and veri ed a new, immune-related prognostic signature associated with MSS CRC patients. This model might provide more novel and in-depth insights into the development of new immunotherapy and offer a comprehensive understanding and a unique CRC perspective.

Data acquisition and preprocessing
The UCSC Xena browser (GDC Hub: https://gdc.xenahubs.net) was used to download CRC patients' data and transcriptome RNA sequence data from TCGA. The criteria of dataset selection were set as follows: (1) containing the patient's mRNA sequence data, and clinical information; and (2) containing the patient's prognostic information. A total of 222 CRC samples were used in the subsequent analysis. A complete list of IRGs was obtained from the immunology database and analysis portal (ImmPort) database (https://immport.niaid.nih.gov). MSS-related genes were acquired from the human genome database (https://www.genecards.org/), and the genome could obtain complete information on a particular gene.

Analysis of differentially expressed genes (DEGs)
The change of gene expression was estimated using the R package with the empirical Bayesian method and a moderated t-test. DEGs were identi ed by R package LIMMA using the criterion (adjusted P<0.05) described in the R package. Benjamini-Hochberg correction was used to calculate the adjusted P value of multiple tests to screen for DEGs between CRC and normal tissues. The intersection of DEGs, IRGs, and MSS-related genes (DE IR MSSGs) was used to further screen the differentially expressed MSS-mediated IRGs in CRC progression.

Enrichment analysis of gene dataset
The functional enrichment analysis was carried out, and the visualization of enrichment analysis was realized by using the R package "Clusterpro ler". Determined strict truncation values of the gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway terms were P<0.01, and FDR (false discovery rate) <0.05.

Development and veri cation of CRC immune-related markers
Univariate Cox proportional hazards regression analysis was used to screen survival-related DE IR MSSGs in CRC patients. Survival-related DE IR MSSGs were considered to be P<0.05. The least absolute shrinkage and selector operation (LASSO) regression analysis was performed to determine DE IR MSSGs associated with survival, which were combined with expression pro les to minimize the over ne, and the R package "GLMnet" was used to nd the optimal gene model. For the TCGA-COAD-READ cohort, the expression pro le (standardized FPKM) was converted to TPM to obtain the risk score of each patient.
The risk score = gene 1 expression * coe cient + gene 2 expression * coe cient +...+ gene N expression * coe cient +b. Each sample's risk score was calculated, and the best risk score's cut-off value was determined by using the R package. The patient was assigned to the high-risk group when the risk score was >cut-off value, otherwise, the patient was assigned to the low-risk group. The 12 DE IR MSSGs were strati ed according to the difference in overall survival (OS) between the two risk groups as indicated by the Kaplan-Meier survival curve. The R package "Survival ROC" was used to calculate the area under the curve (AUC) to evaluate the TIME-dependent prognostic value of gene traits. When P<0.05 in the two-way logarithmic rank, it was of great signi cance to survival analysis. Multivariate Cox regression analysis was performed to verify whether these clinical features (including age, sex, clinical stage, histological grade, and TNM staging status) could predict the prognosis.

The correlation and differences between tumor-in ltrating immune cells and immune checkpoints in CRC populations
These immune cells, effector molecules, and immune checkpoints were identi ed by using the R software. The risk score of each patient was calculated, and their relationship with tumor-in ltrating immune cells (T cells, cytotoxic lymphocytes, B lineage, NK cells, monocytic lineage, myeloid dendritic cells, and neutrophils) and immune checkpoint molecules (PDCD1, CTLA4, HAVCR2, LAG3, CD274, and PDCD1LG2) was evaluated. The risk score of each sample was used to assess its relationship to these molecules.

Mutation analysis
The somatic mutation data (SNP and small INDEL, MuTect2 mutation aggregation and masking) of STAD patients were obtained from the TCGA database (https://gdc.xenahubs.net/download/TCGA-COAD-READ/Xena_Matrices/TCGA-COAD-READ.mutect2_snv.tsv .gz). The R software was used to select the gene mutation data from 12 DE IR MSSGs to determine the difference between the two groups of CRC patients.

Statistical analysis
Graphpad 8, SPSS and R software were used to analyze the corresponding data. The MSS score was evenly divided into low and high groups using the R package MaxStat to reduce batch calculation. The critical genes in differential gene analysis were obtained by transforming the P-value into FDR using the Benjamini-Hochberg method. The volcano map, heat map, and bubble map were generated using the R packages "ggplot2", "pheatmap", and "clusterPro ler", respectively. The differences between variables were analyzed by t-test, non-parametric test, and chi-square test. The prognostic ability of MSS-mediated immune-related risk characteristics and clinical features was evaluated by univariate, LASSO, and multivariate Cox regression analysis.