Two immune-enhanced molecular subtypes differ in inflammation, immune checkpoints, mutations and outcome in Stage I-II colonic carcinoma.

Background: The purpose of this paper is to investigate the immune function of the tumor microenvironment and its clinical correlation with colonic carcinoma. Methods : Immune genes were downloaded from the The Cancer Genome Atlas TCGAdatabase. Five subtypes are obtained by cluster screening based on immune gene expression data. Results: The C3 and C4 subtypes show stronger immune activity. In addition, the C4 subtype has the largest number of gene mutations and the worst prognosis. Most of the immune signatures are upregulated in the C4 subtype, while most of the immune infiltration-related cells are upregulated in the C3 and C4 subtypes. Conclusions: The different immune microenvironments between these subtypes may provide new ideas for immunotherapy strategies in colon carcinoma. The Mesenchymal (CMS4) cells lymphocytes monocytes, mesenchymal characteristics of angiogenesis, immunosuppression. the of mesenchymal are high-density This mesenchymal subtype may produce chemokines and cytokines that are conductive angiogenesis, resulting in a poor prognosis.


Background
Colonic carcinoma is one of the most common malignant tumors in the world. The incidence and mortality of colonic carcinoma rank third among all malignant tumors, and the incidence rate is on the rise . Most of the cases are found in the middle and late stages. Approximately 1/4 of colonic carcinoma patients have metastasis at the time of diagnosis, and the 5-year survival rate is less than 10%. The vast majority of patients with colonic carcinoma suffer from recurrence within two years after surgery [2]. Although there are many studies on colonic carcinoma, its pathogenesis is not yet completely understood [3]. The main treatment of colonic carcinoma is surgery, although treatment also includes chemotherapy, radiotherapy, and biotherapy. Nevertheless, the effects of these treatments are often unsatisfactory [4]. Hematogenous metastasis and lymphatic metastasis often occur in the late stage of colonic carcinoma. Postoperative recurrence and metastasis are the common causes of death [5]. At present, there is still a lack of effective new treatments. With the innovation and improvement in basic science, immunotherapy is regarded as the most likely way to cure tumors [6]. Tumor immunotherapy is a therapeutic method to control and remove tumors by restarting and maintaining tumor-immune circulation and restoring the normal anti-tumor immune response [7]. Compared with conventional tumor therapy, tumor immunotherapy mainly targets the immune system or tumor microenvironment rather than the tumor cells themselves. Tumor immunotherapy can also induce the synergistic anti-tumor effect in combined therapy [8].
Normally, after a tumor develops, the anti-tumor effect is initially performed by the innate immune system. Effector cells include NK cells, neutrophils, macrophages and so on. Subsequently, the adaptive immune response is initiated, including humoral immunity and cellular immunity, which produces antibodies and effector T cells to play an anti-tumor role [9]. As the tumor grows, tumor cells successfully evade the attack of the host immune system through immunosuppressive mechanisms. Different tumors can inhibit the effective identification and killing of tumor cells by the immune system through different mechanisms to produce immune tolerance and even promote the occurrence and development of tumors [10]. At present, a variety of immunotherapeutic drugs have been developed for this mechanism, including monoclonal antibody immune checkpoint inhibitors, therapeutic antibodies, cancer vaccines, cell therapies and small molecular inhibitors [11][12][13]. In recent years, there has been substantial good news about tumor immunotherapy. At present, tumor immunotherapy has shown strong anti-tumor activity in the treatment of a variety of solid tumors, such as melanoma, non-small cell lung cancer, kidney cancer and prostate cancer. Several tumor immunotherapeutic drugs have been approved for clinical use by the Food and Drug Administration (FDA) in the United States [14][15].
However, the complexity of the tumor immune microenvironment not only increases the difficulty of tumor immunotherapy but also affects the efficiency of immunotherapy [16]. How to optimize these immunotherapy methods, screen the dominant population, or carry out immunotherapy for a specific group of people is a current urgent question to be answered by precision medicine [17]. There are many evidence-based medical studies proving that a comprehensive understanding of various immunomodulatory molecules in the tumor-related microenvironment, according to the effective immune molecule classification to guide treatment, is an important task for precision immunotherapy.
At present, the expression patterns of immune checkpoint molecules in colonic carcinoma and the potential clinical correlations are unclear. Therefore, there is an urgent need for in-depth studies of the overall immune status and immune gene expression information of patients, according to clinical information and immune characteristics of patient samples, for more accurate classification to guide the clinical application of immunotherapy.
In summary, the purpose of this study is to explore the overall immune characteristics and clinical correlation of patients with colonic carcinoma. We have identified five molecular subtypes of colonic carcinoma based on the expression data of immune genes in the TCGA database. Among these subtypes, the C3 and C4 subtypes show stronger immune activity. Nevertheless, these subtypes are significantly different in clinical features, immune scores, gene mutations, prognosis and immune checkpoints, and metabolic pathways involved in regulation. These experimental results have been further verified in the GSE39582 dataset of the GEO database.

Data download
We download clinical follow-up information from the RNA-Seq dataset of colonic carcinoma from the GDC. From previous studies, we collect 13 types of immune metagenes, six types of immune cells corresponding to each sample of colonic carcinoma downloaded from Timer (https://cistrome.shinyapps.io/timer/), and immune genes downloaded from the ImmPort database (https://immport.niaid.nih.gov). We use the R package estimate to calculate the immune score and matrix score for each sample.

Data preprocessing
First, we convert FPKM to TPM by R script. Next, we extract the immune gene set using the expression level. The samples with an expression level greater than 0 are screened out. More than 30% of the genes are included in this study as immune genes. We match the expression profile samples and clinical follow-up samples, and we select both samples. We analyze all the Stage combinations, finding that the Stage I + II results are the best, which are related to the prognosis. Therefore, we further select the samples of Stage I and Stage II for this study. We then extract the immune gene set with the expression level from the expression profile. The samples with an expression level greater than 0 are screened out, and more than 30% of the genes are included in this study as immune genes. Finally, 1206 genes and 251 samples are included, as shown in Supplement Table 1. Molecular subtype screening based on immune genes The expression profile of immune genes is used for consensus clustering. We use R software package ConsensusClusterPlus to screen molecular subtypes, which is the same method chosen by Zhang et al [18]. We use the Euclidean distance method to calculate the similarity distance between samples and use K-means to cluster. We use a resampling scheme to sample 80% of the samples and resample 100 times. The optimal clustering number is determined by the cumulative distribution function (CDF). The clustering significance between these subtypes is further analyzed by using the R software

Relationship between subtypes and immunity
Previous studies have reported the key gene sets involved in the immune process. We collect 13 types of immune metagenes to analyze the relationship between these metagenes and subtypes, finding that the immune components of tumor tissue are closely related to the tumor prognosis. We analyze the relationship among matrix, immune scores and molecular subtypes. It is found that the score of immune cell infiltration directly reflects the degree of immune infiltration in tumor tissues and is also closely related to the occurrence and development of tumors. We also analyze the relationship between the scores of immune cell infiltration in subtypes and use variance analysis to evaluate the differences of the above scores in different subtypes.

Relationship between subtypes and survival prognosis
We extract data from the sample follow-up information and use K-M to analyze the prognostic differences of different subtypes.

Screening of molecular subtypes based on immune genes
We use the R software package ConsensusClusterPlus for consensus clustering based on immune gene expression profiles to screen molecular subtypes. We use the Euclidean distance method to calculate the similarity distance between samples and use K-means to cluster. We use a resampling scheme to sample 80% of the samples and resample 100 times. The optimal clustering number is determined by CDF as shown in Fig. 1A, from which we can see that the clustering results are more stable when the Cluster is 5. Further observation of the CDF delta area curve is shown in Fig. 1B, from which we can find that there is a stable clustering result when the Cluster is selected as 5. We eventually choose k = 5 and obtain five molecular subtypes. Furthermore, the clustering significance between the five subtypes is analyzed by using the R software package sigclust. It is found that the classification between C1 and C2 is not significant (p = 1). However, C1-vs-C4 and C3-vs-C5 have a significantly different expression distribution, p < 0.05 (as shown in Supplement Table 2). Cluster analysis of the expression profiles of the five molecular subtypes According to the consensus clustering results, we select the stable clustering results of k = 5, as shown in Fig. 2A. It can be seen that 251 tumor samples have been assigned to these five categories.
Furthermore, we use the expression profiles of 1206 immune gene set for subtype difference analysis and use the Kolmogorov-Smirnov test to screen for genes with high expression in one subgroup relative to the other subtype samples. Using FDR < 0.05 as the threshold, 80 genes with significantly high expression in the C1 subtype, 79 genes in the C2 subtype, 153 genes in the C3 subtype, 148 genes in the C4 subtype and 73 genes in the C5 subtype are screened out. We then select the expression profiles of the first 100 genes in the most significantly high expression of each subtype for PCA (if the genes with significantly high expression are fewer than 100, all the differentially expressed genes are taken). The first two principal components are obtained to draw a scatter plot, as shown in Fig. 2C, from which we can see that the five subtypes can be clearly separated. The expression profiles of these genes are further used to make heat maps, as shown in Fig. 2D. It can be seen that each subtype has a clear boundary in the expression profile of these genes and has obvious expression patterns as well.

Relationship between the five subtypes and clinical features
We analyze the relationship between the five subtypes and Age, Gender, T, N, M and Stage, as shown in Table 1. It can be seen that there is no significant relationship between the five subtypes and these clinical features. We further analyze the relationship between these five subtypes and CMS subtypes,

Relationship between the five subtypes and immunity
To analyze the relationship between these five subtypes and immunity, we collect 13 types of immune metagenes, tumor immune component (matrix, immunity, and tumor purity) scores, six types of immune cell infiltration scores, and MCP counts of 10 immune-related cells from previous studies. We analyze the relationship between the immune-related scores of the three groups and the five subtypes. Most of the 13 types of immune metagenes have high expression in C3 and C4, as shown in Fig. 4A. The immune score of the C4 group is significantly higher than that of the other groups. The matrix score of group C3 is significantly higher than that of other groups, as shown in Compared with the other categories, the prognosis of the C4 subtype is significantly worse than that of the C1 and C3 samples, as shown in Fig. 4B. This result suggests that there are two opposite clinical prognostic outcomes in the immune-enhanced C3 and C4 subtypes in colonic carcinoma.

Relationship between the five subtypes and mutations
Gene mutations of the APC, K-RAS and p53 genes are considered to be essential in the development of colonic carcinoma. BRAF is also a common mutation in colorectal cancer species. Therefore, we analyze the relationship between TP53, KRAS, BRAF and APC mutations in these five types of Relationship between the five subtypes and the expression of eight immune checkpoint genes We further analyze the relationship between the expression of eight immune checkpoint genes in these five subtypes, as shown in Fig. 8A. The expression of PDCD1LG2, CTLA4, CD86 and CD80 in C3 and C4 subtypes is significantly higher than those in other types. The expression of PDCD1, CD274 and CD267 in C 4 is higher than those of other groups. Further statistics of the gene expression distribution of the eight immune checkpoints, as shown in Fig. 8B, reveal that there are significant differences in the expression distribution in the five types of samples.

Mining of immune-enhanced subtype-related modules by WGCNA analysis
To further explore the prognostic markers related to the immune microenvironment of colonic carcinoma, we collect the expression profile data for the five subtypes of immune-related genes. A total of 483 samples are obtained. We further use the Pearson correlation coefficient to calculate the distance between each transcript. The R software package WGCNA is used to construct a weight coexpression network. We select the soft threshold of 4 to screen the co-expression module. The results show that the co-expression network conforms to the scale-free network. That is, the logarithmic log (k) of the node with connection degree k is negatively correlated with the logarithmic log(P(k)) of the probability of the node, and the correlation coefficient is greater than 0.8. To ensure that the network is scale-free, we choose β = 4 (Fig. 9A, B). The next step is to convert the expression matrix into an  Table 2, from which we can find that 328 transcripts are assigned to the three co-expression modules. We calculate the correlation between the eigenvectors of the four modules and the five subtypes respectively, as shown in Fig. 9D.
It can be seen that blue module is positively correlated with C3, the brown module is positively correlated with C4, and the turquoise module is positively correlated with C3 and C4. The number of transcripts in the three modules is 112, 101 and 115 genes, respectively.
We further analyze the functions of the genes in the blue and brown modules that are most relevant to C3 and C4. We use the R software package clusterProfiler for KEGG enrichment analysis and select a significance of FDR < 0.05. In the two modules, the blue module is enriched into 72 pathways, among which the most significant 20 pathways are shown in Fig. 10B. The brown module is enriched into 79 pathways, among which the most significant 20 pathways are shown in Fig. 10A. For all results, please refer to Supplement Table 3. We analyze the relationship between the paths in which these two modules are enriched, as shown in Fig. 10. It can be seen that the two modules are completely enriched to 103 pathways. There are 18 (17.5%) KEGG pathways enriched between the two modules, with fewer intersections, as shown in Fig. 10C. This result suggests that the C3 and C4 samples have different pathway anomalies and may have different regulatory mechanisms.

External validation
We select the genes in the gene co-expression module (blue and brown modules) that are closely related to the C3 and C4 subtypes to calculate the correlation between the genes and the modules. samples, as shown in Supplement Table 4. First, we analyze the expression distribution of 13 immune metagenes in each subtype of these five samples, as shown in Fig. 11A. The high expression of most metagenes in C3 and C4 is consistent with the validation set. Further analysis of sample immune scores is shown in Fig. 11B. It can be observed that the immune score of the C4 group is significantly higher than that of other groups, and the matrix score of the C3 group is significantly higher than that of other groups, which is consistent with the training set. Ultimately, we analyze the prognostic differences of the five subtypes, as shown in Fig. 11E. It can be seen that there is no significant difference in prognosis among the five subtypes, but the prognosis of C4 is worse than that of other subtypes. Further analysis of the prognostic differences between the five types of samples and recurrence shown in Fig. 11F reveals that there is no significant difference in prognosis. The prognosis of C4 is worse than that of the other types. No significant difference in prognosis may be due to the short median follow-up time (Median OS = 58.5 months, RFS = 53 months).

Discussion
There are many publications studying the mechanism of cancer based on gene expression data. For example, Yang Yong et al [19] used systematic bioinformatics methods to explore the DNA methylation regulatory role in colonic carcinoma samples in the TCGA database. Liu Rong et al. [20] used a gene weighted co-expression network analysis technique to extract the co-expressed gene network from mRNA expression and identified a total of 11 co-regulatory modules in the sample data set of 461 patients with colonic carcinoma. Most of these studies are based on genome-wide research.
Such research is comprehensive, but often ignores the characteristics of immune-related genes [21].
Our research focuses on immune genes, which helps obtain additional details about immunity.
In this study, we run consensus clustering of immune gene expression profiles in the TCGA database and obtain five subtypes after screening. The subsequent principal component analysis (PCA) has proven that the five molecular subtypes can be clearly distinguished. It is found that the C3 and C4 subtypes show stronger immune activity among the five subtypes. These subtypes are different in terms of clinical features, immune scores, gene mutations, prognosis, immune checkpoints, and metabolic pathways involved in regulation, which are validated in external GEO datasets. These findings may provide some new insights into immunotherapy for colonic carcinoma.
First, we compare the relationship between the five subtypes and clinical features, in which C1, C2, C3, and C4 are strongly correlated with the common molecular typing (CMS) of colonic carcinoma.
Subtypes C1 and C2 are closely related to CMS2 (classic type) and CMS3 (metabolic type), respectively. These subtypes often show low immunity and inflammatory features. The tumors lacking infiltration of lymphocytes and immunomodulatory factors are usually PDL1 negative. It is suggested that the immunogenicity is poor [22][23], which may be the reason for the low immune activity of the C1 and C2 subtypes. In the following immune scores, the scores of the C1 and C2 subtypes are lower in 13 types of immune metagenes, tumor immune components (matrix, immunity, and tumor purity) scores, six types of immune cell infiltration scores and ten kinds of immune-related cells MCP count scores. The C3 subtype is closely related to the CMS4 type (mesenchymal type). For the characteristics of the CMS4 type, tumor matrix infiltrating epithelial cells, tumor-associated fibroblasts (CAFs) and innate immune cells produce more inflammatory factors [24], which may be the reason for the strong immune activity of the C3 subtype. Most of the 13 types of immune metagene are highly expressed in the C3 subtype. The matrix score of the C3 subtype is significantly higher than that of other subtypes. Among the 10 types of immune-related cells, in Endothelial cells, Fibroblasts and six types of immune infiltration scores, CD4_Tcell, Neutrophil, Dendritic and Macrophage in the C3 subtypes are significantly higher than those in the C1, C2 and C5 subtypes. The C4 subtype is closely related to CMS1 (immunophenotype). CMS1 is characterized by hypermutation and hypermethylation.
The immune cells significantly infiltrate the tumor microenvironment, especially CD8 + T cells, CD4 + T cells and NK cells [25], which may be the reason for the stronger immune activity of the C4 subtype.
In the subsequent immune scores, it can be observed that most of the 13 types of immune metagenes are highly expressed in the C4 subtype. The immune score of the C4 subtype is show that the expression of PDCD1LG2, CTLA4, CD86 and CD80 in C3 and C4 subtypes is significantly higher than that in other subtypes, and the expression of PDCD1, CD274 and CD267 in C4 is higher than that in other groups. The high expression of immune checkpoints may be the cause of stimulating the body to produce stronger immune activity. C1, C2 and C5 subtypes have lower immune activity expression, while the prognosis of these three subtypes is significantly different and not the worst, which is found in the prognostic analysis afterwards. It is worth mentioning that although C4 has a high expression of immune activity, its prognosis is the worst, which suggests that the level of immune activity in colonic carcinoma cannot directly indicate the quality of prognosis.
In addition, the mutation characteristics of five subtypes also reflect the characteristics of different

Availability of data and materials
The data sets used and/or analyzed during the current study are available from the corresponding author on reasonable request. 02;17(2).

Supplemental Tables
Supplement Table 1: The number of genes and samples included in this study Supplement      Expression distribution of immune metagenes and immune infiltration scores in five subtypes; variance analysis of various score differences.   The expression of eight immune checkpoint genes in the five subtypes, as tested by variance analysis.