Data downloading and collection
A total of 432 patients with colon cancer from the TCGA dataset were included in this study. The validation set was from the GEO database. The data set GSE39582 included 566 samples of colon cancer tissue and 19 samples of paracancerous normal tissue. The cancer tissue samples were selected for further analysis. TCGA and GEO related patient information is shown in Table S1.
Immunoinfiltration analysis and screening of M2 macrophage related genes in COAD using WGCNA
We used Cibersort to analyze the immune cell infiltration in patients with TCGA-COAD, and obtained the M2 macrophage score of each sample. Survival analysis showed that the infiltration level of M2 macrophages had an impact on the survival time of patients (Fig. 1A). Patients with higher M2 macrophage scores had poor survival(p = 7.163e-03).To further dig out the genes associated with M2 macrophage infiltration in colon cancer, we performed WGCNA analysis on the data. We performed a WGCNA analysis on the data. We found that when we set the soft threshold to 7, it accords with the scale-free property of biological network, so we used β = 7 to construct the weighted network. Then, we carried out average linkage hierarchical clustering, identified the modules based on TOM's differences, dynamic tree pruning and merging processing, and obtained a total of 20 meaningful modules with different colors (Fig. 1B). Then, all the modules analyzed in WGCNA were correlated with M2 macrophage scoring results, and we found that the tan module had the strongest correlation with M2 macrophages (R = 0.62) and P = 6.1e-13 was statistically significant (Fig. 1C). We enriched and analyzed 110 genes in the tan module, and found that the biological functions of this group of genes were related to immunity (Fig. 1D;figure 1E). So far, we have found a group of stable genes related to M2 macrophages in intestinal cancer, and their biological functions are correlated with tumor immunity to a certain extent.
Screening of hub genes based on cluster analysis
In order to further explore the characteristics of genes in the TAN module, we performed K-value based consistent clustering based on the expression of 110 genes involved in the tan module. According to the cumulative distribution function (CDF), k = 2 was selected as the optimal parameter, and TCGA-COAD patients were divided into two groups, named ClusterA and ClusterB (Fig. 2A).Further, through survival analysis, it was found that the OS of patients in ClusterA and ClusterB had significant differences(P = 0.003),as shown in Fig. 2B.Thus, we infer that M2 macrophage-related genes in the TAN module can affect the OS of TNBC patients through immune-related pathways.
In order to screen hub genes in the module, univariate Cox analysis was conducted on the genes in the TAN module to screen the genes that were related to the survival of patients. When the P value in univariate Cox analysis was less than 0.05, 5 candidate genes could be screened (Table 1). Furthermore, through 1000 LASSO analyses, four hub genes (Fig. 2C) were finally screened out, namely:ANKS4B, CTSD, TIMP1 and ZNF703, which are the most stable prognostice-related genes associated with M2 macrophages in colon cancer.
Table 1
Results of the univariate Cox regression analysis between gene expression and OS.
ID
|
HR
|
HR.95L
|
HR.95H
|
pvalue
|
ANKS4B
|
0.745
|
0.575
|
0.965
|
0.026
|
TIMP1
|
1.421
|
1.102
|
1.831
|
0.007
|
CTSD
|
1.479
|
1.062
|
2.061
|
0.021
|
NFKB2
|
1.437
|
1.022
|
2.019
|
0.037
|
ZNF703
|
0.773
|
0.604
|
0.988
|
0.040
|
In order to verify the prognostic value of the four hub genes, according to the expression of these four genes, we applied the consistency cluster analysis based on the K value, and selected k = 2 as the optimal parameter. TCGA-COAD patients were divided into two groups, named ClusterA and ClusterB (Fig. 2D). Survival analysis showed that the OS of ClusterB was significantly lower than that of ClusterA, and the difference was statistically significant (P = 0.004) (Fig. 2E).
Validation of screened hub genes
In order to verify the stability of the consistency clustering method based on K value for the classification of TCGA-COAD patients, another classification method, namely PCA analysis method, was selected for verification again. The results are shown in Fig. 3A.Meanwhile, ssGSEA results showed that the macrophage score of ClusterB was higher than that of ClusterA, and the difference was statistically significant (P < 0.001; Fig. 3B). The expression of the four hub genes in different clusters is shown in Fig. 3C.At the same time, we chose another dataset, GSE39582, to prove that the change of the dataset would not affect our conclusion. In GSE39582, cluster analysis was conducted according to the expression of hub genes, and the results showed that these 4 genes could well divide patients with colon cancer into two clusters (Fig. 3D). The survival analysis between the two groups showed statistically significant differences in OS (Fig. 3E). Since then, we have obtained consistent results in different datasets and different classification methods, proving the stability and accuracy of the above key gene screening.
Construction of M2I Score
In order to construct M2 macrophage score in TCGA-COAD patients, PC1 values of genes in gene signature A and B were calculated by principal component analysis (PCA) and the sum of PC1 (SPC1a and SPC1b) of gene signature A and B were calculated. Subsequently, the difference between SPC1A and SPC1B was used as M2 macrophage score -- M2I Score. Patients in the TCGA cohort were divided into two groups according to the M2I Score by using the optimal cutoff value obtained by X-tile software. Figure 4A shows the distribution of patients with high and low scores. Patients with high scores were mainly from ClusterB while those with low scores were mainly from ClusterA. Simultaneously, survival analysis showed that the survival of patients in the high-rated group was significantly worse than that in the low-rated group (p < 0.001) (Fig. 4B).Meanwhile, according to Wilcoxon test, immune checkpoints (CD274 and CTLA4) and M2 macrophage marker genes (CCl2, CCR2, CD163, CD40, CSF1R, MRC1 and PDGFB) were significantly overexpressed in the high M2I Score group (Fig. 4C).At this point, We have an accurate M2 macrophage score -- M2I Score, which reflects the level of immune checkpoints and M2 macrophage marker genes.
Correlation between M2I Score and somatic variation
There is substantial evidence that a higher tumor mutation load (TMB) represents a better patient response to immunotherapies such as immunocheckpoint inhibitor therapy18.Considering the important clinical significance of TMB, we decided to explore the relationship between M2I Score and TMB. For this reason, we first analyzed the differences in TMB scores between groups with high and low M2I Scores, and the results showed that TMB was significantly higher in the group with high scores than in the group with low scores (p = 1e-06; Fig. 5A).Meanwhile, Spearman correlation analysis showed that M2I Score was positively correlated with TMB (R = 0.17, p = 0.0016; Fig. 5B).In addition, we also analyzed the differences of somatic cell variation driver genes in the high and low M2I Score group of colon cancer. The top 20 driver genes with the highest frequency of change were selected using the R package "Maftools" for analysis, and the mutation frequency of 16 genes in the high-rated group was higher than that in the low-rated group (Fig. 5C; Fig. 5D). These results suggest that patients in the highly rated group may have a better response to immunotherapy.
The role of M2I Score in predicting the benefit of immunotherapy
In colon cancer, higher microsatellite instability often represents patients' ability to obtain better immunotherapeutic effects6. In order to further explore the relationship between M2I Score and microsatellite instability, relevant data from the TCIA database were used for analysis. According to the TCIA database, the MSI of patients with TCGA-COAD is divided into three levels: 1.MSS -- Microsatellite stabilization; 2.MSI-L -- low instability of microsatellites; 3.MSI-H -- Microsatellite is highly unstable. The proportion of patients with three kinds of MSI levels in the high and low M2I Score group was calculated. The results showed that the proportion of MSI-H in the high M2I Score group was 43% higher than that in the low M2I Score group (11%), and the chi-square test showed that the difference was statistically significant (figure. 6A). Meanwhile, patients were grouped according to the MSI level, and the differences of M2I Score among different groups were compared. The results showed that the M2I Score of patients in the MSI-H group was higher than that of patients in the MSI-L and MSS groups (p = 1.1e-07 and p = 2.3e-09, respectively) (Fig. 6B). This suggests that patients with a high M2I Score are more likely to benefit from immunotherapy.
Based on the above results, we analyzed the difference of efficacy between PD-1 inhibitor and CTLA4 inhibitor in patients with different rating groups according to the sensitivity data of immunotherapy in TCIA database. The results showed that patients in the M2I high-level group, previously associated with somatic variation and microsatellite instability analysis, who might be more sensitive to immunotherapy were more sensitive to PD-1 inhibitors (p = 0.022,Fig. 7A) and PD-1 inhibitors in combination with CTLA4 inhibitors (p = 0.0015,Fig. 7B).For the group with low sensitivity to immunotherapy, we did not use immune checkpoint inhibitors (p = 0.00048,Fig. 7C) or CTLA4 inhibitors alone (p = 0.012,Fig. 7D), which may achieve better efficacy. Taken together, these data suggest that M2I Score may be associated with immunotherapeutic response and may have implications for the selection of immunosuppressive agents in clinical treatment.