2.1 Construction of CAF clusters with different prognosis and immune states
The study flow diagram is presented in Fig. 1. Firstly, we collected 596 differentially expressed genes (named CAF genes) between CAF and normal fibroblasts (NF) in CRC from a previous study25 (Supplementary Data 1). To find prognostic genes, we then performed univariate Cox regression analysis in GEO combined cohort and recognized 115 potential prognostic genes among above genes (Supplementary Data 2). Furthermore, we conducted unsupervised clustering in GEO combined cohort based on 115 prognostic CAF genes using the ConsesusClusterPlus R package. As shown in Supplementary Fig. 1, the clustering results were most stable when patients were divided into two groups (defined as CAF cluster 1 and 2). The PCA plot shows significant differences in gene expression profiles between the two clusters (Fig. 2A). Remarkably, we found several previously identified CAF-related markers26, 27, including ACTA2, FAP, FOXL1, MCAM, and PDGFRA, were substantially upregulated in CAF cluster 2 relative to CAF cluster 1 (Fig. 2B), suggesting that the distinct status of CAF is correlated to group classification. In addition, MCPcounter analysis showed that fibroblast scores of patients in CAF cluster 2 were markedly higher than patients in CAF cluster 1 (Fig. 2C), while KM survival analysis illustrated that the OS was notably better for patients in CAF cluster 1 than those in CAF cluster 2 (Fig. 2D, log rank test, p = 0.0024). These results imply that the varied CAF status significantly impacts the survival of patients with CRC.
To dissect the underlying biological functions between the CAF clusters, we collected 11 tumorigenesis-related pathways from prior research. Our findings showed that the angiogenesis, TGF β, and F-TBRS signature scores were dramatically elevated in CAF cluster 2 compared to CAF cluster 1 (Fig. 2E). Additionally, the genes expression of APM, CD8 + Teff, and ICI pathways were dramatically increased in CAF cluster 2 (Fig. 2E, F), suggesting potentially distinct biological functions between the two groups. To further understand the immune status between the two groups, we analyzed the expression patterns of 122 immunomodulators (including MHC, receptors, chemokines, and immunostimulants) between CAF cluster 1 and 2, most of which were highly expressed in CAF cluster 2 (Supplementary Fig. 2A). The abundance of most tumor infiltrating lymphocytes inferred by MCPcounter analysis were also significantly higher in CAF cluster 2 than CAF cluster 1, such as T cells, cytotoxic lymphocytes, and neutrophils (Fig. 2C). Besides, ssGSEA-inferred adaptive and innate immunity scores were also significantly increased in CAF cluster 2 (Supplementary Fig. 2B). Whereas, the expression of immune checkpoint molecules was significantly higher in CAF cluster 2 than in CAF cluster 1. These results demonstrate that the CAF clusters exhibit distinct immune microenvironments, with CAF cluster 2 exhibiting an inhibitory immune microenvironment.
2.2 Identification the key genes affecting the prognosis of patients in different CAF clusters
In order to identify the key genes affecting the survival of patients in CAF clusters, WGCNA analysis was carried out, and CAF clusters 1 and 2 were used as the traits. As shown in Fig. 3A, B, the soft threshold power of β was set as 4 when scale-free topology model-fit R = 0.9. Then, we identified 16 modules, except the grey module (Fig. 3C-E). Module-trait heatmap shows that the blue module was the most closely related to CAF clusters (Fig. 3F, G). And the biological functions of genes in the blue modules were explored using GO and KEGG analysis. When the adjusted P-value was less than 0.05, 27 and 12 items were identified by GO and KEGG analyses, respectively (Supplementary Data 3, 4). The top 10 enrichment items in GO and KEGG analyses included extracellular matrix structural constituent, collagen binding, ECM-receptor interaction, PI3K-Akt, and TGF − beta signaling pathway (Fig. 3H, I), which were in consistent with the functions of CAF. Therefore, the blue module containing 1089 genes was identified as the key module, among which 100 genes meeting GS > 0.2 and MM > 0.8 were considered as the critical genes related to CAF clusters (CAFGs). In addition, univariate cox regression analysis identified 43 of the 100 CAFGs was prognostic genes with a p value less than 0.01. Then, these genes were again used for unsupervised clustering in GEO combined cohort (Supplementary Data 5). The detail processes of unsupervised clustering were shown in the Supplementary Fig. 3. Surprisingly, when the patients were again divided into two groups, defined as CAFGs clusters 1 and 2, the clustering results were the most stable. The PCA plot shows that there were significant differences in gene expression profiles between the CAFGs clusters (Fig. 4A). The markers associated with CAF and the fibroblast score inferred by MCPcounter analysis were significantly higher in CAFGs cluster 1 than CAFGs cluster 2 (Fig. 4B, F). Consistently, the OS of CAFGs cluster 1 was significantly worse than that of CAFGs cluster 2 (Fig. 4C, log rank test, p 0.04). Additionally, the heatmap of 11 tumorigenesis-related pathways shows that the angiogenesis, TGF β, and F-TBRS signature score of CAFGs cluster 1 were higher than that of CAFGs cluster 2, and the levels of APM, CD8 + Teff, and ICI signatures were also increased in CAFGs cluster 1 (Supplementary Fig. 4A, B). Besides, the expression 122 immunomodulators (Fig. 4D), adaptive and innate immunity (Fig. 4E), and most tumor infiltrating lymphocytes (Fig. 4F, Supplementary Fig. 4C) were higher in CAFGs cluster1 than CAFGs cluster 2. Furthermore, we observed most of the patients consisting of CAFGs cluster 1 were from CAF cluster 2 (Fig. 4G). Therefore, these findings demonstrated the crucial roles of CAFGs, which can reproduce the biological category of CAF clusters.
2.3 Colorectal cancer patients with high CAFGs score have poor outcomes in multiple colorectal cohorts
Gene model plays an important role in clinical application. In order to construct a scoring model for clinical application, the CAFGs meeting a p value < 0.2 in univariate analysis were included in multivariate unicox regression analysis. Finally, 15 genes with a p value < 0.05 were obtained, including FNDC1, FRMD6, FBN1, RAB31, GLT8D2, COL1A2, GLIS2, COL8A1, GPC6, COL3A1, PRICKLE1, FSTL1, HLX, IGFBP7, and EFS. These genes were considered as the important prognostic factors, and again incorporated in the cox model. Next, using the expression values of these 15 genes and their corresponding regression coefficients, a scoring model, named CAFGs scoring system, was constructed (Supplementary Data 6). We then included TCGA COAD and GSE39582 cohorts as external and internal validation sets. According to the best cutoff value determined by survminer R package, patients in these cohort were divided into high and low CAFGs score groups (Supplementary Data 7). We found that the expression levels of CAF markers were significantly higher in the high CAFGs score group than in the low CAFGs score group (Fig. 5A: GEO combined; Fig. 6A: TCGA COAD; Supplementary Fig. 5A: GSE39582). Then, we observed that patients with high CAFGs scores showed worse OS in GEO combined cohort, which were also verified in the internal and external cohorts (Fig. 5B: GEO-combined; Fig. 6B: TCGA COAD cohort; Supplementary Fig. 5B: GSE39582). In additional, we analyzed the DFS, RFS, and DSS in GSE39582, GSE17536, and GSE17537 cohorts. The results show that the RFS, DFS, and DSS of patients with high CAFGs scores were also significantly worse than that of patients with low CAFGs scores (Fig. 5C-F, GSE39582 RFS, GSE17537 DFS, GSE17536 DFS, GSE17536 DSS). Further analysis demonstrated that high CAFGs scores also represented poor OS in both early (Stag I&II) and advanced (stage III&IV) patients (Fig. 5G, H; Fig. 6C, D; Supplementary Fig. 5C, D). And the CAFGs scores of patients in stage III&IV were significantly higher than that of patients in Stag I&II (Fig. 5I; Fig. 6E; Supplementary Fig. 5E). CMS classification, a widely used classification system in CRC, has strong prognostic implications, and includes four subtypes, such as CMS1 (MSI Immune), CMS2 (Canonical), CMS3 (Metabolic), and CMS4 (Mesenchymal). To our surprise, CAFGs scores of patients in CMS subtype 4 were significantly higher than those of patients in CMS subtypes 1–3 in the combined GEO cohort, as well as TCGA COAD and GSE39582 cohorts (Fig. 5J; Fig. 6F; Supplementary Fig. 5F; Supplementary Data 8). CMS subtype 4 is mesenchymal subtype characterized by prominent transforming growth factor β (TGF-β) activation, stromal invasion, and angiogenesis; and has been reported to be associated with poor prognosis.
Next, we conducted a comprehensive investigation of the CAFGs scoring system in the TCGA COAD cohort and examined its association with various clinicopathologic features, such as T, N, and M stages, as well as venous and lymphatic invasion. Our observations revealed that patients in the T3&4 stage, N + stage, and those with lymphatic invasion had significantly higher CAFGs scores than patients in the T1&2 stage, N0 stage, and without lymphatic invasion (Fig. 6G-I; Supplementary Fig. 6). This may underscore the importance of early detection and monitoring of lymphatic invasion and metastasis in patient with high CAFGs score. Moreover, we also performed GSVA by using hallmark gene sets from the MSigDB website, and revealed that several signaling pathways, including EMT, were significantly upregulated in patients with high CAFG scores (Supplementary Fig. 7A).
2.4 Patients with a high CAF score develop resistance to immunotherapy
It’s well known that the therapeutic effect of ICB treatment is closely related to the tumor immune microenvironment, including the abundance of CAF. Therefore, we used TIDE analysis to explore the relationship between CAFGs score and therapeutic response to ICB, which focused on two mechanisms of tumor immune evasion, namely, T cell dysfunction and exclusion. Through TIDE analysis, we found a significant correlation between CAFGs score and T cell disfunction score, T cell exclusion score, and TIDE score (Fig. 7A). In addition, the expression levels of ICI genes were markedly higher in the high CAFG score group compared to the low CAFG score group (Supplementary Fig. 7B). These findings indicated that patients with high CAFGs score may suffer from immune evasion and resistance to immunotherapy. Furthermore, we collected two patient cohorts receiving anti–PD-1 or anti–PD-L1 therapy (GSE78220, and IMvigor210) to analyze the relationship between CAFGs score and ICB efficacy. Indeed, in the high CAFGs score group, we found a higher proportion of no responders (Fig. 7B, C). Consistently, patients with high CAFGs score had significantly worse overall survival (OS) after receiving ICB therapy (Fig. 7D, E). These findings indicated that CAFGs score may have the potential to identify CRC patients who were sensitive to ICB therapy.
2.5 Exploring potential molecules associated with the identity and function of CAF
Several of these 15 genes belong to CAFGs scoring system were previously reported to be specifically associated with the identity and functionality of CAF, including COL1A2, COL3A1, COL8A1, and FBN1, implying the remaining genes may also correlate with CAF. Further analysis of the relationship between these 15 genes and CAF levels in TIDE website revealed that almost all of the genes were significantly correlated with CAF levels (Fig. 8A). Single-cell transcriptomic data from patients with CRC also indicated that nearly all of the genes were significantly expressed in stromal cell types, such as IGFBP7, GLT8D2, FSTL1, GPC6, and FRMD6 (Fig. 8B-D, Supplementary Fig. 8). The proteomic data obtained from CAF derived from AOM/DSS-induced CRC mice and normal mice-derived NF indicates that in addition to commonly known CAF related proteins (such as COL3A1, COL1A2, and FBN1), IGFBP7 and FSTL1 were significantly upregulated in the CAF conditioned medium (Fig. 8E). And transcriptional data reveals that compared to the adjacent non-tumorous samples, expression of COL3A1, COL1A2, FBN1, IGFBP7, and FSTL1 was significantly elevated in the tumor tissue (Fig. 8F). These results suggest that the 15 marker models we identified might play crucial roles in CAF-mediated CRC progression.