Clinical significance of the CAF proportion in BC
The proportions of five CAF and immune cell types in BC and normal samples from the TCGA database were obtained using the CIBERSORT algorithm. We observed marked differences in the proportions of the five types of CAFs (Figure 1), CD4 memory resting T cells, CD4 memory activated T cells, regulatory T cells, resting NK cells, and the stromal score between different cancer stages and normal tissues (Supplementary Figure S1). Based on the median value of the CAF proportion, patients were divided into high and low CAF groups. The prognosis of patients in different groups was compared (Figure 2A-F). Patients with lower proportions of the CAF S3 subset were found to have better OS (p=0.015, Figure 2D). Figure 2G-H illustrates a positive correlation between the CAF-S3 proportion and the immune (R=0.57, p < 2.2e-16) and stromal scores (R=0.33, p < 2.2e-16).
Use of WGCNA to develop a principal-gene module related to CAF-S3 cells
To better represent the scale-free topology of the co-expression axis in the WGCNA, we chose the soft-threshold β = 6 (Figure 3A). Ten co-expressed gene modules were clustered, of which the black, pink, and red modules seen in Figure 3C were most relevant to CAF-S3 cells (Figure 3C). Of these, 921 genes (the black module [427 genes], pink module [255 genes], and red module [239 genes]) were selected as hub CAF-related genes (Figure 3D-F).
Functional enrichment analysis
Gene Ontology (GO) analysis showed that CAF‑related genes were mainly enriched in biological processes related to the ECM and cell adhesion, including focal adhesion, positive regulation of cell adhesion, extracellular structure organization, and ECM organization (Figure 4). Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis indicated marked enrichment of the genes in pathways associated with cell adhesion (Figure 4).
Screening of prognostic genes and generation of a CAF prognostic gene signature (CAFPS)
Only gene expression values greater than 1 were included in this analysis. Univariate analysis showed that 72 CAF-related genes were strongly associated with OS. Further LASSO and multivariate analyses showed that eight CAF-associated genes (IL18, MYD88, GLIPR1, TNN, BHLHE41, DNAJB5, FKBP14, and XG) were independent prognostic factors of BC patient OS (Figure 5A-C).
A risk score (RS) system was established using the expression levels of these eight genes and their corresponding coefficients obtained from the multivariate Cox regression analysis. We then calculated the RS using the formula: RS (CAFPS) = (-0.168 × IL18 levels) + (-0.420 × MYD88 levels) + (-0.338 × GLIPR1 levels) + (-0.236 × TNN levels) +(-0.129 × BHLHE41 levels) + (-0.577 × DNAJB5 levels) + (0.695 × FKBP14 levels) + (0.573 × XG levels).
Evaluation of biomarkers
Genetic alterations in the biomarkers were assessed using data from the TCGA database. Most of the eight biomarkers exhibited high CNV frequencies in BC (IL18: 37.26%, MYD88:22.05%, GLIPR1:17.11%, TNN:27.76%, BHLHE41:16.16%, DNAJB5:14.07%, FKBP14:15.21%, and XG:14.83%, Figure 6).
Predictive accuracy of the eight-gene prognostic signature
Using the median RS, we separated BC patients into either low- (LR) or high-risk (HR) groups to evaluate the predictive accuracy of the eight-gene prognostic signature. This showed that the LR group had significantly better OS, compared with the HR group (HR 2.951, 95% CI 2.220-3.924, P< 0.0001; Figure 7A). The AUCs of the prognostic signature showed that it was effective for predicting the OS of patients with BC (3-year AUC = 0.698, 5-year AUC = 0.719, 8-year AUC = 0.752) (Figure 7B). We generated risk curves and scatter plots to display the RS and survival status of individual BC patients, finding that poorer outcome was strongly related to an elevated RS (Figure 7C-D). Furthermore, a heatmap depicting the expression patterns of the risk-associated genes in both groups showed that FKBP14 and XG levels were significantly increased in HR patients, while the levels of IL18, MYD88, GLIPR1, TNN, BHLHE41, and DNAJB5 were elevated in LR patients (Figure 7E). Multivariate analysis indicated that the CAFPS was an independent prognostic factor of BC patient prognosis after adjustment for other clinical factors (HR 2.466, 95% CI 1.843-3.298, P< 0.001) (Figure 8).
The eight-gene prognostic signature was then validated using the GSE96058 and GSE21653 datasets. Kaplan-Meier survival curves confirmed that the HR cohort experienced worse OS relative to the LR cohort (GSE96085 cohort: P < 0.0001; GSE21653 cohort: P = 0.01) (Figure 9A -B).
We then investigated whether RS could predict outcomes in various subgroups, including age ≥ 60, age < 60, phase I-II, phase III-IV, TNBC, and non-TNBC. This showed that all subgroups were accurately estimated by RS. Stratification analyses in the TCGA indicated worse OS for HR patients in each stratum, including age, stage, and molecular subtype, compared with LR patients. These data indicated that the CAFPS was a robust and independent predictor of OS in different populations. (Supplementary Figure 2).
Correlation between the CAFPS and clinicopathological factors
To further assess whether CAF‑related genes participated in the development of BC, we assessed the associations between CAFPS and patient clinicopathological factors. This showed that the CAFPS was significantly correlated with pathological T stage, pathological M stage, and molecular subtype (Figure 10). The higher the T stage, the higher the corresponding RS. Patients with metastasis and Her2-enrichment had higher RS values.
Generation of the CAFPS-based nomogram
A nomogram was generated using the eight-gene CAFPS and several patient clinical characteristics, namely, age and pathological status, to predict the 3 and 5-year OS of BC patients (Figure 11A). All patients were given a point each for individual prognostic parameters, and a higher total score indicated worse prognosis. We next assessed the predictive efficacy of the nomogram using time-dependent ROC curve analysis. This showed that the AUCs of the nomogram were 0.805 and 0.801 for the 3- and 5-year OS rate predictions, respectively (Figure 11B-C). Moreover, higher AUCs were seen compared with the pathological stage. These results demonstrated the satisfactory predictive efficacy of the nomogram in determining the outcomes of BC patients (Figure 11D-E).
Prognostic value of CAFPS in relation to drug response
Patients in the GSE18728 dataset underwent chemotherapy. Of these, 11 responded while 17 did not. The CAFPS scores of the patients who responded were significantly lower compared to those of the non-responding patients (Wilcoxon test, p=0.048, Figure 12A). Moreover, patients with reduced CAFPS scores responded better than those with elevated scores (Figure 12B). These findings indicate that the CAFPS was effective for estimating patient response to chemotherapy. This also explained the worse outcome and rapid cancer progression of patients with elevated CAFPS scores who underwent chemotherapy.
Evaluation of the prognostic gene signature, immune checkpoints, and hallmark gene networks
We next assessed differences in the levels of immune checkpoint molecules between the HR and LR cohorts. Relative to the HR cohort, the levels of CD274 (PD1) (p < 2.22e-16), PD-L1 (p < 2.22e-16), CTLA4 (p< 2.22e-16), LAG3 (2.7e-13), IDO1 (p< 2.22e-16), and TIGIT (p < 2.22e-16) were significantly raised in the LR cohort (Figures 13). Another interesting finding was that almost all of the hallmark gene networks were significantly associated with CAFPS (Figure 14A). Moreover, the single-sample gene set enrichment analysis (ssGSEA) showed that the CAFPS RS was negatively associated with the TGF-BETA (R=-0.091, p=0.0032), IL-6-JAK-STAT3 (R=-0.37, p<2.2e-16), and TNFA-SIGNALING-VIA-NFKB (R=-0.34, p < 2.2e-16) signaling pathways (Figure 14B-D).