Abundance of CAFs was a poor prognostic factor associated with the progression of HCC
Using EPIC (Estimating the Proportions of Immune and Cancer cells, a tool to estimate the proportions of different cell type from bulk gene expression data) (14), we first accessed the abundance of CAFs in the Cancer Genome Atlas (TCGA) cohort. CAFs were more abundant in HCC samples compared to normal samples (p = 0.0085, Fig. 2A). Moreover, high levels of CAFs in tumor tissue were significantly associated with poorer overall survival (p = 0.02, Fig. 2B) and disease-free survival (p = 0.036, Fig. 2C) in HCC patients. In addition, we investigate the differences in CAF distribution between samples at various phases of disease progression. Notably, there was a significant difference in CAFs abundance between different histologic grade (G1 vs. G2, p = 0.005; G1 vs. G3, p = 0.016, Fig. 2D), pathologic T classification (T1 vs. T3, p = 0.033, Fig. 2E), and pathologic stage (stage 1 vs. stage 3, p = 0.012, Fig. 2F).
Identification Of 140 Cafs Related Degs Based On Adjuvant Sorafenib Efficacy In Hcc
A total of 1,297 genes were differentially expressed between the HCC tumors and normal controls in the TCGA cohort (Fig. 3A), whereas 2,315 genes were differentially expressed between sorafenib RFS responder and non-responder in the GSE109211 dataset (Fig. 3B). Therefore, the overlapped 399 DEGs in the TCGA and GSE109211 were screened out as adjuvant sorafenib efficacy related DEGs in HCC (Fig. 3C). Next, weighted gene co-expression network (WCGNA) was used to identify modules that had a significant association with elevated CAF levels in the TCGA cohort. On the basis of scale independence and average connectivity, 4 was selected as the soft threshold (scale-free R2 = 0.9) (Figs. 3D). Four modules (brown, yellow, green and blue) were positively correlated with CAFs with a correlation coefficient higher than 0.30 (Fig. 3E, 3F). Finally, 140 genes in these four modules were chosen for further analyses as CAFs-related genes based on adjuvant sorafenib efficacy.
Protein-protein Interaction And Function Enrichment Analysis
Then, the potential interactions between these CAFs-related genes based on adjuvant sorafenib efficacy were analyzed by the protein-protein interaction (PPI) network, and the results revealed significant interactions among most of the CAFs-related genes (Fig. 4A). The PTPRC, IL1B, MKI67, DNMT3B, NOTCH3, IL10, MCM8, UBE2C, CSF1R and CTLA4 genes were identified as the top ten core genes in the network (Table 1). With the gene ontology (GO) functional enrichment analyses, the CAFs-related genes were found enriched in the T cell activation, cell-cell adhesion, T cell differentiation, extracellular matrix structural constituent and transmembrane receptor protein kinase activity (Fig. 4B). Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis showed that MAPK signaling pathway, T cell receptor signaling pathway, alcoholic liver disease, TNF signaling pathway, Th17 cell differentiation and Th1 and Th2 cell differentiation were mainly enriched (Fig. 4C). These findings indicated that these CAFs-related genes based on adjuvant sorafenib efficacy were involved in the regulation of HCC tumor microenvironment.
Table 1
A list of 10 core genes among 140 CAFs-related genes in PPI networks
Gene Symbol | Average Shortest Path Length | Betweenness Centrality | Closeness Centrality | Degree |
PTPRC | 1.9 | 1.53258911 | 0.52631579 | 37 |
IL1B | 1.90566038 | 1.35914212 | 0.52475248 | 35 |
MKI67 | 1.72972973 | 1.46805843 | 0.578125 | 30 |
DNMT3B | 1.78873239 | 0.68738682 | 0.55905512 | 28 |
NOTCH3 | 1.9375 | 1.6066114 | 0.51612903 | 27 |
IL10 | 2.05555556 | 0.43948887 | 0.48648649 | 27 |
MCM8 | 2.3 | 0.8922638 | 0.43478261 | 26 |
UBE2C | 1.25 | 0.2473715 | 0.8 | 26 |
CSF1R | 2 | 0.47599578 | 0.5 | 26 |
CTLA4 | 1.94520548 | 0.60220174 | 0.51408451 | 26 |
CAFs, cancer-associated fibroblasts; PPI, protein-protein interaction. |
Development And Validation Of The Novel Prognostic Risk Score Model Based On Crucial Cafs-related Genes
Sixty-eight CAFs-related genes were found to be significantly associated with disease-free survival (DFS) prognosis by univariate Cox analysis (all p < 0.05, Additional file 1, Table S1), and sixteen genes were found to be independent prognostic factors for DFS by multivariate Cox analysis (all p < 0.05, Table 2). After screening for optimal gene combinations using the least absolute shrinkage and selection operator (LASSO) algorithm, ten crucial CAFs-related genes were identified (DCLRE1C, DDX11, MAP4K2, SHCBP1, ADAM12, PAQR4, BEND3, ADAMTSL2, NUP93 and MPP2, Fig. 5A-B).
Table 2
Sixteen CAFs-related genes found to be independent prognostic factors for DFS by multivariate COX analysis
Gene Symbol | P value | HR | HR lower 95% | HR upper 95% |
DCLRE1C | 2.53E-02 | 2.478 | 1.187 | 6.154 |
DDX11 | 7.61E-03 | 1.865 | 1.127 | 3.086 |
MAP4K2 | 2.32E-04 | 2.795 | 1.572 | 4.968 |
SHCBP1 | 2.24E-03 | 3.217 | 1.437 | 7.201 |
ADAM12 | 2.31E-06 | 3.176 | 1.937 | 5.207 |
PAQR4 | 1.64E-02 | 1.478 | 1.033 | 2.116 |
BEND3 | 1.66E-03 | 4.247 | 1.617 | 11.154 |
ADAMTSL2 | 6.85E-04 | 0.728 | 0.600 | 0.884 |
NUP93 | 1.18E-02 | 0.312 | 0.114 | 0.855 |
MPP2 | 2.95E-02 | 0.254 | 0.051 | 0.825 |
HECTD2 | 4.57E-02 | 0.350 | 0.096 | 0.882 |
LIMK1 | 1.53E-02 | 0.566 | 0.337 | 0.948 |
RIBC2 | 4.22E-02 | 0.573 | 0.293 | 0.979 |
KRBA1 | 2.49E-02 | 0.604 | 0.365 | 0.998 |
ATAD3B | 3.11E-02 | 1.707 | 1.162 | 2.994 |
CEP290 | 3.31E-02 | 2.405 | 1.833 | 6.137 |
CAFs, cancer-associated fibroblasts; DFS, disease-free survival; HR, Hazard Ratio. |
Then, the expression and coefficients of these ten genes were utilized to develop a risk score model. The risk score = (-0.0631) * Exp MPP2 + (-0.0577) * Exp NUP93 + (-0.0311) * Exp ADAMTSL2 + (0.0139) * Exp BEND3 + (0.023) * Exp PAQR4 + (0.0252) * Exp ADAM12 + (0.0285) * Exp SHCBP1 + (0.0467) * Exp MAP4K2 + (0.0586) * Exp DDX11 + (0.114) * Exp DCLRE1C. As a result, each patient in the TCGA and GSE76427 cohorts was given a risk score for risk evaluation of DFS. The median risk scores were used to divide the patients into high- and low-risk subgroups in the TCGA and validation groups. Survival analyses showed that the DFS of high-risk subgroups in the TCGA cohort (p < 0.001, Fig. 5C-5D) were significantly worse than the DFS of low-risk subgroups. Similarly, Survival analyses showed that the DFS of high-risk subgroups in the GSE76427 cohort (p = 0.034, Fig. 5F-5G) were significantly worse than the DFS of low-risk subgroups. The ROCs curves were plotted further. In the TCGA cohort, the area under the curve (AUC) for DFS was 0.784, while the specificity and sensitivity were 0.737 and 0.744, respectively (Fig. 5E). In the GSE76427 cohort, the AUC for DFS was 0.724, and the specificity and sensitivity were 0.683 and 0.729, respectively (Fig. 5H).
Furthermore, univariate and multivariate Cox analyses were performed to assess the independent prognostic values of the risk score model in the training and validation cohort. In the TCGA cohort, the pathologic stage and risk score were found to be independently significant in both univariate [stage, p < 0.001, HR = 1.749 (1.468–2.084); risk score status, p < 0.001, HR = 2.041 (1.504–2.770)] and multivariate [stage, p < 0.001, HR = 1.574 (1.259–1.968); risk score status, p < 0.001, HR = 2.012 (1.398–2.897)] Cox analyses (Table 3). Similarly, in the GSE76427 cohorts, risk score was found to be independently significant in both univariate [risk score status, p < 0.05, HR = 1.956 (1.054–3.629)] and multivariate [risk score status, p < 0.05, HR = 1.975 (1.039–3.753)] Cox analyses (Table 4).
Table 3
Univariate and multivariate Cox analysis in the TCGA cohort
Clinical characteristics | Uni-variable Cox | Multi-variable Cox |
HR (95% CI) | P value | HR (95% CI) | P value |
Age (years, mean ± SD) | 0.997 [0.985–1.009] | 6.252E-01 | - | - |
Gender (Male/Female) | 0.866 [0.632–1.187] | 3.715E-01 | - | - |
Pathologic M (M0/M1) | 4.834 [1.515–15.43] | 3.229E-01 | - | - |
Pathologic N (N0/N1) | 1.411 [0.348–5.722] | 6.477E-01 | - | - |
Pathologic T (T1/T2/T3/T4) | 1.688 [1.441–1.976] | 2.439E-01 | - | - |
Pathologic stage (I / II / III / IV) | 1.749 [1.468–2.084] | 1.168E-10 | 1.574 [1.259–1.968] | 6.700E-05 |
Histologic grade (G1/G2/G3/G4) | 1.113 [0.914–1.354] | 2.866E-01 | - | - |
Vascular invasion (No/Yes) | 1.990 [1.407–2.814] | 7.259E-05 | 1.425 [0.960–2.116] | 7.862E-02 |
RS status (Low/High) | 2.041 [1.504–2.770] | 2.917E-06 | 2.012 [1.398–2.897] | 1.680E-04 |
SD, standard deviation; TCGA, The Cancer Genome Atlas; HR, Hazard Ratio; RS, risk model. |
Table 4
Univariate and multivariate Cox analysis in the GSE76427 cohort
Clinical characteristics | Uni-variable Cox | Multi-variable Cox |
HR (95% CI) | P value | HR (95% CI) | P value |
Age (years) | 1.005 [0.983–1.028] | 6.359E-01 | - | - |
Gender (Male/Female) | 1.673 [0.773–3.621] | 1.867E-01 | - | - |
Pathologic stage (I/II/III/IV) | 1.270 [0.891–1.810] | 1.853E-01 | 1.172 [0.814–1.688] | 3.941E-01 |
RS status (Low/High) | 1.956 [1.054–3.629] | 3.341E-02 | 1.975 [1.039–3.753] | 4.443E-02 |
RS, risk score |
Prognostic Nomogram Construction
On the basis of the results of multivariate Cox analyses for DFS in the TCGA, we devised a novel prognostic nomogram for DFS prediction with the pathologic stage and our risk score model (Fig. 6A), which could provide a simple method for predicting the 1-, 3-, and 5-year DFS of HCC. The nomogram was effective in predicting DFS with a high C-index of 0.7835 (95% CI, 0.7518–0.8152, p < 0.001). The C-indexes for predicting 1-, 3-, and 5-year DFS were 0.711 [95% CI, 0.679–0.743], 0.753 [95% CI, 0.721–0.785] and 0.789 [95% CI, 0.757–0.821], respectively. The calibration curves for 1-, 3- and 5-year DFS rates were also largely overlapped with the standard lines for the TCGA cohorts (Fig. 6B). DCA curves demonstrated that the combined model had the best net predictive value for DFS (Fig. 6C). In the TCGA cohort, the AUCs for predicting 1-, 3-, and 5-year DFS with the nomogram were 0.824 [95% CI, 0.792–0.856], 0.795 [95% CI, 0.763–0.827] and 0.790 [95% CI, 0.785–0.823], respectively (Fig. 6D). Taking together, these results indicated that the nomogram combining pathologic stage and our risk score model was effective for predicting HCC recurrence with a high discriminative capacity, which may facilitate the clinical management.
Gene Set Enrichment Analyses Between Different Risk Groups
To investigate the underlying molecular mechanisms of the risk model, we compared the high-risk and the low-risk groups of HCC patients in the TCGA cohort using gene set enrichment analyses (GSEA) analysis. Totally 25 KEGG signaling pathways were significantly enriched. In the high-risk group, oncological signatures dominated the enriched KEGG pathways (including cell cycle, DNA replication, p53 signaling, mismatch repair and so on, Fig. 7A-I). In the low-risk group, KEGG pathways were enriched primarily for metabolism process (including arachidonic acid, fatty acid, linoleic acid, retinol, tyrosine and so on, Fig. 7J-T). In addition, several biosynthesis and degradation pathways were substantially enriched in the low- and high-risk groups (Fig. 7U-Y).
Immune Related Analyses Between Different Risk Score Groups
Since the KEGG pathway enrichment analysis revealed that oncological pathways were enriched in the high-risk group, we questioned the association between risk score and tumor microenvironment. Thus, we calculated the ESTIMATE score, immune score, stromal score and tumor purity using ESTIMATE. We found that risk score was negatively correlated with stromal score (p < 0.001, Fig. 8A), immune score (p < 0.05, Fig. 8B), and ESTIMATE score (p < 0.001, Fig. 8C), and was positively correlated with tumor purity (p < 0.05, Fig. 8D), indicating that the stromal and immune cell content was significantly lower and the tumor purity was significantly higher in the high risk group, resulting in a poorer prognosis for the HCC patients. In addition, we assessed the proportional distribution of immune cells in the TCGA samples using CIBERSORT. In the high-risk group, we observed significantly higher proportions of B cell plasma, activated memory CD4 T cell, follicular helper T cell, M0 Macrophage and lower resting memory CD4 T cell (all p < 0.05) (Fig. 8E). These findings suggested that the infiltration of these immune cell subtypes may have a significant influence on the prognosis of HCC patients.
Tumor Immune Dysfunction And Exclusion Analysis
Given the clinical significance of therapeutic strategies based on immune checkpoint blockade in HCC, we investigate the relationship between the risk score, immune checkpoints and HLA family genes. As shown in Fig. 9A, the gene expression of CD274, CD40, CD70, CD86, CTLA-4, HAVCR2, ICOS, PDCD1, and TIGIT was substantially increased in the high-risk group. In the high-risk group, we also observed a significant decrease in the expression of B2M, HLA.B, and HLA.DOB (Fig. 9B). These findings suggested that the risk score model may affect the immunotherapy response of HCC patients. To investigate further whether our prognostic model can be used to evaluate the efficacy of immunotherapy in HCC patients, we analyzed the immunotherapy response of TCGA-LIHC patients using the tumor immune dysfunction and exclusion (TIDE) database. The results demonstrated that risk score was positively correlated with immunotherapy response (p = 0.0082, Fig. 9C), the high-risk group had a higher ratio of immunotherapy responders (p = 0.03157, Fig. 9D) and the responder group in HCC patients had a better DFS than the non-responder group (p = 0.037, Fig. 9E). It indicated that the HCC high-risk group had a greater immune escape potential and a more effective immunotherapy.
Correlations Between Risk Score Model And Drug Susceptibility
Since we observed that the oncological and metabolism-related pathways were enriched in according to the KEGG pathway enrichment analysis, and the oncological and metabolic changes in tumor may serve as potential targets for chemotherapeutic and targeted drugs. Therefore, we compared the IC50s of a number of chemotherapeutics between different risk groups. Patients in the high-risk group had lower IC50s of cisplatin, camptothecin, gemcitabine, paclitaxel and vinorelbine than those in the low-risk group, suggesting that patients may derive greater benefit from chemotherapy (Fig. 10). In addition, we assessed the sensitivity of patients in different risk subgroups to a number of targeted inhibitors. Patients in the low-risk group had substantially lower IC50s to multiple targeted drugs including lapatinib, erlotinib, gefitinib, dasatinib and lapatinib (Fig. 10). These results indicated the prospective benefits of specific targeted therapy for patients in the low-risk group.