1. Overview of study design
In this study, we first combined bulk RNA-seq and scRNA seq data to access hub genes related to CD8+ T cell in HCC. In terms of bulk RNA-seq, the composition of 22 types of immune cells was calculated in TCGA-LIHC and HCC patients were stratified into two groups based on CD8+ T cell composition. The impact of CD8+ T cell composition on prognosis was assessed using Kaplan-Meier analysis. WGCNA was employed to identify CD8+ T cell-related genes in HCC. Furthermore, single-cell analysis was conducted to identify CD8+ T cell and obtain cell markers.
The present study investigated the interaction between CD8+ T cells and other cells through cell chat. A risk scoring model was developed by integrating CD8+ T cell-related genes and markers, followed by immune infiltration and immune checkpoint analysis. Additionally, immunotherapy analysis was conducted to identify sensitive drugs. Immunohistochemistry was utilized to determine the expression of model genes in HCC samples and corresponding adjacent tissues. Additionally, correlation analysis was performed between the riskscore and commonly used clinical indicators. The design of this study was summarized in Fig. 1.
2. CD8 + T Cell related genes identified by WGCNA in HCC
The mRNA sequencing results of HCC patients were downloaded from TCGA database including 371 tumor tissues and 50 adjacent normal tissues. 31 patients were removed due to OS fewer than 30 days or incomplete survival information. 5085 DEGs were identified between normal and tumor samples with the screening criteria of p value < 0.01 and |log FC| > 0.5 (Figure S1A).
In order to explore differences of immune cell infiltration in normal and tumor tissues, CIBERSORT analysis was used to estimate the proportion of immune cells in each sample. Tumor samples demonstrated higher abundance of CD8+ T cells, regulatory Tregs, M0 Macrophages, M1 Macrophages, and resting mast cells (Fig. 2A). To further clarify the influence of CD8+ T cells on HCC, 340 TGCA-LIHC patients were divided into high- and low- CD8+ T cell-compositiongroups. Kaplan – Meier analysis confirmed a worse prognosis for low-CD8+ T cell- composition patients compared to high-CD8+ T cell- composition patients (p value = 2.563e-04) (Fig. 2B).
Based on this observation, WGCNA was undertaken to identify CD8+ T cell-related genes in HCC. For WGCNA construction, there was no outliers (Figure S1B) and the soft threshold was set to 5 for the criterion of free-scale topology (Figure S1C). Using hierarchical clustering, dynamic branch cutting methods, and the Topological Overlap Matrix (TOM), 5085 DEGs were grouped into 10 modules (Fig. 2C). Module-trait relationships were analyzed using correlations between the module genes and CD8+ T cell-composition, which identified co-expressed gene modules with significant correlations to CD8+ T cell. As shown in Fig. 2D, the yellow module was associated most significantly with high CD8+ T cell-composition group (R = 0.25, p value = 4e-06). Thus, 547 genes (Supplementary Table S1) in the yellow module were selected for downstream analyses.
3. CD8+ T Cell marker genes identified by HCC scRNA-seq data
In HCC scRNA-seq data set GSE125449, a total of 21289 genes in 3547 cells passed quantity control. Vlnplots in Supplementary Figure S2A displayed the number of genes (nFeature), the sequence count per cell (nCount), the percentage of mitochondrial genes (percent. mt), and the percentage of hemoglobin (percent. HB). The correlation analysis showed a positive correlation between nCount and nFeature (Supplementary Figure S2B). 2000 variable genes were plotted in a scatter diagram (Supplementary Figure S2C). Following this, thirty PCs were identified (Supplementary Figures S2D and E), which showed high heterogeneity in HCC. These top twenty PCs were then analyzed using t-SNE. The t-SNE algorithm separates single cancer cells into 21 clusters based on the top 2000 highly expressed genes (Fig. 3A). According to cell type annotation, HCC cells were clustered into seven groups, which were composed of adipocytes, B cells, CD8+ T cells, endothelial cells, epithelial cells, fibroblasts, and monocytes (Figs. 3B and C).
Marker genes of these cells were detected by the screening criteria of p value < 0.05 and |log FC| > 0.5 (Supplementary Table S2) and shown in heatmap (Fig. 3D). 670 CD8+ T cell marker genes were used for subsequent studies. GO and KEGG Enrichment were performed on these 670 markers. For Go enrichment, the biological process (BP) category included “cytoplasmic translation” and “regulation of peptidase activity”, the cell component (CC) category included “focal adhesion” and “cell − substrate junction”, and the molecular function (MF) category included “structural constituent of ribosome” and “enzyme inhibitor activity” (Fig. 3E). The results of KEGG enrichment analysis showed that these marker genes were mainly enriched in “Coronavirus disease − COVID − 19” and “Ribosome” pathways (Fig. 3F).
4. Cell-cell interactions between CD8+ T cell and other cells
Cell chat analysis was performed to investigate the interactions among different cell types. CD8 + T cells directly affected endothelial cells, epithelial cells, and monocytes, and were regulated by endothelial cells, epithelial cells, fibroblasts, monocytes, and adipocytes. However, there was no significant interactions between CD8 + T cells and B cells (Fig. 4A). Application of this analysis also uncovered outgoing signaling pathways, CD8 + T cells were linked to MIF, PARs, LIGHT, and IFN-II signaling (Fig. 4B and C). CD8 + T cells served as mainly mediator and influencer in MIF signaling pathway, sender in LIGHT and IFN-II signaling pathways, but weak sender in PARs signaling pathway (Fig. 4D).
5. Construction and Validation of CD8+ T cell-related prognostic signature
Taking the intersection of CD8+ T cell marker genes and yellow module genes, a total of 62 CD8+ T cell-associated genes were obtained (Fig. 5A). Based on overall survival information of 340 TCGA-LIHC samples, 14 genes were screened out from the 62 genes by univariate cox regression analysis (Fig. 5B). The LASSO regression analysis (Fig. 5C and D) sorted out four CD8+ T cell- associated genes, including KLRB1, RGS2, IL7R and TNFRSF1B. Then, multivariate cox regression analysis was performed on the four genes, and three genes, KLRB1, RGS2 and TNFRSF1B, were able to enter the equation as a prognostic predictor (Fig. 5E, Table 1). Compared with normal tissues, these three genes were all significantly downregulated in tumor tissues (Supplementary Figure S3A). In single-cell dataset, KLRB1 was mostly found in CD8+ T cells, TNFRSF1B was distributed in CD8+ T cells and Monocytes, while RGS2 was distributed in CD8+ T cells, B cells, Epithelial cells, Fibroblasts and Monocytes (Supplementary Figure S4).
Table 1
The information of 3 model genes.
Gene symbol
|
Gene ID
|
Full name
|
Function of the encoded protein
|
RGS2
|
5997
|
regulator of G protein signaling 2
|
RGS proteins are able to deactivate G protein subunits of the Gi alpha, Go alpha and Gq alpha subtypes.
|
KLRB1
|
3820
|
killer cell lectin like receptor B1
|
The KLRB1 protein is classified as a type II membrane protein.
|
TNFRSF1B
|
7133
|
TNF receptor superfamily member 1B
|
The TNFRSF1B protein is a member of the TNF-receptor superfamily. This protein and TNF-receptor 1 form a heterocomplex that mediates the recruitment of two anti-apoptotic proteins, c-IAP1 and c-IAP2, which possess E3 ubiquitin ligase activity.
|
Kaplan-Meier survival curves for both the training set and validation set confirmed that higher expression of RGS2 and lower expression of KLRB1 and TNFRSF1B contributed to poorer survival (Supplementary Figures S3B and C).
These three model genes were enriched in pathways related to regulation of muscle hypertrophy for BP. For CC, the model genes were enriched in “tumor necrosis factor − activated receptor activity”, “death receptor activity”, “cyclase regulator activity” et al pathways (Fig. 5F).
Risk score of each patient was calculated according to the following formula.
Risk Score=RGS2×0.2801-KLRB1×0.3221-TNFRSF1B×0.2167
Using the median risk score as the cutoff, HCC patients were divided into high and low-risk groups. Survival analysis (Fig. 6A) showed that the prognosis of high-risk group was worse than that of the low-risk group (p = 5.028e-07). The heatmap revealed that the expression levels of KLRB1 and TNFRSF1B in low-risk group were higher than those in high-risk group, while RGS2 expression was higher in high-risk group (Fig. 6B). Figure 6C illustrates an incremental distribution of patient risk scores. According to risk plot, more patients in high-risk group distributed at the bottom of figure with shorter survival times, on the contrary, more patients in low-risk group distributed at the top of figure with longer survival times (Fig. 6D). Time-dependent ROC curves were plotted to evaluate the risk model. The area under the ROC curve (AUC) at one, three, and five years was 0.773, 0.743, and 0.691, respectively, indicating that the model could be used to predict patient prognosis accurately (Fig. 6E). As well as, these results were verified in an independent GEO dataset GSE14520 (Fig. 6F-I). Time-dependent ROC curves revealed the AUC of validation cohort at one, three, and five years was 0.688, 0.691, and 0.656 respectively (Fig. 6J). Results from the validation dataset (Figs. 6F-J) support those presented above (Figs. 6A-E), confirming the risk scoring model as an independent prognostic factor for HCC.
The clinicopathological characteristics of high-risk and low-risk groups differed as well as their survival rates. The heatmap showed the clinicopathological characteristics and expression of model genes in high-risk and low-risk groups (Fig. 7A). In terms of age, gender, M stage and N stage distribution, there were no significant differences between high-risk and low-risk groups, whereas there were differences in tumor grade, T stage and tumor stage. There was a significantly higher proportion of patients of G3-4, T3-4, and stage III in high-risk group than those in low-risk group (Fig. 7B). HCC patients subdivided according to their clinicopathological characteristics showed poorer survival outcomes in high-risk group, whether grouped by age, tumor grade, tumor stage or TNM stage (Fig. 7C, Supplementary Figure S5).
Furthermore, HCC patients were consistently clustered based on model gene expression. According to cophenetic, dispersion, and silhouette coefficients, the NMF method selected three as clustering number (Fig. 8A and B). Therefore, HCC patients were categorized into three clusters. Figure 8C showed the heatmap of clinicopathological characteristics and different expression of model genes in different clusters. In addition, there was a significant difference in survival among the three clusters (p = 9.239e-05), demonstrating the prognostic value of these three model genes (Fig. 8D).
6. Construction of risk score-related Nomogram
To develop a more comprehensive prediction of patient survival, a nomogram was constructed using the risk score as well as other clinicopathological characteristics. By univariate and multivariate Cox hazard regression analysis, tumor grade, M stage, child pugh classification grade and risk score were selected to construct Nomogram with p value < 0.05 (Fig. 9A). Time-dependent ROC analysis revealed the AUC values of nomogram for 1, 3, and 5 years were 0.784, 0.775, and 0.765, respectively (Fig. 9B). Calibration plot indicated excellent agreement between observations and predictions (Fig. 9C). Combining with risk score and various clinical features, DCA curves suggested that the nomogram, with the greatest area under the curve, was the most efficient for HCC (Fig. 9D). Based on the results above, the nomogram showed a reliable performance on predicting patient survival.
7. Risk model related-Immune Infiltration, Immune Checkpoint and Immunotherapy
In terms of immune cell infiltration, low-risk group demonstrated higher abundance of CD8+ T cells, resting memory CD4 + T cells, M1 Macrophages and regulatory T cells (Tregs), whereas high-risk group demonstrated higher abundance of M0, M2 Macrophages and resting mast cells (Fig. 10A). ESTIMATE calculated the ESTIMATE score, immune score, stromal score and the tumor purity of each HCC patient. A negative correlation was significantly observed between ESTIMATE score, immune score, stromal score and risk score, whereas a significantly positive correlation between tumor purity and risk score (Fig. 10B). Compared with low-risk group, patients in high-risk group showed lower ESTIMATE score, immune score, stromal score and higher tumor purity (Fig. 10C). These results indicated that the risk scoring model was capable of accurately predicting the status immune microenvironment of HCC patients.
Furthermore, all of 70 ICP genes (24 inhibitory genes, and 46 stimulatory genes) were differentially expressed between the two groups such as CD276, CD40, CD80, CTLA4, et al (Fig. 11A and B). In addition, the expression of most ICP genes was significantly negative correlated with risk score (Supplementary Figure S6), especially CD40LG (R=-0.54, Fig. 11C) and TMIGD2 (R=-0.53, Fig. 11D).
Considering the results presented above, we wondered whether low-risk and high-risk groups respond to immunotherapy differently. TIDE algorithm was used to calculate TIDE score of HCC patients, to reveal their sensitivity to immune checkpoint and examine the ability of the risk scoring model to predict response to cancer immunotherapy. Compared with low-risk group, patients in high-risk group showed higher TIDE scores (p < 0.05, Fig. 12A). There was a significantly positive correlation between TIDE score and risk score (R = 0.46, p < 2.2e-06, Fig. 12B). Based on TIDE score, patients were divided into responders and non-responders. In contrast, responders exhibited lower risk score (Fig. 12C). Figure 12D illustrated that patients in high-risk group displayed a significantly higher fraction of non-responder. Besides, IPS scores of CTLA4_negative + PD-1_negative, CTLA4_negative + PD-1_positive, CTLA4_positive + PD-1_negative and CTLA4_positive _PD-1_positive were calculate, and IPS score in low-risk group was significantly higher than that in the high-risk group (Figs. 12E-H). According to these findings, patients in low-risk group may benefit from immunotherapy, but patients in high-risk group may not be sensitive to the stimulation of immunotherapy.
8. Prediction of Drug Sensitivity
In addition to predicting which patients are responsive to immunotherapy, the “oncoPredict” package was applied to assess the drug sensitivity in GDSC2 database and identify the alternative therapeutic agents for less-sensitive patients. The IC50 value was normalized between high-risk and low-risk groups and calculated as a measure of sensitivity (Fig. 12I, Supplementary Table S3). As shown in Fig. 12I, many drugs may be sensitive to low-risk group patients, such as BMS-754807, Gemcitabine, BI-2536, AZD5991, Vincristine, et al (Supplementary Figure S7). However, only NU6441, AZD6482 and SB505124 exhibited sensitive to patients in high-risk group. Due to lower IC50 values, AZD6482 and SB505124 may serve as novel potential targeted drugs for patients in high-risk group (Figs. 12J and K).
9. The expression of model genes in normal and HCC tissues
Immunohistochemistry (IHC) was employed to ascertain the expression of model genes (KLRB1, RGS2, and TNFRSF1B) in tissues obtained from HCC samples and corresponding adjacent tissues (Fig. 13A). The IHC score was subsequently computed to assess the expression of model genes. In comparison to normal tissues, KLRB1, RGS2, and TNFRSF1B exhibited a reduced expression level (P < 0.05), which is in line with our findings in publicly available databases.
10. Correlation Analysis between riskscore and clinical indicators
Utilizing the aforementioned formula, the riskscore for each patient was computed by employing IHC scores as gene expression values. Among the commonly used clinical indicators, namely Alpha-fetoprotein (AFP), carcinoembryonic antigen (CEA), and carbohydrate antigen 19 − 9 (CA19-9), the riskscore exhibited a robust positive correlation with CEA (R = 0.76, p = 0.011). Furthermore, a positive correlation trend was observed between riskscore and AFP, albeit without statistical significance. Conversely, no statistical difference was detected in the correlation between riskscore and AFP and CA19-9 (Fig. 13C).