DOI: https://doi.org/10.21203/rs.3.rs-1630689/v1
Background: Liver hepatocellular carcinoma (LIHC) remains a significant global health challenge. Immunotherapy for LIHC is a very promising treatment modality. However, some patients with LIHC have a poor response to tumor immunotherapy. Thus, it is imperative to fully understand liver cancer immune subtypes, by which individualized and optimized immunotherapy would be developed.
Methods: We downloaded transcriptome and clinical data of TCGA-LIHC from the Cancer Genome Atlas (TCGA) databases, GSE52018 and GSE72467 from the Gene Expression Omnibus (GEO). The immune cell infiltration of the tumor microenvironment was calculated. Based on the results of CIBERSORT, the LIHC patients were clustered into three subtypes. To establish the prognostic model, we employed LASSO-COX regression analysis. The differences in immune cell infiltration, tumor stemness, somatic mutations, and sensitivity to the immune checkpoint inhibitors in patients with different risk groups were systematically assessed.
Results: According to the results of CIBERSORT, LIHC samples were divided into three individual subtypes: Cluster-A, B, and C. These three subtypes have distinct characteristics with regard to immune cell infiltration, prognosis, and response to immune checkpoint inhibitors (ICIs) and chemotherapy drugs. Cluster-B belongs to hot tumors, and Cluster-A and Cluster-C belong to cold tumors. We trained a prognostic model based on differential genes between cold and hot tumors. We then attempted to explore the differences among different risk groups in the immune microenvironment, oncogenic biology, genetic variation, and prognosis. The characteristics of immune activation, enhanced immunotherapy response, and prolonged survival was observed in the low-risk group. Finally, we combined Riskscore and other clinicopathological features to build a nomogram to predict the 1-, 3-, and 5-year survival rates of LIHC patients.
Conclusion: In this study, an immune‑related gene-based model of predicting the prognosis and immunotherapy response in hepatocellular carcinoma was established.
Liver cancer is expected to exceed 1 million cases by 2025, which is one of the most malignant digestive tract cancers [1]. LIHC is the only cancer of the five deadliest cancers with an annual increase in incidence, which was characterized by high mortality, recurrence, and metastasis, about 90% of primary liver cancer [2]. The occurrence of LIHC is a multi-step and complex biological process. Many signaling cascades are changed in LIHC, resulting in a heterogeneous molecular profile that ultimately leads to tumorigenesis, progression, and metastasis. Although there are various treatment methods for LIHC, tumor recurrence is very common after the treatment, and the 5-year recurrence rate exceeds 70% [3]. For the recent few years, ICIs-based immunotherapy has been a very promising treatment in many advanced cancers, especially virus-induced cancers [4-6]. Some immune checkpoint molecules such as PD-1 and CTLA-4, are usually targets of immunotherapy. CTLA-4 is low expressed on activated T cells and regulatory T cells (Tregs), but absent on naïve T cells [7]. CTLA-4 can negatively regulate T cell activation. PD-1 can regulate peripheral immune tolerance, and autoimmunity and inhibit T cells activation and proliferation. PD-1 is mainly expressed on CD8+ T cells, Treg, and myeloid-derived suppressor cells. [8, 9]. Current CTLA-4 and PD-1 studies in LIHC demonstrate manageable safety, optimistic response rates, and prolonged overall survival [10-12]. But in many patients, ICIs are ineffective or only partially effective, and response rates to current immunotherapy are low, at approximately 10-25% [13]. Another problem is acquired resistance in patients who initially respond, leading to tumor recurrence. The Tumor microenvironment (TME) takes an essential part in tumorigenesis, which consists of stromal cells and immune cells, such as CD8+, CD4+ T cells, Tregs, and dendritic cells. These tumor-associated immune cells may have tumor-antagonizing or tumor-promoting functions. Current studies have indicated that TME can affect the efficacy of ICIs [14, 15]. Therefore, exploring some immune-related genes that affect the abundance of immune cells in the TME may help to improve the therapeutic effect of ICIs.
Consequently, to understand the differences among diverse LIHC immune subtypes, and the impact on immunotherapy and prognosis. We divided LIHC samples into three subtypes based on the degree of immune cell abundance of LIHC. In addition, we utilized machine learning to construct the prognostic model to quantify risk between individuals. Our study integrates multi-omics analyses to better predict the prognosis of LIHC patients and the effectiveness of immunotherapy.
TCGA-LIHC transcriptome data with related clinical information and somatic mutation data were acquired from the TCGA database (http:// cance rgeno me. nih. gov/). We downloaded GSE52018 and GSE72467 microarray data and corresponding clinical information on LIHC from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). For more comparability with GEO microarray data, the TCGA data were transformed using the "ComBat" function in the "sva" R package to reduce batch effects due to non-biological experimental factors.
We quantified the immune cell infiltration in each LIHC patient by the "CIBERSORT" algorithm. The TME consisting of Immune cells and Stromal cells were quantified by the "ESTIMATE" algorithm[16]. We correlation analyzed the results of CIBERSORT and ESTIMATE, then visualized them using the "corrplot" R package. Molecular subtypes were determined for each patient using consensus clustering analysis, relying on the results of CIBERSORT. Cases were classified according to k-means, using the "ConsensusClusterPlus" R package [17]. Next, We adopted the "survival" R package to understand differences in the survival of different subtypes, and then the "survminer" R package was employed to determine the best cutoff.
We visualized the results of CIBERSORT for three clusters using the "pheatmap" R package. Then we compared the Stromal Score, Immune Score, ESTIMATE Score (comprehensive scoring of stroma and immune), and Tumor Purity of the three clusters. We applied the Single-Sample Gene Set Enrichment Analysis (ssGSEA) function in the R package "GSVA", the 28 immune cell type abundance was estimated, and then compare between three clusters [18].
First, we assessed the potential of LIHC patients to respond to immunotherapy, by comparing the expression of 13 immune checkpoint genes. We combined the Genomics of Drug Sensitivity in Cancer database (GDSC: https://www.cancerrxgene.org) and the "pRRophetic" R package to predict chemotherapy response for each LIHC sample [19]. The half-maximum inhibitory concentration (IC50) is used to measure the ability of the drug to induce apoptosis of tumor cells, that is, the stronger the induction ability, the lower the value is, indicating that the drug has better efficacy.
Comparison of expression profiling data using the R package "limma" to identify the differential genes (DEGs) of Cluster-B vs Cluster-A or Cluster-C [20]. We set the threshold as |log2FoldChange| >1 and P-value <0.05 to obtain DEGs. We analyzed the connections between the proteins encoded by DEGs on the STRING website and mapped the protein interaction network [21]. Next, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis was adopted to understand the functions of 42 selected DEGs using the R package “clusterProfiler”.
We constructed a prognostic model to assess immune-related risk factors in LIHC patients based on DEGs between three clusters. LASSO regression was carried out on 42 DEGs to reduce the number of predictors. The prognostic model was established using LASSO-COX regression analysis of selected prognostic DEGs. Finally, we calculated the Riskscore value of all LIHC samples. LIHC patients with Riskscore above the median were allocated in the high-risk group, and those with a Riskscore under the median were allocated in the low-risk group.
First, to understand the survival differences in different groups, we performed the Kaplan-Meier survival analysis, then visualized using survival curves. Next, we employed the “heatmap” R package to draw a heatmap showing the presence of 5 hub genes in different risk groups. We downloaded immunohistochemical staining of partial hub gene proteins in LIHC and normal tissues from the Human Protein Atlas database (https://www.proteinatlas.org/). We included all predictors in univariate regression analysis followed by multivariate regression analysis. Forest maps were employed to present the results. Finally, The optimality of the prognostic model was verified by the receiver operating characteristic (ROC) curve.
To further explore differences based on genomics among different risk groups, we investigated the differences in immune infiltration, somatic mutation, and tumor stemness index in the two risk groups. We used seven immune infiltration estimation algorithms in the TIMER2.0 database (http://timer.cistrome.org/) to quantify the proportion of immune cells in the two groups of samples [22-27]. Then, we visualized somatic mutations in both groups by the "maftools" R package. Tumor stemness index (mRNAsi) was generated for each sample using the one-class logistic regression (OCLR) to assess tumor dedifferentiation potential. [28]. Finally, using the "rms" R package, a nomogram was built up for the prediction of 1-, 3-, and 5-year survival rates of LIHC patients.
Identification of Immune Cell Correlations and Patient Consensus Clustering
We performed correlation analysis on immune cells in LIHC samples. In Fig. 1A, T cells CD4 naive and Neutrophil cells with the highest positive correlation because the red sector has the largest area. NK cells activated and Macrophages M0 has the highest negative correlation. Then we identified three subtypes, Cluster-A, B, and C based on the results of CIBERSORT (k=3, Fig. 1B, C). The survival curve (Fig. 1D) shows that Cluster-B had the best survival, while Cluster-C had the most dismal prognosis (p=0.032).
Identification of Immune-Related Subtype
To further understand the difference between different subtypes, we made a heatmap to show the infiltration degree of different immune cells among each subtype (Fig. 1E). We found that Cluster-B had the most abundant enrichment of immune cells, involving CD8 T cells and NK cells which are associated with tumor killing. Others included T cells follicular helper and plasma cells. CD4 memory resting T cells were significantly higher in Cluster-A, and Macrophages M0 cells were dramatically increased in Cluster-C. Then the result of ssGSEA shows that the infiltration degree of most immune cells was higher in Cluster-B too (Fig. 1F). Finally, by the result of ESTIMATE, we found no differences in the Stromal Scores of the three subtypes (Fig. 1G). Cluster-B had the highest Immune Score and ESTIMATE Score, and Cluster-C had the lowest score, which was consistent with the prognostic analysis. Cluster-C tumors have the highest purity and the worst prognosis, while Cluster-B tumors have the lowest purity and the best prognosis. Therefore, we believe that Cluster-A and Cluster-C fall into the cold tumor, while Cluster-B belongs to the hot tumor.
Response of Three Clusters to Immunotherapy and Chemotherapy
ICIs such as PD-1 and CTLA-4 are of great promise in LIHC therapy. We validated the expression of 13 immune checkpoint-related genes in 3 clusters. The results show that PD1 related (Fig. 2A), CTLA4 related (Fig. 2B), and other checkpoint genes (Fig. 2C) were highly expressed in Cluster-B.
In Cluster-A and Cluster-C, the expression of most immune checkpoint-related genes was not significantly different, but all were lower than in Cluster-B. Previous studies reported that cold tumors were insensitive to treatment with ICIs. In our opinion, the results of immune checkpoint genes validate our judgment on the three clusters of subtypes. We evaluated the response of chemotherapy drugs. Then, we found 9 drugs with potential predominance in Cluster-A (Fig. 2D), 53 in Cluster-B (Fig. 2E), and 9 in Cluster-C (Fig. 2F).
Functional Enrichment Analysis and PPI Network
Cluster B belongs to the hot tumor, while Cluster-A and Cluster-C are cold tumors. To further explore the genetic differences between the two tumors, we identified 42 DEGs in Cluster-B vs Cluster-A or Cluster-C. The results in GO (Fig. 3A) and KEGG (Fig. 3B) enrichment analysis suggest that the differences between hot and cold tumors may involve PD-L1 expression and PD-1 checkpoint pathway in cancer, and other immune pathways. In addition, there are proliferation, differentiation, and activation of various immune cells. Finally, we mapped the protein interaction network of 42 DEGs (Fig. 3C).
Establish the Prognostic Model
Considering the immune heterogeneity of the 3 Cluster, we have established a prognostic model based on this heterogeneity. We incorporated 42 DEGs into the Lasso regression analysis to reduce the number of predicted variables. Then, five hub genes were selected by the LASSO-COX regression analysis (Fig. 3D). The formula of Riskscore is as follows: . Exp is the expression value of the hub gene (Table S1). Fig. 3E and 3F show the distribution of Riskscore and the survival status of LIHC patients. The K-M survival curve shows that the low-risk group has a better prognosis (p<0.001) (Fig. 3G). Fig. 3H exhibits the distribution of various clinical features and hub gene expression levels in two risk groups. The expression of EOMES, GZMH, NKG7, and CD8A are higher in the low-risk group. The expression of ANKRD22 is higher in the high-risk group. Furthermore, immunohistochemical staining revealed differences in partial hub genes between LIHC and normal tissues (Fig. 3I). Fig. 3J shows higher mortality in the high-risk group, and Fig. 3K indicates that there is a significant difference in Riskscore between dead and alive patients (P<0.001).
We included age, gender, stage, and Riskscore in univariate (Fig. 3L) and multivariate (Fig. 3M) regression analysis. Results reveal that Riskscore is an independent prognostic factor for LIHC patients.
To assess the predictive accuracy of our models, we employed the ROC curve (Fig. 3N). The area under the curve (AUC) value (0.68) of Riskscore is higher than other clinicopathological characteristics, indicating that Riskscore has the best predictive ability.
Immune Infiltration and Sensitivity to Immunotherapy
As shown in Fig. 4A, the low-risk group had higher tumor infiltration of diverse anti-cancer immune cells, including CD8+ T cells, macrophages, Th1 cells, NK cells, and dendritic cells. However, M2 macrophages, which might have a suppressive TME, are more accumulated in the high-risk group. Fig. 4B shows that 21 immune cells were more strongly infiltrated in the low-risk group. The results in Fig. 4C indicate that the low-risk group had more abundant stromal components and immune infiltration. The higher tumor purity in high-risk groups may lead to a poor prognosis. The PD1 related (Fig. 4D), CTLA4 related (Fig. 4E), other immune checkpoint genes (Fig. 4F), and T cell activation agonists (Fig. 4G) are apparently higher in the low-risk group. In consideration of the positive correlation between the expression of the immune checkpoint genes and the response to immunotherapy, the low-risk group may benefit much more from immunotherapy. In conclusion, the low-risk group has stronger antitumor immunity, thereby displaying better immunotherapy response and has a more favorable prognosis.
Somatic mutation and stem cell index in two groups, and Nomogram construction
To further investigate the heterogeneity of the two risk groups, we analyzed the somatic mutation distribution and stem cell index. We found that in the TCGA-LIHC cohort, the high-risk group had a higher somatic mutation rate. (Fig. 5A, B). The mutation landscape showed the top 30 mutant genes. Then we compared the cancer stem cell index of the two groups and observed a higher stem cell index in the high-risk group (Fig. 5C). The Riskscore had a weaker positive correlation with the stem cell index (Fig. 5D). Fig. 5E is a nomogram predicting the overall survival of LIHC, combined with Riskscore and clinical features.
Despite rapid advances in treatment and surgical techniques, the incidence and mortality of LIHC remain high. In the past few decades, immunotherapy for LIHC has made tremendous progress. Immunotherapy with ICIs improved outcomes in patients with LIHC. However, due to the immune complexity and the heterogeneity of LIHC, immunotherapy was only effective in 10-25% of patients. In recent years, the rapidly developing microarray technology has been widely used to predict disease progression, and assess immunotherapy and prognosis. In the present study, LIHC samples with different immune cell infiltration were classified into 3 molecular subtypes by the results of transcriptomic data analysis. In this research, multi-omics data and clinical data were used to analyze the immune heterogeneity of LIHC patients, and to construct a prognostic model to evaluate patient prognosis and predict the effect of immunotherapy.
To obtain LIHC typing based on immune cell infiltration, we first calculated the degree of immune infiltration in all samples using CIBERSORT. Next, three immunophenotypes were comprehensively identified based on the consensus cluster analysis of the CIBERSORT results. The further study we found that the prognosis and the degree of immune infiltration were significantly different among the three clusters. Cluster-B has the highest degree of anti-tumor immune cells. We believed that the immune infiltration characteristics of Cluster-B help prolong the lifespan of patients. In contrast, Cluster-C has lower immune infiltration and the highest tumor purity and therefore the worst prognosis. Analysis of immune checkpoint-related genes showed that the expression in Cluster-B was significantly higher than in the other two clusters. Therefore, the immunotherapy effect of Cluster-B may be better. Currently, researchers are increasingly focusing on the immunophenotype of tumors. Tumors were divided into hot and cold tumors according to the T cell infiltration in the TME [29]. According to the infiltration of immune cells, tumors were divided into immune-inflamed type, immune-excluded type, and immune-desert type [29]. The immune-inflamed type contains a large number of CD8+ T cells infiltrate in the tumor, and PD-1/PD-L1 is highly expressed. The CD8+ T cells of immune-excluded tumors mainly infiltrate the stroma and cannot effectively invade the tumor. CD8+ T cells are lacking in the immune-desert type. Therefore, the immune-inflamed type is hot tumors. Besides weak infiltration of T cells, the immune-excluded type and immune-desert type also have the characteristics of low expression of PD-1/PD-L1, which belong to cold tumors [30]. A clinical trial shows that compared with cold tumors, the hot tumors may be more likely to benefit from immunotherapy because of the significantly higher PD-1/PD-L1 expression [31]. Our results show that Cluster-B has higher PD-1/PD-L1 expression than the other two clusters. At the same time, the CD8+ T cell infiltration of Cluster-B was also the highest. Based on our results, we believe that Cluster-B belongs to hot tumors, while Cluster-A and Cluster-C are cold tumors. Cluster-B may be more sensitive to immunotherapy.
Next, to reveal the differences between hot and cold tumors, we obtained the DEGs of the Cluster-B vs Cluster-A or Cluster-C. Then we employed KEGG and GO analysis on the DEGs. The results of GO enrichment analysis show that these genes mainly take an important part in the proliferation, differentiation, activation of immune cells, and regulation of immune responses. For the KEGG pathway, DEGs were involved in immune-related processes, including PD-L1 expression and PD-1 checkpoint pathway in cancer.
We screened 5 hub genes (EMOES, ANKRD22, GZMH, NKG7, CD8A) by LASSO regression analysis on 42 DEGs. The predictive model was established based on hub genes to evaluate the sensitivity to immunotherapy and prognosis of LIHC patients. EOMES is highly expressed in CD8+ T cells expressing inhibitory receptors, which encodes transcription factors required for the differentiation of effector CD8+ T cells and is essential for CD8+ T cells which play an important part in the antitumor immune. [32]. ANKRD22, an oncogene in non-small cell lung cancer, promotes tumor advance mainly by upregulating E2F1 transcription and enhancing cell proliferation [33]. In breast cancer, ANKRD22 was found to enhance the malignant degree of breast cancer cells by regulating NuSAP1 expression to activate the Wnt/β-catenin pathway [34]. GZMH is constitutively expressed in NK cells and has a significant role in the innate immune response against tumors and viruses. Studies have shown that low expression of GZMH in cytotoxic lymphocytes may predispose to HBV infection and hepatocellular carcinoma. [35]. NKG7 was upregulated on CD8+ T cells and NK cells for antitumor immunity and was required for T cell accumulation in the TME [36]. Wen et al. reported that NKG7 is an essential component of the cytotoxic function of CD8+ T cells and has been established as a T cell-intrinsic therapeutic target for enhanced cancer immunotherapy [37]. CD8A encodes the α chain of the CD8 antigen. The CD8 antigen played a crucial role in mediating cell-cell interactions within the immune system. which is a cell surface glycoprotein located on cytotoxic T lymphocytes. Combined with the current research, we believe that EMOES, GZMH, NKG7, and CD8A belong to tumor suppressor genes, which were highly expressed in the low-risk group. ANKRD22 is an oncogene with higher expression in high-risk groups.
There are profound clinical implications of Riskscore as we expected. First, Riskscore can efficiently forecast the prognosis of LIHC patients, and the low-risk patients have longer survival. Significantly, Riskscore is more precise than other clinical parameters. Second, tumors can be distinguished between cold or hot according to the Riskscore. Overall, the two-group classified by Riskscore showed a significant discrepancy in immune infiltration, immune score, and expression of immune checkpoint-related genes. In this study, patients in the low-risk group were classified as hot tumors, displaying the rich infiltration of activated CD8+ T cells and a higher expression of PD-1/PD-L1. In the low-risk group, not only for PD1, but the expression of the other immune checkpoints (CTLA4, LAG3, PRF1, TNF, CXCL9, GZMB, CD8A, GZMA) was higher than those in another group. Therefore, it is suggested that immunotherapy might be more effective in the low-risk group, whereas the high-risk group is the opposite. Third, there were more TP53 gene mutations in the high-risk group. The tumor suppressor gene TP53 mutations were the most common genetic changes in HCC, with an average of 30% [38]. In HBV-related HCC patients, the frequency of TP53 mutation can reach nearly 60% [39]. TP53 mutation is closely related to tumor differentiation, vascular invasion, serum alpha-fetoprotein (AFP) level, tumor stage, and the immune microenvironment in HCC [38]. TP53 mutation may affect patient prognosis, and HCC patients with this mutation have shorter overall survival and recurrence-free survival [38, 40]. CTNNB1 mutations were the highest mutation in the low-risk group, but lower than those in the high-risk group. Mutations in CTNNB1 can promote tumor advance by activating the Wnt/βcatenin pathway and are found in approximately 19%-26% of liver cancers [41, 42]. Intrinsic activation of the Wnt/β-catenin pathway is associated with the lack of T cell infiltration within tumors and resistance to PDL1/CTLA4 immunotherapy [42-44]. Clinical studies have shown that Wnt/CTNNB1 mutations characterize the immune-exclusion category (cold tumors), and LIHC patients with these mutations are resistant to PD1 immunotherapy [45]. Fourth, we found that Riskscore had a weak positive correlation with tumor stemness index, and there were significant differences in stemness index between high- and low-risk groups. Cancer stem cells are the crucial factors that contribute to cancer recurrence, metastasis, chemotherapy resistance, and patient prognosis in HCC [46]. Higher stemness index values correlated with active biological processes and greater tumor dedifferentiation in cancer stem cells, and metastatic tumor cells exhibit more dedifferentiation phenotypically, which may contribute to their aggressiveness. The high-risk group showed a higher stemness index, which may lead to a worse prognosis. Finally, we constructed a nomogram combining Riskscore and clinicopathological features to provide a novel approach to predicting overall survival in LIHC patients.
The present study constructed a valid, innovative, and reliable immune-related prognostic model (EOMES, ANKRD22, GZMH, NKG7, CD8A) to forecast the prognosis of LIHC patients and the sensitivity to immunotherapy. Our model is an independent risk factor for LIHC. Further, we constructed a nomogram for predicting OS in patients with LIHC.
LIHC: Liver hepatocellular carcinoma; TCGA: The Cancer Genome Atlas; GEO: Gene Expression Omnibus; ICIs: immune checkpoint inhibitors; Tregs: regulatory T cells; MDSC: myeloid-derived suppressor cells; TME: Tumor microenvironment; ssGSEA: Single-Sample Gene Set Enrichment Analysis; IC50: half-maximum inhibitory concentration; GO: Gene Ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes;
Acknowledgments
We thank the TCGA and GEO databases, and the authors who uploaded the relevant data.
Authors’ contributions
RJK designed the overall study and wrote the original draft. YC and ZFX participated in the literature review and edited the manuscript. ZKN and ZT are responsible for proofreading. All authors read and approved the final manuscript.
Availability of data and materials
The datasets for this study can be found in the Cancer Genome Atlas (TCGA) database and the Gene Expression Omnibus (GEO) database.
Declarations
Ethics approval and consent to participate
The authors have no ethical conflicts to disclose.
Consent for publication
All authors agree with the paper’s content.
Competing interests
There are no competing interests to declare.
Funding
This work is supported by the Natural Science Foundation of Guangdong Province (No.2019A1515011850 and No.2022A1515012224)