An immune-related gene signature for predicting survival and immunotherapy efficacy in hepatocellular carcinoma

Hepatocellular carcinoma (HCC) ranks the fourth in terms of cancer-related mortality globally. Herein, in this research, we attempted to develop a novel immune-related gene signature that could predict survival and efficacy of immunotherapy for HCC patients. The transcriptomic and clinical data of HCC samples were downloaded from The Cancer Genome Atlas (TCGA) and GSE14520 datasets, followed by acquiring immune-related genes from the ImmPort database. Afterwards, an immune-related gene-based prognostic index (IRGPI) was constructed using the Least Absolute Shrinkage and Selection Operator (LASSO) regression model. Kaplan–Meier survival curves as well as time-dependent receiver operating characteristic (ROC) curve were performed to evaluate its predictive capability. Besides, both univariate and multivariate analyses on overall survival for the IRGPI and multiple clinicopathologic factors were carried out, followed by the construction of a nomogram. Finally, we explored the possible correlation of IRGPI with immune cell infiltration or immunotherapy efficacy. Analysis of 365 HCC samples identified 11 differentially expressed immune-related genes, which were selected to establish the IRGPI. Notably, it can predict the survival of HCC patients more accurately than published biomarkers. Furthermore, IRGPI can predict the infiltration of immune cells in the tumor microenvironment of HCC, as well as the response of immunotherapy. Collectively, the currently established IRGPI can accurately predict survival, reflect the immune microenvironment, and predict the efficacy of immunotherapy among HCC patients.


Introduction
According to the Global Cancer Report of 2018, hepatocellular carcinoma (HCC) is among the most prevalent malignancies and ranks the fourth in terms of cancer-related mortality globally [1]. HCC accounts for nearly 90% of all primary liver cancer, which is considered as the most common type [2]. At present, the 5-year survival rate of this disease is as low as 14.1% in China [3]. Even for patients at the earliest stages, surgical resection, accepted as the optimal option, is also accompanied by a high recurrence rate [4,5], making the overall prognosis of HCC patients far from satisfaction. Consequently, it is urgently demanded to predict survival and to improve the clinical outcome of HCC patients.
In recent years, rapid progress has been made in the treatment of liver cancer. Among the advent of a wealth of cutting-edge treatments, immunotherapy has gradually become a hot spot for liver cancer [6][7][8]. Immunotherapy is characterized by stimulating specific immune responses, inhibiting and killing tumor cells, thereby attenuating the rate of tumor recurrence and metastasis. International guidelines have clearly proposed that immunotherapy could be selected as an effective treatment for patients with advanced liver cancer [9]. However, only a small percentage of the population could benefit from immunotherapy. As an indispensable component of immunotherapy, the tumor immune microenvironment (TIME) has gradually acquired accumulative attention, and the analysis of TIME will contribute to the improvement of immunotherapy responsiveness. Some researchers revealed that the TIME could be taken as a main prognostic indicator, which could also enhance the potential of precision treatments [10,11]. Therefore, it is suitable and feasible to construct an immune-related gene signature that is closely related to TIME, aiming at predicting immunotherapy efficacy.
Although a number of HCC signatures have been established based on immune-related genes [12][13][14], a more comprehensive and reliable index is urgently demanded, which can simultaneously predict survival and the efficacy of immunotherapy for HCC patients. To this end, herein, based on the cancer genomics and bioinformatics, we established an immune-related gene-based prognostic index (IRGPI), followed by the validation of its reliability through several data sets. Further, we explored the prognostic value of IRGPI, and the potential predictive role in immunotherapy efficacy. The experimental technical roadmap is summarized in Fig. 1. We expected the IRGPI could provide a foundation for cancer immunotherapy of HCC patients.

Collection of sample information
Clinical information and transcriptomic data of HCC samples were downloaded from The Cancer Genome Atlas (TCGA) data portal (https ://porta l.gdc.cance r.gov/) as well as Gene Expression Omnibus (GEO) (https ://www.ncbi.nlm. nih.gov/gds/), which were named as entire TCGA cohort (n = 365) and GSE14520 cohort (n = 221), respectively [15,16]. The entire TCGA cohort was randomly and equally categorized into TCGA training set (n = 183) and TCGA validation set (n = 182). In addition, the entire TCGA cohort and GSE14520 cohort were used as the internal testing set and external testing set, respectively. Patient demographics and clinical characteristics of the included datasets were summarized in Tab. S1. Furthermore, 1,811 unique immunerelated genes (IRGs) were obtained from the Immunology Database and Analysis Portal (ImmPort) database (https :// www.immpo rt.org/home) [17].

Differentially expressed immune-related genes (DEIRGs)
R package "limma" was utilized to identify differentially expressed genes (DEGs) between 365 HCC specimens and 50 normal specimens according to the criteria of |log 2 (Fold Change)|> 1 and false discovery rate (FDR) < 0.05 [18], followed by extraction of DEIRGs from DEGs. Thereinto, these normal specimens came from normal liver tissue samples which matched with HCC samples in the TCGA dataset. The volcano plot of DEIRGs was plotted using R package "ggplot2" [19]. Additionally, a Venn diagram was drawn by an online tool (https ://bioin forma tics.psb.ugent .be/webto ols/ Venn/) for visualization of the intersections between DEGs and IRGs.
Afterwards, functional enrichment analysis was performed to examine the biological functions of these DEIRGs, including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) via the Database for Annotation, Visualization, and Integrated Discovery (DAVID) 6.8 [20]. GO terms included biological process (BP), molecular function (MF) as well as cellular component (CC) [21,22]. The enrichment of GO terms and KEGG signaling pathways was based on the criteria of FDR < 0.05, followed by visualization of the top 10 most significant GO terms as well as KEGG signaling pathways via R package "ggplot2" [19].

Signature development and reliability evaluation
The prognosis-related IRGs were identified and an IRGPI was established based on the training set, followed by validation of its predictive performance in other datasets. To be specific, during the exploration of prognosis-related IRGs, univariate Cox proportional hazard regression analysis was conducted to evaluate the correlation of DEIRGs with overall survival (OS) in the training set. With the cutoff value of P < 0.05, the prognosis-related IRGs were identified. The optimal model based on prognosis-related IRGs was subsequently identified by the Least Absolute Shrinkage and Selection Operator (LASSO) penalized Cox proportional hazards regression via R package "glmnet" [23]. The IRGs that incorporated into the model were referred to as hub IRGs, and the differential expression of these genes was validated using the Oncomine database [24]. Moreover, this model was used to construct the IRGPI to predict the prognosis of HCC patients. The risk score of each HCC patient was calculated by the following formula: risk score = [Expression level of Gene 1 × coefficient] + [Expression level of Gene 2 × coefficient] + … + [Expression level of Gene n × coefficient]. Patients were further categorized into low-and high-risk groups based on the median value of risk score.
For further validation of the predictive performance of IRGPI, the Kaplan-Meier (K-M) survival curves were applied for survival comparison between low-and highrisk groups via R package "survival" [25]. Additionally, the time-dependent receiver operating characteristic (ROC) curve analysis (including 1-, 3-, and 5-year survival) was established to reflect the sensitivity and specificity of IRGPI using R package "survivalROC" [26]. Meanwhile, the ROC curve was also used to compare the performance of our IRGPI with other published immune-related signatures and widely used biomarkers of cancer immunotherapy. Thereinto, a TP53-associated immune prognostic model established on two genes was named as "Long signature" [12], while a 10 gene-based signature that was associated with tumor microenvironments was named as "Pan signature" [13]. And a risk score prognostic model based on eight genes was named as "Zhang signature" [14].

Association between IRGPI and clinicopathologic factors
Univariate and multivariate analyses on OS for IRGPI and clinicopathologic factors were carried out in the entire TCGA cohort and GSE14250 cohort using R package "survival" [25]. Moreover, independent t tests were applied to evaluate the association of IRGPI with different clinicopathological factors.

Construction of prognostic nomogram
For providing a quantitative analysis tool to predict the survival risk of HCC patients, the nomogram was further constructed on the basis of IGRPI as well as clinical parameters. Meanwhile, calibration curves were drawn for comparison of the predictive and actual survival to evaluate the predictive performance of nomograms. The nomogram and calibration curves were plotted via R package "rms" [27].

Assessment of immune cell infiltration
Immune cell infiltration was estimated from RNA-sequencing data using CIBERSORT, which is an excellent tool for analyzing the expression matrix of immune cell subtypes based on the principle of linear support vector regression [28].

Analysis of immunotherapy efficacy
Immunophenoscore (IPS) can well predict the response of immune checkpoint inhibitors (ICIs), whose scores are based on the expression of important components of tumor immunity, including MHC molecules, immunoregulatory factors, effector cells, and suppressor cells. In addition, the calculation of IPS score is based on representative cell-type gene expression z-scores with a scale ranging from 0 to 10. The IPS of each HCC patient was derived from The Cancer Immunome Atlas (TCIA) (https ://tcia.at/home) [29], followed by an analysis of expression on several prominent checkpoints. Moreover, tumor mutation burden (TMB) can reflect the total number of mutations in tumor cells, which could be utilized for assessing the therapeutic effect of immunotherapy [30]. To explore the correlation between the IRGPI and TMB, we analyzed the available somatic mutation data in the entire TCGA cohort. The mutation data of HCC patients were downloaded and stored as MAF format in the TCGA data portal [31]. And TMB analysis was conducted by R package "maftools" [32].

Statistical analysis
Univariate and multivariate Cox regression analyses were conducted via R package "survival" [25], along with hazard ratios (HRs) and 95% confidence intervals (CIs). Moreover, the difference of various clinical factors was compared by the independent t test. A P < 0.05 indicated statistical significance.

Construction of IRGPI
The analysis of 365 HCC samples and 50 normal samples gave rise to 7667 DEGs, including 7273 up-regulated as well as 394 down-regulated genes. In addition, 329 DEIRGs were extracted from DEGs, including 267 up-regulated and 62 down-regulated genes (Fig. 2a, b). Functional enrichment analysis revealed that the most relevant signaling pathways to the DEIRGs was "cytokine-cytokine receptor interaction" (Fig. 2c). Meanwhile, the most enriched term in the aspect of biological process (BP), molecular function (MF), and cellular component (CC) was "immune response", "extracellular space", and "growth factor activity", respectively (Fig. 2d).
The expression levels of these 11 IRGs were significantly increased in a wide variety of tumor tissue than normal tissue (Fig. S2). IRGPI was, therefore

IRGPI predicts survival of HCC patients
HCC patients were categorized into low-and high-risk groups based on the median value of risk score of IRGPI (shown in Fig. 3a), and the IRGPI threshold is 0.879126. Significantly worse OS was observed in high-risk patients than low-risk patients (Fig. 3b, P < 0.05). Afterwards, the reliability of IRGPI was determined by time-dependent ROC curves (Fig. 3c). As a result, the area under curve (AUC) was 0.809, 0.717 and 0.622 in 1-year, 3-year and 5-year survival, respectively, in TCGA training set, which indicated the good potential of the constructed IRGPI in monitoring survival. These curves were also applied in TCGA validation set, and the AUC was 0.767, 0.663 and 0.721 for 1-year, 3-year and 5-year survival, respectively. Meanwhile, we found that IRGPI also had a high predictive accuracy of survival in the entire TCGA cohort and GSE14520 cohort. Moreover, ROC curves were used to compare the prediction performance of IRGPI with other signatures. As a result, IRGPI achieved consistently superior performance, whether in comparison with other published immune-related signatures or widely used biomarkers of cancer immunotherapy (Fig. 3d-f). These

IRGPI is an independent prognostic indicator
To prove the independence of IRGPI, Cox proportional hazards regression analysis was conducted in the entire TCGA cohort and GSE14520 cohort. As shown in Table 1, univariate and multivariate analyses revealed a significant correlation between IRGPI and OS (P < 0.05). Therefore, IRGPI was considered as an independent prognostic indicator in entire TCGA cohort [HR (95% CI) = 2.973 (1.966-4.496), P < 0.001]. After elimination of cases with unknown M stage (MX, n = 99, > 27%) and unknown N stage (NX, n = 113, > 31%), the sample size of entire TCGA cohort was small, thus M stage and N stage were not included in the analysis. In addition, this index was also capable of independently predicting OS in the GSE14520 cohort [HR (95% CI) = 2.090 (1.034-4.225), P = 0.040]. Taken together, the above outcomes strongly indicated that IRGPI was an independent prognostic factor.

IRGPI significantly correlates with disease progression
To explore the possible relationships between IRGPI and multiple clinicopathologic factors, correlation analysis was conducted via independent t tests. In the entire TCGA cohort, the risk score was significantly higher in patients with advanced tumor grade, advanced pathological stage, and advanced T stage (P < 0.05, Fig. 4a). In the GSE14520 cohort, higher risk score was more commonly detected in male patients, and those with larger tumor size, advanced TNM staging, and increased alpha-fetoprotein (AFP) (P < 0.05, Fig. 4b). These findings demonstrated that IRGPI was statistically correlated with multiple clinicopathological factors, and a higher risk score generally indicated poorer clinical pathological status.
Additionally, based on IRGPI and some clinicopathological factors, we constructed a prognostic nomogram, aiming at providing a quantitative analysis tool that can predict the survival risk of individual patients (Fig. 4c). More importantly, the calibration curves of the prognostic nomogram showed the good consistency between predictive and actual 1-, 3-, and 5-year survival in the entire TCGA cohort (Fig. 4d).

IRGPI predicts the infiltration of immune cells into HCC microenvironment
For further exploration of the indicative roles of IPGRI on TIME, it is necessary to investigate the types of infiltrating immune cells in HCC patients. As an excellent tool to estimate immune cell infiltration, CIBERSORT was adopted for evaluation of the relative proportion of 21 types of immune cells in all HCC specimens. As shown in Fig. 5a, the highly infiltrating immune cells include M0 macrophages, resting memory CD4 T cells, CD8 T cells and M2 macrophages. Among the 21 types of immune cells, the relative proportion of naive B cells, resting memory CD4 T cells, and monocytes had a significant negative correlation with risk score, while the relative proportion of activated memory CD4 T cells and M0 macrophages had a significant positive correlation with risk score (P < 0.05, Fig. 5a). Besides, we also analyzed the correlation between 11 IRGs contained in IRGPI and the infiltration levels of 21 types of immune cells in all HCC specimens, respectively. As shown in Fig. S3, most of these 11 IRGs are negatively related to the infiltration levels of multiple immune cells, such as monocytes and M1 macrophages, while positively related to the infiltration levels of M0 macrophages and activated mast cells. These results reveal the potential of these 11 genes to reflect tumor microenvironment.
In addition, we explored the prognostic value of 21 types of immune cells, among which the infiltrating abundance of M0 macrophages (Fig. 5b), M2 macrophages (Fig. 5c), activated memory CD4 T cells (Fig. 5d) as well as CD8 T cells (Fig. 5e) were significantly related to OS (P < 0.05). The higher infiltrating abundance of M2 macrophages was associated with poorer OS, while the higher infiltrating abundance of CD8 T cells was related to better OS. Collectively, IRGPI was statistically correlated with the infiltration level of most immune cells, implying that our IRGPI could potentially reflect the state of TIME.

IRGPI predicts responses of immunotherapy
To further explore the association of IRGPI with immunity, the correlation analysis was conducted between IRGPI and immune functions. As shown in Fig. 6a, IRGPI was positively correlated with MDSC recruiting, but negatively with infiltration of immune cells into tumors (P < 0.05). As a well-known biomarker of immunotherapy, we also analyzed the relationship between tumor mutation burden (TMB) and IRGPI, revealing the positive correlation of IRGPI with TMB (Fig. S4). Moreover, to predict the response of immune checkpoint inhibitors (ICIs), the correlation between IRGPI and immunophenoscore (IPS) in HCC patients was explored. IPS has been proved excellent in predicting the response of ICIs in several studies [29,33]. The major immune checkpoints include cytotoxic T lymphocyte antigen-4 (CTLA-4), programmed cell death protein 1 (PD-1), programmed death ligand-1 (PD-L1) as well as programmed death ligand-2 (PD-L2). Thus, the scores of IPS, IPS-CTLA4 blocker, IPS-PD1/PD-L1/PD-L2 blocker, and IPS-CTLA4 + PD1/ PD-L1/PD-L2 blocker were used for evaluating the potential application of ICIs. As shown in Fig. 6b, the IPS and IPS-CTLA4 scores were significantly elevated in the low-risk group which was categorized by the IRGPI, implying more immunogenicity on ICIs in the low-risk group. Besides, the expression of some critical immune checkpoints was investigated, showing that the expression of CTLA-4, LAG-3, PD-1, TIGIT, and TIM-3 was significantly higher in the high-risk group than low-risk group (Fig. 6c). These results suggested that the low-risk group was more likely to have an immune response and respond to immunotherapy.

Discussion
An increasing body of evidence has suggested the close correlation of TIME with tumorigenesis and cancer progression [34][35][36]. By analyzing the immune landscape of HCC Fig. 6 Analysis of immunotherapy responses. a Correlation matrix of IRGPI and anticancer immune responses. The blue dots represent a negative correlation, and the red dots represent a positive correlation. *, ** and *** represent P < 0.05, P < 0.01 and P < 0.001, respectively. b The relationship between the IRGPI and IPS. The score of IPS and IPS-CTLA4 blocker were significantly increased in the lowrisk group. c The relationship between the IRGPI and several prominent immune checkpoints. The expression of CTLA-4, LAG-3, PD-1, TIGIT, and TIM-3 was significantly elevated in the high-risk group microenvironment, some researchers pointed out that the immune contexture could be a major prognostic indicator, and should not be disregarded to enhance the potential of precision treatments [37]. At present, immunotherapy has been widely recognized to treat a variety of cancers including HCC [38][39][40]. However, not all patients can benefit from it. Therefore, it is necessary to establish an IRG signature for survival prediction of HCC patients and enriching the effective population of cancer immunotherapy.
During the past years, genomics and bioinformatics have enabled the identification of molecular signatures. For example, several signatures have been identified for prognostic prediction based on lncRNA, miRNA, and mRNA [41,42]. In this study, IRGPI was constructed by integrating the clinical information and transcriptomic data of HCC samples in the TCGA cohort and GSE14520 cohort. A total of 329 DEIRGs were identified, of which the most relevant biological process and signaling pathway was "immune response" and "cytokine-cytokine receptor interaction", respectively. This result was closely associated with immune, which was consistent with some existing literature reports [43]. Subsequently, Cox regression analysis and LASSO regression model identified 11 out of 81 prognosis-related IRGs, which were used to construct IRGPI, including NDRG1, FABP6, MAPT, HSP90AA1, CD320, CACYBP, BRD8, OSGIN1, NRAS, ISG20L2, and PSMD14. Among them, NDRG1 has been reported to be an essential molecule in controlling the metastasis and recurrence of HCC [44]. In addition, the deletion of CACYBP has also been reported to increase apoptosis of HCC cells [45], while the variants of OSGIN1 could reduce apoptosis and are associated with shorter survival [46]. Besides, knockdown and overexpression assays have demonstrated that PSMD14 could promote migration and invasion of HCC cells in vitro, and facilitate tumor growth and metastasis in vivo [47]. Although the direct association between the other seven genes and HCC has not been discovered, we think that the underlying correlations deserve further experimental validation.
In consideration of the importance of immune cell infiltration in tumors, CIBERSORT was further adopted for evaluating the relative proportion of 21 types of immune cells in every HCC specimen. Some evidence has indicated that the interplay between tumor and microenvironment plays a critical role in HCC progression and the probability of response to immunotherapies. So, we assessed the potential of IRGPI to reflect immune cell infiltration, as well as the prognostic value of different types of immune cells. Generally speaking, the high abundance of activated memory CD4 + T cells contributes to immune response and is associated with better prognosis, which also apply to CD8 T cells [48]. Besides, it has been reported that tumor-associated macrophages are mainly M2 macrophages which can promote tumor growth [49].
Consistent with this, our study suggested that the patients with higher infiltrating abundance of CD8 T cells have a better prognosis. Although some partial results seem to be not consistent, they should be acceptable from an overall perspective.
The advent of immunotherapy has shed novel light on HCC treatment, of which ICIs have become a potentially effective treatment [6]. Targeting immune checkpoint molecules such as PD-1 and CTLA-4 could reinvigorate anti-tumor immunity [50]. Recently, nivolumab and pembrolizumab, two therapeutics against PD-L1/PD1, have been recently approved for subsequent-line therapy [51]. To predict the reactivity of ICIs, the relationship between IRGPI and IPS was explored in HCC patients. The analysis indicated that the low-risk group had higher IPS and IPS-CTLA4 scores, revealing that IRGPI has the potential to determine the specific HCC patients who are immunogenic and more responsive to ICIs. The predictive value of IRGPI on the response to ICIs provides a theoretical basis for the therapeutic selection of ICIs in clinical practice. Hopefully, this predictive model could assist to accelerate the pace of individualized cancer immunotherapy.
To further enhance the accuracy of prognostic prediction, we constructed and validated a nomogram by integrating IRGPI, age, gender, tumor status, tumor grade, pathological stage and T stage. Similarly, Ying et al. [52] combined inflammatory biomarkers with risk factors to form a nomogram, which could improve the accuracy for predicting clinical outcomes in CRC patients undergoing surgical resection. More importantly, these new prognostic tools could not only improve the accuracy of prognostic prediction, but also help to predict the specific survival risk of individual patients, which is of great significance in clinical practice [53].
There are several strengths in this study. First, this signature was sufficiently validated and evaluated in multiple datasets, indicating the robustness and reliability of the signature. Second, comprehensive and in-depth researches were carried out in various aspects, including discussions on the correlation of IRGPI with the immune cells, IPS and TMB. Third, a nomogram was further established for the quantitative calculation, which is conducive to clinical promotion and application. Nevertheless, several limitations still exist in our study. Thus, more HCC patients and validations are warranted to further test this signature by prospective studies in the future.
In conclusion, we have constructed an IRG-based index that is closely related to the immune microenvironment, which can better predict survival and reflect the efficacy of immunotherapy for HCC patients. In the era of precision medicine, the IRG-based index could hopefully provide an effective tool to meet the clinical requirements of HCC treatment to a certain extent.