Identification of DEGs related to immune
From the TCGA database, we obtained the expression profiles of 417 HCC samples, which contained 367 tumor samples and 50 nontumor ones after data preprocessing. Totally 7194 genes were identified as DEGs with the threshold of P < 0.05 and |log2FC| > 1.5, containing 3657 genes upregulated and 3537 genes downregulated (Fig. 2A, Tables S3). The samples could be well clustered for normal and tumor groups when the top 200 DEGs were selected for unsupervised hierarchical clustering (Fig. 2B). To obtained the immune-related DEGs for HCC samples, we selected the genes both be immune-related genes and difference expressed for HCC. We total collected 2542 human immune-related genes composed by the 1811 immune-related genes from Immport and 1052 innate immune-related genes from InnatedDB. Finally, there were 866 genes within all the DEGs when intersecting with immune genes (Fig. 1C, Tables S4), and These immune-related DEGs were then chosen for further analysis.
The further GO enrichment analysis of these 866 immune-related DEGs showed that 1008 go terms including 891 biological processes (BPs), 39 molecule functions (MFs), and 78 cellular components (CCs) were significantly enriched in (Tables S5). As shown in Fig. 3A, the top 10 go terms were showed by classification. Leukocyte migration, positive regulation of cytokine production, cell chemotaxis, leukocyte cell to cell adhesion, and T lymphocyte activation are the top enriched biological process; the enriched cellular components including collagen − containing extracellular matrix, external side of plasma membrane, collagen trimer, secretory granule membrane, extracellular matrix component; the enriched molecule functions contains structural constituent of extracellular matrix, carbohydrate binding, receptor ligand activity, cytokine activity, glycosaminoglycan binding. More details about the top GO enrichment analysis results were showed by overview their genes in Fig. 3B, C, D. These circle plots significantly could help us to understand the function of DEGs in these enriched terms. And most of the time, it is more meaningful to further represent genes with various functions. Furthermore, analysis of KEGG pathway enrichment (Tables S6) indicated that significantly enriched pathways include interaction between cytokine and cytokine receptor, Viral protein interaction with cytokine and cytokine receptor, Cell adhesion molecules (CAMs), Chemokine signaling pathway, Malaria (Fig. 3E). The top 10 pathways were showed and interact with their assigned genes in Fig. 3F. The KEGG pathways analysis results were different from GO enrichment results, indicating that the immune microenvironment of HCC was sophisticated.
Weighted Co-expression Network Construction And Key Modules Identification
The 866 immune-related DEGs and 367 HCC tumor samples were used in constructing gene co-expression network. After checked the missing values, we detected the outlier samples by hierarchical clustering (Figure S1), and the dendrogram shows 5 outlier samples that will be removed from the analysis. In this study, β = 4 (scale-free R2 = 0.85) was selected to be the soft-thresholding parameter to ensure a scale-free network. As shown in Fig. 4A, we performed network topology analysis for thresholding powers from 1 to 20, 4 was the lowest power for the scale-free topology fit index on 0.85. We obtained gene clustering tree by using hierarchical clustering of TOM-based dissimilarity and identified 6 modules (Fig. 4B, Table 1). To select the clinically significant modules, we used WGCNA to calculate the correlations between the external clinical information and gene modules. As shown in Fig. 4C, the Green module was found to be the highest one associated with overall survival, and the green and brown eigengenes are highly related (Figure S2).
Table 1
The number of genes contained by modules
Module | The number of genes |
Blue | 302 |
Brown | 147 |
Green | 139 |
Red | 86 |
Turquoise | 389 |
Yellow | 139 |
Grey | 254 |
We visualized the green module as networks by Cytoscape and selected the top 100 gene pairs by sorting the weight of gene pairs. As shown in Fig. 4D, it indicates that some genes, such as PDLIM7, EHHADH, DMGDH, and CYP8B1 with larger size have higher node degree. Finally, we screened 144 immune-hub genes with high gene significance with OS and OS time (> 0.1), and high relationship with interesting green module (> 0.5) (Table S7).
Statistical gene set enrichment analysis about the 144 immune-hub genes was performed to find over-representation of functions from biological pathways like KEGG, Reactome, and complexes in CORUM, etc. This is done with the hypergeometric test followed by correction for multiple testing. The results showed that immune-hub genes were enriched significantly in 52 pathways and complexes (p < 0.05) (Fig. 4E; Tables S8), including KEGG pathways such as TNF signaling pathway, Metabolic pathways, Reactome pathways such as Phenylalanine and tyrosine metabolism, and CORUM complexes such as PLAUR − PLAU complex, IGF2R − PLAUR − PLAU complex, MAK − ACTR − AR complex, IGF2R − PLG − PLAU − PLAUR − LTGFbeta1 complex. These findings show that these hub genes not only affect the metabolic, apoptosis, and cell survival as well as inflammation and immunity of HCC, but also plays a pivotal role in the protein complexes of immune cells.
Establishment of the lasso cox-based prognostic gene signature
We subsequently revealed that 108 of the 144 immune-hub genes were significantly related to OS through univariate Cox regression analysis (results in Tables S9). Then, we performed lasso-penalized Cox analysis to further narrow the hub genes (Fig. 5AB). Seven genes were identified and utilized thereafter in constructing an ImmuneRiskScore model to evaluate the prognostic value of CRC patients. The seven genes identified and their cox coefficient were showed in Table 2 including Secreted Phosphoprotein 2(SPP2), Glucose-6-Phosphatase Catalytic Subunit(G6PC), Phospholipase B Domain Containing 1(PLBD1), ETS Variant Transcription Factor 4(ETV4), Phosphofructokinase Platelet(PFKP), G Protein Subunit Alpha Z(GNAZ), Asparaginase And Isoaspartyl Peptidase 1(ASRGL1). The GO enrichment analysis shows that these seven genes were enrichment in several molecular functions such as asparaginase activity, beta-aspartyl-peptidase activity, 6-phosphofructokinase activity, glucose-6-phosphatase activity, and sugar-terminal-phosphatase activity. The formula for the ImmuneRiskScore model was described in the part of Materials and Methods.
Next, we divided HCC patients into high- and low-score groups, based on the lasso cox hub genes and ImmuneRiskScore, according to the optimal cut-off got from survminer package. The results indicated that five genes (PLBD1, ETV4, PFKP, GNAZ, ASRGL1) are risk factors, and that high-score samples had a worse OS than those with low score and the SPP2 and G6PC are protective factors (Fig. 5C). The last figure in Fig. 5C showed the prognostic accuracy of ImmuneRiskScore (95% CI for HR: 0.48 (0.33–0.68), log-rank test p < 0.0001). Additionally, the result of multivariate Cox regression analyses showed that the predictive value of the ImmuneRiskScore for HCC patients is not associated with common clinical variables (Tables S10).
Table 2
The result of Lasso regression
Genes | Coef. |
SPP2 | -0.27112 |
G6PC | -0.12328 |
PLBD1 | 0.66909 |
ETV4 | 0.359013 |
PFKP | 0.402211 |
GNAZ | 0.126409 |
ASRGL1 | 0.795796 |
Validation Of The Immuneriskscore In Tcga Crc Cohort
To further investigate the prognostic value of ImmuneRiskScore, we conducted a validation analysis in another GEO cohort (GSE14520). The dataset was categorized into two groups based on ImmuneRiskScore; the results of analysis indicate that there is a significant prognostic value between ImmuneRiskScore and OS as well as recurrence, the high-score group had a worse OS rate than the low-score one, not merely in the TCGA cohort (Fig. 6AB). Figure 6C showed the prognostic accuracy of ImmuneRiskScore in the GEO dataset, which was considered as a continuous variable during investigation. The area under the ROC curves (AUC) of the prognostic model for OS was 0.608 at 1 year, 0.614 at 3 years, 0.620 at 5 years. These results indicated that ImmuneRiskScore is a good model to predict survival.
Stromal and Immune cell infiltration landscapes between high and low ImmuneRiskScore
By using ESTIMATE algorithm, we estimated the infiltrating cells and tumor purity of tumor tissue. Stromal score represented the presence of stromal cells in tumor tissue, and immune score indicated the infiltration of immune cells in tumor tissue, and combination of them stood for a measurement of tumor purity (Tables S11). We then distinguished differences in stromal and immune scores between high- and low-risk patients with LUSC. As shown in Fig. 7(A, B), the immune and stromal score in high-level patients were both significantly (Wilcox p < 0.05) higher than low-level HCC patients indicate the overall presence of immune and stromal levels are associated with Immune Risk Score.
We further estimated the ratio of 22 immune cell categories in HCC patients using the CIBERSORT method, the results are shown in Tables S12. The proportion distribution of immune cells is heterogeneity not only between the high and low-levels samples of ImmuneRiskScore but among the HCC samples (Figure S3). We made a comparison of the relative proportions of 22 immune cells between low and high ImmuneRiskScore in HCC patients, and found significant differences in B cells memory, Plasma cells, T cells CD4 memory activated, T cells regulatory Tregs, NK cells resting, Monocytes, Macrophages M0, Dendritic cells resting, Mast cells resting, Neutrophils(Fig. 7C). Furthermore, we calculated the absolute immune infiltrated score of these 22 immune cells by combined with tumor purity. A similar comparison of the relative proportions of 22 types of immune cells between low and high ImmuneRiskScore was conducted and significant differences were found in B cells memory, Plasma cells, T cells CD4 memory activated, T cells follicular helper, T cells regulatory Tregs, NK cells resting, NK cells activated, Macrophages M0, Macrophages M2, Dendritic cells resting, Mast cells resting, Neutrophils. It can be easily obtained that several immune cells were associated with ImmuneRiskScore both in relative and absolute types including B cells memory, Plasma cells, T cells CD4 memory activated, T cells regulatory Tregs, NK cells resting, Macrophages M0, Dendritic cells resting, Mast cells resting, Neutrophils. These results also demonstrated that the variations in levels of immune cells in tumor environment might have a correlationship with the overall survival in HCC samples.
Correlations between the ImmuneRiskScore and immune biomarkers
The discovery of broad immune biomarkers of the TME could effectively predict clinical benefit to ICIs. We next introduced a few important immune biomarkers including PDL1, PD1, PDL2, CTLA4, CYT, and IFN gamma. Among these biomarkers, the immune checkpoint genes including PDL1, PD1, PDL2, CTLA4 are worth attentional for Immune checkpoint inhibitors which are revolutionizing the clinical treatment landscape. Cytolytic activity (CYT) focuses on cytotoxic T cells (CTL) and natural killer cells (NK) for their powerful ability to lyse tumor cells[35]. CYT is measured based on the geometric mean of expression of granzyme A (GZMA) and perforin (PRF1). Interferon-γ(IFN-gamma) signature is a key cytokine to activate the PD-1 signaling axis by directly upregulating the ligands PD-L1 and PD-L2 mainly in tumor cell, which was produced by T cells activated, NK and NKT cells[36, 37]. Tumor mutational burden (TMB) is defined as the number of nonsynonymous mutations detected in each megabase sequenced. TMB is proved to be associated with improved responses to checkpoint blockade, in some tumors such as melanoma[38] and non-small-cell lung cancer[39, 40]. An experimentally determined pan-fibroblast TGF-β response signature (Pan-F-TBRS) showed elevated mean expression in non-responders and decreased overall survival particularly in patients with mUC[41]. transforming growth factor β (TGF-β) signalling in fibroblasts is documented as a pleiotropic cytokine having relationship with poor prognosis in multiple tumor categories[42, 43], and it is thought to be vital in advanced cancers in promotion of immunosuppression, angiogenesis, metastasis, tumor cell epithelial to mesenchymal transition (EMT), fibroblast activation and desmoplasia[44–46].
We further explored the relationship between ImmuneRiskScore and these immune biomarkers (Tables S13, Tables S14) by person which satisfying a bivariate normal distribution (Pearson result in Tables S15). As shown in Fig. 8, ImmuneRiskScore were significantly positive correlated with both several immune inflammed biomarkers(PDL1: r = 0.31;p = 1.63e-09,95% CI: 0.21–0.40; PD-1: r = 0.35;p = 8.45e-12,95% CI: 0.25–0.43; PD-L2: r = 0.27;p = 1.02e-07,95% CI: 0.18–0.37; CTLA-4: r = 0.42;p = 6.78e-05,95% CI: 0.33–0.50; CYT: r = 0.21;p = 5.41e-17,95% CI: 0.11–0.30, IFN gamma: r = 0.42;p = 0.00037,95% CI: 0.17–0.36) and Pan-F-TBRS (r = 0.27;p = 2.51e-07,95% CI: 0.33–0.50). All of the significant p value for correlation was smaller than 0.001, suggesting that ImmuneRiskScore might be a potential biomarker for immunotherapy especially for Immune checkpoint inhibitors. In addition, there is no correlation between ImmuneRiskScore and TMB, indicating TMB is an independent factor mediating the TME in these two groups.