Identification of DEGs related to immune
From the TCGA database, we obtained the expression profiles of 417 HCC samples, which contained 367 tumor samples and 50 non-tumor ones after data preprocessing. Totally 7194 genes were identified as DEGs with the threshold of P < 0.05 and |log2FC| > 1.5, containing 3657 genes upregulated and 3537 genes downregulated (Figure2A, Tables S3). The samples could be well clustered for normal and tumor groups when the top 200 DEGs were selected for unsupervised hierarchical clustering (Figure2B). To obtain the immune-related DEGs for HCC samples, we selected the genes which are both immune-related genes and differentiately expressed for HCC. We total collected 2542 human immune-related genes composed by the 1811 immune-related genes from Immport and 1052 innate immune-related genes from InnatedDB. Finally, there were 866 genes within all the DEGs when intersecting with immune genes (Figure1C, Tables S4), and These immune-related DEGs were then chosen for further analysis.
GO term enrichment analysis results varied from GO classification and expression change of immune-related DEGs. As to biological process, the up-regulated immune-related DEGs significantly enriched in cell chemotaxis, positive regulation of cytokine production, leukocyte migration, leukocyte chemotaxis and T cell activation, and the down-regulated immune-related DEGs significantly enriched in hormone-mediated signaling pathway, steroid hormone mediated signaling pathway, cellular response to steroid hormone stimulus, response to steroid hormone, intracellular receptor signaling pathway. About cellular component, the up-regulated immune-related DEGs significantly enriched in external side of plasma membrane, MHC protein complex, cytoplasmic vesicle lumen, vesicle lumen and collagen-containing extracellular matrix and the down-regulated immune-related DEGs significantly enriched in RNA polymerase II transcription factor complex, nuclear transcription factor complex, transcription factor complex. As for molecular function, the up-regulated immune-related DEGs significantly enriched in receptor ligand activity, cytokine activity, cytokine receptor binding and the down-regulated immune-related DEGs significantly enriched in receptor ligand activity, steroid hormone receptor activity, nuclear receptor activity. More detailed GO enrichment analysis results are shown in Figure3A,B and Tables S5. These significantly enriched pathways and terms could help us a lot to further understand the role which DEGs played in HCC immune microenvironment.
Ten significantly KEGG pathway enriched pathways were shown in Figure3C and Table S6 for up-regulated genes such as Cytokine-cytokine receptor interaction, Viral protein interaction with cytokine and cytokine receptor, Natural killer cell mediated cytotoxicity, Chemokine signaling pathway and Rheumatoid arthritis. However, the down-regulated immune DEGs were also enriched in Cytokine-cytokine receptor interaction. The top five KEGG and GO enrichment result were showed in detail for up and down regulated DEGs(Figure3D).Most of genes are up-regulated and overlap in the top five KEGG pathways and GO terms. As shown in Figure3D, we found 168-regulated DEGs and 11 down-regulated DEGs enriched in Cytokine-cytokine receptor interaction. This analysis results indicating a complicated molecular mechanism existing in HCC immune microenvironment
Weighted co-expression network construction and key modules identification
The 866 immune-related DEGs and 367 HCC tumor samples were used in constructing gene co-expression network. After checking the missing values, we detected the outlier samples by hierarchical clustering (Figure S1), and the dendrogram shows 5 outlier samples that will be removed from the analysis. In this study, β = 4 (scale-free R2 = 0.85) was selected to be the soft-thresholding parameter to ensure a scale-free network. As shown in Figure4A, we performed network topology analysis for thresholding powers from 1 to 20, 4 was the lowest power for the scale-free topology fit index on 0.85. We obtained gene clustering tree by using hierarchical clustering of TOM-based dissimilarity and identified 6 modules (Figure4B, Table 1). To select the clinically significant modules, we used WGCNA to calculate the correlations between the external clinical information and gene modules. As shown in Figure4C, the Green module was found to be the highest one associated with overall survival, and the green and brown eigengenes are highly related (Figure S2).
Table 1 The number of genes contained by modules
Module
|
The number of genes
|
Blue
|
302
|
Brown
|
147
|
Green
|
139
|
Red
|
86
|
Turquoise
|
389
|
Yellow
|
139
|
Grey
|
254
|
We visualized the green module as networks by Cytoscape and selected the top 100 gene pairs by sorting the weight of gene pairs. As shown in Figure4D, it indicates that some genes, such as PDLIM7, EHHADH, DMGDH, and CYP8B1 with larger size have higher a node degree. Finally, we screened 144 immune-hub genes with high gene significance with OS and OS time (>0.1), and high relationship with interesting green module (>0.5) (Table S7).
Statistical gene set enrichment analysis about the 144 immune-hub genes was performed to find over-representation of functions from biological pathways like KEGG, Reactome, and complexes in CORUM, etc. This is done with the hypergeometric test followed by correction for multiple testing. The results showed that immune-hub genes were enriched significantly in 52 pathways and complexes (p < 0.05) (Figure4E; Tables S8), including KEGG pathways such as TNF signaling pathway, Metabolic pathways, Reactome pathways such as Phenylalanine and tyrosine metabolism, and CORUM complexes such as PLAUR−PLAU complex, IGF2R−PLAUR−PLAU complex, MAK−ACTR−AR complex, IGF2R−PLG−PLAU−PLAUR−LTGFbeta1 complex. These findings show that these hub genes not only affect the metabolism, apoptosis, and cell survival as well as inflammation and immunity of HCC, but also plays a pivotal role in the protein complexes of immune cells.
Establishment of the lasso cox-based prognostic gene signature
We subsequently revealed that 108 of the 144 immune-hub genes were significantly related to OS through univariate Cox regression analysis (results in Tables S9). Then, we performed lasso-penalized Cox analysis to further narrow the hub genes (Figure5AB). Seven genes were identified and utilized thereafter in constructing an ImmuneRiskScore model to evaluate the prognostic value of CRC patients. The seven genes identified and their cox coefficient were showed in Table 2 including Secreted Phosphoprotein 2(SPP2), Glucose-6-Phosphatase Catalytic Subunit(G6PC), Phospholipase B Domain Containing 1(PLBD1), ETS Variant Transcription Factor 4(ETV4), Phosphofructokinase Platelet(PFKP), G Protein Subunit Alpha Z(GNAZ), Asparaginase And Isoaspartyl Peptidase 1(ASRGL1). The GO enrichment analysis shows that these seven genes were enriched in several molecular functions such as asparaginase activity, beta-aspartyl-peptidase activity, 6-phosphofructokinase activity, glucose-6-phosphatase activity, and sugar-terminal-phosphatase activity. The formula for the ImmuneRiskScore model was described in the part of Materials and Methods.
Next, we divided HCC patients into high- and low-score groups, based on the lasso cox hub genes and ImmuneRiskScore, according to the optimal cut-off got from survminer package. The results indicated that five genes (PLBD1, ETV4, PFKP, GNAZ, ASRGL1) are risk factors, and that high-score samples had a worse OS than those with low score and the SPP2 and G6PC are protective factors (Figure5C). The last figure in Figure 5C showed the prognostic accuracy of ImmuneRiskScore (95% CI for HR: 0.48 (0.33-0.68), log-rank test p<0.0001). Additionally, the result of multivariate Cox regression analyses showed that the predictive value of the ImmuneRiskScore for HCC patients is not associated with common clinical variables (Tables S10).
Table 2 The result of Lasso regression
Genes
|
Coef.
|
SPP2
|
-0.27112
|
G6PC
|
-0.12328
|
PLBD1
|
0.66909
|
ETV4
|
0.359013
|
PFKP
|
0.402211
|
GNAZ
|
0.126409
|
ASRGL1
|
0.795796
|
Validation of the ImmuneRiskScore in TCGA CRC Cohort
To further investigate the prognostic value of ImmuneRiskScore, we conducted a validation analysis in another GEO cohort (GSE14520). The dataset was categorized into two groups based on ImmuneRiskScore; the results of analysis indicate that there is a significant prognostic value between ImmuneRiskScore and OS as well as recurrence, the high-score group had a worse OS rate than the low-score one, not merely in the TCGA cohort (Figure6AB). Figure 6C showed the prognostic accuracy of ImmuneRiskScore in the GEO dataset, which was considered as a continuous variable during investigation. The area under the ROC curves (AUC) of the prognostic model for OS was 0.608 at 1 year, 0.614 at 3 years, 0.620 at 5 years. These results indicated that ImmuneRiskScore is a good model to predict survival.
Stromal and Immune cell infiltration landscapes between high and low ImmuneRiskScore
By using ESTIMATE algorithm, we estimated the infiltrating cells and tumor purity of tumor tissue. Stromal score represented the presence of stromal cells in tumor tissue, and immune score indicated the infiltration of immune cells in tumor tissue, and combination of them stood for a measurement of tumor purity (Tables S11). We then distinguished differences in stromal and immune scores between high- and low-risk patients with LUSC. As shown in Figure 7 (A, B), the immune and stromal score in high-level patients were both significantly (Wilcox, p <0.05) higher than low-level HCC patients indicate the overall presence of immune and stromal levels are associated with Immune Risk Score.
We further estimated the ratio of 22 immune cell categories in HCC patients using the CIBERSORT method, the results are shown in Tables S12. The proportion distribution of immune cells is heterogeneity not only between the high and low-levels samples of ImmuneRiskScore but among the HCC samples (Figure S3). We made a comparison of the relative proportions of 22 immune cells between low and high ImmuneRiskScore in HCC patients, and found significant differences in B cells memory, Plasma cells, T cells CD4 memory activated, T cells regulatory Tregs, NK cells resting, Monocytes, Macrophages M0, Dendritic cells resting, Mast cells resting, Neutrophils(Figure 7C). Furthermore, we calculated the absolute immune infiltrated score of these 22 immune cells by combined with tumor purity. A similar comparison of the relative proportions of 22 types of immune cells between low and high ImmuneRiskScore was conducted and significant differences were found in B cells memory, Plasma cells, T cells CD4 memory activated, T cells follicular helper, T cells regulatory Tregs, NK cells resting, NK cells activated, Macrophages M0, Macrophages M2, Dendritic cells resting, Mast cells resting, Neutrophils. It can be easily obtained that several immune cells were associated with ImmuneRiskScore both in relative and absolute types including B cells memory, Plasma cells, T cells CD4 memory activated, T cells regulatory Tregs, NK cells resting, Macrophages M0, Dendritic cells resting, Mast cells resting, Neutrophils. These results also demonstrated that the variations in levels of immune cells in tumor environment might have a correlationship with the overall survival in HCC samples.
Correlations between the ImmuneRiskScore and immune biomarkers
The discovery of broad immune biomarkers of the TME could effectively predict clinical benefit to ICIs. We next introduced a few important immune biomarkers including PDL1, PD1, PDL2, CTLA4, CYT, and IFN gamma. Among these biomarkers, the immune checkpoint genes including PDL1, PD1, PDL2, CTLA4 are coordinate expression in HCC[35] and are worth attending for Immune checkpoint inhibitors which are revolutionizing the clinical treatment landscape. Cytolytic activity (CYT) focuses on cytotoxic T cells (CTL) and natural killer cells (NK) for their powerful ability to lyse tumor cells[36]. A recent study found that CYT-high hepatocellular carcinomas have stronger immunogenicity and favorable tumor microenvironments, which would result in better clinical outcomes[37]. CYT is measured based on the geometric mean of expression of granzyme A (GZMA) and perforin (PRF1). Interferon-γ(IFN-gamma) signature is a key cytokine to activate the PD-1 signaling axis by directly upregulating the ligands PD-L1 and PD-L2 mainly in tumor cell, which was produced by T cells activated, NK and NKT cells[38, 39]. The interferon gamma receptor expression can affect the mechanism of escape from host immune surveillance in hepatocellular carcinoma[40]. Tumor mutational burden (TMB) is defined as the number of nonsynonymous mutations detected in each megabase sequenced. TMB is proved to be associated with improved responses to checkpoint blockade, in HCC[41], melanoma[42] , non-small-cell lung cancer[43, 44]. Dysregulated signaling in the transforming growth factor beta (TGF-β) pathway plays a central role in inflammation, fibrogenesis, and immunomodulation in the HCC microenvironment[45].[46]. Transforming growth factor β (TGF-β) signalling in fibroblasts is documented as a pleiotropic cytokine having relationship with poor prognosis in multiple tumor categories[47, 48], and it is thought to be vital in advanced cancers in promotion of immunosuppression, angiogenesis, metastasis, tumor cell epithelial to mesenchymal transition (EMT), fibroblast activation and desmoplasia[49-51].
We further explored the relationship between ImmuneRiskScore and these immune biomarkers (Tables S13, Tables S14) by person which satisfying a bivariate normal distribution (Pearson result in Tables S15). As shown in Figure 8, ImmuneRiskScore were significantly positively correlated with both several immune inflamed biomarkers(PDL1: r=0.31;p=1.63e-09,95% CI: 0.21-0.40; PD-1: r=0.35;p=8.45e-12,95% CI: 0.25-0.43; PD-L2: r=0.27;p=1.02e-07,95% CI: 0.18-0.37; CTLA-4: r=0.42;p= 6.78e-05,95% CI: 0.33-0.50; CYT: r=0.21;p=5.41e-17,95% CI: 0.11-0.30, IFN gamma: r=0.42;p= 0.00037,95% CI: 0.17-0.36) and Pan-F-TBRS (r=0.27;p= 2.51e-07,95% CI: 0.33-0.50). All of the significant p value for correlation was smaller than 0.001, suggesting that ImmuneRiskScore is correlated with immunotherapy biomarker . In addition, there is no correlation between ImmuneRiskScore and TMB, indicating TMB is an independent factor mediating the TME in these two groups.
Tumor Immune Dysfunction and Exclusion(TIDE) is a gene expression biomarker developed for predicting the clinical response to immune checkpoint blockade[52]. We obtained the TIDE score for GSE14520 dataset (Tables S16) by the online webserver(http://tide.dfci.harvard.edu/). There is significant different from TIDE score between high and low ImmuneRiskScore (Wilcox test p= 4.588315e-05) and the AUC for ImmuneRiskScore is 0.659. The result indicated that ImmuneRiskScore might be a potential biomarker for immunotherapy especially for Immune checkpoint inhibitors.