Clinical information and expression data and were obtained from The Cancer Genome Atlas (TCGA) database. Cancer-related transcription factors (TFs) and immune genes were available in the Cistrome database and the ImmPort database, respectively. Also, immune infiltrate data was available in Tumor Immune Estimation Resource.
Detection of the DE immune genes
We applied the Wilcoxon signed-rank test to identify DE genes and DE immune genes with R software. The cut-offs were false discovery rate (FDR) < 0.05 and Log2(fold change [FC]) > 1. The DE genes and immune genes were presented in the volcano plot and heatmap using the gplots package and Pheatmap package.
Function enrichment analyses of the DE immune genes
We applied the Database for Annotation, Visualization, and Integrated Discovery (DAVID) (v6.8) for Gene Ontology (GO) analysis and ClusterProfiler package for the Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. GO terms, comprising cellular component, biological process, and molecular function, were considered significantly enriched when Bonferroni correction < 0.001 and FDR < 0.05, and same for KEGG terms when p.adjust < 0.001 and GeneRatio > 0.05.
Construction of a DE immune genes-TFs network
We detected the DE TFs according to the DE genes and cancer-related TFs. Cor.test() was used to estimate the correlation between DE immune genes and DE TFs. Correlation coefficient > 0.6 and p < 0.001 was set as a threshold to get more robust interaction pairs between DE immune genes and DE TFs. The qualified interactions were imported to Cytoscape (v3.7.2) to construct a network.
Establishment of a prognostic risk model
A risk model was developed with the training group. We performed univariate analysis to detect the qualified immune genes. Then, we conduct Lasso regression to eliminate functionally similar genes. At last, multivariate analysis was applied to find out the ultimate immune genes. The risk score was calculated according to the following formula:
N, Exp, and Coef represented gene number, expression level, and coefficient value, respectively. We divided all HCC patients into high-risk and low-risk groups due to the median of the risk score. A low-risk score shows good survival for HCC patients. Kaplan-Meier analysis was employed to compare the survival rate between the two groups. The SurvivalROC package was applied to perform receiver operating characteristic (ROC) analysis. It is reported that an area under the ROC (AUC) > 0.60 was viewed as suitable for predictions, and an AUC > 0.75 was believed to have perfect predictive value . Risk curves and the heatmap of risk genes were also utilized to assess this model.
Validation of the model
We conducted a survival analysis of the OS and performed the ROC analysis for the validation of the model in the testing group as well as the entire group.
The prognostic value of the risk model
All the clinical parameters, as well as the risk score, were analyzed with univariate and multivariate Cox analyses. Indicators were considered independent prognostic factors when P < 0.05 in both Cox analyses.
Applicability of the model
The correlation of the model (the risk score and expression level of the immune genes) with clinical parameters (age, gender, the histological grade, as well as the pathological stage), were analyzed. Patients were separated into two subgroups according to age (>=60 and <60 years old), gender (male and female), grade (G1&2 for well-differentiated and G3&4 for poor-differentiated), and stage (stage I&II and stage III&IV). Besides, we assessed the correlation between the risk score and the content of immune cells using the Pearson correlation coefficient test.