Novel Prognostic Markers for Liver Hepatocellular Carcinoma and Their Roles in Immune Inltration of the Tumor Microenvironment

Background: Liver hepatocellular carcinoma (LIHC) is one of the most common malignant cancers worldwide, the overall prognosis of LIHC remains unsatisfactory. Valuable prognostic biomarkers are still urgently needed for LIHC. This study aimed to explore hub genes associated with the prognosis of LIHC and tumor microenvironmental immune inltration, providing potential prognostic biomarker and therapeutic target for LIHC. Methods: RNA-seq counts data for LIHC samples were obtained from TCGA database. RNA-seq counts data for normal liver samples were obtained from GTEx database. Weighted gene co-expression network analysis (WGCNA) was used to cluster differentially expressed genes with similar expression proles to form modules and signicant modules and key genes were screened. Next, these genes was veried by cox analyses and overall survival analysis. Further, CIBERSORT was used to explore the relationship between these genes and tumor inltrating immune cells. Results: A total number of 2661 signicant DEGs were included for consensus WGCNA analysis, which identied 6 modules. Blue module (r=0.85, p<0.0001) showed high relationship with LIHC, which included 400 genes. After the overall survival analyses of hub genes, CDC20, CDCA5, CDCA8, KIF2C and KIFC1 were identied as ve potential marker genes, which would result in an unfavorable prognosis in LIHC. Further CIBERSORT analysis showed these novel biomarkers expression levels in LIHC were positively correlated with activated memory CD4+ T cells, follicular helper T cells, regulatory T cells and macrophages M0. While, resting memory CD4+ T cells, monocytes, macrophages M2, resting mast cells showed a negative correlation with the 5 novel biomarkers expression levels. Conclusions: The study screened 5 genes with marked prognostic capability for LIHC and found these genes were correlated with the inltration of immune cells in LIHC tumor microenvironment. The ndings might provide a more detailed molecular mechanism underlying LIHC occurrence and progression, holding promise for acting as potential biomarkers and therapeutic targets.


Background
Liver hepatocellular carcinoma (LIHC) is one of the most common malignant cancers and is currently the third leading cause of cancer-related deaths worldwide [1]. The incidence of LIHC is increasing in regions that have conventionally been low incidence areas, such as North America and some European countries [2]. Although great advancements have been made in LIHC treatment such as liver transplantation and surgical resection [3], the overall prognosis of LIHC remains unsatisfactory, which is mainly attributed to its rapid progression, high recurrence rate and short overall survival (OS) time [4]. As a result, it is still important to identify new diagnostic and prognostic markers.
Normally, stroma maintains the tissue homeostasis and acts as a barrier toward tumor formation; however, when a cell starts to be cancerous, it is surrounding matrix changes in a way to support cancer development [5,6]. This modi ed stroma around the malignant cells is termed tumor microenvironment (TEM) [7]. There are several components in the typical tumor microenvironment, including immune and in ammatory cells, stromal broblasts, endothelial cells, blood vessels, mesenchymal stem cells, neural cells, adipose cells, chemokines, cytokines and extracellular matrices [8]. There is a lot of evidence showing that LIHC initiation and progression bene ciate from its associated tumor territory [9][10][11][12].
Recent years, the eld of cancer immunotherapy is moving fast because of the encouraging clinical results obtained of cytotoxic T lymphocyte-associated protein 4 (CTLA-4) and programmed cell death protein 1(PD-1) inhibitors for the treatment of cancer [13][14][15]. LIHC is an in ammation-driven disease with potentially chronic liver in ammation and cirrhosis, and just has fewer chromosomal aberrations, suggesting a combination of immunological interventions may be more effective with conventional treatment of this disease [16]. In the tumor microenvironment, non-malignant cells can help tumor cells to proliferate, invade and metastasize [9]. Multiple immune cells coexist and interact in a complex series of pathways that ultimately lead to tumor carcinogenesis [10]. The immunosuppressive features of TEM not only play as one of the major roles inducing cancer progression but also a big challenge for effective immunotherapy. Therefore, it is critically important to get an in-depth understanding of its speci c functions and mechanisms.
In recent years, with the development of high-throughput research technology and genome-wide microarray technology, the application of bioinformatics tools in biological and medical studies, including research regarding cancer early diagnosis, cancer grading and prognosis prediction, has increased remarkably [17]. Large numbers of changed genes can be detected by these technologies, and can be further analyzed to identify critical signi cant genes as novel biomarkers [18]. Weighted gene coexpression network analysis (WGCNA) is an important bioinformatics tool that can construct gene coexpression networks to explore the correlations among different gene clusters or the relationships between gene clusters and clinical features and detect hub genes that may serve as diagnostic, prognostic or therapeutic targets [19]. CIBERSORT algorithm, an online tool, can be used to analyze largescale RNA mixtures and identify cell types in the bulk RNA-seq matrix [20]. Meanwhile, there are plenty of public free databases, such as TCGA and the GEO, from which abundant information can be excavated via various bioinformatics methods. These bioinformatic approaches and databases make it possible for us to achieve a system-level insight into the complex biological processes that underlie cancer. Although alpha-fetoprotein (AFP) and AFP mRNA have been used as potential prognosis biomarkers for LIHC [21], their applications have certain limitations due to they rely on signi cant tumor burden, and the evaluation of their value has been incomplete [22]. Therefore, there is an unmet need to screen and identify new prognostic markers for LIHC.
In the current study, we analyzed 226 normal liver samples from GTEx database and 371LIHC samples from TCGA database to get differentially expressed genes (DEGs). And 2661 DEGs, including 1092 upregulated genes and 1569 down-regulated genes, were selected. These DEGs were analyzed by gene ontology (GO) analysis, Kyoto encyclopedia of genes and genomes (KEGG) analysis and gene set enrichment analysis (GSEA). Then, we constructed a gene co-expression network based on WGCNA and identi ed 6 modules associated with clinical features of LIHC. Blue module showed high relationship with LIHC; the top 20 genes of the blue module were hub genes. The overall survival analyses of hub genes revealed that ve genes (CDC20, CDCA5, CDCA8, KIF2C, KIFC1) were associated with worse prognosis in LIHC prognosis. The unfavorabled prognostic values of these 5 genes were also identi ed by cox analyses. Importantly, we also observed that the novel biomarkers were signi cantly upregulated in LIHC, thus providing highly reliable analytic results. In addition, the expression levels of these genes were found had a strong correlation with in ltrating immune cells, suggesting that these hub genes may through regulating the levels of immune in ltration in LIHC TEM to promote the occurrence and development of LIHC.

Microarray Data
RNA-seq counts data for LIHC samples were obtained from TCGA database. RNA-seq counts data for normal liver samples were obtained from GTEx database.

DEGs Analysis
After using Limma package to normalize the gene expression matrix, |log2Fold change|≥2 and p ≤ 0.01 were used for screening DEGs between LIHC samples and normal samples. The heatmap of all samples was generated by the pheatmap package, and the volcano map was produced by the ggplot2 package. GO analysis, a regular method in the annotation of large-scale functional enrichment studies, is normally classi ed into MF, BP, and CC categories [23]. The KEGG is a database resource for understanding highlevel functions and utilities of the biological system [24]. GSEA is a computational method that functions to identify classes of genes that are overrepresented in a large set of genes that may have a connection with disease phenotypes [25]. ClusterPro ler package was used to analyze GO, KEGG and GSEA.
Weighted Gene Co-Expression Network Analysis (WGCNA) Weight co-expression network was performed in accordance to the protocol of WGCNA package in the R language. A gene coexpression similarity measure was rst calculated and used to relate every pairwise gene-gene relationship. The adjacency matrix and topological overlap matrix (TOM) were then constructed utilizing a 'soft' power adjacency function [26]. "Gene modules" are groups of genes that have high topological overlap. Modules were generated using algorithms and blockwiseModules function of WGCNA. Then, the expression level of each module was calculated, which is also called module eigengene (ME). Finally, the most related genes with LIHC were identi ed in a module.

Identi cation and Validation of Biomarkers
A key module, which including 20 hub genes was identi ed after weighted gene co-expression network analysis. The hub genes were validated using Kaplan-Meier Plotter database, and ve novel biomarkers were nally screened. Then, translational levels between LIHC and normal specimens and survival analysis of these genes were validated by the ggstatsplot package. Also, univariate and multivariate cox analyses were conducted to explore the contributions of novel biomarkers.

The Tumor Microenvironment Analyses
CIBERSORT is a versatile computational method for quantifying cell fractions from bulk tissue gene expression pro les (GEPs), can accurately estimate the immune composition of a tumor biopsy [27].
Using CIBERSORT to identify the in ltrating immune cell populations of the bulk RNA-seq samples from TCGA database. Kaplan-Meier curves were used to show the relationship between LIHC patients' overall survival and TEM immune in ltration. Log-rank test was used to test the relationship.

Different Gene Expression from Data between TCGA and GTEx Was Analyzed
After data processing, a total number of 2661 signi cant DEGs, including 1092 up-regulated and 1569 down-regulated genes, were identi ed between normal liver samples and LIHC samples. The expression of DEGs shown in Fig. 1A by heatmap. Figure 1B showed the distribution of all the signi cantly different expressed genes on the two dimensions of -log10 (false discovery rate, FDR) and log2 (fold change, FC) through a volcano map. To test the biological function of the identi ed genes, information from differentially expressed genes was applied to GO and KEGG analysis. The GO results showed that leukocyte migration, organelle ssion, nuclear division were signi cantly enriched in biological process (BP). For cellular component (CC), DEGs were particularly enriched in extracellular matrix, collagencontaining extracellular matrix and chromosome region, and so on. Besides, receptor regulator activity, receptor ligand activity and so on were signi cantly enriched in molecular function (MF) (Fig. 1C). KEGG analysis demonstrated that DEGs were particularly enriched in neuroactive ligand-receptor interaction, cytokine-cytokine receptor interaction, cell cycle, and so on ( Novel Biomarkers Relevant to LIHC Were Identi ed via WGCNA Next, the gene co-expression networks were conducted on 2661 DEGs of all samples. As shown in Fig. 2A, softpower 6 was chosen to construct nets, and there were 6 modules were identi ed. Then, genes in the 6 color modules were continuously used to analyze the module-trait (LIHC and normal) co-expression similarity and adjacency. Blue module (r = 0.85, p < 0.0001) showed high relationship with LIHC, which included 400 genes (Fig. 2B). The results of the cluster (Fig. 2C) also revealed that blue module was closely relevant to LIHC. As shown in Fig. 2D

Validation of Novel Biomarkers
After the overall survival analyses of hub genes, we identi ed ve potential marker genes, which would result in an unfavorable prognosis in LIHC (Fig. 3A). The expression levels of CDC20, CDCA5, CDCA8, KIF2C and KIFC1 were all signi cantly upregulated in LIHC (Fig. 3B), indicating a potential role of these genes in LIHC occurrence and development. Next, a univariate cox proportional hazard regression analysis was conducted to further verify the association of the expression levels of these genes mentioned above with overall survival. As shown in Table 1, univariate cox analysis showed the high expression of CDC20, CDCA5, CDCA8, KIF2C and KIFC1 would make a poor prognosis. Further multivariate cox proportional hazard regression analysis revealed that high CDCA8 expression was related to inferior prognosis, which indicated that CDCA8 could serve as an independent risk factor to predict the prognosis of LIHC. These results above further validated the potential of these genes to be taken as new biomarkers for prognosis. And it is notably that macrophages M0, macrophages M1, macrophages M2, resting memory CD4 + T cell, CD8 + T cells and regulatory T cells made up the majority of these in ltrating immune cells (Fig. 4). Next, Pearson's correlation analyses were conducted to identify the relationship between novel biomarkers and in ltrating immune cells. As showed in Fig. 5, the novel biomarkers expression levels in LIHC were positively correlated with activated memory CD4 + T cells, follicular helper T cells, regulatory T cells and macrophages M0. While, resting memory CD4 + T cells, monocytes, macrophages M2, resting mast cells showed a negative correlation with the 5 novel biomarkers expression levels. Moreover, Kaplan-Meier survival analyses revealed that high level in ltration of follicular helper T cells and macrophages M0 was related with poor prognosis, but less activated mast cells got a worse prognosis (Fig. 6A, B, D), which were consistent with their relationship with the 5 genes above ( Figure S1-S4). Although in ltrating resting mast cells also showed positive relationship with patient prognosis, the P value was not statistically signi cant (Fig. 6C). Together, these results indicated that the novel biomarkers may involve in regulating the immune in ltration of LIHC TEM, further promoting the development of LIHC.

Discussion
Among all cancers, Liver hepatocellular carcinoma is the fth most frequently diagnosed cancer, ranking as the third leading cause of cancer-related death, seriously impacting human health [28]. Currently, the main LIHC treatment strategies include surgical resection, microwave ablation, radiofrequency ablation and transcatheter arterial chemoembolization (TACE) [29,30]. Although surgical resection is believed to have a de nitive curative effect for LIHC, the clinical outcome is still poor due to frequent recurrence and metastasis [31][32][33]. Additionally, most LIHC cases are detected in advanced stages with the invasion of major blood vessels, obvious extrahepatic metastases or poor liver function, making them un t for surgical resection [34]. Better biomarkers for predictive and prognostic molecules for LIHC are still required.
In this study, bioinformatics and comprehensive analyses of multiple datasets were used to screen genes that proved to be novel prognosis factors for LIHC. Firstly, we compared the translational pro les between normal and cancerous samples to obtain DEGs. GO and KEGG enrichment analysis was performed to interpret the functions and pathways of DEGs. The enriched BPs included eukocyte migration, organelle ssion, nuclear division, chromosome segregation, humoral immune response, and so on. These processes had strong relationship with LIHC progression. The processes of GO CCs and GO MFs are also typically representative features of LIHC progression. The enriched KEGG pathways included neuroactive ligand-receptor interaction, cytokine-cytokine receptor interaction, cell cycle, and so on, which are also associated with LIHC. Then, co-expression network analysis by WGCNA identi ed that blue module was signi cantly associated with LIHC traits. The genes in these modules were taken intersection with DEGs between pathology stages I and IV. Finally, CDC20, CDCA5, CDCA8, KIF2C and KIFC1 were screened out. Using GEPIA based on the TCGA database, we proved the expression levels of CDC20, CDCA5, CDCA8, KIF2C and KIFC1 were signi cantly up-regulated in LIHC. And the survival analysis showed that high expression of these genes was associated with poor prognosis. Moreover, univariate cox analysis showed that the high expression of CDC20, CDCA5, CDCA8, KIF2C and KIFC1 would make a poor prognosis.
Further multivariate cox proportional hazard regression analysis revealed that high CDCA8 expression was related to inferior prognosis, which indicated that CDCA8 could serve as an independent risk factor to predict the prognosis of LIHC.
Strikingly, the 5 genes have been reported correlated with clinical outcomes of a huge number of solid tumors. Cell division cycle 20 homologue (CDC20), also called Fizzy, speci cally activating the anaphasepromoting complex-cyclosome, promoting ubiquitination and proteolysis of cell-cycle-regulatory proteins, which is essential for anaphase-promoting complex activity, initiation of anaphase, and cyclin proteolysis during mitosis [35]. In recent years, mounting evidence has revealed that CDC20 plays an oncogenic role in human tumorigenesis. Overexpression of CDC20 was observed in a variety of human tumors, including pancreatic cancer [36], breast cancer [37], prostate cancer [38], lung cancer [39], colorectal cancer [40] and liver hepatocellular carcinoma [41]. Li et al. reported that overexpression of CDC20 was observed in 68% liver hepatocellular carcinoma tissues compared to adjacent non-tumor liver tissues. Moreover, high levels of CDC20 were positively correlated with gender, tumor differentiation, and TNM stage [41]. In our study, we found the overexpression of CDC20 may result in an unfavorable prognosis and promote TEM immune in ltration in LIHC patients. Cell division cycle associated 5 (CDCA5), is an important element for the interaction between cohesin and chromatin in interphase. As reported, the expression of CDCA5 was upregulated in LIHC tissues compared to paracancerous tissues and had a negative correlation with patient survival. Further study showed that CDCA5 promotes oncogenesis by enhancing cell proliferation and inhibiting apoptosis via the AKT pathway in liver hepatocellular carcinoma, plays an important role in LIHC progression [42]. These results were consistent with our ndings. The cell division cycle associated 8 (CDCA8) plays an important role in mitosis [43]. CDCA8 is a putative oncogene that is up-regulated in many types of cancer tissues, such as lung cancer [44] and gastric cancer [45]. While, little is known about its role in liver hepatocellular carcinoma. Our study revealed that CDCA8 was signi cantly upregulated in LIHC and linked to poor prognosis. Besides, CDCA8 plays a vital role in regulating immune cell of TEM and could serve as an independent risk factor to predict the prognosis of LIHC. Kinesin family member 2C (KIF2C), the kinesin-like protein functioning as a microtubule-dependent molecular motor, participates in spindle assembly and microtubule disaggregation, thereby determinately regulating cell cycle during mitosis [46][47][48][49]. A recent study rstly demonstrated KIF2C was substantially higher expression in tumor tissues than adjacent nontumor tissues, upregulated PCNA and CDC20 expression and was signi cantly involved in growth promoting pathways, fully con rming our results that it could serve as a prognostic indicator and confer a novel target for clinical treatment. Kinesin family member C1 (KIFC1), also known as HSET, is a minus end-directed motor protein [50], which is a critical role in centrosome clustering in cancer cells [51]. Fu x et al. investigated the expression of KIFC1 in paired liver hepatocellular carcinoma tissues and adjacent non-cancerous tissues and found that the expression of KIF2C was upregulated and connected with a poor prognosis of LIHC [52]. Our analyses further revealed that the overexpression of KIFC1 was a novel predictive marker in patients with LIHC and involved the regulation of immune cell in LIHC TEM.
The association of in ammation and cancer was rstly hypothesized by Rudolf Virchow observations in 1863 as chronic irritation theory. Virchow found that certain cancers are associated with in ammatory macrophages [53]. In recent years, more and more studies have proved the important role in ammatory cells played in the development of cancer. Neutrophils, monocytes, lymphocytes, dendritic cells, eosinophils and mast cells are the commonly observed cells in tumor stroma although their count depends on cancer type [53,54]. In this study, we illustrated the percentage distribution of immune cells in LIHC TEM, nding macrophages M0, macrophages M1, macrophages M2, resting memory CD4 + T cell and CD8 + T cells accounted for most of the immune cells. Remarkably, increasing number of studies showed that macrophages facilitate cell proliferation, angiogenesis, metastasis and invasion, which were important in the initiation, development and metastasis of primary hepatocarcinoma [55].In the early stage of tumorigenesis, M1 macrophages eliminate the tumor cells as soldiers of adaptive immunity. However, in advanced stages, M1 macrophages replaced with M2-type. M2 macrophages suppress the adaptive immune system and promote the cancer proliferation, angiogenesis and extracellular matrix remodeling as well. Eventually, tumor cells escape from the immune barriers and invade [56,57]. Additionally, CD4 + T cells and CD8 + T cells were con rmed to participate in immune recognition and escape of LIHC [58]. Actually, tumour in ltrating lymphocytes (TILs) form a large component in solid tumours, in an attempt by the host to mediate an antitumour reaction [59]. However, this cellular response can be dysfunctional with a higher proportion of CD4+ (helper or T regulatory cells) to CD8 + cells. This promotes immune tolerance and has been shown to confer a worse prognosis [60]. E.A. Said et al found that the binding of PD-L1 (expressed on other cells) to its receptor PD-1 on macrophages promotes IL-10 release and thereby CD4 + T cell repression [61]. Effector CD8 + cells within LIHC tumors show intense PD-1 expression and the number of PD-1 + CD8 + cells was found to be related to disease progression and post-operative recurrence [62]. Since the existing treatment strategies for LIHC have great limitations, immunotherapy is a promising therapeutic option. Our results revealed the 5 novel biomarkers expression levels had a crucial association with parts of immune cells, which suggested these genes may involve in the level regulation of the immune cells of TEM in LIHC, may help to promote the application of immunotherapy in liver cancer.

Conclusion
In conclusion, this study aimed to identify hub genes involved in LIHC by WGCNA of data from the TCGA database and we screened 5 genes with marked prognostic capability for LIHC. These genes are highly differentially expressed in tumor and non-tumor tissues. Further bioinformatic analysis showed that these genes can act as biomarkers of LIHC prognosis. Besides, these genes were correlated with the in ltration of immune cells in LIHC TEM. The ndings might provide a more detailed molecular mechanism underlying LIHC occurrence and progression, holding promise for acting as potential biomarkers and therapeutic targets. However, further molecular experiments on these genes in LIHC, in vivo and in vitro, need to be carried out. This study was an analysis of the third-party anonymized databases with preexisting IRB approval.

Consent for publication
Not applicable.

Availability of data and materials
The datasets supporting the conclusions of this article were available in the TCGA (https://portal.gdc.cancer.gov), GTEx database (https://gtexportal.org/home/datasets).

Competing interests
The authors declare that they have no competing interests.     The heatmap of pearson's correlation. Pearson's correlation between in ltrating immune cells and the novel biomarkers. Pearson's r values were presented in the heatmap.