ceRNA network development and tumor-infiltrating immune cell analysis in hepatocellular carcinoma

Hepatocellular carcinoma (HCC) is among the primary causes of cancer deaths globally. Despite efforts to understand liver cancer, its high morbidity and mortality remain high. Herein, we constructed two nomograms based on competing endogenous RNA (ceRNA) networks and invading immune cells to describe the molecular mechanisms along with the clinical prognosis of HCC patients. RNA maps of tumors and normal samples were downloaded from The Cancer Genome Atlas database. HTseq counts and fragments per megapons per thousand bases were read from 421 samples, including 371 tumor samples and 50 normal samples. We established a ceRNA network based on differential gene expression in normal versus tumor subjects. CIBERSORT was employed to differentiate 22 immune cell types according to tumor transcriptomes. Kaplan–Meier along with Cox proportional hazard analyses were employed to determine the prognosis-linked factors. Nomograms were constructed based on prognostic immune cells and ceRNAs. We employed Receiver operating characteristic (ROC) and calibration curve analyses to estimate these nomogram. The difference analysis found 2028 messenger RNAs (mRNAs), 128 micro RNAs (miRNAs), and 136 long non-coding RNAs (lncRNAs) to be significantly differentially expressed in tumor samples relative to normal samples. We set up a ceRNA network containing 21 protein-coding mRNAs, 12 miRNAs, and 3 lncRNAs. In Kaplan–Meier analysis, 21 of the 36 ceRNAs were considered significant. Of the 22 cell types, resting dendritic cell levels were markedly different in tumor samples versus normal controls. Calibration and ROC curve analysis of the ceRNA network, as well as immune infiltration of tumor showed restful accuracy (3-year survival area under curve (AUC): 0.691, 5-year survival AUC: 0.700; 3-year survival AUC: 0.674, 5-year survival AUC: 0.694). Our data suggest that Tregs, CD4 T cells, mast cells, SNHG1, HMMR and hsa-miR-421 are associated with HCC based on ceRNA immune cells co-expression patterns. On the basis of ceRNA network modeling and immune cell infiltration analysis, our study offers an effective bioinformatics strategy for studying HCC molecular mechanisms and prognosis.


Introduction
Hepatocellular carcinoma (HCC) is the 6th most frequent cancer globally and is linked to rising morbidity and mortality [1]. Despite advances in HCC treatment, including chemotherapy, targeted therapy, radiotherapy, Iodine125 seed implantation, transcatheter arterial chemoembolization (TACE), radiofrequency ablation, and immunotherapy, its overall 5-year survival has increased by 1-3% only and HCC relapse rate may reach 70% in the 5 years. Remarkably, a median survival of 7.1 months has been reported in advanced liver cancer patients without treatment [2]. Liver cancer is a major clinical challenge globally. Thus, effective personalized HCC biomarker and therapeutic strategies, as well as prognostic factors are urgently needed.
The hypothesis of competing endogenous RNA (ceRNA) refers to a brand-new gene expression regulation mode. CeRNAs, including long non-coding RNA (lncRNA) and circular RNA (circRNA), could competitively combine to micro RNA (miRNA) which will interfere miRNA binding to messenger RNA (mRNA) to regulate gene expression, thereby affecting cell function [3]. CeRNAs, correlations between mRNA, miRNA, and lncRNA have been identified in many diseases [4][5][6][7]. The tumor microenvironment (TME) is mainly composed of tumor cells and stromal components mixed with tumor-infiltrating immune cells. Understanding the immune infiltration is the key to improving the response rate and developing new strategies in tumor treatment [8,9]. However, few previous studies have paid attention to ceRNA networks and tumor-infiltrating immune cells in HCC. Understanding the molecular mechanisms underlying HCC tumorigenesis is critical for early detection, diagnosis, successful treatment, and prognosis determination. Here, we established a HCC-associated ceRNA network on the basis of gene expression datasets from The Cancer Genome Atlas (TCGA). Using CIBERSORT, we evaluated HCC tumor sample infiltration by various immune cells. We then established 2 nomograms for predicting HCC prognosis on the basis of important immune cells and the ceRNA network. The association of HCC-linked ceRNA with immune cells networks was assessed to determine potential molecular mechanisms.

Data abstraction and differential gene expression analysis
HCC RNA profiles along with the matching patient clinical data were abstracted the from TCGA data resource (https:// tcga-data. nci. nih. gov/ tcga/). HTseq counts and fragment mapping per million exons per thousand bases for 421 samples (371 tumors and 50 normal samples) were assembled. edgeR was employed to uncover mRNA, lncRNA, and miRNA that were differentially expressed. False discovery rate (FDR), |log(fold change)|> 1.0 coupled with P = < 0.05 were set as cutoff thresholds for differentially expressed genes.

Construction of the ceRNA network
LncRNA-miRNA-mRNA ceRNA networks are hinged on the theory that lncRNAs can directly or indirectly serve as miRNA sponges to modulate mRNA activity. The ceRNA network was built using the package "GDCRNATools" R (http:// bioco nduct or. org/ packa ges/ devel/ bioc/ html/ GDCRN ATools. html) [10]. Cytoscape web resource (http:// www. cytos cape. org/) was employed to visualize the expression locations of the ceRNA network [11].

Survival analyses and nomograms of essential members in the ceRNA network
Kaplan-Meier (K-M) survival assessment was done to evaluate correlation between Biomarker expression and HCC survival [12]. Next, important variables in the initial Cox model were analyzed to identify those with prognostic value and important biomarkers integrated into the reduced Cox proportional hazard model. LASSO (minimal equivalent contraction and recovery of selective operators) utilizes contraction to lower the data value to a specific point to verify the validity of the constructed multifactorial model. Finally, a nomogram was created on the basis of the multivariate model to predict HCC prognosis. The nomogram may enable clinicians determine a prognostic score for each biomarker based on its expression. The sum of individual scores may indicate prognosis and overall three-and five-year survival. Calibration and recipient behavioral characteristic (ROC) curve analyses were used to assess the nomograms accuracy.

CIBERSORT estimation
CIBERSORT uses gene expression data to determine the abundance, as well as the proportion of various types of immune cells in a mixed cell population [13,14]. Here, we employed CIBERSORT to evaluate the fractions of 22 immune cell types in HCC samples. Only cases with CIB-ERSORT outputs of P = < 0.05 were eligible for subsequent analysis. Wilcoxon rank-sum was used identify important immune cells in tumor vs non-malignant samples. Next, K-M survival assessment was employed to assess the association of the proportion of specific immune cells with HCC overall survival. Upon LASSO regression analysis, distinct immune cells were integrated into a Cox proportional hazard model and a nomogram developed for HCC prognosis prediction. The concordance index of the Cox model was employed to assess the nomogram's bias and precision. Pearson's correlation explored correlation between HCC biomarkers and immune cells.

Explanation and use of the nomograms
A nomogram is a graphical representation of an equation that predicts medical outcomes. Nomograms use a pointsbased system in which patients accumulate point based on the level of risk factors. The risk factors of two nomograms we built in the study were based on the fold change of gene expression between tumor tissues and normal tissues. Thus, comparing the target genes' expression of patients' sample and a normal sample, we could get the corresponding log (fold change) and scores in the nomograms. Then, we could predict the 1-, 3-or 5-year survival rate of patients.

Criterion and expression analysis of genes with essential differences
A schematic of our analysis process is shown in Fig. 1. Genes with FDR = < 0.05 as well as a log fold change > 1.0 or < − 1.0 were regarded as differentially expressed. Differential expression of mRNAs, miRNAs, and lncRNAs in tumor vs normal samples was calculated using DESeq2 ( Fig. 2a-c). Our analysis identified 104 lncRNAs, 107 miR-NAs, and 1222 mRNAs as upregulated and 32 lncRNAs, 21 miRNAs, and 806 mRNAs as downregulated (Fig. 2d).

CeRNA network construction and survival assessment
The ceRNA network was developed on the basis of interactions between 14 pairs of lncRNA-miRNA and 42 pairs of miRNA-mRNA (Fig. 3a, Table 1). K-M survival assessment evaluated the relationship of the prognosis with biomarkers in the HCC-linked ceRNA network. These analysis revealed that hsa-miR-421 (P = < 0.001) and hyaluronan mediated motility receptor (HMMR) (P = < 0.001) signified significance . These analysis revealed that hsa-miR-421

Establishment of the prediction model on the basis of the ceRNA network
LASSO regression analysis found HMMR, ARL2, RNF24, has-miR-326, RAP2A, S100A10, and hsa-miR-421 as essential for modeling, which were then integrated into Cox regression analysis and a nomogram built for prognosis prediction. Area under curve (AUC) values for threeand five-year survival were 0.691 and 0.700, respectively, indicating good efficiency and accuracy. Additionally, the identification of the nomogram can be seen through the calibration curves (Fig. 4a-g). Immune cells related to the HCC, Histograms and heat maps which illustrated the composition of immune cells in HCC were evaluated by the CIBERSORT algorithm (Fig. 5a, b). The results of Wilcoxon rank-sum test showed that there were relatively high levels of T CD4 naive cells (P = 0.004), T gamma delta cells (P = 0.005), T regulatory (Tregs) cells (P = 0.016), mast resting cells (P = 0.005) and dendritic activated cells (P = 0.031) were relatively in liver cancer (Fig. 5c).

Development of the prediction model on the basis of the immune cells
We integrated five of 22 immune cells with remarkable prognostic potential in the Cox regression into the final multivariable model with satisfactory estimation power (concordance index = 0.660) and used to establish the nomogram. LASSO regression analysis revealed that all 6 genes were important for modeling. ROC and calibration curve analysis revealed acceptable accuracy (three-year survival AUC: 0.674; 5-year survival AUC: 0.694), indicating good concordance ( Fig. 6a-g).

Discussion
Mounting evidence shows that genetic alterations in signaling pathways drive HCC tumorigenesis, highlighting the importance of molecular biomarkers in liver cancer detection. Although the mechanisms underlying HCC tumorigenesis are unclear, cellular and molecular changes are influential factors. Here, we first evaluated differentially expressed ceRNA and immune cells infiltration in normal vs tumor HCC tissues. We then developed nomograms for predicting HCC prognosis. The excellent concordance index and AUC value in the 2 nomograms may guide HCC survival prediction. K-M survival along with correlation analysis indicated that the ceRNA modulatory network of SNHG1 (lncRNA), HMMR (mRNA), hsa-miR-421 (miRNA), and Tregs may influence HCC progression. It is estimated < 2% of the human genome is proteincoding, suggesting that most human transcripts are noncoding [22]. mRNAs, miRNAs, and lncRNAs are linked in an intricate crosstalk network by competing endogenous RNA [23]. Interaction between miRNA, lncRNA, and mRNA, in ceRNA networks is extensively studied in various disorders, including lung, gastric, and gallbladder cancers [24,25]. However, the mechanisms of ceRNA networks in HCC oncogenesis are largely unclear. Here, we find that in a ceRNA network, SNHG1 (a lncRNA), may downregulate hsa-miR-421 and upregulate HMMR to promote HCC tumorigenesis. Hsa-miR-421 is reported to play a core role in malignant transformation, which is consistent with our findings [26,27]. HMMR is an important component of polo-like kinase 1 (PLK1)-dependent mitotic spindle localization cascade, which is essential for neurodevelopment, neonatal survival, and tumorigenesis [28]. HMMR is upregulated in many cancers, such as lung cancer, glioblastoma, prostatic adenocarcinoma, and leukemia [29][30][31][32]. HMMR has been documented to enhance EMT (epithelial-mesenchymal transition) and carcinogenesis by activating TGF-β/Smad2 signaling in gastric cancer [33]. Assessment of 1420 colorectal cancer tissues indicated that HMMR may be a better prognostic factor relative to tumor grade as well as vascular infiltration [34]. How HMMR modulates cell cycle-linked gene expression has not been studied. HMMR antisense lncRNA HMMR-AS1 stabilizes HMMR mRNA and enhances the progress of lung adenocarcinoma, epithelial ovarian cancer, as well as glioblastoma [35][36][37]. HMMR negatively correlates with poor HCC prognosis, highlighting it as a potential indicator of HCC prognosis [38].
We also uncovered different proportions of immune cells in liver cancer. Naïve CD4 T cells, resting mast cells, gamma delta T cells, Tregs, and activated dendritic cells have been shown to be associated with HCC. A nomogram based on 5 immune cell types, built for overall survival prediction (consistency index = 0.66) may have high clinical value.
Tregs, which are CD25 + , account for 5-10% of CD4 + T cells and are one of the most widely studied immune cell types due to their inhibitory effects on tumor pathogenesis. Tregs repress the activation as well as the proliferation of CD4 + and CD8 + T cells in vitro and in vivo using various mechanisms consisting of cell-cell contact along with secretion of immunosuppressive cytokines like TGF-β, IL-10, and IL-35 [39]. Tregs are believed to be the most important determinant of hepatitis B prognosis [40]. Infection with hepatitis B virus (HBV) is linked to > 60% of liver cancer cases in developing countries and < 25% in developed countries. Our data show that Tregs are upregulated in HCC. It is suggested that Tregs depletion may slow HCC progression [41].
Levels of tumor-infiltrating dendritic cells (DCs) correlate with clinical outcomes. CD14 + CTLA4 + regulatory DCs have the same inhibitory effect as Tregs in HCC. These DCs inhibit T-cell response and upregulate PD-1 via IL-10 and CTLA4 [42].
High CD4 + T-cell levels also hinder liver cancer development. Anti-programmed death 1 antibody on the surface of CD4 + cells can demonstrate the clinical outcomes of HCC patients.
In the innate immune cells, total mast cells, monocytes, DCs, as well as neutrophils are considerably changed. Mast cells have key immune regulation functions, and the mechanism of cell inactivation in HCC remains unclear. IgE, a mast cell activator has been reported in HBV-linked HCC [43]. Our data show a higher proportion of active mast cells in healthy livers and a higher proportion of inactive subtypes in advanced HCC. COX evaluation also established that the proportion of resting mast cells remarkably correlated with prognosis. Mast cells constitute the fastest immune cell responders that are particularly enriched in barrier sites. Their rapid release is mediated by factors like vasoactive amines, proteases, cytokines, and proteoglycans, from intracellular storage upon FCER1-bound IgE crosslinking [44]. However, mast cell function in abnormal livers is still unclear and warrants further investigation.
This study has several limitations. Firstly, multivariable survival assessment included only basic prognostic factors from TCGA and could not suggest other potential clinical factors like metastatic lesions status and patients' performance status. Secondly, preponderant evidence suggests that HCC is a very heterogeneous tumor at genetic and molecular levels. Due to study limitations, gene expression data from Gene Expression Omnibus (GEO), Sun Yat-sen Memorial Hospital (SYMH), and International Cancer Genome Consortium (ICGC) cohorts was determined from one piece of HCC tissue from a single patient. Future studies should evaluate gene expression in several HCC specimens from a patient via single-cell whole-genome sequencing or Quantitative Real-time PCR (RT-qPCR) analysis to determine if the line map is a reliable and viable predictor of HCC overall survival markers. In addition, cooperation with other hospitals should be considered to recruit more patients and samples for gene model validation.  , c) Lasso regression. HMMR, RNF24, RAP2A, S100A10, ARL2, has-miR-326 and hsa-miR-421 are incorporated into the Cox proportional hazards model. e Nomogram for predicting patients' outcome based on RNAs (HMMR, RNF24, RAP2A, S100A10, ARL2, has-miR-326 and hsa-miR-421) in (a). d ROC curves and f calibration curves for assessing the discrimination and accuracy of the nomogram. Besides, AUCs of the 3-and 5-year survival were 0.691 and 0.700, respectively. AUC, area under curve; ROC, receiver operating characteristic