Systematic Analysis of the Function and Prognostic Value of RNA-Binding Proteins in Hepatocellular Carcinoma

Background: RNA-binding proteins (RBPs) are abnormally expressed in a variety of malignant tumors and are closely related to tumorigenesis, tumor progression, and prognosis. The role of RBPs in hepatocellular carcinoma (HCC) is unclear. Based on the cancer genome atlas (TCGA) database, we conducted a systematic bioinformatics analysis of abnormally expressed RBPs in HCC, with the aim of identifying the prognostic markers and potential therapeutic targets. Methods: HCC RNA sequencing data downloaded from TCGA database were used to determine the differentially expressed RBPs in livery cancer and normal tissues, followed by performing functional enrichment analysis and visualization of interaction relationships. Univariate and multivariate Cox regression analyses were subsequently used to identify RBPs that were signi�cantly related to the prognosis to construct a prognostic model. The predictive performance of the prognostic model was evaluated by survival analysis and receiver operating characteristic (ROC) curve analysis and veri�ed in the test cohort. Human protein atlas online database was used to verify the expression level of RBPs in the prognostic model. Results: In total, 82 differentially expressed RBPs were identi�ed, including 55 upregulated and 27 downregulated RBPs. Further functional enrichment and interaction analyses showed that the differentially expressed RBPs were mainly related to regulating of mRNA metabolic process, RNA catabolic, mRNA catabolic process, and macromolecule methylation. Five RBP genes, LIN28B, SMG5, PPARGC1A, LARP1B, and ANG were identi�ed as prognostic-related genes and used to construct the prognostic model. The predictive ability of the prognostic model was veri�ed in the test cohort. ROC curve analysis showed that the prognostic model had good sensitivity and speci�city. Independent prognostic analysis showed that the risk score may be an independent prognostic factor for HCC. Conclusion: This study constructed a reliable prognostic prediction model by analyzing the differentially expressed RBPs of HCC, facilitating the identi�cation of HCC prognostic biomarkers and therapeutic targets.


Introduction
Hepatocellular carcinoma (HCC) is one of the most common malignant tumors worldwide.A global malignant tumor report in 2018 showed that there were approximately 841,000 new cases of HCC, ranking 6th among new malignant tumors and involved in approximately 782,000 deaths.HCC is the 4th most common cause of cancer-related death, accounting for 46.6% new cases worldwide and 47.1% of global deaths [1] .The American Cancer Society estimates that there will be 42,810 new cases and 30,160 deaths of HCC in 2020 [2] ; therefore, active prevention and treatment are of great signi cance.Due to the insidious and rapid progress of HCC, only 30% of patients bene t from surgical resection and liver transplantation.Most HCC patients lose the opportunity to have surgery when they are diagnosed and are mainly treated by trans-arterial chemoembolization (TACE), radiofrequency ablation, cryoablation, molecular targeted therapies, and other adjuvant treatments.However, these therapies have not signi cantly improved the 5-year survival rate of patients with HCCs [3] .Hence, actively searching for molecular markers and therapeutic targets for predicting the prognosis of patients with primary HCC is important for improving prognosis.
RNA-binding proteins (RBPs) play important roles in post-transcriptional modi cation.In the human genome, more than 1,500 RBP genes have been identi ed through whole-genome sequencing.RBPs are a class of proteins interacting with a variety of RNAs such as, ribosomal RNAs, non-coding RNAs (ncRNAs), small nuclear RNAs (snRNAs), microRNAs (miRNAs), messenger RNAs (mRNAs), and transfer RNAs (tRNAs) [4] .In recent years, related studies have shown that abnormal functions of RBPs in tumor tissues due to genome changes, regulation of transcription and post-transcriptional modi cation, and posttranscriptional modi cation [5][6][7] , affecting the conversion of mRNA to protein and involved in tumorigenesis and tumor progression [8][9] .For example, RBPs affect the activation of tumor growthrelated signaling pathways and the expression of growth-related target genes, leading to tumor proliferation [10][11] .In addition, RBPs promote tumor metastasis by regulating the epithelialmesenchymal transition and ncRNA [12][13] .Many studies have shown that the expression of RBP in cancer tissues is signi cantly different from that in tumor-adjacent normal tissues and is closely related to the prognosis of cancer patients [14][15][16] .Thus, further in-depth studies of RBPs will not only help to reveal the pathogenesis of tumors but will also have signi cance for identifying tumor treatment targets and prognosis.Survival cancer models based on the expression of RBPs have been established to assess prognosis and identify treatment targets [17][18] .However, no relevant research on HCC is available.
The goal of this study was to identify independent prognostic markers to better guide the clinical treatment of HCC.To this end, we downloaded RNA sequencing and clinical data related to HCC from the cancer genome atlas (TCGA) database, identi ed RBPs differentially expressed between tumor and normal tissues through bioinformatics analysis, and systematically explored their potential functions and molecular mechanism to construct a prognostic prediction model in order to evaluate the prognosis of HCC patients.

Data collection and analysis of differential expression
RNA-seq FPKM data and clinical data of 374 cases of HCC and 50 cases of non-tumor samples were downloaded from the TCGA database (https://portal.gdc.cancer.gov;as of June 7, 2020) for analysis.
The Limma package of R software was used to perform differential expression analysis.The Wilcoxon signed-rank test was used to identify RBPs that were differentially expressed in tumor tissues and normal tissues [4] , with the cut-off values of false discovery rate (FDR) < 0.05 and |log 2 FC| >1.The pheatmap software package was used to prepare heat maps in this study.Ethical approval was not required in this study because of our strictly adherence to the publishing guidelines provided by the TCGA database.

Gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses
To fully analyze the biological functions of these differentially expressed RBPs, the ClusterPro ler package in the R software was used to perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses.GO has three categories: biological process (BP), cellular component (CC), and molecular function (MF).P < 0.05 and FDR < 0.05 were used as statistically signi cant thresholds in this study.

Protein-protein interaction network construction and module selecting
To analyze the correlation between differentially expressed RBPs, the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) (https://string-db.org/)database was used to establish the interaction between differential proteins, The Cytoscape (version 3.6.1)was used for network visualization.The key modules of PPI network were identi ed by the Molecular Complex Detection (MCODE) plug-in with both MCODE score and node count number > 4.

Prognostic model construction and veri cation
Univariate Cox regression analysis was performed on RBPs in the protein-protein interaction (PPI) to identify RBPs related to patient survival.Furthermore, samples with complete clinical information were included as the entire cohort and divided into training and test cohorts.Multivariate Cox regression was performed on survival-related RBPs in the training cohort to construct a prognostic model and to calculate the risk score, thereby evaluating the prognosis of the patients.The formula for calculating the risk score of each sample is as follows: where β represents the regression coe cient, and Exp represents the gene expression value.
To evaluate and verify the predictive power of the prognostic model, we divided the patients in the training and test cohorts into low-risk and high-risk groups, according to the median risk of the training cohort, and performed Kaplan-Meier survival analysis to compare the overall survival rates of the two groups.Receiver operating characteristic (ROC) curves were used to evaluate the reliabilities of the prognostic model.The ROC curve with the area under the curve (AUC) > 0.6 was considered an acceptable model.Univariate and multivariate Cox regression analyses with risk score and other clinical variable (e.g., age, gender, pathological grade in HCC, tumor stage) were used to determine whether the risk score could be used as an independent prognostic factor.In addition, a nomogram based on the RBPs in the prognostic model were prepared.

Genetic alteration analysis and veri cation of expression levels
The cBioPortal (https://www.cbioportal.org/)was used to analyze the genetic alternation of RBPs in the risk model.The human protein atlas (HPA) online database (http://www.proteinatlas.org/) was used to detect the expression level of RBPs in the prognostic model.

Identi cation of differently expressed RBPs in HCC patients
A owchart of our study design is shown in Fig. 1.Clinical information and gene expression data of 374 HCC samples and 50 non-tumor tissue samples were obtained from the TCGA database.A total of 1,343 RBPs were collected in this study.The Lamma package of R software was used after the differential expression analysis to identify 82 differentially expressed RBPs (FDR < 0.05 and |log 2 FC| >1), including 55 upregulated RBPs and 27 downregulated RBPs.The expression distribution of these differently expressed RBPs is shown in Fig. 2.

Functional enrichment analysis of the differentially expressed RBPs
To analyze the biological functions and related signaling pathways of the differentially expressed RBPs, GO and KEGG pathway enrichment analyses were performed.The results of GO enrichment analysis showed that the upregulated differentially expressed RBPs in BP were signi cantly enriched in regulating the mRNA metabolic process, RNA and mRNA catabolic processes, and macromolecule methylation.The downregulated differentially expressed RBPs were signi cantly enriched in the RNA catabolic process, regulation of translation, nucleic acid phosphodiester bond hydrolysis, and regulation of the cellular amide metabolic process.In MF, the upregulated differentially expressed RBPs were mainly and signi cantly enriched in catalytic activity, acting on RNA, mRNA 3'-UTR binding, and RNA helicase activity; and the downregulated differentially expressed RBPs were mainly and signi cantly enriched in ribonuclease activity, nuclease activity, and catalytic activity, acting on RNA.In the results of CC analysis, the upregulated differentially expressed RBPs were mainly enriched in cytoplasmic ribonucleoprotein granule, ribonucleoprotein granule, and cytoplasmic stress granule, and the downregulated differentially expressed RBPs were mainly enriched in mRNA cap binding complex, RNA cap binging complex, CCR4-NOT complex, and apical dendrite (Fig. 3A, B).
In addition, the results of KEGG pathway enrichment analysis showed that the upregulated differentially expressed RBPs were signi cantly enriched in the mRNA surveillance pathway, miRNAs in cancer, RNA transport, and other signaling pathways.The downregulated differentially expressed RBPs were signi cantly enriched in signal pathways, such as hepatitis C and in uenza A (Table 1).

PPI network construction and module selection
To further study the interaction between these differentially expressed RBPs, the STRING database was used to construct a PPI network, which included 64 nodes and 126 edges, followed by use of the Cytoscape for visualization (Fig. 4A).The screening was performed using the median of two topological features, degree and betweenness to obtain six pivotal genes: GSPT2, DDX39A, ELAVL2, IGF2BP1, BOP1, and OASL.The PPI network was processed through the MCODE plug-in, and two key modules were found (Fig. 4B, C, D).Module 1 contained 11 nodes and 27 edges, and module 2 contained 5 nodes and 9 edges.The results of functional enrichment analysis showed that module 1 was mainly enriched in the defense response to virus, nuclear-transcribed mRNA catabolic process, and RNA catabolic process (Table 2); module 2 was mainly enriched in DNA alkylation, DNA methylation or demethylation and DNA modi cation (Table 3).If there were more than ve terms in this category, the rst ve terms selected on the P value.If there were more than ve terms in this category, the rst ve terms selected on the P value.

Construction and veri cation of the prognostic model
Multivariate Cox regression analysis on the prognosis-related RBPs in the training cohort was performed to obtain ve RBPs for the construction of the prognostic model: LIN28B, SMG5, PPARGC1A, LARP1B, and ANG (Fig. 6).
According to the median risk score, 186 patients in the training cohort were divided into two groups for survival analysis: low-risk and high-risk groups.The results showed that compared with patients in the low-risk group, the survival rate of the patients in the high-risk group was poorer (Fig. 7A).ROC analysis showed that the model could be used to well predict the prognosis of HCC patients.The AUC values of the ROC curves of 1/3/5-year survival rates were 0.718, 0.731, and 0.657, respectively (Fig. 7B).The risk score curve, survival status distribution, and heat map of RBP gene expression of the patients in the highand low-risk groups are shown in Fig. 7C.To further verify the accuracy of the prognostic model, survival and ROC analyses in the test cohort consisting of 184 patients were performed.The results showed a signi cant difference between the survival rates of the high-and low-risk groups.The AUC values of the ROC curves of the 1-, 3-, and 5-year survival rates were 0.732, 0.642, and 0.627, respectively, suggesting that the prognostic model had good sensitivity and speci city (Fig. 8).
In addition, univariate and multivariate Cox regression analyses were used to evaluate the prognostic values of the risk score and other clinical features of the prognostic model in the training cohort and test cohort.Univariate Cox regression analysis suggested that the tumor stage and risk score were related to the prognosis of HCC patients.Multivariate Cox regression analysis suggested that the risk score could be used as an independent risk factor (Fig. 9).

Construction of a nomogram based on RBPs in the model
To better predict the survival of HCC patients quantitatively, the gene expression of RBPs in the prognostic model was used to construct a nomogram (Fig. 10).The expression of each RBP in the prognostic model corresponded to 1 point.The total points of the patients were calculated by adding all points and were subsequently compared with each prognostic axis to obtain the survival rates of the patient at 1, 2, and 3 years.

Genetic alteration analysis and veri cation of expression levels
The cBioPortal was used to analyze genetic alternation of genes signi cantly related to HCC prognosis including LIN28B, SMG5, PPARGC1A, LARP1B, and ANG.The results showed that alterations occurred in ve genes in 72 (20%) of the 366 HCC samples.The alternations were mainly mRNA upregulation (Fig. 11A, B).To verify the differences in the expression of these ve genes, immunohistochemical results of the ve corresponding genes in HCC were obtained through the HPA database.The results indicated that the expression of LARP1B was positive in HCC tissues and negative in normal tissues.LIN28B expression was negative in both HCC and normal tissues (Fig. 12).

Discussion
RBPs interact with other proteins or RNA to form ribonucleoprotein complexes, regulate RNA processing, translation, export, and localization, thereby maintaining the stability of the intracellular environment.
Abnormal expression levels and changes in activity can lead to various diseases, including tumors [19][20][21] .
Studies have shown that the post-transcription of RBPs are involved in the tumorigenesis and tumor development.However, their roles in tumors have not been completely revealed [22] .Thus, further research on the differential expression and interaction of RBPs in various tumors may reveal new mechanism for tumor progression and new targets for anti-tumor therapy.HCC is one of the most common malignant tumors of the digestive tract and has a poor prognosis.Hence, the development of effective and early screening and prognostic markers has positive signi cance for early treatment, intervention, and prognostic judgement.
This study collected the gene expression and clinical information data of 374 HCC samples and 50 nontumor tissue samples from the TCGA database and identi ed 82 differentially expressed RBPs, including 55 upregulated and 27 downregulated RBPs, followed by performing systematic analysis of the related biological pathways to construct PPI network of some RBPs.In addition, univariate and multivariate Cox regression analyses of the RBPs were performed to construct a risk model for predicting the prognosis of HCC based on ve RBPs genes, followed by performing cohort veri cation based on a test cohort.
The results of enrichment analyses showed that in biological functions, various differentially expressed RBPs were greatly enriched in BP, CC, and MF terms.These differentially expressed RBPs were signi cantly enriched in mRNA surveillance pathway, miRNAs in cancer, RNA transport, and hepatitis Cand in uenza A-related signaling pathways.RBPs interact with miRNAs, mRNAs, long ncRNAs, and cRNAs to form ribonucleoprotein complexes, thereby improving the stability of target genes, promoting gene expression, and playing a key role in the tumorigenesis and progression of many tumors [23][24][25][26][27] .For example, the abnormal expression of RBP-eIF3c promotes the proliferation of HCC, which is positively correlated with KRAS, vascular endothelial growth factor, and Hedgehog signaling pathways [28] .A previous study showed that the effects of RBPs in HCC on biological functions are mainly focused on RNA splicing, translation, transcription termination, RNA localization and transport, RNA surveillance and degradation, RNA modi cation, ribosome, tRNA, among others [29] .These ndings are consistent with the results of this study.
In addition, this study constructed a PPI network of differentially expressed RBPs, including 64 key RBPs to also collect 6 hub proteins including GSPT2, DDX39A, ELAVL2, IGF2BP1, BOP1, and OASL.GSPT2 is highly expressed in HCC, promoting the progression of HCC by affecting cell cycles [30] .DDX39A is upregulated in HCC tissue and cells.High DDX39A expression is positively correlated with advanced clinical stage and DDX39A activates the Wnt/β-catenin signaling pathway through β-catenin to promote HCC growth, invasion, and metastasis [31] .ELAVL2 activates endogenous proto-oncogenes, causing the progression of a variety of tumors [32] and is involved in tumor resistance to chemotherapy.High ELAVL2 expression may be an independent risk factor for poor chemotherapy response in patients with esophageal squamous cell carcinoma [33] .IGF2BP1 regulates the expression of some important mRNA targets required for tumor cell proliferation, growth, invasion, and chemotherapy resistance and is related to the overall survival rate and metastatic rate of various human cancers [34] .BOP1 regulates the epithelial-mesenchymal transition, leading to the invasion and migration of HCC cells [35] and playing an important role in the metastasis of colorectal cancer, with mechanism being related to the regulation of Wnt/β-catenin signaling pathway [36] .In addition to the core RBPs, a variety of other RBPs play an important role in tumorigenesis and tumor progression.For example, TERT is an important catalytic subunits of telomerase activation.It upregulates the transcription/activity in 80-90% of malignant tumors and is closely related to cell proliferation, tumor invasion, and transformation [37] .TRIM71 promotes the proliferation of non-small cell lung cancer (NSCLC) cells by inhibiting the kappa B/nuclear factor kappa B pathway.Upregulated expression of TRIM71 is related to the tumor size, lymph node metastasis, tumor-node-metastasis staining, and poor prognosis of NSCLC [38] .NR0B1 protein is detected in more than 50% of the human lung adenocarcinoma tissues and is highly expressed in the poorly differentiated cancer tissues in male [39] .It promotes HCC proliferation, invasion, and metastasis by regulating the Wnt/β-catenin signaling pathway [40] .By analyzing the key modules in the PPI network, this study showed that these modules were mainly related to defense response to virus, nuclear-transcribed mRNA catabolic process, and DNA modi cation.Subsequently, univariate Cox regression analysis was used to obtain 22 prognosis-related RBPs in this study.The multivariate Cox regression analysis of the TCGA training cohort was performed to obtain a construction of 5-RBP (i.e., LIN28B, SMG5, PPARGC1A, LARP1B, and ANG) prognostic risk model.LIN28B is highly expressed in liver cells and is related to the level of alpha-fetoprotein, which promotes the tumorigenesis and progression of HCC.The overall survival rate of HCC patients with high LIN28B expression in signi cantly shortened, which is related to the multidrug resistance of HCC [41] .SMG5 is highly expressed in malignant tumors, especially prostate cancer and it is a potential molecular marker for the early diagnosis of cancers [42] .A survey of HCC in the Han population in eastern China has shown that PPARGC1A is associated with the risk of HCC [43] .LARP1B belongs to one of the La ribonucleoprotein 1, translational Regulator (LARP1) genes.A study showed that high LARP1 protein levels in HCC tissue increased the risk of patient's death by approximately 35% (compared to the low protein levels), and are related to tumor size, survival time, and the Child-Pugh score [44] .However, the existing research mostly focuses on LARP1A, and only few studies in LARP1B are available.One report has showed that ANG-2 is highly expressed in HCC tissues.Related survival analysis has also shown that ANG-2 is a useful tumor marker for differential diagnosis of patients with HCC from those with chronic liver diseases or healthy subjects and prognosis prediction of HCC [45] .
Further veri cation of the reliability and stability of the model showed that our model accurately predicted the prognosis of HCC patients.ROC curve analysis also indicated that the prognostic model had good diagnostic capabilities to identify the HCC patients with poor prognosis.Subsequently, survival analysis and ROC curve analysis in the TCGA test cohort showed that the results supported the above conclusion, and our model independently predict the prognosis of the patients with HCC.A nomogram based on this model was constructed to help us more intuitively predict the 1-, 3-, and 5-year prognosis of the HCC patients.The HPA database was used to verify the expression of 5 RBPs in immunohistochemistry.The results showed that the expression of LARP1B was positive in tumor tissues and negative in normal tissues.Consistent with the results of this study, LIN28B was negative expressed in the normal and tumor tissues.No relevant data on SMG5, PPARGC1A, and ANG were available, which may be related to the limitations of the data included in the database.In brief, our prognostic prediction model was relatively reliable and could be used to identify HCC patients with poor prognosis, which is conducive to the early intervention and treatment of the patients.
This study systematically explored the expression and prognostic value of differentially expressed RBPs in HCC through a series of bioinformatics analyses.This study also constructed a prognostic model based on 5-RBP encoding genes that better predicted the prognosis of the patients with HCC and helped physicians to make clinical decision.However, this study still has some limitations.First, the prognostic model constructed in this study was only based on the TCGA database.It needs to be further veri ed by a large cohort of the patients.Second, in vitro and in vivo experiments are needed to further reveal the mechanism of the selected RBPs' action in HCC.

Conclusion
This study conducted a systematic analysis of the key role and prognostic value of RBPs in HCC.
Through univariate and multivariate Cox regression analyses of the differentially expressed RBPs in tumor tissues and normal tissues, a prognostic model was constructed by identifying ve RBPs that were signi cantly related to the prognosis.Survival analysis and independent prognostic analyses showed that the prognostic model of this study was an independent predictor of the prognosis of HCC.This study helped further reveal the pathogenesis of HCC and had certain guiding signi cance for the prognostic molecular of HCC and the development of new therapeutic targets for HCC in clinical practice.

Figure 1 Whole
Figure 1

Figure 1 Whole
Figure 1

Figure 3 GO
Figure 3

Figure 3 GO
Figure 3

Figure 7 Risk
Figure 7

Figure 7 Risk
Figure 7

Figure 11 Five
Figure 11

Figure 11 Five
Figure 11

Table 1
KEGG pathway enrichment analysis of aberrantly expressed RBPs.

Table 2
Functional enrichment analysis of critical module1 associated with HCC

Table 3
Functional enrichment analysis of critical module2 associated with HCC