Identication and Validation of RNA Binding Protein-associated Prognostic Model for Neuroblastoma

Background: The abnormal expression of RNA binding protein (RBP) may be related to the development and progress of cancer. However, little is known about the mechanism of RBP in neuroblastoma (NB). Methods: We downloaded the RNA expression data of NB and normal nervous tissues from the (TARGET) database and GTEx database, and determined the differential expression of RBP between normal and cancerous tissues. Then the function and prognostic value of these RBPs were systematically studied. Results: A total of 348 differentially expressed RBPs were identied, together with 166 up-regulated RBPs and 182 down-regulated RBPs. Two hub RBPs (CPEB3 and CTU1) were identied as prognostic-related genes and chose to build prognostic risk score models. Further analysis showed that based on this model, the overall survival rate of patients in the high-risk subgroup was lower (P=2.152e-04). The area under the curve(AUC) of the receiver-operator characteristic curve(ROC) of the prognostic model is 0.720 in the TARGET cohort. There is a signicant difference in the survival rate of patients in the high and low risk subgroups in the validation data set GSE85047 (P = 0.1237e-08), the AUC is 0.730. Conclusions: RNA binding protein (CPEB3 and CTU1) can be used as molecular markers of NB.


Introduction
Neuroblastoma is still the main cause of tumor-related deaths in children worldwide [1]. As diagnosis and treatment methods have made great progress in the past twenty years, the average 5-year relative survival rate of neuroblastoma has reached 50% [2]. Currently, the diagnosis of NB mainly relies on histopathological examination, imaging results and tumor molecular biomarkers. [3] It is di cult to detect neuroblastoma early. This may be the most important reason affecting the mortality of patients with neuroblastoma [4]. Therefore, further study of the molecular mechanism of neuroblastoma and identi cation of effective molecular markers for early cancer screening are essential to enhance the therapeutic outcomes and quality of life of child. [5].
RNA binding protein (RBP) is a pleiotropic protein that can regulate gene expression to reach posttranscriptional levels by interacting with target RNA [6]. It can interact with various types of RNA, such as rRNA, ncRNA, snRNA, miRNA, mRNA, tRNA and snoRNA [7]. Now, over 1,500 different types of RBP have been found in the human genome through whole-genome screening, [8]. RBP regulates cell functions by interacting with RNA, and plays an important function in post-transcriptional gene expression regulation. [9] The ribonucleoprotein complex formed by the binding of RBP and target RNA regulates the stability of mRNA on post-transcriptional level, thereby affecting RNA processing, splicing, localization, export and translation. [10,11] Regulate and determine various important physiological processes of cells [12].
Existing research have found that RBPs play an important responsibility in many human diseases which are key regulators of the he development and progression of cardiovascular diseases [13], myotonic muscular dystrophy [14], neurological diseases [15] and cancer [16]. Therefore, we use high-throughput bioinformatics analysis to identify RBPs that are differentially expressed between cancerous samples and normal samples, and systematically investigated their expression patterns, functional effects and potential mechanisms to understand their role in tumors. This study will deepen our understanding of the molecular mechanism of NB and provide potential diagnostic or prognostic biomarkers for NB.

Data sets and preprocessing
We get RNA expression datasets and corresponding clinical datasets of NB patients from the Therapeutically Applicable Research To Generate Effective Treatments project database (TARGET, https://ocg.cancer.gov/programs/target), and normal neural tissues samples datasets from Genotype-Tissue Expression Database (GTEx, https://gtexportal.org/), respectively. All data comes from an open public big data platform, this study does not require ethics approval. In order to determine the differentially expressed genes between NB tissue and normal sample, the Limma software package was used for analysis.

Gene Ontology (GO) enrichment and KEGG pathway analysis
Gene enrichment analysis and pathway analysis was carried out by the R package "clusterPro ler". [17] Protein-protein interaction (PPI) network building and subnet detection.
Signi cant differential protein-protein interaction information of RBP is evaluated using STRING database (http://www.string-db.org/) [18] and further building and visualization of the PPI network by Cytoscape 3.7.0 software. The application the molecular complexity detection (MCODE) plug-in clusters genes in the PPI network and constructs functional modules. P < 0.05 was considered a statistically signi cant difference [19].

Prognostic model construction
The R package "survival" was applied to carry out univariate Cox regression analysis on all differential RBPs to identify the prognostic genes, and lasso regression was performed to further screen important key genes. Finally, based on the preliminary screening of the above key candidate genes, we built a multivariate Cox proportional hazard regression model, and evaluated the survival of patients through risk scores. The sample risk score formula is like this: Risk score = β1* Exp1 + β2 Exp2+…+βi Expi Among them, β was the value of risk coe cient, and Exp represented the value of expression in a certain gene. In accordance with the median value of risk score, NB patients were divided into two proups: low-risk group and high-risk group, and the survival difference between the two subgroups was compared through survival analysis. In addition, the prognostic ability of the above model is estimated through receiver operating characteristic curve(ROC) analysis. A sample of 276 NB patients with dependable follow up information from the GSE85047 data set was used as a validation group to evaluate the predictive power of the prognostic model. P < 0.05 was considered a statistically signi cant difference.

Identi cation of differently expressed RBPs in NB patients
In this research, we performed a methodical analysis of the key role and prognostic value of RBP in NB. The NB data is downloaded from TARGET, which contains 144 tumor samples, and the normal nerve tissue data is downloaded from the GTEx database, which contains 278 samples. After analyzing the currently known 1542 RBPs, 348 RBPs with signi cant differences (P < 0.05, |log 2 FC)|>1.0) were screened out, together with 166 up-regulated RBPs and 182 down-regulated RBPs. (Fig. 1)

Enrichment analysis of the differently expressed RBPs
In order to study the functions and mechanisms of the selected RBP, we use the R package clusrepro le for enrichment analysis. The results show that biological processes are mainly enriched in mRNA processing, RNA splicing, ncRNA metabolic process, RNA phosphodiester bond hydrolysis, RNA splicing, via transesteri cation reactions with bulged adenosine as nucleophile, mRNA splicing, via spliceosome, RNA splicing, via transesteri cation reactions, nucleic acid phosphodiester bond hydrolysis, RNA catabolic process, and MF in catalytic activity, acting on RNA ribonuclease activity, nuclease activity, mRNA 3'-UTR binding, endonuclease activity, translation regulator activity, catalytic activity, acting on a tRNA, mRNA binding, double-stranded RNA binding, endoribonuclease activity, single-stranded RNA binding, and CC in ribonucleoprotein granule, cytoplasmic ribonucleoprotein granule, ribosome, ribosomal subunit, organellar ribosome, mitochondrial ribosome, P-body, mitochondrial matrix, P granule, pole plasm. The KEGG chie y enriched in RNA transport, mRNA surveillance pathway, Ribosome biogenesis in eukaryotes, RNA degradation, Ribosome, Aminoacyl-tRNA biosynthesis, Spliceosome, RNA polymerase, In uenza A. (Fig. 2) (Table 1,2) PPI network building and subnet detection In order to more study the function of differential RBP and its role in the development of NB, we used Cytoscape software to create a PPI network, which contains 311 nodes and 1766 edges. The coexpression network was analysis with the MCODE to recognize potential key section. (Fig. 3) The RBPs in the subnet 1 were mainly enriched in ribosome biogenesis in eukaryotes pathway, ribosome biogenesis, rRNA processing, ncRNA processing, ,maturation of SSU-rRNA, ribosomal small subunit biogenesis, rRNA metabolic process ,maturation of SSU-rRNA from tricistronic rRNA transcript (SSU-rRNA, 5.8S rRNA, LSU-rRNA), ribosomal large subunit biogenesis.

Prognosis-related RBPs selecting
The difference analysis identi ed a total of 348 key RBPs. In order to learn the prognostic signi cance of these RBPs and their effect on clinical outcome and survival time, we conducted univariate Cox regression analysis and get 4 candidate center RBPs related to prognosis (Fig. 4 ). Subsequently, through lasso regression, the prognostic risk equation of multi-factor Cox regression was established. (Fig. 5, Table 3). Prognosis-related RBPs model building and analysis At last, CPEB3 and CTU1 were identi ed as the key prognostic genes by the multivariate Cox regression analysis. We used this two hub genes to construct the predictive model. The risk score of every child was calculated in accordance with the following formula: Risk score = (-0.60901*expCPEB3)+ (0.851637* expCTU1).
Then, based on median value of riskscore, 144 NB patients were divided into two groups: low-risk group and high-risk group. The results showed that compared with patients in the low-risk group, patients in the high-risk group had poorer survival, which was statistically signi cant (P = 2.152e-04). The value of area under curve(AUC) in the TARGET model is 0.720. (Fig. 6A, 6B,Fig. 7A)

Validation of hub RBPs
With the purpose of evaluation of the prognostic value of the RBPs prediction model, we used the GSE85047 patient cohort to verify the relationship between risk score and survival time. In the GSE85047 cohort, groups were also grouped based on the median value of risk score in the TARGET model. The survival time of patients with high risk scores was poorer for patients with lower risk scores, which was signi cant (P = 0.1237e-08), and the AUC was 0.730. (Fig. 6C,6D,Fig. 7B)

Discussion
The prognosis of different neuroblastoma patients varies greatly, that is, there is extensive tumor heterogeneity among neuroblastoma. For low-risk neuroblastoma patients (most commonly in infants), simple observation or surgical treatment can often achieve good results; but for high-risk neuroblastoma patients, even if a variety of intensive treatment options are combined. [20] The prognosis is still not ideal.
The true cause of neuroblastoma is still unclear. In recent years, with the emergence of immunotherapy and new drugs, the survival of patients in the high-risk group has improved to a certain extent [21].
RBP has always been with RNA life. It is not an exaggeration to say: Without RBP, RNA can't do anything.
Its main role is to mediate RNA maturation, transport, localization and translation; one RBP may have multiple target RNAs; and its expression defects can cause multiple diseases. Recently, the importance of RBP in tumor occurrence, development, and metastasis has gradually been noticed [22].
In our research, we identi ed 348 RBPs based on NB datasets from TARGET. We systematically analyzed the related biological functions and built these RBP PPI networks and its subnets. In addition, we also performed univariate Cox regression analysis, survival analysis, lasso regression analysis and multivariate Cox regression analysis of differential RBP to more investigate its biological function and prognostic value.
The GO enrichment and KEGG pathway analysis of these differentially expressed RBPs indicate that RBP is used in mRNA monitoring pathways, RNA transport, ribosomal biogenesis in eukaryotes, RNA degradation, ribosomes, aminoacyl-tRNA biosynthesis, and spliceosome It is signi cantly enriched in RNA polymerase pathway, and plays an critical function in mRNA processing, RNA splicing, ncRNA metabolism, RNA phosphodiester bond hydrolysis, catalytic activity, and acting on RNA ribonuclease activity and nuclease activity. At present, many research have reported its role in various forms of RBPs in metabolism and disease. It plays a dual and opposite role in tumorigenesis, regulating the proliferation of early tumor cells and promoting tumor progression and metastasis of advanced cancer. According to reports, abnormal expression of multiple RBPs had been found in many malignant tumors [10,23,24]. However, the impact of RBP on the occurrence and development of cancer is still poorly understood.
A total of 2 RBPs were identi ed as hub RBPs related to NB prognosis, including CPEB3 and CTU1. Based on the two hub RBPs trained by the TARGET cohort, multi-step Cox regression analysis produced a riskscore model that can predict the prognosis of NB. In the TARGET-NB cohort and GSE85047 cohort, the survival results of the high and low risk subgroups were signi cantly different, and the ROC values of the training set and validation set were 0.72 and 0.73, respectively, indicating that the 2-gene marker prognostic model is used to evaluate the prognosis of NB patients has a certain value. However, the molecular mechanisms of these two RBPs are still little known, and further study of their underlying function may be valuable.

CPEB3 (Cytoplasmic Polyadenylation Element
In summary, we systematically study the function and prognostic value of RBPs differently expressed in NB. These RBPs may be related to the occurrence, development, invasion and metastasis of NB. The establishment of a prognostic model of NB gene based on two RBP coding genes is conducive to clinical application. Our results help explain the pathogenesis of NB and develop new molecular markers of therapeutic and prognostic targets.

Declarations
Availability of data and materials The data used to support the ndings of this study are included within the article. The data and materials in the current study are available from the corresponding author on reasonable request.
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Competing interest
The authors declare no con icts of interest.

Funding
There is no fund support for this work.

Authors' contributions
Jun Yang and Shaohua Wang conceived and designed the study and wrote the manuscript.
Jiaying Zhou and Cuili Li analyzed the data.
All authors reviewed and approved the nal manuscript.