Integrated Analysis of the Functions and Prognostic Values of RNA Binding Proteins and Candidate Drugs in Renal Papillary Cell Carcinoma

Background: Roles of RNA binding proteins in renal papillary cell carcinoma (KIRP) remain undiscovered. We thus conducted a series of bioinformatics analyses to elucidate the associations between RBPs and prognosis of renal papillary cell carcinoma. Methods: RNA sequencing data and clinical of KIRP were downloaded from the TCGA database. The differentially expressed RBP coding genes (DEGs) were sorted out by R software and Gene Ontology (GO) function and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were then performed to evaluate the functional pathways. Protein-protein interaction (PPI) network of DEGs was formed through the Search Tool for the Retrieval of Interacting Genes (STRING) database and visualized by Cytoscape. Subsequently, a prognostic model was constructed by the uses of univariate Cox regression analysis, random survival forest analysis and multivariate Cox analysis. Validations of Receiver Operating Characteristic (ROC) analysis, KM analysis of overall survival and nomogram were performed as follow. Furthermore, CMap database was used to predict potential drugs. Results: A prognostic OS-predictive model based on six RBPs (SNRPN, RRS1, INTS8, RBPMS2, IGF2BP3 and PIH1D2) was constructed. STOCK1N-28457, pyrimethamine and trapidil were determined as potential drugs according to the CMap database. Conclusion: This study constructed a prognosis-related six-RBP signature and made a prediction of three small molecular drugs, which provided a novel insight into KIRP and assisted the development of individualized therapeutic strategies.

replication. The induction of these two factors may promote DNA replication and cancer cell proliferation 9 . High expression of RBP LARP1 co-associate with BCL2 and BIK in BCL2 messenger ribonucleoprotein (mRNP) complexes in epithelial ovarian cancer (EOC) and stabilizes BCL2 while destabilize BIK, which promotes ovarian cancer cell survival and leads to adverse prognosis 10 . RBP NELFE decreases the stabilization of NDRG2 mRNA, which results in epithelial-to-mesenchymal transition through the activation of the Wnt/β-catenin signaling and the promotion the metastasis of pancreatic cancer 11 . Stated thus, defects or dysfunctional of RBP are bounded up with tumorigenesis and tumor prognosis.
As the development of high-throughput sequencing platforms, the vast amount of genomic data are extensively applied for biomarker prediction, prognosis analysis and targeted therapy 12 13 . Bioinformatics analyses provide abundant tools and speci c algorithms to obtain, process and interpret biological data 14 . In this study, we conducted a series of bioinformatics analyses based on biological data downloaded from the TCGA database and nally sifted six differentially expressed RBPs associated with KIRP. Our results might provide a new direction for the understanding of progression and prognosis of KIRP.

Data Acquisition
The FPKM transcriptome pro ling data of 32 normal samples and 289 KIRP samples were obtained from the TCGA database. Clinical data were downloaded from the TCGA database. The RBP genes were obtained from the published literature 15 . RBPs of KIRP were screened when the KIRP transcriptome sequencing map combined with the RBP genes.

Data processing of Differentially Expressed Genes (DEGs)
We used limma package in R software to identify different-expression genes of RBPs between the tumor group and the normal group. The identi cation was based on cutoffs of | log2 fold change (FC)| >0.5 and false discovery rate (FDR) < 0.05.

Functional Enrichment Analyses and Protein-Protein Interaction Network
Gene ontology (GO), which contained three terms: biological process (BP), cellular component (CC), and molecular function (MF), were applied to investigate the biological functions enrichment. The Kyoto Encyclopedia of Genes and Genomes database (KEGG) was applied to gure out potential biological pathways. All GO and KEGG enrichment analyses were conducted on R software through the clusterpro ler R package with a P-value less than 0.05. The protein-protein interactions (PPIs) among DEGs were checked using The Search Tool for the Retrieval of Interacting Genes (STRING) database 16 . The Cytoscape 3.7.2 software were then put into use for the visualization of PPI network. Subsequently, the MCODE (Molecular Complex Detection) plug-in in Cytoscape was loaded to lter out signi cant modules from the PPI network with score > 5 and node counts > 5.

Prognostic Model Construction and Analyses
After establishing a combination by merging gene expression and overall survival, we conducted univariate Cox regression to select prognosis-related RBP genes (p < 0.01). We applied randomForestSRC package in R software to conduct the Random survival forests-variable hunting (RSFVH) algorithm to further predict the signi cant RBP genes from initial screened candidates. Based on these genes, a prognostic model was constructed and the rick score was calculated according to the formula as follows: Risk Score= N represents the number of selected genes, is the coe cient of genes in the Cox regression analysis and x equals gene expression value. P values computed by Kaplan-Meier (KM) analysis were then sorted to sift the best combination of six genes. KIRP patients from the TCGA database were divided randomly into training set and test set, and patients in either set were further categorized into high risk group and low risk group according to the median risk score. The survival R package and pROC R package were conducted to construct a ROC curve to measure the accuracy of prognosis. In addition, we plotted a nomogram to calculate the feasibility of overall survival using the nomogramEx R package.

Gene Set Enrichment Analysis
Considered that SNRPN and RRS1 were intersections between critical module 1 of PPI network and prognosis-related combination, GSEA v4.1.0 was downloaded from Broad Institute and Hallmark gene set V6.2 collection were downloaded as target set to perform an analysis of potential mechanism of actions of two genes.

Prediction of Candidate Small Molecular Drugs
We identi ed differentially expressed RBPs of high and low risk group by applying the limma R package.
Then the Connectivity map (CMap) was conducted in order to predict small molecules as potential targeted drugs for KIRP 17 .

Exploration of Differentially Expressed Genes (DEGs)
The owchart of this study is illustrated in Fig. 1. RNA sequencing data containing 32 normal samples and 289 tumor samples of KIRP and clinical data were downloaded from the TCGA database. The list of 1542 RBP genes were acquired from published literature 15 . Finally, 380 RBP-coding genes containing 129 downregulated and 251 upregulated genes met our inclusion criteria of adj P < 0.05 and |log2FC| 0.5 ( Fig. 2A). Expression of DEGs were plotted in a heatmap to visualize (Fig. 2B).

Functional Enrichment Analyses of DEGs
GO and KEGG analyses were conducted on both downregulated genes and upregulated genes using the clusterPro ler R package. The upregulated RBPs were conspicuously enriched in BPs including ncRNA metabolic process, ncRNA processing, ribonucleoprotein complex biogenesis and RNA splicing while the downregulated RBPs were enriched in regulation of translation, RNA splicing and regulation of cellular amide metabolic process. CC analyses demonstrated that the upregulated RBPs were notably enriched in ribosomal subunit, ribosome, spliceosomal complex and cytosolic ribosome whereas the downregulated RBPs were enriched in cytoplasmic ribonucleoprotein granule, ribonucleoprotein granule and spliceosomal complex. With regard to MF analyses, the results revealed that the upregulated RBPs were signi cantly enriched in catalytic activity, acting on RNA, ribonuclease activity, nuclease activity and mRNA 3'−UTR binding, and the downregulated RBPs were mainly enriched translation regulator activity, catalytic activity, acting on RNA, translation regulator activity, nucleic acid binding and translation factor activity, RNA binding ( Fig. 3A, B). As for the KEGG analyses, we discovered that the upregulated RBPs were largely enriched in the pathways of Ribosome, Spliceosome and RNA transport (Fig. 3C). The downregulated RBPs were primarily enriched in RNA transport pathway and mRNA surveillance pathway (Fig. 3D).

PPI Network Construction and Crucial Module
PPI network was constructed by using the STRING database and visualized by applying the Cytoscape software (Fig. 4A). This PPI network comprised 346 nodes and 3164 edges. We further extract three modules with the use of the plug-in MCODE in cytoscape according to the cutoff of node counts > 5 and score > 5 (Fig. 4B). Module 1 consisted of 85 nodes and 1366 edges, module 2 comprised 13 nodes and 33 edges and module 3 included 15 nodes and 38 edges (Fig. 4C, D, E). The results of GO and KEGG for three modules revealed that genes in module 1 were mainly enriched in ribonucleoprotein complex biogenesis, cytosolic ribosome, structural constituent of ribosome and Ribosome pathway, genes in module 2 were enriched in DNA alkylation, chromatoid body, regulatory RNA binding and MicroRNAs in cancer, whereas the genes in module 3 were signi cantly enriched in mitochondrial translational elongation, organellar large ribosomal subunit, structural constituent of ribosome and RNA transport pathway ( Table 1, 2).

Prognostic Risk Score Model Construction and Validation
A combination of 346 RBPs from PPI network and overall survival was put into univariate Cox regression analysis to con rm prognosis-related RBPs (Fig. 5A). 60 RBPs were sorted out by the cutoff of P value < 0.01. The randomForestSRC R package was applied to carry on a Random survival forest analysis in order to distinguish RBP genes with best association with prognosis and 10 genes (EXO1, RBPMS2,  PABPN1L, PIH1D2, INTS8, RRS1, CPSF4L, IGF2BP3, SNRPN and NPM3) were screened out from the 60 prognosis-related RBPs (Fig. 5B). Multivariate Cox analysis were subsequently conducted to establish a prognostic model related to overall survival and we further performed a Kaplan-Meier analysis on the 2 10 -1 = 1023 models formed by 10 genes for the purpose of determining the best risk score model In order to evaluate the predictive capabilities of the model, we allocated KIRP patients into trainging set and test set equally, and then patients in each set were divided into high risk group and low risk group in consifderation of the median risk score. Expressions of survival status and heatmap of each set were also shown (Fig. 5D, G). Receiver operating characteristic (ROC) analyses were utilized for further estimate on the prognostic model. The AUC values in the training set was 0.9 at 1 year, 0.87 at 3 years and 0.78 at 5 years together with 0.88 at 1 year, 0.75 at 3 years and 0.69 at 5 years in the test set (Fig. 5E, H). The results revealed that patients in the high risk group tend to a signi cantly lower survival probability than those in low risk group (Fig. 5F, I).
In order to appraise clinical factors in prognosis, we then conducted an independent prognostis-related analysis on the training set and test set by using univariate and multivariate COX regression analyses.
The univariate results revealed that, in both training set and test set, the stage and T staging could be considered as independent prognostic factors for overall survival of KIRP patients (Fig. 6A, B). In multivariate analysis, tumor stage and T staging could be considered as independent prognostic factors in the training set (Fig. 6C). Nevertheless, only tumor stage can be taken as independent prognostic factor in the test set (Fig. 6D). Finally, the nomogram were constructed with six selected genes and tumor stage to evaluate the mortality risk of 3 and 5 years (Fig. 6E). Furthermore, we plotted calibration curves which demonstrated ideal conformity between speculated outcomes and observed outcomes (Fig. 6F, G).

Gene Set Enrichment Analysis
We gured out that SNRPN and RRS1 were intersections between prognostic model and module 1 of PPI network and then made a Kaplan-Meier analysis for overall survival (Fig. 7A, B). Given the levels of SNRPN were negatively correlated with the survival while RRS1 was positively correlated with the survival, GSEA analysis was applied in the low-expression and high-expression groups. As to RRS1 highexpression group, the genes sets in hallmark collection were enriched in DNA repair, E2F targets, G2M checkpoints, MTORC1 signaling, MYC targets and unfolded protein response (Fig. 7D). As to SNRPN low-expression group, the gene sets were enriched in mitotic spindle, E2F targets and G2M checkpoints (Fig. 7C).

Screening of Candidate Small Molecular Drugs
We applied the limma R package to identi ed differential-expression RBPs of different risk groups, and then 297 RBPs consisted of 261 upregulated and 36 downregulated RBPs reached the threshold of adj P < 0.05 and |log 2 FC| > 1. The prediction of small molecular drugs was based on the 297 RBPs. Finally, three small molecules which were STOCK1N-28457, pyrimethamine and trapidil were cherry-picked according to the enrichment score (> 0.6), P value (< 0.05) and percent non-null (> 70) ( Table 3).

Discussion
Recently, RBPs were becoming progressovely more important by the profoung study conducted on its roles in various cancers and increasingly regarded as crucial factors in posttranscriptional regulation 18-enhance the function mutations of oncogenes and depresse mutations of tumor suppressor 21; 22 . To the best of our knowledge, this was the rst study focused on the role of RBPs in the progression and prognosis of KIRP. Here, we integrated RNA sequencing data of KIRP from the TCGA database and sorted out differentially expressed RBPs between tumor and normal patients. We further coducted GO and KEGG enrichment analyses and established PPI network for these RBPs. Moreover, we constructed a OSpredictve model for predicting the prognosis of KIRP patients and performed ROC analyses to evaluate the feasibility of our model. Subsequently, GSEA was conducted to gure out the biological functions of two selected RBPs.
As for the results of biological functions and pathway enrichment analyses, DEGs were enriched in ribosome and posttranscriptional modi cation pathways, such as RNA splicing, RNA transport, spliceosome and translation. Large numbers of studies in recent years have reported that aberrant RNA modi cation and RNA metabolism were of great value in various cancers 23 24 . Alternative RNA splicing events, which were probably adjusted by RBPs, were reported prevalent in liver cancer and affected tumorigenesis in metabolism-related pathways by investigations conducted by Li, S research team 25 38 39 . Experiments on the the interaction mechanism of TRAF6 and p62 were carried on by Linares and their colleagues. Results revealed the importance for lung cancer cell proliferation through the activation of mTORC1 40 . Last but not least, we made a prediction of potential small molecular drugs which may be of therapeutic bene ts for KIRP patients and had a certain degree of reliability.
Overall, our study dissected the relationship between RBPs and KIRP for the rst time and proposed a novel direction for exploring the tumorigenesis and prognosis of KIRP. We determined 6 RBPs which were linked with prognosis, constructed a reliable prognostic OS-predictive model and speculated three potentially useful drugs. The six RBPs could act as potential therapeutic targets of KIRP and contribute to the development of clinical treatment. Nevertheless, our study had several limitations. First, our prognostic model was only constructed on the TCGA database and lack of clinical data from the GEO database to evaluate. Meanwhile, the lack of clinical characteristics of clinical data from TCGA may decrease the credibility of our research. Moreover, our results were based on RNA sequencing, patients had the possiblity to exhibit inter-individual heterogeneity. Finally, prospective clinical studies had the necessarity to condut before our 6-RBP prognostic model put into use.

Conclusion
We applied a series of bioinformatics analysis on the aberrantly expressed RBPs, which were a liated with tumorigenesis, invasion and prognosis, in order to investigate their potential functions, action pathways and prognostic values. Subsequently, we identi ed six RBPs highly associated with prognosis of KIRP and constructed a six-RBP prognostic model to predict overall survival as well as optimize the predictive ability of stage system. Moreover, we selected two important RBPs and evaluate their biological effects and made a prediction of potential drugs. To the best of our knowledge, this study represented the rst study focused on prognostic values of RBPs in KIRP and provided new insight into pathogenesis and therapeutic strategies of KIRP.

Data Availability Statement
All datasets about TCGA are publicly available and included in the article.

Declaration of Competing Interest
The study was conducted with no existance of competing interests.

Funding
The study was supported by the National Natural Science Foundation of China (81771640).

Authors' Contributions
Zengjun Wang: conception and design of the study, funding acquisition. Silin Jiang and Xiaohan Ren:                Identi cation of independent prognostic factors, nomogram for predicting the probability of patient mortality based on six RBPs as well as tumor stage and calibrations of nomogram in terms of conformity between predicted outcomes and observed outcomes at 3 and 5 years. Outcomes of univariate prognostic analysis conducted on training set (A) and test set (B). Outcomes of multivariate prognostic analysis conducted on training set (C) and test set (D). (E) Nomogram for evaluating the possibility of KIRP patients mortality at 3 and 5 years. Calibration for assessing the conformity between nomogram OS and observed OS at 3 years (F) and 5 years (G).