A Prognostic Model Constructed Based on Rna Binding Protein to Predict the Prognosis of Patients With Rectal Cancer

Background: The incidence of rectal cancer in young people is increasing, and there has been a problem of poor prognosis in recent years. Many studies have shown that RNA binding protein (RBP) is related to the progression of various malignant tumors. However, the role of RBPs in rectal cancer is poorly understood. New prognostic models are urgently needed. Materials and methods: In the study, we used the RBPTD database, The Cancer Genome Atlas (TCGA) database and the transcription data information and corresponding clinical information of rectal cancer patients in the Gene Expression Omnibus (GEO) database to screen out RBPs that are differentially expressed in tumor tissues and normal tissues. Subsequently, we analyzed the prognostic value of these RBPs using bioinformatics methods. In order to screen the key RBP in the occurrence of rectal tumors and establish a prognostic risk score model. The use of survival analysis shows that assessing the relationship between key RBPs and the patient's overall survival rate. In the TCGA cohort, the prognostic model was further tested. At the same time, the nomogram of the 6 RBP mRNAs in the TCGA cohort was constructed, and the ROC curve was used for verication. Finally, q-PCR was performed on clinical samples to verify the expression of hub genes. Results: The new 6RBP (EXO1, TOP2A, RUVBL1, NXT1, PACSIN2, WDR4) prognostic model was established to predict the prognosis of rectal cancer. The ROC curve showed good results in the training cohort and validation cohort. The new 6RBP (EXO1, TOP2A, RUVBL1, NXT1, PACSIN2, WDR4) prognostic model was established to predict the prognosis of rectal cancer. The ROC curve showed good survival prediction in both the training cohort and the validation cohort. The constructed nomogram has certain guiding signicance for clinical decision-making. In addition, GSEA analysis revealed potential biological functions. The q-PCR verication results showed the consistency with the construction of the prognostic model. Conclusions: We constructed a six RBPs prognostic model and a nomogram to predict the prognosis of patients with rectal cancer, and performed q-PCR expression testing through clinical samples, which may help clinical decision-making.


Introduction
In China, colorectal cancer is the third most common cancer and the third leading cause of cancer-related deaths. More than 50,000 people die from colorectal cancer each year [1]. Research predicts that by 2030, the global burden of rectal cancer will increase by 60%, reaching 1.1 million deaths and 2.2 million new cases [2]. However, recent studies have shown that the incidence of rectal cancer among young people is on the rise, and the long-term prognosis is reduced [3]. At present-day, the diagnosis of rectal cancer relies on colorectal endoscopy, pathological evaluation and tumor markers, which are challenging to meet clinical requirements [4]. In adjuvant therapy, the purpose of treating tumors is achieved by improving the drug resistance of chemotherapeutics and discovering speci c tumor-speci c targets. At the same time, exploring effective early diagnosis methods is another important way to improve the clinical prognosis of patients with rectal cancer. Studies have shown that the protocadherin 17 (PCDH17) gene can be used as a potential prognostic indicator for the sensitivity of colorectal cancer patients to resistance in patients who are sensitive to 5-uorouracil (5-FU) [5]. At the same time, in terms of how to target the KRAS mutation to cause drug resistance, research has found that targeting CDK4 and FYN can cause more KARS-mutated colorectal cancer cells to die, which can bring new directions for tumor treatment [6].Compared with invasive tests, biomarkers with better speci city and sensitivity have extraordinary value in early diagnosis, prognosis prediction and even treatment.
RBPs are composed of more than 2000 proteins, which participate in a variety of biological processes by combining with different RNA types, including RNA shearing, mRNA stability, cytoplasmic localization and protein translation [7]. Due to the critical role of RBPs in the process of post-transcriptional gene expression, RBPs that regulate differential expression may cause different diseases, including cancer Due to the critical role of RBPs in the process of post-transcriptional gene expression, RBPs that regulate differential expression may cause different diseases, including tumor [8].In recent years, whole-genome analysis has shown that the expression level of certain RBPs is closely related to the malignancy of colorectal cancer. For instance, LIN28 is an emerging carcinogenic driver [9]. The two subtypes LIN28A or LIN28B in mammals can promote colon cancer that invasion and progression, thereby affecting the prognosis of patients [10]. Also, Musashi RNA binding protein (MSI) is also upregulated in colorectal cancer and has been con rmed to be related to the invasion and drug resistance of colorectal cancer [11].On the contrary, HuR exerts anti-tumor effects by regulating the tumor suppressor p21 and WNT family protein WNT5A [12]. RBPs can also play an important role in the treatment of colorectal cancer. Tristetraprolin (TTP), RBPs, can be used as a therapeutic target for the anti-tumor drug resveratrol, inducing tumor cell apoptosis, reducing invasion and migration [13]. Nonetheless, most of the RBPs related to rectal cancer have not been clearly studied.
In this study, We collected data about rectal cancer from TCGA and GEO databases, screened out differentially expressed RBPs, and then used a variety of veri cation methods to validate the risk scoring model, and nally performed functional enrichment analysis to explore its potential biological functions. Our research has identi ed some RBPs related to rectal cancer. It provides us with potential new ideas for studying the occurrence and development of rectal tumor and may provide potential predictive biomarkers for the early diagnosis and Improved prognosis of rectal cancer in the future. Expression and Disease (READ) from the TCGA database ( Including 10normal, 167tumor), use the R software package to convert the data into TPM values and remove data less than 0.5; Download the GSE87211 data set (including 160normal, 203tumor) from the GEO database, use the R software package to remove the data less than 0.5, combine the RBP gene name to extract the preprocessed data, and use the "limma" package to process the extracted data (with P<0.05, |log2FC)| >1.0 is the standard).
The differentially expressed RBPs are screened for the next analysis [14].

protein-protein interaction (PPI)network and module analysis
In order to clarify the relationship between the differentially expressed RBPs, the PPI information of the differentially expressed RBPs was analyzed through String Database (https://string-db.org) [15], and Cytoscape 3.7.1[16] was used for visualization. Then use the Cytoscape 3.7.1 plug-in Molecular Complex Detection (MCODE) to pick out the modules whose score, and the number of nodes is greater than 5, and sort out the hub RBPs. P-value ≤0.05, there is a signi cant difference.

KEGG pathway and GO enrichment analysis
Through GO enrichment analysis [17] and KEGG pathway analysis[18], the biological functions of the differentially expressed RBPs were determined (subject to P<0.05, FDR<0.05).Use DAVID (https://david.ncifcrf.gov/) to identify essential signal pathways, and submit the analyzed data to Sangerbox (https:Sangerbox.com//) for mapping.

Cox proportional hazards regression analysis
We used the "survival analysis" R package to perform univariate Cox regression analysis on the key RBPs in the compulsory modules of PPI sorting to screen out RBPs related to the overall prognosis (P<0.05). First of all, the 166 samples obtained after integrating clinical information from TGCA-READ are randomly divided into a training set (n=82) and test set (n=84); the training set is used for the prediction model, and the test set and complete data set will be Used to verify the prediction model. Next, use the multi-factor Cox analysis to further screen out the six-candidate hub RBPs (P<0.05) and the corresponding β value and HR value for constructing the risk ratio model in the training set.

Prognostic risk model Construction
According to the selected RBPs related to the prognosis. In the training set, we build a proportional hazards model to predict the prognosis of patients with rectal cancer. The risk score calculation formula for each hub RBP is as follows: Where βi represents the regression coe cient, Expi represents the expression of RBP. The risk score is positively correlated with the patient's prognosis, so patients are divided into high and low-risk groups.
Then the log-rank test was used to compare the signi cance of OS difference between the two groups, and the Kaplan-Meier survival curve was drawn. Finally, the "SurivivalROC" R language software package was used to perform univariate and multivariate Cox regression analysis and ROC curve to predict the performance of the model.

Validation of the potency of the prognostic risk model
The patient scores of the test set and the complete data set are compared with the calculated values of the training set, and each patient is divided into a high-risk group and a low-risk group. Use the "survival analysis" R package to draw ROC curve, Kaplan-Meier survival curve, and single factor and multivariate Cox regression analysis, combined with clinicopathological characteristics to analyze and nally con rm the performance of the prediction model. Then we analyzed the correlation between the risk and clinical traits of the selected hub RBPs genes and the relationship between hub RBPs gene expression and clinical traits and used the R package for visualization. Subsequently, based on the complete model, the "rms" R package was used to construct a nomogram model containing all independent factors to verify the relationship between hub RBPs and the patient's 5-year OS. Finally, we used two data sets (n>100) in the GEO database and the Human Protein Atls(HPA) database (https://www.proteinatl as.org/) on the expression levels of hub RBPs in rectal cancer and normal tissues.

Gene Set Enrichment Analyses
According to the constructed risk ratio model to determine the best critical value, the TCGA-RBPs samples were divided into High and low-risk score groups. Use GSEA (version 4.0) [19] based on the molecular signature database (Molecular Signatures Database, MSigDB) to provide gene enrichment analysis for the high-risk group and low-risk group (|NSE|>1, FDR<0.05 is considered as statistical Learning meaning).

Reverse Transcription-QuantitativePolymerase Chain Reaction (RT-qPCR)
TRIzol reagent was used to extract total RNA from tumor tissues and normal tissues adjacent to gastric cancer patients. Use TB Green® Premix Ex Taq™ II (Tli RNaseH Plus) and 7900HT Fast Real-Time PCR System (ThermoFisher) for cDNA analysis. Calculate the expression levels of HEYL, FOXC1, XBP1, PLEK and HOPX compared with the internal reference gene ACTB. And the ampli cation results were compared and analyzed, and the relative expression levels of EXO1, TOP2A, RUVBL1, NXT1, PACSIN2 and WDR4 were obtained. The primer sequence is shown in Table 1.

Statistical analysis
The data set of our research is analyzed using the R package (version 3.6.1). Bioconductor provides tools for analyzing and interpreting high-throughput genomic data (http://bioconductor.org/). Analyze differential genes using the limma package in R (version 3.61). For continuous variables, t-test and analysis of variance were used for statistical differences; for categorical variables, Pearson's χ2 test and Fisher's exact test were used for statistical differences. Univariate and multivariate Cox proportional regression analysis uses Survival software package analysis. The Kaplan-Meier method was used to draw a survival curve to evaluate the overall survival rate of high-risk and low-risk rectal cancer. This study uses R (version 3.6.1) and GraphPad Prism (version 7.0) to perform statistical analysis.

Screening of differentially expressed RBPs in rectal neoplasms
This research was carried out according to the owchart shown in Figure

PPI Network Construction and functional enrichment analysis of differentially expressed RBPs genes
The biological function of differentially expressed RBPs in rectal cancer. Take the intersection of the differentially expressed genes screened by TCGA and GEO (including 44 up-regulated RBPs and 32 downregulated RBPs), as shown in Figure 3A. According to the information of co-up-regulated and co-downregulated RBPs in the STRING database, a PPI network with 73 nodes and 394 edges was constructed using Cytoscape software ( Figure 3B). Subsequently, the MOCODE plug-in in Cytoscape was used to screen out the two compulsory modules with the highest scores from the PPI network; the cytoHUbba plug-in was used to select hub genes (P<0.05), and all hub genes existed in modules ≥1. Therefore, these two key cluster modules represent the critical biological roles of PPI networks.
Use DAVID (https://david.ncifcrf.gov/) to perform GO and KEGG function enrichment analysis on the 76 RBPs genes selected to discover the biological functions of the differential genes. KEGG analysis identi ed that differentially expressed RBPs are enriched in the biological pathways related to cell cycle, cysteine and methionine metabolism, and microRNA in cancer. The results of molecular function analysis con rmed that the differentially expressed RBPs were signi cantly enriched in RNA phosphodiester bonds, RNA binding catalytic activity, RNA synthesis, chromatin binding, etc. In terms of cell composition, different RBPs are mainly enriched in nucleoplasm, cytoplasm and nucleolus. In addition, we also found that differentially expressed RBPs are also enriched in RNA hydrolysis, IFN-α biosynthesis forward regulation, IFN-β synthesis forward regulation, rRNA transcription ( Figure. 4)

Selection of RBPs related to prognosis
We aimed at identifying 73 differential expression hub RBPs in the PPI network. In order to study the correlation between these RBPs and prognostic signi cance, we used univariate Cox regression analysis to screen out 13 RBPs with a signi cant overall prognosis. ( Figure 5A). Next, we divided the 166 samples with clinical information of TCGA-READ into a training set (82 samples) and test set (84 samples). And using multivariate Cox regression analysis to analyze their impact on patient survival time and clinical prognosis( Table 2.), it was found that six hub RBPs are independent predictors of rectal cancer patients ( Figure 5B).

Construction of a genetic risk scoring model related to prognosis
Construct a risk ratio prediction model for the six hub RBPs identi ed by multivariate cox regression analysis (Table 3). Calculate the risk score of each patient according to the following formula: To evaluate the predictive ability of the model. In the training set (n = 82), we risk score based on the median of 82 samples are divided into high-risk and low-risk groups and group survival analysis. The results showed that high-risk group OS is far lower than the low-risk group ( Figure 6A). Secondly, we further conducted a time-dependent ROC analysis on the prognostic ability of six hub RBPs. The area under the ROC curve of the constructed RBPs risk scoring model is 0.780, indicating that it has a good predictive ability( Figure 6B). Figure 6C shows the expression heat map of the risk score, patient survival status, and six RBPs in the high-risk group and the low-risk group. Finally, we performed univariate and multivariate Cox regression analysis. The results show that the forecasting model showed moderate independent predictive power.

Prognostic model performance veri cation
To validate the predictive value of six key RPBs prognostic models, we use the internal test set (n = 84), the complete data set (n = 166), to assess the predictive ability of the training set. Survival analysis of the two test sets showed that patients in the high-risk group had a worse prognosis than those in the low-risk group ( Figure 7A,8A), which was consistent with the results of the training set. Time-dependent ROC curve analysis shows that the AUC of the internal test set and the complete set are 0.911 and 0.798, respectively. (Figure 8B, 9B). Figure 7C and 8C showed the survival status of patients with risk scores in the prognostic model and the expression heat map of 6 RBPs. Finally, univariate and multivariate Cox analyses involving clinical factors and risk scores showed good independent predictive power in both test sets. In summary, the reliability of the prognostic model is explained (Table 4, Table 5).

Construction and veri cation of prediction nomogram
To establish a method for quantitatively predicting the survival probability of patients with rectal cancer, we combined six independent prognostic markers of RBPs to construct a nomogram to predict the veyear OS rate of patients ( Figure 9A). Besides, the ROC curve analysis showed that t n addition, ROC curve analysis showed that AJCC staging (AUC=0.890), tumor status (AUC=0.798), tumor residual (AUC=0.618) and patient age (AUC=0.615) were all lower than the risk score of the model (AUC= 0.962) as shown in Figure 9B.

Analysis of hub genes related to RBPs, clinical features and biological functions
Based on the proportional hazard regression model analysis, six RBPs(PANCSIN2, EXO1, TOP2A, NXT1, RUVBL1, and WDR4) are associated with prognosis. The expression of these 6 RBPs in different risk groups in the TCGA data set is shown in Figure 10. We observed signi cant differences between the lowrisk and high -risk subgroups in clinical characteristics such as risk grade, tumor status, tumor residual, and tumor stage. We analyzed 6 RBPs for different clinical characteristics. The results showed that the expression levels of NXT1, TOP2A and WDR4 were signi cantly different in different risk groups. In different tumor stages and grades, signi cant differences were found in the expression levels of TOP2A and PACSIN2. The expression of PACSIN2 in different tumor states and tumor invasion blood vessels is different.
To further analyze the potential biological functions of the six RBPs genes, we use GSEA analysis in groups of risk scores. The results showed that the high-risk group was enriched in pancreatic β cells, blood coagulation, myogenesis, and epithelial-mesenchymal transition pathways( Figure 11).

External veri cation of the prognostic performance and expression of hub RBPs
To further determine the relationship between the expression of these Hub RBPsin rectal cancer and the prognosis, we obtained the immunohistochemical results from the human protein atlas database. Compared with normal rectal tissues, the expression of TOP2A, RUVBL1, PACSIN2, and WDR4 in rectal cancer increased signi cantly. At the same time, we combined two external data (GES20482, GSE90627) with sample numbers greater than 100. It was found that PANCSIN2, TOP2A, RUVBL1, WDR4 had signi cant differences in expression in normal tissues compared with rectal cancer tissues. We further demonstrated the expression of 5 TFs screened in clinical specimen tissues ( Figure 12). The results con rmed that it is consistent with our analysis. EXO1, TOP2A, RUVBL1, WDR4 have higher expression levels in normal tissues, and lower expression levels in tumor tissues. However, PACSIN2 and NXT1 were not statistically different between normal tissues and tumor tissues.

Discussion
Colorectal cancer is the third most common malignant tumor in the world. Due to the unknown pathogenesis, the mortality rate of rectal cancer patients is still high [1]. With the development of precision medicine, people in addition to surgical treatment of colorectal tumors are expected to bring new understanding of diseases by deciphering the molecular basis behind the process of cancer [20].Most of this knowledge is obtained through the study of DNA and protein functions, but the study of posttranscriptional functions is not thorough enough. In the past ten years, the understanding of posttranscriptional cancers has increased, especially with the discovery of the diverse biological functions of non-coding RNAs. RBPs play an important biological role by forming an intricate network of RNA regulators to regulate gene expression before and after transcription [21]. Accumulated studies have shown that RBPs have obstacles to the function or expression of RBP in the progression of various tumors [22][23][24][25]. However, the speci c functions of most RBPs during tumorigenesis are still unclear [26]. Compared with a single biomarker using a multi-gene signature to construct a prognostic model, it can better predict the patient's prognosis [27].In this study, we integrated data from TCGA and GEO of rectal cancer and identi ed RBPs that are for screening differential expression RBPs. Subsequently, we constructed a PPI network of vital RBPs and analyzed their biological pathways. Besides, we selected six hub RBPs from 76 critical RBPs through Cox proportional regression analysis to establish a prognostic risk model. In the internal data set, survival analysis and nomogram show that the model has good independent prediction ability. Concurrently, we compared 12 marker genes related to the prognosis of colorectal cancer constructed by Fan et al. [28], and the results showed that the risk prognosis model we constructed was better than its ROC curve (AUC=0.798) than its ROC curve (AUC=0.553). We speculate that these 6 RBPs have the potential to predict the prognosis of rectal tumor.
GO analysis showed that RBPs that are differentially expressed in molecular functions are mainly involved in RNA synthesis, regulating ribozyme activity and participating in the process of chromatin binding, IFN-α biosynthesis forward regulation process, IFN-β synthesis forward regulation process. This is consistent with the reported research that RBPs mainly exert their biological functions by regulating RNA or participating in the process of chromatin regulation [29][30][31]. IFN forward synthesis pathway is a pathway related to tumor immune escape. In the process of tumorigenesis, inhibiting IFN forward synthesis can reduce the production of IFN and other in ammatory mediators and enhance tumor immune escape [32]. According to recent researches, it is stated that RBPs interact with the IFN forward synthesis pathway, which can affect tumor growth by regulating the immune-in ammatory response [32,33]. Furthermore, RBPs related to prognosis are mainly enriched in microRNA in cancer, cysteine and methionine metabolism, and cell cycle in the KEGG pathway. MicroRNA is a regulatory non-coding small RNA, and it has been proven to be a promising target for targeted therapy of cancers [34]. Cysteine and methionine metabolism may be a unique metabolic phenotype dependent on the growth of malignant cancers [35]. Studies have found that this metabolic process is related to the prognosis of many tumors [36]. The prognosis of colorectal cancer is related to the expression of cell cycle-related genes, such as changes in the ubiquitin-proteasome system. The prognosis of colorectal cancer patients will also be affected [37]. The above results indicate that RBPs may participate in the pathophysiological process of tumor and are related to the prognosis of cancer patients.
Subsequently, by constructing a PPI network of differential RBPs, further single-factor and multi-factor Cox regression analysis obtained out 6 Hub RBPs (EXO1, TOP2A, RUVBL1, NXT1, PANCSIN2, WDR4) which are related to prognosis. Con rmed in research that the expression of EXO1[38, 39], RUVBL1 [40], TOP2A [41] are related to the progression of colorectal cancers, which is consistent with the prediction results of our constructed model. Next, based on the 6 Hub RBPs coding genes in the TCGA training set, through multi-factor Cox proportional hazard regression analysis, we established a proportional hazard model for predicting the prognosis of rectal cancer. ROC curve analysis shows that these six genes have good diagnostic capabilities and can screen outpatients with rectal cancer with acquainting scarcely. However, the molecular mechanisms of these six RBPs in rectal cancer are still poorly understood, and it is worthwhile to further explore their potential molecular mechanisms in rectal cancer. Then, we constructed a nomogram to help us more intuitive prediction 1--5-year overall survival. Finally, we analyzed through GSEA that the high-risk group was mainly enriched in pancreatic β-cells, blood coagulation, myogenesis, and epithelial-mesenchymal transition pathways. Among them, the epithelialmesenchymal transition pathway is an important marker to distinguish benign and malignant tumors [42]. Studies have shown that RBP can regulate the stability of RNA to inhibit the metastasis of rectal cancer cells [43]. We further veri ed the expression of the six RBPs encoding genes in the high and low risk groups through the HPA database and two external validation sets, which is consistent with the evaluation of the risk prognosis model we constructed. Finally, our q-PCR analysis of clinical samples was basically consistent with our predicted results. In summary, according to the results of external veri cation and risk model prediction, we speculate that the prognosis of patients in the high-risk group is poor, and the corresponding treatment plan or personalized evaluation treatment needs to be adjusted.
In general, our research affords new ideas for people to further reveal the role of RBPs in the rectal tumor.
Also, our prognostic model shows good survival prediction performance, which may contribute to developing new prognostic indicators for rectal cancer. Unlike previous studies, the genetic signatures associated with these RBPs show critical biological functions, which suggests that they may be used in the treatment of rectal cancer in the future. Nevertheless, this study still has several limitations. First of all, due to the heterogeneity of patients with rectal cancer, our prognostic model should be constructed based on a multi-omics platform. Finally, some clinical information missing from the TCGA data set could be reduced the statistical validity and reliability of multivariate Cox regression analysis.
In summary, we systematically explored the expression and prognostic value of differentially expressed RBPs in rectal cancer through a series of bioinformatics analysis. These RBPs may be involved in the occurrence, development, invasion and metastasis of rectal cancer. Six prognostic models of RBPs encoding genes were constructed, which can be used as independent prognostic factors for rectal cancer.
According to our research, this is the rst report to establish a predictive model related to RBPs and the prognosis of rectal tumor. Future research should further clarify how these RBPs are involved explicitly in rectal cancer pathogenesis and develop speci c therapeutic targets and molecular markers for evaluating patient prognosis.

Conclusion
Through a series of bioinformatics analysis methods, we have established new 6RBPs gene markers and nomograms to predict patients' overall survival with rectal cancer. This is further veri ed by q-PCR of clinical specimens, which may help clinical individualized treatment decisions.

Declarations
Ethics approval and consent to participate All gastric cancer clinical samples involved were approved by the patient's informed consent and the ethics committee of the Second A liated Hospital of Nanchang University.

Consent for publication
All authors agree to the publication of this article.
Availability of data statement and materials All data can be obtained from the corresponding author's o ce and public databases.

Competing Interest
The author claims no con ict of interest    Figure 1 Analyze the ow chart of RBPs in rectal cancer from public data from GEO and TCGA databases Identify RBPs related to prognosis. Univariate Cox regression analysis to screen the differentially expressed hub RBPs identi ed by the PPI network.