Establishment of A Risk-Predicting Twenty-Two-Gene Signature Based on the Prognosis-Related RNA Binding Proteins of Breast Cancer


 Background

Breast cancer has been the leading cause of death among women worldwide. RNA-binding proteins (RBPs) are promising novel biomarkers for patients with malignant tumors, the abnormal expression of which is closely associated with the development of breast cancer. This study aimed to identify RBPs associated with the survival outcomes of breast cancer and to construct a prognostic model and a clinical prediction nomogram for breast cancer.
Methods

The transcriptome data of breast cancer were downloaded from TCGA database. GO and KEGG analyses were performed to clarify the gene functions and regulatory pathway. Cox and LASSO regression analyses were utilized to analyze the prognosis prediction effect of RBPs and clinical characteristics in breast cancer and create a risk score model. A nomogram was also built by merging the model and clinicaopatholigical characteristics, which was validated using calibration curves.
Results

A prognostic risk score model of the 22-RBP signature was established. This risk score predicted 3-, 5-, and 10-year overall survival rates sensitively and accurately. Compared to other clinicaopatholigical characteristics, this risk score had better predictive ability. We also constructed a nomogram based on risk scores and clinicaopatholigical characteristics. The nomogram may predict the 1-, 3-, and 5-year survival rates of patients with breast cancer.
Conclusions

RBPs play an important role in the development and survival outcomes of breast cancer by regulating multiple biological processes. Furthermore, this study identified and constructed a 22-RBP based signature and a clinical nomogram for predicting the survival probability of patients with breast cancer.


Background
In 2020, the global cancer data showed that breast cancer (BC) had exceeded lung cancer to become the most frequently diagnosed cancer in women. In 2020, there were 2.26 million newly diagnosed BC, representing 24.5% of all new cancer cases in women, with more than 680,000 deaths, accounting for 15.5% of cancer deaths (1). High incidence and mortality are important factors affecting the survival time of women with BC. the cell transcriptome and proteome, thereby affecting cell growth, proliferation, and invasion (6). In addition, there is increasing evidence suggesting that RBP disorders are associated with tumor progression and prognosis. HuR has been identi ed as an RBP that acts as an important oncogenic driver that controls mRNA stability and promotes proliferation and metastasis of gastric cancer (7). RBP MEX3A can independently re ect the clinical course of BC, and MEX3A overexpression predicts poor prognosis (8-9). Although these ndings are informative, to date, few researches have systematically evaluated the prognostic value of RBP in BC.
In this study, we performed an comprehensive analysis based on the data gained from The Cancer Genome Atlas (TCGA) database. We used Cox regression and survival analyses to establish prognostic RBP-related BC gene signatures. We also built a prognostic nomogram based on risk score and clinicopathological features to predict the survival outcomes of BC.

Methods
Data acquisition and RNA-binding protein screening From the TCGA dataset (https://portal.gdc.com), the raw count of tumor RNA sequencing data (level 3) and the clinical information were obtained using the TCGAbiolinks package (version 2.20.0) (10). We obtained human RBP data from this study (3). Survival analyses were used to explore the correlations between the key prognostic genes and prognoses. In addition, we also generated Venn diagrams by the Draw Venn Diagram tool to identify key prognostic RBPs.

RBP signature establishment and veri cation
Hub RBPs were identi ed, and the predictive signature was constructed using the least absolute shrinkage and selection operator (LASSO) Cox regression and multivariate Cox analysis. The LASSO regression analysis can help narrow down the candidate gene number further. Cox proportional hazards models were utilized to estimate the hazard ratios (HRs) and to calculate the coe cients for risk RBPs in the equations. Survival analysis and time-dependent receiver operating characteristic (ROC) analyses were utilied to validate the model. Gene Ontology (GO) functional and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis GO and KEGG pathway analysis were performed to explore the biological functions and pathways of the differentially expressed RBPs.GO analysis were constituted by biological process (BP), cellular component (CC), and molecular function (MF). All these analyses were conducted by clusterPro ler (11), ggplot2, and GOplot (12) functions to present the results (12).

Protein-protein interaction (PPI) network construction
The STRING database (http://string-db.org) was utilized to build PPI network of the differentially expressed RBPs. Then Cytoscape 3.8.2 (http://cytoscape.org/) software was used to analyze the data gained from the STRING database. Furthermore, Molecular Complex Detection, was used to screen hub RBPs of the network.

Statistical analyses
In the present study, univariate and multivariate Cox regression, LASSO regression, Kaplan-Meier curveand log-rank test were applied. For the Kaplan-Meier curve, the p-value and HR were obtained using the log-rank test and univariate Cox regression. Furthermore, the nomogram and calibration curves were drawn by the "rms" package. LASSO regression was performed using the "glmnet" package. Forest plots were constructed using the geom_violin (ggplot2). All these analysis methods and R software packages were implemented using R version 4.1.1 software (13). P <0.05 was considered as statistically signi cant.

Data analysis
RNA-Seq and clinical data of 1,097 BC and 113 normal cases were gained from the TCGA database. The baseline for the inclusion of clinical data only for female patients was presented in Table 1. We further performed survival analysis of the whole genome of BC, using univariate Cox and log-rank methods, according to the standard of p <0.05, nally obtaining 973 genes with signi cant prognosis (Additional le 1). Seventy-ve RBPs with signi cant prognostic signi cance were nally obtained from the 1,542 RBPs included in the research. High and low risks were de ned as HRs >1 and <1, respectively. Finally, there were 28 low-risk and 47 high-risk RBPs. HR values with 95% CIs for the top 20 signi cantly prognostic RBPs are shown in the forest plot ( Figure 1). To further understand the potential molecular mechanisms and functions of these RBPs, we performed GO and KEGG pathway analyses for all these signi cantly prognostic RBPs. Regarding BPs, low-risk RBPs were mainly enriched in cotranslational protein targeting to membrane, signal recognition particledependent cotranslational protein targeting to membrane, and so on. High-risk RBPs were mostly enriched in ncRNA processing, translational termination, ribosome biogenesis, mitochondrial translational termination, and protein-containing complex disassembly. Regarding CCs, low-risk RBPs were mainly enriched in cytosolic ribosomes, ribosomal subunits, cytosolic parts, ribosomes, and cytosolic large ribosomal subunits. However, the high-risk RBPs were notably enriched in ribosomes, ribosomal subunits, organellar ribosomes, mitochondrial ribosomes, and small ribosomal subunits. In the MF category, lowrisk RBPs were associated with structural constituents of ribosomes, rRNA binding, cytidine deaminase activity, translation elongation factor activity, and deaminase activity. However, the high-risk RBPs were enriched in RNA 7-methylguanosine cap binding, rRNA binding, and so on. KEGG pathway analysis presented that low-risk RBPs were also mainly related to spliceosomes and ribosomes, whereas high-risk RBPs were also mainly related to ribosomes, ribosome biogenesis in eukaryotes, and RNA transport ( Figure 2).

Results of PPI network and screening hub RBPs
We further explored the effects of prognosis-related RBPs in BC by constructing a PPI network ( Figure  3A). Furthermore, we analyzed the network to identify hub gene by the MODE plugin and identi ed a hub gene module with 25 RBPs ( Figure 3B).
Establishment and veri cation of the risk core signature Based on the 75 prognosis-associated RBPs, we performed LASSO regression to identify RBPs which have the highest potential prognostic effect. Eveuntually, 22 hub RBPs were obtained and utilized to construct a predictive signature ( Figure 4A Figure 4D). Next, we ranked patients with BC according to their risk scores and analyzed the survival status of every patient on a dot plot. The results showed that patients in the low-risk group have higher survival rates and times than those patients in the high-risk group.
The expression levels of the 22 hub RBPs in the high-and low-risk groups were also shown in Figure 4E.

Relationships between risk score and clinicopathological characteristics
Univariate and multivariate Cox regression analyses of TCGA data were conducted to determine whether the risk score was an independent prognostic predictors. Univariate analysis showed that T, N, M, and tumor-node-metastasis (TNM) stages and risk scores were signi cantly associated with OS in patients with BC. Furthermore, multivariate analysis predicted that T, N, M, and TNM stages and risk scores were independent prognostic predictors for OS (Table 2). In conclusion, we can say that the risk score based on 22 RBPs may serve as an independent prognostic predictor for survival in patients with BC. To further explore the prognostic effect of the signature, we analyzed the association between the risk score and clinicopathological parameters. The results suggested that the risk score was signi cantly related to the M stage; the greater the grade of metastasis, the higher the risk score. However, risk scores were not related to the T stage, N stageand TNM stage ( Figure 5). The above results further con rmed that the risk score could be an independent prognostic predictor in patients with BC. To develop a quantitative method for predicting the survival outcomes of patients with BC, we built a nomogram that integrated risk score and T stage, N stage, and M stage. Every variable was assigned a score; the scores of the four variables were then added, and a vertical line was drawn from the total score to the nomogram point scale to determine the estimated 1-, 3-, and 5-year survival rates ( Figure 6A). The C-index value of the nomogram was 0.917 (0.908-0.927) for this BC cohort, suggesting that this nomogram had good discrimination capability. The calibration curves suggested that the nomogram were signi cantly consistent with actual observations for 1-, 3-, and 5-year OS in the TCGA-BC cohort, indicating that it was trustable ( Figure 6B-D).

Discussion
Breast cancer has the highest incidence and mortality rates among women worldwide (1).
Clinicopathologicalfeatures, such as age, clinical stage, and tumor classi cation are major predictors of survival outcomes in patients with BC (14). Recent researches have shown that RBP is closely related to the occurrence and development of various cancers and plays a critical role in tumor invasion, migration, and metastasis (15). However, only a few RBP mechanisms have been studied, and the role of RBP in BC is not fully understood. Therefore, compared with a single clinicopathological parameter, establishing genetic markers and a nomogram using multiple biomarkers and clinicopathological features is a relatively effective method for predicting tumor prognosis. Therefore, signatures and nomograms based on RBPs may be more accurate in predicting BC survival.
In this research, we used RNA-Seq data and clinical data from the TCGA database for BC and identi ed 75 RBPs with signi cant prognostic signi cance. The biological functions and pathways of signi cant prognostic RBPs in BC were investigated via PPI network analysis and functional enrichment analysis. Next, we used LASSO and Cox regression analysis to screen out 22 key RBPs (CPSF4, RPL14, EEF1B2, HNRNPA1, RPL38, PTRHD1, ENDOV, PARP12, APOBEC3F, EIF5AL1, MCTS1, MRPS18C, AIMP2, MRPL18, ZMAT3, LARP1, MRPS2, AIMP1, POP1, IGF2BP1, MRPS28, NANOS1) most associated with prognosis. Based on 22 RBPs associated with prognosis, we established a promising 22-RBP signature and nomogram to predict the survival outcome of patients with BC. Based on our risk score signature, survival analysis showed signi cant differences in OS between the high-and low-risk subgroups, with the low-risk group generally surviving better than the high-risk group. The ROC curve showed that our prognostic model had good accuracy, with AUC values greater than 0.75 at 3, 5, and 10 years. Some studies have presented that stage is an independent prognostic indicator of BC, and the risk score also has good predictive performance, even equal to the predictive ability of the TNM stage.
We divided the 22 RBPs with signi cant survival signi cance into high-risk and low-risk RBPs, according to the HR value of survival analysis. The high-risk RBPs were MCTS1, MRPS18C, AIMP2, MRPL18, ZMAT3, LARP1, MRPS2, AIMP1, POP1, IGF2BP1, MRPS28, and NANOS1. Low-risk RBPs were CPSF4, RPL14, EEF1B2, HNRNPA1, RPL38, PTRHD1, ENDOV, PARP12, APOBEC3F, and EIF5AL1. Poly (ADP-ribose) polymerase 12 (PARP12) is a mono-ADP-ribosyltransferase. Studies have shown that PARP12 de ciency promotes the migration and invasion of hepatocellular carcinoma (HCC) cells and increases HCC metastasis. This is similar to the conclusion of this study that PARP12 is a protective RBP in BC (16). Cleavage and polyadenylation speci city factor 4 (CPSF4) is a member of the CPSF complex, suggesting that it has low expression and poor survival in breast tumor tissues. In contrast to Lee's study (17), CPSF4 was highly expressed in triple-negative BC and promoted the invasion and migration of BC cells. This may be due to the high expression of CPSF4 in certain tumor phenotypes, such as lung and liver cancers (18, 19). Studies (20) have shown that RPL14, associated with various tumors, has lower expression in various tumors and inhibits metastasis, invasion, and epithelial-to-mesenchymal transition. In this study, as a protective RBP, the high expression of RPL14 can signi cantly increase the survival time of patients, which is consistent with the nding of Lin's study. RPL38 belongs to the L38E family of ribosomal proteins and is located in the cytoplasm of cancer cells. A previous study (21) showed that knockdown of RPL38 decreased the proliferation and invasion of gastric cancer cells. In this study, we observed a trend of low expression in breast tumor tissues. As an oncogene, MCTS1 canpromote tumor cell proliferation and migration and inhibit apoptosis. This further indicates that MCTS1 is a high-risk gene in this study (22)(23)(24). Studies have found that AIMP2 plays a key role in the control of cell fate. It showed antiproliferative activity by enhancing the growth arrest signal of tumor growth factor-β signaling. AIMP2 also promotes cell death by activating p53 and tumor necrosis factor-α apoptotic signals (25). However, a splicing variant of AIMP2, which disrupts the normal function of AIMP2 and induces tumorigenesis, is more common in tumor tissues (26). This further indicates that the patient was classi ed as a high-risk RBP. ZMAT3 is an RNA splicing and homeostasis regulator. As an important downstream splicing regulator of p53 (27), ZMAT3 plays a vital role in p53-mediated tumor suppression (28). In the LARP1 interaction group, mRNA transcription of mTOR was stabilized by LARP1. At the functional level, LARP1 can promote cell migration, invasion, unanchored growth, and tumorigenesis in vivo (29). In addition, LARP1 expression increased in epithelial cancers such as cervical cancer (30), whose expression is associated with disease progression and poor prognosis, respectively. LARP1 may be a promising therapeutic target. Studies have shown that IGF2BP is a novel oncogenic factor in many solid tumors, including ovarian cancer and breast cancer, and its high expression is closely related to metastasis and poor prognosis (31). Current studies have suggested that IGF2BP1 can be used as a therapeutic target and may be used in the clinical diagnosis and prognostic evaluation of bladder cancer and neuroblastoma (32)(33).
At present, a prognostic model based on BC prognostic-associated RBP has not been reported. This is the rst study to use multiple prognostic-associated RBP genes to built a prognostic model for patients with BC. Although our model has good predictive performance, some limitations also need to be discussed. First, 22 RBP gene signatures were constructed based on the TCGA-BCRA dataset; however, they could not be veri ed in clinical practice and the external cohort. Second, the races of patients with BC in the TCGA database are white and black, and it is not clear whether the predictive effect is the same among Asians and other races. In addition, whether this clinical prediction model has good validity in other BC subtypes requires further veri cation. Finally, our study is retrospective and needs to be further validated in larger prospective studies.

Conclusions
In this study, We found RBPs played an important role in the development and survival outcomes of BC by regulating multiple biological processes. Furthermore, we identi ed and constructed a 22-RBP based signature and a clinical nomogram for predicting the survival of patients. We think our results may assist in prognosis management postoperative intervention of high-risk patients with BC clinically. These data and materials can be available from the corresponding author for rational reasons.

Competing interests
The authors declare that they have no competing interests Funding This research was supported by Natural Science Foundation of Liaoning Province (2020-MS-178).

Authors' contributions
Xi Gu and Xudong Zhu designed this research. Tong Zhu and Jiawen Bu analyzed these data from public databases. Tongzhu, Xudong Zhu, Jiawen Bu, Yi Jiang, JinqiXue, and Nan Niuwrote this paper. All authors have read and approved the nal manuscript.