A Novel Prognostic Model In Oral Squamous Carcinoma: The Functions And Prognostic Values Of Rna Binding Proteins

Background: The biological roles and clinical signicance of RNA-binding proteins (RBPs) in oral squamous cell carcinoma (OSCC) are not fully understood. We investigated the prognostic value of RBPs in OSCC by several bioinformatic strategies. Methods: OSCC data were obtained from a public online database, the Limma R package was used to identify differentially expressed RBPs, and functional enrichment analysis was performed to elucidate the biological functions of the above RBPs in OSCC. We performed protein-protein interaction (PPI) network and Cox regression analyses to extract prognosis-related hub RBPs. Next, we established and validated a prognostic model based on the hub RBPs by Cox regression and risk score analyses. Results: We found that the differentially expressed RBPs were closely related to the defence response to virus and multiple RNA processes. We obtained ten prognosis-related hub RBPs (ZC3H12D, OAS2, INTS10, ACO1, PCBP4, RNASE3, PTGES3L-AARSD1, RNASE13, DDX4, and PCF11) and effectively predicted the overall survival of OSCC patients. The area under the ROC curve (AUC) of the risk score model was 0.781, suggesting that our model had good prognostic performance. Finally, we built a nomogram integrating the ten RBPs. The internal validation cohort results showed a reliable predictive capability of the nomogram for OSCC. Conclusions: We established a novel ten-RBP-based model for OSCC that could enable precise therapeutic targets in the future.


Introduction
Oral cancer, representing a group of tumours located in the alveolar ridge, buccal cavity, mucosa, oor of the mouth, palate, tongue and other parts of the oral cavity, is the eighth most frequent cancer and accounts for an estimated 350 000 new cases and 170 000 deaths per year worldwide. Of these, oral squamous cell carcinoma (OSCC) accounts for more than 95% of oral tumours. A major proportion of global OSCC cases are diagnosed in Asia [1]. Despite some developments in the therapeutic strategies of OSCC that have been achieved over the last few years, the long-term survival rates of OSCC patients are still extremely low, and despite the promising approval of checkpoint inhibitors, precise targeted therapies remain limited in changing treatment failures owing to individual differences in genetic and epigenetic changes among different OSCC patients [2][3][4]. Thus, a systematic study to identify key biomarkers for diagnosis and effective targets for treatment is urgently needed for OSCC.
Nomograms are statistics-based tools that can effectively integrate different variables and are widely used in detecting prognostic factors for various cancers [5,6]. Many models have been established for predicting the outcomes of OSCC by calculating the risk of clinicopathologic characteristics, such as age, sex, smoking status, TNM stage and therapies [7,8]. Although these studies have enhanced our comprehension of OSCC, few have taken into account genetic features, and they are not accurate and reliable enough, as the underlying molecular mechanisms of OSCC might be considerably complex [9]. RNA-binding proteins (RBPs) are proteins that regulate the functions of RNA transcripts at multiple levels [10]. Currently, a total of 1542 RBPs have been identi ed, and more than 800 mRNAs have been discovered in human cells [11,12]. These RBPs bind RNA through the globular RNA-binding domains (RBDs) and affect the splicing, stability or translation of their target RNAs [13,14]. Considering the importance of RBPs in many cellular processes, it is unsurprising that defects in their functions affect several human diseases, including, neurological disorders, muscular atrophies and cancer [15]. Many studies have indicated that the dysregulation of RBPs is related to the biological behaviours of various cancer cells. For instance, human antigen R (HuR) is widely overexpressed in cancer cells and is known to be involved in posttranscriptional gene regulation [16]. HuR not only binds to protein-coding mRNAs but also affects lncRNAs with both positive and negative in uences on their stability [17]. Serine/arginine-rich splicing factor 1 (SRSF1) is another RBP that is upregulated in glioma and handles oncogenic impacts by regulating splicing, RNA stability, and nuclear export [18]. The abnormal expression of heterogeneous nuclear ribonucleoprotein K (hnRNPK) is frequently observed in cancer and might play an important role in hepatocellular carcinoma (HCC) by changing the localization of MALAT1 [19]. Insulin-like growth factor 2 mRNA-binding protein 1 (IGF2BP1) may play an important role in regulating cell proliferation and apoptosis in HCC [20]. Mitogen-activated protein kinase-activated protein kinase-2 (MAPKAPK2) regulates transcript stability and exerts important effects in head and neck squamous cell carcinoma [21]. To sum up, these previous studies have shown that RBPs may play important roles in tumorigenesis and tumour progression. However, only a few of them have been studied recently and revealed to have critical functions in cancers.
In this study, we integrated all relevant OSCC information from The Cancer Genome Atlas (TCGA) database to perform a novel systematic analysis. To reveal the potential functions and clinical signi cance of RBPs in the pathogenesis of OSCC, we identi ed multiple differentially expressed RBPs associated with OSCC. Some of them may serve as potential biomarkers for diagnosis and prognosis.

Dataset Collection and Identi cation of Differentially Expressed RBPs
The RNA-sequencing data of 32 normal oral tissues and 331 OSCC tissues with corresponding clinical information were obtained from TCGA (https://portal.gdc.cancer.gov/). Approval of the institutional ethics committee was not necessary because the data were collected from publicly available online databases. To identify the differentially expressed genes between normal oral and OSCC tissue samples, all raw data were preprocessed by the Limma package (http://www. bioconductor.org/packages/release/bioc/html/limma.html), and genes with an average count value of less than one were excluded. In addition, we also identify the differently expressed RBPs with the thresholds of |log2-fold change (FC)| ≥1 and false discovery rate (FDR) <0.05.

KEGG Pathway and GO Enrichment Analyses
The biological functions of these differentially expressed RBPs were systematically detected by Gene Ontology (GO) enrichment and Kyoto Ecyclopedia of Genes and Genomes (KEGG) pathway analyses. The GO analysis included terms in the cellular component (CC), molecular function (MF), and biological process (BP) categories. All enrichment analyses were carried out by WEB-based Gene Set Analysis Toolkit (WebGestalt, http://www.webgestalt.org/) [22]. For both analysis, P values less than 0.05 and gene numbers greater than 5 were considered statistically signi cant.

PPI Network Construction and Module Screening
The protein-protein interactions (PPIs) among all differently expressed RBPs were investigated in the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database (http://www.stringdb.org/) [23]. Cytoscape 3.8.0 software was used to further visualize the PPI network. The key modules and genes in the PPI network were selected by using the Molecular Complex Detection (MCODE) plug-in with both MCODE scores and node count numbers greater than 5 [24]. A P value<0.05 was considered to indicate a signi cant difference.

Prognosis-related RBP Identi cation and Prognostic Model Construction
We constructed Univariate Cox regression analysis on the differentially expressed RBPs in the critical modules of the training dataset using the survival R package. Subsequently, we performed the log-rank test to extract the prognosis-related RBPs. Based on the above preliminarily screened signi cant candidate RBPs, a multivariate Cox proportional hazards regression model was further performed and the risk score (RS) was calculated to predict the prognostic outcomes of OSCC patients [25]. The risk score formula for each sample was as follows: RS = Expression gene1 × β gene1 + ··· + Expression genen × β genen (where β is the regression coe cient derived from multivariate Cox regression). According to the median RS analysis, the OSCC patients were divided into low-risk and high-risk groups. The log-rank test was used to compare the difference in overall survival (OS) between the two groups. Besides, the survival ROC package was provided to assess the prognostic capability of the above model, and receiver operating characteristic (ROC) curve analysis was executed [26]. In addition, an internal validation cohort was used to test the predictive capability of the nomogram. Calibration curves were constructed using the rms R package to compare the observed probabilities and the predicted probabilities of the nomogram. A P value <0.05 indicated a signi cant difference.

Identi cation of Differentially Expressed RBPs in OSCC
The work ow of this study is illustrated in Figure 1. RNA-sequencing data for OSCC and corresponding clinical information were downloaded from the TCGA database. In total, 331 OSCC samples and 32 normal oral samples were analysed. The Limma R software package was applied to preprocess these data and to identify the differentially expressed RBPs [11]. A total of 1542 RBPs were analysed in this study, and 257 met our inclusion criteria (P<0.05, |log2FC)| >1.0), which consisted of 121 downregulated and 136 upregulated RBPs. The expression of these differentially expressed RBPs is shown in Figure 2.

Functional Enrichment Analysis of the Differentially Expressed RBPs
The identi ed RBPs were divided into two groups, the upregulation group and the downregulation group, to explore their potential functions and mechanisms. We carried out GO and pathway analyses for these differentially expressed RBPs. The results showed that the upregulated RBPs were signi cantly enriched in defense response to virus and regulation of mRNA metabolic process (Table 1, Figure 3A), while the downregulated RBPs were considerably enriched in mRNA processing, RNA splicing, and RNA catabolic process (Table 1, Figure 3B). Regarding the CC category, the upregulated RBPs were enriched in cytoplasmic ribonucleoprotein granule, ribonucleoprotein granule, and spliceosomal complex (Table 1, Figure 3A), while the downregulated RBPs were enriched in the ribosome, ribosomal subunit, and mitochondrial matrix (Table 1, Figure 3B). The MF analysis indicated that the upregulated differentially expressed RBPs were mainly enriched in catalytic activity acting on RNA, nuclease activity, and helicase activity (Table 1, Figure 3A), whereas the downregulated RBPs were notably enriched in mRNA binding, catalytic activity acting on RNA, and nuclease activity (Table 1, Figure 3B). Additionally, the results of KEGG pathway analysis showed that the upregulated RBPs were enriched in the RNA transport, spliceosome, and mRNA surveillance pathways (Table 1, Figure 3C), while the downregulated RBPs were mainly enriched in the RNA transport, ribosome, and mRNA surveillance pathways (Table 1, Figure 3D).

PPI Network Construction and Key Module Screening
To better understand the potential roles of these identi ed RBPs in OSCC, we performed a PPI network using the STRING database and Cytoscape software. This PPI network included 228 nodes and 1384 edges in total ( Figure 4A). We further analysed the coexpression network to detect potential critical modules using the MCODE plug-in and identi ed the key modules ( Figure 4B). Of these, the rst essential module, module 1, contained 49 nodes and 316 edges, the second module, module 2, consisted of 8 nodes and 28 edges, module 3 consisted of 13 nodes and 43 edges, module 4 consisted of 6 nodes and 12 edges, the next module consisted of 4 nodes and 6 edges, and the last module consisted of 16 nodes and 29 edges (Supplementary Figure 1A-F). The functional analysis showed that the RBPs in these key modules were mainly enriched in mitochondrial translation, mitochondrial gene expression, and cellular protein complex disassembly. More details are provided in Supplementary Table 1.

Identi cation of Prognosis-related RBPs
Of the differentially expressed RBPs from the PPI network, 20 prognosis-associated candidate hub RBPs were identi ed by univariate Cox regression analysis (Supplementary Table 2). The candidate RBPs were analysed by multivariate Cox regression to determine their impact on patient survival time and clinical outcomes. Ten hub RBPs, ZC3H12D, OAS2, INTS10, ACO1, PCBP4, RNASE3, PTGES3L-AARSD1, RNASE13, DDX4 and PCF11, were identi ed as independent predictors in OSCC (Supplementary Table 3).
A total of 221 OSCC patients were divided into low-risk and high-risk groups according to the median RS. The results showed that patients in the low-risk group had signi cantly longer OS times than those in the high-risk group ( Figure 5A). The time-dependent ROC analysis was further conducted to estimate the prognostic ability of the identi ed RBPs. From the chart, we can see that the area under the ROC curve (AUC) of the risk score model was 0.781 ( Figure 5B), which suggested that our model has high prognostic accuracy. The expression heat map of the ten prognostic RBPs is illustrated in Figure 5C, and the RS and survival status for the low-and high-risk groups are displayed in Figure 5D. Moreover, the results from our internal validation cohort, using the same formula to assess the predictive capability of the ten-gene prognostic model, showed that patients with lower risk scores had better OS ( Figure 6A-D). These results indicated that the prognostic model has signi cant predictive capability.

The Prognostic Value of Different Clinical Parameters
To assess the prognostic signi cance of different clinical features, we performed a univariate Cox regression analysis of OSCC patients in the training set and internal validation set. The results showed that stage, grade, age and RS were correlated with the OS of OSCC patients in both cohorts (P<0.05) ( Figure 7A, B), whereas only stage and RS were independent prognostic factors for OS in the multiple regression analysis of both cohorts (P<0.05) ( Figure 7C, D).

Development and Validation of a Nomogram Based on the Hub RBPs
We established a nomogram for predicting the 1-, 2-and 3-year OS based on the multivariate analysis results ( Figure 8A). The points for each independent prognostic factor listed in the nomogram were summed the total points value, which is projected on the bottom scale and used to determine the 1-, 2and 3-year OS probabilities for each patient. We plotted the calibration curves, which indicated that there was no apparent deviation from the ideal line, with optimal accordance achieved between the nomogrampredicted outcomes and the actual survival rates in both the training and validation cohorts ( Figure 8B, C).

Discussion
The survival rate of OSCC patients remains low since there are limited prognostic factors that could provide effective targeted therapies and precisely predictable outcomes [9]. In recent years, the regulation of gene expression by RBPs has become the centre of attention [27]. Previous studies have demonstrated that RBPs, which regulate oncogene expression at the posttranscriptional level, signi cantly affect tumorigenesis and progression, indicating that RBPs are potential cancer biomarkers [28]. In fact, several RBPs have been identi ed to predict OS in cancer. Zhao et al identi ed a novel extra-ribosomal role of RPS3 as a potential therapeutic target in HCC via the posttranscriptional regulation of SIRT1 expression [29]. Soni et al highlighted that MK2, as the master regulator of RBPs, plays an important role in the regulation of transcript stability and tumour progression, as well as the possibility of the use of MK2 as a therapeutic target in tumour management [30]. However, the role of RBPs is rarely understood in OSCC, even though there have been studies focused on the functions of RBPs in head and neck carcinoma. Berggen et al revealed that MK2 could be a potential prognostic biomarker for head and neck cancer and that MK2 pathway activation can mediate radiation resistance in head and neck squamous cell carcinoma (HNSCC) [31]. Jiang et al indicated that LINC00460 is a promising candidate prognostic predictor for HNSCC by facilitating the entry of an RBP called PRDX1 into the nucleus and promoting epithelial-mesenchymal transition (EMT) in HNSCC cells [32]. However, most of these studies analysed the functions of single RBPs, while information is now rapidly emerging on the vital functional role of the molecular network in tumour initiation and progression, indicating that we should investigate prognostic markers collectively [33] In the present study, we identi ed dysregulated RBPs in OSCC based on the TCGA database and systematically analysed the related biological pathways of these RBPs. We further established a PPI network to screen the modules that may play key roles in OSCC. Subsequently, we executed Cox regression analysis and obtained ten prognosis-related RBPs. Moreover, we carried out risk score analysis of this ten-gene prognostic model in the training and validation cohorts to explore the potential biological functions and clinical signi cance of these ten hub RBPs. The model exhibited good predictive capability in both cohorts. Finally, we constructed and validated a nomogram to predict OSCC prognosis based on these hub RBPs. Our ndings may provide novel biomarkers for indicating the prognosis of OSCC patients.
The biological function and pathway enrichment analyses of the differentially expressed RBPs showed that the upregulated RBPs were signi cantly enriched in the defense response to virus, response to virus, regulation of mRNA metabolic process, RNA transport, RNA degradation, spliceosome, and mRNA surveillance pathways. The downregulated RBPs were mainly enriched in the mRNA processing, RNA splicing, RNA catabolic process, RNA transport, ribosome, and mRNA surveillance pathway.pathways. Many studies have reported the role of metabolism, RNA degradation and RNA transport in OSCC [34][35][36]. Human papillomavirus (HPV) is closely associated with OSCC [37]. HPV infection signi cantly increases the number of binding sites for RBPs and subsequently upregulates a group of oncogenic genes by absorbing RBPs [38]. A previous study showed that RBPs can regulate mRNA stability or translational e ciency via ribosomes and are involved in various diseases [39]. These results suggest that RBPs can affect the biological behaviour of OSCC by interacting with components related to HPV or regulating multiple biological processes of OSCC cells.
We obtained ten hub RBPs by constructing a PPI network and performing Cox regression analysis: ZC3H12D, OAS2, INTS10, ACO1, PCBP4, RNASE3, PTGES3L-AARSD1, RNASE13, DDX4, and PCF11. Most of them are associated with tumorigenesis and the progression of cancer patients, such as OAS2 [40], INTS10 [41], ACO1 [42], PCBP4 [43], DDX4 [44] and PCF11 [45]. Of these, OAS2, an interferon (IFN)induced antiviral enzyme, has been reported as an OSCC-related gene. Similarly, Wang et al analysed three microarray datasets and found that the expression of OAS2 was closely related to the survival rate of patients with oral cancer [46], which was consistent with our results. Interestingly, the IFN system is a central and powerful host rst-line antiviral defence strategy that is inhibited by HPV 16 in HNSCC [47]. A signi cant level of OAS2 could activate the IFN signalling pathway and in turn produce a T-cell response that is effective in clearing HPV infection [48].
Thus, OAS2 may serve as a key regulator and may provide rational insight into immunotherapy for HPV OSCC. However, the detailed mechanisms underlying these hub RBPs in OSCC are still unexplored, and further in-depth studies to discover their potential roles would be valuable.
Overall, these results suggested that our prognostic model of the ten-RBPs has a certain value for the treatment decisions of OSCC patients. Nonetheless, our study had several limitations. First, the multivariable survival analysis contained only basic prognostic factors from the TCGA database and was unable to provide other possible clinical factors, such as the status of metastatic lesions and the HPV status of patients, which may affect the reliability of the Cox regression analysis. Second, extensive evidence indicates that OSCC is an extremely heterogeneous tumour at the genetic and molecular levels, and limited by the data of the study, the expression of all genes from the TCGA cohort was detected in a piece of OSCC tissue from one patient. In the future, we will detect the expression of the ten genes by single-cell whole-genome sequencing or quantitative RT-PCR analysis in several pieces of OSCC specimens from one patient to determine whether the ten-gene signature is a reliable and workable OS prediction marker for OSCC. In addition, we will seek cooperation with other hospitals to obtain more clinical patient cohorts for gene model validation.

Conclusion
We systemically explored the prognostic value of differentially expressed RBPs by performing a series of bioinformatics analyses in OSCC. A prognostic model of the ten hub RBPs was constructed and further validated. Our results were used to establish a novel prognostic model for OSCC and will contribute new candidate markers to aid in making therapeutic decisions in the future.