3.1 Differentially expressed RBPs in CRC and their enrichment network
We acquired an exhaustive list of 1542 RBPs from the literature published by Gerstberger et al. (Supplementary Table 1). The CRC differential genes were downloaded from the GEPIA 2 online database, and it was found that there were 71 down-regulated RBP genes and 229 up-regulated RBP genes in CRC (Supplementary Table 2). In order to understand the potential mechanism of RBP in CRC, KEGG analysis and GO analysis were performed on the above 300 differentially expressed RBP genes (Figure 1).
3.2 RBPs in CRC participate in the network regulation of AS
It could be seen from Figure 1 that RBPs were enriched in the spliceosome pathway, indicating that RBPs were important part of participating in AS. As we all know, AS events are mainly regulated by splicing factors. Splicing factors bind to pre-mRNA and affect exon selection and splicing site selection [13]. More importantly, AS disorders in the tumor microenvironment may be caused by a limited number of splicing factors [14]. Thus, an important question was whether a large part of these AS events related to the prognosis of CRC (Supplementary Table 3) may be regulated by some key splicing factors. According to the data collected by zhen et al. in 2018 [15], most of the splicing factors belong to RBPs. Therefore, the regulatory relationship between RBPs and AS events was worth exploring. For this purpose, the expression of RBPs from the RNA sequencing data of the TCGA CRC cohort was analyzed. Next, in the CRC cohort, through the splicing regulation network established by significant correlation (|R|>0.55, t test, P<0.05), the correlation between the expression levels of these 1542 RBPs and the PSI value of each AS events related to the prognosis of CRC was analyzed. In the splicing-related network shown in Figure 2, there were 31 AS events related to the prognosis including 24 up-regulated AS events (red dots) and 7 down-regulated AS events (green dots) significantly correlated with 22 RBPs (purple dots). Interestingly, most RBPs (purple dots) were associated with more than one AS event, some of which played an opposite role in the regulation of differential AS events. Moreover, our network revealed that different splicing factors competed for the same binding site (AS event), which at least partly explained why the transcript was able to produce several different splicing isoforms. In addition, we observed from the Figure 2 that DDX39B as RBP had the most AS events, mainly up-regulating AS events. This showed that DDX39B may be a key factor in modulating AS events related to CRC prognosis.
3.3 Potential mechanisms that may be involved in the regulation of differentially expressed RBP genes in CRC
Gene mutations are ubiquitous and occur spontaneously. Not all mutations cause obvious changes in cell functions [16]. However, mutations in key cellular genes cause developmental disorders, which is one of the main ways that proto-oncogenes are transformed into a carcinogenic state [17]. The gradual accumulation of multiple mutations in life will result in cancer, which is also one of the important mechanisms for the occurrence and development of CRC [18]. We speculated that the underlying mechanism of RBP gene differential expression may be caused by mutations, so we used 8,920 CRC samples on the COSMIC platform to investigate the occurrence of differentially expressed RBP gene mutations. We were surprised to discover that all the differentially expressed RBP genes were mutated in CRC samples, and the mutation frequency of 11 RBP genes (PRKDC, RBMS3, SRRM2, HELZ2, MSI2, AFF3, DZIP1, TNRC6A, SND1, QKI, ESRP1) was more than 5% (Supplementary Table 4). Then, in order to further understand the mutations of the first 11 RBP genes, we analyzed 526 TCGA colorectal adenocarcinoma samples (TCGA, pancancer Atlas) on the cBioportal platform, and 180 (34%) samples showed mutations (Figure 3A). The mutation frequency of these 11 differentially expressed RBP genes in CRC was indeed high, and the mutation frequency of PRKDC and HELZ2 was more than 10% in the Figure 3A. These two genes may be key therapeutic targets for CRC.
Additionally, we also noticed the role of epigenetics in CRC. Therefore, we investigated the methylation of differentially expressed RBP genes on 8920 CRC samples on the COSMIC platform and revealed that 21 differentially expressed RBP genes had DNA methylation changes (Supplementary Table 5). Thus, we used the TCGA 450k array to perform independent prognostic analysis of TCGA CRC methylation sites and determined that 64 RBPs with prognostic methylation sites (Supplementary Table 5) included 10 differentially expressed RBP genes (RPL37, NOL10, CD3EAP, EIF5A, OASL, NHP2, RRS1, NUFIP1, RRP12 and EIF4E). Then we tested the correlation between the degree of methylation of these 10 genes on CRC cell lines and their expression on the CCLE platform. It was observed that the degree of RPL37 methylation was negatively correlated with its mRNA expression level (Figure 3B).
3.4 Construction of RBPs related prognosis model
RBPs play an important role in the occurrence and development of CRC [19]. We collected 1,855 survival-related genes for CRC through the GEPIA 2 database and found that among the 1,542 RBPs that have been cataloged, only 96 RBPs were related to the survival of CRC. Simultaneously, CRC transcripts and clinical information from the TCGA platform were downloaded, and data integration was carried out to obtain the clinical information of 540 CRC patients using R software. Thus, the 96 RBPs obtained above were subjected to univariate cox regression analysis to further screen twenty RBPs related to CRC survival (Figure 4). Next, using R software, the above 540 CRC patients were randomly divided into two groups according to their survival status (to ensure that the number of surviving patients and the number of dead patients is not much different), namely the train group and the test group (Supplementary Table 4). Finally, according to the train group, a multivariate analysis was performed to construct a CRC prognostic prediction model of five genes (CAPRIN2, RPL3L, CCAR2, GSPT1 and MRPS18C) (Table 1).
3.5 Validation of RBPs related prognostic model
By using the risk score formula to combine the effects of each of these five RBP genes, the RBP risk score was calculated for each patient in the train group and the test group. According to the RBP risk score, CRC patients were divided into low-risk and high-risk groups (Figure 5). We conducted a survival analysis on the risk scores of the train group and the test group and determined that the risk scores were both poor prognostic indicators (Figure 6). For further verification, univariate and multivariate independent prognostic analysis involving age, gender, and stage in the train group and the test group determined that the RBP risk score was an independent predictor of patient survival (Figure 6). Lastly, the ROC curve of the train group and the test group was drawn using R. According to its area under the curve (AUC) value, the accuracy of the model was basically at a medium level (Figure 7).
3.6 Genetic alterations of five RBP signatures in CRC
We put the five RBP signatures of the prognostic model on the Oncomine platform (Hong colorectal statistics) and compared the CRC samples with the normal samples. We observed that the transcription level of CAPRIN2, GSPT1 and CCAR2 (CCAR2 is also known as DBC1) showed an upward trend, while the transcription level of RPL3L and MRPS18C decreased in CRC (Figure 8). Subsequently, we performed RT-PCR detection of CAPRIN2 mRNA in six existing CRC cell lines and normal intestinal epithelial cells (Figure 9A), which confirmed that CAPRIN2 was indeed highly expressed in CRC. Besides, we investigated the transcription levels of the five signatures in various CRC cell lines on the CCLE platform. The expression of RPL3L mRNA in CRC cell lines of patients with TNM stageⅠwas lower than that of patients with TNM stageⅡ-Ⅳ(Figure 9B). In addition, except for the LS411N cell line (from poorly differentiated CRC patient), the expression of CCAR2 mRNA in CRC cell lines of patients with Ducks’ type B was significantly lower than that of patients with Ducks’ type C-D, indicating that CCAR2 may be related to the progression and malignant degree of CRC (Figure 9C). Finally, we analyzed the immunohistochemical results of five signatures in normal colorectal tissue and CRC on the platform of The Human Protein Atlas (https://www.proteinatlas.org/). According to the results of immunohistochemistry, the expression level of MRPS18C in CRC was lower than that in normal tissues (Figure 9D and 9E), which was consistent with the result of mRNA level.