Development and Validation of A Seven-microRNA Signature as a Novel Prognostic Biomarker for Bladder Cancer


 Background The differential expression of miRNAs has played a significant role in bladder tumors. The aim of our study was to screen new biomarkers . Methods Through differential analysis of bladder cancer mRNA and miRNA expression data in the TCGA, differential genes and miRNAs were screened. Furthermore, Cox univariate analysis and multifactor analysis were used to establish a prognostic prediction model . The predictive ability of the prognostic model was then verified on the patient. The action mechanism of these miRNAs was analyzed.Results By the differential analysis and standardization of miRNA expression profiles. Differentially expressed miRNAs were screened, then all the patients were then randomly divided into train group and the test group. 23 miRNAs were revealed , then a Seven-miRNA signature prognostic biomarkers was constituting.Univariate cox regression and multivariate cox regression considering other clinical factors displayed that the seven-miRNA signature could serve as an independent prognostic factor.Target genes of these seven miRNAs were analyzed by KEGG signaling pathway and GO enrichment analysis. . Conclusion The prognostic model constructed by seven miRNAs has possessed certain degree of sensitivity and specificity for the prediction of the survival of bladder cancer patients, which can be used as a potential new clinical marker for bladder cancer patients.

Introduction Bladder cancer (Bca), one of the most commonly-seen malignant tumors of the urogenital system, has ranked the tenth among the most commonly-seen tumor globally, with an estimated number of 549,000 new cases and 200,000 deaths annually [1,2]. However, most patients (70-75%) with bladder cancer were initially diagnosed as non-muscle invasive bladder cancer (NMIBC), with super cial location. About 70% of them will relapse after treatment [3]. In addition, 25-30% of patients were diagnosed as muscle invasive bladder cancer (MIBC), often accompanied by local or systemic metastases [4,5]. In recent years, with the development of science and new technology, diagnosis and treatment of bladder cancer have made substantial progress, yet the prognosis still lags behind with poor performance. Currently, the main prognostic factors, namely the staging and pathological grading of tumor lymph node metastasis (TNM), were not sound enough to predict the clinical outcome alone [6] These clinical and pathological risk factors fail to clearly predict the risk of relapse, chemotherapy response, and overall survival (OS) of patients. Therefore, it is of great necessity to include new prognostic biomarkers into the current staging system, and these biomarkers can also be used as therapeutic targets.
miRNA functions as an endogenous small single-stranded non-coding RNA that does not contain an open reading frame (ORF) and it does not encode the protein by itself [7]. Most studies have found that it is about 22 nt in length, usually ranging between 21 and 25 nt. Its expression is characterized by speci city in tissue and time [8,9], which can regulate its translation and stability through complementary combination with mRNA, thus playing a role of posttranscriptional regulation. Although miRNAs tend to be relatively small in number, more than 1/3 of the cell transcriptome is regulated by miRNA. miRNA targets for mRNA, which regulates gene expression at the post-transcriptional level. Through its combination with the 3'-UTR region, the following effects can be produced: during its period of translation initiation and peptide chain elongation, translation is stagnated and non-functional protein are produced with cleavage of miRNAs, and degradation of stability are caused by the removal of mRNA from poly-A tails. Because some genes regulated by miRNA are closely related to the cell cycle, they are considered to have played an essential role in tumorigenesis which may become potential proto-oncogenes or tumor suppressor genes [10]. In addition, related studies have demonstrated that miRNAs participate in cell growth, proliferation, differentiation, and apoptosis by regulating the expression of genes encoding proteins [11]. Studies have suggested that miRNAs function as potential biomarkers for early cancer detection and accurate prognosis, as well as targets for more effective treatments [12]. To date, however, only a few reports have described miRNA expression and its relationship with bladder cancer survival.
This study is based on miRNA sequencing data in the TCGA database, and a prognostic prediction model composed of multiple miRNAs is constructed to verify the prognostic predictive value of the prediction model by evaluating the differential expression of miRNAs. At last, combined with transcriptome data, the action mechanism of these miRNAs was analyzed through multi-omics analyses.

Data Download
The transcriptome data of 438 patients with bladder urothelial carcinoma were downloaded from the Cancer Genome Atlas (TCGA) o cial website (https://portal.gdc.cancer.gov/repository) [13]. The types of data include (transitional cell papilloma and carcinoma, epithelial tumor, adenoma and adenocarcinoma, squamous cell tumor) and miRNA expression pro le data. All mature miRNA sequences (mature.fa) in Fasta format were all downloaded from the miRBase website (http://www.mirbase.org/). We combined these two sets of data in the Perl language to obtain the expression pro le information of each mature miRNA and its corresponding clinical information (data format: BCR XML). Please check for speci c baseline data (Table 1). They were randomly divided into two groups: train group (204 cases) and test group (204 cases).

Differential Analysis Of Mrna And Mirna
We adopted the R language edgeR package to normalize and analyze the expression values of mRNA and miRNA sequencing matrices in normal and tumor samples. The screening criteria was set to be | log2FC |> 1 and P < 0.05, aiming at obtaining the expression matrix of mRNA and differentially expressed mRNAs. Afterwards, we combined them with differentially expressed and standardized miRNAs and mRNA expression pro les for subsequencing analysis. Then we used the the "pheatmap" package of the R language package to draw the visualized differential heatmap and volcano map of the differential mRNA and differential miRNA, respectively.

Model Establishment And Validation
We used the "caret" package of version 3.6.1 in R language to randomly divide the samples with complete survival information and differentially expressed miRNA expression pro les into two groups (train group and test group). Then through the survival package of R language, the Coxph function was used to conduct Cox univariate regression analysis on miRNAs in the train group. In order to reduce the number of miRNAs that expressed similarly, a stepwise multivariate Cox regression analysis was performed on miRNAs with p-values <0.005 to construct a prognostic model. In the multivariate Cox regression analysis, we took advantage of the "Coxph" and "direction = both" features of the R survival package. Therefore, based on the sum of the following results, a calculated risk score containing multiple prognostic miRNAs and an extreme coe cient of expression of each miRNA was established. In addition, we tested the hypothesis of proportional hazards in the Cox model. Based on the median risk score grouping, the model was used to evaluate the survival prognosis of each patient in the train group, the test group, and the entire group, using the Kaplan-Meier curve and Log-rank test, ie high-risk group and low-risk group. The predictive power of miRNA models was evaluated by calculating the AUC of the 3-year dependent ROC curve using the "survivalROC" package and its speci city.

Prediction of miRNA Target Genes and Construction of miRNA Regulatory Network
We downloaded miRNA prediction databases from three miRNA target gene prediction websites, including miRTarBase (http://mirtarbase.mbc.nctu.edu.tw/), targetScan (http: // www. Targetscan.org), and miRDB (http (http://www.mirdb.org/). Besides, we used Perl language to nd at least 2 target genes regulated by miRNAs in the database. In the meantime, we adopted the "VennDiagram"package of R software to draw a visualized Wayne diagram. We used the cytoscape software to draw a prepared miRNA and mRNA target gene regulatory relationship data to visualize the miRNA regulatory network diagram.
Gene Ontology Enrichment Analysis and Survival Analysis of Genes in MiRNA Regulatory Network In order to further explore whether these miRNA target genes may be involved in the progression of bladder urothelial carcinoma, we intersected these target genes and differentially expressed genes in bladder urothelial carcinoma and extracted the target genes to be used in bladder urothelial carcinoma which is regulated by miRNAs. All these intersecting genes were analyzed with the R language "clusterPro ler" package and the (KEGG) signaling pathway. Gene ontology (GO) enrichment analysis was further conducted. "Organization.Hs.eg.db" software package set the cut-off criteria as: p adjustment < 0.05 and q value < 0.05, to draw a visualized graph. We used the survival package of R language to conduct batch survival analysis on genes in the regulatory network to visualize the graph.

Construction of Protein Interaction Network Map in MiRNA Regulatory Network and Selection of Core Genes
We used the string website (URL: http://string-db.org/cgi/input.pl) to build the protein interaction network map, set the medium parameter to 0.400, and then downloaded the network relationship map [14]. Then, through the cytoHubba plug-in in the cytoscape software, the central genes were screened based on the degree value and the core gene PPI network map was constructed.

Screening of Differentially Expressed mRNAs and miRNAs
We extracted a total of 7892 mRNAs and 856 miRNAs from the expression pro le of urinary bladder urothelial carcinoma of TCGA. Through R language software, we used the edge R package to set the P value to less than 0.05 and log2FC > 1 as differentially expressed mRNA and miRNA. A total of 1111 differentially expressed genes (including 545 up-regulated genes and 566 down-regulated genes) and 473 differentially-expressed miRNAs were screened.

Roc Curve And Km Survival Analysis
Based on the model, we used the R language programming survival package and survivalROC package to draw relevant graphs. From the gure, we can observe that the area under curve(AUC) of the train group, test group, and all sample groups were 0.830, 0.722, and 0.770 respectively, all of which were greater than 0.7, indicating that this prognostic model has sensitivity and speci city in predicting the survival of patients with bladder cancer. (Fig. 3D-F). At the same time, (Fig. 3A-C) it can be seen that there was a signi cant difference between the high and low risk groups in the train group, the test group, and all sample groups (P < 0.05), and the survival rate decreased as the risk value increased. The train group showed that the 5-year overall survival rates for the lowand high-risk groups were 32.0% and 70.8%, respectively. The test team demonstrated that the 5-year survival rates for the overall high-risk and low-risk groups were 34.3% and 62.7%, respectively. The entire group showed that the 5-year overall survival rate of the high-risk and low-risk groups were 36% and 56.9%. We could observe that (Fig. 3H-I) it was the visualization results of the sample survival status of the train group, test group and all sample groups. From the gure, we could nd that as the risk value increases, the number of death also increases. This evidence suggests that the model was sensitive and speci c in predicting the survival of bladder cancer.

Independence Of Clinicopathological Factors
Univariate Cox regression analysis showed that a model constructed from 7 miRNAs was signi cantly related to overall patient survival (hazard ratio HR = 1.592, con dence interval 95% CI = 1.365-1.856, p = 3.19E-09; Table 3). Multivariate Cox regression analysis indicated that the comprehensive consideration of other commonly-seen clinical factors (HR = 1.605, 95% CI = 1.357-1.898, p = 3.28E-08), and other clinical factors including clinical stage, T stage, lymph node status, and distant metastasis, which still has nothing to do with overall survival rate. In addition, the patient's gender and age make it possible to become a prognostic indicator of bladder cancer in the future. At the same time, the patient's age also serves as an independent prognostic factor (HR = 1.030, 95% CI = 1.002-1.059, p = 0.037). The risk score of the model and other clinical characteristics ROC curves indicate that the risk score (0.698), clinical stage (0.637), T stage (0.623), lymph node status (0.631), and distant metastasis (0.523). These results indicate that this model still has good predictive ability ( Fig. 4). In this regard, we can observe that risk score can be adopted as an independent prognostic factor for predicting the survival of bladder cancer patients.

Prediction And Visualization Of 7 Mirna Target Genes
The seven miRNAs we selected were used to predict target genes from three databases (miRDB, miRTarBase, and Targetscan). We used the R software VennDiagram package to draw a visualized Wayne diagram to nd reliable targets. (Fig. 5) In order to improve the reliability of bioinformatics, we used targets that were supported by all 3 databases, which could be used as targets for miRNAs. From: 4 we can observe that we have screened out 7 miRNA including hsa-miR-33b-5p, hsa-miR-33a-5p, hsa-miR-590-3p, hsa-miR-576-5p, hsa-miR-200b-3p, hsa-miR-186-3p, and hsa-miR-3934-5p,together with 17, 33, 169, 31, 74, 49 of hsa, and 22 overlapping bases. Seven miRNAs were predicted to have a total of 395 target genes.

Construction Of Mirna Regulatory Network
We can see from the regulatory network diagram that in order to explore whether these miRNA target genes may be involved in the progression of bladder urothelial carcinoma, 1111 differentially expressed genes obtained above (among which 545 genes were up-regulated and 566 genes were down-regulated) and 473 differentially expressed miRNAs (380 differentially expressed miRNAs up-regulated and 93 down-regulated) were used for analysis.7 up-regulated miRNAs (hsa-miR-33b-5p, hsa-miR-33a-5p, hsa-miR-590-3p, hsa-miR-576-5p, hsa-miR-200b-3p, hsa-miR-186-3p and hsa-miR-3934-5p) and 247 downregulated mRNAs. (The regulatory relationship between MiRNA and its target genes was shown in Fig. 6.)  Fig. 7A-C). Among these three categories, BP analysis mainly includes cell junction, adhesion, migration, protease phosphorylation, regulation of enzyme activity and smooth muscle cell proliferation and differentiation. CC analysis mainly includes adhesion plaques, actin, extracellular matrix, brotic proteins, etc. MF analysis mainly includes metal ion transmembrane transporter activity, transcription activator activity, DNA binding, and ion channel binding. The result of the KEGG pathway of target genes associated with bladder urothelial carcinoma was 18 (Table 4), of which counts > 10 were mainly enriched in the MAPK signaling pathway, cGMP-PKG signaling pathway, the pi3k signaling pathway, the TGF-β signaling pathway, the interaction of miRNAs in tumors, and the regulation of in ammatory cell interaction. In addition, in order to provide a complex relationship between the target gene and the relative KEGG pathway, we adopted the R language software package to make a visual network diagram of the pathways (Fig. 7).

Construction Of Protein Interaction Network
We used the plug-in cytoHubba calculation in the cytoscape software to sort the network nodes in descending order based on the degree value, select the top ten genes as the central genes, and construct the central network map. The results indicate that 136 of the 247 target genes were ltered to the target genes for the PPI network complex, the top ten genes based on the degree value were (MYC, VCL, MMP2, ITPKB, FOXO1, SMAD7, CDKN1A, SRF, CAV1, KLF4) .(The speci c genetic information were shown in Fig. 8 and Table 5)

Discussion
Bladder cancer has been regarded as a highly malignant tumor that can undergo lymphatic metastasis, hematogenous metastasis, implantation metastasis, which was prone to recurrence. The recurrence rate of non-muscle invasive bladder cancer can reach as high as 50-70%. A small number of patients can progress to muscular invasive bladder cancer after relapse, which severely affects the survival prognosis of patients [15]. Therefore, it was becoming increasingly urgent for patients to nd prognostic markers with high speci city and sensitivity. MiRNAs cause degradation or translational inhibition of target miRNAs through speci c base pairing with target mRNAs, and regulate gene expression at the post-transcription level, thereby regulating tumor cell proliferation and apoptosis [11]. A miRNA can target one or more mRNAs, while multiple miRNAs can also simultaneously targets one mRNA [16]. Therefore, abnormally-expressed miRNAs were believed to play a critical role in tumorigenesis, which may not only be carcinogenic but also tumor suppressive [10]. In 2002, Calin reported the deletion of a small genomic region of 13q14 in patients with chronic lymphocytic leukemia, including the miR-15a and miR-16-1 genes, for the rst time it demonstrated that miRNA abnormalities were closely related to tumors. In 2007, Catto Jw et al [17].. reported the miRNA expression pro les of bladder cancer for the rst time. Since then, with the continuous improvement of detection ability, Chen et al [18] applied Solexa sequencing technology and found that 33 miRNA expressions were up-regulated in bladder cancer tissue compared with normal bladder tissue. The expression of 41 miRNAs was down-regulated, of which miRNA96 was identi ed as the most signi cantly up-regulated and miRNA-490-5p was most signi cantly downregulated. It can be found that miRNAs have played an important role in the occurrence and development of bladder cancer, and its abnormal expression may be related to the survival prognosis of cystic cancer patients.
By using the R language edgeR package for the differential analysis and standardization of miRNA expression pro les from The Cancer Genome Atlas (TCGA), 473 differentially expressed miRNAs and then all patients were randomly divided into train combination test groups. Afterwards, seven miRNA prediction models were constructed by univariate Cox regression and stepwise multivariate Cox regression in the train group. (including hsa-miR-33b-5p, hsa-miR-33a-5p, hsa-miR-590-3p, hsa-miR-576-5p, hsa-miR-200b-3p, hsa-miR-186-3p and hsa-miR-3934-5p). At the same time, the predictive ability of 7 miRNAs was veri ed in the test group and the entire group. According to the median risk score grouping, the Kaplan-Meier curve showed that the overall survival rate of the low-risk group was signi cantly lower when compared with three high-risk groups. The AUC of the train group, test group, and entire group in the ROC curve all exceeded 0.7, indicating the sensitivity and speci city of these 7 miRNA prediction models in predicting the survival of bladder cancer patients. Univariate Cox regression and multivariate Cox regression analysis also suggested that these 7 miRNAs functioned as independent prognostic factors considering the conventional clinical factors of bladder cancer patients. Most of these 7 miRNAs were reported to participate in the research progress of various kinds of tumors. Studies by X, Y et al. [19] found that MiR-33b-5p was low expressed in gastric cancer cell lines. Up-regulated miR-33b-5p may reduce the expression of HMGA2, inhibit the growth of gastric cancer cells, and make gastric cancer cells sensitive to chemotherapy drugs. Studies by H, L, et al. [20]demonstrated that the long-chain non-coding RNA HIF1A-AS2 was overexpressed in osteosarcoma tissues and cells, which could interact with miR-33b-5p and negatively regulate its expression, thereby up-regulating the expression of the target protein SIRT6 of miR-33b-5p. Studies by S, W et al [21]showed that the expression of long-chain non-coding RNA EGOT in hepatocellular carcinoma (HCC) tissues was signi cantly up-regulated. EGOT has a signi cant regulatory effect on the survival, migration and invasion of liver cancer cells. EGOT expression level was negatively correlated with miR-33a-5p expression level. It was also con rmed that EGOT can speci cally bind to miR-33a-5p, thereby reducing its expression and then up-regulating the expression of histone HMGA2. Gliomas has been regarded as the most commonly-seen and the most deadly type of malignant brain tumor. Studies by Y, S, et al [22] demonstrated that miR-590-3p was downregulated and played a tumor suppressive role in glioma cells. L, Z, et al [23] have found that long-chain non-coding RNA linc-PINT can directly regulate miR-543 and miR-576-5p and inhibit the proliferation of esophageal cancer cell lines. It was proposed that linc-pin-mir − 543 / miR-576-5p pathway could predict the prognosis of esophageal cancer and provide new targets for the treatment of esophageal cancer. Studies by L, X et al [24] indicated that the expression of miR-200b-3p and miR-200c -3p in metastatic CRPC was signi cantly down-regulated, and was negatively correlated with the expression level of PRKAR2B in prostate cancer tissues. At rst, M, H, et al. [25]demonstrated that EGFR agonist EREG has played an important role in tamoxifen-resistant breast cancer cells by activating EGFR signaling and its downstream glycolytic genes. It was further proved that EREG was a direct target of miR-186-3p, and tamoxifen downregulation of miR-186-3p could lead to upregulation of EREG in tamoxifen-resistant breast cancer cells [26]. Research by W, Y et al [26] found that miR-3934-5p was signi cantly up-regulated in neuroblastoma tissues and cell lines. Further research by the authors also found that TP53INP1 served as a direct target of miR-3934-5p Gene, miR-3934-5p can negatively regulate it In order to predict the potential biological functions of seven-miRNA signature, target genes of these seven miRNAs were analyzed by KEGG signaling pathway and GO enrichment analysis.GO analysis of target genes mainly demonstrated that it was related with connection, adhesion, migration, protease phosphorylation and regulation of enzyme activity with cells, the proliferation and differentiation of smooth muscle cells, adhesion plaques, actin, extracellular matrix, brosis protein, metal ion transmembrane transporter activity, transcription activator activity, DNA binding, ion channel binding, etc. The signal pathway of target genes was mainly enriched in the interaction of MAPK signal pathway, cGMP-PKG signal pathway, pi3k signal pathway, TGF-β signal pathway, and the miRNAs in tumors to regulate the interaction of in ammatory cells. H, C et al [27]. clari ed that the MAPK pathway played a crucial role in the invasion and metastasis of bladder cancer cells.S, C et al [28]have found that the upregulation of myosin light chain kinase (MYLK) and activation of cGMP-PKG signaling pathway can silence serine / arginine repeat matrix protein 2-selective splicing (SRRM2-AS), thus inhibiting nasopharyngeal carcinoma cell proliferation, colony formation, angiogenesis, blocking the cell cycle and enhancing apoptosis.Studies by Xu, X, et al [29] found that the Pi3K pathway was speci cally activated, vaccine-associated kinase 1 was up-regulateed (VRK1), and then histone H2A was induced to be up-regulated, eventually promoting the progression of osteosarcoma. S, J, et al [30]clari ed that the signaling pathway of transforming growth factor-β (TGF-β) was involved in many cellular processes in mature organisms and developing embryos. These processes include cell growth, cell differentiation, apoptosis, cell dynamic balance and other cell functions. It has been found through research that TGF-β1 signi cantly promotes autophagy and mitochondrial phagocytosis of ovarian cancer cells, and provides a new direction for targeted therapy of ovarian cancer. These signaling pathways affect various tumors to varying degrees, and these four pathways were just the tip of the iceberg of target genes involved in signaling pathways, suggesting that the miRNA prognosis model we constructed for bladder cancer may also be related to these tumor signaling pathways.
In order to nd the key nodes that regulate the prognosis model of bladder cancer miRNA, the network nodes were sorted in descending order based on the degree value according to Cytoscape and its cytoHubba, and the top ten genes were selected as the central genes (MYC, VCL, MMP2, ITPKB, FOXO1, SMAD7, CDKN1A, SRF, CAV1, KLF4). In addition, Kaplan-Meier showed that the expression of 39 genes (ACCTC1, AMOTL2, CALD1, CAV1, CLIC4, CPE, CSRP1, CXCL13,   DCN, DKK3, DPYSL2, DUSP3, FAM168B, FILIP1L, FLNC, FSTL1, GSN, IL33, KLHL21, LAMC1, LIX1L, LRP1, MEF2A, MFAP4, MSRB3, MYC, MYLK, NFIC, PTGER4,   RASD1, SBDS, SGCB, SLC39A14, SYNM, SYNP02, TIMP2, TMPRSS11E, VCL, WWTR1) was negatively correlated with survival prognosis, while the high expression of TRIB1 level was positively correlated with the survival prognosis. Among them, MYC, VCL, and CAV1 were not only the central genes of the miRNA prognosis model, but their expression was also negatively correlated with the survival prognosis. MYC genes were a group of oncogenes which were discovered earlier, including C-MYC, N-MYC, L-LYC. The MYC gene family and its products can promote cell proliferation, immortalization, dedifferentiation and transformation, etc, which has played a critical role in the formation of many types of tumors [31]. Related studies have shown that MYC gene was affected by DNA ampli cation in some types of tumors. MYC gene was ampli ed at a higher frequency in small cell lung cancer and was also ampli ed in other types of tumors such as breast cancer [32,33].The VCL product was a highly conserved protein involved in cell proliferation, migration, and adhesion. Studies have shown that high expression degree of VCL may lead to signi cantly lower overall survival rates in patients with gastric cancer. Studies by S, F et al. have shown that Caveolin-1 (CAV1) functioned as the key component of caveolae protein. The CAV1 polymorphism affected mRNA expression through the effect of mutant alleles and has been proven to affect tumorigenesis, including bladder cancer, colon cancer, liver cancer, stomach cancer, breast cancer and lung cancer [34]. Therefore, miRNA may affect the progress and prognosis of bladder cancer by regulating MYC, VCL, and CAV1.
In summary, our study not only constructed a new miRNA model related to the prognosis of bladder cancer through the expression pro le of miRNA, but also veri ed and evaluated the predictive ability of the model, and found that it could be used as an independent prognostic factor for bladder cancer. In addition, inferring its potential functions by predicting the target genes of the model has enhanced our understanding of the development of bladder cancer. However, this was only a study of the TCGA database based on bioinformatics. It is hoped that other databases and a large number of experiments will be able to verify the feasibility of this prognostic model and provide reliable prediction and treatment targets for bladder cancer patients.

Declarations
Declarations Figure 1 Unsupervised hierarchical clustering heatmap and volcano plot of the differential expression between 7892 mRNAs and 856 miRNAs in the expression spectrum of bladder uroepithelial carcinoma based on TCGA.     Sub network map of miRNA regulating mRNA. The quadrangle represents miRNA. The oval stands for mRNA. Red means upregulated, green means downregulated, and yellow means both.