Comprehensive Analysis of a ceRNA Network Reveals Potential Prognostic lncRNAs Involved in Progression of Bladder Cancer

Background: The aberrant expression of long non-coding RNAs (lncRNAs) has attracted more and more attention in the biological eld of bladder cancer(BC). We aim to construct a competing endogenous RNA(ceRNA) network reveals potential prognostic lncRNAs involved in progression of BC. Results: Expression proles of messenger RNA(mRNA), micro RNA(miRNA) and lncRNA of 397 BC samples and 19 non-tumor tissues were downloaded from The Cancer Genome Atlas(TCGA) database. Patients with BC were randomly divided into training group (n=198) and validation group (n=199). Then, 130 lncRNAs, 159 miRNAs and 2,048 mRNAs were identied as differentially expressed genes (DEGs, |logFC|>1, FDR<0.01) related to BC progression. Nextly, we constructed an BC associated deregulated competing endogenous RNA(ceRNA) network with 70 lncRNAs, 30 miRNAs, and 62 mRNAs involved in. Subsequently, a seven-lncRNA signature was constructed by establishing a LASSO Cox model with 13 lncRNAs associated with survival from the ceRNA network. This signature can well distinguish high-risk patients from low-risk patients in training group and verication group. Furthermore, we combined the risk score model with other clinical ctures to estimate the ability of survival prediction. The result suggested that the risk score can be selected as an independent prognostic factor for overall survival(OS) rate. Conclusion: In this study, we construct a ceRNA network related to progression of BC and established an seven-lncRNA signature, which was a candidate prognostic biomarker for prognostic prediction of BC patients .


Background
In developed countries, BC is the rst most incident neoplasm of urinary system. In addition, the incidence of BC is steadily rising worldwide and four-fths of all BC patients are men [1,2]. Although the endoscopic treatment, radical surgery, chemotherapy and immunity therapy has progressed vastly, high recurrence rate of BC(30-70%) and poor prognosis, especially for patients with metastatic BC, are still troubling [3]. The 5-year relative survival will decrease from 69.2-36.5% once the tumor transforms localized stage(con ned to primary site) to regional stage(spread to regional lymph nodes) [4]. Therefore, exploration of potential molecular mechanisms that may trigger BC and promote BC progression is crucial. And it will help to identify novel prognostic molecular biomarkers and and effective therapeutic targets.
The occurrence of BC is related to the mutation of a variety of genes, like TP53, MLL2, HER-2, HRAS, Bcl-2, etc [5,6]. In recent years, the important function of non-coding RNAs (ncRNAs) in the occurrence and progression of BC has been demonstrated in accumulating evidence [7]. LncRNAs are ribonucleic acid molecules with a length of more than 200 nucleotides, which have no protein-coding function. Originally, they were thought to be the by-products of RNA polymerase II transcription and noises of gene transcription [8]. However, recent studies suggested that numerous lncRNAs are involved in all essential biological processes in living cells [9]. Aberrantly expressed lncRNAs are related to cell proliferation, migration, apoptosis, invasion and metastasis in various tumors. Speci c lncRNAs are digged out during tumorigenesis and metastasis of BC [10]. For example, experiments in vivo showed that knocking down metastasis-related lung adenocarcinoma transcript 1 (MALAT1) can inhibit the metastasis of BC [11]. In addition, the silencing of taurine up-regulated gene 1 (TUG1) inhibits the proliferation and initiates apoptosis of BC cells [12]. LncRNA-ATB activates PI3K/AKT and mTOR signaling pathways in bladder cancer so that the cell viability, migration and invasion is promoted [13]. All these ndings strongly indicated that some speci c lncRNAs could be used as prognostic biomarkers and therapeutic targets for BC. The study of lncRNAs in BC and normal bladder epithelial cells will help to expound the pathogenesis of BC.
According to the hypothesis of ceRNA, circular RNAs(circRNAs), lncRNAs, miRNAs and mRNAs are involved in the ceRNA interaction network. MiRNAs can bind mRNAs to induce gene silencing, while lncRNAs or circRNAs regulate gene expression by competitively binding miRNAs. CeRNAs can be combined with miRNA through response elements (micro RNA response elements, MREs) to up-regulate genes suppressed by microRNA [14]. CeRNAs usually contain multiple MREs, so each ceRNA can regulate multiple miRNAs, and each miRNA can be also regulated by multiple ceRNAs [15].
Experiments and bioinformatics have identi ed lncRNA-miRNA-mRNA ceRNA networks in various physiological and pathological processes to initiate and develop carcinoma [14,16]. However, the lack of large samples and research approaches hinder integrative analyses of the ceRNA mechanisms of lncRNAs in BC progression. Encouragingly, the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) co-established a cancer research project, The Cancer Genome Atlas (TCGA, https://www.cancer.gov/) platform, which provides a large, free cancer research reference database by collecting and sorting out various cancer-related omics data. At present, a total of 33 cancer types are collected, more than 2 PB of clinical pathology and corresponding bioinformatics data. The data is free and public, which greatly helps cancer researchers to better investigate the tumorigenesis, recurrence and metastasis of cancer [17].
In the current study, RNA sequencing data of 397 BC tissue samples and 19 adjacent nontumor bladder tissue samples were obtained from from TCGA database. Differentially expressed lncRNAs, miRNAs, and mRNAs between early tumor stages (I/II) and normal tissues were screened, and those differentially expressed genes (DEGs) between late tumor stages (III/IV) and normal tissues were identi ed as well.
Subsequently, based on the miRcode, TargetScan, starBase and miRTarBase databases, a ceRNA regulatory network for BC progression was constructed by the intersecting 130 lncRNAs, 159 miRNAs, and 2,048 mRNAs of the two comparation above. Finally, a risk score system containing 7 lncRNAs signi cantly affecting BC patient prognosis is constructed by means of Lasso penalized Cox proportional hazards regression analysis. Moreover, we analyzed the correlations between the risk score system and clinical characteristics of BC. Our understanding of the function of lncRNAs in the ceRNA network of BC progression will be improved and potential diagnostic biomarkers for BC may be identi ed with the ndings from the present study.

Analysis of DElncRNAs, DEmiRNAs, and DEmRNAs
To better understand genes associated with tumorigenesis and progression of BC, the training set was rstly divided into three groups : one group included 19 non-tumor adjacent tissue samples, one comprising 65 relatively early stage(stage I/II, lesions restricted in bladder) BC samples, and one comprising 133 late stage (stage III/IV, lesions exceed bladder wall) BC samples. There was no signi cant difference in clinical features between the two datasets (all P > 0.05). Table 1 lists the detailed features. After extracting the expression matrix from the training set, the differentially expressed RNAs were analyzed with "limma" package of R software. |logFC|>1 and FDR < 0.01 were used as cut-off criteria. The DElncRNAs, DEmiRNAs, and DEmRNAs were screened out by comparing the tumor groups with the normal group and visualized by volcano maps ( Fig. 2A and 2B).  Construction of a lncRNA-miRNA-mRNA regulatory network To investigate the regulatory interaction of the intersecting 130 speci c DElncRNAs, 159 DEmiRNAs and 2,048 DEmRNAs above, lncRNA-miRNA interactions were predicted with miRcode and miRNA-mRNA pairs were acquired by all three databases(TargetScan, Starbase and miRTarBase). However, based on the expression of miRNAs in tumor reverse to that of target lncRNAs and mRNAs, we only select the lncRNA-miRNA-mRNA interaction as its expression pattern is up-down-up-regulation or down-up-down-regulation in BC tissues. Finally, the lncRNA-miRNA-mRNA ceRNA network was constructed. And it was visualized with Cytoscape. The nal BC ceRNA regulatory network included 70 DElncRNAs,30 DEmiRNAs, and 62 DEmRNAs, consisting of 162 nodes and 421 interactions( Fig. 3A and Table S1).

Functional enrichment analysis
To explore the functions of differentially expressed genes in BC occurrence and progression, 62 DEmRNAs in the newly formed ceRNA network were analyzed using the GO terms and the KEGG pathways. The top GO results(including 10 biological processes(BPs), 4 cellular components(CCs) and 10 molecular functions(MFs)) and 10 KEGG pathways are shown in Fig. 3B and 3D. GO analysis demonstated that these mRNAs were highly related to binding mutiple biological factors in molecular function, including activating transcription factor binding, beta-catinin binding, SMAD binding, transmembrane receptor protein kinase ativity and cytokine receptor binding. For the cellular component, the mRNAs were highly associated to apical dendrite and transcription-related complex that located in cell nucleus. Meanwhile, these mRNAs were found mainly enriched in cell cycle G1/S phase transition and epithelial cell proliferation in biological processes. P < 0.05 and Benjaminni corrected P < 0.05 were set as lter criteria for the GO enrichment networks. The results of KEGG pathway enrichment analysis suggested that cancer-related pathways were the most important pathways for these selected genes(microRNA in cancer,colorectal cancer, prostate cancer, breast cancer,gastric cancer and melanoma). The other pathways also involved many screened genes including hepatitis B, signaling pathways regulating pluripotency of stem cell and cellular senescence.

Survival analysis of genes in the ceRNA network
The training set was rstly analyzed to identify the potential lncRNA, miRNA and mRNA biomarkers which were strongly associated with the prognosis of BC patients. We conducted K-M survival analyses for lncRNAs, miRNAs, and mRNAs in the ceRNA network to analyze the impacts of each gene. At last, we screened out 7 lncRNAs, 3 miRNAs, and 11 mRNAs which were associated with OS of BC patients (Table  S2). According to rank of the relationship between the expression level and prognosis of BC patients, K-M survival curves of top three lncRNAs, miRNAs, and mRNAs were shown in Fig. 4A, 4B, and 4C, respectively.

Establishment of BC prognostic signatures
In the ceRNA regulatory network, lncRNAs play crucial roles as initial effectors to in uence miRNAs and mRNAs downstream. Therefore, lncRNAs may be ideal diagnostic or prognostic biomarkers due to their status and highly speci c expression. After extracting the prognostic information of the traning set, univariate Cox regression analysis were performed for the 70 lncRNAs in the ceRNA network to identify lncRNAs which were associated with OS of BC patients. With cut-off for relevance set at P < 0.05, 12 lncRNAs were considered signi cantly correlated with overall survival (Table S3). Lasso-penalized Cox regression for these 12 lncRNAs were then performed to determine the independent predictors for prognosis of BC patients( Fig. 5A and 5B). Ultimately, a 7-lncRNA signature prognostic model with OS was constructed: AC004803.  Fig. 5C. The maximally selected rank statistics were estimated. Based on the cut-off for risk score set at "-3.2118", the patients were classi ed into low-risk group (n = 99) and high-risk group (n = 99) (Fig. 5D). According to K-M survival and timedependent ROC curve analyses shown in Fig. 5E, these two groups were well distinguished regarding overall survival. Compared with the low-risk group, the OS survival rate of the high-risk group tended to become lower (P < 0.001). We conducteded ROC curve analysis to analyze the accuracy of the 7-lncRNAbased risk score model. The value of area under curve(AUC) at 1,3,5 years was 0.665, 0.675 and 0.661 (Fig. 5F). Figure 6A reveals a scatter plot of the risk scores, the corresponding OS of 198 BC patients and the expression heat map of the 7 lncRNAs.
Veri cation for survival prediction of the risk score system The validation set (n = 199) was then used to test the ability of the 7-lncRNA signature for survival prediction. After applying the cut-off value of the 7-lncRNA signature to the validation set, the result of analysis was consistent with the ndings of training set. In the validation set, K-M survival analysis suggested that the prognosis of BC patients was obviously worse in the high-risk group (n = 100) than in the low-risk group (n = 99) (P = 0.037) (Fig. 5G). The AUC of the risk score signature at 1,3,5 years was 0.619, 0.646 and 0.646 in the validation set (Fig. 5H). Figure 6B reveals a scatter plot of the risk scores, the corresponding OS of 198 BC patients and the expression heatmap of the 7 lncRNAs of validation set. Thus, the desired accuracy prediction ability of this prognostic model was shown in both the training set and the validation set.
Prognostic value of the risk score system Clinical information of all BC patients involved in this study was obtained through TCGA. Univariate and multivariate Cox regression analyses were subsequently conducted to estimate the ablity of prognostic prediction of the 7-lncRNA signature in training set, validation set and entire set respectively. The risk score, N stage, and pathological stage were signi cantly associated with OS in BC patients (P < 0.05) all for the three datasets according to the result of univariate analysis. However, only the risk score and N stage were considered independent prognostic factors of OS by multivariate analysis (P < 0.05, Table 2).

Discussion
As one of the most common tumors of urinary system, bladder cancer carried a poor prognosis with hundreds of thousands of new cases and deaths every year [2]. Currently, clinician usually take the TNM staging system as the most important standard for prognostic predictions for BC patients, which was developed by the American Joint Committee on Cancer [18]. However, the TNM system distinguishes different cancer stages by anatomical disease progression. It is di cult to explain why patients with same TNM stage exhibit variable responses to the similar therapy. This phenomenon prompts us to pay attention on the genomic heterogeneity of malignant lesions. Therefore, we aimed to screen several effective biomarkers for earlier BC diagnosis and prognosis prediction.
Without function of protein coding, lncRNAs are attracting as miRNA sponges or ceRNAs to regulate posttranscriptional processes or gene expression indirectly [22]. In recent years, various ceRNA networks have been reported in which lncRNAs regulate mRNA expression by interacting with miRNAs. As a miRNA sponge distributed mostly in the cytoplasm, the up-regulated expression of lncRNA DANCR is detected in BC. It positively regulates the expression of MSI2 by adsorbing miR-149, and thus promotes BC cell proliferation, migration, invasion and tumorigenicity [23]. HOTAIR is a well-known lncRNA involved in various tumors. In breast cancer, the high expression of HOTAIR inhibits cell apoptosis, facilitates cell growth and metastasis through miR-20a-5p/HMGA2 axis [24]. What's more, due to the initial status in ceRNA network, lncRNAs can be taken as potential diagnostic or prognostic biomarkers for cancer patients. In the hepatocellular carcinoma(HCC), a ceRNA network with 37 lncRNAs, 10 miRNAs, and 26 mRNAs included was constructed based on TCGA data. And then by means of multivariate Cox regression and lasso analysis, a risk score system consisting of 13 lncRNAs was established for prognositic prediction of patients with HCC [25]. Nevertheless, the ceRNA network of BC progression and the prognostic values of lncRNA signature in the ceRNA network have been rarely investigated. Therefore, the study of RNA interactions in BC progression will help us to a better understand the biological processes in tumor.
In the present study, RNA sequencing data of BC was rstly extracted from TGGA database. We randomly devided the BC patients into two group, training set and validation set. Differentially expressed lncRNAs, miRNAs, and mRNAs were screened by comparing early tumor stages (I/II) tissues with normal tissues, late tumor stages (III/IV) with normal tissues respectively according to data of training set. Then the overlapping 130 lncRNAs, 159 miRNAs, and 2,048 mRNAs in were identi ed. We screened the lncRNA-miRNA-mRNA chain as its expression pattern is up-down-up-regulation or down-up-down-regulation in BC tissues after predicting the interactions between lncRNA and miRNA, miRNA and mRNA. Then a BCassociated ceRNA regulatory network including 70 DElncRNAs, 30 DEmiRNAs, and 62 DEmRNAs was constructed. Nextly, the functions and signaling pathways in which these dysregulated mRNAs were involved were determined by means of GO and KEGG analyses. We performed K-M survival analysis to dig out the RNAs in this network which were signi cantly associated with OS. Subsequently, we established a 7-lncRNA-based risk prediction model (AC004803.1, AC009690.2, AC055713.1, AC105942.1, BX890604.1, MIR200CHG, RNF139-AS1) via univariate Cox regression and LASSO regression analysis. According to the risk score, the patients were devided into low-risk and high-risk groups in the K-M survival analysis. For the time-dependent ROC curve, the AUC values of one year, three years and ve years were 0.665, 0.675 and 0.661 respectively, showing moderate e cacy in prognostic prediction. Then we veri ed the prognostic value of the mutiple lncRNAs signature in the validation set. We also conducted univariate and multivariate Cox regression analyses to combine the risk score system and clinical characteristics, and the result suggested a good predictive power of the risk score, which can be an independent factor for prognosis. Finally, based on ceRNA network theory, the correlation between lncRNAs and mRNAs was explored.
Through GO and pathway analyses, we can explore the functions of dysregulated mRNAs in the ceRNA network. Several GO terms of the differentially expressed mRNAs belonged predominantly to the following categories: "epithelial cell proliferation", "mesenchymal cell proliferation", "cell cycle G1/S phase transition", "regulation of cell cycle G1/S phase transition" and "lymphocyte differentiation", suggesting that the alteration of cell cycle, cell proliferation and immune response may play crucial roles in BC progression. The result of KEGG pathway analysis showed that mRNAs were most enriched in "microRNA in cancer", "breast cancer", "gastric cancer", "hepatitis B", "proteoglycans in cancer", "signaling pathways regulating pluripotency of stem cells", "cell senescene", "melanoma", "colorectal cancer" and "prostate cancer". This indicated that different cancer types may share the common abnormal signaling pathways.
Among the 7 lncRNAs involved in the signature model, limited studies were performed on the role of MIR200CHG. Previously, an up-regulated expression of MIR200CHG was demonstrated in colon adenocarcinoma. MIR200CHG was identi ed as an immune-related lncRNA and independent biomarker for prognositic prediction of patients with colon adenocarcinoma [26]. MIR200CHG was also included in a prognostic four-lncRNA based Risk Score evaluation model for urothelial bladder carcinoma. The upregulated expression of MIR200CHG was observed in tumor than normal tissues in bladder cancer. However, the Risk Score system showed that MIR200CHG acts as a protective factor in BC prognosis [27]. The result may be contributed to the various functional mechanisms of lncRNAs, which regulated expression of multiple mRNAs by competitively bind miRNAs. In our study, the upregulated expression of MIR200CHG and its protective action in BC progression was consistent with the study above. This indicates that different regulatory function of MIR200CHG may be involved in BC tumorigenesis and progression.

Conclusions
In conclusion, we identi ed dysregulated lncRNAs, miRNAs and mRNAs related to the occurrence and development of BC, and constructed a BC-associated ceRNA regulatory network. By univariate Cox regression and LASSO regression analysis, we constructed a seven lncRNA prognostic model for prognostic prediction of BC patients. According to further analysis, the 7-lncRNA signature may be an independent factor for prognosis BC patients. The prognostic model presented satisfying accuracy in OS prediction. The practicability of the prognostic model are necessary to be veri ed by more extensive investigations.

RNA sequence data acquisition
The ow chart of our study was showed in Fig. 1

Data processing and identi cation of DEGs
After annotating the genes for lncRNA and mRNA, we standardized the expression pro les among samples. Then, the "limma" package of R software was applied for differentially expressed lncRNAs, miRNAs, and mRNAs analyses with two comparisons: early stage (stage I/II) BC samples versus adjacent non-tumor samples and late stage (stage III/IV) BC samples versus adjacent non-tumor samples. The cutoff criteria were set as |logFC|>1 and FDR 0.01. The "ggplot2" package was used to visualize the results with volcano plots. The intersection of DEGs between two comparisons was visualized by Venn diagrams for further analysis.

Function and pathway enrichment of DEmRNAs
The "clusterPro ler" package of R was applied to perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) to analyze the DEmRNAs. A GO analysis covered three domains: molecular functions (MFs), cellular components (CCs) and biological processes (BPs). KEGG was used to predict possible pathways. The signi cant GO terms were identi ed by Fisher's test, and GO categories with P < 0.05 were selected to be the enriched functions. The possible signaling pathways at the signi cance level set (P < 0.05) were searched for with KEGG.

Kaplan-Meier analysis
The "survival" package of R was used to perform Kaplan-Meier (K-M) survival analyses to explore the association between the DElncRNA, DEmiRNA, DEmRNA and overall survival(OS) of patients individually. We divided the patients into 2 groups based on median expression of the DEGs for the OS analysis using log-rank test with P < 0.05 considered statistically signi cant.

Construction of the risk score system
To verify the accuracy of risk score system, the patients were randomly divided into two groups according to the ratio of 1:1 (the training set = 198 and validation set = 199) by hierarchical grouping. Chi-square test was conducted to analyse the differences of clinical features between two groups. Then, we performed Lasso-penalized Cox regression to identi ed the DElncRNAs related to survival of BC patients to simplify the model. Firstly, the penalized maximum likelihood method was applied to generate a Cox model. Subsequently, we used ten-fold cross-validation to screen out the best lambda which was used to assure the mean cross-validated error minimization. And it was also applied in calculating the regression coe cients (β) of the multivariate Cox regression model. Ultimately, a multiple lncRNAs prognostic risk score system was constructed. The risk score formula for predicting OS was constructed as follows: where n is the number of lncRNAs involved, Exp i is the expression value of each lncRNA and C i is the corresponding estimated regression coe cient from the multivariate Cox regression analysis. According to the median risk score of the training set, the patients were divided into low-risk and high-risk group both in the training set and the validation set. The abilities of OS prediction and risk discrimination of the risk score system were evaluated by means of and K-M survival analysis and time-dependent receiver operating characteristic (ROC) curves.

Regression analysis of risk score and clinical characteristics
In order to test whether the clinical characteristics and risk score level of BC patients were signi cantly related to OS, we conducted univariate and multivariate Cox regression analysis in training set and validation set. P 0.05 was considered statistically signi cant. Calculation of the hazard ratio and 95% con dence intervals for every factor was performed as well.

Correlation analysis between DElncRNAs and DEmRNAs
To estimate the expressive correlation among DElncRNAs and DEmRNAs, we conducted regression analysis of DElncRNAs and DEmRNAs. The "ggpubr", "tidyverse", "corrplot" packages of R were used to visualize the results. P 0.05 and r 0. 25     Construction and validation of risk score system. Best lambda was calculate to assure a minimum mean cross-validated error by ten-fold cross-validation (A). Red dots represent partial likelihood deviance; solid vertical lines indicate their corresponding 95% CI; the left dotted vertical line is the value of lambda that gives minimum cvm, named lambda.min; the right dotted vertical line is the largest value such that error is within 1 standard error of the minimum, named lambda. 1se. The selection of the optimal cutoff and survival curve based on risk score. The coe cient values at varying levels of penalty (B). Each curve represents an lncRNA. Distribution of densities for low-and high-risk score BC patients(C). Risk scorerelated standardized log-rank statistics(D). Maximally statistic was de ned as the optimal cutoff value. Kaplan-Meier survival curve of two groups in training set(E). Time-dependent ROC curves based on risk score level in training set (F). Kaplan-Meier survival curve of two groups in validation set(G). Timedependent ROC curves based on risk score level in validation set (H).ROC: receiver operating characteristic.

Figure 6
Risk score analysis of 7 DElncRNAs. Training set(A).Validation set(B).The above scatter plot displays the risk score of 7 DElncRNAs, medium scatter plot displays the survival state of BC patients, and the below heatmap exhibits the DElncRNA expression pro les in each BC patients with survival data. Red is de ned as high expression, and blue is de ned as low expression.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.