Construction of An Immune-Related CeRNA Network as a Prognostic Signature in Bladder Cancer

Background: Immunotherapy has been proved to be effective for bladder cancer (BLCA). However, the molecular network involved in BLCA tumor immune response remains unclear. This study aims to construct an immune-related ceRNA network and to identify the prognostic value. Methods: Based on The Cancer Genome Atlas (TCGA), we used single-sample gene set enrichment analysis (ssGSEA), weighted gene co-expression network analysis (WGCNA) to determine immune-related mRNA, lncRNA and miRNA. Then least absolute shrinkage, and selection operator (LASSO) and Cox regression were performed to identify the mRNAs with high prognostic value, and accordingly, the risk score was calculated. Internal and external validation were performed both in TCGA and GSE13507 with Kaplan-Meier (KM) survival and Receiver Operating Characteristic (ROC) curve analysis. Using the immune-related mRNA, lncRNA and miRNA, a ceRNA network was established via MiRcode, starBase, miRDB, miRTarBase and TargetScan. Besides, we also explore the relationship between the risk score and immune cell inltration via CIBERSORT algorithm. Results: 5 mRNAs (PCGF3, FASN, DPYSL2, TGFBI and NTF3) were ultimately identied, and KM survival analysis displayed the 5-mRNA risk signature could predict the prognosis of BLCA with high ecacy both in TCGA (p = 1.006e-13) and GSE13507 (p = 7.759e-04). Using miRNA targeting molecular prediction database, an immune-related ceRNA network, including 5 mRNAs, 24 miRNAs and 86 lncRNAs, was constructed. Memory B cells, activated dendritic cells, and regulatory T cells inltration into tumors were negatively correlated with risk score, while the inltration levels of macrophages M0, M1 and M2 were positively correlated with risk score. Conclusion: This study helped to better understand the molecular mechanisms of tumor immune response from the view of ceRNA hypothesis, and provided a novel prognostic signature for bladder cancer.


Introduction
Bladder cancer (BLCA) is among the most common malignancies worldwide with high morbidity and mortality 1,2 . Most BLCA patients still need postoperative adjuvant therapy even after radical cystectomy which remains the main treatment for BLCA, especially for muscle invasive bladder cancer 3 . Adjuvant chemotherapy has been widely accepted as a feasible measure to improve the prognosis of BLCA 4 . Currently, immunotherapy mainly including traditional Bacillus Calmette-Guérin (BCG) and emerging immune checkpoint inhibitors (ICIs), which could strengthen anti-tumor immunological ability in theory, has been demonstrated to be effective for BLCA, providing new choice for patients with advanced bladder cancer or resistance to conventional medical practice 5,6 . Thence, it is of high signi cance to explore the immune-related prognostic indicators and relevant biological mechanisms based on immunobiology research.
The competing endogenous RNA (ceRNA) hypothesis, in which lncRNA, mRNA and circRNA could regulate the expression level of each other by competitive binding miRNAs response elements 7 , has been proved to be involved in carcinogenesis of various tumors by emerging evidence 8,9 . Establishment of a ceRNA network could provide novel insights into tumor regulatory networks. The tight relationship has been found between tumor-in ltrating immune cells and ceRNA networks in various tumors like head and neck squamous cell cancer 10 , soft tissue sarcoma 11 and mesothelioma 12 , while this kind of research on BLCA has not been received enough attention. Previous study has successfully constructed a regulatory ceRNA network, which could be a prognostic predictor for BLCA patients and displayed potential biological mechanisms from the angle of ceRNA hypothesis 13 . However, this study was only based on The Cancer Genome Atlas (TCGA) database and external validation was lacking. Besides, the analysis of the roles ceRNA played in tumor immune response of BLCA, especially based on large datasets, has not been illustrated completely, which is urgently demanded.
In this study, we aimed to screen hub immune-related genes including lncRNA, mRNA and miRNA via single-sample gene set enrichment analysis (ssGSEA), weighted gene co-expression network analysis (WGCNA) and other bioinformatical tools, and to construct a lncRNAs-mRNAs-miRNAs ceRNA network based on TCGA. Moreover, an immune-relevant prognostic signature was established though Kaplan-Meier (KM) survival analysis, LASSO regression and Cox regression, and the robustness of this model has been veri ed in TCGA-BLCA dataset and GSE13507 dataset, respectively. We also explored the correlation of the risk signature with tumor-in ltrating cells (TICs) proportion and tumor immune microenvironment by means of CIBERSORT and ESTIMATE algorithm.

Data collection
The RNA-seq expression matrix and corresponding clinicopathological parameters of patients with BLCA were extacted from TCGA on UCSC website (https://xena.ucsc.edu/) 14 . The gene expression data were downloaded in the forms of "HTSeq-Counts" and "HTSeq-FPKM" for genomic expression divergence detection and prognostic model building. A total of 430 samples, including 411 BLCA samples and 19 normal samples, were included in this study. With the annotation le downloaded from GENECODE (version 22, GRCh38), Ensembl IDs were transformed into gene symbol and the expression matrix of lncRNAs and mRNAs were singled out. The external validation dataset was acquired from GSE13507 15 in Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/). The series matrix le, along with the detailed survival information, of 165 enrolled BLCA patients in GSE13507 were downloaded directly from GEO website. All these data were analyzed with R software (version 3.5.3).
2. Identi cation of immune-related lncRNAs, mRNAs and miRNAs via ssGSEA Gene markers of 29 immune-related signatures, including immune cells, immune-relevant pathways and immune-relevant functions, were obtained from previous studies 16,17 . Gene Set Variation Analysis (GSVA) package 18 of R was utilized to evaluate the activation and in ltration level of these immunerelated biological events according to the expression pro le of BLCA from TCGA. Based on the ssGSEA score matrix, 411 BLCA patients were grouped into high-and low-immune-in ltration with "hclust" R package. To evaluate the potency of ssGSEA grouping, Stromal score, Immune score, ESTIMATE score and tumor purity, which were obtained from ESTIMATE algorithm 19 on the basis of BLCA gene expression pro le, were analyzed. The expression divergence of human leukocyte antigen (HLA) were veri ed between two immune-related groups by Wilcoxon sum rank test. KM survival analysis was used to compare the survival rate of these two groups. To ascertain the immune-related lncRNAs, mRNAs and miRNAs, "edgeR" package 20 was implemented to detect the differentially expressed mRNAs (DEmRNAs), lncRNAs (DElncRNAs) and miRNAs (DEmiRNAs) between high-and low-in ltrating groups based on the criteria of |log2FC| > 0.5 and p < 0.05. The same ltering criteria was applied for genomic divergence analysis between normal and cancer groups in order to select the genes both correlated immune response and tumorigenesis, and venn diagrams were used to visualize the interaction analysis.

Weighted gene co-expression analysis (WGCNA)
To further identify the immune-related lncRNAs, WGCNA R package 21 was conducted to construct a lncRNA co-expression network to determine the gene modules with high correlation with immune score, which could re ect the ratio of immune component of tumor microenvironment (TME) and was acquired from ESTIMATE algorithm mentioned above. First, Pearson correlation coe cients of paired genes were calculated, and a weighted adjacency matrix was constructed in light of the formula amn = |cmn|β (amn: adjacency between two paired genes; cmn: pearson correlation coe cient of two genes; β: softthresholding parameter, which could strengthen the robust correlation and penalize the weak correlation of genes). Second, the optimal β value was selected to establish a scale-free co-expression network. After transforming the adjacency matrix into a topological overlap matrix (TOM), hierarchical clustering was performed to assign genes with analogous expression pattern into the same module 22 with a minimal gene size of 50. Next, the correlation between the modules and Immune score was explored, and the modules with high and statistically signi cant module signi cance (MS) score, which is de ned as the average log10-transformed p-value of the correlation between the genes in the module and Immune score, were chosen. The lncRNAs in these modules were regarded as the immune-related lncRNAs.

Construction of a ceRNA network
MiRcode (http://www.mircode.org/) 23 , a database used for targeted miRNA prediction, was implemented to forecast the miRNAs with potential to bind with the lncRNAs obtained from WGCNA. To avoid falsepositive predictive results as possible, then miRDB (http://www.mirdb.org/) 24 , miRTarBase (http://www.mirtarbase.cuhk.edu.cn/ ) 25 and TargetScan (http://www.targetscan.org/vert_72/) 26 were all used for mRNAs prediction based on the miRNAs obtained from the previous step analysis. The interaction analysis was performed between these mRNAs and immune-relevant DEmRNAs. After detecting the hub prognostic mRNAs, the targeted miRNAs and lncRNAs were determined via miRcode and starbase (http://www.starbase.sysu.edu.cn/) 27 , respectively. The interaction of the ceRNA regulatory network was visualized by means of Cytoscape (version 3.8.0). 5. Development and validation of an immune-related risk signature 399 BLCA patients with complete overall survival (OS) information from TCGA were randomly grouped into a training dataset (n = 280) and an internal validation dataset (n = 119). Considering the probable unrelated wherefore of death, the patients with the survival time of less than one month were removed from the study. The overlapped mRNAs, which have been codetermined by genomic difference detection and lncRNAs prediction by the mentioned-above methods, were chosen for further survival analysis. First, according to the median expression of genes, patients with BLCA were divided into two groups, and KM survival analysis was then performed with the criteria of p < 0.05 via "survival" R package. Following that, univariate Cox regression was utilized to explore the correlation between hub mRNAs expression level and OS using the same R package. The genes with p < 0.05 were singled out for least absolute shrinkage, and selection operator (LASSO) regression, which was conducted by "glmnet" R package. At last, multivariate Cox analysis with stepwise regression was carried out to set up a prognostic model, and the risk score of each BLCA patient was calculated. With "survivalROC" R package, Receiver Operating Characteristic (ROC) curves were drawn to validate the effectiveness and robustness of the risk signature in the training dataset, the internal validation dataset, the entire TCGA-BLCA dataset and the external validation dataset.
Meanwhile, after ascertaining the optimal cut-off value of risk score by X-tile software 28 , all BLCA patients including 399 cases from TCGA and 165 cases from GSE13507 were divided into high-and lowrisk group. Subsequently, KM survival analysis was conducted to recon rm the prognostic value of the risk score. Besides, we also used the external validation dataset (GSE13507) to explore the respective prognostic value of the mRNAs with p < 0.05 in the prognostic model. KM survival analysis was performed and X-tile helped to determine the optimal cut-off value. In addition, univariate Cox regression was performed to validate the prognostic value of the predicted miRNAs and lncRNAs in the ceRNA network via "survival" R package.

Construction of a risk-score-based nomogram
Clinical phenotype and the immune-related risk score from TCGA-BLCA were transformed into binary variables, respectively. Then univariate Cox regression and multivariate Cox regression were conducted not only to identify whether the risk score was an independent predictor for BLCA prognosis, but also to construct a nomogram integrating the risk score and other important clinical parameters by means of "survival", "rms" and "foreign" R packages to obtain a better predictive performance. Similarly, the variables with p < 0.05 in univariate analysis were picked into multivariate Cox model. ROC curve, KM survival curve and calibration curves were drawn to con rm the e cacy of the nomogram.

Correlation between risk score and BLCA immune response
In this part, we mainly evaluated the correlation of the gene-based risk score with the expression level of crucial immune-checkpoint-relevant genes, Immune score which could account for the proportion of immune component in TME, and the in ltrating level of TICs. To su ciently explore the relationship between risk score and tumor immune response, all BLCA patients were labeled with high-or low-risk according to the median risk score. The expression pro le of ve critical immune checkpoints, including TIGIT, LAG3, HAVCR2, CTLA4 and PDCD1, were extracted from TCGA-BLCA gene expression matrix with log2(FPKM+1) transformation. Pearson coe cients were calculated and only p < 0.001 was considered as statistically signi cance. Meanwhile, the expression level of these ve genes in high-risk group was compared with that in low-risk group by Wilcoxon test. The linkage between the hub mRNAs in the gene risk signature and pivotal immune-checkpoint-relevant genes was also explored. The correlation with R > 0.3 and p < 0.05 was deemed to be signi cant ESTIMATE, an algorithm developed to evaluate the relative abundance of stromal and immune components in tumor, was utilized to calculate the Immune score. The correlation was assessed both using difference test and correlation test.
To check the tight relationship between risk score and immune TME, CIBERSORT algorithm 29 was implemented to reckon the in ltration ratio of 22 different kinds of TICs with the ltering threshold of p < 0.05. Then the proportion difference was explored between high-and low-risk group. Similarly, to obtain reliable results as possible, we calculated the Pearson correlation coe cients between risk score and in ltration proportion of each TIC.

Functional annotation analysis
The Hallmark gene sets, which were available in the MSigDB database (https://www.gseamsigdb.org/gsea/msigdb/index.jsp) 30 , were downloaded as reference datasets for Gene Set Enrichment Analysis (GSEA) and Gene Set Variation Analysis (GSVA). These analysis was performed through GSEA software 31 (version 4.0.3) and "GSVA" R package, respectively. According to the median risk score, the enrolled BLCA patients from TCGA all divided into high-and low-risk group. For GSVA, |log2FC| > 0.1 and p < 0.05 were selected as cut-off criteria. For GSEA, p < 0.001 and false discovery rate (FDR) < 0.001 were set as the signi cance threshold. The common signi cantly changed pathways obtained by GSEA and GSVA were recon rmed by measuring the Pearson correlation between risk score and ssGSEA score of enriched pathways.

Identi cation of immune-relevant DEmRNAs, DElncRNAs and DEmiRNAs via ssGSEA
The work ow of the whole study is displayed on Fig. 1. Via unsupervised clustering algorithm, 411 bladder cancer patients were divided into two groups, including low-immune-in ltrating group (n = 314) and high-immune-in ltrating group (n = 97) (Fig. 2a). The expression level of HLA family in high-immunein ltrating group was signi cantly higher than that in low-immune-in ltrating group (p < 0.001), verifying the utility of the immune grouping strategy (Fig. 2b). Besides, using KM survival analysis, BLCA patients with high immune in ltration have poorer survival than those with low immune in ltration, which is logical and consistence with previous reports (Fig. 2c). After con rming the feasibility of our grouping method, 3257 DElncRNAs (Fig. 2e), 212 DEmiRNAs (Fig. 2k) and 6012 DEmRNAs (Fig. 2h) were screened between high-and low-in ltration groups through "edgeR" package. Besides, 2940 DElncRNAs (Fig. 2d 2. Immune-related lncRNAs obtained by WGCNA From the analysis above, 1030 DElncRNAs were regarded to be associated with BLCA immune response. In order to further clarify the immune-relevant lncRNAs, WGCNA was performed based on Immune score. As shown in Fig. S1a&b, soft-thresholding value 2 was chosen to construct a scale-free network (R² = 0.90). We have identi ed a total of four divergent modules, including blue module, turquoise module, brown module and yellow module (Additional le 1: Fig. S1c). Among these modules, brown module (R = 0.45, p = 3e-22) and yellow module (R = 0.54, p = 3e-32) were picked for subsequent analysis (Additional le 1: Fig. S1d). A sum of 170 lncRNAs, including 102 lncRNAs in brown module and 68 lncRNAs in yellow module, were recon rmed to be associated with BLCA immune response.
3. Overlapped mRNAs of predicted mRNAs and immune-related DEmRNAs 170 lncRNAs, which were obtained from the establishment of a scale-free co-expression network, were then used to predict the possible targeted miRNAs using miRcode website, and as a result, 75 miRNAs were obtained. With miRDB, TargetScan and miRTarBase, 75 predicted miRNAs from the previous step analysis were uploaded and 1933 common mRNAs were acquired. Venn analysis was conducted between 1933 predicted mRNAs and 2863 immune-relevant DEmRNAs, and 182 overlapped mRNAs were found and further explored in the following survival analysis.

Construction and validation of a ve-gene prognostic signature
First, according to the median expression level of each gene among 182 overlapped mRNAs, 399 BLCA patients with detailed survival data were divided into two groups, and accordingly, KM survival analysis could be put into practice. 48 genes were preliminarily believed to be correlated with prognosis of BLCA (Additional le 4). As displayed in Table.1, no signi cant divergence of clinical features between the training dataset and validation dataset was found with the ltering threshold of p < 0.05 by means of Chisquare test. After the stochastic grouping of 399 patients, univariate Cox regression with ltering criteria of p < 0.05 helped to identify 24 genes in the training dataset (Additional le 5: Table. S1). By performing LASSO (Fig. 3a&b) and multivariate Cox regression, a ve-gene prognostic model was built as follows: -0.4231*exp(PCGF3) + 0.4164*exp(FASN) + 0.2498*exp(DPYSL2) + 0.1477*exp(TGFBI) -0.6484*exp(NTF3), where exp means the mRNA expression level of the particular gene. In addition, we found PCGF3, FASN, DPYSL2 and TGFBI were independent predictors in this model (p < 0.05; Fig. 3c), implying the tight relationship between these four genes and the process of tumor immunobiology. KM survival analysis displayed these four genes also had predictive ability for BLCA prognosis in GSE13507 cohort ( Fig. 6d-g). The patients with higher expression of DPYSL2, TGFBI and FASN and lower expression of PCGF3 would suffer poorer prognosis, which was consistent with the analysis results from TCGA.
To validate the robustness of the ve-gene model, internal and external validation was performed in TCGA-BLCA cohort and GSE13507 cohort, respectively. Time-dependent ROC analysis showed the areas under curves (AUCs) at 1-, 3-and 5-year was 0.672, 0.701 and 0.714 in the training dataset (Fig. 3d). The same analysis was conducted in internal validation dataset (Fig. 3e), the entire TCGA-BLCA dataset (Fig.  3f) and external validation dataset (Fig. 3j). KM log-rank test survival curves were also drawn to validate the predictive ability for BLCA prognosis. "2.05" was selected as the optimal cut-off value of risk score by X-tile, and then all patients, including ones in the training dataset, internal validation dataset and external validation dataset, were grouped into high-and low-risk. The cases in high-risk group signi cantly suffered less survival time than that in low-risk group no matter in the training dataset (p = 1.554e-09; Fig.  3g), the internal validation dataset (p = 1.959e-05; Fig. 3h), the entire TCGA-BLCA dataset (p = 1.006e-13; Fig. 3i) and GSE13507 cohort (p = 7.759e-04; Fig. 3k). In all, our ve-gene model was a strong tool to predict the prognosis of BLCA patients across the datasets and platforms.

Establishment a risk-score-based nomogram
To clarify whether the risk-score, which was highly correlated with tumor immune response, was an independent predictive factor for BLCA prognosis compared with other clinicopathological traits, univariate and multivariate Cox analysis was performed. In the univariate Cox model, the risk score, along with age, pathological N stages, pathological T stages and tumor stages, were responsible for the prognosis of BLCA (p < 0.05; Fig. 4a). Age, pathological N stages and risk score were recon rmed as independent prognostic factors though multivariate Cox regression analysis (p < 0.05; Fig. 4b), and based on the results of multivariate Cox analysis, a nomogram consisting of age, N stages, T stages and tumor stages was constructed to predict the 3-and 5-year survival through the quantitative scoring strategy (Fig.   4c). Using the nomogram, each patient with BLCA could get a total point. ROC analysis (Fig. 4d) and KM survival analysis (Fig. 4e), which is based on the median total points, demonstrated the higher total score represented the greater mortality. Calibration curves (Fig. 4f&g) indicated the high similarity between our model and ideal model. 6. Correlation between risk score and BLCA immune response To prove the close relevance between the risk score and the immune in ltrating status of BLCA, Immune score, immune-checkpoint-related gene and in ltrating proportion of TICs were picked for further analysis. The Immune score (Additional le 2: Fig. S2a), Stromal score (Additional le 2: Fig. S2b) and ESTIMATE score (Additional le 2: Fig. S2c) were positively correlated with the risk score, and these scores in the high-risk group were signi cantly higher than those in the low-risk group (Additional le 2: Fig. S2d-f). Besides, to further explore the relationship between the risk score and immune TME, the in ltration ratio of 22 different TICs was calculated via CIBERSORT (Additional le 3: Fig. S3). The outcome from difference test and correlation test indicated that a sum of 6 types of TICs were signi cantly correlated with the risk score (Fig. 5). Among them, three kinds of TICs have positive correlation with risk score, including macrophages M0, M1 and M2, while three kinds of TICs have negative correlation, including memory B cells, activated dendritic cells, and regulatory T cells. We also detected the correlation among risk score and crucial immune-checkpoint-relevant genes (TIGIT, PDCD1, LAG3, HAVCR2 and CTLA4), and found the expression of critical immune checkpoints in the high-risk group was signi cantly higher than that in the low-risk group ( Fig. 6h&i; Additional le 6: Table. S2), implying the poor prognosis of patients with high risk might be partly due to the increasing expression of immune checkpoints. All this evidence above strongly demonstrated our risk score was signi cantly associated with the process of BLCA immune response. To verify the immunological relationship between the immune-checkpoint-relevant genes and the four hub genes, which were independent predictors in our risk model, we calculated the Pearson correlation coe cients and the results of this analysis were shown in Fig 6j and Additional le 7: Table. S3 (R > 0.3; p < 0.05). Remarkably, FASN and PCGF3 had negative correlation with the crucial immune-checkpoints, while DPYSL2 and TGFBI had positive correlation.

Establishment of an immune-related ceRNA network
Through miRcode online tool, 199 miRNAs were obtained based the ve hub mRNAs which included DPYSL2, TGFBI, NTF3, PCGF3 and FASN. After integrating the immune-related DEmiRNAs, which were acquired by our rst-step analysis of the whole process, 24 overlapped miRNAs were identi ed (Fig. 6c). Similar process of analysis was conducted on the predicted lncRNAs. However, starBase instead of miRcode was used for lncRNA prediction. Ultimately, a mRNA-miRNA-lncRNA ceRNA network including 5 mRNAs, 24 miRNAs and 86 lncRNAs was constructed (Fig. 6a). The correlation analysis among the ve hub mRNAs was displayed in Fig. 6b. Additional le 8: Table. S4 showed the prognostic value of 24 miRNAs and 86 lncRNAs via univariate Cox regression.
8. Identi cation of risk-score-related pathways GSVA displayed the ssGSEA scores of 9 pathways in high-risk group were signi cantly higher than those in low-risk group by "limma" R package (Fig. 7a). Top 10 terms ranked by Normalized Enrichment Score (NES) in high-risk group via GSEA software were chosen and visualized by R (Fig. 7b). The interaction analysis showed ve pathways, including allograft rejection, complement, E2F targets, in ammatory response, and epithelial mesenchymal transition (EMT) pathways, were mostly associated with risk score (Fig. 7c). Positive correlation between risk score and the ssGSEA score of the ve signi cantly changed pathways was recon rmed by Pearson signi cance test (Fig. 7d).

Discussion
Bladder cancer (BLCA), carrying a great social burden with about 170000 patients died annually 32 , has been widely demonstrated to be able to bene t from immunotherapy 33 , so exploring the underlying mechanism involving in the immune response of BLCA is necessary and meaningful. The increasing researches of ceRNA network suggested the upset of the equilibrium of regulatory ceRNA network played nonnegligible roles in the progress of diseases, especially cancers. A recent study 13 has constructed a ceRNA network, which was of high prognostic value, to achieve a deeper comprehension to the RNA regulatory network in BLCA. Besides, although several immune-relevant signature [34][35][36] have been constructed to predict the survival of BLCA, few studies concentrated on immune-relevant genes revealing the prognosis based on ceRNA hypothesis. In this study, we integrated the TCGA-BLCA database and GEO database to determine an immune-related risk signature and constructed an immune-relevant ceRNA network including 5 mRNAs, 24 miRNAs and 86 lncRNAs, leading to a better understanding of the oncological immunology of BLCA from the angle of RNA interactions.
Utilizing ssGSEA strategy, which has been proved to be effective for tumor grouping based on immune in ltrating status in multiple studies 17,35,37 , 411 BLCA patients were labeled with high-immunein ltrating and low-immune-in ltrating, and then genomic difference detection was performed to achieve the latent immune-relevant lncRNA, mRNA and miRNA. To identify the genes correlated with tumorigenesis of BLCA, we compared the genomic expression difference between the parcancerous tissues and malignant bladder tissues by "edgeR" package of R. WGCNA was implemented to obtain the lncRNAs mostly relevant to the process of immune response. Multiple databases, including MiRDB, miRcode, miRTarBase, TargetScan and starBase, were used to predict the possible binding pro le of the lncRNAs, mRNAs and miRNAs in order to ensure the accuracy of the prediction of target molecules as possible. KM survival analysis, univariate Cox regression, LASSO algorithm and multivariate Cox regression were implemented to develop a risk model based on 182 overlapped mRNAs. Through the internal and external validation, 5 mRNAs including PCGF3, FASN, DPYSL2, TGFBI and NTF3 were identi ed as important prognostic biomarkers and the risk score was successfully constructed. ESTIMATE algorithm, CIBERSORT algorithm and the exploring of the correlation between risk score and crucial immune checkpoints were conducted, verifying our risk score was signi cantly correlated with BLCA immune process. At last, we established an immune-relevant ceRNA network based on these 5 critical mRNAs, providing novel insights of the tumor immunity of BLCA.
Among these ve mRNAs, PCGF3, FASN, DPYSL2 and TGFBI were independent predictors and displayed signi cant prognostic value in the external validation dataset. Polycomb Group Ring Finger 3 (PCGF3), involving in polycomb group multiprotein PRC1-like complex, could inhibit the transcription of many genes in the developmental process 38 . It has been reported that PCGF3 could serve as a biomarker for the e cacy of docetaxel, cisplatin and S-1 (DCS) on patients with advanced gastric cancer 39 . Besides, some study reported the relatively high expression level of RING1 protein, such as PCGF3, could reveal a favorable prognosis in non-small cell lung cancer 40 , implying PCGF3 might be an antioncogene during the tumorigenesis. In our study, we rstly found the expression of PCGF3 acted as a favorable predictor in BLCA and was negatively correlated with the expression of immune-checkpoint-relevant genes. Most of the current researches of PCGF3 are based on developmental biology, but the role PCGF3 played in cancer development, especially the process of antitumor immunity, is unclear.
Fatty Acid Synthase (FASN), Dihydropyrimidinase Like 2 (DPYSL2) and Transforming Growth Factor Beta Induced (TGFBI) were all associated with adverse clinical outcomes in BLCA patients. As an essential lipogenic enzyme synthesizing the fatty acids de novo, FASN was involved in energy metabolism of BLCA. Many studies have reported the metabolic reprogramming of tumor cells could inhibit the energy intake of immune cells 41 . In addition, it has been found the fatty acids are able to induce the transformation of M1 macrophages to M2 macrophages, promoting tumor growth and metastasis 42 .
Recent study revealed blockade of FASN helped CD4 effector T cells effectively avoid restimulationinduced cell death (RICD) 43 . All the proof above displayed the close relationship between FASN and tumor immune response, while few studies concentrated on this aspect in BLCA for the moment.
DPYSL2, a member of dihydropyrimidinase-like protein family, which was acted as a regulator of the nervous system development, was rstly found to be an indicator for poor prognosis of BLCA. Some researchers 44,45 have demonstrated the linkage between DPYSL2 and mTOR signaling pathway, and the activation of mTOR signaling pathway is crucial for T cell differentiation 46 , suggesting the potential function of DPYSL2 in immunobiology of bladder cancer. TFGBI is an exocrine protein and could promote the malignant phenotypes of BLCA 47,48 . Recent study 49 has identi ed TGFBI as an immune-relevant biomarker for prognosis of clear cell renal cell carcinoma based on multiple datasets, verifying our interesting ndings. In all, combining the literature and the outcomes of our study, we believe PCGF3, FASN, DPYSL2 and TGFBI could serve as important prognostic factors and involve in the process of immune response in BLCA.
Several ncRNAs of the 24 miRNAs and 86 lncRNAs in our ceRNA network have been reported to be correlated with immune response and tumorigenesis. For example, miR-216a, which served as a predictor for poor prognosis of BLCA ( Figure. 6A), could promote the proliferation of renal cell carcinoma cell 50 . We also found HCP5, a lncRNA obtained from the target prediction of miR-216a, was correlated with immune checkpoints, which was reported by a recent literature 51 . Additionally, the previous studies offered us reliable experimental buttress to the prediction in our ceRNA network. The FASN-3'-UTRluciferase assay was successfully performed to validate the ability of miR-150-5p to target FASN in the mice experiment 52 . MiR-21 was able to induce the tumor growth by targeting TGFBI in non-small cell lung cancer cells 53 and pancreatic cancer cells 54 , demonstrating the relative stability of our predictions though experimental investigation in BLCA is demanded. Importantly, we constructed a gene-based risk scoring model, which showed high accuracy of prediction for overall survival both in the TCGA-BLCA cohort and GSE13507 cohort. By means of immune grouping and correlation test, we ensure the risk score is closely correlated with tumor immunity. Compared with other immune-related risk models [34][35][36] , we rstly made use of ceRNA hypothesis to establish a prognostic signature in order to make sure the risk score was immune-related. The analysis results of functional annotation indicated that the immune-related pathways, such as complement and in ammatory response, were signi cantly enriched in the high-risk group, which once again veri ed the immunological correlation of risk score. We also explored the immune cell in ltration divergence between low-and highrisk group, and found the higher in ltration proportion of macrophages M0, M1 and M2 and lower in ltration ratio of memory B cells, activated dendritic cells and regulatory T cells in the high-risk cohort than that in the low-risk group. Multiple researches revealed that tumor-associated macrophages (TAMs) could positively regulate the tumor growth and affect the effectiveness of BCG instillation in BLCA 55,56 .
Unexpectedly, M1 macrophages, which were regarded as anti-tumor cells, were found to be correlated with poor prognosis of BLCA in our study. It was reported that M1 macrophages could induce the expression of PD-L1 in hepatocellular carcinoma (HCC) cells, leading to the resistance to anti-tumor immunity 57 . Our research also disclosed that the higher in ltrating level of M1 macrophages and higher expression level of PD-L1, which is also known as PDCD1, could be discovered in the high-risk group.
Hence, there may be similar biological mechanisms in the tumorigenesis of BLCA, which needed to be further experimental validated. Furthermore, we revealed that the risk score is positively correlated with the immune score, stromal score and ESTIMATE score. Therefore, the cases in the high-risk group might have a signi cantly higher in ltration ratio of immune cells and stromal cells, along with the lower ratio of tumor cells, supporting the conclusion that the risk score could indicate the immune status of TME in patients with BLCA.
Meanwhile, the limitations of the present study are nonnegligible. First, PCGF3 and DPYSL2 were rstly reported to be prognostic biomarker and correlated with tumor immune response in BLCA patients, and further studies needed to explore the underlying mechanisms and stability of their predictive value though multicenter and large-scale researches. Second, we predict the possible targeted miRNAs and probable lncRNA competitors of DPYSL2, TGFBI, NTF3, PCGF3 and FASN, while few of the predictions have been veri ed in BLCA. Last, an immune-related scoring model has been constructed and validated primitively, and clinical traits ought to be performed to recon rm the robustness.
To sum up, our study has successfully set up an immune-related risk signature, including DPYSL2, TGFBI,

Availability of data and materials
The data supporting our ndings could be downloaded from UCSC browser and GEO database.
Ethics approval and consent to participate No applicable

Consent for publication
No applicable

Competing interests
The authors declare that they have no competing interests.

Authors' contributions
Cun-dong Liu designed the study. Ran-ran Zhou developed the algorithm and drafted the manuscript. Hu Tian downloaded and analyzed the data. Cheng Yang checked the results. Fan Peng, Xi Zhi, Hao-yu Yuan and Jun-hao Zhou revised the manuscript. All authors read and approved the nal manuscript.  Figure 1 The work ow diagram of the study process. The immunogenomic landscape of the TME in BLCA patients. a. Unsupervised clustering of 29 immune signatures for 411 BLCA patients from the TCGA-BLCA cohort: high in ltration (red, n = 97) and low in ltration (blue, n = 314). Parameters including tumor purity, ESTIMATEScore, ImmuneScore and StromalScore are shown above the heatmap. b. Expression levels of HLA genes in the TCGA-BLCA cohort patients with a high-in ltrating TME (red) vs that in the patients with a low-in ltrating TME (blue).The The risk score based on the ve-gene signature is a promising predictor for prognosis of BLCA patients. ab. LASSO Cox regression identi ed 11 genes most correlated with overall survival in the training dataset. BLCA training (d), validation (e) and entire (f) cohort. Kaplan-Meier survival curves for patients assigned to high-and low-risk group in TCGA-BLCA training (g), validation (h) and entire (i) cohort. 2.05 is regarded as the optimal cut-off value via X-tile software. j-k. Time-dependent ROC curve (j) and Kaplan-Meier survival curve (k) shows the 5-gene risk signature can predict the prognosis of patients with BLCA from GSE13507.

Figure 4
Development of a nomogram based on the ve-gene signature to predict overall survival (OS) of patients with BLCA. a. Univariable Cox regression analysis for the entire TCGA-BLCA cohort. b. Multivariable Cox regression analysis for the entire TCGA-BLCA cohort. c. A nomogram including risk score and other clinical features for predicting 3-and 5-years OS for BLCA. d. Time-dependent ROC curve for 1-, 3-, and 5year OS predictions for the nomogram. e. Kaplan-Meier survival curve for the patients with BLCA. The cohorts were stratifed at median cut-off of the points calculated by nomogram to form high-risk and lowrisk groups. Calibration plot of nomogram for predicting probabilities of 3-(f) and 5-year (g) OS of TCGA-BLCA patients.

Figure 5
Correlation of tumor-in ltrating immune cells (TICs) proportion with risk score. a. Violin plot displaying the ratio divergence of 22 types of immune cells between BLCA samples with low or high risk score relative to the median of risk score, and Wilcoxon rank sum was used for the signi cance test. b. Scatter plot showing the correlation of 7 kinds of TICs proportion with risk score (p < 0.05). Pearson coe cient was used for the correlation test. c. Venn plot revealing six kinds of TICs correlated with risk score codetermined by difference test and correlation test displayed in violin and scatter plots, respectively.

Figure 6
Construction of a ceRNA network. a. A ceRNA network including 5 mRNAs, 24 miRNAs and 86 lncRNAs.
Red nodes represent mRNAs, blue nodes represent miRNAs, and green nodes represent lncRNAs. The nodes with red border mean the genes with HR >1, and the nodes with blue border mean the genes with HR <1 (p < 0.