Identication of Prognostic Biomarkers of Potential Hub Genes in Urothelial Carcinoma and Function in Microenvironment

Background Urothelial carcinoma (UC) is the most common histological type of urinary system. In the past decades, despite the advances in UC diagnosis and therapy, there are still challenges to improve the overall survival (OS) of UC patients. PD-L1 inhibitor and PD-1 inhibitor have been approved for treating invasive UC, however, only about 20% of patients with metastatic UC show clinical benets from immune checkpoint inhibitors. Therefor, bioinformatics tools were utilized to screen prognostic-related biomarkers, and analyze their relationship with immunocyte in UC, hoping to provide new ideas for the clinical treatment of UC patients.


Page 3/32
Urothelial carcinoma (UC), the most common histological type of urinary system [1], representing 2.1% of all cancer deaths [2]. There are an estimated 429,000 cases of UC diagnosed and 150,000 deaths worldwide each year [3]. UC derives from the transitional cells of the urinary tract and it is classi ed into the upper and lower tract UC [1,4]. According to the anatomical location, among which up to 90% is located in bladder and merely 5% is located in upper urinary tract [4]. According to 2016 WHO classi cation of tumors of urinary system and male genital organs, UC can be further classi ed into invasive and super cial types depending on depth of tumor in ltration [5]. To date, despite the advances in UC diagnosis and therapy in the past decades, there are still challenges to improve the overall survival (OS) of UC patients [6]. According to a previous study, 30-70% of UC patients would present recurrence and 30% of cases with super cial UC would show invasive lesions [7]. Therefore, it is necessary to futher analysis UC and develop novel prognostic biomarkers for UC patients.
Nowadays, more studies focus on the roles of immune checkpoint inhibitors (ICIs) and immunocytes in pathogenesis, immunotherapy of cancer, as well as disease progression [7]. Interaction between tumor cells and immunocytes suppressed immune response and promoted tumor progression [8][9][10]. In 1976, Bacillus Calmette-Guerin was rst used as an immune agent for treating UC [11] as it showed anti-tumor effects by activating the immune system and inducing in ammatory reaction [12]. However, no further meaningful progress has been achieved in treating invasive UC, local advanced or metastatic UC [13]. Up to now, several ICIs (e.g. PD-L1 inhibitor and PD-1 inhibitor) have been approved for treating invasive UC, which signi cantly prolong the survival of patients with invasive UC [14,15]. However, only about 20% of patients with metastatic UC show clinical bene ts from ICIs [1]. Moreover, immunotherapies are expensive. Therefore, it is necessary to select suitable patients for immunotherapy.
In this study, bioinformatics tools were utilized to conduct detailed molecular analysis of UC, we selected three gene expression pro les (i.e. GSE32548, GSE32894 and GSE48075) from Gene Expression Omnibus (GEO). NetworkAnalyst tool was used to construct gene regulatory network of DEGs, DAVID and Metascape were used to perform gene ontology (GO) and kyoto encyclopedia of genes and genomes (KEGG) enrichment analysis of DEGs. Hub genes were screened by Search Tool for Retrieval of Interacting Genes (STRING) and cytoscape software, and the ONCOMINE, Gene Expression pro ling Interactive Analysis (GEPIA), UALCAN, cBioPortal and Human Protein Atlas (HPA) databases were used to analyze the expression differences and survival curves of UC at the DNA, RNA, and protein levels. TIMER was used to analyze the relationship between hub genes and immunocyte in ltration. We hope to further explore its sensitive and speci c treatment methods for UC patients with immunotherapy.

Materials And Methods
Data source GEO (www.ncbi.nlm.nih.gov/geo) serving as an open functional genomics database of the highthroughput resources [16]. Three gene expression pro les (i.e. GSE32548, GSE32894 and GSE48075) with adequate samples of invasive and super cial UC were selected . [17][18][19] Ta and T1 tumors were classi ed as super cial UC, while tumors of T2 or higher stage were classi ed as in ltrating [20]. Samples with unknown stages (e.g. Tx) were excluded from analysis. All of the samples were based on Agilent GPL6947 platform (Illumina HumanHT-12 V3.0), and all data were freely available online. This study did not involve any experiments on humans or animals.
Data processing of DEGs GEO2R online analysis tool (https://www.ncbi.nlm.nih.gov/geo/geo2r/) was used to detect the DEGs between super cial and in ltrating UC samples. NetworkAnalyst (https://www.networkanalyst.ca/) is an analytical platform integrating tissue-or cell-type-speci c protein-protein interaction (PPI) network, gene co-expression network, and gene regulatory network [21]. In this study, we constructed gene regulatory network of DEGs using NetworkAnalyst tool. Genes met the cut-off criteria, with adjusted P < 0.05 and |logFC| ≥ 1.0, were considered as DEGs. Statistical analysis was carried out for each dataset in NetworkAnalyst, and Venn plot (http://bioinformatics.psb.ugent.be/webtools/Venn/) were used to intersect three datasets to obtain common DEGs.

GO/KEGG pathway analysis of DEGs
GO is an ontology widely used in bioinformatics and a common useful method for large-scale functional enrichment research, covering biological process (BP), molecular function (MF), and cellular component (CC) [22]. KEGG contains abundant data on genomes, biological pathways, diseases, chemical substances, and drugs [23]. Initially, DAVID (https://david.ncifcrf.gov) website is designed to investigate the function of a large number of genes or proteins, containing a comprehensive set of functional annotation tools to clarify biological functions of target genes [24], and we utilized it for the GO annotation and KEGG pathway enrichment analysis of DEGs. To ensure the credibility of results, we also analyzed the data with online tools from Metascape (http://metascape.org/gp/index.html#/main/step1) software [25]. Cut-off value for signi cant GO terms and KEGG pathways were a false discovers rate (FDR) of less than 0.05. Meanwhile, GO and KEGG pathway enrichment analyses of DEGs were conducted using DAVID and Metascape, in order to visualize and integrate DEGs enrichment of BP, MF, CC and KEGG pathways. PPI network construction and hub genes identi cation STRING (http://string-db.org/; version 11.0) database is designed to analyze the PPI [26]. To evaluate the potential PPI relationship, DEGs were mapped using STRING database to evaluate PPI with a combined score of > 0.4 setting as the cut-off criterion. Subsequently, the PPI network was visualized by Cytoscape (www.cytoscape.org/; version 3.8.1) software serving as a bioinformatics software for computational analysis of cellular networks and merging of experimental omics datasets [27]. Nodes with a higher degree of connectivity tend to be more essential in maintaining the stability of the entire network. Hub genes were selected using CytoHubba network analyzer plug-in. Furthermore, Molecular Complex Detection (MCODE) plug-in was used to screen modules of hub genes from PPI network.
Hub genes analysis ONCOMINE (www.oncomine.org) database is an integrated online cancer microarray database for DNA or RNA sequencing analysis [28]. In our study, data were extracted to evaluate the expression of hub genes in invasive and super cial UC tissues samples. Cut-off of P-value and fold change were as following: Pvalue: 0.01, fold change: 1.5, gene rank: 10%, data type: mRNA, Analysis type: cancer vs cancer analysis. cBioPortal (www.cbioportal.org) is an online open-access website resource for exploring, visualizing, and analyzing multi dimensional cancer genomics data [29]. Copy number variation (CNV), mutations, and the summary of the gene types in UC were evaluated according to the online instructions of cBioPortal. In addition, relationship between gene mutation and UC prognosis was analyzed using cBioPortal tool based on TCGA database. In this study, we analyzed CNV of 413 UC samples (TCGA, provisional) and Pvalue of less than 0.05 was considered to be signi cantly different. UALCAN (http://ualcan.path.uab.edu) is an interactive web resource based on the level 3 RNA-seq and clinical data of 31 cancer types from TCGA database [30]. In this study, data for hub genes expression was obtained using "Expression Analysis" module of UALCAN dataset. Afterwards, it was utilized to analyze mRNA expression of hub genes in individual UC stages and their association with individual stages. Student's t test was conducted to analyze the differences in transcriptional expression. P < 0.05 was considered to be statically signi cant.
HPA (https://www.proteinatlas.org) is a free online database containing immunohistochemistry-based expression data for near 20 highly common kinds of cancers (n = 12 for each cancer type) [31]. It provides abundant transcriptome and proteome data on human normal tissues or pathological tissues through RNA sequence analysis and immunohistochemical analysis. In the present study, we compared the protein expression and distribution of hub genes in UC tissues with that of normal tissues in HPA. According to the fraction of stained cells, staining quantity was also divided into four levels: none, < 25%, 25-75%, and > 75%. Protein expression levels were based on staining intensity and staining quantity.
Classi cation criteria for protein expression levels were as follows: negative, not detected; weak and < 25%, not detected; weak combined with either 25-75% or 75%, low; moderate and < 25%, low; moderate combined with either 25-75% or 75%, medium; strong and < 25%, medium; and strong combined with either 25-75% or 75%, high. GEPIA (http://gepia.cancer-pku.cn) is an interactive web server for estimating mRNA expression data based on 9,736 tumors and 8,587 normal samples in TCGA and Genotype-tissue Expression dataset projects [32]. In this study, we examined the expression of hub genes in 402 UC patients from TCGA database based on gene expression by using the Logrank and Mantel-Cox tests. A total of 402 patients were used to evaluate the OS. Hazard ratio (HR), 95% con dent interval (CI), and P-value were calculated accordingly. Similarly, in order to ensure its accuracy, we also used UALCAN platform to analyze hub genes survival. Survival analysis of hub genes in UC was performed on the GEPIA and UALCAN online database, respectively. TIMER (https://cistrome.shinyapps.io/timer/) is a reliable, intuitive tool that provides systematic evaluations of the in ltration of different immunocyte and their clinical impact [33]. In this study, Gene module was used to evaluate the correlation between the hub genes level and the in ltration of immunocyte. Survival module was used to evaluate the correlation among clinical outcome and the in ltration of immunocyte and the hub genes expression. SCNA module was used to explore the relationship between copy number variation and immune in ltration.

Identi cation of DEGs
Three gene expression pro les (i.e. GSE32548, GSE32894 and GSE48075) were selected in this study.  (Table 1). Based on the criteria of P < 0.05 and |logFC| ≥ 1 through NetworkAnalyst platform, we obtained 157 DEGs from GSE32548, including 99 up-regulated genes and 58 down-regulated genes, 97 DEGs from GSE32894, including 47 up-regulated genes and 50 down-regulated genes, and 227 DEGs from GSE48075, including 130 up-regulated genes and 97 down-regulated genes (Supplementary Table 1). In addition, 63 DEGs were differentially expressed among all three groups, including 31 up-regulated genes and 32 down-regulated genes. All DEGs were identi ed by comparing super cial UC with in ltrating UC samples. Subsequently, DEGs of three datasets and common DEGs were visualized with Venn plot (

GO/KEGG Functional enrichment analyses of DEGs
The biological functions of all identi ed DEGs were evaluated by GO and KEGG pathway enrichment analyses. In the enrichment analysis of BPs, DEGs were signi cantly enriched in collagen catabolic process, collagen bril organization, extracellular matrix (ECM) organization, skeletal system development, transforming growth factor beta receptor signaling pathway and cellular response to amino acid stimulus ( Fig. 2A). In CC analysis, DEGs were signi cantly enriched in ECM, collagen trimer, extracellular space, proteinaceous ECM, extracellular region and extracellular exosome (Fig. 2B). In MF analysis, DEGs were predominantly enriched in ECM structural constituent, platelet-derived growth factor binding, heparin binding oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, monooxygenase activity and calcium ion binding (P < 0.05, Fig. 2C and Table 2). KEGG pathway analysis indicated that DEGs were signi cantly enriched in ECM-receptor interaction, focal adhesion, protein digestion and absorption, amoebiasis, platelet activation and PI3K-Akt signaling pathway with the highest gene numbers (P < 0.05, Fig. 2D and Table 3). We further analyzed DEGs enrichment of GO and KEGG pathways through Metascape (Supplementary Table 2)and results were roughly consistent with DAVID website.

PPI network construction and hub genes identi cation
Protein interaction among DEGs were predicted with STRING tools. A total of 62 nodes and 109 edges involved in PPI network (Fig. 2E). The results were then transferred to Cytoscape software to analyze the interaction between candidate DEGs in invasive and super cial UC. To investigate the signi cant modules in this PPI network, the most signi cant modules were obtained by Cytotype MCODE, with an enrichment score of 10.36. These indicated that module one consisted of 12 nodes and 57 edges (Fig. 2F). The top 9 genes with degree of ≥ 10 were screened based on CytoHubba analysis in the PPI network ( Table 4). The results showed that matrix metallopeptidase 9 (MMP9) was the most outstanding gene with a connectivity degree of 16, followed by periostin (POSTN; degree = 13), collagen type III alpha 1 chain (COL3A1; degree = 13), collagen type I alpha 1 chain (COL1A1; degree = 13), collagen type I alpha 2 chain (COL1A2; degree = 13), Versican (VCAN; degree = 12), thrombospondin 2 (THBS2; degree = 11), collagen type V alpha 2 chain (COL5A2; degree = 11), secreted phosphoprotein 1 (SPP1; degree = 10). All of these hub genes were up-regulated in invasive UC compared with super cial UC. Differential expression analysis of hub genes at DNA, mRNA, protein levels and prognostic of UC In order to explore the distinct prognostic and potential therapeutic value of hub genes in UC patients, DNA, mRNA and protein expression was analyzed by ONCOMINE database, cBioportal, UALCAN and HPA, respectively.
A comprehensive analysis was given to the molecular characteristics of differentially expressed hub genes by cBioportal database. Provisional datasets of TCGA were utilized to analyze genetic alterations of hub genes that were differentially expressed. As a result, 44 patients (35%) showed genetic alteration of hub genes in cBioportal for Cancer Genomics database. Ampli cation and deep deletion mutations were the most common genetic alteration (Fig. 3J). VACN, THBS2, POSTN and COL5A2 were the highest four genes with genetic alterations with a mutation rate of 16%, 13%, 12% and 10%, respectively.
In ONCOMINE database, mRNA expression of the hub genes in 20 types of cancers was compared to that of normal tissue (Fig. 4). We found the expression of all hub genes in UC were higher than that in normal tissues. Similarly, we further analyzed mRNA expressions of the hub genes through GEPIA database, the results showed that the expression level of mRNA in UC of MMP9 and SPP1 was higher than that of normal tissues (P < 0.05, Supplementary Fig. 1). Furthermore, we also indicated that mRNA expression of hub genes in invasive UC patients was signi cantly higher than that in super cial UC patients in ONCOMINE database, (P < 0.05, Fig. 3A-3I, Supplementary Table 3).
In this section, we tried to explore the protein expression pattern of the hub genes in UC by HPA. There were no data on the expression of COL5A2 in UC and normal humans in the HPA database. MMP9 protein was not detected in normal and UC tissues. COL1A1, COL1A2 and THBS2 showed moderate expression in normal and UC tissues. Low COL3A1 and SPP1 expression was detected in normal tissues, while moderate and high expression was detected in UC tissues, respective. POSTN and VCAN were not detected in normal tissues, and UC tissues with moderate expression of POSTN and high expression of VCAN (Fig. 5). The hub genes were not signi cantly different from the normal and UC tissues, which may be related to the small sample size (n = 12), and the regulation of other factors in the process of RNA translation protein.
The effects of these hub genes on OS and disease-free survival (DFS) rates were analyzed by Kaplan-Meier curve using the online GEPIA platform. Our data showed that COL1A2, COL3A1, THBS2 and VCAN were positively correlated with OS rate in UC patients (P < 0.05, Fig. 6). Meanwhile, there was no correlation between hub genes and DFS rate in UC (Supplementary Fig 2). Interestingly, the UALCAN websites showed that UC patients with high expression of COL1A1, COL1A2, COL5A2 and POSTN showed a poor prognosis (P < 0.05, Supplementary Fig 3). It is worth noting that COL1A2 is meaningful in both databases.
Immunocyte in ltration of the hub genes in patients with UC TIMER database was utilized to investigate the correlation between differentially expressed hub genes and immunocyte in ltration. Gene module analysis showed that a negative correlation between expression of all hub genes and in ltration of B cell (P < 0.05). A positive correlation was noticed between hub genes expression and the in ltration of CD4+T cells, CD8+T cell, macrophages, neutrophils and dendritic cells (all P < 0.05, Fig. 7). We also evaluated the correlation of differentially expressed hub genes and immunocytes in ltration. Cox proportional hazard model was used for the correction of the following confounding factors: B cells, CD4+T cells, macrophages, neutrophils, COL1A1, COL1A2, COL3A1, COL5A2, MMP9, POSTN, SPP1, THBS2, VCAN. Expression of macrophages (P = 0.004), dendritic cells (P = 0.016), MMP9 (P = 0.043) was signi cantly associated with the UC patients (Table 5). Additionally, we further evaluated the impact of immune in ltration on the clinical prognosis of patients with UC. Our ndings showed that B cells, T cells, macrophages, neutrophils, and dendritic cells were factors related to the cumulative survival rate of UC over time (Fig. 8). And SCNA module analysis showed that several immune cell in ltration levels seemed to associate with hub genes CNV in UC (Fig. 9).

Discussion
The pathogenesis and progression of UC is complicated with various genetic abnormalities [15]. Therefore, it is essential to identify appropriate biomarkers for UC. We conducted a series of bioinformatics analysis based on three independent gene chip databases.
In the present study, 63 common DEGs were identi ed including 31 that were up-regulated and 32 that were down-regulated, respectively. We focused on the function of differentially expressed DEGs using GO and KEGG pathway enrichment analysis. Our data demonstrated that DEGs were mainly enriched in collagen catabolic process, ECM organization, ECM structural constituent and ECM-receptor interaction. ECM was a key component exerting an active effect in all the marks of cancer and was associated with tumor metastasis [34]. Collagen was recognized as the most important component of the ECM, which provided multiple biochemical and biophysical cues to tumor cells [35]. ECM-receptor interaction was shown to be important components of cancer cell proliferation and invasion [36]. These results indicated that these DEGs were mainly involved in the occurrence, development and metastasis of UC. Among the DEGs, nine potential hub genes were obtained using the STRING and MCODE plug-in of Cytoscape.
Notably, all our hub genes were overexpressed in UC. Afterwards, we analyzed the characteristics of hub genes through various databases with an aim to investigate the potential biomarkers of UC.
There were many studies reporting the family of collagen (e.g. COL1A1, COL1A2, COL3A1 and COL5A2) that was involved in carcinogenesis. Li et al. [37] indicated that the high COL1A1 expression was associated with shorter survival in UC patients. Brooks et al. [38] found that high COL1A1 expression was associated with shorter survival time in patients with super cial UC. COL1A1 has been considered as an oncogene and promoted cancer migration and invasion by inducing epithelial-mesenchymal transition (EMT) [39]. In addition, increasing evidence con rmed that COL1A1 involved in proliferation, migration, development, and progression of various cancers [37,39]. This indicated that COL1A1 might be a putative therapeutic target for cancer. Rong et al. [40] reported high expression of COL1A2 in gastric cancer than in normal tissues. Additionally, Tamilzhalagan et al. [41] reported that miR-25 could selectively target the COL1A2 gene. MiR-25 silencing increased the expression of COL1A2 and inhibited the expression of E-cadherin. This is related to EMT, revealing the inhibitory effect of miR-25 on diffuse gastric cancer [15]. Notably, COL1A2 was highly expressed in pancreatic cancer, sarcoma, and gastric cancer tissues and promoted cell proliferation, migration, and invasion, which provided insights for mechanistic studies and clinical trials [42]. It has been reported that mRNA transcription level of bladder cancer is increased and CpG hypermethylation of COL1A2 contributes to proliferation and migration activity of human bladder cancer [43]. It has been reported that mRNA transcription level of bladder cancer is increased and CpG hypermethylation of COL1A2 contributes to proliferation and migration activity of human bladder cancer. Qiu et al. reported that miR-29a/b promoted cell migration and invasion in nasopharyngeal carcinoma progression by regulating COL3A1 gene expression [42]. While Su et al. indicated that let-7d can at least partially inhibit growth, metastasis, and tumor macrophage in ltration in renal cell carcinoma by targeting COL3A1 and CCL7 [42]. Yuan et al. discovered that COL3A1 played certain roles in the progression of human UC and in uenced the prognosis probably by regulating MAPK signaling pathway, which contributed to the poor prognosis of UC [42]. These were consistent with our results. In a previous study, the in situ expression of COL5A2 increased in invasive breast cancer and was associated with ECM remodeling, suggesting that COL5A2 is associated with tumor progression in breast cancer [42]. Moreover, the expression of COL5A2 was related to the malignancy of colorectal cancer, suggesting its potential as a biomarker for colorectal cancer. Zeng et al. [44] demonstrated that COL5A2 was related to the poor clinical outcome and survival rate of patients with UC, demonstrating that COL5A2 was associated with poor clinical outcomes and survivals of patients with UC. This implied that it could be regarded as a biomarker of UC. Therefore, we considered that COL1A1, COL1A2, COL3A1 and COL5A2 may serve as promising candidates for the diagnosis, treatment and prognosis of UC.
MMP9 is an important matrix proteinase involving in embryonic development, wound healing, arthritis, angiogenesis, and cancer invasion and metastasis [45]. MMP9 mRNA expression was higher in invasive UC than that in super cial UC [46]. Besides, Lei et al. [8] revealed MMP9 could promote cancer invasion, metastasis and angiogenesis, while its expression was correlated with UC grade, invasiveness and poor prognosis. Wu et al. [47] found that the Wnt signaling pathway would regulate UC metastasis through activating MMP9, and showed that MMP9 may be regulated by miR-3713 in UC cells.
POSTN is an ECM protein expressed in many normal tissues, where it regulated cell adhesion and played an important role in the development and maintenance of mechanical stress-bearing structures (e.g. bones, teeth, and heart valves) [48]. POSTN was overexpressed in many human cancers, and high POSTN expression was correlated with tumor proliferation, cell motility and invasion, metastasis, angiogenesis, evasion of apoptosis, clinical stage, and survival of patients with UC [48], which is consistent with our ndings. In addition, at the molecular level, POSTN involved in activation of PI3K/Akt and/or MAP kinase pathways via interacting with integrin receptors (i.e. aVb3 and aVb5), which subsequently promoted cell adhesion, motility, and angiogenesis. Furthermore, it could regulate E-cadherin expression and cell invasion in a cell-type dependent manner via Akt phosphorylation and regulating E-cadherin transcription, as well as negative regulation on Snail and Twist [49]. We identify POSTN expression as a feature of invasive UC associated with poor outcomes.
SPP1, also known as Osteopontin (OPN), is located on chromosome 4 (4q13) including 6 introns and 7 exons [50]. Recent studies have reported that SPP1 was signi cantly associated with cell growth, adherence and invasion in tumorigenesis and metastasis [50,51]. It was over-expressed in lung, colon, breast, and prostate cancers. Our study showed that SPP1 expression was related to tumor stage, progression and survival rate. In addition, high SPP1 gene expression was related to poor prognosis. This implied that SPP1 may be a diagnostic and prognostic marker for UC. Meanwhile, SPP1 can enhance cancer cell survival, angiogenesis, and in ammation, and promote metastasis by enhancing EMT. This indicated that SPP1 was a tumor-promoting gene in tumors [51,52].
THBS2, also known as TSP2, is a disul de-linked glycoprotein that mediates ECM assembly, cell-to-matrix interactions, degradation of MMP2 and MMP9, and inhibition of angiogenesis [53]. THBS2 mainly participates in tumors by inhibiting angiogenesis and negatively regulating MMP-2 and MMP-9 [51]. THBS2 is generally considered a tumor suppressive gene. However, there are disputes on expression of THBS2 in various tumors. High THBS2 immunoreactivity was associated with the tumor response to neoadjuvant chemoradiotherapy. Besides, it was also an independent and good prognostic factor for DFS in patients with rectal cancer [53]. THBS2 gene is down-regulated in prostate cancer tissues and cell lines [53]. In addition, Chijiwa found that the transcription level of THBS2 in lung adenocarcinoma was signi cantly higher than that of normal lung tissue (P < 0.0001) [53]. In our study, THBS2 was upregulated in UC, which played important roles in UC progression.
VCAN, also known as chondroitin sulfate proteoglycan 2 (CSPG2), is a highly conserved structural component in ECM expressed in invasive and metastatic cancer [54]. To date, four subtypes (i.e.V0, V1, V2, and V3) have been identi ed, and all of which contribute to the proliferation, adhesion, and migration of tumor cells, as well as interaction with tumor microenvironment (TME) [55]. In a recent study, VCAN has been reported to be correlated with low RhoGDI2 expression, high VCAN expression and poor clinical outcomes [54]. Our data showed VCAN expression was related to grade and prognosis of UC patients.
The highly in ammatory microenvironment induced high expression of VCAN, and increased macrophage in ltration. In turn, in ltration of macrophage exacerbated VCAN overexpression along with the secretion of other cytokines and in ammatory mediators. In this context, VCAN appeared to mediate a dialogue between in ammatory cells, cytokines and cancer cells in TME [54]. Therefore, we proposed that VCAN may serve as the rst step in the ampli cation of in ammation.
Increasing evidence indicated that immunocyte in ltration may affect tumor progression and recurrence [9]. In recent years, immunotherapy has been preferred in treating aggressive or advanced cancers with promising e ciency [7,10]. As immune microenvironment affects tumor progression, we intended to explore an immune-related gene signature associated with prognosis of patients with UC. TME is composed of cancer cells and non-cancer cell components such as immunocyte (B cells, CD8+T cells, CD4+T cells, macrophages, neutrophils and dendritic cells), mesenchymal cells, endothelial cells, ECM molecules, and in ammatory mediators [56]. Tumor gene expression pro le could quantify the immune activity in TME, such as CD4+T recognizing cancer antigens, and then activated M1 macrophages may inhibit tumor growth [57]. In addition, broblasts were associated with exocrine phenotype of T cells in bladder cancer, and the secreting transforming growth factor-β (TGF-β) can lead to the e ux of immunocyte or the resistance of chemotherapy drugs [58]. Therefore, it is necessary to screen biomarkers for hub genes in the immune responses.
Although the effects of CD4+T cells on the clinical outcome were controversial, high in ltration of CD8+ cytotoxic T cells and memory T cells were clearly shown with a longer survival. There was increase in expression of T-helper 1 and cytotoxicity-related genes in UC patients [8]. Mariathasan et al. also shown that invasive UC patient with a high number of CD8+T cells within the tumor showed a longer survival than those with fewer CD8+T cells [8]. Immunocyte has complex biological relationships with tumor cells, which then triggers the progression or repression [15]. Therefore, we speculated that the diversity of immunocyte in TME might be one of the causes leading to different immunotherapeutic responses. In this study, we found that hub genes expressions were negatively correlation with B cell in ltration (P < 0.05), and was positive correlation wit the in ltration of CD4+T cells, CD8+T cell, macrophages, neutrophils and dendritic cells. In addition, our research results also showed that B cells, T cells, macrophages, neutrophils, and dendritic cells were factors related to the cumulative survival rate of UC over time and the immunocyte in ltration levels seemed to associate with hub genes CNV in UC. These ndings together suggest that these hub genes were prognostic indicators and may involve in activation and recruitment of immunocytes in UC.
Our study focused on the exploration of therapeutic targets and prognostic biomarkers in hub genes of UC microenvironment by the screening of multiple databases. There are some limitations in our study.
First, multiple different databases are utilized, and it is di cult to guarantee the unity among different databases. Second, we did not verify the data from these databases. In dependent cohort and in vitro or in vivo studies are required to validate our results. Meanwhile, functional and molecular mechanisms investigations on the hub genes in UC should be further performed.

Conclusion
In conclusion, 63 differentially expressed genes were identi ed in the GEO datasets of UC. Nine of them (i.e. COL1A1, COL1A2, COL3A1, COL5A2, MMP9, POSTN, SPP1, THBS2 and VCAN) were identi ed to be important for the pathogenesis and progression of UC. In the database, the expression levels of hub genes in invasive UC tissues were signi cantly higher than in super cial UC tissues. The expression of hub genes cannot only be used as a prognostic indicator of UC, but it is also related to the immunocyte in ltration. The present study provided novel insights into the occurrence and progression of UC.
However, the diagnostic and prognostic value of these genes required further validation. Therefore, we   This gene is a member of the aggrecan/versican proteoglycan family. The protein encoded is a major component of the extracellular matrix and is involved in cell adhesion, proliferation, proliferation, migration and angiogenesis and plays a central role in tissue morphogenesis and maintenance. Mutations of this gene are the cause of Wagner syndrome type 1.

THBS2 thrombospondin 2 11
The protein encoded by this gene belongs to the thrombospondin family. It mediates cell-to-cell and cellto-matrix interactions.
COL5A2 collagen type V alpha 2 chain 11 This gene encodes an alpha chain for one of the low abundance fibrillar collagens and product is closely related to type XI collagen and mutations are associated with Ehlers-Danlos syndrome, types I and II.

10
The protein encoded by this gene is involved in the attachment of osteoclasts to the mineralized bone matrix. The encoded protein is secreted and binds hydroxyapatite with high affinity and upregulates expression of interferon-gamma and interleukin-12.