Identification and Prognostic Analysis of Hub Genes in Bladder Cancer


 Background:Bladder cancer(BC) is one of the most common tumors worldwide. Its incidence and mortality rate rank first in urological malignancies. Due to the lack of credible predictors, most patients are not timely diagnosed and treated. Moreover, in the past 30 years, the clinical treatment of BC had seen little progress, and the 5-year survival rates of patients were flat.Therefore,identifying novel potential markers or therapeutic targets are urgently required for the diagnosis and prognosis of BC.Methods: The BC gene expression chip data （GSE121711）were downloaded from the GEO database and the BLCA RNA-seq data were downloaded from the TCGA database. The differentially expressed genes (DEGs) were identified by R software using limma package and the edgeR package, and obtained the overlapped DEGs from two databases. Then, the Gene Ontology(GO) function analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of overlapped DEGs were performed through DAVID database, and the protein–protein interaction(PPI) network was constructed to screen Hub genes for regulatory protein expression in BC. Expression and prognostic analysis of the hub genes were performed by UALCAN and Kaplan-Meier plotter.Results: A total of 372 overlap DEGs were obtained, of which 93 were up-regulated and 279 were down-regulated. These genes were mainly associated with the function and pathway enrichment such as glycosaminoglycan binding, vasculature development, Cell cycle, Proteoglycans in cancer. The protein-protein interaction network analysis obtained 12 hub genes. Among these hub genes，HMMR，NCAPG2，SMC4, TROAP were closely related to the survival rate of bladder cancer patients revealed that these genes might be the key genes play an important role in the occurrence and progression.Conclusion:Therefore, our current studies demonstrated thatHMMR, NCAPG2, SMC4, TROAP are potential prognostic biomarkers for BC.In the future, these may also become clinical therapeutic targets.

SMC4, TROAP were closely related to the survival rate of bladder cancer patients revealed that these genes might be the key genes play an important role in the occurrence and progression.
Conclusion:Therefore, our current studies demonstrated thatHMMR, NCAPG2, SMC4, TROAP are potential prognostic biomarkers for BC.In the future, these may also become clinical therapeutic targets.

Background
Bladder cancer (BC) is the 10th most common form of cancer globally, with an estimated 549,000 new cases and 200,000 deaths in 2018 1 . BC is more common in men than in women, with morbidity and mortality rates of 9.6 and 3.2 per 100,000 people, which is about four times that of women worldwide 1 .
The main risk factors for bladder cancer include tobacco smoking; industrial exposure to potential carcinogens such as aromatic amines and carbon black dust; long-term drinking of water contaminated with arsenic or chlorine; and a consistent family history of cancers 2 ; 3 . Currently, approximately 75% of patients have non-muscle-invasive bladder cancer (NMIBC) and 25% have muscle-invasive or metastatic disease 4 . Patients with high grade indicate poor prognosis 5 .According to reports, BC has a tendency to recur. The 5-year NMIBC recurrence rate is between 50% and 70%, and the 5-year progression rate range from 10-30% 6 ; 7 . The diagnosis of papillary BC depends on cystoscopy and histological evaluation of sampled tissue 8 . However, cystoscopy is an invasive examination that is primarily used to diagnose bladder cancer and to monitor tumor recurrence. It is not only expensive but also uncomfortable. In addition, no urine cytology has high enough sensitivity to replace cystoscopy 9 ; 10 . For 30 years, treatment for BC had seen little progress, and diagnosis occurs too late 11 . Therefore, there is an urgent need to seek novel and reliable BC biomarkers and potential therapeutic targets.
Because of the development of supercomputer technology, high-throughput sequencing for the analysis of gene expression are growing. Microarray technology and bioinformatics analysis are occupying an increasingly important position as essential tools in the eld of medical oncology 12 ; 13 . Lots of public tumor databases have been established, such as Gene Expression Omnibus (GEO) 14 and The Cancer Genome Atlas (TCGA) 15 . We processed a series of bioinformatics analysis based on public data obtained from two databases. In the present study, we detected HMMR,NCAPG2,SMC4 as well as TROAP are potential and effective biomarkers for diagnosis of BC, and closely related to the survival rate of BC. In summary, our study provides evidence for the pathogenesis and prognosis of BC.

Microarray data
We downloaded a gene expression dataset(GSE121711) from GEO (https://www.ncbi.nlm.nih.gov/geo/) database according to the following screening criteria:(a) the sample contains normal and tumor tissue;

Data processing and DEGs screening
The raw CEL data of the GSE121711 dataset were standardized by the Affy package 16 in software R (Version Rx64_3.6.2. https://www.r-project.org/). To identify differentially expressed genes (DEGs) in GSE121711 using limma package 17 with adjusted.P-value < 0.05 and |logFC| > 1 were used as the cut-off criteria. The TCGA BLCA dataset was processed by edgeR package 18 with adjusted.P-value < 0.05 and |logFC| > 1 were used as the cut-off criteria. Then, the signi cant DEGs obtained from the two datasets were shown in a volcanic map respectively. Subsequently, the online tool Venn diagrams (http://bioinformatics.psb.ugent.be/webtools/Venn/) was used to obtain overlapped DEGs. The upregulated and down-regulated genes are stored separately for subsequent analysis.
The Gene Ontology (GO) function analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of overlapped DEGs were performed via DAVID database(https://david.ncifcrf.gov/). The Database is an online biological information database for gene functional classi cation 19 . And the GO function analysis of overlapped DEGs involves three parts: biological process (BP), molecular function (MF), and cell component (CC). P < 0.05 was considered as statistically signi cant difference.
Protein-protein interactions network and module analysis STRING is an online search tool (https://string-db.org/) designed for Protein-Protein Interaction (PPI) networks functional enrichment analysis 20 . We constructed the PPI network after importing overlapped DEGs into the search tool (STRING, version 10.0).Only interactions with a combined score > 0.4 were used as the threshold. Then, Cytoscape (V3.6.1) 21 , a free visualization software, was performed to visualize PPI networks. Subsequently, we used the plug-in app cytoHubba and Molecular Complex Detection (MCODE) in Cytoscape to identify hub genes and signi cant modules inside the PPI network. Finally, a total of 12 hub genes with connection degree > 30 and top 3 modules with degree > 3 were screened.
Validation and prognostic signi cance analysis of the hub genes UALCAN (http://ualcan.path.uab.edu/) is an interactive portal that can deeply analyze the cancer genome map of TCGA database 22 . We compare the expression of hub genes in normal sample and BC sample. The Kaplan-Meier plotter (https://kmplot.com/analysis/) database was used to evaluate the prognostic value of hub genes. P < 0.05 considered the difference to be statistically signi cant.

Screening of differentially expressed genes in bladder cancer
The GSE121711 gene expression dataset was normalized by the Affy package in software R, and then the limma package was used to screen for DEGs. A total of 566 DEGs were obtained, of which 154 were up-regulated genes, and 412 were down-regulated genes. The edgeR package was used to analyze the TCGA-BLCA database, a total of 4753 differential genes were obtained, of which 2692 were up-regulated genes and 2062 were down-regulated genes. Then the DEGs were identi ed from the two database, which was showed in volcano map by ggplot2 package, respectively (Fig. 1A, B). For the data reliability of DEGs in BC, an overlap of 372 DEGs was identi ed from two datasets containing 93 up-regulated and 279 down-regulated genes ( Fig. 1C-E).
Using DAVID to analyze the GO function and KEGG pathway enrichment of overlapped DEGs. The top 15 GO terms of the overlapped DEGs were shown in Table 1.We can nd that the DEGs were mainly enriched in glycosaminoglycan binding, vasculature development, and anatomical structure morphogenesis. To further analyze the biological function of the DEGs in each part, the biological process (BP), cellular component (CC), molecular function (MF)in GO function analysis were shown in the top 10 results  As for the KEGG pathway analysis, the overlapped DEGs were mostly enriched in the cell cycle, proteoglycans in cancer, pathways in cancer, focal adhesion, cGMP-PKG signaling pathway, cell adhesion molecules (CAMs),p53 signaling pathway, regulation of actin cytoskeleton, DNA replication, and cAMP signaling pathway (Fig. 2D).The 16 pathways with p value < 0.05 is displayed in Table 2. PPI network analysis and screening for hub genes The overlapped DEGs were uploaded to the online tool STING for protein interaction network analysis to obtain the PPI network (Fig. 3A). Then, Cytoscape software was used to visualize PPI networks and we used plug-in cytoHubba to screen the hub genes from the PPI network. And the top 12 genes with the highest degree of connectivity were identi ed as the hub genes in Table 3. Subsequently, hub genes were imported into STRING to detect the interaction between the proteins encoded by these genes (Fig. 3B). In addition, to further analyze the interrelationships in the PPI network, the plug-in MCODE was used to select the top 3 important protein expression modules in the PPI network ( Fig. 3C-E).  Fig. 4, most of the hub genes expression in tumor samples were signi cantly increased compared to normal samples. These ndings indicated that the results of our candidate hub gene are reliable.
In order to further analyze the correlation between these hub genes and the prognosis of patients, we used Kaplan-Meier plotter for survival analysis (Fig. 5A-L), and found that HMMR, NCAPG2, SMC4, and TROAP were signi cantly associated with the survival of bladder cancer patients (P < 0.05). The overall survival rate of patients with high expression levels of HMMR, NCAPG2, SMC4, and TROAP in bladder cancer was signi cantly reduced, whereas the overall survival of others were not signi cantly changed, which displayed that HMMR, NCAPG2, SMC4, and TROAP were associated with BC progression and might be used as tumor progression predictors and potential therapeutic targets for BC patients.

Discussion
Bladder cancer is one of the 10 most common cancers in the world 23 . BC is male predominance with about 4 times more common in men than in women. Because of the inequality in resource allocation within the bladder cancer, more than 60% of all bladder cancer cases and half of all the 165000 bladder cancer deaths occur in the less developed regions of the world 24 ; 25 . At rst diagnosis, about 75% of patients have non-muscle-invasive bladder cancer and 25% have muscle-invasive or metastatic disease.
The recurrence rate can reach about 50% -70% within 1 year after surgery, even 10% -30% of patients progressed to muscle-invasive stage or metastasize to a distant place 26 ; 27 . Although novel diagnostic and treatment was implemented, the ve-year survival rates for bladder cancer has been no improvement since the 1990s 28 . Therefore, it is still urgent and challenging to nd new diagnostic markers and therapeutic targets and methods.
In the present study, we identi ed the overlapped 372 DEGs in normal bladder sample and BC sample by bioinformatics approach from the GEO and TCGA databases. Then, GO (BP, CC and MF) enrichment analysis were performed and we obtained the following result: glycosaminoglycan binding, heparin binding, sulfur compound binding, extracellular matrix binding, and binding (MF); and extracellular space, contractile ber, myo bril, contractile ber part, and extracellular matrix (CC); and anatomical structure morphogenesis, circulatory system development, blood vessel development, movement of cell or subcellular component, and tissue development (BP). The results show that the overlapped DEGs are related to the mitotic process and the invasion of bladder cancer cells. The KEGG pathway analysis demonstrated that the overlapped DEGs are mainly enriched in the cell cycle, proteoglycans in cancer, pathways in cancer, focal adhesion, cGMP-PKG signaling pathway, cell adhesion molecules (CAMs), p53 signaling pathway, regulation of actin cytoskeleton, DNA replication, and cAMP signaling pathway, which also implicated tumorigenesis or progression. Hence, further study of these pathways will help elucidate the mechanism of BC and help predict cancer progression. A PPI network was constructed with the overlapped 372 DEGs to select the following 12 hub genes: CDK1, UBE2C, MKI67, MAD2L1, BIRC5, RRM2, CEP55, PTTG1, HMMR, SMC4, NCAPG2, TROAP. These hub genes are all potential biomarkers for BC. In addition, we performed a survival analysis of these 12 hub genes using the Kaplan-Meier plotter. And we found that, the expression levels of HMMR, NCAPG2, SMC4, and TROAP were associated with BC progression. These studies may give novel diagnostic methods and therapeutic targets for BC patients.
HMMR is also called receptor for hyaluronate-mediated motility (RHAMM), which is critical cell surface binding proteins of Hyaluronic Acid (HA) 29 ; 30 . As was reported, HMMR was signi cantly up-regulated in BC tissues compared with normal bladder tissues, which is consistent with our results 31 ; 32 . Previous studies showed that HMMR involved in oncogenic properties several mechanisms in multiple cancers, such as in uencing mitotic spindle assembly, regulating cell signaling pathways, and modulating the expression of growth factor receptor [29,33]. HMMR has also been found over expressed in gastric cancer 33 , glioblastoma 34 ,head and neck carcinomas 35 .The increased expression of HMMR directly leads to an increase in the metastatic potential of many types of tumors, including prostate cancer 36 , breast cancer 37 , colorectal cancer 38 ,and bladder cancer 39 . Available evidence showed that HMMR is considered to be an important downstream regulator of HA, which can lead to the rapid growth of bladder cancer cells 40 .One recent study showed that inhibition of HMMR expression in vitro can reduce the proliferation of bladder cancer cells and inhibit the growth of xenograft tumors in vivo 41 .These results suggest that HMMR mediates an important signaling pathway for bladder cancer cell growth and proliferation.
NCAPG2 is one of the components of the non-SMC condensing II complex, which contributes to chromosome segregation via microtubule-kinetochore attachment during mitosis 42 . Previous study found that NCAPG2 plays a critical role in regulating cell mitosis by recruiting Polo-like kinase 1 (PLK1) to the kinetochore during prometaphase 43 . It is well known that PLK1 is an oncogene in a variety of human cancer types [44][45][46][47][48] . However, the role of altered NCAPG2 expression and its transcriptional regulation in tumorigenesis and progression remains unclear. In our present study, we found that the NCAPG2 expression of were signi cantly increased in BC tissues compared to the normal bladder tissues. The prognostic role of NCAPG2 expression was identi ed in Kaplan-Meier Plotter analysis. And these results need to be further veri ed.
Structural maintenance of chromosome 4 (SMC4) is a core subunit of condensin complexes, which chie y involved in chromosome condensation and segregation 49 . Overexpression of SMC4 is related to tumorigenesis and involved in tumor cell cycle, migration and invasion 50 .Previous studies have shown that SMC4 protein is highly increased in various tumors and correlated with poor prognosis of cancer patient [51][52][53][54][55] . However, its clinical signi cance and functional role in BC remain unknown. In our study, we found for the rst time that highly expressed level of SMC4 in BC is associated with poor prognostic. We suppose that SMC4 may be used as a signi cant prognostic factor in BC, and the underlying molecular mechanisms need to be further investigated.
TROAP (also called tastin) is a cytoplasmic protein required for microtubular cytoskeleton regulation and proper bipolar organisation of spindle assembly, and it plays a critical role in cell proliferation 56 . Recent literature reports that TROAP is involved in the initiation, invasion, and migration of multiple cancers. A previous study found that high expression of TROAP promotes progression and is related to the poor prognosis of gallbladder cancer 57 . In addition, Ye et al. 58 discovered that TROAP is one of the driving mechanisms of Wnt/β-catenin signalling affects proliferation in prostate cancer and its expression correlates with patient survival. However, the role of TROAP in BC has never been reported. In our study, we found that the elevated TROAP expression is related to poor prognosis of BC. Functional analysis demonstrated that TROAP is correlate with mitotic cell cycle process. We deduced that TROAP may have a role in regulating tumorigenesis and be a valuable prognostic factor in BC.
In this study, we integrated two databases and used bioinformatics method to screen out genes with differential expression in BC and normal bladder tissues. However, this research has some limitations. We did not have our own microarray data, but obtained them from the public database. And the TCGA-BLCA dataset only contains four NMIBC cases, which may cause some important genes to be ignored and limit the ability to produce effective results. Further experimental research requires a larger sample size to con rm our results. This will be the next step of our research.

Conclusion
In conclusion, our research revealed twelve 12 hub genes, including CDK1, UBE2C, MKI67, MAD2L1, BIRC5, RRM2, CEP55, PTTG1, HMMR, SMC4, NCAPG2, and TROAP. All these genes might be potential predictors for BC. In addition, four of them, including HMMR, SMC4, NCAPG2, and TROAP have the capability to be prognostic predictors and new therapeutic approaches in BC patients. We tried to explore its underlying mechanism by bioinformatics method. And this needs to be further con rmed by in vivo and in vitro experiments. In the future, we will conduct further research to verify these ndings.