Identifying novel cell glycolysis related gene signature predictive of overall survival in bladder urothelial carcinoma


 Background: Bladder urothelial carcinoma (BLCA) is the most common pathological type of bladder cancer and featured by a high risk for relapse and metastasis. Although many biomarkers have been developed by data mining and experimental studies to predict the prognosis of BLCA, a single-gene biomarker usually has poor specificity and sensitivity, leading to unsatisfactory prediction. Therefore, novel gene signatures are needed to more accurately predict the prognosis of BLCA.Methods: Data mining was performed for expression profile analysis of 433 mRNA expression data from the TCGA BLCA patients (n=412). Gene Set Enrichment Analysis (GSEA) was used to interpret the glycolysis-related gene sets. Gene signature related to the prognosis of BLCA was identified by univariate and multivariate Cox regression. A risk score was computed based on three genes by linear regression model and its relation with overall survival was investigated by Kaplan-Meier analysis.Results: Three genes (CHPF, AK3, NUP188) were found to be significantly correlated to the overall survival of BLCA patients. Based on the signature composed of these three genes, 412 BLCA patients were divided into high-risk and low-risk groups. The survival time of the high-risk group was significantly shorter than that of the low-risk group, indicating a worse prognosis.Conclusion: A signature composed of three glycolysis-related genes was developed as biomarkers to predict the prognosis of BLCA and to provide a meaningful reference for the clinical treatment of BLCA.


Introduction
Bladder cancer is one of the most common cancers throughout the world. In 2012, there were 429,800 new cases of bladder cancer and 165,100 deaths related to bladder cancer worldwide [1]. Bladder urothelial carcinoma (BLCA) is the most common subtype of bladder cancer, accounting for over 90% of bladder cancer cases [2]. BLCA is more likely to relapse and develop metastasis, which brings enormous pain and economic burden to the patients [3]. Treatments for BLCA include surgical resection, radiotherapy, chemotherapy, and targeted therapy [3][4]. Although the treatments of BLCA show good e cacy, the prognosis of BLCA patients may vary under the same treatment due to individual differences. Therefore, nding proven biomarkers to effectively predict the prognosis of BLCA is of particular importance.
The disorder of the metabolic pathways is one of the hallmarks of cancer. The proliferation of cancer cells involves metabolic processes. Even if there is su cient oxygen, cancer cells preferably undergo glycolysis to generate materials and energy needed for life activities. This unique phenomenon of energy metabolism is known as the Warburg effect [5]. At present, some glycolysis related biomarkers have already been used to predict the prognosis of BLCA patients. An upregulation of the abnormal spindle-like microcephaly (ASPM) expression level and a downregulation of thyrotroph embryonic factor (TEF) expression level are signi cantly correlated with a decreased overall survival of the BLCA patients [6].
Forkhead box A1 (FoxA1) depletion can lead to increased proliferation and invasion of the BLCA cells and is considered an independent adverse prognostic factor [7]. Besides, slingshot homolog-1 (SSH-1) expression is associated with the poor prognosis of BLCA patients [8].
With the rapid development of high-throughput sequencing technology, several cancer genomic databases have been established, offering an excellent channel to understand cancer-causing genomic changes [9][10]. Data mining can be performed to discover candidate biomarkers related to the prognosis of cancer patients. However, there are few reports on the metabolic status of cancer patients and its prognostic value. Nearly all of the existing studies use a single-gene biomarker to predict its role in the prognosis of BLCA patients, and the prediction effect is usually far from satisfaction. Therefore, building an expression-based genetic signature is of major clinical signi cance for predicting the survival of BLCA patients.
In the present study, genome-wide expression pro le datasets were analyzed by using GSEA and Cox multivariate regression model to identify novel biomarkers predicting the prognosis of BLCA patients. All mRNA expression data of 412 BLCA patients were downloaded from the TCGA database, and comprehensive analysis was conducted for gene sets. The glycolysis-related mRNAs were identi ed, and the risk features of three genes were established. The signature composed of these three genes can effectively predict the prognosis of BLCA patients. Much to our surprise and excitement, the risk factor related to glycolysis can be independently used to evaluate high-risk patients with poor prognosis. We developed and veri ed a novel glycolysis-related gene signature.

Methods And Materials
Data acquisition MRNA expression pro les and clinical datasets of BLCA patients were downloaded from TCGA(https://cancergenome.nih.gov/). The clinical data of 412 patients were used, including the survival time, survival status, gender, age, the tumor, node, metastasis (TNM) staging, T stage, N stage, and M stage. Mutations of the selected genes were obtained from the cbioportal database (http://www.cbioportal.org/).

GSEA analysis
GSEA was performed to determine whether there was a signi cant difference in the glycolysis-related gene sets between the normal group and BLCA group. The expression data of 433 samples in TCGA data set were analyzed, including 19 paracancerous samples and 414 BLCA samples. A normalized p-value was used to establish the functions used for subsequent analysis.

Statistical analysis
Log2 transformation was carried out to normalize the expression of each gene in the expression pro le.
Univariate Cox regression was used to screen for genes associated with the overall survival. After that, multivariate Cox regression was performed to screen out the prediction model with the best explanatory and informational effect. The selected genes were divided into risky type (hazard ratio [HR]>1), and protective type (0<HR<1). The expressions of the genes were subject to linear combinations according to the coe cient, and the risk parameter formula was constructed: risk parameter=∑ (βn×gene expression n). The patients were divided into low-and high-risk groups based on the median of the risk values. The prognostic signi cance of the risk score was estimated using the Kaplan-Meier survival curve and logrank test. Student's t-test was used to compare the expressions difference of the optimized genes between paracancerous tissues and BLCA tissues. All statistical analyses were performed by using R3.6.2 (https://www.r-Project.org/) and Graph Pad Prism 7. P< 0.05 was considered statistically signi cant.

Results
Preliminary gene screening based on GSEA Detailed information regarding the clinical data of 412 BLCA patients has been presented in Table 1. Based on the clinical data of 412 BLCA patients from TCGA and mRNA expression datasets of 433 noncancerous and BLCA samples, we have found all glycolysis-related gene sets on the GESA gene database (https://www.gsea-msigdb.org/gsea/msigdb/search.jsp). There were ve different datasets, namely, BIOCARTA_GLYCOLYSIS_PATHWAY, GO_GLYCOLYTIC_PROCESS, HALLMARK_GLYCOLYSIS, KEGG_GLYCOLYSIS_GLUCONEOGENESIS and REACTOME_GLYCOLYSIS. Next, GSEA was performed to determine whether there was a signi cant difference in these ve glycolysis-related gene sets between the paracancrous samples and BLCA samples. The datasets GO_GLYCOLYTIC_PROCESS and REACTOME_GLYCOLYSIS were signi cantly enriched between paracancerous samples and tumor samples(P<0.01). However, the other three glycolysis-related gene sets HALLMARK_GLYCOLYSIS, BIOCARTA_GLYCOLYSIS_PATHWAY, and KEGG_GLYCOLYSIS_GLUCONEOGENESIS were not signi cantly enriched between paracancerous samples and tumor samples (P>0.05, Table 2, Fig.1).
Identi cation of glycolysis-related genes predictive of the survival of BLCA patients In order to determine the novel genetic biomarkers predicting the prognosis of BLCA patients, we selected the core genes from the glycolysis-related datasets. These genes were further analyzed by univariate Cox regression to screen for mRNAs related to overall survival. Five genes were found to be signi cantly correlated with the overall survival (P<0.05). Multivariate Cox regression con rmed that three independent genes (NUP188, CHPF, and AK3) were signi cantly associated with overall survival (Table3). Then, the cBioPortal database was used to analyze the genomics alternations and the expression pro les of the three genes from 412 BLCA samples. The results showed that NUP188, CHPF, and AK3 had mutations in 6%, 2.2% and 3% patients (Fig.2a). We further analyzed the differential expression of the NUP188, CHPF, and AK3 genes between BLCA tissues and para-cancerous tissues. As compared to non-cancerous tissues, NUP188,CHPF and AK3 were signi cantly upregulated in the BLCA tissues (Fig.2b).
Relationship between the risk score and prognosis of BLCA patients The gene-based prognostic model was built to estimate the survival risk of each BLCA patient: Risk score= 0.058×expression of NUP188 +0.005×expression of CHPF+(-0.031 ) expression of AK3. The risk value of each BLCA patient was calculated, and ranked in increasing order (Fig.3a). According to the median of the risk value, patients were divided into high-risk (n=203) and low-risk (n=204) groups. Figure  3b shows the risk scores, overall survival (in the unit of year), and survival status of 412 patients in the dataset. The patients with higher risk scores had a lower mortality rate than those with lower risk scores. The heat map (Fig.4) shows the expression pro les of the three mRNAs. As the risk score of the BLCA patients increased, the expressions of the risky-type mRNAs (CHPF and NUP188) were signi cantly upregulated. In contrast, that of the protective-type mRNA (AK3) was signi cantly downregulated.
The risk score is an independent prognostic indicator. Veri cation of the e cacy of the risk score in survival prediction by using the Kaplan-Meier curve Kaplan-Meier curve analysis showed that age, TNM stage, T stage, N stage, M stage and high risk score were correlated to the adverse overall survival of BLCA patients (P <0.001 for all cases, Fig.5a,b). In order to evaluate whether the proposed prediction model is applicable to different populations, the model built by clinical grouping was veri ed. As shown in Figure 6, for BLCA patients classi ed as TNM stage, T stage, N stage (N0), M stage (M0), age (<65) and gender (male), risk score was a reliable prognostic marker. The high-risk group was associated with a poor prognosis. However, when the patients were divided into different subgroups by TNM stage, age, and gender respectively, the predictive effect of the risk score varied. In the subgroup of patients aged above 65, the overall survival of the high-risk patients was signi cantly lower than that of the low-risk patients (P=0.001,Fig.6b). However, there was no statistical signi cance in the subgroup aged below 65 (P=0.064, Fig.6b). Similar results were observed in the subgroup divided by gender (male, P<0.001; female, P=0.096; Fig.6a). As shown above, the risk parameter could no longer be used as a sole prognostic marker for the subgroup of patients aged below 65 or the subgroup of females (Fig.6a,b). Moreover,as shown in Fig 6e and 6f, the prognostic predictive value of risk score in patients of N0 stage (P<0.001) and M0 stage (P<0.001) was considerably higher than in patients of N1-3 stage (P=0.086) and M1 stage (P=0.566). These results suggest that, in the early stages of BLCA, risk score might be a more useful prognostic marker.

Discussion
Studies on the mechanism of glycolysis affecting tumor occurrence, development, proliferation and invasion are emerging. However, very few of them are devoted to glycolysis-based prediction of survival of cancer patients. For example, FOXJ1 can promote aerobic glycolysis and proliferation of bladder cancer cells. FOXJ1 upregulation is associated with a poor clinical prognosis of the patients [11]. MiR-4999-5p can regulate the glycolysis of the colorectal cancer cells and promote the clone formation, proliferation and invasion of the colorectal cancer cells. The overall survival of the colorectal cancer patients with MiR-4999-5p upregulation is generally low, and MiR-4999-5p upregulation can be used as an independent predictor [12]. However, currently no single glycolysis-related gene signature has been developed to predict the survival of BLCA patients. In the present study, bioinformatics methods were used to screen and analyze glycolysis-related mRNAs, which were signi cantly correlated to the prognosis of BLCA patients. Three marker genes (CHPF, AK3, NUP188) were identi ed. Among them, chondroitin polymerizing factor (CHPF) is a type II transmembrane protein composed of many amino acids. Studies have shown that CHPF is abnormally upregulated in cancer tissues, such as glioma, lung cancer, and colorectal cancer [13][14]. CHPF upregulation can promote the proliferation of glioma cells while inhibit the apoptosis, thereby predicting a poor prognosis [14]. Adenylate kinase 3 (AK3) is localized to the mitochondrial matrix and participates in maintaining the dynamic balance of nucleotides in cells. Human chromosome 9 carries the AK1, AK2 and AK3 genes [15]. It has been reported that chromosome 9p deletion occurs in the bladder cancer and has close connections with pathological staging and prognosis of bladder cancer [15][16]. AK3 compounds are blockers of mitosis, which block the proliferation and apoptosis of the cells in colonic and breast cancers. AK3 upregulation positively correlates with the overall survival of patients with colonic cancer and breast cancer [17][18], AK3 downregulation can lead to severe inhibition of the growth of the cancer cells in chronic lymphocytic leukemia [19], which is in line with our nding that AK3 is a protective type mRNA. NUP188 is a nucleoporin regulating chromosome segregation and promotes tumor cell division and growth by participating in mitotic process. In mitotic cells, Nup188 depletion can lead to mitotic arrest and cell division arrest [20]. There are only a limited number of reports on the relationship between the three genes and BLCA and the molecular mechanism of these genes. Our research in detail reported the involvement of the three genes in the prognosis of BLCA.
GSEA is a method to integrate data of different levels and sources [21]. In the present study, expression data of 19 paracancerous samples and 414 BLCA samples were used to perform GSEA, and two gene sets with signi cant functional difference were identi ed. Then, GSEA was used to select genes predicting the survival of patients, and these genes were further investigated by univariate and multivariate Cox regression of the selected genes from the BLCA patients. The signature composed of three genes was nally determined to be of potential prognostic predictive value for BLCA patients. As compared with the con rmed prognostic biomarkers, this gene signature might be more targeted and convincing in terms of prognostic predictive value, and can be used as an effective tool for classi cation of BLCA patients. BLCA gene sets were downloaded from TCGA and the glycolysis-related genes were analyzed. Comparison was done between tumor tissues and the adjacent normal tissues. The Kaplan-Meier survival curve showed that a high risk score was closely associated with a low survival rate. It is thus implied that estimating the risk score for BLCA patients can help predict the prognosis.
Although a prognostic prediction model was built based on this gene signature, our study had some limitations. Firstly, TCGA database was the only data source in the present study, and no clinical data of relapse and metastasis of the BLCA patients were included. Therefore, we could only perform prognostic analysis using overall survival data. In the strati ed analysis, the risk parameter was a reliable prognostic predictor except for the subgroups where the patients were aged below 65, were females, of N1-3 stage, or of M1 stage. The reasons for these ndings remain unknown and need to be further discussed. Besides, the signature composed of the three genes was not veri ed for the molecular biological function in BLCA, which might need further studies in the future.

Conclusion
The three glycolysis-related genes associated with the overall survival of the BLCA patients were veri ed by GSEA and Cox regression. The signature composed of these three genes could predict the prognosis of the BLCA patients. The higher the risk score, the worse the prognosis. The use of this gene signature may improve the success rate of individualized cancer therapy. Our ndings may provide potential glycolysis-related biomarkers for predicting the prognosis of BLCA patients, and may also provide a basis for further understanding of the occurrence and development mechanism of glycolysis in BLCA.   Different expression of three genes in the normal tissues and tumor tissues obtained from TCGA. (***represent P < 0.001,**represent P < 0.01,*represent P < 0.05).