Identifying stage-associated hub genes in bladder cancer via weighted gene co-expression network and robust rank aggregation analyses

Background: Bladder cancer (BC) is among the most frequent cancers globally. Although substantial efforts have been put to understand its pathogenesis, its underlying molecular mechanisms have not been fully elucidated. Methods: The robust rank aggregation approach was adopted to integrate 4 eligible bladder urothelial carcinoma microarray datasets from the Gene Expression Omnibus. Differentially expressed gene sets were identified between tumor samples and equivalent healthy samples. We constructed gene co-expression networks using weighted gene co-expression network to explore the alleged relationship between BC clinical characteristics and gene sets, as well as to identify hub genes. We also incorporated the weighted gene co-expression network and robust rank aggregation to screen differentially expressed genes. Results: CDH11, COL6A3, EDNRA, and SERPINF1 were selected from the key module and validated. Based on the results, significant downregulation of the hub genes occurred during the early stages of BC. Moreover, receiver operating characteristics curves and Kaplan–Meier plots showed that the genes exhibited favorable diagnostic and prognostic value for BC. Based on gene set enrichment analysis for single hub gene, all the genes were closely linked to BC cell proliferation. Conclusions: These results offer unique insight into the pathogenesis of BC and recognize CDH11, COL6A3, EDNRA, and SERPINF1 as potential biomarkers with diagnostic and prognostic roles in BC.


Introduction
Bladder cancer (BC), a prevalent urological malignancy, is a global public health concern, and the 9th commonly diagnosed cancer in men, especially in high-income countries. [1] Following a report by Boccardo et al, nearly a quarter BC cases are at first diagnosed as muscle-invasive BC. Moreover, <16% of patients, characterized by non-muscle-invasive BC present with invasive recurrent cancer during treatment, in most cases, within 1 year. [2] As the tumor progresses, BC survival rate declines remarkably. The BC symptoms are usually atypical, without any uniqueness, which poses difficulty in earlier diagnosis. [3] Based on the current understanding, BC diagnosis and surveillance primarily incorporates cystoscopy and urine cytology, however, these approaches are unsatisfactory. [4] Besides, an ideal BC detection technique must be more Medicine convenient and rapid. Hence, researchers should urgently uncover more accurate indices for clinical staging, treatment and prognosis of BC.
In this work, we explored 4 independent microarray datasets abstracted from Gene Expression Omnibus web resource (GEO, https://www.ncbi.nlm.nih.gov/geo/) with robust rank aggregation (RRA) to reveal robust differentially expressed genes (DEGs) between BC tissues and matched control. Thereafter, we subjected the DEGs to weighted gene co-expression network analysis (WGCNA) to determine key modules related to clinical parameters. Using the Gene Ontology (GO) functional annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, we assessed the potential functions of the genes within the key module. In exploring the biosignatures and targets for BC therapy, we did a range of analyses via mining of sequencing data with high-throughput, retrieved from publicly available databases. Consequently, the present study reported CDH11, COL6A3, EDNRA, and SERPINF1 as potential biomarkers and therapeutic target of BC, and are all linked to the prognosis of individuals with BC.

Data processing
Employing the GEO website, sequential matrix files of cohorts were retrieved. The R package "limma" [9] was used for data normalization and identify the DEGs. Then, we employed the RRA to integrate the findings of the 4 cohorts to identify DEGs with the highest significance. [10] Genes with a corrected P value < .05 and |log fold change (FC)| > 1 were considered as significant DEGs in the RRA analysis.

GO and KEGG pathway analysis
With the Database for Annotation, Visualization and Integrated Discovery (DAVID, https://david.ncifcrf.gov/), important for functional analysis of genes, we performed KEGG pathway enrichment and GO functional analyses, P < .05 for statistical significance.

WGCNA analysis of the filtered genes
Herein, 343 DEGs were retrieved following RRA analysis. This aided in obtaining WGCNA with expression data from TCGA. Using the R package "WGCNA," we uncovered the associated hub genes and clinical traits-related modules. [11] Using the topological overlap measure matrix, transformed through an adjacency matrix, we estimated its network connectivity. [12] Thereafter, we established a hierarchical clustering dendrogram of the topological overlap measure matrix employing the average distance with a value of 20 as the minimum size threshold. This was to group genes with similar expression patterns into distinct gene modules, after which we determined the correlation of different module eigengenes with the clinical features. We evaluated the gene significance (GS) quantifying correlations between individual genes and the module membership (MM) as well as the clinically interesting trait which depicts the association of the module eigengenes with gene expression profiles. Following previous reports, if the GS and MM were highly associated, the highly critical elements in the modules were also strongly linked to the trait. [13] We used the highly correlated module to explore potential function via GO and KEGG analyses and for hub gene screening. Notably, we defined hub genes with: GS > 0.2, and MM > 0.8.

Validation and survival analysis of hub genes
We employed "ggstatsplot" (R packages, https://cran.r-projrct. org/web/packages/ggstatsplot) to verify the levels of expression of hub genes between BC and neighboring healthy tissue sample. Also, we evaluated how they are correlated with clinical traits in TCGA-BLCA dataset. Accordingly, we employed the independent samples t test or one-way analysis of variance. To evaluate the diagnosis values of hub genes, we generated receiver operating characteristic (ROC) curves and used "survminer" (R package, https://CRAN.R-project.org/package=survminer) and "survicval" (R package, https://CRAN.R-project. org/package=survival) to calculate for hub genes. For tumor samples within the TCGA-BLCA dataset, we classified them into 2 groups relying on the best-separation cutoff value for each hub gene. After that, we plotted the Kaplan-Meier (KM) survival curves.
The image of immunohistochemistry staining of the selected prognosis-related genes in normal tissue and BC tissues were

Establishment and estimation of multi-gene prognostic signature
Based on the multivariate Cox proportional hazards regression model, 4 optimal prognostic genes were identified. After assembly of expression levels and coefficients for each gene, a linear combination method was applied to get risk scores, which is as follows: where Exp is the expression level of each prognostic gene, and β is the regression coefficient of it. A median risk score was used as a cutoff for stratifying patients into high-risk and low-risk groups. As a comparison of the survival differences between above 2 groups, we also used KM survival analysis with log-rank tests.
The prognostic power of the risk score and some clinical parameters, such as age, gender, grade, stage, T-stage, N-stage, M-stage, was assessed by using univariate Cox proportional hazards regression analysis. Moreover, we assessed whether the risk scores could be independent prognostic factors based on levels of risk using multivariate Cox proportional hazards regression analysis for PCa patients. Other clinical parameters with statistically significant difference (P < .05) in univariate Cox proportional hazards regression were also incorporated in the analysis.

Oncomine database
Herein, we retrieved transcriptional expression profiles of CDH11, COL6A3, EDNRA, and SERPINF1 in BC patients using the Oncomine web resource (https://www.oncomine. org). [14] To compare the differences in transcriptional expression, we employed Student t test with fold change and cutoff of P value as follows: data type: messenger ribonucleic acid (mRNA), P value = .01, gene rank = 10%, fold change = 1.5.

Tumor Immune Estimation Resource (TIMER)
TIMER (https://cistrome.shinyapps.io/timer/) offers a web interface, which is user friendly, important for dynamic analysis of the associations of immune infiltrates with gene expression. [15] Using the gene module, we validated the association between immune infiltration and genes. We then generated scatterplots, depicting statistical significance and Spearman correlation.

Data processing of gene set enrichment analysis (GSEA)
Using the R package "clusterprofiler," [16] we conducted a GSEA analysis of hub genes using TCGA-BLCA RNA-dataset. For each hub gene, we determined the median expression by classifying 414 BLCA samples into high and low expression groups. We considered P < .01 to be statistically significant. For the reference gene set, we used "h.all.v7.1.symbols.gmt," abstracted from the Molecular Signature Database (MSigDB, http://software.broadinstitute.org/gsea/msigdb/index.jsp).

Statistical analysis
The results were given as means ± standard deviation of independent experiments. P values were calculated using SPSS v. 24.0 software with unpaired, 2-tailed Student t test or which indicated with one-way analysis of variance followed by Turkey test. P values of <.05 were considered to indicate statistical significance. *P < .05, **P < .01, and ***P < .001.

Identifying robust DEGs via the RRA method
Using the selection criteria, 4 independently eligible BLCA datasets were enrolled for subsequent RRA analysis. A series of clinical traits, including GEO accession ID, platform ID, as well as the number of genes for each platform are displayed in Table 1. [5][6][7][8] Based on RRA analysis data, we identified 111 upregulated and 232 downregulated remarkable DEGs (Supplementary file 1, Supplemental Digital Content, http://links.lww.com/MD/I153, which illustrates 343 remarkable DEGs). Besides, the top 50 upregulated, as well as downregulated DEGs are depicted in the heatmap (Fig. 1).

Functional enrichment analysis of DEGs
The biologically functioning DEGs were revealed via the GO and KEGG functional enrichment analysis using DAVID. We considered the results significant only if P < .05, we have highlighted the 3 categories of the GO results in Figure 2A and B. Results on the upregulated and downregulated DEGs in top 15 findings derived from the GO enrichment analysis are depicted in Table 2 and Table 3. Of note, the upregulated genes were highly enriched in protein binding (ontology: MF), nuclear division during mitosis (ontology: BP), and cytoplasm (ontology: CC). Besides, the downregulated genes were highly abundant in, www.md-journal.com extracellular exosome (ontology: CC), and binding of calcium ions (ontology: MF) and cell adhesion (ontology: BP). As to KEGG pathway analysis, extracellular matrix (ECM)-receptor interaction, focal adhesion, P13K-Akt signaling cascade, proteoglycans in cancer, as well as vascular smooth muscle contraction, were mostly associated with these genes (Fig. 2C).

WCGNA analysis and modules significance calculation
To reveal the key modules highly related to the clinical characteristics of BC, we analyzed the WGCNA on the TCGA-BLCA cohort by integrating the DEGs retrieved from the RRA analysis (Fig. 3). Clinical information of BC sample from TCGA, including stage, age, grade, and TNM classification were retrieved (Fig. 3A). We set the soft-thresholding power at 6 (scale free Following this evaluation, genes within the blue modules were primarily linked to signal transduction, cell adhesion, and ECM organization.

Survival analysis and significant gene identification
We assessed whether the 19 hub genes in BC were clinically relevant. To achieve this, correlation assessment of the hub genes with prognosis outcome of BC patients in TCGA-BLCA data sets was performed. By optimizing the cutoff values for hub gene analysis, CDH11, COL6A3, EDNRA, and SERPINF1 were highly expressed and were associated with poor prognosis (Fig. 5A and Figure S2, Supplemental Digital Content, http:// links.lww.com/MD/I151, which analyses all hub genes' survival in the WGCNA blue module). Furthermore, ROC curves demonstrated that they had high diagnostic potential as BC biosignatures ( Figure     We compared the mRNA expression of CDH11, COL6A3, EDNRA, and SERPINF1 between bladder tumor and neighboring healthy tissues, respectively. This was based on data for RNA-seq obtained from the Oncomine and TCGA databases. Notably, the transcriptional levels of CDH11, COL6A3, EDNRA, and SERPINF1 expressions were lowly expressed in BC tissues in comparison to healthy tissues (Fig. 5B). Moreover, Human Protein Atlas database was used to validate the protein expression of the 4 genes. However, no difference was found for CDH11 and COL6A3 protein expression (Fig. 6). Besides, there was a significant correlation of CDH11 mRNA expression and BC samples with a mild clinical stage (Fig. 5C), whereas the lowest CDH11 mRNA expression was reported stage I + Ⅱ. Similarly, we evaluated the association of CDH11 mRNA expression with different pathological grade, whereby it was revealed that mRNA expression of CDH11 is significantly correlated with lower pathological grades (Fig. 5D). Additionally, mRNA levels of COL6A3, EDNRA, and SERPINF1 were lower in BC tissues (Fig. 5B). COL6A3, EDNRA, and SERPINF1 mRNA expression in BLCA sample were significantly correlated with mild clinical staging, whereas the lowest COL6A3, EDNRA, and SERPINF1 mRNA expression were detected in stage Ⅰ + Ⅱ (Fig. 5C). Moreover, mRNA expression levels of COL6A3, EDNRA, and SERPINF1 were related to lower clinicopathological grading (Fig. 5D). Collectively, we demonstrated that the expressions of CDH11, COL6A3, EDNRA, and SERPINF1 were lower in BC tissues compared to healthy tissues. Thus, the hub gene CDH11, COL6A3, EDNRA, and SERPINF1 could play a pivotal role in BC progression. Overall, low expression of CDH11, COL6A3, EDNRA, and SERPINF1 mRNA is significantly associated with mild clinical-pathological parameters in BC patients and is significantly lowered in the early disease stages. This may be vital in the early BC diagnosis.

Identification and estimation of a 4-gene prognostic signature
A multivariate Cox proportional hazards regression analysis of 4 prognostic genes was conducted to determine the prognostic significance of each gene for PCa patients (Fig. 7A). Thus, we constructed a gene-based risk score based on their Cox coefficients: risk score = −0.191090924*Exp (CDH11) + 0.141406166*Exp (COL6A3 ) + 0.099154851*Exp (EDNRA) +  0.101448872*Exp (SERPINF1) . In the next step, the risk score of every patient was calculated, based on which we obtained the median cutoff point using the R package "survminer" and divided the patients into 2 groups, high risk group (n = 203) and low risk group (n = 204) (Fig. 7C). All patients' survival states are shown in Figure 7D, while the heatmap of 4 prognostic genes can be seen in Figure 7E. As indicated by the KM survival curves (Fig. 7B), high-risk patients had a worse overall survival than low-risk patients. The univariate and multivariate Cox regression analysis demonstrated that other clinical parameters, such as age, gender, grade, stage and TNM stage were not independent risk factors of the overall survival, except for the 4 genes prognostic signature (Fig. 8).

Association of hub genes' expression with tumorinfiltrating immune cells
Referring to the critical roles of invading immune cells within the tumor microenvironment, we comprehensively analyzed immune signatures plus immune infiltrates. From the TIMER web resource, the association between CDH11, COL6A3, EDNRA, and SERPINF1 immune signatures and tumor purity or numerous vital immune cells was revealed. CDH11, COL6A3, EDNRA, and SERPINF1 were all negatively correlated with tumor purity. The correlations (Cor > 0.5 and P < .05) were considered to be the strongest correlated.
Although it was observed no or weak correlations of these genes with infiltration of CD8 + T cells, dendritic cells, CD4 + T cells, B cells, and neutrophils, CDH11, COL6A3 and SERPINF1 were significantly associated with macrophages ( Fig. 9).

Discussion
BC, being the most prevalent malignant tumors of the genitourinary system has in recent years, shown an increasing incidence.
More importantly, identifying the prognostic, as well as predictive biosignatures for BC is vital because BC is a diverse disease with unpredictable clinical endpoints. [3] A wealth of studies have shown that progression of BC is attributed to the accumulation of cellular and molecular aberrations, such as transcriptomic, miRNA, epigenetic, metabolomic and proteomic abnormalities. [17][18][19][20] Following the multiple "omics" research that purposed to reveal diagnostic biomarkers for early BC detection, both the heterogeneity and the potential commonalities at the molecular level were highlighted in different BC stages. Of note, there is evidence on BC molecular heterogeneity, associated with several changes at genetic and protein levels. Therefore, a bunch of comprehensively-selected candidates could be representative of these tumors. Several assessments employing microarray and RNA-seq data have been performed to uncover novel therapeutic targets and biomarkers for BC; however, inconsistencies exist on the DEGs detected in various studies. [21,22] Of interest, we present the first report to the use of RRA-WGCNA to explore novel hub genes related to BLCA. In the present work, unlike a single genetic or cohort study, we incorporated 4 qualified BLCA datasets from GEO into the RRA technique, after which several robust DEGs were identified. In total, 343 DEGs were revealed, including 111 upregulated and 232 downregulated genes. Then, we conducted GO based on DAVID, which demonstrated that the DEGs were mainly abundant in cell division, mitotic nuclear division, cell proliferation, protein kinase binding and protein serine/threonine kinase activity. Based on these observations, we confirmed their role in BC development. [3,23] Additionally, enrichment of the DEGs in some KEGG pathways, for instance, ECM-receptor interaction and focal adhesion implicate that they are essential in the pathogenesis of BC. Following GO and KEGG analysis findings, we proposed that the DEGs have a close association with the development of BC. Moreover, upon constructing the co-expression network, as well as identifying the hub genes via WGCNA, we revealed that genes within the co-expression module which are highly associated with clinical features of BLCA samples in TGCA (blue module) were enriched in: signal transduction, cell adhesion, P13K-Akt signaling pathway as well as ECM-receptor interaction by GO and KEGG analyses (Fig. 4). After filtering for GS and MM value, 19 hub genes (EDNRA, SERPINF1, COLEC12, FBLN5, DDR2, SFRP2, OLFML3, AEBP1, DCN, CDH11, TIMP2, LUM, DPT, COL6A3, COL16A1, EMILINN1, SPON1, OLFML1, and CRISPLD2) were eventually obtained. Notably, most of them could exert essential functions in BC pathogenesis. [24][25][26][27] Moreover, after performing survival analysis, CDH11, COL6A3, EDNRA, and SERPINF1 were revealed as the only 4 outstanding genes (Fig. 5A). CDH11 (cadherin-11), which is a cadherin superfamily member, a group of intercellular adhesion molecules dependent on calcium, which are critical for adhesion, proliferation and invasion of cells. [28,29] The expression of CDH11 has been correlated to numerous pathologic processes, including fibrosis and inflammation, which is essential as it progresses from chronic inflammation to cancer. [30,31] Besides, CDH11 has been implicated in breast, prostate, colorectal cancer metastases. [32][33][34] However, based on recent studies, CDH11 functions as a gene that suppresses tumors, upon CDH11 inactivation, which is linked to the malignant characteristics of different human tumors. [35,36]  However, the association of CDH11 with BC is yet to be fully elucidated. COL6A3 (collagen VI α 3), a protein of the ECM, is present in a majority of connective tissues, such as skin, muscle, vessels, and tendons. [37] Based on recent understanding, numerous studies have outlined the critical function of COL6A3 in the prognosis and diagnosis of prostate, lung, and colorectal cancers. [38][39][40] Besides the above findings, the use of COL6A3 to diagnose and prognose BC is still elusive. EDNRA is a G-protein coupled endothelins receptor which is expressed on vascular smooth-muscle cells as well as on neuronal cells, kidney, and heart. [41] Notably, the potential functional effects of EDNRA in metastasis and cancer progression remain unclear. SERPINF1, also known as pigment epithelium-derived factor, is secreted as a protein with multiple functions. It impedes metastasis and angiogenesis, promotes tumor cell differentiation and apoptosis, and activates cellular immunity in fighting breast cancer, cervical cancer, and melanoma. [42][43][44] Of note, SERPINF1 promotes vascular microenvironment maturation and regression of immature blood vessels. [45] Some reports show that SERPINF1 potentially impede the migration and proliferation simultaneously, which is induced via the vascular endothelial growth factor. [46] Consequently, it inhibits angiogenesis through the interaction with specific cell surface receptors, [46] though its actual role in BC progression is unclear.
The mRNA expression of these DEGs were verified in normal tissues and BC tissues from the Oncomine and TCGA databases. We found via immunohistochemical analysis that mRNA expression was not completely consistent with the protein expression. For 2 genes, protein expression was not differential between cancer and normal tissues. The possible reason is that the process of mRNA translation into protein may be regulated by noncoding RNA, including microRNA, circular RNA, and long non-coding RNA, [47][48][49] or may be subject to epigenetic modifications, such as m6A modification. [50] In this study, we found that the mRNA expression of hub genes in cancer tissues was lower than that in normal tissues, but the genes were upregulated in higher tumor grades (Fig. 5B-D). For example, the mRNA of EDNRA was overexpressed in samples with higher T stages. The possible explanation is that downregulation of genes is the initiating factor of BC, but the expression level gradually increases during the disease progression or as a result of other regulation of oncogene expression. Moreover, ROC curves demonstrated that all the 4 genes, when adopted as biomarkers could distinguish tumors from healthy bladder tissue in a more sensitive and accurate manner. It is worth noting that all these genes are prospective candidates as prognosis predictors as well as therapeutic targets. Then, in the univariate and multivariate Cox regression analyses, the 4-gene signature was found to be an independent factor in the prognosis evaluation (Fig. 8).
For the hub genes, we further explored their biological functions by inferring to the TIMER dataset and GSEA. It was noted that the expression of CDH11, COL6A3, EDNRA, and SERPINF1 were negatively associated with tumor purity. However, we did not find any or weak relationships for hub genes and invading immune cells except for macrophages in BC tissues. Based on TIMER results, we suggested that CDH11, COL6A3 and SERPINF1 may exhibit their macrophage-associated functions (Fig. 9). Recent studies also revealed that macrophages enhance the tumorigenesis and increase aggressive clinical manifestations of BC. [51,52] GSEA showed that significant pathways for CDH11, COL6A3, EDNRA, and SERPINF1 include "MYC-TARGETS-V1," "MYC-TARGETS-V2" and "OXIDATIVE-PHOSPHORYLATION" (Fig. 10). Of note, all the gene sets with the highest enrichment scores had a close association with tumor proliferation. [53][54][55]

Conclusion
In a nutshell, the present study integrated RRA, WGCNA with other bioinformatics tools to identify and characterize numerous robust DEGs and significant gene modules in BC. Of note, 4 hub genes (CDH11, COL6A3, EDNRA, and SERPINF1) were strongly downregulated in BC tissues, which may be vital in uncovering the underlying mechanisms related to BC progression and provide more insights into its molecular pathogenesis in addition to defects in the signaling pathways of hub genes associated with the BC. The hub genes in this study were extracted by bioinformatics techniques, and further experimental studies are needed to determine whether they can regulate the biological functions of BC cells. The specific mechanism also must be proved and explored in more detail.