Workflow
A flow diagram of current study was shown in Figure 1. First, we downloaded gene expression profiles from TCGA database and GEO database to screen DEGs. Next, we constructed a co-expression network by WGCNA to identify differential co-expression genes. After that, we perform GO enrichment analysis and KEGG pathway analysis for differential co-expression genes. Additionally, we constructed a PPI network to screen hub genes and identified survival-related hub genes via OS and DFS analysis. Then, we validated the expressions of survival-related hub genes by HPA database, GEPIA database and qRT-PCR. Finally, we analyzed the association between the expressions of survival-related hub genes and clinical information of BLCA patients.
Datasets from TCGA and GEO database
We downloaded the gene expression profiles of BLCA and matched clinical information from TCGA (https://portal.gdc.cancer.gov/) database. There were 430 samples that included 411 bladder cancer and 19 normal tissues and 19,645 genes. Then we downloaded the GTF configuration file from the Ensembl database (http://asia.ensembl.org/index.html) to match probes with corresponding genes. Probes with more than one gene were eliminated and the average value was calculated out for these genes corresponding to more than one probes. Genes of low read counts are usually excluded for further analysis, so we set the genes with a cpm (count per million) ≥ 1 in our current study. Ultimately, 14,716 genes were used for further analysis.
In addition, we downloaded the normalized expression profiles of GSE13507 from GEO database (https://www.ncbi.nlm.nih.gov/geo/), another gene expression profiles of BLCA which were studied with the GPL6102 platform Illumina human-6 v2.0 expression beadchip. GSE13507 consisted of 188 BLCA samples including 165 primary bladder cancer samples and 23 recurrent non-muscle invasive tumor tissues, and 68 normal samples containing 58 normal looking bladder mucosae surrounding cancer which were histologically confirmed normal and 10 normal bladder mucosae. Probes were converted to the gene symbol based on an annotation file which was provided by the manufacturer and repeated probes for the same gene were removed. Therefore, about 24,323 genes were selected for the subsequent analysis.
Differentially expressed genes (DEGs) analysis
The R package “limma”(20) was used to screen DEGs in TCGA-BLCA and GSE13507 dataset between BLCA and normal tissues. We set the adj.P < 0.05 and |logFC| ≥ 1 as the threshold for screening DEGs. Ultimately, the DEGs selected from TCGA-BLCA and GSE13507 dataset were visualized as a volcano plot by using the R package “ggplot2”, while as a heatmap plot by using “pheatmap”.
Construction of weighted co-expression network and identification of key modules
We kept the expression profiles of TCGA-BLCA and GSE13507 qualified and then constructed gene co-expression networks by “WGCNA” package in R(12) for these data. To build standard scale-free networks using the pickSoftThreshold function, soft powers β = 5 and 11 were selected. Next, we used Pearson’s correlation matrices performed for all pairs of genes to construct adjacency matrixes. Then the adjacency matrixes were turned into topological overlap matrixes (TOM) as well as the corresponding dissimilarity (1-TOM). At the same time, average linkage hierarchical clustering was conducted to classify genes with similar expression patterns into gene modules. To identify functional modules in co-expression networks, we firstly quantized the correlation between module eigengenes and clinical trait information, and further quantified the relationship by calculating Gene Significance (GS). We got the average value of GS of all the genes in a module, called Module Significance (MS) via the data processing. Module with an absolute MS ranking first in all modules was considered candidate relevant to clinical traits. Finally, the module highly correlated with clinical trait information of bladder cancer was selected for further analysis.
Interaction between the modules of interest and DEGs
After DEGs analysis and WGCNA analysis, we obtained several differential co-expression genes between DEGs and co-expression genes that were extracted from the co-expression network to look for potential prognosis genes via the R package “VennDiagram”(21). And the result was present as a Venn diagram.
Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis
GO enrichment analysis and KEGG pathway analysis were performed for differential co-expression genes via R package “clusterProfiler”(22). Gene sets with p < 0.05 were referred to be significantly enriched. GO analysis consist of biological process (BP), cellular component (CC) and molecular function (MF), only the top ten terms with p < 0.05 were selected. While in KEGG analysis, we selected a maximum of 30 terms with p < 0.05.
Construction of PPI and screening of hub genes
In current study, we used Search Tool for the Retrieval of Interacting Genes (STRING) database (http://www.string-db.org/)(23) which is designed for predicting protein–protein interactions (PPI) to construct a PPI network of differential co-expression genes. We defined these genes with minimum required interaction score greater than 0.7 as hub genes, and the network diagram was visualized by Cytoscape (v3.8.0)(24). Maximal Clique Centrality (MCC) algorithm was efficient to find hub nodes. We calculated MCC of each node by CytoHubba(25), a plugin in Cytoscape, and finally we chose the top 10 MCC values as hub genes.
Verification of the expression patterns and the prognostic values of hub genes
We confirmed the hub genes’ expression pattern in BLCA and normal tissues based on GEPIA database (http://gepia.cancer-pku.cn/)(26). The expression level of each hub gene was shown in a box plot graph. In our study, patients whose clinical data of survival time and survival status were incomplete were excluded. According to the median expression value of hub genes, we divided patients into two groups. After that, we performed a Kaplan-Meier survival analysis to explore the relationship between OS and hub genes in patients using the R package “survival”. Furthermore, the association between DFS and hub genes was conducted in GEPIA database. Finally, it should be emphasized that genes with significant p value (p < 0.05) in OS analysis would be thought as survival associated genes. In addition, we divided the bladder cancer patients obtained from TCGA database into two groups according to age, gender, grade, stage and TNM stage using dichotomy, then we explored the association between these clinical information and hub genes’ expression via R package “ggpubr”.
Validation of protein expressions of survival-related hub genes by HPA database
We downloaded immunohistochemistry (IHC) pictures from HPA (https://www.proteinatlas.org/) database to validate the protein expressions of hub genes between BLCA and normal tissues.
Validation of mRNA expressions of survival-related hub genes by qRT-PCR
We further collected bladder cancer and matched paracancerous tissues from Zhongnan Hospital, Wuhan University, to verify the expression of hub genes. Total RNA from tissues was isolated using Hipure Total RNA Mini Kit (Cat. R4111-03, Magen) according to the manufacturer's instruction. After that, the quantity of isolated RNA was measured by a NanoDrop® ND‐1000 UV‐Vis spectrophotometer (Thermo Scientific). The cDNA was synthesized using 1 μg total RNA by ABScript II RT Master Mix for qPCR (Cat.RK20402, Abclonal). Each qPCR reaction was conducted with 10 μL 2X Universal SYBR Green Fast qPCR Mix (Cat.RK21203, Abclonal), 7 μL ddH2O, 1 μL cDNA, 1 μL forward primer and 1 μL reverse primer. Values were normalized for amplified GAPDH alleles. Primer sequences were listed as follows:
TPM1: 5'-GCCGACGTAGCTTCTCTGAAC-3', 5'-TTTGGGCTCGACTCTCAATGA-3'; MYLK: 5'-CCCGAGGTTGTCTGGTTCAAA-3', 5'-GCAGGTGTACTTGGCATCGT-3'; CALD1: 5'-TCGACCCAACAATAACAGATGC-3', 5'-TCTCGTATCTTTCTTGGCGACT-3'; MYH11: 5'-CGCCAAGAGACTCGTCTGG-3', 5′-TCTTTCCCAACCGTGACCTTC-3'; ACTA2: 5'-GTGTTGCCCCTGAAGAGCAT-3', 5'-GCTGGGACATTGAAAGTCTCA-3'; GAPDH: 5'-ATGGAGAAGGCTGGGGCTC-3', 5'-AAGTTGTCATGGATGACCTTG-3'.
Statistical analysis
R software 4.0.2 was used for all statistical analyses. Two-tailed Student's t-tests were used to assess the statistical significance of differences between the groups. Statistical significance was considered as p < 0.05.