2.1ǀClinical samples and RNA sequencing
A total of 49 BC patients (35 NMIBC and 14 MIBC) in Shanghai Tenth People’s Hospital (STPH) who underwent either transurethral resection of bladder tumor or radical cystectomy between November 2019 and April 2020, were included in the training set. Among the criteria followed in the training set were: (1) histologically confirmation of BC; (2) availability of freshly collected tissue from surgery; and (3) availability of clinical data. Informed consent was prior obtained and the ethical approval was granted by the Ethics Committee of Shanghai Tenth People’s Hospital. Subsequently, the total RNA of each BC tissue was isolated using Trizol reagent (Invitrogen, Thermo Scientific, Shanghai, China). The cDNA libraries were constructed using a customized protocol following its sequencing using the Illumina Hiseq 2500 sequencer (Sangon Biotech, Shanghai, China).
2.2ǀPublic data processing
Gene expression data from GSE32894 (n=224) and GSE13507 (n=165) datasets were downloaded from GEO (http://www.ncbi.nlm.nih.gov/geo) as a series of matrix file format that was earlier processed by the respective study authors trough the MAS 5.0 algorithm. The latest TCGA data was downloaded having clinical features and follow-up information using GDC API. Theclinical data of the training sets (STPH and TCGA datasets) and validation sets (GSE32894 and GSE13507) were shown in Table 1.
2.3ǀPredictive model establishment and bioinformatics analyses
2.3.1ǀWeighted gene co‑expression network analysis
The “limma” R package was adopted for screening the differential expression genes (DEGs) of between 35 NMIBC and 14 MIBC samples in STPH. A weighted gene co-expression network analysis (WGCNA) was constructed[9] via the ‘WGCNA’’ R package v1.68 based on the DEGs (P-value<0.05 and |log2FC|≥0.5). However, the basic flow of WGCNA adhered to the following: (1) Outlier samples were omitted to facilitate the reliability of the weighted gene co-expression network, (2) An appropriate β value was selected using 0.85 as the degree of independence (R2), (3) A weighted adjacent matrix was transformed into a topological overlap matrix (TOM) to determine the network connectivity of the genes, (4) Genes with similar expression profiles were classified into gene modules based on the average linkage hierarchical clustering following the TOM-based dissimilarity measure, (5) All genes were represented by the expression of module eigengenes (MEs) in a given module. For the modules that were highly correlated with the NMIBC/MIBC subtype (|r| ≥0.3), they were selected for further analyses, and (6) The genes in the selected modules (brown, turquoise, and yellow) were extracted.
2.3.2ǀConstruction of eight-gene risk score
Based on hub genes extracted from WGCNA, we performed the univariate Cox to select an overall survival (OS) associated genes in the TCGA dataset [10]. Thereafter, the OS associated genes were subject to Least absolute shrinkage and selection operator (LASSO) with ten-fold cross-validation to construct an eight-genes risk score using “glmnet” package. Accordingly, the genes with the minimal influence on the patients overall survival were removed whereas the genes with non-zero coefficients were selected. The risk score for each patient was calculated as follows: risk score = Coef1 × expression of gene1 + Coef 2 × expression of gene2 + … + Coefm × expression of gene m. Coef denotes the corresponding coefficient of the gene. With this data expression, the ‘survminer’ R package v4.6 was used to evaluate the optimal cutoff value of the risk score in each cohort. The determined corresponding optimal cutoff value was used to divide patients into low- and high-risk groups accordingly.
2.3.3ǀGene sets enrichment analysis (GSEA)
GSEA (http://www.broadinstitute.org/gsea/index.jsp) was conducted to investigate different pathway activities between patients in high- and low-risk groups. Signaling pathway with P<0.05 and a false discovery rate <0.25 were regarded as statistically significant.
2.3.4ǀGene Set Cancer Analysis
Gene Set Cancer Analysis (GSCALite, http://bioinfo.life.hust.edu.cn/web/GSCALite/) provides a single nucleotide variation (SNV) module through the maftools[11]. Therefore, in this present study, we employ the SNV module for analyzing and visualizing the SNV of eight genes in BC.
2.4ǀEight-Gene Risk score validation
In evaluating the generalizability of the model, similar formula and coefficients to the training sets (STPH and TCGA dataset) in the validation datasets (GSE3289 and GSE13507) were used. The Kaplan-Meier curves of the validation datasets were separately drawn before analyzing their eight-genes risk score prognostic classification efficacy. Furthermore, the receiver operating characteristic (ROC) curve and violin plot were used to verify the relationship between the risk score and the clinical features (MIBC).
2.5ǀDevelopment of the nomogram
A nomogram was developed from the factors that were significant in the final multivariate Cox regression analyses. The ROC, concordance index (C-index) and calibration plots were conducted purposely to assess the performance of the nomogram. The nomogram, C-index and calibration plots were generated using the ‘Rms’ R package. Furthermore, the decision curve analysis (DCA) was employed to ascertaining the net benefits of nomogram and other crucial prognostic factors.
2.6ǀCell culture
The human normal bladder epithelial cell line SV-HUC-1 and BC cell line(T24, UMUC3, J82, 5637, EJ)obtained from Shanghai Cell Bank of the Chinese Academy of Sciences (Shanghai, China) were used in this study. The SV-HUC-1 cell lines were cultured in F12K having 10% FBS and 1% penicillin/streptomycin (P/S). As of the BC cell lines, they were cultured in RPMI‑1640 (Gibco; Thermo Fisher Scientific, Inc.) with 10% FBS (Gibco; Thermo Fisher Scientific, Inc.) and 1% P/S (Gibco; Thermo Fisher Scientific, Inc.). Cell culture reagents were sourced from Gibco and the subjected cells were cultured at 37˚C in 5% CO2.
2.7ǀReal-time quantitative (RT-qPCR)
TRIzol reagent (Invitrogen, Thermo Scientific, Shanghai, China) was used for extracting RNA of the cell lines. cDNA synthesis was performed using cDNA Synthesis SuperMix Kit(Cat No.11141ES60;Yeasen, Shanghai. China) and qPCR was performed using qPCR SYBR Green Master Mix KIT (Cat No. 11203ES03; Yeasen, Shanghai. China). The genes expression levels normalized to the level of GAPDH and Primer sequences are presented in Table SI.
2.8ǀStatistical analysis
The Wilcoxon rank-sum test was performed to investigate the differential expression genes between NMIBC and MIBC. The Student’s t-test or one-way ANOVA was used to evaluate the risk score in patients that were grouped according to their clinical characteristics. Both the Kaplan-Meier and log-rank tests were employed to compare the OS of the BC patients of between high- and low-risk groups. The performance of the risk score in predicting prognosis and the NMIBC/MIBC subtype was determined using the ROC curves. The heatmap and correlation matrix were generated using ‘Pheatmap’ R package v1.0.12 and ‘corrplot’ R package v0.84, respectively. SPSS 22.0 (SPSS, Armonk, NY, USA), R v3.6.1 (https://www.r-project.org/) and Graphpad Prism V7 (GraphPad Software, Inc.) were used for data analyses. However, a two-sided P value<0.05 was considered significant.