Patient cohort
In this study, patients with treatment- naïve histologically confirmed NMIBC treated with TURBT and induction intravesical BCG between 2012 and 2021 were identified from the Department of Urology of Fudan University Shanghai Cancer Center (FUSCC, Shanghai, China). Studies were performed in accordance with the Declaration of Helsinki. All patients with T1 disease underwent a restaging TURBT. Patients were selected based on the availability of bladder tissue specimen pre-BCG treatment with enough tumor available to obtain a core. Specially, pre-BCG treatment samples were obtained from primary, incident tumors 1–3 months prior to starting a 6-week induction course of BCG. Cystoscopic evaluations were performed 3 months after the first BCG instillation, and those responding to treatment were then evaluated with cystoscopy and urine cytology every 3 months for 2 years, 6 months to 5 years, and annually up to 10 years at the discretion of the treating urologist. All pathologic evaluations were performed at FUSCC by a genitourinary pathologist. BCG response was defined according to standard definitions: (1) patients with a minimum follow-up of 2 years after diagnosis were defined as “BCG Responders (BCG-R)”. (2) “BCG non-responders (BCG-NR)” included BCG unresponsive, BCG relapsing, and BCG progressor. BCG unresponsive patients were those who had persistent high-grade T1 disease at the initial 3-month cystoscopy, or had relapsed high-grade NMIBC with or without CIS within 6 months of the last exposure to BCG. BCG relapsing patients had recurrent high-risk (high-grade) NMIBC after a prior complete response and did not fulfill the BCG-unresponsive definition [72, 73]. Patients who received an inadequate BCG induction (less than five of six courses) and those with incomplete follow-up information were excluded.
Sample preparation
Tumor tissue samples were collected from the FUSCC tissue bank immediately after surgery. The samples were collected within 30 minutes after resection, and were immediately transferred into sterile freezing vials and snap frozen in liquid nitrogen. The tumor tissues were then cut into approximately 0.5 cm3 pieces under a temperature of -40°C and stored at -80°C until further use. Histologic sections were obtained from the top and bottom portions of the tumor tissues and stained with Hematoxylin and eosin (H&E) for review. In addition, precancerous lesions and divergent histological variant tumors of patients were scraped according to H&E staining. Tumor samples were required to contain an average of 70% tumor cell nuclei with necrosis equal to or less than 20% for inclusion in the study. Each sample was assigned a new research ID, and the patient’s name or medical record number used during hospitalization was de-identified.
Pathology Review
All samples were systematically evaluated to confirm the histopathologic diagnosis and any variant histology according to the World Health Organization (WHO) classification by three expert genitourinary pathologists. Additionally, the tumor samples were also assessed for tumor content, presence and extent of tumor necrosis, and signs of invasion into the lamina propria or muscularis propria. Tumor samples were also evaluated for the presence and extent of inflammatory infiltrates, as well as for the type of the infiltrating cells (lymphocytes, neutrophils, eosinophils, histiocytes, plasma cells) in the tumor microenvironment. Any non-concordant diagnoses among the three pathologists were re-reviewed, and a resolution was reached following discussion. All the information is included in Supplementary Data 1.
Whole-exome Sequencing
DNA extraction
For WES, DNA was extracted from fresh or frozen tumor tissue samples, while matched germline DNA was obtained from corresponding blood samples. DNA from both the tumor tissues samples and blood samples was extracted according to the manufacturer’s instructions of a DNeasy Blood & Tissue Kit (QIAGEN, Hilden, Germany). The quality of the isolated and contaminated samples was verified using the following methods: (i) monitoring DNA degradation and contamination using 1% agarose gels; and (ii) measuring DNA concentration with the Qubit® DNA Assay Kit in a Qubit® 2.0 Fluorimeter (Invitrogen, CA, USA).
Library preparation
An input of 0.6 µg genomic DNA per sample was used for the DNA preparation. The DNA sequencing libraries were generated following the manufacturer's recommendations using an Agilent SureSelect Human All Exon kit (Agilent Technologies, CA, USA) following the manufacturer’s recommendations, following which index codes were added to each sample. Briefly, fragmentation was carried out using a Covaris M220 system to generate 200–300 bp fragments. The remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. Following adenylation of the 3’ ends of DNA fragments, adapter oligonucleotides were ligated. DNA fragments with adapter molecules ligated at both ends were selectively enriched using a PCR reaction. Following the PCR reaction, the libraries were then hybridized with t a biotin-labeled probe in the liquid phase and captured using magnetic beads with streptomycin to capture the exons of genes. Enriched libraries were subjected to a PCR reaction to add index tags for sequencing preparation. The products were purified using an AMPure XP system (Beckman Coulter, Beverly, USA) and quantified using the Agilent high sensitivity DNA assay on the Agilent Bioanalyzer 2100 system.
Clustering and Sequencing
Clustering of index-coded samples was performed using a cBot Cluster Generation System with a Hiseq PE Cluster Kit (Illumina) according to the manufacturer’s instructions. After cluster generation, the DNA libraries were sequenced on the Illumina HiSeq platform and 150 bp paired-end reads were generated.
WES data analysis
Quality Control
The original fluorescence image files obtained from the Hiseq platform were transformed to short reads (raw data) through base calling. These short reads were then recorded in FASTQ format, which contains both sequence information and corresponding sequencing quality information. To ensure reliable bioinformatics analysis, it is crucial to address sequence artifacts, including reads containing adapter contamination, low-quality nucleotides, and unrecognizable nucleotides (N). Hence, quality control is an essential step that must be applied to guarantee meaningful downstream analysis.
The data processing steps were as follows:
-
Paired reads were discarded if either read contained adapter contamination (> 10 nucleotides aligned to the adapter, allowing ≤ 10% mismatches).
-
Paired reads were discarded if more than 10% of bases are uncertain.
-
Paired reads were discarded if the proportion of low-quality (Phred quality < 5) bases is either read was over 50%.
All downstream bioinformatics analyses were based on high-quality clean data, which were retained after these steps. At the same time, QC statistics including total read number, raw data, raw depth, sequencing error rate, percentage of reads with Q30 (the percentage of bases with Phred-scaled quality scores greater than 30), and GC content distribution were calculated and summarized.
Reads Mapping to Reference Sequence
Valid sequencing data were mapped to the reference human genome (UCSC Genome Brower, hg38) using Burrows-Wheeler Aligner (BWA) software [74] to obtain the original mapping results stored in BAM format. If one read, or one paired read, were mapped to multiple positions, the strategy adopted by the BWA was to choose the most likely placement. If two or more most likely placements were present, the BWA picked one randomly. Then, SAMtools [75] and Picard (http://broadinstitute.github.io/picard/) were used to sort BAM files and perform duplicate marking, local realignment, and base quality recalibration to generate final BAM files for computation of the sequence coverage and depth. The mapping step was very difficult due to mismatches, including true mutations and sequencing errors, and duplicates resulting from PCR amplification. These duplicate reads were uninformative and should not be considered as evidence for variants. We used Picard to mark these duplicates for the follow-up analysis.
Variant Calling
Somatic variants were then called, utilizing VarScan v2.3.8 [76] MuTect v1.1.7 [77], and InVEX (http://www.broadinstitute.org/software/invex/). The following filters were applied to get variant cells of high confidence: Remove mutations with coverage less than 10×; Remove variant sites in dbSNP and with mutant allele frequency (MAF) > 0.05 in the 1000 Genomes databases (1000 Genomes Project Consortium) and the Novo-Zhonghua (in-house unrelated healthy individual database), but include sites with MAF ≥ 0.05 with COSMIC evidence (http://cancer.sanger.ac.uk/cosmic) [78, 79]; All variants must be called by 2 or more callers; All variations must be exonic; Retain the nonsynonymous SNVs if the functional predictions by PolyPhen-2, SIFT, MutationTaster and CADD all show the SNV is not benign [80, 81]; Retain genes identified by Cancer Gene Census (CGC, http://www.sanger.ac.uk/science/data/ cancer-gene-census). The genes with mutations were analyzed by Fisher's exact test to compare the two proteomic groups.
Copy-Number Analysis
CNAs were identified using GATK’s (GATK 4) Best Practice somatic CNA calling pipeline. The results of this pipeline, segment files of every 1000 were then input into GISTIC2 [82], to identify significantly amplified or deleted focal-level and arm-level events, with a q-value < 0.1 considered significant. A log2 ratio cut-off 1 was used to define SCNA amplification and deletion. We further summarize the arm-level copy number change based on a weighted sum approach [83], in which the segment-level log2 copy ratios for all segments in the given arm were added up with the length of each segment being weighted. To exclude false positives as much as possible, relatively stringent cut-off thresholds were used with the following parameters: -ta 0.1 -tb 0.1 -brlen 0.98 -conf 0.9. Other parameters were the same as their default values.
Co-occurrence and Mutual Exclusivity Analysis of Mutations
Co-occurrence and mutually exclusive mutated genes were detected using Fisher’s exact test to determine the co-occurrence and mutually exclusively of significantly mutated genes in our mutational dataset.
Analysis of Significantly Mutated Genes
Filtered mutations (including SNV and indel) were further used to identify significantly mutated genes using MutSigCV (https://software.broadinstitute.org/cancer/cga/mutsig, version 1.4) with default parameters. The final MutSigCV P values were then converted to q-values using the method of Benjamini and Hochberg [84]. Genes with q-value < = 0.1 were declared to be significantly mutated.
Mutational signature analysis using the sigminer approach
Mutational signatures were jointly inferred for 92 tumors using the R package sigminer [85]. The sigminer approach (https://github.com/ShixiangWang/sigminer) was used to extract the underlying mutational signatures. The input data consisted of 96 mutation vectors (or contexts) generated by somatic SNVs based on six base substitutions (C > A, C > G, C > T, T > A, T > C, and T > G) within 16 possible combinations of neighboring bases for each substitution. These mutation vectors were used to determine their contributions to the observed mutations. Sigminer using a non-negative matrix factorization (NMF) approach was applied to decipher the 96 × 92 (i.e., mutational context-by-sample) matrix for the 30 known COSMIC cancer signatures (https://cancer.sanger.ac.uk/cosmic/signatures) and infer their exposure contributions.
Mutational signature analysis using the deconstructSigs approach
The mutational signature of each sample was deconstructed using the deconstructSigs approach [86] and its R package (deconstructSigs v1.8.0) with default parameters. Thirty COSMIC cancer signatures were considered and their contributions (weights) in each patient were normalized between 0 and 1, and signatures with a weight below 0.08 were filtered out.
Functional Annotation
Functional annotation plays a vital role in elucidating the link between genetic variations and diseases. ANNOVAR was performed to annotate the variant call format (VCF) obtained in a previous study [87]. The dbSNP, 1000 Genome, and other related databases were used to characterize the detected variants. Given the significance of exonic variants, gene transcript annotation databases, such as Consensus CDS, RefSeq, Ensembl, and UCSC, were also included in the determination of amino acid alterations. Annotation content contained the variant position, variant type, and conservative prediction, and more. These annotation results would help locate disease-causal mutants. The details of the annotation are provided in the supplementary material.
TMB
TMB was defined as the number of somatic mutations (including base substitutions and indels) in the coding region. Synonymous alterations were also counted [88]. To calculate the TMB, the total number of mutations counted was divided by the size of the coding sequence region of the xGen Exosome Research Panel (Integrated DNA Technologies, IDT). The WES target region was 33.52M. In this study, a threshold value of10 was adopted to differentiate between TMB high (TMB-H) and TMB low (TMB-L) cases.
Proteomic and phosphoproteomic analysis
Protein extraction and trypsin digestion
Frozen samples were collected and washed three times with phosphate buffer saline (PBS) to remove blood and debris. The samples were then minced and treated with lysis buffer (8M urea, 100 mM Tris-HCl, pH 8.0) containing protease and phosphatase inhibitors (Thermo Scientific, USA) and then sonicated for 1 minute (3s on and 3s off, amplitude 25%). The resulting lysates were centrifuged at 14,000g for 10 minutes and the supernatants were collected as whole-tissue extracts. Protein concentrations were determined by the Bradford protein assay (TaKaRa, T9310A). Extracts (100 µg protein) were reduced with 10 mM dithiothreitol at 56°C for 30 minutes and alkylated with 10 mM iodoacetamide at room temperature in the dark for 30 minutes. Finally, the samples were digested with trypsin using a filter-aided sample preparation method [89].
The enrichment of phosphorylated peptides
For the phosphoproteomic analysis, peptides were extracted from the FFPE slides following the same protocol as previously described for “Protein extraction and trypsin digestion”. Tryptic peptides were then used for phosphopeptide enrichment using a High-Select Fe-NTA kit (Thermo Fisher Scientific, Rockford, IL, USA, #A32992) according to the kit manual and a previous report [90] with some modifications. In brief, the peptides were suspended in the binding/washing buffer (contained in the enrichment kit) and mixed with the equilibrated resins. The peptide-resin mixture was incubated for 30 minutes with three gentle blows at room temperature. Following incubation, the resins were washed three times with the binding/washing buffer and twice with water. The enriched peptides were eluted with the elution buffer (contained in the enrichment kit) and immediately dried using a speed-vac at 45°C for mass spectrometry analysis.
LC-MS/MS analysis
For the analysis of proteome profiling samples and phosphoproteomic samples, peptides were analyzed on an Orbitrap Fusion Lumos Hybrid Quadrupole-Orbitrap Mass Spectrometer (Thermo Fisher Scientific) coupled with a high-performance liquid chromatography system (EASY nLC 1200, Thermo Fisher Scientific). Dried peptide samples were re-dissolved in Solvent A (0.1% formic acid in water) and loaded onto a 2-cm self-packed trap column (100 µm inner diameter, 3 µm ReproSil-Pur C18-AQ beads, Dr Maisch GmbH) using Solvent A. The peptides were then separated on a 150 µm-inner-diameter column with a length of 30 cm (1.9 µm ReproSil-Pur C18-AQ beads, Dr Maisch GmbH) over a 150 mins gradient (Solvent A: 0.1% Formic acid in water; Solvent B: 0.1% Formic acid in 80% ACN) at a constant flow rate of 600 nL/min (0-150 mins, 0min, 4% B; 0–10 mins, 4%-15% B; 10–125 mins, 15%-30% B; 125–140 mins, 30%-50% B; 140–141 mins, 50%-100% B; 141–150 mins, 100% B). The eluted peptides were ionized at 2 kV and introduced into the mass spectrometer. Mass spectrometry was performed in data-dependent acquisition mode. For the MS1 Spectra full scan, ions with m/z ranging from 300 to 1,400 were acquired by an Orbitrap mass analyzer at a high resolution of 120,000. The automatic gain control (AGC) target value was set to 3E + 06 with a maximal ion injection time of 80 ms. MS2 spectral acquisition was performed in the ion trap in a rapid speed mode with a cycletime of 1.5s. Precursor ions were selected and fragmented with higher energy collision dissociation (HCD) with a normalized collision energy of 30%. Fragment ions were analyzed by an ion trap mass analyzer with an AGC target set at 5E + 04 and the maximal ion injection time of MS2 was 20 ms. Peptides that triggered MS/MS scans were dynamically excluded from further MS/MS scans for 18 s. The coefficient of variation (CV) values on FAIMS were − 45V and − 65V.
MS database searching
Peptide and protein identification
MS raw files were processed using the “Firmiana” (a one-stop proteomic cloud platform) [91] against the human National Center for Biotechnology Information (NCBI) RefSeq protein database, with the Mascot 2.4 software (Matrix Science Inc., London, UK). The search parameters included a maximum of 2 missed cleavages, mass tolerances of 20 ppm for precursor ions and 0.05 ppm for production ions and fixed modification of cysteine carbamidomethylation. Variable modifications considered were N-acetylation and methionine oxidation. To ensure the quality control of protein identification, a target-decoy-based strategy was implemented to maintain a FDR of less than 1% for both peptides and proteins. The program percolator was used to obtain the probability value (q-value) and showed that the FDR (measured by the decoy hits) of every peptide-spectrum match (PSM) was lower than 1%. Peptides shorter than seven amino acids were excluded, and a cutoff ion score of 20 was set for peptide identification. To ensure stringent quality control, PSMs from all fractions were combined. The q-values of both target and decoy peptide sequences were dynamically increased employing the parsimony principle until the corresponding protein FDR was less than 1%. Finally, to reduce the false positive rate, proteins with at least two unique peptides were selected for further investigation.
Label-free-based MS Quantification of Proteins
For the calculation of protein abundance, the non-redundant peptide list was used to assemble proteins following the parsimony principle. Protein abundance was then estimated by a traditional label-free, intensity-based absolute quantification (iBAQ) algorithm, which divided the protein abundance (derived from identified peptide intensities) by the number of theoretically observable peptides. Match between runs [92] was used to improve parallelism across different tissues from 86 patients. We built a dynamic regression function based on commonly identified peptides in different tissues. According to the correlation value, R2, Firmiana chooses a linear or quadratic function for regression to calculate the retention time (RT) of the corresponding hidden peptides, and check the existence of the extracted ion chromatogram (XIC) based on the m/z and calculated RT. The program evaluated the peak area values of the existing XICs which were calculated as parts of the corresponding proteins. Proteins with at least two unique peptides with a 1% FDR at the peptide level were selected for further analysis. Then, the fraction of total [48], a relative quantification value that was defined as a protein’s iBAQ divided by the total iBAQ of all identified proteins in one experiment, was calculated as the normalized abundance of a particular protein among experiments. Finally, the fraction of total [48] was further multiplied by 105 for ease of presentation and any missing values were assigned as 10− 5 [93].
Batch effect analysis
Hierarchical clustering, dip statistic test and principal component analyses were implemented in R v.3.5.1 to assess batch effects in our proteome dataset with respect to the following two variables: batch identity and sample type (pre-BCG and normal samples). For the hierarchical clustering analysis, pairwise Spearman’s correlation coefficients of samples that passed quality control were investigated. Samples of the same type exhibited high similarity, whereas samples of different types exhibited clear differences. There was no evident association between batch identity and correlation coefficients. The density plot of the normalized intensities of the identified proteins in each sample showed that all samples passed quality control as expected from a unimodal distribution (dip statistic test). The results of principal component analysis showed that batch effects were negligible for batch identity but significant for the sample types.
Quality control of the MS data
For quality control of MS, the HEK293T cell (National Infrastructure Cell Line Resource) lysates were measured every three days to set the quality-control standard. The quality-control standard was then digested and analyzed using the same method and conditions as the 12 samples. Pairwise Spearman’s correlation coefficient was calculated for all quality-control runs using the R package corrplot (v0.84), and the results are shown in Figure S1F. The average correlation coefficient among the standards was 0.90, while the maximum and minimum values were 0.95 and 0.86, respectively.
Targeted PRM analysis
To evaluate the accuracy of the classifiers for predicting BCG therapy, we designed a PRM strategy to quantify the classifier proteins in plasma samples from an independent cohort composed of 50 NMIBC patients treated with BCG. From the library search results, a set of target peptides that were unique to the classifier proteins were selected. Besides, house-keeping proteins such as VCP, RPLP0, and PSMB4, were also included as the reference. Equal amounts of tissue from each sample in the validate cohort with 50 NMIBC patients were digested as described in the profiling preparation section. The peptide samples were then injected into the Q Exactive HF-X Hybrid Quadrupole-Orbitrap Mass Spectrometer (Thermo Scientific) operating in PRM mode with quadrupole isolation and HCD fragmentation. The mobile phase buffers used were the same as those mentioned in the DIA method section. Full MS mode was measured at a resolution of 60,000 with an AGC target value of 3E6 and maximum IT of 20ms, scanning the range of 300–1400 m/z. Target precursors were then isolated through an m/z window of 1.2 Th, followed by fragmentation at 27% normalized collision energy. The product ions were scanned with a resolution of 15,000, an AGC target value of 1E6 or a maximum injection time of 25ms. The raw data was searched by Skyline-daily (4.2.1.19004, University of Washington, USA). The proteins were quantified based on the fragment total area reported by Skyline-daily. We selected peptides and tested their signal stability and peak shape in the pool sample for final quantification, following the ranking offered by Skyline.
Quantification and Statistical Analysis
Missing Value Imputation
For the proteomic and phosphoproteomic data, FOTs were multiplied by 1E5 for quantification. Proteins and phosphoproteins detected in more than 30% of the samples underwent missing value imputation. The missing values were imputed with 1E-5 and, if necessary, log2 transformed. This method has also been previously applied in other published proteomic and phosphoproteomic studies [93, 94].
Differential protein analysis
Proteins expressed in more than 30% of the samples were selected for differential expression analysis. The Wilcoxon rank-sum test was used to examine whether proteins were differentially expressed between BCG-R (n = 52 samples) and BCG-NR (n = 21 samples), or patients with different mutation statuses and CNA statuses. Upregulated or downregulated proteins are defined as proteins that show differential expression in one group compared with the other group (Wilcoxon rank-sum test, FDR < 0.05, Fold change > 2 or < 0.5). The Kruskal-Wallis test was used to test whether genes were differentially expressed among different tissue types or other subgroups. To account for multiple-testing, the P values were adjusted using the Benjamini-Hochberg FDR correction. The same strategy was applied to analyze the differential expression of phosphoproteomic data.
Pathway enrichment analysis
Differentially expressed genes were subjected to Gene Ontology and KEGG pathway enrichment analysis in DAVID[95] with a P-value/FDR < 0.05. We used gene sets of molecular pathways from the KEGG [96] /Hallmark [97]/Reactome [98] / GO [99] databases to compute pathways.
Pathway scores and correlation analysis
Single-sample gene set enrichment analysis (ssGSEA) [100] was utilized to obtain pathway scores for each sample based on proteomic and phosphoproteomic data using the R package GSVA [101]. The KEGG, Reactome, and Hallmark gene sets from the MsigDB v7.1 were used as the background. Correlations between the pathway scores and other features were determined using Spearman’s correlation [102]. Inferred activity was performed using ssGSEA implemented in the R package GSVA with a minimum gene set size of 10.
Survival analysis
Kaplan-Meier survival curves (log-rank test) were used to determine the OS and PFS of different subtypes and patients. The coefficient value, which is equal to ln (HR), was calculated using Cox proportional hazards regression analysis. P-values less than 0.05, were considered significantly different and selected for Cox regression multivariate analysis. Prior to the log-rank test of a given protein, phosphoprotein, or phosphosite, survminer (version 0.2.4, R package) with maxstat (maximally selected rank statistics; http://r-addict.com/2016/11/21/Optimal-Cutpoint-maxstat.html) was used to determine the optimal cut-off point for the selected samples according to a previous study[103]. OS curves were then calculated (Kaplan-Meier analysis, log-rank test) based on the optimal cut-off point.
Gene Set Enrichment Analysis (GSEA)
GSEA was performed by the GSEA software (http://software.broadinstitute.org/gsea/ index.jsp). Gene sets including KEGG, GO Biological Process (BP), Reactome and HALLMARK downloaded from the Molecular Signatures Database (MSigDB v7.1, http://software.broadinstitute.org/gsea/msigdb/index.jsp) were used. FDR < 0.05 was used as a cutoff. The normalized enrichment score was used to reflect the degree of pathway overrepresentation.
Multivariate Cox proportional hazards regression analysis
Multivariate Cox proportional hazard regression analysis was used to establish the PFS-predicting model for NMIBC. The results were plotted using a forest map with the R package “ggplot2”.
Construction and validation of predictive models to distinguish between BCG-R and BCG-NR
Binomial logistic regression analysis was used to construct the distinguishing between BCG-R and BCG-NR predicting model based on the significantly differently expressed proteins in BCG-R and BCG-NR samples using in the R software v3.5.1. The backward stepwise method was utilized to feature selection. Samples were randomly divided into the training set and the testing set. Moreover, the diagnostic value of this model was verified using ROC analysis (pROC R package version 1.16.2 and Caret R package version 6.0–86). Sensitivity, specificity, accuracy, and AUC were used to determine predictive values. The predictive model was validated in validation cohort.
Phosphoproteomic data analysis
Database searching of MS phosphoproteomic data
Phospho-proteome MS raw files were searched against the human Refseq protein database (27,414 proteins, version 04/07/2013) using Proteome Discoverer (version 2.3.0.523) with a Mascot [104] (version 2.3.01) engine with a percolator [105]. Carbamidomethyl cysteine was used as a fixed modification, and oxidized methionine, protein N-term acetylation, and phospho (S/T/Y) were set as variable modifications. The FDR of peptides and proteins was set at 1%. The tolerance for spectral search a mass tolerance of 20 ppm for the precursor. The maximum number of missing cleavage site was set at 2. For phosphosite localization, ptmRS [106] was used to determine phosphosite confidence, and a phosphosite probability > 0.75 was used for further analysis.
Kinase Activity Prediction
Kinase activity scores were inferred from phosphorylation sites by employing PTM signature enrichment analysis (PTM-SEA) using the PTM signatures database (PTMsigDB) v1.9.0 (https://github.com/broadinstitute/ssGSEA2.0). Sequence windows flanking the phosphorylation site by 7 amino acids in both directions were used as unique site identifiers. Only fully localized phosphorylation sites as determined by Spectrum Mill software were taken into consideration. Phosphorylation sites on multiply phosphorylated peptides were resolved using the approach described in Krug et al.[107] resulting in a total of 29,406 phosphorylation sites that were subjected to PTM-SEA analysis using the following parameters:
gene.set.database = ‘‘ptm.sig.db.all.flanking.human.v1.9.0.gmt’’
sample.norm.type = ‘‘rank’’
weight = 0.75
statistic = ‘‘area.under.RES’’
output.score.type = ’’NES’’
nperm = 1000
global.fdr = TRUE
min.overlap = 5
correl.type = ‘‘z.score”
Immunohistochemistry (IHC)
Formalin-fixed, paraffin-embedded tissue sections of 10 µM thickness were stained in batches for detecting TLR3 in a central laboratory at the FUSCC according to standard automated protocols. Deparaffinization and rehydration were performed, followed by antigen retrieval and antibody staining. TLR3 IHC was performed using the Leica BOND-MAX auto staining system[108]. Rabbit monoclonal anti-TLR3 antibody (ABclonal, A11778) was introduced, followed by detection with a Bond Polymer Refine Detection DS9800 (Bond). Slides were imaged using an OLYMPUS BX43 microscope (OLYMPUS) and processed using a Scanscope (Leica).
Functional Experiments
Cell culture
The human bladder cancer cell lines (T24, 5637 and RT4) were purchased from American Type Culture Collection (ATCC). T24 and 5637 were cultured in Dulbecco’s modified eagle medium (DMEM, Gibco, Carlsbad, CA, United States) containing 10% fetal bovine serum (FBS, Gibco, Carlsbad, CA, United States) and 100 U/ml penicillin streptomycin. RT4 was cultured in McCoys 5A Medium with 10% fetal bovine serum (FBS, Gibco, Carlsbad, CA, United States) and 100 U/ml penicillin streptomycin. For TLR3 overexpression, the full-length TLR3 was cloned into pCDH-puro vector plasmid.
Western blot
Western blotting was performed as previously described [109]. Proteins were extracted from harvested BC cells separately and quantified by BCA assay. The primary antibodies used in this study were anti-TLR3 antibody (1:1000, CST, 6961s), anti-p65 antibody (1:1000, CST,3034), anti-p-p65 antibody (1:1000, CST,3031), HLA class I ABC Polyclonal antibody (Proteintech, 15240-1-AP) and anti-GAPDH antibody (ab181602, Abcam), β-Actin antibody (CST,4967) in our study.
Flow cytometry Analysis for HLA-I
After collecting cell precipitates and washing them once with PBS, the cells were resuspended in 500 cell staining buffer. Subsequently, they were incubated in the dark for 30 minutes with gentle agitation, then Alexa Fluor® 488 anti-human HLA-A, B, C antibody was immediately added. They were next incubated at room temperature for an additional 30 minutes. Eventually, the samples were analyzed promptly using a flow cytometer (FACSCalibur flow cytometer from BD Biosciences). Each experiment should be conducted at least three times.
Apoptosis Analysis
Apoptosis in cells was assessed using the Annexin V-PE/7-AAD Apoptosis Detection Kit (Liankebio, Hangzhou, China) and flow cytometry. Following the collection of cell precipitates, a single wash with PBS was performed. Subsequently, the cells were resuspended in 200 µl of 1x binding buffer. The resuspended cells were then incubated with Annexin V-PE and 7-AAD at room temperature for 5 minutes, followed by immediate analysis using flow cytometry.
Tissue digestion and Patient derived organoid establishment
Ten tissue specimens from BC patients undergoing transurethral cystectomy (TUR-B) at the Department of Urology, Affiliated Cancer Hospital of Fudan University, were utilized to establish Patient-Derived Organoids (PDOs). Initially, the tissue specimens were immersed in a basal medium (Advanced DMEM F12 Serum-Free medium, Gibco, 12634010). After mechanical disruption, the tumor tissues were minced into a paste, rinsed in Basal medium (800 g for 5 min), and enzymatically digested at 37°C. The digestion mixture contained 2.5 mg/ml Collagenase II (ThermoFisher 17101015) and 10 µM Y-27632-HCl Rock Inhibitor (MCE, 146986-50-7) in Basal medium for 10–15 minutes, with periodic stirring every 5 minutes. The digested tissue underwent additional washing with Basal medium (1000 rpm, 5 min). The resulting precipitate was then resuspended in 1–2 ml TrypLE Express (ThermoFisher, 12605028) and subjected to a 37°C digestion for 5 min. The digestion was terminated using Basal medium through a 70 µm cell strainer (Corning, 352350). The cell filtrate underwent centrifugation once more (1000 rpm, 5 min). The cell clusters were subsequently resuspended in an organoid matrix gel (Corning, 356231) and seeded into Ultra Low Adhesion (ULA) 24-well plates (Corning, 3743). Following a 30-minute incubation at 37°C under 5% CO2 conditions, Bladder Cancer Organoid Specific Medium (500 µl/well, Bladder Organoid Kit, K2126-CB) was added. Relative cell activity was measured using CellTiter-Glo 3D (Promega, G9682) following the manufacturer's instructions [110, 111].
ELISA
To detect IL-10, IL-4, and IFN-beta, specific ELISA kits are utilized (IFN-beta, JL19215; IL10, JL19246; IL-4, JL19870). The procedure involves coating a microplate with capture antibodies that are specific to the cytokines of interest. After coating, the plate is blocked to prevent any nonspecific binding. Next, the cell culture supernatants containing the cytokines are added to the wells, allowing the target cytokines to bind to the capture antibodies. Following a washing step to remove any unbound substances, detection antibodies conjugated with enzymes are added. These antibodies bind to the captured cytokines. After another round of washing to eliminate excess detection antibodies, a substrate solution is added to the wells. The enzyme catalyzes a reaction that produces a measurable color change. The intensity of the color is proportional to the concentration of the cytokines present in the cell culture supernatant. Finally, the absorbance is measured using a microplate reader, and the concentrations of IL-10, IL-4, and IFN-beta are determined by comparing the absorbance values to a standard curve generated with known concentrations of the respective cytokines.
In vitro BCG stimulation assay
Bladder cancer cell lines and organoids were treated with Bacillus Calmette-Guérin (BCG) at a multiplicity of infection (MOI) of 10 [49, 112]. The cells were cultured in an appropriate growth medium until they reached 70–80% confluence. A BCG solution was then prepared and added to the cells to ensure uniform exposure. Following treatment, the cells (24h) and organoids (7 days) were incubated, and after the designated period, they were harvested for analysis.
Statistical analysis
Standard statistical tests were used to analyze the clinical data, including but not limited to Student’s t test, Wilcoxon rank-sum test, Chi-square test, Fisher’s exact test, Kruskal-Wallis test, Log-rank test. For categorical variables versus categorical variables (including gene mutations, gender, age group, smoke status, nerve invasion, vascular invasion, metastasis, hyperglycemia, hypertension, TNM stage, and histological type), Fisher’s exact test was used in a 2 × 2 table, otherwise Chi-square test was used. The Wilcoxon rank-sum test was used to examine whether genes were differentially expressed between different proteomic subtypes, and patients with different mutation statuses and CNA of statuses. The Kruskal-Wallis test was used to test whether genes were differentially expressed among the different tissues type or other subgroups. To account for multiple-testing, the P values were adjusted using the Benjamini-Hochberg FDR correction. Kaplan–Meier plots (Log-rank test) were used to describe OS. Variables associated with OS and PFS were identified using univariate Cox proportional hazards regression models. Significant factors in univariate analysis were further subjected to a multivariate Cox regression analysis in a forward LR manner. All the analyses of clinical data were performed in R (version 3.5.1). For functional experiments, three biological repeats were performed independently, and results were expressed as mean ± standard error of the mean (SEM). The statistical significance of differences was determined by two-way ANOVA. Statistical analysis was performed using GraphPad Prism (version 5.01). The p values less than 0.05, 0.01, 0.001, 0.0001 were marked with *, **, ***, ****, respectively. All the statistical analyses had been checked by two statisticians.