SBS4-like signature identified in BCa tumors
The optimal number of de novo SBS signatures was determined as k = 7 (Figure S2, upper). In the mutation signatures extracted, we found one with high similarity to the COSMIC SBS4 signature (SBS de novo 2, cosine similarity = 0.86). Mutation signatures equivalent to or resembling COSMIC SBS1 (SBS de novo 7, cosine similarity = 0.97), COSMIC SBS2 (SBS de novo 6, cosine similarity = 0.99), COSMIC SBS13 (SBS de novo 1, cosine similarity = 0.89), COSMIC SBS5 (SBS de novo 3, cosine similarity = 0.83), and COSMIC SBS10b (SBS de novo 5, cosine similarity = 0.79, showing contamination of COSMIC SBS10a) were extracted as well. One signature, SBS de novo 4, with minor contribution to the overall mutation load, showed moderate similarity (cosine similarity = 0.66) with COSMIC SBS3 related with homologous recombination deficiency (HRD). De novo signatures with ~0.9 or higher cosine similarity to established COSMIC SBS signatures were thereafter considered as the corresponding COSMIC signature, those with ~0.8 cosine similarity was named as the corresponding COSMIC signature followed by a ‘-like’ suffix, except the SBS de novo 4 signature which was re-named as “Moderate similarity to SBS3, HRD’ (Figure 1A). Namely, SBS de novo 1 was re-named as SBS13, SBS de novo 2 as SBS4-like, SBS de novo 3 as SBS5-like, SBS de novo 4 as SBS3 moderate similarity, SBS de novo 5 as SBS10b-like, SBS de novo 6 as SBS2, and SBS de novo 7 as SBS1. The heat map showing systematic cosine similarity analysis of the mutation signatures were shown in Figure S2 lower panel.
APOBEC-associated mutagenesis contributed most to the overall SBS mutation load (SBS13, #mutations = 55,828, 43.3%; and SBS2, #mutations = 31,244, 24.2%) same as previously reported , followed by SBS5-like (#mutations = 11,788, 9.1%), SBS4-like (#mutations = 8,436, 6.6%), and SBS1 (#mutations = 7,682, 6.0%). The SBS10b-like signature associated with certain POLE exonuclease domain mutations were basically limited to one hypermutator sample with POLE P286R mutation (TCGA-DK-A6AW-01; Figure S3). The distribution of sample-wise mutation signature fraction scores was shown in Figure 1B.
SBS4-like signature associated with smoking history and other smoking somatic signatures
TCGA BLCA tumors without documented tobacco-smoking history (n = 13) or current reformed smoker with unspecified duration (n = 10) were excluded from this analysis. The remaining TCGA BLCA patients were classified to four tobacco-smoking history categories, namely lifelong non-smoker (n = 111), current smoker (n = 89), early reformed smoker (current reformed smoker for >15 years, n = 114), and late reformed smoker (current reformed smoker for ≤ 15 years, n = 73). Applying a mutation etiology estimation algorithm by Letouzé and colleagues, SBS4-like mutations were further refined here to those with SBS4-like signature as the most probable origin, totaling 5,057 mutations. We found a significantly higher proportion of tumors with detected SBS4-like mutations as well as higher number of SBS4-like mutations in current or late reformed smokers than early reformed or lifelong non-smokers (Fisher’s exact test, P = 0.017; Wilcoxon test, P = 0.008; Figure 2A and 2B). All other signatures expect SBS5-like showed null association with tobacco-smoking behavior, consistent with previous reports (Figure S4) [15, 17].
We further demonstrated the correlation of SBS4-like signature with other known tobacco-smoking induced mutation signatures, ie ID3 and DBS2, as tumors with detected DBS2 (characterized by CC > AA mutations) and ID3 (characterized by 1bp deletion of C) mutations showed significantly higher number of SBS4-like mutations (Wilcoxon test, P = 6.9 × 10−9 and 0.038, respectively; Figure 2C and 2D).
SBS4-like mutations showed transcriptional strand bias
We found transcriptional strand bias featured by more mutated G than C bases on the untranscribed strands of genes for C > A and T > A mutations (binomial test, P = 1.2 × 10−25 and P = 0.005, respectively), a characteristic coherent with what has been observed in the COSMIC SBS4 mutations (Figure 2E).
SBS4-like signature anti-correlated with APOBEC mutagenesis
We looked at the Spearman’s correlation coefficients among APOBEC mutagenesis fraction scores calculated in our previous work  and that of the SBS4-like signature which was calculated as the sample-wise proportion of NMF-deconvolution derived SBS4-like signature mutation counts against overall mutation load. The APOBEC fraction scores were as expected highly correlated with each other, and both showed significant anti-correlation with the SBS4-like signature fraction (SBS2 vs SBS4-like, Spearman’s rho = -0.58, P = 2.9 × 10−37; SBS13 vs SBS4-like, rho = -0.50, P = 4.8 × 10−37; APOBEC vs SBS4-like, Spearman’s rho rho = -0.60, P = 1.9 × 10−41; Figure 3A), suggesting potential competing relationship between these two types of mutagenesis. Coherently, the SBS4-like high tumors (dichotomized at median of SBS4-like mutation count, 10) showed significantly lower APOBEC mutation fraction and load both, compared with SBS-like low tumors (Wilcoxon test, P = 4.2 × 10−13 and 4.8 × 10−6, respectively; Figure 3B and 3C).
SBS4-like signature associated with BCa molecular subtype
We further asked whether there was a link between SBS4-like mutagenesis and TCGA BLCA molecular subtypes. The MIBC consensus classes by Kamoun and colleagues was considered . We found that the LumU tumors showed a strong enrichment of SBS4-like mutations (Figure 4A; Kruskal-Wallis test, P = 5.6 × 10−5; Wilcoxon test, P = 3.1 × 10−6). With categorical analysis, the SBS4-like high tumors showed significantly higher proportion of LumU subtype tumors and lower proportion of stroma-rich and LumP subtype tumors, compared to SBS4-like low tumors (Figure 4B; Chi-squared test, P = 3.7 × 10−4). This enrichment was independent of overall tumor mutation load exclusively for the SBS4-like mutations, thus ruling out the possibility that the observation was merely a side-effect of higher global genomic instability in LumU tumors (Figure 4C).
SBS4-like mutation load correlated with tumor grade and prognosis
We then asked whether SBS4-like mutation load was correlated with patient’s clinicopathologic factors and prognosis. The SBS4-like high tumors showed significantly higher proportion of high-grade tumors, and the two groups had similar tumor stage, and patient’s age and gender profiles (Figure 4D and Figure S5). The SBS4-like high group tumors had significantly worse disease-free survival (DFS) and overall survival (OS) compared with SBS4-like low group tumors (Figure 4E; median DFS, 24.8 months in SBS4-like high and 54.8 months in SBS4-like low tumors, log-rank test, P = 0.02; median OS, 24.3 months in SBS4-like high and 46.8 months SBS4-like low tumors, log-rank test, P = 0.03). SBS4-like mutation load showed independent prognostic significance in a multivariate Cox model with age, gender, tumor stage, tumor grade, and consensus classification subtypes as adjustment covariates (SBS4-like high vs low, hazard ratio = 1.42, 95% confidence interval = [1.04, 1.94]; P = 0.03).
SBS4-like mutation load associated with enhanced cell cycle, suppressed IFN response and immune infiltration, and squamous differentiation
We performed transcriptome differential analysis comparing SBS4-like high and low tumors. The genes were pre-ranked in decreasing order by log2FC for GSEA, considering the Broad Institute Hall Mark 50 gene sets. Among the gene sets with statistically significant enrichment (false discovery rate (FDR) < 0.05), the cell cycle related ontologies (eg., cell cycle progression: E2F targets, normalized enrichment score (NES) = 2.29, FDR = 0; cell cycle progression: G2/M checkpoint, NES = 1.90, FDR = 6.2 × 10−4) were top enriched for genes up-regulated in SBS4-like high tumors, and the immune/inflammation/interferon related ontologies (eg., interferon gamma response, NES = -2.47, FDR = 0; allograft rejection, NES = -2.47, FDR = 0; inflammation, NES = -2.12, FDR = 0) for those down-regulated (Figure 5A). We then looked at the difference in tumor microenvironment infiltration between SBS4-like high and low tumors, using transcriptome-derived cell population abundance estimated with the MCP-counter. Similar with findings of gene expression differential analysis, we found significantly lower abundance of T cells, myeloid dendritic cells, neutrophils, and endothelial cells in SBS4-like high tumors than SBS4-like low tumors, as well as a trend for CD8 T cells, cytotoxic lymphocytes, NK cells, B lineage cells and monocytic lineage cells (Figure 5B).
We also noted among the top genes up-regulated in SBS4-like high tumors an enrichment of skin / epidermis development / keratinization related genes, including squamous cytokeratin (KRT1, KRT6B, etc) and keratinization/cornification related proteins (squamous Kallikrein-related peptidase, small proline rich protein family, late cornified envelope family, desmoglein family, etc; Figure 5C, Figure S6, and Supplementary Table S4). Given expression of these genes was basically found in Ba/Sq subtype tumors but no difference in the proportion of Ba/Sq tumors was observed between the SBS4-like high and low groups, we further performed transcriptomic differential analysis within the Ba/Sq tumors. Consistent with what was observed in global analysis, the Gene Ontology (GO) terms enriched for the top 200 genes up-regulated in SBS4-like high Ba/Sq tumors were basically epidermis/keratinization related (eg., GO:0031424, keratinization, FDR = 0; GO:0009913, epidermal cell differentiation, FDR = 0; Figure 5D). This was further validated by pre-ranked GSEA analysis with single cell RNA-seq derived skin supra-basal/basal keratinocyte markers (supra-basal keratinocyte markers, NES = 2.22, FDR = 0; basal keratinocyte markers, NES = 1.82, FDR = 0; Figure 5E and 5F). On the other hand, the SBS4-like high Ba/Sq tumors showed loss of urothelial differentiation in GSEA analysis with both single cell RNA-seq derived urothelial cell markers and markers of luminal epithelial cells found in a Ba/Sq tumor (urothelial cell markers, NES = -1.45, FDR = 0; markers of luminal epithelial cells found in Ba/Sq tumor GSM4307111, NES = -2.12, FDR = 0; Figure 5G and 5H).
In contrast, CpG methylome wide comparison revealed only a few differentially methylated CpG sites (Limma Padj < 0.05) between these two groups with no difference in global CpG methylation status (Figure 5I and Table S3).
SBS4-like mutations enriched in tumors with AHR amplification and high CYP1A enzyme expression
Given the key role of AHR-mediated xenobiotic metabolism in the activation of tobacco-smoking carcinogens, we investigated the association between AHR copy number alterations and SBS4-like signature. We here did not consider the AHR mutations as we demonstrated the AHR Q383H hotspot was induced by APOBEC thus less likely to be associated with SBS4-like given the anti-correlation between the two, and the other AHR mutations were basically non-recurrent of minor significance. We found that tumors with AHR genomic amplifications showed significantly higher SBS4-like mutation load (Wilcoxon test, P = 8.1 × 10−4; Chi-squared test, P = 5.8 × 10−4; Figure 6A left and right), which was not observed for all other SBS signatures (Figure S7). Our genome-wide analysis of somatic CNV further confirmed this as a genuine biological effect as the AHR gene was found within the 7p21.1 locus with top differential amplification rate between the tumors with high and low SBS4-load (proportion of 7p21.1 amplified = 11.7% and 1.96%, number of gene-level amplification event = 287 and 35, respectively in SBS4-high and low tumors; odds ratio = 6.62, Fisher’s test, P = 8.6 × 10−5; Figure S8). The CYP1A1 and CYP1A2 enzymes have been established as major canonical down-stream effectors of AHR-mediated carcinogen metabolism [39, 40], which we further confirmed by transcriptomic DEGs analysis in RT4 BCa cells pre- and post- treatment of 5uM tobacco carcinogen BaP for 24 hours as CYP1A1 and CYP1A2 were the top 2 up-regulated canonical xenobiotic CYP enzymes induced by BaP (log2FC = 10.8 and 4.5, Padj. = 1.5 × 10−30 and 1.9 × 10−17, respectively; Figure S9). We then tested the association between SBS4-like mutation load and the collective gene expression level of the two enzymes measured by gene set variation analysis (GSVA), and we found that tumors with high CYP1A1/A2 expression showed significantly higher SBS4-like mutation load (Figure S10), further suggesting a role of AHR in tobacco-smoking induced mutagenesis in BCa.
BaP-induced cytotoxicity and DNA-damage dependent on AHR-mediated xenobiotic metabolism in BCa cells
BaP treatment showed concentration and time-dependent inhibition on cell viability of luminal RT4 cells with high AHR expression (Figure S11), which was antagonized by pre-treatment of 5uM AHR inhibitor CH-223191 in a concentration dependent manner (Figure 6B, left). In contrast, the non-luminal T24 cells with low AHR expression (Figure S11) showed neutral effect in cell viability in response to the treatments (Figure 6B, right).
In RT4 cells, pre-treatment of 5uM AHR inhibitor CH-223191 largely attenuated the effect of 5uM BaP treatment for 24h on the induction of CYP1A1/A2 enzyme activity as measured by EROD assays, as well as the production of BPDE, a genotoxic metabolite of BaP, and DNA damage biomarkers including 8OHdG and γH2AX (Figure 6C). In contrast, all these effects were absent in T24, which showed no response to BaP treatment (Figure 6C).
No effect of short-term BaP exposure on gene expression of APOBEC3A
Our previous study showed that the APOBEC-associated hotspot mutations were basically of YTCN (Y = C or T) motif and tumors presenting these hotspots had a higher APOBEC3A expression-level compared with other tumors, suggesting a major role of APOBEC3A in APOBEC-mediated mutagenesis process in bladder cancer . We in this study found an anti-correlation between APOBEC and tobacco-smoking induced mutation load, thus then tested the effect of short-term BaP exposure on APOBEC3A expression level in RT4 and T24 cell lines. We showed APOBEC3A was not among the significant DEGs between RT4 cells pre- and post-treatment of 5uM BaP for 24h in the bulk RNA-seq analysis (log2FC = 0.22, Padj = 0.83; Figure S9). This was further confirmed by RT-PCR analysis (Figure 6D).