Stem cell signature collection
The 26 human stem cell gene sets used in this study were obtained from StemChecker (http://stemchecker.sysbiolab.eu/) [22] Expression Checks (18), RNAi Screens (1), Literal Curation (2), Computationally Derived (2), and TF Target Genes (3).
Data processing
The datasets used to identify the BLCA stem cell subtypes were from three different platforms: The Cancer Genome Atlas (TCGA), Gene Expression Omnibus (GEO), and ArrayExpress databases. TCGA's RNA-seq data (fragments per kilobase of transcript per million mapped reads, FPKM) of 19 normal samples and 414 cancer samples, variant data of VarScan and clinical information were downloaded from TCGA Knowledge Base (https://portal.gdc.cancer.gov/repository). Gene annotation was performed using the Ensemble database. The ArrayExpress database contains RNA-seq and clinical data (n = 476) for 476 cases of early urothelial carcinoma (E-MTAB-4321) FPKM from the European Genome-phenome Archive. The expression matrices of four GEO datasets, GSE13507 (n = 165), GSE32548 (n = 131), GSE31684 (n = 93), and GSE32894 (n = 308), were all quantile-normalized, and the genes were annotated in their respective platform files Illumina human -6 v2.0 expression beadchip, Illumina HumanHT-12 V3.0 expression beadchip, [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array, and Illumina HumanHT-12 V3.0 expression beadchip.
Identification of BLCA subtypes based on stem cell gene sets
For each BLCA dataset, we used the GSVA package to perform single-sample gene set enrichment analysis (ssGSEA) to quantify the enrichment level of each BLCA sample in the 26 stem cell gene sets. The ConsensusClusterPlus package was used for consensus clustering and stem cell subtype screening of the ssGSEA scores. Briefly, k-means clustering was performed using 50 iterations (each using 80% of samples). The best cluster number was determined by the clustering score for the cumulative distribution function (CDF) curve, and the relative changes in the area under the CDF curve were evaluated.
Survival analysis
The Kaplan–Meier curve was used to describe differences in survival of patients with BLCA in different datasets for classifying stem cell subtypes. We compared the survival prognosis of patients with BLCA (overall survival (OS), relapse-free survival (RFS), and progression-free survival (PFS)). The log-rank test used P < 0.05 as the threshold to detect significant differences in survival time.
Immune checkpoint inhibitor treatment response prediction
Tumor immune dysfunction and exclusion is a calculation method for simulating tumor immune escape primarily by examining how the expression of each gene in the tumor interacts with the level of cytotoxic T cell (CTL) infiltration to affect patient survival [23]. We used TCGA's FPKM RNA_seq expression profile combination subclass mapping method to predict the clinical response of BLCA stem cell subtypes to immune checkpoint blockade [24].
Chemical response prediction
We used TCGA’s FPKM RNA seq expression profile to predict the chemotherapy response of each sample based on the largest publicly available pharmacogenomics database (Genomics of Drug Sensitivity in Cancer (GDSC), https: //www.cancerrxgene.org/); six commonly used chemotherapeutic agents were selected, namely, cisplatin, doxorubicin, gemcitabine, sunitinib, methotrexate and vinblastine. The prediction process was conducted using the R package “pRRophetic” where the half-maximum inhibitory concentration IC50 of the sample was estimated using ridge regression, and the accuracy of the prediction was evaluated using 10-fold cross-validation, according to the GDSC training set. All parameters were set to the default values, and the repeated gene expression was averaged.
Pathway enrichment analysis
We compared the biological changes in every two subtypes in TCGA dataset, and used h.all.v7.1.symbols.gmt as the reference gene set for gene set enrichment analysis (GSEA). Analysis was performed using 1,000 permutations, a <0.05 false discovery rate (FDR) as the screening threshold, and GSEA version 4.0.1.
Evaluation of immune cell infiltration level, tumor purity, and stromal content in BLCA
ESTIMATE was used to evaluate the level of immune cell infiltration, tumor purity, and stromal content in the BLCA stem cell typing [25].
Comparison of immune cell fraction between BLCA stem cell subtypes
CIBERSORT is an algorithm that deconvolves the expression matrix of 22 human immune cell subgroups and can be used to estimate the proportion of immune cells [26]. We set the permutations to 1,000 and used P < 0.05 as the screening threshold. The Kruskal–Wallis test was used to compare the differences in immune cell components of each BLCA stem cell subtype.
Gene co-expression network analysis
To identify key genes or gene networks that characterize various stem cell subtypes in BLCA, we performed weighted correlation network analysis (WGCNA) [27] to detect gene modules associated with stem cell subtypes. The gene matrix is composed of 4,876 differential genes in BLCA control normal tissues (difference is generated by limma package in R, |log2 fold-change| > 1, P < 0.05). WGCNA network construction and module detection used the unsigned topological overlap matrix; the best soft threshold (power) was set to 3, the minimum number of genes in the module was 50, and the branch merge interception height was 0.25. The hub gene was defined as that which has a Pearson correlation (due to the generally low value of the connection weight, the Pearson correlation was used) of greater than 0.30, with connections to at least 10 genes. The gene co-expression network was visualized using Cytoscape 3.7.1 software. Wilcoxon tests were used to examine the expression differences of hub genes between stem cell subtypes. The results of survival analysis were divided into high and low groups based on median expression of transcription factor using GEPIA (http://gepia.cancer-pku.cn/) database [28]. Log rank test was used for survival distribution. Top 20 enrichment pathways obtained using Metascape (http://metascape.org/gp/index.html#/main/step1).
Statistical analysis
Comparison of the stemness index of BLCA stem cell subtypes was performed using the Kruskal–Wallis test. Unpaired t test was used to compare cancer and normal tissue stemness index was performed using the, while comparison of the BLCA molecular subtype and stemness index of T stage was performed using analysis of variance (ANOVA) in GraphPad Prism 8.0. CD274 expression differences between stem cell subtypes was evaluated using ANOVA in R. All tests were two-tailed, and P < 0.05 was considered as statistically significant.
Results
BLCA subtypes identified based on stem cell gene sets
We collected 26 stem cell gene sets representing unique self-regenerating properties (Supplementary Table 1) and quantified the scores of 26 stem cell gene sets in each sample using ssGSEA. We used the ConsensusClusterPlus package to divide all tumor samples into k (k = 2–9) different subtypes. The CDF curve based on the consensus scores achieves the best division when k = 3. Additionally, the principal component analysis results indicated that the ssGSEA scores, based on the 26 stem cell gene sets, were divided into 3 subtypes (Figure 2A–D), which were defined as SCE_H, SCE_M, and SCE_L (Figure 2E). Similarly, we performed the same clustering and subtyping for the remaining datasets E-MATB-4321, GSE13507, GSE31684, GSE32548, and GSE32894 (Supplementary Figure 1).
When using ESTIMATE to evaluate the level of immune infiltration in all datasets, we found that the SCE_H immune score in all 6 datasets was much higher than that for other subtypes, and SCE_L showed the lowest immune score (Supplementary Figure 2). Comparison of the stromal content showed the same trend (SCE_H > SCE_M > SCE_L). However, comparison of tumor purity of the BLCA stem cell subtypes showed opposite results. SCE_H and SCE_L showed the lowest and highest tumor purity (SCE_L > SCE_M > SCE_H), respectively. This is consistent with the results observed for most HLA genes and immune cell marker genes evaluated, such as CD8A (CD8 T cells), GZMA (cytotoxic cells), IFNG (Th1 cells) PMCH (Th2 cells), CD68 (macrophages), and IL17A (Th17 cells), among others, which were significantly upregulated and downregulated in SCE_H and SCE_L, respectively (Supplementary Figure 3).
Due to the close association between BLCA stem cell subtypes and immunity, we focused on the differential expression of CD274 (PD-L1) in each subtype (Figure 3). In all six datasets, SCE_H and SCE_L showed the highest and lowest expression levels, respectively, indicating that the BLCA subtype SCE_H was more sensitive to anti-PD-1 immunotherapy than the remaining subtypes. Subsequent immune checkpoints inhibitor treatment response prediction and survival analysis have both confirmed these results.
Survival of patients with different BLCA stem cell subtypes
Since BLCA is a heterogeneous disease with a high recurrence rate, exploring the association between subtype classification and clinical prognosis is beneficial for prognosis assessment and clinical management of BLCA. We performed OS, RFS, and PFS analysis on the six datasets. Unexpectedly, all datasets showed consistent trends (Figure 4). SCE_H and SCE_L showed the worst and best survival in prognostic analysis (SCE_L > SCE_M > SCE_H), respectively. The P values of the log rank for the OS of TCGA, GSE31684, GSE32548, and GSE32849 were 5.631e-4, 0.038, 2.158e-4, and 8.755e-6, respectively. The P-values for the log-rank of PFS for TCGA, E-MTAB-4321, and GSE13507 were 0.004, 3.976e-9, and 7.046e-4, respectively, and the log-rank P value of TCGA RFS was 0.032.
Prediction of therapeutic response of BLCA stem cell subtypes to immune checkpoint inhibitors
Based on the above results, we further evaluated the responses of the three subtypes to immunotherapy. At present, 5 PD-1/PD-L1 immunotherapy drugs have been approved by the Food and Drug Administration for treating BLCA. This includes nivolumab and pembrolizumab (both PD-1 inhibitors) approved in 2016 and 2017 for treating patients with locally advanced or metastatic urothelial cancer who were administered first-line platinum-containing chemotherapy for one year [29-34]. We used the tumor immune dysfunction and exclusion algorithm to predict the likelihood of a response to immunotherapy. The results showed significant differences in the responses to immunotherapy among the SCE_H (20%, 32/158), SCE_M (42%, 71/168), and SCE_L groups (61%, 54/88) (P = 2.951e-10). We further performed subclass mapping to compare the expression profiles of the three stem cell subtypes which were defined using another published dataset containing 47 patients with melanoma who responded to immunotherapy [35]. In pairwise comparison of the three subtypes, more promising results were observed in SCE_H for the anti-PD1 and anti-CTLA4 treatments compared to the other subtypes (Figure 5A–C) (anti-PD1 therapy: SCE_H vs SCE_L, FDR = 0.036; SCE_H vs SCE_M, p = 0.046; SCE_M vs SCE_L, FDR = 0.048; anti-CTLA4 therapy: SCE_H vs SCE_L, FDR = 0.036; SCE_H vs SCE_M, FDR = 0.008). We further correlated the BLCA stem cell typing results with the published molecular typing and immunotyping results in TCGA cohort. SCE_H primarily corresponded to the molecular subtypes luminal-infiltrated and basal squamous and C1 and C2 for immune subtypes; SCE_L primarily corresponded to luminal-papillary and C1–C4, and SCE_M showed a wide distribution of molecular and immune subtypes (Figure 5D).
Differences in sensitivity of stem cell subtypes to chemotherapy
Since chemotherapy is a common treatment strategy for patients with BLCA, we selected six chemotherapeutic agents (cisplatin, doxorubicin, gemcitabine, sunitinib, methotrexate and vinblastine) and evaluated the response of the three subtypes. We designed the prediction model on the GDSC cell line dataset using ridge regression and evaluated the satisfactory prediction accuracy using 10-fold cross-validation. We estimated the IC50 of each sample in the TCGA dataset based on the prediction models of these six chemotherapeutic agents. For cisplatin, sunitinib and vinblastine, SCE_L was the least sensitive while SCE_H was the most sensitive compared to the other subtypes. For doxorubicin, SCE_M was the most sensitive, while for gemcitabine and Methotrexate, SCE_M was the most sensitive and SCE_L was the least sensitive relative to the other subtypes (Figure 6).
Differences among 22 human immune cell subgroups of BLCA stem cell subtypes in CIBERSORT
To explain the difference in survival of patients with different BLCA stem cell subtypes, we used the CIBERSORT algorithm to calculate the proportions of 22 immune cells in each subtype of the six datasets, with P < 0.05 as the threshold for screening. The results showed that the proportions of macrophages M0, M1, and M2 had an upward trend in the SCE_H subtype (except for GSE13507) and the proportion of regulatory T cells (Tregs) was significantly (P < 0.05) increased in the SCE_H subtype (Supplementary Figure 4A–F). We also used the ssGSEA scores of immune cells in each cohort as continuous variables and performed univariate Cox analysis (Supplementary Table 2). Further, we divided the median value of the corresponding data’s ssGSEA scores into groups with high and low scores. A high score indicated that patients with a high macrophage M0 content had a worse prognosis. This is consistent with the poor clinical prognosis of patients with the SCE_H subtype compared to that of patients with the other subtypes. The trend for macrophages M2 was similar to that of macrophages M0, whereas macrophages M2 and Tregs showed opposite trends (Supplementary Figure 4G–L). This indicates that compared to Tregs, macrophages M0 and M2 have completely opposite regulatory mechanisms during BLCA prognosis.
Correlation of CD274 with stemness genes and risk observation of stem cell subtype populations
We identified an important immune role for CD274 in the stem cell subtypes, and thus we further explored the correlation between CD274 and the identified stemness genes. Figure 7A–G shows scatter plots of the expression of CD274 and stemness genes CD44, GATA3, HIF1A, ID1, MYC, SOX9, and CXCL8 in BLCA TCGA cohort. Among them, CD274 was negatively correlated with ID1 and GATA3 and positively correlated with CD44, HIF1A, MYC, SOX9, and CXCL8 (Figure 7H). We divided the patients according to the optimal expression cut-off of CD274 and each stemness gene into risk groups I, II, III, and IV (for example, CD274 and CD44 corresponded to CD274lowCD44low, CD274highCD44low, CD274low CD44high, and CD274high CD44high). According to the scatter plot, for each pair of risk groups divided by CD274 and the optimal threshold of stemness gene expression, patients with stem cell subtypes primarily belonged to groups I and III (87–88%), while SCE_L subtype was primarily concentrated in risk group III of the CD274 and GATA3 and ID1 pairs, and in the CD274 and CD44, finally, the HIF1A, MYC, SOX, and CXCL8 gene pairs were concentrated in risk group I. Further survival analysis of groups I and III of each gene pair showed that patients with the higher SCE_L subtype had better survival than those with a lower SCE_L subtype (Figure 7I–O). This is consistent with the observation that patients in the SCE_L group had the longest survival duration.
GSEA for BLCA stem cell subtypes
To explore the biological changes caused by differences in the enrichment of stem cells, we conducted pairwise comparison of the GSEA results for each subtype.
By selecting at least one pathway with an FDR <0.05, we found that as the enrichment of stem cells increased, epithelial mesenchymal transition (EMT) became more significant. EMT is considered as a signal of malignant transformation in all cancers, giving cells the ability to metastasize and invade, by imparting stem cell characteristics, reducing apoptosis and aging, and resisting chemical and immunotherapy [36, 37] (Supplementary Table 3). EMT can also activate multiple pathways, regulate cell metabolism, angiogenesis, proliferation, and migration, and enable cells to respond hypoxic environments. Pathways are also significantly enriched, for example, during hypoxia, angiogenesis, inflammatory response, IL6-mediated JAK-STAT signaling pathway, and KRAS signal upregulation (Figure 8). These pathways together constitute a vicious circle of cancer occurrence, proliferation, invasion, and metastasis.
Somatic mutation landscape of BLCA stem cell subtypes with identified pathways
The tumor suppressor genes, TP53 and RB1, play an important role in regulating cell division [38]. Inactivation, mutations, and deletions of TP53 and RB1 are one of the primary causes of BLCA [39]. The mutation frequency of TP53 and RB1 in SCE_L (31% and 4%) was much lower than that in SCE_M (56% and 25%) and SCE_H (47% and 18%). However, the mutation frequency of STAG2, one of the most commonly mutated genes in BLCA [40], in SCE_L (24%) was significantly higher than that of SCE_M (11%) and SCE_H (9%). In the identified EMT pathways, the mutation frequency of COL6A3, LRP1 and FBN2 in SCE_L (12%, 12% and 13%) was significantly higher than that of SCE_M (3%, 6% and 5%) and SCE_H (7%, 4% and 4%). In the hypoxia pathway, no MYH9 mutation was observed in SCE_L (0%), while the mutation frequencies in SCE_M and SCE_H were 5% and 8%, respectively. In addition, the mutation frequencies of CDKN1A in SCE_L, SCE_M and SCE_H were 13%, 11% and 6%, respectively and in the KRAS signaling pathway, the mutation frequencies of RELN in SCE_L, SCE_M and SCE_H were 12%, 6% and 5%, respectively (Figure 9). High-frequency gene mutations during angiogenesis, inflammation, and in the IL6 / JAK / STAT signaling pathway were not obtained.
Key gene networks identified in BLCA stem cell subtypes
WGCNA is used to describe correlation patterns between genes. Using microarray gene expression data, or RNA-seq gene expression data, WGCNA can be used to identify highly correlated gene sets (module), which are randomly assigned different colors. The colors are only used to distinguish different modules with no practical meaning or associated value. WGCNA promotes a network-based genetic screening method that can be used to identify candidate biomarkers or therapeutic targets. A total of 14 gene modules were generated based on WGCNA. Among them, brown and black modules showed the strongest association with stem cell subtypes. The brown module was positively correlated with SCE_L and negatively correlated with SCE_H and SCE_M, while the black module was positively correlated with SCE_H and negatively correlated with SCE_M and SCE_L (Figure 10A). We intersected the genes in the black module most correlated with SCE_H and the genes in the brown module most correlated with SCE_L with human transcription factors identified using Cistrome (http://cistrome.org/). Two pairs of transcription factor regulators were identified in the brown and black modules (Figure 10B–C), IRF5 and GATA3 and GRHL2 and GATA6. IRF5 is a key transcription factor regulating the differentiation of M1 macrophages into M2, enabling its anti-inflammatory role; these cells also influence tissue repair and reconstruction as well as cancer occurrence [41-43]. GATA3 is a type 2 helper T cell (Th2) cytokine-specific transcription factor and a key stemness gene that regulates cell differentiation. It enables Th2 to express IL-4 and other cytokines, promotes antibody production, mediates humoral immunity, and suppresses anti-tumor immunity [21, 44, 45]. Therefore, GATA3 may act as a tumor suppressor gene in BLCA. GRHL2 and GATA6 play various regulatory roles during embryonic development, damage repair, epidermal barrier formation, tracheal epithelial formation, and neural tube development. They also play an important role in the occurrence and development of tumors, cell proliferation, invasion, and metastasis, which were verified by the relevant pathways identified in the black module [46-52] (Figure 10D–E). Thus, we identified GATA3 and GATA6 as important transcription factors with opposite expression and effects on tumor prognosis in patients with different BLCA stem cell subtypes (Figure 10F–I).