Clinical samples and study design
Frozen samples obtained from the Children’s Healthcare of Atlanta Pediatric Biorepository were viably thawed and processed using 10X genomics platform for single cell profiling. The cohort of samples was made up of genetically and morphologically diverse cases of AML (Table S1A). A total of 23 bone marrow (BM) samples were processed from 15 patients (nine with CCR and six with relapse). Of the 23 BM samples, 14 were collected at Dx, five were collected at EOI, and four were collected at relapse (Figure S1, Table S1B).
Single cell analysis identified heterogeneous patient specific AML blasts
ScRNA-seq analysis of four paired Dx and EOI samples (three from patients with relapse and one from a patient with CCR) was carried out. After preprocessing, alignment, and quality control steps including removal of low-quality reads and cells, the remaining 19,350 cells were analyzed using unsupervised and supervised approaches. The unsupervised analysis on preprocessed and normalized scRNA-seq data using Uniform Manifold Approximation and Projection (UMAP) approach (4) identified 14 single cell clusters (Fig. 1A). Ten of these clusters were manually annotated based on the expression of well-established gene markers as differentiated immune and stromal cells (Fig. 1B). The remaining clusters enriched with undifferentiated cells were putatively considered as blast cell clusters and annotated based on top cluster specific overexpressed genes (Fig. 1B, Figure S2). The differentiated immune cells included T cells (CD3D+, IL32+), B cells (CD19+, CD79A+, MS4A1+), monocytes (CD14+; mono), macrophages (CD68+; macro), monocytes/macrophages (CD14+, CD68+; Mono/mac), plasmacytoid dendritic cells (GZMB+, IL3RA+, IRF8+; pDC), and neutrophils (CD63+) (Fig. 1B). The four undifferentiated blast cells exhibited overexpression of canonical blasts associated marker genes (MPO+, CD33+, CD34+) and/or multiple lineage genes (CD3+, CD19+, ELANE+, etc.) (Fig. 1B, Figure S2). Three of the four blast cell clusters, i.e., MPO+, AZU+, and CD38+ blasts, were sample specific as they were enriched with cells from individual patients (Fig. 1C) indicating significant heterogeneity among the blast cells. Additionally, the split UMAP analysis showed that these three canonical blast cell clusters were over-represented in the Dx samples compared to EOI samples (Fig. 1D). Also, these three Dx enriched blast cell clusters depicted high expression of well-established AML blast cells associated genes, namely, MPO and CD34 (Fig. 1E). In summary, scRNA-Seq data distinguished heterogeneous AML blast cells as well as immune cells in the Dx TME samples.
Identification of AML blasts progenitor signature
The supervised comparison of Dx enriched AML blast cell clusters and EOI enriched non-blast cell clusters (Fig 1D), based on the Wilcoxon rank test, identified a set of 232 significantly differentially expressed genes (DEGs) (adjusted P-value<0.01 and log FC > 1.2). To further refine the gene list, we performed external validation analysis using the TARGET AML bulk transcriptome data (3) that contains Dx BM samples with different proportions of blast cells and EOI BM samples. The TARGET AML Dx samples were partitioned into 3 bins based on the extent of disease burden (>60%, 30-60%, <30% blast cells) to determine the expression patterns of the identified 232 blast-related genes. 44 genes were significantly overexpressed in high blasts samples (>60%) compared to EOI samples as well as high blasts samples compared to low blast (<30%) samples (FC >1.75 and P-value <0.05) (Figure S3, Table S2). We further evaluated the expression patterns of these 44 genes in the blast cell clusters in scRNA-Seq data (Fig 1F, Table S3). Of the 44 genes, 20 genes depicted pattern of specific overexpression in the blast cell clusters suggesting the existence of a common progenitor AML blast cell signature cross heterogeneous blast cells (Table S4). A heatmap depicting the expression pattern for these 20 blast specific genes in TARGET AML data is shown in Fig. 2A. The blast cells overexpressed genes include genes related to anti-apoptosis (PRAME, MSLN, CITED4), growth factor for progenitor hematopoietic cells (CLEC11A), cell proliferation (CAPRIN1), and PPARα-induced proliferation and tumor growth (FABP5) (Fig. 2A). Feature-plot of two of the identified blast associated genes (NREP and CLEC11) depict uniform expression in the heterogenous blast clusters (Fig. 2B) that is equivalent to or better than some of the genes used clinically for identifying AML blasts (CD34, MPO, CD33, and CD56/NCAM1) (Fig. 1E, Figure S4). The blast associated genes expression exhibited correlation with the blasts percentages in the AML BM samples (Fig. 2C). Also, majority of the blast associated genes exhibited significant association with overall survival (OS) in the TARGET AML data set (Table S5, Fig. 2D) indicating their possible involvement in AML progression.
To verify blast cells characterization using our 20-gene signature, we also looked at blast cell clusters in longitudinal samples (Dx, EOI, Rel) from two patients with negative MRD at EOI. The analysis revealed significantly fewer blast cells in the EOI samples as compared to Dx and Rel samples; the split UMAP (Fig. 2E) and bar plot (Fig. 2F) shows the two immature blast cell clusters (MPO+ and AZU1+) that predominate in Dx and Rel samples are minimally represented in EOI samples. To generate a more concise AML blast specific signature and evaluate performance in discriminating AML blast cells from other cells, we implemented supervised machine learning approach, support vector machine (SVM), and calculated its performance using cross-validation. SVM is a powerful supervised machine learning technique for identifying multigene biomarker panels from complex datasets (5, 6). The analysis identified a set of 7 genes (CLEC11A, PRAME, AZU1, NREP, ARMH1, C1QBP, TRH) that discriminated AML associated blast cells from non-AML cells with an AUC of 0.968 (Fig. 2G, Table S6). Several of these genes, including CLEC11A (7), PRAME (8), and AZU1 (9) have been previously associated with AML. The analysis also identified a unique gene, ARMH1/C1orf228, that codes for a yet uncharacterized protein and has not been previously reported to be associated with pediatric or adult AML. NREP is also known to affect cell development (10), while PRAME may alter metabolic and immune pathways (11). This unique set of genes overexpressed in AML blasts are therefore associated with a myriad of cell functions that promote tumor growth and spread. The module score based on 7-gene signatures effectively distinguished AML blast cells from non-AML cells as only MPO+ and AZU+ blasts clusters had positive module scores (Figure S5). The clinical significance of this 7-gene signature was tested by evaluating correlation with survival in the TARGET AML data. Patients having high expression of the selected 7 genes had a significant correlation with poor survival (HR=2.3 and Log Rank P-value=.007) compared to patients with low expression of these genes (Fig. 2H, Table S7A).
Comparative analysis of Dx relapse- and CCR-associated samples
We performed a comparative analysis of 14 bone marrow samples collected at Dx, of which 6 later relapsed (1D - 6D) and 8 remained in CCR (7D - 14D) (Figure S1, Table S1). After quality control, processing, and normalization, 28,181 cells clustered into 15 distinct clusters (Figure S6A). Based on the 7-gene blast signature module score, 14,166 cells were classified as AML blast cells and the remaining 14,015 cells were classified as non-AML cell clusters (Figure S6B). The UMAP projection depicted that CCR-associated clusters 1,3,4,6,8,10,14 and relapse-associated clusters 0,2,5,7,9,11,12,13 clustered separately, suggesting differential transcriptome makeup of the 2 sets of blast cells (Fig. 3A). Also, we observed patient specific clusters indicative of inter-patient variation in the blast cells profiles (Fig. 3B). Differential expression analysis of select relapse-associated samples enriched clusters (0,2,7,13) and CCR-associated samples enriched clusters (1,3,4) revealed significant transcriptome differences (Fig. 3C). The CCR-associated clusters exhibited high expression of genes including MPO, IFITM3, TRH, PRTN3, HLA-DPM1, and NPM1 while relapse-associated clusters expressed high levels of genes including CRIP1, FLNA, and RFLNB/FAM101B (Fig. 3C). Survival analysis using the SurvivalGeneie tool TARGET AML dataset indicated that genes like FLNA (cytoskeleton component, signaling scaffold and Pol1 transcription regulator) and RFLNB/FAM101B (formation of cartilaginous skeletal elements) overexpressed in relapse-associated cell clusters were significantly associated with poor OS (Fig. 3D, supplementary table 5C). On the other hand, genes overexpressed in the CCR-associated clusters such as MPO (microbicidal activity of neutrophils) and TRH (controls the secretion of thyroid-stimulating hormone) were significantly associated with better OS (Log Rank P value=0.001, 0.000396 for MPO, TRH respectively, Table S7B, Fig. 3D). Additionally, higher expression of RFLNB/FAM101B genes showed a significant correlation with shorter EFS while lower expression was associated with intermediate EFS (Figure S7A). The interactive string network analysis based on known and predicted protein-protein interactions (13) identified FLNA as a key interactant of RFLNB/FAM101B. Another identified interactant of RFLNB/FAM101B was WDFY4, which plays a critical role in the regulation of classical dendritic cells (cDC1) mediated cross-presentation of viral and tumor antigens (14) (Figure S7B). The Kaplan-Meier plots showed a significant correlation of high expression of combined RFLNB/FAM101B and WDFY4 genes with poor OS (Log Rank P-value=0.017) and shorter EFS (Log Rank P-value=0.00094) (Figure S7C, D; Table S7A). Furthermore, pathways analysis on relapse-associated blast cells differentially expressed genes depicted significant activation (P-value <0.01) of multiple pathways including “RhoGDI signaling”, “eNOS signaling”, “Androgen Signaling”, and “Protein Kinase A signaling” (Fig. 3E). These relapse-associated blast cells also depicted significant inhibition of pathways related to JAK/STAT signaling, HIF1-α, Interferon, and neuroinflammation signaling (Fig. 3E). Further upstream regulator-based systems biology analysis identified activation of a highly connected cohesive network of cell growth and proliferation related master regulators (e.g., MTA1, NKX2.3, TCF3) in relapse-associated blasts (Fig. 3F). The analysis also depicted significant inhibition of multiple immune and metabolism related key molecules including HIF1A, IRF7, STAT1 (Figure S7E) in relapse-associated samples enriched AML blast cells.
Dx relapse-associated non-AML clusters depicted enrichment of exhausted T cells and diminution of M1 macrophages
Next, we performed focused analysis on the TME non-AML cells in the 14 Dx samples (Fig. 4A). Clusters were annotated based on the expression of specific gene markers (Fig. 4B). CCR-associated clusters had a higher representation of monocyte/macrophages (CD14+, CD16+) while relapse-associated clusters depicted a higher proportion of T cells (CD3+) (Fig. 4C). These results suggest differences in the type of immune cells populating the TME at the time of Dx in those who go on to achieve CCR as compared to those who eventually suffered relapse.
To understand if a particular subtype of T cells is associated with AML relapse/CCR, we sub-clustered Dx T cells (cluster 0, Fig. 4A) using Monocle package (15) which revealed 11 distinct subclusters of T cells (Fig. 5A). Most of the patients showed separate subclusters for naïve and activated T cells (Fig. 5B, Figure S8A). While naïve T cell (CCR7+, LEF1+, TCF7+) sub-clusters 2, 4 and 7 are CD8+, sub-clusters 1 and 3 were CD4+ clusters (Figure S8A, B). Clusters 5, 6, 9, 10, and 11 were activated T cell clusters expressing CCL5, KLRB1, KLRD1, GZMH, CD69, and CD44 (Fig. 5B). The unequal distribution of CCR- and relapse- associated samples in different clusters in the T cell subset is suggestive of relapse- and CCR-associated transcriptome variations in T cell sub-populations (Fig. 5B). Comparative analysis between relapse- and CCR- associated samples revealed that the former had a significantly higher percentage of T cells (Fig. 5C). Also, T cell exhaustion estimation based on ssGSEA score(16, 17) revealed significantly higher exhaustion of T cells (P value<0.0001) in relapse-associated samples compared to CCR-associated samples (Fig. 5D). The relapse enriched clusters (sub-clusters 5 and 6) depicted over-expression of T cell exhaustion marker genes including HAVCR2, LAG3, PDCD1, NFATC1, TIGIT, and TOX (Figure S8B). The relapse-associated samples dominant clusters 1 and 11 depicted overexpression of CD69, a type II glycoprotein that is known to regulate inflammation and exhaustion of tissue resident T cells and promoting tumor growth/relapse (18). On the other hand, significantly more naïve T cells (CCR7, LEF1, TCF7) were observed in the CCR-associated T cell subsets (Fig 5D). Further GSEA (16, 17) analysis depicted higher proportion of cells with T regulatory signature in the patients with relapse as compared to patients with CCR, hinting toward immunosuppression in patients that eventually had a relapse (Fig 5E). Comparative analysis of relapse-associated sub-clusters 3,5, and 6 and CCR-associated sub-clusters 7,8, and 10 identified significant differentially expressed genes (DEGs) (Fig 5F). Relapse-associated sub-clusters showed enhanced expression of genes associated with effector T cells (CCL5, NKG7, GNLY, GZMA, GZMK, NFATC1), ubiquitin gene; UBC associated with cell differentiation and certain immune cell functions (19), inflammation associated and mitogen inducible gene, DUSP2 (20). On the other hand, CCR-associated clusters showed higher expression of naïve T cell marker genes (LEF1, TCF7, CCR7, SELL), microtubule associated gene; RPL41 (21), scaffold protein gene; ABLIM1 (Fig. 5F), The pathway analysis of relapse-associated upregulated DEGs revealed upregulation of multiple immune regulatory pathways including Th1 pathway, calcium induced T lymphocyte apoptosis, and to a lesser extent, upregulation of T cell exhaustion pathway with downregulation of PD-1 and PD-L1 cancer immunotherapy pathways (Fig. 5G). These results suggest that T cells at Dx exhibit relapse- and CCR-associated specific enrichment of T cell subtypes that may play a role in the outcome in AML.
Inflammatory monocytes/macrophages enriched in CCR-associated samples
Focused analysis of Dx non-AML cells identified three monocyte/macrophage cell clusters with differential enrichment of samples from those with CCR compared to those with relapse (Fig. 6A). Clusters 1 and 2 were enriched with CCR-associated Dx samples while cluster 7 was almost exclusively comprised of relapse-associated Dx samples (Fig. 6A, Figure S9A). Cells in cluster 7 overexpressed many of the 20-gene signature genes including PRAME, CITED4, CLEC11A, KCNE5, and MFSD10, unlike clusters 1 and 2, indicating the immaturity of these cells (Figure S9B). Cluster 7 also demonstrated expression of multiple immature marker genes including MEIS1, MSI1, PROM1, and EGFL7 (Figure S9C). Further DEGs analysis between CCR- and relapse-associated samples enriched monocytes/macrophages revealed upregulation of S100A inflammatory genes family and protease inhibitors, TIMP1 and CST3 in the CCR-associated samples enriched clusters (Fig. 6B). The analysis for canonical M1 and M2 macrophage markers revealed that CCR-associated clusters depicted overexpression of M1 macrophage markers (IL6, CD86, TNF, IL1B, FCGR2A, CSTA, S100A8, S100A9, S100A12, TYROBP, FCN1), while the relapse-associated cell cluster 7 mostly depicted overexpression of M2 macrophage markers (MRC1, ARG1, MMP9) (Fig. 6C). Further upstream regulator analysis of DEGs from the CCR-associated samples enriched clusters revealed activation of multiple M1 macrophage polarization related key regulators including FOS, TREM1, CD44, and IL1B along with activation of the inflammatory response associated NFkβ2 (Fig. 6D). The analysis also depicted the downregulation of key regulators related to M2 macrophage suppression including SAMHD1, HOXA9, and SATB1 (Fig 6D) CCR-associated samples enriched clusters. SAMHD1 has been shown to play a critical role in inhibiting innate immunity through inhibition of NF-kβ and interferon pathways (22). Further evaluation of gene expression levels for key regulators FOS, TREM1, and NFKBI depicted high expression in CCR-associated samples while CAT and SAMHD1 were higher in the relapse-associated samples (Fig. 6E). To summarize, Dx CCR-associated samples are enriched with inflammatory M1 macrophages, whereas relapse-associated samples depicted enrichment of immature monocyte population of cells exhibiting mixed M1/M2 lineage markers.
Characterization of treatment-resistant residual cell population at EOI
Sc-RNA-Seq analysis of EOI samples from patients with relapse (n=3) and CCR (n=2) generated 15,070 single cell profiles. The analysis identified 13 distinct cell clusters based on trancriptome profile (Fig. 7A) that were annotated based on enrichment of 20 blast overexpressed genes (Figure S10A) and expression of canonical cell specific marker genes (Fig. 7B). The module score-based classification using the 7-gene blast signature identified enrichment of blast cells in four clusters with a cut-off value of scaled ssGSEA score set at >0.25 (Fig. 7C). To identify residual blast cells, we performed an integrated analysis of single cell data obtained at the time of Dx and EOI. The analysis assisted in identifying blast cells present before (Dx) and after treatment (EOI), namely clusters 4, 6, and 8 (Figure S10B). Analysis showed persistence of few treatment-resistant blast cells (~6% of the blast cells present in Dx) in EOI samples (Figure S10B). These AML blasts overexpressed multiple AML blast genes validating AML blast characteristics for these cells (Figure S10C).
To further determine the transcriptome landscape of treatment-resistant residual tumor cells, we performed comparative gene analysis of treatment-responsive (only detected in Dx samples) and treatment-resistant (detected in both Dx and EOI samples) cells (Fig. 7D). The treatment resistant blast cells depicted overexpression of multiple genes associated with tumor growth and poor outcome including HOPX, SELENOP/SEPP1, and FAM30A/C1orf110. Survival analysis on these treatment-resistant residual tumor cells overexpressed genes in the TARGET AML dataset showed significant association with poor OS as well as lower EFS (Figure S11, Table S8). Additionally, multiple genes (MPO, KCNE5, MSLN, and TRH) that are significantly upregulated in treatment responsive blast cells are associated with better OS in AML (Fig. 3D, 7D, Figure S11). Further pathways analysis of the DEGs depicted significant upregulation of pathways related to muscle contraction, fatty acid omega oxidation, PPARα network in the treatment resistant cells (Fig. 7E, Table S9). Survival analysis based on treatment resistant cells upregulated pathways (e.g., fatty acid omega oxidation and PPARα network) in the TARGET AML dataset showed significant association with poor survival indicating their role in disease progression (Table S10). Additionally, upstream regulator analysis on DEGs in treatment-resistant cells depicted activation of multiple key regulators associated with epithelial–mesenchymal transitions including SOX4, STAT3, and TGFβ1 (Fig. 7F).
To summarize, treatment-resistant blasts identified in this preliminary analysis may contribute to relapse by differential regulation of genes and pathways associated with modulation of TME, leading to poor OS and lower EFS in AML patients.
Relapse-associated EOI samples depicted enrichment of T cells and diminution of monocytes/macrophages
EOI samples showed a much larger proportion of non-AML immune cells since therapeutic intervention (chemotherapy) results in the killing of tumor cells. To further characterize EOI single cell landscape, the focused analysis on non-AML cells was performed after removing residual blast cell clusters discussed in the previous section. The split UMAP of CCR- and relapse-associated samples revealed the altered distribution of the transcriptionally distinct canonical immune cells (T cells, NK cells, and macrophages/monocytes) (Fig. 8A). Analysis of relapse- and CCR-associated non-AML clusters at EOI revealed a similar pattern to that seen at Dx, that is higher enrichment of T cells in relapse-associated samples and a higher proportion of monocytes and macrophages in the CCR-associated samples enriched clusters (Fig. 8B, C). Further M1 and M2 macrophage enrichment analysis revealed a highly significant (p<0.001) inflammatory M1 macrophages predominance in CCR-associated EOI samples and significant (p<0.05) M2 macrophages enrichment in relapse-associated EOI samples (Fig. 8D). The T cells formed three distinct clusters i.e., naïve T cell-1, naïve T cell-2, and NKT cells clusters (Fig. 8A, Figure S12A). Naïve T cells-2 cluster depicted significantly (P value<0.05) higher enrichment in the relapse-associated EOI samples (Figure S12B). The comparative analysis of the two naïve T cell clusters depicted that relapse-enriched naïve T cell-2 has lower expression of MHC class I genes including HLA-A and HLA-B (Figure S12C). Pathway analysis of naïve T cell-2 depicted downregulation of pathways related to Th1, Th2, calcium-induced T cell apoptosis, CD28 signaling in T helper cells, and integrin pathways (Fig. 8E).