Characteristics of patient population. Patients who were included in this study displayed higher WBC, blast percentage, and ANC (P<0.0001 for all) compared to patients enrolled on these trials who were not included in this study. In addition, there was a significant difference in cytogenetic profiles (P=0.0031), FAB class (P<0.0001), and proportions across clinical trials (P=0.0129, Additional File,Table S3). These differences between included and not included patients likely reflect reported biases for patients within repositories having a higher burden of disease at diagnosis and depletion of specimens from older trials (6). However, there were no significant differences between the included and not included patients with respect to CR rates (60% vs. 58% P=0.52), RFS (5-year RFS 32% vs. 33%; P=0.52) or OS (5-year OS 30% vs. 32%; P=0.62, Additional File, Table S3 and Figure S1). Comparing the discovery and validation patients, the two cohorts displayed some differences in clinical characteristics despite randomization (Additional File,Table S4), however there were no significant differences in clinical outcomes between the discovery and validation cohorts (CR 57% vs. 63% P=0.31; 5-yr RFS 30% vs. 34%; P=0.54; or 5-yr OS 30% vs. 31%; P=0.82, Additional File, Table S4 and Figure S2).
Characterization of mutations and transcript expression. Mutation analyses focused on genes utilized for ELN-2017 risk stratification. FLT3-ITD and NPM1 mutations were examined in all specimens with available material (i.e., MNCs and VLBs). There was 100% concordance for NPM1 mutations in MNCs and VLBs. One FLT3-ITD was observed in the MNCs but not VLBs (99.7% concordant). FLT3-ITD and NPM1 mutations were detected in 109 (31%) and 125 (36%) patients, respectively. The distribution and mutation frequencies of NPM1 and FLT3-ITD, as well as FLT3-ITD AR, were not significantly different between discovery and validation cohorts in either population of cells (Additional File, Table S5 and Figure S3). Excluding the patient with discordant FLT3-ITD results, FLT3-ITD AR was significantly higher in VLBs than the MNCs (AR ranges 0.03-20 and 0.04-13.2, respectively, P<0.0001). Given that the ELN-2017 guidelines utilize FLT3-ITD AR of 0.5 for risk stratification, we examined the impact that testing the FLT3-ITD AR in VLBs had on ELN-2017 classification. In the MNCs, percentages of patients with low and high FLT3-ITD ARs were 34% and 66%, respectively, while percentages for low and high AR in VLBs were 23% and 77%. Examining FLT3 in VLBs resulted in a different AR classification for 19 patients, with 15 patients changing from low AR in MNCs to high AR in VLBs and 4 patients changing from high AR in MNCs to low AR in VLBs.
ASXL1, CEBPA, RUNX1, and TP53 mutations were examined in both MNC and VLB populations for the discovery cohort. Similar to the results for NPM1 and FLT3, there was a 99.4% concordance in mutations between MNCs and VLBs, with only one patient displaying a discrepancy for an ASXL1 mutation. Therefore, mutation analyses for ASXL1, CEBPA, RUNX1, and TP53 were examined in only VLBs for the validation cohort. Overall, the frequencies of mutations in the examined patients were as follows: ASXL1 (N=35, 10%), CEBPA (N=20, 6%), RUNX1 (N=40, 11%), and TP53 (N=26, 7%). The frequency of ASXL1 mutations was modestly higher in the discovery cohort (13% discovery vs. 7% validation, P=0.044); other mutations displayed similar frequencies in both groups of patients (Additional File, Table S5 and Figure S3).
Building upon the results examining transcript biomarkers in the discovery cohort (6), analyses examined transcript expression as a continuous variable for 13 genes, which had been previously reported to be potential prognostic biomarkers: BAALC, CCNA1, CEBPA, ERG1, EVI1, FLT3, GATA2, IL3RA, JAG1, KIT, MN1, RUNX1, and WT1 (13-27). In the case of EVI1, transcript expression was not detectable and thus censored in 69% of VLBs and 70% of MNCs. Given the dichotomous nature of EVI1 expression, we also examined the prognostic significance of EVI1 expression as a binary variable (expressed vs. not expressed). In the discovery cohort, univariate analyses showed a significant increase in expression in VLBs relative to MNCs for BAALC (P<0.0001), CCNA1 (P=0.005), ERG1 (P<0.0001), EVI1 (P=0.001), FLT3 (P=0.024), MN1 (P<0.0001), RUNX1 (P=0.001) and WT1 (P<0.0001), while none of the transcripts were expressed at significantly lower levels in VLBs than MNCs (Additional File, Table S6).
Prognostic significance of biomarkers in univariate analyses. Univariate analyses examined the prognostic significance of FLT3-ITD AR, NPM1 mutation, and transcript expression in MNCs and VLBs in the discovery cohort. Increasing FLT3-ITD AR in MNCs was associated with worse OS (Table 1). NPM1 mutations were not associated with clinical outcome in univariate analyses (Table 1). The prognostic significance for some transcripts varied depending upon tested cell type (Table 1, Additional File, Table S7). Overall, increased expression of CCNA1, ERG1, EVI1, FLT3, IL3RA, KIT and MN1 was significantly associated with adverse risk for one or more clinical outcomes in one or both cell populations (Table 1), while expression of BAALC, CEBPA, GATA2, JAG1, RUNX1 and WT1 were not significantly associated with clinical outcomes in either MNCs or VLBs (Additional File, Table S7). Univariate analyses also evaluated the prognostic significance of age, cytogenetics, PS, secondary AML status, and ELN risk groups in the discovery cohort. As expected, increasing age, adverse cytogenetics, poor PS, and secondary AML status were significantly associated with poor clinical outcomes (Table 2). Favorable ELN-2017 risk was significantly associated with improved CR, whether examining MNCs or VLBs (OR=3.11, P=0.024 and OR=3.69, P=0.014, respectively), while adverse and unknown ELN-2017 risks were not significantly associated with CR (Table 2). Favorable ELN-2017 risk was also significantly associated with improved OS in VLBs (MNCs, HR=0.58, P=0.060 and VLBs, HR=0.38, P=0.001, Table 2). Adverse ELN-2017 risk was associated with reduced OS in MNCs (HR=1.66, P=0.050) but not in VLBs (HR=1.10, P=0.720). In keeping with the CR and OS analyses, favorable ELN-2017 was significantly associated with improved RFS in both MNCs and VLBs (HR=0.47, P=0.027 and HR=0.37, P=0.008, respectively, Table 2).
Performance of novel risk models utilizing ELN and other prognostic factors. Multivariable models for CR, OS, and RFS were developed separately for each cell population using age, ELN-2017 risk group, PS, AML onset, immunophenotype, clinical trial, transcript biomarker and expression as possible covariates (Additional File, Models Details). In the discovery cohort, the models with the best performance were obtained when clinical variables and expression biomarkers were integrated; however, when applied to an independent population of patients in the validation cohort, the performances of integrated models for most outcomes were not superior to AGE+ELN2017 models (Table 3). If a model is generalizable to a broad population, AUCs or C-statistics will be nearly equivalent in the two cohorts. Generalizability of the developed integrated models was inconsistent across CR, OS and RFS outcomes.
The ELN2017 model divides patients into 4 groups: favorable, intermediate, adverse, and unknown. Figure 1 shows OS by ELN2017 risk in MNCs and VLBs from the validation cohort. Since previous studies demonstrated a worse prognosis for intermediate risk patients over the age of 55 (8, 44), the ELN2017 models were also applied to younger (age <55) and older (age ≥55) patients. ELN2017 models were a better fit for the younger patients, whether using data derived from MNCs (Fig. 1) or VLBs (Fig. 2). To visualize the AGE+ELN2017 model for OS, the continuous risk score from the AGE+ELN2017 model in the discovery data was divided into quartiles to parallel the ELN2017 model, and boundaries of these quartiles were applied to the validation data (Figs. 3 and 4). Though these plots are intended to be exploratory, the quartiles defined by the AGE+ELN2017 models visually show more separation between curves than do the ELN2017 risk groups in MNCs and in VLBs (Figs 3A and 4A vs Figs. 1A and 2A). The c-statistics for the AGE+ELN2017 models are also slightly higher than the c-statistics for the ELN2017 models.
Evaluation of simplified ELN-2017 and AGE+ELN2017 Models. To investigate the impact of ASXL1, CEBPA, RUNX1 and TP53 mutations on risk stratification, we evaluated the performance of modified models that did not include mutation data for these 4 genes without age (ELN2017-MOD) and with age (AGE+ELN2017-MOD). Exclusion of mutation status of these four genes resulted in an overall reassignment of risk groups for 46 patients in MNCs and 44 patients in VLBs of the 351 patients (Additional File, Table S8). Both models were developed using the discovery data from the MNCs and VLBs. In the validation cohort, the AUCs and C-statistics were similar between the ELN2017 and ELN2017-MOD models, allowing comparable population risk prediction at the community sites that may not have access to genomic mutation screening. Furthermore, the AGE+ELN2017-MOD models had almost the exact same performance characteristics as the AGE+ELN2017 models (Table 4).