Protein selector sets (PS) identify patient groups with distinct clinical outcomes
We developed an algorithm to identify the most therapeutically discriminating proteins and generated Protein Selector Sets (see Materials and Methods section). The first one, entitled PS1, was comprised of 55 proteins, which identified three clusters (C1, C2, and C3) with unique expression signatures. Protein levels across the clusters are shown in Fig. 1A. Although the protein signature of each cluster was the same in both patients with VH and CC, their overall survival (OS) varied greatly between treatments. As shown in Figs. 1B, patients in C1 (red) treated with VH (solid line) had diametrically different and superior responses compared to those treated with CC (dashed line), with a Median OS (MS) of 68.5 months (mo.) in the VH group versus (vs.) MS of 19.4mo. in the CC population. The opposite was true for C3 (yellow), where CC patients had a MS of 16.8mo. and the VH population displayed a very poor MS of 8.7mo. However, PS1 did not identify an optimal therapy for patients in cluster C2 (light blue). Therefore, to identify the preferred therapy for PS1-C2 patients (N = 182), we generated PS2, using the same strategy described previously. As shown in Fig. 1C, PS2 separated the population into two clusters with distinct expression profiles. In Fig. 1E, cluster PS2-C1 (blue color) treated with CC (dashed line) had a markedly better OS (> 120mo.), compared to C1-VH (solid blue), which has a MS of 12.7mo. The same was true for cluster PS2-C2 (purple color), where CC (dashed line) had a MS 12.2mo., and VH (solid line) had a MS of 6.4mo. Moreover, as shown in Fig. 1B, the best PS1-C3 curve (dashed yellow, CC-treated) has an OS comparable to the worst PS1-C1 group (dashed red, CC treated). Therefore, we generated a PS3 for PS1-C3 patients (N = 146) in an attempt to identify a group with better OS. Within PS3, two clusters with contrasting protein expression levels were defined, and separated by treatment (Fig. 1D). As shown in Fig. 1F, patients in cluster PS3-C1 (green color) had a very good prognosis when treated with CC (dashed line), with MS > 120mo., and a very poor outcome when treated with VH (solid line), having a MS of 10.4mo. In contrast, OS of patients in PS3-C2 (orange color) were similarly poor for both therapies.
The combination of the PS sets led to the generation of five clusters separated by the expression levels of 109 proteins as shown in Fig. 2A. C1 derived from PS1, C2 and C3 from PS2 (former PS2-C1 and PS2-C2), and C4 and C5 from PS3 (former PS3-C1 and PS3-C2). In Fig. 2B, the OS was better for C1 patients (red) treated with VH (solid) compared to CC (dashed) (MS = 68.5mo. vs. 19.4mo.). In contrast, both C2-CC (dashed blue) and C4-CC (dashed green) displayed MS > 120mo., outperforming both C2-VH (solid blue), with a MS of 12.7mo., and C4-VH (solid green), which has a MS of 10.4mo. Moreover, although C3-CC (purple dashed) do better than C3-VH (purple solid) (MS of 12.2mo. vs. 6.4mo.), their OS are worse than the C2-CC and C4-CC populations. Finally, our PS system could not determine which treatment patients in cluster C5 (orange) should receive. Considering their poor outcomes in both VH (MS = 2.9mo.) and CC (MS = 8.6mo), it seems that this population might benefit from another treatment regimen (e.g. target-based therapies). Analysis of CRD for all PS sets showed a similar outcome pattern (Supplementary Fig S1). Comparison of VH vs. CC for each cluster separately is shown in Supplementary Fig S2.
Clusters associations with demographic, clinical and molecular features
We examined how the clusters differed considering demographic (age, gender, race), clinical (AML group and laboratory parameters), and molecular features (cytogenetics and mutation profiles), as shown in Table 1. There were significant differences in age distribution, as well as the frequency of many clinical variables (primary vs. secondary AML, white blood cell count, percentage of blasts and platelets number), cytogenetics (by risk group, simple vs. complex karyotype, or for specific events, such as -5/5q-, -7/7q- and inv16), and for several individual mutations (ASXL1, CEBPA, DNMT3A, EZH2, FLT3 [individually for ITD and D835, and in combination], NPM1, and TP53). An expanded table with all variables assessed is shown in Supplementary Table S4.
Since many of these features with unbalanced distributions among the clusters are known to be prognostic, we wondered whether the cluster prognostic impact was just a reflection of these imbalances or if the clusters were independently predictive. Here, we generated KM plots to verify whether cluster membership is prognostic for OS and CRD when the population is filtered for specific variables (e.g., males only, secondary AML only, etc.). KM plots with p-values are shown in Supplementary Figs. S3 and S4. The prognostic impact of the five clusters was sustained for almost all the variables, including gender, all three age groups, all races, both primary and secondary AML, and major cytogenetic groupings (whether divided into three prognostic groups, or for complex karyotypes). Since most individual cytogenetic and mutation events occur at a low frequency when the five clusters are subdivided by treatment modality (ten groups in total), the small sample sizes often preclude reaching statistical thresholds. However, similar trends (C1, C2 and C4, better than C3 and C5) were maintained for the majority, with exceptions noted for FLT3, IDH1, IDH2, JAK2, MLL, PTPN11, and TP53 mutations.
Next, we measured the prognostic value of the clusters and other variables using univariate (UV) and multivariate (MV) Cox proportional-hazards models (CoxPH) for both OS and CRD. In both analyses, clusters were condensed into three groups to avoid a large number of levels in a single variable, which might negatively influence the CoxPH models. Therefore, clusters with good prognosis (C1-VH, C2-CC, and C4-CC) were joined and renamed Group1; the ones with intermediate OS and CRD (C1-CC, C2-VH, C3-CC) were compacted into Group2; and finally, the remaining clusters, with poor prognosis, (C3-VH, C4-VH, C5-VH and C5-CC) were merged into Group3. As demonstrated in Table 2, all cluster groups were predictive of survival and remission in both the UV and MV models, reinforcing their prognostic value. Moreover, a few demographic (age, white race, and asian race), clinical (secondary AML, blasts, Hbg, and serum B2M), cytogenetic (complex karyotype, -5/5q-, -7/7q-, t(8;21), Inv16, and Del12), and mutational (ASLX1, CEBPA, FLT3 [individually for ITD and D835, and in combination], IDH2, JAK2, MLL, NPM1, PTPN11, and TP53 mutations) features were also prognostic in the UV model for OS. However, only clusters, secondary AML, complex karyotype, Inv16, and IDH2 and PTPN11 mutations remained significant in the MV analysis. Regarding CRD, in the UV analysis clusters remained highly significant along with other characteristics (age, black race, AML group, complex karyotype, -5/5q-, Inv16, and FLT3, RUNX1 and TP53 mutations), with only clusters, black race, and complex karyotype, which remained significant in the MV model. Taken together, these findings corroborate the independent prognostic value of the PS protein signatures. An expanded table containing all variables evaluates in the UV model for both OS and CRD is shown in Supplementary Table S5.
Development of a protein classifier (PC) for treatment recommendation
Although the PS system can efficiently separate patients who should receive VH from those who would do better with CC, it is not feasible to measure more than 100 different proteins in the clinical setting. The number of proteins required to be assessed is excessive and poses a major cost-benefit challenge for the application of the method. Instead, the identification of a few proteins that can be measured using a Clinical Laboratory Improvement Amendments (CLIA)-certified test to accurately assign an individual patient to a specific protein expression profile is practical. Therefore, we designed a classification algorithm using the random forest machine learning technique entitled Protein Classifier (PC). The system can identify the most predictive proteins for treatment recommendation, based on previously developed cluster memberships and protein expression data. In other words, we recommended VH treatment for patients belonging to cluster C1 (N = 91); CC therapy for patients in clusters C2, C3 and C4 (N = 267); and neither VH nor CC for the C5 patient population (N = 61). The system was developed with the goal of defining clusters using three different models sequentially:1) Define C1 patients (N = 91); 2) Distinguish C2 and C4 groups (N = 154) from the C3 and C5 populations (N = 174); and 3) Separate C3 (N = 113) from C5 (N = 61) patients. In Fig. 3A, the top predictive proteins are visualized together with their respective SHAP values. The first step of the PC system identified the six most predictive proteins for C1: SPI1, ASH2L, EIF4EBP1.pS65, EZH2, NFE2L2 and SOX2 (C-index: 0.951). Therefore, according to our previous OS and CRD analyses, patients with this protein signature should receive VH therapy. In the second step of the PC system, TGM2, NOTCH1.cle, DUSP4, and RAD51 were the best proteins to differentiate C2 + C4 from C3 + C5 (C-index: 0.903). Of note, distinguishing C3 from C2 and C4 is necessary, because although both patient groups should receive CC, the OS and CRD for C3 is much lower, so this patient group may benefit from additional therapy (e.g., CC and stem cell transplant in first remission), whereas C2 and C4 seem to do well with CC alone. Finally, SMAD2.pS245_250_255, MAPK14.pT180_Y182, EIF4E.pS209, and NDUFB4 were identified as the best proteins to segregate C3 and C5, defining the last step of our system (C-index:0.923). The expression of all proteins in the PC system by cluster is shown in Fig. 3B. Importantly, the C-index, a measure of individual patient discriminatory power, of all models in our PC system is above 0.90, demonstrating that it robustly predicts optimal therapy choice (a C-index higher than 0.7 is considered predictive, while a measure of 1 would indicate perfection). Moreover, by considering all three models working together, we predicted that 87.3% of patients would receive the correct therapy, and only a small fraction of 5.5% would be misassigned. The proportion of patients in the C5 group who could be assigned to either CC or VH, instead of being defined as ‘undetermined’, was 7.1%. Overall sensitivity, specificity, and accuracy were 84.2%, 79.6%, and 82.8%, respectively. The predictive calculations for the PC model are presented in Supplementary Table S6. Therefore, the development of a kit that determines the expression of the aforementioned 14 proteins would be useful and financially feasible for triaging patients and guiding the recommendation for VH or CC.
Patients with the worst outcomes have a unique and targetable protein signature
Since our PS system was unable to recommend either VH or CC for cluster C5 patients, we decided to determine the most associated signaling pathways within this population. We identified 27 proteins among the 411 in our database which in combination form a unique expression profile in C5 patients, compared to all the other clusters, as depicted in Fig. 4A. A table with adjusted p-values for the 27 significantly different expressed proteins in cluster C5 is shown in Supplementary Table S7. We built the protein network of these uniquely expressed proteins and noted EIF4E and MAP2K1 as major signaling hubs (Fig. 4B), which were also highly expressed compared to the other clusters (Fig. 4A). Furthermore, we determined the signaling pathways enriched with the 27 differentially expressed proteins, observing that signaling pathways related to the un/misfolding protein response and cell proliferation were the most significantly correlated (Fig. 4C). Therefore, even though we were unable to recommend a specific treatment for C5 patients, our Differential Expression (DE) analysis revealed potential druggable proteins that could be useful for developing target-based therapies.