Genetic Mutation Signature for Relapse Prediction in Normal Karyotype Acute Myeloid Leukemia

Background: Risk strati�cation for normal karyotype acute myeloid leukemia remains unsatisfactory, which is re�ected by the high incidence of leukemia relapse. This study aimed to evaluate the role of gene mutations and clinical characterization in predicting the relapse of patients with normal karyotype acute myeloid leukemia. Methods: A prognostic system for normal karyotype acute myeloid leukemia was constructed based on gene mutations, measurable residual disease, and clinical characteristics. A panel of gene mutations was explored using next-generation sequencing. The least absolute shrinkage and selection operator, and nomogram algorithm were used to build a genomic mutation signature (GMS) nomogram (GMSN) model that combines GMS, measurable residual disease, and clinical factors to predict relapse in 347 patients with normal karyotype acute myeloid leukemia from four centers. Results: Patients in the GMS-high group had a higher 5-year incidence of relapse than those in the GMS-low group (P< 0.001). The 5-year incidence of relapse was also higher in patients in the GMSN-high group than in those in the GMSN-intermediate and -low groups (P< 0.001). The 5-year disease-free survival and overall survival rates were lower in patients in the GMSN-high group than in those in the GMSN-intermediate and -low groups (P < 0.001) as con�rmed by training and validation cohorts. Conclusions: This study illustrates the potential of GMSN as a predictor of normal karyotype acute myeloid leukemia relapse.


Background
Acute myeloid leukemia (AML) is a highly heterogeneous disease with a poor prognosis, largely owing to its high incidence of relapse (1).Cytogenetic detection has been proven important for risk strati cation of patients with AML (2,3).However, AML with a normal karyotype (NK-AML) is observed in nearly half of AML cases (4).The identi cation of speci c genetic mutations has remarkably augmented our understanding of AML molecular pathophysiology and revealed the prognostic signi cance of each mutation in NK-AML (5)(6)(7).The classi cation of AML and its prognostic pro le have been improved owing to advances in molecular characterization and the application of high-throughput sequencing.NK-AML belongs to the largest cytogenetic AML subgroup, with a large proportion of patients experiencing relapse(8).However, the precise identi cation of patients at high risk of relapse remains unsatisfactory.It is, therefore, imperative to introduce novel prognostic biomarkers for determining high relapse risk.
In recent years, next-generation sequencing (NGS) has become a routine diagnostic method for hematological malignancies (7,9).NGS outcomes, including single somatic mutation pro ling, may improve diagnostic accuracy and support precise treatment strategies in clinical practice.However, the comprehensive use of genomic mutation databases and clinical factors to guide clinical decision-making remains at a relatively early stage and, therefore, di cult.In a previous study, we used a machine-learning algorithm based on 16s rRNA gene sequencing of intestinal microbiota to precisely predict the occurrence of acute graft-versus-host disease during allogeneic hematopoietic stem cell transplantation (allo-HSCT) (10).Our results strongly suggested that NGS data and machine learning can be used for the identi cation of novel biomarkers for predicting NK-AML relapse.
In this study, we adopted the least absolute shrinkage and selection operator (LASSO) method and combined 22 gene mutations into a panel for NGS testing prior to induction therapy to establish a robust model (genomic mutation signature, GMS) for the prediction of relapse in NK-AML.Furthermore, we combined GMS, measurable residual disease (MRD) and clinical characteristics to generate a nomogram model for improved relapse and survival prognosis in 347 patients with NK-AML enrolled from four centers.Our model could provide novel insights into improving the precise evaluation of relapse risk in NK-AML.

Patients
From July 2016 to December 2019, patients diagnosed with AML in our centers (the First A liated Hospital of Zhengzhou University, Henan Cancer Hospital, the First A liated Hospital of Xinxiang Medical University, and the Huaihe Hospital of Henan University) were enrolled based on the following inclusion criteria: (1) diagnosed with de-novo AML and normal karyotype; (2) between the ages of 14 and 60 years; (3) received ≥ 3 cycles of chemotherapy.The study was performed in accordance with the Helsinki Declaration and was approved by the ethical committees of the First A liated Hospital of Zhengzhou University, Henan Cancer Hospital, the First A liated Hospital of Xinxiang Medical University, and the Huaihe Hospital of Henan University.

Diagnosis
AML was diagnosed as previously described (2).Immunophenotyping was conducted on diagnostic bone marrow (BM) aspirate samples via eight-color CD45/SSC gated ow cytometry (11).Cytogenetic examinations were conducted as per standard techniques (12).Molecular screening for fusion genes and gene mutations was performed via RT-PCR, and sequencing analysis was performed for all patients (13).

The diagnosis of AML was based on the European Leukemia Net 2017 recommendations version 3(2).
Secondary AML (sAML) includes AML arising from myelodysplastic syndrome, myeloproliferative neoplasms, myelodysplastic/myeloproliferative neoplasm, and therapy-related AML (14).

NGS
BM samples containing at least 20% blasts were collected at diagnosis, and mononuclear cells were separated via density gradient centrifugation.Genomic DNA was extracted using the Tiangen DP318-02 blood genomic DNA Extraction Kit (Tiangen, Beijing, China) according to the manufacturer's instructions.
The integrity and concentration of genomic DNA were determined using the Qubit 4.0 uorometer dsDNA HS Assay (Thermo Fisher Scienti c, Waltham, MA, USA).Germline control DNA was obtained from matched BM during complete remission (CR).Approximately 500 ng to 1 µg of high-quality DNA was used for sequencing library construction.
The sequencing panel contained 22 frequently mutated genes related to AML diagnosis and prognosis.The panel kit was purchased from Shanghai Yuanqi Biomedical Technology Company Ltd.The threshold of the read depth was 1000×, resulting in a sensitivity of 1%.Raw reads were ltered using the Cutadapt software (version 2.10; https://cutadapt.readthedocs.io/en/stable),and clean reads were mapped to the human reference genome (GRCh37) using the BWA-mem algorithm (software version 0•7•17; http://biobwa.sourceforge.net/bwa.shtml).The Sambamba software (version 0•6•8; https://github.com/biod/sambamba)was used to mark duplication, and the GATK software (version 4•0•12•0; https://gatk.broadinstitute.org/hc/en-us)was used for the recalibration of the base quality score.Single-nucleotide variants and short insertions/ deletions were called using the GATK software.Finally, a variant allele frequency of 0.01 was used as the threshold to determine whether a mutation was positive or negative.
Sequencing libraries were prepared using the KAPA Hyper Prep Kit (Kapa Biosystems, Wilmington, MA, USA) according to the manufacturer's instructions.Genomic DNA was fragmented to 200 to 300 bp using the Covaris S220 Ultrasonicator (Covaris, Woburn, MA, USA).The overhanging ends were repaired into blunt ends.The 3′ ends of the fragments were additionally adenylated with a single adenine (A) nucleotide, allowing hybridization to the 3′ overhanging thymine (T) nucleotide of the sequencing adapters.The fragments were puri ed and ligated to adapters.The resulting DNA fragments were selected using AM Pure XP beads (Beckman Coulter, Krefeld, Germany) for the desired size of 420 bp.The fragments were then ampli ed via PCR.The puri ed libraries were sequenced and demultiplexed on a Nova Seq 6000 System (Illumina, San Diego, CA, USA), with 2 × 150 bp paired-end sequencing.

Lasso-based Gene Selection And Establishment Of The Gms For Relapse Prediction
In our previous studies, we employed LASSO for the regression analysis of high-dimensional variables (10,15); herein, we used the LASSO algorithm to select the most important mutated genes in the training cohort.After repeated ne-tuning and 10-fold cross-validation, the standardized constraint parameters (minimum value of log λ) based on the 1-SE criteria were nally set to 0.036, and several non-zero coe cients were selected.We then used Cox regression to generate a predictive model of relapse and calculated the GMS for each patient.Using Kaplan-Meier survival analysis, we evaluated the predictive performance of the GMS model in both the training and independent validation cohorts.The best cut-off value estimation for the GMS model was 0.0069, as determined by repeated testing for relapse prediction.

Monitoring And De nition Of Mrd
BM samples were obtained to monitor MRD using eight-color multi-parameter ow cytometry after each cycle of chemotherapy (induction, six cycles of consolidation) (11).The identi cation of the leukemiaassociated immunophenotype, as de ned in AML diagnosis, was performed for MFC-MRD detection.The different-from-normal immunophenotype was applied to monitor MFC-MRD when leukemia-associated immunophenotype was not available at diagnosis.The sensitivity of MFC-MRD was de ned as 0•1%.Any level of MRD ≥ 0.1% was positive, and less than 0•1% was negative(16).

Treatment
Induction chemotherapy included anthracycline (10 mg/m 2 idarubicin or 45 mg/m 2 daunorubicin for 3 days) in combination with infusional cytarabine (Ara-C, 100 mg/m 2 for 7 days) or HAA (2 mg/m 2 homoharringtonine, 100 mg/m 2 cytarabine, and 20 mg aclarubicin for 7 days) (17,18).Generally, induction chemotherapy was performed for two cycles if the patients achieved CR or partial remission (PR) in the rst cycle; otherwise, those who experienced no remission (NR) after the rst cycle received FLAG (30 mg/m 2 udarabine on days 1-5, 2 g/m 2 Ara-C on days 1-5, and 300 µg of G-CSF on days 0-5) or CLAG (5 mg/m 2 cladribine on days 1-5, 2 g/m 2 Ara-C on days 1-5, and 300 µg of G-CSF on days 0-5) (19,20).After two cycles of induction chemotherapy, patients with NR were given decitabine + CAG (Ara-C, aclarubicin) (21) or were enrolled in a clinical trial, and the patients of CR/CRi were administered consolidation chemotherapy, which comprised a cycle of cytarabine (2 g/m 2 q12h for 3 days).Subsequently, the patients in CR/CRi received another three to four cycles of consolidation chemotherapy (17), allo-HSCT, or auto-HSCT based on MRD and donor availability.In auto-HSCT, peripheral blood stem cells were harvested following mobilization using intermediate-dose cytarabine and subsequential granulocyte colony-stimulating factor.In allo-HSCT, myeloablative conditioning regimens were administered to all patients, as previously described (10).

De nitions Of Treatment Response
CR was de ned as follows: BM blasts less than 5%, absence of blasts with Auer rods, zero blasts in peripheral blood, no extramedullary disease, absolute neutrophil count > 1.5 × 10 9 /L, platelet count > 100 × 10 9 /L, and hemoglobin concentration > 90 g/dL.CRi was de ned as CR with incomplete blood count recovery; PR was de ned as 5% < BM blasts < 20%; NR was de ned as BM blasts ≥ 20%; relapse was de ned as the reappearance of BM blasts > 5%, recurrence of blasts in the blood, or the appearance of extramedullary disease; non-CR was de ned to include NR and PR as previously described (22).

Development Of A Nomogram Combining Gms, Mrd, And Clinical Factors For Relapse Prediction
Nomogram models have been extensively reported in previous studies on cancer prognosis (23,24).We performed multivariate regression analysis to construct a GMS nomogram (GMSN) as a precise quantitative model to predict relapse in patients with NK-AML.After multivariate analysis, candidate predictors of relapse were GMS, sAML, risk category, cycle3rd (remission status after the third cycle of chemotherapy), MRD3rd, and treatment choice.The predictive performance of the nomogram was estimated using an independent validation cohort.Relapse was then assessed considering the total points as a factor in the Cox regression analysis.Harrell's C-index was calculated to quantify the discriminating capability of the GMSN in the training cohort.The GMSN model was plotted using the "rms" package (https://cran.r-project.org/web/packages/rms/index.html).

Endpoints And Statistical Methods
The cumulative incidence of relapse was the primary endpoint of the study, while the secondary endpoints included disease-free survival (DFS) and overall survival (OS), determined via Kaplan-Meier analysis and compared using a log-rank test.In Cox regression analysis, variables associated with relapse or survival (P < 0.10 in univariate analysis) or variables (e.g., age, high white blood cell count, cycles required to achieve CR ≥ 2) known to in uence outcomes were included in the nal models.Statistical signi cance was established at P < 0.05.The R software (http://cran.R-project.org) was used for all data analyses.

Potential of GMS for determining NK-AML relapse
To determine suitable predictors for NK-AML relapse, the above gene mutations were assessed at diagnosis.Based on the 10-fold cross-validation via minimum criteria, nine coe cients were selected as the vertical lines shown in Figure 2b.The nal optimal selection genes included NPM1, KIT, CEBPA double, FLT3-ITD, RUNX1, TP53, ETV6, ZRSR2, and JAK2.The GMS was determined, and the correlation weights are shown in We evaluated patient relapse and survival based on GMS.For the 209 patients in the training cohort, the 5-year cumulative incidence of relapse was higher in the GMS-high than in the GMS-low group (72.41% vs. 24.60%,HR = 4.093 [2.100-7.977],P < 0.001; Figure 3a).The 5-year DFS and OS were lower in the GMS-high group than in the GMS-low group (23.42% vs. 63.83%, and 25.27% vs. 71.06%;HR = 3.142 [1.771-5.576]and 4.093 [2.100-7.977],respectively; each P < 0.001; Figure 3b and c).The 5-year incidence of relapse was higher (71.97% vs. 16.93%,HR = 4.993 [2.293-10.870],P < 0.001; Figure 3d), whereas the 5-year DFS and OS were lower in the GMS-high group than in the GMS-low group of the validation cohort (  3e and f).Our results also indicated that FLT3-ITD-high status was associated with a higher 5-year relapse rate than FLT3-ITD-low and wild-type status according to the mutation allelic ratio in all patients (100% vs. 34.5% vs. 35.0%,P < 0.001; Additional File Figure 2a).Furthermore, the FLT3-ITD-high subgroup included more patients with high GMS than the FLT3-ITD-low and wild-type subgroups (26/26 vs. 29/35 vs. 47/239, P < 0.001; Additional File Figure 2b).

Establishment of a prognostic model that combines GMS, MRD, and clinical characteristics
To further improve predictive accuracy, we established GMSN, a comprehensive nomogram combines MRD, and clinical characteristics.We quanti ed the degree of agreement between the actual and predicted relapse in the training cohort, divided into GMSN-low, -intermediate, and -high groups (Figure 4).The detailed formula for the GMSN calculation is shown in Additional File Table 2.The 5-year cumulative incidence of relapse was higher in the GMSN-high group than in the GMSN-intermediate andlow groups (100.00% vs. 48.63%vs. 8.70%, P < 0.001; Figure 5a).Furthermore, the 5-year DFS and OS were lower in the GMSN-high group than in the GMSN-intermediate and -low groups (00.00% vs. 43.20% vs. 79.25% and 00.00% vs. 50.30%vs. 86.15%,P < 0.001 and < 0.001, respectively; Figure 5b and c).Likewise, in the validation cohort, the incidence of relapse was higher in the GMSN-high group than in GMSN-intermediate and -low groups (91.11% vs. 49.90% vs. 7.46%, P < 0.001; Figure 5d), whereas the 5-year DFS and OS were lower (8.88% vs. 45.68% vs. 85.05% and 8.33% vs. 54.24% vs. 86.01%,P < 0.001 and < 0.001, respectively; Figure 5e and f).To evaluate the predictive power of the model, we presented the morphology and MRD (at the time of relapse) of two relapsed patients with AML-M2, with high GMS and GMSN scores (Additional File Figure 3a and 3b).

Discussion
Over the past 10 years, the application of NGS and gene expression pro ling has fostered the identi cation of an increasing number of genetic alterations with prognostic value in AML(6, 7, 25).
Although the prognostic impact of various single markers has been established, little is known regarding the interaction of these risk factors and their cumulative effect on NK-AML outcome.The aims of this study were to: (1) evaluate the role of gene mutations and clinical characterization in predicting the prognosis (especially relapse) of patients with NK-AML (14-60 years of age); and (2) build and validate a GMSN prognostic system for these patients.In this study, based on the analysis of 347 intensively treated patients with NK-AML, we established the GMS model for AML relapse prediction.A high GMS score was independently associated with relapse in patients with NK-AML.Furthermore, we combined clinical factors, MRD, and GMS to generate the prognostic GMSN, which exhibited an improved accuracy of prognostic classi cation.We found that the incidence of relapse was higher in the GMSN-high than in GMSN-low and -intermediate groups.The GMSN strati ed patients based on relapse risk, highlighting the importance of molecular and clinical factors and their interaction with other risk factors.
Various models for AML prognosis have been reported, but most were focused on speci c AML cohorts.For example, patients with speci c DNA methylation signatures, CEBPA double mutations, or FLT3-ITD mutations(26-28).For NK-AML, a previous study indicated that prognostic indices discriminated among low-, intermediate-, and high-risk patients, without considering the effect of HSCT and MRD as only NPM1, CEBPA, and FLT3 gene mutations were analyzed(8).During AML development, single gene mutations might be affected by individual differences, leukemia heterogeneity, or different consolidation strategies.Moreover, the interaction between multiple genes, such as co-mutations and mutually exclusive mutations, cannot be ignored and could be related to treatment response and relapse.Thus, we assessed the correlations among mutations and found that DNMT3A and FLT3-ITD mutations appeared more commonly in patients harboring mutated NPM1, which is consistent with a recent report (29).
RUNX1 mutations were associated with those of EZH2 and SF3B1, which agrees with recent ndings (29,30).CEBPA double mutations were mutually exclusive with NPM1, which is in line with a previous nding (31).These observations support the paradigm of how co-occurring variations can in uence prognosis beyond the effects of a single mutation (29,32,33) and indicate the importance of re ning NK-AML molecular characterization.
To further evaluate sequencing data for the prediction of relapse, we performed bioinformatics analysis for identifying new biomarkers for risk group classi cation, and applied the LASSO algorithm to select the best panels of mutational genes for prognostic prediction.Finally, nine gene mutations were included in the GMS; several genetic alterations have been con rmed to affect clinical outcomes in AML (9,29,34).
Based on transcriptional pro ling, a previous study used LASSO to predict initial induction resistance and to develop prognostic biomarkers for AML (35).Moreover, a recent study suggested that a prognostic model based on the immune marker score predicts OS in NK-AML(36).Notably, the cut-off values of transcriptional data frequently limit clinical practice, especially across centers.Unlike the analysis of RNA-sequencing data sets (Gene Expression Omnibus) conducted in the studies described above, we employed NGS to establish the GMS, which was signi cantly associated with the incidence of relapse.
Our model mainly focused on the mutational status (yes or no) of each gene, allowing an easy calculation of GMS based on only nine genes.Importantly, our GMS was an independent prognostic factor for relapse as well as DFS and OS, which was convenient and e cient in discriminating the different outcomes of NK-AML.In addition, we found that the FLT3-ITD-high status was associated with a higher incidence of relapse than the FLT3-ITD-low and wild-type status, and FLT3-ITD-low patients might bene t from allo-HSCT, which is consistent with ndings from recent studies (37,38).MRD is a crucial biomarker in AML and is applied in prognostic monitoring (39,40).Getta et al. reported that both multicolor ow cytometry and NGS can be used to monitor MRD for AML relapse prediction, but clinical factors were not considered (41).
Another study proposed a prognostic model for the prediction of 3-year OS, with an AUC of 0.74, without considering the MRD status or treatment choice (including HSCT) and analyzing only a few gene mutations (17).Damm et al. proposed a model for NK-AML prognosis that includes several gene mutations and clinical characteristics but lacks treatment response analysis (42).To further improve the performance of genomic analysis-based models, we combined GMS, MRD, and clinical features to build GMSN for relapse prognosis.We recognize that it is crucial to consider both the risk category of the European Leukemia Net guideline and therapy response when applying GMSN scores, so the selected clinical features included risk category, Cycle3rd status, MRD3rd, and consolidation treatment choice.This study demonstrated that the combination of gene mutations, MRD, and treatment choice successfully differentiated high-, intermediate-, and low-risk of relapse patients.Therefore, GMSN has potential as a predictive tool for NK-AML outcomes.
Nevertheless, this study had some limitations.First, we mainly used NGS and clinical data to establish the predictive model for NK-AML relapse.Multi-omics, including DNA methylation analysis and proteomics, should be considered in future studies.Second, our study was a pilot study and focused only on the application of NGS.A larger prospective multicenter study should be performed to extensively validate the prognostic value of GMS/GMSN.Cycle3rd, after the 3rd chemotherapy cycle; GMS, genomic mutation signature; MRD3rd, measurable residual disease after the 3rd chemotherapy cycle; sAML, secondary acute myeloid leukemia.
were excluded due to early death (n = 31), loss of follow-up (n = 21), or without MRD data during the two cycles of induction chemotherapy.Of the remaining 370 patients, patients lost to follow-up (n = 13) or without MRD (n = 10) were excluded after another cycle of chemotherapy consolidation.The remaining 347 patients (the First A liated Hospital of Zhengzhou University, n = 197; Henan Cancer Hospital, n = 106; the First A liated Hospital of Xinxiang Medical University, n = 25; and Huaihe Hospital of Henan University, n = 19) were enrolled and randomly divided into a training cohort (n = 209) and a validation cohort (n = 138).

Figures Figure 1
Figures

Table 1 .
Characteristics of patients in the training and validation cohorts.

Table 3 .
). Univariate and multivariate analysis of cumulative relapse in the training cohort