Table 1 describes the clinicopathological characteristics of patients. In the descriptive cohort, the average age was 10 years, and the disease was more frequent in males (57%). Sixty percent of patients had normal karyotype, and t(1;19) was the most frequent genetic alteration (14.3%). The most frequent risk group was intermediate (71.4%); extramedullary infiltration was 11%. More than 90% of patients responded to corticosteroid therapy. MRD was detected by FCM at day 15 and the end of induction; 20 patients did not respond at day 15 (71%), while only 9 patients did not respond (32%) at the end of induction. MRD-/- patients corresponded to 25%, while MRD+/+ (refractory) patients represented 29%. Partial response refers to patients who did not respond at day 15 but achieved complete remission at the end of induction (46%). Relapse occurred in 4 patients (14%), and 9 patients (32%) died. The data obtained in the validation cohort are like those of the descriptive cohort.
Table 1
CLINICAL CHARACTERISTICS OF THE PATIENTS INCLUDED IN THE STUDY
CLINICAL CHARACTERISTICS | DESCRIPTIVE COHORT (n = 28) | VALIDATION COHORT (n = 18) | |
| n (%) | MEAN (RANGE) | n (%) | MEAN (RANGE) | p - value |
AGE (years) | | 10 (3–17) | | 8.26 (1–15) | 0.17 |
SEX | | | | | |
Female | 12 (43) | | 9 (50) | | 0.22 |
Male | 16 (57) | | 9 (50) | |
WBC (cel/µ) | | 61.85 (1.43–487) | | 40.39 (0.8–269) | 0.47 |
RISK | | | | | |
Low | 1 (3.6) | | 3 (16.7) | | 0.06 |
Intermediate | 20 (71.4) | | 11 (61.1) | |
High | 7 (25) | | 4 (22.2) | |
EXTRAMEDULLAR INFILTRATION | | | | | |
Yes | 3 (11) | | 3 (17) | | 0.55 |
No | 25 (89) | | 15 (83) | |
CORTICOID RESPONSE | | | | | |
Yes | 26 (93) | | 17 (94) | | 0.83 |
No | 2 (7) | | 1 (6) | |
CARIOTYPE/MOLECULAR ALTERATIONS | | | | | |
Normal | 17 (60.7) | | 11 (61.1) | | 0.08 |
Hipodiploid | 0 (0) | | 0 (0) | |
Hyperdiploid | 1 (3,50) | | 3 (16.7) | |
t(1;19) | 4 (14,3) | | 1 (5.6) | |
t(4;11) | 0 (0) | | 0 (0) | |
t(9;22) | 1 (3.5) | | 0 (0) | |
t(12;21) | 0 (0) | | 1 (5.6) | |
Amp Cr 21 | 1 (3.5) | | 0 (0) | |
Other | 4 (14.3) | | 2 (11.1) | |
MRD DAY 15 | | | | | |
Positive | 20 (71) | | 11 (61) | | 0.46 |
Negative | 8 (29) | | 7 (39) | |
MRD END OF INDUCTION | | | | | |
Positive | 9 (32) | | 3 (17) | | 0.24 |
Negative | 19 (68) | | 15 (83) | |
TRUE RESPONSE | | | | | |
MRDday15+ & MRDday33+ | 8 (29) | | 3 (17) | | 0.53 |
MRDday15- & MRDday33- | 7 (25) | | 7 (39) | |
Parcial | 13 (46) | | 8 (44) | |
RELAPSE | | | | | |
Yes | 4 (14) | | 3 (17) | | 0.82 |
No | 24 (86) | | 15 (83) | |
DEATH | | | | | |
Yes | 9 (32) | | 3 (17) | | 0.17 |
No | 19 (68) | | 15 (83) | |
MRD indicates minimal residual disease; RNAseq, RNA sequencing, RT-qPCR, quantitative reverse transcription PCR |
To identify DEGs between responders and non-responders, we balanced the groups and randomly compared 8 responders vs. 8 non-responders. A supervised hierarchical cluster analysis compared gene expression profiles between BM samples at diagnosis from responders vs. non-responders at day 15 of induction treatment. We detected 159 DEGs (Fig. 1a). Similarly, a second supervised analysis compared BM samples at diagnosis from responders vs. non-responders at the end of induction treatment and found 200 DEGs (Fig. 1b). The third supervised approach was done between BM samples at the time of diagnosis from MRD+/+ vs. MRD-/- patients and found 153 DEGs (Fig. 1c). Finally, a Venn diagram was used to identify common DEGs between the three analyses, finding 50 of them (Fig. 1d). In non-responders, 46 of 50 common genes were up-regulated, and 4 of 50 were down-regulated (see Supplementary Table S1 online). Uniprot analysis showed that some genes were associated with different functions, including DNA binding (HOXB2, RFX8), transmembrane (SLC18A2, SLC45A3, BOC, ITGA3), lipidation (TNFRSF10C, PTGIR), among others. In addition, enrichment analysis showed that these genes are involved in processes like inflammatory response, transmembrane transport of sugars, cellular response to growth factor stimulus, etc.
To identify GDMCpGs, we used the same groups as in Fig. 1. A total of 1 834 GDMCpGs were obtained when comparing diagnosis BM of patients with MRD + and MRD- at day 15 (Fig. 2a); 2 382 at day 33 (Fig. 2b); and 2 077 between MRD+/+ and MRD-/- patients (Fig. 2c). We identified 687 common GDMCpGs across all comparison groups (Fig. 2d).
To identify genes with both differential expression and methylation, we matched the previously obtained DEGs and GDMCpGs. We found 38 genes at day 15 of induction treatment, 52 genes at the end of induction, and 40 genes between patients with MRD+/+ and MRD-/-. Finally, a Venn diagram analysis identified 10 common genes (HOXB2, TMEM51, F2RL1, PDZRN3, BOC, TANC1, ASCL2, FBN2, MIR4435-2HG, and SCL18A2) (Fig. 2e). We found an inverse correlation between gene overexpression and hypomethylation in only four of them, which were present in all three comparison groups (BOC, MIR4435-2HG, SCL18A2, and ASCL2)(Table 2). We selected six additional DEGs from the analyses between the MRD+/+ vs. MRD-/- comparison; these genes also had GDMCpGs, and there was an inverse correlation between them. The resulting genes were DAPK1, NPDC1, CNKSR3, ITGA6, SCL45A3, and CTHRC1(see Supplementary Table S2 online).
Table 2
COMMON GENES BETWEEN THREE ANALYSIS GROUPS AND THEIR CORRESPONDS CpGDM
GENE/CpGs | | | MRD DAY 15 | MRD DAY 33 | TRUE RESPONSE | |
| FC | METHYLATION STATUS | p-value | Rho | p-value | Rho | p-value | Rho | |
HOXB2 | 5,1 | | | | | | | | |
cg22777724 | | Hypermethylation | 0.45 | 0.2 | 0.08 | 0.47 | 0.08 | 0.49 | |
cg09313705 | | Hypermethylation | 0.32 | 0.26 | 0.18 | 0.37 | 0.45 | 0.22 | |
cg22807449 | | Hypermethylation | | | 0.18 | 0.53 | | | |
TMEM51 | 5,4 | | | | | | | | |
cg24835051 | | Hypomethylation | | | 0.004 | -0.73 | | | |
cg15319451 | | Hypomethylation | | | 0.009 | -0.67 | | | |
cg13210325 | | Hypermethylation | 0.09 | 0.43 | | | 0.54 | 0.18 | |
cg07180646 | | Hypomethylation | 0.08 | -0.45 | | | | | |
cg11618537 | | Hypomethylation | | | | | 0.2 | -0.37 | |
F2RL1 | 3,5 | | | | | | | | |
cg12518146 | | Hypermethylation | 0.001 | 0.72 | 0.003 | 0.74 | 0.005 | 0.74 | |
PDZRN3 | 5,4 | | | | | | | | |
cg04340156 | | Hypomethylation | | | 0.27 | -0.31 | | | |
cg11461794 | | Hypomethylation | | | 0.86 | 0.05 | 0.87 | 0.049 | |
cg24353443 | | Hypomethylation | 0.66 | -0.11 | | | 0.62 | -0.14 | |
cg05502701 | | Hypomethylation | 0.6 | -0.14 | | | 0.45 | -0.22 | |
BOC | 4,6 | | | | | | | | |
cg11205552 | | Hypomethylation | | | 0.68 | -0.11 | | | |
cg22413784 | | Hypomethylation | 0.05 | -0.48 | 0.02 | -0.58 | 0.01 | -0.68 | |
TANC1 | 3,5 | | | | | | | | |
cg21390755 | | Hypomethylation | 0.11 | -0.4 | 0.03 | -0.57 | 0.12 | -0.45 | |
ASCL2 | 5,4 | | | | | | | | |
cg22346124 | | Hypomethylation | | | 0.13 | -0.42 | | | |
cg10290276 | | Hypomethylation | | | 0.03 | -0.55 | | | |
cg13930892 | | Hypomethylation | 0.004 | -0.66 | 0.01 | -0.65 | 0.02 | -0.6 | |
cg20912770 | | Hypomethylation | | | 0.05 | -0.52 | | | |
cg10526374 | | Hypomethylation | | | 0.05 | -0.51 | | | |
cg06263495 | | Hypomethylation | 0.01 | -0.6 | 0.02 | -0.59 | | | |
cg26051413 | | Hypomethylation | | | | | 0.52 | -0.19 | |
cg11644479 | | Hypomethylation | | | | | 0.02 | -0.62 | |
cg13762320 | | Hypomethylation | | | | | 0.03 | -0.58 | |
cg19284039 | | Hypomethylation | | | | | 0.09 | -0.48 | |
FBN2 | -6 | | | | | | | | |
cg23696863 | | Hypomethylation | 0.05 | 0.49 | 0.09 | 0.46 | 0.12 | 0.45 | |
MIR4435-2HG | 4,4 | | | | | | | | |
cg24783876 | | Hypomethylation | 0.02 | -0.55 | 0.05 | -0.51 | 0.06 | -0.53 | |
SLC18A2 | 8,7 | | | | | | | | |
cg03570973 | | Hypomethylation | 0.001 | -0.72 | 0.001 | -0.77 | 0.0009 | -0.8 | |
cg19721867 | | Hypomethylation | | | 0.003 | -0.72 | | | |
cg10245915 | | Hypomethylation | | | 0.01 | -0.65 | | | |
cg01043119 | | Hypomethylation | | | 0.0005 | -0.8 | | | |
cg19617377 | | Hypomethylation | | | 0.003 | -0.72 | | | |
cg00512279 | | Hypomethylation | | | 0.001 | -0.76 | 0.03 | -0.58 | |
cg08521987 | | Hypomethylation | | | 0.05 | -0.51 | 0.007 | -0.7 | |
cg27186877 | | Hypomethylation | | | | | 0.03 | -0.59 | |
Ten upregulated DEGs with hypomethylated CpGs (BOC, MIR4435-2HG, SCL18A2, ASCL2, DAPK1, NPDC1, CNKSR3, ITGA6, SCL45A3, and CTHRC1) were selected for verification by RT-qPCR. The genes showing correlation between the two techniques were DAPK1, CNKSR3, MIR4435-HG2, CTHRC1, NPDC1, SLC45A3, ITGA6, and ASCL2(see Supplementary Fig. S1 online). BOC and SLC18A2 did not demonstrate correlation and were excluded from further analysis.
All samples (descriptive and validation cohort) were used to verify whether gene expression was different between responders and non-responders at each time point of the evaluation. Interestingly, according to the RNA-seq results, MIR4435-2HG, CNKSR3, ITGA6, NPDC1, and DAPK1 were overexpressed in MRD + in all comparison groups and had differential FC values between conditions (see Supplementary Fig. S2 online).
We used logistic regression to test whether genes could predict response to induction chemotherapy. MIR4435-2HG was found to be the best predictor of whether a patient will be a true responder, refractory (Figs. 3a and b), or non-responder at day 15 of induction (Figs. 3c and d). Regarding DAPK1, ASCL2, SCL45A3, NPDC1 and ITGA6, it was observed that they could also predict response at day 15 or whether the patient will be a true responder or refractory (see Supplementary Fig. S3 and S4 online); however, both MIR4435-2HG and the other genes do not predict whether the patient will respond at the end of induction (see Supplementary Fig. S5 online).
To test whether MIR4435-2HG could also predict the risk of death, we performed logistic regression using our RNA-seq counts. We observed that MIR4435-2HG can predict death with good sensitivity and specificity (Fig. 3e). Finally, patient overall survival was compared based on MIR4435-2HG gene expression. We observed that patients with MIR4435-2HG overexpression have worse survival than those without it (Fig. 3f).
When analyzing the relationship between normalized RNA-seq counts of MIR4435-2HG and the probability of non-response to treatment at day 15 or being refractory, we found that patients with counts > 5.1 had a 66% probability of being refractory to treatment. This probability increased proportionally to gene expression. In addition, for patients with counts > 4.97, the probability of therapeutic failure at day 15 was > 60% and increased progressively as MIR4435-2HG overexpression increased. Similarly, the probability of death increased when counts were > 7.0 (see Supplementary Table S3 online).
To test whether the identified genes could also be useful as prognostic biomarkers of relapse or death, first, we compared FC values between patients with MRD-/- and patients with relapsed using RT-qPCR analysis. We found that DAPK1 and CNKSR3 had higher FC in relapse samples than in MRD-/- samples, while MIR4435-2HG showed a tendency (Fig. 4a). No difference was observed with ITGA6, NPDC1, ASCL2, CTHRC1 and SLC45A3(see Supplementary Fig. S6 online).
To validate the role of MIR4435-2HG, we analyzed its expression in a larger patient cohort with longer follow-up, using normalized RNA-seq counts obtained from the BM of 58 children with recurrent ALL from TARGET Pan-Cancer database. The 5-year survival analysis showed that MIR4435-2HG overexpression is a poor prognostic variable for patient survival (Fig. 4a). Interestingly, average counts in this cohort (7.82) correlate with our cutoff points, which would give a 100% probability of being refractory and an 80% probability of dying. Nevertheless, in this database, DAPK1 and CNKSR3 were not correlated with survival.
Four Cox regression models were performed to determine which variable might affect patient survival. The first model employed the clinical variables used currently to define the risk of death; in this model, no variable conferred risk of death. The second model used the same clinical variables and gene overexpression; in this model, no variable conferred risk either. However, in the third model, when using only MRD and the selected genes, MIR4435-2HG overexpression was the only variable that increased the risk of death 29 times. Similarly, in the fourth model, when evaluating the entire gene profile, MIR4435-2HG overexpression affected the risk of death (Table 3). Strikingly, survival analysis showed that patients overexpressing MIR4435-2HG had worse survival compared to those who did not overexpress MIR4435-2HG(Fig. 4b).
Table 3
MULTIPLE COX REGRESSION USING CURRENTLY CLINICAL VARIABLES AND GENE PROFILE
| MODEL 1. CURRENTLY USED CLINICAL VARIABLES | MODEL 2. CURRENTLY USED CLINICAL VARIABLES AND GENE PROFILE | MODEL 3. GENE PROFILE AND MRD | MODEL 4. GENE PROFILE | |
PARAMETER | | | p-value | OR (95%CI) | p-value | OR (95%CI) | p-value | OR (95%CI) | |
Age | 0.76 | NC | 0.46 | NC | | | | | |
WBC | 0.62 | NC | > 0.99 | NC | | | | | |
Extramedullar infiltration | > 0.99 | NC | > 0.99 | NC | | | | | |
Corticoid response | 0.15 | NC | > 0.99 | NC | | | | | |
MRD + day 15 | 0.36 | NC | 0.51 | 0.32 (0.002–11.91) | 0.34 | 0.25 (0.006–3.80) | | | |
MRD + final induction | 0.22 | NC | 0.36 | 3.71 (0.1786–121.0) | 0.23 | 3.49 (0.357–37.17) | | | |
MIR4435-2HG | | | 0.22 | NC | 0.01 | 29.42 (2.52-1 067) | 0.005 | 16.47 (2.49–157.2) | |
DAPK1 | | | > 0.99 | NC | 0.41 | 3.12 (0.16–96.95) | 0.32 | 2.87 (0.26–24.31) | |
ITGA6 | | | 0.18 | NC | 0.11 | 0.03 (0.00002–1.59) | 0.14 | 0.09 (0.002–2.27) | |
NPDC1 | | | > 0.99 | NC | 0.10 | 13.22 (0.68–631) | 0.09 | 9.40 (0.90–221.5) | |
CNKSR3 | | | 0.92 | NC | 0.87 | 1.17 (0.11–9.55) | 0.97 | 0.97 (0.11–7.07) | |
ASCL2 | | | > 0.99 | NC | 0.73 | 1.58 (0.08–29.20) | 0.92 | 0.88 (0.05–11.05) | |
CTHRC1 | | | > 0.99 | NC | 0.52 | 0.43 (0.02–5.49) | 0.24 | 0.26 (0.01–2.56) | |
SCL45A3 | | | > 0.99 | NC | 0.51 | 2.55 (0.08–45.41) | 0.54 | 2.11 (0.11–24.78) | |
MRD indicates minimal residual disease; NC, No calculable, WBC, white blood counts |
However, since MRD is the most used variable to define risk of death, for survival analysis, we considered MRD at the end of induction, effectively showing that MRD- patients have better survival than MRD+ (Fig. 4c). Interestingly, when we compared survival combining MRD + with MIR4435-2HG overexpression vs. MRD- with down-expression of MIR4435-2HG, we observed that these two variables separate the survival curves better, where patients with MRD + and MIR4435-2HG overexpression presented worse survival (Fig. 4d).
Finally, since more than half of our patients had normal karyotype, we wanted to test whether selected genes could improve risk classification in patients with normal karyotype. Surprisingly, we found that simultaneous overexpression of MIR4435-2HG, DAPK1, and ASLC2 was associated with poorer survival in these patients compared to those who did not overexpress them (Fig. 4e).