Unique mutation and gene expression in High Hyperdiploid Acute Lymphoblastic Leukaemia with whole chromosome uniparental isodisomy


 Whole genome copy number alterations, targeted gene-panel and RNA sequencing were used to investigate differences in high hyperdiploid (HH) B-cell acute lymphoblastic leukemia (ALL), with and without whole chromosome uniparental isodisomy (wUPD). Patients with wUPD demonstrated a higher modal number of chromosomes with gains of chromosome 5. Mutations in genes within epigenetic pathways with upregulation of genes involved in cellular response to stress and stimuli were seen in wUPD, in contrast to mutations in RAS/RTK pathways and upregulation of genes in RNA Polymerase III pathway in noUPD. AP-1 transcription factors and NUDT18, potentially implicated in thiopurine drug resistance, were upregulated in wUPD. wUPD was associated with female gender, higher presenting white cell count and lower end of induction minimal residual disease levels, though survival rates were similar in the two groups. Genome-wide differences in HH ALL with and without UPD suggest plausible biological explanations for the heterogeneity in therapeutic response.


Unique mutation and gene expression in High
Hyperdiploid Acute Lymphoblastic Leukaemia with whole chromosome uniparental isodisomy Mayur  this report, we investigate copy number alterations, mutations and gene expression in HH with and without wUPD to investigate possible biological differences that may lead to variations in response to treatment.

Results
Chromosomal Gains in High Hyperdiploid ALL One hundred six patients with HH were identi ed in 303 BCP-ALL patients sequentially analysed between 2015-2019. 58 HH patients were analysed with whole genome SNP arrays (Supplementary Table 1 and Supplementary Fig. 1). Detection of copy neutral loss of heterozygosity (LOH) across most chromosomes with an accompanying TP53 mutation identi ed a patient with masked hypodiploidy who was excluded from further analyses. Similar to previous reports, the most frequent gains of chromosomes were + X, + 4, +6, + 10, +14, + 17, +18 and + 21 with modal numbers ranging from 52-67 chromosomes ( Fig. 1A) 1,11,19−21 . Majority of gains (83%) were trisomies, except for chromosome 21, which was tetrasomic in 48 and pentasomic in two patients. All tetrasomies were duplications of two homologues (2:2 pattern), except in four patients (3 patients with tetrasomy 21 and one with tetrasomy 10). The only chromosome loss detected was a monosomy 7 seen in a UPN7. Subclonal chromosomal gains were seen in 6 patients (Supplementary Table 2.1).
In UPN 8, an IKZF1 deletion (exon 2-8) was detected on a single homologue of a trisomic chromosome 7. A 37.1Kb deletion in 16q22.1 was detected in UPN 16. This resulted in an in-frame fusion between exon 2 of RIPOR1 and exon 3 of CTCF, veri ed at transcript level, resulting in a premature stop codon and presumably loss of CTCF function (Fig. 1E). A RIPOR1::CTCF fusion has been previously reported in acute myeloid leukaemia 30 . A similar loss of CTCF function as a result of a downstream deletion, resulting in a CTCF::PARD6A fusion has been previously described in HH ALL. 13 Deletions in 13q12.2, as previously reported, 11,31 were not observed in this cohort. An AA genotype (rs10994982) of the ARID5B gene reported to be associated with an increased risk of HH BCP-ALL was detected in 45 (79%) patients 32,33 (Supplementary Table 2.4).

Frequency of Whole Chromosome Uniparental isodisomy in HH
Whole-chromosome uniparental isodisomy (wUPD) was seen in 26 (45%) patients (median 2; range 1-6 chromosomes), with subclonal wUPD of chromosome 9 in two patients ( Fig. 2A). wUPD was further con rmed by whole genome sequencing in 3 patients (Fig. 2B). No germline wUPD's were detected in remission DNA samples in 22 analysed patients, con rming the somatic origin of wUPD (Supplementary  Table 2.5). UPN10 was a female patient with copy neutral LOH of four copies of chromosome X.
More frequent gains of chromosomes + 5, +11, + 12, +16 and + 22 not typical of classical HH were observed in wUPD (Fig. 2D). Gains of chromosomes 17 and 18 were evenly distributed in wUPD and noUPD patients with gain of chromosome 5 more frequent in wUPD patients. Given the differences in patterns of gains between HH with wUPD and noUPD we further investigated for differences in somatic mutations and gene expression between the two groups.
Distinct Mutations are present in wUPD and noUPD Mutation analysis using a targeted gene panel (mean coverage 1311.4x, range 124.3-5122x, Supplementary Table 3, Supplementary Fig. 2) was performed on 20 noUPD, 21 wUPD and one HH patient for whom somatic copy number abnormalities (SCNA) data was not available. Two hundred seventy-three variants were predicted to be pathogenic or probably pathogenic based on the analysis done by an in-house informatics pipeline ( Supplementary Fig. 3 Fig. 4 and Supplementary Table 4) The majority of mutations, 41/60 (68%), were identi ed in genes with two copies. Seventeen (28%) were detected in genes with more than two copies. Mutations were infrequent in wUPD chromosomes, with only 3 (5%) detected (Fig. 3A). Two mutations were identi ed in samples where SCNA information was unavailable. The majority (86%) of mutations had variant allele frequencies (VAF) < 50% with a peak between 31-40% (Fig. 3A), suggesting that mutations occurred after chromosomal gains as previously reported 13 . In UPN1 with wUPD of chromosome 1, the somatic VAF of ARID1A (1p36.11) was 89.4%, suggesting that this mutation preceded the development of wUPD. Transitions (N = 31) were more frequent than transversions (N = 20) (Fig. 3B). The most frequent mutational signature was C > T transition, all involving CpG dinucleotides in both groups, suggesting common endogenous mutagenic mechanisms as previously reported 13 .
As previously described in HH ALL the most frequent mutations were in genes belonging to the RAS, epigenetic and chromatin modifying pathway 13 (Fig. 3C)., Though the numbers are small mutations involved in the RAS pathway (KRAS,NRAS) occurred more frequently in noUPD, while genes involving epigenetic and chromatin modifying pathways(KMT2D,ARID1A) were more likely to be mutated in wUPD patients (Fig. 3D). Mutations in KRAS and NRAS genes occurred primarily in described hotspot regions. The few in non-hotspot regions (K117N, A146T, E63K) have been described previously in BCP-ALL and other cancers 13,34−38 (Fig. 3E) The RAS mutations were mutually exclusive except in UPN30, where KRAS and NRAS VAFs of 15% and 16% were detected. This could possibly be due to mutations present in two different subclones. Of the 7 mutations identi ed in KMT2D, there were 6 missense mutations in the disordered protein region and one non-frame shift substitution in the PHD domain (Fig. 3F). Two of the three KMT2D mutations in this cohort, not previously reported were veri ed on RNA sequencing.

Differences in gene expression between wUPD and noUPD
RNA sequencing was performed on 14 wUPD and 14 noUPD patients. From 20,260 protein coding genes identi ed, the 1000 most variable genes (Mahalanobis distance method) were used for further analysis. The expression levels of diploid wUPD chromosomes lay between haploid and diploid chromosomes, suggesting that the loss of allelic variation was associated with a lower gene expression in chromosomes with wUPD 39 (Fig. 4A). Gene expression levels are strongly in uenced by SCNAs. In HH, this is at the level of whole chromosomal gains 13,17,40,41 . Unsupervised hierarchical clustering did not distinguish between HH with or without wUPD ( Fig. 4B) possibly because of the commonality of whole chromosomal copy numbers between the two groups ( Fig. 2A). Indeed, the commonality in segmental SCNAs between wUPD and noUPD provides a partial explanation for the grouping in the hierarchical cluster (  Table 5). The central wUPD cluster was heterogeneous for SCNAs and mutations. This suggested the possibility of a unique set of genes expressed differentially in wUPD, accounting for the cluster. To further identify genes uniquely expressed in wUPD, a differential gene expression analyses was performed with EdgeR (FDR < 0.1) 42 with gene enrichment analysis (TOPPR). 113 and 108 genes were signi cantly upregulated in wUPD and noUPD respectively (Supplementary Table 6.1). The expression of these genes were also in uenced by whole chromosomal gains (Fig. 5B). Restricting the analysis to log fold change of -1 to + 1, identi ed 71 and 18 genes signi cantly upregulated in wUPD and noUPD respectively (Supplementary Table 6.2 and Fig. 5C). Pathway analysis was performed using 84/89 protein coding differentially expressed genes (DEGs).As previously reported for HH ALL, in both wUPD and noUPD, pathway analyses showed an enrichment in transcription, translation and protein metabolism 43 (Supplementary Table 7). Genes involved in RNA polymerase II and III transcription were enriched in noUPD. Genes involved in cell response to stress stimuli and activation of AP-1 transcription family were among those enriched in wUPD (Fig. 5D).
In wUPD, copy number adjusted TPM values showed upregulation of the AP-1 transcription factors (FOS, FOSB and JUN), NUDT4 and NUDT18 and RAC2 ( Supplementary Fig. 6). RQ-RT-PCR con rmed the differences in expression in 10 of 15 genes analysed (Fig, 5E, NUDT15 analysed for comparison). Similar copy number adjusted TPM values suggested upregulation of ABL1 in noUPD and C-MYC in wUPD. RQ-RT PCR con rmed the expression of C-MYC (Fig. 5F), this was not validated for ABL1. Unavailability of samples with high expression of select genes may have affected the RQ-RT PCR results as well as variations in expression ( Supplementary Fig. 7). NRAS upregulation was unrelated to the activating mutations in NRAS ( Supplementary Fig. 7).
The downregulation of cohesion complex genes and CTCF resulting in weakening of topologically associated domain boundaries has been reported in HH 43 . We identi ed a RIPOR1-CTCF fusion with presumptive loss-of-function of CTCF (UPN16). In wUPD expression of CTCF (p = 0.0526) and cohesion complex gene SMC3 (p = 0.0357) were lower compared to noUPD patients ( Supplementary Fig. 8).

Clinical Comparison of patients with wUPD and noUPD
The demographics of the patients analysed in this study are shown in Table 2. wUPD were more likely to be female (p = 0.0497) and have a higher presenting WBC (p = 0.0004). A high end of induction MRD (≥ 10 − 4 ) was detected in 11/29 (38%) and 4/23 (17%) in noUPD and wUPD respectively (p = 0.1318). Relapse rates, event free and overall survival were comparable in the two groups ( Supplementary Fig. 9). Figure 6 shows the correlation of copy number alterations and mutations with therapeutic responses. There were no speci c SCNA or mutation patterns seen in relapsed patients. CREBBP mutation/deletions 14,44 , and in particular those associated with KRAS mutations 15

Discussion
To date, reports on genomic characterisation of HH ALL have come primarily from analyses of western patients. In this Indian cohort, chromosomal gains, modality, structural abnormalities, frequency of microdeletions in genes recurrently deleted in BCP-ALL (IKZF1, PAX5, CDKN2A/2B, ETV6) and mutations were similar to those previously reported 13,28,46,47 . Recurrent deletions at 13q12.2 that result in loss of TAD borders and overexpression of FLT3, 11,31 were not observed. wUPD in ALL occurs almost exclusively in HH ALL 16,18 . Mutations occurred primarily on disomic chromosomes with noUPD. Of the 3 mutations identi ed in wUPD chromosome, one is likely to have occurred prior to the development of wUPD. This suggests the development of wUPDs along with gains of chromosomes are initial events in the leukaemogenic process with mutations as secondary events. The expression of CTCF and SMC3 was lower in wUPD patients. The lower expression of CTCF and cohesion alters TAD boundaries impacting gene expression in HH ALLs 43 , with aberrant cohesion complexes associated with increased copy number heterogeneity at the level of whole chromosome copy numbers in HH ALL 48 . Thus, cohesion dysregulation is a potential mechanism for the higher modal numbers and more frequent gains of nonclassical chromosomes in wUPD HH ALL. A more detailed analysis using Assay for Transposase Accessible Chromatin (ATAC) sequencing, chromatin immunoprecipitation (ChIP) chromosome conformation capture, followed by sequencing to study the chromatin interactions and gene dosage difference between wUPD and noUPD patients is required to investigate this possibility.
In HH, trisomies of + 4, +10 and + 17 are associated with good 19,49,50 , and trisomies of 5 and 20 with poor outcomes 51,52 . Presence of + 17 and + 18 in the absence of + 5 and + 20 identi es subset of HH patients with a lower relapse rate and good overall survival rate 9 . Patients with wUPD were more likely to have + 5, though none of the patients in this series had a + 20. Patients with noUPD were more likely to have mutations in RTK/RAS pathways while genes involving epigenetic and chromatin modifying pathways were more likely to be mutated in wUPD. Along with a higher frequency of RAS mutations, NRAS expression (unrelated to mutations) was higher in noUPD. A number of noUPD patients also had upregulation of ABL1. NRAS mutations 7 and ABL1 expression 53 are associated with high MRD levels in childhood ALL. This supports the observation that patients with noUPD had higher MRD levels.
Therapy for ALL patients with high MRD levels at the end of induction is intensi ed, as this is associated with improved outcomes 54 . The intensive phase of treatment lasts 6-9 months and this is followed by maintenance therapy for 2-3 years. The main drug used during this latter phase is 6-Mercaptopurine. Intolerance or resistance to 6-Mercaptopurine is associated with late relapses occurring shortly before or after stopping therapy, and late relapse are seen in the majority of HH ALL. The AP-1 factors (FOS, JUN and FOSB) and RAC2, upregulated in wUPD, have been linked to resistance to 6-Mercaptopurine 55,56 . NUDT18, a member of Nudix hydroxylase family, was upregulated in wUPD. Germline polymorphisms in NUDT15, another Nudix hydroxylase, results in decreased activity of the enzyme. This results in an increase in the incorporation of thiopurine metabolites during the maintenance phase of therapy leading to excessive toxicity 57 . There is considerable overlap in the substrate activities of NUDT15 and NUDT18 including the dephosphorylation and inactivation of thiopurine metabolites 58 . Arguably somatic overexpression of either gene could promote resistance to thiopurines and associate with late relapse.
While higher MRD levels are associated with higher relapse rates, so is thiopurine resistance, and overall outcomes of between those with and without wUPD were comparable. Furthermore, in wUPD, but not in noUPD, there was also an elevated expression of C-MYC unrelated to copy number change of 8q24 or translocation involving that region. Isolated C-MYC expression has been previously described at the protein level in BCP-ALL, but its prognostic signi cance is unknown 59 .
Given the favourable outcomes of HH ALL, we would not expect to see a survival differences in this small cohort. Similarly, identifying differences in gene expression between the two groups would also bene t from a larger number of patients especially as the similarity in copy number variation in uences expression. Nevertheless, a number of differences of potential prognostic signi cance were identi ed in patients with wUPD and noUPD. Patients with wUPD, were more likely to be females ,have a higher presenting white cell count with a better MRD response. DEG analyses showed upregulation of cell cycle/mitosis genes (CCNB1, CDC20 and PLK2) suggesting that wUPD is associated with cell proliferation. Though our investigations suggest possible biological explanations for variations in the therapeutic response in HH ALL, it is likely the complex genomic changes in HH ALL, result in therapeutically signi cant genetic changes in individual patients. An example in this study is UPN20 which had deletions of IKZF1 (e2-3), and PAR1 with a JAK2 R683G activating point mutation. He suffered late relapse and retrospective FISH analysis showed a CRLF2 gene rearrangement. In our clinical setting, FISH and karyotype are used for initial diagnoses and treatment strati cation 60 and would not be able to identify such patients.
This study provides possible explanations, based on differences in gene expression, that merit further functional investigation. Further dissection of the response variability, prognostication and alternative therapeutic interventions requires collaborative analyses to which our data obtained from a systematically treated cohort of patients is contributory.

Patients
All patients included in the study were diagnosed as precursor BCP-ALL, based on morphology and immunophenotyping, and treated on the Indian Childhood Collaborative Leukaemia Group

Banking and Nucleic Acid Extraction
Mononuclear (MNC) fractions were obtained with Ficoll density gradient centrifugation from bone marrow aspirates collected at diagnosis and remission. Genomic DNA and total RNA were extracted from MNC cells using QIAamp Blood kit from Qiagen and Trizol reagent (Invitrogen), respectively. The concentration and purity of the extracted nucleic acids were assessed using a Thermo Nanodrop ND-Scienti c, USA) and gel electrophoresis. RNA quality was assessed using Agilent TapeStation 4200. Samples demonstrating > 6.0 RINe values were selected for further downstream analysis.

Cytogenetics
Cytogenetic analysis of pre-treatment bone marrow or peripheral blood was performed using karyotyping and FISH analysis using standard protocols 62 . High hyperdiploidy was diagnosed integrating the results of karyotyping and FISH analysis as described before 60 .
SNP arrays analysis SNP array analysis was performed using CytoScan™ HD, (ThermoFiSher, USA) platform as per manufacturer's instructions. Brie y, DNA extracted from mono nuclear fractions obtained from bone marrow aspirates collected at diagnosis was digested and ligated to NSP1 adaptors followed by ampli cation using Titanium Taq. The PCR products were puri ed, fragmented, biotin labelled and hybridised to Cytoscan™ HD GeneChip. Data was analysed using the Chromosome Analysis Suite 3.1.1 (ChAS 3.1.1, ThermoFISHER) software suite based on GRCh37/hg19 as previously described 63,64 .
Remission DNA samples from patients with one or more whole chromosome uni parental isodisomy (wUPD) at diagnosis on the Cytoscan™ HD analysis were analysed with an In nium 850K Beadchip array SNP. The samples were processed in accordance with the manufacturers protocols. Illumina generated .GCT les were imported into BlueFuse Multi version 4.5 (Illumina) and analyzed using the default BlueFuse algorithm.  Fig. 10). Mutations not previously described; or not validated by sequencing; or not present on RNA sequencing; or uncertain on IGV inspection, were excluded.

Transcriptomic Analysis
RNASeq samples were rst mapped to the reference human genome (Grch37 assembly) using hisat2 alignment program. Aligned bam les obtained from the program were further used to generate the hisat2 speci c counts for annotated human Ensembl Grch37 genes using featureCounts program. Finally, the count les were then used to identify a common set of differentially expressed (FDR < 0.1) genes from EdgeR and DESeq2 program for downstream analysis (Supplementary Table 8). We applied independent hypothesis weighting (IHW) method 65 during multiple testing procedure that increases power compared to the method of Benjamini and Hochberg by assigning data-driven weights to each hypothesis. The principal component analysis (PCA) of gene expression pro le of the patient samples was performed with the top-1000 highly variable gene set. The PCA and subsequent cluster analysis were performed using FactoMineR R package. We further estimated the copy-number normalized expression signal of each gene after dividing the raw counts of the genes by their observed copy-number values and then multiplied by their expected diploid copy-number.

Whole Genome Analysis
Fastq reads were aligned against build 37 of the human reference genome using BWA mem (0.7.15) software. The resulting les were converted to compressed binary format (BAM), sorted by coordinate, indexed, and marked for PCR duplicate reads using the Picard toolkit (bundled with GATK v4.1.8.1). Base quality scores were recalibrated with GATK to improve call accuracy and variants were called with the Haplotype Caller algorithm from GATK.  Fig. 11) to derive copy number status of each chromosome. Chromosomes that showed gains on SNP array analysis exhibited a corresponding high median coverage ratio (> 1.2) with respect to the reference.

Analysis of copy neutral loss of heterozygosity
Segments of genome with UPD are characterized by a high percentage of homozygous calls. In order to con rm the wUPD reported on SNP array analysis, variant calls with the highest con dence were evaluated in the whole genome data. Speci cally, a high con dence subset of single nucleotide variants in the samples (calls marked as "PASS", with a GQ>= 30) that had also been reported in 1000 genomes (phase 3) were shortlisted for the analysis. The fraction of homozygous variants per chromosome arm was computed and compared with variant calls from ve 1000 genomes (phase 3) samples of similar sex and ethnicity (population code BEB) as reference ( Supplementary Fig. 12). Chromosomes reported with wUPD on SNP array analysis showed signi cantly higher than expected fraction of homozygous SNV calls (Fig. 2B) in whole genome sequencing. wUPDs of chromosomes 2, 9, 19 in UPN25, chromosomes 15, 19, 20 in UPN13 and in chromosome 10 in UPN24, were con rmed by analysis of whole genome data.  The Targeted sequencing data, whole genome sequencing data and RNA sequencing data and the SNP array data have been submitted in SRA and GEO databases at NCBI, Bio-project Accession ID: PRJNA765140 Table 2   Table 2 is available in the supplementary les section. . The presence of a fusion transcript was con rmed by RT-PCR (right panel). The fusion transcript encodes for a non-functional CDS due to early termination of stop codon, resulting in loss of function of both RIPOR1 and CTCF genes. The schematic also shows a deletion involving exons 6-12 of the CTCF gene and exons 10-12 of ACD resulting in a CTCF-PARD6A fusion reported as reported earlier (13).     Summary of patient demographics, clinical data including WBC count, prednisolone response, EOI MRD, copy number status and mutations of 58 HH ALL patients.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.