Patients
All patients included in the study were diagnosed as precursor BCP-ALL, based on morphology and immunophenotyping, and treated on the Indian Childhood Collaborative Leukaemia Group (ICICLE)/(INPOG-ALL-15-ICiCLE-ALL-14; CTRI/2015/12/006434) 61 protocol between 2015 to 2019. Consent was obtained prior to start of treatment and for collection and banking of samples. The institutional review board of the institute approved the study.
Banking and Nucleic Acid Extraction
Mononuclear (MNC) fractions were obtained with Ficoll density gradient centrifugation from bone marrow aspirates collected at diagnosis and remission. Genomic DNA and total RNA were extracted from MNC cells using QIAamp Blood kit from Qiagen and Trizol reagent (Invitrogen), respectively. The concentration and purity of the extracted nucleic acids were assessed using a Thermo Nanodrop ND-2000 UV-Vis Spectrophotometer (Wilmington, DE, USA) and Qubit 2.0 Fluorometer (Thermo Fisher Scientific, USA) and gel electrophoresis. RNA quality was assessed using Agilent TapeStation 4200. Samples demonstrating > 6.0 RINe values were selected for further downstream analysis.
Cytogenetics
Cytogenetic analysis of pre-treatment bone marrow or peripheral blood was performed using karyotyping and FISH analysis using standard protocols 62. High hyperdiploidy was diagnosed integrating the results of karyotyping and FISH analysis as described before60.
SNP arrays analysis
SNP array analysis was performed using CytoScan™ HD, (ThermoFiSher, USA) platform as per manufacturer’s instructions. Briefly, DNA extracted from mono nuclear fractions obtained from bone marrow aspirates collected at diagnosis was digested and ligated to NSP1 adaptors followed by amplification using Titanium Taq. The PCR products were purified, fragmented, biotin labelled and hybridised to Cytoscan™ HD GeneChip. Data was analysed using the Chromosome Analysis Suite 3.1.1 (ChAS 3.1.1, ThermoFISHER) software suite based on GRCh37/hg19 as previously described 63,64.
Remission DNA samples from patients with one or more whole chromosome uni parental isodisomy (wUPD) at diagnosis on the Cytoscan™ HD analysis were analysed with an Infinium 850K Beadchip array SNP. The samples were processed in accordance with the manufacturers protocols. Illumina generated .GCT files were imported into BlueFuse Multi version 4.5 (Illumina) and analyzed using the default BlueFuse algorithm.
Targeted Panel Sequencing
A targeted panel of 95 genes (Supplementary Table 3), based on reported mutations in BCP-ALL patients 28 was designed using the Agilent SureDesign software to identify single nucleotide variants (SNV). The probe design included 102966 amplicons covering 1.977 million bases; all coding exons, intron-exon boundaries and average 1000 bp upstream promoter sequences within selected genes. Target regions were captured using Agilent HaloPlex High-Sensitivity Target Enrichment kit (Agilent technologies, Santa Clara, USA) according to the manufacturer’s instructions. Library fragments were amplified by PCR followed by quality assessment using D1000 High-Sensitivity Screen-Tape analysis. Qubit fluorometry as well as a quantitative PCR-based (qPCR) approach using KAPA library quantification kit from Roche Diagnostics were used to quantify each of the libraries before pooling them for sequencing. An expected average fragment size of 230 to 250 bp were considered as a typical library fragment to perform paired-end (2x75 bp) sequencing using Next-Seq 550 Illumina platform (Illumina, San Diego, CA, USA).
RNA Sequencing
Whole-transcriptome sequencing was performed using the Illumina TruSeq stranded Total RNA library prep protocol. Quantification of libraries was performed using Qubit 2.0. After passing the quality control parameters the libraries were sequenced using Next-Seq 550 instrument at 2x150 bp read-length. An average of 50-55 million reads were generated for each sample.
Informatics
Fastq files were quality assessed and mapped to the human genome 19 using Burrows-Wheeler aligner BWA, MEM algorithm and converted to BAM files. Alignment was refined with Broad GATK and quality controlled for miscalls and recalibrated prior to analysing variant calls. Three variant callers (VarScan, VarDict and Mutect2) were used to identify variants. Variants identified by at least 2 of the 3 tools were selected for gene-based functional annotation using ANNOVAR.
3408 variants were obtained after excluding synonymous variants. 2189 germline variants, identified from sequencing remission DNA samples, were excluded leaving 1219 single nucleotide variants (SNVs). The 1000 genome and the dbSNP database were used to filter out single nucleotide polymorphisms as were variants with low allele frequencies (<12.25%), leaving 564 variants. PROVEAN, SIFT, FATHMM and MutationTaster (http://www.mutationtaster.org/) tools were used to predict the deleterious effect of mutations. The M-CAP pathogenicity classifier was used to classify rare missense variants. 273/564 variants were predicted to be pathogenic or probably pathogenic by at least two of 5 programs. Frameshift insertions, deletions substitutions and stop gain variants were considered pathogenic by default. Variants with less than 10 reads were excluded. 270 SNVs were inspected manually with the Integrative Genomic Viewer (IGV), and 138 variants validated by Sanger sequencing and IGV visualisation of RNA sequencing data (Supplementary Fig. 10). Mutations not previously described; or not validated by sequencing; or not present on RNA sequencing; or uncertain on IGV inspection, were excluded.
Transcriptomic Analysis
RNASeq samples were first mapped to the reference human genome (Grch37 assembly) using hisat2 alignment program. Aligned bam files obtained from the program were further used to generate the hisat2 specific counts for annotated human Ensembl Grch37 genes using featureCounts program. Finally, the count files were then used to identify a common set of differentially expressed (FDR < 0.1) genes from EdgeR and DESeq2 program for downstream analysis (Supplementary Table 8). We applied independent hypothesis weighting (IHW) method 65 during multiple testing procedure that increases power compared to the method of Benjamini and Hochberg by assigning data-driven weights to each hypothesis. The principal component analysis (PCA) of gene expression profile of the patient samples was performed with the top-1000 highly variable gene set. The PCA and subsequent cluster analysis were performed using FactoMineR R package. We further estimated the copy-number normalized expression signal of each gene after dividing the raw counts of the genes by their observed copy-number values and then multiplied by their expected diploid copy-number.
Whole Genome Analysis
Fastq reads were aligned against build 37 of the human reference genome using BWA mem (0.7.15) software. The resulting files were converted to compressed binary format (BAM), sorted by coordinate, indexed, and marked for PCR duplicate reads using the Picard toolkit (bundled with GATK v4.1.8.1). Base quality scores were recalibrated with GATK to improve call accuracy and variants were called with the Haplotype Caller algorithm from GATK.
Copy Number Analysis
Chromosome coverage was computed as the median of the coverage of each individual base in the chromosome. The coverage of a position in a chromosome was computed as the number of reads mapped to the chromosome that spanned the position. In computing the coverage of a position in the chromosome, only reads with a mapping quality >= 20 with base quality>= 10 were considered. Unlike targeted exome sequencing, whole genome sequencing yields uniform coverage. Chromosome 13 was selected as reference and the ratios of median coverages of other chromosomes were evaluated with respect to the reference chromosome (Supplementary Fig. 11) to derive copy number status of each chromosome. Chromosomes that showed gains on SNP array analysis exhibited a corresponding high median coverage ratio (> 1.2) with respect to the reference.
Analysis of copy neutral loss of heterozygosity
Segments of genome with UPD are characterized by a high percentage of homozygous calls. In order to confirm the wUPD reported on SNP array analysis, variant calls with the highest confidence were evaluated in the whole genome data. Specifically, a high confidence subset of single nucleotide variants in the samples (calls marked as “PASS”, with a GQ>= 30) that had also been reported in 1000 genomes (phase 3) were shortlisted for the analysis. The fraction of homozygous variants per chromosome arm was computed and compared with variant calls from five 1000 genomes (phase 3) samples of similar sex and ethnicity (population code BEB) as reference (Supplementary Fig. 12). Chromosomes reported with wUPD on SNP array analysis showed significantly higher than expected fraction of homozygous SNV calls (Fig. 2B) in whole genome sequencing. wUPDs of chromosomes 2, 9, 19 in UPN25, chromosomes 15, 19, 20 in UPN13 and in chromosome 10 in UPN24, were confirmed by analysis of whole genome data.
Structural Abnormalies (>5MB)
|
Number of patients (%)
|
1q duplication
|
27 (47%)
|
1q triplication
|
2(3.5%)
|
Isochromosome 7
|
4(7%)
|
8p deletion
|
2(3.5%)
|
16p deletion
|
2(3.5%)
|
6q deletion
|
1 (2)
|
20p deletion
|
1(2%)
|
13q duplication
|
1(2%)
|
12p duplication
|
1(2%)
|
19q duplicaiton
|
1(2%)
|
5q duplication
|
1(2%)
|
Xq duplication
|
1(2%)
|
isochromosome Xq
|
1(2%)
|
Chromopthisis like pattern involving chromosome 20
|
1(2%)
|
Table 1. Frequency of Structural abnormalities > 5 MB identified by Cytoscan -HD SNP array analysis in 57 cases of paediatric high hyperdiploid BCP-ALL. More than one structural abnormality was present in 13 patients.