Patient Cohort – Our analysis included samples from a total of 287 (mostly) pediatric and adolescent patients of diverse (self-identified) ethnic background (Tab1. 1), most abundant among them Urdu-speaking (North India), Sindhi, Saraiki, Punjabi, Pathan and Balochi, all treated at the Afzaal Memorial Thalassemia Foundation (“AMTF”) hospital in Karachi. All patients were on chronic transfusion protocols.
The majority of these patients (n=274) had a diagnosis of β-thalassemia, but 13 blinded samples from patients with diagnoses other than β-thalassemia also were included as “negative controls”, namely: hereditary spherocytosis (5), immune thrombocytopenic purpura (2), hemolytic anemia (2), autoimmune hereditary anemia (1), Fanconi anemia (1), bone marrow failure syndrome (1) and alpha (α)-thalassemia (1). One sample failed to produce data, both by LeanSequencing and Sanger sequencing, and was excluded from further analysis.
Of the 274 patients in the β-thalassemia cohort, 132 were female, with a mean (median) age of 8.7 (6.5) years, and 142 were male, with a mean age of 9.3 (7.0) years (Table 1).
Sample Collection & Processing – Duplicate barcoded buccal swab samples were collected, and crude extracts prepared [Patel et al unpublished data]. These were processed, without DNA purification, by a multiplexed PCR reaction comprising both HBB and HFE genes, followed by an allele-specific labeling reaction and analysis by capillary electrophoresis by send out to a third party (Genewiz, South Plainfield, NJ), in accordance with LeanSequencingTM (“LSQ”), a novel process for analyzing sequence variants developed at BioMolecularAnalytics (Warren, NJ).
LSQ Protocol: Amplify & Discriminate – The two principal analytical steps in LeanSequencing are [unpublished data; Hashmi et al 2017, 2019] are amplification & discrimination. Amplification, in a single multiplex PCR reaction, produces a set of amplicons comprising all sequence variants of interest; each amplicon bears a molecular tag (aka “barcode”) that identifies the sample of origin and permits the simultaneous discrimination of multiple amplicons. Discrimination, in a second multiplex PCR reaction, produces labeled allele-specific amplicons for analysis in a standard capillary sequencer, in this case the 96-channel ABI 3730xl. The discrimination reaction is configurable so as to select specific marker (“SNP”) sets according to ethnicity or geographic area of interest, and configurable so as to accommodate individual or multiple (2 or 4) “pooled” samples per well. The protocol omits extraneous steps such as “normalization” and “clean-up” reactions, and, so as to further simplify the process, accommodates crude extracts prepared from buccal swabs. The entire process takes less than 1h of hands-on time per 96-well plate, and it is readily automated using inexpensive laboratory pipetting instrumentation. The process achieves a very high data rate – for example, in the “pools of 4” configuration used here, a single run on a standard sequencer with a 96-capillary array produces complete molecular HBB and HFE profiles for 384 samples.
Selection of HBB and HFE Variants - Our LSQ design comprises 18 clinically relevant β-thalassemia mutations as well as the 3 most common hemochromatosis mutations (Table 2). To select β-thalassemia mutations of interest, we started with the set most commonly observed in Pakistan and Middle Eastern countries [15] as well as 3 sickle cell anemia mutations. Following initial testing of this design, we selected 64 samples with at most one of these most commonly observed mutations for Sanger sequencing of HBB gene exon 1, partial intron 1 and exon 2 and thereupon expanded our initial set to include five mutations, namely: cd 30 G > C ("Monroe"); cd 16 delC; cd 15 G > A; cd 5 –CT; and the rare -90 C > T mutation (aka rs34999973 C > T [16]. The final selection for our LSQ application comprises 18 HBB and 3 HFE mutations, as detailed below.
This selection also covers many of the common mutations observed in other regions, including (with reference to Table 1 in reference [2], Figure. 3 in reference [3]: Mediterranean (cd 5 –CT, IVS I–1 G > A, IVS I–6 T > C, IVS I–110 G > A); Central and SE Asian (cd 41/42 – TTCT); East Asian (IVS I–5 G > C); African (–29 A > G, –88 C > T); and Indian ( –619 deletion) and (with reference to Table 1 in [17]: Middle Eastern (IVS I–5 G > C, IVSI-1 G > A, IVSI-6 T > C and cd 5 –CT.
Statistical Analysis – Allele frequencies were determined by “gene counting” from genotypes. All analysis was performed, and tables and figures generated, using Microsoft Excel.