Patient Cohort – Our analysis included samples from a total of 288 (mostly) pediatric and adolescent patients of diverse (self-identified) ethnic background, most abundant among them Urdu-speaking (North India), Sindhi, Saraiki, Punjabi, Pathan and Balochi, all treated at the Afzaal Memorial Thalassemia Foundation (“AMTF”) hospital in Karachi. All patients were on chronic transfusion protocols.
The majority of these patients (n=275) had a diagnosis of β-thalassemia, established by standard clinical methods including blood work, hemoglobin electrophoresis and HPLC; in addition, 13 blinded samples from patients with diagnoses other than β-thalassemia also were included as “negative controls”, namely: hereditary spherocytosis (5), immune thrombocytopenic purpura (2), hemolytic anemia (2), autoimmune hereditary anemia (1), Fanconi anemia (1), bone marrow failure syndrome (1) and alpha (α)-thalassemia (1). One sample failed to produce data, both by LeanSequencing and Sanger sequencing, and was excluded from further analysis. Of the remaining 274 patients with β-thalassemia, 132 were female, with a mean age of 8.7 years, and 142 were male, with a mean age of 8.6 years (Table 1); the age distribution for both is positively skewed (with a mode below the mean, near ~6 years).
Sample Collection & Processing – Duplicate barcoded buccal swab samples were collected, and crude extracts prepared using the Phusion Human Specimen Direct PCR kit (ThermoFisher Scientific, Waltham, MA) by a protocol that, following preparation of the lysate, requires only simple spinning (but no high-speed centrifugation) and takes ~10 min to complete for a batch of 8 samples. These extracts were processed by a multiplexed PCR reaction comprising both HBB and HFE genes, followed by an allele-specific labeling reaction and analysis by capillary electrophoresis (the latter performed by send-out to a service provider, Genewiz, South Plainfield, NJ), in accordance with LeanSequencingTM (“LSQ”).
LSQ Protocol: Amplify & Discriminate – LeanSequencing is a novel process developed at BioMolecular Analytics (Warren, NJ) for analyzing sequence variants. The two analytical steps in LeanSequencing are amplification and discrimination (Fig. 1). Amplification, in a single multiplex PCR reaction, produces a set of amplicons comprising all β-thalassemia and hemochromatosis sequence variants of interest; each amplicon bears a molecular tag (aka “barcode”) that identifies the sample of origin and permits amplicons from multiple samples to be combined (“pooled”). Following pooling, discrimination, in a second multiplex PCR reaction, produces labeled allele-specific amplicons, from, in this case, 4 samples per well, for analysis in a standard capillary sequencer, in this case the 96-channel ABI 3730xl.
The protocol omits extraneous steps including DNA purification, “normalization” and “clean-up” reactions. The discrimination reaction is configurable so as to select specific marker (“SNP”) sets according to ethnicity or geographic area of interest, and to accommodate individual or multiple (2 or 4) “pooled” samples per well. The entire process takes less than 1h of hands-on time per 96-well plate, and it is readily automated using inexpensive laboratory pipetting instrumentation. The process achieves a very high data rate – for example, in the “pools-of-4” configuration used here, a single run on a standard sequencer with a 96-capillary array produces complete molecular HBB and HFE profiles for 384 samples. Raw data, in the form of sequence traces, can be uploaded to the BioMolecular Analytics web portal for processing by proprietary software that can be accessed from anywhere using only a web browser.
Selection of HBB and HFE Variants - To select mutations of interest, we started with the set most commonly observed in Pakistan and Middle Eastern countries [15], namely: –del 619; IVS I-1 G > T or G > A; IVS I-5 G > C; IVS I-6 T > C, IVS I-110 G > A, –88 C > T, –29 A > G, cd 8/9 +G, cd 41/42 –TTCT, as well as cd 6 A > T (the sickle cell anemia mutation, “HbS”) and the two structural variants, cd 6 A > G (“HbC”) and cd 26 G > A (“HbE”). Following initial testing of this design, we selected 64 samples with at most one of these most commonly observed mutations for Sanger sequencing of HBB gene exon 1, partial intron 1 and exon 2 to check for any additional mutations or variants, notably these five mutations: cd 5 –CT, cd 15 G > A, cd 16 –C, cd 30 G > C ("Monroe"), and the rare –90 C > T mutation (aka rs34999973 C > T [16]), as well as these two variants: rs713040 c.9 T > C and rs35799536 G > C .
The final selection for our LSQ application comprises 18 HBB and 2 HFE mutations excluding the two variants (Table 2); as initial testing showed all patients to be normal for S65C, this was omitted. This selection covers many of the mutations commonly observed in other regions, namely (with reference to Tb. 1 in reference [2] and Fig. 3 in reference [3]): Mediterranean (cd 5 –CT, IVS I–1 G > A, IVS I–6 T > C, IVS I–110 G > A); Central and SE Asian (cd 41/42 –TTCT); East Asian (IVS I–5 G > C); African (–29 A > G, –88 C > T); and Indian ( –del 619) and (with reference to Tb. 1 in [18]): Middle Eastern (IVS I–5 G > C, IVS I-1 G > A, IVS I-6 T > C and cd 5 –CT).
Statistical Analysis – Allele frequencies were determined from genotypes by “gene counting”. All analysis was performed, and all data tables and figures were generated, using Microsoft Excel.