Lab-created samples to evaluate enrichment approaches
Dried blood spots. To test the different parasite DNA enrichment approaches, dried blood spots were created by mixing cultured NF54-infected red blood cells with uninfected whole human blood. The laboratory-adapted isolate, NF54, was maintained in culture following the method of Trager and Jensen [14]. Parasite concentrations were microscopically enumerated from sorbitol-synchronized ring-stage NF54 culture and mixed with uninfected whole human blood (Interstate Blood Bank, Nashville, TN) resulting in parasite concentrations ranging from 10,000 to 500 parasites/µL and subsequently spotting 12.5 mL of blood onto Whatman 3 MM filter paper. DNA was extracted from dried blood spots using the protocol described by Zainabadi et al [15]. The DNA was treated under three conditions, including sWGA only and MspJI or MspJI –control (no enzyme)+ followed by sWGA, as illustrated in Supplementary Figure S1. The MpsJI– condition followed the same protocol as MspJI, without the enzyme.
Mixtures of DNA to assess amplification bias. DNA from four previously cultured and sequenced field isolates from Malawi [16] were mixed in equal proportions to assess potential amplification bias introduced by sWGA. The DNA concentration for each sample was measured using a picogreen assay (Thermo Fisher Scientific, Waltham, MA) and the samples were mixed in equal proportions. None of the cultured isolates represented polyclonal infections, based on analysis of prior sequencing data using estMOI [17]. The sample was further split into six tubes, three of which underwent direct whole genome sequencing and the other three which underwent sWGA, followed by whole genome sequencing, as shown in Figure 4.
MspJI Digestion. MspJI digestion was performed in a 0.2 mL 96-well PCR plate. The reaction mixture contained 1x CutSmart Buffer, 10 mg of bovine serum albumin and 6 units of MspJI (New England Biolabs, Ipswich, MA). The MspJI digestion control (i.e., MspJI–) contained the same buffers but excluded the enzyme. 25 mL of sample DNA was added to both reaction mixtures (total volume, 30 mL), and reactions were incubated in a thermocycler. Two different incubation protocols were tested, one with a 16 hrs incubation at 37°C, followed heating at 65°C for 20 min to inactivate the enzyme and cooling at 4°C, and a second protocol where the 37°C incubation lasted only 4 hrs.
Vacuum filtration of DNA. Following enzymatic digestion, the entire reaction mixture was transferred to a MultiScreen® PCR Filter Plate (Millipore) and filtered to remove digested DNA fragments using a MultiScreen® Vacuum Manifold with a pressure of -7 inches Hg until the wells were emptied and the filters appeared dry. Filtered samples were reconstituted with 30 mL of water, and the plate was gently agitated for 15 mins. Samples were then transferred to a new plate.
sWGA. Amplification was performed in a 0.2mL 96-well PCR plate. The reaction mixture contained 1x BSA, 1mM dNTPs, 2.5mM of each amplification primer, 1x Phi29 reaction buffer and 30 units of Phi29 polymerase. 17mL of template DNA was added to the reaction mixture (total volume, 50 mL) which was then placed in a thermocycler programmed for a stepdown protocol (35° C for 5 min, 34°C for 10 min, 33°C for 15 min, 32°C for 20 min, 31°C for 30 min, 30°C for 16 hrs), followed by heating at 65°C to inactivate the enzyme and cooling at 4°C. We used the same primers as Oyola et al [9] (see Supplementary Material).
qPCR. Quantitative PCR of the human actin gene and P. falciparum 18S rRNA gene was used to estimate the amount of human and parasite DNA, respectively, before and after sWGA. The reaction mixture contained QuantiTech 2x QT Multiplex Master Mix, 10 mM of each of the primers and 1.5 mL of template DNA (total volume, 10 mL). A two-tailed Mann-Whitney U test was used to estimate differences between different experimental conditions (e.g. MspJI-sWGA, MspJI–-sWGA, sWGA).
Whole genome sequencing. Genomic DNA libraries were constructed for sequencing using the KAPA Library Preparation Kit (Kapa Biosystems, Woburn, MA). DNA (500 hg) was fragmented with the Covaris E210 to ~200 bp. Libraries were prepared using a modified version of the manufacturer’s protocol. The DNA was purified between enzymatic reactions and library size selection was performed with AMPure XT beads. Libraries were assessed for concentration and fragment size using the DNA High Sensitivity Assay on the LabChip GX (Perkin Elmer, Waltham, MA). Library concentrations were also assessed by qPCR using the KAPA Library Quantification Kit. Libraries were pooled and sequenced on a 150 bp paired-end Illumina HiSeq 4000 run (Illumina, San Diego, CA).
Data Analysis
Read mapping and coverage. Each dataset was analyzed by mapping raw fastq files to the 3D7 reference genome using Bowtie2 [18]. Bam files were processed according to GATK’s Best Practices workflow to obtain analysis-ready reads [19,20]. Bedtools [21] was used to generate coverage and depth estimates from the processed reads. Differences in the proportion of the genome covered in samples from untreated vs. filtered sWGA were tested using the z-score test for difference in proportions.
Variant calling. GATK’s Best Practices workflow was followed for variant calling [19,20]. Haplotype Caller was used in reference confidence mode to create genomic variant call format (GVCF) files for each sample and joint SNP Calling (GATK v3.7). Variants were removed if they met the following filtering criteria: QD < 2.0, FS > 60.0, MQ < 40.0, MQRankSum < -12.5, ReadPosRankSum < -8.0, QUAL < 50. Variant sites with >20% missing genotypes were additionally removed using vcftools.
Assessment of amplification bias. Whole genome sequence data generated from lab-created isolate mixtures that either underwent sWGA and sequencing or direct sequencing were compared to evaluate the potential for sWGA to introduce amplification bias. The experimental design is illustrated in Figure 4. Reference allele frequencies for each sample were estimated using samtools mpileup and were compared to examine the variability between and within samples from sWGA and non-sWGA groups. Sites with coverage depth lower than 20x were excluded from this analysis. In order to reduce coverage bias, we also eliminated sites that were not covered in all six samples. After applying these filters, 1,786,088 sites remained. The distribution and spearman correlation coefficient (rho) were estimated in R.
FWS, a measure of within-sample diversity, was used to estimate infection complexity for each isolate mixture for comparison between the sWGA and non-sWGA conditions. FWS was estimated using the R package, moimix (https://github.com/bahlolab/moimix). Significance was determined using the Mann-Whitney U test. Only the core P. falciparum genome was used to estimate FWS [22].
The composition of each lab-created mixture was also compared to determine if there were significant differences between the sWGA and non-sWGA groups, implying differential amplification of certain strains over others. To assess potential amplification bias, we compared the proportion of isolate-specific SNPs for each isolate called from WGS data generated from isolate mixtures that did or did not undergo sWGA prior to sequencing. Isolate-specific SNPs were identified by comparing WGS data from each of the four isolates used to create the experimental mixtures. All variant sites with missing alleles in any of the isolates were removed from the analysis. For each isolate mixture, the predominant allele (defined as an allele comprising >70% of reads) at each position was called; if no allele was predominant based on this threshold, the SNP was called missing. The positions of unique SNPs for each isolate were extracted from the sequence data generated from the isolate mixtures. The proportion of unique SNPs from each isolate in the mixture was estimated and compared between the sWGA and non-sWGA groups. Significance was estimated using a z-score test for difference in proportions.