Mock DBS samples
Mock DBS samples were made by mixing synchronized, ring stage cultured P. falciparum parasites with uninfected human whole blood to obtain a range of parasite densities (10, 100, 1000 and 10000 parasites per μL of blood). DBS samples were stored at -20°C until processing.
DNA extraction and quantitative PCR
DNA was extracted from a single 6mm hole-punch using four different extraction methods. i) Saponin-Chelex as described previously [18]; ii) a modified Tween-Chelex described below; iii) QIAamp DNA Mini Kit (Qiagen, California, United States) following the manufacturer’s recommendations and iv) QIAamp DNA Investigator Kit (Qiagen, California, United States) as described elsewhere [13].
Tween-Chelex extraction was conducted by modifying the Saponin-Chelex extraction method. DBS were punched using a 6mm hole-puncher in to 1.5ml microcentrifuge tubes. 1mL of 0.5% Tween 20 (catalogue # P1379, Sigma Aldrich) in 1X PBS was added into the tube containing DBS punches and incubated overnight at 4°C. The samples were briefly centrifuged, Tween-PBS was removed and the punches were washed with 1mL of 1X PBS and incubated for 30 minutes at 4°C. The samples were briefly centrifuged, PBS removed and 150μL of 10% Chelex 100 resin (catalogue # 1422822, Bio-Rad Laboratories) in water were added to each sample, ensuring the DBS punches were covered with the Chelex solution and incubated for 10 minutes at 95°C. The tubes were centrifuged at 15,000rpm for 10 minutes and the supernatant was transferred to 0.6mL microcentrifuge tubes and centrifuged at 15,000rpm for 5 minutes. The extracted DNA was then transferred to a 96-well plate and stored at -20°C until processing. The parasite densities were confirmed using var-ATS ultra-sensitive qPCR as described previously [19].
McrBC digestion of human DNA
For a subset of samples, extracted DNA was digested with McrBC (catalogue # M0272S, New England Biolabs) in a 30 μl reaction containing 20μL of extracted DNA, 10 units of McrBC, 1X NEBuffer 2, 0.5μl 100X BSA and 0.5μl 100X GTP. The samples were incubated at 37°C for 2 hrs followed by inactivation at 65°C for 20 minutes. The digested products were used as templates for whole-genome amplification.
Selective Whole genome amplification (sWGA)
The sWGA reactions were performed using two sets of primers: 10A [13] and 6A10AD, a combination of 6A [12] and 10A [13] with an adjusted ratio of dNTPs. The sWGA reaction using the 6A10AD primer set was performed as described previously [13] except for the adjusted proportion of nucleotides in the dNTP mix (i.e. 70% AT and 30% GC), similar to the composition of the malaria genome. The master mix and reaction condition for the modified sWGA protocol is shown in Table S1. A template DNA volume of 20uL with an estimated 16-16000 parasite genomes (Table S2) per sWGA reaction was used. For the non-selective whole genome amplification, random hexamer primers were used following the manufacturer's instructions for the GenomiPhi V2 DNA Amplification Kit (catalogue # 45-001-221, GE Healthcare Life Sciences).
Whole-genome sequencing
The whole genome amplification product was purified with SPRI magnetic beads (catalogue #65152105050250) to remove unbound primers, primer dimers, and other impurities. Sequencing libraries were prepared using the NEBNext® Ultra™ II DNA Library Prep Kit (catalogue #E7103) following manufacturer’s instructions. Samples were barcoded, purified, pooled and sequenced on the Illumina NextSeq550 or NovaSeq 6000 System using 150bp paired-end sequencing chemistry. Only preliminary analyses were sequenced using the Illumina NextSeq550.
Sequence analyses
Reads were demultiplexed, filtered by base quality and poly-g tails were clipped. The reads were then aligned to the P. falciparum 3D7 reference genome (version 3) with BWA-MEM with secondary alignments marked [20]. Sample base quality was recalibrated using Genome Analysis Toolkit (GATK) BaseRecalibrator and GATK ApplyBQSR with SNP locations from the Pf3k project (Data Release 5) used as a prior. The number of reads per sample were then downsampled to minimum total reads for equivalent comparisons. Percentiles of genome coverage were calculated using GATK CollectWgsMetrics across the core genome. Dropouts were evaluated by taking mean coverage over 200bp sliding windows across the core genome. Percentage of reads mapping to the core P. falciparum genome was calculated with samtools flagstat [21].
Variant calling and variant-filtering was conducted on the genome sequences following the Genome Analysis Toolkit (GATK) Best Practices [22] with minor modifications. Variants were called by sample across the core genome using gatk4 HaplotypeCaller (-ERC GVCF) then genotyped across all samples using gatk4 CombineGVCFs and gatk4 GenotypeGVCFs. The variants were recalibrated using gatk4 VariantRecalibrator and ApplyVQSR using data from the Pf3k project (Data Release 5) as a training set (annotations: QD, MQ, MQRankSum, ReadPosRankSum, FS, SOR). Variants were then filtered for passing VQSLOD and for biallelic SNPs. SNP concordance was measured with bcftools gtcheck, and gatk GenotypeConcordance. Concordance was measured using all known SNPs as well as with only high-quality non-homozygous SNPs.