Study site, DNA sampling and reference sequences
P. malariae samples from symptomatic patients were collected in 2008 in Rakhine state of Myanmar (N=37) as part of a larger study to compare the effectiveness of five artemisinin-based combination therapy regimen (59). Parasite DNA was extracted using QIAamp DNA mini kit (Qiagen, Germany). All samples were confirmed as P. malariae using 18 small-subunit ribosomal RNA (18S rRNA)-based PCR (60) following the Standards for the Reporting of Diagnostic Accuracy (STARD) (61-63). Ethical approval for the study was obtained from the ethical review board of the Faculty of Tropical Medicine, Mahidol University. Reference sequences for whole genomes of P. malariae UG01, P. falciparum 3D7, P. vivax SAL-1, P. ovale curtisi GH01 and P. knowlesi STRAIN-H were downloaded from PlasmoDB (64) repository (http://plasmodb.org /common/downloads/release-36/). The reference sequence for pmmsp1 gene (Accession no. FJ824669) was obtained from NCBI nucleotide database.
Development of genotyping markers
Identification of genome-wide perfect and imperfect microsatellites was performed using Phobos version 3.3.11 (65, 66). The detection criteria for tandem repeat was restricted to evaluation of perfect and imperfect repeats with unit motifs of 1-10 bp having a minimum threshold repeat number of 14, 7, 5, 4, 4, 4, 4, 4, 4 and 4 for mono-, di- , tri- , tetra- , penta- , hexa- , hepta- , octa- , nonan- and deca-nucleotide microsatellites, respectively. The genome wide distribution of microsatellites in reference sequences were summarized (Table S1).
Sequences of six potential microsatellite genotyping markers together with 150 bp flanking sequences were extracted and primers were designed using BatchPrimer3 version 1.0 (https://probes.pw.usda.gov/cgi-bin/batchprimer3/batchprimer3.cgi). The selection criteria of primers were set as: (i) PCR product should be unique to the target flanking sequence containing the microsatellite; (ii) The variation in length of PCR product should be result of deviation in microsatellite length; (iii) Microsatellites with higher copy numbers, followed by manual inspection of each primer products to avoid selection of primers that could potentially associate with experimentally verified protein coding regions of the parasite or any interspecies cross reactivity (iv) Preference is given to primers located at physically different chromosomes with no linkage disequilibrium. Likewise, during validation steps the factors such as ease of amplification, no starter peaks and absence of non-specific bands were also considered for selection of microsatellite markers. To investigate polymorphisms in pmmsp1 gene, the partial regions of pmmsp1 genes were amplified using semi-nested PCR with specifically designed primer sets (Table 2). Allelic frequencies were determined based on length of consensus sequences for variable number of tandem repeats (VNTR) in semi-conserved regions interspersed between coding blocks (Fig. 1). Only the well-amplified VNTR in majority of samples with copy numbers greater than 2.0 were considered as alleles for INDEL polymorphisms analysis.
Validation of markers using PCR analysis
Primers for microsatellite genotyping markers were labelled with 6-Carboxyfluorescein (6-FAM) and validated using PCR (Table 1) followed by fragment analysis. PCR were performed in Eppendorf Mastercycler® pro (Eppendorf, Germany) with a total volume of 20 μL containing 1× PCR Buffer II (Mg2+ free), 2-3 mM MgCl2, 125 μM dNTPs, 0.25 μM primers and 0.4 U/20 μL of Platinum Taq Polymerase (Invitrogen, USA). Gel electrophoresis was used to detect the amplified products on 3% agarose gel. Fragment analysis of the 6-FAM-labelled PCR products was conducted using gel capillary electrophoresis by Macrogen (Macrogen Inc., South Korea). During fragment analysis, presence of a distinct expected single peak with a minimum of 100 relative fluorescent units (RFU) were accepted as cut-off value. If multiple peaks were detected, then one third height of dominant peak with minimum of corresponding proportionate RFUs were taken as selection threshold for scoring the multiple recessive microsatellite alleles per locus.
For pmmsp1 polymorphisms, semi-nested PCR reaction was conducted (Table 2) to increase sensitivity. Primary and secondary PCR products were generated using corresponding volumes of 20 and 100 µl reaction in the presence of 10 mM Tris-HCl, pH 8.3, 50 mM KCl, 2 mM MgCl2, 125 µM dNTPs, 125 nM of each primers and 0.4 units of Platinum Taq Polymerase (Table 2). Sequences of amplified products were obtained using high-fidelity capillary electrophoresis conducted by Macrogen. Mono infection of P. malariae DNA verified using 18S rRNA-based PCR was taken as positive control. The PCR master mix with nuclease free water instead of parasite DNA was taken as control.
Multiplicity of infections
As the blood-stage malaria parasites are haploid, the presence of multiple peaks during evaluation of fragment size or VNTR analysis for one or more alleles at target locus was inferred as co-infection with two or more genetically distinct variants. This was referred to as multiplicity of infections (MOI) for the same isolate (20, 21, 67). The mean MOI for positive samples was calculated independently for each marker by dividing the total number of P. malariae clones identified by the number of samples PCR positive for the parasite. For microsatellites, the single or predominant locus at each allele was considered for evaluating allele frequencies. The allele fragment size was interpreted using GeneScan™ version 3.1. Additional alleles were scored only if the peak height was at least one third the height of the major peak during fragment analysis. For pmmsp1 gene, sequence data were interpreted using Bioedit version 7.0.4. Allelic frequencies for pmmsp1 gene were determined based on length of consensus sequences for variable number of tandem repeats (VNTR) in semi-conserved regions interspersed between coding blocks (Fig. 1).
Measures of diversity
The expected heterozygosity (HE) was calculated using FSTAT version 1.2 based on previously described formula HE = [n / (n−1)][1− =1p2] where p is the frequency of the ith allele and n is the number of alleles sampled (26). LIAN version 3.7 was used for analyzing overall multilocus linkage disequilibrium (LD) implementing a standardized index of association ( ). LD for candidate genotyping markers with p-values <0.05 was considered as significant (68). Dendogram for microsatellite fragment analysis data was constructed using ClustVis (69). Blastx (https://blast.ncbi.nlm.nih.gov/Blast.cgi?LINK_LOC=blasthome&PAGE_TYPE=BlastSearch&PROGRAM=blastx) without low-complexity filter was used for identification of regions targeted by primers. Tandemly repeated sequences and copy numbers were identified by using TRF version 4.09 (70). The number of haplotypes (H), haplotype diversity (Hd) and pairwise nucleotide diversity (π) were evaluated using DnaSP v5 . Phylogenetic tree was constructed using Mega version 7.0 (71).
Likelihood of coinfections using genetic markers
The likelihood of two infections by same genotype were deduced by combining individual probabilities of two or more unlinked genetic markers and designated as combined probabilities (πPi) , where πPi = P1 × P2 ×…Pi, where Pi=Σp2i is individual probabilities for each markers being utilized. The assumption was based on each infections being independent and the probability of reinfection by same genotype were products of probabilities for individual markers (72).
Sensitivity and specificity
The specificity of primers was assessed on samples from symptomatic P. malariae patients (Table S2) compared to the reference nested-PCR method targeting the 18S rRNA (60). Specificity of all primer products were checked for amplification of unspecific products to access true positive results. The specificity of the markers was tested by two approaches: (i) BLAST analysis of the primers sequences against the NCBI databases and in silico PCR using default settings of UGENE version 1.30 (73) against whole genome reference database (64) of all available human infecting Plasmodium species. All markers did not belong or cross-aligned with any regions of other Plasmodium species. (ii) The samples used were verified for mono-infection of P. malariae pre-confirmed using 18S rRNA-based PCR specific. During PCR validation, all markers were specific to P. malariae only and had no cross reactivity to P. falciparum and P. vivax. (iii) Screening for sequence polymorphisms in the pmmsp1 gene by comparing to reference sequence or fragments analysis of amplified products. Non-specific amplification was not observed for any markers during assessment of qPCR products by gel electrophoresis. For microsatellite markers, inspection of potential contamination was safeguarded by routine insertion of known negative samples in each PCR run to access true negative results. For all six microsatellite markers except one sample for Pm06_506 (1/37), the target amplification was positive for all symptomatic P. malariae patients (N=37), thus making sensitivity and specificity ~100% (Table S3). For primers targeting the pmmsp1 gene polymorphisms, all well-sequenced samples (N=27) exhibited VNTR polymorphisms and non-specific amplification or false positive results were not identified (Table S3). Additional inspection of potential contamination involved cross-comparison of the amplified region with known sequences from P. falciparum and P. vivax. To avoid cross-contamination of samples during DNA addition and PCR processing steps, negative controls consisting of only water were added in each run.