P. vivax whole genome sequence data and clonality
A total of 499,206 high-quality bi-allelic SNPs were identified in the non-hypervariable regions of the P. vivax genome after filtering, comprising 1,534 isolates from 29 countries. In keeping with previous work20, we divided these countries into 7 sub-regional populations based on the degree of geographic and genetic proximity: East Africa (N = 173), West Africa (1), South America (364), South Asia (156), South East Asia (SEA) (550), Maritime SEA (66), and Oceania (224) (Table S1).
Within-sample diversity, as a metric of multiclonality, was measured using the FWS fixation index, where an FWS ≥ 0.95 indicates an infection predominated by a single genotype. Here, a large proportion of isolates (71.7%) were monoclonal (FWS ≥ 0.95). Regionally, mean FWS was greatest in Maritime SEA (0.96), compared with South Asia (0.92), South America (0.92), East Africa (0.91), SEA (0.91), and Oceania (0.86) (Fig. S1, Table S1). These observations likely reflect trends in transmission intensity, where higher transmission intensity is often marked by higher clone multiplicity.
Global P. vivax isolates form distinct sub-populations to sub-continent level
An analysis of population structure using a principal component analysis (PCA) approach applied to 499,206 bi-allelic SNPs revealed P. vivax isolates form distinct and independent sub-populations, reflective of their continental or sub-continental origins (Fig. 1A). The first principal component separated East Asian and Oceanian populations from South American, South Asian, and African populations, producing three major geographical population centres, with South American populations being the most distinct. The population structure was supported by the corresponding SNP-based neighbour joining tree (Fig. 1B). The number of highly differentiating SNPs (FST ≥ 0.99) was positively correlated with geographic distance (Spearman’s Coefficient = 0.64) (Table S2). These findings suggest gene flow between neighbouring territories is contributing to this phylogeographic distribution and is supported by known waves of human migration27.
An ADMIXTURE ancestral analysis inferred that global P. vivax isolates descend from ten ancestral populations (K1-K10), comprising two in East Africa (K1 and K5), one in South Asia (K1), three in South America (K2, K3, K9), three in SEA (K1, K8, K10), two in Maritime SEA (K4 and K6), and one in Oceania (K7) (Figs. 1C-1D). There was some concordance between ADMIXTURE populations and country of origin, where several ancestral populations were made up of predominantly (> 95%) one country, including K2 (Panama), K3 (Brazil), and K4 and K6 (Malaysia) (Table S1, Fig. S2). Except for K1, which was characterised by African, South Asian, and SEA isolates, all ancestral populations comprised isolates from the same sub-regional grouping. This population structure was supported by a PCA plot coloured by dominant ancestry (Fig. S3).
Pairwise relatedness supports model of isolation by distance
To further dissect the global P. vivax population structure, pairwise relatedness of isolates was inferred by calculating identity-by-descent (IBD), which describes shared evolutionary history. Countries which presented the highest median pairwise IBD include Panama (0.97), Mexico (0.23), and Malaysia (0.19), suggesting reduced outcrossing within these populations (Table S3). While the fractional IBD values of Malaysian and Mexican populations approach the expected value (0.25) of half-siblings in an outbred population, the value for Panama is inflated due to the persistence of a single clone in the region for a decade28. Within the remaining subpopulations, IBD sharing was low (< 6%), revealing that samples were predominantly weakly related, with a minority of very highly related samples (Fig. S4). Regionally, median pairwise IBD was highest in Maritime SEA (0.15) and lowest in SEA (0.01). In agreement with FST observations, there was a moderate correlation between inter-regional median pairwise IBD and geographic distance (Spearman’s Coefficient = -0.39) (Table S4), consistent with the model of isolation by distance29 which posits that populations in closer geographic proximity have increased genetic similarity.
IBD sharing reveals putative selective sweeps at P. vivax drug resistance loci
To investigate patterns of shared ancestry intrachromosomally, we analysed genome-wide IBD fractions calculated across 10 kb sliding windows (Fig. S5). We focused on windows with the top 5% of mean IBD fractions, within or around drug resistance loci, to identify candidates that have most likely experienced selective sweeps. Due to the high genetic relatedness of isolates from Malaysia, Mexico, and Panama (Figs. 1A-1B) and their inflated pairwise IBD values (Fig. S6, Table S3), we excluded them from further analysis. Overall, regions with the highest IBD fractions spanned antigenic loci (pvmsp1, PVP01_0728900; pvmsp5, PVP01_0418400; pvdbp, PVP01_0623800), genes involved in life-cycle specific processes (pvlisp2, PVP01_0304700), and candidate drug resistance loci (pvmrp1, PVP01_0203000; pvdhfr, PVP01_0526600; pvmdr1, PVP01_1010900; pvdhps, PVP01_1429500) (Table S5).
The strongest signals overall (0.35) were observed in Indonesia on chromosome 10 between 320–330 kb (Table S5), a region encompassing two conserved Plasmodium proteins of unknown function (PVP01_1007200, PVP01_1007250), with the former being a pseudogene. This region is of particular interest as it is ~ 140 kb downstream of pvmdr1, the gene putatively responsible for CQR in P. vivax. This peak of IBD sharing at 320–330 kb is also found within a large tract of significant IBD values spanning 200–500 kb on chromosome 10 (Fig. S7, Fig. S8, Table S6). Although peaking at 320–330 kb, the median fractional IBD value of the region was 0.12, and the IBD value at pvmdr1 was 0.11. Interestingly, we found high proportional IBD sharing at this downstream pvmdr1 locus (320–330 kb; PVP01_1007200, PVP01_1007250) in isolates from PNG (0.19), Sudan (0.13), and Brazil (0.11). As with Indonesian isolates, in the Sudanese population, this was found within a tract spanning 280–500 kb, with a median IBD value of 0.17. At pvmdr1 itself, countries with significant IBD values were Sudan (0.26), Peru (0.19), Ethiopia (0.19), and Brazil (0.13).
The pvdhps and pvdhfr genes on chromosomes 14 and 5 respectively are associated with resistance to the combination antimalarial therapy sulfadoxine-pyrimethamine (SP). Although not used to treat vivax malaria, it is used as chemoprevention in regions co-endemic with P. falciparum. We observed a trend of differential IBD values surrounding the genes, where isolates had high fractional IBD for one of the SP-resistance genes, but not the other. Isolates from Eritrea, Indonesia, and Myanmar had much greater fractional IBD values at pvdhps (0.25, 0.22, and 0.21, respectively) than at pvdhfr (0.11, 0.09, < 0.01, respectively) (Table S5).
Finally, we observed significant IBD values in pvmrp1, a gene on chromosome 2 associated with resistance to chloroquine, primaquine, and mefloquine in P. falciparum30 in Vietnamese (0.22), Thai (0.21), Cambodian (0.18), Burmese (0.16), and Colombian (0.14) isolates. We found no evidence of significant IBD sharing in any population at the resistance candidates pvcrt-o (PVP01_0109300), pvpm4 (PVP01_1340900), pvk13 (PVP01_121100), or in the sister genes to pvmrp1 and pvmdr1, pvmrp2 (PVP01_1447300) and pvmdr2 (PVP01_1259100). However, there was a signal from pvaat1 (PVP01_1120000), a gene newly implicated in CQR in P. falciparum25, in isolates from PNG (0.06).
Evidence of selective sweeps in candidate drug resistance loci
A genome-wide scan for the top 1% of genes under recent positive selection in monoclonal isolates was performed using the iHS metric (Table S7). Multiple strong signals (N = 125) were found in the genes encoding the surface proteins pvmsp1 and pvmsp5 (P < 1 x 10− 24), which are under selection pressure due to their interactions with the host immune response. The most significant values were in Afghanistan (P < 1 x 10− 23), Thailand (P < 1 x 10− 14), Cambodia (P < 1 x 10− 11), and Vietnam (P < 1 x 10− 10). Similarly, other hotspots of selection pressure were found in the cytoadherence linked asexual protein (pvclag, Ethiopia (P < 1 x 10− 12), Thailand (P < 1 x 10− 10)), the liver-specific protein 2 (pvlisp2, Afghanistan (P < 1 x 10− 14), Indonesia (P < 1 x 10− 6), Cambodia (P < 1 x 10− 6)), ApiAP2 transcription factors (PVP01_1418100, Cambodia (P < 1 x 10− 7); PVP01_1440600, Cambodia and Vietnam (both P < 1 x 10− 5)), and the invasion proteins pvrbp1a (Cambodia, Indonesia, Pakistan; P < 1 x 10− 10) and pvrbp2b (Afghanistan and India; P < 1 x 10− 5). In agreement with our IBD findings, a series of 18 SNPs (P < 1 x 10− 8) were under selection in Indonesian isolates downstream of pvmdr1 on chromosome 10 (243480–319423), with peaks at a SNP upstream a translation initiation factor (PVP01_1005600, P < 1 x 10− 8) and a SNP in the PVP01_1007200 gene (P < 1 x 10− 7). Isolates from Thailand and Pakistan also had SNPs under selection in PVP01_1007200 (P < 1 x 10− 5).
To identify signals of directional selection, the cross-population metric, Rsb, was used at both a country and regional level. The most common SNPs under differential selection encompassed the cluster of SNPs with significant iHS scores in the Indonesian population downstream of pvmdr1 (N = 248), differentiating Indonesian isolates from those in SEA (P < 1 x 10− 13), East Africa (Eritrea and Ethiopia; P < 1 x 10− 7), South Asia (Afghanistan, India, Pakistan; P < 1 x 10− 10) and South America (Peru; P < 1 x 10− 6) (Fig. 2, Table S8). Beyond SNPs in merozoite surface antigens, we found evidence of selective sweeps via extended haplotype homozygosity at pvmrp1 in isolates from SEA (Cambodia, China, Vietnam, Thailand) when compared to isolates from South Asia (Afghanistan and India, P < 1 x 10− 6). Most of these SNPs were in the promoter region, except for L1361F and 1232I present exclusively in Thai isolates. We also found SNPs within pvmrp2 under differential selection between isolates from Vietnam and China, Cambodia, and India (P < 1 x 10− 6). Similarly, we observed differential signals in pvdhps in SEA isolates (Cambodia, Thailand, Vietnam; P < 1 x 10− 11), and in pvdhfr in South Asian isolates (Afghanistan, India; P < 1 x 10− 7). The SNPs S513R and L845F in MDR1 were under differential selection between Thai isolates compared with Vietnamese (P < 1 x 10− 9) and Cambodian isolates (P < 1 x 10− 6).
Sub-regional differences in the frequency of drug resistance mutations in P. vivax populations
Identifying mutations in putative drug resistance loci can reflect the epidemiology of transmission in both local and global contexts. We therefore evaluated the prevalence of resistance haplotypes (non-synonymous mutations) in all isolates in genes with prior evidence of contributing to antimalarial resistance in P. vivax: pvmdr1 (33), pvmrp1 (53), pvdhfr (17), and pvdhps (20) (Table 1) and several additional genes of interest (Table S9). Multiple pvmdr1 mutations have been previously associated with CQR or reduced chloroquine susceptibility10,14,18, including S513R, G698S, M908L, Y976F, and F1076L, and all, except S513R, are present in the P. vivax PvP01 reference. We observed the frequencies of the putative resistance alleles at 70.9% (698S), 97.9% (908L), 100% (976F), and 73.4% (1076L). The F1076L mutation was observed in all isolates from East Africa and Oceania and had lowest frequencies in South American (81.6%) and SEA (19.4%) isolates. Similarly, the G698S mutation was fixed in isolates from Oceania and Maritime SEA and had lowest frequency in South American (14.0%) isolates. The S513R mutation was found exclusively in isolates from SEA (14.0%), South Asia (37.2%), and East Africa (74.0%). There were lower frequency SNPs with regional specificity, including D500N (South America, 24.2%), V221L (South America, 21.1%), and L845F (Across Asia and Oceania, 9.5%). Only one isolate from Indonesia had a non-synonymous SNP in pvmdr1 (L845F).
Table 1
Prevalence of SNPs in genes that have been associated with drug resistance in P. vivax
Gene | Position | Mutation | East Africa (N = 173) | South Asia (N = 156) | SEA (N = 550) | Maritime SEA (N = 66) | Oceania (N = 224) | South America (N = 364) |
pvdhfr | 1077534 | R58K | | | | | 0.5 | 5.5 |
pvdhfr | 1077535 | R58S | 37.6 | 62.8 | 2.55 | 59.1 | 6.3 | 61.3 |
pvdhfr | 1077711 | N117T | 6.9 | 50.0 | 30.7 | 90.9 | 69.2 | 17.0 |
pvdhfr | 1078180 | N273K | 1.7 | 10.3 | 0.2 | | | |
pvdhfr | 1077878 | I173L | | | | 56.1 | | 8.52 |
pvdhfr | 1077530 | F57I | | 0.64 | 78.0 | 89.4 | 71.0 | 0.5 |
pvdhfr | 1077543 | T61M | | | | 31.8 | 41.5 | 27.5 |
pvdhps | 1270119 | A647V | 24.9 | 4.5 | 0.2 | | | |
pvdhps | 1270401 | A553G | | 10.3 | 29.8 | 86.3 | 10.7 | |
pvdhps | 1270911 | G383A* | 71.1 | 83.3 | 9.6 | 4.5 | 16.1 | 39.8 |
pvdhps | 1270914 | S382C | | | 0.4 | | | 12.9 |
pvdhps | 1270915 | S382A | | | 10.9 | | | |
pvdhps | 1271444 | M205I | 71.7 | 0.6 | 96.2 | 6.1 | 1.8 | 58.5 |
pvdhps | 1271634 | E142G | 64.7 | | | 59.1 | 10.7 | |
pvmdr1 | 479908 | L1076F* | | 1.9 | 19.5 | 1.5 | | 81.6 |
pvmdr1 | 480412 | L908M* | 0.58 | | | | | 8.52 |
pvmdr1 | 480552 | A861E | | 5.8 | 3.6 | | | 0.82 |
pvmdr1 | 480601 | L845F | | 14.7 | 12.0 | 7.6 | 0.4 | |
pvmdr1 | 481042 | S698G | 27.2 | 51.9 | | | | 86.0 |
pvmdr1 | 481595 | S513R | 74.0 | 37.2 | 14.0 | | | |
pvmdr1 | 481636 | D500N | | | | | | 23.6 |
pvmdr1 | 482473 | V221L | | | | | | 21.2 |
pvmrp1 | 154107/8 | A1606V | | | | | | 5.2 |
pvmrp1 | 154168 | H1586Y | | 5.1 | 2.6 | | | 19.8 |
pvmrp1 | 154215/6 | D1570F | | | | | 14.7 | |
pvmrp1 | 154992 | I1478V | 24.3 | 17.9 | 0.2 | | | 9.9 |
pvmrp1 | 154668 | G1419A | 13.4 | 16.7 | | | | 24.7 |
pvmrp1 | 154831 | L1365F | | | | | 17.9 | |
pvmrp1 | 155305 | L1207I | | | 81.5 | | | |
pvmrp1 | 156208 | E906Q | | | | | | |
pvmrp1 | 156563 | E787D | 0.6 | 3.8 | 3.5 | 1.5 | 1.8 | |
pvmrp1 | 158148 | R259I | 35.3 | 27.6 | 0.6 | | | 11.3 |
pvmrp1 | 158223 | T234M | | | 78.4 | | | |
pvmrp1 | 158272 | Y218D | 82.7 | 64.1 | 92.4 | 98.5 | 87.5 | 60.4 |
pvmrp1 | 158545 | V127I | 82.7 | 60.3 | 92.4 | 98.5 | 87.9 | 65.4 |
Allele frequencies were calculated in isolates if, at a country level, their frequency was ≥ 10. Allele frequency is bolded if ≥ 50%. East Africa (Eritrea, Ethiopia, Sudan); South Asia (Afghanistan, India, Pakistan); SEA (Cambodia, China, Myanmar, Thailand, Vietnam); Maritime SEA (Malaysia); Oceania (Indonesia, Papua New Guinea); South America (Brazil, Colombia, Mexico, Panama, Peru). Asterisked SNPs are those where the reference has the putative resistance allele. SNP prevalence was calculated solely for isolates with homozygous alternate calls. |
Mutations in the pvmrp1 orthologue, pfmrp1, decrease susceptibility to chloroquine30. Although there were several SNPs approaching fixation globally (V127I; 81.2%, Y218N, 80.4%), most SNPs showed regional specificity. South American isolates had the greatest number of unique pvmrp1 SNPs including A1606V (5.22%), T1525I (4.4%), and C1018Y (3.3%). Other SNPs with regional specificity of varying frequencies included T234M (SEA, 78.4%), L1365F (Oceania, 17.9%), F560I (East Africa (8.1%) and South Asia (1.9%)). We observed a novel mutation, D1570F, exclusive to Oceanian isolates (Indonesia (16.5%), PNG (10.0%)) (Table 1).
Antifolate (SP) resistance is attributed to mutations in DHFR and DHPS. In pvdhfr and pvdhps, numerous SNPs structurally correspond to resistance conferring mutations in P. falciparum. The pvdhfr S58R and S117T mutations and the pvdhps A383G mutation were present in the PvP01 reference, so we report the frequencies of the wild-type (reference) allele (Table 1). The triple pvdhfr mutants F57L-S58R-S117N (LRN) coupled with the pvdhps double mutant A383G-A553G (GG) are associated with clinical SP failure31. We found 3 instances of the pvdhfr LRN haplotype in isolates from Myanmar, PNG, and Peru (Table S10). The pvdhps GG haplotype was globally more prevalent, found in India (33.3%), Indonesia (1.5%), Malaysia (3.2%), and Thailand (1.9%). Another pvdhfr haplotype of concern is the pvdhfr L/IRMT(F57L/I-S58R-T61M-S117N/T) quadruple mutant, found at moderate prevalence in East Asian and Oceanian isolates: China (16.7%), Malaysia (30.6%), Thailand (65.4%), Myanmar (28.6%), Indonesia (64.9%), and PNG (10.0%). Although we did not observe the pvdhps SAKAV mutant (S382A-A383G-K512E-A553G-V585A)32, we observed the alternate variant, MSAAK (M205I-S382A-A383G-A553G-K512M), exclusive to Thai isolates (4.5%). Finally, although not structurally linked to sulfadoxine resistance, we observed the Q142G mutation in various haplotypes specific to isolates from East Africa (Ethiopia, Eritrea, and Sudan), and the S382C-M205I double mutant specific to isolates from South America (Brazil, Guyana, Panama, and Peru).
Moderate degree of sequence conservation in global mdr1 haplotypes
Haplotype networks were constructed to visualise the global diversity of pvmdr1 and explore haplotypes in low-grade vs. high-grade CQR regions. We observed 169 distinct haplotypes, of which 93 (55.1%) were singletons and 151 (89.3%) had less than 10 observations. A median-joining haplotype network was estimated for the remaining 18 (10.7%) haplotypes identified across 1,238 samples (Fig. 3). Haplotype IV, representing the 598P synonymous mutation, was the most prevalent, present in all geographic subregions (41.1%). Most Oceanian isolates (80.8%) were represented by this major haplotype, indicating that isolates from low and high-grade CQR regions have high pvmdr1 sequence similarity. Globally, 44.0% of isolates had pvmdr1 haplotypes comprised solely of synonymous mutations, suggesting a substantial degree of conservation at pvmdr1. Intra-region estimates of haplotype diversities (h) ranged from 0.33–0.93, with Oceanian isolates having the lowest genetic diversity (h = 0.33), and South Asian isolates with the highest (h = 0.93) (Table S11).
Temporal trends in markers under selection in Indonesian isolates
An ex vivo susceptibility study of P. vivax isolates from Indonesian Papua revealed the persistence CQR between 2005-201833. With conflicting evidence of pvmdr1’s role in mediating CQR and maintained phenotypic CQR, we sought to investigate temporal trends in genome-wide signals of differential selection within the Indonesian population. We divided all monoclonal isolates into two sub-populations: pre-2014 (N = 75) and post-2014 (N = 29), with 2014 being 10 years post-contraindication of chloroquine in Indonesia, and a timeframe we perceive adequate for genotypic changes to occur. Comparing iHS data revealed that the hotspot of SNPs downstream pvmdr1 was present solely in the pre-2014 population (Figs. 4A-B). A scan for the top 1% of genes under differential selection (Rsb) between the pre- and post-2014 populations revealed the novel pvmrp1 mutation D1570F had a significant hit (P < 1 x 10− 8) (Figs. 4A-4C). Pairwise FST revealed that this SNP was the most highly differentiating between the two populations (FST = 0.34; Median FST = 0.027) (Table S12, Fig S6). This SNP first occurred in 2 isolates from 2011, and was present in 58.6% of post-2014 samples, compared with only 20.0% of pre-2014 isolates.
Taking this further, we divided all Indonesian isolates into 5 groups based on year of sample collection (2008-09 (N = 33), 2010-11 (N = 49), 2012-13 (N = 62), 2014-15 (N = 30), and 2016-17 (N = 18)) and applied the same intra- and inter-population selection metrics. Across all groups, hotspots of selection were observed on chromosomes 4 and 7, corresponding to pvmsp5 and pvsmp1, and on chromosome 14, corresponding to proteins expressed in gametocytes (G377, PVP01_1467200) and histone methylation machinery (CARM1, PVP01_142800; JmJC1, PVP01_1430400) (Fig. 4D-F, Table S13). Significant iHS scores at the downstream pvmdr1 locus were observed only in populations 2012-13 and 2014-15 (all P < 1 x 10− 6), indicative of transient directional selection at this locus. Differential selection metrics applied between the divergent 2008-09 and 2016-17 groups (4 years vs. 12 years post-contraindication of chloroquine) revealed SNPs in the sporozoite-invasion associated protein 2 (pvsiap2, P < 1 x 10− 8) and the cysteine-rich protective antigen (pvcyrpa, P < 1 x 10− 7) under differential selection (Table S14). Contrastingly, the FST metric revealed the pvmrp1 SNP L1365F was the most highly differentiating (FST = 0.54) between the two populations, with prevalence of the SNP increasing from 0% in 2008-09 to 38.5% in 2016-17 (Fig. 5).