Sequencing of P. falciparum and analysis of allele frequency
High-quality sequence data obtained from 25 P. falciparum clinical isolates(Additional file 1) collected from the West Arsi of Ethiopia enabled the identification of 672,956 biallelic SNPs with less than 10% missing SNPs data and < 10% sample missing data in the individual isolate. All isolates had 95.95% (645715/672,956) SNPs call. Sequences from the intergenic regions had lower read coverage compared to those sequences in the coding regions, and as a result, 78.92% (531120/672,956)of all SNPs called were located within genes. Of 5,058 genes analyzed, 3,370 genes had at least one SNP (Table1, Additional file 2 Figure 1A). About 18517 SNPs were polymorphic marker in at least one sample in Ethiopian (n=25) samples of which 43.4 % (8,037/18,517) were non-synonymous coding SNPs, 22.8% (4,222/18,517) synonymous coding SNPs, 26.6 % (4,932/18,517) in intergenic regions, 3.6% (666/18,517) other intragenic regions and 3.5 % (656/18,517) SNPs in intron region. Similarly, P. falciparum populations from Cambodia (n=46), DR Congo (n=50), Malawi (n=50), and Thailand (n=49) had 32,854, 68,476,79,250 and 30,427 biallelic polymorphic SNPs marker in at least one sample, respectively (Table 1, Additional file 2 Figure 1B). The proportion of non-synonymous coding to synonymous coding and the intragenic to intergenic SNPs were ~2 or above in all populations (Table 1).
In general, all populations had ahigh percentage of non-synonymous coding SNPsat polymorphic marker consistent with previous findings[17].SNPs with minor allele frequency (MAF) <5% were common in all analyzed P. falciparum populations following the exclusion of monomorphic SNPs in each population. Further, SNPs with minor allele frequency of < 5% occurred more frequently in samples from Malawi than in Ethiopian isolates (Additional File 2-Figure 2).
Genomic diversity of P. falciparum infections
FWS scores ranged from 0.837 to 0.997 (mean= 0.97, median = 0.99) for Ethiopian P. falciparuminfections whereas the FWS values in Cambodia ranged from 0.702 to 0.999 (mean = 0.962, median = 0.995), from 0.483 to 0.998 (mean = 0.94, median = 0.994) in Thailand, from 0.321 to 0.998 (mean = 0.94, median = 0.994) in DR Congo and from 0.194 to 0.997 (mean = 0.747, median = 0.762) in Malawi (Figure 1;Additional file 3).
The FWS value of >0.95 suggests that the individual samples predominantly contained a single genotype and could have other additional genotypes in lower proportions.In this study, FWSvalues of >0.95 were observed in 84%, 79.6%, 78%, 50%, and 36 % of samples from Ethiopia, Thailand, Cambodia, DR Congo, and Malawi respectively.
The mean FWS scores of the EthiopianP. falciparum population were not significantly different from Cambodia’s(Welch two Sample t-test, P = 0.42) and Thailand’s (Welch two-sample t-test, p = 0.083) at 95% confidence intervals. However, meanFWSwas significantly higher in Ethiopia compared toDR Congo (Welch two-sample t-test, p = 5.603e-06) and Malawi (Welch two-sample t-test, p = 3.242e-08) at 95% confidence intervals.
Population structure and admixtures
Analysis using PCA revealedthe presence of four clear major population groups of isolates, which were coincident with their geographical origins (Figure 2A-C).
Similarly, the findings from admixture analysis were consistent with the PCA clustering. The isolates from the three regions were distinguished. This admixture analysis showed that four major components could be differentiated with a cluster value of K = 5. Multiple parasite subpopulations were observed in Malawi and DR Congo parasite populations suggestive of gene flow between these two populations(Figure 3).There was no detectable gene flow between the isolates from Ethiopia and East African orSoutheast Asia.
The clustering of Ethiopian P. falciparumisolates was consistent with the fixation index (FST) values with or without correcting for sample size. The FST values of Ethiopian isolates versus those from the two other East African regions (DR Congo and Malawi) ranged from 0.08 to 0.09,whilethe FSTvalue of Ethiopian P. falciparum versus the two southeast Asian regions (Thailand and Cambodia) was 0.18 (Table 2).
Signatures of selection in the P. falciparum isolates
The Ethiopian isolates had the average Tajima’s D value of 0.18 across the entire genome (One sample t-test, p < 2x10-16). 1,450 genes had at least one SNP with TajimaD value > 1of which 125 genes had at least five SNPs with Tajima D values >1 of which 125 genes had at least five SNPs with Tajima D values >1 (Additional file 4). These genes include apical membrane antigen-1 (ama1), erythrocyte binding antigen-175 (eba175), merozoites surface protein-1 (msp1),thrombospondin related anonymous protein ( trap), duffy binding like merozoites surface protein (dblmsp), and cytoadherence linked asexual protein 2 (clag2), that were previously reported for the balancing selection [24,29].
Thestandardized integrated haplotype homozygosity score (IHS)was applied to investigate genome-wide evidence for recent positive directional selection due to drug pressure, immune impact, or other mechanisms. Using |IHS| score of >2.5 (top 1% of the expected distribution) as a threshold for hits, 36 genes with at least one SNP that could be under significant positive selection were identified, and out of these, 15 genes had at least two SNPs (Table 3).
Thirteen (13) out of the above 15 genes under positive directional selection showed both positive balancing and directional selections (Table 4) and these genes include the vaccine candidate gene SURF4.2 on chromosome 4 and CLAG8 (cytoadherence linked asexual protein 8) on chromosome 8[36].
Interestingly, our analysis failed to detect selection signals in drug-resistance genes such as Pfcrt, Pfmdr1, Pfdhfr, and Pfdhps. The reason could be that IHS may not be suitable for detecting positive selection for those SNPs that have reached or are near fixation in the local P. falciparumpopulation[34].
Prevalence of mutations conferring antimalarial drug resistance in P. falciparum
Table 5 shows inter-population differencesin the prevalence of drug resistance genes observed among the P. falciparumglobal datasets analyzed. In tandem with previous studies [20,37]that suggest temporal differences in the geographical distribution of antimalarial drug resistance mutations, we observedthat CQ-resistance alleles (Pfcrt-K76T, Pfcrt-A220S,andPfcrt-Q271E) were fixed in Ethiopia, Cambodia, and Thailand, regions where malaria transmission rates are comparably low. But, the prevalence of these same alleles was0% in Malawi and ranged from 66% to 72% in DR Congo.
Similarly, drug resistance mutations in Pfmdr1 (Pfmdr1-N86Y and Pfmdr1-Y184F) were also variable among populations. For instance, the Ethiopian parasite population showed the presence of 14% Pfmdr1-N86Y and 100% Pfmdr1-Y184F gene mutations, whereas Pfmdr1-N86Ywas detected in 48% of DR Congo isolates and in 3% of Malawi’s. Also, thePfmdr1-Y184F drug resistance marker was detected in 58% of the P. falciparum population in Cambodia, 32% in DR Congo, 35% in Malawi, and 6% in Thailand’s parasite isolates.
Sulfadoxine/pyrimethamine drug resistance mutations were also present in Pfdhfr and Pfdhpsgenes in all analyzed P. falciparum populations. The major pyrimethamine resistance-conferring alleles such as Pfdhfr-N51Iand Pfdhfr-C59R were also identified in all parasite populations with fixed or near fixation in frequency. Pfdhfr-S108N was fixed in other P. falciparum populations, except in Ethiopia. The variable prevalence of drug resistance-conferring alleles werealso observed in Pfdhps(Pfdhps-S436A, Pfdhps-G437A, Pfdhps-K540N, and Pfdhps-A581G), for the parent drug sulfadoxine resistance.
In terms of artemisinin resistance, the African population-specific Pfk13-K189T mutation was observed in Ethiopia (in 20% of the samples), DR Congo (17%), and Malawi (13%). This mutation was previously identified in African P. falciparum populations[20,37]. As previously reported[8]the validated and most characterizedartemisinin resistance-conferring mutation PfK13-C580Y was identified in Cambodia (36% of the samples) as well asin Thailand (26%), but not in Africa.