Genetic polymorphism and natural selection of PfTRAP on Bioko Island
Of the 153 blood samples extracted from our collections on Bioko Island, 121 yielded suitable PfTRAP amplicons for sequencing and deposited to NCBI Genbank (MK981410 - MK981530). Finally, 119 monoclonal sequences were applied in the further analysis while 2 polyclonal sequences (MK981439, MK981530) were excluded.
As the result of genetic polymorphism and natural selection analysis shown in Table 2, on Bioko Island, a total of 87 haplotypes were found among 119 samples, with the haplotype diversity (Hd) for 0.99. Furthermore, nucleotide diversity (π) of Bioko Island was detected as 0.01046, which was a relatively high value among the 14 countries in this analysis. As for the parameters related to natural selection on Bioko, the value of dN-dS was 6.2231 (p < 0.05), which hinted at natural selection. Moreover, the value of Tajima’s D was shown as negative (-0.41438), which signified an excess of low frequency polymorphisms, indicating population size expansion or purifying (negative) selection. To analyze the recombination degree of PfTRAP on Bioko Island, the minimum number of recombination event (Rm) and the probability of recombination between adjacent sites (Rb) was detected as 22 and 119, respectively, which shows relatively high level among the 14 countries or areas included in this analysis.
Table 2
Result of genetic diversity, natural selection and recombination among global PfTRAP sequences
Countries | n | S | H | Hd | π | Tajima’s D | dN-dS | Rm | Rb |
Africa | | | | | | | | | |
Bioko | 119 | 100 | 87 | 0.99 | 0.01046 | -0.41438 | 6.2231* | 22 | 119 |
Congo | 54 | 76 | 51 | 0.997 | 0.01029 | -0.08216 | 6.3986* | 19 | 298 |
Gambia | 34 | 65 | 28 | 0.985 | 0.01025 | 0.04683 | 6.9506* | 16 | 372 |
Ghana | 228 | 87 | 193 | 0.9968 | 0.01075 | 0.47289 | 7.0314* | 27 | 287 |
Guinea | 49 | 73 | 43 | 0.993 | 0.009778 | -0.19876 | 6.6430* | 21 | 162 |
Malawi | 106 | 75 | 84 | 0.993 | 0.01066 | 0.61245 | 6.0216* | 26 | 124 |
Mali | 39 | 67 | 39 | 0.999 | 0.01043 | 0.12807 | 7.4272* | 16 | 339 |
Senegal | 110 | 76 | 64 | 0.977 | 0.01006 | 0.32976 | 7.2140* | 19 | 142 |
Asia | | | | | | | | | |
Thailand | 104 | 40 | 29 | 0.947 | 0.00546 | 0.61617 | 4.6708* | 14 | 62.4 |
Vietnam | 60 | 40 | 26 | 0.957 | 0.00567 | 0.37851 | 5.0222* | 12 | 45.2 |
Myanmar | 45 | 40 | 22 | 0.954 | 0.00507 | -0.22333 | 4.8873* | 10 | 40.4 |
Bangladesh | 23 | 53 | 23 | 1 | 0.00769 | -0.34671 | 5.8936* | 14 | 53.7 |
Cambodia | 393 | 47 | 71 | 0.924 | 0.00517 | 0.52199 | 4.8661* | 18 | 35 |
Laos | 42 | 41 | 25 | 0.961 | 0.00614 | 0.21991 | 5.2813* | 11 | 52 |
Footnote: n for number of sequences; S for segregating sites; H number of haplotypes; Hd for haplotype diversity; π for nucleotide diversity; dN for number of non-synonymous substitutions per non-synonymous site; dS for number of synonymous substitutions per synonymous site; Rm for minimum number of recombination event; Rb for the probability of recombination between adjacent sites; * for p < 0.05. |
Global PfTRAP comparative analysis
Not only Bioko Island, but also 7 other African countries and 6 Asian countries were included in the genetic diversity and natural selection analysis of PfTRAP. 1287 global sequences mined from Pf3K database (23 for Bangladesh, 393 for Cambodia, 54 for Congo, 34 for Gambia, 228 for Ghana, 49 for Guinea, 42 for Laos, 106 for Malawi, 39 for Mali, 45 for Myanmar, 110 for Senegal, 104 for Thailand, 60 for Vietnam) and 119 Bioko sequences were applied in the global comparative analysis. According to our result, a total of 161 SNPs were detected from the 1406 PfTRAP sequences (7 SNPs for DomainⅠ, 41 for DomainⅡ, 23 for DomainⅢ, 83 for DomainⅣ, 3 for DomainⅤ and 4 for DomainⅥ, respectively), among them there were 42 SNPs appeared only once. Due to the uncertainty, these single SNPs were not included in the following analysis (original SNP information was shown in Additional file 1). There were 7 SNPs (S91N, K130R, S179N, L287P, N317K, S419P and A489G) popular globally, with percentage over 80%, and notably, there are 60 SNPs unique to Africa, while there were 12 SNPs unique to Asia. Not surprisingly, no mutations were found in some important motifs including WSPCSVTCG in A-domain, metal ion-dependent adhesion site (MIDAS), IQQ motif, A glycoprotein with a cellular recognition function (RGD) and two important points (aa131 and aa162).
As shown in Table 2, both Hd and π show that the polymorphism in Africa is greater than in Asia (except Bangladesh). All the 14 countries and areas in this analysis shown evidences of natural selection with the positive value of dN-dS (p < 0.05). Except for Bioko, Congo, Guinea, Myanmar, Bangladesh, the Tajima’s D of other 9 countries shown positive value, which signified low levels of both low and high frequency polymorphisms, indicating a decrease in population size and/or balancing selection (Table 2). In the sight of the Tajima’s D of the full-length PfTRAP, on Bioko, the value of DomainⅡ is positive and obviously deviated from 0, while most of the rest region got the negative value (Fig. 2). The overall trend is similar in Bioko and Africa, while in Asia, higher positive Tajima’s D was found in N-terminal of DomainⅡ and C-terminal of DomainⅣ (Fig. 2). When turns to the recombination effect, the parameters (Rm and Rb) revealed that more frequent recombination events were taking place in Africa rather than Asia (Table 2).
For further exploration of the relationship of haplotypes, a haplotype network was constructed based on global PfTRAP sequences (excluded repeat region) using NETWORK software. A total of 661 haplotypes were detected among 1406 isolates (Fig. 1), among them, 667 Asian isolates clustered in 136 haplotypes and 739 African isolates clustered in 528 haplotypes, which indicated the higher heterogeneity found among African Plamsodium falciparum population. Generally, the vast majority of haplotypes were limited to one continent (Africa or Asia) and only several haplotypes (Hap_8, Hap_18, Hap_69, Hap_104 and Hap_370) were shared between two continents isolates (Detailed information about haplotypes was shown in Additional file 2 and Additional file 3). In the meantime, we found that the haplotypes of Bioko Island isolates were scattered. Obviously, haplotypes from Asian isolates were distributed relatively concentrated while the haplotypes from African isolates were in greater dispersion with long branch, which indicated that the recent mutations have been more active in Africa than Asia.In order to analyze the genetic differentiation among PfTRAPs from 14 malaria endemic regions, Fixation Index (FST) was calculated and shown in Table 3. According to Sewall Wright rules, FST range from 0 to 0.05 reflects little genetic differentiation; FST range from 0.05 to 0.15 means moderate genetic differentiation; FST range from 0.15 to 0.25 for great genetic differentiation, while FST over 0.25 means extremely high genetic differentiation. According to Table 3, it seems like the TRAP gene from Bioko Island have not so much genetic differentiation with other 7 African countries in this analysis according to the FST between them shows little or moderate level of genetic differentiation (p < 0.05). But the FST between Bioko and Asian countries were over 0.15 (p < 0.05), which indicated a great genetic differentiation. Generally, the differentiation of PfTRAP among African countries is less pronounced, and the same phenomenon was found among Asian countries, but the high degree of genetic differentiation between African and Asian countries cannot be ignored (p < 0.05).
Table 3
Result of Fixation Index (FST) of PfTRAP between populations of 14 countries and 3D7 isolate.
| 3D7 | Bioko | Congo | Gambia | Ghana | Guinea | Malawi | Mali | Senegal | Thailand | Vietnam | Myanmar | Bangladesh | Cambodia |
Bioko | 0.01884 | | | | | | | | | | | | | |
Congo | 0.04111 | 0.03557 | | | | | | | | | | | | |
Gambia | -0.02608 | 0.04804 | 0.02839 | | | | | | | | | | | |
Ghana | -0.00027 | 0.04754 | 0.031 | 0.00462 | | | | | | | | | | |
Guinea | -0.01109 | 0.05875 | 0.03512 | -0.01171 | 0.0105 | | | | | | | | | |
Malawi | 0.04597 | 0.04828 | 0.0117 | 0.04733 | 0.03895 | 0.05173 | | | | | | | | |
Mali | 0.02442 | 0.08476 | 0.06783 | 0.00006 | 0.01863 | 0.00512 | 0.07585 | | | | | | | |
Senegal | 0.01457 | 0.06236 | 0.04189 | -0.0032 | 0.00733 | 0.00253 | 0.05327 | 0.00427 | | | | | | |
Thailand | 0.41365 | 0.18493 | 0.2088 | 0.1872 | 0.17375 | 0.18402 | 0.20101 | 0.20032 | 0.19828 | | | | | |
Vietnam | 0.39716 | 0.1835 | 0.20929 | 0.17677 | 0.17109 | 0.17222 | 0.19248 | 0.18318 | 0.18338 | 0.01441 | | | | |
Myanmar | 0.45549 | 0.20845 | 0.22712 | 0.20579 | 0.19298 | 0.20113 | 0.2189 | 0.2141 | 0.20958 | -0.00322 | 0.01804 | | | |
Bangladesh | 0.32227 | 0.1773 | 0.18833 | 0.15792 | 0.15682 | 0.15563 | 0.18121 | 0.17234 | 0.16962 | 0.03613 | 0.01636 | 0.45549 | | |
Cambodia | 0.37617 | 0.19574 | 0.21869 | 0.18226 | 0.17058 | 0.17901 | 0.20142 | 0.18779 | 0.18695 | 0.02722 | -0.01528 | 0.0316 | 0.0352 | |
Laos | 0.37235 | 0.17651 | 0.19926 | 0.16232 | 0.15973 | 0.15977 | 0.18318 | 0.16318 | 0.16903 | 0.0392 | -0.02372 | 0.03975 | 0.01368 | -0.01303 |
Footnote: The value in bold is statistically significant (p < 0.05). |
Effect prediction of mutations located at immune epitopes
In this study, all the proven T cell epitopes, B cell epitopes, as well as nonsynonymous substitutions were marked and presented in Fig. 3. As we can see, 98 amino acid substitutions were located at the B cell epitopes while 30 substitutions were at T cell epitopes. 24 substitutions located at the overlap region of T cell epitope and B cell epitope. Since previous reports shown that cellular immunity is more dominant than humoral immunity in the PfTRAP-based vaccine clinical trials, the following mutation effect prediction analysis was focus on the variations at T cell epitopes. In Table 4, among the 30 mutations in T cell epitopes, 13 of them were predicted as benign and 12 for possibly damaging. Not surprisingly, the high frequency (> 90%) mutations were all predicted as benign. It is worth noting that there were 5 mutations (I116T, L221I, Y128F, G228V and P299S) predicted as probably damaging and interestingly, all these 5 mutations were mostly or totally distributed in Africa region, with the occurrence frequency ranging from 0–28% worldwide.
Table 4
Information of mutations at T cell epitopes and mutation effect prediction.
Region | Mutations | Frequency in global | Frequency in Bioko | Distribution (Africa%/Asia%) | Polyphen score | ΔΔG |
A-Domain | | | | | | | |
| E 46Q | 24% | 32% | 75/25 | 0.893* | -0.211536 |
E 46R | 0%(7/1406) | 0% | 100/0 | 0.732* | 0.0221299 |
L 49V | 1% | 3% | 100/0 | 0.944* | 1.12894 |
I116S | 13% | 19% | 99/1 | 0.924* | 0.342025 |
I116T | 0%(2/1406) | 0% | 100/0 | 0.965** | -0.386652 |
K119R | 13% | 16% | 52/48 | 0.020 | -0.194173 |
L122I | 2% | 7% | 100/0 | 0.965** | 0.571158 |
S123N | 19% | 7% | 57/43 | 0.000 | -0.26044 |
S123R | 0%(2/1406) | 0% | 50/50 | 0.184 | -0.349001 |
Y128F | 21% | 28% | 98/2 | 0.965** | 0.770957 |
K130R | 93% | 94% | 49/51 | 0 | 0.0703724 |
T134S | 37% | 24% | 43/57 | 0.463* | 0.705496 |
TSR | | | | | | | |
| A219E | 2% | 0% | 8/92 | 0.122 | -0.166547 |
A219T | 2% | 8% | 87/13 | 0.954* | -0.345222 |
E221A | 0%(6/1406) | 1% | 100/0 | 0.924* | 0.128533 |
G228V | 3% | 3% | 100/0 | 0.997** | 0.411189 |
DⅣ | | | | | | | |
| R285G | 1% | 0% | 100/0 | 0.663* | 1.3286 |
R285S | 1% | 0% | 100/0 | 0.663* | 1.16481 |
L287P | 97% | 95% | 51/49 | 0.000 | 0.546262 |
R290W | 29% | 61% | 72/28 | 0.001 | 0.336084 |
D297H | 54% | 29% | 47/53 | 0.000 | -0.336216 |
D297E | 5% | 8% | 92/8 | 0.000 | -0.352555 |
D297Q | 9% | 14% | 25/75 | 0.000 | -0.524996 |
E298D | 2% | 0% | 100/0 | 0.544* | -0.0333006 |
E298K | 0%(4/1406) | 0% | 100/0 | 0.893* | -1.10525 |
P299S | 1% | 5% | 100/0 | 0.958** | 1.17995 |
S419P | 96% | 90% | 51/49 | 0.000 | -0.8004 |
K421N | 29% | 41% | 90/10 | 0.051 | 1.41483 |
V422I | 0%(3/1406) | 2% | 100/0 | 0.455* | -0.147794 |
TM (DⅤ) | | | | | | | |
| Y526F | 4% | 0% | 0/100 | 0.051 | 0.56536 |
Footnote: Polyphen score with * indicated possibly damaging mutation; Polyphen score with ** indicated probably damaging mutation. Polyphen score without * indicated benign mutation. |
Furthermore, the three dimensions (3D) structure of full-length 3D7 PfTRAP was predicted and modeled by using I-TASSER server and displayed in YASARA application (Fig. 4). Based on the predicted structure of TRAP, we calculated the changes in free energy difference before and after the mutations (ΔΔG), and the result shows that these mutations L49V, R285G, R285S, P299S and K421N would lead to the destabilization because of the obviously increased free energy difference (ΔΔG > 1). Furthermore, the spatial conformational changes of the twelve relatively common (> 10% globally) mutations at T cell epitopes were predicted and presented in Fig. 5. According to it, 7 amino acid substitutions (E46Q, I116S, K119R, S123N, Y128F, L287P and R290W) were identified as would lead to the breakage of hydrogen bonding, which would further impair the stability of the protein structure after these mutations (Fig. 5).