Amplification of Bioko PfCSP
Of the 148 blood samples extracted from our collections in Bioko Island, 118 yielded suitable PfCSP amplicons for sequencing. Finally, 96 full-length monoclonal PfCSP were analyzed in this study and 22 polyclonal PfCSP were excluded. As expected, size variations were observed in the amplified PfCSP. The approximate sizes of amplified products varied from 1.1 to 1.2 kb, which was mainly caused by differences in the number of tandem repeats in the central repeat region. These nucleotide sequences have been deposited at GenBank under Accession Numbers (MN623126- MN623221).
Genetic polymorphisms of N-terminal region of Bioko and global PfCSP
The N-terminal non-repeat region was relatively conserved in Bioko PfCSP. Compared with the 3D7 reference sequence (XM_001351086), five variations were found in PfCSP N-terminal region of Bioko parasites including L5F (2.08%, 2/96), R70K (1.04%, 1/96), D82N (1.04%, 1/96), A98G (24%, 23/96) and a 57 bp (encoding 19 amino acids of 80NNGDNGREGKDEDKRDGNN81) insertion (50%, 48/96). A comparative analysis of the N-terminal non-repeat region in global PfCSP also showed that the region is relatively well-conserved in global parasites. As shown in Figure 1A, the 19 amino acids length insertion and A98G were two major variations observed in global PfCSP. Almost all Asian and Oceanian countries showed a high frequency of insertion and A98G (ranging from 80% to 100%), but lower in African and American isolates (ranging from 15% to 79%). Meanwhile, some variations showed uneven geographic distributions and in relatively low frequencies. As shown in Figure 1A, D99G and G100D were only detected from Indian and Iranian parasites with the proportion of approximately 50%.
Genetic polymorphisms of central repeat region of Bioko and global PfCSP
A total of 7 haplotypes of Bioko PfCSP central region was found at amino acid levels (Figure 1B). The number of NANP/NVDP repeats were analyzed and compared among Bioko and global isolates. In Bioko PfCSP, the number of repetitive sequences (NANP/NVDP) were mainly found as 40 (35%, 34/96) and 41 (34%, 33/96). Globally, the number of NANP/NVDP repeat were differed by geographic location. As shown in Figure 1B, repeat number of majority global isolates in this study were ranging from 40 to 43, while the patterns of Philippines, India and Iran were more polymorphic than others.
Genetic polymorphisms and natural selection of the C-terminal non-repeat region in Bioko and global PfCSP
Nucleotide diversity (π) of the C-terminal non-repeat region was analyzed in Bioko and global PfCSP (Figure 2). Both Th2R (314KHIKEYLNKIQNSL327) and Th3R (352NKPKDELDYAND363) region, the proven T-cell epitopes, are in high nucleotide diversity, while the connecting region between Th2R and Th3R was conserved. The pattern of nucleotide diversity in Bioko PfCSP was perfectly matched with other African countries ones. Compared to patterns of Asia, Africa and America, the one of Oceania was in relatively low diversity, especially in Th2R region, which nearly shows no nucleotide diversity (Figure 2).
The parameters associated with nucleotide diversity and natural selection were also evaluated on C-terminus non-repeat region (311-363) of Bioko and global PfCSP (Table 1). The average number of nucleotide diversity (K) of Bioko PfCSP was 5.775 and the overall haplotype diversity (Hd) was 0.962±0.008. The estimated value of dN-dS in Bioko PfCSP was found to be 0.0166 (Table 1). For further analysis of natural selection in the C-terminus of Bioko PfCSP, Tajima’s test and Fu and Li’s test were performed and the result was shown in Table 1. Both Tajima’s D (-0.68556, p>0.1) and Fu and Li’s F and D (-1.23926, p>0.1 and -1.22255, p>0.1, respectively) values were found to be negative.
As for globally situation, Hd of African countries were generally higher than others (Hd>0.9), which verified the higher level of genetic diversity on African PfCSP. The global dN-dS were shown as positive except Nigeria, and global Tajima’s D values were deviation from 0 in different extents. Recombination events were also evaluated among both Bioko and global PfCSP. As shown in Table 1, relative high recombination parameters were shown in all African countries and Philippines, Bangladesh and Venezuela, while lower recombination parameters in other countries.
In terms of amino acid, the mutation types and its frequencies in C-terminus (311-363) were briefly presented in Figure 3. There were totally 26 logos generated, one for 3D7 reference isolate and 25 for isolates from different countries and areas. As for Bioko PfCSP, mutations were detected at twelve positions (314, 317, 318, 321, 322, 324, 327, 352, 356, 357, 359, 361). All these positions were situated at two T-cell epitopes (Th2R and Th3R). The overall pattern of Bioko is similar to those of African countries. Relatively, more kinds of mutations existed in African isolates, as well as in Philippine and Venezuelan isolates. In contrast, the Oceanian mutation patterns were tended to more uncomplicated. Rare mutation L320I was only found in Philippines while S326A was only found in Venezuela. The high frequency mutation, A361E, existed in all 25 countries, while its wild type (A361) was mainly found in Africa. Notably, the wild type residues of 317, 318, and 321 positions were rarely seen in global PfCSP isolates, instead, K317E, E318K, E318Q, N321K were mainly found in these positions (Figure 3).
C-terminus point mutation effect prediction
A total of 28 variances were found in 16 positions of C-terminus of global PfCSP isolates. As shown in Table 2, the mutations K322I, N325Y and S326A were predicted to be deleterious using SIFT program (SIFT < 0.05). According to Humdiv score predicted by PolyPhen 2.0 program, 13 mutants were predicted as benign, 4 mutants were possibly damaging and 11 for probably damaging. Among these probably damaging mutants, the protein structures of K317T, K317A, L327I, N352G, P354S and A361I were tending to destabilize (ΔΔG > 0). Some high frequency mutations such as K317E (84.32%), N321K (84.76%) and A361E (72.43%), were predicted as benign. Some extremely low frequency but predicted damaging mutations like K317A (0.17%), S326A (0.09%), G349D (0.13%) and D356G (0.09%), were lack of persuasion (Table 2).
Population differentiation analysis of PfCSP C-terminus among global P. falciparum isolates
A TCS haplotype network was constructed using 96 samples from Bioko in addition to 2,200 global PfCSP C-terminal monoclonal sequences mining from the Pf3k database and NCBI. The 2296 PfCSP C-terminal sequences were clustered into 138 unique haplotypes (H_1 to H_138). Detailed information of haplotypes was presented in Additional File 2. Fifty-eight haplotypes were shared by PfCSP sequences from at least two different countries; 70 haplotypes were limited to singleton (only composed by 1 sequence). And as for the H_1, which belongs to the 3D7 standard isolate, as well as the component of RTS,S malaria vaccine, only hold 2.08% (2/96) in Bioko isolates and 3.35% (77/2296) in the worldwide isolates, among which 74 isolates were found in Africa. Only H_62 was composed of samples from four continents (Africa, Asia, America and Oceania) but in a low prevalence (24/2296). Interestingly, the isolates from Africa and America shared the same haplotypes or the related ones (H_54, H_131), while the haplotypes of Oceanian isolates (H_35, H_134) have closer relationship with Asian’s. These phenomena correspond to the Fst index results shown in Table 3. As the Table 3 shown, Fst between Bioko Island and African mainland showed no significant population differentiation (Fst=0.00878, p<0.05). Meanwhile, clear population differentiation was identified between American, Asian, Oceanian and African parasite population (p<0.05). Relatively closer genetic relationships were found in African & American parasite population and Asian & Oceanian parasite population (Fst=0.19194, p<0.05 and Fst=0.06564, p<0.05, respectively).