Amplification and sequence analysis of Myanmar pvcsp
A total of 171 Myanmar pvcsp genes were successfully amplified form the genomic DNA samples used in this study. The size of the amplified pvcsp genes ranged from 0.5 to 1.3 kb. Sequence analysis of the amplified pvcsp genes revealed only two variants of pvcsp, VK210 and VK247, in Myanmar pvcsp, but not the P. vivax-like variant. The VK210 variants were prevalent (n = 143, 83.6%) and the frequency of VK247 variants was 16.4% (n = 28). No mixed infection with the two different variants was detected.
Genetic diversity of the N-terminal non-repeat region of Myanmar pvcsp
The N-terminal non-repeat region of Myanmar pvcsp showed a limited range of genetic diversity. Alignment of the deduced amino acid sequences of Myanmar pvcsp revealed 7 distinct haplotypes of VK210 variants and 7 haplotypes of VK247 variants (Fig. 2). The N-terminal non-repeat region of Myanmar VK210 variants was highly conserved. Compared with Sal I (GU339059) sequence, only few amino acid substitutions was found in the latter portion of the N-terminal non-repeat region. Haplotype 1, which was identical to Sal I sequence, was predominant (n = 108, 75.5%). An alanine insertion at the end of the RI conserved motif (KLKQP) was identified in haplotypes 2, 3 and 4. The N86I was found in haplotype 3. Two dimorphic (K91R and P95S) and one trimorphic (K93E/N) amino acid changes were observed in the RI motif of haplotypes 4, 5, 6, and 7 (Fig. 2a). Changes in the amino acid sequences were also identified in VK247 variants of Myanmar pvcsp. Haplotype 1, which shared the same sequence with the PNG (M69059) sequence, was the most prevalent, accounting for 57.1% of 28 Myanmar VK247 variants. All amino acid changes identified in Myanmar VK247 variants were all dimorphic (E96G/A, G99R, and N100D). Furthermore, eight amino acids (97DGAGNQPG104) were detected in haplotypes 6 and 7 (Fig. 2b).
Genetic polymorphisms of the N-terminal non-repeat region in global pvcsp
Overall genetic polymorphisms of the N-terminal non-repeat region in the global pvcsp population were analysed. A comparative analysis of the region revealed that the region is relatively well-conserved in the global pvcsp. Alanine insertion at the end of the RI in the VK210 variants was the major variation identified in the global pvcsp, but the prevalence of the insertion differed by geographically (Fig. 3a). The pvcsp from Sudan showed 100% alanine insertion followed by pvcsp from Cambodia (67.7%), Vanuatu (47.6%), Myanmar (21.7%), Brazil (9.8%), and India (7.6%). No alanine insertion was found in the VK210 sequences identified in Iran, South Korea, and Mexico. Indian VK210 variants showed the highest genetic diversity with amino acid substitutions at 10 positions including E83K/G, K85T/Q, N86K/Y/I, P87A, R88G, N90I, L92V, K93N, Q94H/P and P95G, even though their frequencies were low, less than 15%. Meanwhile, A82T, N86I, N86S, K91R, P95S, and K93E/N showed uneven geographic distribution in the global VK210 variants with very low frequencies. The N-terminal non-repeat region of global VK247 variants were also well-conserved, although low frequencies of uneven amino acid changes were identified (Fig. 3b). The most remarkable variation involved N100D, which was observed in VK247 variants from Colombia (100%), Mexico (100%), Iran (27.3%), and Myanmar (17.9%). The amino acid changes such as E96G/A, A99E, G100R, and N101D showed uneven geographic distribution with low frequencies.
Polymorphic pattern of the CCR in Myanmar pvcsp
The CRR of Myanmar pvcsp showed extreme diversity in both VK210 and VK247 variants. A total of 118 and 23 haplotypes were identified in VK210 and VK247 variants, respectively. As expected, the greatest diversity of Myanmar pvcsp CRR was mainly attributed to differences in numbers, types, and arrangements of PRMs in each haplotype. A total of 47 different types of PRMs have been identified in the CRR of Myanmar VK210 variants (Fig. 4). Among these PRMs, two major types GDRADGQPA and GDRAAGQPA were the dominant ones for VK210 variants. Twenty-seven novel PRMs including GDRVAGQPA, GDRAHGQPA, GDRADGKPA, GDRADRQPA, GDGAGGQAA, EDRAAGQPA, GDKAAGQPA, GDRAAGLPA, GDRADVQPA, GDRADGQPV, GDRADGRPA, GDRADGLPA, GDRADGQPT, GDRAARQPA, GDRAAGRPA, GDGAGGQPA, GDRAAGQSA, SDRADGRPA, GDRAAGQPT, GDRAYGQPA, SDRAAGQPA, RDRADGQPA, GDRASGQPA, GGRADGRPA, GDRADQQPA, GDRADGPPA and GNGADGQPA, which were not previously reported, were found in Myanmar VK210 variants. Interestingly, 109 haplotypes out of 118 VK210 variants were terminated with GNGAGGQAA motif. The number of PRMs consisting CRR of Myanmar VK210 haplotypes varied from 1 to 29. Sequences with 18 PRMs were the most prevalent accounting for 18.9% of 143 Myanmar VK210 sequences (Fig. 5). Compared with global VK210 variants, the CRR of Myanmar VK210 variants showed a high level of length polymorphisms. The VK210 variants from other countries analysed in this study showed only a few (2 to 5) variations in length of polymorphisms in CRR; however, the CRR of Myanmar VK210 variants had 24 different length polymorphisms consistent with various types and different compositions of PRMs. The overall genetic diversity in CRR of Myanmar VK247 variants was much lower than that of Myanmar VK210 variants. A total of 23 VK247 haplotypes, each CRR comprising different numbers and combinations of 8 types of PRMs, were identified (Fig. 6). Five of 8 PRMs, ANGAGNQSG, AYGAGNQPG, VNGAGNQPG, ANGVGNQPG and AYGAGNQPG, identified in Myanmar VK247 variants were novel ones that have not reported previously. The CRR of Myanmar VK247 variants carried different numbers of PRMs ranging from 1 to 22, and the CRR with 2 PRMs was the most prevalent (14.3%) (Fig. 7). Similar to VK210 variants, the Myanmar VK247 variants also displayed higher levels of diversity than the VK247 variants identified from other geographical origins. Sixteen different size polymorphisms of CRR were identified in Myanmar VK247 variants, whereas less size variations (1 to 5) were found in CRR of all other countries analysed in this study.
Genetic diversity in the C-terminal non-repeat region of Myanmar and global pvcsp
Sequence analysis of the C-terminal non-repeat region of Myanmar VK210 variants revealed 27 distinct haplotypes (Fig. 8a). These sequence diversities were attributed to differences in the arrangement of ANKKAEDA octapeptide insertion and GGNA tetrapeptide repeat motifs with different amino acid substitutions throughout the region. The sequence of haplotype 22 was identical with Sal I (GU339059), and accounted for 9.8% of all the VK210 sequences. The octapeptide insertions were observed in haplotypes 1 to 16. All the inserted sequences in these 16 haplotypes were ANKKAEDA except for haplotypes 8 and 13, which contained ANKKAENA and ANKEAENA, respectively. The C-terminal non-repeat region of Myanmar VK247 variants showed a lower level of genetic diversity than that of Myanmar VK210 variants (Fig. 8b). A total of 10 haplotypes were identified, and haplotype 1, which carried a sequence identical to that of the reference sequence of PNG (M69059), was the most prevalent haplotype with a frequency of 46.4%. The most noteworthy polymorphic characteristics identified in the C-terminal non-repeat region of Myanmar VK247 variants were the deletions of GGQAAGGNAANKKAGDAG in haplotype 7 and ANKKAGDAG in haplotypes 8, 9 and 10. Analysis of sequence polymorphisms in the C-terminal non-repeat region of the global pvcsp suggested a high level of genetic diversity. Among VK210 variants, the frequency of ANKKAEDA insertion differed by country (Fig. 9a). All sequences from Iran and South Korea VK210 variants contained an octapeptide insertion, but the frequencies in VK210 variants from Sudan, India, Mexico and Cambodia were 96.7, 89.9, 63.6 and 9.7%, respectively. Interestingly, no insertion of the octapeptide was identified in VK210 variants from Brazil and Vanuatu. The numbers of GGNA motifs in the C-terminal non-repeat region of global VK210 variants also differed by country (Fig. 9b). The number of repeated GGNA motifs found in global VK210 variants ranged from 0 to 6. Similar to VK210 variants, the presence of ANKKAGDAG octapeptide insertion and the number of GGNA motifs also differed in the C-terminal non-repeat region of global VK247 variants. The frequency of ANKKAGDAG octapeptide insertion was high in VK247 variants isolated from Cambodia, Iran, Mexico, and Colombia, but was low in Myanmar VK247 variants (Fig. 9c). The number of GGNA motifs found in the C-terminal non-repeat region of global VK247 variants ranged from 0 to 3 (Fig. 9d). The number of GGNA motifs in global VK247 also differed by country.
Nucleotide diversity and natural selection in the N- and C-terminal non-repeat regions of Myanmar pvcsp
Since the CRR of Myanmar pvcsp sequences showed a high degree of length polymorphisms, the nucleotide diversity and genetic differentiation of the N- and C-terminal non-repeat regions were analysed separately by omitting the CRR. In the N-terminal non-repeat region of Myanmar VK210 variants, the average number of nucleotide differences (K), overall haplotype diversity (Hd), and nucleotide diversity (π) were 0.098, 0.096 ± 0.034, and 0.0024 ± 0.0009, respectively (Table 1). The estimated dN–dS value in the N-terminal non-repeat regions was 0.0008. These results suggested that the N-terminal non-repeat region of Myanmar VK210 variants was under positive natural selection. Meanwhile, the average number of nucleotide differences (K), the overall haplotype diversity (Hd), and the nucleotide diversity (π) for the C-terminal non-repeat region of Myanmar VK210 variants were 0.209, 0.186 ± 0.044, and 0.0035 ± 0.0009, respectively (Table 1). The dN–dS value was –0.0035. These findings indicated that the C-terminal region was influenced by negative natural selection. The Tajima’s D test was also performed to further elucidate the effect of natural selection on the N- and C-terminal non-repeat regions in Myanmar VK210 variants. Tajima’s D values for the N- and C-terminal non-repeat regions were –1.9359 (P < 0.05) and –2.2251 (P < 0.01), respectively (Table 1). The Fu and Li’s D and F values of these regions also showed negative values. In the Myanmar VK247 variants, the average number of nucleotide differences (K), overall haplotype diversity (Hd) and nucleotide diversity (π) of the N-terminal non-repeat region were 0.405, 0.405 ± 0.094 and 0.0090 ± 0.0021, respectively (Table 1). These values for the C-terminal non-repeat region were 0.495, 0.331 ± 0.114 and 0.0067 ± 0.0027, respectively. The dN–dS values of the N- and C-terminal non-repeat regions of Myanmar VK247 variants were 0.0117 and –0.0150, respectively (Table 1). The Tajima’s D values were –0.4445 (P > 0.1) and –1.9719 (P < 0.05) for the N- and C-terminal non-repeat regions, respectively (Table 1). The Fu and Li’s D and F values for both regions were all negative.
Table 1. Genetic polymorphism and tests of neutrality in the N-terminal and C-terminal regions of Myanmar pvcsp
Variant
|
Region
|
n
|
K
|
S
|
Eta
|
H
|
Hd±SD
|
π±SD
|
dN-dS
|
Tajima’s
DP value
|
Fu & Li’s
DP value
|
Fu & Li’s
FP value
|
VK210
|
N-terminal
|
143
|
0.098
|
5
|
6
|
7
|
0.096±0.034
|
0.00238±0.00086
|
0.00075
|
-1.9359c
|
-3.9086b
|
-3.8409b
|
C-terminal
|
143
|
0.209
|
10
|
11
|
11
|
0.186±0.044
|
0.00354±0.00091
|
-0.00353
|
-2.2251d
|
-3.5089b
|
-3.6329b
|
VK247
|
N-terminal
|
28
|
0.405
|
1
|
2
|
3
|
0.405±0.094
|
0.00899±0.00210
|
0.01168
|
-0.4445a
|
-0.7114a
|
-0.7369a
|
C-terminal
|
28
|
0.495
|
6
|
6
|
6
|
0.331±0.114
|
0.00669±0.00268
|
-0.01494
|
-1.9719c
|
-2.5946c
|
-2.8039c
|
n = number of analysed sequences; K = avarage number of nucleotide differences; S = number of segregating sites, Eta = total number of mutations; H = number of haplotypes; Hd = haplotype diversity; π = observed average pairwise nucleotide diversity; SD = standard deviation; dN = rate of non-synonymous mutations; dS rate of synonymous mutations. a P > 0.1; b P < 0.02; c P < 0.05; d P < 0.01
Nucleotide diversity and natural selection in the N- and C-terminal non-repeat regions of global VK210 variants
To further examine the nucleotide diversity and natural selection in the global pvcsp population, the nucleotide diversity of the N- and C-terminal non-repeat regions of global pvcsp was analysed. For the N-terminal non-repeat region of VK210 variants, nucleotide diversity and pattern of natural selection were differed by country (Table 2). VK210 variants from India showed the highest nucleotide diversity; the values of K, Hd, and π were 1.972, 0.661 ± 0.063, and 0.0470 ± 0.0074, respectively. The N-terminal non-repeat region of Cambodia VK210 variants also showed relatively high nucleotide diversity comparable to Myanmar VK210 variants. Substantial nucleotide diversity was found in the VK210 variants from South Korea and Brazil. However, VK210 variants from Iran, Mexico, Sudan, and Vanuatu were genetically well-conserved. The N-terminal non-repeat region of VK210 variants showed different patterns of natural selection by country. The dN–dS value was positive for VK210 variants from Myanmar, Cambodia, and South Korea, which indicated positive natural selection may occur in the region. Meanwhile, the values for VK210 variants from India and Brazil were negative, suggesting negative selection. The values of Tajima’s D for all VK210 variants derived from Myanmar, Cambodia, India, South Korea, and Brazil were negative, indicating that they were under purifying selection. The C-terminal non-repeat region of global VK210 variants also revealed nucleotide diversity and pattern of natural selection (Table 2). Indian VK210 variants showed the greatest nucleotide diversity with K, Hd, and π values of were 0.276, 0.123 ± 0.051, and 0.0043 ± 0.0020, respectively. Meanwhile, no nucleotide diversity was detected in VK210 variants from Cambodia, Iran, Brazil, Mexico, and Vanuatu. Similar to the N-terminal non-repeat region, the C-terminal non-repeat region of global VK210 was also under the effect of purifying selection based on the negative values of Tajima’s D. The values of Fu and Li’s D and Fu and Li’s F were also negative for the C-terminal region of VK210 variants from Myanmar, India, South Korea, and Sudan.
Table 2. Genetic polymorphism and tests of neutrality in the N-terminal and C-terminal non-repeat regions of global VK210 variants
Region
|
Country
|
n
|
K
|
S
|
Eta
|
H
|
Hd ± SD
|
π ± SD
|
dN-dS
|
Tajima’s
DP value
|
Fu & Li’s
DP value
|
Fu & Li’s
FP value
|
N-terminal
|
Myanmar
|
143
|
0.098
|
5
|
6
|
7
|
0.096 ± 0.034
|
0.0024 ± 0.0009
|
0.0008
|
-1.9359c
|
-3.9086b
|
-3.8409b
|
Cambodia
|
31
|
0.125
|
1
|
1
|
2
|
0.125 ± 0.077
|
0.0030 ± 0.0018
|
0.0039
|
-0.7737a
|
0.5907a
|
0.2450a
|
India
|
79
|
1.972
|
23
|
28
|
28
|
0.661 ± 0.063
|
0.0470 ± 0.0074
|
-0.0909
|
-2.0261c
|
-1.3952a
|
-1.9514e
|
Iran
|
39
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
South Korea
|
39
|
0.051
|
1
|
1
|
2
|
0.051 ± 0.048
|
0.0012 ± 0.0011
|
0.0016
|
-1.1264a
|
-1.7662a
|
-1.8293a
|
Brazil
|
41
|
0.049
|
1
|
1
|
2
|
0.049 ± 0.046
|
0.0012 ± 0.0011
|
-0.0059
|
-1.1219a
|
-1.7816a
|
-1.8406a
|
Mexico
|
11
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
Sudan
|
30
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
Vanuatu
|
21
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
C-terminal
|
Myanmar
|
143
|
0.209
|
10
|
11
|
11
|
0.186 ± 0.044
|
0.0035 ± 0.0009
|
-0.0035
|
-2.2251d
|
-3.5089b
|
-3.6329b
|
Cambodia
|
31
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
India
|
79
|
0.276
|
8
|
9
|
6
|
0.123 ± 0.051
|
0.0043 ± 0.0020
|
-0.0149
|
-2.1953d
|
-4.4419b
|
-4.3524b
|
Iran
|
39
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
South Korea
|
39
|
0.103
|
2
|
2
|
3
|
0.101 ± 0.065
|
0.0008 ± 0.0005
|
-0.0037
|
-1.4889a
|
-2.4148e
|
-2.4864e
|
Brazil
|
41
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
Mexico
|
11
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
Sudan
|
30
|
0.067
|
1
|
1
|
2
|
0.067 ± 0.061
|
0.0009 ± 0.0008
|
-0.0041
|
-1.1470a
|
-1.6821a
|
-1.7655a
|
Vanuatu
|
21
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
n = number of analysed sequences; K = avarage number of nucleotide differences; S = number of segregating sites; Eta = total number of mutations; H = number of haplotypes; Hd = haplotype diversity; π = observed average pairwise nucleotide diversity; SD = standard deviation; dN = rate of non-synonymous mutations; dS rate of synonymous mutations. a P > 0.1; b P < 0.02; c P < 0.05; d P < 0.01; e 0.05 < P < 0.1
Nucleotide diversity and natural selection in the N-terminal and C-terminal non-repeat regions of global VK247 variants
The nucleotide diversity and natural selection in the global VK247 variants were analysed (Table 3). Analysis of the N-terminal non-repeat region of VK247 variants revealed the greatest nucleotide diversity in VK247 variants from Iran, with the values of K, Hd, and π of 1.309, 0.436 ± 0.133, and 0.0190 ± 0.0058, respectively. VK247 variants from Cambodia also revealed nucleotide diversity in the region. However, no nucleotide diversity was found in VK247 variants from Mexico and Colombia. The N-terminal non-repeat region of VK247 variants from Iran showed negative dN–dS (–0.0543) and positive Tajima’s D (0.9518), Fu and Li’s D (1.1271), and Fu and Li’s F (1.2185). However, Myanmar and Cambodia VK247 variants revealed positive values for dN–dS and negative values for Tajima’s D, Fu and Li’s D, and Fu and Li’s F. The C-terminal non-repeat region of VK267 variants from Cambodia and Colombia also showed nucleotide diversity, which was lower than that of Myanmar VK247 variants. Similar to Myanmar VK247 variants, the C-terminal non-repeat region of Cambodia VK247 variants showed a negative values of dN–dS (–0.0172), Tajima’s D (–1.4009), Fu and Li’s D (–1.5866), and Fu and Li’s F (–1.7190). However, the dN–dS of Colombia variants was estimated to be positive (0.0017), although the values of Tajima’s D, Fu and Li’s D, and Fu and Li’s F were negative.
Table 3. Genetic polymorphism and tests of neutrality in N-terminal and C-terminal regions of global pvcsp VK247 variants
Region
|
Country
|
n
|
K
|
S
|
Eta
|
H
|
Hd ± SD
|
π ± SD
|
dN-dS
|
Tajima’s
DP value
|
Fu & Li’s DP value
|
Fu & Li’s
FP value
|
N-terminal
|
Myanmar
|
28
|
0.405
|
1
|
2
|
3
|
0.405± 0.094
|
0.0090 ± 0.0021
|
0.0117
|
-0.4445a
|
-0.7114a
|
-0.7369a
|
Cambodia
|
10
|
0.2
|
1
|
1
|
2
|
0.200 ± 0.154
|
0.0029 ± 0.0022
|
0.0038
|
-1.1117a
|
-1.2434a
|
-1.3467a
|
Iran
|
11
|
1.309
|
3
|
3
|
2
|
0.436 ± 0.133
|
0.0190 ± 0.0058
|
-0.0543
|
0.9518a
|
1.1271a
|
1.2185a
|
Mexico
|
8
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
Colombia
|
25
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
C-terminal
|
Myanmar
|
28
|
0.495
|
6
|
6
|
6
|
0.331 ± 0.114
|
0.0068 ± 0.0027
|
-0.0150
|
-1.9719c
|
-2.5946c
|
-2.8039c
|
Cambodia
|
10
|
0.400
|
2
|
2
|
2
|
0.200 ± 0.154
|
0.0039 ± 0.0030
|
-0.0172
|
-1.4009a
|
-1.5866a
|
-1.7190a
|
Iran
|
11
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
Mexico
|
8
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
Colombia
|
25
|
0.500
|
2
|
2
|
3
|
0.44 ± 0.095
|
0.0039 ± 0.0010
|
0.0017
|
-0.1215a
|
-0.6754a
|
-0.6012a
|
n = number of analysed sequences; K = avarage number of nucleotide differences; S = number of segregating sites, Eta = total number of mutations; H = number of haplotypes; Hd = haplotype diversity; π = observed average pairwise nucleotide diversity; SD = standard deviation; dN = rate of non-synonymous mutations; dS rate of synonymous mutations. a P > 0.1; b P < 0.02; c P < 0.05; d P < 0.01; e 0.05 < P < 0.1