Sample information and PCR amplification product sequencing
Seventy-eight reports (3.1%) with recurrent episodes were retrieved from 2484 vivax malaria cases, drawn from the period 2013-19 for patients, all of whom were living in Yunnan Province (97°31′ E to 106°11′ E; 21°8′ N to 29°15′ N). The majority of cases could be traced to origins in Myanmar (98.7%, 77/78), and the male-to-female ratio was 1:5 for all within study’s sample. The majority of cases had one relapse (97.4%, 76/78), while one case had two episodes, and one case had three episodes. General demographic information and original place of infection for the 78 cases are shown in Table 1 (Additional file 2).
Table 1
Demographic and clinical characteristics of the study cohort
Variable | Total | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 |
Total | 78 | 0 | 14 | 20 | 17 | 9 | 9 | 7 | 2 |
1. Gender | | | | | | | | | |
Male | 65 | 0 | 12 | 17 | 13 | 7 | 9 | 5 | 2 |
Female | 13 | 0 | 2 | 3 | 4 | 2 | 0 | 2 | 0 |
2. Age (in years) | | | | | | | | |
0–20 | 8 | 0 | 1 | 0 | 4 | 3 | 0 | 0 | 0 |
21–60 | 67 | 0 | 12 | 20 | 13 | 5 | 9 | 6 | 2 |
above 60 | 3 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 |
3. Malaria recurrence | | | | | | | | |
1 episode | 76 | 0 | 13 | 20 | 17 | 9 | 8 | 7 | 2 |
2 episodes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
3 episodes | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4. Infection source | | | | | | | | |
Myanmar | 77 | 0 | 13 | 20 | 17 | 9 | 9 | 7 | 2 |
Africa | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Yunnan indigenous | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5. Interval time of recurrence(days) | | | | | | |
Longest | 1561 | 0 | 939 | 882 | 1561 | 426 | 319 | 367 | 268 |
Shortest | 28 | 0 | 54 | 46 | 62 | 80 | 43 | 28 | 264 |
Average | 299 | 0 | 271 | 291 | 292 | 277 | 275 | 285 | 264 |
NOTE. Data are no. (%) of paired samples, unless otherwise indicated. |
A total of 81 relapses occurred across these 78 cases infected with Plasmodium vivax, allowing a total of 159 blood samples which were obtained from all reported original infection and recurrent episodes. From these, 156 PCR amplification products (> 1,000 bp in length) of the pvcsp gene were successfully obtained, with a product acquisition rate of 98.1% (156/159). Of them, paired CDS full strands (807–1179 bp in length) of the pvcsp gene were obtained from blood samples in 75 cases (96.2%, 75/78), but only those from 59 cases could be used for homological analysis of the gene sequences (see Fig. 1).
The structural regions of the amino acid chains derived from the 78 CDS strand conversions were completed, including the conserved region near the 5'-end (1st ~ 90th aa) (α-N region), the R I (KLKQP) region, the CRR (96th ~ 275th aa), and the conserved region near the 3'-end (276th ~ 393th aa) (TSR), etc. (see Fig. 2)
Diversity of pvcsp gene and CRR array
The 121 CDS strands of pvcsp gene obtained from paired blood samples of 59 cases showed 475 variable (polymorphic) loci, comprised of 20 singleton variable sites and 455 parsimony informative sites (Additional file 3), with a nucleotide diversity index (π) equal to 0.1384 (± 0.0056). Among them, 32 alleles were double alleles at positions 112, 113, 233, 234, 240, 261, 264, 270, 274, 282, 295, 309, 327, 347, 354, 426, 491, 507, 511, 534, 545, 552, 572, 579, 615, 684, 742, 761, 769, 805, 892, and 999 (bimodal chart). The sequences from the original infection and relapse strains both call only one of the biallelic bases respectively, usually the type with a strong sequencing signal (Additional file 3). The 32 double alleles were distributed in all seats of the pvcsp gene, but were predominantly concentrated in the CRR region (62.5%, 20/32) (Table 2). Furthermore, 56.3% (18/32) of the polymorphic sites
Table 2
The polymorphism of single nucleotide loci in pvcsp gene CDS chains
Regions | Loci | Alleles of call (Major/Minora) | Codings | Amino acid variation | No. of CDS (n = 127) | Frequency |
α-N b | c.112 | A/G | AAC/GGC | N38G | 2 | 0.0157 |
c.113 | A/G | 2 | 0.0157 |
c.233 | A/G | GAG/GGG | E78G | 2 | 0.0157 |
c.234 | G/T | GAG/GAT | E78D | 20 | 0.1575 |
c.240 | A/G | AAA/AAG | K80K | 18 | 0.1417 |
c.261 | A/C | CCA/CCC | P87P | 14 | 0.1102 |
c.264 | T/G | CGT/CGG | R88R | 4 | 0.0315 |
c.270 | A/T | AAA/AAT | K90N | 6 | 0.0472 |
RIc (91th -95th aa) | c.274 | T/C | TTG/CTG | L92L | 16 | 0.1260 |
c.282 | A/G | CAA/CAG | Q94Q | 2 | 0.0157 |
| c.295 | C/A | CGA/AGA | R99R | 2 | 0.0157 |
| c.309 | G/A | CAG/CAA | Q103Q | 2 | 0.0157 |
CRR (96th -290th aa) | c.327 | A/C | GGA/GGC | G109G | 2 | 0.0157 |
c.354 | A/C | GGA/GGC | G118G | 4 | 0.0315 |
c.426 | C/A | GGC/GGA | G142G | 2 | 0.0157 |
c.491 | G/C | GGT/GCT | G164A | 2 | 0.0157 |
c.507 | A/C | GGA/GGC | G169G | 2 | 0.0157 |
c.511 | G/A | GGA/AGA | G171R | 2 | 0.0157 |
c.518 | C/A | GCT/GAT | A173D | 2 | 0.0157 |
c.534 | C/A | GGC/GGA | G178G | 2 | 0.0157 |
c.545 | A/C | GAT/GCT | D182A | 2 | 0.0157 |
c.552 | T/A | CAT/CAA | Q184R | 2 | 0.0157 |
c.572 | G/C | AGG/AGC | R286S | 2 | 0.0157 |
c.579 | G/A | CAG/CAA | Q193Q | 2 | 0.0157 |
c.615 | A/C | GGA/GGC | G205G | 2 | 0.0157 |
c.684 | A/C | GGA/GGC | G228G | 6 | 0.0472 |
c.742 | C/G | CCA/GCA | P248A | 2 | 0.0157 |
c.761 | C/G | AGC/AGG | G254R | 2 | 0.0157 |
c.769 | C/G | CCA/GCA | P257A | 2 | 0.0157 |
c.805 | A/C | ACC/CCC | P269T | 2 | 0.0157 |
TSRd (291th -393th aa) | c.892 | C/T | CTT/TTT | L298F | 2 | 0.0157 |
c.999 | A/G | AAA/AAG | K333K | 2 | 0.0157 |
a: At the double allelic base in the DNA sequencing peak map, the base with higher wave peak is Major allele and the another base with lower wave peak is Minor allele; b: Named the near N-terminal of pvcsp amino acid chain; c: The coding region of KLKQP five amino acids; d: The C-terminal of pvcsp amino acid chain. |
belonged to the third base of the amino acid codon, and only 27.8% (5/18) of these resulted in amino acid variants. The percentages of the second base and first base were 17.6% (6/34) and 26.5% (9/34) (Table 2), while the highest frequency of the double allele was 0.1575 for 234, the minor allele frequency (MAF) was 0.1417 for 240, and 75.0% (24/32) of the double alleles were present in only one set of paired sequences (Table 2).
Further to the aforementioned, 121 CDS strands were defined as 84 haplotypes (H01 to H84) with a He of 0.9940 (± 0.0040). Of these, only haplotypes H08 and H13 had similar other paired sequences. Haplotypes H05, H50, H51, H63 and H64 had CRR repeat units (PRMs) of the VK247 genotype (Fig. 3B), while the remaining 79 had PRMs characteristic of the VK210 genotype (Fig. 3A).
Among the haplotypes of the VK210 type, there were 39 CRR forms consisting of peptide repeat motifs (PRMs) (Fig. 3A). Of these, there were 15 PRM unit types, with GDRAAGQPA and GDRADGQPA occurring most often, with frequencies of 0.470 (987/2100) and 0.3833 (805/2100), respectively. The remaining 13 PRMs, included the five newly detected PRMs GNRANGQPA (0.0033, 7/2100), GNRANGQAA (0.0001, 1/2100), GDRADGQTA (0.0001, 1/2100), GDRADGHPA (0.0001, 1/2100), and GNGAAGQPA (0.0001, 1/2100) (Fig. 3A). Generally, the CRRs of VK210 type consisted of 14–20 PRMs with 18 being the most common, and 96.8% (38/39) ended with GNGAGGQAA units (Fig. 3A).
Of the five haplotypes of type VK247, three types of CRR consisted of 17–21 PRMs (Fig. 3B) in which there were eight unit types of PRMs. Those with the highest frequency of occurrence were ANGAGNQPG (0.7414, 86/116), ANGAGGQAA (0.0517, 6/116), and ANGDDQPGA (0.0172, 2/116) while the remaining two were newly detected PRMs (Fig. 3B).
Comparison of paired blood samples of the pvcsp gene and confirmation of relapse episode
Results from the comparison of the pvcsp gene CDS chains of the 59 paired blood samples showed the paired CDS chains of 31 groups (52.5%, 31/59) had only one haplotype and no variant sites, and the He and V values were both 0. This indicated each of the 31 pairs was homologous and the source of the paired Plasmodium vivax was a single clone with complete genetic homology, belonging to the same mosquito-bite inoculated population (Table 3). Subsequent episodes of Plasmodium vivax were caused by the activation of hypnozoites from the same population as primary infection strains. The paired blood samples of CDS chains of the other 28 groups (47.5%) had varying numbers of polymorphic sites (1⁓6 loci) between the paired sequences. However, there were two exceptions, at 39 (0.0082, 1/121) and 1027 (0.0082, 1/121), which were true base substitutions (Table 3, Additional file 3), while the remaining sites were all double allelic bases (Table 2). These 28 sequences showed no evidence of DNA fragment insertion (or deletion), suggesting they were weakly heterologous for each other, but did not experience intra-helical recombination events. Also, their source pairs were "weakly heterologous" parallel sibling strains
Table 3
Identification of the homologous between Plasmodium vivax strains from every group paired samples based on alignment the pvcsp gene sequence
No. of paired samples | Length of aa chains | Nucleotide diversity (Pi) | Ins or Del in CRR (bp) | Combined variable (polymorphic) sites | Paired strains | Single infection |
No. | α-N (1th -90th aa) | RI (91th -95th aa) | CRR (96th -290th aa) | TSP (291th -393th aa) |
Ⅰ. Paired (groups = 59) |
31 | 364–393 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | com-Homo | Yes |
8 | 340–376 | 0.0009 | 0 | 1a | 234, 240, 261, 264 | 274 | 295 | 0 | non-re-Sibl | Yes |
9 | 340–376 | 0.0018 | 0 | 2b | 234, 240, 261, 270 | 274 | 327, 354, 507, 552, 572, 615, 742 | 999, 1027* | non-re-Sibl | Yes |
4 | 333–386 | 0.0030 | 0 | 3c | 234, 240, 261, 264 | 274, 282 | 491, 511, 545 | 0 | non-re-Sibl | Yes |
3 | 372–376 | 0.0036 | 0 | 4d | 234, 240, 261 | 274 | 426, 518, 534, 742, 761, 769 | 0 | non-re-Sibl | Yes |
3 | 336–375 | 0.0049 | 0 | 5e | 112, 113, 233, 234, 240, 261, 270 | 274 | 309, 579, 684, 805 | 892 | non-re-Sibl | Yes |
1 | 358–368 | 0.0054 | 0 | 6f | 39*, 234, 240, 261, 270 | 274 | 0 | 0 | non-re-Sibl | Yes |
Ⅱ. non-Paired | |
69–109 | 372 and 368 | 0.0263 | + 12g | 29f | 240, 247*, 259*, 264 | 0 | 291*,294*,318*,321*,345*, 348*,363*,444*,453*,552, 561*,572,579,588*,599*, 606*,615,642*,783*,798*, 800*,803*,805 | 987*, 1015* | dif-Popu | No |
156 − 110 | 393 and 368 | 0.0245 | + 54g | 27f | 261 | 0 | 291**,294*,318*,321*,345, 348*,363*,364*,372*,373*, 376*,390*,444*,453*,552, 561*,572,579,588*,599*,669*,687*,696*,707*,714*,796* | 0 | dif-Popu | No |
49–125 | 275 and 249 | 0.0204 | +(51 + 27)h | 22f | 240, 261, 264 | 274 | 404*,435*,567*,570*,591*, 594*,597*,598*,621*,645*, 648*,650*,651*,652*,702*, 705*,729*,780* | 0 | dif-Popu | No |
a-e: Existing one locus polymorphic and any two, three, four, five loci polymorphic; f: Existing all loci polymorphic at the same time; g: “+” Including one inserted peptide composed of many amino acids; h: Including two inserted peptides composed of many amino acids; *༚No showing the double peaks at the base site in sequencing peak map. |
which were not genetically homologous and belonged to the latter generations bred from the common ancestor inoculated by a time mosquito-biting population (Table 3). Subsequent episodes of Plasmodium vivax are still caused by the activation of hypnozoites in the same population as primary infection strains.
However, similarity matching of two randomly selected sequences from the 121 CDS strands of the unpaired samples showed there were significantly more base substitutions between the two sequences. Furthermore, the whole strand showed length polymorphism due to the presence of oligonucleotide strand insertions (or deletions) of different lengths (Table 3). This indicated the two sequences of the unpaired samples were more heterologous and their corresponding genomic donors, Plasmodium vivax, were more likely to belong to different populations (Table 3).