Sample information and PCR amplification product sequencing
Seventy-eight reports (3.1%) with recurrent episodes were retrieved from 2484 vivax malaria cases, drawn from the period 2013-19 for patients, all of whom were living in Yunnan Province (97°31′ E to 106°11′ E; 21°8′ N to 29°15′ N). The majority of patients could be traced to origins in Myanmar (98.7%, 77/78), and the male-to-female ratio was 5 males per 1 female for all within study’s sample. The majority of patients had one time relapse (97.4%, 76/78), while one patient had twice episodes, and another patient had triple episodes. General demographic information and original place of infection for the 78 patients are shown in Table 1 (Additional file 2).
Table 1
Demographic and clinical characteristics of the study cohort
Variable | Total | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 |
Total | 78 | 0 | 14 | 20 | 17 | 9 | 9 | 7 | 2 |
1. Gender | | | | | | | | | |
Male | 65 | 0 | 12 | 17 | 13 | 7 | 9 | 5 | 2 |
Female | 13 | 0 | 2 | 3 | 4 | 2 | 0 | 2 | 0 |
2. Age (in years) | | | | | | | | |
0–20 | 8 | 0 | 1 | 0 | 4 | 3 | 0 | 0 | 0 |
21–60 | 67 | 0 | 12 | 20 | 13 | 5 | 9 | 6 | 2 |
above 60 | 3 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 |
3. Malaria recurrence | | | | | | | | |
1 episode | 76 | 0 | 13 | 20 | 17 | 9 | 8 | 7 | 2 |
2 episodes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
3 episodes | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4. Infection source | | | | | | | | |
Myanmar | 77 | 0 | 13 | 20 | 17 | 9 | 9 | 7 | 2 |
Africa | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Yunnan indigenous | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5. Interval time of recurrence(days) | | | | | | |
Longest | 1561 | 0 | 939 | 882 | 1561 | 426 | 319 | 367 | 268 |
Shortest | 28 | 0 | 54 | 46 | 62 | 80 | 43 | 28 | 264 |
Average | 279 | 0 | 271 | 291 | 292 | 277 | 275 | 285 | 264 |
NOTE. Data are no. (%) of paired samples, unless otherwise indicated. |
A total of 81 relapses occurred across these 78 patients infected with P. vivax, allowing a total of 159 blood samples which were obtained from all reported original infection and recurrent episodes. From these, 156 PCR amplification products (> 1,000 bp in length) of the pvcsp gene were successfully obtained, with a product acquisition rate of 98.1% (156/159). Of them, paired CDS full strands (807–1179 bp in length) of the pvcsp gene were obtained from blood samples in 75 patients (96.2%, 75/78), but only those from 59 patients could be used for homology analysis of the gene sequences (see Fig. 1).
The structural regions of the amino acid chains derived from the 78 CDS strand conversions were completed, including the conserved region near the 5'-end (coding 1st ~ 90th aa of PvCSP) (α-N region), the R I (KLKQP) encoding region, near the 3'-end (coding 276th ~ 393th aa of PvCSP) (TSR) and the highly variable region in pvcsp middle for encoding CRR of PvCSP (96th ~ 275th aa), etc. (see Fig. 2)
Diversity of pvcsp gene and the CRR array of PvCSP
The 121 CDS strands of pvcsp gene obtained from paired blood samples of 59 patients showed 475 variable (polymorphic) loci, comprised of 20 singleton variable sites and 455 parsimony informative sites (Additional file 3), with a nucleotide diversity index (π) equal to 0.1384 (± 0.0056). Among them, 32 alleles were double alleles at positions 112, 113, 233, 234, 240, 261, 264, 270, 274, 282, 295, 309, 327, 347, 354, 426, 491, 507, 511, 534, 545, 552, 572, 579, 615, 684, 742, 761, 769, 805, 892, and 999 (bimodal chart). The sequences from the original infection and relapse strains both call only one of the biallelic bases respectively, usually the type with a strong sequencing signal (Additional file 3). The 32 double alleles were distributed in all seats of the pvcsp gene, but were predominantly concentrated in the CRR region (62.5%, 20/32) (Table 2). Furthermore, 56.3% (18/32) of the polymorphic sites
Table 2
The polymorphism of single nucleotide loci in pvcsp gene CDS chains
Regions | Loci | Alleles of call (Major/Minora) | Coding | Amino acid variation | No. of CDS (n = 127) | Frequency |
α-N b | c.112 | A/G | AAC/GGC | N38G | 2 | 0.0157 |
c.113 | A/G | 2 | 0.0157 |
c.233 | A/G | GAG/GGG | E78G | 2 | 0.0157 |
c.234 | G/T | GAG/GAT | E78D | 20 | 0.1575 |
c.240 | A/G | AAA/AAG | K80K | 18 | 0.1417 |
c.261 | A/C | CCA/CCC | P87P | 14 | 0.1102 |
c.264 | T/G | CGT/CGG | R88R | 4 | 0.0315 |
c.270 | A/T | AAA/AAT | K90N | 6 | 0.0472 |
RIc (91th -95th aa) | c.274 | T/C | TTG/CTG | L92L | 16 | 0.1260 |
c.282 | A/G | CAA/CAG | Q94Q | 2 | 0.0157 |
| c.295 | C/A | CGA/AGA | R99R | 2 | 0.0157 |
| c.309 | G/A | CAG/CAA | Q103Q | 2 | 0.0157 |
CRR (96th -290th aa) | c.327 | A/C | GGA/GGC | G109G | 2 | 0.0157 |
c.354 | A/C | GGA/GGC | G118G | 4 | 0.0315 |
c.426 | C/A | GGC/GGA | G142G | 2 | 0.0157 |
c.491 | G/C | GGT/GCT | G164A | 2 | 0.0157 |
c.507 | A/C | GGA/GGC | G169G | 2 | 0.0157 |
c.511 | G/A | GGA/AGA | G171R | 2 | 0.0157 |
c.518 | C/A | GCT/GAT | A173D | 2 | 0.0157 |
c.534 | C/A | GGC/GGA | G178G | 2 | 0.0157 |
c.545 | A/C | GAT/GCT | D182A | 2 | 0.0157 |
c.552 | T/A | CAT/CAA | Q184R | 2 | 0.0157 |
c.572 | G/C | AGG/AGC | R286S | 2 | 0.0157 |
c.579 | G/A | CAG/CAA | Q193Q | 2 | 0.0157 |
c.615 | A/C | GGA/GGC | G205G | 2 | 0.0157 |
c.684 | A/C | GGA/GGC | G228G | 6 | 0.0472 |
c.742 | C/G | CCA/GCA | P248A | 2 | 0.0157 |
c.761 | C/G | AGC/AGG | G254R | 2 | 0.0157 |
c.769 | C/G | CCA/GCA | P257A | 2 | 0.0157 |
c.805 | A/C | ACC/CCC | P269T | 2 | 0.0157 |
TSRd (291th -393th aa) | c.892 | C/T | CTT/TTT | L298F | 2 | 0.0157 |
c.999 | A/G | AAA/AAG | K333K | 2 | 0.0157 |
a: At the double allelic base in the DNA sequencing peak map, the base with higher wave peak is Major allele and the another base with lower wave peak is Minor allele; b: Named the near N-terminal of PvCSP amino acid chain; c: The coding region of KLKQP five amino acids; d: The C-terminal of PvCSP amino acid chain. |
belonged to the third base of the amino acid codon, and only 27.8% (5/18) of these resulted in amino acid variants. The percentages of the second base and first base were 17.6% (6/34) and 26.5% (9/34) (Table 2), while the highest frequency of the double allele was 0.1575 for 234, the minor allele frequency (MAF) was 0.1417 for 240, and 75.0% (24/32) of the double alleles were present in only one set of paired sequences (Table 2).
Further to the aforementioned, 121 CDS strands were defined as 84 haplotypes (H01 to H84) with a He of 0.9940 (± 0.0040). Of these, only haplotypes H08 and H13 had similar other paired sequences. Haplotypes H05, H50, H51, H63 and H64 had CRR repeat units (PRMs) of the VK247 genotype (Fig. 3B), while the remaining 79 had PRMs characteristic of the VK210 genotype (Fig. 3A).
Among the haplotypes of the VK210 type, there were 39 CRR forms consisting of peptide repeat motifs (PRMs) (Fig. 3A). Of these, there were 15 PRM unit types, with GDRAAGQPA and GDRADGQPA occurring most often, with frequencies of 0.470 (987/2100) and 0.3833 (805/2100), respectively. The remaining 13 PRMs, included the five newly detected PRMs GNRANGQPA (0.0033, 7/2100), GNRANGQAA (0.0001, 1/2100), GDRADGQTA (0.0001, 1/2100), GDRADGHPA (0.0001, 1/2100), and GNGAAGQPA (0.0001, 1/2100) (Fig. 3A). Generally, the CRRs of VK210 type consisted of 14–20 PRMs with 18 being the most common, and 96.8% (38/39) ended with GNGAGGQAA units (Fig. 3A). Especially, the CRRs deduced from pvcsp gene sequence of the paired sample from one imported patient 24 infected in Africa (Additional file 1) were defined as Hap-23 of VK210 type, which did not show any divergent whit Myanmar strains (Fig. 3A).
Of the five haplotypes of type VK247, three types of CRR consisted of 17–21 PRMs (Fig. 3B) in which there were eight unit types of PRMs. Those with the highest frequency of occurrence were ANGAGNQPG (0.7414, 86/116), ANGAGGQAA (0.0517, 6/116), and ANGDDQPGA (0.0172, 2/116) while the remaining two were newly detected PRMs (Fig. 3B).
Comparison of paired blood samples of the pvcsp gene and confirmation of relapse episode
Results from the comparison of the pvcsp gene CDS chains of the 59 paired blood samples showed the paired CDS chains of 31 groups (52.5%, 31/59) had only one haplotype and no variant sites, and the He and V values were both 0. This indicated each of the 31 pairs was homologous and the source of the paired P. vivax was a single clone with complete genetic homology, belonging to the same mosquito-bite inoculated population (Table 3). Subsequent episodes of P. vivax were caused by the activation of hypnozoites from the same population as primary infection strains. The paired blood samples of CDS chains of the other 28 groups (47.5%) had varying numbers of polymorphic sites (1⁓6 loci) between the paired sequences. However, there were two exceptions, at 39 (0.0082, 1/121) and 1027 (0.0082, 1/121), which were true base substitutions (Table 3, Additional file 3), while the remaining sites were all double allelic bases (Table 2). These 28 sequences showed no evidence of DNA fragment insertion (or deletion), suggesting they were heterologous with some base substitutions for each other, but did not experience intra-helical recombination events. Also, their heterologous come from parallel sibling
Table 3
Identification of the homologous between P. vivax strains from every group paired samples based on alignment the pvcsp gene sequence
No. of paired samples | Length of aa chains | Nucleotide diversity (Pi) | Ins or Del in CRR (bp) | Combined variable (polymorphic) sites | Paired strains | Single infection |
No. | α-N (1th -90th aa) | RI (91th -95th aa) | CRR (96th -290th aa) | TSP (291th -393th aa) |
Ⅰ. Paired (groups = 59) |
31 | 364–393 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | com-Homo | Yes |
8 | 340–376 | 0.0009 | 0 | 1a | 234, 240, 261, 264 | 274 | 295 | 0 | non-re-Sibl | Yes |
9 | 340–376 | 0.0018 | 0 | 2b | 234, 240, 261, 270 | 274 | 327, 354, 507, 552, 572, 615, 742 | 999, 1027* | non-re-Sibl | Yes |
4 | 333–386 | 0.0030 | 0 | 3c | 234, 240, 261, 264 | 274, 282 | 491, 511, 545 | 0 | non-re-Sibl | Yes |
3 | 372–376 | 0.0036 | 0 | 4d | 234, 240, 261 | 274 | 426, 518, 534, 742, 761, 769 | 0 | non-re-Sibl | Yes |
3 | 336–375 | 0.0049 | 0 | 5e | 112, 113, 233, 234, 240, 261, 270 | 274 | 309, 579, 684, 805 | 892 | non-re-Sibl | Yes |
1 | 358–368 | 0.0054 | 0 | 6f | 39*, 234, 240, 261, 270 | 274 | 0 | 0 | non-re-Sibl | Yes |
Ⅱ. Any tow CDSs of non-paired samples |
69 and 109** | 372 and 368 | 0.0263 | + 12g | 29f | 240, 247*, 259*, 264 | 0 | 291*,294*,318*,321*,345*, 348*,363*,444*,453*,552, 561*,572,579,588*,599*, 606*,615,642*,783*,798*, 800*,803*,805 | 987*, 1015* | dif-Popu | No |
156 and 110## | 393 and 368 | 0.0245 | + 54g | 27f | 261 | 0 | 291**,294*,318*,321*,345, 348*,363*,364*,372*,373*, 376*,390*,444*,453*,552, 561*,572,579,588*,599*,669*,687*,696*,707*,714*,796* | 0 | dif-Popu | No |
49 and 125&& | 275 and 249 | 0.0204 | +(51 + 27)h | 22f | 240, 261, 264 | 274 | 404*,435*,567*,570*,591*, 594*,597*,598*,621*,645*, 648*,650*,651*,652*,702*, 705*,729*,780* | 0 | dif-Popu | No |
a-e: Existing one locus polymorphic and any two, three, four, five loci polymorphic; f: Existing all loci polymorphic at the same time; g: “+” Including one inserted peptide composed of many amino acids; h: Including two inserted peptides composed of many amino acids; *༚No showing the double peaks at the base site in sequencing peak map. **: Alignment of the two CDS chains of the original infected strains both patient 1 and patient 3. ##: Alignment of the CDS chain of original infected strains from patient 4 whit the CDS chain of relapse infected strains patient 3; &&: Alignment of the two CDS chains of the original infected strains both patient 53 and patient 5. |
strains which were not genetically homologous and belonged to the latter generations bred from the common ancestor inoculated by a time mosquito-biting population (Table 3). Subsequent episodes of P. vivax are still caused by the activation of hypnozoites in the same population as primary infection strains.
However, similarity matching of two randomly selected sequences from the 121 CDS strands of the unpaired samples showed there were significantly more base substitutions between the two sequences. Furthermore, the whole strand showed length polymorphism due to the presence of oligonucleotide strand insertions (or deletions) of different lengths (Table 3). This indicated the two sequences of the unpaired samples were more heterologous and their corresponding genomic donors, P. vivax, were more likely to belong to different populations (Table 3).