Population Genetic Analyses Inferred a Limited Genetic Diversity in the Pvama-1 of Plasmodium Vivax Isolates from Khyber Pakhtunkhwa Regions of Pakistan

Background: Plasmodium vivax apical membrane antigen-1 (pvama-1) is an important vaccine candidate. Assessment of the genetic composition of pvama-1 is preliminary important to better plan the vaccine designing strategies based on the antigen. Methods: Blood samples of 84 vivax malaria patients from Khyber Pakhtunkhwa (KP) province of Pakistan were collected. The pvama-1 domain 1 (DI) region was ampli�ed and sequenced. The QC based sequences raw data �ltration was done using DNASTAR package. The downstream population genetic analyses were performed using MEGA4, DnaSP, Arlequin v3.5 and network.5. Results: The data analyses unveiled total 57 haplotypes of pvama-1 DI among 84 KP P. vivax samples with majorly prevalent H-14 and H-5haplotypes. Limited to moderate pairwise genetic distinction was observed among the samples collected from different districts of KP. Likewise, no geography-specic genetic correlation was inferred among KP samples. In context of worldwide available data, the KP samples showed major genetic differentiation against Korean samples with Fst = 0.40915 (P-value =0.0001), while low distinction observed against India and Iranian samples. An excess of low frequency polymorphism and negative Tajima’s D indicate purifying selection signatures across the pvama-1 in KP P.vivax samples. Comparison of KP pvama-1 DI with the reference pvama-1 SalI (AF063138) unveiled total 09 KP samples-specic novel non-synonymous single nucleotide polymorphisms (nsSNPs) including several trimorphic and tetramorphic substitutions. Few of these nsSNPs were mapped within the B-cell predicted epitopic motifs of the pvama-1 and, suggesting to modulate the immune response mechanisms. Conclusion: The genetic composition of the pvama-1 appeared uniform across the KP regions of Pakistan. However, the KP samples exhibited marked genetic distinction in context of worldwide samples. The information may worthy to understand genetic nature of Pakistani P. vivax and implicate in future in pvama-1 vaccine designing.


Background
Malaria is an acute febrile infectious disease caused by vector-borne apicomplexan parasites of the genus Plasmodium.P. vivax and P. falciparum are predominant species responsible for malaria [1], among which P. vivax is the most widely distributed human malaria parasite endemic in tropical and subtropical countries of Asia, South Paci c, Central and South America, Middle East, and North Africa [2].According to the latest WHO report, about 229 million cases and 40,900 deaths occur due to malaria in 2019 [3].
Treatment and control of P. vivax and P. falciparum have become a serious challenge due to drug resistance and lack of a proper vaccine.The wide-range distribution, antigenic variation, relapsing and co-infection led to a collective interest towards the development of effective vaccine against P. vivax [4].The implementation of RTS,s/AS01 in three African countries started in 2019 and this considered as the most effective vaccine against malaria to date.Furthermore, the researchers reported the trial results of R21/Matrixs-M vaccine tested among children in Bukina Faso and reported it as the rst malaria vaccine to meet the WHO's goal up to 77% [5].Several surface antigens of Plasmodium species such as Apical Membrane Antigen 1 (AMA-1), Circumsporozoite proteins (CSP), Merozoite surface proteins (MSP) and Duffy binding protein (DBP) are reported as potent malaria vaccine candidates and many studies have been tried for the candidate antigens to develop productive vaccine [6].
The genetic composition assessments of vaccine candidates' loci are indispensable in modern-age to develop a malaria vaccine.Ample of evidences suggested that AMA-1 of Plasmodium species pama-1 is one of the promising malaria vaccine candidate antigens [7].AMA-1 is a type I integral membrane protein with molecular size of 83 kDa and is mainly expressed in the merozoite and sporozoite stages of Plasmodium parasites [8,9].The main biological function of AMA-1 is not clearly understood yet but its stage-speci c expression and localization suggest its potent crucial role during invasion of erythrocytes and hepatocytes by malaria parasites [10][11][12].The protein consists of cysteine rich ectodomain having three separated domains (i.e.Domain I, II, and III), a conserved cytoplasmic region and a transmembrane region [13].The ectodomain of the protein is highly immunogenic and evokes natural immune responses in patients naturally exposed to P. falciparum and P.vivax [14.15.16.17].The protein ama-1 is also reported to elicit the antibody production that effectively halt the invasion of erythrocytes by malaria parasite and hence confers protective immune responses [16,18], suggesting a leading malaria vaccine candidate.
The domain I of AMA-1 exhibits high level of genetic polymorphism and this region appears to be a major target of anti-AMA-1 protective antibodies [19][20][21][22].It is therefore noteworthy to monitor genetic variation and polymorphism of the vaccine candidate antigen among global malaria isolates circulating in endemic areas, in order to design effective malaria vaccine [23].Several studies about antigenic variation of pvama-1 have been conducted in malaria endemic countries of the World [24][25][26][27][28].However, limited studies are reported about pvama-1 genetic features from Pakistan.Especially, no study till date is reported from remote malaria endemic regions in KP province of Pakistan.The current study was therefore pursued to evaluate the genetic composition of pvama-1 in P. vivax isolates from KP regions of Pakistan (Figure 1).

Ethical approval, blood sample collections and DNA extraction
The current study was approved from Ethical committee of Abdul Wali Khan University Mardan (AWKUM/Biochem/Dept/Commit/eth/18).Blood samples were obtained from 100 consented patients tested positive for P. vivax who presented to different hospitals and private laboratories including Mardan, Swat, Buner, Hangu, Swabi, Kohat, Bannu, Timergara and Peshawar to cover the broad area of KP province, Pakistan (Figure 1).The region has the average annual rain fall of 384 mm during the two seasons from March to May and from August to November, during the time malaria incidences are peaked.The mean temperature in the region ranges from 20℃ to 40℃.The blood samples from the patients were collected prior to treatment, spotted on lters, air-dried, and kept in individual sealed plastic bags at ambient temperature until use.The genomic DNA was extracted from the spotted blood samples using a QIAmp blood kit (Qiagen, CA, USA) according to manufacturer's instructions.The DNA samples were stored at -20°C.
Ampli cation and sequencing of pvama-1 DI A DNA fragment anking the DI of pvama-1 was ampli ed by polymerase chain reaction (PCR) using the speci c primers and ampli cation conditions described previously [21,29].The resulted PCR products were analyzed on 1.5% agarose gel, puri ed, and cloned into the T&A vector (Real Biotech Corporation, Banqiao City, Taiwan).Ligation mixture was transformed into Escherichia coli DH5α competent cells, and positive clones were selected by colony PCR.The nucleotide sequence of cloned insert was analyzed by automatic DNA sequencing with M13 forward and M13 reverse primers (Genotech Inc., Daejoen, Korea).The raw data was ltered for quality assessment using DNASTAR Lasergene package.

Population Genetic Analyses
The DnaSP v6.12 software package was used to estimate parsimony informative sites and haplotypes composition from the KP pvama-1 sequences [30].The population genetics statistics including pairwise xation index (Fst), analysis of molecular variance (AMOVA), haplotype frequencies, and nucleotide diversity based on Nei's net distance (DA) were computed using Arlequin v3.5 [31].The haplotype paradigm was generated by the median-joining method implemented in NETWORK 5.0 [32].The data generated from the median-joining calculation was subjected to a re ne network plot.

Recombination And Linkage Disequilibrium
The recombination parameter (R) between adjacent nucleotides per generation and the minimum number of recombination events (Rm) were calculated by using the DnaSPv6.12[30].Likewise, the linkage disequilibrium (LD) was estimated between the various polymorphic sites based on the R 2 index via v6.12 [33].

Functional Prediction Of Nssnps
The BepiPred -2.0 [34] servers was used for prediction of Linear B-cell epitopes of Pvama-1with a threshold value (0.5) score.The higher score shows the higher binding a nity.The nsSNPs mapping within the top predicted epitopes of pvama-1 was checked.The IURs regions and RBC binding sites within the Pvama-1 have previously been predicted [35,25] and their annotations were utilized to check the novel nsSNPs, being identi ed in current study, mapping within these motifs of pvama-1.

Genetic polymorphic features of KP pvama-1
The 416 bp sequences of pvama-1 anking the DI (322-737 nucleotide positions) were successfully ampli ed in 84 P. vivax KP samples.Comparison of the sequences to the reference sequence, Sal I (AF063138), revealed that the 84 Pakistani pvama-1 classi ed into 57 haplotypes (Figure 2).Analysis of the KP pvama-1 sequences compared to reference Sal I identi ed a large numbers of single nucleotide polymorphisms (SNPs) in KP pvama-1 sequences.Among these, 68 were non-synonymous SNPs (nsSNPs) causing amino acid substitutions including 53 dimorphic, 10 trimorphic, 3 tetramorphic, and 2 pentamorphic.The two pentamorphic amino acid changes were R112K/T/E/S and S228D/N/R/K, The ten trimorphic amino acid substitution included N132D/G, A141E/G, E145A/G, K190E/Q, T191K/P, A199T/V, S209G/C, P210S/L, P223L/S, and V233L/P.While the three tetramorphic amino acid changes are K120R/S/G, E189N/K/G, and E227V/K/G.These amino acid substitutions were observed at varied frequencies in the KP samples.Among the 68 nsSNPS, the 59 have previously been reported in literature for P. vivax isolates from different geographical origins.However, the rest of 9 nsSNPs were speci c to KP samples set (Table 1).These nsSNPs were observed at low frequencies (1.19%).Few nsSNPs such as K120R, N132D, L140I, A141E, K190E, E227V, and S228D were commonly observed with high frequency in KP, as well as some other continental pvama-1 sequences (Table 1).The KP pvama-1 showed overall haplotype diversity (Hd) of 0.978±0.008.A total of 62 segregating sites (S) and 67 mutations were identi ed for the samples.The Fu and Li D's test inferred the effect of natural selection on genetic composition.The negative values of Tajima's D implied an excess of low frequent polymorphism, suggesting the population size expansion (Table 2).

Haplotype Networking Analysis
Total of 57 KP pvama-1 haplotypes were identi ed for the 84 isolates sequences with the haplotype diversity (Hd) of 0.978 (±0.008).The haplotype (H14) was identi ed with high frequency and shared among samples collected from six different KP districts including, Kohat, Hungo, Buner, Swat, Timergara and Bannu.
The haplotype (H5) was identi ed as second predominant haplotype shared among samples collected from ve different KP districts (i.e.Mardan, Swat, Hungo, Bannu and Kohat).The haplotype (H3) was also found with the highest frequency in samples collected from Swat, Mardan, Peshawar and Bannu.The pairwise AMOVA (Analysis of Molecular Variance) inferred the pairwise distances among haplotypes.The haplotype-53, i.e. predominant in Peshawar samples, was identi ed as distinct and showed signi cant genetic differentiation against the haplotype-6 and haplotype-55.The H-6 and H-55 were identi ed with high frequency in samples collected from Mardan and Peshawar regions respectively.The size of each node in haplotype network plot indicates the frequency of a particular haplotype.The length of the line between nodes is proportion to the number of nucleotide substitutions, composing the haplotypes.The majorly shared haplotypes of KP samples, collected from different districts appeared on shared nodes, however, some haplotypes for samples collected from Timergara, Peshawar, Kohat and Hungo districts occupied distinct nodes in the network plot which inferred their distinctive features (Figure 3).
The functional impact of the novel nsSNPs was assessed with respect to amino acids substitution in the IURs motifs of pvama-1.This region considered important in vaccine designing and diagnosis based on pvama-1 [30].None of the residue substituted due to KP samples speci c novel SNPs are mapped within the disordered regions of the pvama-1.The result showed that two SNPs i.e.M171T, V172T mapped within the IURs motifs and four SNPs i.e.R240C, N241D, D242E and W243R were detected in RBC binding region, while most of the amino acid changes caused by nsSNPs were mapped within the predicted B-cell epitopes of pvama-1.Among novel nsSNPs, the G117R, S209G, A212V, and P223L, being identi ed in current study, are mapped within the epitopic region of pvama-1 and predicted to modulate the possible host immune response.The top lead epitopes were predicted based on BepiPred -2.0 threshold score of > 0.5.The region comprises of 240-254 residues have four SNPs (i.e.R240C, N241D, D242E, and W243R) that mapped within the B-cell epitopes as well as RBC binding sites.
The recombination events across pvama-1 and decline of LD index R 2 with the increase of nucleotide distance was identi ed for the KP samples.This speculate high meiotic recombination events across the pvama-1 in the KP samples (Figure 4).The R value for KP samples were observed higher compare to those of East Asian (i.e.China Myanmar boarder, and Korea), South Asia (Sri lanka) samples, while lower than those of the Myanmar samples.The lowest R value for Myanmar samples depicts opportunity of high multiclonal infections, cross fertilization and recombination.The higher values of recombination and rapid LD decay observed in KP and some other geographical samples indicate high meiotic recombination in pvama-1, supporting the recombination as a possible factor to provoke genetic diversity (Table 3).Abbreviations: R a recombinant parameter between adjacent sites, R b recombinant parameter for the whole genes, Rm minimum number of recombinant events.

Nucleotide diversity across pvama-1 in context of global isolates
The sequences of KP isolates (n = were compared to the global pvama-1 sequences deposited in Genbank.The values of K and π observed for KP sequences were more or less similar to previously reported sequences from Iran and India, while differentiated from rest global sequences (Table 2).The xation index Fst statistic was used to the genetic differentiation across pvama-1 gene among KP samples collected from different regions as well as in context of global samples.The pairwise analysis inferred genetic distinction of samples collected from Swabi district compare to rest of the KP regions.The top Fst differentiation was detected between the Bannu and Swabi isolates (Fst = 0.16258, P-value = 0.00977), followed by and (Fst = 0.12932; P-value = 0.04199) samples.The lowest Fst depicted between Swat and Bannu groups (Fst = -0.07427;Pvalue = 0.96973), followed by Swat and Hungo groups (Fst= -0.06635; P-value = 0.89551) (Figure 5A).In context of global samples, marked genetic distinction inferred for KP samples compare to India, Iran, Thailand, Sri-Lanka, Korea, Venezuela, Myanmar, PNG, and China-Myanmar.High genetic differentiation was observed between KP and Korean samples (Fst = 0.40915).The Korean samples showed signi cant genetic distinction in pairwise comparison to rest of the global samples.Meanwhile, least differentiation was observed among Iranian, and Indian samples (Figure 5B).highest pairwise net number of nucleotide variation (DA) and mean pairwise differences (π xy ) was observed between Bannu and Swabi samples (Figure 5a), i.e. congruent to Fst analysis.The highest within population genetic differentiation (π) was found for Korean samples followed by South East Asian samples (Figure 5b).Pearson correlation plot showed relationship among KP, Sri lanka, Iran, India and Myanmar samples, congruent to pairwise Fst (Figure S1).The plot showed correlation among the populations in hierarchical order.However, the Korean samples showed high genetic distinction in term Fst value, probably due to geographical separation.Distinction for Korean samples also depicted in correlation plot.Likewise, the PCA plot also unveiled the samples clustered with more or less fashion (Figure S2).
The AMOVA test was performed to determine genetic differentiation at single and multiple loci because of variation within a population group as well as between population groups.The AMOVA analysis depicted genetic diversity in KP samples set mainly arose due to population differentiation i.e.
instead of among groups differentiation (-0.24%).This indicates limited or genetic differentiation in KP samples despite their geographical distinction across the KP region of Pakistan (Table 4).

Discussion
The comprehensive knowledge about the antigenic variants in Plasmodium parasites is perquisite to design effective vaccine strategies workable in different endemic regions [36].The current study aimed to analyze genetic composition of pvama-1, a leading malaria vaccine candidate antigen, in P. vivax isolates from different districts of KP, Pakistan.
The southern and northern regions of KP province of Pakistan are distinct with respect to geographical and environmental perspectives.However, limited genetic diversity of P. vivax pvama-1 was identi ed in the current study, suggesting no signi cant genetic heterogeneity between the isolates from southern or northern KP.The low genetic diversity across the DI domain of pvama-1 in KP region might be due to low endemicity of Plasmodium genotypes [37], as the low endemic region is generally characterized with limited parasitic genetic diversity [38].The low transmission and endemicity of P. vivax in the KP might have been provoked due to active malaria control program in these regions from last several years.Additionally, the limited genetic diversity speculates that malaria infection in KP region might be monoclonal and may combat with a single type of pvama-1 based immune vaccine.The pair wise genetic analyses indicate the close genetic feature of KP samples to South/Central Asian samples from India and Iran regions.This might be due to close geographical contacts among these countries.The negative values of Tajima's D imply an excess of low frequency polymorphism and indicate the population size expansion.Besides, this clue toward stronger diversifying selection and host immune selection signature across the pvama-1 in KP samples.The Tajima's D indicates the balancing selection event across the pvama-1 in global samples however it was not statistically for the samples set in the current study.
The analyses of KP samples in context of global samples inferred unique genetic features and 9 KP speci c nsSNPs were identi ed in the samples.The genetic polymorphisms (nsSNPs) identi ed in the current study further analyzed with respect to their possible functional consequences in the RBC binding sites, B cell epitopes, and IURs regions of pvama-1.Several nsSNPs were found to be located at the predicted binding sites, B-cell epitopes and IURs region of pvama-1.However, most of these nsSNPs were mapped at B-cell predicted epitopic indicating a high degree of balancing natural selection the B-cell epitopes region of PvAMA-1.The protein structure affected by acid changes due to these nsSNPs may affect the physicochemical perspectives of the PvAMA-1 that might help the parasite to escape from host protective immunity.The IURs play an important role molecular recognition, assembling and protein modi cation [39].The pvama-1 IURs are indispensable for attachment invasion of the parasite into red blood cell [40].Several pvama-1 SNPs were detected in PvAMA-1 IURs region.However, none of KP c nsSNP, identi ed in current study, mapped within the IURs region pvama-1.

Conclusion
The pvama-1 is considered as the promising candidates for malaria vaccine which targeting the blood stage of P. vivax isolates.The partial DNA sequencing and analysis unveiled limited genetic diversity pvama-1 across the KP regions.inferred that pvama-1 based vaccine might be promising to effectively combat and contribute in malaria eradication throughout the KP province of Pakistan.

Supplementary Files
This is a list of supplementary les associated with this preprint.Click to download. ibraretal2021SupplyMaterial.docx

Figure 1 Map
Figure 1 Map of different districts of Khyber Pakhtunkhwa (KP), Pakistan.

Figure 2 Frequency
Figure 2 Frequency of different haplotypes of Pvama-1 in KP region, Pakistan

Figure 3 [
Figure3[A] The pairwise haplotypes difference between the Pakistani pvama-1populations.[B] The Network plots was generated by PopArt to detect haplotypes, in which a circle represents each haplotype, and the size of each circle is proportional to the number of individual samples of haplotype.The lines connecting them re ected the distance between n haplotypes.While the colored shows distinction between population groups.

Figure 4 Patterns
Figure 4 Patterns of linkage disequilibrium (LD) based on the linear regression line: (A, B) illustrate the relationships between the distance between loci (expressed in nucleotides) and |D′| and r 2 , respectively.

Figure 5 (
Figure 5 (a) The graph represents the average number of pairwise differences (πxy), between sampled population groups (Green above diagonal); within-population πxx(orange diagonal) and the net number of nucleotide differences among population's groups (Nei distance DA) (blue below diagonal) based on Pvama-1 gene variants among Pakistani groups.(b) The graph represents the average number of pairwise differences (πxy), between sampled population groups (Green above diagonal); within-population πxx (orange diagonal) and the net number of nucleotide differences among population's groups (Nei distance DA) (blue below diagonal) based on Pvama-1 gene variants in context of other world population.(A) Heat-map plot of pair wise Fst between Pakistani populations based on Pvama-1 gene sequences.(B) Based on Pvama-1 gene sequences, heat map of pairwise Fst among Pakistani population and other world population group.

Table 1
The sixty eights SNPs identi ed in KP P. vivax samples in comparison to the reference pvama1 Sal I (AF063138) sequence * The common nsSNPs identi ed in KP and otherP.vivax samples deposited in Genbank, NCBI.# Novel amino acid polymorphism in KP sample acquired from different region and high and low frequency observed.

Table 2
The neutrality test and genetic polymorphism estimation for Pvama-1 domain-1 DNA sequences of KP-Pakistan and global samples S: number of polymorphic sites (Segregating sites), K: average number of pair-wise nucleotide differences, H: haplotype, Hd: haplotype diversity, π: observed average pair-wise nucleotide diversity, D * (F&L): Fu and Li's D * value, F * (F&L): Fu and Li's F * value.P value < 0.05 is considered as signi cant difference.

Table 3
Comparison of different estimates of recombination in Pvama-1(Domain-1) among KP and global P. vivax samples

Table 4
AMOVA-based genetic differentiation analysis across Pvama-1 (domain-1) in samples acquired from different districts of KP, Pakistan