Malaria is a vector borne disease caused by an infection with Plasmodium spp., transmitted via bites of female pregnant Anopheles mosquitos (1). Of the 5 species causing disease in humans, Plasmodium falciparum is responsible for more than 90% of the world malaria mortality, with the greatest burden felt in Sub-Saharan Africa (1).
Malaria pathogenesis depends on multiple socio-economic, host and parasite factors, resulting in varying severity of the disease from mild and asymptomatic to severe and death (2). Some parasite virulence factors are associated with evasion of the host immune response through antigenic variation and prolonged and efficient infection via cytoadherence and formation of rosettes (2),(3).
Variant surface antigens (VSA) represent several protein families synthesized at distinct phases of the parasite blood stage life cycle and subsequently translocated and expressed on the surface of the infected erythrocytes (IEs) (4). Members of these hypervariable surface antigen families are characterized by a high degree of variation between members, are highly antigenic and are found to be associated with a number of important biological processes in malaria pathogenesis including sequestration and cytoadherence. (3)(5). The most well-characterized protein family is the P. falciparum Erythrocyte Membrane Protein1 family (PfEMP1). PfEMP1 protein variants have been shown to be important for parasite survival, transmission, and virulence due to their role in IE cytoadherence in the microvasculature and sequestration in rosettes. Ongoing research focuses on deciphering the associations between PfEMP1 variants, severe malaria, and their common pathogenic mechanisms with a goal of targeting these for therapeutic interventions (9). PfEMP1 sequence classification and diversity characterization could enable the predictions of host receptor interactions and provide foundation for the development of anti-adhesion strategies (9).
The Repetitive Interspersed Protein family (RIFIN; n = 160), have also been studied but with less research conducted than the var gene products. By contract, the Sub-Telomeric Variable Open Reading-frame protein family (STEVOR; n = 40) remains relatively uncharacterised (5). The var genes protein products: PfEMP1 protein family is encoded of approximately 60 var genes per parasite with limited genotypic overlap, enabling the parasite to escape the host immune system and to exhibit different tissue tropisms, particularly in cerebral and placental malaria (6). This remarkable diversity of variants is concentrated in few protein domains such as the Duffy Binding-Like (DBL) and Cysteine-Rich Interdomain region (CIDR) domains, also associated with binding specific endothelial receptors (7). A subgroup of PfEMP1 is found to be often expressed in severe malaria cases and particular var gene subset are linked to adverse clinical outcomes in malaria across all ages and geographical population, with specific var phenotypes associated with cerebral malaria in children (8).
STEVOR is a multi-copy gene family coded on the sub-telomeric region of all P. falciparum chromosomes, except for chromosome 5, with multiple variants per chromosome. A single parasite genome encodes approximately 40 stevor genes each coding for a different variant, with each parasite expressing a single variant within a polyclonal infection (5). The primary linear protein structure of a STEVOR protein is composed of a small signal peptide (SP), followed by a short variable domain (V1), a 5 amino acid long conserved motif encoding a protein trafficking peptide (PEXEL), a semi conserved domain (SC), a large variable domain (V2) flanked by two transmembrane domains (TM), and a short, terminal conserved domain (C), graphically represented in Fig. 1 (10).
Functionally STEVORs are found to be trafficked to the membrane of IE, localizing in proximity to knobs, shown to be involved in the mediation of the formation of rosettes and the cytoadherence of IE in the microvasculature (11),(12). STEVOR protein members differ mostly by their large variable domain (V2) which is the part of the protein that protrudes to the extracellular space and has been shown to be antigenic (10)(13). This high antigenic variability is an adaptive parasitic mechanism to help evade the host immune response during the infection (5)(14). Studies using peptide arrays have shown that there is an age and disease severity dependent seroreactivity and serorecognition to the STEVOR V2 and SC domains (10). Additional studies using recombinant STEVOR V2 and SC proteins have suggested the generation of an age and exposure dependent antibody responses against them, indicating their potential as markers of infection (15). However, to be able to further explore the antibody responses to these antigens and their potential as vaccine candidates or for use in serosurveilance further research is required to comprehensively characterize the diversity of this family in order to express recombinant antigens which will capture the highest level of diversity.
The aim of this study is to generate a library of STEVOR recombinant antigens which represents the P. falciparum STEVOR protein family for future serological studies in exploring potential of STEVORs as markers of P. falciparum infection exposure and potential for disease protection.
Methods
Protein sequence alignment and inspection
A total of 546 STEVOR protein sequences from 14 sequenced isolates were analyzed: 1 reference strain, 6 laboratory strains and 7 clinical isolates, were obtained from the PlasmoDB database, corresponding to all available protein sequence ‘hits’ under the search terms of “STEVOR and Plasmodium falciparum” (16). Sequences from ‘ML01’ and ‘TG01’ laboratory strains were not downloaded due to the large number of corresponding ‘hits’ of 236 and 272, respectively, annotated as STEVOR/RIFIN representing both families. Moreover, ‘ML01’ and ‘TG01’ strains are known to come from complex infections with more than one parasite strain, possibly explaining the large number of sequences under STEVOR/RIFIN annotation. Another 19 sequences were excluded from this database due to wrong annotation, or considerably too long or too short sequences in comparison to the rest, leaving a database of 527 protein sequences. The new protein database was subjected to multiple alignment with Mafft on Windows 10 terminal, using G-INS-I strategy with a global pair pairwise alignment, computed with the Smith-Waterman algorithm (17),(18). This algorithm was chosen as it is specifically tailored for sequences that have global homology (the entire length of each sequence is related to the entire length of every other sequence), as such is the case of STEVORs, and the global pair is aligning the entirety of each two sequences against each other in a pairwise matter, rather than aligning only the portions of the sequences where they best match (19). The alignment included a gap extension penalty for group-to-group with a default value of 0.123, to deter the introduction of gaps unless they are truly needed, and to refine the alignment when dealing with a mixture of closely and distantly related sequences (20). Subsequently, 34 shorter protein sequences with large gaps and/or annotated as ‘pseudogenes’ were excluded, resulting in the final database of 493 STEVOR aligned protein sequences, summarized in Table 1.
Domain sequences isolation
Isolation of the aligned large hypervariable domain (V2), semi-conserved domain (SC) and conserved domain (C), of all 493 STEVOR protein sequences were manually performed briefly as follows. The two transmembrane domains (2xTM) flanking the V2 domain were identified using TMHMM2.0 trans-membrane domain prediction software (21) isolating the approximately 70 amino acids long V2 protein sequences being between 158th and 215th amino acid position, also supported by literature (22). The C domain situated right after the second TM domain was isolated, identified to be 17 amino acids long. The SC domain was isolated by taking the amino acid sequences between the detected 5 amino acid long PEXEL motif and the first TM domain, identified to be approximately 123 amino acids long (23). Amino acid sequences per isolated domain were realigned using the same method outlined above and no further sequences were removed from either domain group. All isolated domains sequences for V2, SC and C domains can be found as Additional files 1, 2 and 3, respectively.
Clustering Model and Variant Library Selection
Amino acid sequences per domain were digitized into binary Boolean vectors. Each amino acid position for each sequence was challenged to be one of the 21 amino acids found in Plasmodium falciparum with the results recorded as present = 1 and not present = 0, resulting in 21 binary vectors, one for each amino acid. (24). All Boolean vectors were then stored into a matrix, which was further subjected to Singular Value Decomposition (SVD), resulting in a distance matrix of Euclidean distances (sPC) between sequences. sPC were used in a Principal Component Analysis (PCA) and visualized as 2D plots. All analysis and data visualization were performed on R 4.3 computational platform. Alignment diversity for each of the isolated domains was calculated once as Shannon Entropy index, measuring the diversity at each position and second as overall mean diversity of the pairwise distances, analyzed using “Biostrings” package, Bioconductor on R 4.3 computational platform (25). The V2 PCA was further subdivided into 9 equal quadrants and 13 sequences were selected to represent the STEVOR protein family, briefly as follows. The selection of sequences was based on (i) their two-dimensional position on the PCA plot and (ii) the geographical location of the strains. In the case of variants from multiple isolates clustering together, the variant coming from a Western sub-Saharan African isolate was selected. Three SC sequences from the P. falciparum reference strain 3D7 found to have potential as markers of infection were also selected to serve as controls for antigenicity (15).
Recombinant Antigens Library Expression
E. coli BL21(DE3) competent cells (Trans, China) were initially transformed with pMJS226 CyDisCo plasmid (University of Oulu, Finland), containing sulfhydryl oxidase (Evr1p), disulfide bond isomerase (PDI) and chloramphenicol (CMP) resistance cassette (26). Transformed cells were grown in LB media with 100 µg/ml CMP and were further transformed with pGEX-5X-1(RBS) plasmids (GenScript, UK) each containing one of the 13 STEVOR V2 variants, or one of the three Pf3D7 STEVOR SC variants, selected to be expressed as N-terminal GST-tag recombinant proteins. Transformed bacteria was grown in ZY-autoinduction media with 100 µg/mL Ampicillin (Amp) and 100 µg/mL CMP, further lysed using LM20 microfluidizer (Analytik LTD, UK). Recombinant proteins were affinity purified in a batch mode using Glutathione Sepharose 4B beads and quantified using Bradford protein quantification assay (Bio-Rad, UK).
To confirm recombinant antigenicity, the proteins were chemically coupled to MagPlex microsphere beads (Luminex, UK), following established protocols (27). Briefly, coupled beads were titrated at 6-point 8-fold dilutions against 5-point 2-fold dilutions of positive control serum pool of malaria hyperimmune Ugandan adults (PRIMS), starting from 1/100 down to 1/1600 serum concentration (15). Data in form of medium fluorescence intensity (MFI) was obtained for each titration point and a 4-parameter logistic regression was applied to each titration curve to obtain the EC50 point on sigmoidal curve. The median EC50 point across all dilutions was selected as the optimum protein coupling concentration for each recombinant.