Protein kinase domain identification and assembly - The P. falciparum sequences were collected from our earlier alignment 1, while P. vivax sequences were obtained by searching the predicted P. vivax proteome in PlasmoDB v50b 17 using the term “kinase”. The resultant list was further refined to only include (i) sequences containing a Pfam ID of PF00069 or PF07714 (Protein kinase domain and protein tyrosine kinase, respectively) and (ii) sequences which contained the phrase “protein kinase” in ‘Product description’ data. These sequences were assessed using ScanProsite 18 to identify the protein kinase domain. Two regulatory subunits of CK2 (CK2β, both of which have characterised orthologues in P. falciparum 19), a sequence annotated as “putative protein kinase” but that did not have a recognisable protein kinase domain, and a protein phosphatase, were removed (PVP01_0904500, PVP01_1212400, PVP01_1030400 and PVP01_1406400). Four atypical protein kinases (aPK) were identified: ABCK1 and ABCK2 from the ABC family (PVP01_1430100, PVP01_1334400), and Rio1 and Rio2, (PVP01_1449100, PVP01_ 0529500) all of which have orthologues in P. falciparum 12. Finally, we identified a surprising four members of the phosphatidylinsositol kinase (PIK) family in P. vivax, whereas this family is represented by only three enzymes in P. falciparum. It would be interesting to determine if this additional enzyme is implicated in vivax-specific biology, e.g. preference for reticulocyte or ability to establish dormant forms (hypnozoites) in hepatocytes. These sequences are PVP01_1018600 (orthologous to PfPIK3), PVP01_1024200 (orthologous to PfPIK4), PVP01_0529300 (orthologous to PF3D7_0419900) and PVP01_1309200. A fifth P. vivax PIK-like kinase was initially identified (PVP01_1404700), but its low Hidden Markov Model (HMM) score <50 (as defined by HMMER 3.3 20), and subsequent manual examination of the sequence, indicated this protein is highly divergent from the consensus PIK sequence and hence was not included here. Phylogenetic analysis of the PIKs kinase domains indicated that PVP01_1309200 is divergent (Supplementary Figure 1).We further sought to determine if any additional protein kinases were encoded by P. vivax through a Psi-BLAST search (PlasmoDB’s Beta Blast interface – multiple parallel blast searches). We included the protein kinase domains of P. vivax (identified here), and those of P. falciparum 1 in the search query; however, no significant additional sequences were identified in this way. The typical protein kinase domains (78 in P. vivax [up from the 68 reported in the aforementioned preliminary study 16], 98 in P. falciparum) were initially aligned using Clustal Omega 21 and imported into Jalview 22 for manual alignment adjustments.
Hidden Markov Model (HMM) profiling for assignment of sequences to ePK families – To assist the development of a multiple sequence alignment (MSA) of the Plasmodium protein kinase domains, each sequence was assessed using HMMER 3.3 20 using the defined kinase families reported in Kinomer V1.0 23. Kinomer contains a profile for the AGC, CAMK, CK1, CMGC, RGC, STE, TKL and TYR ePK groups, and also includes a profile for the apicomplexan specific FIKK family 12,24, but does not have a profile for the NEK family (traditionally considered to belong to the OPK group [see above]). In view of the importance of NEKs in all eukaryotes 25 including malaria parasites 26, we designed a NEK profile with 21 known kinases from the NEK family 27 (see Methods for details). Using the amended Kinomer library (now containing a NEK profile), we performed a HMMER scan (hmmscan) of the kinomes and designated each sequence with the top hit based on score and E-value (see Supplementary data 1 for full hmmscan results). Each sequence was assessed to determine if it met the e-value thresholds for group assignment 23. These threshold values differ for each group according to the highest E-value obtained during each groups hmm profile construction AGC (2.7e−7), CAMK (3.2e−14), CK1 (3.2e−5), CMGC (1.2e−7), RGC (4.8e−5), STE (1.4e−6), TK (1.1e−9), TKL (1.7e−12) (see 23 for details). To allow for conservative group assignment we also considered assignments with a bit score <50 to be unreliable. This conservative value is double the default bit score threshold of the online HMMER tool www.ebi.ac.uk/Tools/hmmer/. Phylogenetic trees with the H. sapiens, P. falciparum and P. vivax sequences for each of the ePK groups our are available as Supplementary Figures 2-8.
Multi-sequence alignment using the human kinome as a scaffold – The human kinome comprises approximately 478 typical protein kinases, which contain a total of 497 typical protein kinase domains (as some sequences have two kinase domains) 3. Of the 497 protein kinase domains known, over 270 have had their crystal structure solved. This is in stark comparison to the low number of solved protein kinase domains of P. falciparum and P. vivax (to date, 8 and 1 respectively). For P. falciparum these kinase domains are from PK5 28, PK7 29, PKG 30, CDPK3 31, CDPK4 (PDB: 4RGJ), MAPK2 (PDB: 3NIE), CK2 32 and CLK1 (PDB: 3LLT); for P. vivax only PKG 30. The large number of kinase domain structures available supported the human kinome MSA, which was not possible for the P. falciparum and P. vivax kinomes. We therefore leveraged the homology between kinase families across species to aid in the MSA for both P. falciparum and P. vivax kinase domains (see Methods). The 17 conserved segments (230 amino acids) used for the alignment (see Methods) are depicted in Figure 1 as sequence logos for the Plasmodium kinases, to highlight the conserved motifs important for kinase function from within the domain. Key motifs are (i) His-Arg-Asp (HRD), which is within the catalytic loop and stabilises the active site 33, (ii) Asp-Phe-Gly (DFG), also within the activation loop and mediates allosteric conformational changes to regulate activation/inactivation of the enzyme 34, and (iii) Ala-Pro-Glu (APE), which sits at the C-terminal end of the activation loop and stabilises the segment through docking to the domains F-helix 35. The HRD, DFG and APE motifs are part of the conserved segments in the catalytic loop (CL), Activation loop N-terminal (ALN) and Activation loop C-terminal (ALC), respectively (see Figure 1). To determine if the overall conserved segments of the kinase domain differ between Plasmodium and Homo sapiens, we generated sequence logos for the Homo sapiens sequences as well (Supplementary Figure 9). No large differences can be detected, but the consensus sequence is not as pervasive across the Plasmodium kinases, suggesting Plasmodium kinases can accommodate less stringent constraints at the primary structure level. However, this could be an artefact due to the kinomes of Plasmodium being smaller than that of humans, resulting in more divergent kinases making up a greater proportion of the kinome.
Phylogenetic tree of the Homo sapiens, P. falciparum and P. vivax kinomes – A phylogenetic tree (Figure 2) was constructed from the 230-column alignment consisting of the 17 aforementioned conserved segments (see Methods), as the removal of insertions improves accuracy 36. To determine the phylogenetic relationship between the human kinome and that of P. falciparum and P. vivax, we included a total of 671 kinase domain sequences, covering all typical protein kinase sequences from these three organisms. The phylogeny relationships were determined using the RAxML GUI 2.0 37,38. The definition of the boundaries for each PK group was guided by the HMM profiles, the tree structure and the defined family assignment reported for each of the human kinases 3. Six protein kinase previously flagged as orphans could now be confidently assigned to one of these ten groups, (see Table 1 for changes in kinase group assignment). Figure 3 depicts the number of kinases in each family per organism as a percentage of their kinome. This confirms previous reports that there are no Plasmodium kinases in either the TK or RGC groups 12,15. However, we determined that both P. falciparum and P. vivax both have a single kinase belonging to the STE family (previously only reported in P. vivax 39). In addition to the STEs, a clear reduction the AGC family (in comparison to the human kinome) can be observed as well. Orphan kinases make up a much larger percentage of the kinomes of both Plasmodium species (as compared to the human kinome), which presumably reflects the fact that the ePK groups were historically defined using Opisthokont organisms (metazoans and yeasts). The FIKK family, as previously reported, remains a prominent feature of the P. falciparum kinome, with a single member present in the P. vivax.
Plasmodium kinases with close homology to human kinases – Each ePK group (including FIKK and orphan kinases) was assessed to determine if any particularly strong bootstrap support for homology between the Plasmodium and human protein kinases was observed. Surprisingly (in view of the divergent evolutionary paths of Alveolates and Metazoans), a number of noteworthy homologies were identified across most groups (Supplementary data 2). The Plasmodium kinases which exhibited bootstrap support to human kinase/s greater than 50 are listed in Table 2, along with their human homolog(s). Table 2 notes that of the 16 Plasmodium kinases with bootstrap support above 50 to a human homolog(s), 10 of these belong to the CMGC group. Further, all Plasmodium kinases with bootstrap support values over 75, to a human homolog, belong to the CMGC group. Most notable are the Plasmodium kinase, cyclin-dependent-like kinase 3 (CLK3), serine/arginine protein kinase 1 (SRPK1), Casein kinase 2 alpha subunit (CK2a), Mitogen activated protein kinase 1 (MAPK1) and Glycogen synthase kinase 3 (GSK3) (Figure 4; a tree with the entire CMGC family, along with the CMGC kinases of P. vivax, is available in Supplementary Figure 2).
CLK3 (PF3D7_1114700), belongs to the CLK family of protein kinases, which in mammalian cells, facilitate phosphorylation of splicing factors 40. In Plasmodium, PfCLK3 is essential during asexual blood stage development 41. PfCLK3 has previously been assigned to the PRP4 subfamily of dual-specificity tyrosine-regulated kinases (DYRK) 39. Our phylogenetic analysis confirms this finding, and further reports the striking homology between PfCLK3 and HsPRPF4B (bootstrap support = 100). SRPK1 (PF3D7_0302100), was initially considered to belong to the CLK family, however, it was reclassified as a SRPK following functional analysis 42. SRPKs are closely related to the CLKs, and in mammalian cells, have a number of complex functions including mRNA processing and nuclear import (reviewed in 43). In Plasmodium, PfSRPK1 (previously known as PfCLK4) is essential during asexual blood stage development 41. Here we can confirm the homology of PfSRPK1 as the kinase clusters closely with the human SRPK1-3 (Bootstrap support = 99). Interestingly, PfSRPK2 branches away before the SRPK and CLK division in the tree and is likely to have evolved from the precursor gene that gave rise to SRPKs and CLKs in the Opisthokont lineage. CK2α (PF3D7_1108400), and GSK3 (PF3D7_0312400) are present in all examined apicomplexan species. In Plasmodium, PfCK2α and PfGSK3 are essential for blood stage development 41. PfCK2α has distinct homology to human CSNK2A1-3 (Bootstrap support = 99) and PfGSK3 shows strong homology to human GSK3α/β. Interestingly, Plasmodium PK6, PK1, PF3D7_1316000 and GSK3 cluster in the same branch as Human GSKα/β, which could suggest these three kinases derive from a common gene.
The Mitogen activation protein kinase family forms two relatively tight clusters of protein kinases within the CMGC group (Figure 2). MAPKs typically function as part of a three-tiered MAPK cascade, where a MAP3K phosphorylates a MAP2K which in turn phosphorylates a MAPK. PfMAPK1 (Pfmap-1, PF3D7_1431500) clusters closely with human MAPK15 (ERK7, bootstrap support = 97), forming a clade that branches away from the majority of the MAPKs early in the tree (Figure 2 and 4). In humans, MAPK15 (ERK7) is an atypical MAPK that is activated by auto-phosphorylation rather than in the context of a classical 3-tier pathway 44. In P. falciparum MAPK1 has been shown to be dispensable during erythrocytic development and for sporogony in the mosquito 41,45. Curiously however, the other MAPK encoded by P. falciparum, MAPK2 (pfmap-2, PF3D7_1113900) has been demonstrated to be elevated in MAPK1 knockouts, suggesting the parasite is able to adaptively compensate for reduced MAPK1 levels 45. PfMAPK2 clusters within one of the primary branches of the human MAPK family (MAPK1/3/4/6/7 and NLK) (bootstrap support = 86). Further, within this family PfMAPK2 clusters closest to human MAPK1/3/4 and 6 (bootstrap support = 35, Figure 4). Despite a clear homolog to the above-mentioned family of MAPKs, Plasmodium does not possess a MAP2K orthologue to phosphorylate and activate PfMAPK2. In fact, both P. falciparum and P. vivax only encodes a single member of the STE group (containing the MAP2Ks), which does not cluster closely with any specific kinase (Supplementary Figure 3). Whether these enzymes function in pathways that implicate the MAPKs remains to be determined.
Comparison of the P. falciparum and P. vivax kinome – To directly compare the kinomes of P. falciparum and P. vivax, while preserving the phylogenetic tree structure, all H. sapiens branches were removed from the tree (Figure 5). As alluded to above, the kinomes of P. vivax and P. falciparum, despite their evolutionary distance, are very similar, with almost all kinases having a clear orthologue in the other species. There are only three distinct cases where no orthologue was observed (red arrows in Figure 5): first, as previously reported 12, there is only one FIKK encoded by the P. vivax genome, versus a paralogous group of 21 sequences in P. falciparum. Second, PVP01_0118800, which belongs to the TKL family, does not have an orthologue in P. falciparum. Third, PfCDPK2, from the CAMK family does not have an orthologue in P. vivax.
To understand which clades within the Plasmodium genus possess an orthologue of CDPK2, we aligned the kinase domains of all known CDPKs encoded by six distinct species; P. falciparum, P. vivax, Plasmodium knowlesi, Plasmodium berghei, Plasmodium gaboni and Plasmodium gallinaceum (see methods section). These sequences were assessed using RAxML 37,38 and a gene tree inferred from the results 46. From the gene tree it is clear that CDPK2 is significantly different from its next closest homologue in the Plasmodium genus (CDPK3) (Supplementary Figure 10). To more extensively assess which species in the genus Plasmodium encoded an orthologue of CDPK2 we completed a BLASTP search using the kinase domain of PfCDPK2. We compared the species that contained an orthologue of CDPK2 to the mitochondrial genome phylogeny for the Plasmodium spp. on PlasmoDB 17 (see method for details). This identified that the bird-infecting Plasmodium gallinaceum and Plasmodium relictum, as well as species in the Laverania lineage (Plasmodium gaboni, Plasmodium rechenowi and P. falciparum) all contain an orthologue of CDPK2, while species from the murine parasite clades and the other (non-Laveranian) primate-infecting parasites lineages do not (Figure 6). This is consistent with a whole-genome-based phylogeny suggesting that the Laverania have been founded by a single Plasmodium species switching from birds to African great apes (or vice versa, see below) , and suggest that CDPK2 has been lost in all other Plasmodium clades, or gained after the split between the clades 47.
Regulatory subunits of kinases – Protein kinase regulatory subunits do not themselves have protein kinase activity, but are essential in the regulation of a select few protein kinases. Casein kinase 2 (CK2), which belongs to the CMGC group, forms a homo- or hetero-tetramer structure comprised of two regulatory subunits and two catalytic subunits 48. P. falciparum encodes a single CK2 catalytic subunit (PF3D7_1108400) and two different regulatory subunits (PF3D7_1103700 and PF3D7_1342400) 32. BLASTP searches confirmed that P. vivax encodes orthologs to each of these subunits (CK2 catalytic subunit: PVP01_0909200 and regulatory subunits: PVP01_0904500, PVP01_1212400) and no other. Protein Kinase A (PKA) belongs to the AGC ePK group, and, similar to CK2, the human holoenzyme is structured as a tetrameter of two catalytic subunits and two regulatory subunits; cAMP binding to the regulatory subunits results in the release of the active catalytic subunits 49. P. falciparum has previously been reported to encode a single PKA regulatory (PF3D7_1223100) and a single catalytic subunit (PF3D7_0934800) 50, and the same is true for P. vivax (regulatory subunit: PVP01_0733500; catalytic subunit: PVP01_0733500). Lastly, cyclins are a diverse family of proteins which contain a conserved 5-helix bundled region known as the cyclin box, which enables binding to cyclin-dependent kinases (CDKs), stimulating their activity and hence playing a major role in cell cycle control 51. P. falciparum encodes 3 readily identifiable cyclins, CYC1 (PF3D7_1463700), CYC3 (PF3D7_0518400) and CYC4 (PF3D7_1304700) as well as CYC2 (PF3D7_1227500), which appears to be more distantly related 52,53. We completed an HMM search using both the PFAM IDs PF086134 (Cyclin) and PF00134 (Cyclin_N) profiles, which revealed the presence of a fifth, previously unreported, putative cyclin in P. falciparum (PF3D7_0605500). Additionally, the same search identified four possible cyclins in P. vivax, (PVP01_1015500, PVP01_1243100, PVP01_1143400 and PVP01_1405600). Phylogenetic analysis of the cyclin box (a highly conserved sequence among cyclins) showed that PVP01_1243100, PVP01_1015500 and PVP01_1405600 are orthologous to PfCYC1,PfCYC3 and PfCYC4 respectively.. Lastly, PVP01_1143400 and PF3D7_0605500, had not been identified previously and appear to be distantly related to Human Cyclin A (Supplementary Figure 11). Further refined phylogenetic analysis and functional validation of these putative cyclins are warranted.
In this study we generated a complete human kinome comparison to P. falciparum and P. vivax, enabling the first comprehensive assessment of homologues protein kinases between an Apicomplexan parasite and its primary host. The striking kinome conservation observed across the evolutionarily distance species of P. vivax to P. falciparum together with previous studies of P. berghei 54 and earlier kinome assemblies of P. falciparum confirm that there is clear pressure for Plasmodium spp. to maintain the overwhelming majority of its kinome 12,39. Though there are examples of clade-specific gene loss, such as CDPK2 and PVP01_118800 reported here, the vast majority of kinases remain highly conserved. In P. falciparum, PfCDPK2 is critical for male gametogenesis 55; therefore, it is likely that this function is fulfilled by another CDPK in the species where it is absent. The homology observed between CDPK2 and CDPK3 suggest that CDPK3 may fulfil this function, although this remains to be demonstrated (Supplementary Figure 10). PVP01_118800 belongs to the TKL group and our kinome comparison indicates that it is most closely related to TKL4 of both P. falciparum and P. vivax, with a strong bootstrap support of 67 (Figure 2 and 5). A BLASTP search using PVP01_118800’s kinase domain as a query, indicated that an orthologue exists in all Plasmodium species with annotated genomes, except for species within the Laverania clade; this supports the hypothesis that the Laverania lineage results from a transfer from the bird-infecting parasites to the great apes, rather than the reverse 47, and that TKL6 was lost after passage of the Laverania founding species from birds to great apes (Figure 6).