Analysis of 64 IPK sequences previously identified
We analyzed all the 64 IPK sequences identified by Dellas et al as the first step [10]. By using Pfam Scan, which facilitates search of a FASTA file against a library of Pfam HMMs, we identified domains contained in the sequences [8]. As expected, all plant IPK peptide sequences shared an AAK domain (HMM accession number: PF00696.23) [10], whose length was 241 amino acid residues in average. The sequence with the shortest AAK domain was from Montastraeafaveolata, whose AAK domain length was 163 a.a. residues. All sequences shared a conserved site of histidine residue which was showed to be catalytically essential active site [17]. Although IPK genes are divergent in sequences among the tree of life, it was shown that the 64 plant IPK sequences identified by Dellas et al. formed a monophyletic clade in the phylogenetic tree, which indicated that plant IPKs share a single ancestral IPK sequence.
Identification of IPK genes from sequenced plant genomes
We performed identification of plant IPK sequences through two steps. Firstly we identified AAK family members comprehensively in sequenced plant genomes, and then extracted IPK genes through further identification and grouping of homologs. To fulfill the first step, we made a comprehensive identification of plant sequences through a combination of approaches of BLAST and HMMER (Materials and Methods). In total, 493 AAK family members were identified from 35 plants with genome sequence data readily available, plus 4 from S. cerevisiae. In the second step, we used OrthoMCL to further identify and group orthologs among AAK family members. In Group 6, 37 sequences were contained, in which Arabidopsis IPK gene (locus ID: AT1G26640.1) was included (Table 1).We back-checked these 37 sequences, and were assured that all sequences really had AAK domains.
IPK peptide sequence contains a key signature of Histidine residue in IPK peptide sequences (supp Fig 2) [4], [18]. We aligned the 37 peptide sequences, together with two known IPK sequences, from Roseiflexuscastenholzii and Chloroflexusaggregans, respectively, and results showed all 37 peptide sequences have this key His residue (Supp Fig 2).Topology of phylogenetic inference for this group largely conformed to phylogenetic relationship of species where sequences were discovered (Fig. 2).
Interestingly, of the 35 plant genomes we checked, every genome has one copy of the putative IPK genes, except for S.moellendorffii and N.tabacum, which contains two, respectively. This suggests important role of IPK gene throughout plant tree of life.
Phylogeny of AAK family members
In Arabidopsis, 13 AAK family members were identified, among which several were functionally characterized. AT5G13280, AT3G02020 and AT5G14060 were identified to encode aspartate kinases [19], [20], [21]. AT1G31230 and AT4G19710 were characterized to have dual activities of aspartate kinase and homoserine dehydrogenase [19], [22]. AT2G39800 and AT3G55610 had delta 1-pyrroline–5-carboxylate synthase 2 activities [23], [24]. AT3G57560 encoded N-acetyl-l-glutamate kinase [25]. 4 AAK members were present in S. cerevisiae, among which 3 were functionally and/or structurally identified. YER052C encoded aspartate kinase, YDR300C encoded enzyme with gamma-glutamyl kinase activity, while YER069W were identified to have acetylglutamate kinase and N-acetyl-gamma-glutamyl-phosphate reductase activities [26], [27].
We performed phylogenetic inference for these sequences, together with the two IPK sequences from Roseiflexuscastenholzii and Chloroflexusaggregans. AT1G26640 formed a distinctive clade together with the two known IPK genes (Fig. 3). No S. cereviaise gene fell into IPK clade, which indicated that no IPK gene was present in S. cereviaise. Functionally identical or similar genes were grouped together, such as YER052C, AT1G31230 and AT4G19710 which encode aspartate kinases.
Based on the sequence properties and topology of phylogeny, we concluded that the 37 genes from 35 plant genomes were putative IPK genes. In summary, the genes had His residue signature, and AAK domain; phylogenetic inference showed that Arabidopsis gene AT1G26640 was grouped together with known IPK genes. In evolutionary perspective, plant IPKs had distinctive characteristics. Not like other domains of life, plant IPKs are ubiquitously present in plants, from green algae to higher plants. Copy numbers of IPKs were kept not higher than 2, at least in the 35 plant genomes we checked.Phylogenetic topology of IPK genes and AAK family members indicated that plant IPKs had the same origin which could date back to the emergence of green algae on earth. The results in our report indicated that plant IPKs may have important roles in plants. Further physiological as wells as enzymatic characterization should provide more insights into the role of plant IPKs.
Previous work discovered IPK formed a key point of an alternative route of MVA pathway in different domains of life such as archaea [4], [5], [7], [10]. Here we showed that IPK homologs were present in virtually all plant genomes. Considering that isoprenoids play important roles in plant development and physiology, the IPKs we identified here would shed light on mevalonate pathway of plant isoprenoids biosynthesis. The work we demonstrate here indicates missing points should be filled in characterization of plant isoprenoids biosynthesis network.