Isopentenyl Phosphate Kinases are Ubiquitous and Copy Numbers are Conserved in Plant Genomes


 Key message: Isopentenyl phosphate kinase (IPK) is a key enzyme in mevalonate pathway in isoprenoid biosynthesis. We analyzed 37 presumptive IPK sequences from 35 plants. An specific evolution model was found in plant IPKs, which can be used as an new target in studying the plant isoprenoids metabilte. Abstract: Isopentenyl phosphate kinase (IPK) is a recently discovered enzyme played key role in mevalonate pathway in isoprenoid biosynthesis. Here, we showed that IPKs are ubiquitously present in plant genomes. All IPKs previously identified had AAK domain. From 35 plant species with genome assembly data available, we extracted all AAK family members. Using OrthoMCL, we identified a group of 37 sequences in which Arabidopsis IPK was included. Further analysis showed that each peptide sequence in this group has a His residue which is a signature of IPK enzyme, indicating that the genes in this group were IPKs. Not like these in other domains of life which showed spotty distribution over the tree of life, virtually all plant genomes we analyzed here had IPK genes. Further, copy numbers of IPKs were very conserved in that no higher than 2 copies remained in each plant genome. Plant IPKs formed a distinctive clade in phylogenetic tree of plant AAK gene family, and had a phylogenetic topology conformed to that of plant species. The IPKs we identified here would provide new molecular targets for characterization of plant mevalonate pathway, and shed light on biochemistry of plant isoprenoids biosynthesis.


Introduction
Isoprenoids constitute a large group of biologically active metabolites in living cells. Cells invariably require two five-carbon (C5) building blocks, isopentenyldiphosphate (IPP) and its isomer, dimethylallyldiphosphate (DMAPP), as precursors for isoprenoids. To biosynthesize these two compounds, two independent pathways, the mevalonate (MVA) pathway and the methylerythritol (MEP) pathway, are both present and functionally effective in plants, which are not like most other organisms where only one pathway or the other is utilized [1], [2], [3].
Regarding MVA pathway, Plants are also unique as an alternative route is present tothis classicalpathway. Following mevalonate 5-phosphate (MVAP) biosynthesis, the classical MVA pathway phosphorylates MVAP to produce diphosphomevalonate (MVAPP) by phosphomevalonate kinase (PMK), and then uses mevalonate-5-diphosphate decarboxylase (MDD) to decarboxylate MVAPP and produce IPP, C5 isoprenoid building block [4], [5]. Genes encoding PMK or MDD were failed to be identified in archaea, which were intriguing, because isoprenoids constitute essential parts of archaea lipid membrane [6]. Identification of isopentenyl phosphate kinase (IPK) in Methanocaldococcusjannaschii led to a proposal of presence of an alternative MVA pathway in archaea [4], [7]. In this pathway, firstly, MVAP was decarboxylated by a yet unidentified enzyme; next, phosphorylation led to production of IPP catalyzed by IPK [2], [8]. IPK was originally thought to be restricted in archaea as PMK and MDD were only missing in archaea where other components of MVA pathway were present, and virtually no IPK was identified by approaches based on homology [4], [9]. By using more sensitive searching approaches, Dellas et al identified homologs of IPK in bacteria and eukarya in addition to archaea, which were confirmed to be enzymatically functional [5], [10].
Interestingly, despite a spotty phylogenetic distribution of homologs in fungi and animals, IPK sequences were identified in all 15 plants surveyed by Dellas et al [10].
Here, through using a comprehensive identification of AAK family members, followed by orthoMCL grouping, we identified 37 putative IPK sequences from 35 plants with genome assembly data available. Through sequence and phylogenetic analysis, we discussed their evolutionary and physiological implications.

Materials And Methods
Identification of AAK family members 37 plants with genome sequences available publicly were chosen for IPK sequence mining, which represent key stages in plant evolution history (Fig. 1). The 15 plant IPKs identified by Dellas et al, and Arabidopsis were used as queries for BLAST searches, which were performed in a standalone mode using BLASTp included in BLAST+ tools (v2.2.30) (http://www.ncbi.nlm.nih.gov) with e-value set as 1e-2 [11]. In parallel, Hidden Markov Model searches were carried out through HMMER v3.1 by matching AAK domain, whose accession number is PF00696.23 in PfamA database [12], [13]. The two search results were combined, and screened further with pfam_scan.pl (ftp://sanger.ac.uk/pub/databases/Pfam) (e_seq: 1e-3, e_dom: 1e-6) [13]. Duplicated and putatively alternative splicing sequences were deleted, which generated a dataset of plant AAK family genes.

Results And Discussion Analysis of 64 IPK sequences previously identified
We analyzed all the 64 IPK sequences identified by Dellas et al as the first step [10]. By using Pfam Scan, which facilitates search of a FASTA file against a library of Pfam HMMs, we identified domains contained in the sequences [8]. As expected, all plant IPK peptide sequences shared an AAK domain (HMM accession number: PF00696.23) [10], whose length was 241 amino acid residues in average.
The sequence with the shortest AAK domain was from Montastraeafaveolata, whose AAK domain length was 163 a.a. residues. All sequences shared a conserved site of histidine residue which was showed to be catalytically essential active site [17]. Although IPK genes are divergent in sequences among the tree of life, it was shown that the 64 plant IPK sequences identified by Dellas et al. formed a monophyletic clade in the phylogenetic tree, which indicated that plant IPKs share a single ancestral IPK sequence.

Identification of IPK genes from sequenced plant genomes
We performed identification of plant IPK sequences through two steps. Firstly we identified AAK family members comprehensively in sequenced plant genomes, and then extracted IPK genes through further identification and grouping of homologs. To fulfill the first step, we made a comprehensive identification of plant sequences through a combination of approaches of BLAST and HMMER (Materials and Methods). In total, 493 AAK family members were identified from 35 plants with genome sequence data readily available, plus 4 from S. cerevisiae. In the second step, we used OrthoMCL to further identify and group orthologs among AAK family members. In Group 6, 37 sequences were contained, in which Arabidopsis IPK gene (locus ID: AT1G26640.1) was included (Table 1).We back-checked these 37 sequences, and were assured that all sequences really had AAK domains.
IPK peptide sequence contains a key signature of Histidine residue in IPK peptide sequences (supp Fig   2) [4], [18]. We aligned the 37 peptide sequences, together with two known IPK sequences, from Roseiflexuscastenholzii and Chloroflexusaggregans, respectively, and results showed all 37 peptide sequences have this key His residue (Supp Fig 2).Topology of phylogenetic inference for this group largely conformed to phylogenetic relationship of species where sequences were discovered (Fig. 2).
Interestingly, of the 35 plant genomes we checked, every genome has one copy of the putative IPK genes, except for S.moellendorffii and N.tabacum, which contains two, respectively. This suggests important role of IPK gene throughout plant tree of life.
We performed phylogenetic inference for these sequences, together with the two IPK sequences from Roseiflexuscastenholzii and Chloroflexusaggregans. AT1G26640 formed a distinctive clade together with the two known IPK genes (Fig. 3). No S. cereviaise gene fell into IPK clade, which indicated that no IPK gene was present in S. cereviaise. Functionally identical or similar genes were grouped together, such as YER052C, AT1G31230 and AT4G19710 which encode aspartate kinases.
Based on the sequence properties and topology of phylogeny, we concluded that the 37 genes from 35 plant genomes were putative IPK genes. In summary, the genes had His residue signature, and AAK domain; phylogenetic inference showed that Arabidopsis gene AT1G26640 was grouped together with known IPK genes. In evolutionary perspective, plant IPKs had distinctive characteristics. Not like other domains of life, plant IPKs are ubiquitously present in plants, from green algae to higher plants.
Copy numbers of IPKs were kept not higher than 2, at least in the 35 plant genomes we Phylogenetic relationship of 37 plant species with whole genome sequence data available, which were extracted from NCBI Taxonomy. Phylogeny was visualized by FigTree v1.3.1.
Clade lengths do not necessarily reflect divergence times.