Identification of acyl carrier protein (ACP) in different plant species
Although high functional significance has been attached to ACP during FA synthesis, evolutionary history of this gene family has not been characterized yet. Previously most ACPs of A. thaliana have been cloned and functionally characterized [3, 24-26, 29, 30], eight ACPs from this model plant lineage were used as query sequences to perform a BLASTP search against protein databases of nine plant species covering six families (i.e. Cruciferae, Malvaceae, Fabaceae, Euphorbiaceae, Vitaceae, Linaceae, and Gramineae), including both monocot and eudicot. All ACP sequences were then subjected to functional annotation by InterProScan. A total of 97 non-redundant ACPs were identified with 5-17 ACP members in each plant species (Fig. 1, Additional file 1: Table S1). The Identified ACPs spanned from 78 to 254 amino acids in length with 134.8 amino acids on average, and 95.9% were less than 170 amino acids. Furthermore, most of these ACPs (88.7%) were acidic with a pI (isoelectronic point) of less than 7.0 (Additional file 1: Table S1), which was assumed to be vital for higher ACP structures [31]. We classified these ACPs into eight putative groups, namely ACP1-ACP5 (plastidial ACP, also termed cpACP) and mtACP1-mtACP3 (mitochondrial ACP), based on the sequence similarity with these eight ACPs in A. thaliana (Fig. 1; Additional file 1: Table S1). Notably, ACP4, mtACP1, and mtACP2 were present in all 10 plant species examined, with ACP4 being the most abundant, accounting for more than 1/3 of all ACPs detected (Fig. 1). This classification contradicts the in silico prediction of subcellular localization by WoLF pSORT [32], according to which ~96.4% of ACP1-5 is preferentially localized in chloroplast, whereas only 59.5% of putative mtACPs should be localized in mitochondria (Additional file 1: Table S1). Nonetheless, the classification system was still adopted in this study for convenience.
Phylogenetic relationships and gene structural analysis among plant ACPs
To further investigate the evolutionary relationship among ACPs from ten different plant species, phylogenetic trees were generated by calculating Maximum Likelihood and Monte Carlo Markov Chain (MCMC) topology based on the alignments of 97 full-length ACP sequences, rooted by ACP from Chlamydomonas reinhardtii (CrmtACP2/CrACP4). The results showed that ACPs were grouped into two distinct clades (I and II), which were further partitioned into two and three subclades, respectively (Fig. 2). Most branch topology (91/97) generated by two independent methods matched to each other perfectly, with few disorders (6/97) within the same branch (Fig. 2), which clearly indicated the robustness of the two tree topologies. Results showed that subclade Ia included ACP1-5, while subclades Ib, IIa, IIb, and IIc exclusively included ACP4, mtACP1, mtACP2, and mtACP3, respectively (Fig. 2). Both clades contained ACPs from all ten plant species examined, with at least eight species in each subclade. Specifically, eight ACPs from Arabidopsis (AtACPs, AtmtACPs) were included in all subclades. AtACP1, 2, 3 and 5 were grouped together within subclade Ia, while AtACP4 and AtmtACPs were distributed separately in different subclades (Fig. 2, indicated in red).
With regard to protein sequence and gene structure, the exon number of most ACPs within clade I was four, whereas that of clade II was two (Additional file 2: Figures S1A, Student’s t test P<0.001). As for protein length, mtACP exhibited an average length of 126.79 amino acids, which was much shorter than that of ACPs from clade I (142.73 amino acids) (Additional file 2: Figures S1B, Student’s t test P<0.001). Significant differences were observed in terms of molecular weight (MW), average intron length, and pI (isoelectronic point) as well (Student’s t test P<0.05) (Additional file 2: Figures S1C, S1D, S1E). With regard to the intron phase, nearly 95% of exons are disrupted by intron between codons (0), only 4.1% and 2.7% of exons are spaced between first (1) and second (2) nucleotide of code respectively (Additional file 1: Table S1), and no bias was observed between clades.
Structural conservation among different subclades
Protein function largely depends on primary structure, i.e., amino acid sequence; therefore, functionally important sites (often related to substrate binding and reaction) tended to be retained during evolution to maintain a certain physiological role. To gain new insights into the extent of ACP sequence conservation, multiple alignment was performed on these sequences using MUSCLE, and displayed using Weblogo3. Generally, the N-terminal is less conserved (lower percentage of amino acids with overall height of bits score > 2.0) than that of the C-terminal in different subclades (Fig. 3a), except for subclade IIa. In a previous study, the Asp-Ser-Leu (DSL) motif was assigned the essential roles of activating ACPs before accepting an acyl group and functioning as a recognition site for phosphopantetheinyl transferase (PPTase) [10]. In our study, the DSL motif was conserved in 93.9% of all identified ACPs. However, this extent of conservation was much higher than the average conservation levels in subclades Ia, Ib, IIb, and IIc, while only 78.6% of ACPs retained the DSL motif in subclade IIa (Fig. 3a, indicated by red dashed rectangle), indicating that ACPs in subclade IIa (mtACP1) were less conserved.
ACPs are structurally flexible, both in the secondary structure and spatial conformation, which provides the protein with the capacity to alternatively sequester and release the acyl moiety to the active sites of partner enzymes [12]. We hypothesized that the consensus sequence of proteins from the same subclade might better reflect their function than that by a consensus of all ACP sequences from a certain species. Here, we first aligned ACP sequences within the same subclades consisting of ACPs from different plant species, after which we constructed a majority consensus sequence to model the tertiary structure. The resulting structures indicated that all ACPs contained 4 α helices in each subclade, including three longer helices and a shorter helix (Fig. 3b, c, d, e, f). In clade I (cpACPs), the shorter helix was located between helix I and II, whereas in clade II (mtACPs), the shorter helix was found between helix II and III (Fig. 3 b, c, d, e, f), suggesting that cpACPs and mtACPs might have diverged in tertiary structure, but the helix location was conserved to some extent.
ACP family expansion within species
Most flowering plants are estimated to experience one or more rounds of whole genome duplication during their evolutionary history [33-35]. Besides this, tandem duplication, segmental duplication, chromosomal rearrangement, and transposition contribute to genome permutation as well, thus, resulting in gene gain or loss, and consequently, leading to the expansion or retention of gene family members [36]. To better understand ACP gene expansion history, we investigated potential tandem and segmental duplication events within different plant genomes. Based on the genomic location of ACPs we identified five gene pairs neighboring each other separated by only ten or fewer extra genes from four different species, namely, Arabidopsis (one), cotton (one), Linum (two), and rice (one), indicating that these ACPs possibly experienced tandem duplication (Table 1). Thereafter, the sequence identities of proteins flanking the ACPs were calculated (thirty proteins located within, up-, and downstream for each. See details in the Materials and Methods). Eleven ACP gene pairs were identified as potentially derived from segmental duplication (Table 1). Among these ACPs originated by segmental duplication, soybean and Linum ACPs accounted for 54.5% and 36.3%, respectively. Notably, eight out of twelve ACPs from Linum underwent tandem or segmental duplication (Table 1). We also noticed that half (50%) of the tandem- or segmental duplication-derived ACPs were related to ACP4, mtACP1, and mtACP2. Interestingly, four ACPs (LuACP4c, LuACP4d, LuACP4e, and LuACP4f) on scaffold37 and scaffold475 of the Linum genome were estimated to have experienced both tandem and segmental duplication. We reasoned that likely, tandem duplication occurred prior to segmental duplication. However, ACPs of only six plant species examined, namely, A. thaliana, G. max, G. raimondii, L. usitatissimum, P. vulgaris, and O. sativa, probably experienced tandem- and segmental-duplication. Further, duplicated blocks were also observed among different legume species. A set of genes containing mtACP1 and mtACP2 was collinearly located on chromosomes spanning a physical distance of 0.43 to 0.56Mb in soybean, medicago, and common bean (Additional file 3: Figures S2) genomes, suggesting that these ACPs diverged prior to the divergence of the legume species.
To map the divergence of these ACP genes, we calculated the Ks (synonymous substitution rate) of closely neighboring ACP gene pairs on the phylogenetic tree and estimated the time of divergence (Fig. 2a). We found that 45.8% of neighboring ACP gene pairs experienced tandem or segmental duplication, and the time of divergence of these genes is more recent than that of ACP genes without traces of duplications. As expected, ACP gene pairs that diverged within the last 10 million years were derived from the same subclades, whereas gene pairs that diverged more anciently were derived from different subclades. With regard to plant species, 52.9% of ACPs from soybean were involved in segmental duplication rather than tandem duplication and diverged at approximately 12.9 million years ago (Mya) on average, whereas 83.3% of ACPs from Linum underwent segmental and tandem duplication at approximately 5.1 Mya on average.
To elucidate the time between divergence events among clades, pairwise Ks between subclades were calculated. We found these Ks centered on at least eight major peaks (Additional file 4: Table S2) corresponding to the gene divergence event windows from ~237.7 Mya to ~16.4 Mya. All Ks among subclades share two peaks of 0.63 and 1.11, corresponding to ~51.6 Mya and ~91 Mya, respectively (Additional file 5: Figures S3). Similarly, we noticed that most ACP gene pairs (74.2%) exhibited Ka/Ks < 1, suggesting a purifying selection; conversely, only 12.9% experienced positive and neutral selections (Additional file 4: Table S2)
ACP gene expression profile
As remarkable differences were observed among ACPs in terms of clades, gene structure, and expansion patterns, we questioned whether these differences might also be reflected at the expression level. Rapidly progressing high-throughput sequencing technology, such as RNA-seq, provided a substantial amount of bulk data that was readily available for gene expression-profile investigation. We collected RNA-seq data of different tissues from four plant species, namely, L. usitatissimum, G. max, A. thaliana, and M. truncatula, to study the ACP gene expression patterns. Data from different resources were first normalized via the Z-score method, and then combined together for further analysis. Approximately 90.5% of these ACPs were detectable in seeds/embryos, leaves, stems, and roots of different species (Fig. 4a), whereas ACP2 was only detectable in Arabidopsis, indicating that most ACPs were constitutively expressed and actively functional. Hierarchical clustering analysis indicated that ACP of the same species tended to group together (Fig. 4a). We found three gene pairs derived from segmental- (GmACP4a/GmACP4c; LuACP4a/LuACP4b; LumtACP1a/LumtACP1b) and tandem-duplication (AtACP2/AtACP3; LuACPe/LuACPf; LuACPc/LuACPd), which were adjacently grouped and showed similar expression patterns (Table 1; Fig. 4a). Interestingly, seven among nine of these tandem- or segmental-duplication derived ACP gene pairs were of the same ACP type (ACP4, mtACP1, and mtACP3). Relevant network analysis showed that, regarding the expression pattern, 14 and 11 ACP gene pairs were positively and negatively correlated, respectively (Fig. 4b). ACPs from the same species tended to be positively correlated, and most ACPs common to both soybean and Linum were negatively correlated (Fig. 4b).
Regarding the average expression level of each ACP type from different species, positive correlations were observed among ACP1, ACP2, and ACP3, between mtACP1 and mtACP2, and between ACP4 and mtACP3 (Additional file 6: Figure S4A). In particular, the Pearson correlation coefficient (r) for the correlation between ACP1 and ACP3 was 0.94, which was much higher than those for mtACP1 and mtACP2, or for ACP4 and mtACP3 (0.71 and 0.62, respectively). Both ACP1 and ACP3 exhibited relatively higher expression levels in the roots and seeds/embryos than in the stems and leaves (Additional file 6: Figure S4B). From the violin plot, we also noticed relatively higher expression levels of mtACP1 and mtACP2 than of mtACP3 (Student’s t test P<0.001). The widest distribution range was observed for ACP4 (Additional file 6: Figure S4B), probably owing to the fact that the ACP4 family is the largest among ACP families.
Cis-regulatory element analysis within the promoter region of ACP
Cis-elements from gene promoter sequences are deemed specific binding sites by trans-acting transcription factors (TFs) and other additional co-factors to initiate gene transcription [38]. The type and number of cis-elements in the promoter region are assumed to be one of the key factors in coordinating the spatial and temporal gene expression pattern during plant growth and development and during responses to environmental stress conditions [37]. To better understand the cis-elements of these ACP genes, 1000bp upstream of the transcription start site (TSS) of the ACP coding region were analyzed in silico using the Plant Cis-acting Regulatory DNA Elements (PLACE) database [38]. In total, 249 putative cis-elements were identified, and each ACP gene contained 31-90 of these elements, among which only seven (2.8%)—CACTFTPPCA1, DOFCOREZM, GT1CONSENSUS, CAATBOX1, ARR1AT, POLLEN1LELAT52, and GATABOX—were ubiquitous in the promoter of all ACPs examined. Further, these seven cis-elements reached 10.3 times per gene on average, with variation among different ACPs. In terms of function, these seven ubiquitous cis-elements were related to organ-specific gene expression (i.e., the leaf, seed, and pollen), ABA, and light-mediated gene regulation (Fig. 5a; Additional file 7: Table S3). Furthermore, subclade-specific cis-elements were investigated as well. A total of 19, 7, 4, 9, 6 cis-elements belonging to subclade Ia, Ib, IIa, IIb, and IIc, respectively, were identified and found to be involved in tissue specific gene expression, phytohormone regulation, stress response, and light-mediated gene regulation (Fig. 5a). Regarding different species, there are 87 common cis-elements presenting in all species examined with function related to tissue-specific gene expression, phytohormone, and stress response etc. (Fig. 5b). Each species contained 1–19 species-specific cis-elements in one or more ACP gene promoter regions involved in the various functions mentioned above, but these differed among species (Fig. 5b). Statistically, all ACP (100%) promoters contained cis-elements related to the seeds, leaves, roots, shoots, ABA, auxin, biotic and abiotic stress, and light responses (Fig 5c). Only 23.47% of ACPs contained drought related cis-elements with 1.91 times/gene on average, suggesting some of these ACPs might be involved in response to drought stress. Interestingly, cis-elements were found to be slightly more abundant in clade II (1.16 elements per gene) than in clade I (1.09 elements per gene), but the difference was not significant (Student’s t-test, P>0.05), suggesting that both cpACP and mtACP might play equally important roles in plants.