The evolutionary rate variation among genes of MVA and MEP pathways in plant terpenoid biosynthesis

Background: Terpenoids are one of the most important compounds in plants, play an significant biological defense and developmental roles in numerous plant species, and widely used for industrial chemicals. Many previous studies have completed the identification of terpenoid biosynthetic pathway and related genes. However, few studies have focused on the molecular evolution analysis of terpenoid pathway genes in plants. In this study, we researched the evolutionary rate variation pattern of 16 terpenoid pathway genes in 12 species with a broad taxonomic span. Results: We retrieved 14 genes in MVA and MEP pathways and 2 extra genes from 12 species, respectively. The evolutionary parameters d N values and d N / d S ratios are varied significantly among genes, and the d N / d S ratios of most genes are varied substantially among lineages. The MVA and MEP pathways genes have different evolutionary rate variation pattern, although no significant difference in d N / d S ratios between two pathways genes. For MVA pathway, the downstream genes exhibits the greater d N / d S ratio than upstream genes. For MEP pathway, the three midstream genes evolves more rapidly than other genes, and most of MEP pathway genes were detected the signature of positive selection under random sites models. Moreover, the d N / d S ratios of MVA and MEP pathways genes are negatively correlated with pathway position and PPI, and coding sequence length, respectively. Conclusions: Taken together, the results indicated that the evolutionary rate variation of MVA pathway genes is mainly attributed to differential selective constraint rather than the positive selection. However, the differential selective constraint relaxation and positive selection collectively shaped the evolutionary rate heterogeneity of MEP pathway genes. the significant correlations between pathway position and d N / d S were disappeared (ρ = 0.527, P = 0.143). These results suggest that PPI is an important factor affecting the evolutionary rate variation of MVA pathway genes.


Background
Terpenoids are the most abundant natural products, and present in all living organisms, and characterized by complex skeletons, kaleidoscopic structures and extensive biological activity [1,2]. According to incomplete statistics, more than 55,000 terpenoids have been identified in extant organisms. There are known to have many momentous biological and physiological functions, and are widely used in pharmaceutical, cosmetic, food, beverages, pigments and condiment industries [3][4][5][6]. Such as, some terpenoids are crucial components of cytoderm, membrane structures and electron transfer systems in plants [7]. The lineage-specific terpenoids plays an important role in plant defense against herbivores and pathogens [8,9]. Partial terpenoids of secondary metabolism have been essential to many facets of human life (e.g. artemisinin for malaria, taxol for ovarian/breast cancer) [10,11].
All terpenoids are derived from the common five-carbon structural units precursor isopentenyl diphosphate (IPP) and its isomer dimethylallyl diphosphate (DMAPP), though there have a variety of types with distinct structures and functions [12,13]. IPP/DMAPP can be generated via two independent and cooperative pathways: the mevalonate (MVA) and 2-C-methyl-D-erythritol-4-phosphate (MEP) pathways occurred in cytoplasm and plastids, respectively [1] (Figure.1). However, most organisms only use one of the MVA and MEP pathway for the biosynthesis of IPP/DMAPP [14,15]. Specifically, higher plants use both the MVA and MEP pathways for IPP/DMAPP biosynthesis, such an advantage maybe for overcoming the sessile lifestyle constraints [16]. The first step of MVA pathway is to take acetyl-CoA as the initial substrate and catalyzed by acetoacetyl-CoA thiolase (AACT). AACT condenses two molecular of acetyl-CoA to form acetoacetyl-CoA. And then, acetoacetyl-CoA is converted to 3-hydroxy-3-methylglutaryl-CoA (HMG-CoA) under the catalysis of HMG-CoA In this study, we investigated the evolutionary rate variation of MVA and MEP pathways genes and two additional genes from 12 representative species. In order to address the following questions. First, whether statistically significant variation in dN/dS ratios exists among genes and lineages? Second, whether significant difference in evolutionary rate between MVA and MEP pathways genes? Third, are there exist different evolutionary rate variation pattern in MVA and MEP pathways genes? If there is, whether this variability is explained by differential selective constraint and/or positive selection? Fourth, are there exist evidence of positive selection in any analyzed pathway genes? Finally, whether dN/dS ratios varies with any network parameters and/or gene structure?

Sequences retrieval and analyses
A total of 192 protein coding sequences of terpenoid pathway genes were obtained from the 12 species (Additional file 1). The mean coding sequence length of the 16 genes ranged from 714 bp (MECPS) to 2217 bp (HDS) ( Table 1). The ENC and G+C content were similar for the same gene across species and among genes. The average ENC and G+C content of the 16 terpenoid pathway genes were ranged from 47.81 to 55.32, and 0.43 to 0.56, respectively (Table 1). However, the average GC3 content was varied greatly among genes with the average ranging from 0.29 to 0.62 (Table 1). Gene copy number were differed among species for per gene. Elaeis guineensis, Vitis vinifera, Morous notabilis and Arabidopsis thanliana had mostly one or two copy each gene. Whereas most Glycine max terpenoid pathway genes had more than two copies (Additional file 2). According to the KEGG pathway database, we calculated the PPI value of 16 terpenoid pathway enzymes. The results showed that in addition to, AACT, HMGS, HMGR, MVK and DXS were involved in a number of pathways, the other 11 pathway enzymes only participate in terpenoid pathway (Table 1 and  Additional file 3).

Substitution models and phylogenetic analysis
A total of 24 nucleotide substitution models were performed for each gene. We determined the best subsitution model for the all of 16 terpenoid pathway genes is the GTR+I+G model, with the highest log likelihood (lnL) and lowest AIC value (Additional file 4). The results of substitution saturation showed that there were no any mutational saturation for the 16 terpenoid pathway genes, indicating that the sequences datasets were appropriate for subsequent phylogeny and substitution rate analyses. We concatenated the sequences of the 16 genes with the same order for each species, and then constructed the maximum likelihood species tree with the GTR+I+G model, delineated in figure 2.

Comparing dN, dS and dN/dS among genes
To compare dN and dS among genes, we estimated the dN and dS for each branch of the pruned tree of the 12 species. There are total of 22 evolutionary independent branches ( Figure 2). The distribution of dN and dS value of the 21 branches pairs estimated by Codeml of the PAML v4.9 for MVA and MEP pathways genes and the average of dN and dS for each gene are shown in Figure 3. Unlike dS value, dN varied significantly among genes (P < 0.01), with the highest dN (0.133 for GGPPS) being 5.5 times the lowest (0.024 for HDS). Both dN and dS, there was no significant difference between MVA and MEP pathways genes. For MVA pathway, the most upstream gene AACT  The dN/dS ratios estimated using the one-ratio model of CodeML are similar to the values of ratios calculated using the average nonsynonymous and synonymous substitution rates of 16 terpenoid pathway genes ( Table 1). The significant difference comparisons showed that the dN/dS ratios varied significantly among the 16 genes (P <0.001) with the highest value (0.193 for MCT) being 3.1 times the lowest (0.062 for HDS). However, there was no significant difference in dN/dS ratios between MVA and MEP pathways genes. For MVA pathway, according to compared pairwise statistically results, the downstream genes (MVK, PMVK and MVD) exhibit a significantly greater dN/dS ratios than the upstream genes (AACT, HMGS and HMGR) at level of P< 0.05. To MEP pathway, the downstream genes (HDS and HDR) did not exhibit a higher dN/dS ratios than the upstream genes (DXS and DXR), but the midstream genes (MCT, CMK and MECPS) exhibit a significantly higher dN/dS ratios than other MEP pathway genes. The results of two-ratios model with monocots, dicots, gymnosperms and each species as foreground branch showed that such evolutionary rate variation patterns of MVA and MEP pathways genes also exist in each individual species or lineages (Additional file 5). Additionally, we also used the free-ratio model to test whether the dN/dS ratios of the terpenoid pathway genes differed among lineages by comparing the likelihood values with one-ratio model. The LRT results showed that except for MVK, MVD and HDS genes, twice the log likelihood values difference (2Δl) of the other 13 genes were all greater than 31.41 (P < 0.05, under Chi-square test with df = 20), indicating that the dN/dS ratios of these 13 genes are different among lineages.

Detecting positive selection
In this study, we applied the two-ratios model and random-sites models to detect whether there were positive selection occurred on species-specific or lineage-specific branches and codon sites of terpenoid pathway genes, respectively. Two-ratios model results suggested that for all 16 genes only MECPS gene detected positive selection signature (ω1 = 9.908) in branch pinus taeda. The LRT statistic for two-ratios model to one-ratio model is 2Δl = 2 × 5.11 = 10.22, with P = 0.0013 and df = 1, indicating that the dN/dS ratio of MECPS gene for branch pinus taeda is significantly different for other branches. To test whether ω1 is significantly greater than 1, we reestimate the log likelihood value of the two-ratios model with ω1 = 1 fixed [59], calculating the lnL value is -6904.03. The LRT statistic for this comparison is 2Δl = 2 × 5.01 = 10.02, with P = 0.0015 and df = 1, suggesting that ω1 is significantly higher than 1 at 1% significance level.
Six random sites models were used to assume various classes of dN/dS ratios among the codon sites to the 16 terpenoid pathway genes separately. First, we compared models M3 and M0 by LRT to test whether dN/dS ratios is significantly different among sites. For all of 16 genes, the LRTs comparing M3 and M0 has the statistic 2Δl = 345.37~1674.48 much greater than the Chi-square critical values 13.28 at 1% level with df = 4. The results showed extreme variation in dN/dS ratios among codon sites. Second, we compared models M2a and M1a, models M8 and M7 to detect the positive selection sites. Comparison between M2a and M1a results suggested that, for all 16 genes, the LRTs statistic are 2Δl = 0.00~1.22, significantly lower than a χ 2 significance value 5.99 at 5% level with df = 2. Indicating that models M2a were not significantly fit the data than the null models M1a. By contrast, LRTs (2Δl = 6.32~120.58, 0.001 < P < 0.05, with df = 2) for the comparison of M8 and M7 were statistically significant for 14 of the 16 genes (except for MVD and IPPI), and identified several sites under positive selection for 7 of terpenoid pathway genes (Table 2). Interestingly, 5 of the 7 genes appear in the MEP pathway, while only 2 genes appear in the MVA pathway and its downstream.

Multivariate analyses
To determine whether the evolutionary rate variation of terpenoid pathway genes were affected by network structure and intramolecular characteristics, Spearman's rank correlation were used to estimate the correlations between these factors and the evolutionary parameters (dN and dN/dS) estimated by one-ratio model of 16 terpenoid pathway genes. As shown in Figure 4, there were no significant correlations between dN/dS ratios and other factors, except for with CDS length (ρ = -0.592, P = 0.016, with FDR correction, the same below). Similarly, the dN values was only negatively correlated with CDS length (ρ = -0.559, P = 0.024). when conducted correlation analyses for MVA and MEP pathways genes respectively, the correlations between  Since pathway position and PPI are interrelated and are each correlated with dN/dS ratios of MVA pathway genes, the partial correlation analysis were performed to estimate the relative contributions of these two factors. Partial correlation analysis showed that when controlling for pathway position, the correlation between PPI and dN/dS remained significant (ρ = 0.672, P = 0.036). However, when controlling for PPI, the significant correlations between pathway position and dN/dS were disappeared (ρ = 0.527, P = 0.143). These results suggest that PPI is an important factor affecting the evolutionary rate variation of MVA pathway genes.

Discussion
In previous research, Ramsay et al [30] have elucidated the evolutionary rate patterns of genes involved in the biosynthesis pathway of four terpenoids: lutein, abscissic acid, brassinosteroid and gibberellic acid. For this study, the fourteen MVA and MEP pathways genes and its downstream genes FPPS and GGPPS of 12 representative species with a large taxonomic span were used for the analysis of evolutionary rate variation. The results revealed that with a broad taxonomic span including monocots, dicots and gymnosperm, the dN values and dN/dS ratios were varied significantly among the 16 terpenoid pathway genes. However, with the similar dS values among 16 genes, suggesting that the differential mutation rates cannot explain the evolutionary rate variation of 16 terpenoid pathway genes [25]. In fact, the variation among genes in substitution rates is ubiquitous. Several other previous studies conducted on various biological pathways such as carotenoid pathway [60], anthocyanin pathway [25][26][27], isoflavonid pathway [61], starch pathway [33], phenylpropanoid pathway [31], isoprene pathway [29] and gibberellin pathway [32] have demonstrated it. Unlike other precursors, the precursor IPP of terpenoids can be generated by MVA and MEP pathway, respectively. Interestingly, our analysis suggests that MVA and MEP pathways genes have different evolutionary rate variation patterns, although there were no significant difference in dN/dS ratios between two pathways genes. For MVA pathway, the downstream genes evolved more rapidly than the upstream genes. This pattern are similar to the anthocyanin biosynthetic pathway genes [25,26] and isoflavonoid biosynthetic pathway genes [61]. One possible explanation for this pattern is that the upstream enzymes are subject to greater selective constraint than downstream enzymes, because of upstream enzymes exert more control over metabolic fluxes, and influence more end products than downstream enzymes [26,62]. First, in our study the evolutionary parameters dN and dN/dS ratios are much lower for the upstream genes than for the downstream genes, which means that purifying selection is stronger on upstream genes. Second, KEGG pathway analysis showed that the most two upstream enzymes (AACT and HMGS) of MVA pathway involved in 12 and 5 KEGG pathways, respectively. In contrast, the most two downstream enzymes (PMVK and MD) are only participated in the terpenoid backbone biosynthesis pathway (ko:00900). In addition, subsequent correlation analysis also indicated that the dN/dS ratios of MVA pathway genes were significant correlation with pathway position and PPI. An alternative explanation for this pattern is that downstream enzymes undergone more frequent position selection [27]. However, in this study neither the two-ratios branch model nor the two pairs random sites models detected any signature of positive selection for any species-specific branch or lineage-specific branch and codon sites of MVA pathway genes (except for AACT gene has one site under positive selection). Consequently, we demonstrate that the higher dN/dS ratios of the downstream genes in the MVA pathway is mainly put down to the relaxation of selective constraint rather than subject to more frequent positive selection. It should be noted, however, that the power to detect positive selection with the above methods is limited, particularly when data from across the taxonomic range of species are pooled together to analysis [63,64]. So we cannot completely rule out the effect of positive selection on the accelerated evolution of downstream genes in the MVA pathway. Some previous studies have provided the evidence that positive selection is responsible for accelerated rates of evolution [65][66][67].
In this study, the MEP pathway genes did not exhibit the same evolutionary rate variation pattern as MVA pathway genes, anthocyanin biosynthetic pathway genes and carotenoid biosynthetic pathway genes [25,60]. In the MEP pathway, the downstream genes HDS and HDR did not show a higher dN/dS ratios than upstream genes DXS and DXR. In contrast, three midstream genes MCT, CMK and MECPS exhibit greater dN/dS ratios than other MEP pathway genes. Subsequent correlation analysis revealed that there were no significant correlation between dN/dS ratios and pathway position, dN/dS ratios and PPI. This evolutionary rate variation of MEP pathway genes seems better explained by the pathway fluxes hypothesis, which illuminated that in biosynthetic and metabolic pathways, natural selection more likely target enzymes that control flux, and therefore trend to experience greater evolutionary constraints [68][69][70]. In MEP pathway, the early step enzymes DXS and DXR, and the last step enzyme HDR with lower dN/dS ratios are considered as key enzymes [1], consistent with the fluxes theory. We also examined this evolutionary rate variation could be explained by positive selection. Surprisingly, most of MEP pathway genes were detected with positive selection signature (Table 2), especially for the MCT gene with the highest dN/dS ratios, there were 10 positive selection sites with P-value of BEB greater than 0.95, and for the MECPS gene, which detected dN/dS ratio > 1 in branch pinus taeda under two-ratios model. In addition, 3 positive selection sites were also detected in the midstream gene CMK. In conclusion, the three midstream genes with higher dN/dS ratios were all detected stronger signature of positive selection. Consequently, we speculated that the evolutionary rate variation of MEP pathway genes is mainly attributed to the differential constraint and positive selection.
Several previous studies have demonstrated that the evolutionary rate of genes within networks and pathways are also affected by multiple network parameters and gene structure [36][37][38][39][40]. For this study, the Spearman's rank correlation test was conducted between the pathway position, PPI, GC3 content, G+C content, codon bias, gene copy number, CDS length and evolutionary rates of 16 terpenoid pathway genes. The results showed that the dN/dS ratios of all 16 genes were negatively correlated with CDS length. Several prior studies have also provided evidence that protein length play an key role in shaping the evolutionary rate of TLR signaling pathway genes of Suidae [71], HOG-MAPK pathway genes of Saccharomyces cerevisiae [72] and orthologs between Populus tremula and Populus trichocarpa [73]. While no significantly correlated with other factors. Similar to Ramsay et al [30] reported that no apparent variation in dN/dS ratios with gene copy number. Contrary to the significant correlations observed in paralogs between Arachis duranensis and Arachis ipaënsis (dN/dS ratios negatively correlated with GC3 content) [74], orthologs between Brassica rapa and Brassica oleracea (dN/dS ratios significantly correlated with G+C content and codon bias) [75] and pathway genes of converting glucose to the ABA, GA and brassinosteroids (dN/dS ratios correlate with pathway position ) [30]. When performed Spearman's rank correlation test to MVA and MEP pathway genes, respectively. The MEP pathway genes remained significantly correlated with CDS length, while the correlation with MVA pathway genes disappeared, suggesting that protein length have a greater affects on MEP pathway genes than MVA pathway genes. In addition, two new significant correlations were observed between dN/dS ratios and pathway position, dN/dS ratios and PPI. Therewith, we boldly suggested that the MVA pathway genes are subjected to stronger selective constraint than MEP pathway genes. However, limited to the identified of MVA and MEP pathways genes of plants, only a small dataset was used in this study. With the publication of more plants genome data, and the improvement of assembly level, a large dataset should be used for future analysis to provide stronger evidence for the evolutionary rate variation patterns of MVA and MEP pathways genes.

Conclusion
In this study, we have detected the dN and dN/dS ratios were varied significantly among the 16 terpenoid pathway genes. However, there were no significant difference in dN and dN/dS ratios between MVA and MEP pathways genes. The MVA and MEP pathways genes have different evolutionary rate variation patterns and attributed to differential selective constraint, selective constraint relaxation and positive selection, respectively. In addition, MECPS gene was detected dN/dS ratios >1 in branch Pinus taeda under two-ratios model, and most MEP pathway genes were detected with positive selection signature under M8 sites model. Furthermore, Spearman's rank correlation test results showed that CDS length were negatively correlated with the dN/dS ratios of all 16 genes and MEP pathway genes but MVA pathway genes, pathway position and PPI were significantly correlated with dN/dS ratios of MVA pathway genes. Our results will contribute to improving the understanding of natural selection in pathways.

Species and genes analyzed
Twelve representative plants were selected from the species that could be retrieved all of 14 MVA and MEP pathways genes, and the other 2 extra genes FPPS and GGPPS ( Figure.1). We retrieved coding sequence (CDS) of the 16 terpenoid pathway genes of Picea sitchensis, Elaeis guineensis, Zea mays, Oryza sativa, Vitis vinifera, Nicotiana attenuate, Morus notabilis, Eucalyptus grandis, Arabidopsis thaliana, Populus trichocarpa and Glycine max from NCBI or corresponding species genome database. The CDS of 16 terpenoid pathway genes of Pinus taeda were retrieved from the full-length transcriptome sequencing database (unpublished data). The retrieval methods refer to the research of Li et al [41] and Han et al. [42]. Non-redundant NCBI blast and genomic blast search were performed using the CDS of 16 terpenoid pathway genes of Arabidopsis thaliana as query sequences, and then retrieve orthologs sequences based on bidirectional blast-hit (BDBH) algorithm.
In Arabidopsis thaliana, several terpenoid pathway genes occur in multigene families, such as AACT, HMGR, MVD, DXS, IPPI, FPPS and GGPPS genes. However, HMGS, MVK, PMVK and other MEP pathway genes are known only as a single copy in Arabidopsis thaliana. For this, we selected AtAACT2 as the query sequence, which is six times more efficient than AtAACT1 and has an essential role in isoprenoid biosynthesis [1,43], and AtHMGR1 as query sequence, which is generally more highly expressed than AtHMGR2 [44][45][46], and AtMVD1 as query sequence, which is associated with the peroxisomes in vivo [47,48], and AtDXS1 as query sequence which encodes a functional DXS enzymes [49], and AtIPPI1 as sequery sequence which has overlapping functions in isoprenoid biosynthesis [50], and AtFPPS1 as query sequence which encodes two protein in the cytosol [51], and AtGGPP1 as query sequence, which is ubiquitously expressed in all organs [52].

Statistical analyses
To get a better understanding of factors that contribute to variation in evolutionary rate among terpenoid pathway genes, we counted the CDS length, pathway position, GC3 content, G+C content, gene copy number, pathway pleiotropy index (PPI) which measured using the pathway number of protein participate in and codon usage bias which evaluated by effective number of codons (ENC) of terpenoid pathway genes among each species. PPI were obtained from KEGG pathway database. ENC and G+C content were calculated with DnaSP 6.0 [53]. Pathway position were defined by the method that enzymes of MVA and MEP were sequentially numbered from DXS to IPP (1)(2)(3)(4)(5)(6)(7)(8) and AACT to IPP (1-7), respectively. Gene copy number were obtained using the method described in Chu et al [28]. First, we conducted Spearman's rank correlations for the bivariate correlation analysis between these factors and evolutionary parameters (dN and dN/dS) estimated by one-ratio model. And then, the partial correlation analysis were performed to estimate the relative contributions of the correlated factors. All of these statistical analyses were performed by applying statistical package R http://www.r-project.org .

Multiple sequence alignment and phylogenetic analyses
Multiple sequence align codons were performed using ClustalW program of MEGA 7.0 with default parameters [54], and then delete stop codons and refined manually. Finally saved the alignmented sequences files as PHYLIP and FASTA format for subsequent analyses. The sequences substitution saturation were detected using DAMBE 5.0 software [55]. The relative suitabilities of 24 nucleotide substitution models were evaluated for the 16 terpenoid pathway genes by the JmodelTest software, respectively. We selected the best subsitution model based on Akaike Information Criterion (AIC), the lower the AIC value, the better the subsitution model. The maximum likelihood phylogenetic tree of the 12 species were constructed based on combined CDS of 16 terpenoid pathway genes and the best substitution model by MEGA 7.0 with the default options.

Comparing dN, dS and dN/dS among genes
The pairwise nonsynonymous and synonymous substitution rates (dN and dS) of 12 species were estimated for the CDS of 16 terpenoid pathway genes using the CodelML program in PAML version 4.9 [56], with the parameters described in Yang and Nielsen [57]. We then used the method depicted by Yang et al [32] to detect whether the dN or dS differed among genes. We used the one-ratio model of CodelML program of PAML 4.9 to estimate a single value of dN/dS for the entire tree along each gene. we used the free-ratio model to assumes an independent dN/dS ratio for each branch of the tree along each gene. However, in consideration of the rich parameters need to be estimated in free-ratio model maybe increase the bias in the estimates, with a concomitant decrease in the variance. We also used two-ratios model to assumes the dN/dS for each branch in the phylogeny along each gene. The significance of differences in dN/dS ratio among genes was analyzed using the method described in the research of Lu and Ransher [26]. We conducted likelihood ratio test (LRT) to detect whether the free-ratio and two-ratios model fits the data significantly better than the one-ratio model.

Detection of positive selection
The nonsynonymous to synonymous substitution rate ratio (ω = dN/dS) provides a practical measure of natural selection at the gene level, with ω >1, ω =1 and ω <1 indicating positive selection, neutral evolution and purifying selection, respectively. In this study, except for using the two-ratios model to examine whether there existed a few specific-branches with the dN/dS ratio significantly greater than 1. We also applied six random sites models to detect whether specific sites are subject to positive selection, which estimate variable dN/dS ratio among sites, but no variation among lineages. Model M0 assumes a single ω value for all sites. M1a (neutral) assumes the conserved sites with ω0 <1 and the neutral sites with ω1 =1. Compared to M1a, model Ma2a allow a free ω2 value estimated from the data. M3 (discrete) assumes three site classes. M7 (beta) assumes a beta distribution of ω over sites. M8 (beta & ω) allow an extra site class with a free ω ratio estimated from the data. We then examined for a site class with ω >1 by comparing likelihood values under M2a vs. M1a and M8 vs. M7. The Bayes Empirical Bayes (BEB) analysis was used to calculate the posterior probabilities of positive selection sites [58]. We also detected for variable evolutionary rates among sites by comparing likelihood value under M3 and M0.

Declarations
Ethics approval and consent to participate Not applicable

Consent for publication Not applicable
Availability of data and materials All data generated or analysed during this study are included in this published article and its supplementary information. Additional file 1: Table S1. The coding sequence information of 16 terpenoid pathway genes. Additional file 2: Table S2. Gene copy number of terpenoid pathway genes among species for per gene. Additional file 3: Table S3. Summary statistic participated in multiple pathways of the five terpenoid pathway genes. Additional file 4: Table S4. Comparison of 24 nucleotide substitution modles for each gene. Additional file 5: Table S5. The dN/dS ratios of each foreground under two-ratios model.

Competing interests
The authors declare that they have no competing interests.

Authors' contributions
MJP and HSW conceived and designed the research and drafted the manuscript together. MJP, HLW and HJ performed the research and analyzed the data. LTY participated in the data analysis. All authors have read and approved the final manuscripts.