Determination of essential oil in German and Roman chamomile flowers
GC-MS analysis indicated that the essential monoterpenoid and sesquiterpenoid constituents in the flowers of German chamomile and Roman chamomile are significantly different (Figure 1 and Figure 2, Additional file 1). The relative quantities of monoterpenoids and sesquiterpenoids in disk flowers were always higher than in ray flowers in both chamomiles. The main compounds from German chamomile essential oil were sesquiterpenoids, such as α-Bisabolol oxide A, Chamazulene, α-Bisabolol oxide B, α-Bisabolol and Espatulenol. Meanwhile, the major constituents from Roman chamomile essential oil were n-Hexadecanoic acid, linoleic acid and other esters. The contents of α-Bisabolol oxide A, α-Bisabolol oxide B and Chamazulene in German chamomile were 50, 30, 10 times more than that in Roman chamomile. In addition, the content of these compounds in disk flowers were 2-10-fold greater than that in ray flowers in two chamomiles. These results are consistent with the previous findings reported by Yao et al.[27].
De novo transcriptome assembly and comparative RNA-seq analyses
After removing the terminal adaptor sequences, duplicated and ambiguous sequences, and low-quality reads, we generated approximately 106.72 Gb of Illumina RNA-seq data from mRNA extracted from ray flowers and disk flowers of German and Roman chamomile. There were 53.31Gb and 53.41Gb of clean read data used in the assemblies for German and Roman chamomile, respectively. The Q20 and Q30 scores were greater than 98% and 95% for the German and Roman chamomile transcriptome data, respectively. The final German chamomile cDNA assembly consisted of 117,203 unigenes; the average length was 1,056 bp and the N50 length was 1,686 bp. The final assembly for Roman chamomile had 147,616 unigenes with an average length of 914 bp and an N50 length of 1,506 bp (Table 1). There were 50,881 (43.41%) and 48,957 (33.17%) unigenes >1,000 bp in length in German and Roman chamomile, respectively.
Functional annotation and classification
All unigenes from both German chamomile and Roman chamomile were annotated using several public databases: Nr (NCBI non-redundant protein sequences), Nt (non-nucleotide), Swiss-Prot, COG (Clusters of Orthologous Groups of proteins), KEGG (the Kyoto Encyclopedia of Genes and Genomes), and GO (Gene ontology). There were 89,796 (60.83%) and 73,699 (62.88%) sequences annotated for German chamomile and Roman chamomile, respectively. We annotated 68,325 (NR: 58.30%), 47,469 (NT: 40.50%), 48,902 (Swissprot: 41.72%), 29,222 (COG: 24.93%), 52,423 (KEGG: 44.73%), 11,385 (GO: 9.71%), and 51,613 (Interpro: 44.04%) in the Roman chamomile transcriptome. Similarly, 80,594 (NR: 54.60%), 49,537 (NT: 33.56%), 56,793 (Swissprot: 38.47%), 35,889 (COG: 24.31%), 62,060 (KEGG: 42.04%), 13,766 (GO: 9.33%), and 63,945 (Interpro: 43.32%) unigenes were annotated in the German chamomile transcriptome using the seven functional databases (Table 2).
Overall, 29,222 (24.93%) unigenes in the German chamomile transcriptome were assigned to 25 COG categories (Supplementary Figure 1 A). Among these groups, unigenes belonging to “general function prediction” occupied the largest part (8,163), followed by transcription (4,382). In addition, 35,889 of the total 117,203 Roman chamomile unigenes were classified into 25 COG categories. The assignments (10,007) were mostly enriched in the ‘‘general function prediction’’, followed by the “transcription” clusters (5,192) (Supplementary Figure 1 B). KEGG pathway analysis was performed to further predict gene function in the biological pathways of the assembled unigenes in the German chamomile and Roman chamomile transcriptomes. In total, 62,060 unigenes in German chamomile were assigned to 135 signal pathways. Among these, 1,520 unigenes were annotated in “Metabolism of terpenoids and polyketides”. Also, 52,423 unigenes in Roman chamomile were categorized into 136 pathways, and 1,633 unigenes were annotated in “Metabolism of terpenoids and polyketides” (Supplementary Figure 2).
Identification of DEGs and the further analysis on the terpenoids pathway
We identified DEGs by comparing the FPKM (Fragment Per Kilobase of exon model per Million mapped reads) values between the different libraries; thresholds were log2 fold-change >1 and FDR (False Discovery Rate) ≤0.001 [28]. We used the number of DEGs mapping to a pathway/total number of genes mapped to this pathway (rich factors) to estimate the relative degree of enrichment in these pathways. The maximum rich factor (0.5) for pathways in the MC_DF vs. MC_RF comparison was benzoxazinoid biosynthesis, followed by Zeatin biosynthesis (0.38), sesquiterpenoid and triterpenoid biosynthesis (0.35), and flavone and flavonol biosynthesis (0.34). Also, the rich factor of terpenoid backbone biosynthesis was 0.29 and diterpenoid biosynthesis was 0.22. In CN_DF vs. CN_RF, the top three were the ribosome (0.22), vancomycin resistance (0.12), and terpenoid backbone biosynthesis (0.11). In CN_RF vs. MC_RF the top three were benzoxazinoid biosynthesis (0.24), histidine metabolism (0.23), and photosynthesis - antenna proteins (0.22); in addition, carotenoid biosynthesis was 0.19 and limonene and pinene degradation was 0.15. In CN_DF vs. MC_DF, the top three were glucosinolate biosynthesis (0.089), histidine metabolism (0.084), and benzoxazinoid biosynthesis (0.081); terpenoid backbone biosynthesis was 0.057 (Figure 3 and Additional file 2). DEGs identified by comparing RF with DF clustered in the pathways for disease and pest resistance and terpenoid metabolism. The DEGs between German chamomile and Roman chamomile were clustered in disease and pest resistance pathways. A higher rich factor between two different stages in either German or Roman chamomile gives an indication that the secondary metabolism involved in the terpenoid metabolic pathways is different.
Meanwhile, DEGs were identifed in the terpenoid biosynthetic pathways of German and Roman chamomile. A schematic representation of the DEGs and annotated genes in the biosynthetic pathways for these compounds is shown in Figure 4. Terpenoid biosynthesis utilizes isoprenoid precursors from terpenoid backbone biosynthesis (MVA and MEP pathways). In the MVA pathway, two AACT, four HMGS, and two HMGR were up regulated in MC-DF vs. MC-RF. In the MEP pathway, one DXS (1-deoxy-D-xylulose-5-phosphate synthase) and two DXR (1-deoxy-D-xylulose-5-phosphate reductoisomerase) genes were down regulated in MC-DF vs MC-RF. Also, DXS and DXR may play rate-limiting roles in controlling metabolic flux through the MEP pathway [29]. A previous study reports that DXS and HDR are both encoded by small gene families in higher plants, and influence the accumulation of downstream isoprenoids [30]. For example, the three DXS genes in maize (Zea mays) encode functional enzymes, and two different HDR genes have been identified in loblolly pine (Pinus taeda) [31, 32].
In the MC-DF vs. MC-RF comparison, AACT and HMGS are both up-regulated in the terpenoid biosynthesis pathway. There are two and four down-regulated DEGs related to GPPS (geranyl diphosphate synthase) and FPPS (farnesyl pyrophosphate synthase), respectively, and DEGs related to GGPPS (geranylgeranyl pyrophosphate synthase), SS (squalene synthase), and PSY (phytoene synthase) were all up-regulated. Also in MC-DF vs MC-RF, 17 DEGs related to terpene synthase (TPS) were identified, and among them, the relative expression of 15 TPS genes was down-regulated. Terpenoid biosynthesis in plants is catalyzed by a family of enzymes known as terpene synthases that convert prenyl diphosphates to various subclasses of terpeneoids [33]. In the CN-DF vs. CN-RF comparison, there were one and three DEGs related to FPPS and TPS that were down-regulated in expression. These results indicate that the gene expression levels of the rate-limiting upstream enzymes and a variety of TPS genes downstream could result in the observed differences in both the variety and contents of terpenoids in flowers of German and Roman chamomile.
TFs play a diverse role in regulating secondary metabolism pathways by turning genes on and off in plants [34]. We searched for candidate TF genes in the transcriptomes of German chamomile and Roman chamomile, and identified 94 differentially expressed transcription factor genes (52 up-regulated and 42 down-regulated) in CN_DF vs. CN_RF. We also identified 59 differentially expressed (31 and 28 up- and down-regulated, respectively) transcription factor genes in CN_DF vs. MC_DF, 328 (167 up-regulated and 161 down-regulated) in CN_RF vs. MC_RF, and 479 (267 up-regulated and 212 down-regulated) in the in MC_DF vs. MC_RF comparison (Additional file 3).
PPIN construction and analysis between German chamomile and Roman chamomile
A total of 477 and 505 interaction pairs involved in terpenoid biosynthesis pathway were identified from German chamomile and Roman chamomile, respectively. We selected the interaction proteins for further analysis, and the PPIN related to terpenoid biosynthesis pathway were shown in figure 5. And we found these proteins were involved in “Sesquiterpenoid and triterpenoid biosynthesis”, “Brassinosteroid biosynthesis”, “Propanoate metabolism”, “Steroid biosynthesis”, “Carotenoid biosynthesis” and “Valine, leucine and isoleucine degradation” in German chamomile and Roman chamomile. In addition, we also identified three, three and four proteins involved in “Glycosphingolipid biosynthesis - ganglio series”, “Diterpenoid biosynthesis” and “beta-Alanine metabolism” in German chamomile. Meanwhile, we identified three, one, two and one proteins involved in “Butanoate metabolism”, “Stilbenoid, diarylheptanoid and gingerol biosynthesis”, “Flavone and flavonol biosynthesis” and “Tryptophan metabolism” in Roman chamomile.
We found amount of CYPs interaction with protein involved in terpenoid biosynthesis pathway in PPIN of German chamomile and Roman chamomile. And there were more types and amount of CYPs interaction protein in German chamomile than that in Roman chamomile. In addition, we found transcription factors such as MYB, WD-40 repeat and Zinc finger protein in PPIN of German chamomile and Roman chamomile. And the types and amount were also different. Moreover, interactions between proteins related to terpenoid biosynthesis pathway between German chamomile and Roman chamomile. (Figure 5 and Additional file 4).
Comparison of nucleotide sequence identity and divergence analysis between German and Roman chamomile
A high level of nucleotide sequence similarity was found in genes that are homologous between German and Roman chamomile, with 54% of the genes sharing >75% identity. We obtained 60,338 transcripts that were common to German Chamomile and Roman chamomile. We also detected 56,865 and 87,278 special transcripts in German and Roman chamomile, respectively (Figure 6 A). In addition, the special transcripts identified in the two species were used as queries to search the KEGG databases; we found that 2,960 (43%) and 3,861 (49%) of the unigenes are involved in secondary metabolite biosynthesis in German and Roman chamomile, respectively. Also, 1,000 unigenes (46%) were related to plant-pathogen interactions in German chamomile and 1,295 (50%) in Roman chamomile. A total of 27,886 putative ortholog pairs were identified between German and Roman chamomile. Of these ortholog pairs, 584 had a Ka/Ks value more than 1, suggesting that they were under positive selection. Among these positive selection genes, there were 1 unigenes (CMK: 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase) related to terpenoids backbone biosynthesis, 1 unigenes (germacrene D synthase) related to sesquiterpenoids biosynthesis, 2 unigenes ((E)-beta-ocimene synthase and (+)-neomenthol dehydrogenase) related to monoterpenoids biosynthesis and 1 unigenes (Artemisia annua cytochrome P450 mono-oxygenase) related to diterpenoids biosynthesis (Table 3). Since the Ka/Ks value is widely used to detect selective pressure (environmental factors) acting on protein-coding sequences, rapid evolution of the genes involved in terpenoids backbone biosynthesis, monoterpenoids biosynthesis, sesquiterpenoids biosynthesis and diterpenoids biosynthesis might be associated with adaptive selection between German and Roman chamomile.
Comparison analysis in German chamomile, Roman chamomile and other composite family plants
A total of 92,393 gene families were identified by orthofinder, among them 1,232 single copay genes were identified. There were 6 gene families specifically in German chamomile, and 8 gene families specifically in Roman chamomile. In gene families which specifically from German chamomile, we found one gene family related to disease resistance. A total of 5, 718 gene families were specifically both in German chamomile and Roman chamomile (Figure 6B). The KEGG enrichment results showed that gene families specifically both in German chamomile and Roman chamomile were mainly enriched in ‘Metabolism’, and 472 gene families were enriched in ‘Biosynthesis of secondary metabolites’ (Additional file 5). The phylogenetic tree of single copay genes showed that German chamomile and Roman chamomile had a high identity to C. nankingense (Figure 6C). In addition, we analyzed unigenes of German chamomile, Roman chamomile and other composite family plants based on the Pfam database, and the results showed that unigenes related to TFs such as WD-repeat and Zinc−finger were more than other composite family plants. Furthermore, there were more MATE and ABC transporter in German chamomile and Roman chamomile (Figure 6D).
qRT-PCR analysis and correlation analysis
We used qRT-PCR (quantitative real-time PCR) analyses to examine the expression levels of 21 DEGs from the transcriptomic libraries of German chamomile (14 DEGs) and Roman chamomile (7 DEGs). Of these genes, 19 showed differential expression levels that matched the RNA-seq data. The agreement rate was 90.48% (Supplementary Figure 3 and Additional file 6). These results verified the accuracy of the transcriptome data. The three samples of German chamomile and Roman chamomile showed good correlation; the correlation index was >0.84 for CN-DF-1, CN-DF-2 and CN-DF-3, and it was >0.99 for CN-RF-1, CN-RF-2, and CN-RF-3; MC-DF-1, MC-DF-1, MC-DF-2 and MC-DF-3; and MC-RF-1, MC-RF-2 and MC-RF-3, which indicated that the transcriptome data was relatively accurate. However, the correlation index between MC-DF and CN-DF was ~0.8, and the correlation index between MC-RF and CN-RF was ~0.85. The correlation index between MC-DF and MC-RF was only 0.48-0.49, and between CN-DF and CN-RF it was 0.5-0.63 (Supplementary Figure 4).