Determination of essential oil constituents in flowers of German and Roman chamomile
GC-MS analysis indicated that the essential monoterpenoid and sesquiterpenoid constituents in the flowers of German chamomile and Roman chamomile are significantly different (Figure 1 and Figure 2, Additional file 1). The relative quantities of monoterpenoids and sesquiterpenoids in disk flowers were always higher than in ray flowers in both chamomile species. The main compounds present in German chamomile essential oil are sesquiterpenoids, such as α-Bisabolol oxide A, Chamazulene, α-Bisabolol oxide B, α-Bisabolol, and Espatulenol. The major constituents in Roman chamomile essential oil are n-Hexadecanoic acid, linoleic acid, and other esters. The levels of α-Bisabolol oxide A, α-Bisabolol oxide B, and Chamazulene in German chamomile were 50-, 30-, and 10-fold higher than in Roman chamomile. In addition, the contents of these compounds in disk flowers were 2 to 10-fold greater than in the ray flowers of the two chamomiles. These results are consistent with previous findings reported by Yao et al..
De novo transcriptome assembly and comparative RNA-seq analyses
After removing the terminal adaptor sequences, duplicated and ambiguous sequences, and low-quality reads, we generated approximately 106.72 Gb of Illumina RNA-seq data from mRNA extracted from ray flowers and disk flowers of German and Roman chamomile. We used 53.31 Gb and 53.41 Gb of clean read data in the transcriptome assemblies for German and Roman chamomile, respectively. The Q20 and Q30 scores were greater than 98% and 95%, respectively, for the German and Roman chamomile transcriptome data. The final German chamomile cDNA assembly consisted of 117,203 unigenes; the average length was 1,056 bp and the N50 length was 1,686 bp. The final assembly for Roman chamomile consisted of 147,616 unigenes with an average length of 914 bp and an N50 length of 1,506 bp (Table 1). There were 50,881 (43.41%) and 48,957 (33.17%) unigenes >1,000 bp in length in German and Roman chamomile, respectively.
Functional annotation and classification
All unigenes from both German chamomile and Roman chamomile were annotated using several public databases: Nr (NCBI non-redundant protein sequences), Nt (non-nucleotide), Swiss-Prot, COG (Clusters of Orthologous Groups of proteins), KEGG (the Kyoto Encyclopedia of Genes and Genomes), and GO (Gene Ontology). There were 89,796 (60.83%) and 73,699 (62.88%) sequences annotated in the German chamomile and Roman chamomile transcriptomes, respectively. We annotated 68,325 (NR: 58.30%), 47,469 (NT: 40.50%), 48,902 (Swissprot: 41.72%), 29,222 (COG: 24.93%), 52,423 (KEGG: 44.73%), 11,385 (GO: 9.71%), and 51,613 (Interpro: 44.04%) unigenes in the Roman chamomile transcriptome. Similarly, 80,594 (NR: 54.60%), 49,537 (NT: 33.56%), 56,793 (Swissprot: 38.47%), 35,889 (COG: 24.31%), 62,060 (KEGG: 42.04%), 13,766 (GO: 9.33%), and 63,945 (Interpro: 43.32%) unigenes were annotated in the German chamomile transcriptome using the seven functional databases (Table 2).
Overall, 29,222 (24.93%) unigenes in the German chamomile transcriptome were assigned to 25 COG categories (Supplementary Figure 1A). Among these groups, unigenes belonging to “general function prediction” occupied the largest part (8,163), followed by “transcription” (4,382). In addition, 35,889 of the total 117,203 Roman chamomile unigenes were classified into 25 COG categories. The assignments (10,007) were mostly enriched in the ‘‘general function prediction’’, followed by the “transcription” clusters (5,192) (Supplementary Figure 1 B). KEGG pathway analysis was performed to further predict gene function in the biological pathways of the assembled unigenes in the German chamomile and Roman chamomile transcriptomes. In total, 62,060 unigenes in German chamomile were assigned to 135 signal pathways. Among these, 1,520 unigenes were annotated in “Metabolism of terpenoids and polyketides”. Also, 52,423 unigenes in Roman chamomile were categorized into 136 pathways, and 1,633 unigenes were annotated in “Metabolism of terpenoids and polyketides” (Supplementary Figure 2).
Identification of DEGs and further analysis of the terpenoid biosynthesis pathway
We identified DEGs by comparing the FPKM (Fragment Per Kilobase of exon model per Million mapped reads) values between the different libraries; thresholds were log2 fold-change >1 and the FDR (False Discovery Rate) was ≤0.001 . We used the number of DEGs mapping to a pathway/total number of genes mapped to the pathway (enrichment factors) to estimate the relative degree of enrichment in these pathways. The maximum enrichment factor (0.5) for pathways in the MC_DF (German chamomile disk florets) vs. MC_RF (German chamomile ray florets) comparison was benzoxazinoid biosynthesis, followed by zeatin biosynthesis (0.38), sesquiterpenoid and triterpenoid biosynthesis (0.35), and flavone and flavonol biosynthesis (0.34). Also, the enrichment factors for terpenoid backbone biosynthesis and diterpenoid biosynthesis were 0.29 and 0.22, respectively. In the CN_DF (Roman chamomile disk florets) vs. CN_RF (Roman chamomile ray florets) comparison, the top three were the ribosome (0.22), vancomycin resistance (0.12), and terpenoid backbone biosynthesis (0.11). In CN_RF vs. MC_RF the top three were benzoxazinoid biosynthesis (0.24), histidine metabolism (0.23), and photosynthesis - antenna proteins (0.22); in addition, carotenoid biosynthesis was 0.19 and limonene and pinene degradation was 0.15. In CN_DF vs. MC_DF, the top three were glucosinolate biosynthesis (0.089), histidine metabolism (0.084), and benzoxazinoid biosynthesis (0.081); terpenoid backbone biosynthesis was 0.057 (Figure 3 and Additional file 2). DEGs identified by comparing RF with DF clustered in the pathways for disease and pest resistance and terpenoid metabolism. The DEGs between German chamomile and Roman chamomile were clustered in disease and pest resistance pathways. The enrichment factors in the MC_DF vs. MC_RF comparison were higher than in the CN_DF vs. CN_RF comparison, and the enrichment factors in CN_RF vs. MC_RF were higher than in CN_DF vs. MC_DF.
We also identifed DEGs in the terpenoid biosynthetic pathways of German and Roman chamomile. A schematic representation of the DEGs and annotated genes in the biosynthetic pathways for these compounds is shown in Figure 4. Terpenoid biosynthesis utilizes isoprenoid precursors from terpenoid backbone biosynthesis (MVA and MEP pathways). In the MVA pathway, two AACT, four HMGS, and two HMGR were up regulated in MC-DF vs. MC-RF. In the MEP pathway, one DXS (1-deoxy-D-xylulose-5-phosphate synthase) and two DXR (1-deoxy-D-xylulose-5-phosphate reductoisomerase) genes were down regulated in MC-DF vs MC-RF. Also, DXS and DXR may play rate-limiting roles in controlling metabolic flux through the MEP pathway . A previous study reports that DXS and HDR are both encoded by small gene families in higher plants, and influence the accumulation of downstream isoprenoids . For example, the three DXS genes in maize (Zea mays) encode functional enzymes, and two different HDR genes have been identified in loblolly pine (Pinus taeda) [31, 32].
In the MC-DF vs. MC-RF comparison, AACT and HMGS are both up-regulated in the terpenoid biosynthesis pathway. There are two and four down-regulated DEGs related to GPPS (geranyl pyrophosphate synthase) and FPPS (farnesyl pyrophosphate synthase), respectively, and DEGs related to GGPPS (geranylgeranyl pyrophosphate synthase), SS (squalene synthase), and PSY (phytoene synthase) were all up-regulated. Also in MC-DF vs MC-RF, 17 DEGs related to terpene synthase (TPS) were identified, and among them, the relative expression of 15 TPS genes was down-regulated. Terpenoid biosynthesis in plants is catalyzed by a family of enzymes known as terpene synthases that convert prenyl diphosphates to various subclasses of terpeneoids . In the CN-DF vs. CN-RF comparison, there were one and three DEGs related to FPPS and TPS that were down-regulated in expression. These results indicate that the gene expression levels of the rate-limiting upstream enzymes and a variety of TPS genes downstream could result in the observed differences in both the variety and contents of terpenoids in flowers of German and Roman chamomile.
TFs play a diverse role in regulating secondary metabolism pathways by turning genes on and off in plants . We searched for candidate TF genes in the transcriptomes of German chamomile and Roman chamomile, and identified 94 differentially expressed transcription factor genes (52 up-regulated and 42 down-regulated) in CN_DF vs. CN_RF. We also identified 59 differentially expressed (31 and 28 up- and down-regulated, respectively) transcription factor genes in CN_DF vs. MC_DF, 328 (167 up-regulated and 161 down-regulated) in CN_RF vs. MC_RF, and 479 (267 up-regulated and 212 down-regulated) in the in MC_DF vs. MC_RF comparison (Additional file 3).
Construction and analysis of a protein–protein interaction network (PPIN) between German chamomile and Roman chamomile
A total of 477 and 505 interacting pairs involved in terpenoid biosynthesis were identified from German chamomile and Roman chamomile, respectively. We selected the interacting proteins for further analysis, and the PPIN related to the terpenoid biosynthesis pathway is shown in Figure 5. We found that these proteins are involved in “sesquiterpenoid and triterpenoid biosynthesis”, “brassinosteroid biosynthesis”, “propanoate metabolism”, “steroid biosynthesis”, “carotenoid biosynthesis” and “valine, leucine and isoleucine degradation” in German chamomile and Roman chamomile. In addition, we also identified three, three, and four proteins involved in “glycosphingolipid biosynthesis - ganglio series”, “diterpenoid biosynthesis”, and “beta-alanine metabolism” in German chamomile. In Roman chamomile, we identified three, one, two, and one proteins involved in “butanoate metabolism”, “stilbenoid, diarylheptanoid and gingerol biosynthesis”, “flavone and flavonol biosynthesis” and “tryptophan metabolism”.
We found a number of CYPs (cytochrome P450 enzymes) that interact with proteins involved in terpenoid biosynthesis in the PPINs for German chamomile and Roman chamomile. There were more types and more CYP-protein interactions in German chamomile than in Roman chamomile. In addition, we found transcription factors such as MYB, WD-40 repeat, and zinc finger proteins in the PPINs of German chamomile and Roman chamomile, and the types and number of interactions were also different (Figure 5 and Additional file 4).
Comparison of nucleotide sequence identity and divergence analysis between German and Roman chamomile
A high level of nucleotide sequence similarity was found in genes that are homologous between German and Roman chamomile, with 54% of the genes sharing >75% identity. We obtained 60,338 transcripts that were common to German Chamomile and Roman chamomile. We also found 56,865 and 87,278 transcripts that are specific to German and Roman chamomile (special transcripts), respectively (Figure 6A). In addition, the special transcripts identified in the two species were used as queries to search the KEGG database; we found that 2,960 out of 6,811 (43%) and 3,861 out of 7,772 (49%) of the unigenes are predicted to be involved in secondary metabolite biosynthesis in German and Roman chamomile, respectively. Also, 1,000 unigenes out of 2,194 (46%) 1,295 out of 2,588 (50%) were related to plant-pathogen interactions in German and Roman chamomile, respectively (Additional file 5). A total of 27,886 putative ortholog pairs were identified between German and Roman chamomile. Of these ortholog pairs, 584 had Ka/Ks values >1, suggesting that they are under positive selection. Among these positive selection genes, there was one unigene (CMK: 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase) related to terpenoid backbone biosynthesis, one unigene (germacrene D synthase) related to sesquiterpenoid biosynthesis, two unigenes ((E)-beta-ocimene synthase and (+)-neomenthol dehydrogenase) related to monoterpenoid biosynthesis and one unigene (Artemisia annua cytochrome P450 mono-oxygenase) related to diterpenoid biosynthesis (Table 3). Since the Ka/Ks ratio is widely used to detect selective pressure (environmental factors) acting on protein-coding sequences, rapid evolution of the genes involved in terpenoid backbone biosynthesis, monoterpenoid biosynthesis, sesquiterpenoid biosynthesis, and diterpenoid biosynthesis could be associated with adaptive selection between German and Roman chamomile.
Comparative transcriptomic analysis of German chamomile, Roman chamomile, and other studied species
A total of 92,393 gene families were identified by orthofinder, among which there were 1,232 single-copy genes. There were six gene families specific to German chamomile, and eight gene families were specific to Roman chamomile. In the German chamomile-specific gene families, we identified one gene family related to disease resistance. A total of 5,718 gene families were found specifically in both German chamomile and Roman chamomile (Figure 6B). The KEGG enrichment results showed that gene families specific to both German chamomile and Roman chamomile were mainly enriched in ‘metabolism’, and 472 gene families were enriched in ‘biosynthesis of secondary metabolites’ (Additional file 6). The phylogenetic tree constructed from single-copy gene sequences shows that German chamomile and Roman chamomile are in a clade with Chrysanthemum nankingense (Figure 6C). In addition, we analyzed unigenes from German chamomile, Roman chamomile and other studied plants based on the Pfam database, and the results showed that unigenes encoding TFs such as WD-repeat and zinc-finger proteins were more prevalent in German chamomile and Roman chamomile than in other studied specie. Furthermore, there were more MATE and ABC transporter genes in German chamomile and Roman chamomile (Figure 6D).
qRT-PCR analysis and correlation analysis
We used qRT-PCR (quantitative real-time PCR) analyses to examine the expression levels of 21 DEGs from the transcriptomic libraries of German chamomile (14 DEGs) and Roman chamomile (7 DEGs). Of these genes, 19 showed differential expression levels that matched the RNA-seq data. The agreement rate was 90.48% (Supplementary Figure 3 and Additional file 7). These results verified the accuracy of the transcriptome data. The three replicate samples from both German chamomile and Roman chamomile showed good correlations; the correlation index was >0.84 for CN-DF-1, CN-DF-2 and CN-DF-3, and it was >0.99 for CN-RF-1, CN-RF-2, and CN-RF-3; MC-DF-1, MC-DF-2 and MC-DF-3; and MC-RF-1, MC-RF-2 and MC-RF-3, which indicated that the transcriptome data was relatively accurate. However, the correlation index between MC-DF and CN-DF was ~0.8, and the correlation index between MC-RF and CN-RF was ~0.85. The correlation index between MC-DF and MC-RF was only 0.48-0.49, and it was 0.5-0.63 between CN-DF and CN-RF (Supplementary Figure 4).