To select the optimal stages for transcriptome sequencing, the oil accumulation patterns during V. galamensis seed development were analyzed (Fig. 1). The results showed that only a little oil was produced in the developing seeds at 10 and 17 DAP (1.77% and 1.84%, respectively), but a most noticeable increase in seed oil content occurred from 24 to 38 DAP. Thereafter, the growth rate of oil content changed gradually till 45 DAP when the maximum oil content (35.8%) was reached (Fig. 1a). We further monitored the dynamic transition for fatty acid components in V. galamensis seed oils, including vernolic acid (EFA), palmitic acid (16:0), stearic acid (18:0), oleic acid (18:1), linoleic acid (18:2) and linolenic acid (18:3). On the whole, the percentage of EFA showed a similar increasing trend with total oil accumulation in developing V. galamensis seeds. Accompanying the accumulation of EFA, oleic acid and linoleic acid contents were largely decreased. Palmitic acid level gradually declined during seed development. Additionally, there was little change in stearic acid and linolenic acid which make up a lower relative proportion in developing seeds (Fig. 1b).
To identify candidate genes associated with EFA biosynthesis and metabolism, we selected a mixed sample (T01) and three seed developing stages at 17 (T02), 38 (T03), and 45 DAP (T04), which are consistent with the initial, fast and final stages of EFA accumulation, for transcriptome sequencing. 12 cDNA libraries considering three biological replicates at each sample generated around 265 million raw reads. After quality-control processing, a total of 77.6 Gb nucleotide bases were obtained. The average Q30 base ratio (sequencing quality value) and GC percentage for each library was 92.75 and 44.94%, respectively (Table 1). These reads were then de novo assembled by Trinity toolkit, resulting in 67,114 unigenes with a mean length of 729.96 nt (nucleotides) and N50 length of 1316 nt (Table 2). The assembled unigenes mainly ranged from 200 to 2000 nt, and unigene numbers had a trend of gradual decrease as the length of unigene increased. Of the unigenes, 25,321 (37.73%) sequences had a shorter length range between 200 to 300 nt, 9132 (13.61%) transcripts were between 1000 to 2000 nt, and 5261 (7.84%) sequences were longer than 2000 nt in length (Fig. 2a).
Functional annotation, classification and DEG analysis
Based on sequence similarity search, 35,680 (53.16%) unigenes were functionally annotated in at least one public database, whereas the function of rest sequences (31,434) remained unknown (Table 3). The E-value distribution of BLASTX hits showed that 38% of unigenes shared strong homology to previously reported sequences (< 1.0E-100) (Fig. 2b). Unigenes from V. galamensis had the most similarities with genes from Vitis vinifera (7981), followed by Solanum lycopersicum (4183) and Theobroma cacao (3088) (Fig. 2c). Overall, the longer sequences were more easily annotated with BLAST hits (Fig. 2d).
Table 3
Summary of functional annotations
Annotated Database
|
Annotated Unigene Number
|
300<=length<1000
|
length>=1000
|
COG_Annotation
|
12043
|
3975
|
5884
|
GO_Annotation
|
26882
|
10248
|
11191
|
KEGG_Annotation
|
8371
|
3254
|
3363
|
KOG_Annotation
|
20521
|
7701
|
8530
|
Pfam_Annotation
|
23945
|
8570
|
11869
|
Swissprot_Annotation
|
23667
|
9337
|
10131
|
nr_Annotation
|
35308
|
14017
|
13682
|
All_Annotated
|
35680
|
14156
|
13705
|
Based on Nr annotation, a total of 26,882 (40.05%) unigenes could be assigned to three main GO categories, including biological process (21,364 unigenes, 31.83%), molecular function (21,031 unigenes, 31.34%) and cellular component (17,749 unigenes, 26.45%). Within the biological process, metabolic process and cellular process represented the dominant GO terms. In category of cellular component, cell part and cell were the two top sub-categories. The terms related to binding and catalytic activity were most abundant in molecular function (Fig. 3a). Eukaryotic Orthologous Groups (KOG) analysis showed that the matched 28,896 unigenes were divided into 26 function classes, with the signal transduction mechanism (2406 unigenes) as the largest group (Fig. 3b). From KEGG system, 8371 unigenes were annotated to 118 metabolic pathways (Additional file 2). The most predominant pathways were ribosome (ko03010, 771), followed by oxidative phosphorylation (ko00190, 350) and protein processing in endoplasmic reticulum (ko04141, 317). Moreover, we also annotated several regulation networks of lipid accumulation, including fatty acid metabolism (ko00071, 101), biosynthesis of unsaturated fatty acids (ko01040, 77), glycerolipid metabolism (ko005610, 72) and alpha-linolenic acid metabolism (ko00592, 21).
In our study, FPKM value was used to quantify the expression level of each unigene. Gene expression levels were further analyzed by pairwise comparisons across the developmental stages (initial, fast and final phases of EFA accumulation) to identify differentially expressed genes (DEGs). Our criteria led to the identification of 2345 DEGs across three developing samples. There were 531, 464, and 121 genes that were differentially expressed specifically in the group of 38 DAP vs 17 DAP, 45 DAP vs 38 DAP, and 45 DAP vs 17 DAP, respectively (Fig. 4a). In the comparison of 38 DAP vs 17 DAP, there existed the largest number of DEGs (1629) and up-regulated genes (999). Only 441 DEGs were identified in 45 DAP vs 17 DAP, including 138 down- and 303 up-regulated genes (Fig. 4b). The results showed that a more remarkable change in gene expression occurred in 38 DAP vs 17 DAP, as compared to 45 DAP vs 38 DAP. It was noteworthy that the number of up-regulated genes outnumbered the down-regulated genes in all comparison groups.
To explore the functional differences among DEGs in developing V. galamensis seeds, GO and KEGG enrichment analyses were conducted. Firstly, 651 annotated DEGs were assigned to 42 functional subgroups within three main GO categories, implying that these DEGs were functionally involved in diverse physiological processes (Fig. 5). Among these unigenes, the most enriched terms were those involved in regulatory activity for DNA replication, G2/M transition of mitotic cell cycle, and histone phosphorylation. Furthermore, many DEGs were related with the process of fatty acid hydrolysis, fatty acid alpha-oxidation and unsaturated fatty acid biosynthesis (Additional file 3). We highlighted that these enzymes may play critical roles in determining the fatty acid composition in V. galamensis seeds. Further, 495 DEGs were mapped to 87 KEGG pathways (Additional file 3). After pathway enrichment, we noticed that abundant DEGs were included in fatty acid biosynthesis, biosynthesis of unsaturated fatty acid, glycerolipid metabolism, glycosphingolipid biosynthesis, and sphingolipid metabolism, which were closely related to lipid accumulation in V. galamensis seed.
Characterization of genes contributing to high EFA accumulation in V. galamensis seeds
Although EFA is greatly enriched in V. galamensis seed oil, the molecular mechanism underlying its high accumulation is still unclear. The primary purpose of our study is to uncover crucial genes involved in the high accumulation of EFA, and we expect these genes would be used in genetic manipulation for commercial production of industrial-valued epoxy oil in traditional oilseed crops. Based on functional annotations, we identified 116 unigenes relevant to epoxy oil metabolism (Additional file 4), including the previously reported VgDGAT1 and VgDGAT2 [13], further showing that transcriptome analysis is a powerful tool for mining genes of interest. We then fully characterized all critical steps and enzymes involved in EFA-enriched oil biosynthetic pathway (Fig. 6).
It is generally known that FA synthesis begins with the transformation of acetyl-CoA to molonyl-CoA catalyzed by rate-limiting acetyl-CoA carboxylase (EC: 6.4.1.2, ACCase), an enzyme complex composed of four subunits: alpha-carboxyltransferase (α-CT), beta-carboxyltransferase (β-CT), biotin carboxyl carrier protein (BCCP) and biotin carboxylase (BC). We detected nine unigenes encoding ACCase subunits (one for α-CT, one for β-CT, four for BCCP, and three for BC protein, respectively). The expression of these genes exhibited a coordinated pattern, with the highest expression level observed at 38 DAP. Next, an acyl carrier protein (ACP) was covalently linked to malnonyl group by malonyl-CoA:ACP transacylase (EC: 2.3.1.39, MCAT) to form malonyl-ACP, an important substrate for following acyl chain elongation. During FA elongation process, butyryl-ACP was initially generated after the first condensation cycle. This reaction was completed within the participation of four successive enzymes: beta-ketoacyl-ACP synthase III (EC: 2.3.1.180, KAS III), NADPH-dependent beta-ketoacyl-ACP reductase (EC: 1.1.1.100, KAR), 3-Hydroxyacyl-ACP dehydratase (EC: 4.2.1.159, HAD) and enoyl-ACP reductase (EC: 1.3.1.9, EAR). Butyryl-ACP was further elongated to C16-ACP during the following six cyclic reactions, each of which used the same enzymes as the former but KAS III was replaced by KAS I (EC: 2.3.1.41). Whereas the last elongation of C16-ACP to C18-ACP required a third condensing enzyme (KAS II, EC: 2.3.1.179). We detected nearly all unigenes involved in the biosynthesis of C18-ACP, but MCAT was missed in our data, which may be due to its rare expression abundance in V. galamensis seed. Considering malonyl-ACP produced from MCAT was an important substrate for following acyl chain elongation, it is possible that other unknown genes may determine the transfer of malonyl moiety to ACP. Most 18:0-ACP obtained from final elongation process could be desaturated by stearoyl-ACP desaturase (EC: 1.14.19.2, SAD). There are seven SAD family genes in Arabidopsis, and AtFAB2 (At2g43710) has a primary role in dehydrogenation reaction. We found that there existed only one SAD gene (homolog of AtFAB2) in V. galamensis seed, with the highest expression at 38 DAP. The elongation of FA was terminated when acyl group was released by acyl-ACP thioesterase (EC: 3.1.2.14, Fat), yielding free FA molecular. Two Fat gene families, FatA with a preference for unsaturated 18:1-ACP and FatB showing marked activity on saturated FAs, were detected with 1 and 4 unigenes, respectively. The expression levels of FatA and FatB exhibited opposing trends, and only FatA expression was the highest at 38 DAP. These free FAs were ultimately linked to CoA by long chain acyl-CoA synthetase (EC: 6.2.1.3, LACS) and exported to endoplasmic reticulum (ER) for modification. In our dataset, 22 unigenes encoding LACS were successfully annotated. Two unigenes, one for ER-localized LACS8 and one for plastid-localized LACS9, were deemed as DEGs throughout seed development.
In ER, most acyl-CoAs needed to be transferred to PC pools via lysophosphatidylcholine acyltransferase (EC: 2.3.1.23, LPCAT) for further desaturation or modification. This reaction and its reversible reacylation catalyzed by LPCAT was the key component of acyl editing. Generally, oleic acid (C18:1) was channeled into PC, and then sequentially desaturated by ER-associated FAD2 and FAD3 to form C18:2 and C18:3. V. galamensis seed contained 80% EFA, and this modified fatty acid was controlled by a divergent form of FAD2. In our transcriptome, we detected four unigenes encoding FAD2 desaturase. These genes were expressed at a relatively steady level, and quickly dropped at the final stage sampled, 45 DAP. We also discovered a novel FAD2 variant, which was designated VgFAD2-like. The FAD2-like enzyme was most closely related to delta 12 fatty acid epoxygenase identified in S. laevis (Fig. 7a), and this enzyme contained the typical histidine box domains (Fig. 7b). In addition, VgFAD2-like gene showed bell-shaped expression pattern with a peak at 38 DAP. The increased expression of this gene is coincided with the increased accumulation of EFA-containing TAG during V. galamensis seed development, suggesting that VgFAD2-like was likely a fatty acid epoxygenase in V. galamensis. FAD3 genes were detected with two copies, and their expressions were very low, especially when seeds progressed to mid- and late-stages. After modification, PC releases FA into acyl-CoA pool through a constant deacylation reaction by LPCAT. Alternatively, PC can remove its phosphocholine head group and then degrade into DAG by phosphatidylcholine:diacylglycerol cholinephosphotransferase (PDCT). Phospholipid:diacylglycerol acyltransferase (PDAT) can also directly transport FA from PC into the sn-3 position of DAG to produce TAG, which is referred to as PC-mediated TAG assembly (Fig. 6). The resulted FAs on CoA (including de novo synthesized, desaturated and modified FAs) were subsequently assembled into glycerol-3-phosphate (G3P) to form TAG, referred to as the Kennedy pathway. In acyl editing pathway, we isolated one unigene of LPCAT1 and two unigenes of LPCAT2, and only LPCAT1 gene was highly expressed throughout all developmental stages. For Kennedy pathway-related enzymes, we found that there were four unigenes encoding GPAT, five encoding LPAT, three encoding PAP and four encoding DGAT (including two DGAT1 and two DGAT2). Among 16 unigenes in this pathway, 9 unigenes exhibited a high transcript level in V. galamensis seed, and five unigenes substantially up-regulated their expression level, including two GPAT genes, one DGAT1 and two DGAT2 at 38 DAP. For PC-mediated TAG synthesis, PDAT1 was ubiquitously expressed in developing seeds at all stages, while PDAT2 showed a linear rise during seed development up to 38 DAP, and then decreased to a low level at 45 DAP. However, the expression level of PDCT was negligible in V. galamensis seed.
Once synthesized, TAG is protected in the form of oil bodies which is stabilized by phospholipid-containing proteins, such as oleosins, caleosins and steroleosins. We identified eight unigenes encoding oleosin, four encoding caleosin and two encoding steroleosin. In V. galamensis seeds, expression levels of oleosin genes were the most abundant, followed by caleosins and steroleosins. It is generally known that fatty acid β-oxidation is essential for catabolism of stored TAG in seed-oil storing plants. The suppression of the TAG-disassembling genes might be conducive to oil accumulation in oilseeds. Unexpectedly, we noticed the expression levels of multiple genes related to fatty acid β-oxidation, including acyl-CoA oxidase (ACX), enoyl-CoA hydratase (ECH), and 3-ketoacyl-CoA thiolase (KAT) displayed a significant up-regulation from 17 DAP to 38 DAP (Additional file 5). Finally, we examined the transcriptional profiles of certain transcription factors with potential roles in oil accumulation. WRINKLED1 (WRI1) is a master regulator of plant oil synthesis. This TF belongs to the Apetala2 ethylene response element binding factor (AP2/EREB) family. We found that VgWRI1 was quickly up-regulated in the early period of oil accumulation, then kept a relatively high expression in middle period, and finally declined in the late developing period in V. galamensis seed. These results supported a critical role for VgWRI1 in the early and middle period of V. galamensis oil accumulation. FUSCA3 (FUS3) and ABSCISIC ACID INSENSITIVE4 (ABI4), two other lipid-metabolism-related transcription factors, were highly expressed in the late period of oil accumulation in V. galamensis seeds. Except for these unigenes, we did not identify the transcripts for other TFs relevant to lipid metabolism, such as LEAFY COTYLEDON1 (LEC1), LEC2, and MYB89.
Transient expression of selected genes for enhancing EFA-enriched TAG accumulation in tobacco leaves
To determine whether the VgFAD2-like gene isolated from our transcriptome data has catalytic specificity for EFA formation, we transiently expressed this gene in tobacco leaves mediated by Agrobacterium infiltration. As expected, the presence of a large amount of EFA (8.2%) was detected in the GC trace, suggesting the potential role of VgFAD2-like enzyme in manipulation of high-valued expoy oil (Table 4). Besides, we also tested the effect of VgLPCAT1 and VgPDCT on EFA incorporation in TAG. Using tobacco transient assay system, coexpression of VgLPCAT1 and VgFAD2-like gene resulted in further increased amounts of EFA compared to the individual expression of VgFAD2-like gene alone (Table 4). By contrast, there was no significant change in fatty acid composition for additional expression of VgPDCT gene compared to the VgFAD2-like alone (Table 4). These observations indicated that VgLPCAT, rather than VgPDCT, contributed to more EFA accumulation in the TAG, possibly due to the increased efficiency of EFA removal from PC. In addition, the above-mentioned gene sets represent a likely promising resource for future metabolic engineering of EFA production in commercial oilseed crops like soybean.
Table 4
Fatty acid composition of Nicotiana benthamiana leaves transiently expressing the selected genes
Lines
|
Fatty acid composition (% of total)
|
|
|
|
16:0
|
18:0
|
18:1
|
18:2
|
18:3
|
20:0
|
EFA
|
Control
|
21.6 ± 0.3
|
7.3 ± 0.1
|
3.9 ± 0.1
|
16.4 ± 0.3
|
42.7 ± 0.7
|
7.1 ± 0.2
|
0
|
VgFAD2-like
|
21.5 ± 0.5
|
7.9 ± 0.3
|
4.1 ± 0.1
|
17.5 ± 0.5
|
33.2 ± 0.9
|
5.8 ± 0.1
|
8.6 ± 0.2
|
VgFAD2-like + VgPDCT
|
22.3 ± 0.5
|
7.6 ± 0.1
|
3.7 ± 0.4
|
17.9 ± 0.6
|
32.4 ± 1.2
|
6.4 ± 0.5
|
8.2 ± 0.1
|
VgFAD2-like + VgLPCAT1
|
21.4 ± 1.6
|
7.9 ± 0.9
|
4.3 ± 0.2
|
21.3 ± 0.9
|
22.5 ± 1.9
|
4 ± 0.3
|
19.4 ± 0.4
|
Error bars ± SD, n = 3. |
Validation of gene expression using quantitative real-time PCR
To validate the RNA-seq data, twelve genes (α-CT, FatB, SAD, FAD2, FAD2-like, FAD3, GPAT, LPCAT, PDAT, WRI1, Oleosin, ACX) associated with lipid biosynthesis were analyzed by qRT-PCR. The results showed that the gene expression levels detected by qRT-PCR analysis are consistent with those obtained by transcriptome data (Fig. 8, Additional file 6), thus confirming that the data from RNA-Seq were reliable.