RNA-Seq analysis of longan early SE aligned with the Dimocarpus longan Draft Genome
To provide a comprehensive understanding of longan SE at a transcriptional level, we sequenced the four cDNA libraries constructed from the four in vitro embryo developmental stages (NEC, EC, ICpEC, and GE, Figure 1). A total of 243,783,126 clean reads (comprising approximately 24.38 G of nucleotides) were obtained after data cleaning and quality checks. After aligned with longan reference genome , 48,798,229 (81.62%), 52,623,741 (81.1%), 48,346,067 (81.14%), and 48,871,200 (82.08%) reads in four cDNA libraries were mapped to longan reference genome, respectively. Among these, 44,655,772 (74.69%), 48,333,703 (74.50%), 44,490,292 (74.67%), and 44,924,511 (74.45%) reads were uniquely mapped to one location, respectively. Meanwhile, 34,380,246 (57.51%), 35,386,494 (54.54%), 30,535,088 (51.25%), and 29,214,788 (49.07%) reads in four cDNA libraries were mapped to gene, respectively. A summary of mapping statistics obtained for each sample is given in Table 1.
The transcribed regions/units of four different stages samples were constructed independently, generated 22,743, 19,745, 21,144, and 21,102 expressed transcripts, showed 57.89%, 50.26%, 53.82%, and 53.71% overlapped with longan genome (39,282 genes), respectively. After filtering out short sequences which less than 180 bp and low sequencing depth that lower than two, 1,935, 1,710, 1,816, and 1,732 novel transcripts in four samples were detected, respectively. Among these, 1,025, 819, 832, and 806 novel genes were identified as coding RNAs, and 910, 891, 984, and 926 novel genes were identified as non-coding RNAs in longan genome.
Alternative splicing (AS) events represented in our transcriptome were predicted by TopHat2. We analyzed the exon level of the four samples, 110,864, 103,200, 107,592, and 107,971 expressed exon were detected (Table 1) . A total of 130,354 AS events were checked across the four stages, including exon skipping, intron retention, alternative 5’ splicing and alternative 3’ splicing. The largest number of AS events were detected in GE (39,768), followed by ICpEC (36,446), and NEC (35,084), and the smallest in EC (19,056). Exon skipping is the least type in all samples, and intron retention is the most popular type of AS events in NEC, ICpEC and GE (Figure 2).
Global analysis of gene expression across the four distinct developmental processes
There were 22,743, 19,745, 21,144, and 21,102 expressed genes in NEC, EC, ICpEC, and GE stage. Among these, more than 75.3% of the expressed genes were present in all four developmental stages, while 2645 genes were only expressed in NEC. However, only 366, 505 and 588 genes were unique present in EC, ICpEC, and GE stage, respectively (Figure 3a), which suggested that distinct spatial transcriptional patterns were present in the four developmental processes. To evaluate the differences of molecular response among four samples, gene expression were normalized to FPKM by RSEM software. After filtering with FPKM>60, a total of 2961 (11.40%), 3445 (13.26%), 3445 (13.26%), and 3442 (13.25%) genes were highly expressed in NEC, EC, ICpEC, and GE, respectively (Table 2). The Top10 most enriched (FPKM) genes were range from 5476 to 58812, 2766 to 15114, 2343 to 10330, and 2091 to 4004, respectively. The top 20 most expressed genes from the four libraries were shown in Tables 3, SE-related genes such as leafy cotyledon 1 (LEC1), leafy cotyledon 1-like (L1L), Protodermal factor 1 (PDF1), lipid transfer protein (LTP), Heat-Shock protein 90 (HSP90), chitinase (CHI), Indole-3-acetic acid-amido synthetase GH3.6, glutathione S-transferase (GST), root meristem growth factor 3 (RGF3) were highly expressed in EC, ICpEC or GE stage.
To reveal the potential key genetic factors involved in early SE, we filtered the significantly differentially expressed genes (DEGs) with |log2fold change| ≥ 1 and FDR < 0.001 between these four pairwise comparisons as follow: NEC_vs_EC, EC_vs_ICpEC, EC_vs_GE, and ICpEC_vs_GE. Among these four comparisons (Figure 3b), a total of 10,642, 4,180, 5,846 and 1,785 DEGs were identified, respectively. Compared with NEC, EC had 4,887 up-regulated and 5,755 down-regulated genes. Compared with EC, ICpEC had 2,689 up-regulated and 1,491 down-regulated genes, GE had 3,451 up-regulated and 2,395 down-regulated genes. Compared with ICpEC, GE had 832 up-regulated and 953 down-regulated genes. DEGs analysis revealed that longan transcriptome undergoes significantly dynamic changes during SE, particularly during the transition period from NEC to EC. Therefore, the longan SE transcriptome datasets given here may serve as a valuable molecular resource for future studies.
Functional classification of DEGs base on GO and KEGG
To evaluate the potential functions of the DEGs, we used GO terms assignment to classify the functions of DEGs in pairwise comparisons under three GO main categories: biological process, cellular component and molecular function (Additional file 7: Fig. S1). In all pairwise comparisons, the term with the largest proportion in “biological process” was ‘metabolic process’, followed by ‘cellular process’, ‘single-organism process’, ‘respond to stimulus’ and ‘localization’, the term with the largest proportion in “cellular component” were ‘cell’ and ‘cell part’, followed by ‘organelle’ and ‘membrane’, the term with the largest proportion in “molecular function” was ‘catalytic activity’, followed by ‘binding’, ‘transporter activity’, ‘molecular transducer activity’ and ‘nucleic acid binding transcription factor activity’ .
To investigate the biological pathways of the DEGs, we used the KEGG database to classify the DEGs function with emphasis on biological pathways (Additional file 8: Fig. S2). According to KEGG annotation, 6516 DEGs (NEC_vs_EC) were assigned to 128 pathways, 2514 DEGs (EC_vs_ICpEC) were assigned to 126 pathways, 3555 DEGs (EC_vs_GE) were assigned to 126 pathways, 1062 DEGs (ICpEC_vs_GE) were assigned to 111 pathways. The annotated changes in all comparisons were mainly enriched in ‘metabolic pathway’ (21.38%, 22.43%, 23.12% and 25.52%, respectively), ‘biosynthesis of secondary metabolites’ (11.97%, 11.46%, 11.70% and 14.52%, respectively), ‘plant-pathogen interaction’ (8.01%, 8.23%, 7.59% and 6.40%, respectively) and ‘plant hormone signal transduction’ (5.22%, 5.41%, 5.40% and 8.38%, respectively) pathway. Furthermore, dozens of genes involved in ‘flavonoid biosynthesis’, ‘phenylpropanoid biosynthesis’, ‘zeatin biosynthesis’, ‘fatty acid biosynthesis’ and ‘biosynthesis of unsaturated fatty acids’.
Differential expression analysis of plant hormones signaling pathway related genes during longan SE
Based on the KEGG and other annotation, plant hormone signal transduction, zeatin biosynthesis and tryptophan metabolism were the representative pathways in our study. A large number of genes invovled in auxin (97 DEGs) and cytokinin (94 DEGs) biosynthesis and signal transduction pathway were differentially expressed when compared EC with NEC (Additional file 9: Fig. S3) and early SE. For example, the expression level of PIN1, IAA (IAA6, IAA6-like, IAA9, IAA11, IAA14, IAA16, IAA29, IAA31 and IAA33), ARFs (ARF1, ARF1-like, ARF2, ARF2-like, ARF5, ARF10, ARF16, ARF17, ARF18 , ARF18-1 and ARF24), GH3 (GH3.6, GH3.1, GH3.17), and three SAUR, genes involved in auxin signal transduction, were significantly up-regulated from NEC to EC, most of them remained highly expression in EC, ICpEC and GE stages. Nevertheless, AUX1, TIR1, IAA (IAA1, IAA4, IAA13, IAA26, IAA26-like, IAA27), ARFs (ARF4, ARF4-like, ARF10-like), GH3.9 and GH3.17-like, and 12 SAUR were mainly expressed in NEC stage and down-regulated in EC. From EC to ICpEC and GE stages, AUX1 (Dlo_024286.1, Dlo_031956.2), IAA (IAA4, IAA14, IAA26-like, IAA27, IAA13), ARFs (ARF4, ARF4-like, ARF10-like), two SAUR showed noteworthy up-regulated expression (Figure 4a). In IAA biosynthesis, except PAI, Trp synthesis key genes ASA, IGS, TSA, TSB, were up-regulated in EC and remained high expression during early SE. CYP83B1, one ST5a, five YUCCAs, three CYP71A13 and NIT showed NEC-specific expression pattern. Three YUCCAs, three AAO1, one NIT, CYP71A13 and three ST5a were up-regulated in EC and remained high during early SE, and YUCCA_Dlo_013505.1 kept up-regulated during early SE (Figure 4b).
As showed in Figure 4c, TRIT1, a gene involved in cis-zeatin synthesis was up-regulated from NEC to GE. CisZOG family involved in Cis-zeatin O-glycosylation were highly expressed in NEC, and significantly down-regulated from NEC to EC. During early SE, five CisZOG were up-regulated from EC to ICpEC, four CisZOG were down-regulated from ICpEC to GE. In trans-zeatin biosynthesis, six IPT1,4, five CYP735A, four CKX, three UGT76C were noteworthy down-regulated from NEC to EC; two IPT1,4, four CYP735A, one CKX, three UGT76C were up-regulated in EC. During early SE, IPT1,4 family, five CYP735A, two CKX, four UGT76C were up-regulated during early SE with minimal FPKM. Among the cytokinin signal pathway, two A_ARR, 10 B_ARR, 15 CRE1 were mainly expressed in NEC, and down-regulated in EC. One A_ARR, five B_ARR, seven CRE1 were up-regulated in EC. 13 CRE1, seven B_ARR and all A_ARR showed up-regulated expression during early SE, two B_ARR and five CRE1 were down-regulated during early SE (Figure 4d).
In addition, numerous genes involved in abscisic acid, gibberellin, ethylene, salicylic acid, jasmonic acid and brassinosteroid signal transduction pathway were differentially expressed during longan SE (Additional file 1: Table S1 a-h; Additional file 10: Fig. S4). Such an observation suggested an essential role of hormones and their complicated crosstalk during early SE. Therefore, the plant hormones signaling pathway may be the key regulator during longan early SE.
Flavonoids and fatty acid biosynthesis related genes were differential expressed during longan SE
Flavonoid biosynthesis and fatty acid biosynthesis were the representative KEGG pathways, a total of 125 significant DEGs were assigned to ‘flavonoid biosynthesis’ across the early SE processes (Figure 5). In the transition from NEC to EC, the flavonoid biosynthesis key genes, C4H, CHS, CHI, F3H, F3’5’H, DFR, LDOX/ANS, ANR, LAR, CCoAOMT were mainly expressed in NEC, while drastic down-regulated from NEC to EC and remained very low expression level in ICpEC and GE stages, except that F3H_Dlo_011012.1, F3’5’H_ Dlo_010496.1, LAR_ Dlo_022420.1, CCoAOMT_ Dlo_005144.2 were up-regulated in EC, but down-regulated during early SE. Besides, most of the FLS and F3’H family were mainly expressed in NEC, significantly down-regulated in EC and kept low FPKM during early SE, especially, 15 F3’H and 9 FLS belonged to NEC-specific genes. Only four FLS and six F3’H were up-regulated from NEC to EC and then down-regulated or kept low expression level during early SE (Additional file 2: Table S2).
Several R2R3-MYB transcription factors are involve in the regulation of flavonoid biosynthesis in Arabidopsis [49-51]. For example, AtMYB11, -12, -111 regulated flavonol biosynthesis by up-regulated CHS, CHI, F3H, F3’H and FLS [49, 52]. AtMYB75, -90, -113, -114 controlled anthocyanin biosynthesis in vegetative . AtMYB123 controlled the biosynthesis of proanthocyanidins in the seed coat . MtMYB5, -14 played the key role in seed coat polymer biosynthesis . AtMYB4 negative controlled sinapate ester biosynthesis through down-regulated C4H in a UV-dependent manner . In our study, 11 R2R3-MYB transcripts were differentially expressed. During longan SE, MYB12 and MYB111 were barely expressed in NEC, significant up-regulated from NEC to EC and remained high during early SE. MYB75, MYB113, MYB4 and MYB123 were significant down-regulated in EC, and kept relative low expression during early SE.
The fatty acid composition rapidly changed during SE in Daucus carota , and Gossypium hirsutum . In our study, a total of 35 fatty acid biosynthesis related genes were differently expressed during SE (Additional file 3: Table S3). From NEC to EC, except ACCase (Dlo_000360.1), three FabG, two FabZ, SAD (Dlo_031652.1), most of the ACCase, FabD, FabF, FabG, FabZ, FabI, FatB and SAD were significantly up-regulated in EC. During early SE, most of the DEGs remained high expression, part of them with slightly up/down-regulated expression. For example, ACCase (Dlo_023270.1) and SAD (Dlo_019646.1) were up-regulated from NEC to EC, and highly expressed during early SE. Our results indicated that flavonoids were mainly expressed in NEC, while fatty acid were mainly accumulated in early SE stages, especially in EC.
Extracellular protein encoding genes effect on the transition from NEC to EC
It had been reported that extracellular protein germins and germin-like (GLPs), Arabinogalactan proteins (AGPs), chitinases (CHIs), lipid transfer proteins (LTPs) and glycoprotein were critical to SE, and can be served as protein marker during early SE . In our study, 16 CHIs were differentially expressed, and most of them were preferential expressed in NEC, and remarkable down-regulated in EC, only seven CHIs were up-regulated during early SE with low FPKM. Among the 14 identified LTPs, only LTP (Dlo_013012.1, Dlo_013014.1) were highly and specific expressed in early SE, most of them were mainly expressed in NEC and down-regulated from NEC to EC. Meanwhile, 12 GLPs and two secreted glycoprotein genes (EP1-like) were mainly expressed in NEC and kept very low FPKM during early SE. Except AGP10 was first up-regulated in EC and down-regulated during early SE, most of the AGPs were down-regulated in EC, and kept relative low expression level during early SE (Additional file 4: Table S4). The results indicated that most of the extracellular protein encoding genes were mainly expressed in NEC, they were predicted to involve in the transition from NEC to EC.
Characterization of Molecular Markers for longan SE
Several genes have been reported to molecular marker of SE, such as somatic embryogenesis receptor-like kinase (SERK), leafy cotyledon1 (LEC1), BABYBOOM (BBM), wuschel (WUS), WUS-homeobox (WOX). In order to characterize the full-scale of molecular markers for early SE, the comparative analysis of FPKM in nine tissues of longan , including root, stem, leaf, flower, flower bud, young fruit, pericarp, pulp and seed were employed to select the molecular marker genes during SE. For our purposes here, it is crucial to identify the reliable molecular marker genes for distinguishing NEC stage from EC, ICpEC and GE stages. In our study, several embryogenesis-labeled genes that had been reported previously were differentially expressed in each stage (Additional file 5: Table S5). However, some of them showed down-regulated or slightly up-regulated in EC, and kept low expression level from NEC to GE, such as late embryogenesis abundant protein (LEA14A, LEAD34, LEA76), SERK1, SERK3, WUS, WOX5, WOX3, AIL6, AGL15, CLV1, EMB8, suggesting that they were unseemly markers for longan SE.
In our study, a total of 55 genes were identified as representative molecular markers, which were closely related to SE, can be classified as two main categories: NEC markers and SE molecular markers by their specific expression profiles in all test-samples (Table 4). The SE marker genes were barely or undetected in NEC, highly expressed during early SE, they also can be divided into SE-specific and SE-expressed genes. The SE-specific genes were highly transcribed only in somatic embryos, including LEC1, LEC2, WOX9, WOX2, Agamous-like 80 (AGL80), PIN-FORMED1 (PIN1), BBM, PLETHORA2 (PLT2), mannan endo 1,4-beta-mannosidase7 (MAN7), Glycine-rich protein 5 (GRP-5), GRF-interacting factor 2 (GIF2), root meristem growth factor 3 (RGF3), 60S ribosomal protein L17e (RPL17e), zeta-carotene desaturase (ZDS), 3-ketoacyl-CoA synthase (KCS), CYP78A5, CYP87A3 and three unknown genes (DlU1, DlU2, DlU3) (Table 4). These SE-specific genes might play a key role in longan SE. The SE-expressed genes were similar to SE-specific genes, except that these genes also highly expressed in one or some tested tissues included in this study, including LEC1-like (L1L), ABA-insensitive protein 3 (ABI3), FUSCA3 (FUS3), Indole-3-acetic acid-amido synthetase (GH3.6), Protodermal factor 1.3 (PDF1.3), Lipid transfer protein (LTP, Dlo_013012.1) and Lipid binding protein (LBP). For instance, L1L, FUS3 and ABI3 showed very strong transcription level not only in somatic embryos but also in seed. GH3.6 was highly expressed in flower, PDF1.3 and LBP showed high expression level in pulp, LTP also highly transcribed in pulp, flower bud, flower and stem, suggesting their multifunctional on SE and other development processes (Table 4).
On the contrary, 28 representative NEC marker genes were highly and preferentially expressed in NEC, barely or undetected in EC, ICpEC and GE, including LEA5, CCR4-NOT transcription complex subunit 3 (CNOT3), pathogenesis-related protein (PR1-1, PR1-like, PR4), 14 kDa proline-rich protein DC2.15 (DC2.15), chitinases (CHI: Dlo_030517.1, Dlo_024175.1), catalase (CAT), Lipid transfer proteins (NsLTP2, DIR1), aquaporins (PIP1, PIP2.1, TIP2-1), peroxidases (POD-P7, POD5), osmotin-like protein 1 (OSM1), expansin-like B1 (EXLB1), Pectinesterase precursor (PME1), chalcone synthase (CHS), thaumatin-like protein (TLP1), Gibberellic Acid Stimulated Transcript-like (GAST1), ethylene-responsive transcription factor 114 (ERF114), glutathione S-transferase (GST, Dlo_032871.1), germin-like protein 3 (GLP3), and three unknown genes (DlU4, DlU5, DlU6) (Table 4). The NEC-specific marker genes maybe the key inhibitor of the transition from NEC to EC, while the SE markers may function on SE development.
qRT-PCR Verification of selected molecular markers
To experimentally confirm that the molecular markers were indeed expressed and played a key role during longan SE, 16 molecular markers, including 8 transcription factors DlLEC1_Dlo_017092.1, DlL1L_Dlo_020821.1, DlABI3_Dlo_012160.1, DlWOX9_ Dlo_022316.1, DlWOX2_Dlo_032045.1, DlAGL80_Dlo_017585.1, DlBBM_Dlo_011527.1 and DlPLT2_Dlo_004646.1, auxin metabolism gene DlGH3.6_ Dlo_020986.1, auxin polar transport gene DlPIN1_Dlo_020694.1, 3 meristem growth regulation genes DlPDF1.3_ Dlo_030812.1, DlRGF3_Dlo_026048.1, DlGIF2_Dlo_026819.1, 2 extracellular protein encoding genes DlLTP_Dlo_013012.1 and DlCHI_Dlo_030517.1, a late embryogenesis abundant protein gene DlLEA5_Dlo_019949.1, were selected for qRT-PCR identification in the synchronized cultures at distinct developmental stages during longan SE, including NEC, EC, ICpEC, GE, torpedo-shaped embryos (TE) and cotyledonary embryos (CE).
Base on the qRT-PCR results, all selected genes were expressed at varying levels at different development stages (Figure 6). The selected molecular markers DlLEC1, DlPDF1.3, DlGH3.6, DlPIN1, DlWOX9, DlWOX2, DlGIF2, DlRGF3, DlPLT2 and DlAGL80 were barely or undetected in NEC, while they mainly expressed during early SE, they all highest expressed in EC and then down-regulated during SE, showed relative low expression in TE and CE, indicated that those molecular markers played an important role in EC induction and maintainance. Meanwhile, DlL1L, DlBBM, DlABI3 and DlLTP were highly expressed or up-regulated during SE processes, and minimally or undiscovered expressed in NEC, suggested that those marker genes may positive regulated the longan SE development. In addition, the transcription level of DlLEA5 and DlCHI were highly and specific expressed in NCE, they may the inhibitor of the transition from NEC to EC. qRT-PCR validation of SE-related genes also showed a high correlation between RNA-seq and qRT-PCR data.