Identification and expression patterns of lncRNAs at 50d after flowering in soybean
A number of lncRNAs were previously identified in young soybean pods 30-40 d after flowering [16]. On the basis of this information, additional lncRNAs were further identified in soybean pods 50 d after flowering. Six cDNA libraries were constructed including ‘MT72’ (libraries MT_50d) and ‘JN18’ (libraries JN_50d), both 50 d after flowering (Additional file 1: Table S1). A total of 10,500 lncRNAs and 54,370 target mRNAs were identified, and 7,520 lncRNAs were identified in MT_50d as well as JN_50d, accounting for 71.6% of the total (Fig. 1a, Additional file 2: Table S2). Then, to further explore the transcription and genetic characteristics of the identified lncRNAs, the mRNAs encoded by soybean proteins were compared and analyzed. Based on the location of the corresponding protein-coding genes, it was observed that 5,129 (48.8%) lncRNAs were mainly produced in intron regions, accounting for the largest number, whereas 196 (1.9%) antisense lncRNAs were detected, accounting for the smallest number (Fig. 1b). By comparing the number of exons, it was found that most lncRNAs had fewer exons than mRNAs. In particular, 8,527 (81.2%) lncRNAs contained two exons, accounting for the highest proportion. However, the number of mRNAs containing exons was relatively average, and most mRNAs contained approximately two exons (Fig. 1c). In addition, the length and chromosome distribution of the two RNAs types were compared. When the length distribution of lncRNAs was less than 1kb, the proportion of lncRNAs was higher than that of mRNAs, and it subsequently decreased (Fig. 1d). Chromosome distribution results showed that different types of lncRNAs were also associated with different chromosomal regions. Denser lines indicate an increased lncRNA distribution (Fig. 1e-f).
Differentially expressed lncRNAs and mRNAs in five groups at different stages after soybean flowering
Previously published data on the differential expression of lncRNAs and mRNAs in soybean 30 d and 40 d after flowering were combined with the newly identified data 50 d after flowering to conduct a comprehensive analysis. The criteria for identifying differentially expressed lncRNAs and mRNAs were |log2 (fold change) | value > 1 and a statistically significant p value of ≤ 0.05. In total, 1,805 differentially expressed lncRNAs were identified in five different comparison groups (Additional file 3: Table S3). Among them, 336, 489, 44, 485 and 451 were identified in JN_50d versus JN_30d (‘JN18’ 50 d after flowering versus ‘JN18’ 30 d after flowering), JN_50d versus JN_40d (‘JN18’ 50 d after flowering versus ‘JN18’ 40 d after flowering), MT_50d versus JN_50d (‘MT72’ 50 d after flowering versus ‘JN18’ 50 d after flowering), MT_50d versus MT_30d (‘MT72’ 50 d after flowering versus ‘MT72’ 30 d after flowering), and MT_50d versus MT_40d (‘MT72’ 50 d after flowering versus ‘MT72’ 40 d after flowering) (Fig. 2a). Additionally, among the 1,805 differentially expressed lncRNAs identified in the five groups, 1,017 were upregulated and 788 were down-regulated (Fig. 2b). Volcano maps were drawn to further analyze their overall distribution. Most lncRNAs presented different multiples of differences (Fig. 3). Then a systematic cluster analysis was performed for all 1,805 differentially expressed lncRNAs (Fig. 4). Most of them had specific expression patterns in different materials at different stages.
Also, 39,313 target mRNAs were found to be differentially expressed in five comparison groups, among them, 11,127 were up-regulated and 28,186 were down-regulated (Fig. 5a-b, Additional file 4: Table S4). Cluster analysis was conducted on the 39,313 differentially expressed target mRNAs, and it was found that most mRNAs showed specific expression patterns in the same period in different materials (Fig. 6).
Functional analysis of differentially expressed genes
To further explore the potential functions of lncRNAs, GO and KEGG analysis were conducted on the potential target mRNAs of differentially expressed lncRNAs. A total of 322 target mRNAs were found to be related to fatty acid synthesis (Additional file 5: Table S5). The GO term indicated that the main functions of the target mRNAs of differentially expressed lncRNAs included oxidation-reduction process (GO:0055114), integral component of membrane (GO:0016021), ATP binding (GO:0005524), metal ion binding (GO:0046872) (Fig. 7). More importantly, the 20 most significant GO terms showed that the functions related to fatty acid synthesis of the target mRNAs included fatty acid biosynthetic process (GO:0006633), unsaturated fatty acid biosynthetic process (GO:0006636), and long-chain fatty-acyl-CoA metabolic process (GO:0035336). In particular, the XM_003536636.3, XM_003521793.3, XM_003539327.2 and XM_003551789.3 genes have the potential functions related to fatty acid biosynthetic process (GO:0006633). The XM_003534019.3 gene has the potential functions related to fatty acid transporter (GO:0015908) and fatty acid transporter activity (GO:0015245). The XM_003540190.3 gene has the potential functions related to fatty-acyl-CoA reductase (alcohol-forming) activity (GO:0080019), lipid metabolic process (GO:0006629) and long-chain fatty-acyl-CoA metabolic process (GO:0035336).
Subsequently, KEGG analysis showed that the 20 most significant metabolic pathways in each comparison group mainly included photosynthesis (ko00195), car-bon fixation in photosynthetic organisms (ko00710), tryptophan metabolism (ko00380), and plant hormone signal transduction (ko04075) (Fig. 8). Among them, six metabolic pathways may be related to soybean fatty acid synthesis, including fatty acid degradation (ko00071), fatty acid elongation (ko00062), fatty acid biosynthesis (ko00061), biosynthesis of unsaturated fatty acids (ko01040), alpha-Linolenic acid metabolism (ko00592), and linoleic acid metabolism (ko00591) (Fig. 9). This indicates that some mRNA-encoding proteins related to lncRNAs are likely to regulate lipid synthesis by influencing the metabolic pathways of fatty acids in soybean. Through KEGG pathway analysis, it was shown that the XM_003552194.3 and XM_003544690.3 target genes may be involved in fatty acid biosynthesis (Ko00061) and degradation (ko00071). The XM_003518957.3 gene has the potential functions related to alpha-Linolenic acid metabolism (ko00592).
Validation of differentially expressed lncRNAs and target genes
To further verify the RNA-seq data, 9 differentially expressed lncRNAs and 15 target mRNAs were randomly selected using qRT-PCR. The results of 7 lncRNAs and 12 target mRNAs were consistent with high-throughput sequencing data (Fig. 10, Additional file 6: Table S6). For example, the expression patterns of lncRNAs MSTRG. 11445.3 and MSTRG. 5822.1 were up-regulated in qPCR, confirming the RNA-seq results. The expression patterns of target mRNAs XM_003552194.3 and XM_003544690.3 showed a downregulation trend, which also confirmed the RNA-Seq results. This indicates that using RNA-seq to detect the expression of lncRNAs is highly reliable.
Construction of lncRNAs and mRNAs network, and prediction analysis of lncRNAs functions related to fatty acid synthesis
Genes or proteins in the same expression module can be modulated together. Therefore, using computational construction of coding and non-coding genes, it would be possible to obtain co-expression networks to deduce the potential biological functions of lncRNAs. The targets of 1,805 differentially expressed lncRNAs were predicted. A network was constructed for each comparison using the identified differentially expressed genes. In total, 604 lncRNAs and 1,484 mRNAs were found to be related (Additional file 7: Table S7). The lncRNAs-mRNAs network was constructed using Cytoscape (version 3.7.0, http:// www. cytoscape. org/) (Fig. 11).
GO and KEGG pathway analysis were performed for the target mRNAs of the lncRNAs. It was found that a total of 77 lncRNAs may regulate 115 target genes to participate in fatty acid synthesis and, therefore, they have the potential to influence oil synthesis in soybean (Additional file 8: Table S8). In particular, XM_003536636.3, NM_001250756.1, NM_001248321.1, and NM_001251763.1 may have potential functions including fatty acid biosynthetic process (GO:0006633), lipid metabolic process (GO:0006629), and fatty acid metabolic process (GO:0006631). In addition, XM_006588443.1, NM_001251413.1, and XM_003524392.3 may also be involved in biosynthesis of unsaturated fatty acids (ko01040), fatty acid biosynthesis (ko00061), and fatty acid degradation (ko00071).