Identification and characterization of lncRNAs across two soybean genotypes under different P levels
To identify LP-responsive lncRNAs in soybean roots, we constructed 12 cDNA libraries from soybean root samples from two genotypes with contrasting responsiveness to LP stress, namely, Bogao (BG, a LP-sensitive genotype) and Nannong 94156 (NN94156, a LP-tolerant genotype), after exposure to high/normal phosphorus (HP, 500 μM, control) and low phosphorus (LP, 5 μM) conditions [32]. Three biological replicates of each condition were used to minimize the individual variation. The libraries were sequenced using the Illumina HiSeq 4000 platform, and 125-bp paired-end reads were generated. Approximately 1,087 million raw sequencing reads were generated from all 12 libraries, and each sample contained reads ranging from 75.5 to 100.7 million in number. After discarding adaptor sequences and low-quality reads (Q-value ≤ 20), more than 90% of the total reads were retained [33]. We mapped these clean reads to the soybean reference genome sequence (Wm82.a2.v1). In total, 4,166 novel lncRNAs were predicted using the coding-noncoding index (CNCI) [34] and coding potential calculator (CPC) [35] under all tested conditions (Table S1).
The classification of these lncRNAs showed that the majority (2,865, 68.77%) of the 4,166 lncRNAs were located in intergenic regions, and the remaining 1,301 (31.23%) resided within genic regions and included 454 bidirectional lncRNAs, 498 antisense lncRNAs, 121 sense lncRNAs, and 228 others that were not classified into these types (Fig. 1a). The type of lncRNA might be related to its functions; for example, overexpressed LAIR (a lncRNA transcribed from the antisense of the neighboring gene LRK cluster) regulates the expression of several LRK genes and increases the grain yield in rice [36]. We subsequently analyzed the chromosomal location of all the lncRNAs in the soybean genome. The distribution of the lncRNAs was uneven: chr13 and chr18 contained more than 250 lncRNAs, and chr05, chr11, and chr16 contained approximately 150 lncRNAs (Fig. 1b). In addition, we analyzed the number of exons and introns in each lncRNA transcript. Most of the lncRNAs contained one exon and no introns (3,597), and the number of exons and introns was as high as seven and six, respectively (Fig. 1c). The GC content of the lncRNAs varied greatly, with a range of 20.68% to 64.1% and an average of 35.88%.
The majority of lncRNAs have GC percent in the range of 30% to 45% (Fig. 1d). A majority (94.43%) of the lncRNAs were shorter than 2,000 nucleotides (Fig. 1e).
Differentially expressed (DE) lncRNAs in two soybean genotypes under different P levels
To identify the lncRNAs that are responsive to LP stress, we identified the differentially expressed (DE) transcripts of lncRNAs through pairwise comparisons between the two soybean genotypes under HP and LP conditions. The FPKM (fragments per kilobase of transcript per million mapped reads) values were used to evaluate the transcript abundance of lncRNAs. Differently expressed lncRNAs (referred to as DE lncRNAs hereafter) were defined as lncRNAs with Log2FC > 1 and FDR < 0.05. In total, 525 DE lncRNAs were identified among the two different genotypes under HP and LP conditions, and these included 116 DE lncRNAs between different P levels in the same genotype, 456 DE lncRNAs between different genotypes at the same P level, and 47 shared DE lncRNAs (Table S2). To identify the effect of LP stress on lncRNAs, we compared the DE lncRNAs of different genotypes under the same P condition and in the same genotype at different P levels (Fig. 2). As shown in the volcano plot, the LP treatment of Bogao and NN94156 resulted in more downregulated DE lncRNAs than upregulated DE lncRNAs, and the downregulated DE lncRNAs presented a more substantial change in differential expression than the upregulated DE lncRNAs (Fig. 2a, Fig. 2b). The number and fold change in expression of the upregulated and downregulated DE lncRNAs were relatively consistent in the Bogao and NN94156 genotypes under the same P level (Fig. 2c, Fig. 2d, Fig. S1).
Because the two genotypes showed markedly different responses to LP stress, we performed a Venn diagram analysis to elucidate the DE lncRNAs between the two genotypes under LP conditions. The number of common and unique DE lncRNAs between the two genotypes is indicated in the Venn diagram (Fig. 3a). NN94156 and Bogao shared 21 common DE lncRNAs in the HP vs. LP comparisons, and Bogao exhibited more genotype-specific DE lncRNAs (72) than NN94156 (23) (Fig. 3b), which is consistent with the results shown in the volcano plot (Fig. 2a, Fig. 2b). We found that the 21 common DE lncRNAs in Bogao were all downregulated under LP conditions, whereas most of these downregulated lncRNAs (20, all except TCONS_00029009) were also downregulated in NN94156 (Fig. 3d). To determine whether the effect of LP stress on lncRNAs is related to genotype, we compared the changes in DE lncRNAs between Bogao and NN94156 under LP or HP conditions. The results identified 133 and 139 unique DE lncRNAs under the LP and HP conditions, respectively (Fig. 3c). The 184 common DE lncRNAs showed the same up- or downregulation trend: 123 were downregulated, and 61 were upregulated (Fig. 3e).
Validation and quantification of lncRNAs
To validate the expression of these LP-responsive lncRNAs, 14 lncRNAs were randomly selected and analyzed by quantitative PCR (qPCR). As shown in Fig. 4a, the expression patterns of the LP/HP lncRNAs determined by RNA-seq and qPCR were relatively consistent and presented similar trends. Both the qPCR and RNA-seq assays revealed a positive correlation in the expression fold-change with an R2 of 0.7878 (Fig. 4b), which indicated the robustness of our analysis and the reliability of the lncRNA expression patterns identified in the current study. These findings confirm that these lncRNAs are responsive to LP stress in soybean roots.
Functions and expression patterns of DE lncRNAs and their target genes
To reveal the potential functions of the differentially expressed lncRNAs under LP stress in two contrasting genotypes, we predicted the candidate targets of cis-, trans- and antisense-acting DE lncRNAs. In total, 785 targets of 374 DE lncRNAs were identified, and for 960 pairs, one lncRNA might have several targets and/or one mRNA target might be targets of several lncRNAs (Table S3). To explore the putative functions of DE lncRNAs, we analyzed the Gene Ontology (GO) terms (Table S4) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of the putative target genes (Table S5).
The GO analysis of DE lncRNAs in one genotype at different P levels revealed that 403 GO terms (195 in the biological process category, 146 in the molecular function category, and 62 in the cellular component category) were significantly enriched (P < 0.05) (Fig. 5a). The analysis of the DE lncRNAs in Bogao or NN94156 exposed to the same P level showed that 1,086 GO terms (497 in the biological process category, 362 in the molecular function category, and 227 in the cellular component category) were significantly enriched (P < 0.05) (Fig. 5b). Although the numbers of GO terms in the two genotypes were different, their trends were relatively similar. In brief, the most significant GO terms related to biological process were metabolic process, single-organism process, and cellular process, and the analysis of molecular functions revealed that catalytic activity and binding were the important significantly enriched GO terms. In addition, cell, cell part, membrane and organelle were the most important significant terms belonging to the cellular component categories. Taken together, these results show that these lncRNAs might play roles in a variety of biological processes that are responsive to LP stress.
We subsequently analyzed the enrichment of the predicted target genes of DE lncRNAs in KEGG pathways (Table S5). The targets of DE lncRNAs in the same genotype between different P levels were enriched in 42 KEGG pathways, including several KEGG pathways related to carbohydrate metabolism, lipid metabolism, and amino acid metabolism (Fig. 6a). For example, propanoate metabolism, glycolysis/gluconeogenesis, pyruvate metabolism, starch and sucrose metabolism, and the pentose phosphate pathway belong to carbohydrate metabolism, and fatty acid biosynthesis, alpha-linolenic acid metabolism and fatty acid degradation are lipid metabolism pathways. The analysis of targets of DE lncRNAs between Bogao and NN94156 under the same conditions (HP and LP) showed that 74 KEGG terms were enriched, and these included environmental adaptation, carbohydrate metabolism, biosynthesis of other secondary metabolites, lipid metabolism, and signal transduction. Among the top 20 enriched pathways (Fig. 6b), circadian rhythm-plant and plant-pathogen interactions belong to environmental adaptation and were significantly enriched (Q-values < 0.05). Terms related to three secondary metabolite pathways, including flavonoid biosynthesis, isoflavonoid biosynthesis, and phenylpropanoid biosynthesis, were enriched. These findings suggest that DE lncRNAs might regulate genes involved in many biological processes, including molecular metabolism, energy synthesis and signal transduction, in response to LP stress.
Putative P-related lncRNAs based on miRNAs
miRNAs are endogenous noncoding RNAs with a size of 20 to 24 nucleotides that are generated from a single-stranded RNA precursor with a hairpin secondary structure. LncRNAs can be spliced by miRNAs into multiple small RNAs, and as a result, the function of lncRNAs can be regulated by miRNAs via posttranscriptional regulation. For example, miR399 is the first miRNA that was found to be upregulated specifically by P deficiency and rapidly decreased after P readdition, and certain miRNA families are responsive to P deficiency in various species [37].
Because our research mainly focused on LP stress, various targets, including lncRNAs and mRNAs of P-related miRNAs, such as miR399, miR827, miR395, miR319, miR156, miR159, miR166, miR169, miR398 and miR447 [27, 37, 38], were selected. The lncRNA TCONS_00090111 was identified as a target of five miRNAs, namely, gma-miR156aa, gma-miR156z, gma-miR159b-3p, gma-miR159c, and gma-miR159f-3p. Similarly, TCONS_00015352 was predicted as a target of miR447-y (Table 1). The P-related miRNA targets were then predicted, and we found that nine mRNAs were also targets of lncRNAs (Table S6). As shown in Fig. 7, Glyma.19G121000, Glyma.02G109500 and TCONS_00068024 (a novel identified mRNA) were targets of lncRNAs and several miRNAs. Glyma.06G290000 and Glyma.12G117000 were both annotated as ethylene-responsive transcription factor 9-like mRNAs and were targets of the lncRNAs TCONS_00030280 and TCONS_00068008, respectively. Both genes were also targets of gma-miR169l-3p, which is a P-related miRNA [39]. Another example is Glyma.19G193900, which was predicted to be purple acid phosphatase 22-like, is the target of TCONS_00105416 and miR398-x, which belong to miR398 and have a demonstrated role in coping with P starvation stress [37].
Table 1. LncRNAs identified as targets of P-related miRNAs
|
miRNA
|
Target lncRNA
|
gma-miR156aa
|
TCONS_00090111
|
gma-miR156z
|
TCONS_00090111
|
gma-miR159b-3p
|
TCONS_00090111
|
gma-miR159c
|
TCONS_00090111
|
gma-miR159f-3p
|
TCONS_00090111
|
Construction of the network of transcription factors (TFs) and P-related and plant hormone-associated lncRNAs and mRNAs
TFs regulate a diverse group of genes during stress responses and are important components of gene regulatory networks, and many TFs belonging to some families have been proven to play an important role in the maintenance of P homeostasis, such as the phosphate starvation response (PHR), bHLH, WRKY, ZAT, and MYB [40]. The P-mediated regulation of the root system architecture is driven by the local perception of PO4– at the root tip and involves changes in multiple plant hormones, such as auxin, gibberellins and ethylene, as well as hormonal changes coordinated with the root developmental responses to P availability [38]. P-related genes such as PHO2 and PHR1 play important roles in the P starvation response. To further study the function of lncRNAs in the responses of soybean roots to LP stress, we constructed a lncRNA-mRNA network of mRNAs of interest (including transcription factors and P-related and plant hormone targets) and corresponding lncRNAs according to the GO, KEGG and functional annotations of the target genes (Table S7). As shown in Fig. 8, the lncRNA-mRNA network consisted of 52 lncRNAs and 109 targets in total. Twenty-three lncRNAs might be involved in the regulation of gene transcription because their target genes have transcription factor activity; in addition, three of the lncRNAs have two TF targets each, and 20 lncRNAs only have one TF target. The number of lncRNA targets varied from one to five, and Glyma.02G226800 and Glyma.02G226700 were targets of five and four lncRNAs, respectively. Twenty-six TFs belong to diverse families, such as MYB, bHLH, NAC, and AP2, and among these, MYB and bHLH reportedly play roles in the maintenance of P homeostasis [40]. Interestingly, we found that eight lncRNAs might be involved in ethylene regulation because their targets were annotated as ethylene-responsive TFs. Among the P-related genes, Glyma.17G172700 and Glyma.19G193900 were annotated as purple acid phosphatases (PAPs), and Glyma.20G021600 was predicted to be a phosphate transporter that is known as a PHT and is involved in LP stress.