Genome-wide identification of lncRNAs in 186 wheat spike samples
To comprehensively obtain lncRNA related to wheat spike, we used a total of 186 RNA-seq datasets of the spike samples collected from 93 bread wheat accessions to identify the lncRNA. Totally, about 4.6 Tb clean reads were aligned to the wheat reference sequence (IWGSC Refseq v1.1) and all samples showed that the average alignment rate was more than 90%. Reference guided transcriptome assembly strategies were further applied to identify lncRNAs and then 35,913 lncRNAs were obtained (Fig. 1a, Additional file 1: Table S1). Of these, a total of 4,146 (11.5%) lncRNAs were found in the plant lncRNA database (http://greenc.sequentiabiotech.com/wiki2/Main_Page) and the average sequence identity was 92.4%.
Compared to mRNA, these wheat lncRNAs showed different basic characteristics. The average of expressed samples of lncRNAs was 119, whereas that of mRNA was 161 (Fig. 1b). The median length of lncRNAs and mRNA were 1,584 bp and 3,953 bp, respectively (Fig. 1c). Meanwhile, the number of exons of lncRNA transcripts was less than protein-coding transcripts and the majority of lncRNA transcripts (89.2%) had only one exon (Fig. 1d). However, there was no significant difference of GC content between mRNAs (50.0%) and lncRNAs (49.1%) (Fig. 1e). The median expression levels of mRNAs were higher than that of lncRNAs (Fig. 1f). Furthermore, the mRNA and lncRNA density in different chromosome and subgenomes were calculated with 1Mb interval (Fig. 2a). The results showed that the whole genome had an average of 2.71 mRNAs and 2.41 lncRNAs per Mb, respectively. Meanwhile, the number of mRNA on each chromosome was moderately correlated (R = 0.40) with the number of lncRNA through regression analysis. Moreover, most lncRNAs were identified on the B subgenome (36.1 %), followed by the A (34.9 %) and D subgenome (26.6 %) (Fig. 2b). The mRNA density of D genome was significantly higher than that of A and B subgenomes (A: 2.53/Mb; B: 2.48/Mb; D: 3.33/Mb), while the opposite trend was found on lncRNAs that D subgenome displayed the lowest lncRNA density among the three subgenomes (A: 2.49/Mb; B: 2.45/Mb; D: 2.37/Mb). Among all lncRNA, long intergenic noncoding RNA (lincRNA) (30,974) has the largest abundance (Fig. 2c). Interestingly, 36 lincRNA were identified in 15 centromeric intervals, of which 2D centromeric region had the most lncRNAs. Regression analysis showed a significant correlation between lincRNA number and centromere region size (R = 0.47, P-value = 0.0054). The mean expression level (FPKM) of these lincRNA was 1.25, which was relatively lower than other lncRNAs.
Identification of coding genes potentially targeted by lncRNAs
Correlation of expression level between lncRNAs and protein-coding genes can imply involvement in common biological processes [20]. Then, a total of 1,619 lncRNA-mRNA pairs (187 cis- and 1,432 trans-pairs) were identified in 443 lncRNA and 464 mRNA based on the physical position, expression correlation and LncTar tool (Fig. 2d, Additional file 1: Table S2). The distance between cis-lncRNAs and target mRNAs ranged from 1 bp to 4,782 bp with a mean length of 503 bp (Fig. 2e). The relationship cis-lncRNAs and target mRNAs was one to one, while one-to-many relationship was mainly identified between trans-lncRNAs and target genes (Fig. 2f). Subsequently, we performed GO and KEGG enrichment analysis of the targeting genes. Results showed that 63 GO term were significant enriched (FDR < 0.05), of which 2 and 7 terms belonged to cellular component and molecular function, respectively. The top enriched term contained pollen tube development (GO:0048868, 8.90E-16), pollen tube growth (GO:0009860, 1.50E-13), and cell tip growth (GO:0009932, 3.70E-13) (Additional file 1: Table S3, Additional file 2: Figure S1a). KEGG results showed that 11 pathways were significantly enriched, of which also contained regulation of pollen tube growth pathway (Additional file 2: Figure S1b). Enrichment analysis showed that most target coding genes were related to growth and development, especially pollen growth, suggesting the potential biological function of these lncRNAs.
Comparative analysis of conserved lncRNA sequences in homolog species
To understand the evolution of lncRNA, we performed similarity alignment and conservation analysis of the lncRNA among A. thaliana, S. bicolor, Z. mays, O. sativa, H. vulgare, Ae. tauschii, T. urartu, T. turgidum and T. aestivum. Results showed that 8,133 out of 35,193 wheat lncRNAs (23.1%) possessed homologous relationships with other species (Additional file 1: Table S4). The most homologous relationships were identified between T. aestivum and Ae. tauschii, followed by H. vulgare, T. turgidum, T. urartu, Z. mays, O. sativa, S. bicolor and A. thaliana. Most of homologous lncRNA were located in intergenic region. (Fig. 3a). GO enrichment analysis of target genes of homologous lncRNAs showed that conserved lncRNAs were involved in important biological process, such as auxin biosynthetic process (GO:0009851), protein phosphorylation (GO:0006468), and recognition of pollen (GO:0048544) (Fig. 3b).
In order to reduce the impact of different identification methods on the identification of homologous lncRNA. We further investigated whether these lncRNA sequences are highly conserved across different species genome. Wheat lncRNAs were aligned to the genomes of these eight species, and a conservation score was calculated as the product between the percentage of lncRNA sequence coverage and identity. Compared with the homologous analysis between lncRNAs and other species, more hits were obtained from the alignment of wheat lncRNAs and other species genome. Of these, the most of conserved sequences were identified between T. turgidum and T. aestivum (Additional file 1: Table S5). Based on the single copy genes among the nine species, we constructed a phylogenetic evolutionary tree (Fig. 3c). Compared with other species, the closest evolutionary relationship was also found between T. turgidum and T. aestivum. In contrast, Z. mays and O. sativa, both cereal crops, were far away from T. aestivum in the evolutionary relationship of lncRNA. These results showed that fragment introgression may be occurred between hexaploid wheat and tetraploidwheat.
Population genetic analysis of lncRNAs and target mRNAs
Based on genotyping data of 261 accessions from Triticum and Aegilops genera [21] (Additional file 1: Table S6), the genetic variations of lncRNA-mRNA pairs in A1 (wild einkorn wheat, Triticum monococcum L. ssp. aegilopoides, AA), A2 (domesticated einkorn wheat, Triticum monococcum L. ssp. monococcum, AA), A3 (urartu wheat, Triticum urartu, AA), AB1 (wild emmer wheat, Triticum turgidum L. ssp. dicoccoides, AABB), AB2 (domesticated emmer wheat, Triticum turgidum L. ssp. dicoccon, AABB), AB3 (durum wheat, Triticum turgidum L. ssp. durum, AABB), ABD1 (landrace, Triticum aestivum L. ssp. aestivum, AABBDD), ABD2 (cultivar, Triticum aestivum L. ssp. aestivum, AABBDD) and D1 (diploid, Aegilops tauschii, DD) were investigated. The fixation index (Fst) and nucleotide diversity (π) were calculated. The genetic divergence between wheat subgenome and its relatives were also detected (Fig. 4a-c). For each group, the Fst value between A2 and A3 (lncRNA: 0.82; mRNA: 0.89) tended to be largest, indicating a larger degree of divergence. The lowest Fst was observed between ABD1 and ABD2 (lncRNA: 0.0047; mRNA: 0.0049). For each subgenome, the B subgenome had the lowest degree of divergence and there was no significant difference between A and D subgenome. The π of wild species was always larger than that of cultivated species in lncRNA and target mRNAs of different populations and subgenomes. The average π value of B subgenome was larger than that of A and D subgenome (Additional file 1: Table S7). Then, SNP density of lncRNAs and target mRNAs showed that genetic bottleneck events were found in A and D subgenome (Fig. 4d). The shifting of genetic diversity also indicated more genetic bottleneck events appeared in domesticated process (Fig. 4e). During domestication and improvement, the genetic diversity continued to shrink and the large bottleneck event occurred between A1 and A2/A3. In contrast, the B and D genomes showed the bottleneck event in domestication but genetic expansion in improvement, especially between landraces and varieties. The durum wheat and hexaploid bread wheat appeared to possess more genetic diversity after improvement, indicating the breeders would select specific genetic characteristics of wheat according to their needs in different environments.
By constructing phylogenetic evolutionary tree using genetic variation of lncRNAs and target mRNAs, we found each subgroup, expecting ABD1 and ABD2, can be almost clustered in the same branch (Additional file 2: Figure S2-S4), indicating that there were potential gene flow between ABD1 and ABD2 group and even with other subgroups. Furthermore, we identified the haplotype organization and frequency of each functional lncRNA and target mRNAs in these populations based on the resequencing data (Additional file 1: Table S8). A total 192 lncRNA and 213 mRNA were found to have the genetic variations among these populations. Among them, 3 lncRNAs and 8 target mRNAs were identified to have different haplotypes in different populations. For instance, haplotype GGGGGGTTGGGGTTCCCCTTGGAACC (76.92%) was the major haplotype in the AB3 population, but it was replaced by GGGGGGTTGGGGTTCCCCTTGGAAAA in ABD1 (93.33%) and ABD2 (100%). These results showed that these lncRNAs suffered a stronger genetic bottleneck and some loci with important candidate genes may be included in the process of domestication and improvement.
Endogenous competitive role of lncRNA in gene regulatory network
The competing endogenous RNAs (ceRNA) is also the important role of lncRNAs [22, 23]. lncRNAs can act as decoys for miRNAs to competitively inhibit their interaction with target mRNAs. Therefore, we constructed a ceRNA network of wheat spike-related lncRNA, showing the interaction relationships among lncRNA, miRNA and mRNA in wheat spike (Fig. 5a).
A total of 122 lncRNAs, 102 miRNAs and 119 mRNAs were contained in ceRNA network and 238 unique triangle pairs of lncRNA-miRNA-mRNA were identified (Additional file 1: Table S9). For instance, lncRNA.2204.2 was competitive in tae-miR319-TraesCS2A02G226000.1(TaBuB) pairs, showing the significantly negative impact between the expression level of lncRNA.2204.2 and TaBuB (Fig. 5b). The 369 bp to 389 bp of lncRNA.2204.2 cDNA is the same as the 314 to 334 bp of TaBuB, which can be cleaved by tae-miR319 (Fig. 5c-e). Moreover, tae-miR9773 involved the most of interaction relationship pairs in the ceRNA network, such as tae-miR9667b, tae-miR9780, tae-miR159, tae-miR156, tae-miR1127b and tae-miR1120b, indicating that these miRNAs may play important roles.
Functional analysis of lncRNA-mRNA pairs
In order to further explore and confirm the function of lncRNA, we focused on the 1619 lncRNA-mRNA pairs and performed the GWAS analysis with 5 agronomic traits and co-localized analysis with 124 known QTLs that associated with 13 agronomic traits. The lncRNAs or mRNAs overlapped with the known QTL or within 5 Mb upstream and downstream of GWAS signal point was considered as trait candidate gene.
Then, 54 lncRNAs and 52 target mRNAs were identified to be associated with GWAS signals, of which contained 318 lncRNA-mRNA pairs (Additional file 1: Table S10). Meanwhile, there were 130 lncRNAs and 128 mRNAs associated with known QTLs (Additional file 1: Table S11). Of these, 14 lncRNAs and 13 mRNAs were shared by GWAS signals and known QTL regions. GWAS results showed that more lncRNAs and target mRNAs were associated with spike length signals, followed by heading and flowering stage (Fig. 6). Additionally, kernel width-related QTLs had the most lncRNAs and target mRNAs. For example, lncRNA.7517.1 was identified to be related to heading stage, and its expression level was negatively correlated with the target gene. TraesCS5B02G326100.1, as the target gene of lncRNA.7517.1, is an orthologue of AtSCL5 that is a transcription factor involved in plant development [24].
Further, according to the introgression region in previous study [25], we screened the overlap regions of lncRNA and introgression. Then, 85 lncRNAs and 80 mRNAs were found in introgression interval (Additional file 1: Table S12). Meanwhile, we performed fisher's exact test and indicated a significant correlation (P-value = 2.8e-14) between the introgression intervals and lncRNA-mRNA pairs. Interestingly, the distribution of lncRNA-mRNA pairs showed that most of lncRNAs and target mRNAs were enriched in chromosome 1BS (Fig. 6). Because of the differences of wheat pedigree, we suspect that the enrichment region of 1BS may be related to wheat 1BL/1RS translocation (1B1R) line. Furthermore, a total of 55 loci (lncRNA, 33; mRNA, 32) on 1B1R region was divided into two clusters in all experimental samples by analyzing expression abundance (Additional file 2: Figure S5a). The expression level of lncRNA and mRNA between non-1B1R samples (158) and 1B1R samples (28) was also significance difference (Additional file 2: Figure S5b). GO enrichment analysis showed that chloroplast envelope (GO:0009941), cytosol (GO:0005829), maintenance of DNA methylation (GO:0010216) and chloroplast stroma (GO:0009570) were significant enriched (Additional file 2: Figure S5c). Furthermore, we performed qRT-PCR analysis for five randomly selected 1B1R lines-specific lncRNAs and five non-1B1R lines-specific lncRNAs from ten samples (five 1B1R and five non-1B1R lines). The results showed that five 1B1R lines-specific lncRNAs were highly expressed in 1B1R group but low or even not expressed in non-1B1R lines, while the expression pattern of non-1B1R lines-specific lncRNAs was opposite (Additional file 2: Figure S6).
LncRNA-mRNA related to heading and flowering stage
Gene expression is often regulated by cis-acting elements on its promotor region, such as enhancer, terminator and transcription factor (TF) binding site [26]. Six SNPs were found in gene promotor region (3) and coding region (3) of TraesCS2A02G518500.1 which was co-expressed with lncRNA.127690.1. Of these, one GWAS signal that associated with wheat heading and flowering stage (AX-110948179, C to T, Chr2A:738974737, P-value = 8.36E-05) was in the upstream of TraesCS2A02G518500.1 (Fig. 7a). We identified that the C genotype of AX-110948179 (Chr2A:742131947) form the MBSI cis-acting element, while TT genotype was the opposite (Fig. 7b). MBSI (aaaAaaC(C/G)GTTA), a TF binding region, can be targeted by MYB transcription factor gene in regulating plant growth and development, metabolism, biological and abiotic stress, hormone response and other physiological processes [27, 28]. Interestingly, genotype C corresponded to a shorter interval from heading to flowering, and a higher expression level of TraesCS2A02G518500.1, while T genotype had a longer interval heading and flowering stage and a lower expression level (Fig. 7c and 7d). The phylogenetic tree based on SNP of TraesCS2A02G518500.1 showed genetic flow might occur between tetraploid wheat and hexaploidy wheat populations (Additional file 2: Figure S7). Interestingly, TraesCS2A02G518500.1 could encode a pectin methylesterase inhibitor family protein. PMEI in rice has been reported to associated with heading stage [29]. Moreover, lncRNA.127690.1 can remotely target TraesCS2A02G518500.1 with a significant positive relationship of expression level (Fig. 7e). These results supported that lncRNA.127690.1-TraesCS2A02G518500.1 pair may play a key role in heading and flowering stage of wheat.
LncRNA-mRNA related to spike length
In addition, a GWAS signal that associated with spike length (AX-110507832, Chr6A:24725411, P-value = 2.39E-05) was identified in the downstream of lncRNA.104854.1 and its cis-regulatory mRNA (TraesCS6A02G050300.1) (Fig. 8a). Meanwhile, three SNP were found in the eighth exon region of mRNA and these three SNP changed the amino acids (Fig. 8a). Then, three main haplotypes (Hap1: CTT, Hap2: CCT, Hap3: CCC) of this mRNA were identified. The spike length of Hap1 was the longest, followed by Hap2 and Hap3 (Fig. 8b and 8c). Interestingly, lncRNA.104854.1 can compete with tae-miR5175-5p to bind to TraesCS6A02G050300.1. The expression level of lncRNA.104854.1 in Hap1 was the lowest, followed by Hap2 and Hap3 (Fig. 8d). Notably, the expression level of lncRNA.104854.1 and TraesCS6A02G050300.1 showed a significant negative relationship (Fig. 8e). Moreover, TraesCS6A02G050300 is an orthologous gene of AtATG5 encoding autophagy protein. Autophagy protein has been reported to be related to plant growth and development [30, 31]. These results indicated that TraesCS6A02G050300.1 play an important role in spike growth, and lncRNA.104854.1-TraesCS6A02G050300.1 is likely to be a key regulatory module in spike length of wheat.