De novo transcriptome assembly yielded 99,056 unigenes
Illumina deep sequencing of the F. macclureana transcriptome generated 140.94 Gb of data, including 471,537,304 clean reads in 18 unique samples (Additional file 1: Table S1). The average Q20 (sequencing error rate less than 1%) and Q30 (sequencing error rate less than 0.1%) percentages were 95.64% and 89.95% respectively. The GC content of all samples ranged from 53.78% to 55.86%, with an average of 54.81%. Sample data were assembled into 289,122 transcript scaffolds, with an N50 and average length of 1,765 bp and 1,183 bp, respectively. The final de novo assembly included 99,056 unigenes, with an N50 and average length of 1,587 bp and 926 bp, respectively. Among these unigenes, 71.02% (70,354) were shorter than 1,000 bp and 12.06% (11,950) were longer than 2,000 bp (Table 1).
Most unigenes were functionally annotated and classified
A total of 47,306 unigenes were annotated (Additional file 2: Table S2). Of these, 45,516 (96.22%) unigenes were found to encode products that showed significant similarity to characterized proteins in the non-redundant protein sequence database (Nr) at an E-value threshold of 10–5 (Table 2). We also found that 7,027 (15.45 %) unigenes showed similarity to genes found in rice, 11.33% were similar to those found in Brachypodium distachyon, and we also found a significant proportion of the unigenes that were similar to those found in Setaria italica, Oryza brachyantha, and Zea mays (Fig. 2a). We identified 24,847 (52.52%), 28,317 (59.86%) and 43,909 (92.82%) unigenes that showed significant matches to entries in the Swiss-Prot, Pfam, and eggnog databases, respectively (Table 2). Many unigenes expressed in the F. macclureana transcriptome were functionally annotated as regulators of plant responses to evolutionarily important phenotypes, including membrane stabilization, heat stress response and pathogen defense (Additional file 2: Table S2).
Functional annotation indicated that many unigenes were involved in metabolism and genetic information processing
We were able to annotate 13,128 unigenes (27.75% of the total) in 25 different categories of the COG (clusters of orthologous groups) classification database (Fig. 2b). Of these, the cluster for “General function prediction only” (3,277, representing 24.96% of the 13,128 unigenes annotated by this database) was the largest group, followed by “Replication, recombination and repair” (2,202, 16.77%), “Transcription” (1,571, 11.97%), and “Translation, ribosomal structure and biogenesis” (1,429, 10.88%). The “Signal transduction mechanisms”, “posttranslational modification, protein turnover, chaperones”, “carbohydrate and amino acid transport and metabolism” and “transport and metabolism” categories also contained a significant proportion of the annotated unigenes.
GO enrichment analysis indicated that these predicted unigenes were categorized into three main categories—i.e. biological process (BP), cellular component (CC), and molecular function (MF). As shown in Fig. 2c, for unigenes that were enriched in the BP category, they were mainly involved in biological processes related to reproduction, posttranslational modification and signal transduction; as for those in the CC category, they were mainly involved in cellular components related to membrane, ubiquitin ligase complex, mitochondrion, chloroplast and etc.; while for those in the MF category, they were mainly involved in molecular functions related to signaling transduction (e.g. “ATP binding”, “zinc ion binding”, “protein kinase activity”, and etc.) (Additional file 3: Table S3).
We also mapped 14,307 unigenes (representing 30.24% of the total) to six different KEGG subsystems, including metabolism, genetic information processing, environmental information processing, cellular processes, and organismal systems. As shown in Fig. 3, the majority of these unigenes (7,922, representing 66.17% of the 14,307 unigenes classified using KEGG annotations) were assigned to metabolic pathways, including carbohydrate metabolism, energy metabolism, and others. In addition, 4,024 unigenes (28.13%) were assigned to genetic information processing, including transcription, translation, and folding, and 474 unigenes (3.31%) were found to be related to membrane transport and signal transduction. We also found 707 genes (4.94%) that were related to transport and catabolism and 377 genes (2.64%) related to environmental adaptation.
Most BEUs were involved in genetic information processing, environmental adaptation and signal transduction
As shown in the Venn diagram (Fig. 4a), we found nearly equal numbers of unigenes that were broadly and specifically expressed in I-spikelets, P-spikelets, F-branchlets, and F-leaves. COG analysis indicated that most BEUs were clustered in signal transduction mechanisms (T), replication, recombination and repair (L), and transcription (K), besides general function prediction only (R). GO enrichment analysis for these BEUs indicated that they were also mainly involved in reproduction, environmental adaptation and signal transduction, which was largely similar with that for all predicted unigenes (Additional file 4: Table S4-a).
KEGG enrichment analysis also indicated that these BEUs were mainly enriched in pathways related to environmental adaptation (including circadian rhythm, endocytosis, and plant-pathogen interactions), signal transduction (including plant hormone signal transduction, phosphatidylinositol signaling system, and inositol phosphate metabolism) and genetic information processing (including spliceosome, mRNA surveillance, and RNA transport and degradation; Additional file 4: Table S4-b).
The SEUs were mostly involved in carbohydrate metabolism, energy metabolism, and environmental adaptation
As shown in Fig. 4a, we identified 10,653 unigenes that were specifically expressed in spikelets, including 5,528 and 5,025 unigenes in I- and P-spikelets, respectively. We also found 9,067 and 7,437 unigenes that were specifically expressed in F-branchlets and F-leaves, respectively. COG annotation indicated that the distribution patterns of SEUs among the 26 terms were similar, with the number of SEUs within each term varying among the three tissues (Fig. 4b).
The GO enrichment analysis indicated that these SEUs not only shared some common GO terms, but also had some particular ones. As shown in Fig. 4c and Additional file 4: Table S4-c, for those SEUs that were enriched in the BP category, they were broadly involved in several important biological processes, including “protein phosphorylation”, “regulation of flower development”, “protein ubiquitination”, “regulation of transcription, DNA-templated”, “reciprocal meiotic recombination” and “meiotic chromosome segregation”. In addition, SEUs in I- and P- spikelets were also involved in some processes related to reproduction; and those in F-branchlets were mainly involved in processes related to posttranslational modification; while those in F-leaves were mainly involved in processes related to plant-pathogen interaction. As for those in the CC category, they were broadly involved in several important cellular components, including “mitochondrion”, “plasma membrane” and “plastid”. In addition, SEUs in I- and P-spikelets were also involved in ribosome and mitochondria; and those in F-branchlets were mainly involved in endoplasmic reticulum and proteasome; and those in F-leaves were mostly involved in chloroplast. As for those in the MF category, they were broadly involved in several molecular functions, including “ATP binding”, “ubiquitin-protein transferase activity” and “protein tyrosine kinase activity”. In addition, SEUs in I- and P-spikelets were also involved in DNA and microtubule binding; those in F-branchlets were also enriched in oxidoreductases activities; and those in F-leaves were also enriched in enzymes involved in carbohydrate metabolism.
As shown in Additional file 5: Fig. S1, KEGG pathway analysis indicated that SEUs in I- and F-spikelets mainly mapped to the ribosome pathway, with those in F-branchlets mainly mapped to the ribosome, amino acid biosynthesis, and carbon metabolism pathways, and those in F-leaves mainly mapped to KEGG pathways related to energy metabolism (including oxidative phosphorylation, fatty acid metabolism, and photosynthesis), environmental adaptation (e.g. proteasomes), genetic information processing, and various unrelated metabolic pathways (e.g. tryptophan metabolism, beta-alanine metabolism, and N-glycan biosynthesis).
DEUs were mostly involved in carbohydrate and energy metabolism, signal transition and environmental adaptation
As shown in Table 3, many unigenes showed differential expressions across all 15 groups sampled. The number of DEUs in each sample pair ranged from 970 between I- vs P-spikelets to 13,577 in NF-leaves vs I-spikelets. For most pairwise comparisons, the number of up- and down-regulated DEUs was approximately the same, except for four groups, including I- vs P-spikelets, F-branchlets vs both I- and P- spikelets, and F-leaves vs P-spikelets.
The Venn diagram of DEU sets shows that 5,494 unigenes were differentially expressed in F-branchlets/F-leaves vs I- and P-spikelets. For those DEUs that were up-regulated in spikelets, they are mainly mapped to KEGG pathways related to carbohydrate metabolism, plant-pathogen interactions and DNA repair (Fig. 5a). Notably, among the 970 DEUs identified between I- and P-spikelets, 916 up-regulated DEUs were mapped to KEGG pathways related to metabolic activity (Additional file 6: Table S5).
A total of 5,494 unigenes were differentially expressed in the DEU sets of spikelets/F-leaves vs F- branchlets. Upregulated DEUs in F-branchlets were mapped to KEGG pathways including phenylalanine metabolism, phenylpropanoid biosynthesis, ABC transporters, and flavone and flavonol biosynthesis (Fig. 5b). Those that were upregulated in F- and NF-leaves vs F- branchlets were mainly mapped to plant hormone signal transduction, homologous recombination, base excision repair, and mismatch repair (Additional file 6: Table S5). Notably, 3,275 (50.20% of the total) DEUs found between NF- and F-branchlets were upregulated; these were mainly mapped to KEGG pathways related to replication and recombination (Additional file 6: Table S5). Those that were downregulated were mainly mapped to carbon fixation and photosynthesis (Additional file 6: Table S5).
We also found that 6,966 (43.69% of the total) DEUs found in spikelets/F-branchlets vs F-leaves were up-regulated, and were mainly mapped to KEGG pathways related to carbohydrate metabolism (Fig. 5c). 2,492 (49.52%) DEUs in NF-vs F-leaves were up-regulated, and these were mainly mapped to starch and sucrose metabolism (Additional file 6: Table S5). In contrast, downregulated DEUs were mainly mapped to KEGG pathways related to photosynthesis (Additional file 6: Table S5).
Two putative FT orthologs were regulated differently in the circadian rhythm–plant pathway
Among the 5,032 DEUs identified between NF- and F-leaves, 70 were mapped to the circadian rhythm–plant KEGG pathwayAdditional file 7: Fig. S2) and 10 of them showed differential expressions (Additional file 8: Table S6). Notably, c109220.graph_c0 and c110963.graph_c4 were both annotated as FT orthologs: the former was a putative bambooortholog of Heading date 3a (Swissprot: PE = 1 SV = 1), and the latter was another ortholog of rice FT; we designated them as FmHd3a and FmFT, respectively.
As shown in Fig. 6a, protein sequence alignment indicated that both FmFT and FmHd3a had high amino acid sequence similarities (77.14%) with the known FT/TFL1 proteins and had the critical amino acids of FT/Hd3a proteins. For example, they both carry a conserved phosphatidylethanolamine-binding protein (PEBP) domain, D-P-D-x-P and G-x-H-R motifs, and the invariant histidine (asterisk), all of which are relevant to the ability of PEBP proteins to bind phosphoryl ligands, hence interfering with certain kinases and effectors [27–28]. Furthermore, all five proteins carry tyrosine–139 and tryptophan–143 (triangles) in the PEBP domain, two conserved sites in FT homologs acting as flowering inducers [29]. Whereas, our comparison also revealed that there were differences in amino acid sequences between FTs and Hd3as.
Additionally, phylogenetic analysis indicated that these ten proteins were subdivided into two distinct subgroups (Fig. 6b). FT and Hd3a proteins from O. sativa (OsFT, OsRFT1 and OsHd3a), F. macclureana (FmFT and FmHd3a) and A. thaliana (AtFT) were clustered in a branch. Furthermore, OsFT and FmFT were clustered together, just like OsHd3a and FmHd3a; they were both closer to each other than they were to AtFT.
Interestingly, two FT orthologs were regulated differently in NF- vs F- leaves. FmHd3a was significantly upregulated in F-leaves (FDR = 4.23, log2FC = 5.55), while FmFT was significantly downregulated in F-leaves (FDR = 4.25E–07, log2FC = –4.81). RT-qPCR analysis also showed that FmHd3a was significantly more highly expressed in I- /P- spikelets and F-leaves than in NF-leaves or NF- branchlets (Fig. 6c).
WGCNA results identified gene modules related to specific tissues
WGCNA results showed that unigenes expressed in the six different tissues of flowering and nonflowering plants tested here clustered into 18 branches representing 18 different genetic modules (Additional file 9: Fig. S3a). Unigenes within each module were highly co-expressed, while those in different modules were co-expressed to a lower degree (Additional file 9: Fig. S3b). In six of the samples collected, we identified nine significant gene modules including 1,344 unigenes. Here, correlation coefficient of a module with a related trait > 0.7 was used as a threshold of significance (Additional file 9: Fig. S3c). Notably, these six tissues were more strongly divided into clades according to whether they were flowered or not rather than by the differences among tissues (Additional file 9: Fig. S3d).
In addition, the unigenes in gene modules relating to I - and P- spikelets were most strongly enriched in KEGG pathways related to carbohydrate metabolism, genetic information processing, and environmental information processing. In contrast, those related to F- and NF- branchlets were mostly enriched in KEGG pathways related to metabolism, plant hormone signal transduction, and genetic information processing. The gene modules related to F-leaves were enriched in pathways related to plant hormone signal transduction and protein processing, while the gene modules related to NF-leaves were enriched in KEGG pathways related to oxidative phosphorylation (Additional file 10: Table S7).
Identification of SSRs
We detected a total of 9,296 SSRs in 7,668 unigenes longer than 1,000 bp (Additional file 11: Table S8). 1,628 (21.23%) unigenes contained more than one SSR. Mono-nucleotide repeats were the most common (46.28% of all SSRs) at a density of 71 SSRs per Mb, followed by tri- (26.32%) and di- (22.06%) nucleotide repeats, with densities of 40 and 32 SSRs per Mb, respectively (Fig. 7).