Revealing the full-length transcriptome of caucasian clover rhizome development

doi:10.21203/rs.3.rs-18210/v2

Download PDF

Research article

Revealing the full-length transcriptome of caucasian clover rhizome development

https://doi.org/10.21203/rs.3.rs-18210/v2

This work is licensed under a CC BY 4.0 License

Journal Publication

published 16 Sep, 2020

Read the published version in BMC Plant Biology →

You are reading this latest preprint version

Background: Caucasian clover (Trifolium ambiguum M. Bieb.) is a strongly rhizomatous, low-crowned perennial leguminous and ground-covering grass. The species may be used as an ornamental plant and is resistant to cold, arid temperatures and grazing due to a well-developed underground rhizome system and a strong clonal reproduction capacity. However, the posttranscriptional mechanism of the development of the rhizome system in caucasian clover has not been comprehensively studied. Additionally, a reference genome for this species has not yet been published, which limits further exploration of many important biological processes in this plant.

Result: We adopted PacBio Sequencing and Illumina Sequencing to identify differentially expressed transcripts in five tissues taproot (T1), horizontal rhizome (T2), swelling of taproot (T3), rhizome bud (T4) and rhizome bud tip (T5) of the caucasian clover rhizome. In total, we obtained 19.82 GB clean data and 80,654 nonredundant transcripts were analysed. Additionally, we identified 78,209 open reading frames (ORFs), 65,227 coding sequences (CDSs), 58,276 simple sequence repeats (SSRs), 6,821 alternative splicing (AS) sites, 24,29 long noncoding RNAs (lncRNAs) and 4,501 putative transcription factors (TFs) from 64 different families. Compared with other tissues, T5 exhibited more differentially expressed genes, and co-upregulated genes in T5 are mainly annotated as involved in phenylpropanoid biosynthesis. We also identified betaine aldehyde dehydrogenase (BADH) as a highly expressed gene-specific to T5. A weighted gene co-expression network analysis (WGCNA) cluster analysis of transcription factors and physiological indicators were combined to reveal 11 candidate genes (MEgreen-GA3), three of which belong to the HB-KNOX family, that are up-regulated in T3. We analysed 276 differential transcripts involved in hormone signaling and transduction, and the largest number of transcripts are associated with the IAA signaling pathway, with significant up-regulation in T2 and T5.

Conclusions: Taken together, this study contributes to our understanding of gene expression across five different tissues and provides preliminary insight into rhizome growth and development in caucasian clover.

Plant Physiology and Morphology

Plant Molecular Biology and Genetics

Caucasian clover (Trifolium ambiguum. Bieb.)

full-length transcriptome

RNA-Seq

rhizome

plant hormone

TFs

Caucasian clover (Trifolium ambiguum M. Bieb.) also known as Kura clover, is low-crowned perennial legume that is strongly rhizomatous[1]; the species originates from the region encompassing Caucasian Russia, eastern Turkey and northern Iran[2]. Caucasian clover can protect lawns as a flowering species and an ornamental plant[3]. Compared with white clover, caucasian clover has lower fiber concentrations, greater protein concentrations and forage digestibility [4]，outperforming in high aluminum soils[5] and provided high-quality fodder during the year when white clover grows insufficient[6]. Caucasian clover has deep, semi-woody, usually branched main roots, and many branched roots grow new plantlets, either at the ends or nodes[7-9]. The species can tolerate continuous grazing by cattle (Bos taurus)[7], extreme winter temperatures[8], seasonal moisture deficit and many serious diseases that affect other types of clover[9, 10]. These features are attributed to its prominent primary roots, low-spread crowns and well-developed rhizome systems[11].

Recently, high-throughput RNA sequencing (RNA-seq) technology has become a powerful and cost-efficient means to facilitate an understanding of differential gene expression and regulatory mechanisms, especially for plant species without a reference genome[12, 13, 14]. RNA-seq has been widely used in the study of rhizome transcriptomics, including in various rhizomatous species such as sorghum (Sorghum halepense and Sorghum propinquum)[15, 16], bamboo (Phyllostachys praecox)[17], Oryza longistaminata[18-20], Equisetum hyemale[21], Panax ginseng[22], Phragmites australis[23], tropical lotus (Nelumbo nucifera)[24], CangZhu (Atractylodes lacea)[25], Ligusticum chuanxiong[26], ginger (Zingiber officinale)[27], and Miscanthus lutarioriparius[12]. Identification of the energy, metabolism, hormones and protein genes associated with rhizome development has revealed that plant rhizomes are also rich in growth-related regulatory factors. For example, 48 important transcription factors (TFs) belonging to the bHLH families YABBY, NAM, TCP, TALE, and AP2 are expressed specifically or in abundance in the shoot tip and elongation regions of Oryza longistaminata [18-20].

In recent years, an increasing number of Pacific Biosciences (PacBio) sequencing full-length transcriptomes have been generated. The PacBio sequencing is a single-molecule sequencing technology with a longer read length than second-generation sequencing and an average read length of up to 15 kb, also named Single Molecule, Real-Time (SMRT). It does not require assembly and can completely retain the entire sequence from the 3' to 5' ends of an RNA, but with a higher error rate; moreover, the second-generation stepwise approach can correct mistakes[28-31]. PacBio has been utilized to detect more than 42,280 different splicing isoforms and a large number of AS events were found to be associated with the rhizome system and assist genome annotation in orchardgrass[32,33]. The results indicate that posttranscriptional regulation plays an important role in the rhizome system. Moreover, a combination of Illumina and PacBio sequencing applied to various root tissues, particularly the periderm, has provided a more complete view of the Danshen (Salvia miltiorrhiza) transcriptome[34]. PacBio sequencing and RNA-seq analysis together have also been used to identify differentially expressed transcripts along a developmental gradient from the shoot apex to the fifth internode of Populus P. deltoides×P. euramericana cv.‘Nanlin895’, showing 15,838 differentially expressed transcripts, of which 1,216 are TFs[35].

Compared with traditional herbage legumes, such as white clover and red clover, the rhizome is one of the most distinctive characteristics of caucasian clover[9, 36]. The rhizome system has important functions regarding energy storage, transport and vegetative reproduction[37-40]. Elucidating the molecular mechanisms underlying rhizome initiation and development will not only contribute to a better understanding of this important biological process but will also serve as a theoretical basis for efficiently identifying finding important rhizome genes. In this study, we combined Illumina and PacBio sequencing to generate a more complete caucasian clover full-length transcriptome, with analysis of gene expression in five different tissues of caucasian clover rhizomes. To the best of our knowledge, this is the first report on transcriptome profiling of candidate genes related to rhizome development in caucasian clover. These genes are excellent candidates for further functional characterization to elucidate their roles in rhizome differentiation, growth and development.

Analysis of PacBio sequencing datasets

Transcriptome sequencing of the caucasian clover rhizome was completed, and 19.82 GB of clean data were obtained using one cell. We identified 658,323 reads of inserts (ROIs) with a mean length of 2,286 bp, quality of 0.94; 12 passes from 720,832 polymerase reads with full passes>=0 and a predicted consensus accuracy>0.8 (Table 1). In total, the ROIs included 62.87% (449,460) full-length (FL) reads and 29.4% (193,513) non full-length reads of the entire transcriptome sequence from the 5’ to the 3’ end and polyA tail. Additionally, the number of full-length nonchimeric (FLNC) reads was 441,885, with an average full-length nonchimeric read length of 1,969 bp (Table 1). The main number distribution of cDNA and ROIs were shown in Additional file 1: Figure S1a and S1b.

As PacBio sequencing results have a high error rate, FLNC reads were clustered using the iterative isoform-clustering (ICE) algorithm and corrected with the Illumina Hiseq2500 platform to correct errors. We generated 227,516 consensus isoforms with an average consensus isoform length of 2,086 bp, including 148,836 high-quality isoforms (Table 1). We successfully obtained 80,654 non-redundant transcripts using CD-HIT for caucasian clover rhizomes analysis.

Prediction of ORFs, SSRs and lncRNAs and identification of AS events

To identify putative protein-coding sequences, we predicted 78,209 open reading frames (ORFs) using TransDecoder. In total, 65,227 coding sequences (CDSs) were identified with start and stop codons, and the distribution of the numbers and lengths of complete CDSs is shown in Fig.1a. Among them, 12,630 transcripts were distributed in the 100-200 bp range.

A total of 79,424 sequences (167,351,883 bp) were examined, including 58,276 simple sequence repeats (SSRs) and 36,110 SSR-containing sequences (Additional file 2: Table S1). The number of sequences containing more than one SSR was 13,856, and the number of SSRs present in compound form was 10,041. In addition, most are mononucleotides (33,533), dinucleotides (8,610) and trinucleotides (14,026).

In this study, 2,429 long noncoding RNAs (lncRNA) transcripts were predicted by a coding potential calculator (CPC), coding-non-coding index (CNCI), pfam protein structure domain analysis and coding potential assessment tool (CPAT) (Fig.1b), revealing candidate lncRNAs for future research.

Regarding alternative splicing (AS) events, 6,821 were detected. Because no reference genome is available for caucasian clover, we could not identify the types of AS. Nonetheless, as AS is an important mechanism for regulating gene expression and producing proteome diversity, we show the results of these AS events in the KEGG enrichment (Fig.1c), and the genes were found to be highly enriched in the following categories: “glycolysis/Gluconeogenesis” (147), “spliceosome” (129), “carbon metabolism” (129), “protein processing in endoplasmic reticulum” (109) and “biosynthesis of amino acids” (96).

Transcript annotation

We annotated 77,927 (96.61%) transcripts in at least one of seven databases, including NCBI non-redundant protein (NR), Swiss-Prot (a manually annotated and reviewed protein sequence database), Gene Ontology (GO), Clusters of Orthologous Groups (COG), EuKaryotic Orthologous Groups (KOG), Protein family (Pfam) and Kyoto Encyclopedia of Genes and Genomes (KEGG). The number of detailed annotations for five of the databases (GO, COG, NR, KEGG and Swiss-Prot) is shown in a Venn diagram (Additional file 3: Figure S2).

Through homologous species analysis comparing transcriptome sequences in the NR database, 77,721 transcripts were annotated. Approximately 65.23% (50,689) of the sequences were aligned to Medicago truncatula sequences, followed by Cicer arietinum (23.72%, 18,435) (Fig.2a).

All the assembled transcripts were subjected to searches against the COG database to evaluate the effectiveness and completeness of the transcriptome annotation, and the results were divided into 26 main categories (Fig.2b). The clusters “general function predicted only” (10,388), “transcription” (6,199), “replication, recombination and repair” (5,988), represented three of the largest groups, followed by “signal transduction mechanisms” (5,828) and “posttranslational modification, protein turnover, chaperone” (2,943).

A 18,529 KEGG pathway analysis was performed to identify associated biochemical pathways (Fig.2c). A total of 33,383 (42.84%) transcripts were matched in the KEGG database and further classified into 128 KEGG pathways. “biosynthesis of amino acids” (1396), “carbon metabolism” (1389), “protein processing in endoplasmic reticulum” (1232), “starch and sucrose metabolism” (1189) and “spliceosome” (1170) were the most represented pathways.

Based on the GO analysis, 57,583 transcripts were enriched in the three ontologies (Fig.2d) “biological process”, “molecular function” and “cellular component”. Transcripts involved in biological processes mainly included “metabolic process” (39,010), “cellular process” (33,383), and “single-organism process” (26,933). In molecular function, transcripts were mainly enriched in “binding” (32,350), “catalytic activity” (32,206), and “transporter activity” (3,404). Regarding the “cellular component” category, the major classes of transcripts were related to “cell part” (24,076) “cell” (23,984) and “organelle” (16,648).

Expression of specific genes and statistics ofdifferentially expressed genes (DEGs)

We investigated the transcript expression levels in the five tissues, and T1 had the highest number of expressed genes (76,124), followed by T4 (75.978), T2 (75,885), T3 (74,396) and T2 (74,327) (Additional file 4: Figure S3a). The number of genes co-expressed in each tissue was 68,241. Fragments per kilobase of transcript sequence per million base pairs sequenced (FPKM) values were used to represent the expression levels of genes, and the distributions in each tissue are shown in Additional file 4: Figure S3b. To determine specific genes expressed in tissues and provide insight into the specialized developmental process, specific genes (at least two repeats and FPKM>0.1) with the top 5 expression levels were selected for investigation(Table 2). Among them, FPKMs were higher in T2 and T5: F01_cb16574_c994/f1p0/1592 (mitogen-activated protein kinase, MAPK3) and F01_cb71 58_ c 94/f1p0/1000 (betaine aldehyde dehydrogenases, BADH).

To identify the gene expression differences in the development of the caucasian clover rhizome, we focused on the identification of differentially expressed genes (DEGs). As shown in the diagram of the DEG distribution in different rhizome tissues (Fig.3a), T3 and T4 had the largest number (33,612) of DEGs, with 18,372 up-regulated and 15,240 down-regulated genes. Moreover, T5 and other tissues (T1, T2, T3 and T4) had more DEGs; 4,585 co-upregulated genes and 4,196 co-downregulated genes were found in T5 compared with different tissues (Fig.3b and c). With regard to the up-regulated genes in T5, only 65 were enriched in the KEGG analysis: “phenylpropanoid biosynthesis” (9), “protein processing in endoplasmic reticulum” (8), “phenylalanine metabolism” (7), “carbon metabolism” (7) and “isoflavonoid biosynthesis” (6) (Fig.3d). For the down-regulated genes in T5, 231 genes were annotated, mainly involving “carbon metabolism” (35), “biosynthesis of amino acids” (34), “glycolysis/gluconeogenesis” (29), “protein processing in endoplasmic reticulum” (25) and “spliceosome” (21) (Fig.3e).

TFs prediction and WGCNA analysis

Transcription factors (TFs) play critical roles in plant growth and development. We examined 4,501 putative TFs from 64 different families((Additional file 5: Table S2), and the top 20 TF families are shown in Fig.4a, with the AP2/ERF-ERF (374) , C3H (372), BHLH (324), WRKY (305), GRAS (302), NAC (270), BZIP (246), C2H2 (239), and MYB-related (180) families having the most. These TFs are widely involved in plant growth and responses to stress and are related to rhizome development.

We used weighted gene co-expression network analysis (WGCNA) to further explore the relationship between TFs (filtering TF with FPKM value <1 and K-ME<0.7) and physiological characteristics in the rhizome of caucasian clover (Additional file 6: TableS3). Highly correlated TFs genes clusters are defined as modules, in which TFs genes within the same cluster are highly correlated. WGCNA analysis identified eight distinct modules (labelled with different colours in Fig.4b). The correlation coefficient between the characteristic genes of each module of 10 different modules and each different sample (trait) is presented in Fig.4c. Notably, IAA-trait and GA3-trait were significantly correlated with the MEgreenyellow modules and MEgreen modules (r² > 0.8, p <10^-4). The majority genes of the green module were up-regulated for six traits, except for ABA; most of these TFs were mainly up-regulated in T3 (Additional file 7: FigureS4). We identified 11 hub genes based on the criteria of KME (eigengene connectivity) ≥0.99 and edge weight value ≥0.5 in the green module based on the regulatory network (Fig.4d). These hub genes mainly belong to the HB-KNOK, AP2/ERF-ERF, GRAS, C2H2, C3H and NAC families (MEgreen-GA3); moreover, these TF families were upregulated in T1, T2 and T3, particularly in T3 (Fig. 4e), and may be related to the formation of nodules in the rhizome.

Identiﬁcation of hormone signalling-related genes in rhizome development

Plant hormones play an important role in all aspects of development. Accordingly, we mapped DEGs to hormone signalling and transduction pathways for caucasian clover and analysed their expression in different tissues. In total, 276 transcripts are related to the synthesis and metabolism of eight hormones, including auxin (IAA), abscisic acid (ABA), ethylene (ETH), cytokinin (CTK), gibberellic acid (GA), brassinosteroid (BR), jasmonic acid (JA) and salicylic acid (SA) (Fig.5). The maximum number of transcripts (62) is related to IAA synthesis and metabolism, followed by ABA at 60, JA at 52, SA at 24, ETH at 28, BR at 20, CTK at 18 and GA at 10. All these significant genes related to hormones synthesis and metabolism exhibited different expression in the different tissues. In addition, most genes associated with the IAA pathway were up-regulated in T2 and T5. Regarding SA signalling, almost all genes belonged to the TGA family, with up-regulation only in T4. Most transcripts associated with BR signaling showed higher levels in T4 and T5. For CTK transduction, all crucial genes associated with CTK signalling pathway were identified as DEGs. Only three genes showed no up-regulation trend in T3, which may be the tissues in which cells divide in large numbers. Only ten DEGs were associated with GA signalling; five transcripts were classified as GID1 and significantly up-regulated in T2. Genes related to ABA signalling and transduction displayed no significant change. For JA signalling, 52 transcripts all were annotated as JAZ, with up-regulation in T1, T2 and T3. In contrast, most of ETH signaling genes exhibited higher expression in T3, and all were down-regulated in T2.

Verification of gene expression by qRT-PCR

To confirm the accuracy of the genes obtained by RNA-seq, twelve genes including six plant hormone signal transduction genes, three TFs and three genes belonging to other classes were randomly selected for quantitative real-time RT-PCR (qRT-PCR) analysis. Good reproducibility between the qRT-PCR and RNA-seq results was indicated by Pearson’s correlation analysis, verified the accuracy and reliability of the RNA-seq data (Additional file 8: TableS4).

The rhizome of caucasian clover is unique among legume species, endowing it with particular clonal reproduction characteristics and resistance to stress. In this study, we obtained high-quality transcript sequences for the caucasian clover rhizome by PacBio and Illumina sequencing, and the results will contribute to our understanding of rhizome growth and development and lay a molecular foundation for further study.

AS is a vital mechanism regulating gene expression and producing protein diversity[41]. The numbers of AS events identified for the first time in the caucasian clover rhizome was found to be lower than that in Medicago sativa (7,568)[42] but higher than that in Trifolium pratense (5,492)[13]. Our study on the special characteristics of the caucasian clover rhizome was hindered by the lack of a reference genome, and it is impossible to judge the type of AS.

Many amino acids (phenylalanine, tyrosine and tryptophan) are not only important components of proteins but are also precursors of many secondary metabolites. These secondary metabolites are crucial for plant growth[43]. Similar to the rhizome of other plants (Oryza longistaminata and Miscanthus lutarioriparius)[19, 12], some basal metabolism plays an important role in the rhizome of caucasian clover, for example, carbon metabolism (1,389) and starch and sucrose metabolism(1,189).

By analysing the specific gene expression in each tissue, we found that MAPK3 (F01_cb16574_c994/f1p0 /1592) is mainly specifically expressed gene in T2 (Table 2). The MAPK family has been studied in tobacco, and it may be involved in growth, development, response to plant hormone signals and environmental signals[44]. T5 is very different from root tissue (T1, T2 and T3) (Fig.3a), with more DEGs, and it is where new aboveground parts are formed. A specific gene with the highest FPKM in T5 was observed for betaine aldehyde dehydrogenase 1 (F01_cb7158_c94/f1p0/1000) (Table 2). BADHs are involved in glycine betaine synthesis and act as plant osmotic regulators, with important roles in abiotic stress[45]. Experiments have shown that BADH can increase the abiotic stress tolerance of sweet potato and carrot, such as salt stress, oxidative stress and low-temperature stress, maintaining cell membrane integrity[46, 47]. BADH was specifically expressed in a large amount in T5. This may be because T5 is relatively more fragile than other tissues, and BADHs may protect T5 for promoting the growth of new plants. Thus, we speculate that defence and stress response play a vital role in the development of caucasian clover. This may be the reason for its ability to grow in extreme winter temperatures, or it may be a necessary condition for a large rhizome system.

TFs can offer knowledge into the gene-regulating networks controlling developmental programs and are recognized as major players in better understanding root tissue differentiation and root development in response to internal growth regulators as well as environmental signals[48, 49]. It has been reported that the genes involved in hormone metabolism, cellulose synthesis energy, metabolism substance synthesis and transportation stress as well as expansion-related protein genes and TFs such as HLH, TCP ,WRKY, bZIP, MYB and NAC participate in the formation of lotus root rhizomes. In addition, the AP2/ERF TF family had the greatest number in our rhizome(Fig.4a), among which ethylene response factors, such as BBM/PLT4 and PLT1-3, have been described as master regulators of root meristem initiation and maintenance in Arabidopsis thaliana[50, 51]. In Raphanus sativus and Medicago sativa, the abiotic stress response mechanism regulated by AP2/ERF has been carefully studied[52, 53]. The Arabidopsis NAC family member NAC1 transduces auxin signals downstream of TIR1 to promote lateral root development[54]. In the WRKY family, WRKY75 was reported to be involved in regulating nutrient starvation response and root development[55]. It is worth noting that we performed WGCNA clustering for TFs and found that 3 of 11 hub genes belong to the HB-KNOX family in MEgreen-GA3 (Fig. 4d). GA can suppress the effect of elevated KNOX gene expression, and there is a possibility of modifying KNOX gene expression to alter plant structure through local changes in GA levels[56]. In Arabidopsis, the GA20ox1 mRNA level is reduced in leaves overexpressing the KNOX proteins STM or BREVIPEDICELLUS[55]. Moreover, in model plants , such as Arabidopsis, maize and tobacco, KNOX gene expression is confined to the shoot meristem and stem[58]. However, in the underground rhizome of caucasian clover, KNOX genes were identified as hub genes, especially in the main tissue of the swollen taproot (T3). Whether it is consistent with KNOX regulation of Arabidopsis meristem stems and buds are worth further investigation[59]. Xi Cheng found that in pear plants co-expressing KNOX and PbKNOX1 these factors are involved in cell wall thickening and lignin biosynthesis, with inhibition of key structural genes involved in lignin synthesis[60].

There is growing evidence that hormones affect tillering growth and the formation of storage organs[61, 12]. Yi Kun’s research has illustrated 600 mg·L^{- 1} GA₃can promote the growth and development of caucasian clover rhizome and increase the content of endogenous IAA, ZT and GA₃[62]. T1 and T2 showed high GID1 expression and GA may be involved in photoperiod induction and regulation of the formation of storage organs and rhizome elongation[63]; therefore, TI and T2 may be key organs for nutrient storage. In addition, JAZ accumulated in roots (T1, T2 and T3). It is believed that when pathogens invade and abiotic stress occurs, JAZ-MYC form an immune network, followed by JAZ protein degradation and MYC TF release[64]. T3 is closely linked to ETH and CTK-mediated pathways, and it may be responsible for root swelling. It has been demonstrated that auxin activates root formation and that cytokinins mediate root identity, early primordial disintegration and early loss of bud development initiation[65]. CTKs are conducive to rhizome enlargement but not to rhizome induction.

Genes related to the IAA anabolic pathway were downregulated in T3 and T4. It may be that IAA does not participate in root enlargement or induce bud production but that it is closely related to lateral root development. T1, T2 and T3 might mainly function as storage organs, providing energy for plant growth. In addition, T4 tissue may be relatively fragile and require the SA pathway to mediate immunity to prevent pathogen infection and to grow new plants. BR signaling is mainly involved in plant growth and plant morphology development and related genes were upregulated in T5[66]. T4 and T5 are mainly associated with resistance to stress and secondary metabolic pathways. Of course, hormones are not the only factors that regulate the development of apical meristems and lateral organs; they often cooperate with TFs to balance the maintenance of meristems and organogenesis.

In summary, we provided a full-length transcriptome of the caucasian clover rhizome based on PacBio sequencing and Illumina sequencing, revealing gene expression and annotation for different tissues. We highlighted the role of hormones and TFs in the rhizome of caucasian clover, investigated the expression of hormone-pathway related genes in different tissue of caucasian clover and identified 11 candidate genes in TF- and GA-related modules by WGCNA. In this study, a set of genes related to rhizome development was identified, laying the foundation for further functional genomics research on rhizome development.

Plant materials and RNA preparation

The taproot (T1), horizontal rhizome (T2), swelling of taproot (T3), rhizome bud (T4) and rhizome bud tip (T5) of 3-year-old caucasian clover (Fig.6) were collected from a test field at Northeast Agricultural University (E 126º14'; N 45º05' in August 2018 ). We placed each sample collected into a 1.5 ml centrifuge tube with three replicates from five individual plants for each tissue to ensure RNA quantity. Each sample showed good correlation (R²>0.8; Additional file 9: Figure S5). The original sources of the plant materials were introduced from Inner Mongolia Grass Variety Engineering Technology Research Center of Inner Mongolia Agricultural University. The Inner Mongolia Grass Variety Engineering Technology Research Center of Inner Mongolia Agricultural University undertook the formal identification of the samples, provided details of the specimens deposited and allowed the collection. The IPNI Life Sciences Identifier (LSID) for caucasian clover is urn:lsid:ipni.org:names:522843-1. Plants were removed from the soil bed, and the roots were washed gently with running water, frozen in liquid nitrogen and immediately stored at -80℃. Total RNA was extracted using Trizol. RNA degradation and contamination were monitored by 1.2% agarose gel electrophoresis. The quantity and integrity of the extracted total RNA were determined using a NanoDrop and an Agilent 2100 bioanalyzer[13]. For each RNA sample, 1μg was pooled and sequenced by PacBio single-molecule long-read sequencing (PacBio Sequel, Menlo Park, USA) and Illumina sequencing (Illumina NovaSeq6000, California, U.S.A) in parallel.

PacBio cDNA library preparation and sequencing

Full-length cDNA was synthesized using the SMARTerTM PCR cDNA Synthesis Kit and then subjected to full-length cDNA PCR amplification and repair of cDNA ends. The concentration and quality of the cDNA library were determined using the Qubit 2.0 Fluorometer and an Agilent 2100 bioanalyzer[67]. The 1-6-KB library was sequenced via PacBio Sequel.

Illumina cDNA library construction and sequencing

First, 15 samples of eukaryotic mRNA were enriched with magnetic beads with oligo(dT) and randomly broken into small fragments in a fragmentation buffer. First-strand cDNA was synthesized using six-base random hexamers with a small fragment of mRNA as a template. The second cDNA strand was synthesized by adding buffer, dNTPs, RNase H and DNA polymerase I, and the cDNA was purified by AMPure XP beads. The purified double-stranded cDNA was subjected to end repair, a tail was added, and the sequencing linker was ligated; the fragment size was then selected using AMPure XP beads. The final cDNA library was assessed by PCR, and the quality of the cDNA library was determined using an Agilent 2100 Bioanalyzer (Santa Clara, CA). The libraries were sequenced from both 5’ and 3’ ends using Illumina NovaSeq.

PacBio sequencing data analysis

Raw reads were processed into error-corrected reads of insert (ROIs) using the Iso-seq pipeline with minFullPass=0 and minPredictedAccuracy=0.80. Next, full-length, non-chimeric (FLNC) transcripts were determined by searching for the polyA tail signal and the 5’ and 3’ cDNA primers in ROIs. ICE was used to obtain consensus isoforms and FL consensus sequences from ICE data, which were further processed using Quiver. High-quality FL transcripts were classified with the criterion postcorrection accuracy above 99%. Iso-Seq high-quality FL transcripts were obtained, and redundancy was removed using cd-hit (identity > 0.99)[68].

Illumina sequencing data analysis

Raw data (raw reads) in FASTQ format were first processed through in-house Perl scripts. In this step, clean data (clean reads) were obtained by removing reads containing adapters, reads containing poly-N and low-quality reads from the raw data. At the same time, the Q20, Q30, GC-content and sequence duplication level of the clean data were calculated. All downstream analyses were based on these clean data with high quality. These clean reads were then mapped to the PacBio reference genome sequence. Clean data were normalized by converting the fragment counts to fragments per kilobase of transcript per million mapped reads (FPKM). A differential expression analysis was performed using DESeq (v 1.10.1); a fold change ≥4 and FDR < 0.01 based on DESeq was considered differential expression[69].

Detection of SSR, ORFs, AS and lncRNA

Simple sequence repeats (SSRs) in the transcriptome were identified using MISA (http://pgrc.ipk-gatersleben.de/misa/), with only transcripts ≥ 500 bp being detected.

TransDecoder software (https://github.com/TransDecoder/TransDecoder/releases) was employed to identify reliable potential coding sequences (CDSs) from the transcript sequences based on the open reading frame (ORF) length, log-likelihood score, and amino acid sequence comparison in the Pfam database.

We used Iso-SeqTM data directly to run all-vs-all BLAST with high-identity settings[70]. BLAST alignments that met all criteria were considered products of candidate alternative splicing (AS) events[71], with two HSPs (high segmentation pairs)＞1000 bp in the alignment. Two HSPs have the same forward/reverse orientation, and one sequence should be contiguous in the same alignment or have a small overlap of less than 5 bp. The other should be different to show "AS Gap", and the contiguous sequence should align completely with the different sequences. The AS gap should be greater than 100 bp and at least 100 bp from the 3'/5' end.

Four computational approaches include CPC[72]/CNCI/CPAT[73]/Pfam[74] were combined to sort nonprotein-coding RNA candidates from putative protein-coding RNAs among the transcripts. Putative protein-coding RNAs were filtered using a minimum length and exon number threshold. Transcripts with lengths more than 200 nt and more than two exons were selected as long noncoding RNA (lncRNA) candidates and further screened using CPC/CNCI/CPAT/Pfam, which have the power to distinguish protein-coding genes from noncoding genes.

Functional annotation

Annotation information on the obtained nonredundant transcript sequences was based on BLAST in the following databases: NR, Pfam[74], KOG[75], COG;, swiss-Prot[76], KEGG[77] and GO[78].

Real-time RT- PCR

Quantitative real-time RT-PCR (qRT-PCR) was conducted in a 10-μl volume containing 0.5μl diluted cDNA, 0.2μl forward primer, 0.2μl reverse primer, and 1×SYBR Premix Ex Taq II (TaKaRa) with the following conditions: 95ºC for 180 sec, followed by 40 cycles of 95ºC for 15 sec, 59ºC for 15 sec and 72ºC for 15 sec. The 2^-^△△Ct method was used to calculate relative expression levels. All reactions were performed with three replicates. All primers used are shown in Additional file 10: Table S5.

Ethics approval and consent to participate

Not applicable.

Consent to publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Availability of data and materials

Raw reads of one combined PacBio library and one Illumina RNAseq library generated in this study are available from BioProject at NCBI (https://www.ncbi. nlm.nih.gov/bioproject/) under accession numbers PRJNA586585 and PRJNA588309, respectively.

Funding

This research was funded by the National Natural Science Foundation of China (31802120), Research and Demonstration of Large-scale Artificial Grassland Combined Plant and Circular Mode (2017YFD0502106) and Academic Backbone Fund Project of Northeast Agricultural University. The funders did not design the experiment or draft and revise the manuscript.

Authors’ contributions

XY designed the experiment and revised the manuscript. KY performed data processing and drafted the manuscript. YZ, YH and XC prepared the materials and performed the experiments. TH and JL assisted in manuscript preparation. GC conceived the study. All authors read and approved the ﬁnal version of the manuscript.

Acknowledgements

We thank the National Natural Science Foundation of China (31802120) for supporting this work.

OFRs: Open reading frames; CDSs: Coding sequences; SSRs: Simple sequence repeats; AS: Alternative splicing; LncRNA: Long non-coding RNA; TF: Transcription factors; KB: Kilobase; ROI: Reads of insert; ICE: Iterative isoform-clustering; FL: full-length; FLNC: Full-length non chimera; CNCI: Coding-Non-Coding Index; CPC: Coding potential; CPAT: Coding potential assessment tool calculator; FLNC: Full-length non chimera; COG: Clusters of Orthologous Groups; NR: NCBI non-redundant protein; GO: Gene Ontology; KOG: EuKaryotic Orthologous Groups. Pfam: Protein family; KEGG: Kyoto encyclopedia of Genes and Genomes; DEG: Differentially expressed genes; FPKM: Fragments per kilobase of transcript sequence per million base pairs sequenced; WGCNA: Weighted gene co-expression network analysis; IAA: Auxin; ABA: Abscisic acid; ETH: Ethylene; CTK: Cytokinin; GA: Gibberellic acid; BR: Brassinosteroid; JA: Jasmonic acid; SA: Salicylic acid; BADHs: Betaine aldehyde dehydrogenases; MAPK: Mitogen-activated protein kinase; qRT-PCR: Quantitative real-time RT-PCR.

Black AD, Lucas RJ. Caucasian clover was more productive than white clover in grass mixtures under drought conditions. NZGA. 2000;62:183-8.
Davis PH. Flora of Turkey and the East Aegean Islands. Q Rev Biol. 1965;35(1):103.
Lane I, Watkins E, Spivak M. Turfgrass species affect the establishment and bloom of Kura Clover (Trifolium ambiguum) in lawns. Hortscience. 2019;54(5):824.
Andrzejewska J, Contreras-Govea FE, Pastuszka A, Albrechtc KA. Performance of Kura clover compared to that of perennial forage legumes traditionally cultivated in central Europe. Acta Agr Scand. 2016;66(6):516-22.
Caradus JR, Crush JR, Ouyang L, FRASER W. Evaluation of aluminium-tolerant white clover (Trifolium repens) selections on East Otago upland soils. NZGA.2001;44:141-50.
Watson RN, Neville FJ, Bell NL, Harris SL. Caucasian clover as a pasture legume for drylanddairying in the coastal Bay of Plenty.NZCA.1996;58:83-8.
Brummer EC, Moore KJ. Persistence of perennial cool-season grass and legume cultivars under continuous grazing by beef cattle. Agron J. 2000;92(3):466-71.
Sheaffer CC, Marten GC. Kura clover forage yield, forage quality, and stand dynamics. Can J Plant Sci. 1991;71(4):1169-72.
Taylar NL, Smith RR. Kura Clover (Trifolium ambiguum M. B.) Breeding, Culture, and Utilization. Adv Agron. 1997;63(08):153-78.
Andrzejewska J, Contreras-Govea FE, Pastuszka A, Albrecht KA. Performance of Kura clover compared to that of perennial forage legumes traditionally cultivated in central Europe. Acta Agr Scand B-S P. 2016;66(6):516-22.
Speer GS, Allinson DW. Kura clover (Trifolium ambiguum): Legume for forage and soil conservation. Econ Bot. 1985;39(2):165-76.
Hu RB, Yu CJ, Wang XY, Jia CL, Pei SQ, He K, He G, Kong YZ, Zhou GK. De novo transcriptome analysis of Miscanthus lutarioriparius identifies candidate genes in rhizome development. Front Plant Sci. 2017;8(482).
Chao YH, Yuan JB, Li SF, Jia SQ, Han LB, Xu LX. Analysis of transcripts and splice isoforms in red clover (Trifolium pratense L.) by single-molecule long-read sequencing. BMC Plant Biol. 2018;18:300-12
Huang LK, Yan HD, Zhao XX, Zhang XQ, Wang J, Frazier T , Yin G, Huang X, Yan DF, Zang WJ , et al. Identifying differentially expressed genes under heat stress and developing molecular markers in orchardgrass (Dactylis glomerata L.) through transcriptome analysis. Mol Ecol Resour. 2015;151(5):497-509.
Jang CS, Kamps TL, Tang H, Bowers JE, Lemke C, Paterson AH. Evolutionary fate of rhizome-specific genes in a non-rhizomatous Sorghum genotype. Heredity. 2009;102(3):266-73.
Zhang T, Zhao XQ, Wang WS, Huang LY, Liu XY, Zong Y, Zhu LH, Yang DC, Fu BY, Li ZK. Deep transcriptome sequencing of rhizome and aerial-shoot in Sorghum propinquum. Plant Mol Biol. 2014;84(3):315-27.
Wang KH, Peng HZ, Lin EP, Jin QY, Hua X, Sheng Y, Bian HW, Ning H, Pan JW, Wang JH. Identification of genes related to the development of bamboo rhizome bud. J Exp Bot. 2010;61(2):551-61.
Hu FY, Tao DY, Sacks E, Fu BY, Xu P, Li J, Yang Y, Mcnally K, Khush GS, Paterson AH. Convergent evolution of perenniality in rice and sorghum. P Natl Acad Sci USA. 2003;100(7):4050-4.
Hu F. Identification of rhizome-specific genes by genome-wide differential expression analysis in Oryza longistaminata. BMC Plant Biol. 2011;11(1):18-32.
He RF, Salvato F, Park JJ, Kim MJ, Nelson W, Balbuena TS, Willer M, Crow JA, May GD, Soderlund CA, et al. A systems-wide comparison of red rice (Oryza longistaminata) tissues identifies rhizome specific genes and proteins that are targets for cultivated rice improvement. BMC Plant Biol. 2014;14(1):46-67.
Balbuena TS, He RF, Salvato F, Gang DR, Thelen JJ. Large-scale proteome comparative analysis of developing rhizomes of the ancient vascular plant Equisetum hyemale. Front Plant Sci. 2012;3(131).
Yang B\W, Hahm YT. Transcriptome analysis using de novo RNA-seq to compare ginseng roots cultivated in different environments. Plant Growth Regul. 2018;84(1):149-57.
Ruifeng H, Min-Jeong K, William N, Balbuena TS, Ryan K, Robin K, Crow JA, May GD, Thelen JJ, Soderlund CA. Next-generation sequencing-based transcriptomic and proteomic analysis of the common reed, Phragmites australis (Poaceae), reveals genes involved in invasiveness and rhizome specificity. Am J Bot. 2012;99(2):232-47.
Yang M, Zhu L, Pan C, Xu L, Liu Y, Ke W, Yang P. Transcriptomic analysis of the regulation of rhizome formation in temperate and tropical Lotus (Nelumbo nucifera). Sci Rep. 2015;5:13059.
Huang QQ, Huang X, Deng J, Liu HG, Liu YW, Yu K, Huang BS. Differential gene expression between leaf and rhizome in Atractylodes lancea: a comparative transcriptome analysis. Front Plant Sci. 2016;7(348).
Song T, Liu ZB, Li JJ, Zhu QK, Tan R, Chen JS, Zhou JY, Liao H. Comparative transcriptome of rhizome and leaf in Ligusticum Chuanxiong. Plant Syst Evol. 2015;301(8):2073-85.
Koo HJ, McDowell ET, Ma XQ, Greer KA, Kapteyn J, Xie ZZ, Descour A, Kim H, Yu Y, Kudrna D, et al. Ginger and turmeric expressed sequence tags identify signature genes for rhizome identity and development and the biosynthesis of curcuminoids, gingerols and terpenoids. BMC Plant Biol. 2013;13(1):27-44.
Pop M, Salzberg SL. Bioinformatics challenges of new sequencing technology. Trends in Genet. 2008;24(3):142-9.
Mason CE. Faster sequencers, larger datasets, new challenges. Genome Biol. 2012;13(3):314.
Rhoads A, Au KF. PacBio Sequencing and Its Applications. Genom Proteom Bioinf. 2015;13(5):278-89.
Li PH, Ponnala L, Gandotra N, Wang L, Si YQ, Tausta SL, Kebrom TH, Provart N, Patel R, Myers CR, et al. The developmental dynamics of the maize leaf transcriptome. Nat Genet. 2010;42(12):1060-7.
Wang T, Wang H, Cai D, Gao Y, Zhang H, Wang Y, Lin C, Ma L, Gu L. Comprehensive profiling of rhizome-associated alternative splicing and alternative polyadenylation in moso bamboo (Phyllostachys edulis). Plant J. 2017;91(4):684-99.
Huang LK, Feng GY, Yan HD, Zhang ZG, Bushman BS, Wang JP, Bombarely A, Li MZ, Yang ZF, Nie G, et al. Genome assembly provides insights into the genome evolution and flowering regulation of orchardgrass. Plant biotechnol J. 2020;18(2):373-83.
Xu ZC, Peters RJ, Weirather J, Luo HM, Liao BS, Zhang X, Zhu YJ, Ji AJ, Zhang B, Hu SN, et al. Full-length transcriptome sequences and splice variants obtained by a combination of sequencing platforms applied to different root tissues of Salvia miltiorrhiza and tanshinone biosynthesis.Plant J.2015;82(6):951- 61.
Chao Q, Gao ZF, Zhang D, Zhao BG, Dong FQ, Fu CX, Liu LJ, Wang BC. The developmental dynamics of the Populus stem transcriptome. Plant Biotechnol J. 2019;17(1):206-19.
Lin WH, W HR, Stephen S. Physiological responses of five species of Trifolium to drought stress. Chin J Appl Environ Biol. 2011;17(4):580-4.
Schmid B, Bazzaz FA. Clonal integration and population structure in perennials: effects of severing rhizome connections. Ecology. 1987;68(6):2016-22.
Schmid B, Bazzaz FA. Growth responses of rhizomatous plants to fertilizer application and interference. Oikos. 1992;65(1):13-24.
Humphrey LD, Pyke DA. Clonal foraging in perennial wheatgrasses: a strategy for exploiting patchy soil nutrients. J Eco. 1997;85(5):601-10.
Huber-Sannwald E, Pyke DA, Caldwell MM, Durham S. Effects of nutrient ptches and root systems on the clonal plasticity of a rhizomatous grass. Ecology. 1998;79(7):2267-80.
Chaudhary S, Khokhar W, Jabre I, Reddy ASN, Byrne LJ, Wilson CM, Syed NH. Alternative splicing and protein diversity: plants versus animals. Front Plant Sci. 2019;10(10):14.
Chao YH, Yuan JB, Guo T, Xu LX, Mu ZY, Han LB. Analysis of transcripts and splice isoforms in Medicago sativa L. by single-molecule long-read sequencing. Plant Mol Biol. 2019;99(3):219-35.
Tzin V, Galili G. New Insights into the shikimate and aromatic amino acids biosynthesis pathways in plants. Mol Plant. 2010;3(6):956-72.
ZHANG S. MAPK cascades in plant defense signaling. Trends Plant Sci. 2001, 6.
Fitzgerald TL, Waters DLE, Henry RJ. Betaine aldehyde dehydrogenase in plants. Plant Biol. 2009;11(2):119-30.
Kumar S, Dhingra A, Daniell H. Plastid-expressed betaine aldehyde dehydrogenase gene in carrot cultured cells, roots, and leaves confers enhanced salt tolerance. Plant Physiol. 2004;136(1):2843-54.
Fan WJ, Zhang M, Zhang HX, Zhang P. Improved tolerance to vrious abiotic stresses in tansgenic sweet potato (Ipomoea batatas) expressing spinach betaine aldehyde dehydrogenase. PLoS One. 2012;7(5):14.
Jin JP, Zhang H, Kong L, Gao G, Luo JC. PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors. Nucleic Acids Res. 2014;42(D1):D1182-7.
Montiel G, Gantet P, Jay-Allemand C, Breton C. Transcription factor networks. Pathways to the knowledge of root development. Plant Physiol. 2004;136(3):3478-85.
Licausi F, Ohme-Takagi M, Perata P. APETALA/Ethylene responsive factor (AP2/ERF) transcription factors: mediators of stress responses and developmental programs. New Phytol. 2013;199(3):639-49.
Mizoi J, Shinozaki K, Yamaguchi-Shinozaki K. AP2/ERF family transcription factors in plant abiotic stress responses. Bba-Gene Regul Mech. 2012;1819(2):86-96.
Karanja BK, Xu L, Wang Y, Tang M, M'Mbone Muleke E, Dong J, Liu L. Genome-wide characterization of the AP2/ERF gene family in radish (Raphanus sativus L.): Unveiling evolution and patterns in response to abiotic stresses. Gene. 2019;718:144048.
Jin XY, Yin XF, Ndayambaza B, Zhang ZS, Min XY, Lin XS, Wang YR, Liu WX. Genome-wide identification and expression profiling of the ERF gene family in Medicago sativa L. under various abiotic stresses. DNA Cell Biol. 2019;13.
Xie Q, Frugis G, Colgan D, Chua NH. Arabidopsis NAC1 transduces auxin signal downstream of TIR1 to promote lateral root development. Gene dev. 2000;14(23):3024-36.
Devaiah BN, Karthikeyan AS, Raghothama KG. WRKY75 transcription factor is a modulator of phosphate acquisition and root development in Arabidopsis. Plant Physiol. 2007;143(4):1789-801.
Singh DP, Filardo FF, Storey R, Jermakow AM, Yamaguchi S, Swain SM. Overexpression of a gibberellin inactivation gene alters seed development, KNOX gene expression, and plant development in Arabidopsis. Physiol Plant. 2010;138(1):74-90.
Bolduc N, Hake S. The maize transcription factor KNOTTED1 directly regulates the gibberellin catabolism gene ga2ox1. Plant Cell. 2009;21(6):1647-58.
Hay A, Tsiantis M. KNOX genes: versatile regulators of plant development and diversity. Development. 2010;137(19):3153-65.
Xu L, Shen WH. Polycomb silencing of KNOX Genes confines shoot stem cell niches in Arabidopsis. Curr Biol. 2008;18(24):1966-71.
Cheng X, Li ML, Abdullah M, Li GH, Zhang JY, Manzoor MA, Wang H, Jin Q, Jiang TS, Cai YP, et al. In silico genome-wide analysis of the pear (Pyrus bretschneideri) KNOX family and the functional characterization of PbKNOX1, an arabidopsis BREVIPEDICELLUS orthologue gene, involved in cell wall and lignin biosynthesis. Front Genet. 2019;10(17).
Dello Ioio R, Linhares FS, Scacchi E, Casamitjana-Martinez E, Heidstra R, Costantino P, Sabatini S. Cytokinins determine Arabidopsis root-meristem size by controlling cell differentiation. Curr Biol. 2007;17(8):678-82.
Yi K, Zhao YH, Hu Y, Liu JX, He TT, Li X, Song P, Cui GW, Yin XJ. Effect of GA₃ and 6 -BA on rhizome segment growth and endogenous hormone content of caucasian clover. Acta Prataculturae Sinica. 2020;29(2):22-30.
Sun T. Gibberellin-GID1-DELLA: A pivotal regulatory module for plant growth and Development. plant physiol. 2010;154(2):567-70
Garrido-Bigotes A, Valenzuela-Riffo F, Figueroa CR. Evolutionary analysis of JAZ proteins in plants: an approach in search of the ancestral sequence. Int J Mol Sci. 2019;20(20).
Pernisova M, Grochova M, Konecny T, Plackova L, Harustiakova D, Kakimoto T, Heisler MG, Novak O, Hejatko J. Cytokinin signalling regulates organ identity via the AHK4 receptor in Arabidopsis. Development. 2018;145(14):11.
Nie SM, Huang SH, Wang SF, Mao YJ, Liu JW, Ma RL, Wang XF. Enhanced brassinosteroid signaling intensity via SlBRI1 overexpression negatively regulates drought resistance in a manner opposite of that via exogenous BR application in tomato. Plant Physiol Biochem. 2019;138:36-47.
Gordon SP, Tseng E, Salamov A, Zhang J, Meng X, Zhao Z, Kang D, Underwood J, Grigoriev IV, Figueroa M. Widespread polycistronic transcripts in Fungi revealed by single-molecule mRNA sequencing. Plos One. 2015;10(7): e0132628.
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658-9.
Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):106-18.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research. 1997;25(17):3389-402.
Liu XX, Mei WB, Soltis PS, Soltis DE, Barbazuk WB. Detecting alternatively spliced transcript isoforms from single-molecule long-read sequences without a reference genome. Mol Ecol Resour. 2017;17(6):1243-56.
Kong L, Zhang Y, Ye ZQ, Liu XQ, Zhao SQ, Wei L, Gao G. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 2007;35:345-9.
Wang L, Park HJ, Dasari S, Wang SQ, Kocher JP, Li W. CPAT: coding-potential assessment tool using an alignment-free logistic regression model. Nucleic Acids Research. 2013;41(6):7.
Lamb J, Jarmolinska AI, Michel M, Menendez-Hurtado D, Sulkowska JI, Elofsson A. PconsFam: an interactive database of structure predictions of pfam families. J Mol Biol. 2019;431(13):2442-8.
Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, et al, A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol. 2004;5(2):28.
Renaux A, UniProt C. UniProt: the universal protein knowledgebase. Nucleic Acids Research. 2018;46(5):2699.
Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004;32:D277-80.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. Nat genet. 2000;25(1):25-9.

cDNA library	1-6KB
PacBio sequencing
polymerase reads	720,832
Reads of Insert	658,323
Read Bases of Insert	1,505,553,284
Mean Read Length of Insert	2,286
Mean Read Quality of Insert	0.94
Mean Number of Passes	12
Date size(GB)	19.82
ROI databases
Number of five prime reads	506,205
Number of three prime reads	521,059
Number of poly-A reads	514,174
Number of filtered short reads	15,350
Number of non-full-length reads	193,513
Number of full-length reads	449,460
Number of full-length non-chimeric reads	441,885
Average full-length non-chimeric read length	1,969
ICE clustering
Number of consensus isoforms	227,516
Average consensus isoforms read length	2,086
Number of polished high-quality isoforms	148,836
Number of polished low-quality isoforms	78,366
Percent of polished high-quality isoforms(%)	65,42
Non-redundant transcripts	80,654

Table 1 Statistics of the PacBio sequencing data, ROI databases and ICE clustering

KB (kilobase): A commonly used length unit of DNA, indicating that a certain DNA molecule contains a thousand base pairs. GB: Used to measure the amount of data, where 1GB=1,000,000 bp.

	Gene	FPKM		Description
T1	F01_cb14545_c33/f1p0/902	1.04	Diphosphoinositol-polyphosphate
	F01_cb17480_c9/f1p0/696	0.90	ADP-ribosylation factor 1
	F01_cb15376_c1/f1p0/1605	0.66	Protein of unknown function
	F01_cb14185_c1/f2p0/1231	0.50	Putative DNA-binding protein
	F01_cb8972_c16/f1p0/1728	0.42	Cytochrome P450
T2	F01_cb16574_c994/f1p0/1592	18.39	mitogen-activated protein kinase
	F01_cb17761_c3939/f1p0/3262	1.57	SNF2 family N-terminal domain;
	F01_cb10989_c11/f1p0/1938	1.21	Cytochrome P450
	F01_cb16704_c9/f2p2/873	0.60	Plant invertase/pectin methylesterase inhibitor
	F01_cb8297_c8/f1p0/2003	0.56	EH-domain-containing protein;
T3	F01_cb8987_c27/f1p0/456	4.07	ADP-ribosylation factor 1-like
	F01_cb8782_c40/f44p1/1360	3.55	peroxidase
	F01_cb7280_c24/f1p1/2016	2.57	NAC domain-containing protein
	F01_cb7489_c3/f1p0/2140	2.10	PHD-finger
	F01_cb8820_c25/f1p0/1072	2.08	ribosomal protein
T4	F01_cb17761_c89877/f1p0/2957	11.28	Auxin response factor
	F01_cb17761_c104162/f2p1/2284	11.02	BURP domain
	F01_cb16338_c23/f1p0/313	5.44	Calmodulin-like protein
	F01_cb17761_c21336/f1p0/3638	3.97	Leucine Rich Repeat
	F01_cb1066_c94/f1p0/2823	3.35	probable galactinol—sucrose galactosyltransferase 2-like
T5	F01_cb7158_c94/f1p0/1000	146.47	betaine aldehyde dehydrogenase 1
	F01_cb5102_c38/f1p0/684	38.70	probable protein phosphatase
	F01_cb16574_c20256/f2p0/1154	36.04	60S ribosomal protein L2
	F01_cb9053_c18/f1p0/304	30.32	-
	F01_cb11585_c6/f2p0/314	20.12	-

Table 2 Specific genes of the top five FPKMs for each tissue

Download PDF

Journal Publication

published 16 Sep, 2020

Read the published version in BMC Plant Biology →

Editorial decision: Major revision
08 Jun, 2020
Review #2 received at journal
21 May, 2020
Reviewer #3 agreed at journal
17 May, 2020
Reviewer #2 agreed at journal
15 May, 2020
Review #1 received at journal
11 May, 2020
Reviewers invited by journal
07 May, 2020
Reviewer #1 agreed at journal
07 May, 2020
Editor assigned by journal
26 Apr, 2020
Submission checks completed at journal
25 Apr, 2020
Editor invited by journal
25 Apr, 2020

You are reading this latest preprint version

Revealing the full-length transcriptome of caucasian clover rhizome development

Status:

Journal Publication

Version 2

Abstract

Figures

Background

Results

Discussion

Conclusion

Methods

Declarations

Abbreviations

References

Tables

Supplementary Files

Status:

Journal Publication

Version 2