Transcriptome sequencing and quality assessment
Total RNA of tobacco roots, isolated separately from 18 individual plants, was employed for RNA sequencing (RNA-Seq) library construction. The 18 RNA-Seq libraries were sequenced using the Illumina platform. After filtering out low quality sequences (quality scores <25), 105 Gb of cleaned data was obtained, representing approximately 6 Gb per sample. The cleaned sequence GC content varied from 42.1% to 42.7% (Table S1). The mapping rates for the cleaned sample reads aligned against the reference genome sequence ranged from 91.6% to 97.8% (Table S1). The sequencing quality and gene expression levels were generally consistent across the sequenced samples (Figure S1).
Identification and verification of differentially expressed genes (DEGs)
The expression levels of the genes from the tobacco transcriptomes were calculated and normalized to FPKM values (Fragments Per Kilobase of transcript per Million fragments mapped). The values of Pearson's Correlation Coefficient across biology replicates exceeded 0.82. In terms of the correlation between samples from different time points, some samples displayed higher values with those from other time points. For example, BWR3-2A showed a correlation coefficient of 0.96 as compared to BWR24-1A, and 0.95 with BWR5-2A (Figure S2). Further experiments will be required to elucidate this unexpected observation. Through comparison of the samples at each time point to the t=0 sample, and using a fold-change (FC) > 2, and a false discovery rate (FDR) < 0.05 as selection criteria, 4,830 DEGs were identified after topping. An almost identical number (2,082 and 2,075 genes) came from the N. tomentosiformis and N. sylvestris genomes, respectively (Table S2). Notably, the number of DEGs at 0.5 h (2,562) was much greater than those at any other time point, indicating that more genes respond to topping at earlier times. The number of DEGs dropped to its lowest level (815) at 1 h after topping (Figure 1a). However, a second burst of differential gene expression was observed at t=8 h (1,756), followed by a decrease at t=24 h (Figure 1a). The results imply that the N. tabacum root produces two discrete peaks of transcriptional activity, at 0.5 h and 8 h after topping. This result is consistent with the gene numbers identified as specifically induced at each of the six time points after topping, the largest number being 1,186 at t=0.5 h and the second highest number being 585 at t=8 h after topping (Figure 1b).
To validate the transcriptional results obtained by RNA-Seq, we selected nine genes related to nicotine biosynthesis and phytohormone signal transduction, and examined their transcriptional responses by qRT-PCR. The expression trends of these genes analyzed by qRT-PCR were consistent with the RNA-Seq analysis performed at the corresponding time points (Figure 2). The changes of selected DEGs obtained by RNA-Seq analysis had good correlations with those obtained by qRT-PCR (R2 = 0.674). These results confirm that the alterations in gene expression detected by RNA-Seq accurately reflect transcript differences at the different time points after topping.
Functional classification and enrichment analysis of DEGs
4,830 DEGs showing significant variation at the different time points after topping were selected for further analysis. Based on their relative expression levels, the DEGs were divided into different categories using hierarchical clustering, being distinguishable in terms of the temporal patterns of the transcriptional responses of the roots at the various time points after topping (Figure S3). The predicted functions of the DEGs were then obtained from their GO (Gene Ontology) annotations, and using KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis. According to GO term annotation, the DEGs were distributed across 42 functional terms, as follows: 19 terms for biological process, 12 terms for molecular functions, and 11 terms for cellular component (Figure S4).
GO enrichment analyses were performed to classify the putative functions of DEGs in the comparisons of libraries prepared from the different time points (Figure 3). The DEGs in GO enriched categories of biological process were mainly involved in response to oxidative stress (GO:0006979), the phenylpropanoid metabolic process (GO:0009698 ), the lignin metabolic process (GO:0009808), and response to abiotic stimulus (GO:0009628). The DEGs of GO enriched categories of cellular component were mainly involved in the apoplast (GO:0048046), the extracellular region (GO:0005576), the external encapsulating structure (GO:0030312), and the cell wall (GO:0005618). The DEGs of GO enriched categories of molecular function were mainly associated with peroxidase activity (GO:0004601), antioxidant activity (GO:0016209), and a series of transporter activities (GO: GO:0006857, GO:0008272, GO:0008509, and GO:0008271) (Figure 3).
To further investigate the functions of differentially expressed transcripts in response to topping, we performed enrichment analyses by mapping the sequences to the KEGG database categories. The DEGs with KEGG annotation were assigned to 28 classes, mainly related to signal transduction (221), carbohydrate metabolism (212), biosynthesis of other secondary metabolites (166), and metabolism of terpenoids and polyketides (69) (Figure S5). KEGG enrichment analyses also indicated the DEGs were significantly enriched in the main pathways of phenylpropanoid biosynthesis (ko00940), of starch and sucrose metabolism (ko00500), and in the plant MAPK signaling pathway (ko04016, their responses to wounding and their roles in the biosynthesis of secondary metabolism have been illustrated previously [23-25] (Figure 4).
DEGs involved in nicotine synthesis and transport
We further investigated whether the genes activated by topping were involved in nicotine biosynthesis and transport. As expected, nine gene families involved in nicotine biosynthesis (AO, QS, ODC, ADC, SAMS, PMT, A622, MPO, and BBL) (Figure 5), and two gene families involved in nicotine transport (MATE, NUP), as identified by showing at least 93% identities with the primary sequences of previously reported enzymes, were found in the DEG dataset (Table S3). All genes showed transcriptional up-regulation, with most being up-regulated at 8h and 24h after topping; our qPCR assay also verified the expression changes of four genes (PMT1a, PMT1b, MPO, ODC) at the corresponding time points (Figure 2). Both gene families encoding MATE and NUP in nicotine transport were found to be up-regulated (Table S3). Similar to the situation across all DEGs, most of those involved in nicotine synthesis and transport were found in both subgenomes. One DEG encoding AO was derived from the T-subgenome and all the DEGs encoding MPO and ODC were from the S-subgenome.
Transcription factors (TFs) of DEGs, and gene clustering by expression patterns
To investigate the upstream regulatory mechanisms of nicotine biosynthesis after topping, we next focused on the types of TFs represented in the DEGs from the tobacco root transcriptome. In our study, a total of 549 DEGs encoding TFs were identified (Table S4), being divided into 49 TF families. Amongst these, the number of TFs was highest at t=0.5 h (355), accounting for 65% of all TFs, with 240 being up-regulated and 115 down-regulated. This was followed by the t=8 h timepoint (253), accounting for 46% of all TFs, with 94 being up-regulated and 159 down-regulated. This suggests that many TFs genes participate in instant early gene activation. Notably, and representing most of these TFs, 18 families were found to contain more than 10 gene members: AP2-EREBP (75), MYB (69), bHLH (44), NAC (30), bZIP (30), Orphans (26), HB (24), WRKY (23), HSF (19), C2C2-Dof (17), GRAS (16), LOB (15), MADS (12), GNAT (11),AUX / IAA (11), G2-like (11), C3H (10) and C2H2 (10) (Figure 6).
To further examine the contributions of specific TFs to the regulatory network of nicotine biosynthesis, we performed clustering using the 549 TFs and the up-regulated structural genes associated with the nicotine biosynthesis pathway. Fifteen clusters showing similar expression profiles were obtained (Figure 7 and Table S5). It can be observed that several clusters are similar but with minor differences. For instance, the TFs of Clusters 2 and 12 were up-regulated at t=0.5 h, and the TFs in Cluster 9 and 10 were up-regulated at 0.5-1 h. They then returned to the expression levels found before topping (Figure 7). Notably, most of the up-regulated DEGs in nicotine biosynthesis were in Cluster 11 (20 DEGs), which displayed the greatest up-regulation at t=8h and at t=24 h after topping (Figure 7 and Table S5). Seventeen genes from the bHLH family and the AP2-EREBP family were found in Cluster 11, including ERF189 (Nitab4.5_0003090g0030 and Nitab4.5_0015055g0010), and ERF91 (Nitab4.5_0004620g0030) (Table S5).
DEGs connected with phytohormone signal transduction
As phytohormones are known to quickly respond to tobacco topping and to also influence nicotine biosynthesis in tobacco roots, we examined the role of phytohormone signal transduction in the transcriptional responses induced by topping. We identified 336 DEGs, including those related to the biosynthesis, metabolism and action of auxin (IAA), abscisic acid (ABA), ethylene, gibberellin (GA), and jasmonic acid (JA) (Table S6). The 53 DEGs involved in IAA signal transduction included the ARF (auxin response factor) family (4), the AUX/IAA (auxin responsive protein) family (11), the AUX1 (amino acid transporter protein) family (22), the GH3 (GH3 auxin-responsive promoter) family (7), and the SAUR (auxin responsive SAUR protein) family (9). Most of the DEGs associated with the IAA signaling pathway showed significant up-regulated expression changes, 21 of 36 genes being up-regulated at t=0.5 h, and 10 of 18 genes being up-regulated at t=8 h. For the ABA signal transduction pathway, six gene families were identified, including the PYL/PYR (abscisic acid receptor) family (5), the SAPK (Serine threonine protein kinase) family (2), the PP2C (protein phosphatase 2C) family (28), the CIPK (CBL-interacting protein kinase) family (19), the CDPK (Calcium-dependent protein kinase) family (9), and the Calmodulin (Calmodulin-like protein) family (8). 43 expression changes were detected at t=0.5 h, and 19 at t=8 h. 83 DEGs were implicated in ethylene signaling, including the AP2-EREBP (ethylene responsive transcription factor) family (75), and the ETR (ethylene receptor) family (8), with most DEGs identified at t=0.5 h (55). The GA and the JA signaling pathways (four and three gene families, respectively) also showed significant transcriptional changes after topping.
Quantification of phytohormones and nicotine
Phytohormones play a vital role in regulating plant defense and development. To gain insights into the mechanisms whereby phytohormones affect the responses of tobacco to topping, we measured the levels of IAA, JA, JA-Ile, and ABA in the root samples at the various timepoints after topping. Both JA and auxin signaling pathways were induced by topping at t=3 h. JA levels at t=3 h were significantly increased by almost 34% (P=0.035, paired t test), and reduced by 23.5% and 18.9% at t=8h and t=24h (Figure 8). The dynamics of JA-Ile levels elicited by topping closely followed those of JA, the levels of JA-Ile significantly increasing to about 3-fold at t=24 h (P=0.014, paired t test) as compared with untreated plants. The levels of IAA significantly increased at t=3 h (P=0.024, paired t test), whilst declining to initial levels at t=24h (Figure 8). The levels of ABA gradually increased to 2.3-fold at t=8 h (P=0.0003, paired t test), and to 1.6-fold at t=24 h (P=0.009, paired t test), as compared to the untreated plants. We also measured nicotine levels after topping. Our analyses indicated the nicotine levels significantly increased to 1.5-fold at t=24 h (P=0.01, paired t test) after topping (Figure 8).