Physiological responses of M. falcata under low-temperature stress
We examined the electrolyte leakage; malondialdehyde (MDA) content; superoxide dismutase (SOD), catalase (CAT) and peroxidase (POD) activities; and superoxide anion radical (O2.-), soluble protein, reduced glutathione (GSH), proline and soluble sugar contents to investigate the physiological changes in M. falcata roots exposed to low-temperature stress for 2 h (Fig. 1). Under low-temperature stress, the electrolyte leakage increased gradually with decreasing temperature (Fig. 1A). The highest electrolyte leakage was observed at -15 °C, which was 4.18 times higher than that in the control environment. The MDA content decreased gradually and peaked at -10 °C (relative to the control) and slightly decreased at -15 °C (Fig. 1B). No obvious difference was observed in SOD activity, and this value increased after low-temperature treatment and decreased from 4 °C to -15 °C (Fig. 1C). The changes in CAT and POD activities were completely contrasting under different temperatures. CAT activity first increased, and POD activity first decreased (Fig. 1D - E). The content of O2.- increased significantly after low-temperature treatment and reached a peak at -10 °C (Fig. 1F). The content of soluble protein decreased first after low-temperature treatment and increased after 4 °C and then decreased significantly after 0 °C (Fig. 1G). The GSH content increased significantly after low-temperature treatment and reached a peak at -5 °C (Fig. 1H). The proline content decreased slightly in the low-temperature environment (Fig. 1I). The highest soluble sugar content was measured at -10 °C (Fig. 1J).
M. falcata transcriptome sequencing
To identify and characterize the transcriptome of M. falcata roots under low-temperature stress, we measured the roots under CK (room temperature), 4 °C, 0 °C, -5 °C, -10 °C and -15 °C for 2 h by joining the PacBio SMRT and NGS technologies for whole-transcriptome profiling. In total, 125.48 Gb clean data were obtained by RNA-Seq, yielding 418,495,643 reads with a GC content of 42.61% and a QC 30 of 85.51% (Additional file 1: Table S1). With SMRT, a total of 8 cells were used to obtain 19.27 Gb clean data. A total of 1,202,336 polymerase reads were obtained, and then the polymerase reads with fragments less than 50 bp and sequence accuracy less than 0.75 were filtered. Then, 8,428,385 subreads were obtained by filtering the remaining sequences from the linker, filtering out the linker sequence and filtering out the subreads with fragments less than 50 bp (Table 1). A total of 552,818 reads of inserts (ROIs) were extracted from the original sequence, of which 270,750 were full-length nonchimeric reads (FLNC), and 223,319 were nonfull-length reads (Table 1). The full-length sequences were clustered using the RS_IsoSeq module of SMRT Analysis software. A total of 131,118 consensus isoforms were obtained; 99,490 high-quality isoforms were obtained using the nonfull-length sequence alignment, and 31,628 low-quality isoforms were obtained and corrected using RNA-Seq data [39]. The redundancy in the high-quality and corrected low-quality transcript sequences of each sample were eliminated by CD-HIT software [40], and 115,153 nonredundant sequences were obtained. Based on the nonredundant sequences of each sample, we predicted a total of 8,849 alternative splicing (AS) events with the IsoSeq_AS_de_novo script (Additional file 2: Table S2) [41], 73,149 simple sequence repeats (SSRs) with the MIcroSAtellite identification tool (Additional file 3: Table S3), and 4,189 long noncoding RNAs (LncRNAs) with Coding Potential Calculator (CPC) [42], Coding-Non-Coding Index (CNCI) [43], Coding Potential Assessment Tool (CPAT) [44], and Protein family (Pfam) (Additional file 4: Figure S1).
Annotation and expression of transcripts under low-temperature stress
To acquire the most comprehensive annotation, all full-length transcripts from SMRT were aligned with NCBI nonredundant protein database (NR), SwissProt, Gene Ontology (GO), Clusters of Orthologous Groups (COG), euKaryotic Ortholog Groups (KOG), Pfam, and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases using BLAST software (version 2.2.26) [45], and a total of 111,587 genes from SMRT were annotated, of which 7,155 genes’ length were >= 300 bp and <1,000 bp, and 104,432 genes’ length were >= 1,000 bp (Additional file 5: Table S4). Among these annotated sequences, 111,384 sequences had significant matches in the Nr database, 89,117 sequences had significant matches in the Pfam database, 82,433 sequences had matches in the SwissProt database, and 3,385 had effective matches in the GO database. Based on the homology with sequences of different species, 97,831 (87.87%) sequences were found against M. truncatula, and 3,463 (3.11%) sequences had clandestine hits with Cicer arietinum, followed by M. sativa (1,662, 1.49%), Glycine max (1,011, 0.91%), Rhizoctonia solani (528, 0.47%), Fusarium oxysporum (388, 0.35%), G. soja (369, 0.33%), Phaseolus vulgaris (349, 0.31%), Vitis vinifera (266,0.24%), and M. falcata (181, 0.16%). Only 5,283 (4.75%) annotated sequences had similarity with other plant species (Additional file 6: Figure S2).
To evaluate gene expression levels in response to low-temperature stress, we mapped all clean data back onto the assembled transcriptome, and the readcount for each gene was obtained from the mapping results by using RNA-Seq by Expectation Maximization (RSEM) software (Additional file 7: Table S5) [46]. The mapped readcount for each gene was then converted into the expected number of fragments per kilobase of transcript per million mapped reads (FPKM) (Additional file 8: Table S6). FPKM values can eliminate the impact of transcript length and sequencing differences on computational expression. The boxplot diagram of FPKM values indicated that gene expression levels were not evenly distributed in the different experimental environments (Fig. 2A). The Pearson correlation coefficient was used to evaluate the correlations of each biological sample, with r2 values close to 1 indicating a strong correlation between two replicate samples (Fig. 2B). Then, all sequences were used for further DEG analysis after excluding the abnormal samples.
Analysis of DEGs in response to low-temperature stress
In total, 11,369 DEGs displaying up- or downregulation between samples (fold change≥2 and false discovery rate (FDR)<0.01) collected at any pair of temperature points were identified by comparing gene expression levels under low-temperature stress (Additional file 9: Table S7). Clustering patterns of DEGs under low-temperature stress were determined by hierarchical cluster analysis of all DEGs (Fig. 3A). All DEGs with the same or similar expression levels were clustered, and a set of genes was quickly activated during the early stage of low-temperature stress (4 °C), and other genes were activated under a freezing temperature (-10 °C). The 11,369 DEGs identified were grouped into six subclusters by K-means coexpression cluster analysis (Fig. 3B). The expression level of genes in subcluster 1 (1,271 genes) began to increase after the temperatures dropped to -5 °C and reached a maximum at -10 °C; then, the expression levels decreased rapidly. KEGG analysis of genes in subcluster 1 revealed that most were involved in starch and sucrose metabolism, plant-pathogen interaction, and galactose metabolism. The expression level of genes in subcluster 2 (2,226 genes) increased significantly after low-temperature treatment, then the expression level first decreased slowly and then rapidly at each temperature point below 4 °C. Genes in this subcluster functioned mostly in circadian rhythm, plant-pathogen interaction, and plant hormone signal transduction. The genes in subcluster 3 (1,460 genes) and subcluster 4 (2,695 genes) were both upregulated after low-temperature treatment and downregulated after 4 °C treatment. The difference was that the decrease in the expression level of genes in subcluster 3 fluctuated while that in subcluster 4 was continuous. Genes involved in starch and sucrose metabolism, protein processing in endoplasmic reticulum, fatty acid degradation, and tyrosine metabolism were enriched in subcluster 3. Genes in subcluster 4 were enriched in plant-pathogen interaction, starch and sucrose metabolism and plant hormone signal transduction. Genes in subcluster 5 (1,613 genes) were weakly downregulated after low-temperature treatment, then strongly upregulated from 4 °C to 0 °C, and then upregulated again under -15 °C treatment. Genes in this subcluster mostly functioned in plant-pathogen interaction, phenylalanine metabolism, and the plant hormone signal transduction pathway. Genes in subcluster 6 (2,104 genes) were downregulated at all times, and most of them functioned in phenylpropanoid biosynthesis, circadian rhythm, and ubiquitin-mediated proteolysis.
Identification of putative TFs
TFs play an important role in cell function and development and directly regulate gene expression through interactions with themselves and other proteins to participate in plant stress regulations, including low-temperature-related processes. In this study, 1,538 TF genes were differentially expressed between different temperature points and were classified into 45 TF gene families according to PlantTFDB (http://planttfdb.cbi.pku.edu.cn/) (Additional file 10: Table S8). The most abundant TF family was the WRKY (186 genes) family, followed by the ERF (165 genes), MYB (143 genes), bHLH (131 genes) and NAC (111 genes) families.
Comparison of coexpressed genes between the low-temperature-treated and control samples
Under low-temperature stress, a total of 8,683 DEGs were identified by a comparison of each temperature point to the control environment. Interestingly, the number of upregulated DEGs (5,876 genes) was much higher than the number of downregulated DEGs (2,807 genes), and 134 genes were differentially coexpressed at all five temperature points (Additional file 11: Table S9). As shown in Fig. 4A, there were 101 upregulated genes and 33 downregulated genes in all five comparisons. These 134 coexpressed genes were assigned into the GO categories of biological process, cellular component and molecular function (Fig. 4B). In the biological process category, “metabolic process”, “cellular process”, “single-organism process” and “biological regulation” were the most enriched terms. In the cellular component category, “cell”, “cellular component” and “organelle” were the most enriched. In the molecular function category, “binding” was the most enriched term, followed by “catalytic activity”. COG functional classification of the 134 coexpressed genes showed that most of the genes were enriched in “General function prediction only”, “Transcription”, “Replication, recombination and repair”, “Signal transduction mechanisms” and “Carbohydrate transport and metabolism” (Fig. 4C). KEGG enrichment showed that most of the coexpressed genes were enriched in the “Circadian rhythm - plant”, “Cysteine and methionine metabolism”, “Arginine and proline metabolism”, “Phenylpropanoid biosynthesis”, “Galactose metabolism”, “Plant-pathogen interaction” and “Biosynthesis of amino acids” pathways (Fig. 4D).
Weighted gene coexpression network analysis (WGCNA) of DEGs in response to low-temperature stress
WGCNA was performed to obtain a better understanding of which genes within these complex signaling networks were the most connected hubs. The number of genes in the module was clustered according to the expression levels, and genes with a high clustering degree were allocated to the same models. The 8,683 DEGs identified by comparing low-temperature-treated and control samples were clustered based on topological overlap, and then the gene modules were obtained from the dynamic tree cut. Finally, 12 gene modules were identified after merging modules with similar expression patterns (Fig. 5A). The magenta modules contained the most genes, 1,092 genes, and the violet nodules contained the fewest genes, 70 genes (Additional file 14: Table S10). The gray module was not a true module but a place to categorize the remaining genes that were not well correlated with any one of the significant colored modules. The kME (module eigengene-based connectivity) was calculated for each gene to every module, and 449 genes were found to act as a hub in more than one module.
All 12 genetic modules with a module characteristic value P < 0.05 were used to find the modules that were highly correlated with samples and physiological indicators (Fig. 5B). The control samples were highly correlated with the MEblue module. The samples under 4 °C treatment were highly correlated with the MEmagenta module. The samples under 0 °C treatment were highly correlated with the MElightcyan module. The samples under -5 °C treatment were highly correlated with the MElightyellow module. The samples under -10 °C treatment were highly correlated with the MEbrown module. The samples under -15 °C treatment were highly correlated with the MEgray60 module. The MEbrown module was found to be significantly correlated with physiological indicators and may play a key role in low-temperature tolerance in M. falcata. COG classification results showed that the MEbrown module genes were involved in 23 major categories, including “General function prediction only”, “Signal transduction mechanisms”, “Transcription” and “Replication, recombination and repair” (Additional file 13: Figure S3A). GO analysis results showed that the MEbrown module genes were mainly involved in 13 biological processes, such as “protein phosphorylation”, “regulation of transcription, DNA-templated”, and “oxidation-reduction process”, were distributed to eight cellular component terms, such as “integral component of membrane”, “plasmodesma”, “chloroplast stroma” and “nucleus”, and consisted of 14 molecular functions, which included ATP binding, protein serine/threonine kinase activity, and cation binding (Additional file 13: Figure S3B). KEGG enrichment showed that most of the genes in the MEbrown module were enriched in “Starch and sucrose metabolism”, “Plant-pathogen interaction”, “Circadian rhythm - plant”, “Protein processing in endoplasmic reticulum”, “Galactose metabolism” and “Plant hormone signal transduction” (Additional file 13: Figure S3C).
Confirmation of RNA-Seq sequencing data by qRT-PCR analysis
The DEGs associated with low-temperature stress were selected for qRT-PCR assays to confirm the SMRT sequencing data. Eighteen genes were selected randomly from 134 DEGs coexpressed at all five temperature points. We found that the fold-changes in the expression calculated by sequencing data did not exactly match the expression values detected by qRT-PCR analysis, but the expression profiles were basically consistent for all 18 genes (Additional file 14: Figure S4). These analyses confirmed the reliability of the gene expression values generated from SMRT sequencing data.