Physiological characterization of tomato seed maturation
Different physiological traits acquired during seed maturation were characterized based both on seed age calculated after tagging the flowers (i.e. 15–90 DAF) and on the fruit ripening stages (i.e. from mature green (MG) to over-ripe fruit (red fruit + 14 d)) as shown in Fig. 1A. Seed filling occurred between 21 to 49 DAF as observed by the increase in seed dry weight (DW) and decrease in water content (Fig. 1B). During seed filling, desiccation tolerance (i.e. the ability to germinate after fast drying) was acquired from 35 DAF onwards, at the start of endosperm solidification, until 56 DAF, consistent with previous studies [5, 6]. Thereafter, seed water content remained high at 1 g H2O/g DW (50% fresh weight basis) throughout fruit ripening, highlighting that the developing tomato seeds do not undergo a maturation drying while they remain in the fruit. Germination percentage of artificially dried seeds progressively increased from 42 DAF until 83 DAF, corresponding to over-ripened fruits (Fig. 1C). This increase in germination was attributed to a gradual release of primary dormancy. Indeed, 100% of the dried immature seeds from the mature green stage onwards (56 DAF) germinated when they were directly imbibed in 30 mM KNO3, a treatment known to break dormancy [36]. Furthermore, germination speed (time to reach 50% germination, t50) increased from 4.4 d at 56 DAF to 2.3 d at 90 DAF (Fig. 1C, Additional file 1: Fig. S1). From 42 DAF onwards, seeds gradually acquired their ability to germinate at low water potential (-0.3 MPa). In parallel, longevity (measured as P50, the period required for the seed batch to lose 50% of germination during storage, Additional file 1: Fig. S2) also increased progressively, even after the fruit red stage, considered to be optimal for seed quality (Fig. 1D). Altogether, these data pinpoint a long late maturation phase lasting over 40 days from 42 DAF onwards during which seed vigour is progressively increased.
Spatial and temporal description of the tomato seed transcriptome
To obtain a spatial and temporal representation of the seed transcriptome during development, a RNA sequencing (RNA-seq) data series of whole seeds and isolated seed tissues was obtained at 14 stages throughout seed development from 15 to 28 DAF for entire seeds (S), from 35 DAF to stage R14 for embryo (Em) and endosperm (End), and from 35 to 49 DAF for seed coat (SC). Principal component analysis (PCA) was performed to compare changes between the transcriptomes of the different seed tissues and developmental stages (Fig. 2). This revealed a distinct clustering of transcript profiles corresponding to tissue types throughout development (Fig. 2A). PCA carried out on embryo and endosperm respectively (Fig. 2B and C) indicated that the major factor responsible for the variance of both datasets (Dim1, 43% of the variation explained) was aligned to the developmental stages in chronological order between 35DAF until 49 DAF. A major transcriptional switch occurred between 49 DAF and mature green fruits. Thereafter, the variation was partially explained by the second dimension, which aligned transcriptome changes with the progress of fruit ripening.
The timing of molecular events during tomato seed development was further examined via the expression of major regulatory genes involved in embryogenesis, seed filling and seed vigour acquisition (Fig. 3, Additional file 1: Fig. S3). Transcript levels of the orthologs of SlFUSCA3, one of the four master transcriptional regulators forming the LALF maturation network, increased from 15 DAF onwards and were maximum around 40 DAF (Fig. 3A). Thereafter, transcripts rapidly decreased and were no longer detectable at the mature green stage. As a marker of seed filling, we chose the ortholog of WRI1, a target of LEC2 regulating oil accumulation in Arabidopsis seeds [37]. SlWRI1 transcript abundance followed a similar pattern as SlFUSCA3, being higher in the endosperm compared to embryo (Fig. 3B). Two ABI3 orthologs were detected in the tomato genome. Transcript levels for SlABI3-1 increased between 15 DAF and 42 DAF, as for SlFUSCA3, whereas SlABI3-2 transcript levels reached a maximum 20 d later, around 60 DAF, when fruits were mature green (Fig. 3C, D). The expression of both genes preceded the acquisition of desiccation tolerance (Fig. 1C). Thereafter, for both genes, transcript levels remained high throughout the rest of maturation, with a slightly lower expression in the embryo for SlABI3-1 (Fig. 3C). The other two members of the LAFL network, LEC1 and LEC2 could not be identified with enough confidence and were not included in this study. As other representatives of ABA signalling pathway, we selected ABI4 and ABI5, the latter controlling the accumulation of protective molecules involved in seed longevity in legumes [38]. Two orthologs of ABI4 were identified, and their transcripts were detected specifically in the embryo. Transcript levels increased between 42 to 76 DAF, when fruits became red, and remained high upon further ripening (Fig. 3E, F). SlABI5 transcript levels increased later than SlABI3, but earlier than SlABI4, first in the endosperm, with a maximum at 42 DAF, then in the embryo, with a maximum at 56 DAF, when fruits were mature green (Fig. 3G). We also included DOG1, a key regulator of seed dormancy that is expressed during seed maturation in Arabidopsis downstream of LEC1 [39], and PROCERA, the only DELLA annotated gene in tomato whose GRAS domain regulates dormancy and longevity [22]. Transcript levels of two SlDOG1 genes increased from 42 DAF onwards, with higher levels in the embryo compared to endosperm (Fig. 3I, J, Additional file 1: Fig. S3). Whereas SlDOG1-1 (Solyc02g072570.2.1) remained higher upon further maturation, transcript levels of SlDOG1-2 (Solyc03g006120.4.1) peaked at 49DAF, and decreased progressively upon further maturation, in parallel with the release of dormancy (Fig. 1C). Transcript levels of SlPROCERA were high during early development around 15 DAF in whole seeds, consistent with previous data [26]. Thereafter, they decreased to very low levels around 35 DAF, and increased again progressively until 80 DAF (Fig. 3H). Transcript levels of SlPROCERA were 3-fold higher in the endosperm compared to the embryo, and followed a similar profile as for SlABI4. Both SlABI4 orthologues and SlPROCERA transcripts correlated with the increase in longevity and germination under osmotic stress (compare Fig. 3 with Fig. 1D).
Identification of temporal and tissue-specific gene modules via WGCNA
To identify and characterize the temporal and tissue-specific gene modules, we carried out a WGCNA [14, 40]. After normalization, genes with low expression value and low coefficient of variation between replicates were discarded, resulting in 15,173 genes that were used for the network analysis. The soft thresholding power to calculate adjacency was based on the criterion of approximate scale-free topology while limiting the loss of mean connectivity [41]. Here, the scale-free topology value was set at 0.8 (Additional file 1: Fig. S4A). Hierarchical clustering analysis revealed 21 distinct expression modules containing between 42 and 4602 genes, which were represented by their module eigengenes (ME, Fig. 4A, Additional file 1: Fig. S4B). The gene modules were sorted in five groups based on their expression pattern and tissue specificity (Fig. 4A). The first group corresponded to genes expressed early during seed development, both in whole seeds and in the seed coat, but not during maturation (ME20, ME15, ME1, ME9 and ME5). The second (ME7, ME8, ME17) and the third group (ME6, ME16, ME11, ME18, ME4) corresponded to modules with genes exhibiting an embryo and an endosperm specific expression characterizing the second half of the seed development (Fig. 4A) respectively. The fourth group (ME10, ME19, ME2) was characterized by modules with transcripts expressed both in embryo and endosperm and specific to late maturation. The fifth group (ME3, ME13, ME14) corresponded to modules containing genes expressed first early during seed development and again later during the end of maturation. ME12 was a hybrid module containing genes expressed in the embryo or in the endosperm. ME0 consisted of genes whose expression did not fit any modules.
To visualize the gene modules, the weighted gene co-expression network was examined using Cytoscape [42]. The network contained 6399 nodes and 411,826 edges on which the expression modules were projected in different colours for a topological representation (Fig. 4B). The network was made up of a central core and several sub-networks of variable size, two of them being connected to the central core. To further characterize the network topology, we determined transcripts that were preferentially expressed in endosperm, seed coat or embryo (Additional file 2: Table S1). Preferentially expressed genes were determined as transcripts at least 10-fold more abundant in a tissue compared to any other seed tissues. Projection of these genes on the network revealed that embryo preferential transcripts were mostly found in three disconnected subnetworks (ME7, ME17, ME19) and at the edge of ME3 within the central core of the network (Fig. 4B-C). Endosperm preferential transcripts localized mostly in ME6 and in the tightly connected modules ME18, ME11, ME4. The seed coat transcriptome was also well delimited within the network, mostly represented in ME9 and ME1 and in the loosely connected subnetwork within ME15 and M20 (Fig. 4B-C). Seed preferentially expressed genes were identified as transcripts whose abundance was a least 10-fold higher than that of other tomato plant tissues that were extracted from published studies [26, 43, 44]. They were found throughout the gene network (Fig. 4D), but with a high concentration in endosperm and embryo specific subnetworks ME4, ME19 and ME10 modules (Fig. 4B, D).
Characterization of biological processes in tissue-specific modules
To gain insight into the biological processes that are specific to the different modules, Gene Ontology (GO) enrichment analysis was performed. The five most significantly over-represented GO terms per module are depicted in Fig. 5 (see Additional file 3: Table S2 for the full dataset). The modules ME5 and ME1 contained genes that are mainly expressed during early seed development and for which it was not possible to dissect the different tissues because they were not well differentiated. These modules were described by GO terms associated with cell division/growth and morphogenesis, likely reflecting the end of embryogenesis. ME1 included several GO terms associated with “response to a stimulus, light signalling, signalling molecules (karrikins, ABA, GA, brassinosteroids) and “positive regulation of seed maturation”. This latter category contained seven bZIP transcription factors involved in developmental reprogramming, including orthologues of bZIP44 and bZIP53, known to play a role in regulating endo-β-mannase genes and germination [45]. ME1 also contained ABA synthesis genes, including NOTABILIS and two orthologs of other NINE-CIS-EPOXYCAROTENOID DIOXYGENASE (NCED). NOTABILIS and SlNCED2 transcripts were highly induced early during seed embryogenesis, with NOTABILIS mostly present in the seed coat peaking at 35 DAF (Additional file 1: Fig. S5). Transcripts of SlNCED6 increased both in the embryo and endosperm between 35 and 60 DAF. This is consistent with the Arabidopsis model where both a maternal and zygotic sources of ABA govern seed development [46]. Over-ripening (83–90 DAF) resulted in higher transcript levels in the endosperm. Both SlABI3 genes were also found in ME1.
ME3 contained genes that were highly expressed between 15 to 42 DAF in all three tissues. The module was enriched in many GO terms associated with cell division, cell plate formation, growth and plant organ morphogenesis that are typical of embryogenesis (Additional file 3: Table S2). SlWRI1, which controls oil synthesis by regulating C allocation between fatty acids and sucrose in developing Arabidopsis seed (Fig. 3B, Additional file 1: Fig. S3), was present in this module, consistent with the presence of GO terms reminiscent of its function such as “response to sucrose” and “acetyl-CoA biosynthetic process from pyruvate” (Additional file 3: Table S2). This module also included the master regulator SlFUSCA3 (Fig. 3A) and SlDOG1-1 (Fig. 3I). Considering that the increase in seed weight parallels the expression profile of the eigengene of ME3, this suggests that ME3 is a gene module regulating lipid storage reserve deposition.
Seed coat-preferentially expressed gene modules
Module ME20 contained genes that were highly expressed between 15 and 35 DAF and was enriched in GO terms associated with growth, cell wall biogenesis and mucilage biosynthetic process (Fig. 5A), highlighting the expansion phase of the seed and differentiation of the seed coat layers. Likewise, ME15, characterized by genes transiently expressed with a peak at 35 DAF, was also enriched with cell wall associated GO terms and included terms related to “lignin biosynthesis process” and “defense response to insects” (Fig. 5A, Additional file 3: Table S2). This suggests that these two expression modules could represent a seed coat differentiation program that will set up the protective barrier just before the acquisition of germination capacity, thereby preventing precocious germination later during maturation and protecting the reproductive organ [12]. ME9 contained 399 genes with increased expression from 35 to 49 DAF. This module was characterized by GO terms associated with “pectin catabolism” and “abscission/senescence processes” (Fig. 5A) that include MAPKKK and several NAC transcription factors (Additional file 3: Table S2). Interestingly, ME9 contained several pectin lyase genes such as Solyc04g015530.3.1 that are implicated in fruit abscission. Its expression profile was consistent with the reported timing of the seed detachment from the fruit tissue in tomato [5]. It also contained the GO “response to ABA” (Additional file 3: Table S2) with several genes involved in sugar transport and synthesis of galactinol (GolS2, Solyc02g062590.3.1), the precursor of the raffinose family oligosaccharide (RFO, Additional file 3: Table S2, Additional file 1: Fig. S6).
Embryo-preferentially expressed gene modules
ME7, the largest embryo-specific module with genes expressed throughout the second phase of maturation was enriched in many biological functions, including “response to ABA” (Fig. 5B) and “regulation of seed germination” (Additional file 3: Table S2), which might reflect repressing activities to avoid vivipary. ME8 contained 406 genes whose transcripts increased between 40 and 80 DAF in the embryo. It was characterized by an over-representation of four terms associated with organelle RNA editing (chloroplast and mitochondria RNA processing, C to U editing) that was represented only by genes encoding pentatricopeptide repeat proteins. Another significant term was “resolution of meiotic recombination intermediates”. A closer look at the gene list revealed seven genes encoding helicases and topoisomerases associated with DNA repair and the stability of repetitive sequences within the genome. The small ME17 characterizing late embryo maturation from 72 DAF onwards did not exhibit revealing GO terms (Additional file 3: Table S2) but we noted the presence of four orthologs of HECATE genes, a transcription factor with versatile functions throughout development [47] and ABA2, involved in ABA synthesis. ME13, containing genes that were highly expressed between 15 and 42 DAF, reflecting a transcriptional program related to embryogenesis (Fig. 5B).
Endosperm-preferentially expressed gene modules
The ME6 module, connected to the core ME1 and the embryo specific part of ME3 (Fig. 4B-C) was composed of genes that were highly expressed early during development in whole seeds and in the endosperm from 15 to 42 DAF. The module was over-represented in GO terms associated with “jasmonic acid metabolism” (Fig. 5C). This GO term contained genes associated with several other phytohormones including salicylate, auxins, brassinosteroids and ABA (Additional file 3: Table S2), probably because of cross-talk between the hormone pathways, evident from the ortholog of JASMONATE RESISTANT 1 (JAR1), an auxin responsive protein implicated in jasmonic acid (JA) synthesis. The GO term “somatic embryogenesis” (Fig. 5C) and “ABA activated signalling pathway” both revealed the presence of 10 orthologues of LEC1-like. ME6 was also enriched with GO terms associated with the regulation of fatty acid biosynthetic process, suggesting that ME6 represents an endosperm-specific expression module that is associated with late embryogenesis and seed filling. Genes in the ME18 module showed transient expression profile with a peak at 49 DAF, and enriched functions were related to suberin biosynthesis, very long fatty acids and cutin synthesis. This might reflect the differentiation of an epidermal barrier [48]. ME4 was a large expression module representing the late phase of maturation from 42 to 90 DAF. It was enriched with “gene silencing” (Fig. 5C). ME11 with 229 genes was enriched with genes involved in “chromatin silencing”, “histone deacetylation” (Fig. 5C), “negative regulation of RNA transcription” and “signal transduction” (Additional file 3: Table S2).
Quantification of module-physiological trait associations
Our next objective was to incorporate the acquisition of the physiological traits (Fig. 1) into the gene network (Fig. 4B) to identify modules and hub genes that might govern these traits. To achieve this, we determined which ME modules were significantly correlated with each of the measured traits based on the Pearson correlation coefficient (PCC, Fig. 6). Acquisition of desiccation tolerance was highly correlated with ME10 (PCC = 0.87). This module was also correlated with P50 values, albeit with a lower level of significance. Modules ME3 (PCC=-0.82) and ME6 (PPC=-0.81) were negatively related to desiccation tolerance. Both modules were enriched in genes involved in fatty acid biosynthesis (Fig. 6, Additional file 3: Table S2). All the other vigour traits (germination/dormancy release, germination speed, germination under osmotic stress and longevity (P50) were all highly associated with module ME2. ME2 captured the late maturation phase and represented an independent subnetwork (Fig. 4). No other modules were found to be highly correlated with any of these traits (Fig. 6, Additional file 3: Table S2).
Identification of a conserved regulatory gene module associated with desiccation tolerance
Module ME10 was highly correlated with the acquisition of desiccation tolerance and contained 280 genes, with a large number of them being preferentially expressed in seeds (Fig. 7). Over-representation analysis revealed functions related to the “TCA cycle”, “seed oil body biogenesis”, “response to desiccation” and “response to ABA” (Additional file 3: Table S2). Since genes with high connectivity are more likely to exert large effects on physiological traits ([49] and reference therein), we identified those genes with the highest intramodular connectivity, (i.e. with the highest correlation with the eigengene of the module), as well as with the highest correlation with the acquisition of desiccation tolerance. The value of module membership (MM) of each individual gene (i.e. the PCC between the gene expression profile and the module eigengene) was plotted against the gene significance value (GS, i.e. the PCC between the expression profile of each individual gene and the seed physiological trait) (Fig. 7A). The resulting correlations between MM and GS values allowed to identify 106 hub genes having both a high MM and GS value > 0.8 (Fig. 7A, squared box, Additional file 4: Table S3). Analysis of the 25 top connected genes showed numerous genes with protective functions, namely 13 Late Embryogenesis Abundant (LEA) proteins from different families (dehydrins, D-34), one small heat shock protein (sHSP) and two oleosins (Table 1). To investigate the conserved nature, we compared our desiccation tolerance gene module with two datasets representing genes that are activated upon acquisition of desiccation tolerance in seeds of M. truncatula [50] and in Arabidopsis [51]. A total of 68 transcripts of the ME10 module were present in at least one of these datasets (Additional file 5: Table S4), and were clustered in the core of ME10 (Fig. 7C). They were mostly associated with protective, detoxification and repair functions. The presence of these genes across species shows that they represent a highly conserved regulatory network governing desiccation tolerance. The conserved network also contained several known regulators that are important for seed vigour (Additional file 5: Table S4), such as SOMNUS, a CCCH-type zinc finger transcription known to negatively regulate seed germination by activating ABA biosynthesis and inhibiting GA biosynthesis downstream of phytochrome [52], Heat Shock Factor A2 (HSFA2) and two genes involved in phospholipid signalling (including FLT/TERMINAL FLOWER 1 an ortholog of MOTHER OF FLOWERING TIME that is regulated by the ABA signalling pathway (Table 1, Additional file 5: Table S4). Fifty-nine percent of genes (22 genes) belonging to this conserved desiccation tolerance network represented experimentally validated direct targets of ABI3 in Arabidopsis [53]. They were tightly connected within ME10 (Fig. 7D). Outside the ABI3 regulon, additional transcription factors (TF) and several ABA signalling/responsive proteins were found associated with the desiccation tolerance network such as an ortholog of an ABSCISIC ACID-INSENSITIVE 5-like protein 4 and a HVA22-like protein. (Additional file 5: Table S4).
Table 1
Top 25 hub genes of ME10 with transcripts that are highly connected with the module eigengene and highly correlated with the acquisition of desiccation tolerance
Gene ID | Description | GS.DT | p.GS.DT | MM.ME10 | p.MM.ME10 |
Solyc02g077980.3.1 | Unknown protein | 0.840 | 1.7E-23 | 0.987 | 9.5E-67 |
Solyc09g082110.4.1 | Late embryogenesis abundant protein D-34 | 0.878 | 5.1E-28 | 0.986 | 3.6E-65 |
Solyc10g078780.2.1 | 11 kDa late embryogenesis abundant protein | 0.864 | 4.1E-26 | 0.986 | 4.0E-65 |
Solyc07g066400.1.1 | seed maturation protein | 0.887 | 3.2E-29 | 0.984 | 4.2E-63 |
Solyc03g115370.3.1 | Diacylglycerol kinase | 0.883 | 1.2E-28 | 0.984 | 7.0E-63 |
Solyc11g042800.2.1 | Embryonic protein DC-8 | 0.842 | 1.1E-23 | 0.984 | 8.2E-63 |
Solyc02g079290.3.1 | FLT/ TERMINAL FLOWER 1-like protein | 0.857 | 2.4E-25 | 0.982 | 3.1E-61 |
Solyc04g072250.4.1 | 17.5 kDa class I heat shock protein | 0.873 | 2.8E-27 | 0.981 | 2.7E-60 |
Solyc09g082100.3.1 | Late embryogenesis abundant protein D-34 | 0.889 | 1.6E-29 | 0.980 | 3.7E-59 |
Solyc12g098900.2.1 | Late embryogenesis abundant protein D-29 | 0.905 | 3.8E-32 | 0.980 | 4.1E-59 |
Solyc03g025810.4.1 | Low-temperature-induced 65 kDa protein | 0.839 | 2.0E-23 | 0.979 | 4.4E-58 |
Solyc07g065990.1.1 | Oleosin S1-2-like | 0.828 | 2.5E-22 | 0.979 | 4.7E-58 |
Solyc09g008770.3.1 | Late embryogenesis abundant protein | 0.842 | 1.0E-23 | 0.978 | 6.7E-58 |
Solyc01g098850.3.1 | D(P)-binding Rossmann-fold superfamily protein | 0.882 | 1.8E-28 | 0.977 | 5.1E-57 |
Solyc02g084840.3.1 | Dehydrin | 0.885 | 5.6E-29 | 0.977 | 1.1E-56 |
Solyc02g091390.3.1 | Cold-regulated protein | 0.819 | 1.8E-21 | 0.975 | 1.2E-55 |
Solyc07g062990.2.1 | Late embryogenesis abundant protein 1-like | 0.803 | 4.4E-20 | 0.975 | 1.3E-55 |
Solyc01g060070.3.1 | Outer envelope pore protein 16 − 2, chloroplastic | 0.824 | 6.3E-22 | 0.974 | 7.3E-55 |
Solyc03g113510.2.1 | Hypothetical protein | 0.817 | 2.7E-21 | 0.973 | 4.1E-54 |
Solyc12g010820.2.1 | Late embryogenesis abundant protein | 0.815 | 3.9E-21 | 0.973 | 7.2E-54 |
Solyc02g062770.2.1 | Late embryogenesis abundant protein | 0.894 | 2.5E-30 | 0.973 | 1.2E-53 |
Solyc09g015070.3.1 | D(P)-linked oxidoreductase superfamily protein | 0.821 | 1.1E-21 | 0.971 | 1.2E-52 |
Solyc02g071760.4.1 | Oil body-associated protein 2A-like | 0.809 | 1.3E-20 | 0.969 | 7.8E-52 |
Solyc12g008430.3.1 | Malic enzyme | 0.928 | 7.1E-37 | 0.969 | 1.1E-51 |
Solyc12g038160.2.1 | Lipase | 0.816 | 3.1E-21 | 0.969 | 1.5E-51 |
GS. DT, Gene significance value associated with desiccation tolerance; p.GS.DT p-value of the correlation, MM.ME10, module membership with ME10; p.MM.ME10, p-value of the correlation. |
Regulatory networks associated with the acquisition of seed vigour
A similar approach as for the desiccation tolerance module ME10 was used to infer the hub genes of ME2, the module that was correlated with the dormancy release, germination under osmotic stress and longevity (Fig. 8). First, the highly connected genes with a MM > 0.9 (448 genes) were selected from the plot between MM and GS of each trait (Fig. 8A-C). A Venn diagram shows the overlap between these gene lists and revealed genes that were more correlated to a specific trait (Fig. 8D). No hub gene was found specifically associated with germination under stress or longevity, whereas 45 genes were found exclusively for dormancy release (Fig. 8G). Over-representation analysis of the 83 common hub genes revealed an enrichment of GO terms related to “translation” and “mRNA processing/modification both in chloroplast and mitochondria” (Additional file 6: Table S5). Many genes encoding penta- and tetratricopeptide repeat containing protein were detected. Among the top correlated genes, we also found an ortholog of the SHK1 kinase binding protein1 or protein arginine methyltransferases (PRMT5, Solyc08g005970.3.1), a gene known to modulate pre-mRNA splicing, seed development and stress response ([54] and reference therein) and mRNA adenosine methylase (Solyc08g066730). The list of the 176 common genes between dormancy release and longevity was also significantly enriched with mRNA processing terms also associated with the chloroplast and mitochondria (Fig. 8F, Additional file 6: Table S5). The same observation was made for the genes only correlated with dormancy. From this analysis, ME2 appeared to represent a subnetwork of regulators that couples the acquisition of seed vigour with the induction of post-transcriptional regulation both in the embryo and endosperm. Also with the genes correlating with dormancy and longevity, we detected several homologues of genes connecting light, circadian rhythm and control of flowering, such as PHYTOCHROME AND FLOWERING TIME 1 (PFT1, Solyc05g009710.4.1), EMBRYO DEFECTIVE 1507 (Solyc06g065300.4.1, a dead helicase implicated in the splicing of FLOWERING LOCUS C (FLC) [55] and EARLY FLOWERING 4, ELF4 (Solyc06g076960.2.1) that synchronizes the circadian clock with light and temperature [56], an homologue of FRIGIDA (Solyc04g072200.3.1) and PRMT5 mentioned above.
Identification of tissue specific modules correlating with physiological traits
So far, the approach taken to identify modules that correlated with seed vigour identified only those for which the module eigengene profile correlated with an increase in the acquisition of seed vigour both in the endosperm and embryo (Additional file 1: Fig. S7). However, this analysis excluded modules containing genes that were only expressed in either endosperm or embryo, because the correlation would have been highly correlated only in one of the seed tissues, thereby decreasing the overall PCC value with the trait throughout all the samples (Additional file 1: Fig. S7). To identify embryo- and endosperm-specific gene modules associated to seed vigour, we retained those genes whose transcript level correlated with the acquisition of seed vigour traits either in the 11 samples of the endosperm tissue or in the 11 samples of the embryo tissue. Projection of these highly correlated genes in the embryo (Fig. 9A-C) or endosperm (Fig. 9D-F) on the network highlighted as expected, the previously identified ME2 for both tissues (-0.8 > PPC > 0.8, Fig. 9A). Comparison of the highlighted modules in Fig. 9 with the tissue-specific modules shown in Fig. 4B identified in addition one embryo-specific module (ME7, see arrow Fig. 9B), and one endosperm-specific module (ME4, see arrow Fig. 9D).
The embryo-specific ME7 contained mostly genes that were highly correlated with germination under osmotic stress (Fig. 9B). A Venn diagram depicting the genes correlating with the different traits in the ME7 module that is depicted in the network shows that only two genes (Solyc03g095977.1.1 and Solyc03g095973.1.1) were found in common between the three vigour traits (Fig. 9G). Both were identified as paralogs of ABI4, a versatile transcription factor known to regulate dormancy in Arabidopsis [57]. The two genes that were in common between dormancy release and longevity encoded a GDSL-type esterase/lipase and AINTEGUMENTA-like 5 (AIL5), also known as CHOTTO1. Among the 31 genes only correlated with germination under stress, three TFs and two enzymes point to a role of cell wall such as orthologues of BEL1-Like HOMEODOMAIN 4 (SlBLH4, Solyc02g065490.4.1) and BLH2 (Solyc04g079830.2.1), both implicated in the regulation of pectin demethylesterification, a homologue of TRICHOME BIREFRINGENCE Like 37 (SlTBL39, Solyc03g006220.4.1), a UDP-glucosyl transferase (Solyc05g053120.1.1) implicated in lignin metabolism and a member of lipid-transfer protein family (Solyc10g075050.2.1, Additional file 2: Table S1). Also the regulation of cellular growth and organellar cell identity appeared to be implicated as shown by the presence of a homologue of BLH8 (Solyc11g069890.3.1), GROWTH-REGULATING FACTOR 5 (Solyc07g041640.3.1), NAC DOMAIN CONTAINING PROTEIN 33 (Solyc12g017400.3.1) and KANADI2 (Solyc06g066340.4.1).
The endosperm-specific module ME4 contained many genes that correlated strongly with dormancy release and longevity, and to a lesser extend with germination under osmotic stress (arrow Fig. 9D-F). Most of the positively correlated genes in this module (442) were found for all three traits (26%, 117 genes) or correlated with dormancy release and longevity (58%, 256 genes). A GO enrichment analysis revealed functional overlap between these two datasets (Fig. 9I, Additional file 7: Table S6). Both of them exhibited several terms broadly associated with defence response against pathogens. A more detailed look at the genes present in these gene-lists identified several paralogs encoding different transposases, a DDE-4 domain-containing protein that is also closely associated with transposase activity and a SNF2 helicase, associated with DNA repair. Six paralogs encoding a serine/threonine-protein phosphatase 7 long form-like protein were found in common between germination and longevity and seven of them common to the three traits. This gene (aka MAINTENANCE OF MERISTEMS LIKE-3, MAIL3) participates in transposable element silencing [58]. Additional genes associated with DNA repair (Solyc01g105520.3.1, Solyc01g105530.3.1, Solyc01g109460.3.1) were found among genes correlating with the three traits. This suggests that a process related to genome stability is actively initiated. Next, both data sets contained genes with signalling function associated with dormancy, namely an OPDA reductase (Solyc11g032230.3.1), GA signalling (GAMYB, three paralogs of RGA-like 3, a GA signalling repressor inhibiting testa rupture) and two ETHYLENE INSENSTIVE 3 (Additional file 7, Table S6). The common lists also showed many genes associated with cell wall activity, including 14 UDP-glycosyl transferases. The list of genes specifically correlated with the release of dormancy contained many genes with unknown function (Additional file 7: Table S6). A closer look at the 14 genes correlated specifically with longevity revealed orthologs of known regulators of ABA/GA signalling pathways such as PYL5, an ABA receptor and RAV1 (Related to ABI3/VIP1), a B3 transcription factor modulating the expression of ABI3, ABI4, and ABI5 [59].