The seed coat area of hybrid and its parents was significantly different on 8DAP.
Seed coat area and kernel weight from 0 to 15DAP of hybrid and parental lines were measured continually (Fig.1A). At 0DAP, seed coat area of hybrid was lower than both mid-parents value (MPV) and maternal line. On 6DAP, the hybrid values began to exceed the MPV but were still lower than maternal line. With the expansion of seed coat, hybrid value continued to be higher than MPV on 7DAP and slightly exceeded the maternal line. On 8DAP, there was a statistical difference (P value = 0.034) between the hybrid and the MPV of seed coat area (Fig.1B). From 9 to 15DAP, the seed area of the hybrid was more and more different from the MPV and paternal line values (Table S1). In conclusion, 8DAP is the inflection point of heterosis in maize seed coat, and also the entry point of heterosis in seed coat of maize. The kernel weight of hybrid was statistically different from that of mid-parents after 10DAP (Fig.1D). These results indicated that the heterosis of seed coat area was prior to seed weight.
In seed coat area, the relative growth rate of hybrid and paternal line peaked at 3DAP, and the maternal line grew most rapidly on 2DAP (Fig.1C). In terms of the relative increasing rate of kernel weight, the value of hybrid was the highest on 3DAP, 2DAP for maternal line and 4DAP for paternal line (Fig.1E). In general, seed coat area and kernel weight showed a large increase in 2-4DAP. More importantly, 3DAP is a critical time point for hybrid seed coat development.
Relative cell numbers in seed coat
In order to understand whether the increased seed coat area in hybrid was due to changes in cell number or cell size, we counted the seed coat cell number per unit area (0.1mm2) (Fig. 2). The middle longitudinal section of seed embryos collected on 8DAP were performed (Fig.2A-C). Surprisingly, the number of cells per unit area of the paternal line (319 cells per 0.1mm2) was higher than that of the hybrid (149.7 cells per 0.1mm2), the maternal line (138 cells per 0.1mm2) and the MPV (228.5 cells per 0.1mm2) on 8DAP (Fig. 2D). Although there was no difference in number of cells per unit area between hybrid and maternal line at 8DAP, there was a significant difference in seed coat surface area between them (Fig.2D, E). Thus, compared with maternal line, the cell number of hybrid is determined by its larger seed coat area. The difference in cell number between hybrid and MPV was significant, suggesting that the difference in seed coat size between hybrid and parents was mainly due to cell number rather than cell size.
Introduction of RNA-seq data
To investigate the expression of genes before and after the time points at which the seed coat of hybrid lines and parents showed statistical differences, we performed RNA-seq on 18 samples of 6 time points (2-4DAP and 8-10DAP) of hybrid and parental lines, with 3 biological replicates for each sample. 54 libraries in total, 2.4 billion high-quality reads were generated. The reads were then mapped to the maize B73 reference genome (RefGen_V4 [26]) by Hisat [27]. An average (87%) of reads were uniquely mapped (Table S2) to calculate the normalized gene expression level as fragments per kb of transcript per million mapped reads (FPKM). There was a high correlation among the three biological replicates in each time point (average R2=0.93) (Table S3). All the above results indicated that the sampling quality of this survey was exact, and RNA-seq sequencing data were accurate and reliable. Thus, we took the average FPKM value of the three replicates as the expression level for the sample at each time point.
Expressed genes
The number of expressed genes of Yudan888 hybrid and parental lines at 6 time points were investigated. Followed by the number of overlapping expressed genes of each material counted. The results showed that the total number of genes expressed by hybrid, maternal and paternal line at six time points were 23516, 23596 and 23247, respectively. 17691, 18328 and 17604 genes were overlapped at 6 time points, accounting for 75.23%, 77.67% and 75.73% of the total expressed genes (Fig.3A-C), respectively. Taken together, the results of overlapping expressed genes in hybrid and parental lines showed that the growth and development of seed coat was mainly due to gene expression quantity rather than the change of gene expression type and number.
Differentially expressed gene identification
It was suggested that the hybrid showed statistical differences with its parents on 8DAP, while the relative growth rate of hybrid seed coat area and kernel weight reached the peak at 3DAP. Therefore, we focused on RNA-seq data of 3DAP and 8DAP for subsequent analysis. In DEG analysis, the gene expression amounts in hybrid were compared one-by-one with their parental lines, setting the difference fold change (FC) ≥ 2 and (FC) ≤ -2 as up-regulated expression and down-regulated expression, respectively. Some DEGs were high or low expressed either in hybrid or parental lines. Among 3DAP, a total of 3915 DEGs varied between hybrid and maternal line (2134 (54.50%) up-expressed and 1781 (45.50%) down-expressed, 2647 DEGs varied between hybrid and paternal line (1248(47.15%) up- expressed and 1399 (52.85%) down-expressed). Compared with 3DAP, the proportion of differential genes up-regulated in 8DAP increased sharply. 2182 DEGs varied between hybrid and maternal line (1499(68.70%) up- expressed and 683(31.30%) down-expressed), and 2724 DEGs varied between hybrid and paternal line (1847(67.80%) up- expressed and 877(32.20%) down-expressed) (Fig.4A). The differences vary greatly in the reconciliation and/or down regulation of gene expression between hybrid and different parent combinations in the comparison of 3DAP and 8DAP. Some DEGs have genotype specific expression and thus identified as genotype specific uni-genes (Fig.4B). Venn diagram analysis indicated that 1156 genes showed DEGs (highlighted as black bold font) between hybrid and parents at both time points (Fig.4B). Out of the 1156 genes, 834 DEGs (sum of bold font numerals) were commonly observed between M vs H and P vs H at 3DAP. Whereas, 430 DEGs (sum of bold font numerals) were commonly observed between M vs H and P vs H at 8DAP (Fig.4B). In hybrids at both time points, 108 DEGs were found commonly differentially expressed (Fig.4B).
Expression patterns of DEGs
In order to explore and categorize the expression alteration trend, DEGs differentially expressed in 3DAP and 8DAP samples were divided into 13 possible hybridization and parental expression patterns (Table 1). Gene expression patterns were divided into additive expression and non-additive expression. We investigated that there were no significant changes in the number of additive expressed DEGs between the two time points. The expression pattern of most genes in hybrid at two time points was additive expressed (85.18% at 3DAP and 86.45% at 8DAP). However, the effects of non-additive DEGs are quite different, especially the dominantly expressed ones (classes 5, 7, 10 and 12). That is to say, the gene expression in hybrid is similar to that of one parent, which accounts for more than 13% of all DEGs at both time points. On 3DAP, the number of higher paternal dominantly expressed genes accounted for the highest proportion in non-additive pattern, while on 8DAP, the number of higher maternal dominance genes was the most abundant. In the number of over-dominantly expressed genes in transitional down expression (0.35% at 3DAP and 0.14% at 8DAP) is relatively higher than that in transitional up expression (0.10% at 3DAP and 0.08% at 8DAP).
We summarized 13 possible hybridization and parental expression patterns into mid-parent, high-parent and low-parent to analyse the expression changes at either time points. By comparing the number of DEGs in three expression patterns at two time points, it was found that the number of low-parents DEGs changed the most, followed by the high-parents DEGs, while the number of mid-parents expressed DEGs changed the least (Table S4). According to the analysis of the fold changes at two time points, it was suggested that the change of the number of DEGs expressed by low-parents and high-parents may play a role in Yudan888 seed coat heterosis.
Functional classification by gene ontology and KEGG pathway enrichment analysis
The DEGs with high and low parental expression at 3DAP and 8DAP were annotated by Gene Ontology (GO). The enrichment of GO with low-parental expression of 3DAP showed that a large number of DEG categories related to stress were significantly enriched in biological process, followed by cellular component, and categories related to plasma membrane (Fig.5A). 3DAP highly expressed GO enrichment results suggested that categories related to chromosome assembly and processing were enriched in cellular component, and categories related to cell cycle and DNA replication were enriched in biological process (Fig.5B). The enrichment results of GO with low-parental expression of 8DAP demonstrated that the categories related to cytoskeleton and microtubule were enriched in cellular component, and the categories enriched in biological process were mainly related to cell cycle, nuclear division and microtubule (Fig.5C). 8DAP highly expressed GO enrichment results indicated that categories related to metabolism were enriched (Fig.5D).
Leverage OmicShare Tools (https://www.omicshare.com/tools/Home/Report/ koenrich) Dynamic KEGG pathway enrichment analysis was carried out to further understanding of gene biological function and gene interactions. We performed KEGG enrichment analysis on DEGs with high and low-parental expression at 3DAP and 8DAP time points, respectively (Fig.6). Among 3DAP low-parental expressed DEGs, metabolic pathways, biosynthesis of secondary metabolites, protein processing in endoplasmic reticulum and glutathione metabolism were significantly enriched (Fig.6A). In the 3DAP high-parental expressed mode, biosynthesis of secondary metals, DNA replication and photosynthesis were significantly enriched (Fig.6B). Metabolic pathways, biosynthesis of secondary metabolites, alpha linolenic acid metabolism were significantly enriched in 8DAP low-parental expressed DEGs (Fig.6C). By contrast to 3DAP high parental expression, DNA replication was also enriched in 8DAP low-parental expression mode. In 8DAP high-parental expression pattern, among which the circadian rhythms, biosynthesis of secondary metals, glycolysis/gluconeogenesis, thiamine metabolism, metabolic pathways, galactose metabolism and other pathways were significantly enriched (Fig.6D).
Weighted gene co-expression network analysis (WGCNA)
In order to identify the specific genes highly related to seed coat area, we constructed co-expression networks by using the transcriptome data of 6 time points between hybrid and its parents, and associated the co-expression module with seed coat area traits. The results showed that these genes should to be divided into 41 modules (Fig.S1). By observing the correlation between modules and samples, it is found that the absolute value of the correlation coefficient between MEyellow (0.86) and MEsienna3(0.82) modules is the highest (Fig.S1). In other words, these two modules have the highest correlation with seed coat area. In these two modules, the genes with the top 100 weight values were screened to construct the regulation network diagram (Fig.7, Fig.S2). Firstly, the module MEyellow showed that Zm00001d019306 is a hub gene (Fig.7). However, in module MEsienna3, the regulatory network is not as simple and clear as that in MEyellow. We chose genes with a degree value of more than 10 as hub genes. Therefore, the hub genes in module MEsienna3 are Zm00001d006330, Zm00001d043289, Zm00001d007181, Zm00001d043290, Zm00001d033510, Zm00001d024732, Zm00001d015025, Zm00001d006331 and Zm00001d047796 (Fig.S2, Table S5).
Validtion of DEGs by Quantitative real-time PCR
The expression patterns of nine differentially expressed genes were further confirmed by quantitative real-time PCR (qRT-PCR). We compared the transcriptional profiles of each gene with the seed coat RNA samples from hybrid and parents at different time points. The data confirmed that the expression patterns of all 9 differentially expressed genes were consistent with the expression levels obtained by RNA-seq analysis (Fig.8, Fig.S3)