We performed high-throughput RNA-seq using the Illumina HiSeq 2500 platform for embryo and endosperm tissues dissected from sorghum seeds at 14 DAP (days after pollination) of four genotypes (two parental pure lines BTx623 [B] and Y49 [Y], and their reciprocal F1 hybrids, BY and YB). We obtained a total of 487.43 million 150 bp paired-end clean reads which were comprised of 77.9 Gb of high-quality raw data (Table S1). The sequence reads were aligned to the BTx623 v3 reference genome (https://phytozome.jgi.doe.gov/pz/portal.html) by HISAT2. An overview of the distribution of the aligned reads along each of the 10 sorghum chromosomes showed that, as expected, most of the reads were located in the distal chromosome arm regions, while transposable elements (TEs) were mainly aggregated to gene-poor, pericentromeric regions (Fig. 1). Moreover, approximately 80% of the mapped reads were in exonic regions and only a small number of the reads were mapped to non-genic regions (Table S1). The aligned reads located in exonic and intronic regions were used to detect gene expression and alternative splicing (AS) for each gene in each sample (tissue and genotype). We assembled RNA-seq alignments into potential transcripts for each sample with StringTie. Gene and transcript expression levels were estimated with Ballgown and measured as transcript per million base pairs (TPM). The cutoff value for determining the expressed genes or transcripts was TPM > 0.
We found that although most genes were commonly expressed in both tissues of all genotypes, transcript isoform abundance seems higher in embryo than in endosperm, consistent with results in a maize transcriptome analysis (Lu et al. 2013). We detected a total of 46,035 genes expressed in the developing sorghum seed transcriptome, with 43,676 and 33,763 genes expressed in embryo and endosperm, respectively, which corresponded to 80.3–64.1% of the annotated genes, producing 81,800 and 66,632 transcripts, with 31,404 (63.2%) genes being commonly expressed in both tissues (Table S2). In addition, we detected expression of 15,690 and 6,637 unannotated genes, which produced 44,710 and 32,054 transcripts in sorghum embryo and endosperm, respectively. Of note, we found that the total numbers of expressed genes in F1 hybrids were higher than those in the parental lines in both tissues (Table S2).
Differentially expressed genes (DEGs) between embryo and endosperm, and between hybrids and parents
Differential expression analyses between the tissues and among the genotypes were calculated by DESeq2. Tissue-specific or tissue-preferential expressed genes were categorized based on the following criteria: (i) false discovery rate (FDR) was less than 0.01; (ii) genes with > 10 reads in one tissue and ≤ 1 read in another are defined as tissue-specific; (iii) genes showing > 2-fold differences in expression level are defined as tissue-preferential. According to these criteria, we identified 1,994 (4.6%) embryo-specific genes, 14,073 (32.2%) embryo-preferential genes, 2,335 (6.9%) endosperm-specific genes and 4,078 (12.1%) endosperm-preferential genes, respectively (Fig. 2a). Clearly, a greater proportion of the genes were specifically or exclusively expressed in endosperm than in embryo, whereas the proportion of preferentially expressed genes was 3.5 times more in embryo than in endosperm (Fig. 2a). However, for imprinted genes (defined as ≥ 5-fold differences in read counts between parental alleles), the great majority is endosperm-specific or -preferential, with embryo-preferential genes accounting < 10% of the total number of imprinted genes, and none imprinted gene showed embryo-specific expression.
Next, we analyzed distribution of the four types of DEGs (embryo-specific genes, embryo-preferential genes, endosperm-specific genes and endosperm-preferential genes) in the different genotypes, including a pair of reciprocal F1 hybrids (BY and YB) and their two parental pure lines (B and Y) (Fig. 2b, c). We found that tissue-specific expressed genes were less shared by the genotypes, i.e., they are more prone to showing genotype-specific expression. Approximately 18% and 26% of the embryo- and endosperm-specific genes, respectively, were common across all four genotypes. Additionally, ~ 52% and ~ 34% of the embryo- and endosperm-preferential expressing genes, respectively, were shared by all four genotypes (Fig. 2b). Notably, we found that the numbers of embryo-specific expressing genes in hybrids were significantly higher than those in their parents, whereas the other three types of between-tissue DEGs showed similar gene expression between hybrids and parental lines (Fig. 2b). Finally, there were more genes expressed at a higher level in embryo than in endosperm (Fig. 2c), which is consistent with a previous study in maize (Lu et al. 2013).
Additive and nonadditive gene expression patterns in embryo and endosperm of F1 hybrids
To determine the relative expression levels of genes in the sorghum reciprocal F1 hybrids (BY and YB) as compared to their parental pure lines (B and Y), and to identify genes displaying expression levels that deviate from additivity in the hybrids, we analyzed the RNA-seq data and identified nonadditively expressed genes in the hybrids. Here, nonadditive gene expression is defined as expression level in a hybrid that is significantly deviated from the averaged parental value, i.e., mid-parent value (MPV) (FDR < 0.05). In embryo, no significant differences between the reciprocal hybrids with respect of the proportions of genes showing additive vs. nonadditive expression were detected; both hybrids showed a predominant proportion of additive expression (99.6%) and very small proportion of nonadditive expression (0.39%) (Fig. 3). In contrast, in endosperm, the reciprocal F1 hybrids showed clear difference in the proportions of genes showing additive vs. nonadditive expression. Moreover, the proportions of nonadditively expressed genes in endosperm were markedly greater than those in embryo, with 10.3% and 7.6% genes showing nonadditive expression in endosperm of the reciprocal hybrids, BY and YB, respectively (Fig. 3). This suggests the influence of cytoplasm and other maternal effects were significantly exacerbated by nuclear genome dosage as the triploid endosperm has a 2:1 maternal vs. paternal nuclear genome contribution.
Differential alternative splicing (AS) between tissues and among genotypes
The RNA-seq data also enable global analysis of dynamic changes in alternative splicing (AS) patterns. We investigated spliced events in types of genes, between tissues and among the four genotypes using the algorithm ASTALAVISTA (Foissac and Sammeth 2007). Overall, of the 46,035 total expressed genes, 27,048 (58.8%) were multiexonic genes, and 41.9% of these genes were alternatively spliced, resulting in 31,012 AS events in total (Table 1). We found the overall AS frequency is slightly higher than that reported previously for sorghum (Wang et al, Genome Res 2018; Min et al. BMC Genomics, 2015), but similar to those reported in maize (Thatcher et al. 2014) and rice (Lu et al. 2010), and significantly lower than other plant species, especially Arabidopsis and soybean, in which AS is reported to occur in > 60% of multiexonic genes (Marquez et al. 2012; Shen et al. 2014). Specifically, 38.6% and 37.4% of expressed multiexonic genes were alternatively spliced and showing 29,600 and 25,427 AS events in sorghum embryo and endosperm, respectively (Table 1). Among these, a total of 22,859 AS events occurred in both tissues but embryo-specific AS events (6,741) were 2.6 times more than endosperm-specific AS events (2,568) (Fig. S1). Then, we analyzed AS events, in separation, for the four types of the DEGs and of imprinted genes. We found that 30.4% and 56.4% of embryo-specific and –preferential genes were alternatively spliced respectively, however, the ratios were much lower in both endosperm-specific and –preferential genes than in embryo DEGs, showing only 17.8% and 29.4%, respectively (Table 1). Additionally, the frequency of AS in imprinted genes was 46.3%, which was higher than those of total expressed genes and endosperm DEGs (Table 1).
Table 1
AS events in sorghum embryo and endosperm
| Expressed genes | Expressed multiexonic genes | AS genes (% of multiexonic genes) | AS events of multiexonic genes |
Total | 46,035 | 27,048 | 11,332 (41.9) | 31,012 |
Embryo | 43,676 | 25,745 | 9,949 (38.6) | 29,600 |
Endosperm | 33,763 | 24,859 | 9,299 (37.4) | 25,427 |
Embryo-specific | 1,994 | 1,466 | 446 (30.4) | 589 |
Embryo-preferential | 14,073 | 12,681 | 7,153 (56.4) | 20,503 |
Endosperm-specific | 2,335 | 1,561 | 278 (17.8) | 374 |
Endosperm-preferential | 4,078 | 2,942 | 865 (29.4) | 1,526 |
Imprinted genes | 101 | 80 | 37 (46.3) | 70 |
Next, we projected these comparisons with emphasis on genotype differences in both tissues. Results showed that (i) 16,920 and 10,847 events were conserved among all four genotypes in embryo and endosperm, respectively; (ii) compared with parents, 3,093 and 3,522 AS events were unique to embryo and endosperm of at least one hybrid, and 624 and 537 of both hybrids; (iii) compared with parents, 3,093 and 3,522 AS events were lost from embryo and endosperm from at least one hybrid, and 444 and 552 from both hybrids (Fig. 4a).
We further investigated the different AS types in tissue-specific or tissue-preferential expressing genes, as well as in imprinted genes. The seven AS modes were illustrated (Fig. 4b). Overall, the four basic AS types accounted for ~ 78% of the total AS events in both tissues, and the distribution of different types of AS events was also similar between embryo and endosperm. Specifically, as described in other plant species, we found that intron retention (IR) accounted for the most abundant AS type in both tissues (~ 25%), though it was only slightly higher than the second type, alternative 3’ splicing site (A3SS). A similar trend was observed in 20 DAP sorghum BTx623 embryo and endosperm (Wang et al. 2018). Exon skipping (ES) events varied in different plant species from 3% in Arabidopsis (Marquez et al. 2012) to 25% in rice (Zhang et al. 2010) and 33% in maize (Thatcher et al. 2014). The number of ES events in our data fell in the middle, accounting for ~ 17% of the total AS events. Alternative 5’ splice site (A5SS) was the least frequent mode of the four main AS types (~ 13%) in both tissues (Fig. 4c). We also calculated the rare AS types. For example, both the skipping two exons, “ES1 + ES2” and “A5SS or A3SS” counted ca. ~2.5% of the total AS events. The mutually exclusive exon (MXE) AS type, which has been detected in rice (Lu et al. 2010; Zhang et al. 2010), but not in Arabidopsis (Marquez et al. 2012), accounted for ~ 0.6% of the AS in sorghum, ranked the least AS type in our data (Fig. 4c). This is consistent with the results in maize (Lu et al. 2013). For those complex AS models, we put them all together to the type of “others”, which likely resulted from different combinations of the four basic types or arose from different mechanisms (Syed et al. 2012). These complex models accounted for approximately 17% of the total AS events in embryo and endosperm of sorghum (Fig. 4c).
For the DEGs, no significant differences in the proportion of AS patterns were observed in tissue-preferentially expressed genes of both embryo and endosperm compare to all genes from the sorghum genome. However, the tissue-specific genes in both tissues showed obviously different splice patterns. Specifically, (i) in embryo-specific genes, IR was increased 16%, whereas in endosperm-specific genes, IR was decreased 28%, relative to those of the total expressed genes; (ii) the exon skipping type did not change in embryo-specific expressing genes but was increased by 27% in endosperm-specific expressing genes; (iii) in both embryo- and endosperm-specific genes, the “others” AS type was decreased compared to the total expressed genes (Fig. 4c).
Interestingly, we found that all the four common types of AS, especially IR, were decreased in sorghum imprinted genes, while those of the rare types, like “skip 2 exons”, “A5SS or A3SS”, MXE and other complex types were increased, led to nearly 30% of the AS events were accounted by the “others” type in imprinted genes (Fig. 4c). This suggests that compared with embryo, endosperm likely has unique AS splicing mechanisms.
We further analyzed the sequence length covered by each AS type in sorghum embryo and endosperm. We found that the four major types of AS events showed different sequence lengths. Specifically, (i) the retained intron length increased from 60 to 100 bp; (Fig. 5); (ii) both of A3SS and A5SS showed similar features of sequence length, with the most frequent length < 10 bp; (iii) the skipped exon lengths ranged from 1 to 200 bp and peaked at 70–80 bp, but this peak was not as sharp as the other types of AS events (Fig. 5). These features of AS are consistent with previous findings from other species (Campbell et al. 2006; Shen et al. 2014).
Generation of mature mRNAs from genes requires precise removal of introns from precursor mRNAs (pre-mRNAs) and joining of exons. Four loosely conserved core sequence elements in introns, the 5’SS with a conserved GT, the 3’SS with a conserved AG, therefore, usually splice at a 5'-GT-AG-3' exon-intron junction (Reddy et al. 2013). We compared splice motif preference among the different AS types and between tissues. We found that the splicing junction site of 5’-GT-AG-3’ pair represented the highest proportion of all splicing sites, followed by the 5’-GC-AG-3’ pair. Specifically, of the IR and A5SS events, splice junctions (SJs) were comprised of ca. 96% GT-AG and 4% GC-AG, whereas in A3SS and ES, SJs were comprised of 98% and 99% GT-AG, 2% and 1% GC-AG, respectively. (Fig. S2). This result was in line with previous studies in other plant species (Li et al. 2014; Marquez et al. 2012; Shen et al. 2014). Further Chi-square test indicated that there was no significant difference in the frequencies of the types of splicing sites between embryo and endosperm.
Experimental validation of differential expression and AS events
To validate the DEGs and AS events detected in our transcriptomic data, we selected 20 AS genes (17 of which were also DEGs between tissues) and interrogated the accuracy via RT-PCR (primers were listed in Table S3). Results showed that most of the DEGs and AS events were very well supported by this independent experiment, with 82% of the DEGs and 80% of the AS events were supported in both tissues (Fig. 6; Table S3).
Gene ontology (GO) enrichment analyses of DEGs and AS genes
To assess the possible functional relevance of the DEGs, we conducted Gene Ontology (GO) enrichment analysis by using a hypergeometric statistic (R function phyper). The significantly enriched (p value < 0.05) top GO terms in each category of the DEGs were presented in Fig. 7 and all enriched GO terms were listed in Table S4-S6. We found that more GO terms were enriched in embryo than in endosperm when controlling FDR at the 0.01 level. Those genes that exhibited embryo-specific expression were involved in all three main GO categories: biological processes, molecular function, and cellular component. For example, the top significantly enriched GO subcategories in the biological process category were “cellular glucan metabolic process”, “regulation of transcription” and “protein amino acid phosphorylation”. In the molecular function category, “ADP binding”, “protein dimerization activity”, “heme binding”, and “transporter activity” were significantly enriched. We also found that the embryo-specific genes were enriched in a few cellular component categories including “cell wall”, “apoplast” and “membrane” (Fig. 7 and Table S4). In contrast, there were limited functional terms involved in endosperm-specific genes when controlling FDR at 0.01. “Biological processes” of endosperm-specific genes were mainly enriched in “regulation of transcription (DNA-dependent)” and “proteolysis”. In molecular function category, “transcription factor activity”, “enzyme inhibitor activity” and “aspartic-type endopeptidase activity” were significantly enriched (Fig. 7 and Table S4). GO enrichment was not detected for endosperm-DEGs in the “cellular component” category.
Analysis of GO enrichment for nonadditive expressing genes, using all sorghum genes as background, revealed that they were significantly overrepresented by some specific GO terms (Fig. 7). All the enriched GO terms were listed in Table S5. Importantly, GO terms “defense response”, “response to biotic stimulus”, and “oxidoreductase activity” were significantly enriched in both tissues, indicating that nonadditively expressed genes in hybrids were possibly related to increased resistance or resilience to biotic and abiotic stresses of the hybrids. In addition, in embryo, the significantly enriched top GO terms were “transferase activity (transferring hexosyl groups)” and “ribonuclease T2 activity”, while the nonadditively expressed genes in endosperm were highly enriched in “carbohydrate metabolic process”, “proteolysis”, “oxidation reduction”, “serine-type carboxypeptidase activity”, “extracellular region”, and so on (Fig. 7 and Table S5).
We also conducted GO enrichment analysis for AS events in the different tissues and genotypes. The numbers of genes categorized to the GO terms were listed in Table S6, and the significant top GO terms in each category were presented in Fig. 7. Overall, fewer AS genes were involved in functional terms compared to the DEGs when controlling FDR at the 0.01 level. Nevertheless, the AS genes of the “IR” type were highly enriched in protein metabolism process in both tissues. Importantly, the GO term “nutrient reservoir” (GO:0045735) was found to be significantly enriched in the genes of the “A3SS” and “A5SS” AS types in endosperm but not in embryo of all four genotypes, indicating functional difference of genes underwent specific types AS between the tissues. In addition, genes spliced by “A3SS”, “ES” and “A5SS” patterns of both tissues showed highly enriched in GO terms “nucleic acid binding” or “RNA binding” (FDR corrected P values, 0.01) (Fig. 7 and Table S6).