Combined sequencing approach to analyze the ovule of P. tabuliformis
To investigate the process of FNM in ovule development in female fertile and sterile lines of P. tabuliformis, reveal the reasons for the FNMM discontinue, we selected three different time points to further analyze the ovule development. Including the early February (early period of FNM process), the early March (middle period of FNM process), and the early April (late period of FNM process). Female fertile line (FER) ovule samples at above periods were marked by FER-FNM1, FER-FNM2, and FER-FNM3, respectively, while female sterile line (STE) ovule samples were marked by STE-FNM1, STE-FNM2, and STE-FNM3, respectively, each sample in triplicate. Eighteen mRNA samples from different stages ovule samples were subjected to 2*150 paired-end sequencing using the HiSeqTM 4000 platform, with 961,064,004 reads produced (Fig. 1). The repeatability of the RNA-seq libraries was evaluated using a PCA analysis; the reproducibility of the libraries was good (Fig. S1). Then combined in equal amounts of total RNA from three stages (FNM1, FNM2, and FNM3). Finally, we got two pooled samples (FER and STE). Two samples were normalized and subjected to an SMRT sequencing using the PacBio RS II platform (Table S1). In total, 276,887 (FER) and 307,741 (STE) ROIs were generated respectively (Fig. 1), and most of them (73.27% in FER, 64.33% in STE) were FLNC reads, which the entire transcript region from the 5´ primer to the 3´ primer, and the 3´ poly(A) tail were observed (Fig. 2A), and the length of FLNC is longer in STE than FER (Fig. 2B, 2C).
As shown in Fig. 2F, there was a significant variation of transcript length distribution between two platforms, and the read length was longer by PacBio, only 1.5% of the transcripts from the Pacio reads were <600 bases, but nearly 60% of the assembled transcripts from NGS reads were in this range. Based on Illumina short-read data, the length of N50 was 1,767bp, whereas N50 was 3,564bp based on PacBio, demonstrated that data assembling by PacBio was better than Illumina. Therefore, NGS reads corrected using SMRT subreads is more valuable than relying on single platform data. Besides, combined sequencing is more worthwhile for the species without a reference genome.
Functional annotation of full-length transcriptomes of FER and STE ovule in P. tabuliformis
For annotation of genes, alignment searches were conducted against public bioinformatics databases. And to further evaluate the function distribution of our transcriptome, we used GO annotation to classify these genes in FER and STE, respectively (Fig. S2). Annotated genes were divided into three categories, biological process, cellular component, and molecular function. In the biological process category, both in FER and STE, annotated genes were significantly enriched in metabolic processes, cellular processes, and single-organism processes. Furthermore, cell, cell part, and organelle enriched more obviously in the cellular component in each line; in the molecular function category, the genes from different lines were enriched into similar processes, mainly for catalytic activity and binding.
The Kyoto Encyclopedia of Genes and Genomes (KEGG) was also used to investigate the pathway and network of the molecular regulation of FER and STE. Both in FER and STE, annotated genes were mapped into 129 pathways (Table S2, S3). Same to the GO annotation, the classes of KEGG pathway between two lines terms were quite similar. Same in two lines, metabolic pathways, biosynthesis of secondary metabolites, biosynthesis of antibiotics and microbial metabolism in diverse environments were the top 4 pathways enriched the most abundant transcripts, but compared with FER, there were more genes enriched in these 4 pathways in STE (Table S2, S3). Although the number of genes in FER and STE enriched in each pathway was slightly different, however, there were no significantly differences in pathway annotation between FER and STE.
We got a through functional characterization for the full length transcripts of the FER and STE ovule, relied on the results from GO and KEGG functional annotation and classification. The GO and KEGG results also indicated that the functional annotation and classification of transcripts between two lines were similar on the whole, showing the pathway level was conservative for the transcriptom of FER and STE in P. tabuliformis. However, the expression level of individual genes might be different, which might related to the ovule abortion.
Candidate genes with opposite expression patterns during the FNM process in STE and FER
According to the variety of gene expression during the whole development process in FER and STE ovules, genes could be clustered into eight profiles by Short Time-series Expression Miner (STEM) software. These model profiles have been chosen to summarize the expression pattern of the genes. Among the eight patterns, we identified four patterns of genes that shown very significant p-values (colored boxes) (Fig. S3), these four profiles are significant both in FER and STE, containing two down-regulated patterns (profile 1 and 0) and two up-regulated patterns (profile 6 and 7). Genes in profile 1 and 0 in STE have been combined, however, profile 6 and 7 have been added in FER. Then 23 DEGs up-regulated in FER also down-regulated in STE (part 1) have been screened (Fig. 2D). 69 DEGs up-regulated in STE as well as down-regulated in FER (part 2) could be filtered equally (Fig. 2E). These genes expressed in these two sections established contrary trends in FER and STE during the FNM process, which could probably play a pivotal role in ovule development.
Two parts of genes were subjected to GO-term analysis, respectively. They were still classified into three main categories, including “cellular component”, “molecular function” and “biological process” (Fig. 2G, 2H). In the biological process category, metabolic process, cellular process, and single-organism process were the most significant in both two parts. In the cellular component category, cell and cell parts are more obvious. Most DEGs were in catalytic activity and binding two parts in the molecular function category. Then DEGs were subjected to KEGG pathway enrichment analysis. The KEGG pathways have been mapped in are shown in Table S4, S5. In part 1, annotated genes enriched in 14 KEGG pathways and 24 pathways in part 2.
Combining two parts of DEGs, Fig. 3 and Table S6 had listed the DEGs which expression level existed obvious differences in varying lines, that were likely associated with female sterility in P. tabuliformis. Energy metabolism and signal transduction are important for FNMM development. Therefore, we investigated the genes associated with carbohydrate metabolism and signal transduction, including sucrose synthase, phosphofructokinase, chorismate synthase, clavata 1-like protein, tetraspanin, reticulons and plasma membrane ATPase related genes. Among these DEGs, most showed significantly lower transcript levels in STE ovules. Specifically, the transcript levels of sucrose synthase genes SUS (Iso_0064722, Iso_0065236, Iso_0045687), AGPL1 (Iso_0015412), clavata 1-like protein related genes CLV1 (Iso_0025308, Iso_0025527, Iso_0073847, Iso_0079334), and some other transmembrane protein related genes RTNLB8 (Iso_0006259), TET18 (Iso_0006259), NPF3.1 (Iso_0006259) were markedly lower in STE compared to FER.
There were also severel DEGs related to mitosis and apoptosis, encoding cyclin (CYCB, Iso_0016253), tubulin (TUBA, Iso_0012780), dihydrofolate reductase (DHFR, Iso_0053543) and cysteine proteinase inhibitor (CPI, Iso_0031083), which differentially expressed between FER and STE also might lead to FNM half-stop. DEGs associated with mitosis like CYCB, TUBA and DHFR all kept a high expression level in FER and low in STE. However, CPI was high expressed in FER, but the low expression level of CPI maybe result in programmed cell death and active oxygen accumulated [32]. Similarly, the expression of bHLH66 (Iso_0067886) was obviously different between FER and STE. Furthermore, the transcript levels of several genes related to the stress response, such as PER (Iso_0053653, Iso_0049597, Iso_0010250), SOBIR1 (Iso_0024626), exgA (Iso_0058895) and CALS3 (Iso_0078524) showed opposite expression levels in varying lines.
AS analysis of DEGs with different expression patterns during the FNM process in two lines
Describe the complexity of AS at the whole transcriptome scale is one advantage of full-length sequencing. To analyze the AS events of transcript isoforms, we partition transcripts into gene families, then reconstructed into a coding reference genome based on k-mer similarity (Fig. 4A). A total of 42.69% had one isoform in FER and a slightly lower percentage (36.91%) in STE. However, there were more contigs with more than ten isoforms in FER (4.49%, 653) compared with STE (4.44%, 608) (Fig. 4C). Seven AS modes (skipping exon, mutually exclusive exons, alternative 5´ splice-site, alternative 3´ splice-site, retained intron, alternative first exon, and alternative last exon) existed in the ovule of P. tabuliformis (Fig. 4B). In total, 1,830 and 1,243 AS events were identified in FER and STE, respectively. The retained intron was the most significant AS mode, which also the predominant mode in plants [33]. Together with alternative 5´ and 3´ splice-site AS events, these three types of AS event accounted over 80% of detected events. On the contrary, mutually exclusive exons was the least AS event (Fig. 4D).
To explore whether AS exists in the genes might affect female sterility in P. tabuliformis, we researched AS in the DEGs, which analyzed above with opposite expression patterns during the FNM process in varying lines. Finally, isoforms of AGPL1 (associated with starch metabolism), bHLH66 (associated with plant growth and development), and TUBA (associated with mitosis) were diverse in FER and STE (Table S7), RI was the main AS event and existed in all three DEGs, TUBA also had A3 splicing (Fig. 4E, S4). The number of isoforms was varied between FER and STE in AGPL1 and TUBA, but bHLH had only one isoform in each line, however, the splicing isoforms were different.
Development and characterization of SSR marker
To further investigate the differences of regulatory mechanisms between FER and STE ovules, thus the microsatellites were identified in this study. 40,828 (in FER) and 44,195 (in STE) genes were generated to discover potential microsatellites and defined as di- to hexa-nucleotide motifs in this study. Using the MIcroSAtellite (MISA, http://pgrc.ipk-gaterSTEeben.de/misa/), a total of 5,530 potential simple sequence repeats (SSRs) were identified in 4,276 genes in FER. Similarly, 5,096 SSRs were identified from 3948 genes in the STE ovule. Of the 4,276 genes, 868 contained more than one SSR in FER, however, in STE, this radio is 796 of 3,948 (Table 1).
In the present study, the characterizations of the potential 5,530 SSRs in FER and 5,096 in STE were further analyzed. As shown in Table 1, the di-nucleotide repeats (45.5% in FER, 49% in STE) ranked first, and next came the tri- (38.4% in FER, 36.4% in STE), tetra- (8.4% in FER, 8% in STE), hexa- (5.2% in FER, 4.5% in STE) and penta- nucleotide (2.4% in FER, 2.6% in STE) repeats. Besides, the number of repeat units from Di- to hexa-nucleotide motifs were further summarized. As shown in Table S8 and Fig. 5A, 5B, the mostly represented repeat units of potential SSRs was 4-7, which accounts for 74.8% in FER, and 72.2% in STE. Tri- nucleotide repeats was the most abundant in this part, both in FER and STE. FER has more potential SSRs than STE, and the quantity difference of 4-7 repeat unit between FER and STE was more significant than the other levels of repeat unit. From Fig. 5C, 5D, we found that the SSR motifs in two lines shared significant similarity. The AT/TA di-nucleotide repeat was the most abundant motif in FER (30.4%) and STE (32.1%), followed by AG/CT (11% FER, 12.1% STE), AGC/CTG (8.9% FER, 6.7% STE), AAG/CTT (7.5% FER, 7% STE), AAT/ATT (6% FER, 6.7% STE) and ATC/ATG (5.9% FER, 5.6% STE). These six types of nucleotide repeats mentioned above represented about 70%.
Verification of the Gene Expression Profile by qRT-PCR
To confirm the transcriptomic analysis results, DEGs were randomly selected for quantitative real-time PCR (qRT-PCR) validation using the same type of samples compared with formerly used samples in NGS analysis, including SUS, PER12, PFK2, CLV1, SOBIR, UBQ10, TUBA, AGPL1, and TET18. The primers of these DEGs were listed in Table S9. The expression profiles of the candidate DEGs revealed by qRT-PCR data were consistent with those derived from sequencing. As shown in Fig. 6, the energy metabolism related genes SUS, AGPL1 and PFK2 were high expressed in FER, the expression level of PKF2 was significantly high in FNM3 stage, SUS and AGPL1 were high in FNM1. The expression of CLV1 and TET18, which is associated with signal transduction, and TUBA were markedly lower in the STE ovules compared with FER ovules. In addition, the expression levels of PER12 and UBQ10 that related to stress response were higher in FER than STE, especially in FNM1 stage. On the contrary, SOBIR1 was obviously up-regulated in STE, which is assiociated with apoptosis. The expression profiles of the candidate DEGs revealed by qRT-PCR data were consistent with those derived from sequencing.