Data Filtering and High Quality Clean Reads Mapped with Reference Genome
The base composition and quality value before and after filtering is shown in Additional file 1: Figure S1, Additional file 2, Table S1 and Additional file 3: Table S2. The Q20 value of all samples was >98.9% and the Q30 value 96%. The alignment rates to the reference genome for each sample, including the parental cultivars ‘Red Globe’ (R.G) and ‘Centennial seedless’ (C), and pooled seeded (S) and seedless (SL) progeny, were >95% (Table 1).
Identification and Classification of Single Nucleotide Polymorphisms (SNPs) and Small Indels Based on Genome Re-sequencing
To catalog genetic diversity in grapevine and its potential influence on gene function in the context of ovule development, we carried out whole genome re-sequencing of the parental cultivars ‘Red Globe’ and ‘Centennial seedless’, as well as their pooled seeded and seedless progeny. A total of 6,379,614 SNPs and 745,433 small indels were identified among these four samples. All of the identified variants were indexed based on their genomic position (Figure 2a). Most variants were found in the intergenic regions, followed by intronic, upstream, downstream and exonic regions. The least number of variants were found in the 5’ UTR regions. A total of 222,652 variants were located within exons. We classified these further based on their potential effects on gene function (Figure 2b). We focused our attention on those variants predicted to cause amino acid changes in protein coding genes (frameshift, substitution, premature termination, and loss of termination). Four kinds of transition and eight kinds of transversions were identified, with C to G transversion and G to A transition the most common. The ratio of transition to transversion was 0.48, which is close to the ratio of 0.5 in the ideal condition.
Gene Ontology (GO) and Pathway Analysis of Identified DEGs Containing Functional Variants
To identify functional variants most closely related to seedlessness, we analyzed these results in the context of transcriptome data using the same progeny as plant materials, aiming to further identify functional variations which may alter the expression of the gene. Functional variations were identified from genome data and the number of corresponding associated genes are shown in Table 2. In addition, a relative comprehensive statistical analysis was conducted in combination of both genome and transcriptome data, and the number of functional variants containing genes identified as DEGs at least at one stage and at all three developmental stages were counted based on function classification of variants (Table 2). With respect to ovule development comparing the seeded and seedless progeny at Stage 1, Stage 2 and Stage 3, a total of 2,425, 2,988 and 2,328 DEGs containing functional variants were identified, respectively (Additional file 4:Table S3). At each ovule developmental stage, the number of up-regulated DEGs (seedless progeny/seeded progeny) were found to be higher than down-regulated ones. Moreover, the number of DEGs containing functional variations were found to be highest at Stage 2, irrespective of whether the gene was up- or down-regulated.
To explore the biological processes and pathways associated with DEGs containing functional variations, we performed a GO and pathway analysis of the DEGs containing functional variants at three developmental stages. In general, results of both enriched GO and pathway were basically consistent and focused on aspects involved in seed development. A total of 162 GO terms were found to be significantly enriched at all three developmental stages (Figure 3a). Those GO terms associated with up-regulated DEGs were mainly involved in biological processes like plant defense, regulation of reproductive development, regulation of hormone balance, seed coat development, endosperm and seed development, and cell death (Figure 3b). On the other hand, GO terms associated with down-regulated DEGs were mainly related to ‘lignin biosynthetic process’, ‘secondary metabolic process’, ‘oxidation reduction’ and ‘iron ion transport’ (Figure 3b). A total of 124 pathways were found to be enriched at all three ovule developmental stages (Figure 4a), and these enriched pathways were mainly related to regulation of hormone balance (‘cytokinins degradation’, ‘cis-zeatin biosynthesis’, ‘jasmonic acid biosynthesis’ and ‘ethylene biosynthesis from methionine’), seed coat development ( ‘cellulose biosynthesis’ and ‘suberin biosynthesis’), secondary metabolic biosynthesis (‘flavonoid biosynthesis’) and oxidation-reduction (‘glutathione-mediated detoxification’) (Figure 4b).
Functional and Expression Analysis of DEGs Containing Functional Variants
We focused further attention on functional variant-containing genes that were differentially expressed in the developing ovules, between seeded and seedless cultivars, at all three ovule developmental stages (Table 2). We estimated gene function based on their annotation, and focused on genes encoding potential disease resistance protein, protein kinase, transcription factor, cytochrome P450 and other factors affecting seed development.
Functional variants within genes encoding protein kinases were found to be nonsynonymous SNP, frameshift deletion, frameshift insertion, stopgain and stopless (Figure 5a). The expression patterns of these DEGs were almost similar at all three ovule developmental stages, i.e., mainly up-regulated in seedless progeny compared with seeded ones. In addition, disease resistance proteins are important during plant defense responses. We found that many functional variant-containing DEGs encoded disease resistance proteins; these variants comprised nonsynonymous SNP, frameshift deletion, frameshift insertion and stopless (Figure 5b).The expression pattern of all the disease resistance protein associated DEGs at the three development stages was nearly the same, irrespective of the functional effects of the variants, with most DEGs up-regulated in the ovules of seedless progeny compared with seeded progeny. Many DEGs containing functional variants (nonsynonymous SNP, frameshift deletion, frameshift insertion, stopgain and stopless) were identified within cytochrome P450-like genes, and most of these DEGs were up-regulated in seedless progeny compared to seeded ones at all three developmental stages. It is noteworthy that, irrespective of whether the gene was up- or down-regulated, the expression pattern of these DEGs showed no differences among all three developmental stages.
Moreover, many DEGs identified as transcription factors (TF) were found to contain functional variants including nonsynonymous SNP, frameshift deletion, stopgain and stopless (Figure 6). These TFs were classified into 15 families, among which MYB and BHLH were the top two families with the most number of DEGs containing function variants. Expression of DEGs associated with 10 TF families (WRKY, BHLH, SBP, AP2, Myb, HB, TCP, HSF, CCAAT and GRAS) were found to be up-regulated at all three ovule developmental stages in seedless progeny compared to seeded progeny. In contrast, an assortment of genes were down-regulated at all three ovule developmental stages in seedless progeny compared to seeded ones; these included one NAC, two MADS-box, one GATA, one bZIP, and one ERF family gene.
DEGs containing functional variants were also classified into 12 other functions associated with seed development, including heavy-metal-associated protein, cellulose-related protein, ankyrin repeat-containing protein, senescence-associated protein, hormone-related protein, calmodulin, methyltransferase, cell cycle, ABC transporter protein, thaumatin, aspartic proteinase and flavonoid synthase. Expression pattern of these DEGs was shown in Figure 7. Two kinds of functional variants were identified in heavy-metal-associated protein (nonsynonymous SNP and frameshift insertion) and methyltransferase (nonsynonymous SNP and frameshift deletion), while only one kind of function variant (nonsynonymous SNP) was identified in others. Expression of DEGs encoding cellulose-related protein, ankyrin repeat-containing protein, senescence-associated protein, calmodulin and cell cycle were up-regulated at all three ovule developmental stages in seedless progeny compared to seeded ones, while expression of DEGs encoding several of the other functions were down-regulated. Except for one DEG encoding methyltransferase which was found to be down-regulated at Stage 1 but up-regulated at Stages 2 and 3, all other DEGs showed consistency in expression pattern among all three ovule developmental stages.
Identification of seedless candidate genes based on SNP-index analysis of two extreme pools
For each progeny bulk, the frequency of each SNP variant (SNP-index) was calculated, and a delta SNP-index was determined (Additional file 5: Figure S2). A number of 3 peaks on chromosomes 11, 13 and 17, as well as 3 valleys on chromosomes 6, 7 and 8 under the threshold for delta-SNP index were identified. A total of 7,923 SNPs were identified within the peak regions, while 360 SNPs were identified in the valley regions. A total of 13 nonsynonymous SNPs were identified in the valley regions, including 1 variant on Chromosome 6 associated with 1 gene, 6 variants on Chromosome 7 associated with 4 genes, and 6 variants on Chromosome 8 associated with 4 genes (Additional file 6:Table S4). In the peak regions, a total of 38 functional variants were identified (Additional file 7:Table S5), among which 35 variants (34 nonsynonymous SNP and 1 stopgain) were located on Chromosome 13 and were associated with 18 genes. The other 3 nonsynonymous SNP were found on Chromosome 17 and associated with 2 genes.
All in all, a total of 33 genes were identified as potential preliminary candidates based on the SNP-index analysis, among which some genes were found to contain more than one functional variant. We examined the expression pattern of these genes during ovule development in previously generated datasets derived from the same progeny. It is noteworthy that the arginine–197-to-leucine substitution on Chromosome 18 in VviAGL11 [18] was also identified in our results, but as this SNP was not performed good enough in our SNP-index analysis compared to other selected genes. And consistent with the expression pattern reported, VviAGL11 was not identified as significant DEGs according to transcriptome analysis. In consideration of all this, we did not carry further research on this gene. And nine candidate genes were further yield by the standard of being DEGs at at least one ovule developmental stage of 3 and corresponding information can be found in Table 3 and Additional file 8: Figure S3. One of these nine genes, G2,was down-regulated at all three ovule developmental stages in seedless progeny compared to seeded ones. Expression of the other eight candidate genes was up-regulated at at least one ovule developmental stage in seedless progeny compared to the seeded ones.
Expression analysis of candidate genes related to seedlessness
To further investigate a potential role for these nine candidate genes in ovule development and seedlessness and explore whether the different expression pattern between seeded and seedless progeny was widely applied, we analyzed their expression in different grapevine tissues (Figure 8) and in ovules at different development stages (Figure 9) from seeded and seedless cultivars. All candidate genes were expressed in reproductive tissues (flowers, tendrils and fruits). Both the G1 and G2 genes show strong expression in reproductive tissues, compared with nutritional tissues, whereas the G8 gene was expressed more strongly in nutritional tissues (Figure 8). The expression pattern of G1, G8 and G9 genes were similar between seeded and seedless varieties, while G5, G6 and G7 genes were differentially expressed. For example, G5 expression was relatively high in fruit of ‘Red Globe’ and leaves of ‘Thompson Seedless’, while G6 and G7 were relatively highly expressed in the root of ‘Thompson Seedless’.
As nine candidate genes showed different expression pattern during ovule development of seeded and seedless progeny, to further test whether this different expression pattern also exist in multiple grape cultivars, and to evaluate whether the observed cultivar-associated difference in expression pattern might be related to seedlessness, we analyzed expression of the nine candidate genes in two seeded (‘Kyoho’ and ‘Red Globe’) and two seedless cultivars (‘Thompson Seedless’ and ‘Flame Seedless’) (Figure 9). The expression pattern of most of the candidate genes was consistent with the transcriptome data, except that some genes showed differences at specific stages, likely related to cultivar difference (Figure 9). Six of the candidate genes exhibited a consistent expression pattern at all three ovule developmental stages between cultivars: four genes (G4, G7, G8 and G9) showed higher expression in seedless cultivars compared to seeded ones, while two genes (G2 and G5) showed the opposite. In transcriptome data (Table 3), expression of G5 gene was first up-regulated at stage 1 then down-regulated at stage 2 and 3 (seedless progeny/seeded progeny), while G5 genewas down-regulated (seedless cultivars/seeded cultivars) at all three developmental stages between multiple cultivars. Despite G5 gene showing some difference, the expression pattern of the remaining five candidate genes was consistent with transcriptome data, which indicates the widely applicability of different expression during ovule development of seeded and seeded cultivars and underscored the utility of these candidate genes underlying seed phenotype differences.