QTL-seq Analysis of Seed Size Trait in Grape Reveal New Insight on The Genetics of Seedlessness


 Background Seedlessness in grape ( Vitis vinifera ) is an important commercial trait for both the fresh and drying markets. However, despite numerous studies, the mechanisms and key genes regulating grape seedlessness are mostly unknown. Results In this study, we sequenced the genomes of the V. vinifera seeded cultivar ‘Red Globe’, the seedless cultivar ‘Centennial’, as well as the derived hybrids. Nonsynonymous SNPs were identified and analyzed with respect to published transcriptome data. All the DEGs containing nonsynonymous SNPs were further analyzed in terms of expression patterns, Gene Ontology and pathway enrichment. A potential QTL region associated with seed size was characterized based on SNP indices for both seedless and seeded progeny. Expression analysis of candidate genes during ovule development in multiple seeded and seedless grape cultivars further indicates their potential function in grape seed development. Conclusion In summary, DEGs containing nonsynonymous SNPs were mainly protein kinase, transcription factors, cytochrome P450 and other factors related to seed development, which were mainly involved in biological processes like hormone balance, seed coat and endosperm development, reproductive organ development, oxidation and reduction, senescence and cell death. Based on SNP-index and expression pattern analysis, three genes were further identified as potential seedlessness-related genes. Overall the data cast light on the differences of seed development between seeded and sedless progeny in perspective of both functional variants and expression pattern,which provides valuable candidates for future functional study.

been identified. A gene named Chloroplast Chaperonin 21 ( ch-Cpn21) was found to be differentially expressed in the inflorescence of seedless and seeded grapes, and silencing of this gene promoted seed abortion in tobacco and tomato fruits [11].
In addition, an EF-hand calcium-binding protein designated VvCBP1 was reported to be related to seedlessness, and silencing of this gene in tomato reduced seed number in fruit [12]. And VvAGL11, an ortholog of the AGAMOUS-like 11 gene of Arabidopsis, was identified to be the major function candidate gene for seedlessness [13]. Moreover, the grape berry-specific basic helix-loop-helix transcription factor VvCEB1 was reported to affect cell size [14]. In addition, some MADS-box genes, which show strong homology to an Arabidopsis ovule identity gene, were found to be differentially expressed during ovule development in multiple seeded and seedless grape cultivars, indicating a potential role in ovule development [15].
Moreover, nowdays omics analysis has cast light on mechanism related to grape seedlessness, evidence showed that parthenocarpy in the grapevine seedless somatic variant Corinto bianco was caused by the absence of a mature macrogametophyte which was probably due to meiosis arrest [16]. A transcriptome analysis of ovule development in grape hybrids was used to develop a network map of possible mechanisms influencing seed size [17]. Recently, the major origin of seedless grapes was found to be associated with a missense mutation in the MADSbox gene VviAGL11 with two F1 mapping populations [18]. In general, despite all the progress made in molecular biology underlying grape seedlessness, the grape ovule identity gene and corresponding regulation mechanism was still not clear enough.
Recently, decreasing sequencing costs and increasing quality of output has facilitated the generation of large amounts of sequence data. Whole-genome resequencing combined with QTL analyses (QTL-seq) has enabled rapid identification of the genomic regions underlying specific traits [19]. For example, QTL-seq in rice successfully identified QTLs associated with seedling vigor and resistance to the fungal rice blast disease [20]. Likewise, sequencing of bulkedsegregant populations has contributed to a number of scientific studies. For example, a study combining transcriptome sequencing (RNA-seq) with BSA developed single nucleotide polymorphisms (SNPs) in polyploidy species [21], whereas another identified gene associated with disease resistance against enteric septicemia in catfish [22]. In addition, deep genome sequencing of bulkedsegregant populations identified a MIXTA-like R2R3 MYB gene influencing epidermal cell development and carotenoid pigmentation in Mimulus lewisii [23].
In this study, we exploited whole genome sequencing to identify genomic regions co-segregating with the seedless trait in bulked populations derived from a cross between a seedless and seeded cultivar. Sequence variants likely to affect gene function were identified and analyzed with respect to previously published ovule transcriptome data [17] in order to identify variants and genes potentially affecting seed size. The whole experiment design was shown in Figure 1. The results of our study will enhance the knowledge of the fundamental mechanisms of grape seed development and have potential to accelerate breeding efforts for seedless cultivars.

Data Filtering and High Quality Clean Reads
Mapped with Reference Genome The base composition and quality value before and after filtering is shown in Additional file 1: Figure S1, Additional file 2, Table S1 and Additional file 3: Table  S2. The Q20 value of all samples was >98.9% and the Q30 value 96%. The alignment rates to the reference genome for each sample, including the parental cultivars 'Red Globe' (R.G) and 'Centennial seedless' (C), and pooled seeded (S) and seedless (SL) progeny, were >95% (Table 1).

Identification and Classification of Single
Nucleotide Polymorphisms (SNPs) and Small Indels Based on Genome Re-sequencing To catalog genetic diversity in grapevine and its potential influence on gene function in the context of ovule development, we carried out whole genome resequencing of the parental cultivars 'Red Globe' and 'Centennial seedless', as well as their pooled seeded and seedless progeny. A total of 6,379,614 SNPs and 745,433 small indels were identified among these four samples. All of the identified variants were indexed based on their genomic position (Figure 2a). Most variants were found in the intergenic regions, followed by intronic, upstream, downstream and exonic regions. The least number of variants were found in the 5' UTR regions.
A total of 222,652 variants were located within exons. We classified these further based on their potential effects on gene function ( Figure 2b). We focused our attention on those variants predicted to cause amino acid changes in protein coding genes (frameshift, substitution, premature termination, and loss of termination).
Four kinds of transition and eight kinds of transversions were identified, with C to G transversion and G to A transition the most common. The ratio of transition to transversion was 0.48, which is close to the ratio of 0.5 in the ideal condition.
Gene Ontology (GO) and Pathway Analysis of

Identified DEGs Containing Functional Variants
To identify functional variants most closely related to seedlessness, we analyzed these results in the context of transcriptome data using the same progeny as plant materials, aiming to further identify functional variations which may alter the expression of the gene. Functional variations were identified from genome data and the number of corresponding associated genes are shown in Table 2. In addition, a relative comprehensive statistical analysis was conducted in combination of both genome and transcriptome data, and the number of functional variants containing genes identified as DEGs at least at one stage and at all three developmental stages were counted based on function classification of variants ( Table 2). With respect to ovule development comparing the seeded and seedless progeny at Stage 1, Stage 2 and Stage 3, a total of 2,425, 2,988 and 2,328 DEGs containing functional variants were identified, respectively (Additional file 4 Table S3). At each ovule developmental stage, the number of up-regulated DEGs (seedless progeny/seeded progeny) were found to be higher than down-regulated ones. Moreover, the number of DEGs containing functional variations were found to be highest at Stage 2, irrespective of whether the gene was up-or down-regulated.
To explore the biological processes and pathways associated with DEGs containing functional variations, we performed a GO and pathway analysis of the DEGs containing functional variants at three developmental stages. In general, results of both enriched GO and pathway were basically consistent and focused on aspects

Containing Functional Variants
We focused further attention on functional variant-containing genes that were differentially expressed in the developing ovules, between seeded and seedless cultivars, at all three ovule developmental stages ( Table 2). We estimated gene function based on their annotation, and focused on genes encoding potential disease resistance protein, protein kinase, transcription factor, cytochrome P450 and other factors affecting seed development.
Functional variants within genes encoding protein kinases were found to be nonsynonymous SNP, frameshift deletion, frameshift insertion, stopgain and stopless ( Figure 5a). The expression patterns of these DEGs were almost similar at all three ovule developmental stages, i.e., mainly up-regulated in seedless progeny compared with seeded ones. In addition, disease resistance proteins are important during plant defense responses. We found that many functional variant-containing DEGs encoded disease resistance proteins; these variants comprised nonsynonymous SNP, frameshift deletion, frameshift insertion and stopless ( Figure  5b).The expression pattern of all the disease resistance protein associated DEGs at the three development stages was nearly the same, irrespective of the functional effects of the variants, with most DEGs up-regulated in the ovules of seedless progeny compared with seeded progeny. Many DEGs containing functional variants (nonsynonymous SNP, frameshift deletion, frameshift insertion, stopgain and stopless) were identified within cytochrome P450-like genes, and most of these DEGs were up-regulated in seedless progeny compared to seeded ones at all three developmental stages. It is noteworthy that, irrespective of whether the gene was up-or down-regulated, the expression pattern of these DEGs showed no differences among all three developmental stages.
Moreover, many DEGs identified as transcription factors (TF) were found to contain functional variants including nonsynonymous SNP, frameshift deletion, stopgain and stopless ( Figure 6). These TFs were classified into 15 families, among which MYB and BHLH were the top two families with the most number of DEGs containing function variants. Expression of DEGs associated with 10 TF families (WRKY, BHLH, SBP, AP2, Myb, HB, TCP, HSF, CCAAT and GRAS) were found to be up-regulated at all three ovule developmental stages in seedless progeny compared to seeded progeny.
In contrast, an assortment of genes were down-regulated at all three ovule developmental stages in seedless progeny compared to seeded ones; these included one NAC, two MADS-box, one GATA, one bZIP, and one ERF family gene.
DEGs containing functional variants were also classified into 12 other functions associated with seed development, including heavy-metal-associated protein, cellulose-related protein, ankyrin repeat-containing protein, senescence-associated protein, hormone-related protein, calmodulin, methyltransferase, cell cycle, ABC transporter protein, thaumatin, aspartic proteinase and flavonoid synthase.  Table S4). In the peak regions, a total of 38 functional variants were identified (Additional file 7 Table S5), among which 35 variants (34 nonsynonymous SNP and 1 stopgain) were located on Chromosome 13 and were associated with 18 genes. The other 3 nonsynonymous SNP were found on Chromosome 17 and associated with 2 genes.
All in all, a total of 33 genes were identified as potential preliminary candidates based on the SNP-index analysis, among which some genes were found to contain more than one functional variant. We examined the expression pattern of these genes during ovule development in previously generated datasets derived from the same progeny. It is noteworthy that the arginine-197-to-leucine substitution on Chromosome 18 in VviAGL11 [18] was also identified in our results, but as this SNP was not performed good enough in our SNP-index analysis compared to other selected genes. And consistent with the expression pattern reported, VviAGL11 was not identified as significant DEGs according to transcriptome analysis. In consideration of all this, we did not carry further research on this gene. And nine candidate genes were further yield by the standard of being DEGs at at least one ovule developmental stage of 3 and corresponding information can be found in Table 3 and Additional file 8: Figure S3. One of these nine genes, G2,was downregulated at all three ovule developmental stages in seedless progeny compared to seeded ones. Expression of the other eight candidate genes was up-regulated at at least one ovule developmental stage in seedless progeny compared to the seeded ones.
Expression analysis of candidate genes related to seedlessness To further investigate a potential role for these nine candidate genes in ovule development and seedlessness and explore whether the different expression pattern between seeded and seedless progeny was widely applied, we analyzed their expression in different grapevine tissues ( Figure 8) and in ovules at different development stages (Figure 9) from seeded and seedless cultivars. All candidate genes were expressed in reproductive tissues (flowers, tendrils and fruits). Both the G1 and G2 genes show strong expression in reproductive tissues, compared with nutritional tissues, whereas the G8 gene was expressed more strongly in nutritional tissues (Figure 8). The expression pattern of G1, G8 and G9 genes were similar between seeded and seedless varieties, while G5, G6 and G7 genes were differentially expressed. For example, G5 expression was relatively high in fruit of 'Red Globe' and leaves of 'Thompson Seedless', while G6 and G7 were relatively highly expressed in the root of 'Thompson Seedless'.
As nine candidate genes showed different expression pattern during ovule development of seeded and seedless progeny, to further test whether this different expression pattern also exist in multiple grape cultivars, and to evaluate whether the observed cultivar-associated difference in expression pattern might be related to seedlessness, we analyzed expression of the nine candidate genes in two seeded ('Kyoho' and 'Red Globe') and two seedless cultivars ('Thompson Seedless' and 'Flame Seedless') ( Figure 9). The expression pattern of most of the candidate genes was consistent with the transcriptome data, except that some genes showed differences at specific stages, likely related to cultivar difference (Figure 9). Six of the candidate genes exhibited a consistent expression pattern at all three ovule developmental stages between cultivars: four genes (G4, G7, G8 and G9) showed higher expression in seedless cultivars compared to seeded ones, while two genes (G2 and G5) showed the opposite. In transcriptome data ( Table 3), expression of G5 gene was first up-regulated at stage 1 then down-regulated at stage 2 and 3 (seedless progeny/seeded progeny), while G5 genewas down-regulated (seedless cultivars/seeded cultivars) at all three developmental stages between multiple cultivars. Despite G5 gene showing some difference, the expression pattern of the remaining five candidate genes was consistent with transcriptome data, which indicates the widely applicability of different expression during ovule development of seeded and seeded cultivars and underscored the utility of these candidate genes underlying seed phenotype differences.

Discussion
In this study, we combined whole genome sequencing with bulk segregation analysis of seeded and seedless progeny derived from the seeded maternal parent 'Red Globe' and the seedless paternal parent 'Centennial seedless' to identify genes with potential function in ovule/seed development. These genes were further assessed in relation to published ovule transcriptome data using the same plant progeny materials. SNP-index analysis combined with expression profiles resulted in the identification of potential candidate genes containing function variants. This adds to our knowledge of grape seedlessness and may contribute to seedless grape breeding in the future.

Overall Differences of DEGs Containing
Functional Variants between Seeded and Seedless Grape Progeny DEGs containing functional variants were identified by combining data from genome sequencing and RNA-seq using the same grape progeny. Further functional analysis showed that overall differences of these DEGs between seeded and seedless grape progeny were mainly enriched in important biological processes and pathways involved in seed development, including reproductive organ development, hormone balance regulation, seed coat development, endosperm development, senescence and cell death (Figure 3 and 4). In addition, many DEGs containing functional variants were identified as protein kinases, TFs, TRs, cytochrome P450 and genes related to seed development. Protein kinases are key factors in plant biological processes such as energy metabolism and signal transduction. Protein kinases participate in hormonal signaling cascades in plants, and there have been in increasing reports of protein kinases regulating plant reproductive tissue development. In Petunia, a Pollen-Expressed Receptor-like Kinase gene (PRK1) was found to play a role in signal transduction events during pollen development and pollination [24]. In Arabidopsis thaliana, a receptor-like protein kinase, HAESA, was reported to control floral organ abscission [25], and GSK3/Shaggy like kinase AtSK3-2 was found to play an important role in the control of cell elongation during flower development [26]. Based on the fact that plant protein kinases participate in the regulation of reproductive tissue development, we infer that DEGs containing functional variants, identified as protein kinases, may be related to phenotypic differences of ovules between seeded and seedless progeny.
Other factors influencing seed development include cytochrome P450s, ankyrin repeat-containing proteins, calmodulins, ABC transporter proteins and so on. In plants, ankyrin repeat-containing proteins mediate protein-protein interactions to regulate signal transduction related to development and plant defense response. In lily, an ankyrin repeat-containing protein characterized as a ubiquitin ligase was found to be required for pollen germination and pollen tube growth [27]. Moreover, plant calcium ions can influence pollen tube growth, and the VvCBP1 gene encoding an EF-hand calcium-binding protein was reported to be associated with fruit seedlessness of grapevine [12]. Other factors such as ABC transporter proteins have also been reported to influence plant reproductive tissue development. For example, an ATP-binding cassette transporter from rice, OsABCG15, was found to be required for anther development and pollen fertility [28]. Because the DEGs containing functional variants identified in this study represented these various functional classes, we propose that they may participate in biological processes and pathways involved in seed development and influence the seedless trait.

Some DEGs Containing Functional Variants were
Identified as Candidates associated with Grape Seed Size by QTL-Seq Analysis Based on the potential function of candidate genes, we further carried out gene expression analysis during ovule development among multiple cultivars. The results, in combination with both gene annotation and previous related functional studies, suggest that three genes in particular, G2, G4 and G8, may be determinants of grape seedlessness.
In Arabidopsis, the AUXIN RESPONSE FACTOR 2 ( ARF2) gene links auxin signalling, cell division, and the size of seeds and other organs [29]. As auxin regulators, ARFs can interact with AUX/IAA proteins to repress the content of auxin and regulate corresponding subsequent biological processes. The grapevine VvCEB1 gene, which was reported to influence cell size, functions by influencing auxin metabolism and cell signaling [14]. Prior to fertilization, ovary growth is blocked by the repressive action of AUX/IAA-ARF complexes on auxin-responsive genes. With pollination and fertilization, the level of auxin in ovules or ovary increases and derepresses ovary growth, which is necessary for fruit initiation [30]. ARFs function to guarantee the corresponding biological process proceeds in an orderly manner to ensure seed viability. For example, in Arabidopsis and tomato, mutation of ARF8 and SlARF7 both lead to seedlessness [30,31]. In our results, the G2 gene, identified as encoding a putative auxin response factor, was differently expressed in ovules between seeded and seedless progeny, and this expression pattern was also consistent in ovules of multiple other cultivars. In view of the reported function of ARFs, this implies that G2 may be a key gene related to seedlessness.
In addition, the G4 was annotated as a putative cytochrome P450 monooxygenase. This class of gene is known to participate in the balance regulation of GA and auxin.
In rice, ELONGATED UPPERMOST INTERNODE ( EUI) encodes a cytochrome P450 monooxygenase that epoxidizes gibberellins to reduce their concentration [32]. The Arabidopsis cytochrome P450 monooxygenasegene 71A13 and YUCCA were reported to be involved in auxin synthesis [33,34]. These studies indicate that the influence of monooxygenases on plant growth and development is via regulating GA and auxin content. In the seedless vs. seeded progeny populations used for transcriptional profiling, the content of both GA and auxin was significantly higher in seedless progeny compared to seeded ones during ovule development [17]. Also, overexpression of a gene encoding a cytochrome P450, CYP78A9, caused large and seedless Arabidopsis fruit [35]. In view of these reports, we speculate that the content of GA and auxin is critical for seed formation, and that G4 is a key gene associated with seedlessness. Moreover, the G8 gene was annotated as a putative Aluminum-activated Malate Transporter (ALMT). Most studies about ALMT demonstrated its important function in malic acid transport [36]. In apple, a natural mutation leading to truncation of the ALMT protein at the Ma locus was reported to be associated with low fruit acidity [37], but to our knowledge, there are no reports of ALMT related to seed development. Our results show that G8 is found within the genomic interval identified by QTL-seq analysis as influencing seedlessness, that the expression of the G8 gene is significantly different during ovule development in multiple seedless vs seeded cultivars. These results implicate G8, and ALMT activity as key factors influencing seedlessness, and thus G8 is an excellent candidate for further functional studies.

Conclusions
In summary, genome sequencing analysis in combination with transcriptome data was conducted using grape progeny derived from the seeded maternal parent 'Red Globe' and the seedless paternal parent 'Centennial seedless'. All DEGs containing functional variants were identified, so as relate biological processes and pathways.
Based on SNP index and expression analysis, a total of three genes in the QTL region were identified as candidates related to grape seedlessness, which provides valuable candidates for further functional research and adds our knowledge for seedless grape breeding.

Plant Material
A total of 114 F1 individuals were obtained from a cross between the seeded maternal parent 'Red Globe' (V. vinifera) and the seedless paternal parent The resulting raw Illumina reads were processed to remove duplicated read pairs (defined as those having identical bases in the first 100 bp of both left and right reads). Illumina adaptor sequences and low quality sequences were trimmed from the reads using Trimmomatic [38]. Read pairs with read < 40 bp were excluded from further analysis. Resulting paired-end reads were mapped to a grape reference genome (PN40024; 12X) [39] using BWA [40]. Only reads that mapped uniquely (i.e. having one single best hit) to the genome were used for SNP calling. SNPs and small indels (1-5 bp) were identified based on the alignment information in the pileup format generated by SAMtools [41].
DEGs containing nonsynonymous SNPs were identified using published transcriptome data (NCBI SRA accession SRP081137), previously generated from the same progeny populations [17]. GO and pathway enrichment analysis were performed using Plant MetGenMAP [42]. The raw sequencing data has been deposited in NBCI SRA under the accession number SRP174903.

QTL Analysis of Seed Size based on SNP Index
A SNP index was calculated for each SNP, and sliding window analysis was applied with a 2 Mb window size and 100 kb increment. The average SNP index within the window was used for sliding window plot drawing, and SNPs with a number less than 10 were excluded. To filter spurious SNPs called due to sequencing and/or alignment errors, SNP positions with SNP-index of <0.3 and read depth <7 were also excluded.
A △(SNP-index), derived from SNP-index for each bulked pool, was further calculated [20]. The QTL region associated with seed size was then identified based on an arbitrary threshold of the top 0.5% of △(SNP-index) [43].

Quantitative RT-PCR Analysis
Quantitative real-time RT-PCR was carried out using a Bio-Rad iQ5 thermocycler

Availability of data and material
The raw sequencing data has been deposited in NBCI SRA under the accession number SRP174903. The datasets supporting the conclusions of this article are included within the article and its additional files.

Competing interests
The authors declare that they have no competing interests.   Overview of the experimental design used in this study. Scale bars are 1 cm.  Real-time quantitative PCR expression levels of candidate genes containing functional varian

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.
Supplemental files.rar