QTL-seq Analysis of Seed Size Trait in Grape Provides New Molecular Insight on Seedlessness

Background: Seedlessness in grape (Vitis vinifera) is an important commercial trait for both the fresh and drying markets. However, despite numerous studies, the mechanisms and key genes regulating grape seedlessness are mostly unknown. Results: In this study, we sequenced the genomes of the V. vinifera seeded cultivar ‘Red Globe’, the seedless cultivar ‘Centennial’, as well as the derived hybrids. Nonsynonymous SNPs were identied by genome sequencing and analyzed with published transcriptome data. Nonsynonymous SNPs were found in genes which were differential expressed during seeded and seedless grape ovule development, and corresponding Gene Ontology and pathway enrichment were conducted. A potential QTL region associated with seed size was characterized based on SNP-index for both seedless and seeded progeny. Expression analysis of SNP associated candidate genes during ovule development in multiple seeded and seedless grape cultivars were conducted and 3 SNP were further subjected to SNaPshot analysis in 40 progeny for validation. Conclusion: In summary, nonsynonymous SNPs happened in genes related seed development, which identied to be protein kinase, transcription factors, cytochrome P450 and showed differential expression during seeded and seedless grape ovule development. These nonsynonymous SNP associated genes were mainly involved in biological processes like hormone balance, seed coat and endosperm development, reproductive organ development, oxidation and reduction, senescence and cell death. Based on SNP-index and expression pattern analysis, candidate genes in the QTL region were identied and a SNP in G8 showed 67.5% eciency in the grape progeny validation. Overall, the data cast light on the differences of seed development between seeded and seedless progeny in genomic level, which provides valuable resources for future functional study and grape breeding.


Background
Seedlessness in grape is an important focus of research, since seedless fruit are preferred for fresh consumption and processing. Based on distinctions related to pollination and fruit development, seedlessness in grape can be classi ed as parthenocarpy, where fertilization is absent, or stenospermocarpy, characterized by normal fertilization with subsequent abortion [1,2]. Most commercial seedless grape cultivars are stenospermocarpic, and since this trait is heritable which can pass hereditary feature to the next generation and usually used as parent in breeding, such as the typical seedless cultivar 'Thompson Seedless'.
Over the past several decades, signi cant progress has been made in characterizing the genetic mechanisms underlying grape seedlessness. On one hand, research has been focused on mapping and identi cation of genetic loci in uencing this trait, aiming to develop molecular markers to increase the speed and e ciency of seedless grape breeding. Seedlessness has been reported to be a quantitative trait mapping for at least seven loci [3][4][5][6], comprising one major-effect locus and several complementary recessive loci [4]. One Random Ampli ed Polymorphic DNA (RAPD) marker, UBC-269500, was found to be linked to a major seedlessness locus using bulked segregant analysis (BSA) [7]. Two RAPD markers were obtained and tightly linked to the dominant I gene, and the closest marker was further developed into a SCAR (Sequence Characterized Ampli ed Region) marker, designated SCC8 [8,9]. In addition, another SCAR marker, SCF27, was identi ed [10]. Grape seedlessness was further studied using QTL (Quantitative Trait Locus) analyses, and a major QTL VMC7F2 underlying seed fresh weight was developed [3]. On the other hand, an assortment of candidate genes associated with grape seed development has also been identi ed. A gene named Chloroplast Chaperonin 21 (ch-Cpn21) was found to be differentially expressed in the in orescence of seedless and seeded grapes, and silencing of this gene promoted seed abortion in tobacco and tomato fruits [11].
In addition, an EF-hand calcium-binding protein designated VvCBP1 was reported to be related to seedlessness and silencing of this gene in tomato reduced seed number in fruit [12]. And VvAGL11, an ortholog of the AGAMOUS-like 11 gene of Arabidopsis, played an important role in stenospermocarpic seedlessness in grapevine [13]. Moreover, the grape berry-speci c basic helix-loop-helix transcription factor VvCEB1 was reported to affect cell size [14]. In addition, some MADS-box genes, which show strong homology to an Arabidopsis ovule identity gene, were found to be differentially expressed during ovule development in multiple seeded and seedless grape cultivars, indicating a potential role in ovule development [15]. Moreover, nowdays omics analysis has cast light on mechanism related to grape seedlessness, evidence showed that parthenocarpy in the grapevine seedless somatic variant Corinto bianco was caused by the absence of a mature macrogametophyte which was probably due to meiosis arrest [16]. A transcriptome analysis of ovule development in grape hybrids was used to develop a network map of possible mechanisms in uencing seed size [17]. Recently, the major origin of seedless grapes was found to be associated with a missense mutation in the MADS-box gene VviAGL11 with two F1 mapping populations [18]. In general, despite all the progress made in molecular biology underlying grape seedlessness, the grape ovule identity gene and corresponding regulation mechanism was still not clear enough.
Recently, decreasing sequencing costs and increasing quality of output has facilitated the generation of large amounts of sequence data. Whole-genome resequencing combined with QTL analyses (QTL-seq) has enabled rapid identi cation of the genomic regions underlying speci c traits [19]. For example, QTL-seq in rice successfully identi ed QTLs associated with seedling vigor and resistance to the fungal rice blast disease [20]. Likewise, sequencing of bulked-segregant populations has contributed to a number of scienti c studies. For example, a study combining transcriptome sequencing (RNA-seq) with BSA developed single nucleotide polymorphisms (SNPs) in polyploidy species [21], whereas another identi ed gene associated with disease resistance against enteric septicemia in cat sh [22]. In addition, deep genome sequencing of bulked-segregant populations identi ed a MIXTA-like R2R3 MYB gene in uencing epidermal cell development and carotenoid pigmentation in Mimulus lewisii [23].
In this study, we exploited whole genome sequencing to identify genomic regions co-segregating with the seedless trait in bulked populations derived from a cross between a seedless and seeded cultivar. Variants were identi ed by genome sequencing and analyzed with respect to previously published ovule transcriptome data [17] in order to identify variants and genes potentially affecting seed size. The whole experiment design is shown in Figure 1. The results of our study will enhance the knowledge of the fundamental mechanisms of grape seed development and have potential to accelerate breeding efforts for seedless cultivars.

Data Filtering and High Quality Clean Reads Mapped with Reference Genome
The base composition and quality value before and after ltering is shown in Additional le 1: Figure S1, Additional le 2: Table S1, and Additional le 3: Table S2. The Q20 value of all samples was >98.9% and the Q30 value 96%. The alignment rates to the reference genome for each sample, including the parental cultivars 'Red Globe' (R.G) and 'Centennial seedless' (C), and pooled seeded (S) and seedless (SL) progeny, were >95% (Table 1). The depth of sequencing coverage is 16X in 'C', 19X in 'R', 15X in 'SL', and 17X in 'S'. To catalog genetic diversity in grapevine and its potential in uence on ovule development, we carried out whole genome resequencing of the parental cultivars ('Red Globe' and 'Centennial seedless'), and their pooled seeded and seedless progeny. A total of 6,379,614 SNPs and 745,433 small indels were identi ed. All of the identi ed variants were indexed based on their genomic position ( Figure 2a). Most of the variants were found in the intergenic regions, followed by intronic, upstream, downstream and exonic regions. The least number of variants were found in the 5' UTR regions. A total of 222,652 variants were located within exons. These were further classi ed based on their potential effects on gene function (Figure 2b). And more attention was given to those variants predicted to cause amino acid changes in protein coding genes (frameshift, substitution, premature termination, and loss of termination). Four kinds of transitions and eight kinds of transversions were characterized, with C to G transversion and G to A transition the most common. The ratio of transition to transversion was 0.48, which is close to the ideal ratio of 0.5.
Gene Ontology (GO) and Pathway Analysis of Functional Variants associated DEGs Functional variations were identi ed from genome data and the number of genes containing these variants are shown in Table 2. In addition, a relative comprehensive statistical analysis was conducted in combination of both genome and transcriptome data, which used the same progeny population, and the number of functional variants containing genes identi ed as DEGs at least at one stage and at all three developmental stages were counted based on function classi cation of variants (Table 2). With respect to ovule development comparing the seeded and the seedless progeny at Stage 1, Stage 2 and Stage 3, a total of 2,425, 2,988 and 2,328 functional variants containing genes showed signi cant differential expression pattern during seeded and seedless progeny ovule development, respectively (Additional le 4 Table S3). At each ovule developmental stage, the number of up-regulated DEGs (seedless progeny/seeded progeny) were found to be higher than down-regulated ones. Moreover, the number of DEGs containing functional variations were found to be highest at Stage 2, irrespective of whether the gene was up-or down-regulated. To explore the biological processes and pathways in which the genes with functional variations involved, we performed a GO and pathway analysis of the DEGs containing functional variants at three developmental stages. In general, results of both enriched GO and pathway were basically consistent and focused on aspects involved in seed development. A total of 162 GO terms were found to be signi cantly enriched at all three developmental stages ( Figure 3a). The GO terms associated with up-regulated DEGs were mainly involved in biological processes like 'plant defense', 'regulation of reproductive development', 'regulation of hormone balance', 'seed coat development', 'endosperm and seed development', and 'cell death' (Figure 3b). On the other hand, GO terms associated with down-regulated DEGs were mainly related to 'lignin biosynthetic process', 'secondary metabolic process', 'oxidation reduction' and 'iron ion transport' (Figure 3b). A total of 124 pathways were found to be enriched at all three ovule developmental stages ( Figure   4a), and these enriched pathways were mainly related to the regulation of hormone balance ('cytokinins degradation', 'cis-zeatin biosynthesis', 'jasmonic acid biosynthesis' and 'ethylene biosynthesis from methionine'), seed coat development ( 'cellulose biosynthesis' and 'suberin biosynthesis'), secondary metabolic biosynthesis (' avonoid biosynthesis') and oxidation-reduction ('glutathione-mediated detoxi cation') ( Figure 4b).

Functional and Expression Analysis of DEGs Containing Functional Variants
We focused further attention on functional variant-containing genes that were differentially expressed in the developing ovules, between seeded and seedless cultivars, at all three ovule developmental stages (Table 2). We estimated gene function based on their annotation, and focused on genes encoding potential disease resistance protein, protein kinase, transcription factor, cytochrome P450 and other factors affecting seed development.
Nonsynonymous SNP, frameshift deletion, frameshift insertion, stopgain and stopless were found to happen in genes encoding protein kinases (Figure 5a). The expression patterns of these DEGs with functional variants were almost similar at all three ovule developmental stages, i.e., mainly up-regulated in seedless progeny compared with seeded ones. We observed that functional variant (nonsynonymous SNP, frameshift deletion, frameshift insertion and stopless) happened in DEGs encoding disease resistance proteins( Figure 5b). Among which, the expression pattern of all the disease resistance related DEGs was almost same during three developmental stages, and most of the DEGs were up-regulated in the ovules of seedless progeny compared with seeded progeny. Functional variants (nonsynonymous SNP, frameshift deletion, frameshift insertion, stopgain and stopless) were also found to happen in DEGs encoding cytochrome P450-like proteins, and most of these DEGs were up-regulated in seedless progeny compared to seeded ones at all three developmental stages. It is noteworthy that, the expression pattern of these DEGs was almost the same during all developmental stages.
Moreover, many functional variants including nonsynonymous SNP, frameshift deletion, stopgain and stopless happen in DEGs identi ed as transcription factors (TF)( Figure 6). These TFs were classi ed into 15 families, among which MYB and BHLH were families with the most number of DEGs havining function variants. Expression of DEGs associated with 10 TF families (WRKY, BHLH, SBP, AP2, Myb, HB, TCP, HSF, CCAAT, and GRAS) were found to be up-regulated at all three ovule developmental stages in seedless progeny compared to seeded progeny. In contrast, a group of genes was down-regulated at all three ovule developmental stages in seedless progeny compared to seeded ones; these included one gene from each family of NAC, GATA, bZIP, ERF and two genes from MADS-box family.
Functional variants also happened in DEGs related to seed development, including heavy-metal-associated protein, cellulose-related protein, ankyrin repeat-containing protein, senescence-associated protein, hormone-related protein, calmodulin, methyltransferase, cell cycle, ABC transporter protein, thaumatin, aspartic proteinase and avonoid synthase. Expression pattern of these DEGs is shown in Figure 7. Two kinds of functional variants were identi ed in heavy-metal-associated protein (nonsynonymous SNP and frameshift insertion) and methyltransferase (nonsynonymous SNP and frameshift deletion), while only one kind of function variant  Table S4). In the peak regions, a total of 38 functional variants were identi ed (Additional le 7 Table S5), among which 35 variants (34 nonsynonymous SNP and 1 stopgain) were located on Chromosome 13 and were associated with 18 genes. The other 3 nonsynonymous SNP were found on Chromosome 17 and associated with 2 genes.
In general, based on the SNP-index analysis, a total of 33 genes were identi ed as potential preliminary candidates containing functional variant and some of them have more than one functional variant. We generated transcriptomics data using ovules from progeny of the same hybrid population as the genomic sample and examined the expression pattern of these genes during three different stages of ovule development. We de ned DEGs in at least one of the three stages of ovule development with functional variant as the standard and nine candidate genes were further analyzed in this way and corresponding information can be found in Table 3 and Additional le 8: Figure S3. Among nine selected genes, G2, was down-regulated during all ovule developmental stages in the seedless progeny. Whereas, the expression of the other eight candidate genes was up-regulated at least during one ovule developmental stage in seedless progeny compared to the seeded ones. It is notable that the arginine-197-to-leucine substitution on Chromosome 18 in VviAGL11 [18] was also identi ed in our results, however, compared with other genes, this gene did not perform well in our SNP-index analysis, so not listed as a key candidate gene for further analysis. According to our transcriptomic analysis, VviAGL11 is not identi ed as DEGs, and our results are consistent with the previous ndings, so we have not considered this gene for further research. Motif analysis of the mutation region was carried out, and results showed that mutation in G1 happened in 'ML domain', G2 and G3 in 'Low complexity region', and mutation in G6 caused stopgain (Additional le 9: Table S6). To further explore the potential roles of nine candidate genes in ovule development or abortion, their expression patterns were observed in different grapevine tissues ( Figure 8) and in ovules at different development stages (Figure 9) from seeded and seedless cultivars. All candidate genes were expressed in reproductive tissues ( owers, tendrils and fruits). Both the G1 and G2 genes showed strong expression in reproductive tissues, compared with nutritional tissues, whereas the G8 gene was expressed more strongly in nutritional tissues (Figure 8). The expression pattern of G1, G8, and G9 genes were similar between seeded and seedless varieties, while G5, G6 and G7 genes were differentially expressed. For example, G5 expression was relatively high in fruit of 'Red Globe' and leaves of 'Thompson Seedless', while G6 and G7 were relatively highly expressed in the root of 'Thompson Seedless'.
To further justify and validate our results, we analyzed expression of the nine candidate genes in two seeded ('Kyoho' and 'Red Globe') and two seedless cultivars ('Thompson Seedless' and 'Flame Seedless') ( Figure 9). The expression pattern of most of the candidate genes was consistent with the transcriptome data, except that some genes showed differences at speci c stages, likely related to cultivar difference ( Figure 9). Six of the candidate genes exhibited a consistent expression pattern at all three ovule developmental stages between cultivars: four genes (G4, G7, G8 and G9) showed higher expression in seedless cultivars compared to seeded ones, while two genes (G2 and G5) showed the opposite results. In transcriptome data ( Table 3), expression of G5 gene was rst up-regulated at stage 1 then down-regulated at stage 2 and 3 (seedless progeny/seeded progeny), while G5 gene was downregulated (seedless cultivars/seeded cultivars) at all three developmental stages in multiple cultivars. Despite G5 gene showing some difference, the expression pattern of the remaining ve candidate genes was consistent with transcriptome data, which indicates these candidate genes may be related to seed phenotype differences.
SNaPshot analysis of 5 SNP in 40 grape progeny A total of 3 SNPs in candidate genes (G2, G4, and G8) were selected to test our genome-sequencing results in 20 seeded and 20 seedless progeny (Table 4). Results showed that SNP in G4 failed in distinguish seed phenotype, while SNP in G2 and G8 showed 30% and 67.5% e ciency in identifying seeded and seedless progeny, respectively. And SNP in G8 may be important and close related to grape seed development.

Discussion
In this study, we combined whole genome sequencing with bulk segregation analysis of seeded and seedless progeny derived from the seeded maternal parent 'Red Globe' and the seedless paternal parent 'Centennial seedless' to identify genes with potential function in ovule/seed development. These genes were further assessed in relation to published ovule transcriptome data using the same plant progeny materials. SNP-index analysis combined with expression pro les resulted in the identi cation of potential candidate genes containing function variants. And SNaPshot analysis were conducted for further validation. This adds to the knowledge of grape seedlessness in the genomic level and will provide new molecular resources for seedless grape breeding in the future.

Overall Differences of DEGs in which Functional Variants Happened
DEGs in which functional variants happened were identi ed by combining data from genome sequencing and RNA-seq using the same grape progeny. Further functional analysis showed that overall differences of these DEGs between seeded and seedless grape progeny were mainly enriched in important biological processes and pathways involved in seed development, including reproductive organ development, hormone balance regulation, seed coat development, endosperm development, senescence and cell death (Figure 3 and 4). In addition, many DEGs containing functional variants were identi ed as protein kinases, TFs, TRs, cytochrome P450 and genes related to seed development. Protein kinases are key factors in plant biological processes such as energy metabolism and signal transduction. Protein kinases participate in hormonal signaling cascades in plants, and there have been in increasing reports of protein kinases regulating plant reproductive tissue development. In Petunia, a Pollen-Expressed Receptor-like Kinase gene (PRK1) was found to play a role in signal transduction events during pollen development and pollination [24]. In Arabidopsis thaliana, a receptor-like protein kinase, HAESA, was reported to control oral organ abscission [25], and GSK3/Shaggy like kinase AtSK3-2 was found to play an important role in the control of cell elongation during ower development [26]. Based on the fact that plant protein kinases participate in the regulation of reproductive tissue development, we infer that DEGs containing functional variants, identi ed as protein kinases, may be related to phenotypic differences of ovules between seeded and seedless progeny.
Other factors in uencing seed development include cytochrome P450s, ankyrin repeat-containing proteins, calmodulins, ABC transporter proteins and so on. In plants, ankyrin repeat-containing proteins mediate protein-protein interactions to regulate signal transduction related to development and plant defense response. In lily, an ankyrin repeat-containing protein characterized as a ubiquitin ligase was found to be required for pollen germination and pollen tube growth [27]. Moreover, plant calcium ions can in uence pollen tube growth, and the VvCBP1 gene encoding an EF-hand calcium-binding protein was reported to be associated with fruit seedlessness of grapevine [12]. Other factors such as ABC transporter proteins have also been reported to in uence plant reproductive tissue development. For example, an ATP-binding cassette transporter from rice, OsABCG15, was found to be required for anther development and pollen fertility [28]. As the DEGs containing functional variants represent various functional classes, we surmise that they may have a role in different biological processes and pathways involved in seed development or ovule abortion.

Some DEGs Containing Functional Variants were Identi ed as Candidates associated with Grape Seed Size by QTL-Seq Analysis
Based on the potential function of candidate genes, we further carried out gene expression analysis during ovule development among multiple cultivars. The results, in combination with both gene annotation and previous related functional studies, suggest that three genes in particular, G2, G4 and G8, may be determinants of grape seedlessness.
In Arabidopsis, the AUXIN RESPONSE FACTOR 2 (ARF2) gene links auxin signalling, cell division, and the size of seeds and other organs [29]. As auxin regulators, ARFs can interact with AUX/IAA proteins to repress the content of auxin and regulate corresponding subsequent biological processes. The grapevine VvCEB1 gene, which was reported to in uence cell size, functions by in uencing auxin metabolism and cell signaling [14]. Prior to fertilization, ovary growth is blocked by the repressive action of AUX/IAA-ARF complexes on auxin-responsive genes. After the pollination and fertilization, the level of auxin in ovules or ovary increases and derepresses ovary growth, which is necessary for fruit initiation [30]. ARFs function to guarantee the corresponding biological process proceeds in an orderly manner to ensure seed viability. For example, in Arabidopsis and tomato, mutation of ARF8 and SlARF7 both lead to seedlessness [30,31]. In our results, the G2 gene, identi ed as encoding a putative auxin response factor, was differently expressed in ovules between seeded and seedless progeny, and this expression pattern was also consistent in ovules of multiple cultivars.Furthermore, the G8 gene was annotated as a putative Aluminum-activated Malate Transporter (ALMT). Previously, ALMT has been well studied for its important function in malic acid transport [32]. In apple, a natural mutation leading to truncation of the ALMT protein at the Ma locus was reported to be associated with low fruit acidity [33], but to our knowledge, ALMT has not been explored for its role in seed development. According to our results, G8 is found within the genomic interval identi ed by QTL-seq analysis in uencing seedlessness. Moreover, the expression of the G8 gene is signi cantly different during ovule developmental stages in seedless and seeded cultivars. And during 40 grape progeny, the SNP in G8 showed 67.5% e ciency in distinguish seeded and seedless progeny. These results implicate that, G8 and ALMT genes may function as key factors in uencing seedlessness and the SNP in G8 may close to grape seed development.

Conclusions
In summary, QTL-seq analysis in combination with transcriptome data was conducted using grape progeny derived from the seeded maternal parent 'Red Globe' and the seedless paternal parent 'Centennial seedless'. Many functional variants happened in genes related to seed development, which involved in biological processes and pathways like hormone balance, seed coat and endosperm development, reproductive organ development, senescence and cell death. In combination with transcriptome data, many functional variants associated genes were DEGs during ovule developmet between seeded and seedless progeny. Based on SNP-index and expression analysis, genes in the QTL region were identi ed as candidates and a SNP in G8 may be close related to grape seedlessness, which provides valuable candidates for further functional research and adds our knowledge for seedless grape breeding.

Plant Material
A total of 114 F1 individuals were obtained from a cross between the seeded maternal parent 'Red Globe' (V. vinifera) and the seedless paternal parent 'Centennial seedless' (V. vinifera) in 2009, which is the same population used in a ovule transcriptome analysis [17]. All seedling plants were developed and maintained at the Zhengzhou Fruit Institute, Zhengzhou, China (113°42'N 34°43'E), and seed phenotypes were evaluated for three years. A total of 65 progeny with stable seed phenotype were obtained, including 31 seedless and 34 seeded. Of these, 19 seeded and 19 seedless individuals were randomly selected as experimental materials for genome sequencing. All the plant materials used and collected in this work comply with China's guidelines and legislation.
Sample Collection and Genomic DNA Library Construction, Sequencing and Data Analysis For expression analysis, plant structures (young roots, stems, leaves, tendrils, owers at the fully opening stage, and fruit at 30 days post-anthesis) were collected from 'Red Globe' and 'Thompson Seedless' seedlings. Ovules were dissected from two seeded ('Kyoho' and 'Red Globe') and two seedless ('Thompson Seedless' and 'Flame Seedless') cultivars at different developmental stages and weighed, as described previously [15]. All plants were identi ed and maintained in the grape germplasm resource orchard of Northwest Agriculture & Forestry University, Yangling, China (34°20′N 108°24′E). Ovules from seedless and seeded grapes at 24,27,30,33,36,39 and 42 DAF were collected. At each stage, one hundred ovules were selected randomly for average ovule weight measurements, and three key stages were de ned based on seedless grape ovule weight change (initial stage, stage with the highest average weight, and stage with the lowest average weight) were chosen for each seedless grape as done previously [17]. Samples were immediately frozen in liquid nitrogen and stored at −80 °C for subsequent gene expression analysis. And the plant materials used and collected in this work comply with China's guidelines and legislation.
For genomic sequencing, young leaves from 19 seedless or 19 seeded progeny were collected and pooled. DNA was extracted from each pooled leaf sample, using equivalent mass of sample from each pool. Sequencing libraries for the bulk and the parents were prepared using the Genomic DNA Sample Prep kit (Illumina, San Diego, CA) according to the manufacturer's protocol. The libraries were sequenced using an Illumina HiSeq 2500 system for 151 cycles, operating in the paired-end mode. The resulting raw Illumina reads were processed to remove duplicated read pairs (de ned as those having identical bases in the rst 100 bp of both left and right reads). Illumina adaptor sequences and low quality sequences were trimmed from the reads using Trimmomatic [34]. Read pairs with read < 40 bp were excluded from further analysis. Resulting paired-end reads were mapped to a grape reference genome (PN40024; 12X) [335] using BWA [36]. Only reads that mapped uniquely (i.e. having one single best hit) to the genome were used for SNP calling. SNPs and small indels (1-5 bp) were identi ed based on the alignment information in the pileup format generated by SAMtools [37].
DEGs containing nonsynonymous SNPs were identi ed using published transcriptome data (NCBI SRA accession SRP081137), previously generated from the same progeny populations [17]. GO and pathway enrichment analysis were performed using Plant MetGenMAP [38]. The raw sequencing data has been deposited in NBCI SRA under the accession number SRP174903.

QTL Analysis of Seed Size based on SNP Index
By aligning the sequence data of 'Seeded' and 'Seedless' DNA bulks to the reference sequence of one parent cultivar 'Red Globe', we counted the number (k) of short reads harboring SNPs that are different from the reference sequence. We de ned the proportion of k in the total short reads (n) covering a particular genomic position (=k n -1 ) as the SNP-index [20]. A SNP index was calculated for each SNP, and sliding window analysis was applied with a 2 Mb window size and 100 kb increment. The average SNP index within the window was used for sliding window plot drawing, and SNPs with a number less than 10 were excluded. To lter spurious SNPs called due to sequencing and/or alignment errors, SNP positions with SNP-index of <0.3 and read depth <7 were also excluded. A △(SNP-index), derived from SNP-index for each bulked pool, was further calculated [20]. The QTL region associated with seed size was then identi ed based on an arbitrary threshold of the top 0.5% of △(SNP-index) [39].

Quantitative RT-PCR Analysis
Quantitative real-time RT-PCR was carried out using a Bio-Rad iQ5 thermocycler (Bio-Rad, Hercules, CA, USA). For each sample, 1 µg of total RNA was converted into cDNA using PrimeScript™ RTase and an oligo dT primer (TaKaRa Biotechnology, Dalian, China) and was subsequently diluted 6-fold with sterile water. The grapevine Actin1 gene (GenBank Accession number AY680701) which was ampli ed with the primers F (5'-GATTCTGGTGATGGTGTGAGT-3') and R (5'-GACAATTTCCCGTTCAGCAGT-3') was used as reference gene. Gene-speci c primers can be found in Additional le 10: Table S7. For each reaction, 100 ng cDNA was used and carried out in triplicate with a reaction volume of 20 µl containing 0.8 µl each primer (1.0 µM), 1.0 µl of cDNA, 10 µl of SYBR green (TaKaRa Bio Inc.), and 7.4 µl sterile distilled water. The ampli cation parameters were 95°C for 30 s, followed by 40 cycles of 95°C for 5 s and 60°C for 30 s. Melting-curve analyses were performed with initial incubation at 95°C for 15 s and then a constant increase from 60°C to 95°C. ACTIN1 was used as a reference gene. Relative expression levels were analyzed using the IQ5 software and the normalized-expression method [40].

SNaPshot analysis
Speci c primers were designed for each SNP and details can be found in Additional le 11: Table S8. PCR ampli cation was performed using a 15 µL reaction system, including 1.5 µL MgCl 2 (25mmol), 1.5µL 10×PCR Buffer (Mg 2+ Free), 0.3µL dNTP (10 mmol·L -1 ), mixed primer 0.3µL, template DNA 1 µL, 0.3 µL Platinum Taq (5 U), and 10.1µL ddH 2 O. The PCR reaction procedure was: pre-denaturation at 94℃ for 3 min, denaturation at 94℃ for 15s, annealing at 55℃ for 15s, elongation at 72℃ for 30 s for 30 cycles, and elongation at 72℃ for 3 min. PCR product puri cation and extension reaction were carried out by referring to ABI SNaPshot technology. 1 µL of the extension product mix with 9 µL of the HIDI was taken, denaturation at 95°C for 3min was followed by immediate ice bath and genotyping was performed using a sequencer [41]. The SNaPshot Multiplex Kit was purchased from ABI company, Platinum® Taq DNA polymerase was purchased from Invitrogen, SAP and ExoI enzymes from Fermentas.

Consent for publication
Not applicable.

Availability of data and material
The raw sequencing data has been deposited in NBCI SRA under the accession number SRP174903. The datasets supporting the conclusions of this article are included within the article and its additional les.

Competing interests
The authors declare that they have no competing interests.