Genetic diversity, population structure and ancestry components
The first two components of the PCA explained 8.8% of the genotypic variation in 149 accessions (Fig. 1A). The first component showed a clear separation between Croatian V. sylvestris and locally grown V. vinifera along the x-axis of the bidimensional plot. One single spontaneous accession (Sy 10) from the Modro jezero population laid in an intermediate position along PC1 nearby widespread cultivars with known high European sylvestris ancestry such as ‘Cabernet Sauvignon’, ‘Chasselas Blanc’ and ‘Welschriesling’, the latter two of which are locally grown in Croatia under the synonymous of ‘Plemenka bijela’ and ‘Graševina’, and the Dalmatian cultivar ‘Divjaka’. It is therefore possible that Sy 10 is either an abandoned local variety or a seedling that were involuntarily introduced into the anthropic environment of Modro jezero or a feral accession deriving from crop-to-wild gene flow in the once wild population of Modro jezero. The cultivated germplasm in Croatia laid in a confined part of the bidimensional space that is delimited by well-known grapevines, which are cultivated in Croatia and beyond in Europe. The few exceptions are represented by three table grapes ‘Krivaja Bijela’, ‘Krivaja Crvena’ and ‘Mijajuša’, the latter of which is a synonym of the popular Greek/Levantine cultivar ‘Xeromachairouda’ [41] also known as ‘Asswad Karech’ [42]. In order to rescale the diversity in Croatia relative to the global grapevine diversity, we generated a PCA using 224,695 common variant sites between the sample set of this paper and a WGS diversity panel. The bidimensional space in Fig. 1B shows that grapevine diversity in Croatia is relatively rich, local sylvestris diversity and diversification are relatively small and low compared to other European wild populations, and Croatian natural sites may contain genuine sylvestris and sylvestris-vinifera admixed individuals. A number of individuals from all but one natural site laid, indeed, midway in the bidimensional space between the sylvestris and sativa distributions and colocalized with KE-06, a known sylvestris-vinifera hybrid from a German natural site.
To substantiate the findings inferred from the PCA, we analysed population structure using a model-based clustering approach that simultaneously estimates population allele frequencies along with ancestry proportions, implemented in the ADMIXTURE software [34]. The cross-validation error indicated that three ancestry components (K = 3) contributed most likely to the genome composition of the analysed germplasm (Supplementary Table S3). According to ADMIXTURE, one ancestry component, which we refer hereafter to as sylvestris, is predominant in natural sites and is present as a minor contributor (< 0.2) in a few local cultivars (Fig. 2). ADMIXTURE confirmed that ‘Cabernet Sauvignon’, ‘Plemenka Bijela’, ‘Graševina’, which is widely grown in Continental Croatia, and ‘Divjaka’, which is grown exclusively in the Pelješac peninsula, contain a relevant fraction of this ancestry component. Natural sites are home to individuals with pure sylvestris ancestry but also to individuals with varying proportions of sylvestris-vinifera admixed ancestry. In all but one of the cases, sylvestris ancestry was largely predominant and the vinifera ancestry component contributed for a fraction from 0.63 to 1 in each individual genome, with a mean of 0.95 and a median of 1 considering the whole sample set. The most extreme case is represented by the individual Sy10 from Modro jezero that showed a predominant 0.65 vinifera ancestry component and is part of the peculiarity of the Modro jezero spontaneous population, wherein all individuals show some degree of admixture. The two ancestry components that, cumulatively, are predominant in the individual genomes of all cultivars, which we refer hereafter to as vinifera, contribute almost equally to the global composition of the cultivated germplasm analysed in this paper (Fig. 2). One vinifera ancestry component is associated with cultivars typically grown in Continental Croatia, northern Adriatic Croatia (Kvarner and Istrian peninsula) and it makes up the entirety of the genomes of the popular cultivars ‘Heunisch Weiss’ (syn. 'Belina Starohrvatska') and ‘Blank Blauer’ (syn. ‘Bljuzgavac’ and ‘Vulpea’). The other vinifera ancestry component is associated with cultivars typically grown in Dalmatia (i.e. central and southern Adriatic Croatia) and it makes up the entirety of the genomes of the cultivars ‘Plavac Mali’ and ‘Bombino Bianco’. The majority of Croatian cultivars showed admixed proportions of these two vinifera ancestry components. Among the most remarkable ones, ‘Malvazija Istarska’ showed approximately three quarters of Continental/Istrian vinifera ancestry component and one quarter of Dalmatian vinifera ancestry component. ‘Tribidrag’ showed the opposite condition.
Origin of admixed ancestries and genealogical relationships
PCA and ADMIXTURE suggested that spontaneous populations in natural sites are in part composed of admixed individuals. The results of ADMIXTURE also conveyed a sense of strong kinship influencing the population structure of the cultivated germplasm. To address these points, we investigated the introgression events that have generated admixed individuals, using the algorithm developed by Pickrell and Pritchard and implemented in the TREEMIX software [35], and searched for complete parent-offspring trios in all possible combinations and permutations of the analysed accessions.
Figure 3 shows the TREEMIX models using a block size of 200 SNPs that explained from 98.3–99.6% of the variance of relatedness among groups, with increasing numbers of migration events. Tree topologies confirm that the spontaneous populations in more remote forest environments in Continental Croatia and in Bosnia Herzegovina are the least related ones to local cultivars, while the spontaneous population in Dalmatia from the site that is most subject to the effects of anthropization on the banks of Modro jezero show signatures of vinifera introgression. The combination of evidence from ADMIXTURE and TREEMIX suggests that at least part of the spontaneous population from the Paklenica National Park have also received some gene flow from vinifera. The f3 test confirmed that the spontaneous populations of Modro jezero and Paklenica National Park are compatible with being the result of admixture between more genuine sylvestris populations, similar to those found at Cerovica and Psunj, and cultivated germplasm.
Fourteen complete parent-offspring trios were identified in the dataset of this paper (Table 1, Supplementary Figs. 1–14). ‘Heunisch Weiss’ (syn. 'Belina Starohrvatska'), ‘Blank Blauer’ (syn. ‘Bljuzgavac’ and ‘Vulpea’), ‘Plavac Mali’ and ‘Bombino Bianco’ were recurrent parents in 11 out of 14 trios. The pair 'Belina Starohrvatska' and 'Bljuzgavac' parented 4 cultivars. One of these full-siblings is the cultivar ‘Surina’, which is reported here for the first time along with 'Svjetljak’, ‘Ranfol’, and ‘Plavec žuti’. The comparison based on IBD segment length of four full-siblings is graphically shown in Supplementary Figs. 15–21. The pair 'Plavac mali' and 'Bombino Bianco' parented two cultivars, i.e. the full-siblings ‘Ninčuša’ and ‘Ljutun’. A total of 72 parent-offspring pairwise relationships were identified in the dataset of this paper (Supplementary Table S4). All these relationships are summarised graphically in the kinship network of Fig. 4.
As a result of the presence of complete trios and many other unoriented parent-offspring pairwise relationships, 23 cultivars are linked to 'Belina Starohrvatska' and 'Bljuzgavac' through a chain of parent-offspring relationships and 36 cultivars are linked to both 'Plavac Mali' and 'Bombino Bianco'. The two kinship groups, one named after its founders and referred to as 'Belina Starohrvatska' and 'Bljuzgavac' and the other one referred to as 'Plavac Mali' and 'Bombino Bianco' represent collectively 56% of the analyzed cultivars. Consanguinity within each group and between groups is furtherly exacerbated by the fact that these four founders are themselves highly consanguineous. 'Belina Starohrvatska', 'Bljuzgavac', 'Plavac Mali' and 'Bombino Bianco' share identical by descent haplotypes across between 23.8% and 31.7% of their diploid genome length in all 6 possible pairwise comparisons. In particular, the members of the pair 'Belina Starohrvatska' and 'Bljuzgavac' and the members of the pair 'Plavac Mali' and 'Bombino Bianco' share by descent 31.7% and 29.3%, respectively, of their diploid genome length.
Table 1
List of complete parent-offspring trios reporting the percentage of Mendelian inconsistencies, the number of informative sites and previous references reporting the parentage.
Offspring
|
Parent 1
|
Parent 2
|
Mendelian error
|
Informative sites
|
Previous reference
|
Ljutun (HR24)
|
Bombino bianco (HR44)
|
Plavac mali (HR33)
|
0.66%
|
387,242
|
[17]
|
Ninčuša (HR32)
|
Bombino bianco (HR44)
|
Plavac mali (HR33)
|
0.67%
|
378,126
|
[22]
|
Gegić (HR41)
|
Bombino bianco (HR44)
|
Bilina privlačka (AF15)
|
0.66%
|
430,842
|
[17]
|
Krstičevica (HR67)
|
Bombino bianco (HR44)
|
Plavina (HR28)
|
1.17%
|
266,891
|
[17]
|
Debit (HR64)
|
Bombino bianco (HR44)
|
Lasina (HR23)
|
0.64%
|
427,447
|
[22]
|
Kurtelaška (AF24)
|
Bombino bianco (HR44)
|
Maraština omiš (HR19)
|
0.53%
|
411,318
|
[22]
|
Ranfol (AF18)
|
Belina Starohrvatska (AF13) (Heunisch Weiss)
|
Bljuzgavac (AF21) (Blank blauer)
|
0.70%
|
285,861
|
[22]
|
Surina (IPTPO08)
|
Belina Starohrvatska (AF13) (Heunisch Weiss)
|
Bljuzgavac (AF21) (Blank blauer)
|
0.66%
|
287,812
|
this paper
|
Plavec žuti (AF12)
|
Belina Starohrvatska (AF13) (Heunisch Weiss)
|
Bljuzgavac (AF21) (Blank blauer)
|
0.71%
|
287,823
|
[17]
|
Svjetljak (AF32)
|
Belina Starohrvatska (AF13) (Heunisch Weiss)
|
Bljuzgavac (AF21) (Blank blauer)
|
0.71%
|
287,881
|
[17]
|
Mejsko bijelo (AF34)
|
Duranija (IPTPO07)
|
Žumić (AF05)
|
0.73%
|
225,938
|
[17]
|
Glavinuša (HR18)
|
Vugava vis (HR37)
|
Plavac mali (HR33)
|
0.55%
|
397,147
|
[22]
|
Dolcin (IPTPO10)
|
Rukatac (HR01) (Malvasia bianca lunga)
|
Verdić (AF19)
|
0.51%
|
481,631
|
[57, 58]
|
Pošip obični (HR05)
|
Zlatarica blatska (HR61)
|
Bratkovina blatska (HR60)
|
0.66%
|
428,187
|
[59]
|
While Mendelian inconsistencies have low frequencies and are randomly distributed across the genome in bona fide parent-offspring trios, their incidence is higher and their chromosomal distribution is uneven for the proposed parent-offspring trio that was claimed for explaining the origin of ‘Plavac Mali’ [43]. ‘Plavac Mali’ is the most widely planted red cultivar in Dalmatia and its pedigree was particularly intriguing because it seemed to result from a cross between the two local cultivars ‘Tribidrag’ and ‘Dobričić’ based on microsatellite DNA allele sizes. In the present study, IBD analysis of the proposed parentage of ‘Plavac Mali’ showed 1.7% Mendelian inconsistencies. Unmatching SNPs densely clustered on several chromosomal regions, thereby representing intervals of IBD0 that were incompatible with their origin from either ‘Tribidrag’ or ‘Dobričić’ (Fig. 5A). We validated this finding using WGS data that became publicly available recently (Fig. 5B). A percentage of 1.2% of the 472,828 SNPs that were informative in all three individuals with respect of the reference genome revealed Mendelian inconsistencies across the same chromosomal regions shown by reduced-representation genome sequencing data. A higher saturation of variant sites and a lower background noise from WGS data compared to reduced-representation genome sequencing data, allowed us to estimate data that 19 chromosomal regions in ‘Plavac Mali’, involving 12 out of 19 chromosomes and covering cumulatively 23.3% of haploid genome, are incompatible with being inherited from either ‘Tribidrag’ or ‘Dobričić’. While we could confirm that ‘Plavac Mali’ has a parent-offspring relationship with both ‘Tribidrag’ and ‘Dobričić’, we can exclude that both ‘Tribidrag’ and ‘Dobričić’ are the parents of ‘Plavac Mali’.
‘Plavac Mali’ showed 8 unoriented parent-offspring relationships in our dataset (Fig. 4), beside those with ‘Tribidrag’, ‘Dobričić’, and three cultivars for which the direction of the relationship was resolved. We therefore tested each individual of those 8 ones (Fig. 4) and their compatibility with being the parent of ‘Plavac Mali’ in a parental combination with either ‘Tribidrag’ or ‘Dobričić’. None of these combinations could explain the origin of ‘Plavac Mali’ (Supplementary Table S5). In a similar way, we tested each individual of the 5 ones that showed an unoriented parent-offspring relationships with ‘Tribidrag’ (Fig. 4), their compatibility with being the parent of ‘Tribidrag’ in a parental combination with ‘Plavac Mali’. None of these combinations could explain the origin of ‘Tribidrag’.
Genome wide association study results
Berry colour
The Manhattan plot in Fig. 6A shows 143 SNPs above the significance threshold that are associated with berry colour, 140 of which are located on chromosome 2. Of these (Supplementary Table S6), 139 SNPs span an interval of approximately 8.4 Mbp, which is centered on the array of MybA genes that is known to encode two transcription factors (MybA1 and MybA2, Fig. 6C) controlling the expression of a key structural gene for anthocyanin biosynthesis [44]. One SNP outside that interval is located at chr2:7,159,705 in the VIT_00013173001 gene sequence, which has been annotated as a BSK2-like serine/threonine protein kinase. Three associated SNPs are located on chromosome 12 of the reference genome in a short interval that is highly similar in nucleotide sequence to an interval on chromosome 2 that is located at 16.7 Mbp, within the region shown in Fig. 6C, where several associated SNPs are in linkage disequilibrium with the MybA genes. The latter may be considered as a kind of false association, because a true positive SNP-trait association revealed by the reads is displaced from the correct genomic location due to misalignments on the reference genome.
Leaf hairiness
Two SNPs on chromosome 5 were significantly associated with the density of prostate hairs between the main veins on the abaxial leaf lamina (Fig. 6B). This trait is also known in viticulture as ampelographic descriptor OIV 084 [38]. The first SNP (A/T) is located at chr5:1,424,308, in an exon of the gene VIT_05s0077g01800 that translates into an amino acid sequence with no predicted function and no known protein domains, according to NCBI and InterPro databases [45]. The second SNP (T/A) is located at chr5:1,525,871 in an exon of the gene VIT_05s0077g01940 that is predicted to encode an IBR3 acyl-CoA dehydrogenase, which oxidizes Indole-3-butyric acid (IBA) into indole-3-acetic acid (IAA) in auxin biosynthesis [46]. However, given the extent of linkage disequilibrium in our dataset, it is unlikely that exactly those genes may have a role in explaining the observed phenotypic variation. Figure 6D shows, indeed, an interval of approximately 1 Mbp that is likely to contain the causal factor. No other OIV descriptor showed statistically significant associations.
as leaf hairiness in the cultivated compartment might be an adaptive trait, we sought to investigate possible relations between leaf hairiness and ancestry components. When we sorted cultivars in categorical phenotypes (i.e. the only one assigned to OIV 084 class 9 was merged to those assigned to class 7 in the following analysis), we did not detect differences in the distribution of vinifera ancestry components among hairiness categories, but we found an inverse trend of decreasing sylvestris ancestry component as hairiness increased (Fig. 7). In particular, the distributions of the sylvestris ancestry component were significantly different (p = 0.041) at a two-sided Wilcoxon test between cultivars with hairiness class 3 and cultivars with hairiness classes 7–9.