Marker quality analysis
SNP markers
A total of 65 510 SNP markers with an average reproducibility of 99.6% were obtained using the DArTseq method for 188 individuals of V. officinalis. Over 98% of SNPs had reproducibility above 95%, of which 55 101 was completely reproducible (Fig. 2a). The call rate was in the range from 20% to 100% (Fig. 2b). Approximately 63% of SNPs had a call rate above 75%. When the quality criteria were applied, i.e. reproducibility above 90% and call rate above 95%, 23 507 SNP markers were assigned to subsequent analysis. The polymorphism information content (PIC) was in the range of 0.01-0.5 with an average of 0.181 and a median of 0.129. As much as 40% of SNP markers had a PIC below 0.1, and only 1% had a maximum value of 0.5 (Fig.2c).
SilicoDArT markers
A total of 61 633 silicoDArT markers with an average reproducibility of 99.8% were obtained. All markers had the repeatability above 95% and as many as 58 6244 had the repeatability equal to 100% (Fig. 2d). The call rate ranged from 0.8 to 1 (Fig. 2e). About 80% of silicoDArT had a call rate above 90%. After applying quality criteria, 41 925 markers were approved for further analysis. The PIC value ranged from 0.01-0.5 to 0.170 and the median 0.123. The frequency analysis showed that over 40% of markers had a very low PIC, i.e. below 0.1, and only 0.02% had a maximum PIC value (0.5).
Genetic diversity
SNP markers
In individual populations, the percentage of polymorphic loci ranged between 31-61% (Fig. 3a). The occurrence of unique alleles populations ranged from 0 to 1059 (Fig. 3b). No unique alleles were found in two populations (PL 403186 and PL 403187) originating from the Świętokrzyskie Voivodeship. 22% of alleles that occurred in wild populations were not represented in the cultivated form. The contribution of SNP markers unique for the regions is shown in Fig. 3c. The observed heterozygosity (Ho) ranged from 0.196 (PL 401941) to 0.354 (PL 504920) (Fig. 3d). The mean heterozygosity of wild populations was 0.250. The expected heterozygosity (uHe) varied between 0.274 (PL 403192) and 0.348 (PL 403183), with an average 0.315 for wild populations (Fig. 3d). The lowest value of inbreeding coefficient (FIS) was found for cultivated form PL 504920 (-0.104) while the highest for population PL 401941 (0.354) (Fig. 3d). In general, in wild populations the coefficient reached values above zero, which indicates the presence of inbreeding. The population PL 401945 from the Świętokrzyskie Voivodeship was the closest to the state of Hardy-Weinberg equilibrium. Negative F-value for the cultivated form indicates excessive heterozygosity. The genetic distance according to Jaccard's coefficient between the examined individuals ranged from 0.08 (individuals from population PL 406738 and PL 403192) to 0.380 (individuals from population PL 401941 and PL 406738). The mean distance between individuals from natural populations was 0.239.
SilicoDArT markers
The percentage of polymorphic silicoDArT loci in the investigated samples was 17-49% (Fig.3a). Unique fragments were found in different numbers between 20 (PL 401938) and 2282 (PL 504920) (Fig. 2b). The cultivated form did not include 21 010 fragments, which were present in natural populations. The contribution of unique fragments in specific regions was presented in Fig. 3c. The Nei unbiased genetic diversity coefficient (uHe) ranged between 0.328 and 0.409 for populations PL 403192 and PL 403183, respectively, and it showed a strong positive correlation (0.807; p<0.0001) with uHe calculated for SNP markers. The mean genetic distance between 188 individuals was 0.861. The maximum distance (0.941) was between individuals from population PL 401941 and PL 403192, while the minimum distance was 0.294 for individuals from population PL 406738 and PL 403192.
Mantel test
The comparison of genetic distance matrices obtained for SNPs and silicoDArTs showed a strong uphill (0.752; p<0.0001) linear relationship of them (Fig 4a). The test in which genetic distance matrices between populations were collated with a matrix of geographical distance showed a moderate positive relationship for both SNPs and silicoDArTs (0.304 and 0.329; p<0.0001 respectively) (Fig. 4b, c).
AMOVA
The Analysis of Molecular Variance (AMOVA) for SNP markers of all samples showed that 23% of the variance in allele frequency could be attributed to differences between populations (Fig. 5). In case when only wild populations were analyzed, the value of differentiation among populations (ɸPT) coefficient was 0.199 (p<0.001). The hierarchical AMOVA of wild populations revealed that 78% of total variance was found among individuals, 16% was found among populations while the rest of variation (6%) was among regions. The genetic variation between the cultivated form and wild populations was 29%. AMOVA results of silicoDArTs indicated greater intrapopulation than interpopulation variation (~83% vs ~17%). The ɸPT coefficient for the natural populations exclusively was 0.166 (p<0,001). The hierarchical AMOVA of wild populations confirmed that the largest fraction of variation (82%) was within populations, while the interpopulation variance was 13% and inter-regions variation was restricted to 5%. 16% of the variation was found between the cultivated form and the natural populations.
Population structure
SNP markers
The Agglomerative Hierarchical Clustering based on SNP markers showed the presence of four main groups (Fig. 6a). The first one, most distant, was composed of 20 individuals. It consisted of the cultivated form individuals and four individuals from population PL 406738 and one of PL 403192. The second cluster consisted of 55 individuals, which came from six populations collected in the Podkarpackie Voivodeship. Nine specimens from Mazowieckie Voivodeship formed the third and the smallest group. The last group included as many as 104 individuals. Its core was constituted by populations located in the Świętokrzyskie Voivodeship. In addition, it contained the population from Kujawsko-Pomorskie and four individuals from the Mazowieckie Voivodeship.
The first three main coordinates of Principal Coordinate Analysis (PCoA) explained only 24.5% of the total variability (Fig. 6b, c). In the 1 and 2 coordinate system, the presence of four groups, i.e. the cultivated form, population originating from the Podkarpackie Voivodeship with a separate population PL 403183 and a large group consisting of populations from the other three Voivodships, was clearly visible. The third coordinate allowed to isolate as well as a group of individuals from Mazowieckie Voivodeship.
In order to further elaborate the genetic structure of the populations, a model-based Bayesian clustering was conducted using the Structure 2.3.4 software. In order to find the suitable number of clusters (K) it was tested in the range from 1 to 10 and was plotted against ΔK which showed a sharp peak for K=5 (Fig. 6d), hence, five distinct groups were found to contribute significant genetic information across tested accessions. The sub-populations were denoted as POP1 – POP5, and each sub-population contributed in 10%, 5%, 56%, 21% and 7% of total V. officinalis genetic makeup, respectively (Fig. 6d). The proportion of membership of individuals in each population was illustrated in the bar plot of the population assignment test in structure analysis (Fig 6e, f). Genotypes that the estimated proportion of membership (Q) scored >0.80 were considered as pure while <0.80 as admixed. It suggested that all cultivated form’s individuals were assigned entirely in POP1, the same as four ones from population PL 406738 and one from PL 403192 that were originated from the Mazowieckie Voivodeship. The POP2 was comprised of seven individuals from both Mazovian populations. The remaining six plants from that region showed intermediated genetic composition and were determined as heterogeneous. POP3 dominated in wild populations originating from Świetokrzyskie and Kujawsko-Pomorskie Voivodships, although in some part of the accessions which collection sites were located further to the South, the presence of about 10-20% of genetic information from POP5 was also noted. Most of populations collected in Bieszczady Mountains i.e. Podkarpackie Voivodship comprised admixtures of POP4 and POP3, however population PL 403183 represented POP4 and POP5 genetic makeup.
SilicoDArT markers
Ward's hierarchical grouping showed the presence of five main clusters (Fig. 7a). The groups were identified in a very similar way to the above-mentioned SNP analysis. The only difference corresponded to the classification of PL 403183 from Podkarpackie Voivodship as a separate group. Also, PCoA results for silicoDArT data showed high agreement with those reported for SNP markers (Fig. 7b, c). The three main coordinates explained 14.9% of the total variation.
Bayesian cluster analysis of silicoDArT markers using the ΔK method to determine optimal K values suggested that the most likely number of clusters was two (Fig. 7d), hence two distinct groups were found, and they correspond to cultivated and wild populations (Fig. 7e). The subpopulations were determined as POP1 and POP2 and constituted 11% and 89% of the total genetic information, respectively (Fig. 7f). The structure of the lower order for K=3 showed the existence of third group composed of populations originating from Podkarpackie Voivodeship (Fig. 7g, h).
Joint analysis
Generalized Procrustes Analysis was performed based on the first ten principal coordinates obtained from PCoA for SNP and silicoDArT data. The most effective reduction of the total variability was scaling and rotation (p<0.0001). The residuals by individuals after transformation were in the range 0.0001-0.009. On average, the highest value was observed in the cultivated form. The residuals by configuration after transformation were equal (0.169) indicating that both methods gave consensus results. A consensus test allowed to determine that the observed Rc (results of the consensus test) value was significantly higher than 95% of the results that were obtained when permuting data. The eigenvalues showed that the first three axes represented 53.56% of variability. The variability splitted between the methods showed that the first axis explained a bit more variation of SNP than silicoDArT data. The first three GPA factors explained 53.5% of total variance. Finally, scatterplots presented the consensus results. Due to the high consistency of the results obtained for SNP and silicoDArT analysis, as before, five major groups could be identified. There was a clearly visible distinctiveness between the cultivated form and natural populations. Among the wild populations, the ones coming from the Bieszczady Mountains were distinct and moreover they showed additional internal differentiation. Also, a part of individuals from the Masovian region forms a separate aggregation. On the other hand, the population originating from Kujawsko-Pomorskie Voivodeship does not differ from the large group of populations collected in the Świętokrzyskie Voivodeship (Fig. 8).