Genetic diversity analysis
The amplification of 270 pairs of SSR marker candidates led to the selection of 91 pairs of polymorphic SSR loci with clear amplified bands. The examination of these 91 SSR loci in 33 standard varieties of flue-cured tobacco revealed 304 alleles (2–6 alleles per locus) and an average of 3.34 alleles per locus. These alleles included 67 rare alleles with allele frequencies ≤ 0.05. The SSR loci with 4 or 5 alleles also had the highest number of rare alleles, 28 and 22 rare alleles, respectively. These rare alleles accounted for 75% of the total number of rare alleles. No rare alleles were detected in loci with 2 alleles. The polymorphic information content (PIC), Nei’s index (H) values, and Shannon’s information index (I) values of the 91 SSR pairs were 0.3603, 0.4040, and 0.7228, respectively. A boxplot of PIC values by allele number revealed that the polymorphism of a given locus increased with the number of alleles. Cluster analysis showed that the average genetic similar ity between varieties was 0.5640 ± 0.1744. According to the unweighted pair group method using arithmetic average (UPGMA) clustering tree, the 33 flue-cured tobacco varieties can be fully distinguished from each other using 91 pairs of SSR markers (Fig. 1).
Evaluation of the minimum number of primers required for genetic diversity analysis
To evaluate the minimum number of primers required for genetic diversity analysis, we analyzed how the measured genetic diversity varied with the number of primers. Random sampling of 1–90 markers was repeated 50 times for each value and the average PIC values were calculated. A scatter plot of the results revealed that PIC values gradually tend towards the average PIC value as the number of markers increases (Fig. 2). Thus, using more markers decreases the coefficient of variation (CV) between repeats, as is shown in the histogram in the bottom of Fig. 3. By calculating the CV trend line, we found that using more than 25 markers resulted in a CV < 5.0%, indicating that the PIC values were stable. Therefore, a subset of 25 markers (out of the 91 markers tested in this study) is sufficient to reveal the genetic diversity of the population.
SSR marker screening to construct the genetic fingerprints of the studied varieties
Following the principle of using two markers for each linkage group, we selected 48 pairs of SSR markers from the 91 markers tested to be used for the construction of the genetic fingerprints of the DUS-tested standard flue-cured tobacco varieties. The PIC, H, and I values of the 48 markers were 0.3736, 0.4223 and 0.7534, respectively. Using the 48 pairs not only met the requirements for the minimum number of primers but were also sufficient to fully distinguish the 33 varieties from each other. Furthermore, we calculated and plotted genetic similarity matrices using the two sets of markers to compare the differences in the genetic relationships revealed by the 48 and 91 markers selected. The points in the scatter plot are arranged along a diagonal line with significant linearity, all within the 95% confidence interval of the linear fit. Subsequent correlation analysis revealed a significant correlation between the genetic relationships determined by the two sets of markers, with a Pearson correlation coefficient of 0.967 (Fig. 4).
Construction of SSR genetic fingerprints for the DUS test standard flue-cured tobacco varieties
The genetic fingerprints of the 33 standard varieties were constructed using 48 pairs of SSR markers and produced the banding patterns shown in Fig. 5a. The fingerprints contained 162 alleles with allele frequencies that ranged from 0.0303 to 0.9394 and an average allele frequency of 0.2963 ± 0.2897. There were 39 rare alleles with allele frequencies ≤ 0.05. Eleven of the varieties carried a rare allele, and the varieties SV15, SV22, SV11, and SV20 contained 15, 7, 6, and 4 rare alleles, respectively. The number of differentiated loci among the tested varieties ranged from 4 to 40, with an average of 20.15 ± 7.716. Fig. 5b shows that SV22, SV15, and SV20 have more differentiated loci than the other varieties, indicating that they are exceptionally different.
Core SSR markers for molecular DUS test of flue-cured tobacco
The 48 SSR pairs revealed that there were at least four differentiated loci between all of the varieties; this set of markers can therefore be used for molecular DUS testing of new varieties of flue-cured tobacco. As such, we screened reference varieties for each allele according to the augmented band pattern. We selected 16 varieties to be used as reference varieties: SV02, SV03, SV04, SV08, SV10, SV11, SV12, SV14, SV15, SV18, SV19, SV20, SV22, SV23, SV30, and SV32. These 16 varieties each had typical and clear augmented bands for a specific allele. In DUS testing of new flue-cured tobacco varieties using the 48 pairs of SSR markers, these varieties can be added as a reference for evaluating the bands of the new varieties according to the results presented in Table 2.