Genetic diversity analysis
The amplification of 270 SSR marker candidates led to the selection of 91 pairs of polymorphic SSR loci with clear amplified bands (Additional file 1: Table S1). The examination of these 91 SSR loci in the 33 standard varieties revealed 304 alleles (2–6 alleles per locus) and an average of 3.34 alleles per locus. These alleles included 67 rare alleles with allele frequencies ≤ 0.05. The SSR loci with 4 or 5 alleles also had the highest number of rare alleles, 28 and 22 rare alleles, respectively. These rare alleles accounted for 75% of the total number of rare alleles. No rare alleles were detected in loci with 2 alleles. The polymorphic information content (PIC), Nei index (H), and Shannon information index (I) values of the 91 SSR pairs were 0.3603, 0.4040, and 0.7228, respectively. A boxplot of the PIC values by allele number revealed that the polymorphism of a given locus increased with the number of alleles (Fig. 1). Cluster analysis showed that the average genetic similarity between varieties was 0.5640 ± 0.1744. According to the unweighted pair group method with arithmetic mean (UPGMA) clustering tree, the 33 standard varieties can be fully distinguished from one another using 91 pairs of SSR markers (Fig. 2).
Evaluation of the minimum number of primers required for genetic diversity analysis
To evaluate the minimum number of primers required for genetic diversity analysis, we analyzed how the measured genetic diversity varied with the number of primers. From 1 marker to 90 markers, the random sampling test of each marker number was repeated 50 times, and the average PIC values of each marker number were calculated. A scatter plot of the results revealed that PIC values gradually tend towards the average PIC value as the number of markers increases (Fig. 3). Thus, using more markers decreases the coefficient of variation (CV) between repeats, as the histogram shown in the bottom of Figure 3. By calculating the CV trend line, we found that using more than 25 markers resulted in a CV < 5.0%, indicating that the PIC values were stable. Therefore, a subset of 25 markers (out of the 91 markers tested in this study) is sufficient to reveal the genetic diversity of a population.
SSR marker genotyping to construct the genetic fingerprints of the studied varieties
Following the principle of using two markers for each linkage group, we selected 48 pairs of SSR markers from the 91 markers tested to be used for the construction of the genetic fingerprints of the standard flue-cured tobacco varieties commonly used in DUS testing. The PIC, H, and I values of the 48 markers were 0.3736, 0.4223, and 0.7534, respectively. Using the 48 pairs not only met the requirements for the minimum number of primers but were also sufficient to fully distinguish the 33 varieties from one another. Furthermore, we calculated and plotted genetic similarity matrices using the two sets of markers to compare the differences in the genetic relationships revealed by the 48 and 91 markers selected. The points in the scatter plot are arranged along a diagonal line with significant linearity, all within the 95% confidence interval of the linear fit. Subsequent correlation analysis revealed a significant correlation between the genetic relationships determined by the two sets of markers, with a Pearson correlation coefficient of 0.967 (Fig. 4).
Construction of SSR genetic fingerprints of the 33 standard varieties
The genetic fingerprints of the 33 standard varieties were constructed using 48 pairs of SSR markers and produced the banding patterns shown in Figure 5a. The fingerprints contained 162 alleles with allele frequencies that ranged from 0.0303 to 0.9394 and an average allele frequency of 0.2963 ± 0.2897. There were 39 rare alleles with allele frequencies ≤ 0.05. Eleven of the varieties carried a rare allele, the varieties SV15, SV22, SV11, and SV20 contained 15, 7, 6, and 4 rare alleles, respectively. The number of differentiated loci among the tested varieties ranged from 4 to 40, with an average of 20.15 ± 7.716. Figure 5b shows that SV22, SV15, and SV20 have more differentiated loci than the other varieties, indicating that they are exceptionally different.
Core SSR markers for molecular DUS testing of flue-cured tobacco
The 48 SSR pairs revealed that there were at least four differentiated loci among all varieties. Therefore, this set of markers can be used for molecular DUS testing of new varieties of flue-cured tobacco. As such, we screened reference varieties for each allele according to the PCR band pattern. We selected 16 varieties to be used as reference varieties: SV02, SV03, SV04, SV08, SV10, SV11, SV12, SV14, SV15, SV18, SV19, SV20, SV22, SV23, SV30, and SV32. These 16 varieties each had typical and clear amplified bands for a specific allele. In DUS testing that employs the 48 pairs of SSR markers, these varieties can be added as a reference to evaluate the banding patterns of candidate varieties according to the results presented in Table 1.