Recent advances in high throughput DNA sequencing technology offers new information to accelerate the development of molecular markers. Molecular markers are widely used in many plant genetic and genomic-based studies. Microsatellites or SSR markers are one of the frequently used molecular markers for genetics and molecular studies in diverse plant species. Generally, SSR markers are distributed in both transcribed and non-coding sequences referred to as EST- and genomic-SSRs respectively. With the advantages of being co-dominant, PCR-based, highly polymorphic, chromosome-specific, reproducible and consistent [42] compared to other molecular markers, SSR markers have been widely used in many genetic and molecular-based studies in sweetpotato including variety identification, genetic diversity analysis and construction of linkage maps [19-21]. Although several studies have reported the use of SSR markers in sweetpotato, most of them evolved from investigating transcriptome libraries and ESTs. Again, the published SSR markers do not cover the whole genome in addition to their limited number and availability with only a few being polymorphic compared to other crops. Therefore, the development of novel SSR markers that are highly polymorphic and distributed throughout the genome will be more effective for genetic analysis in sweetpotato.
To identify valuable genomic-SSR markers for sweetpotato genetic improvement, the sweetpotato genome was searched and a total of 2,431 SSR markers (Additional file 3: Table A1) were successfully developed based on the SSR-containing sequences. The distribution density was 159.77 Mb per SSR or 6.26 kb per one SSR on average which was lower than the average density recorded for sweetpotato (7.1 kb), pigeon pea (8.4 kb), cotton (20.0 kb), and soybean (23.80 kb) but almost the same as that of sesame (6.55 kb), and relatively higher compared to that of rice (3.4 kb) and radish (4.93 kb) [28, 43, 44]. However, the differences in frequency and abundance could be attributed to the size of the database, tools for SSR data-mining, the length of repeat motifs and the application of different repeat unit thresholds, hence, it is practically difficult to directly compare the frequency and abundance estimates of different studies [45]. In our current study, mono-, di- and trinucleotides were the most common SSRs with dinucleotides showing the highest frequency (38.50%) followed by trinucleotides (31.46%) and mononucleotide (12.77%; Additional file 2: Figure A1). Feng et al. [46] identified dinucleotides (9439, 51.52%) as the most abundant repeats followed by trinucleotides (7636, 41.68%) in sweetpotato which is consistent with the results of this study. Our findings contrast with previous reports showing trinucleotides as the most dominant repeat motifs in sweetpotato followed by dinucleotides [28, 31]. Other studies also suggested trinucleotides as the second predominant repeat motifs in sweetpotato which is in agreement with our current findings [30].
The main repeat types among the identified SSRs were A/T (12.61%), AT/AT (61.51%), AAT/ATT (27.42%), and AAAT/ATTT (13.32%, Additional 2: Figure A1). In agreement with our current study, Wang et al. [28] identified AAT/ATT as the most dominant SSR motif in sweetpotato. Similarly, Yang et al. [6] identified AAAT/ATTT as the most frequent repeat motif among tetranucleotides in Welsh onion. However, previous studies identified AG/CT, AAG/CTT, and AT/TA motifs as the most dominant motif types in sweetpotato [47], conflicting with our findings.
In this study, 100 primer pairs were randomly selected for validation of the SSR markers and to assess its usefulness in sweetpotato. Of these, 50 primer pairs (50%) produced clear stable bands. The 50% PCR amplification efficiency recorded in this study was much lower than the reported 75%-90% EST-SSR amplification rate in sweetpotato [28, 29, 32]. However, the amplification efficiency of genomic-SSRs has always been lower than EST-SSRs in sweetpotato which is in line with our results [29, 48]. The reason being that genomic-SSR primers are designed randomly from genomic libraries whereas EST-SSRs are from relatively highly conserved transcribed regions. Due to this reason, EST-SSRs are reported to be highly applicable and transferable to related species but less polymorphic compared to genomic-SSRs [49]. The 50 working primer pairs amplified 251 alleles in the 24 sweetpotato materials (Table 4). The average number of alleles per locus was 5.02 alleles and a range of 1 to 13 alleles. Several studies have also reported a high number of alleles ranging between 2-23 alleles per loci using SSR markers to study the genetic diversity of sweetpotato germplasm, which is similar to that reported in this studies [8, 9, 48, 50, 51]. This indicates a high polymorphism among the sweetpotato accessions studied. Conversely, Hwang et al. [52] had low polymorphism and recorded 1 to 4 alleles per SSR using varied annealing temperatures and SSR primers. The result of our current study confirms the exceptional discriminatory ability of SSR markers [53]. As a hexaploid plant, distinguishing between homozygous and heterozygous sites becomes difficult hence dominant markers are preferred over collinear markers [14, 15]. Previous studies reported the high polymorphism of sweetpotato which is attributed to the large genome size and high heterozygosity [52] influenced by its mating systems (self-incompatibility and outcrossing). Again, the polyploidy (autohexaploid) of sweetpotato combined with the large chromosome number (2n = 6x = 90) makes sweetpotato SSR primers highly polymorphic [21, 54]. Hence, it is likely for sweetpotato genotypes to have huge genetic distances among them even in smaller populations [55]. In this study, we recorded 27 (54%) primer pairs exhibiting polymorphism among the 50 primer pairs. This value was higher than the 41.9% polymorphism recorded by Wang et al. [28] in the eight cultivated sweetpotato varieties tested but lower than the 67.2% and 62.5% polymorphism reported in different sweetpotato test materials [29, 32]. Differences in polymorphism are attributed to the different geographic origins of samples and the number of DNA samples used. For instance, Chavarriaga-Aguirre et al. [56] observed a relatively high polymorphism after increasing the number of samples from the initial 38 to about 500 or more in cassava. Generally, studies involving comparative genomics, genetic linkage mapping, diversity analysis, gene-based association, and evolutionary analysis require polymorphic markers. Thus, the SSR markers in this study could be used for such studies in sweetpotato. Polymorphic information content (PIC) denotes the degree of SSR variation and also assess the discriminatory efficiency of SSR markers [57]. SSR polymorphism of a particular observed size may be derived from two or more homoeologous loci. In other words, a clear single SSR band amplified by one pair of primers may well be from two or more loci (where an allele may mask or override the effect of another allele) and this overlapping problem of SSRs is very severe in sweetpotato due to its polyploid (hexaploid) nature. Thus, in principle, a pair of primers may amplify six alleles making it difficult to distinguish homozygous sites from heterozygous sites. The acclaimed large heterozygosity in sweetpotato is more of homoeoallelic than allelic variations, hence may not be considered as a true heterogenecity. Therefore, the PIC of the SSRs were not determined because of the aforementioned problem of SSR markers detected in sweetpotato.
The average SSR-based genetic distance among the 24 sweetpotato varieties was 0.740 on average for values ranging between 0.605 and 1.00 (Additional file 1: Table A1). The genetic similarity coefficient range of 0.66 to 0.87 with a mean value of 0.765 recorded in this study is high, indicating a low diversity in the sweetpotato materials studied (Figure 2). The result is consistent with Hwang et al. [52] who recorded a high similarity coefficient of 0.64 on average and thus, concluded a low diversity among the accessions studied. On the contrary, Yada et al. [50] reported an average similarity coefficient of 0.57 by evaluating the genetic diversity of cultivars from Uganda. Zhang et al. [58] observed a low similarity coefficient (0.588) among sweetpotato varieties from South America. Tumwegamire et al. [9] also recorded a similarity coefficient of 0.54 on average when the genetic diversity of farmer varieties of both white- and orange-fleshed sweetpotato from East Africa were assessed. Similarly, David et al. [59] reported a low genetic similarity coefficient of 0.54 on average and concluded a high diversity among the studied accessions. Thus, the differences could be attributed to the number and type of markers used and the genotypic variances. The clustering results revealed no direct relationship between the national and regional sources of germplasm, indicating a more frequent exchange of germplasm in sweetpotato cultivation and breeding. The results from this study provide background information for genomic-SSR markers in sweetpotato.