SSR site distribution in S. chinensis
We searched 59,786 unigene sequences in the transcriptome data and detected 6254 SSR sites in 4989 sequences. The SSR frequency was 10.46%. There were 897 unigene sequences containing two or more EST-SSR sites, and all sequences included a complex SSR site.
The types of EST-SSR detected in the transcriptome varied and their frequencies differed significantly (Table 2); mono-, di-, and trinucleotides were the most common, accounting for 60.06%, 31.61%, and 7.84%, respectively, of all SSRs. Tetra-, penta-, and hexanucleotides were rare, accounting for 0.49% of all SSR. There were 10 SSRs at most SSR sites (22.91%).
Characteristics of EST-SSRs
We analyzed di-, tri-, tetra-, penta-, and hexanucleotides; mononucleotides were excluded because of the potential for poor sequencing quality caused by homopolymerization [16]. In total, 82 types of motif were identified, with 8, 30, 25, 5, and 14 di-, tri-, tetra-, penta-, and hexanucleotide repeats, respectively (complementary sequences were considered one type of motif) (Table 3). The most highly represented EST-SSR type was dinucleotide (35.71%); TC/GA was the most common motif (48.46%) followed by CT/AG (39.52%), with other motif types constituting just 12.02% of all EST-SSR dincleotides. There were 30 types of trinucleotide repeat motif among S. chinensis ESTs; the most frequent were GAA/TTC and AGA/TCT, which accounted for 16.59% (846) and 13.49% (688) of the total, respectively.
The frequency of different motifs in the EST datasets varied, with 20572 (79.39%) dinucleotides, 5100 (19.68%) trinucleotides, 206 (0.81%) tetranucleotides, 27 (0.1%) pentanucleotides, and six (0.02%) hexanucleotides (Fig. 1a). The number of motif repeats ranged from five to 12, with six being the most common number of repeats (Fig. 1b).
Development S. chinensis EST-SSR primer pairs and detection of polymorphisms
To obtain high quality SSR primers that could detect polymorphisms, we randomly selected 50 primer pairs to evaluate polymorphisms among four accessions of S. chinensis (Yanhong, Zaohong, Jinwuwei, and 12-(-2)-1). We identified 14 pairs of primer sets that were effective (Additional file 1), with a mean amplification rate of 28%.
Discrimination between different S. chinensis genotypes using EST-SSR primer pairs
In the genetic diversity analysis, the 14 EST-SSR primer pairs identified as described above could be used to differentiate between the 42 S. chinensis accessions. Using NTSYS-pc software to analyze genotype data, the accessions were classified into four groups at a similarity index of 0.63 (Fig. 2). A dendrogram revealed clear distinctions between the accessions, reflecting a high genetic diversity that can be exploited for S. chinensis identification based on a DNA fingerprint. The relatedness of the 42 accessions was supported by similarity coefficients ranging between 0.61 and 0.97. Group I, which comprised 28 varieties mostly originating in Jilin, was the largest group with four subgroups and a similarity coefficient of 0.682. Group II included four varieties; most were from Heilongjiang, with one accession from Jilin. ‘18-10-3’and ‘162-1-4’ did not cluster with any of the groups and were designated as group IV, and the remaining accessions constituted group III. All 42 accessions were distinguishable based on the 14 EST-SSR markers and their clustering pattern was concordant with their distribution, indicating that EST-SSR data obtained by transcriptome analysis can reveal the genetic relatedness of S. chinensis germplasm resources.
Accession identification
There is a need for a simple, practical, and reliable method for identifying S. chinensis accessions. Of the 14 primer pairs that were tested, ten were required to clearly distinguish between the 42 accessions (Fig. 3). All accessions were initially identified based on different combinations of the 220-, 270-, and 280-bp bands amplified by primer pair no. 30 (Fig. 3). The smallest group contained only two strains—18-10-3 and 17-N1-N1—that were further distinguished based on a 240-bp band amplified by primer set no. 11. Likewise, all five of the other groups could be differentiated using the primers shown in Figure 3. Thus, all of the accessions could be identified using 10 pairs of primers for the construction of a manual cultivar identification diagram (MCID).