3.1. SSR screening of black cumin transcriptome
Black cumin transcriptome containing 67,591 unigenes 51,740,837 bp in size were screened for SSRs. As result, 8,038 SSRs were identified in 6,709 unigenes (9.93% of total unigenes). The size of the unigenes ranged from 200 nt to 14,144 nt with a mean value of 2,578.5 ± 1,289.1 nt (mean ± SE). SSR length ranged from 10 nt to 96 nt with a mean value of 14.66 ± 0.03 nt. (Online Resource 1, Table S1). Most of the Unigenes contained single SSRs (86.3% of total unigenes containing SSRs). Dinucleotide and trinucleotide repeats were the major repeat types which comprised 52.4% and 44.7% of total SSRs, respectively. Rest of the repeat types (tetranucleotide, pentanucleotide, and hexanucleotide repeats) had low frequency which comprised 2.9% of the total SSRs (Table 2). TC/GA was the most frequent repeat motif which comprised 11.8% of the total motifs. The rest of the SSR motifs had a frequency of less than 10% (Table 3). TCT/AGA was the most abundant trinucleotide SSR motif (4.7% of total SSRs).
Table 2. SSR types identified in the black cumin transcriptome.
Motif length
|
Number of SSRs
|
Frequency (%)
|
Dinucleotide
|
4,215
|
52.4
|
Trinucleotide
|
3,590
|
44.7
|
Tetranucleotide
|
174
|
2.2
|
Pentanucleotide
|
27
|
0.3
|
Hexanucleotide
|
32
|
0.4
|
Total
|
8,038
|
|
Table 3. The most frequent SSR motifs
SSR motif
|
Number of SSRs
|
% of SSR motif
|
TC/GA
|
946
|
11.8
|
CT/AG
|
736
|
9.2
|
AG/CT
|
720
|
9.0
|
GA/TC
|
637
|
7.9
|
TCT/AGA
|
380
|
4.7
|
TTC/GAA
|
349
|
4.3
|
CTT/GAA
|
295
|
3.7
|
GAA/TTC
|
289
|
3.6
|
AGA/TCT
|
243
|
3.0
|
AAG/CTT
|
239
|
3.0
|
3.2. Marker development for SSR polymorphism
For amplification, a total of 4,779 primer pairs targeting 5,297 SSRs (65.9% of total SSRs) were developed. Detailed primer data are given in Online Resource 2. Randomly selected 20 SSR primers were tested in a subset of the Nigella sativa population and a Nigella damascena genotype as an out-group. A total of 13 SSR markers (65%) had clear amplification and one of them (nsSSR2136) was polymorphic in agarose gel electrophoresis (Figure 1).
Thus, non-polymorphic SSR markers were separated by using capillary electrophoresis for better resolution. A total of nine SSR markers (69.2%) were transferable to Nigella damascena. PCR fragment number arranged from 1 to 6 with a mean value of 3.31 ± 1.26. All SSR markers except for nsSSR17 were polymorphic in all genotypes. All markers produced 41 polymorphic PCR fragments (97.7% of total PCR fragments). Interspecific Gene diversity of markers ranged from 0.18 to 0.37 with a mean value of 0.25. nsSSR1912 had the highest gene diversity (Table 4). All markers produced 30 polymorphic PCR fragments (90.9% of the total fragments) in Nigella sativa genotypes. The average fragment number ranged from 1 to 5 with a mean value of 2.54 ± 1.28. All SSR markers except for nsSSR2221 and nsSSR2567 were polymorphic. Intraspecific gene diversity of markers ranged from 0.10 to 0.31 with a mean value of 0.25. nsSSR1912, nsSSR1941, and nsSSR3047 had the highest gene diversity (Table 4).
Table 4. Gene diversity values of Nigella sativa specific SSR markers.
|
|
|
Nigella sativa genotypes & Nigella domesticane
|
|
Nigella sativa genotypes
|
SSR marker
|
SSR motif
|
Repetitions
|
No. of polymorphic fragments//total no. fragments (%)
|
Gene diversity ± SE
|
|
No. of polymorphic fragments//total no. fragments (%)
|
Gene diversity ± SE
|
nsSSR446
|
TCACAG/CTGTGA
|
8
|
6/6 (100)
|
0.28 ± 0.03
|
|
5/5 (100)
|
0.28 ± 0.06
|
nsSSR701
|
GAA/TTC
|
5
|
5/5 (100)
|
0.28 ± 0.05
|
|
5/5 (100)
|
0.28 ± 0.05
|
nsSSR1716
|
GAT/ATC & AG/CT
|
6 & 6
|
4/4 (100)
|
0.32 ± 0.07
|
|
3/3 (100)
|
0.28 ± 0.01
|
nsSSR1912
|
AG/CT
|
7
|
3/3 (100)
|
0.37 ±0.08
|
|
2/2 (100)
|
0.31 ± 0.13
|
nsSSR1941
|
GAA/TTC
|
5
|
4/4 (100)
|
0.31 ± 0.06
|
|
4/4 (100)
|
0.31 ± 0.06
|
nsSSR2136
|
CTT/AAG
|
5
|
3/3 (100)
|
0.23 ± 0.04
|
|
1/2 (50)
|
0.11 ± 0.09
|
nsSSR2139
|
CTC/GAG
|
5
|
3/3 (100)
|
0.30 ± 0.06
|
|
2/2 (100)
|
0.23 ± 0.09
|
nsSSR2221
|
TTC/GAA
|
6
|
2/2 (100)
|
0.18
|
|
0/1 (0)
|
0
|
nsSSR2239
|
CCA/TGG & TC/GA
|
5 & 5
|
2/2 (100)
|
0.2
|
|
2/2 (100)
|
0.2
|
nsSSR2419
|
GAA/TTC
|
5
|
3/3 (100)
|
0.27 ± 0.04
|
|
2/2 (100)
|
0.13 ± 0.05
|
nsSSR2543
|
TA/TA & CTG/CAG
|
5 & 7
|
4/4 (100)
|
0.22 ± 0.03
|
|
2/2 (100)
|
0.10 ± 0.05
|
nsSSR2567
|
AG/CT
|
5
|
0/1 (0)
|
0
|
|
0/1 (0)
|
0
|
nsSSR3047
|
CCT/AGG
|
5
|
3/3 (100)
|
0.36 ± 0.07
|
|
2/2 (100)
|
0.31 ± 0.13
|
3.3. Diversity analysis
Performance of newly developed SSR markers in terms of diversity was tested using Principal Coordinate Analyses (PCoA). As result, the PCoA plot contained two clusters (cluster A and B). Cluster A (green colored) contained two landraces collected from Amasya and Eskişehir and one cultivar (Çameli). Cluster B (red-colored) contained four landraces collected from Şanlıurfa, Burdur, Tokat, Samsun. Although the landrace collected from Konya is located close to Turkish landraces, it was not clustered. While landrace from Syria was located in the upper right quadrant of the PCA plot, Nigella damascena was located in the lower right quadrant of the PCA plot apart from the rest of the material (Figure 2). Average interspecific diversity (including Nigella damascena) ranged from 0.29 (71% similarity) to 0.99 (1% similarity) with a mean value of 0.66. Intraspecific diversity (excluding Nigella damascena) ranged from 0.29 (between landraces collected from Tokat and Burdur) to 0.82 (between landraces collected from Konya and Syria) with a mean value of 0.57.
MCMC methods-based hierarchical clustering analysis was performed using distance matrix calculated from binary data of the SSR markers developed in the present study. As expected, Nigella damascena was not clustered with Nigella sativa genotypes. The dendrogram contained two clusters (blue and green colored clusters). While blue colored cluster contained landraces from Burdur and Tokat, red-colored clusters contained six landraces collected from Syria, Şanlıurfa, Samsun, Konya Eskişehir and Amasya and one cultivar (Çameli) (Figure 3). Landraces collected from Syria and Konya had the highest branch length.
3.4. Functional annotation of the transcriptome
Functional annotations of unigenes containing SSR markers were performed. As result, SSR markers had 2,972 hits for 539 molecular functions. SSR markers had ATP, metal ion, DNA, nucleic acid, RNA, and zinc ion binding functions had the highest frequency. A total of 292, 160, 154, 145, 134, and 105 SSR markers had these functions, respectively. For biological processes, SSR markers had 1,432 hits in 530 processes. Regulation of transcription and translation was the most prevalent biological processes with 83 and 52 SSR marker hits, respectively. Also, cellular components of SSR markers were predicated. As result, SSR markers had 2002 hits for 217 cellular components. Integral membrane and nucleus have the most frequent cellular components with 1,063 and 243 hits. Overall, 2,141 markers had hits for uncharacterized proteins and 6 for unplaced genic scaffolds. A total of 167 SSR markers did not have any hits, while 2,456 SSR (51% of total SSRs) markers were annotated. Detailed annotation results of SSR markers are given in Online Resource 3.