Frequency and Distribution of SSRs in the G. max TF Genes
In the present study, a total of 6150 TF coding sequences of G. max with an average length of 1170 bp were mined for SSRs and used to design the TF-derived genic markers (Table 1). The TF genes containing microsatellite search was based on MISA, and it detected a total of 1550 SSRs in 6150 (25.2%) TF genes, with a distribution frequency of one SSR locus per 4.6 kb, which was higher than the early reports on TF-derived SSRs in chickpea (7.1 kb)(Kujur et al., 2013) and EST-derived SSRs in soybean (7.7 kb) [17], peanut (7.3 kb)[18] among leguminous crops.
Table 1
Summary of TF-derived SSR searches in G. max
Search Items
|
Numbers
|
Total number of TFs examined
|
6150
|
Total number of identified SSRs
|
1550
|
Number of SSR containing TFs
|
1138
|
Number of TFs containing more than 1 SSR
|
282
|
Repeat type
|
Mononucleotide
|
26
|
Trinucleotide
|
1455
|
Pentanucleotide
|
1
|
The total length of sequences searched (kb)
|
7200
|
Frequency of SSRs
|
One per 4.64 kb
|
Varied repeat motifs were detected in G. max TF-derived SSRs, and they were unevenly distributed in different motifs and locations (Table 1). Investigating those SSR motifs revealed that 282 (24.7%) TF genes contained more than one SSR. Of the 1138 total SSRs, 978 (85.9%) contained simple repeat motifs, while 160 (14%) were found to be compound motifs. Among the simple repeat motifs, tri-nucleotide motifs were the most abundant (93.8%), followed by di- (3.1%), mono-(1.6%) and hexanucleotide motifs (1.1%). Only one tetra-nucleotide motif (AAAG/CTTT) and one penta-nucleotide motif (ACACT/AGTGT) were spotted in G. max TF sequences. It was reported that tri-nucleotide repeats were the most common motif for SSR markers developed in most of the species, followed by di and tetra-nucleotide repeats. In cereals, tri-nucleotide repeats were the most recurrent motif present in the ESTs (54–78%). Di-nucleotides frequency was found 17.1–40.4% and for tetra-nucleotides it was 3–6% (Fig. 1). It has been reported that in wheat more than 70% trinucleotide repeats were found in the coding sequences, whereas di-nucleotide repeats (~ 80%) were abundant in non-coding regions [19]. However, in this study in soybean, the most abundant repeat type observed was tri-nucleotide followed by di-nucleotide TF.
Interestingly, in this study, the abundance of tri-nucleotide repeats in the ORF of G. max TF genes could be ascribed to the nonappearance of frameshift mutations in coding sequences when there is an existence of length variation in these SSRs. AAC/GTT type tri-nucleotide repeats were found most frequent, followed by ACC/GGT type. Among mono-nucleotides, A/T type repeats were found in high frequency, while only two C/G repeats were detected. In di-nucleotides, AT/AT repeat was least abundant. As compared to other variants, tetra and penta-nucleotide frequency was very low. The observed repeat type and frequency in these two categories were AGCATC/ATGCTG (5), AAAG/CTTT (1), ACACT/AGTGT (1), AACCCG/CGGGTT (2), AACCCT/AGGGTT (1), AAGCCC/CTTGGG (1), AAGGAG/CCTTCT (3), ACCAGC/CTGGTG (2), ACCATG/ATGGTC (2), ACTAGT/ACTAGT (2).
Chromosomal Distribution Of Ssr Containing Tf Genes
We mapped all the 1127 SSR-containing TF genes onto 20 chromosomes using the MapChart tool after removing redundant SSR loci except 11 SSR that belong to unassigned scaffolds regions (Fig. 2). A total of 49 TFs were anchored to the different chromosomes having a compound SSR motif. This may be likely due to a whole-genome duplication event or may result in a tandem duplication. Mapped TF-derived SSR markers varied from a minimum of 18 on chromosomes 1,14,15,18 to a maximum of 49 on chromosome 13, followed by chromosomes 6 and 10 (48 SSRs each).
Functional Classification Of Ssr-containing Tf Genes
In the present study, the potential functions of 176 SSRs containing TF genes were evaluated by searching against the Gene Ontology (GO) database using the Blast2GO and WEGO software. Figure 3 summarizes the categorization of these TF genes according to biological process, cellular component and molecular function. A total of 1138 TF genes were divided into 33 GO categories, and it was found that 1100 TF genes were fully annotated. As per the study, 1013 genes were found to be predominant, having cellular functions and 1009 genes were having a functional role, and 913 genes had biological functions. In the cellular component category, cell and cell part genes were found to be (997 genes, 90.6% for both) dominated, followed by organelle (989 genes, 89.9%), and the least abundant was membrane-enclosed lumen (29 genes, 2.6%). Based on molecular function, the TF genes were classified into several groups: 932 TF genes (84.7%) were assigned to binding, followed by transcription regulation (637 genes, 57.9%), catalytic (56 genes, 5.1%), and structural molecule (1 gene, 0.6%). Rest were found to have molecular transducer activity, antioxidant activity, and molecular function regulator (2 genes for each, 0.2%). In the biological process category, there were the two most over-represented GO terms, i.e. cellular process (895 genes, 81.4%) and Metabolic process (880 genes, 80.0%), followed by biological process (872 genes, 79.3%) and a response to stress or stimulus (both 147 genes, 13.4%).
Development of G. max TF derived Markers
Out of the 1138 SSR-containing TF genes, a total of 1339 primer pairs could be successfully designed from G. max TF genes (96.26%); the remaining genes either had too-short flanking SSR loci sequences or did not match the required criteria for primer design. Details of the designed primer pairs are provided as supplementary data (Supplementary Table 1). A set of twenty markers with a total of 58 allelic loci have been validated in eight different soybean genotypes (Fig. 4) and distinct polymorphism was observed. These functional markers can co-relate with traits further for use in plant breeding research related to hormone signaling, pathogen defense response, abiotic stress tolerance, etc. since they belong to diverse gene families.
Genetic Diversity Analysis Of Soybean Accessions
Twenty TF-derived SSR primer pairs were randomly selected assuming the transferable candidate SSR markers in soybean and were verified for their potential use in diversity study and to ascertain that they were tested against eight soybean accessions (Supplementary Table 2). A total of 57 allelic polymorphisms were detected from the 20 polymorphic TFGM markers. The allele number produced per primer pair ranged from two (Glyma.10G029700.1.p, Glyma.11G216500.1.p, Glyma.13G236800.1.p, Glyma.09G207300.2.p and Glyma.04G242200.1.p) to four (Glyma.13G146400.1.p and Glyma.16G141300.1.p) with an average of 2.85. The highest polymorphism information content (PIC) value was noticed with primer Glyma.16G031400.1.p (0.99), and the lowest PIC was observed for MtTF64 (0.08), and the average PIC value was found 0.60 (Table 2).
Table 2. Details of the twenty polymorphic TF-derived SSR markers with their genetic parameter values
Sl. No.
|
SSR containing TF-gene
|
No of alleles
|
PIC value
|
TF Family
|
1
|
Glyma.15G063300.1. p
|
3
|
0.80
|
GeBP family
|
2
|
Glyma.10G029700.1. p
|
2
|
0.68
|
C2H2 family
|
3
|
Glyma.11G216500.1. p
|
2
|
0.73
|
GRAS family
|
4
|
Glyma.20G014400.2. p
|
3
|
0.63
|
HD-ZIP family
|
5
|
Glyma.13G146400.1. p
|
4
|
0.86
|
C2H2 family
|
6
|
Glyma.13G236800.1. p
|
3
|
0.84
|
C3H family
|
7
|
Glyma.11G136600.1. p
|
3
|
0.85
|
G2-like family
|
8
|
Glyma.17G170100.1. p
|
3
|
0.93
|
ERF family
|
9
|
Glyma.07G109500.1. p
|
3
|
0.95
|
SBP family
|
10
|
Glyma.05G200400.1. p
|
3
|
0.88
|
Trihelix family
|
11
|
Glyma.20G005800.1. p
|
3
|
0.90
|
C2H2 family
|
12
|
Glyma.16G141300.1. p
|
4
|
0.94
|
GRAS family
|
13
|
Glyma.10G225200.1. p
|
3
|
0.79
|
Trihelix family
|
14
|
Glyma.09G117200.2. p
|
3
|
0.70
|
B3 family
|
15
|
Glyma.06G121300.2. p
|
3
|
0.64
|
GRAS family
|
16
|
Glyma.05G027000.1. p
|
3
|
0.82
|
MYB family
|
17
|
Glyma.13G236800.1. p
|
2
|
0.79
|
C3H family
|
18
|
Glyma.09G207300.2. p
|
2
|
0.98
|
G2-like family
|
19
|
Glyma.04G242200.1. p
|
2
|
0.98
|
MYB family
|
20
|
Glyma.16G031400.1. p
|
3
|
0.99
|
WRKY family
|
As suggested earlier, PIC values greater than 0.5 specify informative markers, and more specifically loci with PIC values more than 0.7 are highly suitable for genetic mapping [20]. In the present study, all the SSR markers taken were found with PIC values greater than 0.5, and eighteen SSR markers showed PIC of more than 0.7, which indicates the high level of polymorphism of these genic SSR markers and their potential usage in genetic diversity study and analysis of genetic mapping. Nevertheless, the results highlighted the worth of the newly developed TFGM markers in our study and can be endorsed for cultivar identification as well as assessment of genetic diversity in soybean genotypes.