Characteristics of E. sibiricus draft genome
The development of next-generation sequencing (NGS) provided researchers with an attainable and cheaper method to access the plant genomes, especially for the non-model grass species like Elymus sibiricus. Based on Illumina sequencing with fourteen 270 bp insertion size libraries, surveyed draft genome of E. sibiricus was de novo sequenced and assembled in this study. The moderate GC content (45.68%) of the draft genome of E. sibiricus was excellently avoided the sequencing bias caused by Illumina platform. The final assembly had a N50 of 2,510 bp for contigs, far less than that of Lolium perenne L. (contig N50 = 16370 bp) [19], while slightly larger than that of allotetraploid Arachis hypogaea L. (contig N50 = 1782 bp) [9]. This may be caused by the relatively big estimated genome size (6.86 Gb), high repetitiveness (73.23%) of E. sibiricus and the short insertion size of library. The estimated genome size of E. sibiricus (6.86 Gb) was smaller than that of related allohexaploid Triticum aestivum (17 Gb) [20], while larger than that of many other important species in Gramineae, such as Hordeum vulgare (5.1 Gb) [21], Aegilops tauschii (4.5 Gb) [22], Triticum urartu (5.0 Gb) [23], Brachypodium distachyon (260 Mb)[24], Oryza sativa (466 Mb)[25], Sorghum bicolor (730 Mb)[26], Lolium perenne (2 Gb)[27] and Zea mays (2.3 Gb)[28]. The low level of heterozygosity for E. sibiricus (0.01%) obtained via the k-mer analysis was probably caused by the self-crossing mating system of E. sibiricus, and indicated its feasibility for genome sequencing. This is the first draft genome of E. sibiricus and it is useful in the molecular marker development and functional gene mining. This work also provided the basis for further whole-genome sequencing using larger insert libraries and new sequencing technique like the single-molecule real-time sequencing.
SSR marker development
SSR markers have been widely applied in genetic study and molecular breeding. Among all of the identified 293,362 SSRs, the vast of SSRs (97.95%) belonged to mono-, di- and tri-nucleotide motifs, which was similar to the result of restriction site associated DNA-Seq (RAD) in E. nutans [29]. However, in the transcriptome sequencing study of E. sibiricus, the tri-nucleotide motifs had the largest number [30], which could be due to the difference between sequences in non-coding and coding regions. Typically, the coding regions has a higher percentage of trinucleotides due to the enrichment of triplet codons under selection pressure [30]. Usually, the most abundant tri-nucleotide motif in monocotyledon is CCG/CGG [32], while in this study that is CTC/GAG. This could be the result of codon usage bias in different species[33]. The A/T rich tendency of SSRs in E. sibiricus was also consistent with the study of eukaryotes as reported [34]. The phenomenon that motif abundance decreased as the motif repeat number increased of each motif type was in accordance with the previous study [35].
For polyploid species, it’s usually hard to distinguish alleles because of the reciprocal overlapping and uncertain allelism of these fragments [36], which is difficult for genotype scoring. In this case, single-locus SSR markers are considered as the best choice, and development of single-locus SSRs has been reported in barley, peanut and Luffa by genome survey [9, 12, 37]. In this study, 10 single-locus SSR markers were developed via the genome survey of E. sibiricus with great potential use in genetic variation study and linkage map construction.
Effectiveness comparison between single- and multi-loci markers
Genetic diversity of 27 wild E. sibiricus accessions was evaluated by 30 markers developed in this study and other 30 ones reported before. We found that the expected single-locus SSRs screened by in silico analysis still exhibited multi-loci amplicons when separated by polyacrylamide gel. This may be caused by their non-conservatism of flanking sequences [9]. Finally, only 10 single-locus (ESGA-SL) markers and 20 multi-locus (ESGA-ML) markers were obtained in this study for genetic diversity analysis of 27 wild E. sibiricus accessions.
The average amplified alleles of the 10 ESGA-SL markers was 2.9, which was close to the allotetraploid species Arachis hypogaea (3.85) and Brassica napus (3.23) [9, 36]. The PIC value of the 10 ESGA-SL markers varied from 0.069 to 0.595 with an average of 0.391, that indicated its abundant polymorphism and high application value [38]. There was no significant difference of PIC detected between ESGA-SL and other three marker systems (ESGA-ML, ES and ESGS), which may be caused by the different calculation criteria between single-locus and multi-loci marker or the limited amplification loci of single-locus markers. According to the Mann Whitney test, G-SSR (ESGA and ESGS markers) was more efficient and polymorphic than EST-SSR (ES markers) in view of PIC, MI and Rp, that may be driven by the more conservative flanking sequences of EST-SSR [39, 40]. In addition, significantly (P < 0.05) higher PIC values of ESGA-ML markers vs. ESGS markers were calculated, which demonstrate the superiority of SSR markers development method by sequencing over traditional method.
The UPGMA and PCoA derived cluster analysis based on ESGA-SL markers divided the 27 wild E. sibiricus accessions into two groups, and the structure analysis based on Bayesian algorithm also revealed the same pattern. However, the other three types of multi-loci markers exhibited inferior ability than ESGA-SL marker in revealing actual genetic relationships. One should note that all the genetic diversity parameters (Na, Ne, I, He and PP) of each geo-group calculated based on ESGA-SL markers were higher than that of ESGA-ML, which suggested that the single-locus marker reveals more accurate genetic information, so it is more suitable for further genetic analysis [36]. However, slightly higher pairwise Fst values were observed among each geo-group based on ESGA-ML markers. Given that multi-loci SSRs possesses characteristic like multiple amplification sites in the genome location, a part of genetic information was unavoidably covered. The advantage of single-locus markers over multi-loci markers was manifested in this study, however, vast number of single-locus markers that covering the entire genome of each chromosome are needed for E. sibiricus. In this case, higher quality genome-wide sequencing and assembling for E. sibiricus are necessary.