Genome sequencing survey and identification of SSR of Lycium ruthenicum

Background Lycium ruthenicum had high economic and ecological role in western China due to the high content of active substances and tolerance to drought and salinity stress. But its genomic information was lack, which seriously affected the next breeding and forestation. We surveyed the genomic size and developed SSRs of L. ruthenicum based on the next generation sequencing technology to lay a theoretical foundation for next genomic research in this study. Results Totally 451,721,828 bp raw data were generated, 4,596,439 scaffolds were obtained after assembly. The estimated genome size of L. ruthenicum was 3,249.33 Mb, the heterozygosity rate was 1.13%, and repeat rate was 73.13%. Totally 958,619 SSRs were identified. The average SSRs density were 163.95 SSRs/Mb, the dinucleotide repeat motif accounted for larger proportion in all motifs, the AT/AT, AC/GT and AG/CT are dominant repeat motifs in L. ruthenicum genome. These results could lay a foundation for next genome sequencing. And SSR data could alarge the molecular resources for L. ruthenicum and relatives, such as genetic mapping, QTL and population genetic study.

research showed that appropriate salt stress was beneficial to the seed germination of L. ruthenicum (Chen et al. 2010; Wang et al. 2014). Based on the drought-resistance and saline-alkali resistance, L. ruthenicum has been widely planted in saline-alkali land improvement, water and soil conservation, desert management and so on (Jalali et al. 2012;Peng et al. 2013). The cultivation of L. ruthenicum was asexual propagation by wild seedlings and lack of good cultivars. And because of its important medicinal value, the wild resources have been endangered due to overexploitation ).
With the development of genomic technology and bio-informational analysis, many genomic researches about the Solanaceae plant have been reported, such as tomato ( But few studies have been reported on L. ruthenicum, the genetic information has lack. The lack on genomics, genetics and cell biology of L. ruthenicum has restricted the cultivation and improvement in disease-resistant and new varieties of L. ruthenicum.
Considering its importance role in economy and ecology, it is urgent to reveal the potential genetic background in the synthesis of active components, as well as the genetic mechanism related to its drought and salinity resistance. Based on the genetic information, we can clarify the synthesis pathway and regulatory mechanism of its specific active components, so as to lay the foundation for targeted genetic improvement by means of molecular biology.
In this study, we surveyed the genome size and assessed the genomic characteristics of L. ruthenicum, such as heterozygosity and repeat sequence information based on the next generation sequencing technology. The aim of this study was to provide evidence and consult to a completely genome sequencing and assemble program for L. ruthenicum.

Results
A total of 451,721,828 bp raw data were generated from the Illumina HiSeq 2000 sequencing platform. The values of Q20, Q30 were 95.25% and 89.12% respectively, and the GC content was 41.66% (Table 1). The peak of the depth distributed at 33,the estimated genome size of L. ruthenicum was 3,249.33 Mb ( Table 2). The heterozygosity rate was 1.13% repeat rate was 73.13%. About 10,000 high quality reads were randomly selected(5000 read 1 and 5000 read 2) and mapped to the nucleotide database of NCBI based on the Blast program, the sample was considered free from potential contamination in the case of homologous alignment. The Blast program found Solanum lycopersicum (0.71) and Nicotiana tabacum (0.33) were the top two homologue species with L. ruthenicum. After assemble, a total of 5,257,494 contigs were obtained, with N 50 was 1,145 bp and N 90 was 150 bp. And 4,596,439 scaffolds were assembled; N 50 and N 90 were 1,693 bp and 150 bp respectively. All of assemble data showed in Table 3. The GC contents of assembled scaffolds were statistically and analyzed.  In dinucleotide repeat motif, the repeat type of AT/AT (134,917; 55.12%) was the dominant, followed The high heterozygosity was the reason of fitness and ecological success (Vrijenhoek 1994), and was related with morphological and adaptive differentiation of species. The heterozygosity rate of L.
ruthenicum is 1.13%, suggested that the structure of L. ruthenicum genome has great variation. We speculated that the high heterozygosity of L. ruthenicum has resulted from the long evolution and adaptation process. Therefore, given the high heterozygosity of the genome, it is not suitable for genome assembly based on the second-generation sequencing results, and it is recommended to use the third-generation sequencing technology with a longer reading length.
The repeat rate of L. ruthenicum was estimated as 73.13%, which was higher than the proportion in potato (62.2%) (Potato Genome Sequencing Consortium 2011), but was lower than the proportion in pepper (81%) (Varshney et al. 2012). According to the Uozu (1997), the number of repetitive sequence contributed to the nuclear DNA content. Therefore, the different proportions of the repetitive elements caused the genome size variation of same Solanceae family.
Repetitive sequences played the important role in evolution process, they were expanded and Length distribution and percentage of microsatellites in L. ruthenicum.

Figure 1
Length distribution and percentage of microsatellites in L. ruthenicum.