EST-SSR markers identification based on RNA-sequencing and application in Schisandra chinensis

Background: Schisandra chinensis, a climbing woody vine, is the best-known and representative genus of the Schisandraceae family which is an important plant in Chinese herbal medicine; however, the application of molecular breeding is restricted by the few genetic markers for this species. Results: In this study, we performed transcriptome sequencing of S. chinensis using the Illumina HiSeq platform to establish a library of expressed sequence tag-simple sequence repeat (EST-SSR) markers. A total of 59,786 unigenes were obtained and 6254 putative SSR sites were detected with a frequency of 10.46%. The predominant type of repeat motif was dinucleotide (35.71%), followed by trinucleotide (13.22%), hexanucleotide (0.50%), tetranucleotide (0.06%), and pentanucleotide (0.22%). We randomly selected 50 EST-SSR primer pairs and used 14 of these for genetic diversity analysis in 42 S. chinensis genotypes. All 42 accessions were successfully identified and formed four major clusters, indicating that the SSR markers can be used for genetic diversity analysis and genetic linkage map construction. In addition, using the polymorphic bands associated with 10 markers as DNA fingerprints, we generated a manual cultivar identification diagram that can distinguish between the 42 accessions, with different individuals identifiable based on polymorphic band patterns. Conclusion: S. chinensis transcriptome data is an effective resource for developing SSR markers. These results can provide a basis for the identification of S. chinensis accessions and construction of genetic linkage maps as part of future selective breeding and conservation efforts for this valuable plant.


Background
Schisandra chinensis (Turcz.) Baill. belonging to family Schisandraceae is an important plant in Chinese herbal medicine [1][2]. While only S. sphenanthera and S. chinensis are listed in the Chinese Pharmacopoeia, in fact, most Schisandra species have medicinal value [3][4]; specifically, the fruit has multiple beneficial therapeutic and physiologic properties including adaptogenic, hepatoprotective, anticancer, antioxidant, and anti-inflammatory properties. These are mostly attributable to dibenzocyclooctadiene lignans, which are widely referred to as Schisandra lignans [5][6]. About 150 lignan derivatives with a dibenzocyclooctadiene skeleton have been identified in fruit extracts [7]. Due to habitat loss and excessive economic exploitation, the abundance of Schisandra species has decreased markedly in recent years, such that they are now endangered [8].
Molecular marker-assisted selective breeding can be applied to the genetic diversification of Schisandra varieties. This requires the identification of DNA markers and generation of linkage maps for target gene localization. However, only few types of marker have been used to date in Schisandra -namely, inter simple sequence repeats (ISSRs) [9][10], simple sequence repeats (SSRs) [11][12], amplified fragment length polymorphisms (AFLPs) [13][14], and randomly amplified polymorphic DNA (RAPD) [15]. In order to increase the collection of available markers, in the present study, we screened EST-SSR markers for Schisandra. Our results provide a foundation for research on germplasm resources and functional gene localization as well as for marker-assisted breeding of Schisandra.

SSR site distribution in S. chinensis
We searched 59,786 unigene sequences in the transcriptome data and detected 6254 SSR sites in 4989 sequences. The SSR frequency was 10.46%. There were 897 unigene sequences containing two or more EST-SSR sites, and all sequences included a complex SSR site.
The types of EST-SSR detected in the transcriptome varied and their frequencies differed significantly (Table 2); mono-, di-, and trinucleotides were the most common, accounting for 60.06%, 31.61%, and 7.84%, respectively, of all SSRs. Tetra-, penta-, and hexanucleotides were rare, accounting for 0.49% of all SSR. There were 10 SSRs at most SSR sites (22.91%).

Development S. chinensis EST-SSR primer pairs and detection of polymorphisms
To obtain high quality SSR primers that could detect polymorphisms, we randomly selected 50 primer pairs to evaluate polymorphisms among four accessions of S. chinensis (Yanhong, Zaohong, Jinwuwei, and 12-(-2)-1). We identified 14 pairs of primer sets that were effective (Additional file 1), with a mean amplification rate of 28%.

Discrimination between different S. chinensis genotypes using EST-SSR primer pairs
In the genetic diversity analysis, the 14 EST-SSR primer pairs identified as described above could be used to differentiate between the 42 S. chinensis accessions. Using NTSYS-pc software to analyze genotype data, the accessions were classified into four groups at a similarity index of 0.63 (Fig. 2). A dendrogram revealed clear distinctions between the accessions, reflecting a high genetic diversity that can be exploited for S. chinensis identification based on a DNA fingerprint. The relatedness of the 42 accessions was supported by similarity coefficients ranging between 0.61 and 0.97. Group I, which comprised 28 varieties mostly originating in Jilin, was the largest group with four subgroups and a similarity coefficient of 0.682. Group II included four varieties; most were from Heilongjiang, with one accession from Jilin. '18-10-3'and '162-1-4' did not cluster with any of the groups and were designated as group IV, and the remaining accessions constituted group III. All 42 accessions were distinguishable based on the 14 EST-SSR markers and their clustering pattern was concordant with their distribution, indicating that EST-SSR data obtained by transcriptome analysis can reveal the genetic relatedness of S. chinensis germplasm resources.

Accession identification
There is a need for a simple, practical, and reliable method for identifying S. chinensis accessions. Of the 14 primer pairs that were tested, ten were required to clearly distinguish between the 42 accessions ( Fig. 3). All accessions were initially identified based on different combinations of the 220-, 270-, and 280-bp bands amplified by primer pair no. 30 (Fig. 3). The smallest group contained only two strains-18-10-3 and 17-N1-N1-that were further distinguished based on a 240-bp band amplified by primer set no. 11. Likewise, all five of the other groups could be differentiated using the primers shown in Figure 3. Thus, all of the accessions could be identified using 10 pairs of primers for the construction of a manual cultivar identification diagram (MCID).

Discussion
S. chinensis (Turcz.) Baill. is a plant used to treat asthma and cough in traditional Chinese herbal medicine that is mainly distributed in northeastern China as well as in Korea, far eastern Russia, and northern Japan [17][18][19]. However, appreciation of the medicinal value of S. chinensis has led to its over-exploitation as well as habitat destruction, which has severely depleted natural S. chinensis sources. There is growing interest among herbalists and across the general population to preserve natural sources of important herbal medicine plants, including S. chinensis [8,[20][21]. Sustainable use of S. chinensis requires an understanding of its population genetic structure and diversity.
DNA markers are useful tools for genetic diversity analysis owing to their abundance, codominance, reproducibility, and high degree of polymorphism. A variety of markers including rbcL, internal transcribed spacers, AFLPs, ISSRs, and SSRs have been described in S. chinensis [11-13, 20, 22].
Dinucleotides (37.83%) and trinucleotides (14.00%) were the main types of SSR in Schisandra; the two times higher frequency of the former compared to the latter is consistent with the trend observed in peanut [32], precocious trifoliate orange [37], and litchi [20], although there is also evidence that trinucleotides are the most common SSR type [20,23,26,28,31]. This difference may be related to the characteristics and quantity of EST-SSR and EST resources in plants. TC/GA was the most common dinucleotide repeat unit in S. chinensis, which is the same as pigeonpea [34] but in contrast to barley, wheat, corn, sorghum, rice [28], eggplant [27], peanut [32], Miscanthus [26], thorn pear [35], and T. chinensis [30]. GAA/TTC was the most abundant trinucleotide repeat motif, as reported for pansy [38].
Of the 50 EST-SSR primer pairs that we tested, 14 produced stable and polymorphic bands of the expected size, with an amplification efficiency of 28%; this is comparable to Morinda officinalis How (24%) [39] but higher than pigeon pea [34] and lower than onion (60%) [23], thorn pear (54.76%) [35], and T. chinensis (53.23%) [30]. Using these primer sets, the accessions could be divided into four distinct groups. To date, only nine EST-SSR markers have been identified to date for S. chinensis.
In this study, we used 10 EST-SSR markers to generate an MCID identification map for different S. In this study, specific SSR markers were developed based on the transcriptome sequencing data of S. chinensis, and the frequency and distribution of SSR markers were analyzed. Using these markers, we successfully analyzed the genetic diversity and identified different accessions of S. chinensis. The results demonstrated that transcriptome sequencing is an effective method to identify molecular markers. Our work may lay a foundation for genetic diversity, genetic mapping, and marker-assisted selection in S. chinensis. It may facilitate S. chinensis breeding, as well as studies with other Schisandra plants with economic and medicinal value.

Sample collection and total RNA extraction
The plant materials were obtained from germplasm resource garden of Jilin Agriculture University  [41]. RNA was visualized by 1.5% agarose gel electrophoresis.

Transcriptome analysis
Transcriptome data were obtained in 2018 by Illumina (San Diego, CA, USA) high-throughput deep sequencing. RNA was extracted with the CTAB method from six to eight S. chinensis seedlings at the true leaf stage and then reverse-transcribed into cDNA, which was sent to Biomarker Technologies (Beijing, China) for transcriptome sequencing. A total of 59,786 unigenes were assembled with the Trinity method [42] and used as background data for analysis.

PCR amplification and data analysis
The 50 EST-SSR primers were tested to identify those that were stable and could detect polymorphisms. The PCR reaction, which had a total volume of 16 µl, contained 8 µl of 2× Ex Taq Master Mix, 0.8 µl of each of primer, 5.4 µl ultrapure water, and about 20 ng DNA template. PCR amplification was carried out as previously described [10]. The products were separated on a 5% polyacrylamide gel; bands were visualized by silver staining.

Ethics approval and consent to participate
Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The raw data of RNA-seq are deposited in Sequence Read Archives Database (http://www.ncbi.nlm.nih.gov/bioproject/609148) under accession number PRJNA609148. Other dataset supporting the conclusions of this article are included within the article (and its additional file).

9
The authors declare that they have no conflict of interest.   Tables.docx