De novo transcriptome analysis and development of novel EST-SSR markers in Bergenia ciliata (Haw.) Sternb. (Saxifragaceae) through Illumina sequencing

Materials And Methods

Plants material and RNA extraction

Rhizomes of B. ciliata were collected from the Galla Dhar, Rajgarh, Jammu and Kashmir (N33°11'39.41'' E 75°22'44.74'') and planted in a glasshouse at CSIR-National Botanical Research Institute, Lucknow, India. The Fresh leaf samples were collected, cleaned, and stored immediately in liquid nitrogen and kept at -20℃ until RNA extraction. Total RNA was isolated using a commercially available QuickRNA Plant Miniprep Plus kit (ZYMO Research) as per the manufacturer’s instruction. Qualitative analysis of the isolated RNA samples, RNA degradation and contamination were monitored on 1% denaturing RNA agarose gel. RNA purity was checked using the Nano Photometer® spectrophotometer (IMPLEN, CA, USA) and RNA integrity and quantitation were assessed using RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA).

Cdna Library Construction And Rna-sequencing

A total amount of 1µg RNA per sample was used as input material for preparation of RNA samples. More than 30 ng of RNA was equally pooled from the three individuals for preparing a complementary DNA (cDNA) library. Sequencing libraries were generated using NEBNext® Ultra™ RNA Library Preparation Kit for Illumina® (NEB, USA) as per the manufacturer’s protocol and index codes were added to attribute sequences to each sample. Purification of mRNA was done from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5X). The first strand of cDNA was synthesized using random hexamer primer and M-MuLV Reverse Transcriptase (RNase H), and the second strand was synthesized subsequently using DNA Polymerase I and RNase H. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of 3’ends of DNA fragments, The NEBNext Adaptor with hairpin loop structure was ligated to prepare for hybridization.

To select cDNA fragments of preferentially 250 ~ 300 bp in length, the library fragments were purified with AMPure XP system (Beckman Coulter, Beverly, USA). Then 3µl of USER Enzyme (NEB, USA) was used with size-selected, adaptor-ligated cDNA at 37°C for 15 min followed by 5 min at 95°C before PCR. PCR was performed with Phusion high-fidelity DNA polymerase, universal primers, and index (X) primer. At last, PCR products were purified (AMPure XP system) and library quality was assessed (Agilent Bioanalyzer 2100 system). The clustering of the index-coded samples was performed on a cBot Cluster Generation System using PE Cluster Kit, cBot-HS (Illumina) according to the manufacturer’s instructions. After cluster generation, the library preparations were sequenced, and paired-end reads were generated.

The sequenced raw data were stored in FASTAQ format. RNA isolation, Illumina library preparation, and sequencing were outsourced to Eurofins Scientific, Bengaluru, India. The raw reads of the transcriptome sequence were submitted to the sequence read archive (SRA) under accession number PRJNA775824.

De novo transcriptome assembly and analysis of sequence data

De novo assembly of the transcriptome was carried out in Trinity (Grabherr et al. 2011). Gene Ontology (GO) annotations as well as KOG functional analysis of the unigenes and pathway were performed based on four public databases (Cai et al. 2018; Yue et al. 2017). The GO annotation and functional classification of the unigenes were carried out in automatic and high throughput software, Blast2GO (Conesa et al. 2005).

Identification And Development Of Simple Sequence Repeat Markers

The MIcroSAtellite program (MISA; http://pgrc.ipk-gate rsleben.de/misa/) was used to detect SSRs in the unigenes. Simple sequence repeats consist of a repeating unit of 1–6 bp, and can be of three types according to their distribution and composition (i.e., compound, pure, and interrupted). In this study, SSRs comprised at least four uninterrupted repeats of 2–6 bp. Minimum number of repeats of each unit size obtained were: 1–10; 2–6; 3–5; 4–5; 5–5 and 6 − 5. SSRs occurred at a frequency of 1/kb of cDNA. PCR primers (20 bp) with a melting temperature (T_m) of 55–65 ºC, were designed to produce an amplification product of 100–200 bp, using primer designing tool, Primer3 v 2.3.5 (Untergasser et al. 2012).

Dna Extraction, Est-ssr Amplification And Ssr Marker Validation

The extraction of genomic DNA from the fresh leaves of eight B. ciliata accessions collected from eight different geographical locations (Table 1.) was carried using CTAB method (Doyle and Doyle 1990). The SSR-PCR reactions were carried in 10 µL volume, containing 2 µL water, 6 µL PCR Master Mix 2X (Thermo Scientific Pvt. Ltd.), 0.5 µL of each primer at a concentration of 10 µM, and 1 µL of genomic DNA (30–50 ng). The reaction conditions for SSR primers were optimized to an initial denaturation at 95 ºC for 4 min, followed by 35 cycles of 45 sec at 94 ºC, annealing temperature for 45 sec, and 72 ºC for 1 min, and a final extension at 72 ºC for 5 min. The amplified products were separated on 2.5% agarose gels and stained with ethidium bromide (10 mg/mL) for visualization. Step Up (50 bp) DNA Ladder (GeNei Laboratories Pvt. Ltd.) was used as a reference for comparison of the band size of the amplified products. The gel was documented using UV Tech Gel Documentation System (UK) and the gel patterns were documented as digital images for further processing.

Table 1

Accessions of *B. ciliata* used in validating of EST-SSRs
SN	Accessions number	Locality	Latitude	Longitude	Altitude (m)
1	255845	J&K, Rajouri, Dera Gali Forest, RJR	33.58432°	74.35805°	2036
2	253516	HP, Kothi to Rahla, Kullu, KLU	32.34808°	77.22172°	2277
3	335347	UK, Dehradun, Chakarata, CHK	30.681388°	77.880828°	2097
4	335316	UK, Dehradun, Mussoorie, MSR	30.447476°	78.091282°	1902
5	335301	UK, Bageshwar, Gansi, GSI	30.04853°	79.97393°	1615
6	253459	UK, Almora, Binsar Wildlife Sanctuary, BWS	29.68313°	79.72982°	1923
7	255752	SK, East Sikkim, Penlong, PLG	27.37408°	88.62213°	1646
8.	255715	WB, Darjeeling, DRG	27.05383°	88.25267°	2087

Results

Illumina sequencing and de novo assembly

In Illumina sequencing platform, the cDNA library was constructed using a total of 21,490,725 paired-end raw reads, which included null reads, low-quality sequences, and adapter-primer sequences. Total of 21,277,286 high-quality clean reads with 98.02% (Q20) and 93.98% (Q30) bases were obtained after a stringent quality check and data filtering. The GC percentage for the clean reads had 4.71 and clean reads were total nucleotide number of 133,174,507 (66,597,750 transcripts + 66,576,757 unigenes) (Table 2). Total unigenes identified with paired-end reads were 65,010 bp, with total length of 665,767,57 bp encompassing an average length of 1024.10 bp. From a given set of contigs, the values of N50 and N90 were generated as 1349 and 494 bp, respectively. In the 65,010 unigenes, 17,527 unigenes (26.96%) had a length of 200-500bp; 21,591 unigenes (33.21%) ranged from 500-1 kbp; 19,547 bp unigenes (30.06%) ranged from 1kbp-2kbp, and the length of 6,345 unigenes (9.76%) ranged > 2kbp (Fig. 1).

Table 2

Statistics of the transcriptome data generated after Illumina sequencing in *B. ciliata*
Category	Items	Number
Raw reads	Total raw read	21,490,725
Clean reads	Total clean reads	21,277,286
	Total clean nucleotides (nt)	133,174,507
	Q20 percentage	98.02%
	Q30 percentage	93.98%
	GC percentage	44.71%
Unigenes	Total sequence number	65,010
	Total sequence base	665,767,57
	Largest	6,531
	Smallest	301
	Average	1,024
	N50(bp)	1,349
	N90 (bp)	494
EST-SSR	Total number of examined sequences	65,010
	Total size of examined sequences (bp)	665,767,57
	Total number of identified SSRs	18,226
	Number of SSR-containing sequences	14,497
	Number of sequences containing more than one SSR	2,913
	Number of SSRs present in compound formation	1,468

Frequency And Distribution Of Est-ssrs In The Unigenes

Out of 65,010 unigenes, 18,226 potential EST-SSRs were identified, and 1,468 compound microsatellites obtained from the 18,226 EST-SSRs (Table 3). The SSR frequency of unigenes in B. ciliata was 28.03%. A total of 18,226 EST-SSRs were identified, wherein the most prominent type repeat was dinucleotides (8,728, 47.89%), followed by mono (6,327, 35.04%) and trinucleotide repeats (2,899, 15.91%) (Table 4; Fig. 2).

Table 3

EST-SSR markers identified from *de novo* transcriptome sequencing in *B. ciliata*
Searching Items	Numbers
Total number of sequences examined	65,010
Total size of examined sequences (bp)	66,576,757
Total number of identified SSRs	18,226
Number of SSR containing sequences	14,497
Number of sequences containing more than 1 SSR	2,913
Number of SSRs present in compound formation	1,468
Mono-nucleotide	6,387
Di-nucleotide	8,728
Tri-nucleotide	2,899
Tetra-nucleotide	101
Penta-nucleotide	41
Hexa-nucleotide	70

Table 4

Length distributions of microsatellites in *B. ciliata* based on the number of nucleotide repeat units
Number of repeats	Mono-	di-	Tri-	tetra	Penta-	Hexa-	Total	Percentage (%)
5			1,737	72	30	54	1,893	10.38
6		2,468	593	24	8	9	3,102	17.01
7		1,709	297	2		7	2,015	11.05
8		1,292	147	1	1		1,441	7.90
9		901	79	2	1		983	5.39
10	3,194	741	29				3,964	21.74
11	1,127	448	4				1,579	8.66
12	683	340	8		1		1,032	5.66
13	364	217	5				586	3.21
14	275	234					509	2.79
15	198	190					388	2.12
16	123	24					147	0.80
17	108	50					158	0.86
18	56	25					81	0.44
19	50	19					69	0.37
20	43	17					60	0.32
21	13	23					36	0.19
22	20	10					30	0.16
23	18	5					23	0.12
24	20	4					24	0.13
25	12	5					17	0.09
26	15	4					19	0.10
27	8	1					9	0.04
28	6	1					7	0.03
29	25						25	0.13
30								0
< 30	29						29	0.15
Total	6,387	8,728	2,899	101	41	70	18,226
Percentage (%)	35.04	47.89	15.91	0.55	0.22	0.38

The ten tandem repeats of EST-SSR (3,964, 21.74%) were found to be the most common (Table 5), followed by six (3,102, 17.01%), seven tandem repeats (2,015, 11.05%), and five, eleven, and eight tandem repeats were 1,893 (10.38%), 1,579 (8.66%) and 1,441 (7.90%), respectively, while the remaining of tandem repeat for individual contributed < 10% of EST-SSR. The AG/CT was most dominant motif (7,351; 40.33%) followed by AT/AT (1050; 5.76%), AC/GT (301; 1.65%), and CG/CG (26; 0.14%) in the di-nucleotide repeats. The most abundant repeat motif was AAG/CTT (706; 3.87%) followed by ACC/GGT (519; 2.84%), ATC/ATG (387; 2.12%), AGC/CTG (309; 1.69%), and AGG/CCT (281; 1.54%) in the tri-nucleotide repeats (Table 5; Fig. 2).

Table 5

Frequency and distribution of microsatellites in *B. ciliata* based on SSRs repeat motifs
Repeat motifs	Number of repeats
Repeat motifs	5	6	7	8	9	10	> 10	Total	Frequency (%)
Mono-nucleotide
A/T	—	—	—	—	—	3164	3081	6245	34.26
C/G	—	—	—	—	—	30	112	142	0.77
								6387	35.04
Di-nucleotide
AG/CT	—	2033	1471	1088	749	634	1376	7351	40.33
AT/AT	—	293	165	163	125	89	215	1050	5.76
AC/GT	—	121	70	41	25	18	26	301	1.65
CG/CG	—	21	3	—	2	—	—	26	0.14
								8,728	47.88
Tri-nucleotide
AAG/CTT	377	159	73	47	31	11	8	706	3.87
ACC/GGT	308	108	62	32	6	3	—	519	2.84
ATC/ATG	254	67	32	12	13	6	3	387	2.12
AGC/CTG	202	49	33	20	1	4	—	309	1.69
AGG/CCT	189	52	26	5	7	2	—	281	1.54
Others	407	158	71	31	21	3	6	697	3.82
								2,899	15.90
Quad—nucleotide
AAAT/ATTT	24	—	—	—	—	—	—	24	0.13
AAAC/GTTT	11	2	—	—	—	—	—	13	0.07
AATC/ATTG	11	2	—	—	—	—	—	13	0.07
AGAT/ATCT	7	3	1	—	2	—	—	13	0.07
Others	19	17	1	1	—	—	—	38	0.21
								101	0.55
Penta-nucleotide	30	8	—	1	1	—	1	41	0.22
Hexa-nucleotide	54	9	7	—	—	—	—	70	0.38
								111	0.60
Total	1,893	3,102	2,015	1,441	983	3,964	4,829	18227	100
Frequency (%)	10.38569	17.01871	11.05503	7.905854	5.393098	21.74796	26.49366	100

Functional annotation of Bergenia transcriptome

De novo assembled unigenes of B. ciliata were annotated against the functional public databases; Nt (Non-redundant nucleotide sequence), Nr (Non-redundant protein sequence), KO (KEGG Orthology), Swiss-Prot, Pfam, GO (Gene Ontology), and KOG (Eukaryotic Orthologous Group) databases (Table 6; Fig. 3). All the sequences were generated by Blast and splicing algorithm, which was applied for the comparison and to obtain a relevant sequence and associating annotation.

Table 6

Summary of functional annotation of unigenes of *B. ciliata* with seven databases
	Number of Unigenes	Percentage (%)
Annotated in NR	53577	82.41
Annotated in NT	44297	68.14
Annotated in KO	22540	34.67
Annotated in Swissport	42287	65.05
Annotated in PFAM	20609	31.7
Annotated in GO	29477	45.34
Annotated in KOG	15027	23.11
Annotated in all Databases	4954	7.62
Annotated in at least one Database (overall*)	54732	84.19
Total Unigenes	65010	100
* the number of unigenes which can be annotated with at least one functional database

Annotation Of Non-redundant And Nucleotide Database

Out of the 65,010 unigenes, 54,732 unigenes were successfully annotated. However, 53,577 (82.41%) unigenes showed efficient homology with the proteins in the Nr database, while 44,297 (68.14%) of the control sequences relate with the Nt database entries (Fig. 3). As for the species distribution annotations of B. ciliata, Vitis vinifera (Vitaceae) has the highest similarity score (25.7%, followed by Quercus suber (Fagaceae) 7.6%, Juglans regia (Juglandaceae) 5.1%, Nelumbo nucifera (Nelumbonaceae) 3.3% and Hevea brasiliensis (Euphorbiaceae) 3.2% (Fig. 4).

Kog Classification, Go Annotation, Kegg Pathway And Swiss-prot Annotation

Functional annotation in the KOG database was based on 25 functional groups, including metabolic functions, cellular structure, and signal transduction. The post-translational modification, protein turnover and chaperones (2266 genes, 15.07%) represents the largest group, followed by general function prediction (1861 genes, 12.38%), translation, ribosomal structure, and biogenesis (1638 genes, 10.90%), and nuclear structures and cell motility as the smallest groups (Fig. 5).

Annotation in GO database grouped 29,477 (45.34%) unigenes into three major categories such as cellular component (51232, 37.8%), biological process (50744, 37.5%), and molecular functions (33274, 24.6%), with 51 subcategories (Fig. 6). Most of the unigenes in the molecular function are specified for binding (15346) and catalytic activity (14136), while metabolic process (14927) and cellular process (14264) are the major subcategories in the biological process. In total, 22,540 (34.67%) of unigenes were identified in the database, which were significantly assigned to 125 metabolic pathways.

Annotation in KEGG database, categorized metabolic pathways into 11 main divisions. Metabolic information processing (with 21717 genes) found to be the largest division, followed by genetic information processing (5,167), organismal systems (4,282), cellular processing (2,482), and environmental information processing (2,286) Further, in Swiss-Prot database 42,287 (65.05%) unigenes were found matched (Table 6; Fig. 7).

Development And Validation Of Novel Est-ssr Markers

Total 96 primer pairs were synthesized and checked for their amplification and polymorphism. Out of 96 primer pairs, 37 were successfully amplified, while the remaining 59 primers did not show any amplification even at different annealing temperatures. Among the 37 primer pairs, 32 successfully produced the desired amplified products, while the remaining 5 PCR products were larger or smaller than the expected size (ESM Fig. 1). Total of eight individuals from eight different populations of B. ciliata were used as PCR templates, and from 37 primers, 18 primer pairs were found polymorphic (ESM Fig. 2; ESM Fig. 3), whereas 14 pairs were identified as monomorphic (Table 7).

Table 7

Characterization of the 18 novel EST-SSR polymorphic primer pairs synthesized from the transcriptome of *B. ciliata*
SSR	Repeat Type	Repeat motif	Forward Primer	Tm (ºC)	Reverse Primer	Tm (ºC)	Product Size (bp)
BC2	Di	(AG)10	CTGAGGCCAAAGAAAGTGCG	59.7	ACAAAGTCACACGGGCATCT	59.8	190–250
BC7	Di	(AG)6	ACAATCAACAAGGCATCATGC	57.7	TCCAACTTACTGGGCAGGAA	58.5	180–250
BC8	Di	(AG)7	TGGTCTGACAGTGAGTTCGC	59.9	TCGCCATCACAGAAGCCTTT	59.9	140–160
BC17	Di	(AT)6	TACAAATACACCGGTGCAGG	57.8	AAATCTGGAGGGTTGCCAGG	59.9	125–150
BC23	Di	(CT)10	TCACTCGTAAAGTCGACCCT	58.0	GGACGTCGAGCGAACAAATG	59.9	140–180
BC26	Di	(CT)11	CAGCCAGTACTCTGCCCAAA	59.9	ACTCTCCACCTCCTGACCTC	59.9	130–150
BC29	Di	(CT)6	ACGCCATTCTCACTGTACCT	58.7	TCAGCGGAGAAACAACCTCC	59.9	180–210
BC33	Di	(CT)6	CATTGTTTCCTCCGTTGCCC	59.7	CTCCGTTTGGTTCTCGGGAA	59.9	200–250
BC38	Di	(CT)8	TCGCAAACTCTCTCACTCTCC	59.4	AAACTTCAACCGCGGGATCT	59.9	140–160
BC50	Di	(GA)8	TCCTCGAGTATTTGTCGCAG	57.4	GCGTTGAGAATCATTCGCCC	59.9	140–160
BC53	Di	(GA)9	ACCGCCAAGAGCTTGATGTA	59.3	TGTTGAGTCGTTCGTCTTCC	57.8	110–150
BC58	Di	(TA)8	ACACATGTTTACACGCGCAT	59.1	GAAGTGCACCCAAAGCATGA	59.0	175–210
BC67	Tri	(AGA)5	ACCAATGTGAGGGTTCCTTCT	58.9	CCAACACACAGCAAGACAGC	59.9	160–180
BC71	Tri	(CAC)6	AGAGGCACAATGTGGAAGAGA	59.0	TTCATGTAGTCCGGCAGCTC	59.8	190–210
BC73	Tri	(GAA)5	AGTGTGGTACTCCTCGCTCT	59.9	ATCACGTCGTCGGAGAATCG	59.9	220–250
BC74	Tri	(GAC)5	GGCAAACCTCCTCCCAAGAA	59.8	TTCCCTTGCCAGTTCCTCAC	59.8	230–250
BC84	Tri	(TTC)5	GCTTGCAGTTTACACCCACA	58.9	CGCCTCCACGTCTATGTCTC	59.9	160–190
BC87	Tri	(TTTA)5	GGAAAGGTTGGATTGCTCCC	58.8	GATCTGCTGCAGAACTGGGT	60.0	100–130

Discussion

Characterization of B. ciliata transcriptome

Next-generation sequencing has been gradually used to analyze the transcriptome sequencing and assembly in various plants due to the accuracy, high efficiency, high speed, and low cost (Zhou et al. 2018; Dhiman et al. 2020; Shah et al. 2020; Xie et al. 2021). EST-SSR markers have been used widely in assessing genetic diversity, DNA fingerprinting, marker-assisted breeding, and gene mapping due to their high polymorphism, codominant inheritance, and repeatability (Ercisli et al. 2011; Palumbo et al. 2018; Wang et al. 2020; Gaurav et al. 2021). An enormous data generated by transcriptome sequencing delivers extensive details and a significant source for the SSRs and gene development in several life forms (Li et al. 2019; Chabikwa et al. 2020; Saina et al. 2021). Therefore, transcriptome sequences have been successfully employed in both model and non-model plants for detection of marker-based functional variation and gene-associated genetic analysis (Klepikova et al. 2016; Zhang et al. 2020; Raizada et al. 2021). The lacks information on transcriptome data and EST sequences prompted us to sequence the transcriptome, and eventually develop the EST-SSR markers through NGS technology in B. ciliata. A total of 21,490,725 paired-end raw reads were constructed, and 21,277,286 best-quality clean reads were generated with 98.02% Q20 level (base quality > 20) during the present study (Table 2), which assures the sequencing quality. This is in congruence with the earlier studies on Neolitsea sericea (Chen et al. 2015), and Stephanandra incisa (Zhang et al. 2021). The mean N50 sizes (1,349 bp) of unigenes constructed in this study were found smaller than Stephanandra incisa (7,212 bp; Zhang et al. 2021) and Paeonia cultivars (1,780bp; He et al. 2020). However, total of 65,010 unigenes were assembled to transcriptome of B. ciliata with the average length of 1,024 bp, which was significantly larger than earlier reported transcriptome studies of Cynanchum komarovii (604 bp; Ma et al. 2015), Raphanus sativus (576 bp; Wu et at. 2015), Rhododendron rex (526.74 bp; Zhang et al. 2017), and Magnolia wufengensis (695 bp; Wang et al. 2019), but shorter than Cyamopsis tetragonoloba (1583.43 bp; Rawal et al. 2017), Phragmites karka (1354.16 bp; Nayak et al. 2020), and Lathyrus sativus (1250 bp; Hao et al. 2017). This could be due to a mismatch between the parameters and the assembler, as well as differences in the nature of the species (Xing et al. 2017). These unigenes were enriched for proteins that maintain the essential functions of B. ciliata. Ultimately, the massive amount of transcriptome sequences generated in this study will be further significant for investigating gene function and molecular mechanisms.

The assembled unigenes of B. ciliata were annotated to known public databases successfully; Nt, Nr, KOG, KEGG, GO, Pfam, and Swiss-Prot (Table 6). These annotations may provide useful facts for the future molecular studies in the genus Bergenia. The limited annotation or short sequence length of the genus Bergenia and related species in the present database could be the reason for the genes which did not match any functional annotations (Xing et al. 2017). In the KOG classification, the first and second groups found in this study were the post-translational modification, protein turnover chaperones, and general function prediction (Fig. 5), which were similar to the studies performed by Du et al. (2019) but distinct from the work of Cao et al. (2018). Overall, 4,954 (7.62%) unigenes in the database had significant matches and were classified into seven main categories, including 271 KEGG pathways. In this study the biggest group in three GO categories were metabolic process, and cellular process under the biological process and the cell and cell part in cellular component (Fig. 6), which were similar to Hevea brasiliensis (Li et al. 2012). These outcomes revealed that B. ciliata has active metabolic processes and can synthesize plethora of metabolites. GO functional annotation helped to describe the macro level of gene functions and predict the physiological role of each unigene (Kumar et al. 2014). The results revealed various molecular functions of assembled unigenes, suggesting their involvement in diverse metabolic pathways. These findings indicate that B. ciliata makes a huge investment in gene transcription control and capacity, as well as cell maintenance and defence. Briefly, functional analysis revealed that RNA-seq-based de novo transcriptome analysis for B. ciliata a non-model organism with a complex genome, will facilitate further research on the physiology, molecular genetics, and biochemistry of B. ciliata or related species.

Frequency And Distribution Of Est-ssrs

EST-SSR markers play a significant role in the genetic diversity and population structure analysis, development of genetic maps, marker-assisted selection, genomics comparison, breeding, and species conservation (Chen et al. 2005; Marconi et al. 2011; Ahmad et al. 2018; Sahoo et al. 2021). Total of 18,226 potential EST-SSRs were recognized to 65,010 unigenes in this study, which has significantly enhanced the SSR resources availability for marker development in the B. ciliata and other closely related taxa of the family Saxifragaceae. The di-nucleotide (47.88%) repeats were the most abundant type, which are corresponding with other species of the plants studied earlier (Chen et al., 2015; Jia et al., 2016; Biswas et al., 2020), followed by mono- nucleotide (35.04%) and tri-nucleotide (15.90%) repeats (Fig. 2; Table 5). In dicotyledonous plants, di-nucleotide repeats have been found to be the most abundant SSR repeat type (Yang et al. 2020). Overexpression of UTRs as compared to open reading frames may lead to di-nucleotide repeat sequence (Qiu et al. 2010). Additionally, AG/CT was the most common di-nucleotide repeat (40.33%).

In an mRNA population, the AG/CT motif can reveal UCU and CUC codons, which translate to the Ala and Leu amino acids, are found in proteins at a higher frequency than other amino acids and can be identified in an mRNA population (Chen L. Y. et al. 2015). As a result, AG/CT motifs can be found in abundance in plant EST libraries (Morgante et al. 2002; Chen L. Y. et al. 2015). In our study, the CG/CG (0.14%) was the lowest prominent motif. It could be due to cytosine methylation, which inhibits the transcription in some plants (Chen L. Y. et al. 2015; Xing et al. 2017). The AAG/CTT repeat motif was the most dominant in tri-nucleotide repeat (3.87%). The earlier studies on Ricinus communis, Neolitsea sericea, Sesamum indicum, and Cucumis sativus, also revealed the tri-nucleotide AAG motif is highly dominant and valuable in dicotyledonous plants (Qiu et al. 2010; Cavagnaro et al. 2010; Wei et al. 2011; Chen L. Y. et al. 2015; Zhang et al. 2021).

EST-SSRs frequency of unigenes in B. ciliata was (Table 2) is significantly higher than Citrus reticulata (Long 2014), Quercus austrocochinchinensis (An et al. 2016), Rosa roxburghii (Yan et al. 2015), Rhododendron fortune (Yang et al. 2018), but lower than Tagetes erecta (Zhang et al. 2018), and Lonicera caerulea (Zhang et al. 2016). The frequency of SSRs, diversified from species to species depends upon the SSR database size, database-mining tools, and the search criteria, in different studies (Gao et al. 2003; Varshney et al. 2005). A significant number of high-quality EST-SSRs generated during present investigation will be immensely helpful in better understanding of the genetic diversity, population structure, and in breeding programs.

Est-ssr Makers Validation

A total 96 primer pairs were randomly selected for PCR validation, out of which 37 (38.54%) were successfully amplified by genomic DNA of B. ciliata, 18 (18.75%) primer pairs were polymorphic among the eight geographically isolated populations of B. ciliata tested, and the remaining 59 (61.45%) primer pairs failed to amplify the PCR products at different annealing temperatures or else that were formed bigger than expected size of PCR products (Fig. S1).

The PCR success rate was higher in B. ciliata than the rates reported in other species like Curcuma alismatifolia (11.33%; Taheri et al. 2019), Magnolia sinostellata (15.33%; Wang et al. 2019), Salix psammophila (16.07%; Jia et al. 2016). However, the success rate was found lower than Sesamum indicum (80%; Wei et al. 2011) and Hevea brasiliensis (55.45%; Li et al. 2012). Thus, in this study the polymorphic ratio of EST-SSR markers was found significantly high. This indicates that newly developed EST-SSR markers will be significant and useful for the population genetic structure and evolutionary studies among the Bergenia species.

The transferability of EST-SSRs markers among related species is higher because it contains conserved sequences among homologous genes and their origin from the transcribed regions in genomes (Wu et al. 2014; Guo et al. 2014). In the current study, the 18 newly developed polymorphic markers will be applied to estimate genetic diversity in the genus Bergenia and cross transferability in allied species of the family Saxifragaceae. Thus, these markers will be helpful in providing valuable sequence resources for the development of molecular markers in Bergenia species.

References

Ahmad A, Wang J-D, Pan Y-B, et al (2018) Development and Use of Simple Sequence Repeats (SSRs) Markers for Sugarcane Breeding and Genetic Studies. Agronomy 8:260. https://doi.org/10.3390/agronomy8110260
An M, Deng M, Zheng S-S, et al (2016) De novo transcriptome assembly and development of SSR markers of oaks Quercus austrocochinchinensis and Q. kerrii (Fagaceae). Tree Genet Genomes 12:103. https://doi.org/10.1007/s11295-016-1060-5
Asolkar LV, Kakkar KK, Chakre OJ (1992) Second supplement to glossary of Indian medicinal plants with active principles: part-1 (A-K), (1965-1981). Publications and information directorate, New-Dehli
Biswas MK, Bagchi M, Nath UK, et al (2020) Transcriptome wide SSR discovery cross-taxa transferability and development of marker database for studying genetic diversity population structure of Lilium species. Sci Rep 10:18621. https://doi.org/10.1038/s41598-020-75553-0
Cao D, Liu Y, Ma L, et al (2018) Transcriptome analysis of differentially expressed genes involved in selenium accumulation in tea plant (Camellia sinensis). PLoS One 13:e0197506. https://doi.org/10.1371/journal.pone.0197506
Cavagnaro PF, Senalik DA, Yang L, et al (2010) Genome-wide characterization of simple sequence repeats in cucumber (Cucumis sativus L.). BMC Genom 11:569. https://doi.org/10.1186/1471-2164-11-569
Chabikwa TG, Barbier FF, Tanurdzic M, et al (2020) De novo transcriptome assembly and annotation for gene discovery in avocado, macadamia and mango. Sci Data 7:9. https://doi.org/10.1038/s41597-019-0350-9
Chen H (2005) Development, chromosome location and genetic mapping of EST-SSR markers in wheat. Chinese Sci Bull 50:2328. https://doi.org/10.1360/982005-379
Chen L-Y, Cao Y-N, Yuan N, et al (2015) Characterization of transcriptome and development of novel EST-SSR makers based on next-generation sequencing technology in Neolitsea sericea (Lauraceae) endemic to East Asian land-bridge islands. Mol Breeding 35:187. https://doi.org/10.1007/s11032-015-0379-1
Conesa A, Götz S, García-Gómez JM, et al (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21:3674–3676. https://doi.org/10.1093/bioinformatics/bti610
Dhiman N, Kumar A, Kumar D, et al (2020) De novo transcriptome analysis of the critically endangered alpine Himalayan herb Nardostachys jatamansi reveals the biosynthesis pathway genes of tissue-specific secondary metabolites. Sci Rep 10:17186. https://doi.org/10.1038/s41598-020-74049-1
Doyle JJ, Doyle JL (1990) Isolation of plant DNA from fresh tissue. Focus 12:13-15. doi: 10.1007/978-3-642-83962-7_18.
Du X, Zhu X, Yang Y, et al (2019) De novo transcriptome analysis of Viola ×wittrockiana exposed to high temperature stress. PLoS One 14:e0222344. https://doi.org/10.1371/journal.pone.0222344
Ercisli S, Ipek A, Barut E (2011) SSR Marker-Based DNA Fingerprinting and Cultivar Identification of Olives (Olea europaea). Biochem Genet 49:555–561. https://doi.org/10.1007/s10528-011-9430-z
Gao L, Tang J, Li H, et al (2003) Analysis of microsatellites in major crops assessed by computational and experimental approaches. Mol Breeding 12:245–261. https://doi.org/10.1023/A:1026346121217
Gaurav AK, Namita, Raju DVS, et al (2022) Genetic diversity analysis of wild and cultivated Rosa species of India using microsatellite markers and their comparison with morphology based diversity. J Plant Biochem Biotechnol 31:61–70. https://doi.org/10.1007/s13562-021-00655-3
Grabherr MG, Haas BJ, Yassour M, et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652. https://doi.org/10.1038/nbt.1883
Guo R, Mao Y-R, Cai J-R, et al (2014) Characterization and cross-species transferability of EST–SSR markers developed from the transcriptome of Dysosma versipellis (Berberidaceae) and their application to population genetic studies. Mol Breeding 34:1733–1746. https://doi.org/10.1007/s11032-014-0134-z
Hao X, Yang T, Liu R, et al (2017) An RNA Sequencing Transcriptome Analysis of Grasspea (Lathyrus sativus L.) and Development of SSR and KASP Markers. Front Plant Sci 8:1873. doi: 10.3389/fpls.2017.01873
He D, Zhang J, Zhang X, et al (2020) Development of SSR markers in Paeonia based on De Novo transcriptomic assemblies. PLoS One 15:e0227794. https://doi.org/10.1371/journal.pone.0227794
Hina F, Yisilam G, Wang S, et al (2020) De novo Transcriptome Assembly, Gene Annotation and SSR Marker Development in the Moon Seed Genus Menispermum (Menispermaceae). Front Genet 11:380. https://doi.org/10.3389/fgene.2020.00380
Jia H, Yang H, Sun P, et al (2016) De novo transcriptome assembly, development of EST-SSR markers and population genetic analyses for the desert biomass willow, Salix psammophila. Sci Rep 6:39591. https://doi.org/10.1038/srep39591
Kim JM, Lyu JI, Lee M-K, et al (2019) Cross-species transferability of EST-SSR markers derived from the transcriptome of kenaf (Hibiscus cannabinus L.) and their application to genus Hibiscus. Genet Resour Crop Evol 66:1543–1556. https://doi.org/10.1007/s10722-019-00817-2
Kirtikar KR, Basu BD (1935) Indian Medicinal Plants. Lalit Mohan Basu Publication. Allahabad.
Klepikova AV, Kasianov AS, Gerasimov ES, et al (2016) A high resolution map of the Arabidopsis thaliana developmental transcriptome based on RNA-seq profiling. Plant J 88:1058–1070. https://doi.org/10.1111/tpj.13312
Kumar S, Shah N, Garg V, Bhatia S (2014) Large scale in-silico identification and characterization of simple sequence repeats (SSRs) from de novo assembled transcriptome of Catharanthus roseus (L.) G. Don. Plant Cell Rep 33:905–918. https://doi.org/10.1007/s00299-014-1569-8
Kumpatla SP, Mukhopadhyay S (2005) Mining and survey of simple sequence repeats in expressed sequence tags of dicotyledonous species. Genome 48:985–998. https://doi.org/10.1139/g05-060
La Rota M, Kantety RV, Yu J-K, et al (2005) Nonrandom distribution and frequencies of genomic and EST-derived microsatellite markers in rice, wheat, and barley. BMC Genom 6:23. https://doi.org/10.1186/1471-2164-6-23
Li D, Deng Z, Qin B, et al (2012) De novo assembly and characterization of bark transcriptome using Illumina sequencing and development of EST-SSR markers in rubber tree (Hevea brasiliensis Muell. Arg.). BMC Genom 13:192. https://doi.org/10.1186/1471-2164-13-192
Li Y, Jia L-K, Zhang F-Q, et al (2019) Development of EST-SSR markers in Saxifraga sinomontana (Saxifragaceae) and cross-amplification in three related species. Appl Plant Sci 7:e11269. https://doi.org/10.1002/aps3.11269
Long D (2014) Large scale development of SSR markers based on transcriptome sequencing of precocious trifoliate orange (M.Sc. dissertation). Wuhan (China): Huazhong Agriculture University.
Ma X, Wang P, Zhou S, et al (2015) De novo transcriptome sequencing and comprehensive analysis of the drought-responsive genes in the desert plant Cynanchum komarovii. BMC Genom 16:753. https://doi.org/10.1186/s12864-015-1873-x
Marconi TG, Costa EA, Miranda HR, et al (2011) Functional markers for gene mapping and genetic diversity studies in sugarcane. BMC Res Notes 4:264. https://doi.org/10.1186/1756-0500-4-264
Morgante M, Hanafey M, Powell W (2002) Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet 30:194–200. https://doi.org/10.1038/ng822
Nayak SS, Pradhan S, Sahoo D, et al (2020) De novo transcriptome assembly and analysis of Phragmites karka, an invasive halophyte, to study the mechanism of salinity stress tolerance. Sci Rep 10:5192. https://doi.org/10.1038/s41598-020-61857-8
Palumbo F, Galla G, Vitulo N, et al (2018) First draft genome sequencing of fennel (Foeniculum vulgare Mill.): identification of simple sequence repeats and their application in marker-assisted breeding. Mol Breeding. https://doi.org/10.1007/s11032-018-0884-0
Pandey R, Kumar B, Meena B, et al (2017) Major bioactive phenolics in Bergenia species from the Indian Himalayan region: Method development, validation and quantitative estimation using UHPLC-QqQLIT-MS/MS. PLoS One 12:e0180950. https://doi.org/10.1371/journal.pone.0180950
Qiu L, Yang C, Tian B, et al (2010) Exploiting EST databases for the development and characterization of EST-SSR markers in castor bean (Ricinus communis L.). BMC Plant Biol 10:278. https://doi.org/10.1186/1471-2229-10-278
Raizada A, Souframanien J (2021) SNP genotyping and diversity analysis based on genic-SNPs through high resolution melting (HRM) analysis in blackgram [Vigna mungo (L.) Hepper]. Genet Resour Crop Evol 68:1331–1343. https://doi.org/10.1007/s10722-020-01064-6
Rana MS, Samant SS (2011) Diversity, indigenous uses and conservation status of medicinal plants in Manali wildlife sanctuary, North Western Himalaya. IJTK Vol10(3) [July 2011]
Rawal HC, Kumar S, Mithra SVA, et al (2017) High Quality Unigenes and Microsatellite Markers from Tissue Specific Transcriptome and Development of a Database in Clusterbean (Cyamopsis tetragonoloba, L. Taub). Genes 8:313. https://doi.org/10.3390/genes8110313
Sahoo A, Behura S, Singh S, et al (2021) EST-SSR marker-based genetic diversity and population structure analysis of Indian Curcuma species: significance for conservation. Braz J Bot 44:411–428. https://doi.org/10.1007/s40415-021-00711-1
Shah M, Alharby HF, Hakeem KR, et al (2020) De novo transcriptome analysis of Lantana camara L. revealed candidate genes involved in phenylpropanoid biosynthesis pathway. Sci Rep 10:13726. https://doi.org/10.1038/s41598-020-70635-5
Taheri S, Abdullah TL, Rafii MY, et al (2019) De novo assembly of transcriptomes, mining, and development of novel EST-SSR markers in Curcuma alismatifolia (Zingiberaceae family) through Illumina sequencing. Sci Rep 9:3047. https://doi.org/10.1038/s41598-019-39944-2
Taramino G, Tarchini R, Ferrario S, et al (1997) Characterization and mapping of simple sequence repeats (SSRs) in Sorghum bicolor: Theor Appl Genet 95:66–72. https://doi.org/10.1007/s001220050533
Tiwari V, Mahar KS, Singh N, et al (2015) Genetic variability and population structure of Bergenia ciliata (Saxifragaceae) in the Western Himalaya inferred from DAMD and ISSR markers. Biochem Sys Ecol 60:165–170. https://doi.org/10.1016/j.bse.2015.04.018
Tiwari V, Meena B, Nair NK, et al (2020) Molecular analyses of genetic variability in the populations of Bergenia ciliata in Indian Himalayan Region (IHR). Physiol Mol Biol Plants 26:975–984. https://doi.org/10.1007/s12298-020-00797-z
Untergasser A, Cutcutache I, Koressaar T, et al (2012) Primer3—new capabilities and interfaces. Nucleic Acids Res 40:e115–e115. https://doi.org/10.1093/nar/gks596
Varshney RK, Graner A, Sorrells ME (2005) Genic microsatellite markers in plants: features and applications. Trends Biotechnol 23:48–55. https://doi.org/10.1016/j.tibtech.2004.11.005
Wang L, Gong X, Jin L, et al (2019a) Development and validation of EST-SSR markers of Magnolia wufengensis using de novo transcriptome sequencing. Trees 33:1213–1223. https://doi.org/10.1007/s00468-019-01853-2
Wang X, Chen W, Luo J, et al (2019b) Development of EST-SSR markers and their application in an analysis of the genetic diversity of the endangered species Magnolia sinostellata. Mol Genet Genomics 294:135–147. https://doi.org/10.1007/s00438-018-1493-7
Wang Y, Jia H-M, Shen Y-T, et al (2020) Construction of an anchoring SSR marker genetic linkage map and detection of a sex-linked region in two dioecious populations of red bayberry. Hortic Res 7:53. https://doi.org/10.1038/s41438-020-0276-6
Wei W, Qi X, Wang L, et al (2011) Characterization of the sesame (Sesamum indicum L.) global transcriptome using Illumina paired-end sequencing and development of EST-SSR markers. BMC Genom 12:451. https://doi.org/10.1186/1471-2164-12-451
Wu G, Zhang L, Yin Y, et al (2015) Sequencing, de novo assembly and comparative analysis of Raphanus sativus transcriptome. Front Plant Sci 6: 198. doi: 10.3389/fpls.2015.00198
Xie X, Jiang J, Chen M, et al (2021) De novo Transcriptome Assembly of Myllocerinus aurolineatus Voss in Tea Plants. Front Sustain Food Syst 5: 631990. doi: 10.3389/fsufs.2021.631990
Xing W, Liao J, Cai M, et al (2017) De novo assembly of transcriptome from Rhododendron latoucheae Franch. using Illumina sequencing and development of new EST-SSR markers for genetic diversity analysis in Rhododendron. Tree Genet Genomes 13:53. https://doi.org/10.1007/s11295-017-1135-y
Yan XQ, Lu M, An HM (2015) Analysis on SSR information in transcriptome and development of molecular markers in Rosa roxburghii. Acta Hort Sinica 42:341–349. doi: 10.1016/j.gene.2015.02.054.
Yan Z, Wu F, Luo K, et al (2017) Cross-species transferability of EST-SSR markers developed from the transcriptome of Melilotus and their application to population genetics research. Sci Rep 7:17959. https://doi.org/10.1038/s41598-017-18049-8
Yang B, Xu QW, Niu MY, et al (2018) SSR analysis and molecular marker development of the transcriptome of Rhododendron fortunei Lindl.. Acta Agri Nucl Sin 32:53–63.
Yang Y, He R, Zheng J, et al (2020) Development of EST-SSR markers and association mapping with floral traits in Syringa oblata. BMC Plant Biol 20:436. https://doi.org/10.1186/s12870-020-02652-5
Yue L, Twell D, Kuang Y, et al (2017) Transcriptome Analysis of Hamelia patens (Rubiaceae) Anthers Reveals Candidate Genes for Tapetum and Pollen Wall Development. Front Plant Sci 7: 1991. doi: 10.3389/fpls.2016.01991
Zhai L, Xu L, Wang Y, et al (2014) Novel and useful genic-SSR markers from de novo transcriptome sequencing of radish (Raphanus sativus L.). Mol Breeding 33:611–624. https://doi.org/10.1007/s11032-013-9978-x
Zhang C, Wu Z, Jiang X, et al (2021) De novo transcriptomic analysis and identification of EST-SSR markers in Stephanandra incisa. Sci Rep 11:1059. https://doi.org/10.1038/s41598-020-80329-7
Zhang H, Cong R, Wang M, et al (2018) Development of SSR molecular markers based on transcriptome sequencing of Tagetes erecta. Acta Hortic Sin 45:159-167. doi: 10.16420/j.issn.0513-353x.2017-0166.
Zhang K, Li Y, Zhu W, et al (2020) Fine Mapping and Transcriptome Analysis of Virescent Leaf Gene v-2 in Cucumber (Cucumis sativus L.). Front Plant Sci 11:570817. doi: 10.3389/fpls.2020.570817
Zhang QT, Li XY, Yang YM, et al (2016) Analysis on SSR information in transcriptome and development of molecular markers in Lonicera caerulea. Acta Hortic Sin 43:557–563. doi: 10.16420/j.issn.0513-353x.2017-0283.
Zhang Y, Zhang X, Wang Y-H, et al (2017) De Novo Assembly of Transcriptome and Development of Novel EST-SSR Markers in Rhododendron rex Lévl. through Illumina Sequencing. Front Plant Sci 8:1664. https://doi.org/10.3389/fpls.2017.01664
Zhou Q, Zhou P-Y, Zou W-T, et al (2021) EST-SSR marker development based on transcriptome sequencing and genetic analyses of Phoebe bournei (Lauraceae). Mol Biol Rep 48:2201–2208. https://doi.org/10.1007/s11033-021-06228-w
Zhou S, Wang C, Frazier TP, et al (2018) The first Illumina-based de novo transcriptome analysis and molecular marker development in Napier grass (Pennisetum purpureum). Mol Breeding 38:95. https://doi.org/10.1007/s11032-018-0852-8