DOI: https://doi.org/10.21203/rs.3.rs-2145830/v1
Bergenia ciliata (Haw.) Sternb. is an important herb predominantly found in Indian Himalayan Region (IHR). It is widely used in medicines and healthcare system, cosmetics, food, fodder, and ornamental purposes. Illumina sequencing and de novo transcriptome assembly were carried out in B. ciliata to develop and identify simple sequence repeat markers for genetic diversity and conservation studies. The assembled data generated a total of 65,010 unigenes that showed significant similarities when compared with seven functional databases including 53,577 (Non-Redundant Protein Sequence Database: 82.41%), 44,297 (Nucleotide Sequence Database: 68.14%), 42,287 (Swiss Prot: 65.05%), 15,027 (Eukaryotic Orthologous Groups: 23.11%), 22,540 (KEGG Orthology: 34.67%), 29,477 (Gene Ontology: 45.34%) and 20,609 (Pfam: 31.7%) unigenes. In this study, a total of 18,226 SSRs and 14,497 SSR containing sequences were identified. Dinucleotides were found to be abundant (47.88%) in B. ciliata followed by mononucleotides (35.04%), and trinucleotides repeat (15.90%). AG/CT was the most common di-nucleotide repeat (40.33%). A total of 11,839 EST-SSR primers were designed, of which 96 primer pairs were synthesized randomly. Finally, 18 primer pairs were selected that revealed clear, distinct polymorphic bands when examined in eight diverse B. ciliata accessions. Furthermore, the transcriptome data and the EST-SSR markers will be an important resource for investigating genetic diversity in B. ciliata and other species of the family Saxifragaceae.
The genus Bergenia Moench (Saxifragaceae) consists of about 10 species of perennial rhizomatous herbs distributed mainly in the temperate and subtropical regions in Central and East Asia (Pandey et al. 2017). In India, the four species of Bergenia viz., Bergenia ciliata (Haw.) Sternb., Bergenia ligulata (Wall.) Engl., Bergenia stracheyi (Hook. & Thoms.) Engl., and Bergenia purpurascens (Hook. & Thoms.) Engl., are mainly found in the Himalayan regions (Tiwari et al. 2015). Bergenia ciliata (Haw.) Sternb. (Saxifragaceae), commonly known as ‘Pashanbheda’ is a perennial, rhizomatous herb occurs in temperate and sub-temperate Himalayan regions at an elevation of 350–4000 m from Afghanistan to eastward in Pakistan, Nepal, Bhutan and India (Ahmad et al. 2018; Tiwari et al. 2020). In India, it is found mostly in humid and moist habitats, crevices, on rock surfaces, hill slopes and near to water channels (Tiwari et al. 2020). B. ciliata is medicinally important and traditionally being used to treat several diseases like kidney stones, urinary problems, piles, boils, burns, fever, liver disorders, thirst, cuts and wounds, asthma, ophthalmia, and diarrhoea in cattle (Kirtikar and Basu 1935; Rana and Samant 2011). It is used as an important botanical source for preparation of many herbal medicines due to the presence of numerous bioactive metabolites. It is one of the significant constituents in indigenous drugs such as Calculi and Cystone used in clinically to cure urinary bladder and kidney stones (Asolkar et al. 1992). To estimate the genetic diversity and population genetic structure with high reproducibility within a large number of species, microsatellites or SSRs (Simple sequence repeats) are recognized as significant molecular markers (Taheri et al. 2019). SSRs are tandem repeats of 2–6 nucleotides widely distributed in both eukaryotic and prokaryotic genomes (Zhai et al. 2014). Microsatellites can be developed either from EST database or from coding sequences by RNA sequence technology and found to be less expensive as compared to genomic SSRs which are costly, time-consuming, and labor-intensive (Taramino et al. 1997; Rota et al. 2005). EST-SSRs are generated from the highly conserved regions of the genome, therefore, show high rates of transferability than gSSRs and are therefore considered as an efficient tool for exploring the population dynamics, evolutionary studies, comparative genetic mapping, and determination of differentially expressed genes (Hina et al. 2020). A large number of transcribed sequences with EST-SSR markers have been generated for studying cross transferability in medicinal plant species such as Saxifraga sinomontana (Li et al. 2019), Melilotus germplasm (Yan et al. 2017), Hibiscus cannabinus (Kim et al. 2019), and Phoebe bournei (Zhou et al. 2021). Earlier genetic diversity studies on B. ciliata were restricted to universal, multi-allelic, dominant, single primer amplification reaction (SPAR) methods like Inter Simple Sequence Repeat (ISSR) and Directed Amplification of Minisatellites region-DNA (DAMD). Development of SSR markers in B. ciliata will be valuable tools for studies of genetic diversity, population structure, cross-transferability, conservation biology, and plant breeding. As such there is no report available on the development of microsatellites and transcriptome analysis (RNA-seq) of B. ciliata, therefore, to the best of our knowledge, the present study seems to be the first report on de novo transcriptome analysis, and development of SSR markers based on Illumina HiSeq 4000 sequencing for exploring the population genetics of B. ciliata in India and establishing its relationship with allied species of the family Saxifragaceae.
Rhizomes of B. ciliata were collected from the Galla Dhar, Rajgarh, Jammu and Kashmir (N33°11'39.41'' E 75°22'44.74'') and planted in a glasshouse at CSIR-National Botanical Research Institute, Lucknow, India. The Fresh leaf samples were collected, cleaned, and stored immediately in liquid nitrogen and kept at -20℃ until RNA extraction. Total RNA was isolated using a commercially available QuickRNA Plant Miniprep Plus kit (ZYMO Research) as per the manufacturer’s instruction. Qualitative analysis of the isolated RNA samples, RNA degradation and contamination were monitored on 1% denaturing RNA agarose gel. RNA purity was checked using the Nano Photometer® spectrophotometer (IMPLEN, CA, USA) and RNA integrity and quantitation were assessed using RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA).
A total amount of 1µg RNA per sample was used as input material for preparation of RNA samples. More than 30 ng of RNA was equally pooled from the three individuals for preparing a complementary DNA (cDNA) library. Sequencing libraries were generated using NEBNext® Ultra™ RNA Library Preparation Kit for Illumina® (NEB, USA) as per the manufacturer’s protocol and index codes were added to attribute sequences to each sample. Purification of mRNA was done from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5X). The first strand of cDNA was synthesized using random hexamer primer and M-MuLV Reverse Transcriptase (RNase H), and the second strand was synthesized subsequently using DNA Polymerase I and RNase H. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of 3’ends of DNA fragments, The NEBNext Adaptor with hairpin loop structure was ligated to prepare for hybridization.
To select cDNA fragments of preferentially 250 ~ 300 bp in length, the library fragments were purified with AMPure XP system (Beckman Coulter, Beverly, USA). Then 3µl of USER Enzyme (NEB, USA) was used with size-selected, adaptor-ligated cDNA at 37°C for 15 min followed by 5 min at 95°C before PCR. PCR was performed with Phusion high-fidelity DNA polymerase, universal primers, and index (X) primer. At last, PCR products were purified (AMPure XP system) and library quality was assessed (Agilent Bioanalyzer 2100 system). The clustering of the index-coded samples was performed on a cBot Cluster Generation System using PE Cluster Kit, cBot-HS (Illumina) according to the manufacturer’s instructions. After cluster generation, the library preparations were sequenced, and paired-end reads were generated.
The sequenced raw data were stored in FASTAQ format. RNA isolation, Illumina library preparation, and sequencing were outsourced to Eurofins Scientific, Bengaluru, India. The raw reads of the transcriptome sequence were submitted to the sequence read archive (SRA) under accession number PRJNA775824.
De novo transcriptome assembly and analysis of sequence data
De novo assembly of the transcriptome was carried out in Trinity (Grabherr et al. 2011). Gene Ontology (GO) annotations as well as KOG functional analysis of the unigenes and pathway were performed based on four public databases (Cai et al. 2018; Yue et al. 2017). The GO annotation and functional classification of the unigenes were carried out in automatic and high throughput software, Blast2GO (Conesa et al. 2005).
The MIcroSAtellite program (MISA; http://pgrc.ipk-gate rsleben.de/misa/) was used to detect SSRs in the unigenes. Simple sequence repeats consist of a repeating unit of 1–6 bp, and can be of three types according to their distribution and composition (i.e., compound, pure, and interrupted). In this study, SSRs comprised at least four uninterrupted repeats of 2–6 bp. Minimum number of repeats of each unit size obtained were: 1–10; 2–6; 3–5; 4–5; 5–5 and 6 − 5. SSRs occurred at a frequency of 1/kb of cDNA. PCR primers (20 bp) with a melting temperature (Tm) of 55–65 ºC, were designed to produce an amplification product of 100–200 bp, using primer designing tool, Primer3 v 2.3.5 (Untergasser et al. 2012).
The extraction of genomic DNA from the fresh leaves of eight B. ciliata accessions collected from eight different geographical locations (Table 1.) was carried using CTAB method (Doyle and Doyle 1990). The SSR-PCR reactions were carried in 10 µL volume, containing 2 µL water, 6 µL PCR Master Mix 2X (Thermo Scientific Pvt. Ltd.), 0.5 µL of each primer at a concentration of 10 µM, and 1 µL of genomic DNA (30–50 ng). The reaction conditions for SSR primers were optimized to an initial denaturation at 95 ºC for 4 min, followed by 35 cycles of 45 sec at 94 ºC, annealing temperature for 45 sec, and 72 ºC for 1 min, and a final extension at 72 ºC for 5 min. The amplified products were separated on 2.5% agarose gels and stained with ethidium bromide (10 mg/mL) for visualization. Step Up (50 bp) DNA Ladder (GeNei Laboratories Pvt. Ltd.) was used as a reference for comparison of the band size of the amplified products. The gel was documented using UV Tech Gel Documentation System (UK) and the gel patterns were documented as digital images for further processing.
SN | Accessions number | Locality | Latitude | Longitude | Altitude (m) |
---|---|---|---|---|---|
1 | 255845 | J&K, Rajouri, Dera Gali Forest, RJR | 33.58432° | 74.35805° | 2036 |
2 | 253516 | HP, Kothi to Rahla, Kullu, KLU | 32.34808° | 77.22172° | 2277 |
3 | 335347 | UK, Dehradun, Chakarata, CHK | 30.681388° | 77.880828° | 2097 |
4 | 335316 | UK, Dehradun, Mussoorie, MSR | 30.447476° | 78.091282° | 1902 |
5 | 335301 | UK, Bageshwar, Gansi, GSI | 30.04853° | 79.97393° | 1615 |
6 | 253459 | UK, Almora, Binsar Wildlife Sanctuary, BWS | 29.68313° | 79.72982° | 1923 |
7 | 255752 | SK, East Sikkim, Penlong, PLG | 27.37408° | 88.62213° | 1646 |
8. | 255715 | WB, Darjeeling, DRG | 27.05383° | 88.25267° | 2087 |
Illumina sequencing and de novo assembly
In Illumina sequencing platform, the cDNA library was constructed using a total of 21,490,725 paired-end raw reads, which included null reads, low-quality sequences, and adapter-primer sequences. Total of 21,277,286 high-quality clean reads with 98.02% (Q20) and 93.98% (Q30) bases were obtained after a stringent quality check and data filtering. The GC percentage for the clean reads had 4.71 and clean reads were total nucleotide number of 133,174,507 (66,597,750 transcripts + 66,576,757 unigenes) (Table 2). Total unigenes identified with paired-end reads were 65,010 bp, with total length of 665,767,57 bp encompassing an average length of 1024.10 bp. From a given set of contigs, the values of N50 and N90 were generated as 1349 and 494 bp, respectively. In the 65,010 unigenes, 17,527 unigenes (26.96%) had a length of 200-500bp; 21,591 unigenes (33.21%) ranged from 500-1 kbp; 19,547 bp unigenes (30.06%) ranged from 1kbp-2kbp, and the length of 6,345 unigenes (9.76%) ranged > 2kbp (Fig. 1).
Category | Items | Number |
---|---|---|
Raw reads | Total raw read | 21,490,725 |
Clean reads | Total clean reads | 21,277,286 |
Total clean nucleotides (nt) | 133,174,507 | |
Q20 percentage | 98.02% | |
Q30 percentage | 93.98% | |
GC percentage | 44.71% | |
Unigenes | Total sequence number | 65,010 |
Total sequence base | 665,767,57 | |
Largest | 6,531 | |
Smallest | 301 | |
Average | 1,024 | |
N50(bp) | 1,349 | |
N90 (bp) | 494 | |
EST-SSR | Total number of examined sequences | 65,010 |
Total size of examined sequences (bp) | 665,767,57 | |
Total number of identified SSRs | 18,226 | |
Number of SSR-containing sequences | 14,497 | |
Number of sequences containing more than one SSR | 2,913 | |
Number of SSRs present in compound formation | 1,468 |
Out of 65,010 unigenes, 18,226 potential EST-SSRs were identified, and 1,468 compound microsatellites obtained from the 18,226 EST-SSRs (Table 3). The SSR frequency of unigenes in B. ciliata was 28.03%. A total of 18,226 EST-SSRs were identified, wherein the most prominent type repeat was dinucleotides (8,728, 47.89%), followed by mono (6,327, 35.04%) and trinucleotide repeats (2,899, 15.91%) (Table 4; Fig. 2).
Searching Items | Numbers |
---|---|
Total number of sequences examined | 65,010 |
Total size of examined sequences (bp) | 66,576,757 |
Total number of identified SSRs | 18,226 |
Number of SSR containing sequences | 14,497 |
Number of sequences containing more than 1 SSR | 2,913 |
Number of SSRs present in compound formation | 1,468 |
Mono-nucleotide | 6,387 |
Di-nucleotide | 8,728 |
Tri-nucleotide | 2,899 |
Tetra-nucleotide | 101 |
Penta-nucleotide | 41 |
Hexa-nucleotide | 70 |
Number of repeats | Mono- | di- | Tri- | tetra | Penta- | Hexa- | Total | Percentage (%) |
---|---|---|---|---|---|---|---|---|
5 | 1,737 | 72 | 30 | 54 | 1,893 | 10.38 | ||
6 | 2,468 | 593 | 24 | 8 | 9 | 3,102 | 17.01 | |
7 | 1,709 | 297 | 2 | 7 | 2,015 | 11.05 | ||
8 | 1,292 | 147 | 1 | 1 | 1,441 | 7.90 | ||
9 | 901 | 79 | 2 | 1 | 983 | 5.39 | ||
10 | 3,194 | 741 | 29 | 3,964 | 21.74 | |||
11 | 1,127 | 448 | 4 | 1,579 | 8.66 | |||
12 | 683 | 340 | 8 | 1 | 1,032 | 5.66 | ||
13 | 364 | 217 | 5 | 586 | 3.21 | |||
14 | 275 | 234 | 509 | 2.79 | ||||
15 | 198 | 190 | 388 | 2.12 | ||||
16 | 123 | 24 | 147 | 0.80 | ||||
17 | 108 | 50 | 158 | 0.86 | ||||
18 | 56 | 25 | 81 | 0.44 | ||||
19 | 50 | 19 | 69 | 0.37 | ||||
20 | 43 | 17 | 60 | 0.32 | ||||
21 | 13 | 23 | 36 | 0.19 | ||||
22 | 20 | 10 | 30 | 0.16 | ||||
23 | 18 | 5 | 23 | 0.12 | ||||
24 | 20 | 4 | 24 | 0.13 | ||||
25 | 12 | 5 | 17 | 0.09 | ||||
26 | 15 | 4 | 19 | 0.10 | ||||
27 | 8 | 1 | 9 | 0.04 | ||||
28 | 6 | 1 | 7 | 0.03 | ||||
29 | 25 | 25 | 0.13 | |||||
30 | 0 | |||||||
< 30 | 29 | 29 | 0.15 | |||||
Total | 6,387 | 8,728 | 2,899 | 101 | 41 | 70 | 18,226 | |
Percentage (%) | 35.04 | 47.89 | 15.91 | 0.55 | 0.22 | 0.38 |
The ten tandem repeats of EST-SSR (3,964, 21.74%) were found to be the most common (Table 5), followed by six (3,102, 17.01%), seven tandem repeats (2,015, 11.05%), and five, eleven, and eight tandem repeats were 1,893 (10.38%), 1,579 (8.66%) and 1,441 (7.90%), respectively, while the remaining of tandem repeat for individual contributed < 10% of EST-SSR. The AG/CT was most dominant motif (7,351; 40.33%) followed by AT/AT (1050; 5.76%), AC/GT (301; 1.65%), and CG/CG (26; 0.14%) in the di-nucleotide repeats. The most abundant repeat motif was AAG/CTT (706; 3.87%) followed by ACC/GGT (519; 2.84%), ATC/ATG (387; 2.12%), AGC/CTG (309; 1.69%), and AGG/CCT (281; 1.54%) in the tri-nucleotide repeats (Table 5; Fig. 2).
Repeat motifs | Number of repeats | ||||||||
---|---|---|---|---|---|---|---|---|---|
5 | 6 | 7 | 8 | 9 | 10 | > 10 | Total | Frequency (%) | |
Mono-nucleotide | |||||||||
A/T | — | — | — | — | — | 3164 | 3081 | 6245 | 34.26 |
C/G | — | — | — | — | — | 30 | 112 | 142 | 0.77 |
6387 | 35.04 | ||||||||
Di-nucleotide | |||||||||
AG/CT | — | 2033 | 1471 | 1088 | 749 | 634 | 1376 | 7351 | 40.33 |
AT/AT | — | 293 | 165 | 163 | 125 | 89 | 215 | 1050 | 5.76 |
AC/GT | — | 121 | 70 | 41 | 25 | 18 | 26 | 301 | 1.65 |
CG/CG | — | 21 | 3 | — | 2 | — | — | 26 | 0.14 |
8,728 | 47.88 | ||||||||
Tri-nucleotide | |||||||||
AAG/CTT | 377 | 159 | 73 | 47 | 31 | 11 | 8 | 706 | 3.87 |
ACC/GGT | 308 | 108 | 62 | 32 | 6 | 3 | — | 519 | 2.84 |
ATC/ATG | 254 | 67 | 32 | 12 | 13 | 6 | 3 | 387 | 2.12 |
AGC/CTG | 202 | 49 | 33 | 20 | 1 | 4 | — | 309 | 1.69 |
AGG/CCT | 189 | 52 | 26 | 5 | 7 | 2 | — | 281 | 1.54 |
Others | 407 | 158 | 71 | 31 | 21 | 3 | 6 | 697 | 3.82 |
2,899 | 15.90 | ||||||||
Quad—nucleotide | |||||||||
AAAT/ATTT | 24 | — | — | — | — | — | — | 24 | 0.13 |
AAAC/GTTT | 11 | 2 | — | — | — | — | — | 13 | 0.07 |
AATC/ATTG | 11 | 2 | — | — | — | — | — | 13 | 0.07 |
AGAT/ATCT | 7 | 3 | 1 | — | 2 | — | — | 13 | 0.07 |
Others | 19 | 17 | 1 | 1 | — | — | — | 38 | 0.21 |
101 | 0.55 | ||||||||
Penta-nucleotide | 30 | 8 | — | 1 | 1 | — | 1 | 41 | 0.22 |
Hexa-nucleotide | 54 | 9 | 7 | — | — | — | — | 70 | 0.38 |
111 | 0.60 | ||||||||
Total | 1,893 | 3,102 | 2,015 | 1,441 | 983 | 3,964 | 4,829 | 18227 | 100 |
Frequency (%) | 10.38569 | 17.01871 | 11.05503 | 7.905854 | 5.393098 | 21.74796 | 26.49366 | 100 |
Functional annotation of Bergenia transcriptome
De novo assembled unigenes of B. ciliata were annotated against the functional public databases; Nt (Non-redundant nucleotide sequence), Nr (Non-redundant protein sequence), KO (KEGG Orthology), Swiss-Prot, Pfam, GO (Gene Ontology), and KOG (Eukaryotic Orthologous Group) databases (Table 6; Fig. 3). All the sequences were generated by Blast and splicing algorithm, which was applied for the comparison and to obtain a relevant sequence and associating annotation.
Number of Unigenes | Percentage (%) | |
---|---|---|
Annotated in NR | 53577 | 82.41 |
Annotated in NT | 44297 | 68.14 |
Annotated in KO | 22540 | 34.67 |
Annotated in Swissport | 42287 | 65.05 |
Annotated in PFAM | 20609 | 31.7 |
Annotated in GO | 29477 | 45.34 |
Annotated in KOG | 15027 | 23.11 |
Annotated in all Databases | 4954 | 7.62 |
Annotated in at least one Database (overall*) | 54732 | 84.19 |
Total Unigenes | 65010 | 100 |
* the number of unigenes which can be annotated with at least one functional database |
Out of the 65,010 unigenes, 54,732 unigenes were successfully annotated. However, 53,577 (82.41%) unigenes showed efficient homology with the proteins in the Nr database, while 44,297 (68.14%) of the control sequences relate with the Nt database entries (Fig. 3). As for the species distribution annotations of B. ciliata, Vitis vinifera (Vitaceae) has the highest similarity score (25.7%, followed by Quercus suber (Fagaceae) 7.6%, Juglans regia (Juglandaceae) 5.1%, Nelumbo nucifera (Nelumbonaceae) 3.3% and Hevea brasiliensis (Euphorbiaceae) 3.2% (Fig. 4).
Functional annotation in the KOG database was based on 25 functional groups, including metabolic functions, cellular structure, and signal transduction. The post-translational modification, protein turnover and chaperones (2266 genes, 15.07%) represents the largest group, followed by general function prediction (1861 genes, 12.38%), translation, ribosomal structure, and biogenesis (1638 genes, 10.90%), and nuclear structures and cell motility as the smallest groups (Fig. 5).
Annotation in GO database grouped 29,477 (45.34%) unigenes into three major categories such as cellular component (51232, 37.8%), biological process (50744, 37.5%), and molecular functions (33274, 24.6%), with 51 subcategories (Fig. 6). Most of the unigenes in the molecular function are specified for binding (15346) and catalytic activity (14136), while metabolic process (14927) and cellular process (14264) are the major subcategories in the biological process. In total, 22,540 (34.67%) of unigenes were identified in the database, which were significantly assigned to 125 metabolic pathways.
Annotation in KEGG database, categorized metabolic pathways into 11 main divisions. Metabolic information processing (with 21717 genes) found to be the largest division, followed by genetic information processing (5,167), organismal systems (4,282), cellular processing (2,482), and environmental information processing (2,286) Further, in Swiss-Prot database 42,287 (65.05%) unigenes were found matched (Table 6; Fig. 7).
Total 96 primer pairs were synthesized and checked for their amplification and polymorphism. Out of 96 primer pairs, 37 were successfully amplified, while the remaining 59 primers did not show any amplification even at different annealing temperatures. Among the 37 primer pairs, 32 successfully produced the desired amplified products, while the remaining 5 PCR products were larger or smaller than the expected size (ESM Fig. 1). Total of eight individuals from eight different populations of B. ciliata were used as PCR templates, and from 37 primers, 18 primer pairs were found polymorphic (ESM Fig. 2; ESM Fig. 3), whereas 14 pairs were identified as monomorphic (Table 7).
SSR | Repeat Type | Repeat motif | Forward Primer | Tm (ºC) | Reverse Primer | Tm (ºC) | Product Size (bp) |
---|---|---|---|---|---|---|---|
BC2 | Di | (AG)10 | CTGAGGCCAAAGAAAGTGCG | 59.7 | ACAAAGTCACACGGGCATCT | 59.8 | 190–250 |
BC7 | Di | (AG)6 | ACAATCAACAAGGCATCATGC | 57.7 | TCCAACTTACTGGGCAGGAA | 58.5 | 180–250 |
BC8 | Di | (AG)7 | TGGTCTGACAGTGAGTTCGC | 59.9 | TCGCCATCACAGAAGCCTTT | 59.9 | 140–160 |
BC17 | Di | (AT)6 | TACAAATACACCGGTGCAGG | 57.8 | AAATCTGGAGGGTTGCCAGG | 59.9 | 125–150 |
BC23 | Di | (CT)10 | TCACTCGTAAAGTCGACCCT | 58.0 | GGACGTCGAGCGAACAAATG | 59.9 | 140–180 |
BC26 | Di | (CT)11 | CAGCCAGTACTCTGCCCAAA | 59.9 | ACTCTCCACCTCCTGACCTC | 59.9 | 130–150 |
BC29 | Di | (CT)6 | ACGCCATTCTCACTGTACCT | 58.7 | TCAGCGGAGAAACAACCTCC | 59.9 | 180–210 |
BC33 | Di | (CT)6 | CATTGTTTCCTCCGTTGCCC | 59.7 | CTCCGTTTGGTTCTCGGGAA | 59.9 | 200–250 |
BC38 | Di | (CT)8 | TCGCAAACTCTCTCACTCTCC | 59.4 | AAACTTCAACCGCGGGATCT | 59.9 | 140–160 |
BC50 | Di | (GA)8 | TCCTCGAGTATTTGTCGCAG | 57.4 | GCGTTGAGAATCATTCGCCC | 59.9 | 140–160 |
BC53 | Di | (GA)9 | ACCGCCAAGAGCTTGATGTA | 59.3 | TGTTGAGTCGTTCGTCTTCC | 57.8 | 110–150 |
BC58 | Di | (TA)8 | ACACATGTTTACACGCGCAT | 59.1 | GAAGTGCACCCAAAGCATGA | 59.0 | 175–210 |
BC67 | Tri | (AGA)5 | ACCAATGTGAGGGTTCCTTCT | 58.9 | CCAACACACAGCAAGACAGC | 59.9 | 160–180 |
BC71 | Tri | (CAC)6 | AGAGGCACAATGTGGAAGAGA | 59.0 | TTCATGTAGTCCGGCAGCTC | 59.8 | 190–210 |
BC73 | Tri | (GAA)5 | AGTGTGGTACTCCTCGCTCT | 59.9 | ATCACGTCGTCGGAGAATCG | 59.9 | 220–250 |
BC74 | Tri | (GAC)5 | GGCAAACCTCCTCCCAAGAA | 59.8 | TTCCCTTGCCAGTTCCTCAC | 59.8 | 230–250 |
BC84 | Tri | (TTC)5 | GCTTGCAGTTTACACCCACA | 58.9 | CGCCTCCACGTCTATGTCTC | 59.9 | 160–190 |
BC87 | Tri | (TTTA)5 | GGAAAGGTTGGATTGCTCCC | 58.8 | GATCTGCTGCAGAACTGGGT | 60.0 | 100–130 |
Characterization of B. ciliata transcriptome
Next-generation sequencing has been gradually used to analyze the transcriptome sequencing and assembly in various plants due to the accuracy, high efficiency, high speed, and low cost (Zhou et al. 2018; Dhiman et al. 2020; Shah et al. 2020; Xie et al. 2021). EST-SSR markers have been used widely in assessing genetic diversity, DNA fingerprinting, marker-assisted breeding, and gene mapping due to their high polymorphism, codominant inheritance, and repeatability (Ercisli et al. 2011; Palumbo et al. 2018; Wang et al. 2020; Gaurav et al. 2021). An enormous data generated by transcriptome sequencing delivers extensive details and a significant source for the SSRs and gene development in several life forms (Li et al. 2019; Chabikwa et al. 2020; Saina et al. 2021). Therefore, transcriptome sequences have been successfully employed in both model and non-model plants for detection of marker-based functional variation and gene-associated genetic analysis (Klepikova et al. 2016; Zhang et al. 2020; Raizada et al. 2021). The lacks information on transcriptome data and EST sequences prompted us to sequence the transcriptome, and eventually develop the EST-SSR markers through NGS technology in B. ciliata. A total of 21,490,725 paired-end raw reads were constructed, and 21,277,286 best-quality clean reads were generated with 98.02% Q20 level (base quality > 20) during the present study (Table 2), which assures the sequencing quality. This is in congruence with the earlier studies on Neolitsea sericea (Chen et al. 2015), and Stephanandra incisa (Zhang et al. 2021). The mean N50 sizes (1,349 bp) of unigenes constructed in this study were found smaller than Stephanandra incisa (7,212 bp; Zhang et al. 2021) and Paeonia cultivars (1,780bp; He et al. 2020). However, total of 65,010 unigenes were assembled to transcriptome of B. ciliata with the average length of 1,024 bp, which was significantly larger than earlier reported transcriptome studies of Cynanchum komarovii (604 bp; Ma et al. 2015), Raphanus sativus (576 bp; Wu et at. 2015), Rhododendron rex (526.74 bp; Zhang et al. 2017), and Magnolia wufengensis (695 bp; Wang et al. 2019), but shorter than Cyamopsis tetragonoloba (1583.43 bp; Rawal et al. 2017), Phragmites karka (1354.16 bp; Nayak et al. 2020), and Lathyrus sativus (1250 bp; Hao et al. 2017). This could be due to a mismatch between the parameters and the assembler, as well as differences in the nature of the species (Xing et al. 2017). These unigenes were enriched for proteins that maintain the essential functions of B. ciliata. Ultimately, the massive amount of transcriptome sequences generated in this study will be further significant for investigating gene function and molecular mechanisms.
The assembled unigenes of B. ciliata were annotated to known public databases successfully; Nt, Nr, KOG, KEGG, GO, Pfam, and Swiss-Prot (Table 6). These annotations may provide useful facts for the future molecular studies in the genus Bergenia. The limited annotation or short sequence length of the genus Bergenia and related species in the present database could be the reason for the genes which did not match any functional annotations (Xing et al. 2017). In the KOG classification, the first and second groups found in this study were the post-translational modification, protein turnover chaperones, and general function prediction (Fig. 5), which were similar to the studies performed by Du et al. (2019) but distinct from the work of Cao et al. (2018). Overall, 4,954 (7.62%) unigenes in the database had significant matches and were classified into seven main categories, including 271 KEGG pathways. In this study the biggest group in three GO categories were metabolic process, and cellular process under the biological process and the cell and cell part in cellular component (Fig. 6), which were similar to Hevea brasiliensis (Li et al. 2012). These outcomes revealed that B. ciliata has active metabolic processes and can synthesize plethora of metabolites. GO functional annotation helped to describe the macro level of gene functions and predict the physiological role of each unigene (Kumar et al. 2014). The results revealed various molecular functions of assembled unigenes, suggesting their involvement in diverse metabolic pathways. These findings indicate that B. ciliata makes a huge investment in gene transcription control and capacity, as well as cell maintenance and defence. Briefly, functional analysis revealed that RNA-seq-based de novo transcriptome analysis for B. ciliata a non-model organism with a complex genome, will facilitate further research on the physiology, molecular genetics, and biochemistry of B. ciliata or related species.
EST-SSR markers play a significant role in the genetic diversity and population structure analysis, development of genetic maps, marker-assisted selection, genomics comparison, breeding, and species conservation (Chen et al. 2005; Marconi et al. 2011; Ahmad et al. 2018; Sahoo et al. 2021). Total of 18,226 potential EST-SSRs were recognized to 65,010 unigenes in this study, which has significantly enhanced the SSR resources availability for marker development in the B. ciliata and other closely related taxa of the family Saxifragaceae. The di-nucleotide (47.88%) repeats were the most abundant type, which are corresponding with other species of the plants studied earlier (Chen et al., 2015; Jia et al., 2016; Biswas et al., 2020), followed by mono- nucleotide (35.04%) and tri-nucleotide (15.90%) repeats (Fig. 2; Table 5). In dicotyledonous plants, di-nucleotide repeats have been found to be the most abundant SSR repeat type (Yang et al. 2020). Overexpression of UTRs as compared to open reading frames may lead to di-nucleotide repeat sequence (Qiu et al. 2010). Additionally, AG/CT was the most common di-nucleotide repeat (40.33%).
In an mRNA population, the AG/CT motif can reveal UCU and CUC codons, which translate to the Ala and Leu amino acids, are found in proteins at a higher frequency than other amino acids and can be identified in an mRNA population (Chen L. Y. et al. 2015). As a result, AG/CT motifs can be found in abundance in plant EST libraries (Morgante et al. 2002; Chen L. Y. et al. 2015). In our study, the CG/CG (0.14%) was the lowest prominent motif. It could be due to cytosine methylation, which inhibits the transcription in some plants (Chen L. Y. et al. 2015; Xing et al. 2017). The AAG/CTT repeat motif was the most dominant in tri-nucleotide repeat (3.87%). The earlier studies on Ricinus communis, Neolitsea sericea, Sesamum indicum, and Cucumis sativus, also revealed the tri-nucleotide AAG motif is highly dominant and valuable in dicotyledonous plants (Qiu et al. 2010; Cavagnaro et al. 2010; Wei et al. 2011; Chen L. Y. et al. 2015; Zhang et al. 2021).
EST-SSRs frequency of unigenes in B. ciliata was (Table 2) is significantly higher than Citrus reticulata (Long 2014), Quercus austrocochinchinensis (An et al. 2016), Rosa roxburghii (Yan et al. 2015), Rhododendron fortune (Yang et al. 2018), but lower than Tagetes erecta (Zhang et al. 2018), and Lonicera caerulea (Zhang et al. 2016). The frequency of SSRs, diversified from species to species depends upon the SSR database size, database-mining tools, and the search criteria, in different studies (Gao et al. 2003; Varshney et al. 2005). A significant number of high-quality EST-SSRs generated during present investigation will be immensely helpful in better understanding of the genetic diversity, population structure, and in breeding programs.
A total 96 primer pairs were randomly selected for PCR validation, out of which 37 (38.54%) were successfully amplified by genomic DNA of B. ciliata, 18 (18.75%) primer pairs were polymorphic among the eight geographically isolated populations of B. ciliata tested, and the remaining 59 (61.45%) primer pairs failed to amplify the PCR products at different annealing temperatures or else that were formed bigger than expected size of PCR products (Fig. S1).
The PCR success rate was higher in B. ciliata than the rates reported in other species like Curcuma alismatifolia (11.33%; Taheri et al. 2019), Magnolia sinostellata (15.33%; Wang et al. 2019), Salix psammophila (16.07%; Jia et al. 2016). However, the success rate was found lower than Sesamum indicum (80%; Wei et al. 2011) and Hevea brasiliensis (55.45%; Li et al. 2012). Thus, in this study the polymorphic ratio of EST-SSR markers was found significantly high. This indicates that newly developed EST-SSR markers will be significant and useful for the population genetic structure and evolutionary studies among the Bergenia species.
The transferability of EST-SSRs markers among related species is higher because it contains conserved sequences among homologous genes and their origin from the transcribed regions in genomes (Wu et al. 2014; Guo et al. 2014). In the current study, the 18 newly developed polymorphic markers will be applied to estimate genetic diversity in the genus Bergenia and cross transferability in allied species of the family Saxifragaceae. Thus, these markers will be helpful in providing valuable sequence resources for the development of molecular markers in Bergenia species.
Complete transcriptomic information obtained from B. ciliata, was useful in mining SSR markers, thus providing a large and well-assembled transcript dataset. A total 65,010 unigenes with the mean length of 1,024 bp were generated, 54,732 unigenes were efficiently annotated into the Nr (53,577: 82.41%), Nt (44,297: 68.14%), Pfam (20,609: 31.7%), Swiss-Prot (42,287: 65.05%), KEGG (22,540: 34.67%), KOG (15,027: 23.11%), and GO (29477: 45.34%) databases. Based on these unigene sequences, 18,226 potential EST-SSRs were identified and developed. The ninety-six primer pairs were selected randomly, synthesized, and utilized for the evaluation of polymorphism. Eighteen primer pairs (18.75%) were found polymorphic and showed polymorphism among the eight individuals of eight populations. The present study has provided a rich dataset of B. ciliata transcripts, that will enrich the available genomic resources and serve as an important resource for future genomic and genetic studies. These outcomes would also be helpful in exploring the biosynthetic pathways and regulatory mechanisms of active phytoconstituents of therapeutic value present in B. ciliata.
Acknowledgements
The authors are thankful to the Director, CSIR-National Botanical Research Institute, Lucknow for facilities. The Science and Engineering Research Board (SERB), Government of India, New Delhi (SERB No: EEQ/2016/000096) is thankfully acknowledged for financial support, and we also thank the Forest Departments of Jammu and Kashmir, Himachal Pradesh, and Uttarakhand for permission to collect the representative samples of B. ciliata from different forest locales. This manuscript has the institutional communication number: CSIR-NBRI_MS/2022/05/10.
Author contributions
Tikam Singh Rana conceived and designed the study. Harish Chandra Singh and Vandana Tiwari conducted the experiments and wrote the first draft of the manuscript. Tikam Singh Rana and Avinash Tiwari finalized the manuscript. All authors read and approved the manuscript.
Availability of data and materials
The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive of NCBI under accession number PRJNA775824.
Funding
This research was partially funded by the Science and Engineering Research Board (SERB), Government of India, New Delhi (SERB No: EEQ/2016/000096).
Conflict of interest
The authors declare that they have no conflict of interest.