In vitro propagation of W. coagulans
Multiple shoot morphogenesis from the nodal explants with an average of 50 shoots/explant and 100% regeneration efficiency was noted. Callus formation at the cut end and cluster of shoots at each axil of the nodal segments was observed (Figure 1 ab). Healthy and elongated shoots when transferred to rooting medium resulted in formation of dense network of thin and robust roots (Figure 1 c).
RNA isolation and cDNA library preparation
Both leaf (L1 & L2) and root (R1 & R2) tissues yielded ~10 µg and ~6 µg total RNA, respectively. Leaf and root paired-end libraries were prepared using Illumina library preparation kit and its profiling was done using DNA HS Chip. The average size of leaf and root cDNA libraries were 330 bp and 370 bp, respectively. These libraries were then sequenced on Illumina platform using 2x150 bp PE chemistry.
Sequencing, de novo assembly and unigene identification
Sequencing of leaf (WcL) and root (WcR) libraries generated 58,878,942 and 42,683,928 raw reads respectively. The raw read sequences were submitted to NCBI sequence read archive (SRA) and were published with SRX9006975 (leaf) and SRX9006976 (root) accession numbers. Average 8.08 GB (leaf) and 6.35 GB (root) of raw reads were obtained, which were further filtered for HQ reads followed by de novo assembly. De novo assembly produced 292,074 (WcL) and 16,474 (WcR) transcripts with 410 bp (WcL) and 265.58 bp (WcR) average transcript length. Transcripts length of leaf assembly ranged from 201 bp – 57,369 bp, while that of root assembly ranged from 401 bp – 4637 bp (Table 1). Transcript length distribution pattern indicated that majority of the leaf (~ 61%) and root (~85%) transcripts ranged between 200 – 300 bp (Fig. 2 a).
CD-HIT-EST was used for identification of unigenes from the assembled HQ reads. Total 267,119 and 15,758 unigenes with 392 bp and 265.24 bp average length were identified from leaf and root transcripts, respectively (Table 1). Unigene length distribution pattern for both tissues are represented in Fig. 2 b.
Functional characterization of unigenes
For functional annotation, all the predicted unigenes were aligned against different protein databases using BLASTx program. No significant variations were obtained in the annotation pattern of the replicates. Among total unigenes annotated (18,8475 – WcL; 9,519 - WcR) using different protein databases, 11,1785 (WcL) and 4,278 (WcR) were annotated using only one database, such that matches for 99% of both WcL and WcR were found only in Nr database, while that for 0.04%, 0.15% and 0.08% of WcL and 0.4%, 0.11%, and 0.09% of WcR were found only in Uniprot, Pfam and KOG databases, respectively (Fig. 3 ab).
The BLAST hits were obtained for 188,174 (70.44%) leaf and 9,448 (59.9%) root unigenes against NCBI Nr protein database, out of which 50.04% (leaf) and 39.65% (root) showed ≥ 99% similarity with known proteins. In contrast, BLAST annotation with Uniprot and Pfam returned matches for 32,976 and 53,908 unigenes, respectively. Lower proportion of leaf unigenes (29.8%) and approximately equivalent proportion of root unigenes (46.2%) were collectively annotated using Uniprot and Pfam (Fig. 3ab). The unigenes were further classified into different protein families based on their annotations derived from Pfam database. Distribution pattern of unigenes across different protein families was determined by identifying number of unigenes coding for similar proteins, and it was revealed that Pkinase, ACR_tran, ABC_tran, Pkinase_Tyr were the most abundant domains in leaf transcriptome, while HSP70, ACR_tran, ABC_tran, GTP_EFTU were most abundant in root transcriptome.
Further, unigenes were annotated against the cluster of eukaryotic orthologous groups (KOG) and were categorised into various KOG functional groups. Out of 40,124 leaf and 2,129 root unigenes identified through KOG database, 4108 (10.2%) leaf and 290 (13.6%) root annotated unigenes belonged to class R (general function prediction only) and class J (translation, ribosomal structure and biogenesis), respectively (Fig. 4 ab).
For functional classification, GO annotation of the unigenes identified through Nr database was performed using blast2GO. A total of 102,029 leaf and 3,506 root unigenes were classified into (sub-categories) 47 and 45 functional (sub-categories) groups, respectively (Fig. 5 ab). From the total GO assigned leaf unigenes, 80.1% of them were classified under ‘molecular functions’, 73.7% under ‘biological processes’ and 51.5% under ‘cellular components’. Similar GO distribution pattern with molecular functions being most enriched GO class with 83% (2910) of total annotated unigenes followed by 76.5% of unigenes in biological processes and 54.3% of unigenes in cellular components, was also recorded for annotated root unigenes. Majority of the unigenes of both tissues were involved in 2 or more functions, while only 26.2% of leaf and 22.8% of root unigenes were classified just under either one of the GO categories. Leaf unigenes were dominantly involved in ‘metabolic process’, ‘catalytic activity’, and ‘membrane’ functions grouped under biological process, molecular functions and cellular component of GO categories, respectively (Fig. 5 ab). Unlike leaf, majority of the root unigenes classified under ‘cellular components’ were grouped into a different functional category ‘cell’, while the most enriched functional groups under ‘biological process’ and ‘metabolic functions’ were similar to that of leaf (Fig. 5 ab).
Pathway analysis of unigene libraries
Ortholog assignment and mapping of the unigenes to the biological pathways was performed using KAAS. The active biological pathways in leaf and root tissues were identified by mapping the annotated leaf and root unigene sequences to the reference pathways available in KEGG. In total, 20,927 (WcL) and 2,474 (WcR) unigene sequences were assigned 491 (leaf) and 458 (root) different biological pathways (level 3) which were classified into 7 categories (level 1) and further subdivided into 32 (leaf) and 31 (root) subcategories (level 2). Out of the total pathway assigned unigenes, only 49% of leaf and 38% of root unigene sequences represented single pathway, while majority of the unigenes were assigned to as many as 118 and 72 different pathways/processes in leaves and roots, respectively (Table, Fig. 6ab). Total 12298 (25%) WcL and 1682 (23.14%) WcR unigenes were involved in various types of metabolisms, from which ‘carbohydrate metabolism’ represented largest cluster of unigenes followed by ‘amino acid metabolism’ and ‘energy metabolism’ (Fig. 6 ab). Since the study was aimed to identify putative genes involved in steroidal lactone biosynthesis, unigene clusters representing secondary metabolite biosynthesis were also identified. Among all the secondary metabolite biosynthesis pathways, maximum unigenes clustered for “terpenoid backbone synthesis [PATH:ko00900]”, followed by “ubiquinone and other terpenoid-quinone biosynthesis [PATH:ko00130]” and “phenylpropanoid biosynthesis [PATH:ko00940]” with 129, 103 and 80 leaf unigenes and 22, 7 and 7 root unigenes, respectively. BRITE functional hierarchies were assigned to 17,188 leaf and 2286 root unigenes, out of which 2935 WcL and 348 WcR were classified under “protein families: metabolisms”.
Identification of withanolide precursor biosynthesis genes
The precursor molecule(s) for withanolide biosynthesis are derived from terpenoid backbone and steroid biosynthesis pathways (Gupta, et al. 2013). Pathway annotations of unigenes were used to identify different enzymes leading to biosynthesis of withanolide precursor in both leaf and root tissues. The withanolide precursor, 24-Methylenecholesterol is derived from Farnesyl-pyrophosphate through a 12 step steroid biosynthesis pathway, which is derived from Acetyl-CoA via mevalonate pathway (7 steps) and D-glyceraldehyde-3-phosphate via MEP/DOXP pathway (8 steps) of the terpenoid backbone biosynthesis (Fig. 7). In total, 16 and 13 enzymes are involved in biosynthesis of precursors derived from terpenoid backbone and steroid biosynthesis pathways, respectively. All the 29 enzymes were present in the both leaf transcriptome assemblies, while only few enzymes (10) could be annotated from both root assemblies (Table 2). Moreover, all the enzymes involved were encoded by more than one unigenes annotated from leaf transcriptome, while in root transcriptome, majority of the enzymes were encoded by a single unigene.
Putative genes related to biosynthesis of withanolides
In W. somnifera, biosynthesis of different types of withanolides from its precursor (24-methylenecholesterol) is mediated through members of cytochrome P450 (CYP 450), glycosyltransferase (GT) and methyltransferase (MT) gene families (Senthil et al. 2010; Dhar et al. 2015). With the help of functional annotation, 3642 leaf and 185 root unigenes were classified as members of CYP450, GT and MT gene families, out of which only few (8 - CYP450; 16 – GT; 56 - MT) were present in both leaves and roots, while rest of the members were tissue specific. Of the total unigenes mapped against these gene families, ‘cytochrome P450 CYP2 subfamily’, ‘cytochrome P450 CYP4 CYP19 CYP26 subfamilies’, ‘glycosyl transferase’, ‘glycosyl transferase family 1’, ‘glycosyl transferase family (trehalose-6-phosphate synthase)’, ‘methyltransferase’, ‘SAM dependent methyltransferase’, ‘serine hydroxymethyltransferase’ and ‘DNA (cytosine-5)-methyltransferase 1’ were most abundant in leaves, while ‘cytochrome P450’, ‘cytochrome P450 CYP4 CYP19 CYP26 subfamilies’, ‘glycosyl transferase’, ‘glycosyl transferase family 1’, ‘dolichyl-diphosphooligosaccharide---protein glycosyltransferase’, ‘SAM dependent methyltransferase’, ‘methayltransferase’, ‘RNA methyltrasnferase’ and ‘glycine hydroxymethylatransferase’ were the major class of CYP540, GT and MT gene families.
Differential gene expression analysis
The abundance estimates of leaf and root unigenes were calculated in the form of transcripts per million (TPM), which were then compared for identification of differentially expressed genes. Digital expression profile for the genes commonly expressed in both leaf and root tissues showed significant variations in the expression levels of (putative) genes involved in withanolide precursor biosynthesis and conversion of precursor to different withanolides with average expression ranging from 0.1 – 7 and 0.01 – 9.2, respectively (Fig. 8 ab). Among the commonly expressed genes in both tissues, only 6 withanolide precursor biosynthesis genes and 53 putative withanolide biosynthesis genes expressed differentially between leaf and root tissues. Out of total 59 differentially expressed genes, 34 were upregulated and 25 were downregulated (Fig. 8 cd). Hierarchical clustering further revealed the co-expression pattern of the different genes involved in precursor biosynthesis and conversion of precursor to withanolides (Fig. 8 ab).
SSR markers
28,852 leaf and 886 root SSRs were identified through screening of 297,412 leaf and 16,474 root transcripts, respectively. 24,841 (8.3%) leaf and 772 (4.7%) root transcripts accounted for the total SSRs in both tissues, out of which 11.6% (leaf) and 9.5% (root) sequences harboured more than one SSR. Comparatively higher proportion of root SSRs (10.5%) than that of leaf SSRs (6.7%) were present in compound formation (Table 3). Leaf SSRs were distributed across 6 repeat type classes, while hexa-nucleotide repeats were absent in root transcript sequences. Mono-nucleotide repeats were most abundant (22791 – leaf; 676 – root) in both tissues, whereas hexa-nucleotide repeats (64) in leaf and tetra-nucleotide repeats (4) in root were less (Fig. 9). From the 82 types of SSRs identified from the leaf assembly, A/T, AG/CT, AAG/CTT, AAAG/CTTT, AATGG/ATTCC, AATGGT/ACCATT were most mono-, di-, tri-, tetra-, penta- and hexa- nucleotide repeats. Similarly, 23 different types of repeat units contributed to the SSR pool of the root library, out of which A/T, AC/GT, AAG/CTT, AAAC/GTTT, AAGG/CCTT and AATGG/ATTCC were most common in their respective repeat classes.
Validation of withanolide accumulation through qualitative and quantitative profiling
HPLC analysis revealed significant variations in the withanolide profile of leaves and roots (Figure 10). Among the 5 standard withanolides used in the present study, none of them were present in root extracts, while WL B (724.83 ± 0.5 ng/µl of extract), WF A (8.63 ± 0.5 ng/µl of extract) and WDL (10.25 ± 0.5 ng/µl of extract) were present in leaf extracts. Further, UV-DAD absorption maxima for the unidentified peaks/compounds in the chromatogram matched with that of characteristic absorption spectrum of withanolides given by Chaurasiya et al. (2008). Thus, the peaks obtained in chromatograms for both leaf and root (withanolide) extracts predominantly represent withanolide compounds. Based on these absorption spectrum studies and assuming that each peak corresponds to a different withanolide, approximately 50 withanolides were detected in leaves, while only few (~10) withanolides were detected in roots. Moreover, the relative abundance of various withanolides was significantly higher in leaves than that of roots (Figure 10).