Leaves and Roots of Micropropagated Plants Adopt Different Pathways for Withanolide Biosynthesis in Withania Coagulans: A Comparative Transcriptome Study

Withania coagulans (Stocks) Dunal commonly known as paneerbandh due to its milk coagulating properties, is a widely used medicinal herb, and micropropagated plants of the plant to produce some novel withanolides. Trasncscriptome sequencing of leaves and roots of micropropagated plants of W. coagulans assembled total 8.08 and 6.35 GB of raw reads into 292,074 and 16,474 high quality reads, out of which 267,119 and 15,758 unigenes were identied in WcL and WcR, respectively. Further, 40.6% WcL and 55.05% WcR unigenes were annotated using more than one database. Metabolic process and cellular components were identied as dominant categories in gene ontology, while total 20,927 WcL and 2,474 WcR unigenes were mapped to different biological pathways. KEGG classication aided in identication of genes involved in biosynthesis of withanolide precursor, 24-methylenecholesterol. All the genes related to withanolide precursor biosynthesis were present only in WcL, indicating de novo biosynthesis of withanolides, while absence of some rate limiting enzymes in WcR indicated towards biosynthesis of withanolides through salvage pathways. GTs, MTs and CYP450s were identied as putative genes involved in conversion of 24-methylenecholesterol to different withanolides. Differential expression of these genes further revealed details of enzymes involved in biosynthesis of tissue-specic withanolides. Further, withanolide proling through HPLC analysis ascertained the differential biosynthesis and accumulation of withanolides in both the tissues in relatively very less concentration conrmed their biosynthesis through salvage mechanism in roots. Therefore, the present study can be fruitful for future research and product development of W. coagulans through pathway engineering. Moreover, the SSRs identied in this study can be used in marker assisted breeding and selection of biochemically elite varieties of W. coagulans. root and EST The data for development species-specic outcome validated of vitro raised plants of W. coagulans.


Introduction
Withania coagulans (Stocks) Dunal, commonly known as "paneer bandh" or "Indian cheese maker" is one of the highly valued medicinal species of family Solanaceae, used as natural milk coagulant (Jain et al. 2012). The plant has been used in treatment of various diseases such as ulcers, rheumatism, dropsy, diabetes and cancer (Bhandari 1995;Jain et al. 2012). Anti-in ammatory, antitumor, immunomodulatory, cardioprotective, antioxidant, neuroprotective and antimicrobial properties of the plant have been attributed to a speci c class of secondary metabolites known as 'withanolides' (Maurya et al. 2010;Jain et al. 2012). These ergosterone derivatives are the principle bioactive components of genus Withania, and represent a class of naturally occurring C-28 steroidal lactones in which C-22 and C-23, or C-26 and C-23 of C-28 steroidal backbone are oxidised to form either -or -lactone rings, respectively (Gupta et al. 2015;Agarwal et al. 2017). Withanolides are pharmaceutically active compounds used in formulations for the treatment of cancer, immune disorders and neurodegenerative diseases such as Alzheimer's and Parkinson's (Chaurasiya et al. 2012). These critical medicinal properties led to an exponential rise in global demand of withanolides (Sharada et al. 2007).
Variations in the withanolide pro le of the plants growing in different agro-climatic conditions has greatly affected the e cacy of the pharmaceutical preparations and limits its uses at industrial scale. The plant has almost been extinct from its natural habitat due to overexploitation, reproductive failure and poor seed setting (Jain et al. 2009). Thus, there is an urgent need for the development of conservation strategies and enhanced production of bioactive compounds from the plants.
In vitro propagation has provided as an effective approach for both conservation of elite germplasms of endangered species and higher production of bioactive compounds from cultured plant cells (Jain et al. 2009;Nagella and Murthy 2010;Jain et al. 2011). Sangwan et al. (2007) also concluded that in vitro cultures could serve as potent alternative to the eld plants for production of therapeutically valuable compounds.
Manipulation of culture and growth conditions, use of elicitors, inducers and stress conditions to enhance the production of different bioactive compounds in various plant parts has been well documented (Zhao et al. 2005;Sivanandhan et al. 2014). Identi cation of regulatory compounds in metabolite biosynthesis is a key component for targeted and increased secondary metabolite production. However, due to limited information about withanolide biosynthesis pathways in W. coagulans, it is di cult to identify potential target genes to modulate withanolide biosynthesis process.
With recent advances in sequencing technology, elucidation of biosynthesis pathways and identi cation of regulatory proteins involved in biosynthesis of secondary metabolites(s) through transcriptome sequencing has become one of the most favored technique (Senthil et al. 2015;Han et al. 2016). Transcriptome sequencing and functional annotation has widely been used to identify regulatory mechanism(s) of important metabolites biosynthesised in different plants such as Catharanthus roseus (Shukla et al. 2006), Chamaemelum nobile (Liu et al. 2019), Pueraria lobate (Wang et al. 2015), Amaranthus palmeri (Salas-Perez et al. 2018), Swertia japonica  and Withania somnifera (Gupta et al. 2013b). Moreover, expressed sequence tags (ESTs) have also provided a useful tool for gene discovery, particularly in the non-model plants where no reference genome sequences are available (Gupta et al. 2013b).
Metabolic divergence from 24-methylene cholesterol into different types of withanolides is primarily mediated through various chemical reactions including cyclization, oxidation, hydroxylation and desaturation (Gupta et al. 2015;Agarwal et al. 2017). However, enzymes catalysing conversion of 24methylene cholesterol to different types of withanolides is still not clearly known. Studies in past decades have reported signi cant role of glycosyl transferases, methyl transferases, cytochrome P450s, and transcription factors in conversion of 24-methylene cholesterol to different types of withanolides in leaves and roots of W. somnifera (Senthil et al. 2015;Tripathi et al. 2017).
Further, withanolide production has been associated with the morphological differentiation, which corresponds to biosynthesis of tissue-, chemotypes-and species speci c withanolides, thereby indicating the variations in their corresponding gene expression pro les (Sharada et al. 2007). Till date, transcriptome studies are only limited to leaf and root tissues of W. somnifera, and not W. coagulans. Therefore, this study aimed to perform comprehensive transcriptome pro ling of in vitro cultured leaf and root tissues of W. coagulans with an objective to establish the basic understanding of withanolide biosynthesis pathway(s) in different plant parts and to provide a comparative account of gene expression pro les leading to withanolide biosynthesis. We have reported the total transcriptome data of leaf and root of W. coagulans using NGS technology and identi ed putative genes involved in withanolide biosynthesis for the rst time. The EST collection generated in this study has also been analysed for differentially expressed candidate genes which might be involved in withanolide biosynthesis in leaf and root tissues. The transcriptome data was further screened for SSRs, for development of species-speci c molecular markers that could facilitate its marker-assisted breeding and selection. The outcome of transcriptome analysis was further validated through comparative withanolide pro ling in leaves and roots of micropropagated/in vitro raised plants of W. coagulans.

Materials And Methods
Explant collection and in vitro propagation of W. coagulans Explants (nodal segments) were procured from W. coagulans plants grown in the garden located at Manipal University Jaipur, Rajasthan, India. The explants were thoroughly rinsed under running tap water for 5 min and were then washed with 5% (v/v) liquid detergent, Tween®-20 (HiMedia, Mumbai, India) for 5 min followed by washing with running tap water. These explants were then surface sterilised with 0.1% (w/v) HgCl 2 (HiMedia, Mumbai India) for 3 min under aseptic environment, followed by rinsing with sterile distilled water thrice.
After 4 wk of incubation, healthy elongated shoots (> 3 cm long) were excised and transferred onto root induction medium containing MS + 0.25 mgL -1 IBA (Indole-3-butyric acid) and were incubated into the plant growth chamber for 4 wks. The rooted plantlets were then used as source of plant material for further transcriptome sequencing and analysis. Isolation and quality assessment of RNA Plant material (leaf and root) was crushed in liq. N 2 using pre-chilled mortar and pestle and followed by total RNA extraction using Xcelgen Plant RNA Kit (Xcleris, Ahmedabad, India) as per manufacturer's protocol. The quality and quantity of the RNA isolated from leaf and root tissues was determined on 1% formaldehyde denaturing agarose gel and Qubit® 2.0 Fluorometer (ThermoFisher Scienti c, United States), respectively. cDNA synthesis, library preparation and quality analysis Illumina TruSeq Stranded mRNA Library Preparation Kit (Illumina Inc., USA) was used for cDNA synthesis and paired-end library preparation as per instructions given in the manufacturer's manual. Total RNA (~1 µg) was treated with OligodT beads to enrich mRNA fragments which were then puri ed, fragmented and primed for cDNA synthesis. First and second cDNA strands were synthesised using fragmented mRNA as template. cDNA was subjected to A-tailing, adapter-index ligation and ampli cation through PCR for library preparation. Quality and quantity of the ampli ed library was analysed on Bioanalyzer 2100 (Agilent Technologies, USA) using High Sensitivity (HS) DNA Chip (Agilent Technologies, USA).
Cluster generation, sequencing, de novo assembly and unigene prediction cDNA libraries of leaf and root tissues were loaded into Illumina platform (Illumina Inc., USA) for cluster generation and sequencing. The libraries were subjected to paired-end sequencing on Illumina platform (NextSeq500, Illumina Inc., USA) using 2x150 bp chemistry, generating ~ 5 GB data for both leaf and root libraries. To ensure high quality (HQ), the raw reads obtained from Illumina were ltered such that low quality reads and adapter sequences were trimmed using Trimmomatic version 0.36 (Bolger et al. 2014).
The HQ reads were then assembled into unique sequences using Trinity Transcriptome Assembler (version 2.1.1) with default (25 k-mers) parameters. Due to unavailability of the reference genome of W. coagulans, transcriptome assembly was prepared de novo.
The transcripts thus obtained were clustered using CD-HIT (v 4.6.1) package (Fu et al. 2012) for unigenes identi cation. Short redundant transcripts with 100% overlap and > 90% identity were identi ed and removed using CD-HIT-EST executable to obtain the non-redundant clustered transcripts termed as "unigenes".

Functional annotation, gene ontology and pathway assignment
The unigenes were annotated against the NCBI non-redundant (Nr) protein (http://www.ncbi.nlm.nih.gov), Uniprot (https://www.uniprot.org), KOG (http://www.ncbi.nlm.nih.gov/COG) and Pfam (https://pfam.xfam.org) databases using BLASTx (version 2.2.28+) sequence alignment tool with E-value threshold of 1e-5 such that only top hit for each sequence were recorded. Further, to assign functions to the unigenes, Nr database annotated proteins were subjected to gene ontology (GO) using blast2GO cli (version 1.4.1) software. The unigenes were classi ed into 3 different domains namely, biological process, cellular component and molecular function, in a way that unigenes associated with similar functions were assigned the same GO functional group and those having more than one GO term were assigned 2 or 3 GO domains.
Leaf and root unigenes were mapped against KEGG pathway repository and were assigned EC numbers using BLASTx with default bit-score threshold of 60. Category wise distribution of unigenes into metabolism, genetic information processing, environmental information processing, cellular processes, organismal systems, brite hierarchies and unclassi ed processes to identify unigene enriched pathways using KEGG automatic annotation server, KAAS (https://www.genome.jp/kegg/kaas/).

Digital gene expression pro ling
Transcript level abundance estimates for HQ paired-end reads were computed in terms of transcripts per million (TPM) using RSEM abundance estimation and Bowtie alignment method (Langmead et al. 2009).
The TPM values for all the transcripts were summed to obtain gene level TPM score for all enzymes involved in withanolide biosynthesis pathway. Differential gene expression pro ling of enzymes involved in terpenoid backbone and steroid biosynthesis pathways along with those belonging to cytochrome P450, methyltransferase and glycosyltransferase gene families was carried out using Web MeV (http://mev.tm4.org). Leaf and root TPM values were used in Web MeV and normalised using Deseq method followed by identi cation and preparation of heat map of differentially expressed genes using LIMMA (Ritchie et al. 2015). Genes with log fold change ≤ -1 or ≥ 1 and q-value ≤ 0.005 were considered differential and heat map of their expression levels was generated. Hierarchical clustering (HCl) of the differentially expressed genes was performed using Euclidean distance matrix and average linkage algorithm, as described previously (Gupta et al. 2013b). Further, a volcano plot of the log fold change vs average expression was also constructed to determine downregulated or upregulated genes in roots with respect to leaves.

Identi cation of Simple sequence repeats (SSRs)
All the assembled leaf and root transcripts were submitted to microsatellite identi cation tool, MISA (https://webblast.ipk-gatersleben.de/misa/index.php) for identi cation of SSRs (Beier et al. 2017). The microsatellite identi cation parameters were de ned as minimum 10, 6, 5, 5, 5 and 5 repeats for mono-, di-, tri-, tetra-, penta-and hexa-nucleotide, respectively. For compound microsatellites, 100 bases were set as maximum number between two SSR motifs. The occurrence and distribution frequency of different SSR motifs was also determined using MISA.

Qualitative and Quantitative Pro ling of Withanolides
Withanolides were extracted from dried plant material as per previously reported/optimized protocol (Jain et al. 2011).
Qualitative and quantitative pro ling of withanolides in leaves and roots was done using HPLC as per our previously reported methods with slight modi cations (Jain et al. 2011). 10 ml of each sample was injected through autosampler into 1260 In nity II HPLC system (Agilent, Germany), and separation was achieved through reverse phase column (Eclipse Plus C-18, 5 µm, 4.6 x 250 mm) in a solvent gradient of (A) deionized water (Merck Millipore, India) and (B) methanol (Qualigens, Mumbai, India) each containing 0.1% acetic acid (HiMedia, Mumbai, India) at 27 o C. The solvent gradient was set as A:B, 60:40 to 25:75 for 0 -30 min; 10:90 for 31 -45 min with 0.6 ml min -1 ow rate. The constituent withanolides were detected at 227 nm using UV-Diode array detector (UV-DAD). The characteristic absorbance spectra and retention time of withanolide standards were then used for their identi cation and quanti cation in leaf and root extracts (Jain et al. 2011). Five withanolide standards namely, withaferin-A (WF A), withanoside IV (WS IV), withanoside V (WS V), withanolide-B (WL B) and wedelolactone (WDL) procured from Natural Remedies Pvt. Ltd., India were used as markers for pro ling.

Results
In vitro propagation of W. coagulans Multiple shoot morphogenesis from the nodal explants with an average of 50 shoots/explant and 100% regeneration e ciency was noted. Callus formation at the cut end and cluster of shoots at each axil of the nodal segments was observed ( Fig. 1 ab). Healthy and elongated shoots when transferred to rooting medium resulted in formation of dense network of thin and robust roots (Fig. 1 c).

RNA isolation and cDNA library preparation
Leaf and root tissues yielded 10.4 µg and 6.6 µg total RNA with 694 ngµl -1 and 446 ngµl -1 qubit concentration, respectively. Leaf and root paired-end libraries were prepared using Illumina library preparation kit and its pro ling was done using DNA HS Chip. The average size of leaf and root cDNA libraries were 330 bp and 370 bp, respectively. These libraries were then sequenced on Illumina platform using 2x150 bp PE chemistry.
Sequencing, de novo assembly and unigene identi cation Sequencing of leaf (WcL) and root (WcR) libraries generated 58,878,942 and 42,683,928 raw reads respectively. The raw read sequences were submitted to NCBI sequence read archive (SRA) and were published with SRX9006975 (leaf) and SRX9006976 (root) accession numbers. Total 8.08 GB (leaf) and 6.35 GB (root) of raw reads were obtained, which were further ltered for HQ reads followed by de novo assembly. De novo assembly produced 292,074 (WcL) and 16,474 (WcR) transcripts with 410 bp (WcL) and 265.58 bp (WcR) average transcript length. Transcripts length of leaf assembly ranged from 201 bp -57,369 bp, while that of root assembly ranged from 401 bp -4637 bp (Table 1). Transcript length distribution pattern indicated that majority of the leaf (~ 61%) and root (~85%) transcripts ranged between 200 -300 bp (Fig. 2 a).
CD-HIT-EST was used for identi cation of unigenes from the assembled HQ reads. Total 267,119 and 15,758 unigenes with 392 bp and 265.24 bp average length were identi ed from leaf and root transcripts, respectively (Table 1). Unigene length distribution pattern for both tissues is represented in Fig. 2

Functional characterization of unigenes
For functional annotation, all the predicted unigenes were aligned against different protein databases using BLASTx program. Among total unigenes annotated (18,8475 -WcL; 9,519 -WcR) using different protein databases, 11,1785 (WcL) and 4,278 (WcR) were annotated using only one database, such that matches for 99% of both WcL and WcR were found only in Nr database, while that for 0.04%, 0.15% and 0.08% of WcL and 0.4%, 0.11%, and 0.09% of WcR were found only in Uniprot, Pfam and KOG databases, respectively (Fig. 3 ab).
The BLAST hits were obtained for 188,174 (70.44%) leaf and 9,448 (59.9%) root unigenes against NCBI Nr protein database, out of which 50.04% (leaf) and 39.65% (root) showed ≥ 99% similarity with known proteins. In contrast, BLAST annotation with Uniprot and Pfam returned matches for 32,976 and 53,908 unigenes, respectively. Lower proportion of leaf unigenes (29.8%) and approximately equivalent proportion of root unigenes (46.2%) were collectively annotated using Uniprot and Pfam (Fig. 3ab). The unigenes were further classi ed into different protein families based on their annotations derived from Pfam database. Distribution pattern of unigenes across different protein families was determined by identifying number of unigenes coding for similar proteins, and it was revealed that Pkinase, ACR_tran, ABC_tran, Pkinase_Tyr were the most abundant domains in leaf transcriptome, while HSP70, ACR_tran, ABC_tran, GTP_EFTU were most abundant in root transcriptome.
Further, unigenes were annotated against the cluster of eukaryotic orthologous groups (KOG) and were categorised into various KOG functional groups. Out of 40,124 leaf and 2,129 root unigenes identi ed through KOG database, 4108 (10.2%) leaf and 290 (13.6%) root annotated unigenes belonged to class R (general function prediction only) and class J (translation, ribosomal structure and biogenesis), respectively (Fig. 4 ab).
For functional classi cation, GO annotation of the unigenes identi ed through Nr database was performed using blast2GO. A total of 102,029 leaf and 3,506 root unigenes were classi ed into (subcategories) 47 and 45 functional (sub-categories) groups, respectively (Fig. 5 ab). From the total GO assigned leaf unigenes, 80.1% of them were classi ed under 'molecular functions', 73.7% under 'biological processes' and 51.5% under 'cellular components'. Similar GO distribution pattern with molecular functions being most enriched GO class with 83% (2910) of total annotated unigenes followed by 76.5% of unigenes in biological processes and 54.3% of unigenes in cellular components, was also recorded for annotated root unigenes. Majority of the unigenes of both tissues were involved in 2 or more functions, while only 26.2% of leaf and 22.8% of root unigenes were classi ed just under either one of the GO categories. Leaf unigenes were dominantly involved in 'metabolic process', 'catalytic activity', and 'membrane' functions grouped under biological process, molecular functions and cellular component of GO categories, respectively (Fig. 5 ab). Unlike leaf, majority of the root unigenes classi ed under 'cellular components' were grouped into a different functional category 'cell', while the most enriched functional groups under 'biological process' and 'metabolic functions' were similar to that of leaf (Fig. 5 ab).

Pathway analysis of unigene libraries
Ortholog assignment and mapping of the unigenes to the biological pathways was performed using KAAS. The active biological pathways in leaf and root tissues were identi ed by mapping the annotated leaf and root unigene sequences to the reference pathways available in KEGG. In total, 20,927 (WcL) and 2,474 (WcR) unigene sequences were assigned 491 (leaf) and 458 (root) different biological pathways (level 3) which were classi ed into 7 categories (level 1) and further subdivided into 32 (leaf) and 31 (root) subcategories (level 2). Out of the total pathway assigned unigenes, only 49% of leaf and 38% of root unigene sequences represented single pathway, while majority of the unigenes were assigned to as many as 118 and 72 different pathways/processes in leaves and roots, respectively (Fig. 6ab). Total 12298 (25%) WcL and 1682 (23.14%) WcR unigenes were involved in various types of metabolisms, from which 'carbohydrate metabolism' represented largest cluster of unigenes followed by 'amino acid metabolism' and 'energy metabolism' (Fig. 6 ab). Since the study was aimed to identify putative genes involved in steroidal lactone biosynthesis, unigene clusters representing secondary metabolite biosynthesis were also identi ed. Among all the secondary metabolite biosynthesis pathways, maximum unigenes clustered for "terpenoid backbone synthesis [PATH:ko00900]", followed by "ubiquinone and other terpenoid-quinone biosynthesis [PATH:ko00130]" and "phenylpropanoid biosynthesis [PATH:ko00940]" with 129, 103 and 80 leaf unigenes and 22, 7 and 7 root unigenes, respectively. BRITE functional hierarchies were assigned to 17,188 leaf and 2286 root unigenes, out of which 2935 WcL and 348 WcR were classi ed under "protein families: metabolisms".

Identi cation of withanolide precursor biosynthesis genes
The precursor molecule(s) for withanolide biosynthesis are derived from terpenoid backbone and steroid biosynthesis pathways (Gupta, et al. 2013). Pathway annotations of unigenes were used to identify different enzymes leading to biosynthesis of withanolide precursor in both leaf and root tissues. The withanolide precursor, 24-Methylenecholesterol is derived from Farnesyl-pyrophosphate through a 12 step steroid biosynthesis pathway, which is derived from Acetyl-CoA via mevalonate pathway (7 steps) and Dglyceraldehyde-3-phosphate via MEP/DOXP pathway (8 steps) of the terpenoid backbone biosynthesis (Fig. 7). In total, 16 and 13 enzymes are involved in biosynthesis of precursors derived from terpenoid backbone and steroid biosynthesis pathways, respectively. All the 29 enzymes were present in the leaf transcriptome assembly, while only few enzymes (10) could be annotated from the root assembly (Table  2). Moreover, all the enzymes involved were encoded by more than one unigenes annotated from leaf transcriptome, while in root transcriptome, majority of the enzymes were encoded by a single unigene.

Differential gene expression analysis
The abundance estimates of leaf and root unigenes were calculated in the form of transcripts per million (TPM), which were then compared for identi cation of differentially expressed genes. Digital expression pro le for the genes commonly expressed in both leaf and root tissues showed signi cant variations in the expression levels of (putative) genes involved in withanolide precursor biosynthesis and conversion of precursor to different withanolides with average expression ranging from 0.1 -7 and 0.01 -9.2, respectively ( Fig. 8 ab). Among the commonly expressed genes in both tissues, only 6 withanolide precursor biosynthesis genes and 53 putative withanolide biosynthesis genes expressed differentially between leaf and root tissues. Out of total 59 differentially expressed genes, 34 were upregulated and 25 were downregulated (Fig. 8 cd). Hierarchical clustering further revealed the co-expression pattern of the different genes involved in precursor biosynthesis and conversion of precursor to withanolides (Fig. 8 ab).

SSR markers
Total 28852 leaf and 886 root SSRs were identi ed through screening of 297,412 leaf and 16,474 root transcripts, respectively. Total 24841 (8.3%) leaf and 772 (4.7%) root transcripts accounted for the total SSRs in both tissues, out of which 11.6% (leaf) and 9.5% (root) sequences harboured more than one SSR. Comparatively higher proportion of root SSRs (10.5%) than that of leaf SSRs (6.7%) were present in compound formation (Table 3). Leaf SSRs were distributed across 6 repeat type classes, while hexanucleotide repeats were absent in root transcript sequences. Mono-nucleotide repeats were most abundant (22791 -leaf; 676 -root) in both tissues, whereas hexa-nucleotide repeats (64) in leaf and tetra-nucleotide repeats (4) in root were less (Fig. 9). From the 82 types of SSRs identi ed from the leaf assembly, A/T, AG/CT, AAG/CTT, AAAG/CTTT, AATGG/ATTCC, AATGGT/ACCATT were most mono-, di-, tri-, tetra-, penta-and hexa-nucleotide repeats. Similarly, 23 different types of repeat units contributed to the SSR pool of the root library, out of which A/T, AC/GT, AAG/CTT, AAAC/GTTT, AAGG/CCTT and AATGG/ATTCC were most common in their respective repeat classes.
Validation of withanolide accumulation through qualitative and quantitative pro ling HPLC analysis revealed signi cant variations in the withanolide pro le of leaves and roots ( Figure 10). More than 50 different types of withanolides were detected in leaves, while less than 10 different types of withanolides were present in roots. The relative abundance of various withanolides was signi cantly higher in leaves than that of roots ( Figure 10). Moreover, none of the 5 withanolide standards were present in root extracts, while WL B (724.83 ng/µl of extract), WF A (8.63 ng/µl of extract) and WDL (9.74 ng/µl of extract) were present in leaf extracts.

Discussion
Withania coagulans has popularly been used as a natural coagulant for a long time and has gained unprecedented attention over past few decades due to its bioactive secondary metabolites, i.e., withanolides. Withanolides are the principle bioactive components of genus Withania which possess diverse array of pharmacological properties including immunomodulatory, neuroprotective, cardioprotective, antitumor, etc (Jain et al. 2012). Limitations in propagation strategies and variations in its withanolide pro le of the plant due to stresses are major concerns in developing novel medicinal formulations. Plant tissue culture strategies have emerged as an e cient tool not only for ex situ conservation and mass multiplication of plants but also for enhanced biosynthesis of bioactive metabolites (Jain et al. 2009). Further, biosynthesis of some novel withanolides has also been reported in the in vitro cultures of W. coagulans (Jain et al. 2011), but due to lack of information about withanolide biosynthesis pathway(s) the underlaying gene regulatory phenomenon is di cult to predict. There are various reports on characterization of withanolide biosynthesis pathway(s) genes in different parts of both eld grown and micropropagated plants of W. somnifera (Senthil et al. 2015;Mishra et al. 2016;Mishra et al. 2020), yet the information about speci c enzymes which lead to formation of different withanolides is still incomplete and is also subjected to vary at both species and tissue level.
In the present study, whole transcriptome sequencing and analysis of in vitro raised leaf and root tissues of W. coagulans was performed to identify the withanolide biosynthesis pathway and putative genes involved in this process. The in vitro cultures of the plant were established as per our previously optimized protocol for clonal mass multiplication (Jain et al. 2009;Jain et al. 2011). As per the report in W. somnifera (Gupta et al. 2013b), leaf and root samples from the in vitro raised plants were harvested for transcriptome analysis to ensure that the transcript libraries represent all the genes involved in withanolide biosynthesis and could indicate if the biosynthesis of withanolides is de novo in both tissues or imported from one tissue to other.
Transcriptome sequencing using Illumina second generation sequencing platform can generate 100x more reads with atleast 5x higher read depth than that generated through 454-based sequencers (Xiao et al. 2013). The proportion of transcripts annotated for functional assignment and number of full-length unigenes produced is also relatively higher for Illumina based platforms (Xiao et al. 2013). Therefore, this techniques has been widely used to generate reference libraries and study specialized metabolic pathways through de novo assembly and annotation, in various non-model plants including Asparagus racemosus (Upadhyay et al. 2014), Pueraria lobata (Wang et al. 2015), Withania somnifera (Senthil et al. 2015), Swertia japonica , Chamaemelum nobile (Liu et al. 2019), Plumbago zeylanica (Sundari et al. 2020), etc. Based on these recent reports, RNA sequencing of leaf and root tissues harvested from micropropagated plants of W. coagulans using Illumina NextSeq sequencing platform was done. Total 14.43 GB data with 58,878,942 leaf and 42,683,928 root HQ reads were generated, which were then submitted to NCBI's SRA database. Low quality reads and adapter sequences may produce misassemblies or truncated contigs by interfering with the transcriptome assembly (Upadhyay et al. 2014), therefore ltration of the raw reads was performed prior to transcriptome assembly. Since no reference genome was available for W. coagulans, the HQ reads were assembled de novo into 292,074 leaf and 16,474 root transcripts, out of which 282,877 (267,119 -leaf; 15,758 -root) unigenes were identi ed. From the total unigenes identi ed, matches in Nr, Uniprot, Pfam and KOG databases were found for 188,475 (70.6%) leaf and 9,519 (60.4%) root unigenes, From the total annotated unigenes, 40.7% (76,669) leaf and 55% (5,241) root unigenes were mapped in 2 or more databases, indicating extensive coverage of the transcriptome assembly. Upadhyay et al. (2014) reported that unigenes without any matches either represent the untranslated regions and non-coding RNA, or might be due to assembly mistakes. Transcriptome sequencing performed using 454 GC-FLX Titanium platform produced comparatively lesser number of contigs and singletons, out of which only ~40% were annotated against Nr protein database (Gupta et al. 2013b), suggesting lesser coverage of the assembly generated through 454 sequencing platforms over Illumina platforms.
Further, 35% (leaf) and 33% (root) of the total annotated unigenes were assigned to functional domains (5,872 WcL, 1,454 WR) and functional groups (25) using Pfam and KOG databases, respectively. Assignment of these unigenes to large number of functional groups indicates the diversity of the genes encoded in leaf and root transcriptome of the plant. The rich diversity in the gene functions of W. coagulans transcriptome was also corroborated by the GO annotations assigned to each unigene mapped using Nr database. Total 102,029 annotated leaf and 3,506 root unigenes were assigned to biological process (3815 -WcL, 1087 -WcR), cellular component (375 -WcL, 493 -WcR) and molecular functions (4493 -WcL, 1237 -WcR), thereby highlighting the diversity of the W. coagulans genes identi ed from its leaf and root transcriptome assembly (Gupta et al. 2013b;Upadhyay et al. 2014).
Approximately 74% of GO annotated unigenes were involved in more than one type of functions/processes, probably due to the multiple roles of the functional groups represented. The dominant functions as identi ed from GO annotations of WcL and WcR did not show any signi cant similarity with that of W. somnifera, thus despite of their close phylogenetic relation and shared morphological & pharmaceutical properties, W. somnifera could not be used as a reference plant for transcriptome assembly and annotation of the counterpart species.
The annotated unigenes were mapped to various KEGG biological pathways using KAAS. Total 2,274 and 709 enzymes functional in 491 and 458 pathways of leaves and roots, respectively were identi ed. Most of the enzymes were encoded by two or more unigenes, as these unigenes are either fragments of a single transcript encoding the enzyme or different members of an enzyme family. Similar studies on W. somnifera have identi ed 124 and 139 functional pathways in tissues of led grown and in vitro raised plants (Gupta et al. 2013b;Senthil et al. 2015). Identi cation of potential unigenes encoding for regulatory enzymes involved in pathways of interest could aid in metabolic engineering of pathways for targeted biosynthesis. From the total annotated unigenes, 1566 (leaf) and 145 (root) unigenes were assigned to 'unclassi ed metabolic', 'genetic information processing' and 'signaling & cellular process' pathways, while 603 (leaf) and 34 (root) unigenes were 'poorly characterized'. These unknown and partial unigenes can further be characterized to identify novel genes involved in biosynthesis of different types of withanolides in W. coagulans tissues.
24-methylenecholesterol has been identi ed as a precursor for biosynthesis withanolides and is derived through steroid biosynthesis and terpenoid backbone biosynthesis (MVA and DOXP) pathways (Chaurasiya et al. 2012;Gupta et al. 2013b). Since W. coagulans is closely related to W. somnifera, in this study all the unigenes involved in steroid and terpenoid backbone biosynthesis pathways of micropropagated leaves and roots are identi ed. Unlike in W. somnifera, all the 29 enzymes involved in precursor biosynthesis were identi ed only in leaves, while only few were identi ed in roots. As per recent reports, enzymes mediating the conversion of 5-dehydroepisterol to 24-methylene cholesterol (DWF5) & 24-methylenecholesterol to 24-methyldesmosterol (sterol isomerase, DWF1), and cyclisation of 2,3oxidosqualene into cycloartenol (cycloartenol synthase, CAS), are critical for de novo withanolide biosynthesis in both leaf and root tissues of W. somnifera (Gupta et al. 2015;Mishra et al. 2016;Knoch et al. 2018). In the present study, all these regulatory enzymes (DWF5, DWF1, CAS, etc.) were identi ed only in leaves, while only DWF5 was present in roots also. This clearly indicated that de novo withanolide biosynthesis pathway does not occur in in vitro cultured roots. Agarwal et al. (2017) reported that DWF5 and DWF1 are localized in endoplasmic reticulum and are important for biosynthesis of tissue-speci c withanolides. Withanolides are biosynthesized in W. somnifera, independently through de novo pathway in leaf and root tissues of the plant (Senthil et al. 2010;Gupta et al. 2013b). Among the enzymes commonly present in W. coagulans leaves and roots, ~43% of them showed differential expression, such that enzyme catalysing the conversion of acetyl Co-A to acetoacetyl Co-A [EC:2.3.1.9] showed highest average expression. Enzyme(s) HMGR (hydroxymethylglutaryl-CoA reductase) was downregulated, while farnesyl diphosphate synthase and 1-deoxy-D-xylulose-5-phosphate synthase were upregulated w.r.t that in leaves. Senthil et al. (2015) reported high HMGR expression in roots of W. somnifera after 30 days of in vitro culture and concluded that the expression level in the cultures varies in a time-dependent manner. Further, co-transformation of a fungal elicitor protein in hairy roots of W. somnifera was reported by Mishra et al. (2016), which downregulates HMGR and FPPS enzymes, and reduces withanolide content. Therefore, lower expression levels of HMGR in invitro roots of W. coagulans can be attributed to the absence of other pathway enzymes leading to precursor formation. Since, it is more likely that withanolide biosynthesis in W. coagulans takes place in salvage mode, potential intermediate metabolites which serve as precursor feed can be predicted depending upon the expression levels and clustering pattern of these enzymes.
Two chemical reactions involving -lactonisation between C-22 & C-26 and hydroxylation at C-22 of 24methylenecholesterol have been rendered essential for biosynthesis of withanolides (Dhar et al. 2015). However, tissue speci c divergence of withanolides is mediated through secondary conversions including addition of different side chains, oxidation/reduction, hydroxylation and glycosylation reactions, etc. (Chaurasiya et al. 2012;Gupta et al. 2013b;Gupta et al. 2015). 24-methylenecholesterol has also been identi ed as key precursor for biosynthesis of another class of steroid derivatives called 'brassinolides'.
The precursor is converted into brassinolides through a series of oxidation and hydroxylation reactions, catalysed by cytochrome P450 enzymes (Bishop 2007;Ohnishi et al. 2009). Cytochrome P450s (CYP or P450) catalyse hydroxylation, oxidative demethylation, desaturation, epoxidation, desaturation, oxidative rearrangement of carbon skeleton and oxidative C-C bond cleavage reactions in various plant cellular and metabolic processes (Rana et al. 2014). Srivastava et al. (2015) reported that P450s belonging to CYP83B1 (WSCYP93Id) & CYP734A1 (WSCYP9Sm) families and CYP71 (WSCYP734B) & CYP72 (WSCYP734) clans account for synthesis of chemotype speci c withanolides in W. somnifera and suggested involvement of WSCYP93Id and WSCYP9Sm in oxygenation reactions of plants, which account for synthesis of different metabolites. Further, variations in expression pro les of two-A type P450s (WsCYP98A and WsCYP76A) and two paralogs of cytochrome P450 reductase has also been associated with differential withanolide biosynthesis in W. somnifera (Rana et al. 2014). Glycosylation commonly catalysed by glycosyltransferases (GTs), is a modi cation reaction that is usually related with biosynthesis of secondary metabolites (Dhar et al. 2015). Singh et al. (2016) reported differential biosynthesis of withaferin A, withanolide A and withanoside A is regulated by the expression of sterol glycosyltransferases (SGTs) through a positive feedback mechanism in W. somnifera. Further, role of sterol methyltransferase 1 (SMT1) in channelling the intermediates towards rst committed step of withanolide biosynthesis in W. somnifera has also been reported recently (Pal et al. 2019). In this study, unigenes encoding for P450, MT and GT in leaf assembly (3642) were signi cantly higher than those in root assembly (185), such that only 8, 16 and 56 members of P450, GT and MT, respectively were common in both tissues. Among common members, only 5 P450s, 7 GTs and 41 MTs were differentially expressed in leaves and roots, out of which 3 P450s, 2 GTs and 18 MTs were downregulated in roots w.r.t those in leaves. Presence of large number of tissue-speci c genes and their differential expression indicates diversity of withanolide between both the tissues. Since these putative genes regulate conversion of precursor to different types of withanolides, availability of lesser number of unigenes encoding these genes can be considered as an indication of comparatively less diverse withanolide pro le. The putative unigenes thus identi ed can further be used to elucidate withanolide biosynthesis pathway and mechanism(s) involved in tissue-speci c synthesis of withanolides in leaves and roots of W. coagulans. This tissue-speci c differential accumulation of withanolides was also con rmed through HPLC analysis. Some of the important withanolides including WF A, WL B and WDL were speci cally present only in leaves, a rming their tissue-speci c accumulation and biosynthesis. Though differential accumulation of withanolides in different tissues of W. somnifera has been reported in various studies (Senthil et al. 2015), but same has been reported for the rst time in W. coagulans in this study. Relatively very low abundance of withanolides in roots and their exclusivity to the tissue further supports our proposed hypothesis of de novo and salvage biosynthesis in leaves and roots, respectively.
A comparative analysis of the withanolide biosynthesis pathway and their accumulation mechanism in W. coagulans and W. somnifera plant parts can also be performed to identify the enzymes essential for de novo and salvage synthesis of withanolides and to reveal their phylogenetic origin. Such studies would enable to develop strategies for targeted withanolide biosynthesis through metabolic pathway engineering at gene level.
The leaf and root unigene sequences were also screened for identi cation of simple sequence repeats (SSRs) which can be used for development of species-, chemotype-and tissue-speci c molecular markers. From the total SSRs identi ed, ~ 17% of them were present in compound formation and ~ 21% of the unigenes harboured more than one SSR. In this study, mononucleotides were the most abundant SSRs in both leaf and root transcriptome assembly. On the contrary, trinucleotide repeats have been identi ed as most frequently occurring SSRs in leaf and root transcriptome assembly of W. somnifera (Gupta et al. 2013b). The differences in marker distribution pattern can be used to design species speci c molecular markers for W. coagulans and W somnifera.

Conclusion
This study is the rst to establish the transcriptome database for W. coagulans through NGS technology. Pathway annotation indicated that leaves of in vitro raised plantlets possess de novo withanogenesis potential. Absence of certain withanogenesis enzymes in the roots revealed that the roots lack de novo withanogenic competence and withanolides are biosynthesised through one or more salvage pathway(s).
The higher diversity in putative gene families of leaf (108) transcriptome than that of root (81), further corresponded to higher metabolic diversity of withanolides in leaves than in roots. This information can be fruitful in engineering the biosynthesis pathway for enhanced production of bioactive withanolides.
Moreover, the metabolic distinctness among the two important medicinal species of Withania has also been established for the rst time in this study. The SSRs identi ed can be used to develop chemotypeand species-speci c markers, which in turn will help in identi cation of genotypes rich in speci c withanolides and distinction among different varieties of W. somnifera and W. coagulans. Therefore, sequencing and analysis of W. coagulans transcriptome will be useful for further research and development on marker-assisted selection & breeding programs for development of medicinally elite varieties of Withania. Additionally, this study has also opened new gateways for establishment of plant cell factories for production of important withanolides at industrial scale.