Molecular cloning and characterization of Triterpenoid Biosynthetic Pathway Gene HMGS in Centella asiatica (Linn.)

The flux of isoprenoids and the total accumulation of triterpenoid saponins known as centellosides in C. asiatica are controlled by the key genes of the Mevalonate pathway (MVA). These genes were reported to have positive regulation of the pathway in providing isoprenoid moieties. Though, some information is available on the pathway and secondary metabolites. However, most of the pathway steps are not characterized functionally. For the study, full-length pathway gene Hydroxymethyl glutaryl-CoA-synthase (CaHMGS; GenBank accession number: MZ997833), was isolated from previously annotated transcriptome data of Centella asiatica leaves. HMGS has been successfully cloned and heterologously expressed in bacteria E. coli strain DH5α. The cloned gene has been sequenced and further characterized through in silico studies by different bioinformatics tools. Also, the gene sequences have been submitted in NCBI. In silico studies of isolated gene sequence revealed the nature, characteristics of genes. The ORF of HMGS is 1449 bp encoding 482 amino acids. Predicted molecular weight (MW) of HMGS was 48.09 kDa and theoretical pI was 5.97. Blast results and Multiple sequence alignments of the gene showing the similarity with HMGS of other plants of their respective families. The Molecular Evolutionary Genetic Analysis (MEGA) version 10.1.6 was used to construct a phylogenetic tree. Differential tissue-specific expression of different plant parts was also checked. Tissue expression patterns unveiled that the highest expression level of the CaHMGS had been seen in the roots and lowest in the node of the plant. Functional complementation experiment of the CaHMGS in Saccharomyces cerevisiae wild strain YSC1021 and haploid strain YSC1021 which lack HMGS protein confirmed that the CaHMGS gene encodes functional CaHMGS that catalyzed the biosynthesis of mevalonate in yeast. The gene was reported, cloned and characterized first time in Centella asiatica. Understanding this biosynthetic pathway gene will further help in the improvement of plants for enhanced secondary metabolites production.


Introduction
Centella asiatica, belongs to the family Apiaceae, is perennial herb. It is native to wetlands of European and different South Asian countries and commonly known as Indian pennywort, Jal Brahmi, Mandukaparni [1]. The plant is a great reservoir of various secondary metabolites including triterpenoid glycosides, saponin glycosides, flavonoids, by using OligoAnalyzer™ Tool (www.idtdna.com/pages/ tools/oligoanalyzer) [27].

Total RNA isolation
Total RNA extraction was done from different tissues of C. asiatica by using TRI Reagent®(Sigma) using 100 mg of tissue samples. Isolated RNA was dissolved in Milli Q water and checked by Nanodrop spectrophotometer ND1000 and 0.8% agarose gel electrophoresis and further RNA was kept at -80 ̊ C [28] for applications in further applications.

cDNA synthesis
Revert Aid kit from Thermo Scientific (U.S.A.) was used to make cDNA from the isolated total RNA. Synthesized cDNA were confirmed by Semi quantative PCR with Actin primers.

Amplification of the pathway gene CaHMGS by PCR
Amplification of the gene was done with gene specific primers (Supplementary Table 4) taking synthesized cDNA as a template. For this the reagents like PCR master mix (Thermo Scientific), nuclease-free water, forward primers and reverse primers were added with 35X PCR cycles and temperatures as: Initial denaturation: 94° C:5 min; final denaturation: 94° C:35 s; primer annealing: 53° C:30 s, initial extension: 72° C :2:30 min; final extension: 72° C:7:00 min used for the successful amplification of HMGS gene. To visualize the bands of the gene, 0.8% agarose gel was casted (0.8 g /100 ml 1x TAE) using 1x TAE (Tris-acetate-EDTA) as tank buffer. The amplified band of DNA fragments of interest from the gel was sliced/excised out with a scalpel or razor blade and extracted using the Gen Elute™ gene elution kit from Qiagen.

Cloning and selection of positive clones
The product eluted was further cloned in cloning vector pGEM®-T Easy to obtain a ligated product. The ligated product was then transformed into DH5α strain of E. coli and plated on Luria Agar medium with IPTG (Isopropyl β-d-1-thiogalactopyranoside) and X-gal (5-Bromo-4-Chloro-3-Indolyl-beta-D-Galactoside) as inducer and ampicillin as selection drug. The white colonies were taken for screening through PCR with the gene-specific primers to screen the putative transformants.
like AACT, HMGR, SQS, etc. among others which are reported to have positive regulation of the pathway in providing isoprenoid moieties [9][10][11]. Besides, the major four triterpenoid saponins are important and governed by glucosyltransferases and glucosidases. Genes of the Mevalonate pathway are responsible for controlling the flux of isoprenoids and overall accumulation of triterpenoid saponins in C. asiatica. These genes were reported to have positive regulation of the pathway in providing isoprenoid moieties. The understanding of biosynthetic pathway genes is important for further improvement in the plants for secondary metabolites. Although earlier efforts in this direction have generated transcriptomic resource with annotated and mapped sequences of the pathway. HMGS had been reported in different medicinal plants viz. Arabidopsis thaliana [12], Chamaemelum nobile [13], Matricaria chamomilla [14], Tripterygium wilfordii [8], Santalum album [15], Ginkgo biloba [16,17], Panax notoginseng [18], Taxus media [19], Stereum hirsutum [5], Ganoderma lucidum [20], Solanum tuberosum [21], Gossypium hirsutum [22], Aconitum balfourii [23]. HMGS have not been reported in C. asiatica till date. It is supposed that overexpression of HMGS may lead to the enhanced production of Centellosides yield in transgenic plants. In our study, C. asiatica HMGS (CaH-MGS) gene was cloned and transformed into bacteria E. coli DH5α. The plasmid of positive clones were isolated, sequenced [24] and then submitted to NCBI. Further, the gene was characterized in silico via different bioinformatics tools. Differential expression analysis by quantitative PCR (qRT-PCR) had been performed [25]. Complementation test using Saccharomyces cerevisiae have been performed [15,17]. Though, some information is available on the pathway and secondary metabolites. However, most of the pathway steps are not characterized functionally. Functional elucidation of this pathway transcript in isoprenoids biosynthesis would reveal the finer control of the pathway that further will encourage genetic engineering of enzymes for increasing the medicinal value of the plant.

Gene Sequence mining and primer designing
The full-length gene was mined from transcriptome previously sequenced from leaf samples of C. asiatica [26]. The Hydroxymethyl glutaryl-CoA-synthase (HMGS) gene sequence of C. asiatica was isolated followed by primer designing, RNA isolation, cDNA (complementary DNA) synthesis and further cloning processes. Genes specific primers (listed in Supplementary Table 4) were synthesized with CaHMGS. The sequences were deposited in the Swiss model (swissmodel.expasy.org) for homology modelling. Amongst different models, the best model having maximum favoured score was selected according to Ramachandran Plot. The model selected was then undergoes for refinement by Galaxy Refine server. Also, the obtained refined model was again verified through quantification of the amino acid residues shown in the Ramachandran plot. The stereochemical quality of the CaHMGS model was analyzed through PROCHECK [33,34] in the Structural Analysis and Verification Server (SAVES) (http://nihserver.mbi.ucla.edu/ SAVES/) [25]. Also, the model has further visualized using UCSF Chimera-Molecular Modeling System Chimera [35]. Modelled protein was exposed to Protein Structure Analysis (ProSA) [36] and ERRAT (SAVES) tools for quality determination and validation of the modelled protein [37].

Tissue-Specific Expression Analysis
Differential tissue specific Expression of CaHMGS gene was studied in various tissues (node, petiole, leaves, and root) of Centella asiatica using quantitative real-time PCR. For the study, Total RNA from different tissues like leaves, node, petiole and root were isolated [16] using TRI Reagent® from Sigma. 5 µg of total RNA was used to prepare cDNA using Gene Sure First Strand cDNA Synthesis Kit (Puregene) [38]. The PRIMER 3 was used to primers designing for real time PCR using having Tm > 50°C, GC content of 50-60%, primer length of nucleotides approx.18-24 and amplicon size (expected) of 100-120 bp. RT-PCR specific primer sequences of CaHMGS (Supplementary Table 4). Concentration of cDNA synthesized was optimized to 100-150 ng/µl. Relative expression of the gene was also checked by semi-quantitative PCR. Actin gene was used as endogenous control for normalizing the expression of the target gene ( Fig. 4(a)). Calculation of Relative expression was done by 2 − ΔΔCt method [25]. All the reactions have been done in triplicates to ensure the credibility of the data.

Functional complementation
Yeast (Saccharomyces cerevisiae) required MVA pathway for its survival [15] and interrupting MVA pathway genes in yeast could be lethal. pYES2 are autonomously replicating expression vectors that contain a yeast galactosedependent promoter. Hence to characterized the function of CaHMGS, the vector pYES2-CaHMGS construct was prepared. For this, the cDNA prepared in previous experiment was amplified using primers (Supplementary Table 4) having restriction sites SacI and XbaI, eluted and ligated in pYES vector underneath to the yeast galactose-dependent GAL1 promoter [39]. pYES2-CaHMGS construct was

Colony PCR for screening the recombinant colonies and isolation of plasmid DNA from E. coli DH5α
Confirmation of the positive clones was done through colony PCR [29]. The conditions for the PCR used were 94 ̊ C:7 min; 94 ̊ C: 30 s; 53 ̊ C:40 s; 72̊ C: 2 min 30 s and final extension 72 ̊ C:7 min. The resulting product of colony PCR was checked on 0.8% agarose gel. The plasmid was isolated through the alkaline lysis method and treated with RNaseA to remove RNA contamination. Positive clones were sequenced in an automated sequencer Genetic analyzer Model 3130xl using M13 forward and reverse universal primers [30].

Primary sequence analysis and phylogenetic study
Gene sequences were translated using EMBOSS Transeq tool (https://www.ebi.ac.uk/Tools/st/emboss_transeq). Obtained translated sequences were further undergoes in silico proteomics studies. First of all, HMGS (GeneBank ID:MZ997833) primary sequence of the protein was subjected to BLASTp against the non-redundant (NR) database of NCBI for searching the closest homologs of CaHMGS. For multiple sequence alignment, all sequences with significant alignment were aligned by ClustalW tool [24] (https://www.genome.jp/tools-bin/clustalw) [25]. Identity cut off > 90, E value − 0.0, query coverage − 100% with the CaHMGS. For the study of Evolutionary relationship of the CaHMGS gene with HMGS of other species, we constructed phylogenetic tree by Neighbour-Joining method having bootstrap value of 1000 replicates [25,28] by the software Molecular Evolutionary Genetic Analysis (MEGA) version 10.1.6. [31]. Various bioinformatics tools were used for protein sequence analysis like protein family and domain arrangement within the gene was explored by Pfam database (http://pfam.sanger.ac.uk/) [32], domain and motif was deduced through InterProScan and Simple Modular Architecture Research Tool (SMART) (http://smart. embl-heidelberg.de) Conserved Domain Database (CDD) (http://www.ncbi.nlm.nih.gov/Structure/ cdd/cdd.shtml), was used to deduce the Conserved Domains of the protein.
Besides, various parameters of physiochemical properties of the amino acid sequence of CaHMGS protein was also observed with the ExPaSy Protparam (http://web.expasy. org/protparam) tool of the proteomics server [32].

Theoretical modelling and its validation
The CaHMGS primary protein sequence was subjected to BLAST against the Protein Data Bank database. The closest homolog found was Brassica juncea HMGS (7CQT.1) with > 81% identity and > 95% coverage was observed C terminal (Pfam: HMG_CoA_Synt_C) positioned between 211 and 465 amino acids and Hydroxymethylglutaryl-coenzyme A synthase N terminal (Pfam: HMG_CoA_Synt_N) comprised 5-165 amino acids (Supplementary Table 1). The physiochemical studies revealed that CaHMGS is almost neutral in nature with a theoretical pI of 5.97. The instability index and aliphatic index of HMGS was 31.8 and 67.97 respectively. The molecular mass of CaHMGS protein was 48.09 kDa (Supplementary Table 2). Also, the derived amino acid sequence was searched for homology through PSI-BLAST and the result showed high sequence homology with other HMGS sequences from different plant species.

Molecular cloning of HMGS gene
The cDNA of HMGS gene was successfully cloned and sequenced ( Fig. 1; (a)-(d)). The sequence was subjected to Blastx. After confirmation sequence was submitted to NCBI databases [40] with GenBank accession number:MZ997833 (https://submit.ncbi.nlm.nih.gov/about/bankit/). The gene was further analyzed by various bioinformatics software.

Bioinformatics analysis of CaHMGS
CaHMGS nucleotide sequences was subjected to translate through Expasy translator (https://web.expasy.org ) that revealed, it contained a 1449 bp and 482 amino acids. Different domains present in the synthase superfamily were identified by SMART against the Pfam database [41]. The results dictated that CaHMGS belongs to HMG_CoA_synt_C superfamily ( Supplementary Fig. 2). It contains two putative domains viz. Hydroxymethylglutaryl-coenzyme A synthase

Determination and description of 3D structure of Centella asiatica HMGS
Three-Dimensional structure of the CaHMGS (Fig. 3(a)) was constructed on the basis of sequence identity with the Brassica juncea HMGS template (7CQT.1) by homology modelling in SWISS-MODEL [43]. The model has shown the similar upper and lower structural region as compared to other HMGS [44]. In addition, interface of the above-mentioned lower and upper region detects the binding pocket of acetoacetyl-CoA. The Upper structure region revealed active site and conserved motif i.e. NxD/NE/VEGI/VDx(2) NACF/YxG lies around the five-layered core structure. The conserved motif was responsible for regulating the catalytic activity of the CaHMGS on the substrate by being localized

Phylogenetic Analysis
The CaHMGS protein was aligned with its other homologues in MEGA v10.1.6 for the study of evolutionary relationship [31,42] (Fig. 2(a)). Taking the top hits having the lowest E-value, the highest query coverage and identity listed above into the consideration, phylogenetic tree was constructed by blastp tool. On the basis of differences in clade, phylogenetic tree of the CaHMGS was divided into cluster I and cluster II (Fig. 2(b)). These findings help to conclude that the ancestral gene that was responsible for the evolution of the CaHMGS was ultimately shifted to other parentage of plant species. Left side indicate the origin of the HMGS sequence and the right side indicate the number of amino acid residues. Red: identity = 100%; Dark green: 75% ≤ identity < 100%; light yellow:50% ≤ identity ≤ 75%. Dashes shows gaps introduced for the optimization of the alignment. Maroon box shows conserved motif and star sign indicates active site; (b) Phylogenetic tree of CaHMGS with its closest homologues was prepared by the Neighbor-Joining method in MEGA v10.1.6. The percentage of replicate trees in which the associated taxa were clustered together in the bootstrap test with value of 1000 replicates was displayed adjacent to the branches of the tree. The tree was constructed in scale with same units of the branch lengths in which the phylogenetic tree was concluded by evolutionary distances. p-distance method was used to estimate the evolutionary distances on the basis of number of amino acid differences per site

The molecular modelling and model validation of CaHMGS
The secondary structure of CaHMGS protein was predicted by using PSIPRED (bioinf.cs.ucl.ac.uk/psipred) and UCSF Chimera-Molecular Modeling System Chimera [35] showed that the protein was actually a α-helical protein that contains in the region of active site. Moreover, a catalytic cysteine residue is found at the C-terminus, which was responsible in ramping up HMG-CoA synthesis [45]. These detections will help us in better understanding of the functions of the CaH-MGS and also create a pathway for further study towards terpenoids metabolic process.

Conclusions
In our study, HMGS gene had been first time reported in the Centella asiaitca. The full-length HMGS gene was cloned from C. asiatica and heterologously expressed in bacteria DH5α. Positive clones were sequenced. Confirmed HMGS gene sequences have been submitted to the NCBI. Further, bioinformatics analysis of the CaHMGS have been done. Blast results and Multiple sequence alignments showing the similarity with HMGS of other plants of their respective families. On the basis of Blast results and Multiple sequence alignment, phylogenetic tree has been constructed. The phylogenetic tree showed that the CaHMGS is the rate-limiting enzyme MVA pathway involved in the production of triterpenoids. Gene was characterized by different bioinformatics tools. Differential tissue-specific expression in C.asiatica have been checked. The expression of the gene was highest in roots of the plant and lowest in the nodes. Complementation test showed that the CaHMGS gene encodes functional protein that catalyses the synthesis of Mevalonate in the MVA pathway. Our present study will lead to further understanding the role of gene in the pathway for production of triterpenoid saponins. Also, it can open up different channels for probable genetic engineering avenues that will surely improve the medicinal properties of the Centella asiatica.

Supplementary Information
The online version contains supplementary material available at https://doi.org/10.1007/s11033-17 α-helixes and 10 β-Sheets (CaHMGS). The obtained secondary structure of CaHMGS was further confirmed using 3D modeling in Swiss model server. Molecular modelling of CaHMGS showed that N terminal and C-terminal end of the protein hold α-helixes in conjunction with β-Sheets and β-Sheets respectively.
The protein model was studied by residue and its stereochemical quality and analyzed in details through different bioinformatics model validation servers [46]. Ramachandran plot analysis of CaHMGS ( Fig. 3(b)) showed 90.4% (671) of amino acid residues were found to be in most favorable region. 8.8% (65) of residue were observed under generously favourable regions and 0.4% (3) was in additionally favorable with only 0.4% (3) of residues in unfavorable regions. ProSA revealed the overall quality of the modelled structure CaHMGS using Z-score (Fig. 3(c)) and knowledge-based energy graph (Fig. 3(d)). From this observation, it can be concluded that the CaHMGS model had a fine resolution and stereochemistry by comparing with template 7CQT (Supplementary Table 3).

Differential tissue-specific expression
The results from Quantitative PCR revealed that the expression of the CaHMGS gene was ubiquitous in different tissues of C. asiatica, but the level of expression of the gene was found to be highest in the roots and lowest in nodes ( Fig. 4; (b) and (c)).

Functional complementation
Wild type strain YSC1020 of yeast was able to grow on both non-induction medium YPD with G418 as well as on induction medium YPG with G418 but the mutant yeast strain harboring pYES2-CaHMGS could survive only on YPG medium but not survived on non-induction media [39] YPD medium either with or without G418 (Fig. 5). The