Metabolic Stimulation-Elicited Transcriptional Responses and Biosynthesis ofAcylated Triterpenoids in the Medicinal Plant Helicteres Angustifolia

Yuying Huang Guangzhou University of Chinese Medicine Wenli An Guangzhou University of Chinese Medicine Zerui Yang Guangzhou University of Chinese Medicine Chunzhu Xie Guangzhou University of Chinese Medicine Shanshan Liu Guangzhou University of Chinese Medicine Ting Zhan Guangzhou University of Chinese Medicine Xiasheng Zheng (  xszheng@gzucm.edu.cn ) Guangzhou University of Chinese Medicine Huaigeng Pan Guangzhou University of Chinese Medicine


Introduction
Helicteres angustifolia, a conversant plant in Helicteres within the family Sterculiaceae, is a traditional medicinal herb used in Southern China [1] . Alcoholic and aqueous extracts of H. angustifolia are often used clinically to treat in uenza, headache, carbuncles, hemorrhoids, tonsillitis, pharyngitis, parotitis, in ammatory diseases, and cancer, according to modern pharmacological studies [2][3][4]. These pharmacological bene ts are caused by its chemical components, particularly the triterpenoids [5]. So far, 14 triterpenoids have been isolated and identi ed from H. angustifolia (summarized in Table 1), including betulinic acid [6], oleanolic acid [7], helicteric acid and helicterilic acid [6]. These compounds can be grouped into two parent scaffold types, namely, the lupine type ( Figure 1A) and the oleanane type ( Figure   1B). Intriguingly, most of these pentacyclic triterpenoids are decorated with acyl groups at the C-3 and/or C-27 position. In addition, triterpenoids are found to be acylated only in the genus Helicteres and rarely in other plants, to the best of our knowledge [8,9].
Modern pharmacological studies indicate that acylated triterpenoids exhibit multiple pharmacological effects, such as easing transaminase [9], promoting apoptosis in cancer cells, and providing anti-tumor [4], anti-hepatitis B virus [10], anti-liver brosis [11,12], and liver-protecting effects [13,14]. However, in H. angustifolia, the content of these compounds is limited. Hence, exclusive reliance on isolating ingredients from herbal materials is impractical. Enhancement of the biosynthetic pathway of target products in homologous or heterogenous organs could be a more effective strategy for addressing this shortage than the traditional extraction approach [15,16]. For example, researchers have successfully improved the yield of triterpenoids in S. cerevisiae strains by introducing and overexpressing the genes that encode the key rate-limiting enzymes involved triterpenoid-synthesis pathways in Crataegus pinnati da [17].
Regrettably, the complete lack of basic knowledge on the metabolic network of H. angustifolia hinders further investigation of biosynthesis and metabolic engineering. Due to the rapid development of highthroughput sequencing, de novo transcriptome analyses provide a powerful tool for screening candidate genes involved in the metabolic biosynthesis pathway of a target herb without genomic reference, which will support further investigation [18,19].
Among them, FPP is an important intermediate for triterpenoid biosynthesis. Two units of FPP can be catalyzed by squalene synthase (SS) in a tail-to-tail manner to yield the hydrocarbon squalene. Subsequently, another important precursor 2, 3-oxidesqualene, is generated under the catalysis of the squalene monooxygenase (SM) with the presence of O2 and coenzyme NADPH [24,25]. 2, 3oxidesqualene can then be derived into various triterpenoid skeletons under the catalysis of oxidized squalene cyclases (OSCs) [26]. Based on the known compounds and intermediates, synthetic routes were speculated for triterpenoids in H. angustifolia. As shown in Figure 2, the OSCs involved in the triterpenoid synthesis pathway in H. angustifolia are mainly lupeol synthase and β-amyrin synthase, which act on 2, 3-oxidosqualene to produce lupeol and β-amyrin, respectively. Then, C-28 oxidase catalyzed by cytochrome P450 enzyme (CYP450) is essential for modifying lupeol and β-amyrin to form betulinic acid and oleanolic acid. The acylation reaction is presumably the nal step in the modi cation of the triterpenoids of H. angustifolia. Those unique triterpenoids, such as helicteric acid and helicterilic acid, are formed by acetylation at C-3 and benzoylation at C-27 on betulinic acid and oleanolic acid ( Figure 2).
Triterpenoids are important metabolic substances [27,28], and their biosynthesis can be affected by the metabolic stimulation of their source plants, including exogenous phytohormone treatment and mechanical damage [29,30]. In this study, transcriptomes of H. angustifolia samples under different treatments of metabolic stimulation were sequenced and analyzed to obtain a genetic basic for the biosynthesis of the metabolic regulation of triterpenoids in vivo. Moreover, cloning and functional characterization of two key enzymes encode genes revealed their involvement in the biosynthesis of the direct precursors of medicinally acylated triterpenoids. In general, the results presented here support our understanding of the regulation and synthesis of triterpenoids in H. angustifolia, laying the groundwork for further studies of the potential manipulation of this pharmaceutical resource.  Taking the surface of the leaves as the degree of basic wetting, after 24 hours of treatment, the content of total triterpenoids in H. angustifolia was analyzed, and the optimal treatment concentration was determined. Then, one-third of the blade area was scratched with scissors, and the samples was taken 24 h later as the mechanical damage (MD) group. The leaves were sprayed with the optimal concentration of MeJA and SA, and sampled was taken as the MeJA treatment group and SA treatment group, respectively, after 24 h treatment. The leaves were sprayed with the con guration solvent (0.04% ethanol solution) of MeJA and SA and treated for 24 h, and then sampled as the solvent control group (EtOH).
The samples without any treatment were used as negative control group (NC). 15 individuals from ve different treatment groups were placed in liquid nitrogen for the determination of total triterpenoid content and the extraction of RNA.

Chemical analyses of total triterpenoid
The procedures of total triterpenoid extraction and measurement were adapted from Cai et al [31] and Oludemi et al [32]. In a nutshell, leaves were sampled and extracted with 80% ethanol in a material-toliquid ratio of 1:30 and an ultrasonic extraction time of 2 h at 70 ℃. Then the dry extract was redissolved in n-butanol solution. With 5% vanillin-glacial acetic acid solution and perchloric acid as the chromogenic agents, the mixture's absorbance was measured with ultraviolet spectrophotometry at 548 nm. Betulinic acid served as a reference chemical marker and was used to draw the standard curve. The curve showed a good linear relationship, with an equation of y = 4.2893x + 0.0136 (r2 = 0.9995). Total triterpenoid extraction and measurement were performed in independent triplicates.

RNA extraction and transcriptome sequencing
Total RNA of the samples ( Figure S1) were extracted using a plant total RNA puri cation kit, following the manufacturer's instructions (TIANGEN, China). The integrity and quantity of the RNA samples were analyzed by 1% agarose gel electrophoresis and with a NanoDrop 2000C Spectrophotometer (Thermo Scienti c, USA). To analyze the transcriptome, quali ed RNA samples were send to Majorbio (Shanghai, China) to construct the cDNA library and for sequencing on an Illumina HiSeq4000 (Illumina Inc., San Diego, CA, USA). Paired-end reads were generated and examined with fastx_toolkit_0.0.14 software (http://hannonlab.cshl.edu/fastx_toolkit/) to assess the quality of the sequence.

DEG analyses
Fragments per kilo bases per million reads (FPKM) was measured to gauge the relative expression levels, and the FPKM value of the transcripts were determined using the software RSEM v1.2.15 (http://deweylab.github.io/RSEM/) with the default parameters. DESeq2 version 1.10.1 software was used to analyze the raw counts based on the negative binomial distribution. Transcripts with expression differences between the groups were obtained using the parameters of p-adjust < 0.05 and |log2FC| >= 1. KEGG enrichment analyses of the DEGs were performed with KOBAS version 2.1.1 (http://www.genome.jp/kegg/). Nine differentially expressed unigenes related to triterpenoid biosynthesis were selected for qPCR veri cation. The gene and primer information was shown in Table S1.

Mining of the candidate genes involved in triterpenoid biosynthesis
Three approaches were employed to screen the genes related to the biosynthesis of triterpenoids. Firstly, BLASTx was performed using the resulting unigenes mentioned above against public protein databases, including NR, Swiss-Prot, KEGG, GO, PFAM, and COG (E value< 0.00001). Secondly, BLAST-p was adapted using the protein sequences of a set of characterized genes against the Transdecoder-predicted peptide sequences of the H. angustifolia transcriptome assembly (e-value < 1e-5, identify > 40%, score > 200). Lastly, using the Pfam pro les (PF13243.6, PF13249.6, PF00067.22, and PF02458.15) as queries, pfamscan based on the HMMER suite (http://hmmer.janelia.org/) was conducted (e-value < 10^−5). Positive hits were manually inspected by con rming the presence of the corresponding domains, which were classi ed as the putative genes. Then those candidate genes were selected to conduct phylogenetic analyses using the neighbor-joining (N-J) method in MEGA7.0, with 1000 bootstrap replicates. Expression data were normalized by square root transformation, then used to infer co-expression gene network modules employing WGCNA R software package according to step-by-step network construction and the module detection method; soft-thresholding method were used to select a proper power-law coe cient β, and a dynamic hierarchical tree cut algorithm was used to detect the co-expression modules [33,34].

Recombinant expression
The open reading frame (ORF) of candidate genes were ampli ed from cDNA of samples, primers were summarized in Table S2. And the clone vector was further constructed, which was transformed into Escherichia coli for preservation after sequencing veri cation. Then ORFs of the candidate OSCs and CYP450s were ampli ed with gene-speci c primers that contained SalI and NheI restriction sites, respectively. The PCR products were subcloned into an expression vector pESC-TRP to create pESC-TRP-OSCs and pESC-TRP-CYP450s. After verifying the integrity of the gene through sequencing, recombinant plasmids were transformed into Saccharomyces cerevisiae WAT11. After being induced by 2% Dgalactose, the recombinant yeast activate the GAL10 and GAL1 promoter on the pESC-TRP vector, thus inducing the expression of the target (6×His)-tagged proteins. Western-blot method was used to detect the expression levels of the fusion protein at different time points to determine the optimal induction time, the speci c methods were referred to the literature reported preciously [35]. And the ORFs of the TATs and TBT were subcloned into an N-terminal 6×His tag fusion expression vector pCOLD-TF. Upon sequence con rmation, plasmids were then be transformed into E. coli Rossetta (DE3) for heterologous expression.
All the gene-speci c forward and reverse primers were displayed in Table S3.

GC-MS analysis of yeast extracts and enzymatic assays
The recombinant yeast cells were harvested by centrifugation after 7 days of Gal induction, then added by 20 mL lysis solution (20% KOH, 50% EtOH) for condensation re ux maintained 30 min at a slight boil. After the temperature drops, the pH value is adjusted to 2.0 with concentrated hydrochloric acid. Yeast lysate was extracted three times with equal volume of n-hexane. After the hexane phase was dried, the residue was derivatized using bis-N, O-(trimethylsilyl) tri uoroacetamide (Sigma-Aldrich) at 80 ℃ for 60 min before GC-MS analysis. The GC system equipped with an HP-5MS (30 m×0.25 mm×0.25 μm) column. The sample was injected with a ow rate of 1.2 mL/min and temperature of 250 ℃. The GC oven temperature was programmed from 80 ℃ (held for 1 min) to 300 ℃ at 40 ℃/min and kept for 10 min. For the identi cation of metabolites, complete mass spectra were generated by scanning within the mass-to-charge ratio range of 50 to 600. Triterpenoid products in yeast were monitored according to base peak. Empty pESC-TRP vector was used as negative control, and standard substance was used as positive control. Triterpenoid was determined by comparing both the retention time and mass spectra with the authentic standards.
For the characterization of the acyltransferase, a enzymatic assays which containing 50 mM Tris-HCl buffer (pH 7.0), 300 mM NaCl, 1 mM dithiothreitol, 30-50 μg protein, 200 μM acyl donor (250 μM acetyl CoA/benzoyl CoA) and substrates (500 μM oleanolic acid/betulinic acid) were carried out in a nal volume of 200 μL. The reaction system was placed at 30 ℃ for 2 h. After the reaction, 500 μL ethyl acetate was added to extract for twice and the extracted solution was combined. The residue was redissolved with 200 μL acetonitrile and centrifuged at 16,000 rpm for 5 min, and the supernatant was absorbed for HPLC analysis. HPLC was equipped with a Hypersil Gold AQ-C18 column (250×4.5 mm). Samples (20 μL) were injected at a column temperature of 25 ℃. The liquid phase consisted of solvent A, 0.2% phosphoric acid aqueous solution, and solvent B, acetonitrile. Chemicals were separated in 75 min using the following gradient: 0-60 min, 80% B, 60-75 min, 80%-100% B, with a ow rate of 1.2 mL/min.
The detection wavelength was 210 nm.

Transcriptome analysis induced by metabolic stimulation
The optimal concentration of 500 μM methyl jasmonate (MeJA) and 400 μM salicylic acid (SA) were determined by single factor experiment ( Figure S2). Consequently, the total triterpenoid content of samples from ve groups was measured. The results showed elevated contents for total triterpenoids in the MD-, MeJA-, and SA-induced groups relative to the NC group ( Figure S3) Table S4. Then 581,934 transcripts and 424,824 unigenes were obtained after trimming and assembly of the clean data. The mean size of the unigenes was 655 bp, with a N50 length of 1,300 bp (Table 2).  Figure S4, Table S5). In total, among the 424,824 unigenes generated by transcriptional sequencing, 245,709 (57.84%) received at least one hit in those databases, with 13,320 unigenes sharing common annotation ( Figure S5).

Analysis of DEGs under different treatments
The transcriptome statistics of those ve groups were compared pairwise to establish potential DEGs among the different groups. Compared to the NC group, there was the greatest amount of DEGs in the MD group (Figure 2A), and the number of up-regulated DEGs in each metabolic stimulation treatment group and EtOH group were higher than that of the downregulated genes ( Figure 2B). The Venn diagram presented in Figure 2C indicates that 774 DEGs were shared among the four aforementioned pairwise comparisons, and a total of 22,430 DEGs were obtained among the four comparison groups. Heat maps were produced based on the expression patterns of all DEGs, and these could be divided into eight clusters ( Figure 2D). KEGG enrichment analyses of DEGs in four pairwise groups Figure S6, Table S6-9 suggested that pathways with the most enrichment were mostly associated with triterpenoid synthesis, revealing the expression of genes related to triterpenoid synthesis was upregulated after metabolic stimulation, leading to the enhancement of triterpenoid accumulation for resistance, which is consistent with the results of previous research [30,37].
In addition, WGCNA was conducted to further investigate the potential genes involved in the triterpenoids biosynthesis of H. angustifolia. All DEGs were submitted to construct a scale-free co-expression network. The dynamic hierarchical tree algorithm was employed to divide cluster trees constructed by DEGs, resulting in 10 co-expression modules. These modules are named according to their colors, namely blue (1619 genes), brown (891 genes), turquoise (2505 genes), grey (17 genes), yellow (759 genes), pink (55 genes), green (673 genes), black (87 genes), magenta (52 genes), and red (115 genes) ( Figure 3). The genes within each module were selected for enrichment in the KEGG pathway. Statistically signi cantly enriched genes (P < 0.05) were ltrated for deep analyses. Results showed that ve modules (Blue, turquoise, black, green and magenta), consisting of 4,936 genes in total, were related to triterpenoid biosynthesis.

Quantitative real-time PCR (qRT-PCR) to verify the DEGs.
The expression patterns of 41 key candidate genes ( Figure S7, Table S10) involved in triterpenoids biosynthesis were analyzed. Among them, 18 genes presented signi cantly different expression levels among ve treatment groups, nine of them were chosen to analyze the expression level using qRT-PCR to further verify the reliability of our RNA-Seq data. As shown, the expression patterns of these nine DEGs were consistent with the RNA-Seq data (Figure 4), which con rmed the accuracy of our transcriptome data.

Cloning and sequence analysis of the candidate genes
To shed light on acylated triterpenoids biosynthesis in H. angustifolia, putative key enzymes encoding genes including three OSCs (named HaOSC1, HaOSC2 and HaOSC3), four CYP450s (named HaCYPi1, HaCYPi2, HaCYPi3 and HaCYPi4), two TATs (named HaTAT1 and HaTAT2), and one TBT (named HaTBT) were targeted for the functional analysis. Their nucleotide sequences were summarized and exhibited in Table S11 and physicochemical properties of corresponded proteins were documented in the Table S12, two-dimensional structure information of proteins was classi ed in the Table S13 and three-dimensional structures were displayed in Figure S8.
Phylogenetic analyses of three OSCs with a set of well characterized OSCs revealed that HaOSC1 and HaOSC3 were grouped into the lupeol synthase clade, and HaOSC2 was classi ed as a β-amyrin synthase ( Figure 5A, Table S14). In addition, we found that all three candidate OSCs contained complete SQHop-cyclase-N and SQHop-cyclase-C domains. Furthermore, several conserve motifs were present in all three candidate OSCs, including DCTAE, MWCYCR, and QW repeat ( Figure 5B).
The results of phylogenetic analyses also implied that four CYP450 candidate genes were well documented in the CYP716A subfamily ( Figure 5C,  Figure 5D).
Meanwhile, the three candidate acyltransferase coding genes were selected to perform phylogenetic analyses with other functionally characterized BAHDs. HaTBT falls into branch that principally includes acyltransferases related to benzenoid ester production. HaTAT1 and HaTAT2 were aggregated into the same branch as other anthocyanin-modifying acetyl transferases ( Figure S9A, Table S16). Domain analysis showed that the transferase family domain existed in these three candidate genes. And they all possess three amino acid motifs, HXXXD, YFGNC, and DFGWG ( Figure S9B), which are highly conserved in BAHD enzymes [40].

Heterologous expression and functional characterization of the OSCs
Three candidate OSCs were functionally validated by expressing their ORFs in expression vector pESC-TRP. Recombinant plasmids were transferred into yeast (Saccharomyces cerevisiae strain WAT11) by lithium acetate conversion method. Protein expression of the recombinant yeast within 24 hours were investigated, which implied that the recombinant yeast pESC-TRP-HaOSC1 successfully expressed the protein with the molecular mass of about 86 kDa and the protein expression reached the highest level 8 hours after induction ( Figure 6A).
Gal-induced transgenic yeast cells were also analyzed for the accumulation of triterpenoids metabolites. Yeast cells were extracted and metabolites were identi ed through gas chromatography-mass spectrometry (GC-MS) analysis. By comparing the retention time and mass spectra of those new products with authentic standards (a-amyrin, β-amyrin and lupeol), we found that HaOSC1-expressing transgenic yeast accumulated lupeol, which was not present in the control yeast cells transformed with empty vector (Figure 6B-C). The yeast expression experiments were repeated three times, always with very similar results. These results clearly manifested that HaOSC1 encoded a lupeol synthase, which could use 2, 3-oxidized squalene in yeast as substrate to synthesize the target product lupeol ( Figure 6D).

Functional validation of the CYP450s
To determine the product speci cities of HaCYP450s, ORF of four candidates was expressed in yeast under the control of the gal-inducible GAL10 promoter. Induced cells were further used for Western blot analysis, the fusion protein bands of HaCYPi3 of about 54 kDa were observed. The induction time was also investigated, same with HaOSC1, the protein expression reached the highest level at about 8 hours after induction ( Figure 7A).
Based on the previous sequence analysis, HaCYPi3 was predicted to be a potential C-28 oxidase, catalyzing α-amyrin, β-amyrin and Lupeol substrates. Therefore, we rstly transferred pESC-HaCYPi3 and the above pESC-HaOSC1 recombinant plasmid into the same yeast for co-expression. The metabolites after induction were analyzed by GC-MS. Unfortunately, the target product betulinic acid was not detected, which suggested that HaCYPi3 failed to encode for a lupeol oxidase. Then, we co-expressed HaCYPi3 with α-/β-amyrin synthase IaAS previously identi ed by our research group from llex asprell. It is surprising that IaAS and HaCYPi3 co-expressing transgenic yeast accumulated new products compared with empty vector. Except for the corresponding substrate peaks, the retention time and mass spectra of one new product was consistent with oleanolic acid authentic standards (Figure 7 B-C). The results of three repeated experiments were consistent, which clearly demonstrated that HaCYPi3 encoded a βamyrin oxidase, which could use β-amyrin catalyzed by IaAS in yeast as substrate to synthesize the target product oleanolic acid ( Figure 7D). Furthermore, the subcellular localization of HaCYPi3 was observed, results displayed that it was localized in the cytoplasm ( Figure 7E) which is consistent with reports that triterpenoids were synthesized in the cytoplasm.

Discussion
Triterpenoids, as a natural plant product with extensive biological activities, have attracted the attention of many researchers. Currently, the biosynthetic pathways of many triterpenoids have been successfully identi ed, such as ginsenosides [41], ganoderic acid [42] and glycyrrhizic acid [43]. However, acylated triterpenoids are uncommon, which has only been reported in the plant belonging to Helicteres so far. Hence, H. angustifolia can be used as a good research material to clarify the biosynthesis pathway of acylated triterpenoids.
The biosynthesis of triterpenoids has been reported to be affected by plant metabolic stimulation, including exogenous plant hormone treatment and mechanical damage [44][45][46]. Exogenous methyl jasmonate (MeJA) and salicylic acid (SA) can act as signal transduction molecules in plant cells and induce the production of terpenoids, phenols and other secondary metabolites [47,48]. As a means of abiotic stress, mechanical damage can effectively regulate the growth, development, stress response and metabolites of plants [28]. In this study, the content of total triterpenoids in H. angustifolia was analyzed after different concentration hormone stimulation treatment, and the optimal treatment conditions were optimized. Furthermore, transcriptome sequencing of H. angustifolia under the optimized treatment conditions was carried out to reveal the biosynthetic pathway at molecular level, which provided theoretical basis for the effective development and utilization of triterpenoids in H. angustifolia.
Bioinformatics analysis including phylogenetic and conserved motifs suggested that HaOSC2 might be a potential β-amyrin synthase, but in this experiment, new product was failed to be detected in induced yeast metabolites, the reason might be that the protein expressed in yeast cannot be properly folded, resulting in its inactivity. In the future, we will try to construct the plant expression vector of HaOSC2, and further study its function in model plants such as Nicotiana benthamiana or Arabidopsis thaliana.
Acylation is the nal and most critical step in the biosynthesis of acylated triterpenoids, which plays an important role in many modi cations of plant metabolism, such as changing the polarity, volatility, chemical stability, and biological activity of the metabolites [49]. However, there have been only a few studies of plant acyltransferases, predominately focusing on Arabidopsis thaliana [50], Oryza sativa [51], and Populus simonii [52], which limits the further understanding of this family of enzymes. Therefore, our study of triterpenoid acyl transferase in H. angustifolia is helpful to ll in these gaps. In this study, prokaryotic expression vectors carrying three candidate acyltransferases were successfully constructed, and the fusion proteins were induced to express in Escherichia coli Rosetta (DE3) strain. Three puri ed target proteins were successfully obtained by purifying poly (His) (6×His)-tagged proteins using a nickelnitrilotriacetic acid agarose column ( Figure S10). Unfortunately, by comparing the enzyme activity system of the inactivated protein with that of the non-inactivated protein, the results showed that no new products were produced in the enzyme activity reactions of the three candidate proteins. Reason can be speculated that the catalytic e ciency of the enzyme is too low, or the substrates catalyzed by these three proteins are not oleanolic acid and betulinic acid and so on. Hence, their functions still need to be further con rmed.

Conclusion
Helicteres angustifolia, a shrub distributed in Southern China and used in traditional Chinese medicine, is rich in triterpenoids such as betulinic acid, oleanolic acid, helicteric acid, helicterilic acid, and similar derivatives. The biosynthetic pathway of these compounds is of great value due to their valuable medicinal activity. Here, we rstly report the transcriptomic study of H. angustifolia for the exploration of functional genes. Three OSCs, four CYP450s, two TATs, and one TBT were screened out as candidate genes.   Gene co-expression modules in hormone and mechanical damage-treated transcriptome indicated the cluster hierarchical tree constructed by the eigengenes of the modules and the correlation coe cient between modules with a heatmap.

Figure 4
Validation of genes involved in triterpenoid biosynthesis using qRT-PCR. The left Y-axis and black bars represents the relative expression of qPCR, while the right Y-axis and grey bars exhibits the FPKM value of the RNA-Seq data.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.