We carried out transcriptome sequencing of three tissues (leaf, stem and root) of fenugreek plant and constructed a combined assembly along with functional annotation. We investigated the orthology relationship with other plant species and also assessed the differential gene expression across tissues. Further, we explored the secondary metabolite landscape of fenugreek considering metabolites with reported medicinal properties from several classes of compounds (alkaloids, saponins, volatiles, flavonoids, etc.) and mined for the enzymes involved in their synthesis.
3.1 Transcriptome assembly and functional annotation
Combined transcriptome assembly was obtained from three different tissues (leaf, stem and root) with N50 of 1382 bases. The 169.5 Mb assembly contained 209,831 transcripts. This assembly was 92.2% complete (95.49% using partial core genes) as assessed through the BUSCO 20.
The candidate coding regions within the fenugreek transcriptome were predicted using Transdecoder program. A set of 64,670 ORFs were predicted from the assembly which varied from 200 to 2000 base pairs. We obtained annotations, using BLASTX against UniProt Viridiplantae database, for 26% of transcripts (Table 1). Whereas, among the predicted ORFs, we obtained annotation for 90% of ORFs, through BLASTP using E-value threshold of 10− 5 and 50% query coverage cut-off. Most of these ORFs (76%) were associated with Medicago truncatula, a closely related species, as the best hit.
Table 1
Transcriptome assembly and annotation statistics
Details of the transcriptome assembly |
Assembly | Total number of transcripts | Number of transcripts > 1 Kb | N50 (bp) | Size (Mb) | GC% | BUSCO completeness |
Combined assembly | 209831 | 48685 | 1382 | 169.56 | 36.58 | 92.2% (95.49% including partial genes) |
Functional annotation of transcripts and predicted amino acid sequences |
Annotation of transcripts (209831): |
BLASTX | 54577 (26%) |
GO | 60798 (30%) |
rRNAs (RNAmmer) | 8 (0.04%) |
Annotation of predicted amino acid sequences (Total ORFs: 64670): |
Sequence hits (BLASTP) | 58504 (90%) |
Protein domains (Pfam-hmmscan) | 42524 (66%) |
Signal peptides (SignalP v.4.0) | 3743 (6%) |
Transmembrane helices (TMHMM v.2.0) | 12417 (19%) |
Orthologous groups (eggNOG/COG) | 37243 (58%) |
Association with KEGG Pathway | 38930 (60%) |
GO terms | 38790 (59%) |
Among the predicted ORFs, 66% (42524) could be associated with Pfam domains, representing 4400 unique Pfam domains. The leucine-rich repeat (LRR), pentatricopeptide repeat (PPR), tetratricopeptide repeats (TPR) and several transcription factor binding (WD40, F-box, AAA) domain families were the most abundant annotated domains observed. Similar to other angiosperms, several other Pfam domains such as, protein serine/threonine kinases, protein tyrosine kinases, calcium binding EF-hand and ankyrin repeats were found in abundance (Supplementary Fig. 1A). We could associate GO terms to more than 50% of total ORFs. The top enriched GO terms were classified into molecular function (MF), biological process (BP) and cellular component (CC) (Supplementary Fig. 1B). The enriched terms under MF included transporter activity, transcription regulator, antioxidant activity and nutrient reservoir. Whereas, some of the most enriched BP terms were metabolic process, response to stimuli and immune system process. It was observed that transmembrane helices were present in 19% of ORFs, signal peptides were predicted in 6% of ORFs, whereas eight rRNAs were identified. Of the total number of amino acid sequences obtained from the transcriptome assembly, 59% were mapped to functional annotation databases, namely eggNOG (orthogroups), GO and KEGG pathways (Table 1; complete annotation data available in Supplementary Data 1).
3.2 Detection of orthologous relationships
The orthologs were detected using OrthoMCL and ProteinOrtho for fenugreek and closely related plants (Figs. 2A and 2B and Supplementary Fig. 2). Among the five plants used for OrthoMCL analysis, 8822 orthogroups were common to all whereas, for ProteinOrtho, 114 orthogroups were common across 39 plants considered for the analysis. It was observed that the orthogroup distribution was highly similar across all the plants from subfamily Faboideae (G. max, P. vulgaris, T. pratens, M. truncatula and T. foenum-graecum). We constructed the phylogenetic tree using selected plants from subclass Rosids and O. sativa. (Fig. 2B).
We observed 20,230 fenugreek specific paragroups and singletons, among which only the longest ORFs (15,712, 24% of all predicted gene products) were assessed for their functional annotation (hereafter referred to as singletons). A total of 6300 of the singletons in fenugreek proteins show presence of 1771 unique Pfam families. PPR, LRR, protein kinases, protein tyrosine kinases, zinc knuckle and reverse transcriptase domain were among the most abundant Pfam domains in the singleton proteins (Supplementary Fig. 3A). Almost half of the fenugreek singletons (7235) were found to have Gene Ontology (GO) annotation. Nucleotide/nucleic acid binding, metal ion binding were abundant in molecular functions within GO terms (Supplementary Fig. 3B). Translation, regulation of transcription and defence response were the most abundant biological process (Supplementary Fig. 3C). The most common cellular component term within these GO terms was ‘integral component of membrane’ (Supplementary Fig. 3D).
3.3 Differential expression analysis
The transcript abundance across all three tissues (leaf, stem and root) was assessed. The top 20 significantly expressing transcripts were identified from each tissue and are represented as a heatmap (Fig. 3). While observing variation between highly expressed transcripts in each tissue, we could identify six transcripts (protease inhibitor, metallothionein, dehydrin B, thioredoxin H, plant invertase inhibitor and translationally controlled tumor-like protein) highly expressed across three tissues. Considering the leaf sample, the highly expressed transcripts were mostly related to photosynthesis, for example, ribulose bisphosphate carboxylase, chlorophyll a-b binding protein, carbonic anhydrase etc. In the stem, apart from photosynthesis related transcripts, few cytochrome genes and pathogenesis related protein families were highly expressed. In the root, the top three highly expressed annotated transcripts included metallothionein, dehydrin B and Nmr-A like family of genes. These genes are well documented to be involved in drought tolerance in crops 47–50. The tissue-wide expression values for enzymes involved in synthesis of several secondary metabolites considered in this work are mentioned in the subsequent section.
3.4 Identification and analysis of genes involved in synthesis of secondary metabolites
We explored the secondary metabolites landscape in fenugreek with an aim to identify the transcripts involved in their biosynthesis. The candidate metabolites and the enzymes involved in their biosynthesis were chosen following an extensive literature survey. These spanned several classes of secondary metabolites: alkaloids, saponins, volatiles and flavonoids.
3.4.1 Alkaloids
Alkaloids are heterocyclic nitrogen containing compounds displaying a wide array of pharmacological properties. These compounds are not only associated with plant growth and defence but are a storage source of nitrogen 51. We considered trigonelline and glycine-betaine from the alkaloid class found in fenugreek. We pursued identification of enzymes involved in their biosynthesis by adopting the strategy described in Joshi et al (2020) 38.
Trigonelline is a derivative of nicotinic acid, first isolated from fenugreek (Trigonella foenum-graecum). Trigonelline has several medicinal properties such as hypoglycaemic, neuroprotective, antibacterial, and anti-tumour activities 52. It is synthesised through purine nucleotide cycling, wherein, nicotinate is methylated to methyl nicotinate (trigonelline) by nicotinate N-methyltransferase (NNMT) (EC: 2.1.1.7). We selected and curated protein sequences for this enzyme found in other plants, as a start point to carry out sequence search in fenugreek transcriptome. There are five functionally important residues (FIRs), a substrate-binding motif Asn 21, Tyr 120, His 124 a catalytic motif Thr 264 and a SAM binding motif in the NNMT enzyme 42. Although 12 hits were obtained, a single hit (Tfoe_c19814_g2_i1_m7872) was identified based on co-clustering with Medicago truncatula sequence (NCBI RefSeq ID: XP_013464471.1) (Supplementary Fig. 4E) with a sequence identity of 92.6 %and a 100 %query coverage. Further this hit was validated through mapping of the FIRs (Supplementary Fig. 4A-D). Based on the TPM values, this transcript was expressed highest in the leaf tissue sample (Fig. 4, Supplementary Table 3). This trend was also similar to the expression level estimation through qRT-PCR (Supplementary Fig. 5A). Trigonelline, as a compound, was quantified across different tissues of fenugreek as described further in section 3.5.
Glycine betaine is one of the major osmoprotectants in plants and also contributes to maintaining membrane and cell stability. It plays a protective role against inflammation, carcinogenesis and neurodegenerative diseases 53–55. It was identified in the extract of seeds of fenugreek 56. It is synthesized in plants by the oxidation of choline through a two-step process (Supplementary Data 2). The first step is catalyzed by choline monooxygenase (CHMO) (EC: 1.14.15.7) which is an unusual iron-sulfur containing enzyme. We identified one hit (Tfoe_c38951_g1_i2_m30577) from fenugreek which contained both the motifs (Rieske 2Fe-2S and non-heme binding region). The second step catalyzes the formation of glycine-betaine from betaine-aldehyde by betaine aldehyde dehydrogenase (BADH) (EC: 1.2.1.8) 57. We identified one hit for BADH (Tfoe_c82300_g1_i1_m72031) and mapped the FIRs (proton acceptor: Glu260, Nucleophile: Cys294 and cation-pi interacting Trp285, Trp456). The hits for both the enzymes in fenugreek co-clustered with the M.truncatula proteins (Supplementary Data 2). Both these transcripts were relatively abundant in the leaf tissue sample (Supplementary Table 3).
3.4.2 Saponins
Saponins are amphipathic glycosides grouped phenomenologically by the soap-like foaming they produce when shaken in aqueous solutions. These metabolites are found in the plant families- Sapindaceae, Aceraceae, Hippocastanaceae and Fabaceae 58. We focussed on the biosynthetic pathway of diosgenin, a steroidal sapogenin attributed to treatment of metabolic disorders such as diabetes and hypercholesterolemia and also implicated in treatment of cancer, inflammation and several infections 10. Tissue-wide diosgenin content varies in fenugreek plant and is reported to be present in highest quantity in seeds. Moreover, its quantity differs across several cultivars and is considered one of the criterion in crop improvement programs 59.
From the proposed biosynthetic pathway of diosgenin, as per Ciura et al. 2017 60, (Supplementary Data 2), putative sequences corresponding to the last two enzymes (sterol-3β-glucosyltransferase and β-glucosidase) were identified from the set of protein sequences obtained from the transcriptome assembly. Sterol-3β-glucosyltransferase (EC: 2.4.1.173) contains two motifs- putative steroid binding domain (PSBD) motif and plant secondary product glucosyltransferase (PSPG) motif 61. The hits obtained from the searches against the proteins sequences were filtered for the presence of these two motifs and of the four hits that contained this motif, from the FIR analysis, two transcripts (Tfoe_c26467_g1_i1_m12030 and Tfoe_c33087_g1_i3_m19456) expressing significantly in all three tissues of the plant were considered as true hits (referred henceforth as sterol-3β-glucosyltransferase-1 and − 2). While sterol-3β-glucosyltransferase-1 (Tfoe_c26467_g1_i1_m12030) clustered with a sequence from a closely related plant, M. truncatula, the sterol-3β-glucosyltransferase-2 (Tfoe_c33087_g1_i3_m19456) clustered with the enzyme sequence from Glycine soja. Sterol-3β-glucosyltransferase-1 expressed well in all tissues of the plant (leaf, stem and root) while sterol-3β-glucosyltransferase-2 showed relatively higher expression in leaves and this was further validated using qRT-PCR (Supplementary Fig. 5B, 5C).
The last enzyme in the biosynthetic pathway of diosgenin, β-glucosidase (EC: 3.2.1.21), is known to contain two catalytic glutamate residues which are part of the (I/V)TENG and TFNEP motifs and these have been identified from crystal structures of β-glucosidase from rice in apo and bound forms (2RGL and 2RGM) 62. We identified the β-glucosidase in fenugreek (Tfoe_c120410_g1_i1_m78951) and confirmed the presence of these motifs. Although, we found a total of 22 such hits which had > 70% query coverage and > 30% sequence identity, with the query β-glucosidase sequences from Viridiplantae, the selected candidate protein Tfoe_c120410_g1_i1_m78951 clustered with the enzyme from M. truncatula and was thus considered a true hit. This transcript showed high expression in root followed by stem and the same relative expression pattern was observed from qRT-PCR validation (Supplementary Fig. 5D, Supplementary Table 3). The metabolic pathway, sequence motif, phylogenetic tree and the expression data for all diosgenin synthesising enzymes is documented in Supplementary Data 2.
3.4.3 Volatiles
The volatiles are a class of compounds which typically include aromatic organic compounds. These compounds are used in perfumes, flavours, topical formulations, etc. The medicinal properties of essential volatiles have been known since ancient times. We have explored the enzymes involved in the biosynthesis of eugenol and linalool which are volatile oils found in fenugreek 63,64. The therapeutic potential of eugenol and isoeugenol, which are phenolic compounds, is attributed to antioxidant potency and anti-inflammatory benefits 65,66. Linalool, a terpene, has been reported to show anticancer, analgesic, anxiolytic and other neuroprotective properties 67. Eugenol and isoeugenol are phenylpropene volatiles that have a phenyl ring with a propenyl side chain synthesized from coniferyl alcohol precursor in multiple enzymatic reactions (Supplementary Data 2). We studied three enzymes in the eugenol synthesis pathway: Coniferyl alcohol acetyltransferase, eugenol synthase and isoeugenol synthase 68. Coniferyl alcohol acetyltransferase (EC: 2.3.1.224) associated with two proteins (Tfoe_c37673_g1_i1_m27189, Tfoe_c102741_g1_i1_m76275), however, Tfoe_c37673_g1_i1_m27189 was selected to be the true candidate gene (marked in Supplementary Data 2) since it co-clustered with the M. truncatula protein and was also supported by the presence of the FIR. Similarly, the candidate proteins from fenugreek for eugenol synthase (EC: 1.1.1.318) and the isoeugenol synthase (EC: 1.1.1.319) co-clustered with clades belonging to known enzymes (highlighted in Supplementary Data 2). The TPM values across tissues for each of the enzymes involved in synthesis of eugenol are represented in Supplementary Table 3. The transcripts corresponding to both the enzymes were found to express highest in root compared to the other tissues.
In the linalool synthesis pathway we studied three enzymes: Farnesyl-pyrophosphate synthase, (3S)-linalool synthase and (3R)-linalool synthase 69,70. (3S)-linalool and (3R)-linalool are two enantiomeric forms of naturally occurring monoterpenoids. The candidate protein from fenugreek for farnesyl-pyrophosphate synthase (EC: 2.5.1.1) (Tfoe_c5299_g1_i1_m1972) co-clustered with another legume Lupinus albus (Fabaceae family). The (3S)-linalool synthase (EC: 4.2.3.25) (Tfoe_c11093_g1_i1_m4195, Tfoe_c11093_g1_i2_m4196) and (3R)-linalool synthase (EC: 4.2.3.26) (Tfoe_c44084_g4_i2_m55446) co-clustered with other sequence homologues (as highlighted in the Supplementary Data 2). The TPM values for each of the enzymes involved in synthesis of linalool are represented in Supplementary Table 3. While the first enzyme, i.e., farnesyl-pyrophosphate synthase is expressed highest in root compared to the other tissues, (3S)-linalool synthase and (3R)-linalool synthase is expressed highly in leaves.
3.4.4 Flavonoids
Flavonoids, an important class of secondary metabolites with a polyphenolic structure, widely found in fruits, vegetables and certain beverages. This class of compounds has been linked to a variety of nutraceutical, pharmaceutical, and medicinal properties, including antioxidative, anti-inflammatory, antimutagenic, and anticarcinogenic properties 71. We studied biosynthetic pathways of four flavonoids (quercetin, vitexin, isovitexin and rutin) known to be present in fenugreek 63. These metabolites are synthesised from coumaroyl-CoA through different enzymatic reactions. The flavonoids share a common flavonoid backbone structure with differences in their hydroxylation patterns.
Quercetin biosynthesis pathway involves six enzymes (4-Coumarate-CoA ligase (EC: 6.2.1.12), chalcone synthase (EC: 2.3.1.74), chalcone flavone isomerase (EC: 5.5.1.6), flavanone 3-hydroxylase (EC: 1.14.11.9) / flavonol synthase (EC: 1.14.20.6), tricin synthase (EC: 2.1.1.175) and flavonoid 3′-monooxygenase (EC: 1.14.14.82)) that convert coumaroyl-CoA to quercetin through several intermediates 17. For each of the six enzymes involved in the pathway, multiple transcript hits were identified in fenugreek as mentioned in Supplementary Table 3. The sequence hits for these enzymes were validated as true hits, based on co-clustering with well annotated sequence of same sub-family observed in the phylogenetic tree (as described in Methods section 2.5). It was observed that the selected enzymes are highly expressed in roots of the plant followed by abundance of expression in the stem (Supplementary Table 3, Supplementary Data 2).
We further studied biosynthetic pathways for vitexin and isovitexin, which are two enantiomeric metabolites. The enzymes involved their pathway are identical 72, and they produce these metabolites in a mixture. A three-step reaction is known to be involved in the conversion of naringenin to vitexin/isovitexin. The first two reactions are enzymatic while the third step (hydration reaction) 73, although, has been hypothesized to be an enzymatic one, there is no definitive enzyme that has been isolated for it. For the first two enzymes (naringenin 2-hydroxylase (EC: 1.14.14.162) and C-glucosyltransferase (EC: 2.4.1.360)) known in the reaction, four and 13 hits were obtained from the fenugreek transcriptome Supplementary Table 3. These hits were validated using phylogenetic analysis of the enzymes from the same sub-family (Supplementary Data 2). The highest expression of the naringenin 2-hydroxylase in the pathway was observed in the stem, while for C-glucosyltransferase, although there were 13 hits identified fenugreek, most of them were highly expressed in leaf.
The third flavonoid, rutin, is synthesised in plants from taxol. The pathway involves two enzymatic reactions. While vitexin/isovitexin are obtained by C-glycosylation, rutin is obtained by O-glycosylation of the substrates. The enzymes involved in the pathway are flavonol-3-o-glucosyltransferase (EC: 2.4.1.91) and flavonol-3-o-glucoside rhamnosyltransferase (EC: 2.4.1.15) 74. There were 11 hits identified for flavonol-3-o-glucosyltransferase, and most of them had higher expression value in leaf and root tissue. In case of flavonol-3-o-glucoside rhamnosyltransferase a single hit (Tfoe_c121323_g1_i1_m79650) was identified with a high expression in leaf (Supplementary Table 3, Supplementary Data 2).
3.5 Quantification of metabolites in different tissue samples
Fenugreek is a traditionally well-known plant for its medicinal properties and the presence of complex mixtures of key phytochemicals renders this plant such benefits 4,63. In the literature, details of these chemical constituents in fenugreek tissue extracts have mainly been demonstrated by bio-analytical techniques. We quantified, presence of important metabolites- diosgenin, trigonelline and 4-hydroxyisoleucine (4-HIL) in different tissues (leaf, stem, root and seed) of fenugreek, by HPLC and mass spectrometry. Although high-quality RNA could not be purified from the seed tissue, we used it nonetheless for metabolite quantification. Diosgenin and trigonelline were quantified using HPLC-PDA and 4-HIL was quantified using LC-MS analysis from extracts of different tissues as described in Supplementary Methods section 1.2. Supplementary Figs. 6A and B show the standard curve of diosgenin and trigonelline respectively. The concentration of each compound was quantitatively determined by comparison of the peak area of the standard with that of the samples. Supplementary Figs. 7A-D and 7E-H refers to the HPLC peaks which showed the presence of trigonelline and diosgenin, and Supplementary Fig. 7L-N refers to LC profile from LC-MS data of 4-HIL. As seen in Supplementary Table 4, seed gave the highest concentration of diosgenin i.e. 1.56 µg/ml of the tissue followed by the root (1.036 µg/ml). Leaf tissue showed the highest concentration of trigonelline i.e. 4.715 µg/ml of the tissue followed by the seed (2.698 µg/ml). 4-HIL was detected at a very high concentration of 84.77 µg/ml in the dry seed and 77.48 µg/ml from the leaf. The concentration of 4-HIL was observed to significantly decrease with water-soaked seeds post 12 hours and 24 hours. The concentrations of these medicinally important metabolites vary across several cultivars of fenugreek 75,76.