The total steroids contents (TSCs), metabolite profiles and obtained RNAseq data
The total steroidal saponins contents (TSCs) in the roots (Rs), spears (Ss) and flowering twigs (Fs) of both green and purple asparagus were determined and the results showed that there was no significant difference between the same organs of cultivars, however, the TSCs of Rs in both cultivars of asparagus were significantly higher than that of the Ss or Fs respectively (Fig. 1A G). A total of 437 metabolites were detected in18 samples, including 2 terpenoids (C09, C13) and 18 steroids (including cholesterol(C10), 3β-alcohol-5-β progesterone-16-ene-20-one-3-O-a-L-arabinopyranosyl(C14), 4 furostane saponins (C06, C07, C18, C19), 7 isospirostane saponins (C01 ~ 05, C08, C11), and 6 spirostane saponins (C12, C15 ~ 17, C20) (Fig S1). Furthermore, most steroidal compounds were accumulated with a higher content in Rs which is demonstrated by the heatmap of metabolites abundance (Fig. 1H). The steroidal compounds' profiles and contents are consistent with the TSCs determined with the spectrophotometric method. In contrast, C10 and C14 showed a high abundance in all three detected organs, which may be due to CHOL being an important component of cell membranes and being the precursor for synthesizing brassinosteroids (BRs) as well [14], while the latter (C14), which had endocrinologic modulation in pregnancy with fascinating immunomodulatory capabilities [15], is unknown for its function in asparagus with unknown mechanism why it highly accumulated in all detected organ of asparagus cultivars. The results also showed that C05 and C06 are highly accumulated in Rs of green and Fs of purple asparagus cultivars, while C08, C12 and C15 only have higher concentrations in Rs of purple asparagus cultivars (Fig. 1G, Fig S1).
A total of ~ 1.3 G clean reads were obtained by quality control from raw Illumia read data with 18 samples of Rs, Ss and Fs of both green and purple asparagus cultivars with an average length of 149 ~ 150 bp respectively. A total clean base of 192 Gb with high average quality(Q) value of 35.9 and an average genome matching rate of 74.3% to the reference asparagus genome was used for gene expression analysis (Table S1). The mixture of RNAs of Rs, Ss and Fs of male, female and andromonoecious “purple passion” was further sequenced with PACBIO RSII platform to obtain 43.98, 16.98 and 28.58 Gb raw data respectively. The raw reads of insertion (ROI) were processed with IsoSeq3 pipelines to get 1.6, 0.6 & 1.0 Gb circular consensus sequence (CCS) with an average length of 2.4, 2.0 & 1.7 kb and a mean number of passes of 32, 24 & 25 respectively (Table S 2). The resulting Pacbio RNAseq data were used for optimizing annotation of the detected synthetic and regulatory genes of SSP in asparagus, and detection of their possible exon alternative splicings (ASs) and genes fusions (fusions).
Analysis of differential metabolites accumulation and correlated genes expression among organs and between asparagus cultivars
The principal components analyses (PCAs) were used for both metabolite abundances and all genes expression analyses. PCA results of genes expression (FPKM) from Illumia RNAseq data showed a similar clustering pattern among Rs, Ss and Fs of both green and purple asparagus cultivars, in which Ss and Fs clustered together within both green and purple asparagus samples respectively, however, they have separated distribution between the two cultivars. Moreover, GRs and PRs themselves were clustered in one direction with obvious separation, and Rs are significantly different from all Ss and Fs samples respectively (Fig S2A, Fig. 2A, B). In addition, the PCA results of all detected steroidal metabolites (Fig. 2A) also showed a similar clustering pattern as well, indicating the detected metabolites (including steroids) biosynthesis and accumulation were closely correlated to the detected expression genes of metabolism and regulation of those metabolic compounds. The TSCs (Fig. 1G) and steroid metabolites profiles (Fig. 1H) in detected samples showed that there are greater differences in steroids content and profiles between Rs vs Ss and Rs vs Fs, and Ss vs Fs within the green and purple asparagus cultivars show non-significant differences, while Rs, Ss and Fs between green and purple cultivars show significant differences (Fig. 2). These PCA results suggest there were similar expression patterns of SSP genes within and between organs and cultivars of both green and purple asparagus. Therefore, the differential metabolites (DMs) and differential expression genes (DEGs) analyses of the four groups, GRs vs GSs, GRs vs GFs, PRs vs PSs, and PRs vs PFs, were selected to perform further analyses. The Venn diagrams of DMs and DEGs from the above 4 groups showed that 104 metabolites and 3,303 genes were intersected for both metabolites contents and genes expression which were differential between Rs vs Ss and Rs vs Fs (Fig S2 F, G), in which 33 up-regulated differential metabolites (UDMs) and 1309 up-regulated differentially expressed genes (UDEGs) with higher contents and higher expression in Rs were selected for further analyses (Fig. 2C,D).
For analyzing the diffident metabolite and gene expression of the same organs between green and purple asparagus cultivars, GRs vs PRs, GSs vs PSs, GFs vs PFs of both Illumia RNAseq reads and metabolic compounds were also conducted for their DEGs and DMs analyses. The Venn diagram of DMs and DEGs from the above 3 groups showed that 23 metabolites and 3174 genes were intersected for both metabolite contents and genes expression with differences among GRs vs PRs, GSs vs PSs and GFs vs PFs (Fig S2 E). The results showed that both the DEGs of GRs vs PRs and GSs vs PSs were only enriched in the pathway phenylpropanoid biosynthesis. While the DEGs of GFs vs PFs were not only enriched in pathways of phenylpropanoid biosynthesis but also the biosynthesis of the steroid-related pathway, steroids hormone biosynthesis &signaling transduction and second metabolite modification catalyzed by CYP450 family’s genes as well(FigS2B ~ D), indicating purple and green asparagus cultivars have differentiations in flavonoids and other important nutrient components including steroids. The results were consistent with the different phenotypes of green and purple colors of their spears and different nutritional components detected by metabolomics analysis in Fs and Ss of the cultivars respectively (Fig. 1G, Fig S2F, G).
To fully screen the key genes of the biosynthesis and regulation of steroidal metabolites, except the above DGE analysis method, the alternative strategy with co-expression genes modules correlated steroidal compounds contents was conducted by WGCNA as well. A total of 18 samples with gene expression matrix normalized by FPKM were used to conduct co-expression analyses. The results showed that all the expressed genes were clustered into 19 co-expression genes modules labeled with different colors (Fig S3), and the genes module correlated the abundances of 18 steroids and 2 terpenoids were analyzed (Fig S1, Fig. 2E). According to the result, 7 modules, including 6 positive and 1 negative, were selected with a critical standard of p-value ≤ 0.05 and absolute correlation value ≥ 0.7. Then the genes in the modules positively correlated to steroids were named co-expression genes modules of SSP (SSPGM). To improve the reliability of key genes of SSPGM, the intersection of the UDEGs and SSPMG were merged to obtain 814 genes, furthermore, GO and KEGG enrichment analyses were conducted using the resulting 814 intersected genes. The GO enrichment results of 814 intersected genes sets showed that the genes were enriched in terms of steroid biosynthesis and responding to wounding, etc in the biological process of GO, while molecular function enriched in steroid hydroxylase activities, which was essential for steroidal saponin synthesis for CHOL modification with P value ≤ 0.05 (Fig. 3A). The KEGG enrichment results (Fig. 3B) also showed that steroids and brassinosteroid biosynthesis pathways were enriched. Based on the above enrichment analysis, 29 key candidates of SSP genes were screened, including 12 cytochrome P450s (P450) family genes, 2 glycosyltransferases (GTs) family genes, 3 glucosidases (GHs) genes for DSSP and several CHOL synthesis genes for USSP (Fig S4, Fig S5).
Identification of the genes involved in USSP of asparagus
Due to strict selections in the above analyses, the above-selected USSP synthetic genes, which may only code for the key steps of catalytic enzymes, were not the complete gene sets of asparagus USSP. Therefore, the protein sequences of enzymes encoded for the cholesterol synthesis genes whose functions have been characterized were downloaded from NCBI (https://www.ncbi.nlm.nih.gov/) or/and UniProt (https://www.uniprot.org/) for homologous searching with BLAST, following the construction of clustering trees (Fig S4) by MEGAX to obtained possible CHOL synthetic genes sets in asparagus. The results showed that each step of enzymatic genes of CHOL synthesis was predicted with critical standards with protein similarity of 50% and average coverage of 85%. The CHOL synthesis homologous genes of asparagus were obtained which include 1 acetyl-CoA C-acetyltransferase (AACT), 2 hydroxymethylglutaryl-CoA synthases (HMGSs), 2 hydroxymethylglutaryl-CoA reductases (HMGRs), 2 mevalonate kinases (MVKs), 1 phosphomevalonate kinase (PMVK), 1 diphosphomevalonate decarboxylase (MVD), 2 isopentenyl-diphosphate Delta-isomerases (IDIs), 1 geranylgeranyl diphosphate synthase (GGPPS), 1 farnesyl diphosphate synthase (FPPS), 1 (S)-squalene synthase (SS), 2 (S)-squalene-2,3-epoxide hydro-lyases (SEs), 2 cycloartenol synthases (CASs), 3 sterol side chain reductases ( SSRs), 4 C-4 sterol methyl oxidases (SMOs), 1 cyclopropylsterol isomerase (CPI), 1 sterol C-14 demethylase (CYP51), 2 sterol C-14 reductases (C14Rs), 3 C-4-OH-sterol methyl oxidases (3βHSDs), 2 sterol C-5 (6) desaturase (C5-SDs), and 1 7-dehydrocholesterol reductase (7-DR༉(Fig S4 ). These asparagus CHOL biosynthetic genes were manually annotated optimizely and detected their possible exon alternative splicings (As) and gene fusion (fusion) with RNAseq data of both Illumina and PacBio platforms (Table S3). The results showed that all the USSP genes except a 3βHSD (evm. model. AsparagusV1-06.1107, (06.1107), same as below) have full CDS with transcripts harboring both 5’ and 3’ translational regions (5’UTR and 3’UTR), additionally, a PMVK (3.1112), a CYP51 (Unassigned.240, (Un.240) same as below), an SMO (07.13), a C14R (Un. 725) have ASs; and an HMGS (5.1543) have fusion, while a βHSD (01.3101) have both fusion and ASs indicting the genes of USSP in asparagus annotated at isoform level and the ASs and Fusions of these genes may have contributed to the regulation of USSP for different synthesis and accumulation of CHOL among organs and between asparagus cultivars (Table S3, Fig S4).
Identification of the genes involved in DSSP of asparagus
It is generally believed that CHOL is used as an important intermediate metabolite in the biosynthesis of steroidal saponins, further CHOL modification catalyzed by a series of hydroxylases, oxidases, glycosyltransferases (GTs) and/or glucosidases (GHs) to get the final steroids and steroidal saponins. Therefore, to screen the key hydroxylases genes involved in DSSP, the homologs searching with functionally characterized P450 genes involved in steroidal saponins biosynthesis were performed, and the phylogenetic tree was conducted as well (Fig. 4A ~ C, Fig S5). The results showed that 3 asparagus genes (03.2646, 03.698, 03.2646) of the P450 superfamily from asparagus (AoCYP450s) clustered together with the known function of CYP90s. Further, multiple protein sequence fine alignment (MSA) showed that 03.2646 was highly conserved in 4 key amino acid residues, which are essential for catalytic activities in AtCYP90B1, a representative of the enzyme catalyzing the CHOL to C22-hydroxylatide CHOL for brassinosteroids biosynthesis in Arabidopsis thaliana [16](Fig S6); and the 03.698 is highly conservative in those amino acid residues in functionally characterized DzCYP90B71 [11] and VcCYP90B27 [9], while the 03.2424 was conserved in 5 key amino acid residues with the characterized DzCYP90G6 and PpCYP90G4 playing roles as a difunctional enzyme of C16 hydroxylase and C22-keto sterol oxidase (S22O-S16H)(Fig S6, Fig S7) (Zhou et al., 2022). Therefore, 03.2646 was referred to as brassinosteroid biosynthesis-related C22 Sterol hydroxylase (BR-S22H), 03.698 as steroidal saponin-related C22R hydroxylase (S22H), and 03.2424 as s bifunctional enzyme of S22O-16H for CHOL modification based on their expression patterns in both green and purple asparagus (Fig. 4A ~ B, Fig S5 ~ S7). Based on the structure of detected steroidal saponins which includes furosteroidal, isospirosteroidal and spirosteroidal saponins, such as, pseudoprotodioscin (C06), asparasaponin Ⅱ (C18), dioscin (C04) and asparanin B (C17) (FigS1, Fig S5 ~ S7), the glycosyltransferases ( GTs), glycoside hydrolases(GHs) and steroidal C26 hydroxylase (S26H) should play important roles in DSSP as well, therefore all detected GTs and GHs families genes in both UDEGs and SSPGM intersection set were blasted with known functional enzymes in CAZY database (http://www.cazy.org/ Home) to screen candidate furostanol glycoside 26-O-β-glucosidases(F26Gs) and GTs in DPPS of asparagus (Fig. 4D), The results showed that 3 F26Gs (Un.946, 09.871 and 09.1129) and 2 C3 steroids-β-glycosyltransferases (S-3β-GTs) (04.318, 04.386) were selected based on their list in UDEGs and SSPGM sets, but also closely clustered with functionally characterized F26Gs and S3GTs with their conserved amino acid motifs (Fig. 4D, Fig. 4E ). By using similar methods, 6 S26H (05.2861, 05.2864, 07.923, 08.1961, 08.2023, 08.2077) were selected (Fig. 4C, Fig S5). The resulting 2 genes located on chromosome (Chr) 5 (05.2861, 05.2864) clustered to a clade of CYP72A with functional confirmation in P. polyphylla (PpCYP72A616 with NCBI accesses No: QDS03631.1 same as below); and 1 gene located on Chr 8 (08.2023) clustered to CYP94D108 with functional confirmation in P. polyphylla (PpCYP94D108, QDS03629.1); while, 3 additional genes (07.923, 08.1961 and 08.2077) located on Chr 7 and Chr 8 respectively have high expression levels in Rs in both cultivars, thus were selected as additional possible candidate genes of S26Hs in asparagus as well(Fig. 4C, Fig. 5 and Fig S5 ~ S7).
In addition, it was reported that S16DOX and St16DOX belonged to the 2-OGD family members were involved in the steroidal C16 hydroxylation [17] in steroidal alkaloid biosynthesis, so these proteins were used for BLAST and multiple sequence alignments to search for additional steroid C16 hydroxylases in asparagus belonging to the 2-OGD family. 1 gene (Un.946), was identified as an additional possible C16 hydroxylase in asparagus (Fig S8).
Correlation network of steroidal compounds synthesis regulatory transcription factor (TF) genes
To screen transcriptional factors (TFs) or regulators of steroidal saponins synthesis, all the above-selected intersections of both UDEGs and SSPGM gene were submitted to PlanTFDB (http://planttfdb.gao-lab.org/) for TFs prediction, and resulted in 61 TF genes in 19 TF families including 12 bHLHs, 9 MYBs, 4 C2H2 & 2 C3H1 type zinc fingers (ZFs), 4 ERFs, 2 TALEs, 4 WRKYs and 1 DOF were predicted (Fig. 5A). Based on transcription factor binding sites (TFBSs) prediction in the promoters, which are 2000 bp upstream flanking from the start coding (ATG) of the SSP synthetic or regulatory genes respectively (Fig S9). The TFs and steroids synthetic genes promoter with corresponding TFBSs were regarded as regulation pairs, correlation analysis of expression (FPKMs) between TFs and synthetic genes pairs was conducted based on Pearson correlations with R package psych [18], and the resulting correlation network was visualized by Cytoscape [19] (Fig. 5C). The results showed that 16 TFs, 9 genes involved in 7 catalytic steps of USSP and 5 genes involved in 4 catalytic steps of DSSP were found to have a positive correlation in the network (≥ 0.8 & p-value ≤ 0.05) (Fig. 5A). In detail, the USSP genes (07.2410, 04.1647, 04.2945, 04.1135, 05.1543, 07.130), are correlated with C2H2 zinc finger families, and the gene of 04.1647 was correlated with 3 WRKY families, while genes of 07.130, 02.908, 03.1030 and 01.1069 were correlated TALE families [20]. While the DSSP genes seem to be related to multiple TFs; 03.698 (S22H) is regulated by 2 C2H2(08.1158 and 08.1302); 04.386 (S-3βGT) is corelated with 3 C2H2 type ZFs (04.3454, 10.1786, 08.1158) and 1 Dof (05.2213); 08.196 (S26H) is correlated by 1 Dof (05.2213), 2 C3H type ZFs (01.1829, 05.2823) and 1 C2H2 type ZF (08.1302); while 03.2424 (S22O-16H) is correlated by 5 MYBs (03.863, 07.13, 07.1206, 0.1562, 04.76) and 1TALE (01.2664). These correlated TFs in the network may be important for the regulation of synthetic genes in SSP, especially the DSSP. Further detailed cis-element prediction results (Fig S8, Table S4) showed that ~ 21 kinds of cis-elements were predicted, including 141 ABREs (involved in the abscisic acid responsiveness), 112 MYCs (motifs responding to chilling), 96 STREs (stress response element), 70 CGTCA-motifs (involved in the MeJA responsiveness), 55 as-1(salicylic acid- and auxin-responsive element), 55 TGACG-motifs (involved in the MeJA-responsiveness), 53 AREs (antioxidant response element related to the anaerobic environment), besides these, EREs (ethylene-responsive element), WRE3 and WUN-motifs (wound responsible element), DREs (dehydration-responsive element), GARE-motifs (gibberellin-responsive element) were also found in both TFs and synthetic genes of SSP in asparagus (Fig S9, Table S4). These results suggest that the cis-elements in both synthetic and TFs genes of SSP might be regulated by stresses including cool, wound and drought, and the stress signaling integrated into phytochrome including abscisate (ABA), ethylene (ETH), salicylate(SA) and jasmonate (JA) and GA signaling network to induce steroidal compounds production to respond to environmental stress stimuli for adaptation and survival of asparagus.
To confirm the key synthetic and regulatory genes expression of SSP in asparagus from RNASeq data, 14 genes including TFs coding genes were selected for qRT-PCR amplification to conduced gene expression analysis in different organs from purple asparagus with primers listed in Table S3. The results showed that the expression patterns of both RNAseq and qRT-PCR analysis were nearly consistent (Fig. 5B) indicating the correctness of genes expression detected from RNAseq data. The PacBio RNASeq data were further used to optimize annotation of the predicted DSSP genes and all correlated TFs with both 5’UTRs and 3’UTRs. Further ASs and fusions detection of DSSP-related genes found 09.871 encoding F26GH, 03.698 coding an S22H, and a correlated 04.3454 encoding TF of C2H2 type ZF have detected with gene fusion; while the 09.871gene have alternative splicings as well. It is interesting to find that the DSSP-related genes have less exon alternative splicing and gene fusion than the genes found in USSP (Table S5, Fig S10), indicating gene fusion and AS of SSP play important roles in steroidal compounds synthesis and accumulation, especially the genes of USSP for CHOL biosynthesis.
The proposed full steroidal saponins biosynthetic pathway in asparagus
The biosynthesis pathway of steroidal saponins in asparagus was inferred including the key steroid metabolites, their synthesis genes and regulatory TFs as well (Fig. 6). The biosynthesis of steroidal saponins of asparagus can be simply divided into USSP (de novo synthesis of cholesterol) and DSSP (steroid skeleton modification). It is believed that the USSP is the synthesis of CHOL through the MVP pathway in the cytosol, and the genes encoding corresponding enzymes of each step of CHOL are clear in model plant. In this study, steroids including CHOL, a progesterone derivative, 4 furostanol saponins and 13 spirostane or isospirostane saponins were detected (Fig. 1H, Fig S1), suggesting that the DSSP of asparagus is the multi-branched pathway to synthesis these detected steroids starting from the CHOL modification. The de novo synthesis of CHOL in USSP is catalyzed by many enzymes, including AACTs, HMGSs, HMGRs, MVKs, PMVKs, MVDs, IDIs, GPPSs, FPPSs, SSs, SEs, CASs, SSRs, SMOs, CPIs, CYP51s, C14Rs, 8,7-SIs, C5-SDs, 7-DRs, etc as described in model plant[6], while the DSSP starts with 03.698 catalysts CHOL to 22R-OH-CHOL; then form 22-keto-16-OH CHOL catalyzed by 03.2424; there is also another possible catalytic pathway: a steroidal C16 hydroxylase of 2-OGD family catalysts 22R-OH-CHOL to 16S,22R-diOH-CHOL, then catalyzed to 22-keto-16-OH-CHOL by S22O-16SH e.g. 03.2424 or unknown oxidase(s). The 22-keto-16-OH-CHOL is unstable and may be able to cyclize to form furostanol spontaneously or catalyzed by the unknown enzyme(s). Then the C26 of the steroidal skeleton is hydroxylated by 08.2077 or 08.1961(S26Hs) to produce 26-OH-Furostanol, and the C26- hydroxyl (OH) group of 26-OH-Furostanol can be further modified by GTs/ GHs to form a glycosidic bond to produce furostanol-type saponins. The modified glycosyl group can also be broken by glycoside hydrolases to form C26-OH again to form other saponins. C26-OH and C22-OH of Furostanol can be oxidized at first, then cyclize to produce two chemical isomers of yamogenin and diosgenin respectively, the former is further modified by C5R to produce sarsasapogenin, which is the steroidal aglycone of 5 saponins which were detected in the metabolites data sets. After that, the 3β-OH group of all saponins is modified with one more sugar group by GTs including S3GTs to get different types of steroidal saponins in asparagus. The sugar group modification will increase the solubility of steroidal saponins in water leading to high concentrations in asparagus organs especially in Rs to respond to abiotic and biotic stresses for adaptation to the environment for perennial survival.