Transcriptome Analysis of Cultivated and Wild Ginseng in Different Growth Years Revealed the Regulatory Mechanism of Ginsenosides


 Background: As a famous Chinese medicine, ginseng has been used in the world for nearly 5,000 years. Wild ginseng is endangered due to environmental damage. Thus, cultivated ginseng is developed to replace wild ginseng. The morphological and physiological characteristics of both wild ginseng and cultivated ginseng change during growth, and the mechanism of this change is not yet understood. Results: This study performed transcriptome sequencing on the roots, stems and leaves of cultivated ginseng and wild ginseng with different growth years, exploring the effect of growth years on gene expression in ginseng. The number of DEGs in cultivated ginseng is more than that in wild ginseng. Based on the weighted gene co-expression network analysis, we found that the growth years significantly affected the gene expression of MAPK signaling pathway - plant and terpenoid backbone biosynthesis pathway in cultivated ginseng, but had no effects in wild ginseng. Furthermore, the growth years had significant effects on the genes related to ginsenoside synthesis in cultivated ginseng, and the effects were different in the roots, stems and leaves. However, it had little influence on the expression of genes related to ginsenoside synthesis in wild ginseng and no effect on leaves. These results showed wild ginseng was less affected by growth years than cultivated ginseng. Furthermore, HMGR, SS, DXS, DS, IspF, AACT, CYP450 and UGTs were related with MYB, NAC, AP2/ERF, bHLH and WRKY transcription factors. Growth years may regulate genes for ginsenoside synthesis by influencing these transcription factors, thereby affecting the content of ginsenosides.﻿Conclusions: This study complemented the gaps in the genetic information of wild ginseng in different growth periods and different tissues and provided a new insight into the mechanism of ginsenoside regulation.

The content of ginsenosides exhibited various among different tissues and growing years for ginsengs. Liu et al (2017) suggested the content of ginsenoside was different in various tissues, and it was signi cantly higher in roots than that in the stems and leaves of ginseng [14]. It is generally believed that the longer the ginseng grows, the higher the content of ginsenosides, which has been reported by many previous studies [15]. For example, Zhang et al (2014) found that the content of ginsenoside risen with the increase of cultivation years by comparing the content of ginsenosides from one to thirteen years of ginsengs [16]. So far, transcriptome analysis has been widely used to study the complex biosynthetic pathways of ginsenosides and expression characteristics of related genes. Jayakodi et al (2015) found 38 encoding enzymes involved in ginsenoside biosynthesis pathway by transcriptome sequencing of oneyear-old and six-year-old ginseng roots [17]. In different tissues of ve-year-old ginseng, the expression level of the MEP pathway genes were similar to those of the MVA pathway gens in roots, but higher in leaves [18]. However, the effects of different tissues and growth years of ginseng on the expression levels of genes related to ginsenoside biosynthesis have not been studied in detail.
Moreover, due to endangered resource of wild ginseng resulted from excessive consumption, ginsengs are mainly obtained through cultivation via modern agricultural technologies [19]. Cultivated ginseng is planted in farmland that was once forested. Wild ginseng grows naturally in the forest, and the growth rate of wild ginseng is much slower than that of cultivated ginseng. Generally, cultivated ginseng faces great pressure from various plant diseases and does not usually survive beyond sixth year. In contrast, wild ginseng could grow up to hundreds of years under natural conditions [20][21][22]. These differences in growth environment and growth time may be related to differences in pharmacological activity in wild ginseng and cultivated ginseng. However, the difference in these mechanisms is currently unclear. Hence, in this study, high-throughput transcriptome sequencing was performed on roots, stems and leaves of wild ginseng and cultivated ginseng with different growing years to explore the effect of growing years and tissues on gene expression, especially the DEGs involved in ginsenoside biosynthesis and regulation mechanism of ginsenoside biosynthesis. It may provide information resources for improving transcriptome data of ginseng and also has profound signi cance for further research on ginseng cultivation breeding and related functional candidate genes.

Results
Overview of RNA-seq data A total of 33 individuals were sequenced with an average amount of clean bases per individual was 6.5 Gb. After ltering low-quality reads, the clean reads were mapped to the ginseng reference genome with an average mapping rate of 87.44%. We rst evaluated the variation in gene expression among biological replicates for each sample. The correlation values were high between them (Additional le 1: Fig. S1, r 2 > 0.80), except for between the two replicates in TSBT_R (Additional le 1: Fig. S1, r 2 = 0.56). However, considering that only two samples in the TSBT_R and the r 2 > 0.50. Therefore, all samples were used for subsequent analysis.

Differential expression analysis in three groups
The all ginsengs were divided into three comparison groups, including YNJY_vs_JYSH, TSBT_vs_PHQH and YONE_vs_YSWD. There were 18,923, 11,009 and 1,241 DEGs in YNJY_vs_JYSH, TSBT_vs_PHQH and YONE_vs_YSWD, respectively (Fig. 1). The number of DEGs in cultivated ginseng is higher than that in wild ginseng. Furthermore, the number of DEGs in YNJY_vs_JYSH was larger than that in TSBT_vs_PHQH. In addition, the numbers of DEGs in different tissues of three groups were also different.
In YNJY_vs_JYSH, the number of DEGs in roots were almost the same as that in leaves, while DEGs in stems were the least (6,392) (Fig. 1A). In TSBT_vs_PHQH, the number of DEGs in leaves was the highest (7,198), while DEGs in roots were the least (2,243) (Fig. 1B). In YONE_vs_YSWD, the number of DEGs in root was the highest (696), then DEGs in leaves were the least (181) (Fig. 1C).
Co-expression modules related to growing years of ginseng In WGCNA, the module eigengene (ME) is representative of the corresponding module's gene expression pro le correlated with a de ned trait (growing years). Several modules were signi cantly correlated with the growing years (p < 0.05). In YNJY_vs_JYSH, blue, royalblue and red modules were signi cantly correlated with growing years, and the correlation coe cient were 0.91, 0.9 and -0.96, respectively (p < 0.05) (Additional le 1: Fig. S3). In blue module, the enrichment pathways included Citrate cycle (TCA cycle) (ko00020), C5-Branched dibasic acid metabolism (ko00660) and Carbon metabolism (ko01200), and the most genes in these pathways were highly expressed in YNJY (Fig. 2). In red module, genes were enriched in protein processing in endoplasmic reticulum (ko04141) (Fig. 3A). Differential clustering of genes involved in this pathway showed that most of the genes were expressed at higher levels in YNJY than in JYSH (Fig. 3B). In royalblue module, the enriched pathway was MAPK signaling pathway-plant (ko04016) (Fig. 3C), in which genes were up-regulated in JYSH (Fig. 3D).
In TSBT_vs_PHQH, three modules were signi cantly correlated with growing years, namely white, grey60 and skyblue3 modules (p < 0.05). However, only Alpha-linolenic acid metabolism (ko00592) and terpenoid backbone biosynthesis (ko00900) were enriched in grey60 module (Fig. 4A), and the expression levels of most genes in these pathways were lower in PHQH than that in TSBT (Fig. 4B, C). Nevertheless, there were no enriched pathways in the signi cant related modules (orangered4 and salmon4 modules) for YONE_vs_YSWD (Additional le 1: Fig. S3).
Identi cation of DEG related to ginsenoside biosynthesis The expression level of genes in the ginsenoside biosynthesis pathway were further inspected among different tissues and sample. These genes were signi cantly various among cultivated ginseng with different growth years. In YNJY_vs_JYSH, most DEGs involved in the ginsenoside biosynthetic pathway showed higher expression levels in JYSH than in YNJY. Only genes encoded MVD was highly expressed in three tissues of YNJY. The numbers of DEGs involved in the ginsenoside biosynthesis pathway were more in roots and leaves than in stems ( In TSBT_vs_PHQH, the numbers of DEGs associated with ginsenoside biosynthesis were signi cant more in leaves than that in roots and stems. In roots and stems, the expression level of six enzymes (ISPE, ISPH, SE, β-As, CYP450 and UGTs) were different, only SE and CYP450 showed higher expression levels in PHTS than TSBT, while the other DEGs were highly expressed in TSBT, including ISPE, ISPH, β-As and UGTs. In the leaves, sixteen enzymes (AACT, HMGS, HMGR, DXS, ISPD, ISPE, ISPF, ISPG, ISPH, FPS, SS, SE, β-As, DS, CYP450 and UGTs) were highly expressed in TSBT than PHQH (Fig. 7).
In contrast, there were fewer DEGs for ginsenoside biosynthesis in the roots, stems and leaves of wild ginseng group (YONE_vs_YSWD) compared with cultivated ginseng comparison groups. Five enzymes, HMGS, DXS, SE, CYP450 and UGTs were highly expressed in the root of YSWD, however, only ISPH was highly expressed in the stem of YONE (Fig. 7). The result indicated that the growth years had little effect on ginsenoside synthesis in wild ginseng.

Validation of DEGs by qRT-PCR
To ensure the reliability and accuracy of the results, we randomly selected fteen DEGs of different ginseng tissue samples to verify the relative expression levels by qRT-PCR analysis. The validation results showed that the expression trends of examined genes were consistent with that derived from RNA-seq (Additional le 1: Fig. S4). The correlation between RNA-seq results and qRT-PCR experimental results was signi cant and the correlation coe cient was 0.864 by spearman test (p < 0.01).

Discussion
Expression pro les of the DEGs in different growth years of cultivated ginseng Ginseng is a kind of perennial herbaceous plant, its morphology and physiology changes greatly during the growth process. Here we describe dynamic gene expression pro les of cultivated and wild panax ginseng with different growth years. In the WGCNA analysis of YNJY_vs_JYSH, the genes in blue module related to growth years were enriched in citrate cycle (TCA cycle) (ko00020), C5-Branched dibasic acid metabolism (ko00660) and Carbon metabolism (ko01200), and the most genes in these pathways are highly expressed in YNJY. These pathways are common primary metabolic pathways in plants. This might be that one-year-old cultivated ginseng (YNJY) was in a stage of rapid growth and development and requires more energy, so the primary metabolic pathway was more active. The morphological and physiological components of cultivated ginseng grown to six years have been basically matured, so the energy required for growth is decreased.
The genes in royalblue module related to growth years were enriched in MAPK signaling pathway -plant in YNJY_vs_JYSH. This pathway is involved in plant disease resistance and abiotic stress response [23][24][25]. In Arabidopsis thaliana, overexpression of MKK2 gene increased MPK4 and MPK6 activities, and increased plant frost resistance and salt tolerance [26]. Moreover, the most genes of this pathway were expressed a higher level in JYSH, indicating that six-years-old ginseng (JYSH) may have higher stress resistance. When cultivated ginseng increases with planting years, it faces various plant diseases and great environmental pressure [19]. Hence, the expression of genes related to stress resistance pathways is continuously increased to resist the bad environment [27].
For ginseng that has been grown for more than six years, we found the genes in the growth age-related modules were mainly related to the terpenoid backbone biosynthesis in the WGCNA analysis of TSBT_vs_PHQH. And the expression level of most genes in terpenoid backbone biosynthesis was higher in TSBT than in PHQH. Terpenoid backbone biosynthesis is a key step in the synthesis of ginsenosides for ginseng [28]. Previous studies have reported 15 years represents a turning point in the maturation of ginseng growth and the accumulation of these bioactive components in ginseng, moreover, most ginsenosides accumulated during years 5-15 and decreased after 15 years [29]. This might be related to the low expression of terpenoid backbone biosynthesis genes in ginseng that had been grown for more than 15 years (PHQH), resulting low ginsenoside content.
Effects of growth years on the expression levels of genes related to ginsenosides synthesis in cultivated ginseng Ginsenosides have been used in the treatment of various diseases [13]. In our study, we explored the effect of growth years on genes related to ginsenoside biosynthesis in cultivated and wild ginseng. In YNJY_vs_JYSH, most genes involved in the ginsenoside biosynthetic pathway showed higher expression levels in JYSH than in YNJY. Previous studies have suggested that the contents of ginsenosides increased with cultivation ages in cultivated ginseng [30,31]. The high expression level of these DEGs might contribute to the accumulation of ginsenosides in six-year-old ginseng (JYSH). Moreover, the amounts of secondary metabolites produced by plants for defense purposes increases under environmental stress conditions such as nutrient de ciency [32,33]. Currently, ginsenosides, as secondary metabolites, have been shown to play an important physiological role to protect plants from pathogens [34]. When ginseng grows to six years old, it faces greater environmental pressure. Therefore, it is more necessary to synthesize ginsenoside to cope with many biotic or abiotic stresses [35,36].
Interestingly, we only found the genes encoding MVD was expressed higher in YNJY than in JYSH. MVD are key enzymes in the MVA pathway, and the low expression levels of MVD may be in uenced by the downstream products of MVA pathway and secondary metabolites [37,38]. Moreover, the number of DEGs involved in the ginsenoside biosynthetic pathway was more in roots and leaves than in stems for YNJY_vs_JYSH, suggesting the in uence of growth years on the genes related to ginsenoside synthesis was greater in roots and leaves than in stems. Schramek et al (2014) proposed that ginsenosides were transported from leaves to roots [39]. More recently, Xue et al (2019) showed that genes related to ginsenoside synthesis pathway were expressed in living cells, but not in vascular and xylem cells of ginseng [18]. Therefore, we consider that the content of ginsenoside synthesis in roots and leaves is more than in stems, and roots and leaves are greatly affected by grown years.
For TSBT_vs_PHQH, the most DEGs related to ginsenoside biosynthesis were highly expressed in TSBT than in PHQH, such as ISPE, ISPF and UGTs et al. When the ginseng grows more than six years old, the plant becomes ligni ed [40]. Thus, the activity of genes in related metabolic pathways decreases, which leads to a decrease in the expression of certain genes. Furthermore, the number of DEGs related to ginsenoside biosynthesis was more in leaves than in stems and roots. The different expression pro les of genes related to ginsenoside synthesis in ginseng roots, stems and leaves suggested that there might be signi cant differences in ginsenosides or other triterpenoid precursors synthesized in different tissues of ginseng. This differentiation probably leaded to the accumulation of different types of ginsenosides in the roots, stems and leaves, thus explaining tissue-speci c ginsenosides production. Kim et al (2015) indicated the main ginsenosides in root and leaf were Rb1, Rb2, Rc, Rd (PPD-type) and Re, Rg1, Rf (PPT-type), respectively [34]. Furthermore, Kim et al (2018) also showed that the content of most ginsenosides, Rb2, Rg1, Rf, F1, F2, Rf and Rh1, were obviously different in leaves of 1, 4 and 6-years old ginsengs [41].

Effects of growth years on the expression levels of genes related to ginsenosides synthesis in wild ginseng
The germplasm resources of wild ginseng are scarce, so there are few studies on gene expression level of wild ginseng. In our study, we found the numbers of DEGs in YNJY_vs_JYSH and TSBT_vs_PHQH was more than that in YONE_vs_YSWD, suggesting gene expression levels were less affected by growth years in wild ginseng compared with cultivated ginseng. Wild ginseng grows more slowly than cultivated ginseng and can be grown in nature for hundreds of years, so it may be less affected by a few short decades. Furthermore, ginsenoside biosynthesis related genes, including HMGS, DXS and SE were more A regulated gene network for ginsenoside biosynthesis Biosynthesis of particular compounds is in uenced by a variety of factors by in uencing transcription factors (TFs), which regulate the expression of relevant genes [42,43]. In this study, we used WGCNA to screen the growth age-related modules, and found genes related to ginsenoside biosynthesis and TFs in four modules (blue, royalblue, grey60 and orangered4 MYB family is the most widely distributed and powerful transcription factor. The MYB domain typically consists of one to four incomplete repeats, each containing approximately 52 amino acid residues [44]. A ginseng PgMYB2, was con rmed to positively regulate the expression of DS in ginseng [45]. In our study, MYB was also positively correlated with DS.  Deng et al. (2017) also found that the expression levels of DS and SS and total saponins content in PnERF1 transgenic Panax notoginseng cell lines were higher than those in non-transgenic cell lines [47]. Moreover, WRKY was highly related to SS, SE, CYP450 and UGTS. WRKY transcription factors mainly exist in plants and play important roles in many biological processes [48]. A PqWRKY1 transcription factor isolated by transcriptomic sequencing from Panax quinquefolius enhances the content of saponins by regulating the transcription of some genes related to biosynthesis of saponins [49]. There were 8 genes encoded bHLH, which were signi cant positive correlation with genes related to ginsenoside biosynthesis, such as UGTs, CYP450, SS, DS, IspF, AACT and SE et al. In Panax notoginseng, the overexpression of transcription factor PnbHLH caused the increase of saponins [50]. In YONE_vs_YSWD, only bZIP was highly positively related to IspE in orangered4 module. However, there are few studies on the effect of bzip on the synthesis of ginsenosides in Panax genus, which required subsequent yeast one-hybrid or two-hybrid to verify its function. These result suggested growth years might regulate the expression of ginsenoside synthesis genes by regulating TFs (MYB, NAC, AP2/ERF, bHLH, bZIP and WRKY), thereby affecting the content of ginsenosides.

Conclusions
In this study, through transcriptome analysis of wild and cultivated ginseng with different growth years, we found the DEGs were much more in cultivated ginsengs than that in wild ginseng. WGCNA analysis identi ed modules obviously affected by growth years in cultivated ginseng, and genes in these modules were signi cantly enriched in MAPK signaling pathway -plant and terpenoid backbone biosynthesis pathway. However, there was no enrichment pathway for genes in modules signi cantly related to growth years in wild ginseng. Furthermore, the growth years had a signi cant effect on the gene expression of ginsenoside synthesis in cultivated ginseng, and the degree of in uence varies in different tissues. However, it had little effect on genes related to ginsenoside synthesis in wild ginseng and no effect on leaves. These results indicated that the in uence of growth years on cultivated ginseng was greater than that of wild ginseng. In addition, HMGR, SS, DXS, DS, IspF, AACT and HMGS were highly correlated with MYB, NAC, AP2/ERF, bHLH, bZIP and WRKY. Growth years might regulate genes for ginsenoside synthesis by in uencing these TFs, then it affected the content of ginsenosides. This study provides comprehensive transcriptional information for wild and cultivated ginseng at different growth stages, and also supplies new insights for understanding the metabolic regulation of ginsenosides.

Plant materials and preparation
One-and six-year-old cultivated ginsengs were collected from Jingyu county, Jilin province of China (marked as YNJY and JYSH, respectively), and the one-and twenty-year-old wild ginsengs were collected from Korean Autonomous County of Changbai, Jilin province (marked as YONE and YSWD, respectively).
In addition, we sampled six-year-old cultivated ginsengs from Taishang town of Jilin province (marked as TSBT), meanwhile, we also collected three samples of more than fteen-year-old cultivated ginsengs from the same location (marked as PHQH) (Additional le 1: Table S1).
All cultivated ginseng and wild ginseng samples were collected in August 2018. After cleaning with distilled water on the spot, the roots, stems and leaves of all ginseng were separately cut into small pieces quickly and packaged in aluminum foils. Then these samples are immediately frozen in the liquid nitrogen and stored at -80℃. According to sampling location, these ginsengs were divided into three comparison groups, namely YNJY_vs_JYSH, TSBT_vs_PHQH and YONE_vs_YSWD.

RNA extraction, cDNA library construction and sequencing
Total RNA was extracted from all tissues of each sample using TRIzol (invitrogen, USA) and digested with DNase I (Waryong, China). Finally, the RNA was dissolved by adding 50µl DEPC-treated water. After con rmed the integrity and quality, the total RNA was immediately stored at -80℃ for sequencing. The RNA samples with high purity (28S/18S ≥ 1.4) and high integrity (RIN ≥ 7.0) were employed for cDNA library construction.
Then, library construction and sequencing were performed by the Beijing Genomics Institute (BGI, China).
Brie y, Oligo(dT)-attached magnetic beads were used to isolate mRNA. Puri ed mRNA was broken into short fragments by mixing with fragmentation buffer at appropriate temperature. The short fragments were puri ed and resolved with the Elution buffer for end repair and single nucleotide A addition, and then connected with adapters. Target fragments were selected as templates for PCR ampli cation to construct the cDNA sequencing libraries. Each cDNA library was sequenced by an Illumina HiSeq X-ten platform with paired end (PE) reads of 150 bp. The clean data were deposited in the NCBI (National Centre for Biotechnology Information) under the accession numbers PRJNA762437.

Data ltering, mapping and gene expression pro ling
To obtain high-quality clean reads, the adaptor sequences and the reads with unknown nucleotides larger than 5% or with low-quality nucleotides (≤10) larger than 20% were ltered out. Then the clean reads were mapped to the ginseng reference genome v1 (Ginseng Genome Database) using Bowtie2 (v2.2.5) [51], followed by gene expression levels of each sample were calculated using RSEM with default parameters, and the FPKM (Fragments Per Kilobase of transcript per Million mapped reads) value was used to quantify gene expression levels [52]. Pearson correlation coe cients (r 2 ) of expression levels were calculated between each pair of ginseng samples using R package (v4.2.0), and samples with r 2 < 0.5 across samples were removed. Differential expression analysis of each comparison group was performed according to the method described by DESeq2 (v1.16.1) [53]. The genes with a threshold of |log2(fold change)| ≥ 1 and p ≤ 0.05 were identi ed as differentially expressed genes (DEGs).

Weighted gene co-expression network construction
Weighted gene co-expression network analysis (WGCNA) was performed to construct co-expression networks of genes that were differentially expressed in different growth years of ginseng using the R package (v4.2.0) [54]. After background correction and quantile normalization, the all genes with top 50% variant in the variance analysis for WGCNA analysis. Firstly, the hclust function was used to conduct cluster analysis on the expression data of the samples. Average was selected as the clustering method to remove the outlier samples. Then the function pickSoftThreshold was used to lter Soft Threshold

qRT-PCR validation
Fifteen DEGs were randomly selected for qRT-PCR to validate the RNA-seq data. Total RNAs as described for RNA-seq were used for qRT-PCR. Total RNA from each sample was reverse transcribed into cDNA using PrimeScript TM RT reagent Kit with gDNA Eraser (TaKaRa, Japan) following manufacturer's protocol.
Gene-speci c primers were designed by Primer Premier 6 software [57]. The qRT-PCR reactions were performed in a nal volume of 10µl using 2x SYBR Green

Acknowledgments
We thank reviewers and editors for their constructive comments that greatly improved our manuscript. We also thank Wei Zhang for helping to collected the samples in this study. Con ict of Interest    Gene co-expression subnetwork in the (A) blue, (B) royalblue, (C) grey60 modules. Figure 6