Transcriptome Comparison of Rhizome and Leaf of Rhizome of Ligusticum Chuanxiong on the Effects of Active Metabolites

The rhizome of Ligusticum chuanxiong is a well-known traditional Chinese medicine. The major bioactive compounds found in Ligusticum chuanxiong are Ferulic Acid with 4- hydroxyl -3- methoxybenzene acrylic acid. However, the epigenetic mechanisms and biosynthetic pathways of active metabolites based on the transcriptomic analysis of Ligusticum chuanxiong rhizome have remained unclear. To clarify the key genes in rhizome formation and active metabolite synthesis pathways of Rhizome of Ligusticum chuanxiong, we focus on the change between the investigate the rhizome and leaf of Ligusticum chuanxiong through transcriptomic analysis. The RNA extracted from rhizomes and of Ligusticum chuanxiong were sequenced using the Illumina platform, yielding 295,725 unique transcripts with N50 of 778 bp. The against annotation databases including (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) to identify differentially expressed (DEGs).

Through the transcriptome analysis, we have identi ed the regulatory mechanisms and epigenetic in uences in the secondary metabolic biosynthetic pathway of Ligusticum chuanxiong. Among them, DEGs analysis revealed ferulic acid and other secondary metabolites are some of the major products in the biosynthetic pathway of Ligusticum chuanxiong.

Background
Ligusticum chuanxiong (called Chuanxiong, CX in Chinese), the dried rhizome of Ligusticum Chuanxiong Hort, is mainly distributed in Sichuan and cultivated and bred in Yunnan, Guizhou, Guangxi and other regions [1,2] . It shows an ideal therapeutic effect on cardiovascular and cerebrovascular diseases, and is also used as a preventive medicine to supply the vitamins necessary to maintain tness [3] . In clinical practice, Ligusticum chuanxiong is widely used to treat thromboembolism heart diseases [4] , headache [5,6] , wind-damp [7] . Recent studies have found that the active metabolites of Ligusticum chuanxiong including ligustrazine [8] , ferulic acid [9] , ligustilide [10] , polysaccharide and the others [11] . Ferulic acid, a kind of phenolic acid, is one of the most important active ingredients in Ligusticum chuanxiong [12] . Previous studies provided that ferulic acid has strong antioxidant properties and has a good function scavenge hydrogen peroxide, hydroxyl radicals, superoxide radicals and peroxynitrite [13] . In terms of the application of disease treatment, ferulic acid has antibacterial and anti-in ammatory properties [9] , which can prevent coronary heart disease [14] , protect ovaries [15] and inhibit liver damage [16] , and even play a role in cancer prevention [17,18] . Due to the special therapeutic effects of ferulic acid [19] , much previous research has been done on the organic synthesis of ferulic acid, and a large number of ferulic acid derivatives have been synthesized and demonstrated their activity [20,21] .
However, there are few reports on the interaction between genetic information and functional phenotypes based on transcriptome data analysis of Ligusticum chuanxiong. With the development of the Nextgeneration sequencing technology [22] , whole transcriptome sequencing plays a crucial role in deciphering genome structure and function, identifying genetic networks underlying cellular, physiological, biochemical and biological systems and establishing molecular biomarkers that respond to diseases, pathogens and environmental challenges [23] . Unlike the genome, the transcriptome is data that shows the expression level of all genes in a speci c organ, tissue or cell at a speci c stage of development. Thus, it is especially suitable for the discovery and identi cation of genes related to the medicinal function of plants and the study of the regulation mechanism of gene expression [24] .
In our study, we collected the rhizome and leaf tissues of Ligusticum chuanxiong and extracted the total RNA. We used HiSeq2000 high-throughput sequencing technology to conduct whole transcriptome sequencing on the leaf and rhizome of Ligusticum chuanxiong, and a total of 25 Gb of raw data was obtained. The clean reads after ltering were assembled and annotated against public databases to assess their differences in gene function of Ligusticum chuanxiong. Furthermore, we provided a perspective on the genetic intervention strategy and molecular mechanism of Ligusticum chuanxiong.

Plant materials
Ligusticum chuanxiong was collected from a cultivating in Pengzhou, Sichuan province, China. The leaf and rhizomes of Ligusticum chuanxiong were labeled, cleaned and stored in cryostat tubes, sealed and packed in liquid nitrogen, transported to the laboratory and stored at -80°C in the refrigerator for RNA extraction. Each experiment was repeated three times ( gure 1).

RNA extraction
Precisely weigh 1g of rhizome and leaf tissue of Ligusticum chuanxiong, ground into powder with liquid nitrogen, and extract total RNA according to TRIzol protocol [25] .RNA quality and integrity were assessed using the Nanodrop based on the absorbance ratios at 260/280 and 260/230 numbers greater than 2.0.
All samples selected for sequencing had an RNA integrity number over 8. Total RNA was restored in the refrigerator at the temperature of -80℃. Each experiment was repeated three times.

RNA sequencing
Enrichment of poly(A) mRNA using magnetic beads with Oligo(dT) Synthesize cDNA strands to construct a library. The prepared library was sequenced on the Illumina platform to obtain the raw reads. The raw reads were cleaned by removing adaptor sequences and the low-quality reads (more than 40% bases with Q-value <30 or more than 5% Ns in the read), which is performed by fastp (version 0.20.0).

Transcriptome de novo assembly
The clean reads were then assembled using Trinity (v2.9.1). The longest transcript of each cluster was regarded as the unigenes for further downstream analysis. The TransDecoder (version 5.5.0) was used to predict the ORFs for differentially expressed gene analysis, working with the unigene mentioned previously. The quality and completeness of assembly were assessed by BUSCO (version 3.0.1). The databases virdiplantae_odb10 and embryophyta_odb10 were used as the lineage-speci c dataset.
Identi cation of the microsatellite loci was performed by perl script misa (version 2.0).

Transcriptome functional annotation
To predict putative gene function, genes were annotated by online functional annotation tool eggNOGmapper, which is a web-tool for fast functional annotation of novel sequence, related to KEGG, GO and other databases. The result of eggNOG-mapper was parsed by homemade scripts. The R package cluster Pro ler (version 3.16.0) was used for enrichment clustering.

Differentially expressed genes (DEGs)
Relative gene expression levels of each unigene were determined by calculating the sum of the reads mapping to each contig, and then normalized by converting the reads counts to (reads per kilobase per million reads, or RPKM) across the transcriptome. The DEGs were obtained by using R package edgeR (version 3.30.3). A gene was considered as DEG if its expression level in one tissue was signi cantly higher than that in the other based on the Fold Change (FC) ratio >= 3 and p <0.01. All the data were corrected according to the Benjamini-Hochberg formula.

Data Records
The ltered original RNA sequencing reads have been deposited at the NCBI Sequence Read Archive under the BioProject study accession PRJNA672790 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA672790). Functional annotation of the Trinity transcriptome assembly is available as a supplementary gff le at gshare (Functional annotation of the Trinity transcriptome assembly). Summary of the microsatellite loci identi ed by the MISA tool in the Trinity assembly based on the ltered reads is available as a supplementary excel le at gshare (Summary of the microsatellite loci identi ed by the MISA tool in the Trinity assembly based on the ltered reads). PCR primers designed for these microsatellite loci using Primer3 online tool are available as a supplementary excel le at gshare (PCR primers designed for the microsatellite loci identi ed by the MISA tool in the Trinity assembly based on the ltered reads using Primer3 online tool).

Quality control and assemblies
Two cDNA libraries were sequenced using Illumina technology, respectively. A total of 25 Gb was obtained with Q20 of 97.34% and Q30 of 90.49%. After removing adaptor sequences and the low-quality reads, 119,604,818 and 126,274,844 clean reads were generated from the rhizome and leaf, respectively.
Then all clean reads were pooled up together for Trinity assembly. A total of 295,725 transcripts (N50 length 778) were generated and the longest transcripts in its cluster were treated as unigenes, which got 248,454 unigenes (N50 length 468). Trans Decoder software was used to identify possible coding regions within Trinity transcriptome assemblies, which got 72,363 (N50 length 1041) coding regions.

Functional annotation
Both of them were uploaded to eggNOG website to perform function annotation. From the perspective of annotating functional proteins, the sequence which treats the longest transcripts in its cluster as unigenes, only 17762 sequences hit eggNOG database. By contrast, the predicted coding sequencing performed better, hit 61448 sequences in the database. The followed analysis is based on the latter result.

Differential expression gene analysis
According to the de nition of DEGs, there were a totally of 1,708 unigenes expressed differently between two tissue, 1,246 of which were down-regulated and 462 unigenes were up-regulated. The rst ten differentially expressed genes were labeled ( gure 2). In order to nd the most important key genes related to the synthesis of active metabolites of Ligusticum chuanxiong, the top ten up-and downregulated differentially expressed genes were marked in the volcano map. It is very interesting to note that among the up-regulated genes, there are three genes with the most signi cant differences, including two Cupin (DN72188_c1_g1_i1, DN72188_c1_g1_i2) and a BURP domain-containing protein (DN99636_c2_g1_i1). The Cupin family is a family of multi-functional proteins within which several allergic proteins have been identi ed. However, most of the cupin allergenic proteins are pea globulin and soy globulin. BURP domain-containing protein is a plant-speci c family of proteins with multiple protein functions involving in the development and expression in embryogenesis. In the three most signi cant down-regulated genes, there were two Ribulose bisphosphate carboxylase activase (RCA) (DN73345_c0_g1_i2, DN73345_c0_g1_i1) and an Involved in Biosynthesis of the Thiamine precursor Thiazole (DN63265_c0_g1_i2). RCA is a Protein that binds adenosine 5'-triphosphate (ATP), ribonucleotide adenosine that carries three phosphate groups esteri ed to the sugar moiety. It is the cell's source of energy and phosphate.

Gene Ontology analysis
Most of the annotated DEGs were enriched in chloroplasts and photosynthesis, which was due to tissue speci city. Notably, some of the DEGs were enriched in biological processes, biosynthesis, IGE a nity showing the highest enrichment, followed by glutathione transferase activity, immunoglobulin a nity, IGE a nity and immunoglobulin a nity all showing very high enrichment, which is related to some storage proteins produced by plants that can induce high immunity. Allergy is one of the causes of autoimmune reactions, therefore, the immune-related side effects in the application of Ligusticum chuanxiong should also be taken into consideration. Among them, it is worth noting that the activity of phenylalanine aminolytic enzyme, one of the important synthetic enzymes in the ferulic acid synthesis pathway, shows a relatively high enrichment (Figure 3). In terms of molecular function, monooxygenase and oxidoreductase activities are the most prominent. Plasmodial spheres, vesicle-like cavities and chloroplast membranes are the major cellular components.

Clusters of Orthologous Groups analysis
Overall, 1593 DEGs were assigned to COG classi cations ( Figure 4). Regardless of the most function unknown category (448, 28.1%), the following categories were posttranslational modi cation, protein turnover, chaperones (136, 8.53%), carbohydrate transport and metabolism (126, 7.90%), signal transduction mechanisms (1.14, 7.16%), secondary metabolites biosynthesis, transport and catabolism (95, 5.96%), energy production and conversion (88, 5.52%). Please see the attachment for the rest categories (Supplementary Table S2). Most of the genes in the COG database are genes of unknown function, and the role of these genes in plant biosynthetic and metabolic networks is still unknown to us. What interests us is that some genes were enriched in categories where posttranslational modi cations, protein turnover, chaperones showed high enrichment in categories, and these proteins are essential biological processes for the completion of life activities in chuanxiong rhizome. We also found a high enrichment of carbohydrate transport and metabolism and secondary metabolites biosynthesis, transport and catabolism, which is consistent with the fact that chuanxiong produces a variety of active metabolites and Storage of nutrients is closely related.

KEGG pathways analysis
To explore the speci cation of biological pathways which were active in the root and leaf of L. chuanxiong, 497 DEGs were assigned to KEGG pathways ( gure 5

BUSCO analysis
To test the transcriptome assemblies for completeness, a survey on the conserved orthologous genes was done using BUSCO (Table 1).

Discussion
The rhizome of Chuanxiong is used for its mature rhizome, which is mainly used for promoting blood circulation, dispelling wind and relieving pain [26] . The quality and content of chuanxiong rhizome grown in different regions have been found to be different, and the content of active ingredients of different plants in the same region has great variability and instability with the pharmacological effects and e cacy of chuanxiong rhizome [27] . Due to the importance of chuanxiong rhizome in traditional Chinese medicine, there have been increasing attempts to understand its physiological properties, epigenetic and molecular mechanisms. Therefore, we constructed whole transcriptomes of rhizomes and leaves of Ligusticum chuanxiong and investigated the mechanisms of its genetic properties and metabolites.
In this study, we collected tissues from the rhizomes and leaf of Chuanxiong rhizome in Pengzhou region for transcriptome sequencing. Functional annotation and differentially expressed genes (DEGs) analysis were mainly performed from the sequencing results. In the DEGs of leaf and roots of Ligusticum chuanxiong, we found that most of the DEGs were enriched in chloroplasts and photosynthesis, which was shown by data comparison to being due to tissue speci city. It is worth noting that the enrichment of DEGs class to a biological process, biosynthesis, IGE a nity presented the highest concentration, the second is the activity of glutathione transferase, immunoglobulin a nity, IGE a nity and immunoglobulin a nity are showing very high concentration, which related to plant to produce some storage proteins, these proteins can cause high immune reaction, such as allergy is one of the reasons. Therefore, we suggest that some immune-related side effects should also be paid attention to in the application of Chuanxiong. In the results of GO database analysis, there is a high enrichment of phenylalanine aminolytic enzyme in the biological processes, and the production of ferulic acid is closely related to phenylalanine aminolytic enzyme. Phenylalanine ammonia lyase catalyzes an important ratelimiting step [28] in the phenylalanine pathway, providing precursors such as cinnamic acid, vanillin acid and ferulic acid to a variety of secondary metabolites including bitter glycosides [29] .The most important active component of Chuanxiong is ferulic acid [30] , which is used by the Chinese Pharmacopoeia as the main indicator for the quality evaluation of these herbs [31] .
There are two main biological sub-pathways related to ferulic acid metabolism: one is the synthesis of caffeic acid and ferulic acid from phenylalanine via PAL and cinnamate-4-hydroxylase (C4H), and ferulic acid can be metabolized to 5-hydroxyferulic acid via ferulic acid 5-hydroxylase (F5H) [32] ; the second is the synthesis of caffeic acid and ferulic acid via 4-coumaroyl-coenzyme (4-Coumaroyl-coenzyme); and the second is the metabolism of ferulic acid via 4-Coumaroyl-coenzyme (4-Coumaroyl-coenzyme) [33,34] . The rst pathway can generate active substances such as caffeic acid and ferulic acid, which have antiin ammatory and immunomodulatory effects, and ferulic acid also has antithrombotic effects; the second pathway can generate non-pharmacological active substances such as pineal and lignin. Therefore, by inhibiting ferulic acid decomposing enzyme downstream of PB pathway through biomolecular technology, ferulic acid can be accumulated in excess without increasing the content of lignin (ash), thus enhancing the medicinal value of Ligusticum Chuanxiong. The key enzyme genes of the ferulic acid synthesis pathway were identi ed by transcriptome analysis, including four PAL genes and one cinnamoyl-coenzyme A-reductase (CCR) gene, two BDP genes, three late embryonic developmental enrichment protein genes and one Rop guanine nucleotide exchange factor gene, which are related to the formation of rhizome. These genes will be potential target genes for our future transgenic engineering or genetic modi cation of the herbal quality of Ligusticum chuanxiong.
In recent years, with the development of high-throughput sequencing technology and the reduction of the cost of second-generation sequencing as well as the update of SSR development software, SSRs for medicinal plants have been developed and applied [35] . Since SSRs have interspeci c variability, we were able to use some high-frequency SSRs as biomarkers for interspeci c kinship analysis of Ligusticum chuanxiong. We analyzed simple repeat sequences in chuanxiong and found 499 SSRs represented microsatellite loci with mononucleotide motifs. The trinucleotide repeats of Ligusticum chuanxiong SSR were dominated by ATC/GAT and AAG/CTT, which were basically consistent with the trinucleotide repeat characteristics of Salvia miltiorrhiza [36] . Therefore, SSR molecular markers are of great value in the study of the genetic diversity of Ligusticum chuanxiong.

Conclusion
In conclusion, we constructed a high-quality transcriptome of rhizome and leaf of Ligusticum Chuanxiong using the Illumina sequencing platform. By comparing transcriptome analyses, we identi ed candidate genes in these DEGs related to the active metabolite of chuanxiong rhizome, ferulic acid, for phenylalanine aminolytic enzymes as well as terpene synthesis and some other secondary metabolites. These candidate genes can provide a basis for future studies of the molecular mechanisms of rhizome formation, development and secondary metabolism, and will be useful in future clinical applications in Chinese medicine. Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Competing interests
The authors declare no competing interests.