Quality control and assemblies
Two cDNA libraries were sequenced using Illumina technology, respectively. A total of 25 Gb was obtained with Q20 of 97.34% and Q30 of 90.49%. After removing adaptor sequences and the low-quality reads, 119,604,818 and 126,274,844 clean reads were generated from the rhizome and leaf, respectively. Then all clean reads were pooled up together for Trinity assembly. A total of 295,725 transcripts (N50 length 778) were generated and the longest transcripts in its cluster were treated as unigenes, which got 248,454 unigenes (N50 length 468). Trans Decoder software was used to identify possible coding regions within Trinity transcriptome assemblies, which got 72,363 (N50 length 1041) coding regions.
Functional annotation
Both of them were uploaded to eggNOG website to perform function annotation. From the perspective of annotating functional proteins, the sequence which treats the longest transcripts in its cluster as unigenes, only 17762 sequences hit eggNOG database. By contrast, the predicted coding sequencing performed better, hit 61448 sequences in the database. The followed analysis is based on the latter result.
Differential expression gene analysis
According to the definition of DEGs, there were a totally of 1,708 unigenes expressed differently between two tissue, 1,246 of which were down-regulated and 462 unigenes were up-regulated. The first ten differentially expressed genes were labeled (figure 2). In order to find the most important key genes related to the synthesis of active metabolites of Ligusticum chuanxiong, the top ten up- and down-regulated differentially expressed genes were marked in the volcano map. It is very interesting to note that among the up-regulated genes, there are three genes with the most significant differences, including two Cupin (DN72188_c1_g1_i1, DN72188_c1_g1_i2) and a BURP domain-containing protein (DN99636_c2_g1_i1). The Cupin family is a family of multi-functional proteins within which several allergic proteins have been identified. However, most of the cupin allergenic proteins are pea globulin and soy globulin. BURP domain-containing protein is a plant-specific family of proteins with multiple protein functions involving in the development and expression in embryogenesis. In the three most significant down-regulated genes, there were two Ribulose bisphosphate carboxylase activase (RCA) (DN73345_c0_g1_i2, DN73345_c0_g1_i1) and an Involved in Biosynthesis of the Thiamine precursor Thiazole (DN63265_c0_g1_i2). RCA is a Protein that binds adenosine 5'-triphosphate (ATP), ribonucleotide adenosine that carries three phosphate groups esterified to the sugar moiety. It is the cell's source of energy and phosphate.
Gene Ontology analysis
Most of the annotated DEGs were enriched in chloroplasts and photosynthesis, which was due to tissue specificity. Notably, some of the DEGs were enriched in biological processes, biosynthesis, IGE affinity showing the highest enrichment, followed by glutathione transferase activity, immunoglobulin affinity, IGE affinity and immunoglobulin affinity all showing very high enrichment, which is related to some storage proteins produced by plants that can induce high immunity. Allergy is one of the causes of autoimmune reactions, therefore, the immune-related side effects in the application of Ligusticum chuanxiong should also be taken into consideration. Among them, it is worth noting that the activity of phenylalanine aminolytic enzyme, one of the important synthetic enzymes in the ferulic acid synthesis pathway, shows a relatively high enrichment (Figure 3). In terms of molecular function, monooxygenase and oxidoreductase activities are the most prominent. Plasmodial spheres, vesicle-like cavities and chloroplast membranes are the major cellular components.
Clusters of Orthologous Groups analysis
Overall, 1593 DEGs were assigned to COG classifications (Figure 4). Regardless of the most function unknown category (448, 28.1%), the following categories were posttranslational modification, protein turnover, chaperones (136, 8.53%), carbohydrate transport and metabolism (126, 7.90%), signal transduction mechanisms (1.14, 7.16%), secondary metabolites biosynthesis, transport and catabolism (95, 5.96%), energy production and conversion (88, 5.52%). Please see the attachment for the rest categories (Supplementary Table S2). Most of the genes in the COG database are genes of unknown function, and the role of these genes in plant biosynthetic and metabolic networks is still unknown to us. What interests us is that some genes were enriched in categories where posttranslational modifications, protein turnover, chaperones showed high enrichment in categories, and these proteins are essential biological processes for the completion of life activities in chuanxiong rhizome. We also found a high enrichment of carbohydrate transport and metabolism and secondary metabolites biosynthesis, transport and catabolism, which is consistent with the fact that chuanxiong produces a variety of active metabolites and Storage of nutrients is closely related.
KEGG pathways analysis
To explore the specification of biological pathways which were active in the root and leaf of L. chuanxiong, 497 DEGs were assigned to KEGG pathways (figure 5). The top 14 pathways with the highest enrichment factor were shown in KEGG enrichment analysis. The most representative pathway where DEGs enriched is phenylpropanoid biosynthesis[ko00940] (37, 7.44%), photosynthesis[ko00195] (27, 5.43%), glycolysis/gluconeogenesis[ko00010] (25, 5.03%), carbon fixation in photosynthetic organisms[ko00710] (24, 4.82%), starch and sucrose metabolism[ko00500] (24, 4.82%), glyoxylate and dicarboxylate metabolism[ko00630] (22, 4.43%), porphyrin and chlorophyll metabolism[ko00860] (20, 4.02%), tryptophan metabolism[ko00380] (18, 3.62%). And also, the pathway about Sesquiterpenoid and triterpenoid biosynthesis[ko00909] (10, 2.01%) were enriched. The phenylpropanoid biosynthesis was the most diverse and highly enriched among the KEGG pathway, which may be due to the tissue specificity of Ligusticum chuanxiong. In addition, phenylpropanoid biosynthesis was the most differentially enriched pathway in the KEGG pathway, which may be due to the tissue specificity of Ligusticum chuanxiong. Phenylpropanoids have been shown to be an important intermediate in the pathway of the synthesis of the active metabolite ferulic acid of Ligusticum chuanxiong.
BUSCO analysis
To test the transcriptome assemblies for completeness, a survey on the conserved orthologous genes was done using BUSCO (Table 1).
Microsatellite discovery and testing
The microsatellite discovery of the unigenes sequences found 3394 microsatellite loci in 72,363 sequences. (Supplementary Table S1) Among them, 499 SSRs represented microsatellite loci with mononucleotide motifs, 966-dinucleotide, 1,891- trinucleotide, 20-tetranucleotide, 5-pentanucleotide, and 13-hexanucleotide motifs. As for the frequency of the classified repeat types, A/T represents mononucleotide motif, AG/CT represents dinucleotide motif, ATC/ATG represents trinucleotide motif, followed by AAG/CTT. From the analysis of the composition base types of the repeat units, a total of 34 different composition repeat units were found, including 2 mononucleotide repeats, 4 dinucleotide repeats, 10 trinucleotide repeats, 5 tetranucleotide repeats, 5 pentanucleotide repeats, and 8 hexanucleotide repeats. For frequency of classified repeat types, A/T represents mononucleotide motif, AG/CT represents dinucleotide motif, ATC/ATG represents trinucleotide motif, followed by AAG/CTT. Among mononucleotides G/C is less (32, 0.94%) and A/T is most (467, 13.76); among dinucleotides CG/CG is least (5, 0.15%) and AG/CT is most (697, 20.54%) and most in total nucleotide repeats; among trinucleotides ACG/CGT repeats are least (25, 0.74%). ATC/GAT repeats were the most frequent (428, 12.61%); tetranucleotide, pentanucleotide, and hexanucleotide numbers each had fewer repeated units, with ACAT/ATGT repeats most frequent in tetranucleotides (13, 0.38), less frequent in pentanucleotides, and ACTCAT/AGTATG repeats most frequent in hexanucleotides (3, 0.09%).