Contents of anthocyanin in the leaves
In order to understand the mechanism of pigment formation in A. pseudosieboldianum leaves, we carried out qualitative analysis of anthocyanin components in the middle (M) and last stage (A) of leaf color transformation. According to our UPLC–Q–TOF–MS data five anthocyanins were identified (Fig. 1). They were Peonidin O-hexoside, Rosinidin O-hexoside, Cyanidin 3-O-glucoside, Cyanidin 3 5-O-diglucoside, and Pelargonidin 3-O-beta-D-glucoside The content of five anthocyanin metabolites were different during the middle stage (M) and last stage (A). Rosinidin O-hexoside and Pelargonidin 3-O-beta-D-glucoside were rarely found in leaves at two periods. Peonidin O-hexoside and Cyanidin 3 5-O-diglucoside, especially Cyanidin 3 5-O-diglucoside in the leaves were abundantand, and displayed significant differences in content change at two periods, meaning they may be the key substances for the final color of A. pseudosieboldianum.
Production statistics of sequencing data
In order to understand the molecular mechanism of color change in A. pseudosieboldianum leaves in autumn, sequencing was performed using the Illumina Hiseq 2500 (Additional file 1: Table S1). A total of 67.47 Gb of clean data was obtained from these sequencing results, and the percentage of Q30 bases was 93.10% or more. After assembly, 50,501 unigenes were identified. Among these there were 20,706 unigenes over 1 kb in length, and the error rate of sequencing was less than 0.1% , which indicates that the quality of sequencing data was good and could be used for subsequent analysis.
Statistics of sequencing data assembly results
These recombinant sequence dataset yielded 115,413 transcripts and 50,501 unigenes, among which, the N50 (accounting for 50% of the maximum length nucleotide sequence of all single genes) was 2267 nt and 1979 nt, respectively. There were 17,366 (34.39%) unigenes between 300 and 500 nt, 23,580 (46.69%) unigenes between 500 and 2000 nt, and 9,555 (18.92%) unigenes longer than 2000 nt (Table 1).
Table 1 Length distributions of the transcripts and unigenes from de novo assembly
Length range
|
Transcript
|
Unigene
|
300-500
|
24236(21.00%)
|
17366(34.39%)
|
500-1000
|
26476(22.94%)
|
12429(24.61%)
|
1000-2000
|
33114(28.69%)
|
11151(22.08%)
|
2000+
|
31587(27.37%)
|
9555(18.92%)
|
Total Number
|
115413
|
50501
|
Total Length
|
179159431
|
62348493
|
N50 Length
|
2267
|
1979
|
Mean Length
|
1552.33
|
1234.60
|
Functional annotation and classification
Unigene sequence was then compared with gene sequences in the NR, Swiss-Prot GO, COG, KOG, eggNOG 4.5, and KEGG databases using BLAST software (e < 0.00001). 35,316 unigenes were identified, accounting for 70.01% of the 50,501 unigenes. 12,984 unigenes were annotated in the COG database, 25,375, 12,487 and 19,460 unigenes were annotated in the GO, KEGG, and KOG databases respectively. 25,226 unigenes were annotated in the Pfam database. 19,796 unigenes and 32498 unigenes were also annotated in the Swanshot and eggNOG databases respectively (Table 2).
According to NCBI NR database and E-value distribution, the number of unigenes annotated in our dataset was 35,024, of which 71.53% of these unigenes (E < 10 -50) had strong homology and 47.87% of these unigenes (E < 10-100) had very strong homology (Fig. 2a).Ten popular-related species were also annotated based on the NCBI NR database (Fig. 2b). The highest homology to A. pseudosieboldianum was Citrus sinensis, accounting for 12.25% homology, followed by Citrus clementina, which accounted for 9.74% homology.
Table 2 Statistics of comparisons with databases
Anno_ Database
|
300<=length<1000
|
length>=1000
|
Annotated Number
|
COG_Annotation
|
4534
|
8450
|
12984
|
GO_Annotation
|
11027
|
14348
|
25375
|
KEGG_Annotation
|
4861
|
7626
|
12487
|
KOG_Annotation
|
7536
|
11924
|
19460
|
Pfam_Annotation
|
9281
|
15945
|
25226
|
Swissprot_Annotation
|
6715
|
13081
|
19796
|
eggNOG_Annotation
|
14214
|
18284
|
32498
|
Nr_Annotation
|
16192
|
18832
|
35024
|
All_Annotated
|
16431
|
18885
|
35316
|
GO databases are divided into three categories: biological process, cellular component and molecular function, which are further divided into 42 functional subgroups. Biological process had the largest number of annotated unigenes, included metabolic process and cellular process with 13,141 (51.78%) unigenes and 11,546 (45.5%) unigenes, respectively. The cellular component class mainly included cell and cell part, with 11,886 (46.84%) unigenes and 11,806 (46.53%) unigenes, respectively. The molecular function category mainly included catalytic activity and binding, and there were 12,691 (50.01%) unigenes and 1, 1049 (43.54%) unigenes (Fig. 3).
In addition, Annotation data about COG and KEGG were found in additional file 2: Fig. S1 and Additional file 3: Table S2, respecially.
Differentially expressed genes (DEGs)
In order to explore the genes related to anthocyanin biosynthesis in A. pseudosieboldianum at different color-changing stages, the differential expression of A. pseudosieboldianum samples at different color-changing stages were then analyzed. The results showed that there were 16,521 DEGs in the three color-changing periods of A. pseudosieboldianum (Fig. 4a). Comparing between the early stage (B) and the middle stage (M), there were 87 significant DEGs, with 52 up-regulated and 35 down-regulated. Between with the early stage (B) and the final stage (A), there were 14,855 DEGs, of which 7984 were up-regulated and 6871 were down-regulated. In a comparison of the middle stage (M) and the final stage (A), there were 12,402 DEGs, 5683 up-regulated and 6719 down-regulated, in A. pseudosieboldianum (Fig. 4b).
In order to further understand the function of these respective DEGs, we carried out KEGG pathway enrichment analysis in the three stages of A. pseudosieboldianum. Our results showed that there were 16,521 differentially expressed genes in the three stages (B, M and A). The anthocyanin biosynthesis pathways related to leaf tone control were significantly enriched in B vs M and B vs A up-regulated genes. Phenylalanine metabolic pathways were significantly enriched in B vs M and B vs A up-regulated genes (Additional file 4: Table S3; Additional file5: Table S4).
Candidate genes involved in the anthocyanin biosynthesis Pathway
Twenty candidate genes were identified that covered almost all known enzymes involved in anthocyanin biosynthesis. Four PAL genes (c118011.graph_c0, c118229.graph_c0, c60818.graph_c0, c97964.graph_c0), one CHS gene was detected (c100615.graph_c0), one CHI gene (c108255.graph_c0), two F3H genes (c114916.graph_c0, c56266.graph_c0) were detected in the upstream phenylalanine pathway, and two F3’H genes (c110935.graph_c0, c108910.graph_c0), one ANS genes, two DFR genes, and six GT genes also detected in the downstream phenylalanine pathway. Combined with contents of metabolites, the gene pathways related to anthocyanin synthesis were analyzed in A. pseudosieboldianum (Fig. 5).
Screening of different transcription factors for anthocyanin biosynthesis
Transcription factors play an important role in plant development and secondary metabolism. In this experiment, we screened out 31 MYB genes, 15 bHLH genes, and 28 WD40 protein genes from the three DEGs of B, M and A stages of A. pseudosieboldianum. In the 31 MYB genes, 17 were up-regulated and 14 down regulated (Additional file 6: Table S5). In the 15 bHLH genes, 6 were up-regulated and 9 down regulated. In the 28 WD40 protein genes, 25 were up-regulated and 3 down regulated.
qRT-PCR confirmation of RNA-seq data
In order to verify the accuracy of our sequencing data, we selected eight genes involved in anthocyanin biosynthesis, and analyzed the expression level in leaves of different color from these three different stages of A. pseudosieboldianum by qRT- PCR. The results showed that all of these selected genes had similar expression patterns than identified in the RNA sequencing data (Fig. 6). Therefore, the data obtained in our study can be used to analyze the anthocyanin biosynthesis and metabolism gene in A. pseudosieboldianum.