Metabolite profiling of ‘ZM’ and ‘GQ’ during various fruit development stages
As premature fruits of Citrus grandis ‘Tomentosa’ are utilized for medicinal purpose, we harvested ‘ZM’ and ‘GQ’ fruits at 2, 4, 6, and 8 cm diameter (indicated as S1, S2, S3, and S4, Fig. 1A) and employed UPLC-Q-Orbitrap MS/MS to perform a global untargeted metabolite analysis. Figure S1 illustrates the total ion chromatographs of all samples in positive and negative modes, respectively. Quality control (QC) was conducted to ensure the reliability of the methodology and data. As shown in Figure S2A,B, principal component analysis (PCA) revealed densely gathered data patterns in both positive and negative modes indicating reliability of the data [12]. In order to discover biomarkers, the relative standard deviation (RSD) of potential characteristic peaks in QC samples should not exceed 30%, otherwise the relevant characteristic peaks should be deleted [12] which is the quality assurance. In Figure S1B, the proportion of characteristic peaks with RSD < 30% is higher than 70% in positive mode which further validates the reliability of the obtained data [13]. A total of 25,005 and 22,434 precursor molecules were obtained in positive and negative ion modes, respectively. Furthermore, 407 compounds were identified and detailed information including retention time (rt), mass-to-charge ratio (m/z), exact mass, formula, KEGG ID, and the accurate molecular weight of metabolite errors (ppm) is provided in Table S1.
A total of 39 flavonoids including 14 flavanones/flavones, five isoflavonoids, 12 flavonols, six anthocyanins, and two backbones for the synthesis of flavonoids were identified, and their average intensities in the different samples are presented in Table S2. The total intensity of each category was calculated to primarily examine the contents of flavonoids in each sample. In order to further elucidate the chemical composition of bioactive flavonoids, a targeted metabolic analysis was performed. The qualitative contents of 16 flavonoids including naringin, rhoifolin, prunin, naringenin, isovitexin, rutin, vitexin, glycitin, cynaroside, diosmin, apigenin, isoquercetin, catechin, daidzin, luteolin and taxifolin were determined (Fig. 1B). Naringin, rhoifolin, prunin and naringenin are the top four flavonoids highly contained in the fruits of Citrus grandis ‘Tomentosa’ which are all in the mg/g range. The contents of naringin are extremely high, however, there is insignificant difference between ‘ZM’ and ‘GQ’, while the content of rhoifolin is significantly higher in ‘ZM’ than ‘GQ’ at all four fruit development stages. The amounts of naringenin in ‘ZM’ fruits at S1 and S2 are higher than in ‘GQ’ (p < 0.05). In addition, there is significant difference in the contents of rutin (p < 0.01), isovitexin (p < 0.05), vitexin (p < 0.05), glycitin (p < 0.05), and daidzin (p < 0.01) between S1 fruits of ‘ZM’ and ‘GQ’. In summary, ‘ZM’ fruits contain more flavonoids than ‘GQ’.
RNA sequencing of ‘ZM’ and ‘GQ’ fruits of various development stages
To investigate genes contributing to flavonoid synthesis in C. grandis ‘Tomentosa’ fruits, we sequenced the transcriptomes of the same samples and annotated the sequencing reads to the genome of C. grandis. An overview of the transcriptome sequencing dataset including quality check data is presented in Table 1. For each sample, the Q20 and Q30 percentages were higher than 96% and 90%, respectively, and the average GC content was 46.0% in all libraries. The percentage of total valid reads mapped to the reference genome was higher than 94% in each library. In addition to 30,123 genes from the reference genome, a total of 1,387 novel genes were identified from all libraries together.
The expression levels of genes are presented by FPKM values calculated from the read counts by the method of Reads Per kb per Million reads (RPKM). As expected, global gene expression profiles were similar in all samples analyzed (Fig. 2A). Differentially expressed genes (DEGs) were identified by comparing each sample pair under the condition of a false discovery rate (FDR) < 0.05 and a |log2FC(fold change)| > 1. Figure 2B shows the number of up- and down-regulated genes within varieties (to compare different fruit developmental stages) and between varieties (to compare same fruit stages). The number of DEGs was largest in the comparisons ZM-2 vs. ZM-4 (3,522 genes, 1,857 up, 1,665 down) and GQ-2 vs. GQ-4 (3,431 genes, 1,843 up, 1,588 down), indicating major developmental shifts between these two fruit stages in both varieties. Many genes were also differentially expressed when identical fruit stages of different cultivars where compared, the largest difference was observed in the comparison ZM-4 vs. GQ-4 (2,724 genes). Furthermore, Venn analysis revealed 646 genes to be differently expressed between ‘ZM’ and ‘GQ’ throughout all four developmental fruit stages (Fig. 2C and Table S3).
To evaluate the potential functions of the 646 DEGs, a Gene Ontology (GO) enrichment analysis was conducted for categories ‘biological process’, ‘molecular function’ and ‘cellular component’ (Fig. S3). With respect to ‘biological process’, ‘metabolic process’, ‘cellular process’ and ‘single-organism process’ are the top three processes enriched in the 646 DEGs. The term with the largest proportion in ‘molecular function’ is ‘catalytic activity’. In accordance with these results, KEGG enrichment analysis revealed 65 metabolic pathways (Table S4), whereby ‘biosynthesis of secondary metabolites’ showed the strongest enrichment (Figure S4).
Transcription factors (TFs) are crucial cellular regulators in essentially all developmental and physiological processes, and they likely also contribute to the pharmacological differences between ‘ZM’ and ‘GQ’ fruits. We, therefore, analyzed our transcriptome data to identify TFs differentially expressed between the two varieties in the four developmental fruit stages. A total of 109, 169, 106, and 108 TFs, respectively, were differentially expressed in the groups ZM-2 vs. GQ-2, ZM-4 vs. GQ-4, ZM-6 vs. GQ-6, and ZM-8 vs. GQ-8. Expression levels and statistical data of the TFs are presented in Table S5. Considering the crucial role of MYB and bHLH TFs for controlling flavonoid biosynthesis, genes encoding them were further examined. Interestingly, two C. grandis MYB genes, i.e., Cg4g016300 and Cg5g039960, are significantly differently expressed between ‘ZM’ and ‘GQ’ at all fruit developmental stages, suggesting they are particularly important for the differential flavonoid accumulation in the two cultivar´s fruits. To further narrow down genes and TFs controlling flavonoid biosynthesis it was important to correlate transcriptome with metabolome data. Therefore, WGCNA (weighted gene co-expression network analysis) was performed for all samples. First, the transcriptome data were filtered to ensure accuracy of the network building. Then, a gene clustering tree was constructed taking into account correlations between gene expression levels. The gene modules were divided based on the clustering relationship which means that genes with similar expression patterns are classified into the same module. The modules were generated by cutting branches off the dendrogram. Dynamic Tree Cut was the primary classification method to define modules [14]. Finally, similar modules were merged together, and 18 modules were obtained as shown in Figure 3A, each color represents a module, and gray represents genes that could not be classified into any module. The number of genes and TFs in each module is shown in Figure S5. Paleturquoise, thistle3, and darkmagenta are the top three modules including the highest number of genes, i.e., 4,064, 2,313, and 1,885, respectively, while the numbers of TFs in modules paleturquoise (234), darkmagenta (179), and brown (151) are each above hundred. The WGCNA result provides important information about modules of highly correlated genes which can be integrated with metabolomic data for the identification of relevant gene clusters.
Combined transcriptional and metabolic analysis
To understand flavonoid synthesis in C. grandis ‘Tomentosa’ fruits, the metabolite and transcriptional data were investigated together. To this end, the correlation between the WGCNA modules and the contents of 16 flavonoids was analyzed. Figure 3B presents the values of correlation and corresponding p values. According to the correlation values, the paleturquoise module is significantly (p < 0.001) positively correlated with catechin (r = 0.81), vitexin (r = 0.64), and isovitexin (r = 0.62). The darkmagenta module showed significant (p < 0.001) positive correlations with naringenin (r = 0.79), prunin (r = 0.77), and isoquercetin (r = 0.69). The highest correlation value was observed for mediumorchid and isovitexin (r = 0.88, p = 2e-08). For naringin and rhoifolin, lightcyan and blue are the two important modules, respectively.
The pathway of flavonoid biosynthesis, beginning with phenylalanine, including the results of the metabolite profiling, is illustrated in Figure 4. The expression of the main structural genes in ‘ZM’ and ‘GQ’ throughout fruit development is shown in heatmaps (Fig. 4), next to the pathways. The FPKM value and WGCNA module of each structural gene is provided in Table S5. The pathways of flavonoid backbone biosynthesis (isoflavonoid, flavanone and flavone, flavonol and anthocyanin) are illustrated in different colors and the summary of the metabolite ion intensity of each pathway is shown in the heatmaps (Fig. 4). The total intensity data are provided in Table S2. As can be seen, the contents of flavonoids derived from the isoflavonoid, flavanone and flavone pathways are higher in ‘ZM’ than ‘GQ’ fruits, while the total metabolite ion intensity from the flavonol and anthocyanin pathways is stronger in ‘GQ’ fruits. As shown in Table S5, structural genes involved in flavonoid biosynthesis mainly fall into the paleturquoise and darkmagenta modules. Considering this, we focused our further analysis on TFs of the MYB and bHLH families, and WD40-repeat proteins, which form a complex with the two TFs to regulate flavonoid biosynthesis (Table S6). Twenty-eight MYB, nine bHLH, and one WD40 genes sorted in the paleturquoise module, while16 MYB, 12 bHLH, and one WD40 genes are included in the darkmagenta module. We selected 21 MYB, 12 bHLH, and one WD40 genes, considering their expression levels and connectivity, to illustrate their transcriptional regulatory network for flavonoid biosynthesis in the paleturquoise and darkmagenta modules (Fig. 5). The gene IDs and names of each TF are listed in Table S7.
Next, we searched for potential correlations between DEGs and the 16 selected flavonoids in the groups ZM-2 vs. GQ-2, ZM-4 vs. GQ-4, ZM-6 vs. GQ-6, and ZM-8 vs. GQ-8. The correlation and p values of main structural genes and selected MYB, bHLH and WD40 family genes with relatively high expression levels are shown in Table S8. Pearson correlation coefficients in the four compared groups were further examined to identify genes involved in flavonoid biosynthesis. In the group of ZM-2 vs. GQ-2, five MYB members (CgMYB4, CgMYB15, CgMYB74, CgMYB103, and CgMYB167) were sorted out. Among them, the expression level of CgMYB103 was positively related to the content of naringenin with a correlation coefficient (r) of 0.83, while CgMYB4 and naringenin were negatively correlated (r = -0.95). CgMYB74 (r = 0.95) and CgMYB167 (r = 0.96) were significantly positively related with rutin, while CgMYB4 and rutin were negatively correlated (r = -0.91).
In the group of ZM-4 vs. GQ-4, seven MYB and three bHLH genes (CgMYB4, CgMYB59, CgMYB74, CgMYB77, CgMYB88, CgMYB108, CgMYB111.1, CgbHLH92, CgbHLH140, CgbHLH162.4) were selected. Among them, two MYBs (CgMYB74 and CgMYB108) and one bHLH (CgbHLH162.4) were highly positively correlated with naringenin with a Pearson correlation coefficient greater than 0.90. Expression of CgMYB74 was also positively correlated with the levels of luteolin (r = 0.92), taxifolin (r = 0.94), and glycitin (r = 0.94). It is worth noting that CgMYB4 expression is negatively correlated with almost all flavonoids except catechin (r = 0.83) and prunin (r = 0.94). Expression of CgMYB111.1 and CgbHLH140 is similar to that of CgMYB4. Since prunin is the precursor metabolite of flavanones and flavones, we can infer that CgMYB4, CgMYB111.1 and CgbHLH140 repress the production of flavonoids in C. grandis ‘Tomentosa’ fruits. Nevertheless, other TFs including CgMYB59, CgMYB74, CgMYB108, CgMYB77, CgMYB88, CgbHLH92, and CgbHLH162.4 activate the biosynthesis of flavonoids. In the group of ZM-6 vs. GQ-6, the expression levels of three MYBs (CgMYB15, CgMYB46 and CgMYB77) and one WD40 (Cg1g012600) were significantly positively correlated with the content of rhoifolin (r > 0.90, p < 0.05). In the group of ZM-8 vs. GQ-8, a WD40 member (Cg2g017030) was highly (r < -0.95, p < 0.01) negatively correlated with the contents of diosmin and rhoifolin which are derived from the flavanone and flavone biosynthesis pathway. In summary, nine MYB (CgMYB15, CgMYB46, CgMYB59, CgMYB74, CgMYB77, CgMYB88, CgMYB103, CgMYB108, CgMYB167), two bHLH (CgbHLH162.4 and CgbHLH92), and one WD (Cg1g012600) genes were identified as candidate activators, and two MYB (CgMYB4 and CgMYB111.1), one bHLH (CgbHLH140) and one WD (Cg2g017030) genes were identified as promising repressors for the biosynthesis of flavonoids in the fruits of C. grandis ‘Tomentosa’.
RT-qPCR
To verify the transcriptome sequencing date, RT-qPCR was conducted for selected flavonoid biosynthesis and TF genes. As shown in Figure 6, the relative expression levels of the main structural genes in the flavonoid pathway and CgMYB genes are consistent with the results obtained by RNA-seq analysis.