Quantification of purine alkaloids in different genotypes
Purine alkaloids determination showed that the total purine alkaloids content was different in the four genotypes, highest in YH9 (45.57 mg/g) and lowest in K11 (28.87 mg/g). The content of caffeine (11.33-42 mg/g) was relatively abundant in all genotypes, whereas the content of theobromine (2.46-4.56 mg/g) was low. On the other hand, the theacrine presented the greatest difference, with more than 27-fold higher in Kucha (13.16 mg/g in K6 and 14.94 mg/g in K11, respectively) compared with the conventional varieties (0.48 mg/g in YH9 and 0.37 mg/g in QX1, respectively) (Fig. 1 and Table S1).
Transcriptome data analysis
Totally, 12 libraries were constructed and designated as follows: YH9-1 to 3, QX1-1 to 3, K6-1 to 3 and K11-1 to 3. After stringent quality filtering, the transcriptome sequencing produced ~ 671.61 million of high quality (HQ) clean reads (2 ´ 150 bp) (average: about 55.97 million reads for each sample). The averaged Q30 and GC content value were 96.67% and 45.50%, respectively (Table 1 and S2).
HQ clean reads, excluding rRNA, were mapped to the tea plant reference genome (cv. Shuchazao). As a result, the total align ratio was ranked from 71.42% to 76.87%, with a uniquely mapped rate of 69.22% to 74.53%. As assembled transcriptome were annotated according to the 33,932 reference genes, a total of 30,369 known genes and 8,696 novel genes were identified in all samples (Table 1 and S2). For the annotation of novel genes, a total of 623 (7.16%) novel genes could be identified in the Nr database. Additionally, KEGG enrichment analysis demonstrated that “Metabolism” and “Genetic Information Processing” were the main categories (Fig. S1).
The pair-wise Pearson’s correlation coefficients were computed to test the reproducibility of the sequencing data. As shown in Fig. 2A, the robust correlation values within each genotype indicated high repeatability between replicates. PCA analysis indicated that the major transcriptomic variation (86.8%) could be explained by the first two principal components (Fig. 2B). The PCA cleanly clustered the samples into four groups, with the assignment of QX1 similar to that of K6.
Identification of differentially expressed genes (DEGs)
The DEGs were determined between Kucha and conventional varieties using a pairwise approach. In detail, 10,732 (4,298 unregulated and 6,434 downregulated) and 11,461 (5,077 upregulated and 6,384 downregulated) DEGs were identified in YH9 vs K6 and YH9 vs K11comparisons, respectively. The total number of DEGs in QX1 vs K6 and QX1 vs K11 comparisons were 6,032 (3,091 unregulated and 2,941 downregulated) and 9,290 (5,209 upregulated and 4,081 downregulated), respectively. Based on the analysis, the number of downregulated DGEs was higher than the upregulated DEGs in the comparisons of YH9 vs K6 and YH9 vs K11, but lower in QX1 vs K6 and QX1 vs K11 comparisons. Interestingly, the number of DEGs in QX1 vs K6 was far less than other comparisons (Fig. 3A). Moreover, we obtained 1,142 common DEGs from the four comparisons using a Venn diagram (Fig. 3B).
Functional characterization of DEGs
To understand the functions of the obtained DEGs, GO subcategory and KEGG pathway analyses were performed. Among these DEGs, 6,606 annotated genes were assigned into one or more GO terms under three domain categories (Fig. 3C). Within the biological process category, the top-enriched GO terms were metabolic process (3,110, 47.08%), cellular process (2,893, 43.79%) and single-organism process (2338, 35.39%). Within the molecular function, the terms related to catalytic activity (3,732, 56.49%) and binding (2,776, 42.02%) were the most enriched. For the cellular component, the dominant terms were membrane (2,342, 35.45%) and cell (2,105, 31.86%), followed by cell part (2,078, 31.46%). A total of 5,054 DEGs were distributed into 134 KEGG pathways of five main categories, among which the largest categories were metabolism and genetic information processing, followed by organismal systems, environmental information processing and cellular processes. (Fig. 3D). The most represented pathways were “Metabolic pathway” (1972, 39.02%), “Biosynthesis of secondary metabolites” (1298, 25.68%), “Plant-pathogen interaction” (467, 9.24%), “Biosynthesis of antibiotics” (440, 8.71%) and “Microbial metabolism in diverse environments” (362, 7.16%). Additionally, “Purine metabolism” pathway accounted for a considerable portion of the DEGs (125, 2.47%) (Table S3).
Putative genes involved in the theacrine biosynthesis
The putative pathway of theacrine biosynthesis was proposed according to the KEGG database (Fig. 4). In brief, the formation of theacrine is initiated by the assembly of xanthosine, which act as methyl acceptor for caffeine biosynthesis, then conversion of caffeine to theacrine occurs by sequential oxidation and methylation steps with 1,3,7-trimethyluric acid acting as the intermediate. Based on this, we grouped genes that are related to theacrine accumulation into three categories: synthesis of xanthosine, synthesis of caffeine and synthesis of theacrine.
A total of 38 genes related to enzymes of xanthosine synthesis were identified, including four genes encoding adenosine kinase (ADK), 12 genes encoding adenine phosphoribosyltransferase (ARPT), seven genes encoding AMP deaminase (AMPD), two genes encoding IMP dehydrogenase (IMPDH), 11 genes encoding 5’-Nase and two genes encoding guanine deaminase (GAD) (Table S4). Expression pattern analysis showed that these genes were expressed quite differently among different genotypes (Fig. 5A). In particular, three genes assigned as common DEGs, including one ADK gene (TEA022322), one AMPD gene (TEA007189) and one 5’-Nase gene (XLOC_057370), were highly expressed in Kucha.
For the 13 caffeine synthase (TCS) genes involved in caffeine synthesis, TEA015791 and TEA028050 exhibited very high expression level in all experimental materials (FPKM > 700), while two common DEGs (TEA010054 and TEA022559) were highly expressed in Kucha (Fig. 5B and Table S5). Although most of the five genes encoding S-adenosylmethionine synthetase (SAMS) in the methyl donor synthesis exhibited diverse expression patterns, none was classified as common DEGs (Fig 5C and Table S5).
In the case of the conversion of caffeine to theacrine, genes related to enzymes of oxidoreductase (OR) and methyltransferase (MT) might be bottlenecks, and it would require up-regulated in high theacrine content genotypes. Thus, we focused on common DEGs encoding the enzymes mentioned above. The data showed that 44 genes annotated as OR, of which 20 genes were highly expressed in Kucha (Fig. 5D and Table S6). Meanwhile, a total of 21 genes annotated as MTs were identified and seven of them showed enhanced expression levels in Kucha (Fig 5E and Table S7). Furthermore, four of the above-mentioned up-regulated MTs were grouped into N-methyltransferase (TEA009098, TEA010054, TEA022559, XLOC_035717), two into O-methyltransferase (TEA030958, XLOC_047820) and one into C-methyltransferase (TEA023802).
Validation of gene expression using qRT-PCR
To validate the quantification of the RNA-seq analysis in this study, eight genes including one IMPDH gene (TEA024175), three TCS genes (TEA010054, TEA022559 and TEA030024), two MT genes (TEA023802 and XLOC_047820) and two OR genes (TEA007843 and XLOC_026296) were selected for RT-qPCR analysis. As shown in Fig. 6, all the genes, except for TEA007843, shown a similar expression pattern in qRT-PCR analysis as obtained in the RNA-seq results, indicating the reliability of our RNA-seq data.