Comparative transcriptome analysis to reveal putative genes responsible for high theacrine content in Kucha (Camellia kucha (Chang et Wang) Chang)

Background: The purine alkaloid theacrine, synthesized in a high amount in Kucha (Camellia kucha (Chang et Wang) Chang), has shown multiple pharmacological effects such as anti-depression, dehydration and hypnotic activities. It is critical to better understand the theacrine metabolism at the molecular level for future breeding programs. Results: In this study, we carried out comparative transcriptome analysis of Kucha (Kucha-6 and Kucha-11) and conventional varieties (Yinghong 9 and Qingxin 1). HPLC showed that Kucha synthesized more than 27-fold theacrine in comparison to conventional varieties. A total of ~ 671.61 million high quality clean reads were yield, and 71.42 ~ 76.87% of these were aligned to the reference genome for each library. Among the 19,095 differentially expressed genes (DEGs) identied by a pairwise comparison approach, a pool of 1,142 common DEGs between Kucha and conventional varieties were screened. Functional annotation allowed the identied of 20 oxidoreductase (OR) genes and seven methyltransferase (MT) genes with enhanced expression in Kucha, which might be associated with theacrine diversity in the examined materials. Particularly, two N-methyltransferase-encoding genes (TEA010054 and TEA022559) may catalyze the nal methylation step during theacrine synthesis in Kucha. Conclusion: The current study focused on screening genes of theacrine formation in tea plant, which contributes to the elucidation of candidate genes and breeding tea plant with high theacrine content.


Background
Purine alkaloids (PAs), occurred in about 100 species in the plant kingdom, are secondary metabolites formed from purine nucleotides [1]. As one of the most important non-alcoholic beverages, tea plant synthesized diverse high-value pharmaceutically important PAs including fatigue resistance caffeine, antimicrobial agent theobromine and potential asthma drug theophylline. The theacrine, only accumulated in high amounts in Kucha, also displayed several bene cial effects in improving learning and memory, anti-depressive and hypnotic activities [2][3][4].
To elucidate the rate-limiting steps of PAs biosynthesis in tea plant, numerous studies have been conducted, and a few genes involved in PAs turnover have been characterized. Among these genes, tea caffeine synthase 1 (TCS1) was rst isolated in tea leaves and shown to control caffeine accumulation by two molecular macular mechanisms: with reduced transcription level or the encoded proteins with only N-3 methylation activity [8,9]. Recently, the theacrine synthase CkTcs with key residues that are required for the N9-methylation, was con rmed to catalyze the nal methylation steps during theacrine synthesis in Kucha, and thus were suggested as rate-limiting enzyme in PAs biosynthesis [10].
Usually, caffeine is the predominant purine alkaloids (2-5%) in young leaves of tea plant, followed by theobromine [11]. In sharp contrast, theacrine is the major purine alkaloid in Kucha, which is a special tea resource rst disclosed in Yunnan province [12]. Tracer experiments concluded that theacrine was synthesized from the downstream branch of caffeine metabolism pathway [7]. In this sense, Kucha can be considered as a class of mutants of normal tea and thus are ideal materials to elucidate PAs biosynthesis pathways and identify related candidate genes.
RNA sequencing (RNA-seq) has become a useful tool for identifying and characterizing secondary metabolism genes and their pathways in plants [13][14][15]. The recent release of complete tea genome sequence and detailed annotations allowed exploratory analysis of DEGs associated with important secondary metabolisms [16]. In this context, we established complete transcriptome of Kucha (Kucha-6 and Kucha1-11) and conventional varieties (YingHong9 and Qingxin1) using an Illumina HiSeq TM 2500 platform. Based on this data, potential candidate genes encoding oxidoreductase and methyltransferase were revealed. This study will provide useful gene resources for the investigations of PAs biosynthesis and genetic improvement of tea plant.

Quanti cation of purine alkaloids in different genotypes
Purine alkaloids determination showed that the total purine alkaloids content was different in the four genotypes, highest in YH9 (45.57 mg/g) and lowest in K11 (28.87 mg/g). The content of caffeine (11.33-42 mg/g) was relatively abundant in all genotypes, whereas the content of theobromine (2.46-4.56 mg/g) was low. On the other hand, the theacrine presented the greatest difference, with more than 27-fold higher in Kucha (13.16 mg/g in K6 and 14.94 mg/g in K11, respectively) compared with the conventional varieties (0.48 mg/g in YH9 and 0.37 mg/g in QX1, respectively) ( Fig. 1 and Table S1).
HQ clean reads, excluding rRNA, were mapped to the tea plant reference genome (cv. Shuchazao). As a result, the total align ratio was ranked from 71.42% to 76.87%, with a uniquely mapped rate of 69.22% to 74.53%. As assembled transcriptome were annotated according to the 33,932 reference genes, a total of 30,369 known genes and 8,696 novel genes were identi ed in all samples (Table 1 and S2). For the annotation of novel genes, a total of 623 (7.16%) novel genes could be identi ed in the Nr database.
Additionally, KEGG enrichment analysis demonstrated that "Metabolism" and "Genetic Information Processing" were the main categories (Fig. S1).
The pair-wise Pearson's correlation coe cients were computed to test the reproducibility of the sequencing data. As shown in Fig. 2A, the robust correlation values within each genotype indicated high repeatability between replicates. PCA analysis indicated that the major transcriptomic variation (86.8%) could be explained by the rst two principal components (Fig. 2B). The PCA cleanly clustered the samples into four groups, with the assignment of QX1 similar to that of K6.

Identi cation of differentially expressed genes (DEGs)
The DEGs were determined between Kucha and conventional varieties using a pairwise approach. In detail, 10,732 (4,298 unregulated and 6,434 downregulated) and 11,461 (5,077 upregulated and 6,384 downregulated) DEGs were identi ed in YH9 vs K6 and YH9 vs K11comparisons, respectively. The total number of DEGs in QX1 vs K6 and QX1 vs K11 comparisons were 6,032 (3,091 unregulated and 2,941 downregulated) and 9,290 (5,209 upregulated and 4,081 downregulated), respectively. Based on the analysis, the number of downregulated DGEs was higher than the upregulated DEGs in the comparisons of YH9 vs K6 and YH9 vs K11, but lower in QX1 vs K6 and QX1 vs K11 comparisons. Interestingly, the number of DEGs in QX1 vs K6 was far less than other comparisons (Fig. 3A). Moreover, we obtained 1,142 common DEGs from the four comparisons using a Venn diagram (Fig. 3B).
Putative genes involved in the theacrine biosynthesis The putative pathway of theacrine biosynthesis was proposed according to the KEGG database (Fig. 4). In brief, the formation of theacrine is initiated by the assembly of xanthosine, which act as methyl acceptor for caffeine biosynthesis, then conversion of caffeine to theacrine occurs by sequential oxidation and methylation steps with 1,3,7-trimethyluric acid acting as the intermediate. Based on this, we grouped genes that are related to theacrine accumulation into three categories: synthesis of xanthosine, synthesis of caffeine and synthesis of theacrine.
For the 13 caffeine synthase (TCS) genes involved in caffeine synthesis, TEA015791 and TEA028050 exhibited very high expression level in all experimental materials (FPKM > 700), while two common DEGs (TEA010054 and TEA022559) were highly expressed in Kucha ( Fig. 5B and Table S5). Although most of the ve genes encoding S-adenosylmethionine synthetase (SAMS) in the methyl donor synthesis exhibited diverse expression patterns, none was classi ed as common DEGs (Fig 5C and Table S5).
In the case of the conversion of caffeine to theacrine, genes related to enzymes of oxidoreductase (OR) and methyltransferase (MT) might be bottlenecks, and it would require up-regulated in high theacrine content genotypes. Thus, we focused on common DEGs encoding the enzymes mentioned above. The data showed that 44 genes annotated as OR, of which 20 genes were highly expressed in Kucha ( Fig. 5D and Table S6). Meanwhile, a total of 21 genes annotated as MTs were identi ed and seven of them showed enhanced expression levels in Kucha (Fig 5E and Table S7). Furthermore, four of the abovementioned up-regulated MTs were grouped into N-methyltransferase (TEA009098, TEA010054, TEA022559, XLOC_035717), two into O-methyltransferase (TEA030958, XLOC_047820) and one into Cmethyltransferase (TEA023802).

Validation of gene expression using qRT-PCR
To validate the quanti cation of the RNA-seq analysis in this study, eight genes including one IMPDH gene (TEA024175), three TCS genes (TEA010054, TEA022559 and TEA030024), two MT genes (TEA023802 and XLOC_047820) and two OR genes (TEA007843 and XLOC_026296) were selected for RT-qPCR analysis. As shown in Fig. 6, all the genes, except for TEA007843, shown a similar expression pattern in qRT-PCR analysis as obtained in the RNA-seq results, indicating the reliability of our RNA-seq data.

Discussion
Kucha, a kind of special germplasm resource found in individual areas of China, is often used as traditional medicine by local resident [17]. Though it is di cult to discern them from conventional varieties because of the morphological similarity, the remarkably bitter taste in both fresh leaves and made tea made Kucha very distinctive. The theacrine, in spite of showing varied content in different sources of Kucha, is de ned as a characteristic substance and account for the bitterness in this unique germplasm [10,12,[18][19][20][21]. In recent years, theacrine has attracted increasing attention for its potential therapeutic applications in the pharmaceutical industry, and thus motivated research efforts to unveil the molecular basis underlying theacrine biosynthesis. For instance, two RNA-seq based studies have reported that MT genes were involved in the nal step of theacrine formation [10,20]. However, oxidoreductase-encoding genes that catalyze the critical step before methylation have not been mentioned yet.
Guangdong province, an important tea producing region in China, is rich in diverse tea germplasms. Earlier, we collected Kucha germplasms throughout this region and found that K6 and K11 displayed the highest concentration of theacrine. Hence, we performed transcriptome analysis with interest in identifying possible genes involved in theacrine biosynthesis. A total of 671.61 million of HQ clean reads were generated in this study, which was adequate for accurate quanti cation of gene expression pattern [22]. The alignment rate to the reference genome was quite comparable between Kucha and conventional varieties, and the percentage of annotated genes was similar to that recently reported in tea plant [23,24], suggesting that the relatively conserved functions and our project has captured the majority of the tea transcriptome. Comparative transcriptome pro ling of Kucha and conventional varieties identi ed 19,095 DEGs and 1,142 of them was assigned as common DEGs. Surprisingly, the number of DEGs in QX1 vs K6 was nearly half that of other comparisons. Besides, the PCA analysis indicated that K6 was relatively close to QX1 rather than K11. These results illustrated the fact that tea has not undergone long-term arti cial directional selection and frequent genetic exchange may occur between K6 and parental sources of QX1 due to the geographical proximity [25].
The current research on the metabolic pathway of theacrine in tea plant is relatively clean [1,5,7]. We divided the entire pathway into three modules, i.e. xanthosine synthesis pathway, caffeine synthesis pathway and theacrine synthesis pathway, according to the formations of key intermediates and the nal product (Fig. 4). In this paper, the expression pattern of xanthosine biosynthesis-related genes that encoding ADK, ARPT, AMPD, 5'-Nase and GAD were investigated. Considering the variation of total purine alkaloids content among the tested materials, we proposed that the dynamic expression level of these genes in different routes may in uence an adequate supply of xanthosine for the downstream pathway.
In particular, IMPDH was previously reported to be implicated as an inhibitor of the enzyme reduced the rate of caffeine biosynthesis in tea and coffee leaves [26]. Our data showed that the expression pattern of an IMPDH gene (TEA024175) was consistent with the total purine alkaloids content among the four genotypes. TCS is the most important enzyme that catalyzed the methylation of xanthosine to caffeine [8,9,27]. The RNA-seq data revealed that two TCS genes (TEA015791 and TEA028050) showed strong expression levels (FPKM > 700) in all samples, implying their important and conserved role in the formation of caffeine. SAM is the methyl donor for the three methylation steps in the caffeine biosynthetic pathway [28]. However, the expression pattern of SAMS genes did not show a consistent trend with purine alkaloids in our study, which might be due to the availability of SAM is not a principal factor in the control of caffeine biosynthesis in our four genotypes.
Caffeine converts to theacrine by oxidation at the C8 and methylation at N9 positions, and genes related to enzymes of OR and MT might control these steps [7]. Based on the annotation results, we screened 20 OR genes and seven MTs genes that were signi cantly enhanced in Kucha. A former RNA-seq analysis of Niedu Kucha from Jiangxi province showed that an NMT gene (TEA024443) was highly expressed in high-theacrine sample and therefore was suggested to be involved in the nal synthesis process of theacrine [20]. Unfortunately, this gene displayed consistent moderate expression levels in K6, K11 and QX1 (FPKM: 5.00, 8.89, 8.72, respectively), but relatively high in YH9 (FPKM: 14.92) in our study. Recently, the TCS gene (CkTcs) encoding N9-methyltransferase was proved to catalyze the last step of theacrine formation [10]. Among the four common DEGs encoding N-methyltransferase in the current research, two genes (TEA010054 and TEA022559) showing homology to CkTcs seems most likely to be involved in the last step leading to theacrine biosynthesis from 1,3,7-trimethyluric acid. Thus, further functional characterization of these genes and their substrate speci city assay is required to determine whether they are genotype-speci c in catalyzing the rate-limiting steps of theacrine.

Conclusions
Our study comprehensively reported transcriptomic resource from Kucha and conventional varieties. A total of 19,095 DEGs were obtained, of which 6,606 and 5,054 genes were annotated based on the GO and KEGG databases, respectively. A Venn diagram of the DEGs showed that 1,142 genes were shared between all pairwise comparisons. Genes involved in putative theacrine biosynthesis pathway were identi ed. In particularly, candidate genes encoding oxidoreductase and methyltransferase that might catalyze the last two steps of theacrine formation should provide insight into the molecular mechanisms underlying the high level of theacrine in Kucha. Taken together, the target genes produced in this study can be good candidates for genetic improvement and metabolic engineering studies in the future.

Plant materials
Two Kucha genotypes (Kucha-6 (K6) and Kucha-11 (K11)) and two conventional varieties (Yinghong 9 (YH9) and Qingxin 1 (QX1)) with extremely contrasting theacrine content were used in this study. Both genotypes were grown in Germplasm Bank of Tea Plant in Guangdong Province (Yingde, Guangdong, China, 24°18′N 113°23′E), and details on agro-techniques (fertilization and irrigation) are available upon request. The rst leaves of each genotype were harvested on May 2018. Part of the materials were frozen immediately in liquid nitrogen and stored an -80 ℃ for RNA isolation. The remaining samples were subject to purine alkaloids (caffeine, theobromine and theacrine) analysis. Tea Research Institute, Guangdong Academy of Agricultural Sciences owned the land/materials and had approved the study. No speci c permissions were required for these locations/activities.

Extraction and HPLC determination of purine alkaloids
The harvested leaves were freeze-dried and ground nely. Purine alkaloids was extracted with H 2 O (1/100, w/v) in a water bath at 100 ℃ for 30 minutes and centrifuged at 5000×g for 5min. Then the obtained supernatant was metered and ltered through a 0.22 μm membrane prior to HPLC analysis.
Purine alkaloids were determined using Agilent 1200 HPLC system (Agilent Technologies, USA). The HPLC was made to run with a column temperature of 35 ℃, a wavelength of 231nm , an injection volume of 10 μL, and ow rate of 1.0 mL/min. 5 μm Zorbax SB-C18 column (Agilent) with solvent A (0.1% formic acid) and solvent B (acetonitrile) used as mobile phase. The gradient programs performed as follows: 0-15 min, linear gradient from 4% to 6% A; 15-30min, linear gradient from 6% to 12% A; 30-55min, linear gradient from 12% to 18% A; 55-58min, linear gradient from 18% to 4% A. All samples were analyzed using three biological replicates.

RNA extraction, library construction and RNA-Seq analysis
Total RNA was extracted using a TRIzol Kit following the manufacturer's protocol (Promega, Beijing, China). DNase I (TaKaRa, Dalian, China) was used to remove residual gDNA. RNA quality was assessed on an Agilent 2100 Bioanalyzer (Agilent Tech-nologies, USA). RNA samples with 260 nm/ 280 nm ratios between 1.8 and 2.2 were used for further analysis. Next, mRNA was enriched by Oligo (dT) beads, and sequencing libraries were created using an Illumina Truseq TM RNA Sample Prep Kit (Illumina, SD, USA) according to the standard high-throughput protocol. In total, 12 libraries (i.e., four genotypes, with three biological replicates) were sequenced on the Illumina HiSeq TM 2500 platform by Genedenovo Biotechnology Co., Ltd (Guangzhou, China).
The original reads were ltered with the following parameters: 1) reads containing adaptors, 2) reads with N (unknown base) ratio more than 10%, 3) reads containing more than 50% of low quality (Q-value ≤ 20) bases (Jiang et al., 2019). Then, the obtained clean reads were aligned to rRNA database to remove the remaining rRNA reads using Bowtie2 [29]. The high quality clean reads were mapped to reference genome by TopHat2 (version 2.0.3.12) with maximum read mismatch of 2, distance between mate-pair reads of 50 bp and error of distance between mate-pair reads of ± 80 bp [16,30]. We adopted Cu inks software to perform reconstruction of transcripts and the reference annotation based transcripts (RABT) program was preferred [31]. Genes with classcode "u" (the transcripts were unknown or in the intergenic spacer region) with transcript length longer than 200 bp and exon number more than 2 were de ned as reliable novel genes. Novel genes were further annotated through alignment with the Nr and KEGG databases.
Gene abundances were quanti ed by software RSEM [32]. The gene expression level was normalized by using FPKM (Fragments Per Kilobase of transcript per Million mapped reads) method, which is able to eliminate the in uence of different gene lengths and sequencing data amount on the calculation of gene expression. Pearson correlation coe cient and PCA on all samples were performed with R package gmodels (http://www.r-project.org/) in this experience.

Quantitative real-time PCR (qRT-PCR) validation
The qRT-PCR was performed to validate the transcriptomic data from the extracted total RAN. Eight genes related to theacrine synthesis were selected and analyzed. The corresponding primers were designed with Primer Premier 5.0 and listed in

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.