De novo characterization of the Camellia sinensis transcriptome and comprehensive analysis of the diploid and triploid leaf morphological differences

Background: Polyploidization has undergone a series of significant in Result: In this study, we found that the leaves of triploid tea had obvious growth advantages compared with diploid tea leaves, which was 59.81% higher than that of diploid leaves areas. The morphological structure of the triploid leaves showed obvious changes, the xylem of the veins was more developed, the cell-to-cell gap between the palisade tissue and the sponge tissue became larger, and the stomata of the triploid leaves were enlarged. Transcriptome sequencing analysis showed that after the triploidization of tea, the changes of leaf morphology and physiological characteristics were affected by the specific expression of some key regulatory genes. we identified a large number of transcripts and genes that might play important roles in leaf development, especially those involved in cell division, photosynthesis, hormone synthesis, and stomatal development. Conclusion: This study will improve our understanding of the molecular mechanisms of tea leaf and stomatal development and provide the basis for molecular breeding of high quality and yield tea varieties. Furthermore, it gives information that may enhance understanding of triploid physiology. results provide a framework for the regulatory network of leaf development response in diploid and triploid.

development of leaves and the regulation of key genes, and enrich the theoretical basis of molecular biology of plant organs. On the other hand, the study of the molecular mechanism of its leaf growth and development can provide a theoretical basis for its molecular breeding, breeding excellent new varieties and solving practical problems in production.

Morphological characteristics of diploid and triploid tea leaves
The leaf phenotypes of diploid and triploid annual shoots were determined under field growth conditions. As shown in Fig.1, the leaves of triploid tea trees have obvious growth advantages compared with diploid tea leaves, and the leaf length, leaf width and leaf area are larger than those of diploid leaves. Compared with diploid leaf length, leaf width and leaf area, they increased by 23.37%, 41.12% and 59.81%, respectively, with extremely significant differences. However, the diploid leaf thickness was significantly higher than that of the triploid leaf, which was 295.63 μm and 252.33 μm, respectively, and the difference was extremely significant. There was no significant difference in the length of the petiole of triploid and diploid tea, which were 0.7 cm and 0.667 cm, respectively. The diameter of the petiole of the triploid tea is larger than the diameter of the diploid petiole, with significant differences, 2.05 mm and 2.31 mm, respectively. The dry weight of the threeleaf leaves increased by 39.29% compared with the diploid, and the difference was extremely significant ( Table 1).

Analysis of the number and size of stomata in diploid and triploid leaves
In order to investigate the difference in stomatal appearance on the epidermis of diploid and triploid tea leaves, we performed a statistical analysis of the size and density of the stomata of the epidermis in the same part of the tea leaves. The results showed that the stomata size (length and width) of triploid leaves were significantly larger than diploid, which increased by 57.20% and 84.44%, respectively, compared with diploid length and width. However, the stomata density of triploid leaves is significantly lower than that of diploids, and the stomata density of diploid tea leaves is twice that of triploid stomata (Fig. 2).

Paraffin section analysis of diploid and triploid leaves
In order to understand the difference in growth and development of diploid and triploid leaves, we observed paraffin sections. As a result, it can be seen from Fig.3 that the leaf vein portion of the diploid tea tree undergoes a significant change after triploidization. Among them, the xylem of the veins is the most obvious. The xylem of the triploid vein is more developed than the diploid, and the area is also larger. The area of the triploid xylem is three times that of the diploid, which is 0.476 mm 2 . The increase in the area of the triploid xylem is caused by two reasons: one is the increase in cell areas; the other is the increase in the number of xylem cells, and the number of diploid and triploid xylem cell layers averaged 19 and 25. The number of triploid xylem cell layers increased by 30.67% compared to diploid. There were no significant changes in the size of the diploid and triploid veins and the phloem and formation.
It can be seen from Fig. 4 that the shape and size of the diploid and triploid mesophyll cells are significantly different, and the epidermal cell thickness of the triploid mesophyll is 22.28% larger than that of the diploid, which is extremely significant. The cells of the leaf bark of diploid tea leaves are arranged closely, and the shape and size are relatively uniform. The cells of the triploid leaf mesophyll puncture are loosely arranged, the cell gap is large, and the cell shape and size are different. The average length of the diploid palisade tissue cells is larger than that of the triploid, and the length is 65 μm, which is 15.65% longer than the triploid. The width of the triploid palisade tissue cells was significantly larger than that of the diploid, which was 70% higher than that of the diploid.
Diploid sponge tissue cells are small and dense, and the number is more than that of triploids, which is about twice the number of triploid cells.

Illumina sequencing and reads assembly
To investigate the molecular mechanisms of diploid and triploid leaf growth, and to understand the metabolic processes involved in leaf growth and development, we analyzed the gene expression profiles of diploid and triploid leaves. By de novo transcriptome sequencing, the test samples obtained an average of about 40 million original reads, with high quality reads reaching over 99%, six cDNA libraries (CaS419_1, CaS419_2, CaS419_3, CaS4_1, CaS4_2, and CaS4_3) were generated from diploid and triploid mRNAs, which were sequenced using Illumina deep-sequencing HiSeq ™ 2000.
The raw data obtained after sequencing on the machine is filtered, the sequencing error rate is checked, the GC content distribution is checked, and the GC content analysis is used to detect whether there is A/T or G/C separation phenomenon. Finally, 6 sample data (clean read) for subsequent analysis were obtained. The filtered data is summarized in the table below. Among all the raw reads, 96 % had Phred-like quality scores at the Q20 level (an error probability of 1 %).After removing adapters, low-quality sequences and ambiguous reads, we obtained approximately 45 million, 43 million, 60 million,58 million ,62 million and 48 million clean reads from the diploid samples (CaS419_1, CaS419_2 , and CaS419_3, ), and triploid samples (CaS4_1 , CaS4_2, and CaS4_3), respectively ( Table 2). Raw reads are filtered and assembled by De novo assembly software Trinity. The assembled sequences are redundant and spliced by software TGICL to obtain the longest non-redundant unigene set and further statistics on unigene sets.

Functional annotation and Cluster Analysis
Due to the lack of a complete genome sequence in Camellia sinensis, only 27,031 unigenes were coannotated into six databases (NR, NT, SwissProt, COG, GO, and KEGG), accounting for 26.13% of 103,448 unigenes. Among them, the most frequently cited genes in the NCBI NR and NT databases were 90,547 and 89,933 unigenes (87.53% and 86.93% of all annotated unigenes), while 35,298 (34.12%) and 61,318 (59.27%) unigenes could be annotated into the COG, Swiss-Prot database. We annotated 45,820 (44.29%) and 67.980 (65.71%) unigenes to GO and KEGG databases (Fig. 5).
The main GO terms included biological process (BP), cellular component (CC), and molecular function (MF). Based on sequence homology, 45,820 unigenes were mainly categorized into 55 functional groups (Fig. 6). In the category of BP, the two major groups of cellular processes and metabolic processes accounted for the highest proportion. Of these, approximately 24,750 genes have been annotated as metabolic process categories, which may allow the identification of novel genes involved in secondary metabolism pathways in triploid. As for the MF category, unigenes with binding and catalytic activity formed the largest groups. For CC, the top three largest categories were cell, cell part, and membrane. To further evaluate the reliability of our transcriptome results and the effectiveness of our annotation process, we searched the annotated sequences for genes with COG classifications (Fig. 7). Among the 26 COG categories, the cluster for "General function prediction only" (9,286) represented the largest group, followed by "Transcription" (5,108), "Posttranslational modification, protein turnover, chaperones" (4,438), and "Replication, recombination and repair" (4,369). The categories "Extracellular structures" (6) was the smallest group.
To analyze of the biological functions of the unigenes, we used the annotated sequences to comparisons against the KEGG database. In total, 67,980 annotated unigenes were assigned to 136 known pathways based on the KEGG BLAST analysis. The top 19 pathways with the largest numbers of unigenes are listed in Fig. 8. The majority of the unigenes (22,658; 31.95%) were involved in Global and overview maps pathways, followed by pathways in Carbohydrate metabolism (7,072 unigenes; 9.97%), Translation (6,418 unigenes; 9.05%), and Folding, sorting and degradation (4,408 unigenes; 6.22%).

Camellia sinensis
To confirm the results of the Solexa/Illumina sequencing, twelve unigenes were selected for quantitative RT-PCR assays. The qRT-PCR analysis performed for ten upregulated and two downregulated DEGs growth-related genes confirmed the transcriptomic changes detected by RNAseq ( Fig. 9). Although the expression levels did not exactly match; however, Quantitative real-time PCR analysis showed that the patterns of gene expression were consistent with RNA-seq results. Thus, qRT-PCR results validated the reliability of RNA-seq data.

Analysis of key gene expression of stomatal development based on transcriptome results
In order to analyze the difference in stomatal development of the diploid and triploid leaves of tea plants, we identified 16 differentially expressed stomatal-related genes (P < 0.005), differences in expression of these genes lead to changes in triploid stomatal density and size. Nine genes belong to the negative-regulatory factors, which are key regulators of stomatal in plants. In the negative regulatory family, SDD1, SERK1,2 and EPF1,2 have a negative regulatory effect on stomatal development, while EPLF9/ stomagen has a positive regulatory effect. Among them, the SDD1 and SERK1 genes are up-regulated and the EPF1 gene is down-regulated ( Table 3). SERKs can interact with TMM in a non-ligand-dependent manner to form a multiprotein receptor complex and negatively regulate stomatal development through signal transduction. These genes are involved in biosynthesis and signal transduction of stomatal development by participating in different biological processes. For example, during stomatal development, cysteine-rich secretory peptides belonging to the EPF/EPFL family act as ligands to interact with the corresponding receptors to transmit developmental signals, ensuring proper stomatal density and distribution. The key transcription factor involved in stomatal development in plants is the bHLH type protein, which Family SPCH, FAMA and MUTE play important regulatory roles in stomatal development. There was no significant difference in expression between these three transcription factors in diploid and triploid.
The COP and HIC genes are stomatal development factors that respond to light and CO 2 signaling, in which the COP gene is down-regulated and HIC is up-regulated.

Identification and expression analysis of candidate genes involved in leaves development based on transcriptome results
The morphological structure of the leaves seems simple, but the regulation mechanism of its development is very complicated. The final size of leaves, is tightly controlled by environmental and genetic factors that must spatially and temporally coordinate cell expansion and cell cycle activity. In this study, we identified 28 putative genes associated with leaf development that belong to different pathways, including cell division, photosynthesis, transcription factor, and auxin synthesis, which showed significant differential expression between diploid and triploid.
Transcripts involved in photosynthesis-photoreaction phase were observed to be differentially expressed, including genes encoding Photosystem I (PS Ⅰ), photosystem II (PS Ⅱ), cytochrome b6/f complex, and ATP synthase. The center action of PS I is the pigment molecule P700, while Psa A and Psa B are key genes regulating the synthesis of P700 chlorophyll a apolipoprotein A1 and A2. The photoreaction center pigment of PSII is P680, and the PsbA and PsbE genes are involved in the synthesis of the P680 reaction center D1 protein. PetB is a key gene involved in the synthesis of the cytochrome b6f complex. F1B is a F-type H+ transport ATPase subunit β gene in ATP synthase. All these genes were down-regulated in leaves and are functional essential for carbon dioxide assimilation. Interestingly, it is contrary to the photoreaction phase of photosynthesis gene expression, all of the key enzymes in the carbon reaction phase is up-regulated. Among them, ribulose-1,5-bisphosphate carboxylase/oxygenase Rubisco gene, phosphoenolpyruvate carboxylase PPC and malate dehydrogenase MDH1 were significantly up-regulated (Table S1).
To explore the intracellular transcriptional activity of diploid and triploid plants, we analyzed the expression of genes involved in the regulation of RNA polymerase and transcription initiation factors during transcription. The results showed that RNA polymerase I, RNA polymerase II and transcription initiation factor were both up-regulated. The RPB2 gene and the TFIIA1 gene regulating RNA polymerase II and transcription initiation factor were significantly up-regulated, which was upregulated by about 2.5-fold compared to diploid ( Table 4).  Table 4). The CDK1 gene plays a key role in controlling the eukaryotic cell cycle by regulating centrosome circulation and mitosis initiation, promoting G2-M conversion, and regulating G1 and G1-S transformation by binding to multiple interphase cyclins.
Cyclin-dependent kinase 7, CDK7, is a catalytic subunit of the CDK-activated kinase (CAK) complex that regulates cell cycle progression. The relative diploid expression of CDK7 gene was up-regulated by 22.66% in triploid leaves of tea ( Table 5). The serine/threonine kinase BUB1 gene is involved in cell cycle control and RNA polymerase II-mediated RNA transcription, and is also up-regulated in triploids, which is about twice the expression of diploid genes.
Taken together, these results provide a framework for the regulatory network of leaf development response in diploid and triploid.

Analysis of molecular mechanism difference between diploid and triploid leaf development
Plant polyploid cells have additional genomic genetic material, and the ratio of nuclear genetic material to cell size in eukaryotes, ie karyoplasmicratio, is usually fixed in eukaryotes [17,18] Here, we mainly discuss the key growth regulators for which relationships.
Transcription factors are regulatory molecules of gene expression that bind to either the promoter or enhancer regions of a gene and up-/downregulate its expression [22,23]. They are a complex system controlling cellular growth, differentiation, genetic responses to the environment, and organismal development and evolution [24]. Based on our results, GIF, GRF, and TCP transcription factors showed significant up-/downregulation in triploid, indicating that these transcription factors are involved in the growth and development of triploid leaves ( Table 5). Studies have shown that overexpression of GIF in plants, leaves are larger and contain more cells [25,26]. GIF1 protein interacts with growth regulators GRF1, GRF2 and GRF5 to regulate cell division and regulate leaf growth and development.
The GIF1 protein is involved in the regulation of leaf growth by interacting with members of the putative transcription factor family GROWTH-REGULATING FACTOR1 (GRF1), GRF2 and GRF5 to regulate cell proliferation [21, 26,27]. Horiguchi study found that overexpression of GRF5, increased cell number, resulting in increased leaf area [26,28]. In these plants, the initial size and growth of the leaves did not change, but growth at later stages is faster, and the duration is prolonged, indicating that the GRF5 gene mainly acts in the late stage of leaf growth and development [28]. As a transcriptional coactivator of GRF proteins, GIF proteins also contribute to leaf development. In Arabidopsis leaf development, GIF is essential for cell division [27,29]. TCP (TEOSINTE BRANCHED 1/CYCLOIDEA/PCFs) is also a family of transcription factors associated with leaf development. Studies have shown that the Arabidopsis thaliana TCP gene plays a role in leaf cell growth and division.
AtTCP20 is involved in cell division, cell expansion and growth coordination [30,31]. In the Arabidopsis jaw-D mutant, miR319 is overexpressed, inhibiting the expression of TCP2, TCP3, TCP4, TCP10 and TCP24, and forming large and crinkled leaves [32]. AtTCP seems to have antagonistic effects on cell growth and division [33]. Ectopic expression of AtTCP3 inhibits the formation of shoot tip meristems [34]. The TCP transcription factor family TCP2, TCP3, TCP4, TCP14, TCP15, TCP20 and TCP24 gene expression were down-regulated in tea triploid, indicating that the down-regulation of TCP gene promoted the division and growth of triploid leaf cells ( Table 5).
Besides transcription factors, other regulators can affect the duration of cell division during leaf development. Plant hormones play an important role in the growth and development of plant organs [32,35]. In the analysis of tea gene expression, the expression of brassinosteroid-6-oxidase, BR60X1 and receptor protein BRI1 was up-regulated, while the main physiological role of BRs was to promote cell elongation and division [36]. It is regulated by BRs affecting the polarity extension of cells [37,38], indicating that BRs may affect the growth and development of the cell structure of tea tree triploid leaves. AVP1 regulates the transport of auxin, while ARF6 mediates the synthesis of auxin [39,40]. Overexpression of AVP1 and ARF6 genes increases the number of cells and increases leaf size [26,31]. The AUXIN-REGULATED INYOLYED INORGAN SIZE (ARGOS) gene was induced by auxin expression [41]. The ARGOS gene plays a role in the positive regulation of cell division and expansion of the leaf. Overexpression of ARGOS gene in Arabidopsis promoted plant leaf enlargement [41,42], while loss of ARGOS gene function causes leaf size to decrease [41]. In maize, overexpression of the zea mays ARGOS1 (ZAR1) enhances leaf, stalk and ear size, and grain yield by an increased cell number and promotes drought-stress tolerance [43].
Genes that regulate the cell cycle can affect organ morphological size. Cell division is controlled by cyclin-dependent kinases (CDKs). The cyclin-dependent kinases (CDKs) that promote chromosome duplication in S phase and segregation at mitosis require binding of cyclin and phosphorylation on the activation segment (T-loop) by a CDK-activating kinase (CAK) for full activity [44]. The cyclindependent kinase CDK1 gene is up-regulated more than 6-fold compared to diploids ( Table 4). The In this study, it was found that the size of triploid stomata was significantly larger than that of diploid and the leaf palisade palisade tissue and sponge tissue cells were looser than the diploid tea leaves, and the cell gap was large (Fig. 3,4).The external carbon dioxide enters the surface of the plant leaf through the pores and passes through the cell gap to reach the surface of the mesophyll cells covered by the chloroplast. The size of the cell surface area exposed to the intercellular space is related to photosynthetic strength. Stomatal development is accompanied by leaf growth. During the development of plant leaves, the relationship between stomatal morphology and photosynthesis function is beneficial to a comprehensive understanding of the entire life process of plant leaves. In the study of the stomata of diploid and triploid tea leaves, it was found that the diploid leaves had significantly more stomatal density than the triploid, while the triploid stomata were significantly larger than the diploid (Fig. 2). The increase in the number of pores and the size of the pores in each blade plays an important role in the gas exchange of the leaves. The stomatal density is reduced, which is mainly due to the growth of the leaf area. The triploid stomata become larger, indicating that the leaves are more effective in absorbing CO 2 and have a stronger carbon-fixing capacity. In the differential analysis of key regulatory genes in the photosynthesis of diploid and triploid tea leaves, it was found that the photosynthesis-photoreaction phase of triploid tea leaves were down-regulated, while the key regulatory genes in the dark reaction stage were up-regulated (Table S1). This indicates that the stomata size of the triploid tea leaves and the increase of the gap between the palisade tissue and the sponge tissue are more conducive to the absorption and utilization of CO 2 by photosynthesis.
Plant stomata are small holes surrounded by a pair of guard cells. Stomatal development is generally regulated by a three-step transcriptional cascade of three structurally similar bHLH (Basic-helix-loophelix) transcription factors SPCH (SPEECHLESS), MUTE and FATA [46]. In the diploid and triploid transcription results of tea, there was no difference in SPCH and FAMA regulatory factor gene expression, although MUTE regulatory factors were up-regulated, but not significant. This indicates that the triploid stomatal development differences are not related to the three regulatory factors.
There are many negative regulatory factors involved in stomatal development in plants [47], such as the epidermal model factor EPF, the leucine-rich receptor-like protein TMM, the subtilisin-like SDD 1 and the ERECTA family (ERf) of receptor-like kinases. EPF1 [48] and EPF2 [49,50] have been identified as negative factors of stomatal development. A family of secreted peptide signals known as the EPIDERMAL PATTERNING FACTORs (or EPFs) are proposed to compete for a putative cell surface receptor, believed to comprise the receptor-like protein TOO MANY MOUTHS (TMM) and a putative leucine-rich repeat receptor-like protein kinase [51,52]. Evidence suggests that receptor binding activates an intracellular mitogen-activated protein kinase cascade which phosphorylates and destabilises a bHLH transcription factor required, early in leaf development, for cells to enter the stomatal lineage [53]. EPF2 and EPF1 are very similar in amino acid sequence, and it can also inhibit the development of stomata, which is achieved by blocking the production of meristemoid cells. EPF2 regulates the differentiation of the protodermal cell to the MMC, whereas EPF1 regulates the direction of the spacing division that generates satellite meristemoids. EPF2 is overexpressed, plant epidermis cannot form stomata, and stomata and non-stomatal epidermis Cells are easy to concentrate together [47]. This may prove that the gene can block the development of the initial stomatal system. The Sugano [54] group studied the regulatory factors favorable for stomatal development in Arabidopsis thaliana and found that STOMAGEN and EPF1/EPF2 competed with each other, and STOMAGEN was able to bind to the TMM receptor protein, which is the only negative regulatory family but has a positive effect on stomatal development. Overexpression of the STOMAGEN gene increases the number of stomata. Conversely, inhibition of the STOMAGEN gene reduces the number of stomata. [51]. SERKs can interact with TMM in a non-ligand-dependent manner to form a multiprotein receptor complex and negatively regulate stomatal development through signal transduction.
In the negative regulatory family, STOMATAL DENSITY AND DISTRIBUTION 1 (SDD1) is negatively regulated by stomatal development independent of other signaling pathways [55,56]. The SDD1 mutant increases stomatal density and forms stomatal clusters [57]. SDD1 is believed to proteolytically process certain negative signaling factors, such as EPF1 and EPF2. However, overexpression of each gene of EPF1 and EPF2 in the sdd1 background reduced stomatal densities as in wild-type plants, suggesting that function of these signaling peptides is independent of SDD1 [47][48][49]. It is possible that negative signaling receptors (TMM and ERf) are modulated by SDD1 [47]. The SDD1 gene was significantly up-regulated in triploid tea leaves ( Table 3). This indicates that SDD1 negatively regulates the stomatal development of the leaves of triploid tea, which reduces the stomatal density of the leaves of triploid (Fig.2).
We currently know little about how environmental factors, such as CO 2 and light, regulate stomatal but as the transpiration rate of mature leaves correlates with stomatal in developing leaves. The only gene products known to modulate stomatal development in response to elevated CO 2 are the carbonic anhydrases [58], and the HIGH CARBON DIOXIDE(HIC) protein believed to be involved in biosynthesis of the epicuticular waxes [59][60][61]. The HIC gene is a factor that negatively regulates stomatal development in the sense of CO 2 concentration changes in Arabidopsis. For example, when the concentration of CO 2 is increased, the leaf stomatal density of wild-type plants is decreased, while the leaf stomatal density of hic-deficient mutants is significantly increased. This indicates that the HIC gene is responsible for regulating the number of stomata in a high concentration of CO 2 environment.
In summary, plant stomatal development is closely regulated at multiple levels by a large number of genes as well as self-development programs and environmental signals.

Conclusions
In this study, we examined the morphological characteristics of diploid and triploid tea leaves and analyzed the molecular mechanisms underlying these differences by transcriptome analysis. A potential connection was observed between gene expression and morphology variation in diploid and triploid tea leaves. The triploid tea leaves and stomatal size is bigger than diploid. From the analysis of blade microstructure, we found that the xylem cells in the veins of the leaves of triploid tea trees are larger than diploids, and the number of xylem cells is more. Comparative transcriptome analyses demonstrated that the genes involved in cell division and expansion were more highly expressed in triploid than diploid, especially key enzymes and transcription factor that function at branch points in cell formation pathway, which might explain the biosynthesis of morphological characteristics differences in tea leaves. The transcriptome data obtained in this study will improve our understanding of the molecular mechanisms of tea leaf and stomatal development and provide the basis for molecular breeding of high quality and yield tea varieties.

Plant materials and growth conditions
The diploid tea tree 'QianMei 419' is a small and medium-sized leaf breed cultivated by the Tea Science Institute of Guizhou Academy of Agricultural Sciences. The triploid 'QianFu No. 4' was obtained by using the Co60-γ ray to mutate the seeds of 'QianMei 419' strictly self-pollinated by the

RNA Extraction
Total RNA from tea tree diploid and triploid annual shoot leaves was extracted and purified using an After the sample extracted total RNA was treated with DNase I, eukaryotic mRNA was enriched with magnetic beads with Oligo (dT). Then, the disruption reagent was added to break the mRNA into short fragments, and the stranded cDNA was synthesized with random hexamers using the broken mRNA as a template, and then the second strand of cDNA was synthesized by adding buffer, dNTPs and DNA polymerase I, and purified by kit. After recovery, cohesive end repair, addition of base "A" at the 3' end and ligation of the sequencing linker, the resulting fragment was size-selected and then amplified by PCR amplification. The constructed library was qualified by agilent 2100 bioanalyzer and ABI step one plus real-time PCR system and sequenced using the illumina sequencing platform.

De novo assembly and functional annotation
The original sequence obtained by removing the adapter sequences, low quality sequences, and removes sequences having an N base ratio of more than 10%. Finally, we refer to the filtered reads as clean reads and use them for transcriptome de novo assembly using trinity platform (trinityrnaseq_r20140717; http://trinityrnaseq.sourceforge.net/) without digital normalization according to the parameters of min_kmer_cov 3 and other default parameters.
The sequence obtained by trinty assembly is called a transcript. Then use tgicl (v2.1) to de-redundant and further splicing to get the final unigene. Unigene, which is redundant and spliced by tgicl, is divided into two parts. Part of it is the clusters, which contain several unigenes with high similarity (greater than 70%) (starting with CL, followed by the number of the gene family). The rest are singletons (starting with unigene), representing a single unigene. These unigenes were annotated using the BLASTx alignment (E-value ≤ 10 −5 ) to the following databases: National Centre for

GO and KEGG pathway enrichment analyses for differentially expressed unigenes
After obtaining the GO annotation of each unigene, the WEGO software is used to perform GO function classification statistics for all unigenes, and the gene function distribution characteristics of the species are macroscopically recognized. GO and KEGG pathway enrichment analyses for the differentially expressed unigenes were then carried out. The obtained GO annotation was enriched and refined using top Go package (v2.16.0). The biologically complex behavior of the gene can be further studied by KEGG, and the Pathway annotation of unigene is obtained from the KEGG annotation information. The read counts was normalized by calculating number of reads per kilobase per million (RPKM) for each transcript in individual tissue, and the relative expression amount of the gene was calculated using Log 2 (YH29/WH10). A P-value cut-off of ≤ 0.05 along with at least two-fold change was used to identify significant differential expression of the transcripts.

Statistical analysis
All data were expressed as mean ± standard error. For qRT-PCR, three biological replicates were assessed. Microsoft Excel and GraphPad Prism 5.0 software were used for data analysis. One-way ANOVA with Duncan's multiple range test was used for post hoc comparison of multiple variables. A significant difference relative to the control was recognized at *P <0.05 or **P<0.01.

Consent for publication
Not applicable.

Availability of data and material
The datasets analyzed during the current study are available from the corresponding author on reasonable request. All data generated or analyzed during this study are included in this published article [and its Additional files].

Competing interests
The authors declare that they have no competing interests.     Figure 1 Morphological characteristics of diploid and triploid tea leaves.    Unigene KEGG annotation function distribution statistics and differential expression gene Pathway enrichment analysis Figure 9 Expression pattern of genes related with growth and development of diploid and triploid tea leaves by qRT-PCR