De novo characterization of the Camellia sinensis transcriptome and comprehensive analysis of the diploid and triploid leaf morphology and physiological differences

Background Polyploidization has undergone a series of signicant changes in the morphology and physiology of tea plants as plants multiply, especially in terms of increased growth rate and genetic gainsResult In this study, we found that the leaves of triploid tea had obvious growth advantages compared with diploid tea leaves, which was 59.81% higher than that of diploid leaves areas. The morphological structure of the triploid leaves showed obvious changes, the xylem of the veins was more developed, the cell-to-cell gap between the palisade tissue and the sponge tissue became larger, and the stomata of the triploid leaves were enlarged. After the polyploidy of tea, the content of secondary metabolites in tea leaves also changed signicantly. Transcriptome sequencing analysis showed that after the triploidization of tea, the changes of leaf morphology and physiological characteristics were affected by the specic expression of some key regulatory genes. we identied a large number of transcripts and genes that might play important roles in leaf development, especially those involved in cell division, photosynthesis, hormone synthesis, and stomatal development.Conclusion This study will improve our understanding of the molecular mechanisms of tea leaf and stomatal development and provide the basis for molecular breeding of high quality and yield tea varieties. Furthermore, it gives information that may enhance understanding of triploid physiology. NR: Non NT: COG: of group; GO: Gene Ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes; BP :biological process; CC :cellular component; MF: molecular function ; RIN :RNA integrity number ; SDD1: Stomatal density and distribution 1; SERK: somatic embryogenesis receptor kinase; EPF: EPIDERMAL FACTORs; TMM: TOO MANY MOUTHS; bHLH: Basic-helix-loop-helix; SPCH: SPEECHLESS; FAMA: bHLH family transcription factor ;MUTE: bHLH family transcription factor; COP: Constitutive Photomorphogenesis ; HIC: HIGH CARBON DIOXIDE; PS (cid:0):Photosystem I; PS (cid:0):photosystem II ; ATPase; PPC :phosphoenolpyruvate carboxylase ; MDH1:malate dehydrogenase; RPB2: DNA-directed RNA polymerases I ;TFIIA1: transcription initiation factor ; CDK: Cyclin-dependent kinase; CAK:CDK-activated kinase ; BUB1:serine/threonine kinase ; GIF: GRF1-interacting factor 1; GRF: GROWTH-REGULATING TCP :TEOSINTE BRANCHED 1; BR60X1:brassinosteroid-6-oxidase; BRI1: BRassinosteroid Insensitive 1;ARF: auxin response factor; ARGOS: AUXIN-REGULATED INYOLYED INORGAN SIZE

mitosis and biomass storage of hexaploid wheat are more rapid than tetraploid [13]. The doubling of chromosomes rst causes a series of changes in the plant genome, which in turn affects the growth and development system of plants [14]. The release of the whole genome by polyploid plants causes changes in genomic structure, resulting in re-regulation of gene expression and changes in gene expression levels.
A large amount of research work has shown that plant polyploidization causes changes in genome interactions, DNA sequences and gene expression. This change is called genome shock [15]. Kashkrush used cDNA-APLP technology to determine the expression levels of allopolyploid and natural polyploid genes in Arabidopsis, cotton and wheat [16]. Studies have shown that the gene expression levels of different ploidy plants vary with the growth and development of organs of plants.
Current research indicates that polyploidization of plants can indeed alter the physiological activities and developmental processes of plants. However, the research results have their own advantages. With the deepening of research, the general pattern of multiplication of plant growth and physiological metabolism will gradually become clear. Here, we discuss how alteration of the different Chromosome ploidy can affect the nal leaf size and describe genes and networks involved in each of these processes. The molecular mechanism of the growth and development of diploid and triploid leaves of tea trees can reveal the changes in molecular level during the growth and development of leaves and the regulation of key genes, and enrich the theoretical basis of molecular biology of plant organs. On the other hand, the study of the molecular mechanism of its leaf growth and development can provide a theoretical basis for its molecular breeding, breeding excellent new varieties and solving practical problems in production.

Morphological characteristics of diploid and triploid tea leaves
The leaf phenotypes of diploid and triploid annual shoots were determined under eld growth conditions. As shown in Fig.1, the leaves of triploid tea trees have obvious growth advantages compared with diploid tea leaves, and the leaf length, leaf width and leaf area are larger than those of diploid leaves. Compared with diploid leaf length, leaf width and leaf area, they increased by 23.37%, 41.12% and 59.81%, respectively, with extremely signi cant differences. However, the diploid leaf thickness was signi cantly higher than that of the triploid leaf, which was 295.63 μm and 252.33 μm, respectively, and the difference was extremely signi cant. There was no signi cant difference in the length of the petiole of triploid and diploid tea, which were 0.7 cm and 0.667 cm, respectively. The diameter of the petiole of the triploid tea is larger than the diameter of the diploid petiole, with signi cant differences, 2.05 mm and 2.31 mm, respectively. The dry weight of the three-leaf leaves increased by 39.29% compared with the diploid, and the difference was extremely signi cant ( Table 1).

Analysis of the number and size of stomata in diploid and triploid leaves
In order to investigate the difference in stomatal appearance on the epidermis of diploid and triploid tea leaves, we performed a statistical analysis of the size and density of the stomata of the epidermis in the same part of the tea leaves. The results showed that the stomata size (length and width) of triploid leaves were signi cantly larger than diploid, which increased by 57.20% and 84.44%, respectively, compared with diploid length and width. However, the stomata density of triploid leaves is signi cantly lower than that of diploids, and the stomata density of diploid tea leaves is twice that of triploid stomata ( Fig. 2).

Para n section analysis of diploid and triploid leaves
In order to understand the difference in growth and development of diploid and triploid leaves, we observed para n sections. As a result, it can be seen from Fig.3 that the leaf vein portion of the diploid tea tree undergoes a signi cant change after triploidization. Among them, the xylem of the veins is the most obvious. The xylem of the triploid vein is more developed than the diploid, and the area is also larger. The area of the triploid xylem is three times that of the diploid, which is 0.476 mm 2 . The increase in the area of the triploid xylem is caused by two reasons: one is the increase in cell areas; the other is the increase in the number of xylem cells, and the number of diploid and triploid xylem cell layers averaged 19 and 25. The number of triploid xylem cell layers increased by 30.67% compared to diploid. There were no signi cant changes in the size of the diploid and triploid veins and the phloem and formation.
It can be seen from Fig. 4 that the shape and size of the diploid and triploid mesophyll cells are signi cantly different, and the epidermal cell thickness of the triploid mesophyll is 22.28% larger than that of the diploid, which is extremely signi cant. The cells of the leaf bark of diploid tea leaves are arranged closely, and the shape and size are relatively uniform. The cells of the triploid leaf mesophyll puncture are loosely arranged, the cell gap is large, and the cell shape and size are different. The average length of the diploid palisade tissue cells is larger than that of the triploid, and the length is 65 μm, which is 15.65% longer than the triploid. The width of the triploid palisade tissue cells was signi cantly larger than that of the diploid, which was 70% higher than that of the diploid. Diploid sponge tissue cells are small and dense, and the number is more than that of triploids, which is about twice the number of triploid cells.

Illumina sequencing and reads assembly
To investigate the molecular mechanisms of diploid and triploid leaf growth, and to understand the metabolic processes involved in leaf growth and development, we analyzed the gene expression pro les of diploid and triploid leaves. By de novo transcriptome sequencing, the test samples obtained an average of about 40 million original reads, with high quality reads reaching over 99%, six cDNA libraries (CaS419_1, CaS419_2, CaS419_3, CaS4_1, CaS4_2, and CaS4_3) were generated from diploid and triploid mRNAs, which were sequenced using Illumina deep-sequencing HiSeq ™ 2000.
The raw data obtained after sequencing on the machine is ltered, the sequencing error rate is checked, the GC content distribution is checked, and the GC content analysis is used to detect whether there is A/T or G/C separation phenomenon. Finally, 6 sample data (clean read) for subsequent analysis were obtained. The ltered data is summarized in the table below. Among all the raw reads, 96 % had Phredlike quality scores at the Q20 level (an error probability of 1 %).After removing adapters, low-quality sequences and ambiguous reads, we obtained approximately 45 million, 43 million, 60 million,58 million ,62 million and 48 million clean reads from the diploid samples (CaS419_1, CaS419_2 , and CaS419_3, ), and triploid samples (CaS4_1 , CaS4_2, and CaS4_3), respectively ( Table 2). Raw reads are ltered and assembled by De novo assembly software Trinity. The assembled sequences are redundant and spliced by software TGICL to obtain the longest non-redundant unigene set and further statistics on unigene sets.

Functional annotation and Cluster Analysis
Due to the lack of a complete genome sequence in Camellia sinensis, only 27,031 unigenes were coannotated into six databases (NR, NT, SwissProt, COG, GO, and KEGG), accounting for 26.13% of 103,448 unigenes. Among them, the most frequently cited genes in the NCBI NR and NT databases were 90,547 and 89,933 unigenes (87.53% and 86.93% of all annotated unigenes), while 35,298 (34.12%) and 61,318 (59.27%) unigenes could be annotated into the COG, Swiss-Prot database. We annotated 45,820 (44.29%) and 67.980 (65.71%) unigenes to GO and KEGG databases (Fig. 5).
The main GO terms included biological process (BP), cellular component (CC), and molecular function (MF). Based on sequence homology, 45,820 unigenes were mainly categorized into 55 functional groups (Fig. 6). In the category of BP, the two major groups of cellular processes and metabolic processes accounted for the highest proportion. Of these, approximately 24,750 genes have been annotated as metabolic process categories, which may allow the identi cation of novel genes involved in secondary metabolism pathways in triploid. As for the MF category, unigenes with binding and catalytic activity formed the largest groups. For CC, the top three largest categories were cell, cell part, and membrane. To further evaluate the reliability of our transcriptome results and the effectiveness of our annotation process, we searched the annotated sequences for genes with COG classi cations (Fig. 7). Among the 26 COG categories, the cluster for "General function prediction only" (9,286) represented the largest group, followed by "Transcription" (5,108), "Posttranslational modi cation, protein turnover, chaperones" (4,438), and "Replication, recombination and repair" (4,369). The categories "Extracellular structures" (6) was the smallest group.
To analyze of the biological functions of the unigenes, we used the annotated sequences to comparisons against the KEGG database. In total, 67,980 annotated unigenes were assigned to 136 known pathways based on the KEGG BLAST analysis. The top 19 pathways with the largest numbers of unigenes are listed in Fig. 8. The majority of the unigenes (22,658; 31.95%) were involved in Global and overview maps pathways, followed by pathways in Carbohydrate metabolism (7,072 unigenes; 9.97%), Translation (6,418 unigenes; 9.05%), and Folding, sorting and degradation (4,408 unigenes; 6.22%).

Differentially Expressed Genes and qRT-PCR Validation Between Diploid and Triploid Camellia sinensis
To con rm the results of the Solexa/Illumina sequencing, twelve unigenes were selected for quantitative RT-PCR assays. The qRT-PCR analysis performed for ten upregulated and two downregulated DEGs growth-related genes con rmed the transcriptomic changes detected by RNA-seq (Fig. 9). Although the expression levels did not exactly match; however, Quantitative real-time PCR analysis showed that the patterns of gene expression were consistent with RNA-seq results. Thus, qRT-PCR results validated the reliability of RNA-seq data.

Analysis of key gene expression of stomatal development based on transcriptome results
In order to analyze the difference in stomatal development of the diploid and triploid leaves of tea plants, We identi ed 16 differentially expressed stomatal-related genes (P < 0.005), differences in expression of these genes lead to changes in triploid stomatal density and size. Nine genes belong to the Negativeregulatory factors, which are key regulators of stomatal in plants. In the negative regulatory family, SDD1, SERK1,2 and EPF1,2 have a negative regulatory effect on stomatal development, whileEPLF9/ stomagenhas a positive regulatory effect. Among them, the SDD1 and SERK1 genes are up-regulated and the EPF1 gene is down-regulated (Table 3). SERKs can interact with TMM in a non-ligand-dependent manner to form a multiprotein receptor complex and negatively regulate stomatal development through signal transduction. These genes are involved in biosynthesis and signal transduction of stomatal development by participating in different biological processes. For example, During stomatal development, cysteine-rich secretory peptides belonging to the EPF/EPFL family act as ligands to interact with the corresponding receptors to transmit developmental signals, ensuring proper stomatal density and distribution. The key transcription factor involved in stomatal development in plants is the bHLH type protein, which Family SPCH, FAMA and MUTE play important regulatory roles in stomatal development. There was no signi cant difference in expression between these three transcription factors in diploid and triploid.
The COP and HIC genes are stomatal development factors that respond to light and CO 2 signaling, in which the COP gene is down-regulated and HIC is up-regulated.

Identi cation and expression analysis of candidate genes involved in leaves development based on transcriptome results
The morphological structure of the leaves seems simple, but the regulation mechanism of its development is very complicated. The nal size of leaves, is tightly controlled by environmental and genetic factors that must spatially and temporally coordinate cell expansion and cell cycle activity. In this study, we identi ed 28 putative genes associated with leaf development that belong to different pathways, including cell division, photosynthesis, transcription factor, and auxin synthesis, which showed signi cant differential expression between diploid and triploid.
Transcripts involved in Photosynthesis-photoreaction phase were observed to be differentially expressed, including genes encoding Photosystem I (PS ), photosystem II (PS ), cytochrome b6/f complex, and ATP synthase. The center action of PS I is the pigment molecule P700, while Psa A and Psa B are key genes regulating the synthesis of P700 chlorophyll a apolipoprotein A1 and A2. The photoreaction center pigment of PSII is P680, and the PsbA and PsbE genes are involved in the synthesis of the P680 reaction center D1 protein. PetB is a key gene involved in the synthesis of the cytochrome b6f complex. F1B is a Ftype H+ transport ATPase subunit β gene in ATP synthase. All these genes were down-regulated in leaves and are functional essential for carbon dioxide assimilation. Interestingly, it is contrary to the photoreaction phase of photosynthesis gene expression, all of the key enzymes in the carbon reaction phase is up-regulated. Among them, ribulose-1,5-bisphosphate carboxylase/oxygenase Rubisco gene, phosphoenolpyruvate carboxylase PPC and malate dehydrogenase MDH1 were signi cantly up-regulated (Table S1).
To explore the intracellular transcriptional activity of diploid and triploid plants, we analyzed the expression of genes involved in the regulation of RNA polymerase and transcription initiation factors during transcription. The results showed that RNA polymerase I, RNA polymerase II and transcription initiation factor were both up-regulated. The RPB2 gene and the TFIIA1 gene regulating RNA polymerase II and transcription initiation factor were signi cantly up-regulated, which was up-regulated by about 2.5fold compared to diploid ( Table 4).
The expression of cell cycle-associated genes that regulate cell division can alter the organ volume of a plant, showing an increase in the number of cells and expansion of the cell volume in time and space. Cyclin-dependent kinase (CDK), also known as the cell cycle engine molecule, plays a central role in the regulation of cell cycle function. The cyclin-dependent kinase 1 (CDK1) gene is up-regulated 6-fold compared to the diploid ( Table 4). The CDK1 gene plays a key role in controlling the eukaryotic cell cycle by regulating centrosome circulation and mitosis initiation, promoting G2-M conversion, and regulating G1 and G1-S transformation by binding to multiple interphase cyclins. Cyclin-dependent kinase 7, CDK7, is a catalytic subunit of the CDK-activated kinase (CAK) complex that regulates cell cycle progression. The relative diploid expression of CDK7 gene was up-regulated by 22.66% in triploid leaves of tea ( Table  5). The serine/threonine kinase BUB1 gene is involved in cell cycle control and RNA polymerase IImediated RNA transcription, and is also up-regulated in triploids, which is about twice the expression of diploid genes.
Taken together, these results provide a framework for the regulatory network of leaf development response in diploid and triploid.

Discussion
Analysis of molecular mechanism difference between diploid and triploid leaf development Plant polyploid cells have additional genomic genetic material, and the ratio of nuclear genetic material to cell size in eukaryotes, ie karyoplasmicratio, is usually xed in eukaryotes [17,18], so the increase of genetic material in the nucleus usually causes the change of cell size. After plant polyploidization, the plant tissues and organs are correspondingly larger, and the leaf growth morphology index, pollen size, seed size and stem diameter of the plants are correspondingly increased. The characteristics of this polyploid are called giant. The enormousity of the polyploid organs leads to growth advantages, Under the same growth conditions, the height and breast diameter of the hybrid triploid Populus tomentosa B301 cultivated by Zhu were signi cantly higher than those of the diploid [19]. The homologous triploid tea tree 'QianFu 4' leaves selected in this study were signi cantly larger than the diploid tea leaves. The increase of plant organs is generally caused by an increase in the number of cells or expansion of cell volume. Previous studies have found that an increase in plant cell ploidy does contribute to an increase in cell volume [8,9].
In the observation of para n sections of diploid and triploid tea leaves, we found that the xylem cells in the veins of the leaves of triploid tea trees are larger than diploids, and the number of xylem cells is more. triploid palisade tissue and sponge shape and size of the tissue cells in the same part of the leaves changed signi cantly. The cells of the triploid palisade tissue and sponge tissue became bigger, and the cell-to-cell gaps expanded. The size of the nal leaves of a plant is determined by cell division, differentiation, and expansion. The number of cells that will be produced during the cell division phase of plant leaf development will determine the nal leaf size. In most cases, there is a direct correlation between cell number and organ size. [20]. Therefore, the duration of cell division has a major impact on the size of the nal leaf. A number of factors have been described that regulate the developmental control of the transition between cell division and cell expansion [21]. Here, we mainly discuss the key growth regulators for which relationships.
Transcription factors are regulatory molecules of gene expression that bind to either the promoter or enhancer regions of a gene and up-/downregulate its expression [22,23]. They are a complex system controlling cellular growth, differentiation, genetic responses to the environment, and organismal development and evolution [24]. Based on our results, GIF, GRF, and TCP transcription factors showed signi cant up-/downregulation in triploid, indicating that these transcription factors are involved in the growth and development of triploid leaves (Table 5). Studies have shown that overexpression of GIF in plants, leaves are larger and contain more cells [25,26]. GIF1 protein interacts with growth regulators GRF1, GRF2 and GRF5 to regulate cell division and regulate leaf growth and development. The GIF1 protein is involved in the regulation of leaf growth by interacting with members of the putative transcription factor family GROWTH-REGULATING FACTOR1 (GRF1), GRF2 and GRF5 to regulate cell proliferation [21,26,27]. Horiguchi study found that overexpression of GRF5, increased cell number, resulting in increased leaf area [26,28]. In these plants, the initial size and growth of the leaves did not change, but growth at later stages is faster, and the duration is prolonged, indicating that the GRF5 gene mainly acts in the late stage of leaf growth and development [28]. As a transcriptional coactivator of GRF proteins, GIF proteins also contribute to leaf development. In Arabidopsis leaf development, GIF is essential for cell division [27,29]. TCP (TEOSINTE BRANCHED 1/CYCLOIDEA/PCFs) is also a family of transcription factors associated with leaf development. Studies have shown that the Arabidopsis thaliana TCP gene plays a role in leaf cell growth and division. AtTCP20 is involved in cell division, cell expansion and growth coordination [30,31]. In the Arabidopsis jaw-D mutant, miR319 is overexpressed, inhibiting the expression of TCP2, TCP3, TCP4, TCP10 and TCP24, and forming large and crinkled leaves [32]. AtTCP seems to have antagonistic effects on cell growth and division [33]. Ectopic expression of AtTCP3 inhibits the formation of shoot tip meristems [34]. The TCP transcription factor family TCP2, TCP3, TCP4, TCP14, TCP15, TCP20 and TCP24 gene expression were down-regulated in tea triploid, indicating that the down-regulation of TCP gene promoted the division and growth of triploid leaf cells (Table 5).
Besides transcription factors, other regulators can affect the duration of cell division during leaf development. Plant hormones play an important role in the growth and development of plant organs [32,35]. In the analysis of tea gene expression, the expression of brassinosteroid-6-oxidase, BR60X1 and receptor protein BRI1 was up-regulated, while the main physiological role of BRs was to promote cell elongation and division [36]. It is regulated by BRs affecting the polarity extension of cells [37,38], indicating that BRs may affect the growth and development of the cell structure of tea tree triploid leaves. AVP1 regulates the transport of auxin, while ARF6 mediates the synthesis of auxin [39,40]. Overexpression of AVP1 and ARF6 genes increases the number of cells and increases leaf size [26,31]. The AUXIN-REGULATED INYOLYED INORGAN SIZE (ARGOS) gene was induced by auxin expression [41]. The ARGOS gene plays a role in the positive regulation of cell division and expansion of the leaf. Overexpression of ARGOS gene in Arabidopsis promoted plant leaf enlargement [41,42], while loss of ARGOS gene function causes leaf size to decrease [41]. In maize, overexpression of the Zea mays ARGOS1 (ZAR1) enhances leaf, stalk and ear size, and grain yield by an increased cell number and promotes drought-stress tolerance [43].
Genes that regulate the cell cycle can affect organ morphological size. Cell division is controlled by cyclin-dependent kinases (CDKs). The cyclin-dependent kinases (CDKs) that promote chromosome duplication in S phase and segregation at mitosis require binding of cyclin and phosphorylation on the activation segment (T-loop) by a CDK-activating kinase (CAK) for full activity [44]. The cyclin-dependent kinase CDK1 gene is up-regulated more than 6-fold compared to diploids ( Table 4). The CDK1 gene plays a key role in controlling the eukaryotic cell cycle by regulating centrosome circulation and mitosis initiation; Both Cdk1 and -2 require cyclin-binding and T-loop phosphorylation for full activity. The only known CDK-activating kinase (CAK) in metazoans is Cdk7, which is also part of the transcription machinery.

Molecular analysis of the difference in stomatal formation between diploid and triploid leaves
The stomata in the epidermal cells of plant leaves serve as the main channel for CO 2 and H 2 O to enter and exit the leaves [45]. They are generally composed of two guard cells and intermediate pores. The size of the pores and the degree of opening and closing directly determine the transpiration and photosynthesis of plants. In this study, it was found that the size of triploid stomata was signi cantly larger than that of diploid and the leaf palisade palisade tissue and sponge tissue cells were looser than the diploid tea leaves, and the cell gap was large (Fig. 3,4).The external carbon dioxide enters the surface of the plant leaf through the pores and passes through the cell gap to reach the surface of the mesophyll cells covered by the chloroplast. The size of the cell surface area exposed to the intercellular space is related to photosynthetic strength. Stomatal development is accompanied by leaf growth. During the development of plant leaves, the relationship between stomatal morphology and photosynthesis function is bene cial to a comprehensive understanding of the entire life process of plant leaves. In the study of the stomata of diploid and triploid tea leaves, it was found that the diploid leaves had signi cantly more stomatal density than the triploid, while the triploid stomata were signi cantly larger than the diploid (Fig.  2). The increase in the number of pores and the size of the pores in each blade plays an important role in the gas exchange of the leaves. The stomatal density is reduced, which is mainly due to the growth of the leaf area. The triploid stomata become larger, indicating that the leaves are more effective in absorbing CO 2 and have a stronger carbon-xing capacity. In the differential analysis of key regulatory genes in the photosynthesis of diploid and triploid tea leaves, it was found that the photosynthesis-photoreaction phase of triploid tea leaves were down-regulated, while the key regulatory genes in the dark reaction stage were up-regulated (Table S1). This indicates that the stomata size of the triploid tea leaves and the increase of the gap between the palisade tissue and the sponge tissue are more conducive to the absorption and utilization of CO 2 by photosynthesis.
Plant stomata are small holes surrounded by a pair of guard cells. Stomatal development is generally regulated by a three-step transcriptional cascade of three structurally similar bHLH (Basic-helix-loop-helix) transcription factors SPCH (SPEECHLESS), MUTE and FATA [46]. In the diploid and triploid transcription results of tea, there was no difference in SPCH and FAMA regulatory factor gene expression, although MUTE regulatory factors were up-regulated, but not signi cant. This indicates that the triploid stomatal development differences are not related to the three regulatory factors.
There are many negative regulatory factors involved in stomatal development in plants [47], such as the epidermal model factor EPF, the leucine-rich receptor-like protein TMM, the subtilisin-like SDD 1 and the ERECTA family (ERf) of receptor-like kinases. EPF1 [48] and EPF2 [49,50] have been identi ed as negative factors of stomatal development. A family of secreted peptide signals known as the EPIDERMAL PATTERNING FACTORs (or EPFs) are proposed to compete for a putative cell surface receptor, believed to comprise the receptor-like protein TOO MANY MOUTHS (TMM) and a putative leucine-rich repeat receptor-like protein kinase [51,52]. Evidence suggests that receptor binding activates an intracellular mitogen-activated protein kinase cascade which phosphorylates and destabilises a bHLH transcription factor required, early in leaf development, for cells to enter the stomatal lineage [53]. EPF2 and EPF1 are very similar in amino acid sequence, and it can also inhibit the development of stomata, which is achieved by blocking the production of meristemoid cells. EPF2 regulates the differentiation of the protodermal cell to the MMC, whereas EPF1 regulates the direction of the spacing division that generates satellite meristemoids. EPF2 is overexpressed, plant epidermis cannot form stomata, and stomata and non-stomatal epidermis Cells are easy to concentrate together [47]. This may prove that the gene can block the development of the initial stomatal system. The Sugano [54] group studied the regulatory factors favorable for stomatal development in Arabidopsis thaliana and found that STOMAGEN and EPF1/EPF2 competed with each other, and STOMAGEN was able to bind to the TMM receptor protein, which is the only negative regulatory family but has a positive effect on stomatal development. Overexpression of the STOMAGEN gene increases the number of stomata. Conversely, inhibition of the STOMAGEN gene reduces the number of stomata. [51]. SERKs can interact with TMM in a non-liganddependent manner to form a multiprotein receptor complex and negatively regulate stomatal development through signal transduction.
In the negative regulatory family, STOMATAL DENSITY AND DISTRIBUTION 1 (SDD1) is negatively regulated by stomatal development independent of other signaling pathways [55,56]. The SDD1 mutant increases stomatal density and forms stomatal clusters [57]. SDD1 is believed to proteolytically process certain negative signaling factors, such as EPF1 and EPF2. However, overexpression of each gene of EPF1 and EPF2 in the sdd1 background reduced stomatal densities as in wild-type plants, suggesting that function of these signaling peptides is independent of SDD1 [47][48][49]. It is possible that negative signaling receptors (TMM and ERf) are modulated by SDD1 [47]. The SDD1 gene was signi cantly upregulated in triploid tea leaves (Table 3). This indicates that SDD1 negatively regulates the stomatal development of the leaves of triploid tea, which reduces the stomatal density of the leaves of triploid (Fig.2).
We currently know little about how environmental factors, such as CO 2 and light, regulate stomatal but as the transpiration rate of mature leaves correlates with stomatal in developing leaves. The only gene products known to modulate stomatal development in response to elevated CO 2 are the carbonic anhydrases [58], and the HIGH CARBON DIOXIDE(HIC) protein believed to be involved in biosynthesis of the epicuticular waxes [59][60][61]. The HIC gene is a factor that negatively regulates stomatal development in the sense of CO 2 concentration changes in Arabidopsis. For example, when the concentration of CO 2 is increased, the leaf stomatal density of wild-type plants is decreased, while the leaf stomatal density of hic-de cient mutants is signi cantly increased. This indicates that the HIC gene is responsible for regulating the number of stomata in a high concentration of CO 2 environment.
In summary, plant stomatal development is closely regulated at multiple levels by a large number of genes as well as self-development programs and environmental signals.

Conclusions
In this study, we examined the morphological characteristics of diploid and triploid tea leaves and analyzed the molecular mechanisms underlying these differences by transcriptome analysis. A potential connection was observed between gene expression and morphology variation in diploid and triploid tea leaves. The triploid tea leaves and stomatal size is bigger than diploid. From the analysis of blade microstructure, we found that the xylem cells in the veins of the leaves of triploid tea trees are larger than diploids, and the number of xylem cells is more. Comparative transcriptome analyses demonstrated that the genes involved in cell division and expansion were more highly expressed in triploid than diploid, especially key enzymes and transcription factor that function at branch points in cell formation pathway, which might explain the biosynthesis of morphological characteristics differences in tea leaves. The transcriptome data obtained in this study will improve our understanding of the molecular mechanisms of tea leaf and stomatal development and provide the basis for molecular breeding of high quality and yield tea varieties.

Plant materials and growth conditions
Diploid tea tree: 'QianMei 419' (Guizhou Institute of Agricultural Sciences, Guizhou Academy of Agricultural Sciences -Tea Garden, Tea Lake Institute, Meitan County).
The annual shoots of diploid and triploid tea trees under eld growth conditions were used as experimental materials.

RNA Extraction
Total RNA from tea tree diploid and triploid annual shoot leaves was extracted and puri ed using an RNeasy mini kit (Qiagen, Valencia, CA, USA) according to the manufacturer's instructions. RNA extraction was performed using the Qiagen RNAeasy kit (Qiagen China, Shanghai, China) kit. Detailed experimental procedures were performed according to the kit instructions. Total RNA was then detected for degradation or contamination on a 1% agarose gel electrophoresis. The purity of the RNA was detected using a spectrophotometer (IMPLEN, CA, USA). RNA concentration was determined using a uorescence spectrophotometer (Life Technologies, CA, USA). RNA integrity was detected using the RNA Nano 6000 Assay Kit (Agilent Technologies, CA, USA) of the Bioanalyzer 2100 System. The A260/A280 is between 1.8-2.0, and the 28S/18S is between 1.6-2.0. The sample is quali ed for subsequent sequencing. The construction of the libraries and RNA-Seq were performed by Shenzhen Hengchuang Gene Technology Co., Ltd. (Shenzhen, China).
After the sample extracted total RNA was treated with DNase I, eukaryotic mRNA was enriched with magnetic beads with Oligo (dT). Then, the disruption reagent was added to break the mRNA into short fragments, and the stranded cDNA was synthesized with random hexamers using the broken mRNA as a template, and then the second strand of cDNA was synthesized by adding buffer, dNTPs and DNA polymerase I, and puri ed by kit. After recovery, cohesive end repair, addition of base "A" at the 3' end and ligation of the sequencing linker, the resulting fragment was size-selected and then ampli ed by PCR ampli cation. The constructed library was quali ed by Agilent 2100 Bioanalyzer and ABI Step One Plus Real-Time PCR System and sequenced using the Illumina sequencing platform.

De novo assembly and functional annotation
The original sequencing sequence obtained by removing the adapter sequences, low quality sequences , and removes sequences having an N base ratio of more than 10%. Finally, we refer to the ltered reads as clean reads and use them for transcriptome de novo assembly using trinity platform (trinityrnaseq_r20140717; http://trinityrnaseq.sourceforge.net/) without digital normalization according to the parameters of min_kmer_cov 3 and other default parameters.
The sequence obtained by trinty assembly is called a transcript. Then use Tgicl (v2.1) to de-redundant and further splicing to get the nal unigene. Unigene, which is redundant and spliced by tgicl, is divided into two parts. Part of it is the clusters, which contain several unigenes with high similarity (greater than 70%) (starting with CL, followed by the number of the gene family). The rest are singletons (starting with unigene), representing a single unigene. These unigenes were annotated using the BLASTx alignment (Evalue ≤ 10 −5 ) to the following databases: National Centre for Biotechnology Information (NCBI) Non-Redundant database and Nucleotide Collection (Nr/Nt), Swiss-Prot, Gene Ontology(GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and the Clusters of Orthologous Groups of proteins (COG), and based on the NR annotation results, unigene performs GO function annotation through Blast2GO ( v2.5.0).

GO and KEGG pathway enrichment analyses for differentially expressed unigenes
After obtaining the GO annotation of each unigene, the WEGO software is used to perform GO function classi cation statistics for all unigenes, and the gene function distribution characteristics of the species are macroscopically recognized. GO and KEGG pathway enrichment analyses for the differentially expressed unigenes were then carried out. The obtained GO annotation was enriched and re ned using top Go package (v2.16.0). The biologically complex behavior of the gene can be further studied by KEGG, and the Pathway annotation of unigene is obtained from the KEGG annotation information. The read counts was normalized by calculating number of reads per kilobase per million (RPKM) for each transcript in individual tissue, and the relative expression amount of the gene was calculated using Log 2 (YH29/WH10). A P-value cut-off of ≤ 0.05 along with at least two-fold change was used to identify signi cant differential expression of the transcripts.

Gene validation and expression analysis
For the purpose of gene validation and expression analysis, all the DEGs related to Leaf growth related gene were subjected to quantitative real-time PCR (qRT-PCR). qRT-PCR was performed using a7500 Fast Quantitative Real-Time PCR instrument (Applied Biosystems, Waltham, Massachusetts, USA) and SYBR Premix Ex Taq (TaKaRa) according to the manufacturer's instructions. The synthesized cDNA was used for analysis of transcript abundance using Quantitative Real-Time qRT-PCR and the primers shown in Supplementary Table S2. Relative levels of transcripts were determined by normalizing expression against actin transcript levels. Experiments were replicated three times.

Statistical analysis
All data were expressed as mean ± standard error. For qRT-PCR, three biological replicates were assessed.
Microsoft Excel and GraphPad Prism 5.0 software were used for data analysis. One-way ANOVA with Duncan's multiple range test was used for post hoc comparison of multiple variables. A signi cant difference relative to the control was recognized at *P <0.05 or **P<0.01.

Consent for publication
Not applicable.

Availability of data and material
The datasets analyzed during the current study are available from the corresponding author on reasonable request. All data generated or analyzed during this study are included in this published article [and its Additional les].

Competing interests
The authors declare that they have no competing interests. Authors' contributions YQ, LTL and DGZ conceived and designed the research. YQ and LTL conducted the experiments. YQ and XZY contributed analytical tools and analyzed data. YQ wrote the manuscript. All authors read and approved the manuscript.   Figure 1 Morphological characteristics of diploid and triploid tea leaves.   GO classi cation map. The abscissa is the GO term of the next level of the three major categories of GO, and the ordinate is the number of genes annotated to the term (including the subterm of the term). The three different classi cations represent the three basic tree classi cations of Go term (from left to right, biological processes, cellular components, molecular functions). Unigene KEGG annotation function distribution statistics and differential expression gene Pathway enrichment analysis Figure 9 Expression pattern of genes related with growth and development of diploid and triploid tea leaves by qRT-PCR