Genome-wide identiﬁcation and expression analysis of the NF-Y family of transcription factors in tea plant

Background Nuclear factor-Y(NF-Y) is a heterotrimeric transcription factor commonly found in higher eukaryotes. It usually consists of three subunits: NF-YA, NF-YB and NF-YC, which binds to CCAAT sequences to regulate the expression of target genes. Although NF-Y proteins have been reported in many plants like Arabidopsis and wheat, a comprehensive and systematic analysis of NF-Y genes in tea plant has not yet been reported. In this study, a total of 35 CsNF-Y genes, including 10 CsNF-YA, 16 CsNF-YB, and 9 CsNF-YC genes, were identified in the sequenced genomes of tea plant. The conserved domains analysis and multiple alignments suggested that domains were conserved within subgroups but differed between subgroups in the CsNF-Y family. Functional prediction and protein interaction analysis were performed by phylogenetic analyses with orthologous genes. Further, expression pattern analysis in various tissues of tea plant by transcriptome data indicated that these genes play different roles in tea plant development and many exhibit tissue-specific expression patterns.


Background
Nuclear factor-Y(NF-Y) is a heterotrimeric transcription factor commonly found in higher eukaryotes. It usually consists of three subunits: NF-YA, NF-YB and NF-YC, which binds to CCAAT sequences to regulate the expression of target genes. Although NF-Y proteins have been reported in many plants like Arabidopsis and wheat, a comprehensive and systematic analysis of NF-Y genes in tea plant has not yet been reported.

Results
In this study, a total of 35 CsNF-Y genes, including 10 CsNF-YA, 16 CsNF-YB, and 9 CsNF-YC genes, were identified in the sequenced genomes of tea plant. The conserved domains analysis and multiple alignments suggested that domains were conserved within subgroups but differed between subgroups in the CsNF-Y family. Functional prediction and protein interaction analysis were performed by phylogenetic analyses with orthologous genes. Further, expression pattern analysis in various tissues of tea plant by transcriptome data indicated that these genes play different roles in tea plant development and many exhibit tissue-specific expression patterns.

Conclusions
In this study, we characterized 35 CsNF-Ys based on their distributions on tea plant genome, gene structures, phylogenetic relationship with Orthologous NF-Y genes, and their expression patterns. This study provides extensive evaluation of the tea plant NF-Y family and may contribute to the future studies about the CsNF-Y gene family.

Background
The CCAAT box is a cis-acting element that is widely found in 30% of eukaryotes promoters and plays a key role in regulating gene expression [1].The promoter element are usually located in the region 60 to 100 bp upstream of the transcription start site [2].Among homologous genes of different species, the CCAAT box is highly conserved, especially in the positional direction and nearby nucleotide sequences [3]. According to the regulatory role of transcription factors, genes under the control of a promoter containing a CCAAT sequence may be expressed universally, or may be expressed in different tissues and special stages [4]. Numerous experiments have demonstrated the importance of the biological function of the CCAAT box, whether in yeast ,mammals and plants [5][6][7].
Nuclear factor-Y (NF-Y), also known as CCAAT-binding factor (CBF) and heme activator protein (HAP) is a class of transcription factor which can be combined with CCAAT box in the promoter sequence [8]. Complex transcription factors have multiple subunits, and the spatial and temporal deployment of individual subunits (such as expression levels, posttranslational modifications, and subcellular localization) has profound effects on their function [9]. The NF-Y was first founded in yeast [10], which includes at least three subunits: NF-YA (CBF-B or HAP2), NF-YB (CBF-A or HAP3), and NF-YC (CBF-C or HAP5). In addition, researcher only can found the fourth subunit so-called HAP4 in fungi where it provides the transcriptional activation domain [11]. All subunits contain evolutionarily conserved DNA binding and subunit interaction domains to form heterotrimeric complexes [12,13].
Each type of NF-Y protein in mammals and yeast is encoded by a single exon, but the three subunits of plants are encoded by multiple exons, indicating that NF-Y proteins have more complex regulation in plants [14].In yeast and animals, each of NF-Y protein is encoded by one or two exons, which act as a heterotrimer. In contrast, each NF-Y subunit has evolved into a polygene-encoded family with a large number of heterotrimeric combinations in plants [15,16]. For example, mice and humans encode only one copy of each subunit [17], while 10 NF-YAs, 13 NF-YBs, and 13 NF-YCs are encoded by the dicot model plant Arabidopsis thaliana and in Triticum aestivum, there are 10, 11 and 14 copies of NF-YA, NF-YB and NF-YC ,respectively [18].
Numerous studies have shown that NF-Y proteins specifically bind to highly conserved CCAAT sequences in their target gene promoters, and each NF-Y subunit contains conserved domain associated with DNA binding and protein interactions [19,20]. The NF-Y subunits A is normally localized to the nucleus, and most NF-YA proteins bind to CCAAT cis elements in the promoter regions of the target gene, but have different functions [21,22].
It is known that NF-YA proteins have two conserved domains, one in the C-terminal region, which is related to DNA-binding ,the other in the N-terminus, which is related to the interaction with the NF-YB/NF-YC heterodimer [23]. The NF-Y subunits B and NF-Y subunits C contain highly conserved Histone Folding Domains (HFDs) associated with histone H2B and H2A structures [24], respectively, which functions in protein-DNA and protein-protein interactions [25]. It is worth noting that the NF-YB protein has no nuclear localization signal (NLS), so it must interact with NF-YC in the cytoplasm to transfer to the nucleus, and combine the heterodimer with NF-YA to form the final heterotrimer [26,27]. The HFD domains of the NF-YB and NF-YC subunits assemble into a dimer in a head-to-tail fashion, resulting in DNA binding and structural scaffold interaction with the NF-YA subunit [28].
This research suggests that the regulation of various NF-Y proteins in plants is more diverse and complex than in other organisms. These numerous subunits implies the possibility of genetic redundancy and functional differentiation [29]. There subunits assemble into heterotrimers and bind to the promoter CCAAT box to regulate the expression of many genes [30].
In mammals and yeast, the NF-Y gene possesses many important biological functions, like regulation of energy metabolism [31],response to sugar signal [32], activation of mitochondria-related biosynthesis [33] and coordinating gene expression related to the cell cycle [34]. For example, the degree of NF-YA and CCAAT combination is closely related to human age [35,36].In Kluyveromyces lactis, an additional subunit (HAP4) is proved that can provide powerful acidic activation surface to the complex by associates with the trimer [37]. Many studies have shown that the NF-Y gene family also play an important role in plants, like embryonic development [38], photosynthesis [39], controlling florescence [40] and stress response [41]. Arabidopsis thaliana LEAFY COTYLEDON1 (LEC1) and LEAFY COTYLEDON 1 LIKE (L1L) genes are the first nuclear transcription factors of NF-YB family reported in plants, which play an important role in regulating embryo development and embryo maturation [42,43]. In addition, the AtNF-YB2 gene can promote early flowering by activating the FLOWERINGLOCUS (FT) and SUPPRESSOR OF OVEREXPRESSION OF CO1 (SOC1) genes [44,45].Recent studies have shown that AtNF-YA5 expression is up-regulated under drought stress and abscisic acid (ABA) response, and AtNF-YA5 loss-of-function mutant is sensitive to drought stress, suggesting that AtNF-YA5 may enhance the expression of drought-resistance genes [46].OsNF-YB7/L1L in rice not only plays a role in vegetative growth stage, but also participates in the regulation of flower meristems.
Overexpressing OsNF-YB7/L1L plants are short, densely flocculated, and the inflorescence axis is abnormally developed [47].In barley, overexpression of HvNF-YB1 leads to early flowering time, and NF-Ys regulate the flowering process by combining with the CCT domain in VRN2 (the vernalization pathway) and CO (photoperiod pathway) in wheat [48].
Recently, more and more plants have been discovered and studied in the NF-Y gene family, such as Brachypodium distachyon [49], Zea mays [50], Solanum lycopersicum [51], Glycine max [52], Ricinus communis [53], Poplar [54], Physcomitrella patens [55], and Vitis vinifera L. Tea plant is an important evergreen, thermophilic, hydrophilic and woody perennial crop belonging to the Camellia family [58].The tea made from the young leaves of tea plants is the oldest in the world (from AD) and the most popular non-alcoholic caffeinated beverages since the first 3,000 years [61].In addition to its attractive aroma and pleasant taste, tea beverages also have many health and medicinal properties for humans due to many unique secondary metabolites in tea, such as polyphenols [62], Caffeine [63], and theanine [64]. During the growth of tea plant, the environment is a key factor affecting the growth and development of tea plant. For example, changes in temperature, soil moisture and salinity will affect the growth and harvesting of tea plant to varying degrees. Therefore, improving the stress tolerance of tea plant is essential for sustainable tea production. At present, many researchers have identified some stress-responsive tea plant genes, including genes heat shock proteins (HSPs) [65], HD-Zip [66] and SBP-Box[67]genes.The recently available whole-genome sequences offer a great opportunity for identification and characterization of the NF-Y family in tea plant, which could provide valuable information in understanding the functions of the NF-Y genes underlying the regulation of growth and development of tea plant.
In this study, we identified nine NF-YAs, sixteen NF-YBs and ten NF-YCs in Camellia sinensis using sequence information from Arabidopsis thaliana. Subsequently, we systematically analysis the physicochemical properties of NF-Y genes including gene structure, isoelectric point , hydrophobic property and protein stability. We also studied their conserved domains, phylogeny, divergence time and putative cis-regulatory elements in the promoters of these genes were conducted. Furthermore, expression difference of the CsNF-Y genes in different tissues and under cold and drought treatments were also detected by transcriptome analysis. The results obtained in this study will provide a reference for future studies of NF-Y family genes in tea plant growth and

Identification of NF-Y genes in Camellia sinensis
All Arabidopsis NF-Y protein sequences were used as queries for tBLASTn program to search the CsNF-Y proteins. In total, there are 45 putative NF-Y genes were founded. After Pfam search for the conserved domains of NF-Y proteins, a total of 35 NF-Y genes were identified and confirmed, including 10 NF-YA,16 NF-YB and 9 NF-YC genes ( Table 1).

Table 1 NF-Y transcription factors in Camellia sinensis
Protein properties and sequence analysis of CsNF-Y genes Characteristics of the 35 CsNF-Ys are shown in Table 2 Table 2 The detailed information of CsNF-Y members Gene structure and conserved motifs In order to gain an insight into the diversity of NF-Y genes in tea plant, the exon/intron organization and conserved motifs were further analyzed. The GSDS2.0 online tool was used to determine the gene structure of each predicted CsNF-Y gene (Fig. 1). According to the statistics, 34.3% of genes contain no introns, 11.4% of genes containing 1 to 3 introns, and 54.3% of genes containing more than 3 introns. Using MEME web server, we predicted 10 conserved motif distributions with the full-length protein sequences of the three CsNF-Y subfamilies. Experiments show that all of the genes have functional motifs, and there are significant differences in the distribution of motifs before the different groups, that basically consistent with the predictions of the previous conserved domains.
For 10 NF-YA proteins, all of them shared motif 3 and motif 4, and 9 members contained motif 8. In the case of 16 NF-YB members, two motifs (motif 1,2) were widely distributed.
All of NF-YC members had the motif 1 and motif 3, and 5 NF-YC members contained motif 6, 7 and 9.

Gene duplication and divergence time analysis
We first obtained 10 pairs of paralogous gene pairs based on the phylogenetic tree of the CsNF-Y gene family, and calculated the nonsynonymous(Ka) and synonymous(Ks) rate of each group. According to their ratio, which displayed from table3, we found that Ka/Ks were all less than 1, and we determined that the CsNF-Y gene family experienced a purifying selection.
In addition, we have obtained the replication events of these CsNF-Y genes based on the molecular evolution rate of dicotyledons, which occurred between 32.18 million and 3.9 million years ago. Then we chose kiwifruit as the near-source plant of tea plant, and we selected 11 pairs of orthologous genes based on the NF-Y gene family phylogenetic tree of two plants. It was found through calculation that the differentiation of tea plant and kiwifruit occurred between 49 and 13 million years ago. As shown in table 4, we found that the two pairs of genes CsNF-YB3_Ach28g200451.1 and CsNF-YB9_Ach24g427181.2 differentiated much earlier than other gene pairs. Excluding these two pairs of genomes, we found that the average time for tea tree and kiwifruit differentiation was 18 million years ago. Table 3 The Ka/Ks rate of CsNF-Y replication genes Table 4 The Ka/Ks rate of tea tree and kiwifruit NF-Y genes GO   After counting all the cis-elements as shown in Figure 6, we found that the functional components of light response of tea plant are the most widely distributed. We speculate that the CsNF-Y gene may regulate the light conditions of tea plant. In addition, it was found that only CsNF-YA4 possesses the element motif I related to root growth, so we predict that CsNF-YA4 is closely related to the growth of tea roots. Moreover, we found that CsNF-YB6 has a functional element WUN-motif associated with plant damage and recovery function, and we predict that this gene is involved in the damaged recovery function of tea plant.

Gene expression analysis
In this experiment, the transcriptomes of different tissues of tea plant were analyzed to obtain the differential expression of CsNF-Y gene family in different tissues. According to the heat map as shown in Figure 7, we can clearly find that many genes in the CsNF-Y gene family are differentially expressed in different tissues, and some of them are significantly expressed in a single tissue.
Using detailed analysis of the heat map, we found that CsNF-YA6 is significantly expressed in young leaves, while CsNF-B10, CsNF-B13 and CsNF-C2 are specifically expressed in mature tea leaves. From this we conclude that these genes have an important relationship with the growth and development of tea leaves. At the same time, we also found that CsNF-B12 is differentially expressed in tea flower and CsNF-B4 and CsNF-B15 are abundantly expressed in tea fruit. Based on this, we conclude that these three genes are involved in the growth of tea flower and fruit. In the heat map, we can also find that CsNF-A8, CsNF-B9, CsNF-C4 and CsNF-C9 are only abundantly expressed in the roots of tea plant. We speculate that these genes play an active role in promoting the growth of tea root. In current study, a total of 35 NF-Y subunits, including 10 NF-YA, 16 NF-YB, and 9 NF-YC subunits, were identified based on the genome data of tea plant. In our study, we found that the CsNF-Y gene family have CBF conserved domain sequences, and Histone conserved sequences in both NF-YB and NF-YC gene sequences. These conserved sequences are intrinsic gene sequences of NF-Y gene family. We used the GSDS2.0 program to visualize the CsNF-Y gene structure. The results showed that compared with the relatively consistent intron/exon structure in the NF-YA subunit, most of the NF-YB and NFYC gene structures were less, which were consistent with that observed in Arabidopsis, common bean and Physcomitrella patens. These structures of the CsNF-Y gene family are diverse and contain more genes in introns, and the proportion of genes containing more than 3 introns reached 54.3%.
In order to understand the phylogenetic relationship of the CsNF-Y gene family, many orthologous genes are constructed into phylogenetic trees after multiple sequence alignments.
From the phylogenetic tree of the three groups of CsNF-Y gene family, it can be found that NF-Y gene members of the dicotyledonous and monocotyledon are staggered and not form separate branches, which indicates CsNF -Y gene members are more conservative in the evolutionary system. By observing the phylogenetic tree, we predicted that CsNF-YA1 gene plays an important biological function in the seed development stage of tea plant.
The CsNF-YA5 gene is predicted to be involved in the tea plant maturation process, and CsNF-YA2 and CsNF-YA8 genes are predicted to participate in the drought-resistant process of tea plant. At the same time, it is predicted that CsNF-YB12, CsNF-YB14 and CsNF-YC7 can regulate the flowering process of tea plants. And the ratio of the nonsynonymous substitution rate (Ka) to the synonymous substitution rate (Ks) was less than 1, thereby determining that the CsNF-Y gene family experienced a purification selection.
In addition, we have predicted the replication events of these CsNF-Y genes according to the molecular evolution rate of dicotyledons, which occurred between 32.18 million and 3.9 million years ago. The differentiation of tea plant and kiwifruit occurred between 49 and 13 million years ago.
The analysis of gene expression patterns in different tissues will help us to understand the dynamic gene expression of NF-Y genes in tea plant. Therefore, we displayed heat map of the gene expression patterns of 35 identified CsNF-Y genes using published transcriptome data of tea plant different tissues downloaded from NCBI database. According to the analysis of the heat map, our study found that many genes in the CsNF-Y gene family are differentially expressed in different tissues, and some of them are significantly expressed in a single tissue.
After further analysis, we can predict that CsNF-YA6, CsNF-YB10, CsNF-YB13 and CsNF-YC2 have an important relationship with the growth and development of tea leaves. At the same time, we also predicted that CsNF-YB12, CsNF-YB4 and CsNF-YB15 are involved in the growth of tea flower and fruit. In the heat map, we can also find that CsNF-YA8, CsNF-YB9, CsNF-YC4 and CsNF-YC9 are only abundantly expressed in the roots of tea plants. We speculate that these genes play an active role in promoting the growth of tea roots. These results might suggest that CsNF-Y genes had many putative functions in processes of tea plant growth and development.

Conclusions
In the present study, a comprehensive analysis of NF-Y gene family in tea plant was carried out for the first time. The conserved domain, gene structure, phylogenetic relationship, , gene duplication and expression pattern analysis of CsNF-Y genes were investigated. The CsNF-Y genes were divided into three clades: A (10 genes), B (16 genes), C (9 genes), which were supported by gene structural and conserved domain analysis. The biological function of the CsNF-Y proteins was predicted by phylogenetic analysis with orthologous proteins, and this provided a preliminary understood for the less-studied CsNF-Ys. The differential expression patterns of CsNF-Y genes in different tissues of tea plant showed that these genes play different roles in tea plant development and many exhibit tissue-specific expression patterns. This study is the first report to identify CsNF-Y genes in tea plant and may contribute to the future studies about the CsNF-Y gene family.

Identification of the NF-Y gene family in tea plant
The sequences of 36 Arabidopsis NF-Y proteins were downloaded from the TAIR database (http://www.arabidopsis.org) and PlantTFDB database (http://planttfdb.cbi.pku.edu.cn).
The Camellia sinensis geneome were obtained from the TPIA(http://tpia.teaplant.org). The TBLASTN program was applied with an E-value cut-off of 1.0e −5 to identify predicted CsNF-Y sequences in the tea plant genome. In addition, a Hidden Markov Model (HMM) profile for the NF-Y domain from the Pfam database (http://pfam.janelia.org) was also used as a query to confirm all candidate sequences by searching for NF-Y domain sequences.
The identified NF-Ys were named according to their order in the scaffold from top to bottom.

Protein properties and sequence analysis
The protein molecular weights (MWs) and isoelectric points (pIs) were predicted using the ExPASy tools (http://web.expasy.org/compute_pi/) based on the deduced NF-Y sequences.
The subcellular localization of the NF-Y proteins were predicted using the TargetP 1.1 tool based on the N-terminal amino acid sequence. The MEME program was used to identify the conserved motifs in the CsNF-Y proteins (parameter setting: maximum number of motifs, 10; maximum width, 50). The gene exon/intron structure of the CsNF-Y genes were determined using the Gene Structure Display Server2.0 (GSDS2.0) [39] by comparing their predicted coding sequence (CDS) with their genomic sequences.

Multiple alignments and phylogenetic analysis
The multiple sequence alignments for CsNF-Y and AtNF-Y protein sequences were performed using CLUSTAL X software with manually adjustment where appropriate for the alignment of the NF-Y domain, then displayed using DNAMAN software. The phylogenetic trees were constructed based on the alignment with the Neighbor Joining method using MEGA7.0 software with the pairwise deletion option and bootstrap tests of 1,000 replicates. The combined phylogenetic tree of selected ZmNF-Y, SlNF-Y and OsNF-Y, proteins was generated using the same method.

Gene duplication and divergence time analysis
The predicted CsNF-Y proteins were first aligned by ClustalW2 at MEGA7.0 software to a gene duplication analysis. The MCScanX software (http://chibba.pgml.uga.edu/ mcscan2/) was used to identify the duplications of CsNF-Y proteins. The synonymous (Ks) and nonsynonymous substitution rate (Ka) of the different gene duplication pairs were estimated using DnaSP 5 software. The Ks values were used to predict the divergence time (T) occurred in tea plant, according the formula: T = Ks/2λ × 10 -6 Mya (λ =6.5 × 10 −9 ).

GO and KEGG enrichment analysis
In this experiment, the GO and KEGG annotation files of all the genes of tea plant were used as background genes, and the CsNF-Y gene family was used as the foreground gene.
The background gene and foreground genes of tea plant were analyzed by the clusterprofiler package in R language to obtain GO and KEGG enrichment analysis of the CsNF-Y gene family.

Putative promoter region analysis
For the promoter analysis, the 1,500 bp upstream sequences of the start codon of each CsNF-Y genes were extracted from the tea plant genome sequence as putative promoter region. The PLACE website was used to search for cis-acting regulatory elements in the putative promoter regions.

Gene expression analysis
To study gene expression levels of CsNF-Y genes in different tissues or development stages. The raw RNA-seq reads from different tea plant tissues were obtained from the NCBI SRA database, including apical bud, flower, fruit, young leaf, mature leaf, old leaf, stem and root .
There raw reads were trimmed to remove low quality adaptor sequences with Trimmomatic tool. Then, the paired clean reads were mapped to the C. sinensis reference genome using RSEM with defaults parameters. To visualize the expression patterns of the NF-Y genes in different tea plant tissues, the fragments per kilobase million (FPKM) and heat map were created with EdgeR software.      Expression profile analysis of CsNF-Y genes in different tea plant tissues.
Transcriptome data (FPKM) were used to measure the expression levels of CsNF-Y genes in apical bud, flower, fruit, young leaf, mature leaf, old leaf, stem and root . The colored scale for the different expression levels is shown.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to