Identification and Phylogenetic Analysis of CONSTANS-like Genes in C. sinense
Eight CONSTANS-like genes were screened in the C. sinense transcriptomic database based on functional annotation of isoforms and analysis of sequence similarity. The eight primer pairs (Table S1.1) were designed and used to clone the ORFs of CONSTANS-like genes. The sequences of eight CsCOL genes are listed in Table S2. They were named as CsCOL1 and CsCOL3-9, respectively. Their GenBank accession numbers are GU168786, OR526963, OR526964, OR526965, OR526966, OR526967, OR526968, OR526969, respectively. The physicochemical properties and subcellular localization of all genes were analyzed. The length of coding sequences (CDS) of the eight CsCOL genes ranged between 831 bp and 1380 bp, and the length of the proteins that they encoded ranged between 227 and 460 amino acids. The range of molecular weights (MWs) spanned from 30.86 to 49.94 kDa, and that of isoelectric points (pIs) spanned from 5.03 to 7.53. Prediction of subcellular localization showed that they were all localized in the nucleus (Table S3).
An amino acid-based phylogenetic tree, which was constructed with MEGA11, was used to assess the evolutionary relationships of the eight CsCOL genes against the COL genes of other plants, namely A. thaliana, Oryza sativa, Zea mays, and Hordeum vulgare. Based on this phylogenetic analysis (Fig. 1), the amino acid sequences of eight CsCOL proteins were classified into three groups according to the number and structure of conserved B-box domains. CsCOL1/3/4 were clustered in Group Ⅰ. CsCOL1/3 contained two B-boxes and a CCT domain, similar to AtCO and AtCO1-5. CsCOL1/3 formed a sister group and displayed 69% identity with each other. CsCOL3 also shared 36% and 34% identity with AtCO from A. thaliana and OsHd1 from O. sativa, respectively. CsCOL4, similar to HvCO3/8, only possessed a B-box and a CCT domain, but was classified into Group I. CsCOL4 was only 30% and 24% identical to CsCOL1 and CsCOL3, respectively. However, CsCOL4 shared a high identity with AtCO (33%) and OsHd1 (32%). Among the eight CsCOL genes, only CsCOL9 was found in Group Ⅱ. CsCOL9 contained one B-box domain and a CCT domain, similar to AtCOL6-8 and AtCOL16. CsCOL5-8, which were clustered in Group Ⅲ, contained a normal B-box domain, a divergent B-box domain and a CCT domain, similar to AtCOL9-15. CsCOL6/7 formed a sister group that was closely related to OsCOL9, and AtCOL9/10. CsCOL6/7 showed 65% identity with each other, also shared a high identity with OsCOL9 (61%) and AtCOL9 (63%). Most CO/COL homologs in the same group possessed the same protein domain structure. Apart from CsCOL4 in Group I, this phylogenetic classification of CsCOL6/7, OsCOL9 and ZmaCOL12 in Group III were also different from the classification based on differences in the B-box domain, containing two B-boxes, similar to the protein domain structure in Group Ⅰ (Fig. 1). This diversity of CsCOL genes may indicate the existence of functional genetic divergence in C. sinense.
To further analyze the differences of COL homologs in the three groups, the amino acid sequences in conserved B-boxes and CCT domains of eight CsCOL and 17 AtCOL proteins were aligned (Fig. 2). The alignment indicated that B-box1 and B-box2 domains displayed 82% and 73% identity between CsCOL1/3 and AtCOL proteins in Group Ⅰ, respectively (Fig. 2A, 2B). The consensus sequence of B-box1 was C-X2-C-X8–C-X-A-D-X-A-X-L-C-X2-C-D-X3-H-S-A-N-X-L-X2-R-H, and 17 out of 38 (44.7%) amino acids were fully conserved (Fig. 2A). The consensus sequence of B-box2 was C-X11-C-X2-D-X-A-X-L-C-X2-C-D-X3-H-X7-R-H, and 11 out of 38 (28.9%) amino acids were fully conserved (Fig. 2B). The B-box1 and divergent B-box2 domains from CsCOL5–8 and AtCOL9–15 in Group Ⅲ showed 74% and 60% identity, respectively. In particular, in the divergent B-box2 domain, only five out of 29 amino acids (17.2%) were fully conserved (Fig. 2D). The B-box1 domain showed 81% identity among CsCOL9 and AtCOL6/7/8/16 in Group Ⅱ. Its consensus sequence was C-X2-C-X5-A-X-W-Y- C-X2-A-F-L-C-X2-C-D-X3-H-S-A-N- X2-A
-X2-H, and 20 out of 38 (52.6%) amino acids were fully conserved (Fig. 2E). The CCT domain showed 76% identity among eight CsCOL and 17 AtCOL proteins, and 16 out of 42 (38%) amino acids were fully conserved (Fig. 2F). Thus, the most conserved domain was the B-box1 domain of CsCOL9 in Group Ⅱ and the least conserved domain was the divergent B-box2 of CsCOL5-8 in Group Ⅲ, relative to the AtCOLs domains.
Subcellular Localization of Five CsCOL Proteins
To test whether CsCOL proteins are localized in the nucleus and whether the CCT domain lies near the C-terminus, five CsCOL proteins (CsCOL3/4/6/8/9) from three groups were selected to analyze their subcellular localization. For each, a translational fusion of yellow fluorescent protein (YFP) and CsCOL proteins (35S:YFP-CsCOL) was constructed, and a nuclear localization marker (AtCO-mCherry) was used to identify the localization of CsCOL proteins using the transient expression system of Nicotiana tabacum epidermal cells. The five YFP-CsCOL fusion proteins were mainly detected in the nuclei (Fig. 3). The mCherry signal of the AtCO-mCherry nuclear localization marker overlapped with the signals of YFP-CsCOL3/4/6/8/9, indicating that CsCOL3/4/6/8/9 were clearly localized in the nucleus of N. tabacum epidermal cells, similar to AtCO from A. thaliana [6] and PhalCOL from Phalaenopsis hybrida [24]. Based on this finding, we conclude that CsCOL3/4/6/8/9 are nucleus-localized proteins.
Expression Patterns of Five CsCOL Genes in Various Organs and Developmental Stages
Five CsCOL genes (CsCOL3/4/6/8/9) from three groups were selected to analyze their tissue-specific expression patterns by qRT-PCR. The five CsCOL genes were detected in almost all organs (roots, pseudobulbs, leaves, sepals, petals, lips, columns and ovaries) at the initial flowering stage. Five CsCOL genes were mainly expressed in the leaves, and the lowest expression was in roots (Fig. 4). Apart from their high expression levels in leaves, the expression of CsCOL genes was also high in floral organs, particularly in sepals or lips (Fig. 4).
To comprehensively compare the expression levels of the five CsCOL genes in various organs, a heat map was constructed based on qRT-PCR results. The result was shown in Figure S1. The expression of CsCOL3 in roots was set to 1, and the relative expression of other genes was then adjusted. A visual gene expression profile was generated by TBtools software [25]. Based on a color code, CsCOL3/4 in Group I were more highly expressed in all organs than other CsCOL genes. CsCOL6/8 in Group III were mainly expressed in leaves, but their expression levels were lower than those of CsCOL3/4 in Group I while CsCOL9 in Group II was specifically and highly expressed in leaves, relative to other organs (Figure S1).
To analyze the expression patterns of the five CsCOL genes in different developmental stages, their expression levels were detected in four representative developmental stages: vegetative growth (VG), flower bud differentiation (FD), pedicel development (PD), and initial flowering (IF) (Fig. 5). The expression levels of CsCOL3/4/9 gradually increased from VG to IF, and peaked at the IF stage, except for the lowest expression levels of CsCOL4 at the FD stage. Highest expression of CsCOL6/8 was at the PD stage, but it began to decrease in IF. Moreover, CsCOL6 was barely expressed during VG but was strongly expressed at other floral developmental stages. These results suggest that these five CsCOL genes might play different roles in the different floral developmental stages of C. sinense.
Expression Patterns of Five CsCOL Genes in Different Photoperiods
To further study the photoperiodic rhythm of the five CsCOL genes, their expression patterns in leaves in different photoperiods were analyzed by qRT-PCR. As shown in Fig. 6, the diurnal oscillation of the five CsCOL genes exhibited three patterns after LD or SD treatment. CsCOL3/6 expression exhibited similar diurnal fluctuations and showed a single peak in the first 24 h in LD, and was lowest after 4 h of light, but peaked after 4 h of darkness, then gradually decreased in the first 24-h period. Circadian expression in the second 24-h period was similar to that in the first 24-h period. CsCOL3/6 expression patterns during 48 h in constant light were similar to their response in LD. CsCOL3/6 expression in SD also exhibited similar diurnal fluctuations and showed a single peak in the first 24 h. Their expression increased in light. The peak occurred after roughly 4 h of darkness then decreased until 24 h after
dawn. In the subsequent 48 h of constant light, fluctuations in expression repeated the pattern in SD. CsCOL3/6 expression was higher in SD than in LD, suggesting that CsCOL3/6 was strongly induced in SD relative to LD. The circadian expression patterns of CsCOL4/9 showed no significant differences between LD and SD. Their expression was repressed in light, lowest at dusk, and showed a dramatic increase in the dark, peaking at dawn. In continuous light, the rhythm of CsCOL4/9 expression was repeated, similar to LD or SD. These results suggest that the diurnal expression rhythm of CsCOL4/9 was not affected by the duration of light. The expression of CsCOL8 in SD exhibited diurnal fluctuations and peaked twice in a 24-h period. The expression was gradually up-regulated and peaked initially at dust while the second peak occurred at 16 h in the dark in SD. The expression of CsCOL8 in LD was more erratic, and its expression level was higher in SD than in LD in the subsequent 48 h of constant light. The results suggest that CsCOL/3/4/6/8/9 had different response mechanisms and functions to photoperiodic regulation in C. sinense.
Overexpression of CsCOL3/4/6/8 in A. thaliana Affect Flowering and Growth
The phylogenetic analysis shown that CsCOL3/4 were grouped with CsCOL1 in Group Ⅰ (Fig. 2). CsCOL1 promotes early flowering in A. thaliana (Zhang et al., 2020). CsCOL5/6/7/8 were grouped with AtCOL9 in Group Ⅲ (Fig. 2). AtCOL9 delays flowering in LD (Cheng et al., 2005). Moreover, the second B-box domain was different between CsCOL3 and CsCOL4, and between CsCOL6 and CsCOL8 (Fig. 1). Consequently, we first selected CsCOL3/4/6/8 to analyze their biological functions. The 35S::CsCOL vectors were constructed and transformed into A. thaliana, and three independent transgenic T2 generation plants were randomly selected to examine their flowering time in LD and SD. We confirmed that the transgenic lines carrying an empty vector did not differ significantly from wild type (WT) plants. All transgenic lines showed high expression levels of CsCOL3/4/6/8 in A. thaliana (Figure S2).
Three CsCOL3 transgenic lines (CsCOL3-ox1/2/3) showed an early flowering phenotype under LD, with shortened flowering time (Fig. 7A,7C). However, overexpression CsCOL3 in A. thaliana had no significant effect on flowering time in SD, but could induce more inflorescences (Fig. 7B,7C). Compared with WT plants, the number of rosette leaves in transgenic lines decreased significantly under both LD and SD (Fig. 7D). The expression levels of AtCO and AtFT were significantly upregulated in 35S::CsCOL3 transgenic plants under both LD and SD, relative to WT (Fig. 7E,7F).
The 35S::CsCOL4 transgenic lines (CsCOL4-ox1/2/3) had an early flowering phenotype under both LD and SD (Fig. 8A, 8B). Compared to WT, the flowering time was reduced significantly in transgenic plants under LD and SD (Fig. 8C). The number of rosette leaves during bolting in transgenic plants was reduced significantly under LD, but increased under SD (Fig. 8D). The expression level of AtCO and AtFT increased significantly in 35S::CsCOL4 transgenic A. thaliana under both LD and SD (Fig. 8E, 8F).
Overexpression of CsCOL6 in transgenic A. thaliana did not promote early flowering under LD (Fig. 9A), but displayed early flowering phenotype under SD (Fig. 9B). The flowering time and the number of rosette leaves in transgenic plants were also no significant difference between 35S::CsCOL6 transgenic A. thaliana with WT plants under LD (Fig. 9C, 9D). While under SD, the flowering time reduced, and the number of rosette leaves during bolting increased significantly in transgenic plants (Fig. 9C, 9D). The expression level of AtCO and AtFT were significantly up-regulated in 35S::CsCOL6 transgenic A. thaliana under SD (Fig. 9E, 9F), consistent with their phenotypic results.
Overexpression of CsCOL8 in transgenic A. thaliana showed different flowering phenotype under LD and SD. Overexpression of CsCOL8 showed a late flowering phenotype in LD (Fig. 10A), but promoted earlier flowering in SD (Fig. 10B). Compared with WT plants, the flowering time in transgenic lines increased under LD and reduced under SD (Fig. 10C). However, the number of rosette leaves in transgenic lines increased under both LD and SD (Fig. 10D). The expression levels of AtCO was decreased in 35S::CsCOL8 transgenic plants both under LD and SD, relative to WT (Fig. 10E). But the expression levels of AtFT only increased under SD (Fig. 10F).