Genome-wide scanning of CCT genes
Based on the available genome-wide assembly and annotation in wheat Cultivar CS , the wheat CCT genes were searched by employing the HMM profile using the HMMER3.0 package. After the redundant sequence filtering, Batch CD-Search/FGENESH + verification, and gene complementation by BLAST, a total of 127 candidate sequences were obtained. These genes were subsequently designated based on the available sequences of Brachypodium, rice, and wheat [22, 38], and 54 CO/COLs, 46 CMFs, 15 PRRs, and 12 Zinc-finger protein expressed in Inflorescence Meristem genes (ZIMs), which belong to 18 CO/COL clusters, 13 CMF clusters, 5 PRR clusters, and 4 ZIM clusters, were assigned. Among them, TaCMF6 and TaCMF8 consisted of 7 and 6 members, respectively. For the assignment, TaCMF8-A2 indicates the 2nd copy of the 8th CMF gene in the A-genome of T. aestivum. The protein product of these genes consisted of 170 (TaCMF14-D) to 763 (TaPRR73-D) amino acids, and lead to a molecular weight ranging from 19.48 kD (TaCMF14-D) to 83.21 kD (TaPRR73-D) and theoretical isoelectric point ranging from 4.32 (TaCO13-B) to 10.22 (TaCMF14-B). Most of the 127 proteins are located in nucleus (111), and only 11 occurred in the chloroplast, 3 were cytoplasmic, and 2 were extracellular, indicating that CCT proteins were predominantly nuclear proteins that had flowering regulatory functions by acting as transcription regulators (Table S1) .
The phylogenetic reconstruction of CCTs
Phylogenetic reconstruction using the UPGMA method with 1,000 bootstrap replications and the Jones-Talor-Thornton (JTT) model in wheat, Arabidopsis, Brachypodium, and rice revealed that the CCT proteins could be classified into 8 groups. Consistent with the findings of previous studies, Groups I to III contain all of the CO/COLs, and the same classification groups were adopted . Among these, 24 wheat, 7 rice, 7 Brachypodium, and 6 Arabidopsis members were assigned to Group I; 9 wheat, 3 rice, 3 Brachypodium, and 4 Arabidopsis members to Group II; and 21 wheat, 7 rice, 6 Brachypodium, and 7 Arabidopsis members to Group III. Although failled to cluster with Group I, Group IV was still adopted the same classification group as previously, even if their members were considered to be CMFs due to the lack of B-box domain [22, 36]. In this group, 9 wheat, 2 rice, 1 Brachypodium, and no Arabidopsis members were assigned. The rest of the CMFs were assigned to Groups V and VI, except for AtCMF5 and AtCMF7, which was placed in Group VII, together with the PRRs. Group V consisted of 28 wheat, 9 rice, 7 Brachypodium, and 9 Arabidopsis members; Group VI comprised 9 wheat, 2 rice, 3 Brachypodium, and 4 Arabidopsis members; and Group VII included 15 wheat, 5 rice, 5 Brachypodium, and 7 Arabidopsis members. Interestingly, the PRRs were further divided into 3 clades, namely TOC1, PRR3/7, and PRR5/9. Group VIII contained all of the ZIMs that consists of 12 wheat, 4 rice, 6 Brachypodium, and 3 Arabidopsis members and is further roughly divided into the ZIM1/ZIM3 and ZIM2/ZIM4 clades (Fig. 1).
The conserved domain and gene structure of CCTs in wheat
According to the conserved domains predicted by the Batch CD-Search tool, motifs analyzed by MEME suite, and phylogenetic relationship of the CCTs in wheat, their gene structure coincides with their phylogenetic taxonomic status (Fig. 2). Although most of the CCT domains (corresponding to motif 1 that is predicted by the MEME suite, and motifs 2 to 8 originated from the same prediction) are located in the C-terminal of the sequences except for its loss in TaCO13-B, Group VIII proteins (ZIMs) have their CCT domains in the middle, with the extra tify domain (corresponding to motif 5) at the N-terminal, and the Znf_GATA domain (corresponding to motif 4) at the C-terminal. Group VII proteins (PRRs) harbor their Response_reg domain (corresponding to motif 6 plus motif 3) at the N-terminal, and Groups I to III proteins (CO/COLs) harbor one or two B-box zinc finger domains (corresponding to motif 2 or 7), with all proteins harboring motif 2. Specifically, TaCO13 and TaCO14 harbor two B-boxes, while the rest of the proteins in Group III and all proteins in Group II harbor only one. Both proteins harboring one or two B-boxes are appeared in Group I, and even some obvious differences for TaCO1 and TaCO7 were predicted using B-box prediction in the MEME suite and Batch CD-Search tool (Fig. S1). Furthermore, Group IV to VI proteins (CMF) do not contain any domains other than CCT, except for an unknown motif (motif 8) located at the N-terminal of TaCMF6 in Group V.
The conserved domains of these genes are usually interrupted by introns. For example, the tify domains are usually located in exons 1 and 2, Zn_GATA domains in exons 5 and 6, and Response_reg domains in the first three coding exons, whereas the B-box domains are always located in the first coding exon without interruption. In addition, the CCT domains displayed a distinct interruption mode that coincides with their phylogenetic classification: the ZIMs (Group VIII) harbor the CCT domain in the 3rd and 4th coding exons, and its products are interrupted at the 32nd residue of the unique 43-amino acid protein fragment. The CCT domains of Group III gene products are interrupted at the 16th residue and the Group V–VII members just after the 22nd, 37th, and 20th residue, respectively. These genes harbor CCT domains in the last two exons, except for TaCMF4 and TaCMF10 in Group V, where these are located at antepenultimate and penultimate exons, and TaTOC1 in Group VII at the last exon. The rest of the CCT genes, i.e., the members of Groups I, II, and IV, harbor the CCT domains in the last exon, without any interruption (Fig. 2 and Fig. 3). Furthermore, the alignment of the CCT domains revealed that these 43-amino acid regions are highly conserved in wheat. Among these, 8 residues (R1, K11, Y23, R26, A30, R35, G38, and F40) are completely identical, whereas R15 and K27 are identical except for TaCMF7 and TaCMF9, respectively. In addition, a PL insertion occurs upstream L17 in TaCMF7 (Fig. 3 and Fig. S2).
Exon distribution in these genes is also divergent, which varies from 1 to 9. Although most of Group I, II, and IV genes embrace 2 exons, TaCO8 in Group I and TaCO12 in Group II embrace 2. Group VI genes, together with another Group I gene TaCO7, embrace 3 exons in their sequences. Group III genes embrace 4 exons, except for TaCO13-B. Groups V, VII, and VIII contain genes with the most divergent exon distribution, which varies from 2 to 5, 6 to 9, and 7 to 8, respectively (Fig. 2, Table S1).
Genome distribution analysis
The 127 CCT genes were further analyzed in terms of distribution across 21 chromosomes, whose lengths vary from 474 Mb to 831 Mb. The result indicated that wheat chromosome 3 contain the fewest CCT genes, with only one in each subgenome, followed by chromosome 2 with 3; whereas chromosomes 4B, 4D, and 7D contain the highest number of CCT genes with 10, and chromosomes 7A and 7B with 9. Interestingly, more than one copy of TaCMF6 and TaCMF8 occurs in each sub-genome and is tandemly arranged near telomeres. However, TaCMF13 did not belong to the same cluster as TaCMF14, although these were tandemly arranged in chromosome 5, as it is more closely related to TaCMF15 in chromosome 4 (Fig. 1, 63.58% identity by pairwise alignment between TaCMF13 and TaCMF15, and 43.37% between TaCMF13 and TaCMF14). Furthermore, the gene rearrangement of chromosome 4A is somewhat different from that of chromosomes 4B and 4D, and a recent recombination between chromosomes 4A and 5A near telomeres is also revealed by the transposition of TaCMF4 and TaCMF8 from chromosome 4A to chromosome 5A. Finally, the existence of three TaPPD1 genes and designation of TaPPD1-A and -D imply that TaPPD1-U is the B-genome gene of TaPPD1, and the confusing TaCO20 arrangement in the genome indicates the possibility of B-genome location of TaCO20-U and occurrence of another recombination event (Fig. 4).
The expression pattern of CCT genes under vernalization
To clarify the potential functions of CCT genes under vernalization in wheat, a transcriptome analysis using leaf tissues of a winter wheat cultivar Shiluan 02 − 1 collected before, during, and after vernalization, was performed. Group II and VI genes exhibited insignificant changes in expression with vernalization, with a continuous low expression in Group VI and high expression of Group II genes, particularly TaCO10. TaCO13, TaCO16, and TaCO15 showed the highest expression in Group III, wherein TaCO16 is upregulated and TaCO15 downregulated under vernalization. Another Group III gene, TaCO18, which is barely expressed under normal conditions, was also simultaneously upregulated with vernalization. Group I genes were the most differentially expressed CCT genes, in which TaCO3, TaCO4, and TaCO6 showed the highest expression levels, with TaCO6 being upregulated, while TaCO2 and TaCO8 had almost no expression, and TaCO1 was downregulated. Although six copies of TaCMF8 were observed in Group IV, only TaCMF8-B showed slight expression, whereas no expression data was collected for TaCMF8-B2, while its paralog gene TaCMF11 was continuously upregulated under vernalization. Most of the Group V genes exhibited low expression levels except for TaCMF6, which was significantly upregulated under low-temperature conditions. Furthermore, the expression of PRR genes (Group VII), in which TaPRR95 and TaPRR73 upregulated, was sustained at a relatively high level except for TaPPD1, and TaZIM4-A was the only highly expressed ZIM gene (Group VIII), as the expression data of TaZIM4-B/D was also not collected (Fig. 5).
To further assess significant changes in the expression of CCTs under vernalization, we compared the expression levels of genes during/after to those before vernalization. Among these, 49 genes were upregulated and 31 downregulated at least at one time point. Of these genes, 8 were continuously upregulated, 11 for most of the time, and 10 only for a short time. Among these, TaCMF6, TaCMF11, TaCO18, TaPRR95, and TaCO16 were the most significantly upregulated genes, and remained upregulating even after vernalization. However, only 2 genes were continuously downregulated, while 7 most of time, and 11 only for a while, with TaCO1 and TaCO15 showing the most significant downregulation that persisted even after vernalization (Fig. 6, Table S3).
Expression analysis using real-time PCR
To further validate the expression profile of vernalization-related CCT genes, real-time PCR analysis of the mostly differentially expressed genes, and the most popularly studied CCTs were conducted. Of these genes, TaCMF6, TaCMF11, TaCO18, and TaPRR95 were significantly upregulated, and TaCO16 showed slight upregulation under vernalization, which coincided with the results of RNA-sequencing. Surprisingly, their expression levels rapidly decreased to pre-vernalization levels immediately after the temperature increased. However, the remaining genes were downregulated with continuous exposure to low temperature and maintained this low expression even after vernalization. Interestingly, unlike other genes with expression levels that gradually decreased with vernalization, those of TaCMF8 and TaCO1 were rapidly reduced to remarkably low levels (Fig. 7).