Tracing the origin, evolutionary history, and biological functions of CKI genes: a focus on Gossypium spp

Background: Casein kinase I (CKI) is a kind of serine/threonine protein kinase highly conserved in plants and animals. Although molecular function of individual member of CKI family has been investigated in Arabidopsis, little is known about their origin and evolutionary history in the plant kingdom. Results: In this study, seven representative plant species (with a focus on cotton) are applied to study gene family evolution and characterize the origin of CKI genes. Three important insights were gained: (i) the ancestral CKI genes were traced back to 250 million years ago; the family expansion occurred in different plant species through independent genome duplication events; (ii) the CKI genes were classied into two types, on the basis of their structural characteristics; (iii) expression prole analysis revealed that cotton CKI genes had various expression patterns in different tissues and exhibited inducible expression in response to photoperiod (circadian clock), light signal and heat stress during cotton anther development. Conclusion: This study provides genome-wide insights into the evolutionary history of cotton CKI genes and lays a foundation for further investigation of their roles in specic developmental processes and/or environmental stress conditions. GhCKI9A/D, GhCKI12A/D, GhCKI13A/D, GhCKI19A/D, GhCKI26A/D,GhCKI28A/D, GhCKI29A/D, GhCKI30A/D, and GhCKI31A/D were very high in anthers. The transcripts of some other genes, such as GhCKI4A/D, GhCKI14A/D, GhCKI20A/D, and GhCKI27A/D, were preferentially expressed in leaves. These results provide additional insight into their roles during different growth and development processes in cotton.

Plant-speci c Casein Kinase 1, is critical for maintaining proper owering time [19]. In cotton, GhCKI was speculated to regulate not only tapetal programmed cell death (PCD) and anther dehiscence [20], but also somatic embryogenesis by modulating auxin homeostasis [21]. Since the functions of most CKI genes in higher plants are largely unknown, their identi cation and characterization are particularly relevant for understanding their role and to potentially employ them as genetic resources for improving crop plant defense against biotic and abiotic stresses. In cotton, no genome-wide characterization of the CKI gene family has been reported so far. However, the recently published cotton species genomic information is a solid foundation for characterizing CKI genes at a genome-wide level. Here we have investigated several fundamental questions regarding the CKI gene family evolution: (i) the evolutionary expansion of the CKI gene family; (ii) the diversity of gene structure and domain architecture; and (iii) expression pro les of the CKI genes under different conditions. In summary, we retraced the evolution of the CKI genes to better understand their essential elements and thus be able to exploit this knowledge for plant growth and development.

Identi cation and Classi cation of the Casein Kinase I(CKI) in Gossypium
To extract CKI sequences, BLASTP searches of the complete genomes of three sequenced cotton species (G. raimondii, G. arboreum, and G. hirsutum acc. TM-1) [22][23][24] were implemented using the homologybased method with 13 Arabidopsis CASEIN KINASE 1-LIKE (CKL)and HEADING DATE 16 (Hd16, a CKI protein) encodes as queries. As a result, 31, 30, and 61 CKI members were identi ed in G. raimondii (D genome), G. arboretum (A genome) and G. hirsutum (AD genome), respectively (Table 1). We named the 31 G. raimondii CKI genes GrCKI1 to GrCKI31. Considering G. hirsutum is an allotetraploid cotton species which contains A and D genomes, we named the 61 putative G. hirsutum CKI genes as GhCKI1A/D to GhCKI31A/D, following the same nomenclature system applied to G. raimondii CKI genes. To get a better understanding of the phylogenetic relationships between CKIs, an unrooted phylogenetic tree was generated using the CKI protein sequences from G. raimondii, G. arboreum and G. hirsutum acc. TM-1 ( Figure 1). The CKIs were classi ed into type I and II. A homolog of most CKI genes can be found once in the diploid G. raimondii, once in G. arboretum, and in two copies in the tetraploid G. hirsutum acc. TM-1 ( Table 1). The inconsistent presence of single homologous genes among these cotton species might result from gene gains or losses during their individual evolution process or from assembly errors in partial chromosomal regions, which need to be further con rmed.
Tetraploid cotton species such as G. hirsutum L. are thought to have formed by a polyploidization event that occurred approximately 1-2 million years ago, which involved D and A genome species (Wendel, 1989) [25]. The sequenced D-progenitor genome (G. raimondii) has been well assembled and annotated and its co-linearity with G. hirsutum acc. TM-1 genome resulted obvious [22,24]. Consequently, G. raimondii genome information was used to characterize the CKI family genes. The phylogenetic tree was independently constructed using the MEGA 6 software ( Figure S1). All G. raimondii CKI proteins fell into two distinct groups, which was consistent with the results of Figure 1. The type I CKI proteins were further divided into three subclasses: Group A, B, and C; the type II CKI proteins were classi ed into two subclasses: Group D and Group E ( Figure S1 and Table S1).

Gene Structure, Conserved Motifs and Domains of CKI Genes in G. raimondii
The gene structures were related to their roles. Hence, the gene structure of CKI proteins in G. raimondii were investigated. The putative conserved domains of GrCKI proteins were found using the online program Conserved Domain Search Service (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi). The type I GrCKI proteins were highly conserved within their kinase domains, but differed signi cantly in the length and primary structure of their N-terminal (8-53 aa) and C-terminal domains (176 aa up to more than 206 aa) (Figure 2a, Figure S2a). The gene structure of type I GrCKI proteins was consistent with previous reports [3][4]26]. The type II GrCKI proteins also presented conserved kinase domains. However, contrary to the type I GrCKI proteins, type II possessed a variable N-terminal (1-159 aa) and conserved Cterminal (278-355 aa) (Figure 2b and Figure S2b). In addition, several short sequences are absolutely conserved among the CKIs and not found in other kinases. To further con rm whether the conserved fragments are present in type I and II GrCKI proteins, sequence alignment was performed. The result showed type I GrCKI proteins possess the four short sequences LLGPSLEDLF, HIPXR, EXSRRDD, and LPWQGLKA ( Figure 2c). Type II GrCKI proteins contained three of the four conserved sequences (LGPSL, SRRDD, and LPWQG). However, compared with the type I CKI proteins, two speci c fragments LGKGGFGQV and HGDVKPEN were present in the type II GrCKI proteins only. Comprehensively, LGPSL, SRRDD, and LPWQG appeared in both type I and type II GrCKI proteins.
The 3D structure of GrCKI proteins was predicted by the software SWISS-MODEL (https:// swissmodel.expasy.org) [27]. The 3D structure of type I GrCKIs was conservative, the same for all 16 of them ( Figure S3). However, 3D structures of type II GrCKIs were variable: only some proteins showed similar structures (GrCKI17, GrCKI18, and GrCKI21; GrCKI24, GrCKI25, and GrCKI29). To obtain more insights into the diversity of motif compositions among GrCKI proteins, the conserved motifs in the Cterminal were further analyzed with the program MEME. As shown in Figure 2d, the conserved motifs 1-8 were identi ed. Based on the motif composition, GrCKI proteins were divided into ve groups (A-E); type I GrCKI proteins were divided into groups A to C; type II GrCKI proteins were classi ed into groups D and E.
Most of the type I GrCKI proteins within the same group shared similar motif compositions while high divergence was observed among different groups, implying that the type I GrCKI members within the same group may perform similar functions and that some motifs may play an important role in the speci c functions of the group. For example, all the type I GrCKI except GrCKI5 in Group A possessed motifs 1, 2, 3, 4, and 5, while all members in Group C contained motifs 1 and 4 ( Figure 2d). GrCKI22 and GrCKI23, which were type II GrCKI proteins, had short sequences and thus showed no motif. Except GrCKI20 and GrCKI29 which contained seven motifs, the vast majority of type II GrCKI proteins presented similar eight motifs composition. Generally, the consistency of motif compositions of GrCKI proteins with the phylogenetic groups further supported the close evolutionary relationships among GrCKIs, as well as the reliability of our phylogenetic analysis. To get a better understanding of the diversi cation of type I and type II GrCKI genes in G. raimondii, the exon/intron organization was analyzed. As expected, most GrCKI genes within the same group showed very similar exon/intron distribution patterns in terms of exon length and intron number ( Figure S4). For example, most type I GrCKI genes in groups A, B, and C had thirteen or fourteen exons of similar length, whereas members of type II GrCKI genes within Group D and E contained sixteen or seventeen exons, except for GrCKI22 (Gorai.013G175600.1),, GrCKI23 (Gorai.006G042700.1) and GrCKI29 (Gorai.007G343500.1),, which possessed one, one, and fourteen exons, respectively. Thus, similarly to the exon/intron organization, members belonging to the same group also showed similar motif composition, indicating their functional similarities. And the diverse evolutionary patterns in exon numbers of the CKI genes may hint at their functional diversi cations in gene expression. These results further supported the classi cation between type I and type II GrCKI genes.

The CKI Gene Family mainly Expanded through Genome Duplication
To examine the evolutionary history of the CKI genes, we rstly carried out phylogenetic analysis between the eudicots (G. raimondii and Arabidopsis) and monocot (rice) ( Figure S5). Clearly, the CKI genes from eudicots (G. raimondii and Arabidopsis) and monocot (rice) were present in all subgroups (Table S1), indicating that the appearance of most CKI genes in plants precedes the divergence between monocots/eudicots. The phylogenetic analysis also showed that the CKI members from different plant species were not evenly distributed in the subgroups. Some CKI genes in Arabidopsis have two or more counterparts in G. raimondii. For example, Group D contained ve G. raimondii CKIs but only one Arabidopsis member; Group E contained ten G. raimondii CKIs but there were only three Arabidopsis CKI genes. The ndings may indicate that the CKIs from different plant species underwent differential expansion, though the mechanism is still not understood. Plant evolution is characterized by genome duplication events, which in turn resulted in the expansion of many gene families. Four duplication events have been detected in cotton from the ancient angiosperms development period to the stage of tetraploid cotton species: an ancient whole genome duplication (WGD) very early in angiosperm evolution, one triplication event, the recent WGD event speci c for cotton, and the tetraploid event [22,25,28]. Previous reports calculated the absolute dates for the large-scale gene duplications using the assumed clock-like rates of synonymous substitution of 2.6 × 1.0 −9 substitutions/synonymous site/year [29]. Hence, based on Ks values, we calculated the divergence time of homologous CKI genes in G. raimondii. The result was compared with the estimated time of WGD, which suggests that there were three CKI genes for type I and two for type II. During the genome triplication event shared by owering plants, one of type II CKIs was duplicated into three copies (Figure 3a). After the subsequent ancient triplication event, the three type I CKI genes and one of the type II genes in Group D tripled. Interesting, only group A of the type I CKI gene showed the triploidization: the rest of the CKI only duplicated. It is also possible that they all underwent triploidization but then lost some copies, though it is di cult to determine. However, in all these events type II CKI genes in Group E did not even undergo duplication (Figure 3a). G. raimondii and Theobroma cacao originated from a common ancestor 18-58 million years ago [23]. Because of this, the gene numbers of type I or II CKI should match those of a common ancestor of G. raimondii and T. cacao. The CKI gene numbers in T. cacao were consistent with the speculation (Figure 3b and 3c; Table S1). Also, each CKI gene in T. cacao had one or more counterparts in G. raimondii. Type I and II CKI genes conformed to the rule. After the species divergence, the two types both underwent the duplication events. Furthermore, because of the tetraploid event, one type II CKI gene in group E duplicated and resulted in two different genes, Gorai.006G042700.1 and Gorai.013G175600.1. In short, the CKI subgroups showed various degrees and patterns of duplications, probably associated with many known WGD events which greatly expanded the number of genes in seed plants.
Expression Pro les of CKI Genes in Different Tissues from G. hirsutum Several ndings suggested that CKI genes are involved in plant development [16,17,20]. To further associate the biological functions of CKI genes with speci c developmental processes in cotton, the expression pro les in different organs/tissues (including roots, stems, leaves, petals, anthers, and 5 DPA [days post anthesis] ovules) were examined by quantitative RT-PCR (qRT-PCR). Because of the high sequence similarity between GhCKIA and GhCKID cDNAs, we designed one common primer pair for analyzing CKIA/D gene expression. After the speci city for each primer pair was veri ed, suitable qRT-PCR primer pairs for 44 (26 type I CKI genes and 18 type II CKI genes) of the 61 CKI genes (33 type I CKI genes and 28 type II CKI genes) were obtained (Table S2). As shown in Figure S6, the majority of the type I and type II CKI genes exhibited different tissue expression. GhCKI2A/D, GhCKI3A/D, GhCKI8A/D, GhCKI10A/D, GhCKI11A/D, GhCKI14A/D, GhCKI15A/D, GhCKI18A/D, GhCKI20A/D, and GhCKI27A/D were constitutively expressed in every tested tissue, implying that these genes may play regulatory roles at multiple developmental stages. In addition, some genes were highly expressed in speci c tissues only. CKI Genes Respond to High Temperature during Cotton Anther Development Process CKI genes exhibited the highest expression in cotton anthers, except for GhCKI14A/D and GhCKI27A/D ( Figure S6). Our previous studies showed that one member of the CKI gene family, GhCKI (Gh_A07G0121/GhCKI11A) was induced in H05 (the high temperature (HT)-sensitive line) anthers, but not in 84021 (the HT-tolerant line) anthers under HT condition [20]. Genome-wide analyses of G. hirsutum CKI genes in response to HT during anthers development may lay a foundation for further understanding the mechanisms involved in HT tolerance or HT sensitivity. In the present study, a heat-map representing expression pro les was produced using the transcriptomic data which contained the three different anther development stages (TS, tetrad stage; TDS, tapetal degradation stage; ADS, anther dehiscence stage) of 84021 and H05 under HT and NT [30]. It was observed that 34 of 61 GhCKI genes (19 from type I and 15 from type II) were differently expressed in 84021 and H05 under HT and NT, even after ltering with an absolute threshold log 2 ratio≥1 (Figure 4a). We found that most genes were more expressed in H05 after HT exposure, such as type I genes (GhCKI10 and GhCKI11)and type II genes (GhCKI20 and GhCKI27) Figure 4a). To verify the result, the qRT-PCR experiments were also performed (Figure 4b), but no obvious differences were observed after HT treatment in anthers, consistently with the RNA-seq data.
These results indicated that HT in uenced the expression of both type I and type II GhCKI genes during anther development.

Circadian Rhythm and Light Signal Regulation of GhCKI Gene Expression
To determine whether the expression of cotton CKI genes is regulated by photoperiod (circadian clock), the transcription level of G. hirsutum CKI genes under different diurnal conditions were investigated. Thirty-six G. hirsutum CKI genes were expressed at a su cient level to evaluate their circadian regulation ( To further investigate whether CKIs are involved in light signaling, the expression of G. raimondii ( Figure  S7) and G. hirsutum ( Figure 6) CKI genes in cotyledon under light and dark conditions was examined by qRT-PCR. The expression of most CKI genes both in G. raimondii and G. hirsutum was up-regulated under light, except for GhCKI19A/D (Figure 6), GrCKI6, GrCKI14, GrCKI25, and GrCKI26 ( Figure S7). However, there was only a slight difference between the two types of CKI genes. The results indicated both type I and type II might be involved in plant light signal.

Discussion
Structural Characteristics of G. raimondii CKI genes In animals, certain characteristics of CKI that certainly in uence its activity have been identi ed at the protein level: structure-related regulation, subcellular localization, interaction with other proteins, and posttranslational modi cations [31]. As a member of the superfamily of serine/threonine-speci c kinases, the function of phosphorylation is a priority to focus on. However, the characteristics of phosphorylation and sequences of plant CKI genes were not systematically analyzed in this study Instead, we systematically identi ed CKI genes in the eudicots (Arabidopsis thaliana, Theobroma cacao, G. raimondii, G. arboreum, and G. hirsutum acc. TM-1) and in one monocot (Oryza sativa) (Figure 1, Figure 3, and Figure S5). Based on sequence comparison and phylogenetic analysis, plant CKI genes were rst divided into two groups, namely type I and type II CKI genes (Figure 1, Figure S1, and Figure S5). Motif compositions, 3D protein structure, and exon/intron distribution patterns in terms of exon number in G. raimondii agreed with our hypothesis (Figure 2, Figure S3 and Figure S4).

Regarding the functional characteristics of the N-terminal and conserved C-terminal regions, previous
reports in mammals showed that CKI presents a β-sheets N-terminal lobe and mainly a α-helical Cterminal lobe, which are connected by a hinge region forming a catalytic cleft for substrate and ATP binding [32][33]. Within the C-terminal region, a speci c phosphate moiety binding motif has been identi ed allowing the recognition of phosphorylated protein substrates, which is believed to be involved in CKI regulatory interactions. These reports showed that the N-terminal and C-terminal lobes play an important role in substrate phosphorylation. In this study, type I CKI proteins resulted highly conserved within their kinase domains, but signi cantly differing in the length and primary structure of their Nterminal and C-terminal domains (Figure 2a), which is consistent with previous reports [3,4,26,31,34]. Interestingly, contrary to type I, type II CKI proteins also presented conserved kinase domains and possessed variable N-terminal and conserved C-terminal regions (Figure 2b). Furthermore, the 3D structures of type I CKI proteins were always identical, while type II were inconsistent ( Figure S3). Based on the characteristic of plant CKI genes, we hypothesis the type I and type II CKI genes possess different functions on account of the difference in structures, especially regarding the function of phosphorylation. Functional studies will be needed in the future to further explore this difference.

Evolutionary Expansion of the G. raimondii CKI Gene Family
The evolution of Gossypium CKI genes was characterized by a history of multiple gene duplications at different stages as in the four duplication events (an ancient angiosperm WGD event, one triplication event, a speci c and recent cotton WGD event, and the tetraploid event) [22,25,28]. The phylogeny obtained in this study suggested that Gossypium CKI genes may be traced back to 250 million years ago, before the ancestral angiosperm WGD event (Figure 3a). At this node, duplication created ve ancestral genes (three type I CKI genes and two type II CKI genes) (Figure 3a). Thereafter, with the divergence of angiosperms, independent duplications occurred in different lineages (such as eudicots [G. raimondii and Arabidopsis] and monocots [rice]), generating speci c duplication events. The ve ancestral genes duplicated in the four events to reach the current CKI gene numbers. However, all the 31 members of the CKI gene family in G. raimondii did not equally derived from the ve ancestral genes. Two of the ve ancestral genes (Group A and Group E) duplicated vigorously, creating 19 members of the CKI gene family in G. raimondii. At another key node, the rapid ampli cation of CKI genes was due to a wholegenome duplication (WGD) event in the Gossypium genus after its separation from T. cacao (Figure 3b) [23]. In addition, the CKI members from eudicots (G. raimondii and Arabidopsis) and monocot (rice) were not equally distributed among the subgroups ( Figure S5), which indicated that the CKI genes from different plant species went through differential expansion. Therefore, we speculated that the origin of the CKI genes was very ancient and the expansion might have occurred in different plant species and arose from independent duplications. Besides, the gene families which possess an ancient evolutionary history have important functions, such as MADS-box genes [35], Auxin response factors [36], or ASYMMETRIC LEAVES2-LIKE/ LOB-DOMAIN transcription factors [37]. Thus, we speculate that CKI genes may play important roles in plant development and be involved in very diverse biological roles to adapted to environmental stress.

Expression Patterns of the CKI Genes Suggest Functional Diversi cation
To date, although the functions of only one CKI gene have been characterized in tetraploid cotton [20], no systematic functional analysis of expression patterns for different groups of the tetraploid cotton CKI gene family was done. In this study, we demonstrated that CKI genes displayed expression divergence in roots, leaves, and anthers ( Figure S6). For instance, GhCKI2A/D, GhCKI3A/D, GhCKI8A/D, GhCKI10A/D, GhCKI11A/D, GhCKI15A/D, and GhCKI18A/D were constitutively expressed in every tested tissue (such as roots), implying that these genes may play regulatory roles at multiple development stages. In Arabidopsis, AtCKL2 and AtCKL3 were required for ABA regulation of seed germination, root growth, and gene expression [16,17]. In rice, OsCKI1 de ciency resulted in shorter primary roots and fewer lateral and adventitious roots [2]. Similar expression patterns suggest that these preferentially or speci cally expressed G. hirsutum CKI genes might play important roles in root formation and development. Among these 61 identi ed CKI genes, GhCKI (namely GhCKI11A) is the only one that was speculated to regulate tapetal programmed cell death and anther dehiscence in cotton [20]. Under HT condition, AtCKL2 and AtCKL7 were expressed in the tapetum, in anther microspores at stages 9-12, and in anther pollen grains at stages 13-14, which imply AtCKL2 and AtCKL7 may be key regulators of tapetal development under HT [38]. Apart from GhCKI11A (formerly GhCKI),, the GhCKI1A/D, GhCKI5A/D,GhCKI9A/D, GhCKI12A/D, GhCKI13A/D, GhCKI19A/D,GhCKI26A/D, GhCKI28A/D, GhCKI29A/D, GhCKI30A/D, and GhCKI31A/D genes were exclusively expressed in anther. This nding suggested that CKI genes were components of a complex transcriptional network regulating anther development.
Previous studies had showed that HT stress causes premature programmed cell of the tapetum, resulting in male sterility and catastrophic loss of crop production [30,[39][40][41]. However, the mechanism underlying successful male reproductive development under HT remains largely unknown. Except for the GhCKI11A (formerly GhCKI) gene, which was reported to regulate tapetum development under HT [20], no other CKI genes have demonstrated to participate in the regulation of anther development. This study, using the transcriptomic data of cotton anther in response to HT, found that 34 CKI genes changed the expression at different anther development stages under HT (Figure 4). These are now further CKI genes that participate in regulating stamen development under HT and that will be employed in future works.
Circadian clocks are molecular timekeepers. Many plants use photoperiod (circadian clock) information to prepare for daily environmental changes and increase their tness in changing environments [42]. The circadian clocks share similar network architecture of feedback loops that form by transcriptional and post-translational regulation among the clock components [43,44]. Phosphorylation is a common posttranslational modi cation and is an integral part of circadian regulation [44]. Another function of CKI is to take a series of biological process via phosphorylating various substrates [3]. In rice, Early owering 1 (EL1)/ Heading date 16 (Hd16), a CKI protein, regulated the rice owering pathway by enhancing the photoperiod response caused by the phosphorylation of DELLA protein SLR1 and Ghd7 [13,14]. In Arabidopsis, both CK1.3 and CK1.4 showed a high expression peak before dark under LD conditions, which indicated that the expression of CK1.3 and CK1.4 was strictly regulated by a circadian rhythm; overexpression of either CK1.3 or CK1.4 delayed owering under LD conditions [18]. However, in cotton, it is not clear that the systematic function of the tetraploid cotton CKI gene family is involved in photoperiod (circadian clock). In our present study, the expression of CKI genes is circadian. Under longday (LD) conditions, the expression in the light is higher than in the dark. Under short-day (SD) conditions, the expression of all the 36 CKI genes showed a high expression in dark. (Figure 5). The results indicated that the expression of CKI was strictly regulated by a circadian rhythm. According to these evidences, we propose that at least some CKI genes are components of a diurnal rhythm complex network. The functions of these GhCKI genes in regulating the circadian clock on the diurnal rhythms will be further characterized in future works.
Plant reception of light signal and its subsequent reaction is important for both growth and development.
In Arabidopsis, CK1 genes are involved in the light signaling pathway mainly through phosphorylation; casein kinase 1 proteins CK1.3 and CK1.4 phosphorylate CRYPTOCHROME 2 (CRY2) which is the bluelight receptor, to regulate blue light signaling [18]. Our results showed that the expression of most type I and type II CKI genes was upregulated under light conditions compared to dark ( Figure 6, Figure S7). We thus believe that both type I and type II CKI gene are involved in the response to light signals. But we do not know how these genes respond to light signals in cotton, and whether they participate in light signals by phosphorylating photoreceptors CRY2 or light signal components, including HY5, HF5, HFR1, COP1, and PIF1, which in Arabidopsis are phosphorylated by casein kinase 2 proteins [45]. We also do not know whether the pathways and capabilities of these two types of genes are the same.

Conclusion
Our study offers a promising landscape to unravel the underlying structural characteristics and evolutionary expansion of CKI genes and further elucidate their expression patterns in different tissues and various conditions; this is crucial to better understand their characteristics and to elucidate their precise functions in regulating various facets of plant growth and development.

Database search and identi cation of CKI genes
The genomic database of three cotton species G. raimondii, G. arboreum and G. hirsutum acc. TM-1 were downloaded from https://www.cottongen.org/. The protein database of A. thaliana, O. sativa and T. cacao were obtained from The Arabidopsis Information Resource (TAIR: http://www.arabidopsis.org/) and http://www.phytozome.net/, respectively. BLAST searches were performed using 13 CKL genes within the Arabidopsis casein kinase 1-likegene family and heading date 16 (Hd16) encoding a casein kinase I protein as queries. The candidate CKI proteins were further aligned to remove redundant sequences. Subsequently, the four speci c fragments (LLGPSLEDLF, HIPXR, EXSRRDD, and LPWQGLKA) were used to further con rm the presence of the CKI protein sequences.
Gene structure and phylogenetic analysis Sequence alignments were generated with CLUSTALX [46], and the alignments among CKIs were adjusted before the tree was constructed. The online Gene Structure Display Server 2.0 [47] (http://gsds.cbi.pku.edu.cn/) was used to identify the exon/intron organization. G. raimondii CKI protein sequences were submitted to online MEME (Multiple EM for Motif Elicitation) program [48] (http://memesuite.org/tools/meme) to identify conserved protein motifs. The 3D structure of GrCKI proteins was predicted by SWISS-MODEL (https://swissmodel.expasy.org) [27]. Phylogenetic trees were constructed by the Maximum likelihood method in MEGA 6 [49].

Plant materials, growth conditions and stress treatments
Four cotton accessions used in this experiment are provided by Huazhong Agricultural University. For the expression pro les in different organs/tissues, various samples of G. hirsutum cv. YZ1 were extracted from roots, stems, leaves, petal, anther, and ovules excised carefully from bolls ve DPA. To analysis the expression patterns of Gossypium hirsutum CKI genes at different anther developmental stages under NT and HT conditions, two cotton (Gossypium hirsutum) lines with obvious differences in performance under HT were employed in this study: 84021, which is tolerant to HT, and H05, which is sensitive to HT [30]. The plants were grown in a greenhouse at 28 °C to 35 °C/20 °C to 28 °C day/night as a normal condition. During HT treatment, the plants were cultivated at 35 °C to 39 °C/29 °C to 31 °C day/night in a greenhouse. When the plants were treated with HT for 7 d, buds of different lengths (6)(7)(9)(10)(11)(12)(13)(14), and more than 24 mm) were collected under HT and NT. The anthers were excised and immediately frozen in liquid nitrogen; they were then stored at -80 °C until use. The transcriptome pro les of CKI genes were isolated from the RNA-seq data [30]. In order to analyze the diurnal regulation of G. hirsutum CKIs gene expression, G. hirsutum cv. YZ1 was grown under short-day conditions (8 h light/16 h dark) and long-day conditions (16 h light/8 h dark), respectively. The leaves of four leaves period cotton plants were harvested. We also collected cotyledons under light and dark conditions, that from G. raimondii and G. hirsutum cv.YZ1 respectively.

qRT-PCR
Various plant samples were collected and immediately frozen in liquid nitrogen and stored at -80°C. total RNA was isolated from the collected cotton tissues using previously published methods [50]. First-strand cDNA was generated from 3 μg total RNA using the M-MLV reverse transcriptase (Invitrogen). The cDNA was used as a template for qRT-PCR. The qRT-PCR reactions were performed using the 7500 Real-Time PCR System (Applied Biosystems). The primers used in this study were listed in Table S2, Table S3 Figure 1 Phylogenetic tree of the Casein Kinase I (CKI) in Gossypium spp. The phylogenetic tree was constructed using 122 cotton CKI protein sequences from G. hirsutum (61), G. arboreum (30) and G. raimondii (31) with the Maximum likelihood method in MEGA 6. The three different symbols represent the three cotton species: green for G. arboretum, red for G. raimondii, and blue for G. hirsutum. The GeneIDs of CKI genes from G. hirsutum, G. arboreum, and G. raimondii were listed in Table 1. The speci c fragments exist in type I and type II GrCKI proteins. The red font indicates the speci c fragments from type II GrCKI proteins. (d) Motif composition of GrCKI proteins. Conserved motifs in the GrCKI proteins are indicated by colored boxes. The GeneIDs of type I and type II GrCKI genes were listed in Table S1 Figure 3 Evolutionary expansion of the G. raimondii CKI gene family and phylogenetic analysis of CKIs in G. raimondii and T. cacao. (a) The putative evolutionary history of the GrCKI genes. The four colored boxes represent the four duplication events. (b) Phylogenetic analysis of type I and type II CKI genes in G.

Figures
raimondii and T. cacao, respectively. (c) Number of CKI genes in G. raimondii and T. cacao in different groups. The Gene IDs of type I and type II CKI genes from G. raimondii and T. cacao were listed in Table   S1.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.