Identification and Evolutionary Analysis of the GOLDEN 2-LIKE Gene Family in Foxtail Millet

GOLDEN 2-LIKE (GLK) transcription factors are members of the GARP superfamily and are involved in chloroplast development and stress tolerance. These transcription factors have been studied at the genome level across several plant species. However, no study has comprehensively analyzed the GLK family in foxtail millet (Setaria italica L.). The present study discovered 59 GLK genes in foxtail millet. Multiple sequence alignment and gene motif analysis revealed the presence of Myb-SHAQKYF and Myb-CC-LHEQLE motifs in all the members. In addition, the gene promoters mainly had light- and stress-responsive cis-regulatory elements. Syntenic analysis showed that whole-genome duplications (WGD) and dispersed duplications contributed to the expansion of the SiGLK gene family. Gene expression analysis revealed that SiGLKs exhibited preferential expression in bundle sheath than in the mesophyll cells. Meanwhile, analyzing these genes in foxtail millet under abiotic stress demonstrated a significant role for SiGLK30, 38, 48, and 52 in regulating salt, cold, and drought stress responses. Finally, qRT-PCR confirmed that cold stress increased SiGLK52 and SiGLK31 expression. The study thus provides an overview of GLK gene evolution in foxtail millet and a clue to its function in bundle sheath and mesophyll cell and stress tolerance.


Introduction
C4 plants originated from C3 plants, and the process was reported to have occurred at least 66 times separately across several lineages (Shi et al. 2020;Sage et al. 2012). During the process, C4 plants have evolved a special leaf structure, known as the Kranz anatomy, consisting of two rings of cells around the vascular bundle of leaf veins in most C4 species, enabling them to maintain high carbon dioxide utilization to survive in low ambient CO2 and high temperatures (Gowik et al. 2011;Osborne and Freckleton 2009;Edwards et al. 2001;Lundgren et al. 2014;Sedelnikova et al. 2018). Based on Kranz structure, ambient CO2 was first fixed in OAA in the mesophyll (M) by phosphate pyruvate carboxylase (PEPC) and then transferred as malate into the bundle sheath (BS). With the help of NADP-ME, the fixed carbon dioxide was then liberated from malate and re-fixed by Rubisco. Meanwhile, enzymes involved in this process, such as carbonic anhydrase (CA), pyruvate orthophosphate dikinase (PPDK), phosphoenolpyruvate carboxykinase (PCK), malate dehydrogenase (NADP-MDH), and the previously mentioned phosphate pyruvate carboxylase (PEPC), NADP-dependent malic enzyme (NADP-ME), had a clear accumulation and enrichment. GLK genes are a subset of the few transcription factors known to be involved in mesophyll cells (M) and bundle sheath (BS) compartmentalization in C4 plants (Rossini et al. 2001). GLK genes have two conserved domains, the Myb-DNA binding (containing the HLH motif) and the C-terminal domain (containing the conserved GCT box), as members of the GARP family (Riechmann et al. 2000).
Members of the GLK gene family have previously been found to regulate chloroplast formation and development conservatively. Mutations in the goden2 gene resulted in the destruction of BS and pale green leaves in maize (Hall et al. 1998); in birch, deletion of BpGLK1 reduced chlorophyll content and disrupted chloroplast development (Gang et al. 2019); in tomato, suppression of GLK1 resulted in pale leaves with reduced chlorophyll (Nguyen et al. 2014). In Ppatents, Arabidopsis and rice, simultaneous loss of function of GLK1 and GLK2 resulted in pale leaves (Fitter et al. 2002;Wang et al. 2013;Bravo-Garcia et al. 2009;Yasumura et al. 2005). However, the two GLK genes regulated chloroplast development in different ways. In maize and sorghum, for example, loss of GLK2 caused chloroplast disruption in BS cells but not in M cells. In non-C4 species, however, the two GLK genes acted redundantly to regulate chloroplast development (Wang et al. 2013;Yasumura et al. 2005;Bravo-Garcia et al. 2009;Chen et al. 2016). The fact that only double mutation of the two GLKs results in phenotypic difference from wildtype plants shows that GLK1 and GLK2 function separated after the C3-C4 split (Wang et al. 2013).
GLK genes have also been found to play a function in abiotic stress and disease resistance. In cotton, silencing GhGLK1 caused more damage to plants under drought and cold stress than the wild type (Liu et al. 2021). Overexpression of GLK1 in Arabidopsis conferred resistance to Fusarium graminearum (Savitch et al. 2007;Schreiber et al. 2011). Studies also indicated that AtGLKs exhibited tolerance to cucumber mosaic virus ). Furthermore, ectopic overexpression of GLK1 conferred pathogen resistance in rice (Nakamura et al. 2009). GLKs were also involved in the control of leaf senescence (Qin et al. 2021).
In this study, we firstly explored GLK genes in the foxtail millet at the genome-wide level, and analyzed the subcellular localization, chromosomal distribution and gene structure. Of the 59 GLK genes identified in foxtail millet, SiGLK30, 38, 48, and 52 responded to drought, salt and cold treatments according to the stress analysis. This study will provide a reference for further research on the role of GLKs playing in foxtail millet.

Identification of GLK Genes in Foxtail Millet
BLAST analysis using the Arabidopsis and maize GLK protein sequences (GLK1, G2) against the foxtail millet genome identified 59 SiGLK genes (Fig. 1, Table 1). The SiGLK proteins were 154 (SiGLK53) to 702 (SiGLK29) amino acids long, with an isoelectric point (pI) ranging from 5.03 (SiGLK32) to 10.02 (SiGLK6). The 59 SiGLK genes were found unevenly distributed across the nine chromosomes of foxtail millet, with genes ranging from 1 to 11 per chromosome; chromosome 4 had the most genes (11) and chromosome 8 had the least (1). These genes were renamed based on their position in the genome (Table 1). Subcellular localization analysis showed that all the SiGLKs are located in the nucleus (Table 1).
Further analysis based on the MEME predicted 15 potential motifs in the 59 SiGLK proteins (Fig. 2B, Supplementary Table 1). The MEME annotation classified these proteins into six groups(I-VI) based on the composition of the motif ( Fig. 2A). GLK gene-specific motifs 1 (Myb-SHLQKYF) and 2 (PELHRR) were found in I, II, III, IV, V, and VI groups, with 20, 15, 5, 6, 10, and 3 genes, respectively. All members of the group I contained one extra motif, the motif 3 (Myb-CC-LHEQLE) which distinguishes them from the other groups and is a symbol for the subfamily involved in phosphate starvation signaling (Kumar et al. 2017;Meng et al. 2021). Furthermore, group V was identified as the REC superfamily that acts as a phosphorylation-mediated switch within response regulators (RRs) with motifs 4, 5, 6, and 7. Meanwhile, group III contained an extra unannotated motif (motif 9).

Analysis of SiGLK Promoters
In silico analysis of the SiGLK promoter sequences (2000 bp upstream of the transcription start site, TSS) using plantcare discovered cis-acting elements associated.
Further analysis recognized ABRE and CGTCA as the most abundant defense and stress-responsive elements in the SiGLK promoters. SiGLK8, SiGLK34, and SiGLK59 had the maximum ABREs (10, 9, and 7, respectively), and SiGLK49, SiGLK43, and SiGLK12 contained the maximum number of CGTCA motifs (12, 12, and 12, respectively). These genes also had stress-responsive elements, such as LTR (low-temperature responsive) and MBS (drought-induced) elements. SiGLK31, SiGLK52, and SiGLK40 contained the most LTRs, indicating their role or response under cold stress. In addition, almost all SiGLKs had one to three MBS, implying their potential to respond to drought.

Syntenic Analyses of SiGLKs in Foxtail Millet
Collinearity analysis can provide information about gene duplication events and mode in plants. Therefore, we performed syntenic analysis to determine the gene duplication pairs and the method of duplication that led to SiGLKs during evolution. Five large-scale collinear blocks were discovered in the foxtail millet genome, including Chr1-Chr4, Chr1-Chr7, Chr2-Chr6, Chr2-Chr9, and Chr3-Chr5 (Fig. 4), consistent with previous studies in foxtail millet (Zhang et al. 2012). The analysis found that 27 SiGLKs were derived via WGD or segmental duplication and formed 15 syntenic duplication gene pairs (Supplementary Table 2); 30 SiGLKs were derived via dispersed duplication, whereas two were derived from tandem duplication (Fig. 4). Among these, 25 SiGLKs were found within these large collinear blocks, and therefore, 14 of the 15 duplicated SiGLK pairs were considered to have expanded via WGD (Supplementary Table 2).
Further, we calculated the Ka/Ks ratio (non-synonymous substitution rate, Ka and synonymous substitution rate, Ks) of the 15 duplicated SiGLK pairs using TBtools (Ka_ Ks calculator) to identify whether or not the SiGLK genes are under selective pressure. All duplicated SiGLK pairs except three pairs (SiGLK15,18;SiGLK15,57;SiGLK18,57) underwent purifying selection because the Ka_Ks ratio ranged from 0.14 to 0.46 (Supplementary Table 3). Meanwhile, the Ks of 5 of the 15 duplicated SiGLK pairs ranged from 0.6 to 0.8, and that of 6 of the 15 duplicated SiGLK pairs ranged from 1.0 to 1.7; only one pair had a Ks of 0.16.

Comparative Syntenic analysis Across Species
Further, BLAST analysis was carried out using the GLK protein sequences of Arabidopsis and maize (GLK1, G2) against the genomes of 26 species ranging from lower mosses to tall gymnosperms, from monocots to eudicots, and C3 to C4 plants (2 mosses, 1 lycophyte, 1 basal angiosperm, 2 gymnosperms, 11 monocots, and 9 eudicots) to investigate the expansion of GLK genes across the plants; the analysis discovered 1758 GLKs in total. Subsequently, a species tree was created using the 28 species with the NCBI taxonomy browser (Fig. 5). Excluding Glycine max (128) and.
Musa acuminata (117), most species, from Physcomitrella patens to Zea mays, had similar numbers of GLK genes, demonstrating the importance of GLK in development regardless of mosses or angiosperms. However, the number of GLK genes appeared unrelated to genome size, particularly in gymnosperms like Ginkgo biloba and Gnetum montanum. Ginkgo biloba, with a 9555 Mb genome, has 38 GLK genes. The Daucus carota genome, on the other hand, of 422 Mb in size, contains 85 GLK genes. Hordeum vulgare had nearly the same number of GLK genes as maize, but it had a genome that was more than twice the size of maize. Furthermore, the closely related species, rice, sorghum, and maize, were chosen for comparing the synteny with foxtail millet using MCscanX ( Supplementary Fig. 1), considering  Arabidopsis as the outgroup. We firstly discovered 71, 54, and 54 GLK genes in maize, sorghum, and rice, respectively, and performed syntenic analysis on each of them, as performed in foxtail millet ( Supplementary Fig. 2, Supplementary Table 5). Except for a few uncertain GLKs, rice, maize, and sorghum have 30, 51, and 24 GLKs, respectively, originated from WGD or segmental duplication, and 19, 16, and 26 GLKs from dispersed duplication; only 3 GLKs (SbGLK48, 49, 50) derived from tandem duplication in sorghum, but none in rice or maize ( Supplementary Fig. 2D). Besides, 96, 77, 74, and 13 syntenic gene pairs were identified between foxtail millet and maize, rice, sorghum, and Arabidopsis, respectively (Supplementary Table 5). Among these, 11 SiGLKs had at least one collinear gene in Arabidopsis, implying that they originated before the monocotdicot split; 33 SiGLKs had at least one collinear gene in rice, sorghum, and maize; and three SiGLKs had at least one collinear gene in sorghum and maize, but none in rice ( Supplementary Fig. 3); these observations imply that most SiGLKs originated before the C3-C4 split and after the dicotmonocot divergence. Our analysis showed that each GLK in foxtail millet had two or more orthologs, probably due to gene duplication, resulting in additional paralogous genes. For example, SiGLK15, SiGLK18, and SiGLK57 are paralogous and have two or three orthologs in rice. To further clarify the evolution of GLK genes, we investigated the genome collinearity of rice, foxtail millet, sorghum, and maize using TBtools (Find Path by Gene Pair) (Chen et al. 2020) (Supplementary Table 6). The analysis showed that most SiGLK genes corresponded to a particular orthologous gene in rice at the expected colinear location, such as OsGLK30-SiGLK18-SbGLK43-ZmGLK47 and OsGLK2-SiGLK33-SbGLK15-ZmGLK21 (Fig. 6). SiGLK27 and SiGLK28 were the only two GLKs produced by tandem duplication and occurred after foxtail millet-sorghum split, as both tandem genes shared the same rice and sorghum orthologs (OsGLK36, SbGLK50) at the expected colinear location. SiGLK27 and SiGLK28 were paralogous to SiGLK2, SiGLK26, and SiGLK29 and formed a gene cluster (excluding SiGLK2). Further, the evolutionary history was determined by comparing them to the rice genome collinearity. According to our findings, one WGD event doubled SiGLK26 and SiGLK2, then SiGLK26 duplicated (dispersed duplication) to produce SiGLK29, and finally uneven crossing over produced SiGLK27 and SiGLK28 (Zhang 2003;Panchy et al. 2016).

Comparative Phylogenetic Analysis Across Species
We further aligned the amino acid sequences of 59 SiGLKs with rice, foxtail millet, sorghum, maize, and Arabidopsis to elucidate the evolutionary relationships between GLK genes (Fig. 7). A neighbor-joining tree constructed using the 279 GLK proteins with the MEGA software divided them into 11 groups (A-L), consistent with the grouping based on MEME analysis ( Supplementary Fig. 5). Group K (50) was the largest subfamily, with members containing motifs 1, 2, and 3. Group A was the second-largest, with members harboring motif 8 preceding motif 1 and motif 2; group A was followed by groups B (36), J (35), and L (29). Group H included all orthologs of maize goden2 (ZmGLK21) and GLK1(ZmGLK59), SiGLK33 and SiGLK23 of foxtail millet, OsGLK2.
and OsGLK33 of rice, SbGLK15 and SbGLK46 of sorghum, and AtGLK13 and AtGLK37 of Arabidopsis thaliana. GLKs from monocot plants (rice, foxtail millet,

Expression of SiGLKs in Different Foxtail Millet Tissues and Cell Types
We further investigated the transcriptome data (Zhang et al. 2012) obtained from four foxtail millet tissues (root, leaf, stem, and spica) to determine the expression patterns of the SiGLK genes (Fig. 8). TBtools kit created an expression heatmap, and gene expression was normalized to determine the differences. The majority of SiGLK transcripts were detected in all four organs. SiGLK52 had the highest expression in the root; SiGLK23 had the highest expression in the leaf and was the only gene specifically expressed in leaves; SiGLK17 and SiGLK49 had the highest SiGLK12, SiGLK15, SiGLK19, SiGLK21, SiGLK54, and SiGLK59 were found expressed preferentially in stems, whereas SiGLK14, SiGLK42, SiGLK49, SiGLK50, and SiGLK53 were expressed preferentially in spica. Interestingly, we found that all highly expressed genes were members of groups I and III-V. Almost all group I members showed high expression in the four tissues, implying a broader role. Foxtail millet is a C4 plant with a dimorphic chloroplast and photosynthesis compartmentalized into bundle sheath (BS) and mesophyll cells (M); therefore, the present study analyzed the expression of SiGLKs in M and BS separately ( Fig. 9). The analysis revealed differential cellular accumulation for 39% of the SiGLKs (23 SiGLKs); 6 were M-enriched, and 17 were BS-enriched.
We also compared the cell-specific GLKs across four species. Rice, sorghum, and maize had 12, 26, and 17 GLKs, respectively, with differential cellular expression (Supplementary Fig. 6). The differentially expressed BS-enriched GLK genes in rice, sorghum, and maize were similar to those in foxtail millet. SiGLK15, SiGLK18, and SiGLK57, which are paralogous, were found expressed mainly in the BS as their rice (OsGLK9), sorghum (SbGLK21), and maize (ZmGLK35) orthologs. Similarly, the paralogs SiGLK3 and SiGLK25, and their orthologs OsGLK9, ZmGLK35, and SbGLK21 showed a BS-enriched expression pattern. SiGLK33 demonstrated BS-biased expression Fig. 3 Counts of cis-acting elements in SiGLKs promoter regions. TBtools was used to retrieve the 2,000 bp fragment upstream from the start codon (ATG) of each SiGLK. Then, The online software plantcare was used to analyzed the promoters. The number of different cisacting elements was represented by various colored boxes.with light and stress response as the most commonly distributed ones (Fig. 3). The light-responsive elements (LREs) identified included G-box, GT elements, I-box, H-box, and AT-rich sequences. The promoters of SiGLK genes had maximum G-box elements, ranging from 0 to 9. SiGLK8 and SiGLK17 contained the most G-boxes (9 and 6, respectively), indicating a significant influence of light on their expression. SiGLK23, SiGLK41, and SiGLK43 lacked a G-box, while the other SiGLKs had at least one G-box consistent with its sorghum and maize orthologs (SbGLK15 and ZmGLK21), but the opposite was true for its ortholog OsGLK2 (M-biased) in rice.

Expression Pattern of SiGLKs Under Various Abiotic Stresses
The study further analyzed the expression patterns of SiGLKs during salt, drought, and cold stress using publicly available RNA-seq data (Yi et al. 2022;Meng et al. 2021) to elucidate the possible role of these genes in abiotic stress tolerance in foxtail millet (Fig. 10). Under drought stress, 24 SiGLKs showed significant differences in expression. Here, 12 of 24 SiGLKs were found upregulated, while the others were downregulated. SiGLK19, SiGLK38, SiGLK48 and SiGLK52 showed maximum upregulation in response to drought stress compared with the control samples, while SiGLK18, SiGLK31 and SiGLK47 showed maximum downregulation (Fig. 10A). Under salt stress, five SiGLKs were upregulated, whereas three were downregulated (Fig. 10B). SiGLK52 showed maximum upregulation under salt stress. On the other hand, SiGLK10 and SiGLK31 were greatly reduced. Under cold stress, only SiGLK23 displayed a rapid increase within 1 h (Fig. 10CC). However, the number of cold-responsive SiGLKs gradually increased after 3 h. The genes SiGLK23, SiGLK52, SiGLK3, SiGLK10, and SiGLK31 upregulated at 1, 3, 6, 16, and 24 h under cold stress. Meanwhile, at 3, 6, 16, and 24 h under cold stress, SiGLK18, SiGLK27, SiGLK23, and SiGLK38 were downregulated (Fig. 10C). All three stress treatments resulted in the upregulation of four SiGLKs (SiGLK30,38,48,and 52). Meanwhile, only SiGLK18 was negatively regulated in response to all three stressors. SiGLK52 had a higher expression than the other three upregulated SiGLKs under normal conditions and showed quick and continuous upregulation when exposed to stress (Fig. 10D). Further qRT-PCR confirmed the expression of SiGLK52 and 31 during the cold treatment (Fig.10D).

Discussion
Foxtail millet is one of the world's oldest cultivated crops and is widely grown in northern China as a major food crop (Barton et al. 2009). Moreover, as a typical C4 plant with a small genome (~ 500 Mb), foxtail millet has emerged Fig. 4 Syntenic analysis of SiGLK genes. All syntenic blocks in the Setaria italica genome are represented by colored lines. The red lines highlight the syntenic GLK gene pairs.The chromosomes are represented by the outer circle, and the gene density is represented by the inner circle. The number of genes was shown in sliding windows of 100 kb in each chromosome.. The names of the genes from dispersed, WGD or segmental, and tandem duplication are shown on the outside of different chromosome and designated in black, red, and blue, respectively as a convenient model plant for the study of C4 processes (Brutnell et al. 2010;Lata et al. 2013;Li and Brutnell 2011;Doust et al. 2009). GLK genes, governing plant cell differentiation in C4 plants and chloroplast formation in both C4 and C3 plants, have been discovered and described in maize, sorghum, rice, Arabidopsis, and various other plants (Wang et al. 2013;Fitter et al. 2002;Rossini et al. 2001;Liu et al. 2021;Nguyen et al. 2014). It can tolerate various abiotic stresses such as drought and low temperature because it prefers arid and semiarid environments (Bettinger et al. 2010). A thorough examination of the GLK gene family in foxtail millet will help us better understand this species' growth, development, and stress tolerance.
The present study discovered 59 GLK genes in foxtail millet and named them based on their chromosomal location. The number of GLK genes is similar to the C4 relatives, such as Sorghum bicolor, Setaria viridis, and Panicum hallii, but less than maize (Zea mays;71) due to one independent WGD that occurred in maize after the sorghum-maize split (Swigonová et al. 2004). Except for Musa acuminata and Glycine max, mosses to angiosperms had almost the same number of GLKs, highlighting the role of GLK in development in various members of the plant kingdom. We also found that the number of GLKs had little correlation with genome size. The gymnosperms Gnetum montanum and Ginkgo biloba with a large genome (9555 and 3653 Mb) contain fewer (38 and 27) GLKs, while Brachypodium distachyon has a tiny genome (271 Mb) with more (61) GLKs. Among the 27 species, foxtail millet has a moderately sized genome and SiGLKs. Detailed analysis revealed differences in the sequence length, relative molecular weight, and isoelectric point among the SiGLK proteins, which explained the GLK family's complexity. All SiGLKs were found localized in the nucleus, indicating their conserved transcriptional regulatory roles. Additionally, SiGLKs contained motifs 1 (Myb-SHLQKYF) and 2 (PELHRR), generally used to distinguish the GLKs. Meanwhile, motif3 (Myb-CC-LHEQLE), a type of MYB-like domain, was detected in all members of group I, the largest subfamily of GLKs in foxtail millet similar to maize . Gene duplication has long been considered an important mechanism for acquiring numerous genes with various biological functions, and WGD operates as a key driving force allowing adaptive evolution to proceed more efficiently (Wood et al. 2009;Soltis and Soltis 2016). Meanwhile, progress in genome sequencing techniques and the availability of various genomes have improved our understanding of plant evolution. Researchers have indicated that the foxtail millet ancestor lived through at least five WGD events (Fig. 4) (Wu et al. 2020;Liu et al. 2022;Zhang et al. 2020). However, the surge in SiGLKs mainly was attributed to the last WGD event in the foxtail millet, which occurred 70 mya (Zhang et al. 2012). Moreover, only 11 Fig. 6 Syntenic analysis of GLK genes between foxtail millet and other three species. TBtools was used to visualize the syntenic relations. The long gray rectangles represent the chromosome regions that contain GLKs. The boxes above the long gray rectangles represent the genes during the region; the length and color of the boxes, represent the size and orientation of the genes, respectively. Yellow lines indicate syntenic GLK gene pairs, while gray lines depict other collinear genes ◂ Fig. 7 Phylogenetic analysis of GLK proteins in foxtail millet, maize, sorghum, rice, and Arabidopsis. The GLK proteins were grouped into 12 distinct clades (A-L), which are indicated by colored branches.
Genes on branch ends from different species are denoted by different colored circles SiGLK genes had orthologous genes in Arabidopsis thaliana's colinear region, and 33 SiGLKs had collinear genes in rice, which indicates that most SiGLK genes originated after the dicot-monocot divergence but before the foxtail millet-rice split. However, only one WGD (70 mya) was detected during the process, based on previously recognized WGDs (Zhang et al. , 2012. The study also investigated the expression of SiGLKs in various tissues and two cell types (BS and M). SiGLK52 had the highest expression in roots, although it was not explicitly expressed in this tissue. There are no reports on the role of GLKs in roots, so the importance of SiGLK52 needs to be investigated; Meanwhile, SiGLK23 and SiGLK33 displayed the highest expression in the leaf.
Furthermore, SiGLK23 was only expressed in the leaf, whereas SiGLK33 was expressed in the leaf, stem, and spica but not in the root. Spica showed a high and specific expression of SiGLK49. In addition, we discovered that 24% of SiGLKs showed differential accumulation between M and BS, implying the cell-specific accumulation as a general feature of the GLK gene family. When we analyzed various species, the same expression pattern was detected in maize (24%), sorghum (44%), and rice (20%). The SiGLKs and their orthologs in the C4 maize and sorghum had a similar cell-preferential expression but not in rice. This finding suggested a complex regulatory divergence in the evolution of C3-C4 plants. Moreover, the observations indicate the possible role of SiGLK23 and SiGLK33 Fig. 8 Heatmap of the expression of SiGLKs in leaf, root, stem, and spica. The color of rectangles from blue to red represents an increase in gene expression (normalized by column). The size of red circles represents the maximum transcripts per million (TPM) of SiGLKs in regulating cell differentiation and chloroplast formation, similar to the orthologous maize GLK1 (ZmGLK59) and goden2 (ZmGLK59) with similar expression patterns.
The study also discovered numerous cis-elements associated with light and abiotic stress, such as G-box, AREB, and LTR, in the promoters of SiGLKs. G-box was the most abundant light-related element. It was discovered as a light regulatory element in the small subunit of ribulose 1,5-bisphosphate carboxylase/oxygenase (RbcS) (Giuliano et al. 1988). Deletions or point mutations in G-box modulated the interaction with GBF (G-box binding factors) and influenced light-related gene expression (Donald and Cashmore 1990;Schindler et al. 1992). In addition, SiGLK33 and SiGLK23 are orthologs of maize G2 and GLK1 regulated by light; the study found more G-box elements in SiGLK33 (7) than in SiGLK23 (0), probably resulting in differences in their expression patterns. Meanwhile, all SiGLKs, except seven, had ABRE (ABA-responsive element), the primary ciselement for ABA-responsive gene expression (Nakashima and Yamaguchi-Shinozaki 2013), as the most common stress-related element, indicating the role of SiGLKs in stress responses. Further analysis of the gene expression under stress using publicly available transcriptome data showed that drought and cold treatment affected 24 and 23 SiGLKs, respectively, with 14 SiGLKs (5 upregulated and 9 downregulated) being differentially expressed under both drought and cold stress. The gene expression patterns were consistent with the widely distributed drought-and cold-responsive cis-elements, such as ABRE and MBS. Meanwhile, RNA-seq data obtained from samples under cold stress and qRT-PCR revealed that SiGLK31 and SiGLK52, two of the three SiGLKs (SiGLK 31, 40, and 52) with the most LTRs, were upregulated. These findings confirmed the role of SiGLKs in stress response.

Conclusion
The present study identified and characterized 59 GLK genes in foxtail millet based on bioinformatic analysis. Syntenic analysis revealed WGD and dispersed duplication as the driving forces for the expansion of SiGLK genes. Analysis of the foxtail millet transcriptome revealed expression of these genes in almost all tissues, indicating their role in plant growth and development. Meanwhile, compartmentalized expression in the BS and M cells suggested their potential roles in dimorphic chloroplast development. Further, analysis of the expression of these genes in foxtail millet under abiotic stress demonstrated their roles in regulating the stress response. Among the various SiGLKs, SiGLK30,38,48,and 52 were implicated in salt, cold, and drought stress. Thus, the study's findings advance our understanding of GLKs and propose candidate genes for improving stress tolerance in foxtail millet. Further studies should focus on the functional analysis of SiGLKs for a comprehensive understanding of the stress regulatory pathways.

Identification of GLK Genes
The GLK protein sequences of Arabidopsis (https:// www. arabi dopsis. org) and maize (GLK1, G2) were used as query sequences to blast against the genomes of 27 species, including mosses (Sphagnum magellanicum, Physcomitrium patens), lycophytes (Selaginella moellendorffii), gymnosperms (Gnetum montanum, Ginkgo biloba), basal angiosperms ( (Goodstein et al. 2011). The datas were downloaded from the phytozome (https:// phyto zome-next. jgi. doe. gov) and ensemble (https:// plants. ensem bl. org) databases. As a result, sequences with a similarity of more than 30% and an E-value less than E −10 were retained. And the retained sequences were uploaded to the Swissport and Conserved Domain Database (https:// www. ncbi. nlm. nih. gov) for further validation to ensure that the sequences belonged to the GLK family (Lu et al. 2020). Setaria italica, Zea mays, Oryza sativa, Sorghum bicolor, and Arabidopsis were chosen for multiple align, collinear comparison, and other data analysis. all 28 species were used to establish the species tree generated by searching in the NCBI taxonomy browser (https:// www. ncbi. nlm. nih. gov/ Taxon omy/ Brows er/ wwwtax. cgi) with default parameters. The genome sizes of the 28 species were calculated by TBtools based on the assembled genome files. The pI (isoelectric points) and MW (molecular weights) of the GLK proteins were calculated using the ExPASy website (https:// web. expasy. org).

Multiple Sequence Alignment and Phylogenetic Analysis
ClustalW module within the MEGA11 software package was used to align the full-length of the GLK proteins, The the phylogenetic tree was constructed by the neighbor-joining (NJ) method with default parameters and 1000 bootstrap replicates (Kumar et al. 2018), and was visualized with the online tool iTOL (https:// itol. embl. de).

Chromosomal Localization and Syntenic Analysis of GLK Genes in Foxtail Millet, Maize, Sorghum and Rice
All of the GLK genes found were mapped to chromosomes using information from the phytozome. MCscanX (Wang et al. 2012) was used for collinearity analysis (blast hits E-value cutoff < 1E-10, number of blast hits is 5). TBtools (Chen et al. 2020) was used to visualize the collinearity relationship across foxtail millet, maize, rice, sorghum, and Arabidopsis thaliana. T = Ks/2λ (λ = 6.561029 × 10 -9 ) was used to calculate the divergence time (Lynch and Conery 2000;Blanc and Wolfe 2004).

Predicted Protein Motifs and Gene Structure Characterization GLK Genes in Foxtail Millet
The following settings were used with the MEME (https:// meme-suite. org, v5.5.0) to predict the protein motifs of the GLK genes in foxtail millet: the discovery mode was classic and the 0-order model of sequences was used as the background; the motif sites distribution was zero or one (C) Heatmap of the SiGLKs in response to cold treatment. The red rectangle represents up-regulated SiGLKs, the blue rectangle represents down-regulated SiGLKs, and the white rectangle indicates unchanged SiGLKs. (D) The qRT-PCR of SiGLK52 and SiGLK31 after 6, 12 and 12 h cold treatment. The relative expression levels were calculated using actin as an internal control by the 2 −∆∆Ct method. Error bars indicate SD. Three biological replicates were performed ◂ occurrence per sequence (zoops); the maximum number of distinct motifs was 20, the minimum motif width was 6, and the maximum motif width was 50. TBtools was used to derive the exon-intron structures for the SiGLK genes based on the genome annotation data.

Cis-Regulatory Element Analysis of SiGLKs in Foxtail Millet
The 2,000 bp fragment upstreamed from start codon (ATG) of every SiGLK was retrieved by TBtools, and analyzed using online software PlantCARE (Lescot et al. 2002)). After TATA-box and CAAT-box were deleted, TBtools was used to visualize the cis-acting elements.
Quality checking was performed using fastqc. The adaptor and low-quality reads were then removed by TRIM-MOMATIC. Using Hisat2, clean readings were aligned to reference genomes. The reads were then quantified using Stringtie. Finally, differential expressed gene (DEG) analysis was performed using edgeR. DEGs were defined as those with |log2 foldchange|> 1 and p < 0.05.

RNA Extraction and qRT-PCR Analysis
The foxtail millet (Setaria italica) was grown in a greenhouse at a temperature of 28 °C (16 h of light/8 h of darkness). Seedlings were given a cold treatment after three weeks. Leaves were collected at 0 h, 6 h, 12 h, and 24 h after being exposed to cold stress (4 °C). Total RNA was extracted from all of the samples in this study using TRIzol Reagent (Sigma, USA) according to the manufacturer's instructions. PCR was performed using SYBR Green Real-Time PCR Master Mix (TransStart) according to the manufacturer's instructions. The relative expression levels were calculated using actin as an internal control by the 2 −∆∆Ct method. Three biological replicates were performed.
Author Contributions HFC, XLW designed the study, HFC performed the research, analyzed the data and wrote the paper. LQ, JT and XLW modified the manuscript. All authors have read and approved the final manuscript.
Funding This work was supported by grants from the National Natural Science Foundation of China (31901552 and 32172013).
Data Availability All data used in this study were obtained from publicly accessible databases, and all generated data were included in this published article and its supplementary information file.

Conflicts of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.