Genome-wide identification and expression analysis of the 14-3-3 gene family in cotton ( Gossypium hirsutum L. )

Background: As a ubiquitous acid-regulating protein family in eukaryotes, 14-3-3 proteins are widely involved in the growth and development of plants. With the development of the third-generation sequencing technology and the smooth completion of the cotton genome work, it is possible to explore the existence and distribution of the 14-3-3 protein family in cotton. Results: In this paper, 33, 33, 17 and 18 members were identified from this family in Gossypium hirsutum ((AD) 1 ), G. barbadense ((AD) 2 ), G. arboreum (A 2 ) and G. raimondii (D 5 ), respectively. In particular, evolution analysis, structure analysis and functional expression analysis of this protein family in G. hirsutum ((AD) 1 ) were carried out. The results showed that compared with Arabidopsis and rice, the phylogenetic tree and gene structure clearly divided the 14-3-3 protein family into two subgroups in G. hirsutum ((AD) 1 ), the ε group and the non-ε group; Analysis of transcriptome expression patterns revealed that this family was significantly induced to express under abiotic stress; Most 14-3-3 proteins have a large number of cis-acting elements related to growth, development and abiotic stress in the promoter region, among which elements related to drought stress account for the largest proportion; The results of qRT-PCR showed that the expression of 14-3-3 protein had significant differences under drought stress. Conclusions: In summary, this study signified that the 14-3-3 protein family is relatively conserved in the evolutionary expansion of cotton, and may be involved in the growth and development of plants and the mechanism of stress resistance. These results provide an important theoretical and experimental basis for further analysis and verification of the function of 14-3-3 protein in cotton.

that the molecular weight of 33 GhGF14s was 10.80kDa(GhGF14-32) -30.40kDa(GhGF14- 18), and the isoelectric point was 4.69(GhGF14-05, 13) -6.29(GhGF14-32); The prediction of subcellular localization of Softberry ProtComp9.0 shows that 30 of the 33 GhGF14s are located in the cytoplasm and nucleus, and 3 are located in the nucleus and cell membrane (GhGF14-07, 19,23). It can be seen that there are big differences in the physical and chemical properties of GhGF14s, indicating that GhGF14s may have a more complicated division of labor and functions during the development of cotton.

Chromosome Location and Phylogenetic Tree Analysis
In order to explore the homology of GF14s in G. hirsutum((AD)1), G. barbadense((AD)2), G. arboreum(A2) and G. raimondii(D5), we first locate GF14s on the chromosome (Fig. 1BCDE). In G. hirsutum((AD)1) and G. barbadense((AD)2), the same number of GF14s are distributed on the same 15 chromosomes (Fig. 1BC), in particular, in their respective AD subgroup chromosomes, among them, there are 2 GF14s on A13 chromosome and 3 GF14s on D13 chromosome; In G. arboreum(A2), the distribution of GF14s is roughly the same as the distribution of GF14s on the chromosomes of G. hirsutum((AD)1) and G. barbadense((AD)2) A subgroup (Fig. 1D), but it is worth noting that there are no GF14s on the Chr02 chromosome of G. arboreum(A2), and two on the Chr03 chromosome, And G. hirsutum((AD)1) and G. barbadense((AD)2) each have one GF14s on A02 and A03 chromosomes; In G. raimondii(D5), the distribution and number of GF14s on the chromosomes are somewhat different from the distribution on the chromosomes of G. hirsutum((AD)1) and G. barbadense((AD)2) D subgroup. This indicates that part of the gene may be lost and transferred during the evolution of the D subgroup in cotton.
A phylogenetic tree was constructed from a total of 101 protein sequences of the GF14 protein family in the four cotton species (Table S1), which can obviously be divided into 4 subgroups (Fig. 1A). The phylogenetic tree shows that the GF14s exist in the same position on the chromosomes in G. hirsutum((AD)1) and G. barbadense((AD)2) are tightly clustered, while G. arboreum(A2) and G. raimondii(D5) also have GF14s on the chromosomes of the AD subgroups that are closest to their respective relationships Gathered in a subgroup. Surprisingly, although divided into four major subgroups, the number of genes in G. hirsutum((AD)1) and G. barbadense((AD)2) in each subgroup is basically about twice that of G. arboreum(A2) and G. raimondii(D5). It is in line with the evolutionary relationship of cotton species.
Based on the study of GF14 protein family in Arabidopsis [5] and rice [4] , phylogenetic trees belonging to G. hirsutum((AD)1), Arabidopsis and rice were constructed (Fig. 2). Similarly, the phylogenetic tree is divided into four subgroups. Based on the clustering position of AtGF14, the four subgroups are classified into two broadgroups unique to the GF14 family: ε group and non-ε group.

Identification and analysis of GhGF14 gene homology
Compared with other species, Gossypium hirsutum((AD)1) is an allotetraploid (AADD), and studying its own homology has great significance for the evolution of GF14 protein family in cotton. The homology analysis of GhGF14s on the chromosomes of the G. hirsutum((AD)1) AD subgroup found ( Fig. 3A) that the number and distribution of GhGF14s on the chromosomes of group A and group D are similar. This shows that the evolutionary expansion of this family is somewhat conservative from the perspective of cotton seed evolution; GhGF14s has 54 pairs of non-tandem repeats, 2 pairs of tandem repeats, that is, 56 pairs of paralogous genes, of which there are 10 pairs between A and A, 34 between A and D, and D and D there are 12 pairs; Most genes have one-to-many collinearity, which proves that these genes may play an important role in the evolution and expansion of this family.
Rice and Arabidopsis are model plants in monocotyledonous(dicotyledonous) plants, which play a role as species representative to some extent [40,41] . Using the collinearity analysis among the genomes of G. hirsutum((AD)1), rice and Arabidopsis thaliana, we can understand the evolutionary relationship of the GhGF14 protein family in monocots and dicots. As shown in Fig. 3B, there are 19 pairs of orthologous genes between the genomes of Arabidopsis and G. hirsutum((AD)1), indicating that they have a relatively close evolutionary relationship; However, the collinearity of this family between rice and G. hirsutum((AD)1) is zero, which indicates that the members of this family may have evolved and expanded mainly after the angiosperms split into monocots and dicots. The results of this analysis provide important clues for further research on the function of the protein in G. hirsutum((AD)1) using the existing research results of the protein in Arabidopsis. Gene structure analysis of GhGF14 protein family The GF14 protein family has its own unique gene structure [10] . Using the evolutionary tree ( Fig. 2) for clustering, the conservative motif predicted by MEME [38] and the GhGF14s gene structure information obtained from CottonGen [34] .The cotton genome database, were combined and spliced to obtain the unique GF14 proteins family Grouping mode (Fig. 4). Through the clustering of phylogenetic tree, genes with high sequence similarity and similar gene structure are clustered together (Fig. 4A).
According to the structural information of the gene (Fig. 4C), the ε group of this family has 6-7 exons and 4-6 introns, Non-ε group has 4 exons and 3 introns, and the first exon of the non-ε group is longest. This is consistent with the ε group and non-ε group classification criteria specific to the GF14 protein family [11,12] . And from this, the exact ε group and non-ε group of the GhGF14 protein family were obtained ( Table 1): The ε group has 9 genes (GhGF14-06, 09, 12,16,22,25,27,33), and the non-ε group has 24 genes (GhGF14-01, 02, 03, 04, 05, 07, 08, 10,11,13,14,15,17,18,19,20,21,23,24,26,28,29,30,31). The relatively short GhGF14 gene (GhGF14-32) at the bottom may be an imperfect annotation information in the genome file, so it will not be classified for the time being. The composition of conservative motifs, in terms of number and location (Fig. 4B), is roughly similar in same cluster subgroup, indicating that these family members may have similar functions and regulatory mechanisms; In addition to the same motif1, 2, 3, 4, 5, 6, 7, motif8 unique to the non-ε group and motif9 unique to the ε group deserve our attention, because this may be a grouping exercise of the GF14 protein family. One of the basis for different functions.

Analysis of the expression pattern of GhGF14s gene family
The expression pattern of genes determines the function of genes to a certain extent. The analysis results of tissue-specific expression patterns (Fig. 5) show that the expression patterns of GhGF14 gene family in tissues are quite different. Among them, GhGF14-09, 25, 05, and 21 have high expression levels in different tissues, indicating that it may participate in many important links in the growth and development of cotton; GhGF14-16 and 33 are only highly expressed in stamens, indicating that they may be involved in the regulation of cotton pollen development; GhGF14-11 and 08 are highly expressed in petals and pistils, indicating that they may play a role in the reproduction and growth of cotton; GhGF14-13, 28, 14, 30, 20, 04, 23, etc. are highly expressed in roots, stems and leaves, indicating that it may play an important role in the process of cotton vegetative growth.
Similarly, under different abiotic stresses, the expression patterns of GhGF14 family members also showed great differences (Fig. 6) 20,30,12,27,17), and with the time of stress growth, the amount of expression also shows a trend of upward volatility.

Analysis of cis-acting elements in the promoter region
As a specific binding site involved in the initiation and regulation of protein transcription, the study of cis-acting elements in the gene promoter region is of great significance for understanding the overall regulation of plant gene expression [42] . It can be seen from the figure (Fig. 5A) that there are multiple cis-acting elements related to stress resistance or growth and development in each promoter region of the GhGF14 protein family. Such as abscisic acid, auxin, salicylic acid, jasmonic acid, flavonoids and other hormone-related cis-acting elements, and low temperature, drought, defense and stress response and other anti-stress related cis-acting elements. This indicates that the GhGF14 protein family may be involved in different regulatory mechanisms for resistance to stresses during growth and development in plant.
The upsetplot (Fig. 5B) was constructed to reflect the distribution and proportion of these abiotic stress-related elements in the GhGF14s family. MYB(C/TAACNA/G), studies have shown that it is a cis-acting element involved in drought, salt, and low temperature stress [43] , which is present in 100%(33/33) of the GhGF14s gene promoter region; MYC(CANNTG), which is considered to be an element involved in drought stress and ABA biosynthesis [44] , is present in 97%(32/33) of the GhGF14s gene promoter region; ABRE(ACGT), as an important element involved in ABA biosynthesis and drought stress [45] , is present in 64%(21/33) of the GhGF14s gene promoter region; W-box(TTGACC), which can be combined with WRKY transcription factor, participates in the response of plants to diseases, drought, ABA and other adversities [46] . It exists in 55%(18/33) of the GhGF14s gene promoter region; TC rich repeats(GTTTTCTTAC), involved in plant defense and stress response [47] , exists in 30.3%(10/33) of the GhGF14s gene promoter region; DRE(CCGAC), a cis-acting element involved in drought, salt, and low temperature stress [48] , is present in 21.2%(7/33) of the GhGF14s gene promoter region; LTR(CCG AAA), a cis-acting element involved in low temperature stress [49] , is present in 33.3%(11/33) of the GhGF14s gene promoter region.
It can be seen that the distribution and proportion of cis-acting elements involved in stress resistance-related summation in the promoter region of GhGF14 protein family members are large. Especially drought-related cis-acting elements (MYB, MYC, ABRE, W-box, DRE, etc.) exist in almost every GhGF14 gene promoter region. This indicates that the GhGF14 protein family may play an important role in the process of plant for stress resistance. Therefore, it should be of great significance to study the GhGF14 protein family in the process of stress resistance in plant, especially drought-related response mechanisms.

qRT-PCR analysis
According to the results of transcriptome expression analysis and the Upsetplot of cisacting elements related to the promoter region (Fig. 5B), the GhGF14s gene with obvious expression changes under drought stress and the most intersection of droughtrelated cis-acting elements was specifically selected. Finally, seven GhGF14 genes were selected (GhGF14-09, 14,20,21,23,25,30). At the same time, in order to verify the accuracy of RNA-seq expression analysis, the same leaf parts were selected for qRT-PCR analysis.
The results show (Fig. 7) the relative expression levels of 7 GhGF14s genes in the leaves of drought-resistant materials (KK1543) and drought-sensitive materials (Xinluzao 26) in 1h, 3h and 6h under drought stress shows the opposite trend of change. And in the drought sensitive material (Xinluzao 26), there is a significant high expression compare to the normal treatment at some specific time periods, such as 6h, 24h, etc. This is apparently the same as the result of RNA-seq expression analysis.
In addition, in order to explore the specific expression of the GhGF14s protein family in different cotton tissues, the roots, stems and leaves of Xinluzao 26 at 0h and 6h were selected for qRT-PCR analysis under drought stress (Fig. 8). The results showed that 7 GhGF14 genes generally had higher expression levels in roots under normal 0h treatment, followed by stems, and compared with the expression levels of roots and stems under 6h stress treatment, they generally showed a significant downward trend; On the contrary, in the leaves, the expression level of the leaves under 0h normal treatment and 6h stress treatment showed an up-regulation trend, and there was a significant up-regulation in GhGF14-21 and GhGF14-30.

Discussion
GF14 protein is highly conserved in eukaryotes, and it can form homodimers or heterodimers, combines with phosphorylated proteins that regulate various metabolic processes to form a complex, regulates the activity of phosphorylated proteins, and plays an important role in plant stress and signal transduction [50] . In this study, 33 GF14 protein family members were identified in allotetraploid G. hirsutum((AD)1), which is consistent with existing reports in plants, such as rice(8) [4] , Arabidopsis(13) [5] , tomato(12) [51] , grape(11) [8] , poplar(12) [9] and other species, are the species with the most GF14 family members.
G. arboreum(A2), G. raimondii(D5) and G. barbadense((AD)2)have the closest relatives to G. hirsutum((AD)1).It is of great significance to study the evolutionary transmission between cotton species [20,21,24] for understanding the extension of GhGF14 protein family in cotton. Different from other species, G. hirsutum((AD)1), an allotetraploid, has undergone polyploidization, which exists two subgroups of AD genes. This study shows that in G. hirsutum((AD)1), there are 34 pairs of paralogous genes with collinearity between AD groups. This proves that the evolutionary amplification of GF14 protein is relatively conserved among cotton AD gene subgroups, and is also related to the polyploidization experienced by G. hirsutum((AD)1); Among the four major cotton species, there is only one dense tandem repeat segment on the AD subgroup chromosomes of G. arboreum(A2), G. raimondii(D5), G. barbadense((AD)2) and G. hirsutum((AD)1), and the remaining repeat segments are roughly the same. This indicates that the evolution and expansion of GF14 protein is slower, which shows a certain higher homology between cotton AD gene subgroups.
About 200 million years ago, when angiosperms split into monocotyledonous and dicotyledonous plants, there were two subgroups in the GF14 protein family, the ε group and the non-ε group [11] . In this study, based on the amino acid sequence, gene structure information and phylogenetic tree relationship among G. hirsutum((AD)1), rice and Arabidopsis, the 33 GhGF14 gene members in G. hirsutum((AD)1) were divided into two broadgroups: ε group and non-ε group (Fig. 4), this result is also consistent with studies in other species [4][5][6][7][8][9][10][11][12] . Based on the results of collinearity analysis by the genome data of Arabidopsis, rice and G. hirsutum((AD)1) (Fig. 3B), there are 19 pairs of orthologous genes with collinearity between G. hirsutum((AD)1) and Arabidopsis. There is no collinearity between G. hirsutum((AD)1) and rice, which on the other hand proves that the evolution of the GF14 protein family may indeed be carried out before the differentiation of angiosperms to monocots and dicots.
As a kind of functional regulatory protein, GF14 protein is mostly limited to fiber development in the previous research in cotton. For example, Shi Haiyan et al. [52] found that the Gh14-3-3L gene is mainly expressed in the early stage of fiber development, and reaches a peak values 10 days after anthers stage, suggesting that this gene may be involved in regulating fiber elongation; Zhang Zeting et al. [53] found that six Gh14-3-3 proteins in cotton may be preferentially expressed in fibers and participate in the regulation of fiber cell elongation; Zhou Ying et al. [54] found that cotton 14-3-3 protein can participate in the regulation of fiber initiation and elongation by regulating brassinolide signal transduction. In this study, we specifically found that in the GhGF14 gene family, there are the same 10 gene members (GhGF14-09, 25, 05, 21, 01, 14, 23, 04, 20, 30), Specific expression under cold, heat, drought and salt stress. This indicates that GF14 protein may also play an important role in cotton's response to abiotic stress, therefore, these genes may be the main functional genes in the GF14 family. It is also surprising that most of these gene members belong to the non-ε subgroup of the GhGF14 family, which indicates that there may be some structure in the non-ε subgroup gene sequence to make it function.
The research on the function of GF14 protein in plants is more in model plants. In these studies, it was found that GF14 protein can interact with a series of drought stress response-related proteins, enzymes or hormones, such as ion channel protein [55] , plasma membrane H+-ATPase [56] , ABA [57] , etc. The results of the qRT-PCR experiment in this study showed (Fig. 8) that the seven genes from GhGF14 gene family under drought stress in cotton, as the stress time increases, made significant responses in different upland cotton materials. This indicates that the GhGF14 protein family may interact with certain transcription factors or kinases, and have a negative regulatory effect on drought stress; The tissue-specific results showed (Fig. 9) that with the increase of stress time, the relative expression levels of the seven GhGF14 genes in the roots and stems showed a downward-regulated trend, but all showed an upward-regulated trend in the leaves. This also indicates that GhGF14 protein may be regulated by a certain mechanism under drought stress, and decrease or increase in roots or leaves to achieve the purpose of inhibiting or enhancing the transport and synthesis of certain droughtresponsive proteins or hormones. So as to complete the mechanism of response to drought stress. This also indirectly indicates that the GhGF14 protein family may indeed participate in the drought stress response mechanism of plants, but further research is needed.

Conclusion
Through phylogenetic tree and homology analysis, explored that the evolution and expansion of the GF14s protein family in the two major gene subgroups of cotton AD are highly conserved. And according to the evolutionary relationship between rice, Arabidopsis and G. hirsutum((AD)1), the 33 GhGF14 protein family members are systematically divided into two GF14 subgroups, the epsilon group and the non-epsilon group. RNA-seq expression pattern analysis, it is indicated that this protein family may be involved in regulating multiple anti-stress response mechanisms in cotton. The prediction of the promoter region and qRT-PCR analysis, explored that it may have important regulatory functions in the process of drought stress response. Therefore, these results provides an important scientific basis for further research on the anti-stress function of the GF14 protein family in cotton, especially the regulation mechanism in response to drought stress, and also provides a potential help for further improving the adaptability and yield of cotton.

Plant material
In this study, Xinjiang popularized Gossypium hirsutum((AD)1) resource materials KK1543 (drought-resistance materials, Xinjiang Academy of Agricultural Sciences) and Xinluzao 26 (drought-sensitive materials, Xinjiang Bazhou Agricultural Science Research Institute) [22] were used as experimental materials.1/2 Hoagload nutrient solution hydroponics was applied, 12 h light/12 h dark, 25 ℃ for greenhouse culture. When cotton seedlings are cultivated to the three-leaf stage, they are treated with 15% PEG6000 solution [23] to simulate drought stress. Three cotton seedlings with the same growing vigor were selected at 0h, 1h, 3h, 6h, 12h, 24h, 48h, 72h, and the roots, stems, leaves and other tissues were sampled and stored in liquid nitrogen.