Identification of TaGATAs in Wheat
In total, 79 GATA family members were identified in wheat. The detailed information of genes and proteins were listed in Tables S1. For example, the amino acid length of 79 TaGATA proteins ranges from 146 to 499. Meanwhile, the molecular weight is ranged from 16.1 to 54.1 kDa. The GATA domain sequences were listed in Table S2.
Phylogenetic Analysis of TaGATA Proteins
To figure out the phylogenetic relationship of the GATA proteins, we constructed a evolutionary tree in terms of the alignment of 79 wheat TaGATAs and 29 Arabidopsis AtGATAs (Figure 1). The AtGATAs protein sequence were listed in Table S3. It was reported that 29 AtGATA proteins could be categorized into four clusters (Reyes et al., 2004). On the basis of classification standard used for Arabidopsis, the wheat GATA proteins were classified into four group. Group I, II, III, and IV consist of 35, 21, 12, and 11 TaGATA proteins, respectively (Figure1 and 2A).
Gene Structure and Protein Motif Analysis of TaGATA
We used the web server GSDS to analyze TaGATA genes structures. The results showed that TaGATA genes contained one to eight exons unevenly (Figure 2B). Protein motifs were determined by MEME. In general, 10 conserved motifs were found in TaGATA proteins and considered motifs 1-10 (Figure 2C). The detailed information of conserved motif were listed in Table S4. In total, 19 of 79 TaGATAsonly contain motif 1. Thirty five of 79 TaGATAs contain motif 1 and 2. The motif 1 were primarily presented in subfamily I and Ⅱ, and the motif 3-10 were detected in the members of group II and Ⅳ. In a word, similar gene structures and conserved motifs in the same subfamily forcefully back up phylogenetic analysis for subfamily classifications.
In addition, GATA domain analysis showed that TaGATAs in the subfamilies I, II and IV comprised 18 residues in the zinc finger loop between the second and the third Cys residues, while TaGATAs in the subfamily III comprised 20 residues, with the exception of TaGATA4 and TaGATA15 comprised 18 residues. In the GATA domains, many amino acid sites exhibited high conservation, such as LCNACG residues (Figure 3).
Chromosomal Location and Genome Synteny of TaGATA Genes
The chromosomal distribution of TaGATA gene were analyzed. In total, 79 TaGATAs were mapped to the wheat genome (Figure 4). The TaGATA genes were evenly located among A (29), B (25), D (25) subgenomes. This was consistent with the finding that a large proportion of TaGATAs have three homoeologous sequences distributed on three subgenomes. There were three TaGATA genes located on chromosome 3, 5. Six TaGATAs could be found on each of chromosomes 1 and 2. Four TaGATA genes were located on chromosome 6. Five TaGATA genes were distributed on chromosome 4A and three TaGATA genes were located on chromosome 4B and 4D. Chromosome 7 carried 2 TaGATAs which was the minimum number. With approach of BLAST and MCScanX, we detected 96 segmental duplication events in TaGATAs (Figure 6; Table S5). All events were almost happened between the different chromosomes. Furthermore, 4 duplication events happened on the AA subgenome, 3 events on the BB subgenome, 4 events on the DD subgenome, and 85 events across AA/BB/CC subgenomes. The above results demonstrate that a number of TaGATA genes are likely to appear in the course of gene duplication, and the segmental duplication events could be of great importance in the expansion of TaGATA genes in wheat.
The colinearity of TaGATA gene pairs between Hordeum vulgare genome, Arabidopsis thaliana genome and Oryza sativa genome was compared. The result exhibited that three and ten TaGATA genes exhibited syntenic relationship with AtGATA and OsGATA genes, respectively (Figure 7; Table S6 and S7). For example, AT2G45050 showed syntenic relationship with TaGATA3, TaGATA9 and TaGATA14 (Table S6). However, 54 TaGATAs showed syntenic relationship with GATAs in barley (Table S8), implying that these genes may be responsible for the evolution of TaGATAs family.
To assess the evolutionary constraints acting, we calculated Ks values, Ka values, Ka/Ks ratios and divergence time of paralogous and orthologous on GATA family genes (Tables S9). Ka/Ks ratios were less than 1 in several segmental duplicated TaGATA gene pairs, while TaGATA26/TaGATA31 were more than 1. The results demonstrated that TaGATAs family probably have suffered strong purifying selective stress in the course of evolution.
Cis-elements Analysis in TaGATAs Promoters
To explore the underlying function of TaGATA genes, we used Plant-CARE to detect the cis-elements in these genes promoter. 79 TaGATAs were estimated with cis-elements, such as ABRE, circadian, G-box, LTR, MSA, P-box, TCA, TGA TGACC-motif and MBSⅠ involving in ABA responses, circadian control, light response, low-temperature response, cell cycle regulation, gibberellin response, salicylic acid response, auxin response, MeJA response, drought-inducibility and flavonoid biosynthetic genes regulation (Figure 5, Table S10). In general, 69 TaGATA genes (87.3%) carried ABRE cis-elements, 75 TaGATA genes (94.9%) had G-box cis-elements, and 63 TaGATA genes (79.7%) carried TGACC cis-elements. In a word, the cis-elements analysis implied that a large portion of TaGATA genes are likely to be responded to various environmental stresses.
Expression Analysis of TaGATAs in Wheat Tissues
The expression patterns of 79 TaGATAs in 5 tissues of Chinese spring, including roots, leaves, stems, spikes, and grains, were compared (Figure 8; Table S11). On the basis of different expression pattern of these genes, they could be classified into two groups. Group 1 include 9 genes, and they were only expressed in some tissues. For example, TaGATA4 were only expressed in spike, and no expressed in other tissues. Group 2 includes 70 genes, which displayed expression in all tissues analyzed in the current study. Group 2 can be divided into two subgroup. Twelve TaGATAs were assigned to the subgroup 1 with high expression levels (log2TPM+1>2) in all tissues. 10 TaGATAs were assigned to the subgroup 2 with low expression levels (log2TPM+1<0.5) in all tissues. The rest of 48 genes of 70 genes were belong to the subgroup 3. These results implied that TaGATAs showed different expression level and genes in the same subfamily also displayed different expression profile.
Expression Patterns of TaGATAs under Abiotic Stress
We analyzed the expression level of TaGATA genes under different abiotic stress using the wheat transcriptome data recently published, such as drought, heat, cold stresses and P starvation. Overall, the expression level of TaGATA genes significantly changed under diverse abiotic stresses (Figure 9; Table S12). Several TaGATA genes were in response to heat stress or P starvation. For example, the expression level of TaGATA74, TaGATA76 and TaGATA78 were extremely increased by P starvation. TaGATA54, TaGATA57 and TaGATA60 showed high expression level responding to heat stress. Meanwhile, some TaGATA genes were repressed by cold stress, such as TaGATA53 and TaGATA59, or by P starvation, such as TaGATA19. In contrast, several TaGATAs were not induced by any abiotic stresses. For example, TaGATA4 and TaGATA20 displayed almost no expression alteration in response to all analyzed treatments. Instead, several genes displayed opposite expression patterns under different abiotic stress. For instance, TaGATA78 was extremely induced by all treatments, which showed down-regulation in drought stress, but up-regulation in other treatments.