Identification of the HD-Zip gene family in wheat
Wheat genome data used in this study were downloaded from the Chinese Spring IWGSC RefSeq v1.1 reference genome assembly (https://wheat-urgi.versailles.inra. fr/). We firstly converted the wheat genome into a local BLAST database using the UNIX pipeline. Then, we used 90 Arabidopsis and rice HD-Zip protein sequences to perform a BLAST search (BLASTP) against this local blast database using cut-off E-value < 1e-10. After remove the all redundant sequences using CD-hit program, the rest of protein sequences were further subjected to identify the HD domain and LZ motif using the Simple Modular Architecture Research Tool (SMART; http://smart. embl-heidelberg.de/smart/set_mode.cgi? NORMAL=1). In a recently study, a total of 46 HD-Zip genes were identified in wheat by a genome-wide bioinformatic survey . In this study, we further identified 67 additional HD-Zip genes in wheat latest genome and extended the total member to 113. Based on the genomic position information, 113 HD-Zip genes were located across all the 21 wheat chromosomes, ranging from 3 to 8 per chromosome. Chromosome 5A/B/D have the most HD-Zip genes (24 total, 8 per chromosome), followed by chromosome 4A/B/D (18 total, 6 per chromosome) (Table 1; Additional file 1: Figure S1). The 113 HD-Zip proteins were grouped into 40 clusters based on their phylogenetic relationship. Among them, 39 clusters were assigned to different A, B or D sub-genomes, which were considered as the homoeologous copies of one HD-Zip gene. Finally, We designated wheat HD-Zip genes as TaHDZX_ZA, TaHDZX_ZB, or TaHDZX_ZD, where X denotes the gene number and Z the wheat chromosome where it is located. The detailed information of HD-Zip family genes in wheat, including nomenclature proposed in the previous study  was listed in Table 1. As shown in Table 1, the identified HD-Zip genes in wheat encode proteins ranging from 192 (TaHDZ12-6D) to 890 (TaHDZ35-1B) amino acids (aa) in length with an average of 501 aa. Furthermore, the computed molecular weights of these HD-Zip proteins ranged from 20.88 (TaHDZ12-6D) to 96.02 (TaHDZ35-1B) kDa. The theoretical pI of the deduced HD-Zip proteins ranged from 4.59 (TaHDZ5-6A) to 9.79 (TaHDZ12-6D).
Phylogenetic analysis of HD-Zip gene family
Our study aimed to understand the phylogenetic relationships between plant HD-Zip proteins. We began by identification of HD-Zip genes from seven other plant species with varying levels of complexity for which entire genomes were accessible: an algae (Chlamydomonas reinhardtii), a moss (Physcomitrella patens), the monocotyledonous angiosperms Oryza sativa, Zea mays, and Brachypodium distachyon, and the dicotyledonous angiosperms Arabidopsis thaliana, Populus trichocarpa, and Vitis vinifera. From this analysis, we found that the HD-Zip gene family seems to be restricted to land plants; all genomes except that of the algae contained genes for HD-Zip proteins. We then used the neighbour-joining phylogenetic tree method with full-length HD-Zip protein sequence alignments from eight plant species to describe their evolutionary relationships. All HD-Zip proteins sorted into four well-conserved subfamilies, HD-Zip I to IV (Fig. 1A). The phylogenetic tree revealed that the plant HD-Zip sequence distribution predominates with species bias (Fig. 1B). HD-Zip I genes generally consisted of the largest subfamilies in the plant species except for Brachypodium distachyon and wheat, where HD-Zip II and IV were the largest respectively. In contrast, HD-Zip III genes composed of the fewest numbers of HD-Zip members except for moss (Fig. 1C). Subfamily I included 31 TaHDZ genes, grouped into 11 clusters (TaHDZ1-4A/B/D, TaHDZ2-5A/B/D, TaHDZ3-4A/B/D, TaHDZ4-5A/B/D, TaHDZ5-6A/D, TaHDZ6-5A/B/D, TaHDZ7-2A/B/D, TaHDZ8-6A/B/D, TaHDZ9-4A/B/D, TaHDZ10-2B/D, and TaHDZ11-2A/B/D); Similarly, subfamily II embraces 31 TaHDZs, grouped into 12 clusters (TaHDZ12-6A/B/D, TaHDZ13-6A/B/D, TaHDZ14-7A/B/D, TaHDZ15-1A/B/D, TaHDZ16-4B/D, TaHDZ17-3B/D, TaHDZ18-5A/B/D, TaHDZ19-3A/B/D, TaHDZ20-1A/B/D, TaHDZ21-2A/B/D, TaHDZ22-4A, and TaHDZ23-7A/D); While subfamily III is the smallest, and contained 14 TaHDZs, which grouped into 5 clusters (TaHDZ24-3A/B/D, TaHDZ25-1A/B/D, TaHDZ26-4B/D, TaHDZ27-5A/B/D, and TaHDZ28-5A/B/D); Subfamily IV contained 36 TaHDZs, and grouped into 12 clusters (TaHDZ29-3A/B/D, TaHDZ30-4A/B/D, TaHDZ31-5A/B/D, TaHDZ32-3A/B/D, TaHDZ33-6A/B/D, TaHDZ34-7A/B/D, TaHDZ35-1A/B/D, TaHDZ36-6A/B/D, TaHDZ37-2A/B/D, TaHDZ38-5A/B/D, TaHDZ39-7A/B/D, and TaHDZ40-2A/B/D) (Table 1).
To clarify the paralog and ortholog relationships among this family, we further divided the subfamilies into subclasses. According to the phylogenic tree (Fig. 2), each subfamily contain rice, Arabidopsis and wheat HD-Zip genes, suggesting that this subfamily in plants were generated before the dicot-monocot split. Consistent with the nomenclature in previous studies of Arabidopsis and rice , HD-Zip I subfamily was divided into seven subclasses, i.e., α, β, γ, δ, ε, φ and ζ (Fig. 2). Clade ε and φ contains only sequences from Arabidopsis. Clade ζ contains sequences from both rice and wheat, while no members of Arabidopsis were detected in this subclades, suggesting that Arabidopsis lost its members of this group during the long period of evolution. The HD-Zip II subfamily was divided into ten subclasses, from α to κ, according to Hu et al. (2012) . Clade β contains only sequences from Arabidopsis. Clade α and γ contains sequences from both rice, wheat, and Arabidopsis. While the other clades only contains sequences from rice and wheat. The HD-Zip III subfamily was classified into three subclades, designated as α, β and γ according to the previous studies . Each clade contains sequences from both rice, wheat, and Arabidopsis. The HD-Zip IV subfamily was also divided into six subclades, designated clade α, β, γ, δ, ε and ζ as in a previous study . Clade δ excluded genes from rice and Arabidopsis, while clade ζ included only sequences from rice and wheat. Eudicot- and monocot-specific clustering patterns of HD-Zip genes emerged when tree topology was examined. This pattern may reflect evolutionary history of these subgroups: HD-Zip genes in eudicots were likely retained after they diverged from monocots and then expanded.
Gene structure and motif composition analysis
Exon-intron structural divergence can play an important role in the evolution of multiple gene families . We constructed a phylogenetic tree using only the 113 full-length wheat HD-Zip protein sequences fo further examine patterns in wheat. We found that wheat HD-Zip proteins also fell into the four subfamilies described previously (Fig. 3A). We further mapped the exon/intron organization in the coding regions of each TaHDZ gene. Specifically, 21 TaHDZ genes had two introns, 28 had three introns, 15 had four introns, two had five introns, two had seven introns, 11 had eight introns, 12 had nine introns, three had 10 introns, five had 11 introns, two had 15 introns, two had 16 introns, and 10 had 18 introns (Fig. 3B, C). In general, orthologous genes are highly conserved with respect to gene structure, and this conservation is sufficient to reveal their evolutionary relationships . In wheat, HD-Zip genes within the same subfamily shared similar gene structures (intron number and exon length), especially the members of the HD-Zip I and HD-Zip III subfamilies, i.e, most wheat HD-Zip I genes had two or three introns, and most HD-Zip III genes had 18 introns in their coding regions. Nonetheless, the gene structures in wheat HD-Zip subfamilies II and IV appear to be more variable and display the largest number of exon/intron structural variations, i.e., HD-Zip II members possessed two to four introns, and the number of introns in HD-Zip IV family members varied from 4 to 11 (Fig. 3B, C).
The allohexaploid bread wheat genome is known to have formed by fusion of the T. urartu (subgenome A), Aegilops speltoides (subgenome B), and A. tauschii (subgenome D) genomes prior to several hundred thousand years ago. A majority (60.1-61.3%) of genes in the A, B, and D sub-genomes have orthologs in all the related diploid genomes. To obtain intron gain/loss information for all of the TaHDZ genes in the A, B, and D sub-genomes, we also compared the intron/exon structures of the genes that clustered together based on the phylogenetic tree. Among these, fourteen clusters showed changes in their intron/exon structure, including TaHDZ1-4A/B/D, TaHDZ3-4A/B/D, TaHDZ5-6A/D, TaHDZ10-2B/D, TaHDZ12-6A/B/D, TaHDZ20-1A/B/D, TaHDZ24-3A/B/D, TaHDZ25-1A/B/D, TaHDZ30-4A/B/D, TaHDZ32-3A/B/D, TaHDZ35-1A/B/D, TaHDZ38-5A/B/D, TaHDZ39-7A/B/D, and TaHDZ40-2A/B/D (Fig. 3B). Because there are many orthologs in the wheat A, B, and D sub-genomes, intron gain/loss of these orthologs significantly increases the transcriptome and proteome complexity in wheat.
To further examine the diverse structurse of wheat HD-Zip proteins, the conserved motifs were identified by searching the SALAD database along with subsequent annotation with InterPro (Additional file 2: Figure S2). Seven of these motifs were found to be associated with the functionally defined domains. Motifs 1 and 2 were referred to the HD domain, which is the typical conserved domain found in the middle of all the TaHDZ proteins, and motif 5 was associated with the adjacent LZ domain. Motifs 17 and 34 were referred to the MEKHLA domain, which is specific to subfamily III, and was found only in subfamily III proteins in wheat (14 members). Motifs 3 and 4 were associated with the START region, which has been identified in subfamily III and IV proteins (Additional file 2: Figure S2). Similar motif compositions are shared by TaHDZ proteins which cluster together, and this indicates that members of a given group possess similar functionalities.
Tissue-specific expression profile of TaHDZ genes
Gene family members can exhibit different expression patterns in different tissues to accommodate various physiological processes. To gain insight into the temporal and spatial expression patterns and putative functions of HD-Zip genes in wheat growth and development, the tissue-specific expression patterns of the 113 TaHDZ genes were investigated using RNA-seq data from 10 different tissues. All TaHDZ genes were found to be expressed in at least one of the tissues examined (Fig. 4; Additional file 3: Table S1). Subfamily I TaHDZ genes were found to be much more highly expressed in seedling roots, stems, leaves, flag leaves, young spikes, and 5-day-old grains; for example, TaHDZ1-4A/B/D are highly expressed in leaves and 5-day-old grains, TaHDZ8-6A/B/D are highly expressed in leaves and young spikes (15-days-old), and TaHDZ11-2A/B/D are highly expressed in leaves and 5-day-old spikes (Fig. 4; Additional file 4: Figure S3). Subfamily II TaHDZ genes are more highly expressed in seedling roots, stems, leaves, flag leaves, and young spikes; for example, TaHDZ19-3A/B/D are highly expressed in young spikes, while TaHDZ20-1A/B/D are highly expressed in seedling stems, leaves, and 5-day-old spikes (Fig. 4; Additional file 5: Figure S4). Subfamily III TaHDZ genes showed relatively higher expression levels in seedling stems, leaves, and young spikes; TaHDZ24-3A/B/D are highly expressed in seedling leaves, and TaHDZ27-5A/B/D are highly expressed in seedling stems and leaves (Fig. 4; Additional file 6: Figure S5). Subfamily IV TaHDZ genes are highly expressed in seedling stems, young spikes, and grains; TaHDZ29-3A/B/D are highly expressed in 10-day-old grains, TaHDZ32-3A/B/D are highly expressed in 5-20 day-old grains, and TaHDZ38-5A/B/D are highly expressed in seedling stems and young spikes (Fig. 4; Additional file 7: Figure S6). Thus, genes in the four wheat HD-Zip subfamilies display obvious differences in expression patterns and levels, which indicates that these genes have undergone functional differentiation and redundancy. It is worth mentioning that most homologous genes show similar expression patterns during development. However, it should also be noted that many clustered expression profiles do not reflect gene similarities, and this includes the copies of individual HD-Zip gene types from the sub-genomes. Some of them even show the opposite expression patterns. For instance, TaHDZ7, which is located on chromosome 2D, is preferentially expressed in the seedling leaves and flag leaves, whereas the homologous TaHDZ7 gene from 2A is only expressed in the flag leaves, and the TaHDZ7 homolog from 2B is preferentially expressed in flag leaves and 5-day-old spikes (Fig. 4; Additional file 4: Figure S3). TaHDZ37 on 2A shows relatively higher expression in 10-15 day-old grains, while its homologous TaHDZ37 from 2B is preferentially expressed in seedling leaves and 20-day-old grains, and the homologous from 2D is highly expressed in 15-days-old grains (Fig. 4; Additional file 7: Figure S6). The divergences in expression profiles between homologous genes from the different subgenomes reveals that some of them may have lost their function or acquired a new function after polyploidization during the evolution of wheat.
Expression patterns of TaHDZ genes in response to drought stress
Wheat productivity is severely affected by drought stress, and therefore the study of drought responsive genes is important to increase wheat yield. Many studies have shown that the HD-Zip genes play a crucial role in the response to abiotic stresses in plants. To gain more insight into the roles of wheat HD-Zip genes in drought tolerance, we reanalyzed the expression profiles of all wheat HD-Zip genes using RNA-seq data from roots and leaves that were subjected to drought treatment. We found that the wheat HD-Zip genes could be mainly classified into two groups based on their expression patterns (Fig. 5A, B; Fig. 6A, B). In leaves, the expression levels of 45 TaHDZ genes were up-regulated at one or more time point during drought stress treatment; this included 20 genes from the HD-Zip I subfamily (TaHDZ2-5A/B/D, TaHDZ4-5A/B/D, TaHDZ5-6A/D, TaHDZ6-5A/B/D, TaHDZ7-2A/B/D, TaHDZ8-6A/B/D, TaHDZ9-4B/D, and TaHDZ11-2D), 19 genes from the HD-Zip II subfamily (TaHDZ18-5A/B/D, TaHDZ20-1A/B, TaHDZ16-4A/B/D, TaHDZ12-6A/D, TaHDZ13-6A/B/D, TaHDZ14-7A/B, TaHDZ15-1A/B/D, and TaHDZ17-3D), one gene from the HD-Zip III subfamily (TaHDZ24-3A), and five genes from the HD-Zip IV subfamily (TaHDZ29-3A, TaHDZ30-4B, TaHDZ31-5D, TaHDZ37-2A/B) (Fig. 5A, C, and D). In contrast, 50 TaHDZ genes showed down-regulated expression under drought stress, including seven genes from subfamily I, six genes from subfamily II, 12 genes from subfamily III, and 25 genes from subfamily IV (Fig. 5A, C, D). In roots, 34 TaHDZ genes were found to be up-regulated in response to drought stress, including 16 genes from subfamily I (TaHDZ4-5A/B/D, TaHDZ6-5A/B, TaHDZ7-2A/B/D, TaHDZ8-6A/B/D, TaHDZ9-4B/D, and TaHDZ11-2A/B/D), 16 genes from subfamily II (TaHDZ15-1A/B/D, TaHDZ16-4A/B/D, TaHDZ17-3B, TaHDZ19-3A/B/D, TaHDZ20-1A/B/D, TaHDZ21-2A/B, and TaHDZ22-4A) and two genes from subfamily IV (TaHDZ37-2B and TaHDZ40-2B) (Fig. 6A, C, D). In contrast, 51 TaHDZ genes were down-regulated under drought stress in roots, including 12 genes from subfamily I, 8 genes from subfamily II, 13 genes from subfamily III, and 18 genes from subfamily IV (Fig. 6A, C, D). These results indicate that most TaHDZ genes in subfamilies I and II may play important roles in the response to drought stress.
TaHDZ5-6A confers drought tolerance in Arabidopsis
The phylogenetic analysis and gene expression profiles suggest that TaHDZ5-6A/D may participate in regulating the drought stress response in wheat. Protein sequence analysis revealed that TaHDZ5-6A and TaHDZ5-6D share 95% sequence similarity (Additional file 8: Figure S7). In order to further comfirm the potential role of TaHDZ5 in the drought stress response, we performed quantitative real-time PCR (qRT-PCR) using RNA isolated from different tissues and drought conditions. The PCR primers were designed to amplify the homologous alleles of TaHDZ5. The results showed that TaHDZ5 is expressed at higher levels in the seedling leaves, flag leaves and young spikes, with the highest expression detected in the seedling leaves, and TaHDZ5 was upregulated throughout the testing period by drought stress (Additional file 9: Figure S8). To further investigate the role of TaHDZ5 in the drought stress response, we generated 35S::TaHDZ5-6A transgenic Arabidopsis lines. Three independent transgenic lines (OE1, OE2, and OE3) were chosen for analysis based on their TaHDZ5-6A expression levels (Fig. 7B). Under well-watered conditions, there were no evident morphological differences between transgenic and wild type (WT) Arabidopsis throughout their life cycles. WT and 35S::TaHDZ5-6A transgenic plants were grown for 3 weeks in soil before water was withheld for 14 d. At the early stages of drought treatment, TaHDZ5-6A transgenic and WT plants grew normally, with no notable phenotypic differences between them (Fig. 7A). After the drought treatment and four days of rewatering, 65-74% of the 35S::TaHDZ5-6A plants had survived, whereas only 6% of the WT plants were alive (Fig. 7A, C). The survival rates of lines OE-1, OE-2, and OE-3 plants were 68, 65, and 74%, respectively, which were much higher (t-test, P < 0.01) than in WT (18%). Thus, the overexpression of TaHDZ5-6A greatly improved drought tolerance in transgenic Arabidopsis.
The stomatal apertures of leaves from 35S::TaHDZ5-6A and WT plants grown in soil were measured. The stomatal aperture indices of of the OE1, OE2, and OE3 plants were 0.41, 0.42 and 0.41, respectively, while that of the WT plants was 0.40, when grown under normal conditions (Fig. 7D). After being subjected to 10 d of drought stress, the stomatal aperture indices of the OE1, OE2, and OE3 plants decreased to 0.22, 0.18, and 0.22, respectively, significantly reduced as compared to that of the WT (Fig. 7D, E). In keeping with these results, after dehydration, detached leaves of 35S::TaHDZ5-6A transgenic plants lost water much more slowly than those of WT plants (Fig. 7E). These results indicate that the 35S::TaHDZ5-6A transgenic plants removed water from the soil more slowly than did the WT plants, reducing the rate of wilting. To explore whether TaHDZ5-6A overexpression influences proline accumulation, we measured the free proline contents in the transgenic and WT plants. After drought treatment, the proline contents of the transgenic lines were much higher than those of the WT plants (Fig. 7F). These findings collectively indicate that TaHDZ5-6A can enhance drought tolerance in transgenic Arabidopsis.
Changes in gene expression in 35S::TaHDZ5-6A transgenic Arabidopsis
RNA sequencing allowed us to understand how drought tolerance was conferred by the overexpression of TaHDZ5-6A. The transcriptome of the 35S::TaHDZ5-6A transgenic plants was compared to that of WT plants under normal, non-stress conditions. In transgenic plants, a total of 495 and 111 genes were upregulated and downregulated by at least 2-fold (P < 0.001, FDR < 0.05) as compared with the WT (Fig. 8A, B; Additional file 10: Table S2). The upregulated genes included genes related to water deprivation, abscisic acid, hormones, and abiotic stimuli, and downregulated pathways included those responsive to auxin stimuli, oxidative stress, and defense responses (Fig. 8C). We then chose 10 genes upregulated in transgenic plants and known to be involved in response to drought: DREB2A , RD29A , RD29B , RD26 , RD17 , PP2CA , RAB18 , ANAC019 , NCED3 , and RD20 . We used qRT-PCR to measure their relative expression levels under normal and drought conditions in transgenic and WT plants (Fig. 8D). The results of qRT-PCR were in alignment with those of RNA-seq, indicating that TaHDZ5-6A may positively regulate the transcription of these 10 genes, and thereby play a role in the response, including rapid stomatal closure and reduction of water loss, of transgenic Arabidopsis plants under drought conditions.