Identification and basic characteristics of TaHsf genes
The newly constructed wheat-specific hidden Markov model (HMM) file was used to search the whole wheat protein sequences and 94 candidate Hsf genes were obtained, which were consistent with the wheat candidate Hsf genes obtained by alignments with Arabidopsis and rice Hsf genes. After identifying the DBD domain and coiled-coil structure, 61 non-redundant genes were finally identified in the wheat HSF family (Additional file 1: Table S1, Additional file 2: Table S2). These 61 TaHsfs were shown to be unequally distributed on 21 chromosomes (Additional file 3: Figure S1). The 5A chromosome contained the most TaHsf genes (6) and one to five TaHsfs were unevenly distributed on other chromosomes. We renamed these genes as TaHsf-01 to -61 according to their respective chromosome distribution. The gene name, transcript ID, gene position, exon number, aa length, molecular weight, and theoretical isoelectric point are listed for each of the TaHsfs in Table 1. The UniParct ID and old versions (IWGSC2.2) of the gene IDs are also included for clarity. The protein lengths of the TaHsfs ranged from 227 (TaHsf-22) to 569 (TaHsf-44), the molecular weights varied from 24.69 (TaHsf-22) to 59.76 (TaHsf-44), and the predicted isoelectric points ranged from 4.85 (TaHsf-12) to 9.52 (TaHsf-43).
Phylogenetic analysis of TaHsfs
The full-length amino acid sequences of 25 Hsf genes in rice, 22 Hsf genes in Arabidopsis, and 61 Hsf genes in wheat were obtained to co-construct a phylogenetic tree to classify the TaHsfs (Figure 1). Based on the clustering results, 61 TaHsfs were classified into three major categories: A, B, and C. The TaHsfs in class A belonged to six subclasses of A1–A6, the class B TaHsfs were divided into three subclasses of B1, B2, and B4; and the class C TaHsfs were divided into two subclasses of C1 and C2. Compared with rice and Arabidopsis Hsf members, the TaHsf family lacked members belonging to A7, A8, A9, and B3. In every subclass, at least one OsHsf was found to be highly homologous with the Hsf genes in wheat, e.g., OsHsf-13 (A1) was highly homologous with TaHsf-46, TaHsf-32, and TaHsf-51 in A1; OsHsf-07 (A3) shared high homology with TaHsf-10, TaHsf-14, and TaHsf-07 in A3; OsHsf-04 (A4b) in A4 was highly homologous with TaHsf-28, TaHsf-18, and TaHsf-13; and OsHsf-06 (A5) and TaHsf-52, TaHsf-53, TaHsf-54, in A5 were clustered together due to their high homology. OsHsf-17 (A6a) in A6, OsHsf-23 (B1) in B1, OsHsf-14 (B2a) and OsHsf-24 (B2c) in B2, OsHsf-10 (B4d) in B4, OsHsf-02 (C1b) and OsHsf-03 (C1a) in C1, and OsHsf-05 (C2a) and OsHsf-16 (C2b) in C2 were also highly homologous with some Hsfs in wheat. However, only TaHsf-01 in A4 and OsHsf-15 (A4d), and TaHsf-13 in B4 and OsHsf-19 (B4d) were orthologous genes. No orthologous genes were detected in the HSF family members in Arabidopsis and wheat, which indicates that the genetic relationship between wheat and rice is closer than that between wheat and Arabidopsis. Nineteen pairs of paralogous genes were found among the TaHsfs. A2 had the most paralogous genes with five pairs, as well as three pairs in C1 and B2, two pairs in C2, and one pair in each of the other subclasses. In addition to the initial the roles of heat stress response, those paralogous genes confered new functions for wheat HSF family, which is very important for the functional differentiation of this gene family.
Synteny analysis for TaHsfs
The synteny between species is strongly related to their evolution and kinship. In order to compare the kinship between wheat and other species based on HSFs, we performed synteny analysis between wheat with Arabidopsis thaliana, Chinese cabbage, rice, maize, sorghum, and Brachypodium. The results showed that TaHsfs were distantly related to the Hsfs in Arabidopsis thaliana and Brassica rapa, where only two and three genes were syntenic genes, respectively (Figure 2A). By contrast, the number of syntenic genes between wheat and rice (25), sorghum (39), corn (30), and Brachypodium distachyon (34), which all belong to the Gramineae, were higher (Figure 2B, Figure 2C). Arabidopsis and Brassica rapa are both cruciferous, so their 21 Hsf genes had synteny relationships (Figure 2D). Thus, the Hsf genes in plants from the same family were more closely related, thereby suggesting that the evolution of Hsfs is consistent with the evolutionary direction of the whole genome. To further investigate the evolutionary relationships in the TaHsf family, we performed synteny analysis within the TaHsf genome. Only two pairs of tandem duplications comprising TaHsf-20/TaHsf-19 and TaHsf-30/TaHsf-29 occurred in the TaHsf family on the 3A and 3D chromosomes, respectively (Additional file 3: Figure S1), and their KA/KS values were 0.367731 and 0.388683. Therefore, these two pairs of genes were subject to purifying selection. In addition, 52 TaHsfs participated in segmental duplication (Additional file 4: Table S3) and one to five TaHsfs were involved in each chromosome (Figure 3). The results showed that the probability of segmental duplication was greater for TaHsfs in the same chromosome than that between chromosomes. For instance, the segmental duplication on chromosome 2 only occurred between 2A, 2B, and 2D. Thus, the Hsf gene family in wheat mainly formed via segmental duplication.
Gene structure and motif composition in TaHsfs
Multiple sequence alignments were performed to further analyze the degree of conservation in the DBD and OD domains in the TaHsfs. The results showed that the DBD amino acid lengths in the TaHsfs were 76–94 aa, and thus they were found to be highly conserved (Figure 4A). In particular, TaHsf-28 (76 aa) had a 17 aa deletion compared with most of the TaHsfs. The secondary structure of the DBD in the TaHsfs was predicted to comprise three alpha helices and four beta folds. According to the OD structure, the classification of the gene family was consistent with the results obtained by phylogenetic analysis, with 21 and seven aa insertions between the HR-A and HR-B in the class A and class C TaHsfs, respectively, and no insertion between the HR-A and HR-B in the class B TaHsfs (Figure 4B). The close phylogenetic relationships between the TaHsfs indicated the high evolutionary conservation of the ODs, and the sequences of the ODs in some pairs of paralogous genes were even identical. Figure 5 shows that the number of introns was very small in the TaHsf family members, ranging from 0 to 4, and the lengths of the introns varied greatly. Motif prediction identified 15 conserved motifs with high confidence ranging between 11–50 aa in the TaHsf family, where motif 1 was the longest (50 aa) and motif 6 was the shortest (Additional file 5: Table S4). None of the TaHsf contained all 15 motifs. Three TaHsfs in subclass A1 contained the most motifs (10) and TaHsf-41 contained the least motifs (three). Motifs 1, 2, 5, and 6 corresponded to the DBD domain, motif 3 and 4 to the OD domain, motif 7 to the NLS, and motif 10 to the AHA motif.
Predicted cis-elements in TaHsf promoters and interaction network analysis
In order to facilitate subsequent mining of the functions and mechanisms of TaHsfs, we predicted the cis-elements in their upstream promoter regions (Figure 6). We screened six cis-elements comprising three related to stress responses (DRECRTCOREAT (dehydration-responsive element), CURECORECR (copper-response element), and SORLIP1A (over-Represented in Light-Induced Promoters)) and the other three were CGCGBOXAT (Ca++/calmodulin binds), MYBPLANT (plant MYB binding site), and POLLEN1LELAT52 (one of two co-dependent regulatory elements responsible for pollen specific activation). The results showed that except TaHsf-14, there was at least one cis-element binding site in each TaHsf. The combination of TaHsf with DRECRTCOREAT, CURECORECR, and SORLIP1A also showed that TaHsfs function in the resistance to heat stress, but some TaHsfs also respond to drought, copper, and light stresses. The Ca++/calmodulin pathway is the primary signaling pathway and almost every TaHsf gene was shown to contain more than one binding site. In our previous study, we found that TaHsf interacts with MYB, and this result was confirmed by the MYBPLANT binding sites in TaHsf-15 (B2), TaHsf-16 (C1), TaHsf-28 (A4), and TaHsf-35 (A2). Cis-element analysis allowed us to obtain the upstream factors that regulate their activity, but we determined their interactions using PlantReg Map (Figure 7A). The predicted results indicated interactions between 10 TaHsfs, including self-regulation between TaHsf-02 and TaHsf-44, and both these and nine other genes had regulatory or regulated effects. In addition, we predicted their interacting genes using STRING and found that three main types of genes had interaction relationships with most of the TaHsfs, i.e., the target genes of TaHsfs comprised Hsp70s and Hsp90s, as well as the chaperone protein htpG (high-temperature protein G) (Figure 7B).
Expression profiles of TaHsfs
In order to fully understand the expression patterns of the TaHsfs during each plant growth period and their organization, we analyzed RNA-seq data from the roots, stems, and leaves in the seedling, vegetative growth, and reproductive growth stages, as well spikes from the vegetative growth and reproductive growth stages, and the mature grain (Wheat Expression Browser, http://www.wheat-expression.com/, developmental time-course of Chinese Spring). The results showed that in the roots during the seedling stage, only TaHsfs in subclasses A2 (TaHsf-39 and TaHsf-49), C1 (TaHsf-16), and C2 (TaHsf-25, TaHsf-36, and TaHsf-34) were upregulated. In the roots during the vegetative growth and reproductive growth stages, the TaHsfs with upregulated expression belonged to seven subclasses comprising A1(TaHsf-46, TaHsf-32, and TaHsf-51), A4 (TaHsf-23, TaHsf-18, TaHsf-28, and TaHsf-01), A5 (TaHsf-52, TaHsf-53, and TaHsf-54), B1 (TaHsf-37, TaHsf-43, and TaHsf-47), B2 (TaHsf-38 and TaHsf-44), C1 (TaHsf-16), and C2 (TaHsf-25, TaHsf-36, TaHsf-34, and TaHsf-42). In the stems and leaves during the seedling stage, the upregulated genes only belonged to subclasses A2 (TaHsf-05) and A3 (TaHsf-56). In the stems and leaves during the vegetative growth period, the upregulated genes belonged to subclasses A2 (TaHsf-03 and TaHsf-04) and C1 (TaHsf-20, TaHsf-24, and TaHsf-29). In the stems and leaves during the reproductive growth period, the upregulated genes belonged to subclasses A2 (TaHsf-40, TaHsf-31, TaHsf-33, TaHsf-35, TaHsf-02, TaHsf-03, TaHsf-04, TaHsf-39, and TaHsf-49) and B2 (TaHsf-55, TaHsf-57, TaHsf-59, TaHsf-15, TaHsf-08, TaHsf-11, TaHsf-38, TaHsf-44, and TaHsf-48). In the spikes during the vegetative growth phase, the upregulated genes mainly belonged to subclasses B4 (TaHsf-06, TaHsf-13), A2 (TaHsf-45), A6 (TaHsf-07, TaHsf-10, TaHsf-14), and A5 (TaHsf-54). In the spikes during the reproductive growth phase, the upregulated TaHsfs mainly belonged to C1 (TaHsf-17, TaHsf-19, TaHsf-22, TaHsf-27, TaHsf-29, and TaHsf-21) and A2 (TaHsf-50, TaHsf-45, TaHsf-40, TaHsf-05, TaHsf-09, and TaHsf-12). Among the mature grains, the upregulated TaHsfs mainly belonged to subclasses A2 (TaHsf-05, TaHsf-09, TaHsf-12, TaHsf-40, TaHsf-50), A3 (TaHsf-56, TaHsf-58), C2 (TaHsf-61, TaHsf-55, TaHsf-25, TaHsf-60), and A6 (TaHsf-14, TaHsf-07, TaHsf-10) (Figure 8A, Additional file 6: Table S5).
The analysis of cis-elements demonstrated that TaHsfs also regulate drought stress, so we investigated the expression patterns of TaHsfs under heat stress and drought stress (Wheat Expression Browser, http://www.wheat-expression.com/, drought and heat stress time-course in seedlings). The RNA-seq data showed that (Figure 8B, Additional file 7: Table S6) compared with the no stress control, the TaHsfs did not exhibit differences in their expression levels in response to drought stress for 1 h, but the TaHsfs in C1 (TaHsf-16, TaHsf-21, TaHsf-17, TaHsf-22, TaHsf-26, and TaHsf-27) and C2 (TaHsf-60, TaHsf-36, TaHsf-61, TaHsf-25, and TaHsf-42) were significantly upregulated under drought stress for 6 h. Under heat stress for 1 h and the combination of heat stress and drought stress for 1 h, the expression levels of the TaHsfs were similar, where the main upregulated TaHsfs belonged to A2 (TaHsf-02, TaHsf-03, TaHsf-04, TaHsf-31, TaHsf-39, TaHsf-49, TaHsf-33, TaHsf-35, TaHsf-50, TaHsf-45, and TaHsf-40), A3 (TaHsf-56 and TaHsf-58), A4 (TaHsf-23, TaHsf-18, and TaHsf-28), A5 (TaHsf-52), B2 (TaHsf-08, TaHsf-15, TaHsf-11, TaHsf-57, TaHsf-55, and TaHsf-59), and B1 (TaHsf-37, TaHsf-43, and TaHsf-47). TaHsfs from A6 (TaHsf-50, TaHsf-45, and TaHsf-40) and A4 (TaHsf-23, TaHsf-28, TaHsf-18) were upregulated under heat stress for 6 h and the combination of heat stress and drought stress for 6 h, but TaHsfs from B2 (TaHsf-48, TaHsf-38, and TaHsf-44) were upregulated in the former conditions whereas TaHsfs from A6 (TaHsf-10, TaHsf-07, and TaHsf-14) were upregulated under the latter conditions.
In addition, in order to study the responses of TaHsfs in thermo-sensitive male sterile wheat to different fertility conditions, we analyzed the RNA-seq data for 40 TaHsfs under two fertility conditions in three different periods. The results showed that compared with the sterile conditions, TaHsfs from A1 (TaHsf-51, TaHsf-32, and TaHsf-46), A2 (TaHsf-50, TaHsf-02, TaHsf-04, TaHsf-05, TaHsf-09, TaHsf-35, TaHsf-12, and TaHsf-33), B1 (TaHsf-43 and TaHsf-47), and B2 (TaHsf-11 and TaHsf-55) were upregulated in the uninucleate anthers under fertile conditions. TaHsfs from A2 (TaHsf-12, TaHsf-33, TaHsf-45, and TaHsf-50), A4 (TaHsf-18 and TaHsf-28), A5 (TaHsf-54), A6 (TaHsf-10), B1 (TaHsf-43), and B4 (TaHsf-06) were significantly upregulated in binucleate anthers, and TaHsfs from A2 (TaHsf-35, TaHsf-12, and TaHsf-33), A4 (TaHsf-07), A5 (TaHsf-53), and B2 (TaHsf-55) in the trinucleate anthers under fertile conditions. However, no TaHsfs in subclass C were upregulated under fertile conditions during these three periods (Figure 8C, Additional file 8: Table S7).
The results indicated that the responses of TaHsfs from different subclasses varied in different developmental stages, tissues, stresses, and ecological conditions, thereby demonstrating that the different subclasses of TaHsf have diverse functions and they may participate in different regulatory links and pathways.