The phylogeny and evolution of HSFs in land plants
We identified 287 new candidate HSF sequences from 24 species on the HEATSTER website, of which 228 belonged to known subfamilies (A1-A9, B1-B5, C1-C2), 59 belonged to HSF like (Table 1). In the data sets download from Heaster, 442 sequences were classified. A total of 670 HSF sequences (Table S4) from 44 species, were used for phylogenetic analysis. Across the studied comprehensive samples, the identified number of HSF gene subfamilies is greatly variable, ranging from 2 in chlorophyta to 30 in angiosperms. The unrooted phylogenetic tree inferred from amino acid sequences resolved three main clades, HSFA-HSFC, HSFB, and HSFC (Fig. 1). The newly identified HSF genes were reconfirmed on a phylogeny tree. Most subfamilies of clades (A3, A4, A5, A8, A9, B2, B3, B5, C1, C2) were strongly supported, while the relationships between these clades were weakly supported. The HSF subfamilies display strong diversification in structure, composition and function [5–7], thus, significant genetic differentiation between clades especially for HSFA and HSFB probably resulted in the unstable topology. Group HSFA was observed in all sampled taxa, however group HSFB was absent in chlorophyta, while group HSFC was only present in angiosperms.
The HSFA group, which contains major regulators in the HS response of plants [6], has undergone diversification during plant evolution hence the result of classification displayed variations in different taxa. Strikingly, subfamilies clades (A4, A5, A6, A7, A8, A9) were only identified in angiosperms, whereas A9 was merely identified in Eudicots. In detail, some subfamilies clustered on a single branch, such as A3, A5, A9, while others were clustered on several branches (Fig. S1, Fig. S2). HSFA1, as a master regulator which cannot be replaced by any other HSF [5], is likely to be considered as the most ancient group in HSFA. Although all HSFA1 and HSFA8 of angiosperms clustered as a clade, most of HSFA1 from pteridophyte and gymnospermae were dispersed into several clades. The deep divergence of HSFA1 in pteridophyta and gymnospermae, indicated that it diversified before the radiation of seed plants. HSFA2, HSFA6, HSFA7, and HSFA9 were blended into a complex clade, and while HSFA9 formed monophyletic groups, the others remained unclear. It is interesting to observe that HSFA2 gene and HSFA6 gene clustered together with very little genetic difference in some angiosperm species such as O. sativa, Phoenix dactylifera, and Citrullus lanatus, the same as HSFA6 gene and HSFA7 gene in C. lanatus. Two HSFA5 genes clustered with HSFA4 genes, indicating a close relationship between the two clades. It had been previously suggested that subclass HSFA3 and group HSFC form a cluster. However, due to increased number of ferns and gymnosperm used in this study, the HSFA1 of the gymnosperms and HSFC, rather than HSFA3, formed a single cluster (Fig. S1, Fig. S2). It is assumed that a duplication event occurred in the ancestral angiosperms which could have contributed to the rise of HSFC.
The numbers of genes in HSFC1, as a common subfamily, varied between monocots, which had typically two members, and the eudicots with only one member (Table 1, Fig. S2). The results indicated that the HSFC experienced a steady expansion during the evolution of monocots, and may be involved in important developmental pathways [6]. Notably, HSFC2 was only present in monocots, but HSFC1 was in all angiosperm species except for A. trichopoda. In monocots, both HSFC1 and HSFC2 formed a strongly supported cluster. HSFC1-HSFC2 clade of monocots and HSFC1 of eudicots formed a branch, based with HSFC1 of basal angiosperms. This suggested that HSFC2 are a result of recent duplications occurring in the early stages of divergence between monocots and eudicots.
Contrary to a previous study [6], the results presented here suggested that HSFB subfamily (B1, B2, B3, B4, B5) were moderately supported as a monophyletic group (Fig. S1, Fig. S3). While HSFB1, HSFB2, and HSFB4 have widely been observed across land plants, both HSFB3 and HSFB5 are only present in eudicots, and basal angiosperms. Unlike other subfamily members of the HSFB groups, the HSFB5 has a conserved tetrapeptide LFGV in the C-terminal domain, thus it is closely related to HSFB3 (Fig. 1). Additionally, the number of HSFB1 genes in gymnosperms is far more than that in angiosperms, with the average number reduced from 3.4 in gymnosperms to 1.2 in angiosperms (Table 1). In particular, the gene number of HSFB1 in conifers (Picea abies, Pinus taeda, Picea glauca) is significantly higher than that of other seed plants. Multiple copies of HSFB1 gene of P. abies, P. taeda, and P. glauca clustered and formed a strongly supported monophyletic group. This result indicated that the evolution of those three conifers probably involved both polyploidy and repetitive element activity [10–12]. The multi-copy genes may be attributed to two whole genome duplication (WGD) events in the ancestry of major conifer clades [11]. Though many angiosperm lineages have experienced additional rounds of genome duplication [11,13,14], the number of HSFB1 in angiosperms displayed no significant increase which is consistent with speculation that WGD in angiosperms did not give rise to remarkable expansion of HSFB1 genes. The HSFB1 group in angiosperms, gymnosperms and pteridophyta clustered independently, suggesting that HSFB1 was an ancient group that diverged during the evolutionary history of the different taxa. In gymnosperms, the HSFB1 group experienced several expansions including ancient duplication that were generally rare in angiosperms except for few recent duplicates. All HSFB2 of gymnosperms and angiosperms formed different clusters. We could not trace any remarkable expansion in gymnosperms, but more than two genes in angiosperms were assumed to be the result of recent duplication. In some species such as Selaginella moellendorffii, it was observed that some genes identified as different subfamilies such as HSFB1 and HSFB4 have high genetic similarity with highly supported clades. The complicated relationship of those two subfamilies may be as a result of involvement of recent duplication events. In this study, subfamilies HSFB3 and HSFB5 were only present in eudicots and basal angiosperms probably as a result of duplication events occurring in ancestral angiosperms, but the paralogue genes were lost in monocots.
Gene duplication analysis
To examine expansion patterns, genetic divergences, and identify gene duplication events that affected the evolution of genomes in the HSF gene family, synteny analysis was performed across twenty-one species (Table S3). The synteny analysis between different species was also conducted on pairwise species which were closely related taxa.
Gene duplication events were identified in eleven species among the pteridophyta, basal angiosperms, monocots and eudicots (Table 2, Table S5, Table S6, Table S7). We did not detect any synteny blocks of the HSFs gene among the green alga, moss, and gymnosperm. This result indicate that the ancient HSF gene duplications were not easy to detect, because most duplicates had been lost. In S. moellendorffii, the only fern, we identified one pair of duplicated genes. The two genes belong to different subclasses of HSF gene family;‘SelmoHSFB1b’ and‘SelmoHSFB4’ which were calculated as syntenic to each other. It is speculated that they may be derived from an ancient tandem duplication and evolved a certain degree of difference at the gene sequence level. In L. chinense, the only basal angiosperm, we identified five pairs of duplicated genes out of which members of each of four pairs belonged to the same gene subclass (HSFA2, HSFB1, HSFB2, HSFC1) while genes in the other pair belonged to different subclasses (HSFA4-HSFA5). Gene duplication events were detected in all sampled eudicot and monocot species (Table S7). In five eudicots (A. thaliana, Populus trichocarpa, Prunus persica, S. lycopersicum, Mimulus guttatus), we identified a total of thirty-three pairs of duplicated genes out of which members of each of twenty-nine pairs belonged to the same gene subclasses (HSFA1, HSFA4, HSFA5, HSFA6, HSFA8, HSFB2, HSFB3, HSFB4, HSFB5) while genes in each of the remaining pairs belonged to different subclasses (HSFA2-HSFA9, HSFA6-HSFA7). The same number of gene duplications were identified in four monocots (O. sativa, Sorghum bicolor, Z. mays, Brachypodium distachyon), however, members of each of the twenty-nine pairs belonged to the same gene subclasses (HSFA1, HSFA2, HSFA4, HSFA6, HSFB1, HSFB2, HSFB4, HSFC1, HSFC2) while genes in each of the remaining four pairs belonged to different subclasses (HSFA2-HSFA6, HSFB1-HSFB2, HSFB2-HSFB4). In general, all subfamilies of HSF genes except HSFA3, were involved in duplication events. The results demonstrated that, pairs of genes from different subfamilies such as HSFA2-HSFA6, HSFA2-HSFA9, HSFA4-HSFA5, HSFA6-HSFA7, HSFB1-HSFB4, HSFB1-HSFB2, and HSFB2-HSFB4, were paralogous.
The synteny analysis between different species detected orthologous genes in different taxa (Table 3, Table S8, Table S9). Between two gymnosperm species (G. montanum, G. biloba), only HSFA1 genes from different sources were detected as orthologous genes. The analysis detected that genes HSFA1, HSFA4, and HSFA5 were detected as orthologous in G. biloba (gymnosperms) and L. chinense (basal angiosperms). Though we found several orthologous genes, such as HSFA6-HSFA7, HSFA4-HSFA5, HSFA2-HSFA9, and HSFB2-HSFB5, among the basal angiosperms (A. trichopoda, L. chinense) and eudicots (S. lycopersicum, A. thaliana), only orthologous genes HSFA2-HSFA6, and HSFA2-HSFA7 were detected among basal angiosperms and monocots (O. sativa, Z. mays). Interestingly, the analysis of eudicots-monocots got the same results as basal angiosperms-monocots. Orthologous genes HSFA1-HSFA5, and HSFA2-HSFA7 were found in monocots, while HSFA6-HSFA7 was detected in eudicots.
On the whole, the results indicate that duplication of the HSF genes has been a common event during the evolution of plants, significantly contributing to the expansion and functional diversification (Fig. 2). Thus, it is suggested that HSFA4 and HSFA5 have a close genetic relationship, and their origin may be related to ancient duplication of HSFA1. It is possible that HSFA6 and HSFA7 originated from gene duplication, most probably involving HSFA2. HSFA9 was proved to be derived from HSFA2 after the divergence of ancestral angiosperms. Moreover, HSFB1 is considered as the most ancient among HSFB, and we predict that HSFB2 and HSFB4 were derived from HSFB1 considering the close relationship between them.