Deep Evaluation to the Evolution History of Heat Shock Factor (HSF) Gene Family and Its Expansion Pattern in Seed Plants

HSF (Heat shock factor) genes are essential in the irreplaceable functions in some of the basic developmental pathways in plants. Despite the extensive studies on the structure, function diversication, and evolution of HSF, their divergent history and gene duplication pattern remain unsolved. To further illustrate the probable divergent patterns in these subfamilies, we visited the evolutionary history of the HSF via phylogenetic reconstruction and genomic syntenic analyses by taking advantage of the increased sampling of genomic data for pteridophyta, gymnosperms and basal angiosperms. We identied a novel clade including HSFA2, HSFA6, HSFA7, HSFA9 with complex relationship, very likely due to orthologous or paralogous genes retained after frequent gene duplication events. We suggested that HSFA9 was derived from HSFA2 through gene duplication in eudicots at ancestral state, and then expanded in a lineage-specic way. Our ndings indicated that HSFB3 and HSFB5 emerged before the divergence of ancestral angiosperms, but were lost in common ancestors of monocots. We also presumed that HSFC2 was derived from HSFC1 in ancestral monocots.


Abstract
Background HSF (Heat shock factor) genes are essential in the irreplaceable functions in some of the basic developmental pathways in plants. Despite the extensive studies on the structure, function diversi cation, and evolution of HSF, their divergent history and gene duplication pattern remain unsolved. To further illustrate the probable divergent patterns in these subfamilies, we visited the evolutionary history of the HSF via phylogenetic reconstruction and genomic syntenic analyses by taking advantage of the increased sampling of genomic data for pteridophyta, gymnosperms and basal angiosperms.

Results
We identi ed a novel clade including HSFA2, HSFA6, HSFA7, HSFA9 with complex relationship, very likely due to orthologous or paralogous genes retained after frequent gene duplication events. We suggested that HSFA9 was derived from HSFA2 through gene duplication in eudicots at ancestral state, and then expanded in a lineage-speci c way. Our ndings indicated that HSFB3 and HSFB5 emerged before the divergence of ancestral angiosperms, but were lost in common ancestors of monocots. We also presumed that HSFC2 was derived from HSFC1 in ancestral monocots.

Conclusion
This work proposes that in the era of early differentiation of angiosperms during the radiation of owering plants, the member size of HSF gene family was also being adjusted, accompanied with considerable sub-or neo-functionalization. The independent evolution of HSFs in eudicots and monocots, including lineage-speci c gene duplication gave rise to a new gene in ancestral eudicots and monocots, and lineage-speci c gene loss in ancestral monocots. Our analyses provide essential insights for studying evolution history of multigene family.

Background
Heat shock factors (HSFs), as the central regulators of the expression of heat shock proteins and other heat shock-induced genes, play crucial roles in the enhancement of thermo tolerance in plants. They function as molecular chaperones in protein folding and assembly, to protect cells against proteotoxic damage under heat stress (HS) ( [1][2][3]. Besides their involvement in HS (Heat Shock) response, HSFs had been identi ed in most eukaryotes and non-plant organisms, where they participate in growth and development [4,5]. In plants, especially in angiosperms, HSFs have also been widely studied as essential elements to cope with various environmental stresses [5]. This gene family has expanded greatly such that the number of HSF genes ranges from one or two in green algae, to more than 50 in angiosperms [6].
The HSFs generally contain the DNA binding domain (DBD), the oligomerization domain (OD), and a exible linker between DBD and OD regions [5,7]. Based on the topology of these domains, HSFs are classi ed into three groups: HSFA, HSFB and HSFC. These groups are further divided into 16 subfamilies which are distinguished in angiosperms, including HSFA (A1-A9) group, HSFB (B1-B5) group and HSFC (C1-C2) group [1,5,8,9]. The rst overview of HSFs was presented in Arabidopsis thaliana by Nover et al. [8], in which HSFC was discovered. Afterwards, valuable summaries including data from nine angiosperms species and over 50 species containing all lineages of land plants retrospect the structure, function and evolution of HSFs have been compiled [5,6]. These reports pointed out that the HSF family members and their functions diverged greatly among the higher plant lineages in response to environment stresses. However, the evolutionary relationships among the subfamilies are still obscure as some of the deepest nodes of the HSFs phylogeny tree, such as position of HSFB5 and HSFA9 remain elusive. These would be attributed to the limited access to complete HSFs data in representative seed plant lineages, including gymnosperms and basal angiosperms. It may also be in uenced by the unpredictable gene copy turnover after recurring gene duplication events in tandem or genome-wide.
Here, we expanded the data collection to basal angiosperms, gymnosperms, and pteridophyte to reconstruct the diversi cation history of HSFs during seed plants evolution. Moreover, we detected syntenic relationships of HSFs across a wide range of species, thus providing crucial information to address fundamental questions on the evolution of gene families. We also predicted the divergence time of typical genes derived from their ancestors, based on the reliable gene orthology. Our results present critical evidence for explaining the expansion of the HSF subfamilies in seed plant lineages.

Results
The phylogeny and evolution of HSFs in land plants We identi ed 287 new candidate HSF sequences from 24 species on the HEATSTER website, of which 228 belonged to known subfamilies (A1-A9, B1-B5, C1-C2), 59 belonged to HSF like (Table 1). In the data sets download from Heaster, 442 sequences were classi ed. A total of 670 HSF sequences (Table S4) from 44 species, were used for phylogenetic analysis. Across the studied comprehensive samples, the identi ed number of HSF gene subfamilies is greatly variable, ranging from 2 in chlorophyta to 30 in angiosperms. The unrooted phylogenetic tree inferred from amino acid sequences resolved three main clades, HSFA-HSFC, HSFB, and HSFC (Fig. 1). The newly identi ed HSF genes were recon rmed on a phylogeny tree. Most subfamilies of clades (A3, A4, A5, A8, A9, B2, B3, B5, C1, C2) were strongly supported, while the relationships between these clades were weakly supported. The HSF subfamilies display strong diversi cation in structure, composition and function [5][6][7], thus, signi cant genetic differentiation between clades especially for HSFA and HSFB probably resulted in the unstable topology.
Group HSFA was observed in all sampled taxa, however group HSFB was absent in chlorophyta, while group HSFC was only present in angiosperms.
The HSFA group, which contains major regulators in the HS response of plants [6], has undergone diversi cation during plant evolution hence the result of classi cation displayed variations in different taxa. Strikingly, subfamilies clades (A4, A5, A6, A7, A8, A9) were only identi ed in angiosperms, whereas A9 was merely identi ed in Eudicots. In detail, some subfamilies clustered on a single branch, such as A3, A5, A9, while others were clustered on several branches (Fig. S1, Fig. S2). HSFA1, as a master regulator which cannot be replaced by any other HSF [5], is likely to be considered as the most ancient group in HSFA. Although all HSFA1 and HSFA8 of angiosperms clustered as a clade, most of HSFA1 from pteridophyte and gymnospermae were dispersed into several clades. The deep divergence of HSFA1 in pteridophyta and gymnospermae, indicated that it diversi ed before the radiation of seed plants. HSFA2, HSFA6, HSFA7, and HSFA9 were blended into a complex clade, and while HSFA9 formed monophyletic groups, the others remained unclear. It is interesting to observe that HSFA2 gene and HSFA6 gene clustered together with very little genetic difference in some angiosperm species such as O. sativa, Phoenix dactylifera, and Citrullus lanatus, the same as HSFA6 gene and HSFA7 gene in C. lanatus. Two HSFA5 genes clustered with HSFA4 genes, indicating a close relationship between the two clades. It had been previously suggested that subclass HSFA3 and group HSFC form a cluster. However, due to increased number of ferns and gymnosperm used in this study, the HSFA1 of the gymnosperms and HSFC, rather than HSFA3, formed a single cluster (Fig. S1, Fig. S2). It is assumed that a duplication event occurred in the ancestral angiosperms which could have contributed to the rise of HSFC.
The numbers of genes in HSFC1, as a common subfamily, varied between monocots, which had typically two members, and the eudicots with only one member ( Table 1, Fig. S2). The results indicated that the HSFC experienced a steady expansion during the evolution of monocots, and may be involved in important developmental pathways [6]. Notably, HSFC2 was only present in monocots, but HSFC1 was in all angiosperm species except for A. trichopoda. In monocots, both HSFC1 and HSFC2 formed a strongly supported cluster. HSFC1-HSFC2 clade of monocots and HSFC1 of eudicots formed a branch, based with HSFC1 of basal angiosperms. This suggested that HSFC2 are a result of recent duplications occurring in the early stages of divergence between monocots and eudicots.
Contrary to a previous study [6], the results presented here suggested that HSFB subfamily (B1, B2, B3, B4, B5) were moderately supported as a monophyletic group (Fig. S1, Fig. S3). While HSFB1, HSFB2, and HSFB4 have widely been observed across land plants, both HSFB3 and HSFB5 are only present in eudicots, and basal angiosperms. Unlike other subfamily members of the HSFB groups, the HSFB5 has a conserved tetrapeptide LFGV in the C-terminal domain, thus it is closely related to HSFB3 (Fig. 1). Additionally, the number of HSFB1 genes in gymnosperms is far more than that in angiosperms, with the average number reduced from 3.4 in gymnosperms to 1.2 in angiosperms (Table 1). In particular, the gene number of HSFB1 in conifers (Picea abies, Pinus taeda, Picea glauca) is signi cantly higher than that of other seed plants. Multiple copies of HSFB1 gene of P. abies, P. taeda, and P. glauca clustered and formed a strongly supported monophyletic group. This result indicated that the evolution of those three conifers probably involved both polyploidy and repetitive element activity [10][11][12]. The multi-copy genes may be attributed to two whole genome duplication (WGD) events in the ancestry of major conifer clades [11].
Though many angiosperm lineages have experienced additional rounds of genome duplication [11,13,14], the number of HSFB1 in angiosperms displayed no signi cant increase which is consistent with speculation that WGD in angiosperms did not give rise to remarkable expansion of HSFB1 genes. The HSFB1 group in angiosperms, gymnosperms and pteridophyta clustered independently, suggesting that HSFB1 was an ancient group that diverged during the evolutionary history of the different taxa. In gymnosperms, the HSFB1 group experienced several expansions including ancient duplication that were generally rare in angiosperms except for few recent duplicates. All HSFB2 of gymnosperms and angiosperms formed different clusters. We could not trace any remarkable expansion in gymnosperms, but more than two genes in angiosperms were assumed to be the result of recent duplication. In some species such as Selaginella moellendor i, it was observed that some genes identi ed as different subfamilies such as HSFB1 and HSFB4 have high genetic similarity with highly supported clades. The complicated relationship of those two subfamilies may be as a result of involvement of recent duplication events. In this study, subfamilies HSFB3 and HSFB5 were only present in eudicots and basal angiosperms probably as a result of duplication events occurring in ancestral angiosperms, but the paralogue genes were lost in monocots.

Gene duplication analysis
To examine expansion patterns, genetic divergences, and identify gene duplication events that affected the evolution of genomes in the HSF gene family, synteny analysis was performed across twenty-one species (Table S3). The synteny analysis between different species was also conducted on pairwise species which were closely related taxa.
Gene duplication events were identi ed in eleven species among the pteridophyta, basal angiosperms, monocots and eudicots ( Table 2, Table S5, Table S6, Table S7). We did not detect any synteny blocks of the HSFs gene among the green alga, moss, and gymnosperm. This result indicate that the ancient HSF gene duplications were not easy to detect, because most duplicates had been lost. In S. moellendor i, the only fern, we identi ed one pair of duplicated genes. The two genes belong to different subclasses of HSF gene family;'SelmoHSFB1b' and'SelmoHSFB4' which were calculated as syntenic to each other. It is speculated that they may be derived from an ancient tandem duplication and evolved a certain degree of difference at the gene sequence level. In L. chinense, the only basal angiosperm, we identi ed ve pairs of duplicated genes out of which members of each of four pairs belonged to the same gene subclass (HSFA2, HSFB1, HSFB2, HSFC1) while genes in the other pair belonged to different subclasses (HSFA4-HSFA5). Gene duplication events were detected in all sampled eudicot and monocot species (Table S7). In ve eudicots (A. thaliana, Populus trichocarpa, Prunus persica, S. lycopersicum, Mimulus guttatus), we identi ed a total of thirty-three pairs of duplicated genes out of which members of each of twenty-nine pairs belonged to the same gene subclasses (HSFA1, HSFA4, HSFA5, HSFA6, HSFA8, HSFB2, HSFB3, HSFB4, HSFB5) while genes in each of the remaining pairs belonged to different subclasses (HSFA2-HSFA9, HSFA6-HSFA7). The same number of gene duplications were identi ed in four monocots (O. sativa, Sorghum bicolor, Z. mays, Brachypodium distachyon), however, members of each of the twentynine pairs belonged to the same gene subclasses (HSFA1, HSFA2, HSFA4, HSFA6, HSFB1, HSFB2, HSFB4, HSFC1, HSFC2) while genes in each of the remaining four pairs belonged to different subclasses (HSFA2-HSFA6, HSFB1-HSFB2, HSFB2-HSFB4). In general, all subfamilies of HSF genes except HSFA3, were involved in duplication events. The results demonstrated that, pairs of genes from different subfamilies such as HSFA2-HSFA6, HSFA2-HSFA9, HSFA4-HSFA5, HSFA6-HSFA7, HSFB1-HSFB4, HSFB1-HSFB2, and HSFB2-HSFB4, were paralogous.
On the whole, the results indicate that duplication of the HSF genes has been a common event during the evolution of plants, signi cantly contributing to the expansion and functional diversi cation (Fig. 2). Thus, it is suggested that HSFA4 and HSFA5 have a close genetic relationship, and their origin may be related to ancient duplication of HSFA1. It is possible that HSFA6 and HSFA7 originated from gene duplication, most probably involving HSFA2. HSFA9 was proved to be derived from HSFA2 after the divergence of ancestral angiosperms. Moreover, HSFB1 is considered as the most ancient among HSFB, and we predict that HSFB2 and HSFB4 were derived from HSFB1 considering the close relationship between them.

Divergence time analysis
The estimated divergence time of subfamilies HSFA2 and HSFA9 in eudicots ranges between 131 Mya and 155.2 Mya, which is within the Late Jurassic and Lower Cretaceous periods (Fig. 3). The estimated split time of clade HSFC2 and HSFC1 in monocots ranges between 125 Mya and 190.4 Mya which is within the Jurassic and Lower Cretaceous (Fig. 4). The occurrence time of those gene duplication events were consistent with the most recent common ancestor of all living angiosperms, which likely existed ~ 140-250 Mya [15,16]. Although uncertainty remains for other characters, our reconstruction of differentiation time scale between gene subfamilies allows us to propose a new plausible scenario for the early diversi cation of angiosperms at genomic level. The origin and rapid diversi cation of angiosperms represent one of the most intriguing topics in evolutionary biology [17], and research in the evolution of gene families (such as the origin, expansion and loss of genes) provides an unprecedented opportunity to explore remarkable long-standing questions that probably hold important clues to understand presentday biodiversity and adaption to environment.

Discussion
Previous phylogenetic studies of HSF gene family in plants, have provided valuable insights into their evolutionary history [5,6]. However, the limited sampling in pteridophyta, gymnosperms and basal angiosperms left unresolved questions relating to the origin of subclasses in HSF gene family, their phylogenetic relationship, and gene expansion patterns in different taxa. HSFs play a key role in plants adaptation to the changing habitat and overcoming stresses. However, our understanding of land plants evolution at a genetic level and in relation to environmental changes is also obscure [5,6,[18][19][20][21][22][23].
Although ongoing plant genome projects will certainly uncover additional species or family-speci c deletions and duplications, the general features are unlikely to change [24]. In this study, the number and diversity of plants examined allowed us to raise the question of the evolutionary history of this gene family in a broader taxonomic context. Our phylogenetic analyses revealed divergence of subfamilies of HSFs and independent evolution in plants, especially in angiosperms. With the increased number of simultaneously analyzed genomes, it is becoming more di cult to organize and display such syntenic relationships. This is due to the ubiquity of ancient and recent polyploidy events, as well as smaller scale events that derive from tandem and transposition duplications [25][26][27][28]. However, thanks to a combination of phylogeny and synteny analyses in this study, the results had scratched the surface of just how gene expansion in different land plant taxa occurred. It proved that puzzling clades (HSFA2, HSFA6, HSFA7, HSFA4, HSFA5) with members from other group snuck in caused by recent gene duplication events.
Our studies on different members of the HSF gene family, from pteridophyta and gymnosperm, reveal that this gene family is quite complex in terms of gene number and sequence diversity. We identi ed four subfamilies of HSFs (HSFA1, HSFA2, HSFB1, HSFB4) across candidates of six species in pteridophyta, and ve subfamilies (HSFA1, HSFA2, HSFB1, HSFB2, HSFB4) from sixteen species of gymnosperms. Though the number of HSFs in pteridophyta and gymnosperm is signi cantly less than angiosperms, the number of HSFA1 and HSFB1 in those taxa were higher than angiosperms. It was assumed that pteridophyta and gymnosperm preferred to reserve more ancient members in HSFs subfamily. Both subfamily HSFA1 and HSFB1, in pteridophyta and gymnosperm, clustered on multiple clades on the phylogeny tree with low support, which is consistent with ndings that more ancient duplication events affect more distant taxonomic comparisons [26]. The syntenic analysis detected only two genes (SelmoHSFB1b and SelmoHSFB4) in S. moellendor i, which appeared to be a result of duplication events. These ndings indicate that HSFA1, HSFA2, HSFB1 and HSFB4 which are already present in the ancestor of all land plants, were more ancient groups.
Gymnosperms lineages varied from each other during the Late Carboniferous to the Late Triassic, and were dominant through most of the Mesozoic [29,30]. However, major gymnosperm extinctions occurred in the Cenozoic, and in contrast with angiosperms, the surviving gymnosperm genera have diversi ed slower than angiosperms [31]. Ancient gene subfamilies such as HSFB1 and HSFA1 accompanying plants experienced the long time differentiation and the successional variation process, which may explain the molecular phylogenetic uncertainty within gymnosperm. Ancient WGD is found in the ancestry of all extant seed plants, and angiosperm and gymnosperm lineages have experienced additional rounds of genome duplication [11,13,14,[32][33][34]. Although no syntenic gene was detected in gymnosperms, two or more genes from different subclasses formed strongly supported clades (such as PintaHSFA1a and PintaHSFA2, Abi RHSFB1a and Abi RsfB4a), so the absence of syntenic gene in gymnosperm may be a result of incomplete genome information, or assembly and annotation problems. The ancient interspersed segmental duplication of those genes in recent times could be detected by phylogenetic analysis and synteny analysis.
Comparably in angiosperms, this gene family has undergone extensive duplications that gave rise to complicated relationships of orthology, paralogy, and functional heterology. The results showed that, besides having a remarkably higher number and diversity of HSF family than in older taxa, angiosperms had multiple paralogue and orthology genes. Most of the gene copies generated by WGD events are lost due to fractionation and subsequent "rediploidization" or non-functionalization [35]. Gene duplication is an important mechanism that contributes to genomic novelty [11], and the functional divergence of duplicate genes retained from WGD is thought to promote evolutionary diversi cation. The multiple recent WGDs occurring in angiosperms lineages allowed the expansion and variation of HSFs, as con rmed by previous studies in Fagopyrum tataricum [36], and genus Brassica [21]. The results of synteny analysis con rmed that subfamily HSFA9, only present in eudicots, was derived from HSFA2, and that HSFC2, only present in monocots was derived from HSFC1. New genes originated from divergence of paralogue genes which resulted from duplication events. The two duplication events occurred in the early stages of angiosperm divergence, which is consistent with angiosperm radiations that occurred in the Late Jurassic and Lower Cretaceous [37]. Approximately 132 mya ago, angiosperms underwent massive adaptive radiations to become the most diverse and successful plant group on land [38]. The coincidence of retained duplication events with key moments in the evolution of biological innovations and survival in the face of mass born underlines the importance of this crucial process [39]. Lineage-speci c duplications will provide the keys to understand both common underlying regulatory mechanisms and the species-speci c differences that generate diversity. The subfamily HSFB3 and HSFB5 were found to be absent in monocots, but present in most of basal angiosperms and eudicots. Consequently, we deduce that HSFB3 and HSFB5 were thoroughly lost in monocots, nevertheless, their origin and evolutionary history remain poorly understood. We speculate that those gene loss events took place during the early divergence time in the angiosperm history. The above results indicate that not only did the species experience early rapid radiation, diversi cation, and mass extinction [37,[40][41][42][43], their genes also went through expansion, diversi cation, and loss. After divergence of angiosperms, eudicots and monocots experienced different evolutionary processes.
The recent upward trend in number of completely sequenced genomes of plants in different phylogenetic lineages has advanced our evolutionary understanding of gene families with important functions. Our comprehensive analysis reveals that the diversi cation of HSFs in plants was as a result of extensive gene duplications, gene loss and sub-or neo-functionalization during the evolution and diversi cation of land plants. Lineage-speci c expansions in angiosperms, especially in eudicots and monocots may re ect the potential evolutionary advantage of plasticity and exibility in complex and changing environments. The patterns of gene duplication and evolution history of HSFs in plants provide novel insight into their diversity which facilitate plant diversi cation, adaptation and evolution in various habitats. Our analyses provide essential insights for studying evolution history of multigene families.

Conclusions
The recent upward trend in number of completely sequenced genomes of plants in different phylogenetic lineages has advanced our evolutionary understanding of gene families with important functions. Our comprehensive analysis reveals that the diversi cation of HSFs in plants was as a result of extensive gene duplications, gene loss and sub-or neo-functionalization during the evolution and diversi cation of land plants. Lineage-speci c expansions in angiosperms, especially in eudicots and monocots may re ect the potential evolutionary advantage of plasticity and exibility in complex and changing environments. The patterns of gene duplication and evolution history of HSFs in plants provide novel insight into their diversity which facilitate plant diversi cation, adaptation and evolution in various habitats. Our analyses provide essential insights for studying evolution history of multigene families.
The alignments of the identi ed and classi ed candidate genes excluded Hsf like sequences were performed by MUSCLE (http://www.drive5.com/muscle). Phylogenetic analyses were conducted by RAxML version 8.0.19 [46], with 100 bootstraps, PROTGAMMAAUTO model, and Maximum likelihood reconstruction using the new rapid hill-climbing and rapid Bootstrap analysis (-f ad). Phylogenetic trees were examined and manipulated with Evolview v2 [47]. To better understand the evolutionary relationship within the subfamilies, and for an in-depth phylogenetic analyses of the HSFB, and HSFA-HSFC clades, all the identi ed HSFB, HSFA and HSFC genes were respectively used for phylogenetic tree reconstruction.
Also, for better understanding of the complicated evolutionary relationship between the clade of subfamilies HSFA2, HSFA6, HSFA7, HSFA9 and the HSFC clade, those two group of genes were respectively extracted for construction of phylogenetic trees, with Chlamydomonas reinhardtii as an outgroup. The methods of multiple proteins sequence alignments and phylogenetic analyses followed the same steps as described above.

Synteny analysis and molecular dating analyses
Further investigations of the evolutionary relationship of HSFs between the main taxa of plants, were made using MCScanX [48] in order to detect the gene replication events. Twenty-one plant genomes were subjected to a synteny analysis covering green algae, mosses, ferns, gymnosperms, basal angiosperms and angiosperms (Table S3). We analyzed all protein models from these genomes for all possible intraand inter-species genome-wide comparisons. Genome annotation and corresponding protein sequences were downloaded for each species. Paralogous and orthologous genes in or between those genomes were identi ed through synteny detection by using MCScanX with default parameters (minimum match size for a collinear block = 5 genes, max gaps allowed = 25 genes). The output les from all the intra-and inter-species comparisons were integrated into a single le named "Total_Synteny_Blocks," including the headers "Block_Index," "Locus_1," "Locus_2," and "Block_Score," which served as the database le. All -vsall protein sequence comparisons necessary for MCScanX were performed using DIAMOND v 0.8.25 [49]. The gene list containing all candidate HSF genes was queried against the "Total_Synteny_Blocks" le. From the results, we checked whether or not HSFs genes existed in the syntenic block. For synteny analysis between species on close taxa, eight representative species were chosen as follows; gymnosperms (Gnetum montanum, Ginkgo biloba), basal angiosperms (L. chinense, Amborella trichopoda), Monocots (Oryza sativa, Zea mays), Eudicots (A. thaliana, Solanum lycopersicum). The methods and procedures were the same as previously stated.
The subfamily HSFC1 and HSFC2 genes, and subfamily HSFA2 and HSFA9 genes were extracted from database and used for estimating the divergence time respectively. We calibrated a relaxed molecular clock prior on the node with the divergence time of monocots and eudicots between 140 Mya (a minimum age) and 200 Mya (a maximum age) (represented by the divergence of A. thaliana and O. sativa; [50]. We performed a Bayesian dating analysis in MCMC tree [51], using approximate likelihood calculation for the branch lengths, an auto-correlated model of among-lineage rate variation, the GTR substitution model, and a uniform prior on the relative node times. We used Markov chain Monte Carlo sampling to estimate posterior distributions of node ages, with samples drawn every 2 steps over 200, 000 steps following a burn-in of 10,000 steps.

Consent to publish
Not applicable.

Availability of data and materials
The dataset supporting the results of this article is available as Additional les.

Competing interests
The authors declare that they have no competing interests.