Evolutional analysis of heat shock transcription factors in wild and cultivated rice (Oryza sativa L.)

Background (cid:0) Heat shock transcription factors (Hsfs) take part in many physiological and biochemical pathways in plants by regulating the expression of various stress-responsive genes, such as heat shock proteins (Hsps). With the development of rice genome re-sequencing projects, some researches had been carried out to identify Hsf gene family members in rice at the whole genomic scale. However, Hsfs in cultivated and wild rice genomes has not been fully studied and compared, although genetic diversity in cultivated rice is limited compared to wild rice. Results (cid:0) In this research work, Hsfs genes were screened and evolutionally compared in the genomes of 6 wild rice and 1 cultivated rice varieties, including O. barthii, O. glumaepatula, O. meridionalis, O. nivara, O. punctate, O. rupogon and O. sativa & Nipponbare. Total 22, 23, 24, 24, 25, 25 and 25 Hsf genes were identied in the tested 7 rice genomes, respectively. The different number of Hsf genes between wild and cultivated rice genotypes was due to dispersed duplication and whole genome duplication (WGD) events, reversely contributed to different stress-tolerant ability between wild and cultivated rice. The evolutional analysis on the Hsf genes conrmed that O. rupogon was the immediate ancestral progenitors of O. sativa. The expression prole of Hsf genes in Nipponbare and O. rupogon under different stage of salinity stress showed that 4 root Hsf genes, including HsfA3a, HsfA4d, HsfC2a and HsfC2b, were simultaneously up-regulated by salinity stress in cultivated rice and its ancestral progenitor, implying that these 4 Hsf genes played conserved roles in rice in response to salinity stress. However, a substantial number of Hsf genes were exclusively regulated only in Oryza rupogon rice seedling, suggesting that some of genuine salinity stress tolerance genes might be missing in cultivated rice. Conclusion (cid:0) The results of this study would give insight into the evolution and function of Hsf gene members in rice, and hint to the use of wild relative genes to improve rice performance.


Abstract
Background Heat shock transcription factors (Hsfs) take part in many physiological and biochemical pathways in plants by regulating the expression of various stress-responsive genes, such as heat shock proteins (Hsps). With the development of rice genome re-sequencing projects, some researches had been carried out to identify Hsf gene family members in rice at the whole genomic scale. However, Hsfs in cultivated and wild rice genomes has not been fully studied and compared, although genetic diversity in cultivated rice is limited compared to wild rice.
Results In this research work, Hsfs genes were screened and evolutionally compared in the genomes of 6  Total 22,23,24,24,25,25 and 25 Hsf genes were identi ed in the tested 7 rice genomes, respectively. The different number of Hsf genes between wild and cultivated rice genotypes was due to dispersed duplication and whole genome duplication (WGD) events, reversely contributed to different stress-tolerant ability between wild and cultivated rice. The evolutional analysis on the Hsf genes con rmed that O. ru pogon was the immediate ancestral progenitors of O. sativa. The expression pro le of Hsf genes in Nipponbare and O. ru pogon under different stage of salinity stress showed that 4 root Hsf genes, including HsfA3a, HsfA4d, HsfC2a and HsfC2b, were simultaneously up-regulated by salinity stress in cultivated rice and its ancestral progenitor, implying that these 4 Hsf genes played conserved roles in rice in response to salinity stress. However, a substantial number of Hsf genes were exclusively regulated only in Oryza ru pogon rice seedling, suggesting that some of genuine salinity stress tolerance genes might be missing in cultivated rice.
Conclusion The results of this study would give insight into the evolution and function of Hsf gene members in rice, and hint to the use of wild relative genes to improve rice performance.

Background
Rice is one of the most important crops in the world, more than half of the population in the world use rice as the main source of food. However, the rice plants are growing under various biotic and abiotic stresses. For a case, abiotic stresses is serious threats to agricultural production and have become the primary cause of crop loss worldwide, reducing crop productivity by an estimated 50% annually [1]. Unlike animals, plants could not escape from the unfavorable environment because of immobility. During longterm evolution, plants have developed certain ability to survive under environmental stresses, such as the dominance of sporophyte that encloses the sensitive gametophyte, the presence of leaf epidermis with stomata for gas exchange, the formation of stress resistant dormant organs, and the presence of conducting tissues in long-lived and big plants for long-distance nutrient and water transport [2,3]. A network of interconnected cellular stress response systems is a prerequisite for plant survival and productivity under environmental stresses [4]. Among them, the interaction between Heat shock proteins (Hsps) and heat shock factors (Hsfs) plays a critical role. Hsps possess molecular chaperone activities and contribute to cellular homeostasis in cells under both normal and adverse growth conditions [4,5], while Hsf proteins regulate the expression of various stress-responsive genes and play a key role in providing tolerance to multifarious abiotic stresses [6]. Hsf proteins contain a DNA-binding domain and bind to Heat shock elements (Hses) with a consensus sequence of GAAnnTTCnnGAA [7][8][9], which is presented at the upstream of the TATA box of HS-inducible genes [10], such as Hsp genes [7][8][9]. As transcriptional activators of Hsps, Hsfs cooperated with Hsps to form a network responding to multiple environmental stress treatments apart from heat stress [11,12].
With the complement of genome DNA sequencing project, Hsf family members had been identi ed in more than 20 plant species at genome-wide scale [7], including Oryza sativa L. [8], Pyrus bretschneideri [13], Sorghum bicolor [14] and Populus [15]. The maximum of Hsf genes among monocots is 56 which was identi ed in wheat, while the maximum of Hsf genes among dicots were found in soybean [4]. Although plant Hsfs share highly conserved structure, their remarkable diversi cation across plants re ects their numerous functions as well as their integration into the complex stress signaling and response networks.
In another word, there is functional divergency between Hsf orthologs among and within plant species.
Therefore, it is necessary to adjust the research direction of Hsfs function from model plant to economic plant, including those with desired trait.
The genetic diversity in cultivated rice is lower than that of wild rice species, and the extent of DNA sequence diversity and phenotypic variation in wild Oryza is still being established [16]. Recent progress in sequencing the genomes of wild and cultivated rice genus is a good step in identifying their genetic differences associated with abiotic and biotic stresses-resistant ability. With increasing climate variability, selecting for traits allowing for the mitigation of abiotic stresses has now moved to the forefront of rice improvement. Wild rice species demonstrated abiotic stress tolerance, in which Hsfs played key role.
However, to the best of our knowledge, Hsf evolution in cultivated and wild rice genomes had not been fully studied. In this paper, we would identify and compare the Hsfs in 6 wild rice accessions and 1 cultivated rice variety, and analyze Hsf evolution between wild and cultivated rice.

Results
Identi cation and classi cation of Hsf genes HsfC1b gene. And two Hsf genes were found to be translocated among the 6 tested rice varieties. The HsfA8a gene was located on chromosome 3 in Nipponbare genomes, but the gene was located on chromosome 6 in O. nivara genome. HsfB1a gene was located on chromosome 9 in Nipponbare genome, but it was located on chromosome 3 in O. nivara genome.

Evolutionary analysis of Hsf genes
The full-length amino acid sequences of Hsf genes in the 6 wild and 1 cultivated rice genomes were aligned to evaluate the evolutionary relationship of Hsf proteins, and the orthologous pair of Hsf genes among the test 7 rice varieties were also identi ed by BLASTp. ObHsfA7b, ObHsfB2c and ObHsfC2a were removed because their alignment length was less than 200 bp. The numbers of orthologous pairs As shown in Fig. 1, all the Hsf proteins were divided into 3 different clades corresponding to the main Hsf classes A, B, and C. Within the Hsf gene A clade, eight distinct subclades (A1, A2, A3, A4, A5, A6, A7 and A8) were resolved and contained all of the Hsf A genes. The class A Hsf genes were divided into 13 orthologous gene clusters (OGCs). The C-type Hsf genes from the tested 7 rice varieties also constituted one distinct clade, had 4 OGCs and appeared to be closely related to the Hsf A group. Correspondingly, the B type Hsf genes were grouped into a separate clade subdivided into three groups (B1, B2 and B4). Notably, the B2 sub-clade was obviously distinct from the other two subclades and formed earlier than other subgroups, hinting that the B2 subgroup was the ancestor of other subgroups. Particularly, OrHsfB2b and ObHsfB2c genes were separated, suggesting that these two genes might be the ancestors of the rice Hsf genes, and other Hsf genes might have evolved through random replication and mutation of these two genes.

Selective constraints on OGCs of Hsf genes
To test the selective constraints on the evolution of each Hsf-family OGC, we detected the variations in dN/dS ratio among codon positions within each cluster, which was shown in Table 1. Overall, the average dN/dS ratio for the 25 clusters under model M0 was 0.217 (with a standard deviation of 0.112), which was statistically lower than 1 but greater than 0 (one-sample t-test, p < 0.01), illustrating that purifying selection was the predominant constraint on the evolution of Hsf gene family in rice. However, the loglikelihood differences between M3 and M0 were statistically signi cant for 15 OGCs, suggesting that the selective-constraint levels differed across these 15 Hsf OGCs. To evaluate whether positive selection facilitated the evolution of each Hsf OGC in rice, we compared M8 and M7 models to determine whether they had undergone positive selection. According to Yang et al. (2014), the following 3 criteria was used, including an estimate of ω > 1 under M8, any sites found to be under positive selection and a statistically signi cant likelihood ratio test (LRT) [17]. Finally, we found 8 Hsf OGCs had undergone positive selection during the evolution of rice because they satis ed the above 3 criteria (Table 1). The 3 main types of gene duplication result from chromosomal rearrangements, including retrotranspositions, tandem duplications or segmental duplications, and whole genome duplication (WGD), might drive the evolution of protein-coding gene families [18]. In this research work, MCScanX package was used to detect the origins of duplicate genes of Hsf gene family in the tested 7 rice variety genomes. As shown in Table 2, each member of Hsf gene family was grouped into one of ve different categories: singleton, WGD, tandem, proximal or dispersed, while percentage of total Hsf family member in each category was calculated.

Expression pro le of Hsf genes
Studying the gene-expression patterns of all members of a gene family would provide insight into their functional divergence [19]. To analyze the expression pro le of Hsf genes in cultivated rice and their immediate ancestral progenitors in response to same stress condition, we searched and downloaded two transcription data from the previous studies, one was Nipponbare rice seedling under 1 h salinity stress while the other was Oryza ru pogon rice seedlings under 12 days salinity stress. In early response of Nipponbare rice seedling to salinity stress, three up-regulated shoot Hsf genes, 7 up-regulated and 7 down-regulated root Hsf genes were found. Whilst, in the late response of Oryza ru pogon rice seedling to salinity stress, six up-regulated and 5 down-regulated shoot Hsf genes, 4 up-regulated and 1 downregulated root Hsf genes were detected (Table 3). This result indicated that more Hsf genes were regulated in the shoot of rice seedling in response to long-term than short-term salinity stress, the reverse was true in the root. However, the same tissue of rice seeding under short and long-term salinity stress did not have the common regulated Hsf genes, expect 4 root Hsf genes (Table 3). This result implied that different Hsf family members were function to response to different stage of salinity stress, or that cultivated rice and its immediate ancestral progenitor employed different Hsf genes to response to salinity stress. Four root Hsf genes, including HsfA3a, HsfA4d, HsfC2a and HsfC2b, were simultaneously up-regulated in Nipponbare under 1 h and Oryza ru pogon under 12d salinity stress, showing that they might function critically in the early and late response to salinity stress. Discussion Cultivated rice is considered to have been domesticated from wild rice thousands of years ago. However, in the long-term domestication, cultivated rice lost various valuable traits with regard to tolerance to cold, drought and salinity which derived from wild rice [19,20]. Hsf proteins are regarded as key role in providing tolerance to multifarious abiotic stresses [6], the Hsf genes had therefore been identi ed and analyzed in various plants species. In this study, we achieved 25 Hsf genes in the genome of cultivated rice variety, Nipponbare. This was consistent with the research report of Guo et al. [8]. Meanwhile, we found 22 Hsf genes in O. barthii genome, the same number as the research of Guo et al. [6]. Duplication drives the evolution of gene in planta [21]. In this research work, different numbers of Hsf genes were found in 6 wild rice varieties. This could be inferred from the results that dispersed duplication and WGD played an important role in the expansion of Hsf family genes in rice taxa, which lead to rice polyploid. Song et al. observed that Hsf genes were ampli ed by gene replication in Chinese cabbage and that the replication of Hsf genes promoted the polyploidy of Brassica [22]. Therefore, WGDs and additional recent lineage-speci c WGDs had presumably resulted in varying numbers of Hsf genes within owering plants [ 18] . Meanwhile, in this study, O. ru pogon genomes, as well as Nipponbare, had 25 Hsf genes, implying that O. ru pogon was the immediate ancestral progenitors of O. sativa [23].
The evolutionary tree of Hsfs in the test 7 rice varieties was all divided into three categories, A, B and C. The class A was the largest one. Qiao et al. [13] constructed the Hsfs evolution tree of Chinese white pear, poplar and Arabidopsis, and found that Hsfs of three species were clustered to the corresponding A, B and C [13]. But the number of China pear Hsf A branch was 19, while B and C branch Hsfs were 8 and 2. Nagaraju et al. used the same strategy to build the evolutionary tree of rice, Arabidopsis and sorghum Hsfs [14]. They also found that the most Hsfs was clustered into A branches followed by B branches, and the least was C branches. Meanwhile, they concluded that the difference between the three species of Hsfs in the A and B branches lead to the different responses of different species to stress [14]. Therefore, the results in this research work suggested that the different stress-tolerant ability between wild and cultivated rice might be induced by the difference of Hsfs between 6 wild rice and cultivated rice on the evolutionary branches.
The determination of orthology is an essential part of comparative genomics, which had been carried out by using synteny analysis in many previous studies [24,25]. In this research work, collinear genes and collinear regions were not found in Hsfs of Oryza meridionalis, which probably due to the serious loss of homologous genes in the Hsf regions or its imperfect genome assembly and annotation. The number of collinear regions of Hsf genes detected in the wild and cultivated rice genomes was lower, because some of collinear region was associated with the loss of Hsf genes. The collinear region plays an important role in understanding the replication, deletion and expansion of Hsf genes. In this research work, we found that different patterns of gene duplication contributed differentially to the expansion of the Hsf gene family in the investigated rice taxa (Table 1). This result further con rmed that divergence in Hsf gene number in these species was the result of Hsf gene duplication and/or loss in rice taxa.
As Nipponbare rice root was only exposed to 1 h salinity stress, less Hsf genes were regulated in the shoot than in the root. When only the root was exposed to environmental stresses, it might take time to transduce the signal and induce the expression of more genes in rice shoot [26]. OsTPP1 was found to be up-regulated in rice shoot after exposure to abiotic stress for 10h [27].The reverse results were saw in Oryza ru pogon rice seedling under 12d salinity stress, the immediate ancestral progenitor of cultivated rice. After the root exposed to long-term stress, the rice shoot could sense the stress and induce more Hsf genes to response to the stress. Meanwhile, more regulated Hsf genes might hint the critical stressresponse time stage for rice tissue. Ma et al. [28]found more genes in blast-resistant rice variety were regulated at 24 h than 48 h after blast fungus infection, and they revealed 24 h after infection was the critical response time for rice plants to resist blast fungus. Therefore, we could make a conclusion here that the root and shoot Hsf gens in rice seedling were regulated to response to short-term and long-term salinity stress, respectively.
Some common root Hsf genes, including HsfA3a, HsfA4d, HsfC2a and HsfC2 were up-regulated by salinity stress in Nipponbare and its ancestral progenitor, indicating that these four Hsf genes played important roles in rice during salinity stress. However, a substantial number of Hsf genes were exclusively regulated only in Oryza ru pogon rice seedling, suggesting that some of genuine salinity stress tolerance genes might be missing in Nipponbare. Many of the genes in cultivated rice have been selected by humans under eld conditions, not by environmental stress [26]. During rice domestication, some of stress-responsive genes might be omitted in cultivated rice in order to achieve high rice yield. However, the use of crop wild relative genes to improve crop performance was well established with important examples dating back more than 60 years, such as wheat and tomato. Therefore, these essentially missing genes in ancestral progenitor of the cultivated rice could serve as potential genetic resources for the improvement of cultivated crops.

Methods
Identi cation of Hsf genes in wild and cultivated rice genotype  [29] was used to search against the protein sequence dataset of the tested genome using HMMER software with a strict cut-off E-value of 10 − 5 and removing the repetitive sequence. All the candidate Hsf protein sequences were analyzed in the Pfam database to verify the presence of DBD domains with a strict cut-off E-value of 10 − 3.

Phylogenetic analyses
ClustalW software was used to multiple align HSF domains from the investigated rice varieties with default settings. After that, MEGA6.06 [32] was used to build up the maximum-parsimony phylogenetic trees with Neighbor-Joining method and 1000 bootstrap replicates, and displayed trees using TreeView (http://www.taxonomy.zoology.gla.ac.uk/ rod/treeview.html).
Hsf duplication patterns and collinearity analysis Local BLASTp was performed to search potential homologous gene pairs with a strict cut-off E-value of 1e-5, Identity > 80%, length > 200 bp, and output the top ve matches in each alignment results. Then the output gff le was used as input le for MCScanX to identify syntenic chains, as well as the duplication patterns in the Hsf gene family. At last, Hsf genes and their collinearity were labeled on the rice chromosomes by Circos [33] software.

Detection of positive selection
The program MEGA6 was used to obtain aligned sequence les and tree les, and DAMBE was used to obtain PML format which was input into the codeml program in PAML [34]. Using the program codeml, we detected variation in the ω parameter among sites by employing likelihood-ratio tests (LRTs) for M0 vs. between ω = 0 and ω = 1 among the sites, while the alternative model M8 adds a free parameter to the null model and allows positive selection to occur. In each LRT, twice the difference of the log likelihood of the two models was compared to the chi-squared (χ2) statistic,with the degrees of freedom (DFs) being equal to the difference in the number of parameters. In our analyses, the DFs were 4 for the M0 vs. M3 test and 2 for the M7 vs. M8 test [35].
Transcript pro ling of Hsf genes in wild and cultivated rice under high salt stress The RNA-seq data of Oryza ru pogon and Nipponbareare under high salt stress was derived from previous studies [26,36]. Brie y, The Cuffdiff software was employed to evaluate the differential gene expression between any two samples. The gene abundance differences between sample pairs were calculated based on the ratio of the FPKM values. The signi cance of gene expression differences was calculated using the FDR (False Discovery Rate) control method to justify the p-value. Here, only genes with an absolute value of log2 ratio ≥ 0.76 (fold-change ≥ 1.7) and a FDR signi cance score ≤ 0.001 were used for subsequent analysis.