Identification and characterization of radish HSF proteins
To comprehensively identify the candidate HSF proteins in radish, a profile Hidden Markov Model (HMM) [32] search against NODAI radish genome protein sequences with the HSF domain (Pfam: PF00447) was performed. A total of 55 putative RsHSF gene sequences were obtained from the whole genome. SMART (http://smart.embl-heidelberg.de/) and Pfam (http://pfam.xfam.org/) were used to remove proteins with incomplete hsf-type DBD domain, while MARCOIL (http://toolkit.tuebingen.mpg.de/marcoil) was used to confirm OD (HR-A/B) domain presence. After removing sequences that encoded proteins without the complete DBD, OD (HR-A/B) and/or start/stop codons, 33 RsHSF genes remained for further analysis (Additional file 1: Table S1-S3). Subsequently, the conserved domains of all retained genes were identified by Heatster. The DBD, which consists of approximately 100 amino acids (AAs) at the N-terminus, was the most conserved domain. Overall, 31 AAs of the DBD domain were highly conserved among all RsHSFs (Additional file 2: Figure S1). NLS and NES are crucial for HSF intracellular distribution between the nucleus and cytoplasm. All RsHSFs had an NES, whereas only 13 contained the NLS (Additional file 3: Table S4).
Furthermore, the physical and chemical properties of 33 RsHSF proteins were analyzed with ExPASy (Additional file 3: Table S4). Among these, the RsHSF protein sizes varied from 238 to 2,427 AAs with molecular weights (MWs) from 27.80 to 274.05 KDa, respectively. Additionally, the theoretical pI of most RsHSF proteins was < 7, with the exception RsHSF-04, RsHSF-05, RsHSF-06 and RsHSF-25. All RsHSFs were classified as unstable proteins according to the instability index. The aliphatic index may be a positive factor that increased globular protein thermostability. RsHSF-25 had the largest aliphatic index (77.14). Most RsHSFs showed similar physical and chemical properties, while different AA sequences in non-conserved regions may alter some of the molecular characteristics.
Intron–exon structure and conserved motif distribution
To obtain information about the diversity of RsHSF gene structure, full-length complementary DNA (cDNA) sequences were compared with the corresponding genomic DNA sequences through GSDS. To further analyze RsHSF protein motif distribution, whole sequences were subjected to the MEME web server and a total of 25 AA motifs were generated (Additional file 2: Figure.S2).
Overall, 25 of 33 RsHSF genes had one intron, while the ramaining members have two or more introns. Additionally, proteins within the same subgroups had similar motif structures. All RsHSF proteins harbored motifs 1, 2 and 4, all of which highly correspond to the conserved DBD. Moreover, RsHSF-07, RsHSF-14 and RsHSF-26 in subgroup A6 presented a similar gene and motif structure. However, some motifs were only detected in specific RsHSF protein classes. For example, motif 3 was identified in classes A and C, while motifs 9 and 17 only existed in class A1. A proportion of RsHSF proteins in the same class exhibited similar motif distribution, indicating that these proteins might have conserved functions (Fig.1).
Comparative and phylogenetic analysis of RsHSF genes
To investigate the distribution of HSF subfamilies and the evolutionary relationship among different species, 33 RsHSFs combined with 35 BrHSFs, 19 VvHSFs, 25 OsHSFs and 21 AtHSFs identified from Chinese cabbage (B. rapa ssp. pekinensis) [33], grape (V. vinifera) [34,35] , rice (O. sativa) [36] and Arabidopsis [37] were employed for comparative analysis (Additional file 4: Table S5). Based on the number of AA residues between the A and B portions of the HR-A/B region, RsHSFs were classified into three classes, namely A, B and C. Intriguingly, the motifs 5 and 19 were only identified in class B (Fig.1).
A phylogenetic tree was constructed using maximum likelihood methods by IQ-TREE [38] with the full sequences of the 133 HSF proteins (Fig.2). All proteins were categorised into three groups corresponding to the HSF classes A, B and C. Additionally, there were 9 RsHSF class A subgroups clearly obtained on the basis of the bootstrap values and phylogenetic analysis of Arabidopsis and rice. According to the phylogenetic tree, class A consisted of 8 subclasses (A1, A3, A4, A5, A6, A7, A8, and A9) and contained the most RsHSF numbers, while class C accounted for the least proportion of HSFs among these five species. Interestingly, no RsHSFs, AtHSFs or BrHSFs clustered into classes A2, whereas AcHSFs did not appear in classes A6 and A7. This finding suggests that most RsHSF genes are more closely related to their corresponding homologous genes in Chinese cabbage and Arabidopsis.
Chromosomal location and duplication of RsHSF genes
To obtain the chromosomes distributions of RsHSF genes, their DNA sequences were physically plotted on the chromosomes through blast searches against the genomic sequences (Additional file 5: Table S6). In total, 28 RsHSF genes were mapped on the chromosomes with uneven distribution, with exception of five genes (RsHSF-22, 23, 25, 29 and 33) (Fig.3). Chromosome (Chr.) 5 had the largest number of RsHSF genes among three classes, followed by Chr. 4 (5 RsHSF genes from class A and B). Only one RsHSF gene was present in Chr. 8, while Chr. 1, 6 and 9 harbored two genes. Moreover, all the class B RsHSF genes were mapped on the chromosomes.
Genome duplications have been recognized and contributed to the expansion of gene family in plants [39, 40]. MCScanX was used to obtain information about origins of duplicate RsHSF genes in radish genome [41]. 13 (46.4%) RsHSF genes were duplicated and retained from a whole genome duplication (WGD) event and 11 (39.3%) were duplicated and retained from a singleton event (Additional file 6: Table S7). Eight WGD events of 10 duplicated RsHSF genes were identified and classified into four groups. Among these four groups, two harbored three genes (RsHSF-07, RsHSF-14 and RsHSF-26; RsHSF-08, RsHSF-16 and RsHSF-17), and these six genes were located on five chromosomes (Chr.2, 4, 5, 6 and 9). The other two groups contained two WGD events of four genes, and one duplication event harbored RsHSF-01 and RsHSF-03 which were located on Chr. 2 and 4, respectively (Fig.3). Only one WGD event (RsHSF-07and RsHSF-14) took place within the same chromosome (Fig.3, Additional file 7: Table S8). These results revealed that WGD or segmental duplication played an important role in the expansion of the radish HSF gene family.
Identification of orthologous and paralogous HSF genes
Polyploidization events contributed significantly to the evolution of flowering plants [42]. Previous studies showed that the common ancestor of Brassica and Raphanus have experienced a’ whole-genome triplication (WGT) event since its divergence from Arabidopsis. However, the gene losses of orthologous groups between Arabidopsis, Brassica and Raphanus for protein-coding genes occurred in both the Brassica and Raphanus lineages [39, 42]. Orthologous HSF genes among radish, Arabidopsis and rice were identified for triplication assessment using the Orthomcl-pipeline [43]. In total, there were 23 pairs of orthologous HSFs between radish and Arabidopsis, and eight paralogous gene pairs were identified in radish (Additional file 8: Table S9). It was reported that there are only six orthologous gene pairs between Arabidopsis and rice [23]. In addition, seven orthologous HSF gene pairs were identified between radish and rice (Additional file 8: Table S9). Among the orthologous HSF gene pairs between radish and Arabidopsis, AtHSF-11 and AtHSF-20 had three orthologous genes, three AtHSFs have two orthologous genes and 11 AtHSFs only had one orthologous gene (Additional file 2: Figure S3). These results indicate that many orthologous groups experienced gene losses. Moreover, five genes (RsHSF-01, RsHSF-12, RsHSF-13, RsHSF-20 and RsHSF-32) had no orthologous gene in either Arabidopsis or rice. These results provide useful resource and reference for further exploring the functions of RsHSF genes in radish (Fig.4).
Expression pattern of RsHSF genes in different tissues
To determine spatiotemporal expression patterns of RsHSF genes, the Reads Per Kilobase Per Million (RPKM) values of the 33 RsHSF genes in different tissues and developmental stages were collected from RNA-Seq data and presented in a heatmap (Fig.5, Additional file 9: Table S10). In general, a proportion of RsHSF genes were not expressed in several tissues and developmental stages. As shown in Fig.5, the RPKM value varied from 0 to 127.5 among the 33 RsHSFs genes. The number of RsHSF genes with RPKM >1 was 14 (42.4%). It was found that the RPKM values of four RsHSFA1s (RsHSF-15, 19, 23, 25) were <6 during all tissues and stages. Notably, among the other three RsHSFA1 (RsHSF-02, 18, 24) genes, RsHSF-18 and RsHSF-24 were highly expressed in the cortex, cambium and xylem. The majority of RsHSF genes exhibited differential expression patterns in the various tissues and developmental stages. RsHSF-21 and RsHSF-29 (class C) were relatively lowly expressed during all developmental stages. RsHSF-21 and RsHSF-29 (class C) were relatively lowly expressed during all developmental stages. In addition, RsHSF-13 expression increased significantly in both root and leaf after 7 days after sowing (DAS), and it was in leaves after 40 DAS. RsHSF-30 (class A8) expression was higher compared to RsHSF-28 (class A3) during all developmental stages. Besides, RsHSFB genes exhibited stage-specific expression patterns. Among class B, both RsHSF-04 and RsHSF-10 were lowly expressed during all stages, while RsHSF-03 was highly expressed.
RsHSF gene expression profile under abiotic stresses
RNA-Seq data showed that some HSF genes were differentially expressed under disparate stresses. The transcription levels (log2 (Fold Change) and log2TPM value) of certain RsHSF genes from the NCBI SRA were further analyzed (Additional file11: Table S12; Additional file 2: Figure S4). Several HSF genes, including HSFA1s were up-regulated in response to heat and salt stresses. RsHSFA4s (RsHSF-08, RsHSF-16 and RsHSF-22) were up-regulated under salt and lead (Pb) stresses, while they were down-regulated under HS. Additionally, HSFC1 (RsHSF-21) expression increased after Cd and salt treatments. HSFA1s as master regulators play vital roles in the HSR and activation of transcriptional networks. Expression levels of genes that encode HS responsive TFs including DREB2A, HSFA7s and HSFBs, are directly regulated by HSFA1s [44]. HsfA4a is induced by oxidative stress, including HS, and regulates APX1 expression [45]. To gain insight into the expression patterns of HSF genes in radish, 12 RsHSF genes that belonged to different groups were verified by RT-qPCR analysis under heat, salt, Pb, and Cd stress (Additional file 10: Table S11). Overall, the gene expression patterns varied significantly under the various treatments (Fig.6). Most genes were up-regulated after 24h HS exposure, while some genes were down-regulated after HS for 6h (Fig.6), such as RsHSF-16, RsHSF-22 and RsHSF-29. Moreover, RsHSF-18 expression was highly increased after 6h HS, implying that it can quickly respond to this stress. Further, RsHSF-18 was rapidly up-regulated under Cd treatment and during all heat and salt stress time points, while RsHSF-33 was only up-regulated in response to Pb stress. RsHSF-05, RsHSF-06, and RsHSF-33 were up-regulated under 200mg/L Pb treatment and are likely involved in the Pb stress response. RsHSF-11 and RsHSF-22, which were down-regulated under Cd stress, exhibited relatively high expression level under salt stress and thus they may contribute to salt tolerance.