Identification of Hsf genes in C. moschata and their physical and chemical characteristics
A total of 36 CmHsf genes were identified after the removal of false positives and the same genes (Table 1), and they were designated CmHsf1 to CmHsf36 according to the starting positions of these genes on the chromosomes (from Cmo_Chr00 to Cmo_Chr20, from top to bottom). The physicochemical parameters of each CmHsf were generated, and the predicted open reading frames (ORFs) ranged from 543 bp (CmHsf32) to 4380 bp (CmHsf13), with predicted proteins of 179-1458 amino acids. The physical and chemical parameters of these genes are similar to those seen in A. thaliana and O. sativa [44]. Furthermore, the molecular weights (MW) of these CmHsfs ranged from 20.5642 to 161.5554 kiloDaltons (kDa) (Table 1). Although the deduced heat shock transcription factors presented diversity in terms of the parameters mentioned above, most of the CmHsfs exhibited low isoelectric points (pI) (average 6.3) (Table 1). Subcellular localization prediction indicated that only 2 heat shock transcription factors (CmHsf12 and CmHsf17) were predicted to be localized to the cell membrane, cytoplasm and nucleus, while the remaining CmHsfs were predicted to be localized to the nucleus.
Classification and conserved domain analysis of 36 CmHsfs
To identify the phylogenetic relationships of the 36 CmHsfs, an unrooted phylogenetic tree was produced. These CmHsfs can be divided into three subfamilies (subfamily I, subfamily II and subfamily III; Fig. 1A) according to the amino acid sequence identity. Subfamily I (containing 21 members) was the largest group, and subfamily III included 13 members, while subfamily II presented the fewest members (2 members) (Fig. 1A). Furthermore, based on the structural characteristics of the conserved DBDs and HR-A/B domains, we can divide the 36 CmHsfs into three groups (A, B, and C) (Table 2). All CmHsfs contained a DBD and an HR-A/B domain (Table 2), and the DBD was composed of approximately 100 conserved amino acids (Additional file 2: Fig. S1). In addition, except for CmHsf27 and CmHsf32, all of the CmHsfs contained an NLS. The CmHsfs in group A contained an AHA domain, while the CmHsfs in groups B and C did not contain an AHA domain, and only the proteins in Group B contained an RD (Table 2). To further reveal conserved domains, all CmHsfs were submitted to MEME, and 10 different motifs were identified (Fig. 1B; Additional file 2: Fig. S2). Overall, the CmHsfs exhibited 4-9 motifs, and motifs 1, 2 and 4 were present in all CmHsf proteins. Motif 3 was present in all proteins except for CmHsf20 and CmHsf5. In addition, we found that motif 5 existed only in subfamily I, while motif 9 appeared only in subfamily III (Fig. 1B). The CmHsfs from the same clade usually present conserved domains or similar motif compositions, suggesting functional similarities among these proteins.
Exon-intron analysis of 36 Hsfs in C. moschata
An exon-intron organization map of the 36 CmHsf genes was also produced (Fig. 2). Different numbers of exons (from 2 to 26) were found in the 36 CmHsf genes, suggesting that CmHsfs are quite diverse. In subfamily III, except for CmHsf1, CmHsf10 and CmHsf35, which contained 9, 8 and 3 exons, respectively, the other CmHsf genes all contained 2 exons. CmHsf genes on the same branch usually presented similar intron-exon distributions, such as CmHsf26_CmHsf9. Some genes in the same family exhibited significantly different intron-exon distributions. For example, CmHsf12 contained 26 exons, which was different from the other CmHsfs, indicating that CmHsf12 may have a special function.
Chromosomal distribution and gene duplication of Hsf genes in C. moschata
Chromosomal distribution analysis in the genome revealed that the 36 CmHsf genes were unevenly distributed on 19 of the 21 chromosomes (Fig. 3). The chromosome Cm_Chr06 exhibited the most CmHsf genes, with 5 genes, followed by chromosome Cm_Chr05, with 4 genes. A total of 3 genes were present on each of chromosomes Cm_Chr03, Cm_Chr07 and Cm_Chr14, and 2 genes were present on each of chromosomes Cm_Chr02, Cm_Chr04, Cm_Chr10, Cm_Chr11 and Cm_Chr16, while no genes were distributed on chromosomes Cm_Chr00, Cm_Chr08 and Cm_Chr20.
Two genes, whose putative amino acid identity is >85% and gene alignment coverage is >0.75, were defined here as a recently duplicated gene pair [45-46]. A total of 18 duplicated genes were identified and divided into nine groups, each of which contained two duplicated genes. Eight duplicated gene pairs were distributed on different chromosomes (Fig. 3), which demonstrated that segmental duplication events were involved in the expansion of the CmHsf genes. CmHsf10 and CmHsf12 were separated by a region of more than 100 kb, indicating that all duplicated gene pairs had undergone segmental duplication events. The Ka and Ks ratios were less than 1.0, which suggested that the pairs had evolved mainly under functional constraints with negative or purifying selection (Table 3). We also calculated evolutionary times and divergence times of the duplicated C. moschata Hsf gene pairs ranging from 10.17 to 65.74 million years ago (Mya), averaging 21.11 Mya (Table 3).
Phylogenetic relationship of Hsfs in C. moschata, C. sativa and A. thaliana
To better evaluate the molecular evolution and phylogenetic relationship of plant Hsf, a phylogenetic tree of 79 Hsf proteins in C. moschata, C. sativa and A. thaliana was established. Based on the previous classification of C. moschata Hsf proteins (Fig. 1A), they were divided into 9 clades (Clade Ia-b, Clade II and Clade IIIa-e) (Fig. 4). Subfamily I was divided into Clade Ia and Clade Ib, and subfamily III was divided into Clade IIIa-e. This classification was consistent with the phylogenetic classification of AtHsf proteins [44]. In general, genes from subfamily I (Clade Ia and Clade Ib) (including 51 Hsfs) constituted the largest branch and accounted for 65% of the total Hsfs. Subfamily II contained 2 proteins. The remaining Hsfs belong to subfamily III and contain a total of 26 Hsf proteins. From the perspective of phylogenetic branch, the homology of Hsfs between C. moschata and C. sativa was higher than that between C. moschata and A. thaliana, which was consistent with the evolutionary rules of the three species.
Synteny analysis of Hsf genes in C. moschata
According to the synteny analysis of Hsfs in C. moschata and 5 other species (A. thaliana; Lagenaria siceraria; Cucumis sativus; Cucurbita maxima; Citrullus lanatus), we found that C. lanatus exhibited the most Hsf homologous genes (56), followed by L. siceraria (52), C. maxima (51) and C. sativus (51). A. thaliana presented the fewest (18) homologous genes (Fig. 5). Furthermore, the syntenic genes of the CmHsfs could be found on all chromosomes of A. thaliana, L. siceraria, C. sativus, C. maxima, and C. lanatus, indicating that the CmHsfs have remained closely related to those of these five species during the process of evolution. In addition, we found that certain CmHsf genes on chromosomes Cm_Chr02, Cm_Chr06, Cm_Chr08, and Cm_Chr016 corresponded to two or more Hsf genes in A. thaliana. This phenomenon was more fully reflected in the collinear diagram of C. moschata with L. siceraria, C. sativus, C. maxima and C. lanatus. In general, the collinear relationship between C. moschata and L. siceraria, C. sativus, C. maxima or C. lanatus) was closer than that for A. thaliana, suggesting that these species may have originated from the same ancestor. The collinear analysis showed that C. moschata and L. siceraria, C. sativus, C. maxima, and C. lanatus had frequent collinearity (Fig. 5), indicating that genes with collinear relationship may have similar functions.
Expression pattern of Hsf genes in C. moschata
To understand the physiological role of CmHsfs, we analysed the expression patterns of 36 heat shock transcription factors in the roots, stems, cotyledons and true leaves of C. moschata via quantitative real-time PCR. The transcriptional abundance of 36 C. moschata heat shock transcription factors can be obtained from at least one of the four tissues (Fig. 6; Additional file 1: Table S1). Heat map and cluster analyses showed that 21 CmHsfs were highly expressed in cotyledons and true leaves, such as CmHsf4, CmHsf32, CmHsf35, CmHsf19 and CmHsf15. Two genes (CmHsf9 and CmHsf10) were expressed more highly in the roots and stem than in the cotyledons and true leaves. Some genes were highly expressed only in one tissue. For example, CmHsf23 was mainly expressed in the roots, and its relative expression level was 100-258 times that in other tissues. Based on the above analysis, 36 heat shock transcription factors showed tissue specificity.
Cis-acting element analysis of Hsf genes in C. moschata
To explore the potential function of Hsfs, the cis-elements in the promoters (2 kb before the start codon) of the 36 Hsf genes in C. moschata were predicted. A total of 429 cis-elements were found among all CmHsfs. They were involved in 9 abiotic stresses, including showing salicylic acid responsiveness, defence and stress responsiveness, low-temperature responsiveness, abscisic acid responsiveness, gibberellin responsiveness, MeJA responsiveness, auxin responsiveness, drought inducibility and wound responsiveness (Fig. 7A; Additional file 1: Table S2). A total of 31% of the 429 cis-acting elements were involved in abscisic acid responsiveness, which existed in 32 of the 36 CmHsfs (Fig. 7B, 7C). In addition, 27% and 45% of the cis-acting elements were MeJA response elements (harboring CGTCA and TGACG motifs) and auxin response elements, respectively (Fig. 7B). Among the 36 heat shock transcription factors, 28 genes were involved in the MeJA response, and 22 genes were involved in the auxin response. A total of 14 heat shock transcription factors exhibited low-temperature response elements. Since the Hsf genes involved in abscisic acid responsiveness, low-temperature responsiveness, MeJA responsiveness and auxin responsiveness account for a high proportion of these genes, we speculated that these genes might play important roles in these stresses.
By analyzing the cis-acting elements of individual genes, we found that both CmHsf34 and CmHsf27 contained 12 abscisic acid response elements (Additional file 1: Table S2)In addition, CmHsf17, CmHsf26, CmHsf9 and CmHsf35 contained 8 MeJA response elements, and CmHsf23 and CmHsf35 contained the greatest number (3) of low-temperature response elements, which indicates that these key CmHsfs may play an important role in the corresponding stress response.
The response of CmHsf genes to temperature stress
To explore the response of CmHsfs to temperature stress, we cultured C. moschata seedlings at 4 °C and 38 °C. Under cold treatment, 44% of the CmHsfs (16 genes) were significantly upregulated, and 27% of the CmHsfs (10 genes) were significantly downregulated (Fig. 8; Additional file 1: Table S3). For instance, CmHsf3, CmHsf5, CmHsf23, CmHsf24, CmHsf27, CmHsf35 and CmHsf36 were highly expressed under cold stress. In addition, the CmHsf4, CmHsf15, CmHsf31 and CmHsf32 genes exhibited low expression levels under cold stress. At the same time, two genes (CmHsf28 and CmHsf30) were not expressed under cold stress, indicating that the expression of these genes may be limited under cold stress. Under heat treatment, 24 genes were significantly upregulated, and 12 genes were significantly downregulated (Fig. 8; Additional file 1: Table S3). The expression levels of CmHsf9 and CmHsf31 under heat stress were 128.38 and 66.39 times those in the control plants, respectively, suggesting that these two genes may play important roles under heat stress. Some genes presented low expression levels under heat treatment, such as CmHsf17, CmHsf11, CmHsf21, CmHsf22, CmHsf23 and CmHsf35. Considering the expression levels of the CmHsf genes under cold and heat stress together, we found that CmHsf9, CmHsf11, CmHsf21, CmHsf23, CmHsf31, CmHsf34 and CmHsf35 showed opposite trends under the two stresses, so we speculate that these genes may play important roles in temperature stress.
The response of CmHsf genes to hormones and salicylic acid
According to the prediction of cis-acting elements in the CmHsfs promoter, a total of 28, 32, and 19 CmHsf genes were found to be involved in the MeJA response, abscisic acid responsiveness and salicylic acid responsiveness, respectively (Fig. 7; Additional file 1: Table S2). Therefore, we analysed the responses of these genes to MeJA, ABA, and SA. The results of qRT-PCR analysis showed that 31 CmHsfs responded to MeJA to varying degrees, and the expression of CmHsf20 was 5.1 times that in the control (Fig. 9; Additional file 1: Table S4). Under ABA treatment, 21 CmHsfs were significantly upregulated, and 15 genes were significantly downregulated. The expression levels of CmHsf3, CmHsf4, CmHsf5, CmHsf6, CmHsf7, CmHsf8, CmHsf12, CmHsf25, CmHsf29 and CmHsf31 under ABA stress were 20~86 times those of the control plants, indicating that these genes play important roles under ABA stress. All CmHsfs responded to SA, among which CmHsf25, CmHsf27, CmHsf29 and CmHsf32 were significantly increased under SA treatment, while CmHsf1, CmHsf2, CmHsf23 and CmHsf28 were significantly decreased under SA treatment. Based on the above analysis, we conclude that CmHsf family genes are involved in multiple stresses and may play different roles in these stresses.