Genome-wide identification and expression analysis of Hsf and Hsp gene families in cucumber (Cucumis sativus L.)

Heat shock proteins (Hsps) are molecular chaperones that participate in plant growth and development, and in plant responses to environmental stresses, including heat stress, Hsps' expression is monitored and regulated by specific types of transcription factors known as heat shock factors (Hsfs). Although the roles of Hsfs and Hsps in stress response have been investigated in some plants, their roles are still poorly understood in cucumber (Cucumis sativus L.). To reveal the number of Hsfs and Hsps and their members coping with various stresses in cucumber, the comprehensive analyses of Hsf and Hsp gene families were conducted. A total of 23 Hsfs and 72 Hsps were identified in the cucumber genome (v3.0), and their gene structure and motif composition were found to be relatively conserved in each subfamily. At least 23 pairs of heat shock genes may have undergone duplication in cucumber. The cis-element analysis indicated that the promotors of CsHsfs and CsHsps possess at least one hormone or stress response cis-element, suggesting that CsHsf and CsHsp genes might respond to different stress conditions. RNA-seq showed that most CsHsf and CsHsp genes were sensitive to heat stress, but not all. Interestingly, CsHsf and CsHsp genes up-regulated by heat stress were also up-regulated by NaCl, indicating that cucumber shares common pathways in the responses to heat and salt stress. In addition, our results demonstrated that the fewer introns in the stress genes, the more sensitive they are, which leads to the stronger adaptability of plants to various developmental conditions and environmental stimuli. For instance, comparing with other Hsp families, the genes of CsHsp20 family are shorter in length and have fewer introns, and 39% of them are intronless, however, most of CsHsp20s were highly induced by heat stress. These results provide valuable information to further clarify the functional characterization of the CsHsf and CsHsp genes for plant breeding purposes.


Introduction
During growth and development, plants are usually exposed to a variety of strenuous stresses from the surrounding environment, including not only abiotic stresses such as high temperatures, drought, salinity, but also biotic stresses such as pathogen invasion (Nejat and Mantri 2017). Plants, as sessile organisms, could not avoid the damage of adverse environments by changing their position. Therefore, a complex response and regulatory network has evolved at the biochemical, physiological, and molecular levels to adapt to adversity (Lamers et al. 2020).
Under the trend of global warming, high temperature has become a major issue for agriculture worldwide (Hayes et al. 2020). To prevent the negative effects of high temperature on plants, such as water loss, growth reduction, photosynthesis Communicated by Hua Wang.
Xueqian Chen and Zhiyuan Wang contributed equally to this work. efficiency reduction, some genes are induced quickly by heat stress for adaptive responses (Aleem et al. 2020).
Heat shock factors (Hsfs), being a specific type of transcription factors, normally exist as inactive proteins and can be induced rapidly to modulate growth responses under high temperature (Ohama et al. 2017). Previous studies have shown that the protein structure of Hsfs with a DNA-binding domain (DBD) and an adjacent bipartite oligomerization domain (HR-A/B or OD) in the N-terminal, and a nuclear localization signal (NLS), a nuclear output signal (NES) and an activation peptide motif in the C-terminal (AHA), is very conserved (Scharf et al. 1990). Besides, according to the number of amino acid residues inserted between HR-A and HR-B, plant Hsfs can be divided into three main classes, namely: A, B, and C (Guo et al. 2016).
Heat shock proteins (Hsps), working as molecular chaperones in both plant and animal species and being regulated by Hsfs, are synthesized in large quantities to prevent incorrect aggregation of proteins when plants subject to heat stress (Ohama et al. 2017). Hsps not only are transcriptionally regulated by Hsfs, but also are able to regulate the activity of Hsfs via feedback loop caused by their physical interaction. For instance, Hsp70 represses the activity of HsfA1, while Hsp90 stimulates the DNA binding activity of HsfB1 in tomato (Hahn et al. 2011). According to their apparent molecular weight, plant Hsps are divided into five groups: small Hsps (sHsps), Hsp60, Hsp70, Hsp90, and Hsp100 (Wang et al. 2004), which are distributed in many cell components, such as cytoplasm, nucleus, plastids, mitochondria, chloroplasts, endoplasmic reticulum, and peroxisomes, protecting different organelle structures (Jacob et al. 2017).
So far, the functions of Hsf and Hsp genes in various stress responses have been reported in many plants. In Arabidopsis thaliana, it has been proved that AtHsfA1 is the central regulator of heat stress regulation and response network, which could induce the expression of many downstream heat shock genes, for example, other Hsf subclasses (A2, A3, A7, B1, and B2) (Liu and Charng 2012;Yoshida et al. 2011).
AtHsfA2 not only performs an essential function in heat and osmotic stress response, but also exerts an important role in the growth and development of plants (Charng et al. 2007;Liu and Charng 2013;Ogawa et al. 2007). OsMSR3, coding a small heat shock protein, confers enhanced tolerance to cadmium and copper stresses in Arabidopsis (Cui et al. 2019). GhHsp24.7, a mitochondrial small heat shock protein in cotton regulates seed germination by inducing reactive oxygen species (ROS) via thermal sensing (Ma et al. 2019). CaHsp26 and CaHsp22.5 of sweet pepper (Capsicum annuum L.) play significant roles in the protection of PSII by enhancing photochemical activity and oxidation resistance during chilling stress . In tobacco (Nicotiana tabacum L.), potentiation of Hsp gene expression increased the resistance exposed to NaCl stress (Li et al. 2014b). Besides, ZmHsp16.9, a cytosolic class I small heat shock protein of maize, confers heat tolerance in transgenic tobacco (Sun et al. 2012). Hsp100/ClpB located in the chloroplast plays an active role in regulating the heat resistance of tomato (Solanum lycopersicum) (Yang et al. 2006). The overexpression of SlHsp70-1 resulted in internode elongation, suggesting that it functions in the shoot growth of tomato (Vu et al. 2019). MeHsp90.9 is critical for droughtinduced stress resistance in cassava by regulating hydrogen peroxide (H 2 O 2 ) and abscisic acid (ABA) (Wei et al. 2020). Moreover, Hsp90 genes in barley play roles in regulating hypersensitive responses to stripe rust (Pei et al. 2015).
Cucumber (Cucumis sativus L.) is a warm-season crop, but is vulnerable to heat stress (Li et al. 2014a). High temperature is one of the main abiotic stress factors affecting cucumber yield and quality in summer cultivation or protected land production (Ding et al. 2016). However, in cucumber, it is still unknown that which Hsf and Hsp genes play important roles in heat stress tolerance. Moreover, the updated cucumber genome ('Chinese Long' v3) provided a good chance to comprehensively analyze Hsf and Hsp families . In this study, we identified 23 Hsf and 72 Hsp genes including Hsp20, Hsp60, Hsp70, Hsp90, and Hsp100 in the cucumber v3 genome. Then we carried a comprehensive analysis of Hsf and Hsp genes in cucumber, including their sequence features, chromosomal locations, phylogenetic relationships, and dynamic expression patterns in response to heat, NaCl, and downy mildew stresses, which not only help to identify the function of cucumber Hsf and Hsp genes, but also provide some candidate genes for breeding new stress-resistant cucumber varieties.

Multiple sequence alignment and phylogenetic analysis
Multiple sequence alignments of CsHsf and CsHsp proteins were performed by using the ClustalX (version 1.83) program with default parameters (Thompson et al. 1997). The full-length amino acid sequences of Hsfs and Hsps in cucumber, Arabidopsis, rice, and tomato were uploaded to the MEGA v. 7.0 software (Kumar et al. 2016). The ClustalW alignment method was applied to the sequence alignment. Phylogenetic trees were then constructed based on the alignments using the neighbour-joining (NJ) method.

Chromosomal localization, gene duplication, and evolutionary analysis
All the chromosomal locations of CsHsf and CsHsp genes were performed with the MapChart software to visualize the positions and relative distances of genes on cucumber chromosomes (Voorrips 2002) based on the cucumber v3 genome. Multiple Collinearity Scan toolkit (MCScanX) was used to analyze the gene duplication events, with the default settings . The duplication of the CsHsf and CsHsp gene was determined according to two criteria: (a) the length of the shorter aligned sequence covered > 70% of the longer sequence, and (b) the two aligned sequences shared > 70% amino acid sequence similarity (Yang et al. 2008). In the 100 kb chromosomal fragment, two genes separated by less than five intermediate genes are considered to have undergone tandem duplication (Zhang et al. 2018). Codon alignment of duplicated genes was done by MEGA v.7.0 using the ClustalW codon alignment tool, and the homologous (Ks) and non-homologous (Ka) rates of the tandem and segmental duplications of cucumber Hsf and Hsp genes were calculated using KaKs_Calculator 2.0 (Wang et al. 2010). The divergence time (T) of each duplicated gene pair was calculated as T = Ks / 2r (r = 6.56 × 10 -9 ) × 10 -6 million years ago (Mya), with Ks being the synonymous substitutions per site and r being the rate of divergence for nuclear genes from plants (Koch et al. 2000). To reveal the synteny relationship between the orthologous Hsf and Hsp genes of cucumber, Arabidopsis, and rice, syntenic analysis maps were constructed using the Dual Systeny Plotter software (https:// github. com/ CJ-Chen/ TBtoo ls) (Liu et al. 2017).

Plant materials, stress treatments, and quantitative real-time PCR analysis
The 'Chinese long' 9930, a typical north-China cucumber, was used for expression analyses. The selected seeds after immersion and imbibition for 12 h were sown in pots containing vermiculite and matrix and then they were cultivated in a growth chamber under day/night temperatures of 25/18 ℃, with a 16 h/8 h photoperiod/dark period. Threeweek-old seedlings were subjected to heat treatment at 42 ℃ and the leaves were collected at 0 (control), 3, and 6 h. For NaCl treatment, the seedlings were subjected to 75 mM NaCl stress for 24 h. For downy mildew treatment, the seedlings were inoculated with Pseudoperonospora cubensis for 6 and 24 h. All the collected samples were quickly frozen in liquid nitrogen and stored in a − 80 °C refrigerator until RNA isolation. For the hormone treatments and quantitative real-time PCR analysis was performed as described previously (Luan et al. 2019).

RNA isolation and RNA-Seq analysis
Three independent biological samples were used in each experiment. We used the total RNA extraction kit (TRIzol, B511311, Sangon, China) to isolate the total RNA of each sample, and used RNase-free DNase I to remove genomic DNA contamination. The RNA Nano 6000 analysis kit of the Bioanalyzer 2100 System (Agilent Technologies, California, USA) was used to evaluate RNA integrity. The total RNA amount of each sample is 1 µg, which is used as input material for RNA sample preparation. According to the manufacturer's recommendations, use the NEBNext® UltraTM RNA Library Prep Kit for Illumina® (NEB, USA) to generate a sequencing library, and add the index code to the attribute sequence of each sample. The PCR products were purified on the Agilent Bioanalyzer 2100 system (AMPure XP system) and the library quality was evaluated. According to the manufacturer's instructions, using TruSeq PE Cluster Kit v3-cBot-HS (Illumia) to cluster the index code samples on the cBot cluster generation system. After the clusters were generated, the library preparation was sequenced on the Illumina Novaseq platform, and 150 bp paired-end reads were generated. HISAT2 (version 2.0) was used to map clean reads to the cucumber reference genome (v3, http:// cucur bitge nomics. org/ organ ism/ 20) using default parameters. StringTie (v1.3.3b) uses a reference-based method to assemble the mapped reads for each sample. FeatureCounts v1.5.0-p3 was used to calculate the number of reads corresponding to each gene. Then calculate the FPKM of each gene according to the length of each gene, and count the reads of the gene. Differentially expressed genes (DEGs) were identified by the DESeq2 R package (1.16.1). The transcript abundance of cucumber Hsf and Hsp genes was calculated based on the number of fragments per kilobase of transcript per million fragments mapped (FPKM) in exon patterns. The RNA-seq data of CsHsf and CsHsp genes in different tissues and in responses to NaCl and downy mildew stresses were obtained from the Cucurbit Genomics Database (http:// cucur bitge nomics. org/ rnaseq/ home, PRJNA80169, PRJNA437579, and PRJNA388584). Based on the converted data of log2 (FPKM + 1) value, heatmaps were generated with HemI1.0.

Identification and analysis of Hsf and Hsp genes in the cucumber genome
After HMM analysis, BLASTP and keyword search against the Cucumber Genomic Database (Version 3.0), a total of 23 Hsf and 72 Hsp genes in cucumber were identified, and among the 72 Hsps, 33, 15, 12, 6 and 6 genes belonged to Hsp20, Hsp60, Hsp70, Hsp90 and Hsp100 families, respectively. These Hsf and Hsp genes from Cucumis sativus were abbreviated as CsHsfs and CsHsps, and they were named according to their chromosomal orders. Detailed information of the CsHsf and CsHsp genes was listed in Table S1, including the gene name, gene ID, chromosome location, length of the open reading frame, number of exons, number of amino acids, molecular weight, protein isoelectric point, and subcellular localization prediction.
Then, we conducted a prediction of conserved motifs shared among the related proteins in each subfamily using the Multiple Expectation Maximization for Motif Elicitation (MEME) and identified 10 putative motifs in each family. In general, most closely related members of the phylogenetic trees in different subfamilies had the similar motifs (Fig. 1). Details of these motifs were shown in Table S2.
Like Hsfs in other plants, the protein structure of CsHsfs is also highly conserved. Based on the analyses of Pfam, CDD, and SMART, we found all the CsHsf proteins contain the DBD domain composed of motif 1 and motif 2. Multiple alignments of the CsHsf protein sequences revealed a highly conserved DBD domain existed in all CsHsfs (Fig. S1A). The HR-A/B region was an essential domain in Hsfs, which was characterized by the predicted coiled-coil structure (Guo et al. 2016). The HR-A/B domain in CsHsfs was composed of two typical motifs (motif 3 and motif 4) (Fig. 1). In addition, 21 amino acids were inserted between the HR-A and HR-B regions in class A CsHsf proteins and 7 amino acids in class C (Fig. S1B). However, there was no insertion between the HR-A and HR-B regions in Class B. AHA motif (motif 6) was distinctive in the great majority of class A CsHsf proteins, and 8 (66.7%) CsHsfA proteins have AHA motif. NLS and NES are of vital importance for the intercellular distribution and interactions of Hsf proteins in the nucleus and the cytoplasm (Heerklotz et al. 2001). 17 (73%) CsHsfs contain NLS domain, and 12 (52%) NLS domain (Table S1-1).
All the CsHsp20s have a highly conserved α-crystallin domain (ACD) at the C-terminus. Multiple sequence alignment analysis and sequence logo showed that the ACD domain consisted of two conserved regions, region I and region II (Fig. S2). In addition, the consensus region I contains motif 3, and the region II motif 1, and the motif 6 and motif 4 were inserted between the consensus region I and II (Fig. 1b).

Phylogenetic analysis of CsHsf and CsHsp
We further investigated the evolutionary relationships of Hsfs and Hsps in cucumber, Arabidopsis, tomato, and rice, and based on the full-length amino acid sequences, we generated an unrooted phylogenetic tree by MEGA 7.0 using the neighbor-joining method. The CsHsf gene family could be divided into three classes: Class A (12 genes), class B (9 genes), and class C (2 genes), and each class can be further subdivided into subclasses according to the branches (Fig.  S3). In addition, each CsHsp gene family could be allocated into the subfamilies according to the predicted subcellular location of the protein. For example, 130 Hsp20s from cucumber, Arabidopsis, rice and tomato, were divided into 17 different subfamilies, including CI (Cytosol I) (42 genes), CII (7 genes), CIII (3 genes), CIV (3 genes), CV (4 genes), CVI (4 genes), CVII (3 genes), CVIII (10 genes), CIX (5 genes), CX (9 genes), CXI (5 genes), MI (mitochondria I) (6 genes), MII (7 genes), MIII (4 genes), P (plastids) (7 genes), Po (peroxisome) (3 genes) and ER (endoplasmic reticulum) (8 genes) (Fig. S4). The Hsp60 family consisted of four subfamilies. According to the prediction, 24 and 6 Hsp60s of the I and IV subfamilies were mainly located in cytoplasm, 9 Hsp60s belonging to the II subfamily in mitochondria, and 15 Hsp60s of the III subfamily in chloroplast (Fig. S5). The 81 Hsp70s were divided into seven groups (I to VII), among which the I to V groups were clustered in the DnaK subfamily, and the VI and VII groups were clustered in the HSP110 / SSE subfamily (Fig. S6). The group I contained four CsHsp70s, which was the largest group mainly located in cytoplasm. Only CsHsp70-8 belonged to the group II, being involved in the metabolism of endoplasmic reticulum. The two members (CsHsp70-2 and CsHsp70-11) of group III function mainly in chloroplast and only CsHsp70-5 belonging to the group IV participates in the metabolism in mitochondrion. CsHsp70-3 belonged to the group V, being widely distributed in various subcellular regions (cytoplasm, chloroplast, mitochondria, plastid, and unknown regions). The Hsp90 family could be divided into five classes, among which the members of class I (CsHsp90-1 and CsHsp90-2) and II (CsHsp90-4) are mainly effective in the cytoplasm (Fig. S7). According to the prediction of subcellular localization, the members of class III (CsHsp90-3), IV (CsHsp90-6), and V (CsHsp90-5) play roles in the endoplasmic reticulum, mitochondria, and chloroplast, respectively. The Hsp100 family consisted of four groups, in which the members of groups I (CsHsp100-5) and IV (CsHsp100-1 and CsHsp100-4) produce marked effects in chloroplast, and the members of groups II (CsHsp100-6) and III (CsHsp100-2 and CsHsp100-3) make contributions to the homeostasis of mitochondria and cytoplasm, respectively (Fig. S8). Although the roles of most heat shock genes in cucumber are yet to be elucidated, Hsf and Hsp genes with conserved functions in different plants might show a tendency to aggregate to the same subgroup and have a recent common evolutionary origin.

Chromosomal location and synteny analysis of CsHsf and CsHsp genes
Although CsHsf and CsHsp genes are distributed on the 7 chromosomes of cucumber, the distribution appears uneven (Fig. 2). A relatively less density of CsHsf and CsHsp genes was found on chromosome 4 (10 genes), 6 (9 genes) and 7 (6 genes), while more CsHsf and CsHsp genes are located on chromosome 1 (22 genes), 2 (12 genes), 3 (18 genes) and 5 (18 genes), and most of CsHsps are distributed at both ends of chromosomes.
During the process of plant evolution, gene duplication, especially tandem and segmental duplication events, being the main mechanism for the expansion of gene families, made great contributions to the diversity of gene families (Kotak et al. 2004;. Through the analysis of duplication events of CsHsf and CsHsp genes, we found that only CsHsp20 genes had tandem duplication. Among the 33 CsHsp20 genes, 16 (49%) CsHsp20 genes had tandem duplication events, resulting in the formation of 6 tandem duplication clusters (Fig. 2). On chromosome 1 and 3, three tandem duplication clusters were composed of three pairs of genes (CsHsp20-7/CsHsp20-8, CsHsp20-9/CsHsp20-10 and CsHsp20-14/CsHsp20-15). In addition, two groups of tandem duplicated genes were located on chromosome 1 and 5, each of which includes three genes (CsHsp20-3/CsHsp20-4/ CsHsp20-5 and CsHsp20-21/CsHsp20-22/CsHsp20-23). Only one tandem duplication cluster was composed of the four similar genes (CsHsp20-24/CsHsp20-25/CsHsp20-26/ CsHsp20-27), which are present on chromosome 5. The above results showed that tandem duplications greatly promoted the expansion of the CsHsp20 gene family. Besides tandem duplication events, using BlastP and MCScanX, we also identified 13 segmental duplication events including 10 CsHsf and 14 CsHsp genes (Fig. 2). Furthermore, to analyze the selection of the aboveduplicated gene pairs, the non-synonymous to synonymous substitution ratios (Ka/Ks) were calculated (Table S3). Ka and Ks values are used to analyze the selective pressure on a protein-encoding gene as well as the approximate date of duplication events. Ka/Ks ratio = 1 is commonly used to identify the genes under neutral mutation or no selection, and Ka/Ks > 1 indicates the genes evolved under positive selection, while Ka/Ks < 1 indicates the negative purifying selection. In this study, the Ka/Ks values of 22 pairs (96%) of duplicated genes were < 1, indicating that they had experienced strong purifying selection, and only one pair of tandem duplicated genes showed a Ka/Ks ratio > 1, suggesting that these genes may have experienced positive selection. The Ks values of these duplicated gene pairs ranged from 0.2153 to 6.1528, corresponding to divergence times of 16.41 to 468.96 Mya (Table S3).
To further infer the phylogenetic mechanism of the CsHsf and CsHsp families, we performed a synteny analysis of the heat shock genes in cucumber, Arabidopsis, and rice (Fig.  S9). A total of 52 pairs of heat shock genes showed the synonymous relationship between cucumber and Arabidopsis, followed by rice was 33. Many cucumber heat shock genes were homologous to both Arabidopsis and rice, and most of the homologous genes had Ka / Ks < 1 (Table S4), suggesting that these genes are essential in plant evolution and contribute greatly to maintaining the function of heat shock genes.

Analysis of putative cis-acting elements in the promoters of CsHsf and CsHsp genes
To identify potential cis-acting elements in the promoter regions of CsHsf and CsHsp genes, 1500 bp upstream sequences from translational start sites extracted from the cucumber genome database, were submitted to the PlantCARE database, and 12 types of hormone response elements and 9 types of stress-induced components were found (Table S5). Among the 95 genes, 60, 34, 55, 36, 50 and 38 (63%,36%,58%,38%,53% and 40%) genes have at least one type of abscisic acid (ABA)-responsive element, auxin (IAA)-responsive elements, ethylene (ER)-responsive elements, gibberellin (GA)-responsive elements, Jasmonic acid (MeJA)-responsive elements and salicylic acid (SA)responsive elements, respectively (Fig. 3). In order to verify whether these genes can respond to hormone treatment, we performed different hormone treatments on cucumber plants, including ABA, MeJA, SA, and then detected the gene expression level by qRT-PCR. The results showed that if a gene's promoter has a certain hormone-responsive element, indeed the gene can respond to the corresponding hormone treatment (Fig. S10). For stress-induced components, one or more ARE(s) existed in 14 (61%) CsHsf and 61 (87%) CsHsp genes, being involved in Hypoxia-inducible response. WUN-motifs were presented in 9 (39%) CsHsf and 31 (43%) CsHsp genes, participating in wound response. What's more, heat shock elements (HSEs) were found in 12 (52%) CsHsf and 40 (56%) CsHsp genes. In addition, other elements, such as MBS, LTR, TC-rich repeats, and W-box, known to function as stress-induced components, were also found in CsHsf and CsHsp genes, being effective at variable positions and effectively responding to drought, low temperature, and biotic stresses, while GC-motif, which is useful in anoxic specific inducible response, only existed in 4 CsHsp genes (Fig. 3). The above analysis of cis-elements showed that CsHsf and CsHsp genes might respond rapidly to different stress conditions to reduce the damage caused by unfavorable environments, through maintaining physiological and metabolic balance for the normal growth of cucumber.

Expression patterns of CsHsf and CsHsp genes in different tissues
To address the expression patterns of CsHsf and CsHsp genes in different tissues, we analyzed the cucumber Illumina RNA-seq data obtained from the Cucurbit Genomics Database (http:// cucur bitge nomics. org/ rnaseq/ home), and obtained the expression patterns of CsHsf and CsHsp genes in different tissues including root, stem, leaf, male flower, female flower, ovary(unexpanded), expanded ovary (fertilized), expanded ovary (unfertilized), and tendril (Fig. 4). The expression patterns of CsHsf and CsHsp genes varied largely in different tissues. Among the CsHsfA family, CsHsf-9 (CsHsfA1) and CsHsf-7 (CsHsfA2) were expressed at relatively high levels in all tissues, while CsHsf-4 (CsHsfA9) was not expressed in all tissues. Especially, CsHsf-23 (CsHsfA7) was more highly expressed in the tendril compared with other tissues, indicating that it might function in tendril. In CsHsfB family, CsHsf-12 and CsHsf-17 were expressed at high abundances in all tissues. In contrast, CsHsf-15 (CsHsfB3) was not expressed in all tissues. Notably, CsHsf-20 (CsHsfB1) more highly expressed in the root than in other tissues, indicating that it might play an important role in root. As for CsHsp genes, except for that CsHsp20-6, CsHsp20-24, CsHsp20-26, and CsHsp20-30 were almost not expressed in any tissue or organ, most of the CsHsp genes were expressed in at least one tissue. Almost all of the CsHsp60s had low transcript levels in male flowers. Some genes had high levels in tendril, such as CsHsp20-4, CsHsp20-5 and CsHsp70-10. The results implied that these genes might play different roles in the growth and development of cucumber plants.

Expression profiles of CsHsf and CsHsp genes in response to abiotic and biotic stresses
To explore the responses of CsHsf and CsHsp genes to biotic and abiotic stresses, the expression patterns of CsHsf and CsHsp genes in response to stresses were investigated using three sets of RNA-seq data. Our RNA-seq data were used to analyze the responses of CsHsf and CsHsp genes to heat stress, and the RNA-seq data response to NaCl and downy mildew stresses were obtained from the Cucurbit Genomics Database (http:// cucur bitge nomics. org/ rnaseq/ home) for the expression pattern of CsHsf and CsHsp genes. According to the log2 (FPKM + 1) value, the expression of many CsHsf and CsHsp genes was increased after heat treatment (Fig. 5). Interestingly, CsHsf and CsHsp genes up-regulated under heat stress were usually also up-regulated under NaCl treatment, indicating that cucumber share similar responses to heat and salt stresses. The expression of CsHsf-7 (HsfA2) increased sharply after 3 h of heat treatment compared with the control, indicating that CsHsf-7 is a very sensitive factor during the heat stress response in cucumber. Previous studies have shown that HsfA1 is a master regulator that triggers the thermal response, leading to the acquired thermotolerance in tomatoes and Arabidopsis, but it is not induced by heat stress (Mishra et al. 2002;Yoshida et al. 2011). Here, CsHsf-9 (CsHsfA1) has the same expression pattern with AtHsfA1 and SlHsfA1 (Fig. 5a), suggesting that CsHsf-9 might also be a master regulator in cucumber heat stress tolerance. The expression level of most CsHsp20s increased significantly in the first 3 h after heat treatment compared with the control and then decreased gradually in the next 3 h (Fig. 5b). After heat or NaCl treatment, the expression level of many CsHsp60s slightly decreased (Fig. 5c). The infection of downy mildew reduced the expression level of most CsHsfs and CsHsps in cucumber (Fig. 5), suggesting that CsHsfs and CsHsps might play contrary roles between abiotic and biotic stresses.

Potential protein-protein interactions between CsHsfs and CsHsps
To further explore the possible protein-protein interactions between CsHsfs and CsHsps, an interaction network was constructed using the STRING program. As one of the most important heat shock transcription factors, CsHsf-7 have potential interactions with CsHsp20s, CsHsp70s, CsH-sp90s, and CsHsp100s, implying that CsHsf-7 is an activator of the downstream CsHsps and acts as a key regulator to improving adaptability of cucumber to various stresses (Fig. 6). According to the prediction, there are strong interactions between CsHsf-9 and many CsHp70s (CsHsp70-6, CsHsp70-9, and CsHsp70-10), reflecting previous studies that Hsp70 and Hsp90 interact with HsfA1 under normal conditions and inhibit HsfA1 activity (Hahn et al. 2011;Ohama et al. 2017). In addition to the potential regulatory relationship between CsHsfs and CsHsps, we also detected many other interactions among CsHsps of different subfamilies, such as interactions among CsHsp60, CsHsp70, and CsHsp90, suggesting that the CsHsps of different subfamilies might also be activated or inhibited through interactions with each other when cucumber was subjected to various stresses.

Discussion
The high temperature usually causes the expression changes of related genes in organisms (Ohama et al. 2017). Among the changed genes, Hsps encode a type of conserved proteins, which is synthesized rapidly by organisms under heat stress, and was first found in the salivary glands of Drosophila melanogaster (Tissiéres et al. 1974). Studies have shown that when plants are subjected to heat stress, they can develop tolerance to lethal high temperatures, in which Hsps play important roles. In addition, low temperature, drought, heavy metals, salinization, and ABA can induce plants to produce Hsps (Wang et al. 2004). The expression of Hsps is regulated by heat shock factors (Hsfs), which are the central regulators of Hsps expression and participates in heat stress response (Guo et al. 2016). Therefore, it is important to clarify the number, tissue specific expression and expression pattern induced by stresses of CsHsfs and CsHsps for understanding the mechanism of cucumber stress tolerance, especially heat tolerance. In this study, we identified 95 heat shock genes composed of 23 CsHsfs and 72 CsHsps (33 CsHsp20s,15 CsHsp60s,12 CsHsp70s,6 CsHsp90s, in cucumber, and then analyzed their gene structure, conserved motifs, phylogenetic relationships, chromosome distribution, duplication events, cis-acting elements, expression patterns under stress and potential protein-protein interaction among CsHsfs and CsHsps. Phylogenetic analysis is often used to provide insights into the gene evolutionary relationships of species and to help to identify orthologs between species and paralogs within species. Based on the full-length protein sequences of Arabidopsis, rice, tomato, and cucumber, we constructed unrooted phylogenetic trees of Hsps and Hsfs, respectively. Though the total number of Hsf genes was similar to that of Arabidopsis and tomato (Guo et al. 2008), the members of some specific Hsf subclasses in cucumber were different from the other two species. For instance, the number of subclass HsfA1 members in cucumber was less, only CsHsf-9 belongs to the HsfA1 subclass, but the number in subclass HsfA4 was more than that in tomato and Arabidopsis (Fig.  S3). Most of CsHsp20 genes were classified into nucleocytoplasmic subfamilies (Fig.S4), which are similar to the distribution in Arabidopsis, rice, and tomato (Sarkar et al. 2009;Scharf et al. 2001;Yu et al. 2016). Among these subfamilies, CI was the largest subfamily, containing 14 CsHsp20 genes. Since the proteins are mainly transported to cytoplasm (Nelson et al. 2014), we speculated that cytoplasm may be the main place for Hsp20 proteins as molecular chaperones to interact with the target proteins to regulate their folding, positioning, accumulation and degradation. Hsp70 gene family was divided into 7 groups, and each group contains Hsp70s from Arabidopsis, rice, and cucumber, indicating that the Hsp70 family genes occurred in the early stage of plant evolution, earlier than the divergence of dicots and monocots (Fig. S6).
In terms of intron number, intron, and exon length, the most closely related members of the same Hsf or Hsp subfamily have similar gene structures (Fig. 1). The number of introns usually affects the activity of gene transcription regulation (Le Hir et al. 2003). Some studies suggest that under various stress conditions, the splicing mechanism of RNA is often disturbed, and transcripts without introns can be transferred from the nucleus to the cytoplasm rapidly without splicing, which improves the response efficiency of plants, thus leading to the tendency to select intronless genes during evolution (Le Hir et al. 2003). The fewer introns the Fig. 4 Heat map of the expression profiles of CsHsf and CsHsp genes in different tissues. RNA-seq relative expression data from 9 cucumber tissues including root, stem, leaf, male flower, female flower, ovary(unexpanded), expanded ovary (fertilized), expanded ovary (unfertilized), and tendril were used to reconstruct the expression patterns of CsHsf and CsHsp genes. The raw data were obtained from the Cucurbit Genomics Database (http:// cucur bitge nomics. org/ rnaseq/ home, PRJNA80169). FPKM values of CsHsf and CsHsp genes were transformed by log2 (FPKM + 1) and the heat map was constructed by HemI software. Heat maps are presented in blue/yellow/red colors that represent low/medium/high expression, respectively. Color scales representing the relative expression values are shown on the upper of heat map. The normal relative expression levels of 23 CsHsf and 72 CsHsp genes in different tissues are shown in Table S6 ◂ CsHsp100-1 CsHsp100-3 CsHsp100-4 CsHsp100-2 CsHsp100-5  (Chung et al. 2006). Comparing with other Hsp families, the CsHsp20 family genes are shorter in length and have fewer introns. 39% of CsHsp20s were intronless (Fig. 1), and most of them were highly induced under high-temperature stress (Fig. 5), which may prove the above standpoints. The expansion of gene families in angiosperms may be the result of gene duplication and whole-genome duplication (WGD) at different time points in the evolutionary process . Like most gene families, Hsfs and Hsps seem to have undergone a complex evolutionary process (Swindell et al. 2007). The diversity of Hsf and Hsp genes in cucumber may be the result of multiple gene duplication events including multiple segmental and tandem duplications. In this study, 23 CsHsf and 72 CsHsp genes were found to be unevenly distributed on 7 cucumber chromosomes, and the gene duplication analysis showed that 10 CsHsf and 23 CsHsp genes were duplicated in the Cucumber genome, forming 13 segmentally duplicated gene pairs and 5 tandem duplicated gene groups (Fig. 2), in cucumber. Although the Hsp genes identified in cucumber was relatively fewer than that in rice (Hu et al. 2009), all the subfamilies existed in the cucumber genome, and a total of 52 pairs of heat shock genes showed a syntenic relationship between cucumber and Arabidopsis, and 33 pairs between cucumber and rice (Fig. S9), which shows that the heat stress response (HSR) system is conserved in composition, but is differentiated in duplicated gene numbers. The Ka / Ks value of most duplicated gene pairs was less than one, indicating that most of them undergone purifying selection (Table S3  and Table S4), which might be the key point for maintaining the conserved structure of Hsf and Hsp genes throughout the evolution. Similar expression patterns under heat stress were found within the tandem duplicated gene groups (Fig. 5), indicating that tandem duplicated CsHsp20 genes with similar structure should have functional redundancy.
A large of studies have shown that Hsf and Hsp genes produced a marked effect in response to a variety of abiotic stresses (Guo et al. 2016;Scharf et al. 2012) In this study, we analyzed the dynamic expression levels of CsHsf and CsHsp genes under heat, NaCl, and downy mildew stresses by RNA-seq (Fig. 5). We have observed that most CsHsf and CsHsp genes were induced by heat stress, but not all. Interestingly, CsHsf and CsHsp genes up-regulated under heat stress were usually also up-regulated under NaCl treatment, indicating that cucumber has similar responsive mechanism between heat and salt stresses. The results showed that the expression of most heat shock genes was up-regulated, while some CsHsp60 genes were downregulated after heat treatments, which may be related to mitochondrial activity and programmed cell death (PCD) (Rikhvanov et al. 2007). Different Hsps were induced to different levels under heat stress, indicating that CsHsps in different subfamilies may have distinct regulatory properties. In addition, the CsHsf and CsHsp gene promoters have multiple stress response components (Fig. 3), and there are many potential protein-protein interactions among CsHsfs and CsHsps (Fig. 6). These provide many chances to form a complex regulatory network to respond to different stress conditions in cucumber.
In the end, we believe that CsHsf-7 is a very ideal candidate gene for plant breeding of heat and salt tolerance. First, from the results of RNA-seq (Fig. 5), among all 23 hsfs, CsHsf-7 is the most strongly induced one by heat and salt stress. At the same time, from the perspective of phylogenetic relationship (Fig. S3), CsHsf-7 belongs to the A2 subfamily, and its homologous gene, AtHSFA2 in Arabidopsis plays an important role in responding to multiple stress responses. For example, the overexpression of AtHSFA2 in Arabidopsis increased heat, salt and osmotic stress tolerance (Guo et al. 2016). Therefore, we could screen the lines with higher induced CsHsf-7 among various cucumber varieties, and compare the difference of CsHsf-7 promotor based on the expression of CsHsf-7. Then, the marker linked with the highly induced CsHsf-7 could be developed for plant breeding of abiotic stress tolerance. In addition, in terms of the possible protein-protein interaction between CsHsf and CsHsp (Fig. 6), CsHsf-7 had potential interactions with multiple CsHsps, implying that CsHsf-7 might co-operate with CsHsps to play roles in abiotic stress tolerance. Then, the CsHsps, such as CsHsp70-1, 70-7 and 90-4, could be driven under CsHsf-7 promotor for their coordination in abiotic stress tolerance. In addition, CRISPR-Cas9 or RNA interference techniques could be used to knock out or knock down the genes inhibited by P. cubensis, such as CsHsf-5 and -12 (Fig. 5), to increase the resistance of cucumber to downy mildew. Of course, their specific functions need to be verified by subsequent transgenic experiments.  5 Expression profiles of CsHsf and CsHsp genes under heat, NaCl and downy mildew stresses. Our own RNA-seq data were used to analyze the response of CsHsf and CsHsp genes to heat stress, and the RNA-seq data for CsHsf and CsHsp genes in response to NaCl and downy mildew stresses were obtained from the Cucurbit Genomics Database (http:// cucur bitge nomics. org/ rnaseq/ home, PRJNA437579 and PRJNA388584). FPKM values of CsHsf and CsHsp genes were transformed by log2 (FPKM + 1) and the heat map was constructed by HemI software. Heat maps are presented in blue/white/red colors that represent low/medium/high expression, respectively. Color scales representing the relative expression values are shown on the upper of heat map. The normal relative expression levels of 23 CsHsf and 72 CsHsp genes under heat, NaCl and downy mildew stresses are shown in Table S7 ◂

Conclusions
In summary, we identified 95 members of cucumber Hsf and Hsp (including Hsp20, Hsp60, Hsp70, Hsp90, and Hsp100) gene families in this study, and then analyzed their gene structures, conserved motifs, phylogenetic relationships, chromosome distribution, duplication events, cis-acting elements, and potential protein-protein interaction. Then, their tissue specific expression patterns and induced expression patterns under heat, NaCl and downy mildew stresses were also determined using RNA-seq data.
This study provided comprehensive information on the Hsf and Hsp gene families in cucumber and will aid further research on the functions of CsHsf and CsHsp genes for improving cucumber tolerance to biotic and abiotic stresses.