Identification of RPD3 genes in nine species
In this study, a total of 108 RPD3 protein sequences from nine species were identified after eliminating redundant sequences, and they are named by the position on the chromosome. The corresponding relationship between gene ID number and gene name is shown in Additional file 1: Table S1. A total of 18 genes (GhHDA1-GhHDA18) containing Hist_deacetyl (PF00850) domains were identified from G. hirsutum; 9 genes were located on the At genome, and 9 genes were mapped on the Dt genome. Furthermore, 18 genes (GbHDA1-GbHDA18) from G. barbadense, 9 genes (GaHDA1-GaHDA9) from G. aboreum, and 9 genes (GrHDA1-GrHDA9) from G. raimondii were detected. Tetraploid cotton possessed twice as many RPD3 genes as diploid cotton, indicating that no RPD3 cotton gene was lost in the process of polyploidy. The numbers of RPD3 genes in the other five species were 10 (Arabidopsis), 14 (Oryza sativa L.), 11 (Populus trichocarpa), 8 (Theobroma cacao), and 11 (Zea mays L.). The GhRPD3 protein length ranged from 232 to 635 aa with an average of 459 aa. The physicochemical parameters showed that the isoelectric point (pl) of GhRPD3 proteins varied from 4.47 to 8.65 with an average value of 5.68, and the molecular weight of GhRPD3 proteins varied from 25.79 to 73.01 kDa with an average value of 51.21 kDa. The subcellular localization results indicated that most of the GhRPD3 genes were located in cytoplasmic (10) or nuclear (8), suggesting that GhRPD3 genes might possess multiple regulatory functions (Table 1). The predicted length, pI, MW and subcellular localization of the RPD3 proteins in other eight species are shown in Additional file 1: Table S1.
Phylogenetic analysis of the RPD3 gene family
A total of 108 identified RPD3 protein sequences from G. raimondii (9), G. arboreum (9), G. hirsutum (18), G. barbadense (18), A. thaliana (10), T. cacao (8), Oryza sativa (14), Zea Mays (11) and P. trichocarpa (11) were employed to construct an unrooted phylogenetic tree using the neighbor-joining method for investigating the evolutionary relationships of RPD3 proteins. The RPD3 proteins were phylogenetically classified into 4 subfamilies (Class I, Class II, Class III, and unclassified) according to the formulated subfamilies in Arabidopsis [7]. The Class I subgroup was the largest subfamily with 49 RPD3 genes, whereas the Class III subgroup has the fewest members, only containing one gene in seven diploid species genomes and two genes in two tetraploid cotton genomes (Fig. 1). Among these four classes, each subfamily contained RPD3 genes from all nine species, indicating this gene family was relatively conserved in different species during evolution.
Exon-intron structure and conserved motif analysis
The domains of the RPD3 sequences in cotton were investigated and exhibited according to the results of the SMART database using TBtools, revealing that all cotton RPD3 genes contained a Hist_deacetyl domain (Additional file 2: Table S2 and Additional file 3: Figure S1). An unrooted phylogenetic tree with the predicted cotton RPD3 genes was constructed (Fig. 2a), and then exon-intron structure (Fig. 2b) and conserved motifs (Fig. 2c) were analyzed to better understand the similarity and differences of cotton RPD3 members. The results showed that the length of RPD3 cotton genes was relatively conserved in Class I and Class III, but there were twelve longer sequences in Class II and the unclassified group. The RPD3 cotton genes included from 3 to 17 exons and most RPD3 genes (48/54) contained more than five exons (Additional file 4: Table S3), which might be associated with the diversification of their functions. In terms of the distribution of motifs, most RPD3 cotton genes belonging to the same subfamily showed a similar motif composition, except in the unclassified group (Fig. 2c). Most Class I subfamily members contained 9 motifs, whereas GrHDA5 and GhHDA4 contained 4 and 6 motifs, respectively. Class III subfamily genes had three or four motifs, and most Class II subfamily members possessed 7 motifs, except for GhHDA12 with 6 motifs. There were differences in the exon-intron structure and motif arrangement among the four categories, but they were highly conserved on the same branches, indicating that the RPD3 members classified into the same branch might perform a relatively conserved function in cotton growth and development.
Chromosomal distribution, gene duplication and selection pressure
The chromosomal distributions of GrRPD3, GaRPD3, GbRPD3, and GhRPD3 genes were visualized according to the genomic position of 54 cotton RPD3 genes (Additional file 5: Table S4 and Fig. 3). In G. hirsutum, 18 GhRPD3 genes were unevenly mapped on 13 chromosomes. A03 contained the most GhRPD3 genes (3), whereas the other 12 chromosomes only contained one or two GhRPD3 genes (Fig. 3a). The chromosomal distribution of 18 GbRPD3 genes in G. barbadense was similar to that of GhRPD3 genes in G. hirsutum (Fig. 3b). In G. arboreum, 9 GaRPD3 genes were unevenly located on 6 chromosomes. Chr01 and Chr13 contained three and two GaRPD3 genes, respectively, and the other 4 chromosomes contained only one GaRPD3 gene (Fig. 3c). In G. raimondii, the chromosomal distribution of 9 GrRPD3 genes was highly consistent with the corresponding D subgenome of G. hirsutum (Fig. 3d), showing conserved numbers and chromosomal distribution of RPD3 genes between diploid and tetraploid cotton species. In addition, the lopsided chromosomal distribution of the cotton RPD3 genes indicated that genetic variation occurred during evolution. Notably, most of the RPD3 genes were distributed on the opposite ends of the chromosomes in four cotton species (Fig. 3).
In general, tandem and segmental duplication are two of the main reasons for gene family generation during evolution [21]. The analysis of gene duplication indicated that all RPD3 family members were amplified only through segmental duplication (Additional file 6: Table S5), suggesting that segmental duplication played a vital role in the evolution of the RPD3 gene family. The homologous gene pairs obtained by collinearity analysis among RPD3 genes in G. arboreum, G. raimondii, and G. hirsutum were visualized using circular maps (Fig. 4). The Ka/Ks ratios of most homologous gene pairs were lower than one, indicating that purified selection was essential during the evolution of cotton RPD3 genes, whereas the Ka/Ks ratios of two gene pairs (GhHDA2/GaHDA3 and GhHDA6/GaHDA6) were more than 1, suggesting that these two pairs might have experienced positive selection pressure. The study also predicted the occurrence time of segmentally duplicated RPD3 gene pairs by the formula “ t=Ks/2r” (r=2.6X10-9) [22]. Except for the GhHDA6/GaHDA6 gene pair, the other segmental duplication events of three cotton species might have occurred 0.6 to 144.44 million years ago (MYA) with an average time of 18.39 million years ago (Additional file 6: Table S5).
Analysis of cis-elements in predicted promoter regions of GhRPD3
To explore the possible regulatory functions of GhRPD3 genes under various environmental stresses and hormone regulation pathways, the 2,000-bp promoter regions of 18 GhRPD3 genes were submitted to the PlantCARE database for the identification of putative stress-associated and plant hormone-related cis-elements. A total of 9 kinds of elements related to plant hormones, containing AuxRE-core (auxin), TGA-element (auxin), P-box (gibberellin), TATC-box (gibberellin), GARE-motif (gibberellin), CGTAC-motif (MeJA), TGACG-motif (MeJA), TCA-element (SA), and ABRE (ABA), and 4 kinds of elements responding to stresses, including TC-rich repeats (defense and stress responsiveness), MBS(drought), WUN-motif (wound) and LTR (cold stress), were predicted in the promoters of GhRPD3 genes. As shown in Fig. 5, the promoters of some GhRPD3 genes contained various hormone-responsive and stress-responsive components, such as GhHDA2 (2 MBS, 2 LTR, 2 TC-rich repeats, 1 GARE-motif, 2 ABRE, 1 TGACG-motif) and GhHDA13 (1 MBS, 1 LTR, 1 TC-rich repeats, 1 AuxRE-core, 2 GARE-motif, 1 TCA-element, 4 ABRE, 2 TGACG-motif). Among the 18 GhRPD3 genes, there are large numbers of light-responsive elements distributed in their promoter regions (Additional file 7: Table S6). In addition, MeJA-responsive and ABA-responsive elements are more common than other hormone-related elements (Additional file 8: Figure S2). These results revealed that GhRPD3 genes might be involved in MeJA and ABA hormone signaling pathways as well as response to environmental stresses.
Expression profiles of GhRPD3 genes in different tissues and under different abiotic stresses
To understand the potential functions of GhPRD3 genes in the growth and development of cotton, we studied their expression in various cotton tissues, including the anther, pistil, bract, sepal, petal, filament, torus, root, leaf, stem, ovules, and fibers, using publicly available transcriptomic data provided by Hu et al. [23]. Transcripts of all the GhRPD3 genes were detected in at least three tissues with fragments per kilobase million (FPKM) ≥ 1. Furthermore, ten genes exhibited high expression levels in all selected tissues (Additional file 9: Table S7). These results indicated that GhRPD3 genes are widely expressed in both reproductive organs and vegetative organs and thus might have multiple biological functions. After log2-conversion of FPKM values, the expression profiles of GhRPD3 genes in different tissues are shown in Fig. 6a. Seven GhRPD3 genes exhibited relatively high expression levels in at least eight tissues (log2-transformed FPKM value≥2.6); in particular, one pair of homologous genes (GhHDA1/GhHDA10) showed a high expression level in all the tissues with a similar expression pattern. Nevertheless, three GhRPD3 genes (GhHDA4, GhHDA14, GhHDA18) were expressed at relatively low levels in at least eight tissues (log2-transformed FPKM value<1), of which GhHDA14 showed the lower expression in all tissues except for the pistil. These homologous gene pairs (GhHDA1/GhHDA10, GhHDA4/GhHDA11, GhHDA2/GhHDA13, GhHDA6/GhHDA15, GhHDA7/GhHDA16, and GhHDA9/GhHDA18) were located on At and Dt subgenomes and exhibited similar expression patterns. For example, homologous gene pairs (GhHDA4/GhHDA11 and GhHDA9/GhHDA18) showed relatively low expression in all twelve tissues. The gene pair GhHDA2/GhHDA13 exhibited relatively high expression in torus and ovule but relatively low expression in petals (Fig. 6a).
Based on the analysis of cis-elements in promoter regions and previous reports on RPD3 genes in other plants, GhRPD3 gens might respond to abiotic stresses. To test this hypothesis, we investigated the expression characteristics of 18 GhRPD3 genes under cold, heat, PEG, and salt treatments using available transcriptomic data [23] (Fig. 6b). The expression of most GhRPD3 genes were induced by the four stresses to varying degrees. GhHDA1, GhHDA2, GhHDA6, GhHDA10, GhHDA12, and GhHDA18 showed upregulated expression under four stress treatments. However, one gene (GhHDA4) exhibited marked downregulation in the presence of the four abiotic stresses. Some genes can respond to one specific abiotic stress. For example, the expression of GhHDA13 and GhHDA16 was significantly induced by PEG treatment. Four genes (GhHDA7, GhHDA11, GhHDA5) showed upregulated expression under heat treatment. The expression of GhHDA9 was significantly upregulated under cold and salt treatments. According to the results, we can conclude that GhRPD3 genes play an essential role in response to abiotic stresses.
Characterization of GhRPD3 genes expression during flower bud differentiation
To explore expression differences of GhRPD3 genes between early-maturity and late-maturity cottons during flower bud differentiation, we selected nine genes showing relatively high expression in floral organ tissues (anther, pistil, bract, sepal, petal, filament and torus) for qRT-PCR. The buds of an early-maturity variety (CCRI50) and a late-maturity variety (GX11) from the one-leaf to five-leaf stage were used for qRT-PCR (Fig. 7). The results revealed that more than half of these genes (5/9) possessed relatively higher expression in early-maturity cotton compared with late-maturity cotton during flower bud differentiation. GhHDA5 showed marked differences at the two-leaf and three-leaf stages, and these two stages were regarded as the important period of flower bud differentiation. A homologous gene pair (GhHDA6/GhHDA15) located on At and Dt respectively, showed the same expression trend. Both of them presented the highest expression at three-leaf stage and then exhibited downregulated expression in next two stages in CCRI50. In addition, all nine genes showed relatively higher expression at the two-leaf or three-leaf stage in CCRI50 compared with GX11. The results showed that GhRPD3 genes are associated with the early maturity of cotton.
Responses of GhRPD3 genes to MeJA and ABA treatment
MeJA and ABA play important roles in plant stress resistance. To further explore the possible functions of GhRPD3 genes, we selected the GhRPD3 genes containing MeJA- and ABA-responsive elements in the predicted promotors to analyze their expression characteristics under MeJA and ABA treatment by qRT-PCR (Fig. 8 and Fig. 9). Most GhRPD3 genes (8/13) were markedly upregulated at 9 h after MeJA treatment. Three genes (GhHDA7, GhHDA13, and GhHDA18) exhibited significantly upregulated expression at three or more time points, whereas four genes (GhHDA2, GhHDA8, GhHDA9, and GhHDA11) showed marked transcriptional downregulation at least three time points after MeJA treatment (Fig. 8). More than half of the GhRPD3 genes (6/11) were significantly upregulated at 9 h after ABA treatment. Three GhRPD3 genes (GhHDA14, GhHDA15, and GhHDA18) showed relatively high expression at three or more time points, whereas three GhRPD3 genes (GhHDA10, GhHDA11, and GhHDA17) showed early downregulated and then upregulated expression patterns under ABA treatment (Fig. 9). The results showed that the exogenous application of MeJA and ABA significantly induced the transcription of most GhRPD3 genes containing MeJA-responsive and ABRE elements in their promoter regions.