Genome-wide identification of PHD proteins in cotton
Based on the homology of protein sequences, 108, 52, and 55 PHD proteins were identified in three cotton species G. hirsutum, G. arboreum, and G. raimondii, respectively. In addition, 39 and 43 PHD proteins were identified in Arabidopsis and rice, respectively (Table S1). Among 108 GhPHD proteins, 56 members belong to the At subgenome and 52 members belong to the Dt subgenome. The predicted biophysical characteristic of GhPHDs (Table 1) indicates that the length of GhPHD proteins ranges from 159 aa (GhPHD28) to 2231 aa (GhPHD39) with an average length of 741 aa. Moreover, the molecular weight of GhPHD proteins ranges from 17.76 kD (GhPHD28) to 247.42 kD (GhPHD39) with an average value of 93.09 kD. The isoelectric point (pI) of GhPHD proteins ranges from 4.58 (GhPHD38) to 10.41 (GhPHD103) with an average value of 6.89. Furthermore, the predicted subcellular localization indicated that 93 GhPHD proteins are located in nucleus, ten in cytoplasm, and five are extracellular.
Phylogenetic analysis, chromosomal location, and gene duplication
In order to understand the phylogenetic relationship of PHD proteins in rice, Arabidopsis, and cotton, we constructed a NJ phylogenetic tree and classified PHD proteins into five groups (A-E) (Fig. 1). Among them, most of the orthologous PHD proteins between the diploid and allotetraploid cotton are grouped in same clade exhibiting maximum homology in phylogenetic relationship. Each group contains PHD proteins of these five species, of which group A and D are the first and second largest groups, containing 97 and 79 members, respectively. While, there are relatively few PHD members in groups B, C, and E. Chromosome location analysis showed that 108 GhPHD genes are positioned on 26 chromosomes, including 13 chromosomes from the At subgenome and 13 chromosomes from the Dt subgenome (Fig. 2 and Table S2). Deeper insights indicated that At_05, At_07, and Dt_05 chromosomes contain more number of genes (eight GhPHD genes on each) and display a dense distribution at the top. However, some chromosomes contain only two GhPHD genes, such as At_10, At_11, Dt_03, and Dt_11.
We further investigated the whole genome duplication (WGD) event experienced by GhPHD genes. As a result, 73 GhPHD gene pairs depict segmental duplication and four gene pairs show tandem duplication events (Table 2), indicating that WGD is the main contributor of GhPHD gene family expansion. Duplication gene pairs may have undergone three alternative fates during the evolution process, namely non-functionalization, neo-functionalization, and sub-functionalization . In order to study the evolutionary history of GhPHD genes, the Ka/Ks calculator 2.0 is used to calculate the synonymous and non-synonymous substitution rates. The Ka/Ks ratio of 76 duplicated gene pairs is less than 1, indicating that GhPHD genes underwent purification selection pressure with limited functional divergence. However, there is only one gene pair with the Ka/Ks greater than 1, indicating the occurrence of positive selection pressure. Collectively, these results indicated that the great contribution of purification selection pressure in the functional maintenance of GhPHD genes in upland cotton.
Gene structure and conserved motifs analysis
To better understand the similarity and diversity of GhPHD proteins in upland cotton, we analyzed the phylogenetic tree, exon-intron structure, and conserved motif. Phylogenetic tree grouped GhPHD proteins according to protein homology, conserved gene structure, and motif distribution (Fig. 3). GhPHD49 shows the longest genomic sequence with 26 exons, while GhPHD12 displays the shortest genomic sequence with only two exons (Fig. 3 and Table S3). Furthermore, a total of three motifs are identified in all GhPHD proteins, and all GhPHD proteins have a typical PHD domain (i.e., motif 1). Phylogenetic tree showed that 21 GhPHD proteins are clustered in a clade. Except for GhPHD28, all other GhPHD proteins contain three motifs with similar gene structure and motif distribution (Fig. 3).
Protein sequence alignment shows that GhPHD proteins have a typical Cys4-His-Cys3 motif, which consists of about 60 amino acids and is accompanied by nine conserved amino acid residues (Fig. S1). The conserved histidine (H) is separated from the fourth conserved cysteine (C) by four amino acids and two amino acids from subsequent conserved cysteine (C) residue. The third and fourth conserved cysteine (C) before histidine (H) are separated by one or two amino acids, but the interval number between other conserved amino acids is uncertain. However, GhPHD17, GhPHD27, GhPHD71, and GhPHD81 exhibit maximum homology, but show less conserved PHD domain (Fig. 3 and Fig. S1).
Cis-acting element analysis
Many studies have showed that PHD genes are involved in various stress responses [30, 31, 46]. To elucidate the putative function of GhPHDs under different stresses, we first identified the cis-acting elements in the promoter region that respond to stresses and phytohormones. We identified many cis-acting elements that respond to ABA (ABRE), auxin (TGA and AuxRR-core), GA (TATC-box, P-box, CARE, and GARE), ethylene (ERE), SA (TCA), and MeJA (CGTCA). These results indicated that a total of 85 GhPHD genes are responsive to ethylene, followed by ABA, GA, and MeJA. 73 GhPHD genes have cis-acting elements that respond to three or more phytohormones. Interestingly, the promoters of GhPHD5, GhPHD47, GhPHD56, and GhPHD65 genes contain cis-elements that respond to the above six phytohormones. In addition, we found that many abiotic stresses response elements (TC-rich repeat, MBS, and LTR), circadian control elements, and light-responsive elements (G-box) are also present in the promoters of various GhPHD genes (Fig. 4 and Table S5). These results indicated that GhPHD genes may participate in various signal transduction pathways, such as phytohormones, light response, and abiotic stresses, and play important roles in regulating plant growth and development.
Tissue-specific expression pattern of GhPHD genes
To predict the physiological functions of GhPHD genes in cotton growth and development, we used the online transcriptome data to analyze the tissue-specific expression profile of GhPHD genes in different tissues such as root, stem, leaf, petal, stamen, pistil, ovule, and fiber. According to the expression features and hierarchical clustering (Fig. 5), GhPHD genes are mainly clustered into four groups (A-D). The nine GhPHD genes in group A are highly expressed in all tissues, indicating that they may play important roles in plant growth and development. In particular, GhPHD23 and GhPHD77 show maximum expression levels in ovule and fiber tissues, demonstrating that these two genes may be involved in the development of ovule and fiber. Further, 43 GhPHDs in group B show lower expression levels in all tissues, while six GhPHD genes (GhPHD56, GhPHD108, GhPHD40, GhPHD93, GhPHD19, and GhPHD73) are predominantly expressed in the early stage of ovule development, indicating that they may play important roles in ovule and seed development. Moreover, GhPHD genes in group C show higher expression levels in ovule. However, GhPHD genes in group D show poor expression in all observed tissues. These results indicated that GhPHDs may be involved in regulating cotton growth and development, especially in the development of ovule and fiber.
Identification of stress-related PHD genes in upland cotton
Analysis of the transcriptome data showed that 66 GhPHD genes have higher expression levels under heat, cold, salt, and drought treatments (Fig. S2). In order to further estimate the responses of GhPHDs under abiotic stresses, we treated four-week-old cotton seedlings with heat, cold, salt, and drought, and observed the relative expression level of 12 GhPHD genes (Fig. 6). The relative expression level of GhPHD18 is up-regulated under all stresses, indicating that GhPHD18 may be involved in multiple stresses response mechanisms. GhPHD23 is up-regulated only under heat treatment, indicating that GhPHD23 responds positively to heat stimuli. Further, GhPHD34, GhPHD40, and GhPHD43 are up-regulated after heat and salt treatment, while GhPHD80 and GhPHD88 are up-regulated after heat and drought tolerance at various time points. In addition, we found that GhPHD5 is up-regulated against salt and drought, while GhPHD72 and GhPHD107 are up-regulated against salt and heat, respectively. These results indicated that GhPHD genes may be involved in abiotic stress to improve plant tolerance in adverse environments.
Identification of GhPHD genes in response to phytohormones
To further determine whether GhPHD genes respond to phytohormones, we treated four-week-old cotton seedlings with GA, MeJA, IAA, SA, and BL, and identified changes in the relative expression of GhPHD genes (Fig. 7). The relative expression level of GhPHD5 increases significantly after MeJA, IAA, and BL treatment. While GhPHD5 shows higher expression after 0.5 hour after SA treatment indicating that GhPHD5 may respond to multiple phytohormones signal transduction pathway, which is consistent with the fact that GhPHD5 promoter contains cis-acting elements related to multiple phytohormones. GhPHD40 is significantly up-regulated under SA treatment, indicating that GhPHD40 responds positively to SA signal. Similarly, GhPHD43 is significantly up-regulated under all phytohormone treatments, especially under BL. The relative expression levels of GhPHD80 and GhPHD88 reach at peak after 0.5 hour of GA treatment. The relative expression level of GhPHD88 increases gradually under SA treatment. Moreover, GhPHD107 expression significantly increases to the maximum level after one hour of GA, IAA, and BL treatment. These results indicated that GhPHD genes are involved in regulating multiple phytohormone signal transduction pathways.
Co-expression network with functional modules for G. hirsutum and G. arboreum
Gene co-expression network analysis is a network diagram constructed on the basis of similarity of gene expression data, reflecting the relationship of expression regulation between genes . We analyzed the co-expression network of GhPHD genes using ccNET software, and predicted many co-expressed genes and interaction proteins (Table S6). Among these, GhPHD5 is positively co-expressed with a plant-specific DNA ligase, which is related to seed germination and DNA repair. In addition, GhPHD5 is also positively co-expressed with SLOMO protein, which is a F-box protein required for auxin homeostasis and the normal timing of lateral organ initiation at the shoot meristem  illustrating that GhPHD5 may be involved in the regulation of auxin signal transduction pathway, and mediates seed germination and organ formation to regulate plant growth and development. Similarly, GhPHD18 interacts with highly hydrophilic proteins that regulate FLC (Flowering locus C) expression  and shows positively co-expressed with SHAGGY-related kinases involved in meristem organization, indicating that GhPHD18 may affect the flowering time of meristem. Further, GhPHD34 negatively co-expressed with ERF (Ethylene response factor) subfamily B-1, participating in ethylene signaling pathway and responding to abiotic stresses. GhPHD107 positively co-expressed with ARF-GAP and ERF genes, and may be involved in the signal pathways of auxin and ethylene. More interestingly, we predicted many proteins that interact with GhPHD88, such as leucine-rich repeat protein kinase (LRRK), late embryogenesis abundant (LEA) protein, AP2/B3 transcription factor, R2R3 factor, DREB subfamily A-2, cellulose synthase, gibberellin-regulated family protein (GRP), and ethylene response factor (ERF) (Fig. 8A and Table S6), suggesting that GhPHD88 may be involved in many physiological processes such as plant growth and development, phytohormone signal transduction, and stress response. Further, Gene Ontology (GO) analysis of GhPHDs indicated that protein binding and zinc ion binding are the most abundant functional terms (Fig. 8B), which is consistent with the existing results that the cysteine residues exhibit high affinity for zinc ions (Zn2+), and Zn2+-cysteine complexes are key medium for protein structure, catalysis, and regulation .
In summary, GhPHDs were involved in regulating cotton growth and development, especially ovule and fiber development. Further, GhPHDs not only respond to multiple phytohormones signal transduction pathways, but also improve cotton’s tolerance to adverse environments such as heat, salt, and drought. Particularly, GhPHD5, GhPHD80, GhPHD88 are prominent in their responses. Combining the predicted results of co-expressed genes and interacting proteins, we inferred that phytohormones could improve plant tolerance to abiotic stresses through GhPHD genes and their cofactors, but their regulatory mechanism and interaction network still need further research.