Genome-wide identification of PHD proteins in cotton
Based on the homology of protein sequences, 108, 52, and 55 PHD proteins were identified in three cotton species G. hirsutum, G. arboreum, and G. raimondii, respectively. In addition, 39 and 43 PHD proteins were identified in Arabidopsis and rice, respectively (Table S1). Among 108 GhPHD proteins, 56 members belong to the At subgenome and 52 members belong to the Dt subgenome. The predicted biophysical characteristic of GhPHDs (Table 1) indicated that the length of GhPHD proteins ranges from 159 aa (GhPHD28) to 2231 aa (GhPHD39) with an average length of 741 aa. Moreover, the molecular weight of GhPHD proteins ranges from 17.76 kD (GhPHD28) to 247.42 kD (GhPHD39) with an average value of 93.09 kD. The isoelectric point (pI) of GhPHD proteins ranges from 4.58 (GhPHD38) to 10.41 (GhPHD103) with an average value of 6.89. Furthermore, the predicted subcellular localization indicated that 93 GhPHD proteins were located in nucleus, ten in cytoplasm, and five were extracellular.
Phylogenetic analysis, chromosomal location, and gene duplication
In order to understand the phylogenetic relationship of PHD proteins in rice, Arabidopsis, and cotton, we constructed a NJ phylogenetic tree and classified PHD proteins into five groups (A-E) (Fig. 1). Among them, most of the orthologous PHD proteins between the diploid and allotetraploid cotton were grouped in same clade exhibiting maximum homology in phylogenetic relationship. Each group contains PHD proteins of these five species, of which group A and D were the first and second largest groups, containing 97 and 79 members, respectively. While, group B, C, and E exhibited relatively fewer PHD members. Chromosome location analysis showed that 108 GhPHD genes were positioned on 26 chromosomes, including 13 chromosomes from the At subgenome and 13 chromosomes from the Dt subgenome (Fig. 2 and Table S2). Deeper insights indicated that At_05, At_07, and Dt_05 chromosomes contained more number of genes (eight GhPHD genes on each) and displayed a dense distribution at the top. While, some chromosomes such as At_10, At_11, Dt_03, and Dt_11 contained only two GhPHD genes on each.
We further investigated the whole-genome duplication (WGD) event experienced by GhPHD genes. As a result, 73 GhPHD gene pairs depicted segmental duplication and four gene pairs showed tandem duplication events (Table 2), indicating that WGD is the main contributor of GhPHD gene family expansion. Duplication gene pairs may have undergone three alternative fates during the evolution process, namely non-functionalization, neo-functionalization, and sub-functionalization . In order to study the evolutionary history of GhPHD genes, the Ka/Ks calculator 2.0 program was used to calculate the synonymous and non-synonymous substitution rates. The Ka/Ks ratio of 76 duplicated gene pairs was less than 1, indicating that GhPHD genes underwent purification selection pressure with limited functional divergence. However, only one gene pair Ka/Ks ratio was greater than 1 indicating the occurrence of positive selection pressure. Collectively, these results indicated that the great contribution of purification selection pressure in the functional maintenance of GhPHD genes in upland cotton.
Gene structure and conserved motifs analysis
To better understand the similarity and diversity of GhPHD proteins in upland cotton, we analyzed the phylogenetic tree, exon-intron structure, and conserved motif. Phylogenetic tree grouped GhPHD proteins according to protein homology, conserved gene structure, and motif distribution (Fig. 3). GhPHD49 showed longest genomic sequence with 26 exons, while GhPHD12 displayed shortest genomic sequence with only two exons (Fig. 3 and Table S3). Furthermore, a total of three motifs were identified in all GhPHD proteins, and all GhPHD proteins have a typical PHD domain (i.e., motif 1). Phylogenetic tree showed that 21 GhPHD proteins were clustered in a clade. Except for GhPHD28, all other GhPHD proteins contain three motifs with similar gene structure and motif distribution (Fig. 3).
The protein sequence alignment results showed that GhPHD proteins have a typical Cys4-His-Cys3 motif that is composed of about 60 amino acids, has along with nine conserved amino acid residues (Fig. S1). The conserved histidine (H) was separated from the fourth conserved cysteine (C) by four amino acids and two amino acids from subsequent conserved cysteine (C) residue. The third and fourth conserved cysteine (C) before histidine (H) were separated by one or two amino acids, but the interval number between other conserved amino acids was uncertain. However, GhPHD17, GhPHD27, GhPHD71, and GhPHD81 exhibited maximum homology, however, showed less conserved PHD domain (Fig. 3 and Fig. S1).
Cis-acting element analysis
Many studies showed that PHD genes are involved in various stress responses [41-43]. To elucidate the putative function of GhPHDs under different stresses, we first identified the cis-acting elements in the promoter region that respond to stresses and phytohormones. We identified many cis-acting elements that respond to ABA (ABRE), auxin (TGA and AuxRR-core), GA (TATC-box, P-box, CARE, and GARE), ethylene (ERE), SA (TCA), and MeJA (CGTCA). Results showed that a total of 85 GhPHD genes responding to ethylene, followed by ABA, GA, and MeJA. 73 GhPHD genes showed cis-acting elements that respond to three or more phytohormones. Interestingly, the promoters of GhPHD5, GhPHD47, GhPHD56, and GhPHD65 genes contain cis-elements that respond to all above six phytohormones. In addition, we found that many abiotic stresses response elements (TC-rich repeat, MBS, and LTR), circadian control elements, and light-responsive elements (G-box) were also present in the promoters of various GhPHD genes (Fig. 4 and Table S5). These results indicated that GhPHD genes may involved in multiple signal transduction pathways, phytohormones, light response, abiotic stresses, and play important roles in regulating plant growth and development.
Tissue-specific expression pattern of GhPHD genes
To predict the physiological functions of GhPHD genes in cotton growth and development, we used the online transcriptome data to analyze the tissue-specific expression profile of GhPHD genes in different tissues such as root, stem, leaf, petal, stamen, pistil, ovule, and fiber. According to the expression features and hierarchical clustering (Fig. 5), GhPHD genes were mainly clustered into four groups (A-D). The nine GhPHD genes in group A exhibited high expression in all tissues, indicating that these genes are involved in plant growth and development. In particular, GhPHD23 and GhPHD77 showed maximum expression levels in ovule and fiber tissues, demonstrating that these two genes are involved in the development of ovule and fiber. Further, 43 GhPHDs in group B showed lower expression levels in all tissues, while six GhPHD genes (GhPHD56, GhPHD108, GhPHD40, GhPHD93, GhPHD19, and GhPHD73) were predominantly expressed in the early stage of ovule development, indicating that they may play important roles in ovule and seed development. Moreover, GhPHD genes in group C showed higher expression levels in ovule. However, GhPHD genes in group D showed poor expression in all observed tissues. These results indicated that GhPHDs may be involved in regulating cotton growth and development, especially in the development of ovule and fiber.
Identification of stress-related PHD genes in upland cotton
Analysis of the transcriptome data showed that 66 GhPHD genes had higher expression levels under heat, cold, salt, and drought treatments (Fig. S2). In order to further estimate the responses of GhPHDs under abiotic stresses, we treated four-weeks-old cotton seedlings with heat, cold, salt, and drought, and observed the relative expression level of 12 GhPHD genes (Fig. 6). The relative expression level of GhPHD18 was up-regulated under all stresses, indicating that GhPHD18 may be involved in multiple stresses response mechanisms. GhPHD23 was up-regulated only under heat treatment, indicating that GhPHD23 respond positively to heat stimuli. Further, GhPHD34, GhPHD40, and GhPHD43 were up-regulated after heat and salt treatment, while GhPHD80 and GhPHD88 were up-regulated after heat and drought tolerance at various time points. In addition, we found that GhPHD5 was up-regulated against salt and drought, while GhPHD72 and GhPHD107 were up-regulated against salt and heat, respectively. These results indicated that GhPHD genes may be involved in abiotic stress responses to improve plant tolerance in adverse environments.
Identification of GhPHD genes in response to phytohormones
To further determine whether GhPHD genes respond to phytohormones, we treated four-week-old cotton seedlings with GA, MeJA, IAA, SA, and BL, and observed changes in the relative expression of GhPHD genes (Fig. 7). The relative expression level of GhPHD5 increased significantly after MeJA, IAA, and BL treatment. While GhPHD5 exhibited higher expression after 0.5 hour after SA treatment indicating that GhPHD5 may respond to multiple phytohormones signal transduction pathways, which is consistent with the fact that GhPHD5 promoter contains cis-acting elements related to multiple phytohormones. GhPHD40 was significantly up-regulated under SA treatment, indicating that GhPHD40 respond positively to SA signals. Similarly, GhPHD43 expression was significantly up-regulated under all phytohormone treatment, in particular under BL. The relative expression levels of GhPHD80 and GhPHD88 reached at peak after 0.5 hour of GA treatment. The relative expression level of GhPHD88 increased gradually under SA treatment. Moreover, GhPHD107 expression significantly increased to the maximum level after one hour of GA, IAA, and BL treatment. These results indicated that GhPHD genes are involved in regulating multiple phytohormone signal transduction pathways.
Co-expression network with functional modules for G. hirsutum and G. arboreum
Gene co-expression network analysis is a network diagram constructed on the basis of similarity of expression data between genes, which reflect the relationship of expression regulation between genes . We analyzed the co-expression network of GhPHD genes using ccNET software, and predicted many co-expressed genes and interaction proteins (Table S6). Among these, GhPHD5 was positively co-expressed with a plant-specific DNA ligase, which is related to seed germination and DNA repair. In addition, GhPHD5 was also positively co-expressed with SLOMO protein, which is a F-box protein required for auxin homeostasis and the normal timing of lateral organ initiation at the shoot meristem  illustrating that GhPHD5 may be involved in the regulation of auxin signal transduction pathway, and mediate seed germination and organ formation to regulate plant growth and development. Similarly, GhPHD18 interacted with highly hydrophilic proteins that regulate FLC (Flowering locus C) expression  and showed positively co-expressed with SHAGGY-related kinases involved in meristem organization, indicating that GhPHD18 may affect the flowering time of meristem. Further, GhPHD34 negatively co-expressed with ERF (Ethylene response factor) subfamily B-1, participating in ethylene signaling pathway and responding to abiotic stresses. GhPHD107 positively co-expressed with ARF-GAP and ERF genes, and may be involved in the signal pathways of auxin and ethylene. More interestingly, we predicted many proteins that interact with GhPHD88, such as leucine-rich repeat protein kinase (LRRK), late embryogenesis abundant (LEA) protein, AP2/B3 transcription factor, R2R3 factor, DREB subfamily A-2, cellulose synthase, gibberellin-regulated family protein (GRP), and ethylene response factor (ERF) (Fig. 8A and Table S6), suggesting that GhPHD88 may be involved in many physiological processes such as plant growth and development, phytohormone signal transduction, and stress response. Further, Gene Ontology (GO) analysis of GhPHDs indicated that protein binding and zinc ion binding were the most abundant functional terms (Fig. 8B), which was consistent with the existing results that the cysteine residues exhibit high affinity for zinc ions (Zn2+), and Zn2+-cysteine complexes are key medium for protein structure, catalysis, and regulation .
In summary, GhPHDs were involved in regulating cotton growth and development, especially ovule and fiber development. Further, GhPHDs not only respond to multiple phytohormones signal transduction pathways, but also improve cotton’s tolerance to adverse environments such as heat, salt, and drought. Particularly, GhPHD5, GhPHD80, GhPHD88 were prominent in their responses. Combining the predicted results of co-expressed genes and interacting proteins, we inferred that phytohormones can improve plant tolerance to abiotic stresses through GhPHD genes and their cofactors, but their regulatory mechanisms and interaction networks need further study.