Genome-wide identification and classification of Zf-BED encoding gene in land plants
The whole-genome screening of Zf-BED-encoding genes in 35 plants demonstrated that three lower land plants do not contain any sequences resembling the Zf-BED gene family. In total, we identified 750 Zf-BED encoding genes, which were distributed across the remaining 32 plant species. Interestingly, the highest number of genes were observed in the cotton plants including Gossypium barbadence (77 genes), G. hirsutum (73 genes), G. arboreum (52 genes), G. herbaceum (36 genes), however Oryza sativa (59 genes) and Zea mays (58 genes) also had large number of genes followed by Aquilegia coerulea (34 genes), Theobroma cacao (32 genes), Medicago truncatula (31 genes), Hordeum vulgare (30 genes), Prunus persica (27 genes), Populus trichocarpa (24 genes), Gossypium turneri (21 genes), Solanum lycopersicum (19 genes), Solanum tuberosum (19 genes), Solanum melongena (18 genes), Capsella rubella (14 genes), Brachypodium distachyon (13 genes), Betula pendula (12 genes), Cucumis sativus (12 genes), Sorghum bicolor (9 genes), Nelumbo nucifera (9 genes), Gossypium raimondii (9 genes), Arabidopsis lyrata (9 genes), Picea abies (8 genes), Vitis vinifera (8 genes), Citrus clementina (8 genes), Gossypium thurberi (8 genes), Spirodela polyrhiza (7 genes), Arabidopsis thaliana (7 genes), Marchantia polymorpha (5 genes) and Amborella trichopoda (2 genes) (Figure 1, Table S1-S2).
Based on the conserved domains, motifs, and their patterns in the primary sequence of proteins, we have classified all 750 genes into 22 major classes named class I-XXII. Of these 22 classes, class I (Zf-BED--DUF-domain--Dimer_Tnp_hAT) and class II (Zf-BED ) were more common in all plant species, containing 390 and 144 genes, respectively, followed by class III (Zf-BED--Zf-BED--DUF-domain--Dimer_Tnp_hAT), IV (Zf-BED--DUF-domain), V (Zf-BED--Dimer_Tnp_hAT), and VI (Zf-BED--PHD). The classification of Zf-BED-encoding genes reported several new domains associated with Zf-BED domains. The vast majority of these domains are involved in plant defense functions such as GRAS domain (gibberellin signaling regulator), PHD (Cys4-His-Cys3 motif in the plant homeodomain), WRKY (DNA-binding domain and biotic and abiotic stress regulator), NBS (Nucleotide-Binding Site; one of the major superfamilies of plant resistance genes), Sina domain (N-terminal RING finger domain), F-box with LRR-motif, etc. We have also observed diversity in the presence and absence of these 22 classes in 32 plant species such as the Marchantia polymorpha, which is a liverwort (bryophyte) and has only five genes i.e., one gene from class V (Zf-BED--Dimer_Tnp_hAT) and four genes from class XIII (GST_N_3--GST_C_3--Zf-BED; Glutathione S-transferase, C-terminal domain). Similarly, in the Picea abies genome, a gymnosperm plant, there were only eight genes, and all belonged to class II (Zf-BED). The Amborella trichopoda, considered a basal species among the flowering plants, has only two genes (one from class I and another from class V). The Spirodela polyrhiza, a monocot near to basal of angiosperm, has seven genes i.e., three genes in class I and one in each class III, IV, V and XXII. In summary, a distinct pattern could be identified among monocots since all studied species, except Oryza sativa, contained Zf-BED genes that belonged to the first six classes (I-VI), while the dicots plants showed a more diverse sampling of the Zf-BED genes across various classes. In the dicot plants, especially in Gossypium sp, the G. raimondii genome exhibited only the first three classes (I-III), which is consistent with its conserved genome, while other Gossypium species showed a large number of genes in other classes. It was also observed that most of the classes were only specific to Gossypium sp. For example, the class VIII (GRAS--Zf-BED--DUF-domain--Dimer_Tnp_hAT), IX (GRAS--Zf-BED--DUF-domain--Dimer_Tnp_hAT--Peptidase_C48) and XI (Zf-BED--DUF-domain--Dimer_Tnp_hAT--Peptidase_C48) were found only in Gossypium sp. (Figure 2, Figure S1, Table S3-S4). Overall, we found that diverse plant species possess different Zf-BED gene classes, highlighting the genomic diversity in land plants.
Evolutionary study of Zf-BED in land plants
To find the evolutionary relationship among the Zf-BED-encoding genes, we selected 25 land plants including Aquilegia coerulea (Aqu), Arabidopsis thaliana (At), Betula pendula (Bpe), Brachypodium distachyon (Bra), Capsella rubella (Car), Citrus clementina (Cic), Cucumis sativus (Cuc), Gossypium arboreum (Gar), Gossypium barbadence (Gba), Gossypium herbaceum (Ghe), Gossypium hirsutum (Ghi), Gossypium thurberi (Got), Gossypium raimondii (Gra), Vitis vinifera (Gsv), Nelumbo nucifera (Nnu), Oryza sativa (OsR), Picea abies (Pab), Solanum tuberosum (Sot), Prunus persica (Pru), Solanum melongena (Sme), Sorghum bicolor (Sob), Solanum lycopersicum (Sol), Spirodela polyrhiza (Spi), Theobroma cacao (Tca), and Gossypium turneri (Gtu), and used their protein sequences to determine comparative genomics relationships, gene duplication events, gene trees, orthogroups, orthologs, putative xenologs, and species trees.
The comparative genomics analyses revealed that all Zf-BEDs were divided into 9 orthogroups (OG0; 330 genes, OG1;168 genes, OG2; 23 genes, OG3; 20 genes, OG4; 15 genes and OG6 to OG8 each with 2 genes) (Figure S3, Table S7) covering 564 genes (99.8% of genes in orthogroups) with only one unassigned gene. Of these orthogroups, only one orthogroup was common to all 26 species, while three orthogroups were species-specific containing only six genes each. Overall, the orthogroup mean and the median were computed as 62.7 and 15 genes, respectively (Figure S2, Table S5).
The species-wise orthogroups distribution showed that all species shared orthogroups (100% genes in orthogroups). Some orthogroups were species-specific; for instance, O. sativa had two species-specific orthogroups containing four genes (6.9% of genes). Similarly, S. melongena had one species-specific orthogroup containing two genes (13.3% of genes) (Figure S4, Table S6, Table S8, and Table S10-13).
The orthogroup duplication events demonstrated that only five orthogroups (OG0 to OG4) passed through duplication events during the evolutionary time scale. The highest duplications were recorded in OG0 (194 duplications) followed by OG1 (84 duplications), OG3 (12 duplications), OG4 (9 duplications), and OG2 (3 duplications events) (Figure 3, Table S9).
For the species-based phylogenetic tree analysis, 26 representative species were included. The phylogenetic tree was divided into three major clades representing two clades of dicots plants and one clade of monocots. The gene duplication event at internal nodes and terminal nodes also demonstrated remarkable gene duplication in higher plants. For instance, in the case of the
monocots clade, the highest gene duplication was observed in O. sativa (40 duplications) followed by P. abies (8 duplications). Similarly, in the dicots clades, the highest duplications are found in tetraploid species of cotton (G. barbadence and G. hirsutum with 29 and 31 gene duplication events). Furthermore, several gene duplication events were also found in the internal nodes, and the highest duplications occurred at N10 (49 duplications) (Figure 3, Table S14). In summary, higher plants experienced more duplication events during evolutionary processes and different plants families carry species-specific orthogroups displaying their unique genetic makeup.
Gene ontologies, conserved residues, and motifs in Zf-BEDs
The gene ontology prediction of all Zf-BED genes indicated that all the Zf-BED genes were involved in limited molecular functions, mainly “DNA binding”, “nucleic acid binding”, and “protein dimerization” activities. However, different classes exhibited diverse functions such as “hydrolase activity”, “regulation of transcription”, and “catabolic process”. The class-wise gene ontology demonstrated their putative roles in different molecular functions. For instance, class I is involved in protein dimerization and nucleic acid-binding. Similarly, class II is implicated in DNA
binding/nucleic acid binding activity. However, some classes were involved in more than then one molecular function, such as class XIII, which is required for DNA binding, metabolic process, aromatic amino acid metabolism, catalytic activity, and protein binding activity. Class XXII contained different domain architectures, and consequently its members engaged in diverse functions including photosynthesis, chlorophyll-binding, RNA-DNA hybrid ribonuclease activity, hydrolase activity, etc. (Figure 4, Table S15). In summary, the Zf-BED gene ontology varied with the addition of associated domains. Therefore, the associated domains appear to have a high impact on the function of Zf-BED proteins.
The multiple sequence alignment of Zf-BED proteins demonstrated highly conserved functional residues in the active sites. However, the lower plants showed highly diverse residues at functional sites. For instance, W7 was conserved in 87% sequences. Similarly, H9, F10, C20-C23 (X2-4 motif), G33, G38, T39, L42, K43, H45 (His motif), L46, C52, D65, P83, F92, E124, W146, C247, D259, L318, W346, P372, K409, and L592 sites were conserved in more than 60% of the Zf-BED proteins (Fig. S5). Most of the sequence variations at these sites were either genus-specific or species-specific. Conserved motif analysis also demonstrated the insertion or deletion of a specific motif in different classes of Zf-BED proteins. We also observed similar motif patterns in the same class members in the same genome. For instance, Class_I of GbaZf-BEDs had similar motifs with the same pattern. The CX2-4C motif (motif-5) was conserved in all classes and all land plants. WX[YH]F (motif_14) at the C-terminus was also found in most of the classes except class XIII, XIV, XV, and XXI. Similarly, motif GTXXLXXH[LT] (motif_13) was also conserved in the majority of the genes. In summary, most of the functional motifs were conserved in more than 90% of the analyzed sequences. However, some motifs were species-specific, while the others were Zf-BED classes-specific (Figure S5-S6).
Homology-based modeling of Zf-BED classes
The homology-based 3D structures of different classes displayed different associations with their ligands. Most of the classes’ structures were associated with zinc ions in addition to DNA. Some other ligands exhibited their tendency to interact with different molecules. We have determined representative 3D structures with their associated ligands for each class. The comparative study provided clear differences in structure, active sides, and ligands. For instance, class VI, VII, XX, and XXI proteins showed their association with zinc ion ligand in addition to some additional small molecules. Whereas, some classes e.g. X, XVI, and XVIII did not show any ligands in their structures. Furthermore, some classes (XV, XII, XIV, and XIX) were associated with DNA molecules. Similarly, class XIII was associated with glutathione (GSH) and NAG molecules (Figure 5).
Evolution of Zf-BED gene family in Gossypium sp.
To provide a detailed and comprehensive analysis of Zf-BED in Gossypium species, we also presented the evolutionary relationship among five Gossypium sp., (G. hirsutum, G. barbadence, G. arboreum, G. herbaceum and G. raimondii) with T. cacao. The comparative genomics summarized that all Zf-BED from 6 species were divided into 15 orthogroups encompassing 179 genes (98.9% of genes in orthogroups). Of these orthogroups, only four orthogroups were common in all species, while two orthogroups were species-specific. Similarly, only four genes were in species-specific orthogroups. Overall, the orthogroup mean and medians were computed as 11.9 and 11 genes, respectively.
The species-wise orthogroups distribution represented that most of the identified genes belonged to orthogroups. For instance, 98.0% of genes were in orthogroups with all genes of Tca, Ghe, Gra, and Gba and the minimum genes in orthogroups were present in Ghi and Gar. The orthogroup sharing analysis demonstrated that the highest number of orthogroups were shared between Ghe and Ghi with Gba. Similarly, the orthologs multiplicity also depicted significant
similarity and uniqueness among six species. The species-based phylogenetic tree showed that Tca is the common ancestor for all Gossypium species, and Ghe has the most conserved Zf-BED encoding genes with Tca, followed by the other cotton four species. The duplication events at each terminal demonstrated that Tca has high duplication, followed by Ghe, Ghi Gar, and Gba, while Gra has experienced only a single duplication event (Figure 6).
Gene features and physio-chemistry of Zf-BED proteins sequence
The analyzed gene features included physio-chemistry and gene-specific attributes. The physio-chemistry included molecular weight, protein charge, isoelectric point, and grand average of hydropathy. Similarly, gene features comprised transcript length, CDS length, GC contents in CDS sequence, number of exons and introns, lengths of exons and introns, etc. We provided a detailed study of the five species. In the case of G. arboreum Zf-BED, the transcript length ranged between 0.25 to 5.5 kb, the coding sequence length ranged from 0.25 to 5.5 kb, the percentage of GC content in the coding sequence was 34% to 46%, exon number of genes of G. arboreum ranged from a minimum of 1 to a maximum of 17 exons, while the mean exon length ranged between 1 kbp to 6 kbp, mean intron length of genes of G. arboreum ranged from 0 kb to 7 kb, longest protein consisted of 1,713 amino acids and the shortest protein had 95 amino acid residues, molecular weight was observed between 10.58 to 197.52 kDa, protein charge range was -20 to +47, the isoelectric point range was 5.15 to 10.28 and the grand average of hydropathy range was -0.75 to-0.21. Similar results were also observed in other Gossypium species (Figure S7-S8).
Mapping of Zf-BED genes chromosomes
To understand the effect of polyploidization on Zf-BED genes between diploid and tetraploid species, a genome-wide comparative distribution of Zf-BED genes on chromosomes was determined in all Gossypium species. Diverse species had a varied number of genes on different chromosomes. In A-like genomes (Gar and Ghe), all genes were unevenly distributed on all chromosomes except Chr#2, while in the D-like genome (Gra), all genes were localized to only six chromosomes. In the case of tetraploid species (Ghi and Gba), 20 chromosomes and 18 chromosomes possessed Zf-BED genes in Ghi and Gba, respectively. However, some chromosomes showed the highest number of genes in all species. For instance, D02 has the highest number of Zf-BED genes in G. barbadense and G. hirsutum, followed by Chr#07 in all species except G. arboreum. Similarly, some chromosomes did not contain any Zf-BED genes, like Ga2, Ga5, Ga7, Ga10 did not bear any Zf-BED genes in G. arboreum. A similar pattern was also observed in G. raimondii (Gr2, Gr3, Gr5, Gr8, Gr10, Gr11, and Gr12), G. barbadence (GbA01, GbA04, GbD01, and GbD10), and in G. hirsutum (GhA01, GhA04, GhA09, GhA10, GhD07, and GhD10). In addition to Gossypium sp., the chromosomal location of the AtZf-BED and ZmZf-BED genes were also mapped on their respective chromosomes. We observed that all AtZf-BED genes were localized only on three chromosomes (AT1, AT3, and AT4) while ZmZf-BED genes were distributed on all chromosomes except chromosome no. 6 (Zm6) (Figure S9).
Cis-acting regulatory elements prediction of Zf-BED genes
The identified cis-acting regulatory elements were classified into different categories including response to hormones, response to stresses, growth, and developments. We observed that most of the gene promoters had gibberellin, abscisic acid, salicylic acid, jasmonic acid, and auxin hormone-responsive elements. Similarly, in stress-responsive elements, we observed low temperature, light, wound, and drought, anoxic and anaerobic responsive factors. Furthermore, the growth and developmental cis-acting elements included zein metabolism regulation, seed-specific regulation, meristem expression, endosperm expression, circadian control, and cell cycle regulation. Some additional elements including MYBHv1 binding site, ATBP-1, MYB binding site involved in flavonoid biosynthetic genes regulation, and CMA3 were also observed in some gene regulatory regions. The comparison of Cis-acting elements among G. arboreum, G. hirsutum, and G. barbadence demonstrated common regulatory elements including gibberellin-responsive element, low-temperature responsiveness, wound-responsive element, light-responsive element, salicylic acid responsiveness, etc. (Table S24-S25).
Expression profiling of Zf-BED genes in Cotton, Maize, and Arabidopsis
The expression profiling of Zf-BED genes showed variation with respect to tissues and stresses. In this study, we have included expression profiling of Zf-BED gene in Gossypium sp, maize, and Arabidopsis in different tissues and stresses. The expression profiling and cluster correlation of Zf-BED in G. arboreum demonstrated multiple clusters in different tissues including fiber, leaf, ovule, flower, stem, and root. For instance, a cluster containing GaZf-BED20, GaZf-BED17, and GaZf-BED31 showed relatively high expression in leaf, flower, stem, and root. Similarly, another cluster (GaZf-BED11, GaZf-BED33, GaZf-BED30, GaZf-BED09, GaZf-BED16) had high transcript abundance in the ovule, flower, stem, and root. Some genes showed high expression levels in stem and root tissues, while other genes displayed little expression in ovule, flower, stem, and root. Interestingly, however, none of the genes displayed any expression in leaf and fiber. In summary, only a few GaZf-BED genes presumably play roles in tissue development (Figure S10, Table S27).
The expression profiling of Zf-BED genes in G. hirsutum in diverse tissues (like fiber, flower, ovule, and leaf at different time intervals) showed different expression patterns of genes. The gene clustering cladogram demonstrated that some genes have a putative role in all tissues. For instance, gene Gh_A08G102900.1 had increased transcript levels in all four tissues. On the other hand, a set of genes exhibited tissue-specific expression such as Gh_D08G097100.1, Gh_D07G157800.1, Gh_A08G102900.1, and Gh-D07G157800.1. Gh_D06G202500.1 had the highest expression in the leaf (Table S28). The stress-specific expression of Zf-BED demonstrated their putative function under different abiotic stresses including cold, drought, heat, salt at different time intervals. A cluster of genes such as Gh_A08G102900.1, Gh_D08G097100.1, Gh_D07G157800.1, Gh_D06G202500.1, and Gh_D09G000500.1 showed high expression in all stresses including cold, drought, heat, and salt (Table S28). However, several genes exhibited tissue-specific expression. Moreover, we observed that a cluster of genes did not show any expression during the applied stresses.
The expression profiling of Z. mays Zf-BED genes also demonstrated tissue-specific and stress-specific. For instance, ZmZf-BED43 (Zm00001d033361) gene is highly expressed only in anther and endosperm, while a cluster of genes including ZmZf-BED43 (Zm00001d013336), ZmZf-BED26 (Zm00001d017846), ZmZf-BED34 (Zm00001d022534), ZmZf-BED38 (Zm00001d026358), and ZmZf-BED02 (Zm00001d003128) possess a co-expression pattern in all tissues (Figure S11). In addition, the abiotic stress response of ZmZf-BED genes also depicted the same pattern. A cluster of genes (ZmZf-BED34, ZmZf-BED45, ZmZf-BED17, and ZmZf-BED26) co-expressed under different abiotic stresses with a variable range of FPKM values (Figure S12). In contrast to biotic and tissues specific stresses, the response of ZmZf-BED was highly variable for biotic stresses. For instance, a pair of genes (ZmZf-BED26 and ZmZf-BED17) displayed high expression under all biotic stresses indicating their key roles in biotic stress regulation in Z. mays. Whereas other ZmZf-BED genes were highly specific e.g. ZmZf-BED34 had high transcript levels under pathogenndisease stresses including Maize Mosaic Virus, Fusarium virgliforme, etc. (Figure S13, Table S29).
The tissue-specific expression profiling of AtZf-BED in Arabidopsis demonstrated that all genes had high relative expression except AtZf-BED05 (AT3G48770; class-II; Zf-BED) and AT1G36095 (class-IV; Zf-BED--DUF-domain) forming a common cluster on the dendrogram. Another cluster including AtZf-BED06 (AT4G15020), AtZf-BED01 (AT1G18560), AtZf-BED03 (AT1G79740), and AtZf-BED04 (AT3G17450) had high transcripts abundance in all tissues (pollen, root, shoot, leaf, seedling, endosperm, meristem, seed, embryo, stem, flower and silique). However, the level of transcripts varied among the tested tissues. In contrast to tissue-specific expression, the biotic stress-specific expression was highly variable from stress sources. For instance, AtZf-BED01 (AT1G18560), AtZf-BED03 (AT1G79740), AtZf-BED04 (AT3G17450), and AtZf-BED06 (AT4G15020) had increased expression when challenged with Verticillium dahliae followed by Fusarium graminearum, Tobacco Mosaic Virus, and bacterial disease. Similarly, under abiotic stresses, AtZf-BED06 (AT4G15020), AtZf-BED03 (AT1G79740), and AtZf-BED04 (AT3G17450) showed heightened transcripts under all stresses including cold, dark, drought, salt, and wounding and water deficit (Figure 7, Table S30).
Experimental verification of Zf-BED genes expression in Arabidopsis and Micro-Tom
To decipher the relationship of Zf-BED family members in Arabidopsis and Micro-Tom in response to biotic and abiotic stress, three Zf-BED genes in Arabidopsis and four orthologous genes in Micro-Tom were selected to examine the transcript level changes under various stresses using qRT-PCR. In Arabidopsis, AtZf-BED01 (AT1G18560) and AtZf-BED03 (AT1G79740) were downregulated upon DC3000 infection compared to HrcC-, while AT4G15020 showed the inverse trend (Figure 8). Under heat treatment, AtZf-BED01 (AT1G18560) transcripts were elevated whereas other Zf-BED genes didn’t display significant changes. Furthermore, AtZf-BED01 (AT1G18560) and AtZf-BED03 (AT1G79740) were upregulated significantly in response to drought stress. In Micro-Tom, all Zf-BED genes showed lower expression levels under DC3000 treatment compared to HrcC-. As for abiotic stress, Solyc08g007470.2, Soly09g005660.3, and Solyc03g119830 expression were upregulated upon drought, while Solyc03g007510.3 were downregulated (Figure 8). Overall, the representative Zf-BED genes in Arabidopsis and Micro-Tom were shown to be involved in biotic and abiotic stresses.