Identification and analysis of AREB/ABF family in plants
Based on the 34 genomes listed in the Phytozome database, we performed a genome-wide BLAST search using Arabidopsis ABF1, AREB1/ABF2, AREB2/ABF4, and ABF3 amino acid sequences. We found the candidate ABFs in only 29 land plants, including moss, lycophyte, monocots, and eudicots. Among the ABF sequences we identified, some proteins had shorter amino acid residues (fewer than 200 amino acids). These short sequences were eliminated from subsequent analyses. In the end, 190 ABF-like sequences were collected for further analysis. We subjected these 190 protein sequences to SMART and Pfam analyses, and all of them were classified into the protein family containing bZIP domains (Pfam: 00170).
Previous studies have reported that the plant group-A bZIP family proteins can be phylogenetically clustered into two major groups, the AREB/ABF and the ABI5/AtDPBF subfamilies [14]. As such, we constructed a maximum likelihood (ML) tree, using 190 full-length ABF-like gene sequences (Additional file 1: Figure S1). Our results show that these ABF-like sequences are divided into two major clades, designated as group A and B, each having 95 identified sequences. Group A contains all experimentally characterized AREB/ABFs, including Arabidopsis, Thellungiella salsuginea, and rice ABFs [14-15]. According to previously characterized genes, such as ABI5/AtDPBF1, AtDPBF2, AREB3/AtDPBF3 and EEL/AtDPBF4 [14], group B was classified as ABI5/AtDPBF subfamilies. Therefore, group A sequences are designated ABF and were included for further analyses (Additional file 2: Table S1). The number of ABFs in each species is shown in Fig.1. In summary, the moss Physcomitrella patens and the lycophyte Selaginella moellendorffii have two copies of the ABFs. In monocots, all species contain only four copies of ABFs, with the exception of wheat, maize and Panicum virgatum. The ABF copy number differed, from one to seven, in eudicots. This indicates that several duplication incidents took place. The quantity of ABF paralogs in rice, Arabidopsis, and Thellungiella observed by this study are in line with previous research [14-15].
We further analyzed protein length, molecular mass, and the pI values of 95 ABF proteins (Additional file 2: Table S1). According to our results, the length and molecular mass of ABFs ranged from 254 to 485 amino acid residues, and 27.81 to 52.95 kD, with a mean of 389 amino acid residues and 42.16 kD. Aquilegia coerulea_ABF1 is the longest and largest ABF (485 amino acid residues and 52.95 kD), while Citrus sinensis_ABF3 is the shortest and smallest ABF (254 amino acid residues and 27.81 kD). Zea mays_ABF5 has the lowest pI value, with 5.44, while Citrus sinensis_ABF3 has the highest value, with 10.42. ABFs in clades III, VI and VII have very close pI values, while the pI values of ABFs in clades I, II, IV, and V varied widely. Interestingly, ABFs from clade V displayed a tendency to maintain acidic pI values, with an average of 6.97, while more alkaline pI values (greater than 7) were observed in 83 out of 95 ABFs belonging to other clades (Table 1; Additional file 2: Table S1).
Phylogenetic and structural analysis of plant ABFs
In order to better understand the evolutionary relationship of AREB/ABF members in land plants, we further constructed an ML tree using full-length protein sequences of 95 ABFs. According to support values (85% or greater) of the phylogenetic tree, ABFs can be divided into seven clades (clades I to VII) (Fig. 1; Fig. 2). Inside the phylogenetic tree, ABFs from the moss Physcomitrella patens and the lycophyte Selaginella moellendorffii form two independent clades, assigned as clades I (P.patens_ABFs) to II (S.moellendorffii_ABFs). The monocots can be placed into the next two clades, IV and V. The eudicots can be divided into three clades: III, VI, and VII.
It is worth mentioning that the phylogenetic tree (Fig. 2) aligns with the species tree shown in Phytozome (Fig. 1) with the ABFs from moss (Physcomitrella patens) and lycophyte (Selaginella moellendorffii) forming the two basal lineages of land plants. Monocot and eudicot ABFs are closer on the phylogenetic tree and form two monophyletic clades. To further investigate the accuracy of the ABF phylogenetic tree, we analyzed the exon/intron organization for each individual gene (Additional file 3: Figure S2). Of the 95ABFs, one has one exon; three have two exons; four have three exons; 71 have four exons, 13 have five exons, two have six exons, and two have seven exons. Within each clade, the gene structure of ABFs is relatively conserved, and the adjacent ABFs have a similar exon/intron structure. We then investigated the intron phases of all ABF gene structures. There are three categories of intron phase: phase 0 intron, phase 1 intron, and phase 2 intron. Our analysis indicated that the intron phase patterns (0, 0, 0) and (0, 0, 0, 0) are the predominant patterns across 95 land plant ABFs (Additional file 3: Figure S2). This analysis indicated that we have constructed a phylogenetic tree of the ABF genes in land plants that is highly accurate.
Motif composition and arrangement of plant ABFs
In order to better understand the phylogenetic relationships between plant ABFs, we aligned all of the ABF sequences to better identify the conserved amino acid residues. Based on the alignment, 35 amino acid residues are completely conserved in 88 ABFs (except for eight shorter ABFs). We further identified the conserved motifs in 95 plant ABFs using the SMART program. Finally, we found fiveconserved protein motifs in all ABFs, which are BRLZ domain and the other four low complexity regions (LCR 1-4; Fig. 3A; Additional file 4: Figure S3). ABFs belong to the basic-leucine zipper (bZIP) domain transcription factor family, and we found that the BRLZ domains are highly conserved in all plant sequences (Fig. 4).
However, the ability of SMART to comprehensively identify the motifs present in ABFs is limited, so we used the MEME program to identify conservation and variation in the motif arrangements among ABFs. We identified 20 distinct motifs in ABFs. The occurrences and arrangements of the motifs in ABFs from seven major clades are shown in Fig. 3B and Additional file 5: Figure S4. Among 20 motifs, 8 motifs are shared by all ABFs, which are components of the BRLZ domain (motif 1 and 2) and the other four conserved low complexity regions (motif 3 and 6 for LCR1, motif 5 for LCR2, motif 4 for LCR3, and motif 7 for LCR4). Next, we examined the non-conserved motif composition in land plant ABFs. We then split the ABFs into four regions, based on the location of the LCR motifs and the BRLZ domain (Fig. 3B): Region 1 is the part before the LCR1, Region 2 is the part between LCR1 and LCR2, Region 3 is the part between LCR3 and the BRLZ domains (there were no motifs between LCR2 and LCR3), and Region 4 is the part between the BRLZ domain and LCR4. Of these four, Regions 2 and 4 are highly conserved in plants on land (they are mainly comprised of motifs 15 and 16). Less conserved is Region 1, which is primarily comprised of motif 11 in clades III, VI, and VII. Region 3 is the most divergent region: motif 8 was observed in clades I, III, IV, V, VI, and VII; motifs 9 and 10 were found in clades III, IV, VI, and VII; motifs 12 and 17 were found in clades III, V, VI, and VII; motifs 13 and 14 were found in clades III, VI,and VII; motifs 18 were found in clades III and VII; motif 19 was found in clades IV and VI; and motif 20 was found in clade V (Fig. 3B). Taken together, the conserved and non-conserved motif patterns of plant ABFs that we identified match the pattern of clades in the phylogenetic tree.
Expression analysis of plant ABF genes
To obtain the expression profiles of Arabidopsis ABFs, we extracted the expression data from the Arabidopsis eFP Browser (http://bar.utoronto.ca/efp/cgi-bin/efpWeb. cgi). We found that the expression of Arabidopsis ABF paralogs displayed tissue differentiation. For example, A.thaliana_ABF1 displayed significantly higher expression in roots, and A.thaliana_ABF2 displayed significantly higher expression in seeds, indicating that ABF paralogs have followed the trend of tissue subfunctionalization. We found that ABF paralogs in clade III (A.thaliana_ABF2) have higher expression levels than clade VI paralogs (A.thaliana_ABF1, A.thaliana_ABF3; Additional file 6: Figure S5A). We next investigated the expression profiles of other plant ABF genes. Our results demonstrated that soybean (Glycine max) and common bean (Phaseolus vulgaris) ABF paralogs are expressed more in leaves, roots, and flowers than in other tissues, and that ABF paralogs in clade III (G.max_ABF3, P.vulgaris_ABF2) have higher expression levels than clade VII paralogs (G.max_ABF1, G.max_ABF1, P.vulgaris_ABF1) (Additional file 6: Figure S5B and C). Within monocots, we studied the expression of the four ABF paralogs from rice (Oryza sativa), and found that O.sativa_OsABF2 (clade III) had higher expression levels than O.sativa_TRAB1, O.sativa_OsAREB1, and O.sativa_OsAREB2 (Additional file 6: Figure S5D). The five ABFs in maize (Zea mays) display similar expression patterns among tissues, except for Z.mays_ABF3, which is less expressed among all tissues (Additional file 6: Figure S5E). The expression divergence of plant ABFs indicated the functional differentiation of ABFs.
We also investigated the expression of Arabidopsis ABFs under abiotic stresses using microarray expression data. The results showed that the expression of all Arabidopsis ABFs was induced by ABA, cold temperatures, drought conditions, and high salinity, but the degrees of induction differed. A.thaliana_ABF1 was significantly induced by cold temperatures; A.thaliana_ABF2 was significantly induced by drought conditions; A.thaliana_ABF3 was significantly induced by ABA, drought conditions and salt; and A.thaliana_ABF4 was significantly induced by drought conditions and salt (Additional file 7: Figure S6). We further investigated the expression of other plant ABFs to abiotic stresses (ABA, drought and highly salinity) using quantitative real-time PCR (qRT-PCR). From the heatmap, we found that the expression of most ABFs was induced by ABA, drought conditions, and high salinity. Except for B.rapa_ABF7, G.max_ABF2, and Z.mays_ABF3, all ABFs were significantly induced by drought conditions. ABFs are known for their importance in ABA-mediated abiotic stress responses, meaning a significant induction in the ABF genes might play a crucial role in plant adaptation to environmental stresses (Fig. 5).
Molecular characterization and expression analysis of TaABFs
Phylogenetic analyses suggest that TaABFs might serve a role in regulating abiotic stress response in wheat. We cloned three TaABF genes from the wheat cv. Chinese Spring. Each gene had three homologous components in the A, B, and D genomes of wheat; we named them TaABF1-5A/B/D, TaABF2-7A/B/D, and TaABF3-6A/B/D. Additional phylogenetic analyses indicated that TaABF1 was most closely related to the rice OsAREB2, TaABF2 was most closely related to the rice OsABF2, and TaABF3 was most closely related to the rice OsAREB1 (Fig. 6A). An analysis of the protein sequence revealed that TaABFs displayed 55%-98% sequence similarity (Fig. 6B). We then analyzed the subcellular localization of TaABF3, first constructing the expression cassette and fusing TaABF3 with the GFP protein. The fused proteins were then transiently expressed in Arabidopsis protoplasts. We used fluorescence microscopy to analyze and reveal that the TaABF3-GFP fusion proteins were exclusively localized in the nucleus in the transformed cells, while the control GFP was uniformly distributed throughout the cell (Fig. 7A). These results confirmed that TaABF3 is a nuclear-localized protein.
To examine the expression pattern of TaABFs, we first identified the cis-element in its region of promotion, which was ~ 2 kb upstream of the transcription initiation codon, finding a number of cis-acting elements related to stress response in the promoter of TaABFs. This includes LTR (low temperature-responsive element), MYB (MYB recognition site), MYC (MYC recognition site), MBS (MYB binding site involved in drought-inducibility), ABRE (ABA-responsive element), and DRE (Dehydration-responsive element) (Fig. 7B). In order to better understand the role that TaABFs play in response to drought conditions, we executed quantitative real-time PCR (qRT-PCR) on RNA taken from various tissues and conditions of drought. Considering the highly sequence similarity of wheat homeologous genes, the PCR primers were designed to amplify the conserved locus of three TaABF homeologs; for example, the relative expression level of TaABF1 represents the combined expression of all three TaABF homeologs (TaABF1-5A, TaABF1-5B and TaABF1-5D). The results demonstrated that TaABFs were found in higher levels in the leaves of the seedlings (Fig. 7C) and that under drought stress conditions, all TaABFs in wheat leaves were up-regulated (Fig. 7D).
Overexpression of TaABF3 confers drought tolerance in Arabidopsis
To better understand how TaABFs function in plant abiotic stress tolerance, we generated 35S::TaABF3-GFP transgenic Arabidopsis lines. We then selected three independent transgenic lines for 35S::TaABF3-GFP transgenic Arabidopsis that exhibited higher expression levels of TaABF3 in order to further analyze their response to drought stress (Additional file 8: Figure S7). We then compared the drought tolerance of transgenic and vector-transformed (WT) plants. We grew WT and each 35S::TaABF3-GFP transgenic plants for three weeks in soil before withholding water for ~ 14d. After the drought treatment and six days of re-watering, ~ 65-75% of the transgenic plants survived, while only ~ 8% of the WT plants survived (Fig. 8A and B).
We next assayed the proline contents, malondialdehyde (MDA) contents, and the soluble sugar contents in 35S::TaABF3-GFP transgenic and WT plants (Fig. 8C-E). Our results showed that in transgenic lines the proline contents and the soluble sugar contents were significantly higher and the MDA contents were significantly lower than in WT under both well-watered and drought conditions. We also detected the expression of several well-known drought-responsive genes in the transgenic lines, including Arabidopsis-homologous LEA14 [29], RD29A [30], DREB2A [31], RAB18 [32], RD20 [33], and GolS2 [34]. These results showed that all of these genes were up-regulated in 35S::TaABF3-GFP transgenic lines (Fig. 8F). Collectively, these findings indicate that the overexpression of TaABF3 in Arabidopsis could enhance the drought tolerance of transgenic plants.