Identification of the JAZ family in the maize genome
Thirty-six putative protein sequences were obtained from maize genomes by searching the ZIM [9] domain from GRASSIUM (Grass Regulatory Information Services, http://www.grassius.org) database [37]. Although all these sequences contained the TIFY/ZIM domain, some contained CCT motif and/or C2C2-GATA motif (Group I TIFY protein), thus were predicted as ZML subfamily. Some protein sequences contained only TIFY motifs and were considered belonging to TIFY subfamily. Within the 28 proteins that contained both TIFY domain and Jas motif, two lacked the conserved PY motif at the C-terminal end, two contained incomplete motif, and eight did not have a typical TIFY motif. To identify the most functional JAZ candidates, only the characteristic motifs (“TIFYXG” and “SLX2FX2KRX2RX5PY”) were considered in this study (Group II TIFY protein). Other variants including incomplete motifs from the search results were manually eliminated. Overall, 16 members were identified as the ZmJAZ family (Table 1), and these genes were named according to their grouping in phylogenetic (Fig. 1) and synteny analyses (Fig. 3, 4) described below. We also conducted genome-wide searches for JAZ homologs in three other monocot databases and identified 16, 9, and 11 candidate JAZ genes in rice (Supplemental Table 2), sorghum (Supplemental Table 3), and Brachypodium (Supplemental Table 4) genomes, respectively.
Based on information from maizeGDB, the 16 JAZ genes were distributed on seven maize chromosomes: chromosomes 1, 2, and 7 each had four ZmJAZ genes, and chromosomes 4, 6, 9, and 10 each contained one ZmJAZ gene. Because of their possible role in herbivore defense pathway, we were interested in determining if any of the ZmJAZ genes were located in insect-resistance QTLs known for two lepidopteran species, fall armyworm (FAW) and southwestern corn borer (SWCB) [38-40]. As shown in Table 1, six loci were found in regions of FAW QTLs and three were found in regions of SWCB QTLs. In summary, ZmJAZ1a and ZmJAZ5-1a were located in the SWCB QTL on chromosome 7, bin 0.02, ZmJAZ2b and ZmJAZ3-1b were located in the FAW QTL on chromosome 2, bin 0.02 and 0.08 respectively, ZmJAZ3-1a and ZmJAZ4-5 were in the FAW QTL on chromosome 7, bin 0.04 and 0.03 respectively, and tandem repeats ZmJAZ4-1a and ZmJAZ4-2 were in the FAW QTL on chromosome 1, bin 0.02.
As a transcription factor, almost all the ZmJAZ proteins had a predicted nuclear localization sequence, but four (ZmJAZ3-2, ZmJAZ4-2, ZmJAZ4-4 and 4-5) had chloroplast or Golgi targeting signals (Table 1). According to the transcriptional analysis by Sekhon [41], the highest expressing organs typically were leaves or roots and different expression patterns for ZmJAZ genes were also listed in Table 1. There was no clear correlation between sequence similarity and gene expression patterns.
Phylogenetic tree of the JAZ orthologs from maize, rice, sorghum, Brachypodium, and Arabidopsis
To reveal the evolutionary relationship of the JAZ gene family in plants, a phylogenetic tree was created using the deduced protein sequences from maize and orthologous proteins from three monocot genomes used in this study: Oryza sativa (12 OsJAZ; Supplemental Table 1), Sorghum bicolor (9 SbJAZ; Supplemental Table 3) and Brachypodium distachyon (11 BdJAZ; Supplemental Table 4). Besides, 12 JAZ genes from Arabidopsis thaliana, a eudicot were also included (Supplemental Table 1). The 60 plant genes analyzed in this study clustered into six orthologous JAZ groups according to the phylogenetic tree (1 to 6; Fig. 1).
Each clade resembles a similar topology order (ZmJAZa/b, SbJAZ), ZmJAZb/a), (OsJAZ, BdJAZ), AtJAZ) with minor variations. One example was the homeologous pair ZmJAZ2a and ZmJAZ2b, possibly derived from a chromosome duplication event, therefore they were more closely related to each other than SbJAZ2. Surprisingly, each monocot species had similar numbers of JAZ proteins from each orthologous group except for group 4. There appeared to be a major expansion in this group both in protein number and sequence divergence. It is noteworthy that members from groups 1, 2, 3, 5 and 6 contained a mixture of protein members from both monocots and dicots plants, however, group 4 appeared to be a monocot-only JAZ clade in this study. Similar results were discovered in other studies, indicating that group 4 might be specific for monocots [42-45]. For example, three ZmJAZ genes (4-3, 4-4, 4-5) and one rice gene OsJAZ4-5 had no orthologous sequences in the other plant genomes.
Results from the phylogenetic analysis showed that all JAZ groups were descended from one ancient origin, and groups 1, 3, 4 and groups 2, 5, 6 were loosely clustered together, indicating a large evolutionary distance between these two groups. Compared with previous analysis of Arabidopsis JAZ proteins, results in this study corresponded to the proposed subclades of AtJAZ proteins [3]. Thanks to the information provided in maize genome database, JAZ genes from the same species in groups 1, 2, and 3 were paralogous, while genes in JAZ groups 4, 5 and 6 were not paralogous with each other. As stated previously, many homologous sequences were not included in this study since they had either incomplete or major changes in one or both of the conserved TIFY and Jas motif. For this reason, group 6 that contains homologous sequences only from rice, Brachypodium, and Arabidopsis, since one homologous sequence in maize (AC187560.5_FGT003) and one in sorghum (Sb02g003130) were manually eliminated.
Sequence comparison and structure analysis of the maize JAZ genes
To gain more insight into the divergence of the 16 maize JAZ genes, a phylogenetic tree was generated using the deduced protein sequences identified in this study (Fig.2a). JAZ protein families were found in five clades, and members with similar sequences tended to cluster together. ZmJAZ proteins from phylogenetic groups 1, 3, 4 were more closely related compared to groups 2 and 5, and this topology was in line with the phylogenetic tree in Fig. 1, which used JAZ sequences from all five plant species.
Exon/intron structures of the maize JAZ gene family were compared to examine their evolutionary lineages (Fig. 2b). The results showed that ZmJAZ genes with close phylogenetic relationships contained similar exon-intron patterns, including the number of exons, exon length, intron phases, and splicing patterns (Table 1). As shown in Fig. 2b, groups 1, 2, and 3 had five to six exons, group 4 had one to two exons, and group 5 had six to seven exons. However, since exon loss/gain and sequence polymorphisms were identified in the ZmJAZ genes, there is likely functional diversity in the gene family as well. JAZ gene structures in rice (Supplemental Fig. 1), sorghum (Supplemental Fig. 2), and Brachypodium (Supplemental Fig. 3) were also examined. Again, it was striking that members from the same phylogenetic group also shared the identical exon-intron structure among the listed monocot species.
Although the gene sequences among the ZmJAZ family were fairly diverse, two characteristic domains were retained due to their importance for protein-protein interactions: TIFY/ZIM domain was crucial for interactions of JAZ with other transcriptional regulators (i.e. NIJIA, TPL), and Jas domain was important for interactions with bHLH transcription factor (i.e. MYC2) and COI1-mediated protein degradation responding to JA-Ile [8, 17, 46-50]. Particularly in Jas domain, studies revealed a degron sequence LPIAR(R/K) from the N-terminal and the consensus sequences RX5PY from the C-terminal; the former sequence was important for COI1/JA/JAZ complex formation and the latter one served as a nuclear localization signal (NLS) [12, 45, 51]. The phylogenetic relationship was also analyzed (Fig. 2a). To further examine the two conserved domains in ZmJAZ proteins, sequence logos for TIFY and Jas domains (Fig. 2 and Supplemental Fig. 4) were created with WebLogo [52]. The results revealed that both domains (Fig. 2c and 2d) were highly conserved at multiple amino acid sites. Core domain sequences of the four grass JAZ proteins were listed in Table 1 and Supplemental Table 2-4, and the sequences from the same phylogenetic group were found to be highly conserved, with a limited amino acid variation. Besides, another conserved motif cryptic MYC-interaction domain (CMID) (FAX2CX2LSX3K/R) was found near the N-terminus of JAZ proteins (Fig. 2e) using MEME motif search [53]. In Arabidopsis, functional CMIDs have been identified only in AtJAZ1 and AtJAZ10 [45]. In maize, CMID domain was more commonly present in JAZ sequences from groups 1, 3 and 4; logo sequences of maize CMID domain were more conserved with AtJAZ1. Similar results were found in rice, sorghum, Brachypodium as well (Supplemental Fig. 5). Interestingly, expression results from a previous study in rice suggested that only proteins containing this motif were induced by both JA and cold stress [42]. The ethylene-response factor amphiphilic repression (EAR) motif (LXLXL) was present at the N-terminus in group 2, this motif was found in NOVEL INTERACTOR OF JAZ (NINJA) and some Arabidopsis JAZ proteins that recruit TOPLESS (TPL) scaffolding proteins to repress jasmonate responses [49].
Interspecies synteny analysis and expansion patterns of the JAZ genes
Maize chromosomes contain large duplicated regions implying the whole genome duplication (WGD) previously occurred [54]. Such syntenic regions derived from the same ancestral chromosomes could provide some insight into the expansion of the ZmJAZ family. The self-self syntenic dotplot of whole maize genome was presented in Fig. 3, and it provided visual evidence for duplicated regions between maize chromosomes since only the syntenic gene pairs were plotted. On the dotplot, high density of syntenic gene pairs between two chromosomes was represented by color-coded lines with various slopes, based on synonymous substitution rate Ks shown in Fig. 3b. When we examined the synteny blocks, three significant syntenic JAZ pairs were identified: ZmJAZ1a/1b and ZmJAZ 3-1a/1b located on the large syntenic block shared by chromosomes 2 and 7; ZmJAZ2a/2b is located on another large syntenic block shared by chromosomes 2 and 10 (Fig. 3a). The other two pairs were observed on syntenic blocks shared by chromosomes 1 and 9 for pair JAZ4-1a/1b and chromosome 7 and 2 for pair JAZ5-1a/1b, where syntenic gene pairs are labeled with colored lines (Fig. 3c, d).
After WGD, certain duplicated genes were both retained in the genome such as the five JAZ homeolog pairs described above. But often, one (or both) copies were lost due to deletion over time [55]. JAZ genes ZmJAZ3-2, ZmJAZ4-2, and ZmJAZ5-2 lost their own duplicated copy, however, they still shared a small syntenic region with ZmJAZ3-1a, ZmJAZ4-1b, and ZmJAZ5-1a, respectively, which was most likely due to an older WGD [56]. ZmJAZ4-2 and ZmJAZ4-1a were defined as a tandem duplication cluster on chromosome 1 since one or no intervening gene was between these two adjacent homologous genes [13]. This was the only tandem duplication event for JAZ genes in the maize chromosomes. There were three genes (ZmJAZ4-3, ZmJAZ4-4, and ZmJAZ4-5) that had no synteny with other genes, nor orthologs in other grass genomes (Fig. 1). The genes in group 4 also had the most exon number variations (one-to nine), indicating that loss and gain of exon/intron occurred throughout the evolution of ZmJAZ family. For example, ZmJAZ4-3, ZmJAZ4-4, and ZmJAZ4-5 shared a common first exon, but the latter two acquired extra sets of small exons and large introns. By searching in the Plant Genome Duplication Database [57], retrotransposons were found mostly in genes from group 4. Due to the presence of transposon repeats, together with the lack of synteny and corresponding orthologs, ZmJAZ4-3, 4-4, and 4-5 might be the result of transposon duplication. In summary, 13 out of 16 JAZ genes were associated with chromosomal duplications, suggesting these duplication events have contributed to the expansion of maize JAZ gene family.
Intraspecies synteny analysis of the JAZ family among maize, rice, sorghum, and Brachypodium
Since all grass species have undergone multiple whole genome duplications (WGD) from a common paleopolyploid ancestry some 70 million years ago (MYA) [58, 59], synteny is evident among different grass families. In this study, four published plant genomes (maize, sorghum, rice, and Brachypodium) were used to represent the grass lineages. To identify orthologous regions among maize and other monocots, we generated several syntenic maps using maize genome as a reference [60] (Fig. 4). Large-scaled synteny blocks containing JAZ orthologs were present across the grass family, which suggests the grass family shared the common ancestor for JAZ genes.
Since the recent WGD in maize, one orthologous region from genomes of rice, sorghum, and Brachypodium had two homeologous regions located in maize genome [56]. For example, ZmJAZ1a/1b and 5-1a/1b from maize chromosome (chr) 2 and chr7 aligned with the homologous region in rice chr 9, sorghum chr 2, and Brachypodium chr 4 (Fig. 4a). ZmJAZ2a/2b from maize chr 2 and chr 10 were syntenic with rice chr 4, sorghum chr 6, and Brachypodium chr 5 (Fig. 4b). ZmJAZ4-1a/1b and ZmJAZ4-2 from maize chr 1 and chr 9 were syntenic with rice chr 3, sorghum chr 1, and Brachypodium chr 1 (Fig. 4c). A summary of syntenic blocks for ZmJAZ gene was listed in Fig. 4d, including five primary syntenic regions (5 duplicated pairs from Fig. 3: ZmJAZ1, 2, 3-1, 4-1, 5-1) and three secondary syntenic regions for JAZ singleton (ZmJAZ3-2, 4-2, and 5-2) in four plant genomes. It was noteworthy that larger conservation for syntenic JAZ gene pairs was found between the sorghum and maize, which corresponds to the shorter divergence time between the two species (12-18 Mya), although genomic rearrangements were also extensively present in those genomes.
Strong purifying selection for JAZ genes in maize
Since most of the maize JAZ family was expanded by genome duplications, distances in terms of synonymous (dS or Ks) and nonsynonymous substitution rates (dN or Ka) were calculated using a pair-wise comparison of each JAZ orthologous group between maize and the four other plant species (Table 2). Within each maize intra-species comparison (maize-rice, maize-sorghum, maize-Brachypodium, and maize-Arabidopsis), dS and dN values show homogeneity within most of the orthologous gene groups, however, they were largely different between different intra-species comparisons (ranging from 0.129-0.683 for dS and 0.043-0.593 for dN). dS can often be used to estimate the relative age of homologous sequences [61]. Synonymous distance between maize and the four other plant species can be ranked in the ascending order of Arabidopsis, Brachypodium, rice, maize, and sorghum, which supported the time of divergence based on the phylogenetic lineage. The average dN and dS values between and within each maize syntenic JAZ gene pair were also estimated and listed in Table 3. dS values varied within each syntenic pair (0.181-0.434), with an approximate number 0.1-0.2 for ZmJAZ2 and 4, 0.2-0.3 for ZmJAZ1 and 3, consistent with the timing of recent WGD event occurred 11-15 MYA ago [54]. The exception was the ZmJAZ5 gene pair, a higher dS (0.434) indicated an older divergence time from each other. Relatively higher dS values were also observed between different syntenic pairs, suggesting longer divergence time between each JAZ group.
Comparing orthologs from two species using the dN/dS ratio could reveal the type of selection pressure acting on the genes: ratio = 1 indicates neutral selection, ratio >1 indicates positive selection, and radio <1 indicates purifying selection. Moreover, a codon-based Z-test was also conducted for each JAZ gene using the Nei-Gojobori substitution model/method [62] for purifying (dN<dS) and the null hypothesis (dN=dS), and the results were listed in Table 2 and 3 with p-values. After comparing the relative abundance of dS and dN, we can see almost all group of homologous JAZ genes were under strong purifying selection in the satisfactory zone with p-values less than 0.05. The only exception was genes from group 4, providing a p-value exceeding 0.05 and thus indicating they were under neutral selection. As mentioned before, ZmJAZ4-1a and ZmJAZ4-2 were tandem repeats, and ZmJAZ4-3, 4-4, and 4-5 were transposon repeats without known orthologs with other plant species, the expansion in JAZ group 4 might have happened after the recent WGD since higher dN/dS ratio suggested a more recent duplications event [63].
Cloning and characterizing three major homeologous JAZ genes from Mp708 and Tx601
This study was undertaken to determine if there were sequence differences in JAZ genes of the insect-resistant genotype Mp708 and the susceptible genotype Tx601 since these two maize inbred lines differed in endogenous JA levels and resistance against Lepidoptera. Based on the genomic identification of JAZ genes from the maize inbred B73, six of the 16 candidate JAZ genes were selected for further analysis: ZmJAZ1a/1b from group 1, ZmJAZ2a/2b from group 2, and ZmJAZ3-1a/3-1b from group 3. There were three reasons why we selected genes from JAZ groups 1, 2 and 3 for testing. First, they had the most conserved sequences when compared across plant JAZ families (Fig. 1), thus there was a higher chance that JA regulatory function was preserved for these genes. Second, they had the highest reported expression in leaves and predicted nucleus locations (Table 1). Third, since ZmJAZ1 and ZmJAZ3 were both phylogenetically and functionally closer to each other compared to ZmJAZ2, they provided some diversity in the group. Both genomic DNA (gDNA) and cDNA sequences were amplified from maize Mp708 and Tx601 leaves. The resulting amplified fragments were then cloned and sequenced, listed in Table 4.
A comparison of ZmJAZ protein sequences from Table 4 together with paralogs in B73 is shown in Fig. 5a and the conserved domains (TIFY and Jas) were labeled accordingly. Our results revealed that amino acid sequences were quite conserved among homeologous pairs for three inbreds, all ZmJAZ pairs exhibited >60% nucleotide sequence identity, and >80% peptide sequence identity (Table 5a). When performing a pair-wise comparison between inbreds (Mp708 vs Tx601, Mp708 vs B73, and Tx601 vs B73), there was some degree of polymorphisms present at both nucleotide sequences level (99%-100% identity) and amino acid sequences level (94%-100% identity) (Fig. 5 and Table 5b). Phylogenetic analysis using the aforementioned protein sequences (Fig. 2a) showed that ZmJAZ sequences from inbreds Mp708, Tx601, and B73 were clustered according to JAZ groups and mini-cluster were formed for each homeologous pair. Similar to the previous analysis in Fig. 1, ZmJAZ proteins from groups 1 and 3 were more closely related than JAZ group 2. The protein sequence identity scored highest between group 1 and 3, ranging from 43% to 54%, while the scores were less between the group 1 and 2 and group 2 and 3, ranging from 29% to 44% and 24% to 38%, respectively.
To further explore the variations in conserved TIFY and Jas regions, detailed cDNA sequence alignments were shown in Fig. 5b and 5c, using the sequences of ZmJAZ 1a/b, ZmJAZ2a/b, and ZmJAZ3-1a/b from Mp708, Tx601, and B73. The results indicated the TIFY and Jas domains showed very strong conservation among three inbreds, however, polymorphisms existed at multiple sites. In general, there were more nucleotide substitutions between Mp708 and Tx601, compared with B73. Twelve out of 29, and 16 out of 27 amino acid sites were identical for TIFY and Jas domains, respectively. Polymorphisms were mostly at synonymous sites for each paralogous gene pair due to purifying selection after the recent WGD. On the contrary, polymorphisms were more prevalent at nonsynonymous sites when comparing each inbred, suggesting the possibility of functional divergence for different breeds.
To confirm the possible chromosomal location of each cloned ZmJAZ gene, PCR products were generated using gDNA from oat-maize addition lines [64] and together with three maize inbred lines Mp708, Tx601, and B73 (Fig. 6). Chromosome specificity was defined by the presence of an amplified band from the maize gDNA (donor) but absence from oat gDNA [64]. All ZmJAZ genes tested were at the reported locations predicted by the bioinformatics analysis, except for ZmJAZ3-1a. This gene was predicted to be located on chromosome 7 but showed a chromosome 2 band on the gel. One possible explanation is the chromosome rearrangement between chromosomes 7 and 2 occurred in the specific maize genomes used to make the oat addition lines, so the location of the gene changed accordingly.
At the sequence level, three paralogs of ZmJAZ gene pairs shown no major variations between Mp708 and Tx601, but differences were present at the transcriptional level (data to be published). Noteworthy, there were several cases where cDNAs of variable lengths were found in Mp708. These differences were clearly visualized in gene structure analysis using cDNA sequences (Fig.2b). One example was ZmJAZ1b, it was significantly shorter in Mp708 than the corresponding genes in Tx601, due to the loss of the first two exons. Another example was ZmJAZ2a, there were two cDNA products of ZmJAZ2a in Mp708 (ZmJAZ2a and ZmJAZ2a’) versus only one product in Tx601. Particularly, the two middle exons of ZmJAZ2a’ in Mp708 were merged but not in others, indicating alternative splicing may have occurred. One more significant difference between Tx601 and Mp708 transcript was that no cDNA product of ZmJAZ2b was amplified from Mp708 even when multiple sets of different primers were used. This suggested that ZmJAZ2b might not be expressed in Mp708 leaves, although expression was detected in Tx601. Based on the characteristic of three cloned ZmJAZ gene pairs, there were only minor variations at sequence level when comparing the two inbreds; however, more obvious differences were observed at the transcription level, suggest genotype specificity in the expression of maize JAZ genes.