Identification and chromosomal distribution of WRKYs in soybean
To identify WRKY family genes in soybean, we used the WRKY domain (PF03106) as a query for the BLAST search in soybean, and the results were further confirmed in the SMART and NCBI CDD databases. A total of 185 GmWRKY members were identified in soybean, which is consistent with the number recorded in the PlantTFDB. The 185 GmWRKYs were unevenly positioned among the 20 soybean chromosomes (Chrs) and were named from GmWRKY1 to GmWRKY185 according to their physical location on the Chrs (Table S2 and Figure S1). Of the 185 WRKYs, 15 (8.1%) were located on Chr06, which harbored the greatest number of WRKY genes, followed by 14 genes (7.6%) on Chr09, while only 3 genes were mapped on Chr11, Chr12, and Chr20 (Figure S1).
Phylogenetic classification of the GmWRKYs
To reveal the phylogenetic relationship and evolutionary history of soybean GmWRKY proteins, a phylogenetic tree was constructed based on the full-length sequence alignment of 185 GmWRKYs and 72 Arabidopsis AtWRKYs (Fig. 1). Phylogenetic tree analysis reviewed that all the GmWRKYs were clustered into three groups (Group I, Group II and Group III) according to previously reported classification criteria [45], and Group II was further divided into five subgroups in this neighbor-joining (NJ) tree, designated as Group IIa to IIe (Table S2 and Fig. 1). Group I included 30 GmWRKYs (16.2%), featuring two WRKY domains and one C2H2 motif. Among them, GmWRKY4 and GmWRKY21 contained the WRKYGEK heptapeptide, while the others contained WRKYGQK (Table S2 and Figure S2). However, most of Group I members contained C2H2 (C-X4-C-X23-HXH) zinc finger structure apart from GmWRKY2, GmWRKY126 and GmWRKY127, which contained C2H2 (C-X5-C-X23-HXH, Figure S2). Group II included 130 GmWRKYs (70.3%), harboring one WRKY domain and one C2H2 motif, including Group IIa (7.6%), Group IIb (17.8%), Group IIc (23.8%), Group IId (10.3%), and Group IIe (10.8%). Among them, 115 GmWRKYs (88.5%) contained WRKYGQK, whereas GmWRKY105 in IIa and GmWRKY130/162/29/36/47/52/58/60/82 in IIc contained WRKYGKK, GmWRKY75, and GmWRKY108 in IIc contained WRKYGEK and WRKYGLK, respectively (Table S2). Additionally, 72 and 51 GmWRKYs (55.4% and 39.2%, respectively) contained C2H2 (C-X4-C-X23-HXH) and C2H2 (C-X5-C-X23-HXH), respectively (Table S2). Group III included 25 GmWRKYs (13.5%), harboring one WRKY domain and one C2HC motif. Except for GmWRKY46, the others all contained the WRKYGQK heptapeptide. Additionally, AtWRKY49 and GmWRKY70/168, AtWRKY40 and GmWRKY71, which are supposed to belong to Group IIc and Group IIa respectively, were all classified into Group I, indicating that they might be more homologous to members in Group I. Strikingly, a WRKYGQK-like stretch was lost in GmWRKY20, -104, -129; whereas C2H2 or C2HC was missing in GmWRKY20, -42, -71, -78, -93, -105, -111, -163 (Table S2 and Figure S2), however, the effects on functions need further investigation.
To further investigate the number and proportion of WRKY members in the plant genome, eight representative plant species, namely, Glycine max, Arabidopsis thaliana, Triticum aestivum, Cucumis sativus, Nicotiana tabacum, Zea mays, Akebia trifoliata, and Cymbidium sinense, were counted (Table S3 and Figure S3). The number of WRKY genes in Glycine max far exceeded that in other species, except for Triticum aestivum and Nicotiana tabacum, which are both allopolytraploid plant species. The proportions of GmWRKY proteins in Group I was generally the same in different species except for Akebia trifoliata and Cymbidium sinense. The number of WRKY proteins in Group II was generally greater than that in Group I and Group III for the eight plant species, ranging from 54.8–70.5%. Additionally, Group IIc had the most WRKY members in the eight representative plant species. These results suggested that the majority of members of Group II might experience a significant expansion in soybean and other species, which could be caused by whole-genome duplications, tandem duplications, or segmental duplications of genomes.
Conserved motif analysis and gene structure of GmWRKY genes
To further analyze the structural diversity of GmWRKYs in soybean, the phylogenetic tree has been constructed with only GmWRKY members (Fig. 2A). Each GmWRKY was clustered into the corresponding group, as shown in Fig. 1, suggesting that the phylogenetic tree was reasonable. The conserved motifs were subsequently analyzed using the MEME-suite (Fig. 2B). A total of ten distinct motifs (designated Motif 1to Motif 10) were identified in the GmWRKY proteins (Figure S4). Among them, motif 1 contains a WRKY domain (Figure S4), which is present in all the GmWRKY proteins except for GmWRKY62/30/104/49/129/119/20. The types and orders of motifs in the different groups appeared to be similar (Fig. 2B). For example, most of the GmWRKYs in Group I contained more than five motifs, namely, Motif 1, -2, -3, -4, and − 10, and the GmWRKYs in Group IIIcontained Motif 1, -2, -9 or -8 (Table S2). In addition, the GmWRKYs in Group II had various motif types and orders, for example, Group IIa contained Motif 1, -2, -5, and − 6, Group IIb contained Motif 1, -2, -3, -4, -5, -6, and − 7, Group IIc contained Motif 1, -2, -3, and − 4, Group IId contained Motif 1, -2, -4, -8, and − 9, Group IIe contained Motif 1, -2 and − 4 (Table S2), indicating that the GmWRKYs in Group II might have diverse functions in soybean.
To gain insight into the structural diversity and evolution of GmWRKYs in soybean, the number and arrangement of exons and introns were analyzed. The results showed that the GmWRKYs clustered into the same group had similar exon-intron structures (Fig. 2C). The number of introns varied from zero to seven, for example, GmWRKYs in Group I had three to five introns, GmWRKYs in Group II had zero to seven introns, and GmWRKYs in Group III had one, two or five introns (Table S2 and Figure S5A). Notably, GmWRKYs in IIb generally had more introns and more complex exon-intron structures than did those in the other groups (Table S2 and Figure S5B). However, the effects of diverse exon-intron organizations on the function of GmWRKYs need further study.
Duplication events and syntenic analysis of GmWRKY members
Two diploid soybean (Glycine max) species experienced two sequential WGD events, including a polyploidy event at ~ 59 MYA and a Glycine-specific WGD at ~ 8–13 MYA [46, 47]. Duplicated genes in soybean have been classified into five categories, namely, singletons, dispersed, proximal, tandem, and WGD (including segmental duplications), which strongly correlate with gene function and mode of duplication [46]. Therefore, we analyzed the duplication types of GmWRKY genes via MCScanX. The results showed that WGD events accounted for 96.7% (29/30), 90.9% (118/130), and 72.0% (18/25) in Group I, II, and III, respectively (Fig. 3A). In addition, dispersed duplication events accounted for 3.3% (1/30) and 5.4% (7/130) in Group I and II, respectively. Tandem and proximal duplication events occurred mainly in Group III, accounting for 24.0% (6/25) and 4.0% (1/25), respectively. According to previous report, tandem duplication events occur when genes are in a chromosomal region within 200 kb [48]. In this study, three sets of GmWRKYs (GmWRKY107/108/109, GmWRKY119/120/121/122, and GmWRKY135/136) in Groups IIc and III, were identified as tandem duplicate pairs on Chr10, 13 and 14 (Figure S1 and Table S2). WGD events were subsequently visualized within the soybean genome, and 220 segmental duplication pairs were found between 165 GmWRKY genes that were unevenly distributed on 20 chromosomes (Fig. 3B and Table S4). The Ka/Ks (nonsynonymous/synonymous substitutions) of 220 WGD gene pairs were calculated to be less than 1 (Table S4). Therefore, these findings indicated that the expansion of GmWRKYs was caused mainly by the WGD events under negative selection pressure during their evolution.
To further investigate the evolutionary mechanisms of GmWRKYs, the synthetic analysis of WRKYs in Arabidopsis was performed with GmWRKYs (Fig. 3C). The results showed that 76 GmWRKYs (41.1%) displayed syntenic relationships with 35 AtWRKYs (48.6%) in Arabidopsis, and 96 orthologous pairs were identified according to the whole-genome-wide comparative analysis (Table S5), indicating that these WRKYs may be derived from the same ancestor with similar functions.
Promoter analysis of GmWRKY genes
To explore the regulatory mechanisms of GmWRKYs, 32 types of cis-elements were identified in the promoters of GmWRKYs [49, 50], which were involved in hormone responses, stress responses, plant growth, and developmental response, and light responses (Fig. 4 and Table S6). Among them, twelve hormone-responsive elements were found, including Abscisic Acid (ABRE), Auxin (AuxRR-core, TGA-element and TGA-box), Gibberellin (GARE-motif, P-box and TATC-box), MeJA (CGTCA-motif and TGACG-motif), and Salicylic Acid (TCA-element and SARE), and Ethylene (ERE). Among the hormone-responsive elements, the ERE and ABRE were widely distributed among the promoters of GmWRKYs (Fig. 4A) and occupied the largest proportions (28.53% and 26.25% respectively) (Fig. 4B). Eight stress-responsive related elements were identified, including ARE (anaerobic induction), GC-motif, LTR (low-temperature), MBS (drought), TC-rich repeats (stress-responsive), W-box, WUN-motif (wound), and STRE (heat shock response). Remarkably, the W-box existed in 94 GmWRKYs, and can be specifically recognized and bound by WRKY TFs, indicating that these GmWRKYs may be regulated by themselves or other WRKY TFs at the transcriptional level. Eight elements were involved in plant growth and developmental response, including MBSI (MYB binding site), CAT-box (meristem specific element), CCGTCC-motif (meristem specific activation), GCN4-motif (endosperm-specific element), circadian (circadian control responsiveness), HD-Zip 1 (palisade mesophyll cells differentiation), O2-site (zein metabolism regulation), and RY-element (seed-specific regulation). Four elements involved in light response were detected, including Box 4, G-box, AE-box and MRE. Among them, the Box4 elements have a wide distribution in GmWRKY genes, which were identified in 177 (95.7%) members of GmWRKYs and occupied the largest proportions among light response-related motifs (68.33%), providing a reference for further study the functions of GmWRKYs in light response. Additionally, no cis-elements were found in the promoters of GmWRKY30 and GmWRKY78. Taken together, these findings suggested that GmWRKYs not only play vital roles in stress responses, but also in plant growth and development.
Expression patterns of GmWRKYs during SMV infection
In our previous studies, we found that NO and H2O2 played synergistic roles in callose deposition during SMV infection, which is crucial for restricting virus transport [15, 16]. To investigate the roles of GmWRKYs in response to SMV infection, we identified 60 GmWRKY genes (Fig. 5) that were differentially expressed during SMV infection in H2O2-associated transcriptome data [16]. Among them, seven and eight GmWRKYs belonged to Group I and Group III, respectively. In addition, 45 members belonged to Group II, including eight in Group IIa, nine in Group IIb, eighteen in Group IIc, three in Group IId, and eitht in Group IIe (Fig. 5A). Among them, GmWRKY162 had high expression levels in the incompatibility combination and was downregulated after imidazole treatment (Fig. 5A and 5B), indicating that GmWRKY162 play a positive role in response to SMV. Therefore, GmWRKY162 was selected for further functional analysis.
GmWRKY162 directly targets GmGSL7c to suppress virus spread through increasing callose deposition
To further investigate the function of GmWRKY162 in response to SMV infection, GmWRKY162-silenced Jidou 7 plants were generated via TRV-mediated VIGS, the results of which revealed an efficiency of more than 70% (Fig. 6A). Compared to that of the control (TRV:00), the area of cells with callose fluorescence under SMV infection was significantly decreased at 48, 72, and 120 dpi in GmWRKY162-silenced plants (Fig. 6B and 6C). Furthermore, compared to those in the control, disease symptoms were observed in the non-inoculated upper leaves of GmWRKY162-silenced plants at 15 dpi, and the expression of the coat protein (CP) gene of SMV was significantly induced (Fig. 6D and 6E), indicating that the silencing of GmWRKY162 reduces callose deposition and enhances virus spreading during SMV infection.
To further understand how GmWRKY162 works during SMV infection, 693 differentially expressed genes (DEGs) were identified from the transcriptome database [16], which have similar expression trends with GmWRKY162 (Fig. 7A and Table S7). Functional enrichment analysis revealed that genes associated with catalytic activity, oxidoreductase activity, membrane, and obsolete oxidation-reduction process were enriched (Fig. 7B). Among them, the catalytic activity term enriched the most genes (Table S8), which contained a glucan synthase-like gene (named GmGSL7c) involved in callose synthesis; the functional study of GmGSL7c was carried out in our previous report [51]. Furthermore, a W-box (GTCAA, -1901 bp to -1905 bp) was found through the prediction of the cis-element in the GmGSL7c promoter region (Fig. 7C and Table S9). To test this hypothesis, electrophoretic mobility shift assays (EMSAs) in vitro were performed, which revealed that GmWRKY162 could directly bind to the W-box motif of the GmGSL7c promoter (Fig. 7D). In this, the biotin-labeled probe containing a combination of the W-box from the GmGSL7c promoter was constructed and incubated with the recombinant protein His-tagged GmWRKY162, which caused a mobility shift, indicating that GmWRKY162 bound to W-box motif of the GmGSL7c promoter in vitro (Fig. 7D). Next, ChIP-qPCR was used to determine the enrichment of GmWRKY162-GFP in the P1 region (-1947 bp to -1879 bp, containing a W-box motif) and the P2 region (-781 bp to -735 bp, without a W-box motif) of the GmGSL7c promoter (Fig. 7E). These results showed that the enrichment of GmWRKY162-GFP was markedly greater in the P1 region than in the P2 region in the 35S:GmWRKY162-GFP plants, but not in the 35S:GFP plants (Fig. 7E). In summary, these results demonstrated that GmWRKY162 directly binds to the promoter of GmGSL7c in vitro and in vivo to promote its transcription, which increases callose deposition and suppresses virus spread (Fig. 7F).