Identification of CsWRKYs
A total of 243 candidate WRKYs were initially obtained. All putative WRKYs were further identified for the presence of complete WRKY-specific domains. One predicted WRKY protein (XP_019082498.1) was removed due to incomplete WRKY domain. The remaining 242 WRKYs were renamed CsWRKY1-CsWRKY242 respectively (Additional file 2: Table S2). Their characteristics were showed including the group and subgroup, gene locus ID, start and end position in the chromosomes, SL, MW and PI. The annotation of 242 WRKYs were scanned using available C. sativa genome database. All 202 CsWRKY genes coded 242 CsWRKYs. The formation of 70 CsWRKYs was by alternative splicing of 30 CsWRKY genes, the rest of CsWRKY genes only had one coding protein (Additional file 3: Table S3). The length of protein sequences ranged from 136 (CsWRKY136) to 1699 amino acids (CsWRKY47), with an average of 371 amino acids. The MW ranged from 15.98 kDa (CsWRKY136) to 190.70 kDa (PvWRKY34), with an average of 41.33 kDa. The PI ranged from 4.13 (CsWRKY204) to 10.47 (CsWRKY107), with an average of 7.38 (Additional file 2: Table S 2).
Multiple sequence alignment and comparative phylogenetic analysis
According to the highly conserved domain (about 60 amino acids) of each WRKY sequence, 242 CsWRKYs were aligned using Bioedit software. Some CsWRKYs selected randomly were showed in Figure 1 because of excessive sequences (Multiple sequence alignment of 242 CsWRKYs were displayed in Additional file 6: Figure S1).
The CsWRKYs included the highly conserved heptapeptide sequence (WRKYGQK) and the zinc-finger motif. A total number of 11 CsWRKYs in the group 2c only were different due to a change in one amino acid of WRKYGQK sequence, CsWRKY202 contained a WRKYGXK sequence, the others had a WRKYGKK sequence (CsWRKY127, CsWRKY128, CsWRKY135, CsWRKY223, CsWRKY228, CsWRKY229, CsWRKY230, CsWRKY232, CsWRKY233 and CsWRKY235).
The comparative phylogenetic tree was built using MEGA 7.0 based on the highly conserved WRKY domains (approximately 60 amino acids) of 242 CsWRKYs and 75 AtWRKYs. The classification of WRKYs was confirmed according to the specific conserved sequences of WRKYs and number of their WRKY zinc-finger motifs [32], which was consistent with some previous findings of A. thaliana [32] and Common bean [33]. According to the constructed phylogenetic tree of CsWRKYs and AtWRKYs, the WRKYs were classified into three main groups (group 1, 2 and 3) (Figure 2), 69 members belonged to group 1 (56 CsWRKYs), 198 members belonged to group 2 (149 CsWRKYs) and 50 members (37 CsWRKYs) to group 3. Group 2 was subdivided into five subgroups (2a, 2b, 2c, 2d and 2e), which had 16 (12 CsWRKYs), 35 (26 CsWRKYs), 82 (63 CsWRKYs), 34 (24 CsWRKYs) and 31 members (24 CsWRKYs) respectively, most of the CsWRKYs were present in subgroup 2c.
The CsWRKYs were put into the group 1 owing to their two WRKY conserved domains, which were located in the NT and C-terminal (CT) of these sequences respectively. In 242 CsWRKYs, 56 CsWRKYs of the group 1 had two WRKY conserved domains and the zinc-finger domains (C–X4–C–X22-23–H–X1–H). The remaining 186 CsWRKYs contained one WRKY conserved domain. 149 CsWRKYs of the group 2 had the zinc-finger domain of C–X4–5–C–X23–24–H–X1–H. 50 CsWRKYs of the group 3 had the zinc-finger domain of C–X7–C–X23–H–X1–C. There were exceptions, a number of CsWRKYs lacked some amino acids (CsWRKY100, CsWRKY101, CsWRKY111 CsWRKY186, CsWRKY201, CsWRKY208, CsWRKY226, CsWRKY240, CsWRKY241, CsWRKY242), which might be the reason of the incomplete sequencing database.
Each WRKY domain had one highly conserved intron structure except the N-terminal conserved WRKY domain of the WRKYs in the group 1, and the position also was rather conservative (Figure 1). There were two kinds of introns, the constant amino acid PR intron was between the WRKY sequence and the zinc-finger domain in some groups (group 1-CT, 2c, 2d, 2e and 3). The other intron VQR was located in the interior of zinc-finger structure (C–X4–5–C–X5–VQR–X18–19–H–X1–H) in the group 2a and 2b [32]. The conserved intron made the identification of the correct WRKY domain easier.
Some specific WRKYs contained R proteins and the WRKY domains that belonged to the group R, which were one of the most significant features of the WRKY family genes in flowering plants [25]. For example, A. thaliana had three R protein-WRKY genes (AtWRKY16, AtWRKY19 and AtWRKY52) [26]. In the phylogenetic tree, AtWRKY19 and CsWRKY47 were gathered to the group 1, the other four sequences (AtWRKY16, AtWRKY52, CsWRKY186 and CsWRKY220) were clustered into the group 2d. The classified standard of R protein-WRKY genes have been characterized by Rinerson et al [26]. According to the further protein architecture analysis of CsWRKY47, CsWRKY186 and CsWRKY220, CsWRKY47 contained the typical domain of PAH-WRKY(1-NT)-WRKY(1-CT)-NB-ARC and the protein kinase domain in the C-terminal end of the sequence, which belonged to the group RW3 and was the only member. CsWRKY186 had the typical domain of NB-ARC-LRR-WRKY. CsWRKY220 had the TIR-NB-ARC-LRR-WRKY domain, which indicated that CsWRKY186 and CsWRKY220 were peculiar and not belong to any group. The three CsWRKYs formed possibly in order to adapt to especial change during the growth.
Motifs analysis and chromosomal mapping
A total of 10 motifs in all CsWRKYs were analyzed using the MEME, the motifs identified ranged from 15 to 50 residues in width (Figure 3) and shown with colored boxes according to the scale (Additional file 7: Figure S2). The CsWRKYs were clustered into groups according to the similar quantity and structure of motifs. The information of each motif was displayed in Figure 3. Motif 1, 2 and 3 were found to code the WRKY conserved domain. All CsWRKYs had more than two motifs, and some CsWRKYs holded 7 kinds of motif at most. The motifs were diverse among different groups. For example, the group 1 had 9 motifs (motif 1, 2, 3, 4, 5, 6, 8, 9 and 10), the number of motif 1, 2 and 3 was two, and motif 6 and 9 were unique to these members of group 1, which may be significant to the specific functional WRKYs. Motif 10 only appeared both group 1 and 2c. Motif 4 and 8 were present in group 1, 2b and 2c. Motif 7 was present in group 2d, 2e and 3. Motif 5 was absent in group 2c and 2e merely. The number of motif 5 was two in all members of the group 2a. As a whole, the motif analysis of CsWRKYs showed that every group possibly had peculiar conservation, corresponding to the grouping of the phylogenetic tree.
All identified CsWRKY genes were distributed across all C. sativa chromosomes using MapChart (Chr1–Chr20, Figure 4). Some genes had alternative splicing, so some protein name represented one gene name (in the Additional file 7: Figure S2), the rest of WRKY genes were represented by corresponding protein labels. The map revealed the position of CsWRKY179 was not mapped to any chromosome because it’s present on scaffold region. The remaining distribution of 201 CsWRKY genes on the chromosomes was uneven. Some chromosomes had the same number of CsWRKY genes. For example, the minimal number of CsWRKY genes only was three in several chromosomes (chromosome 1, 15 and 19). Chromosome 3 and 17, chromosome 12 and 20, chromosome 16 and 7 had 6, 10, 15 genes respectively. Some chromosomes (chromosome 2, 8, 9 and 13) had 11 CsWRKY genes. A maximum number of 20 CsWRKY genes on chromosome 11 accounted for 8.3% of all CsWRKY genes. WRKY genes had variant distributions in different species, such as Chickpea [34], Kiwifruit [24] and Vitis vinifera [35].
Gene duplication, synteny analysis and selection pressure analyses
Gene duplication played a vital driving force in gene evolution including tandem and segmental duplication. The two main evolutionary patterns played a prominent role in the generation of new genes and the evolutionary formation of new functions, which was also the pivotal reason to expand the member of gene family in plants [22, 29, 36-38]. In order to analyze the expansion patterns of CsWRKY genes, tandem duplication event (the distance of two or more genes within 200kb in equal chromosome) was confirmed on the basis of the method by Guo et al [35]. The results indicated that the gene duplication of the CsWRKYs contained no tandem duplication. Meanwhile, the gene duplication of CsWRKYs among C. sativa chromosomes was identified using Blastp and MCScanX. The analysis showed that there was no tandem duplication, which was consistent with above result. There were 137 segmental duplication events (Figure 5, Additional file 4: Table S4). These revealed that certain CsWRKYs appeared with the segmental duplication events to a great extent. It has been reported that there were 17 and 55 segmental duplication events in pineapple and kiwifruit chromosomes [24, 27]. The above result indicated that segmental duplication was an uppermost mode of the evolution of CsWRKY genes.
The comparative synteny maps of two different genomes (C. sativa and A. thaliana, C. sativa and B. rapa) were performed in order to explore the origin and evolution of CsWRKY genes (Figure 6, Additional file 5: Table S5). The corresponding orthologs between 173 CsWRKY genes and 65 AtWRKY genes were confirmed and had 173 gene pairs. The corresponding orthologs of 166 CsWRKY genes and 111 BrWRKY genes had 282 gene pairs. The multiple CsWRKY genes corresponded to a single AtWRKY gene. These syntenic relationship showed the expansion of CsWRKY genes possibly occurred after A. thaliana in evolution.
Expression profiling analysis
The transcript expression levels of 202 CsWRKY genes were displayed in Figure 7. The expression number of CsWRKY genes approximately reached 89.11% in R, followed by 88.61% (F), 81.68% (IF), 80.69% (S), 78.71% (OL), 77.23% (ESD), 76.73% (EMSD) and 75.25% (GS). The percent of CsWRKY genes expression in other tissues were 68.32% (C), 67.82% (LMSD) and 64.85% (YL) respectively. The lowest expression percent was 52.48% in LSD. The 70 CsWRKY genes (34.65%) expressed in twelve tissues including 28 CsWRKY genes in group 1, 41 CsWRKY genes in group 2 and 1 CsWRKY gene in group 3. Simultaneously, the 24 CsWRKY genes highly expressed in every tissue. Furthermore, CsWRKY36 and CsWRKY227 not expressed in every tissue. Some CsWRKY genes also had the tissue specific expression pattern (Figure 7). In group 1, there were two genes (CsWRKY25 and CsWRKY34) co-expressed in IF and LSD. CsWRKY174 in group 2b was co-expressed in GS and R. In group 2c, CsWRKY182 and CsWRKY202 specifically were expressed in F, CsWRKY204 only was expressed in ESD, CsWRKY73 was co-expressed in F and OL, CsWRKY203 was co-expressed in F and GS. CsWRKY118 in group 2e was co-expressed in GS and S. In group 3, there were three genes (CsWRKY111, CsWRKY208 and CsWRKY226) merely expressed in R, CsWRKY100 was expressed both in ESD and R.
The functions of most AtWRKY genes had been verified. It has been reported that AtWRKY25 and AtWRKY33 were sensitive to NaCl, and overexpression of either gene enhanced tolerance to NaCl [39], which belonged to group 1. According to structural features and phylogenetic analysis of CsWRKYs and AtWRKYs, we found that AtWRKY33 was the ortholog of CsWRKY8, CsWRKY9 and CsWRKY10, AtWRKY25 was the ortholog of CsWRKY48, CsWRKY49 and CsWRKY50. These CsWRKY genes belonged to group 1. To provide further clues to the function of CsWRKY genes under SS, the expression levels of six CsWRKY genes in roots and shoots under SS and normal conditions (NC) were displayed in Figure 8. These CsWRKY genes had higher levels in shoots than that in roots under same conditions. Compared with gene expression levels in NC, three CsWRKY genes (CsWRKY8, CsWRKY9 and CsWRKY10) under SS had a significant down-regulated expression in roots, three other CsWRKY genes (CsWRKY48, CsWRKY49 and CsWRKY50) under SS had no obvious change in roots. These CsWRKY genes had higher levels of expression to varying degrees under SS in shoots. In particular, CsWRKY50 were significantly up-regulated in shoots under SS. The expression analyses implied that these CsWRKY genes played an important part in the response to SS. These results meant that most genes were deliberately expressed in space and time, which were not useless genes. They may have functional differentiation in the process of evolution to cope with different environmental changes.