Identification of 242 WRKY family members in Camelina sativa
To identify the WRKY proteins encoded in the camelina genome, all 72 Arabidopsis AtWRKY protein sequences were used as queries to search the publicly available genome sequences of C. sativa by BLAST (Basic Local Alignment Search Tool), and then they were examined using an HMM (hidden Markov model) with the WRKY-domain (PF03106). A total of 243 putative WRKYs were initially obtained from Camelina sativa. One predicted WRKY protein (XP_019082498.1) was removed due to the incomplete WRKY domain. The remaining 242 WRKYs (see Additional file 2: Table S2) were renamed from CsWRKY1 to CsWRKY242, and their characteristics were further specified, including their gene locus ID, gene start and end position in the chromosomes, member classification, protein sequence length (SL), molecular weight (MW) and isoelectric point (pI). The length of the CsWRKY proteins ranged from 136 (CsWRKY136) to 1699 amino acids (CsWRKY47), with an average of 371 amino acids. Their MWs ranged from 15.98 kDa (CsWRKY136) to 190.70 kDa (PvWRKY34), with an average of 41.33 kDa. The pIs of the CsWRKYs ranged from 4.13 (CsWRKY204) to 10.47 (CsWRKY107), with an average of 7.38 (Additional file 2: Table S2).
Compared with the numbers of WRKY family members reported in other plant species, the CsWRKY family with 224 gene loci coding for 242 proteins is one of the largest WRKY families, and it is slightly smaller than that of Brassica napus (287 WRKY genes) (Additional file 1: Table S1), indicating that this TF family was extensively expanded in C. sativa during its evolutionary process.
Alternative splicing (AS) occurs among 15 CsWRKY genes, producing 33 spliced ORFs
Alternative splicing (AS) is an important post-transcriptional regulatory mechanism that causes one gene locus to generate two or more ORFs (open reading frame) that differ in their initiation and termination sites as well as splice donor and acceptor sites. These alternative ORFs encode different protein variants from the same gene locus. In this study, we found that 15 CsWRKY genes have different types of AS events such as exon skipping and intron retention, producing 33 spliced variants of CsWRKY (see Additional file 3: Table S3, Additional file 8: figure S1). All the splice variants were completely matched by the corresponding mRNAs identified in camelina. For example, Csa06g039950 had two spliced variants (CsWRKY21 and 22) while Csa10g016180 generated three spliced variants (CsWRKY41, 35, and 38). These results indicate that alternative splicing might act in regulating the function of WRKY family members.
Multiple sequence alignment and phylogenetic analysis of CsWRKY proteins
To investigate the evolutionary relationships among CsWRKY proteins, a multiple protein sequence alignment of all 242 CsWRKYs was conducted using ClustalW with Bioedit software. The alignment result of the randomly-selected CsWRKYs is shown in Figure 1 (for details, see Additional file 9: Figure S2). Subsequently, based on the highly conserved WRKY domains of 242 CsWRKYs and 72 AtWRKYs, a phylogenetic tree was built with MEGA7.0 software using the neighbour-joining method (Figure 2).
The WRKYGQK heptapeptide is considered as the signature of WRKY proteins. As shown in Figure 1, however, WRKY domain amino acid sequence variants were detected among 11 CsWRKYs (all in group IIc), although WRKYGQK was the most common variant present in 231 CsWRKYs. For the WRKYGQK sequence, most variations involved Q to K substitutions, with a few cases from Q to other amino acids. For example, the variant WRKYGXK sequence (X is any amino acid) was detected in CsWRKY202, but the variant WRKYGKK sequence was present in 10 CsWRKYs including CsWRKY127, 128, 135, 223, 228, 229, 230, 232, 233 and 235. All these variants occurred in group IIc, showing that the WRKYGQK sequence in subgroup IIc WRKY proteins are prone to mutation, similar to the WRKY families from peanuts [34] and soybeans [32].
According to the constructed phylogenetic tree containing CsWRKYs and AtWRKYs (Figure 2), the CsWRKYs were classified into three primary groups (groups I, II and III), with 56 CsWRKYs in group I, 149 CsWRKYs in group II, and 37 CsWRKYs in group III. Moreover, the CsWRKYs in group II were further classified into five subgroups (groups IIa, IIb, IIc, IId and IIe) containing 12, 26, 63, 24, and 24 members of CsWRKYs, respectively. Notably, among all these groups or subgroups, most members of the CsWRKYs were present in subgroup IIc, which is similar to AtWRKY proteins. The combined description above indicates that WRKYGQK sequence variants were only present in subgroup IIc, suggesting that subgroup IIc WRKY proteins might be involved in a variety of biological functions.
Of the 242 CsWRKYs found here, 56 CsWRKYs in group I had two conserved WRKY domains located in the N- and C-termini of the protein as well as the zinc-finger domains of C-X4-C-X22-23-H-X1-H. A total of 186 CsWRKYs in groups II and III contained one WRKY domain. The 149 CsWRKYs in group II had the zinc-finger domain C-X4-5-C-X23-24-H-X1-H while 37 CsWRKYs in group III had the zinc-finger domain C-X7-C-X23-H-X1-C. It is notable that based on the multiple sequence alignment, either one or two more mismatched amino acids were detected within the conserved regions of 144 CsWRKY proteins, including CsWRKY100, 101, 111, 186, 201, 208, 226, 240, 241, and 242. These annotation errors might be caused by genomic sequencing or gene prediction software.
As shown in Figure 1, each WRKY domain had at least one conserved intron structure except the N-terminal WRKY domain of the group I WRKYs. The intron insertion position was also rather conserved. Two primary types of intron structures were present in the conserved regions of the CsWRKY genes, much like those conserved in AtWRKYs [35]. One of the introns is a PR intron, which was spliced at the codon of R amino acid. The other is the VQR intron. The PR intron is located in the WRKY domains of CsWRKY genes in group III and subgroups Ic and IIc–e. However, the VQR intron is present within the zinc-finger structure (C-X4-5-C-X5-VQR-X18-19-H-X1-H) in groups IIa and IIb.
A number of WRKY proteins contain both R-protein conserved domains and the typical WRKY domain, and thus they are named R-protein WRKYs, which are one of the most significant features of the WRKY proteins in flowering plants [25]. For example, Arabidopsis had three R protein-WRKYs (AtWRKY16, AtWRKY19 and AtWRKY52) [26]. Here, the phylogenetic analysis showed that AtWRKY19 and CsWRKY47 were clustered closely in group I while AtWRKY16, AtWRKY52, CsWRKY186 and CsWRKY220 were highly classified together into subgroup IId. This result indicated that CsWRKY47, CsWRKY186 and CsWRKY220 may belong to the R-protein WRKYs. A further protein architecture analysis identified three CsWRKY proteins as the R-protein WRKYs. CsWRKY47 contained the typical domain PAH-WRKY (1-N terminus)-WRKY (1-C terminus)-NB-ARC and the protein kinase domain at the C-terminal end of the protein. CsWRKY186 had the typical domain NB-ARC-LRR-WRKY. CsWRKY220 had the TIR-NB-ARC-LRR-WRKY domain. The three R-protein CsWRKYs might be involved in camelina responses to biotic and abiotic stress.
Ten conserved motifs were detected in CsWRKY proteins
To gain insight into the functional regions of CsWRKY proteins, the MEME (Multiple Em for Motif Elicitation) program was employed to reveal the conserved motifs among 242 CsWRKY proteins. A total of 10 conserved motifs were identified, namely Motifs 1-10 (Figure 3). These 10 conserved motifs are indicated with coloured boxes according to their scale (Additional file 10: Figure S3), with sizes ranging from 15 to 50 aa residues in width (Figure 3). Among these individuals, motif 1 was found to encode the conserved WRKY domain and motifs 2 and 3 were found to encode the conserved zinc finger structure. One or two WRKY motifs were detected in all the CsWRKYs as described in the sequence alignment (Figure 1). In addition to the conserved WRKY motif, other conserved motifs (motifs 4-10) were also predicted to exist among the CsWRKY proteins. Each CsWRKY protein had at least three conserved motifs, with the maximum number being seven conserved motifs for several CsWRKYs. The distributions of the conserved motifs were diverse among the different CsWRKY groups. For example, group I CsWRKYs had nine motifs (motifs 1, 2, 3, 4, 5, 6, 8, 9 and 10), with each having at least two of the motif 1 and one zinc finger (motifs 2 and 3) as well as motifs 6 and 9, which were unique to the CsWRKY members in group I. Motif 10 only appeared in group I and subgroup IIc. Motifs 4 and 8 were present in group I and subgroups IIb and IIc. Motif 7 was present in subgroups IId and IIe and group III. Motif 5 was only absent from subgroups IIc and IIe. However, all the CsWRKY members of group IIa contained two conserved Motif 5s. On the whole, the motif analysis of CsWRKYs showed that every group or subgroup of CsWRKYs had similar motif compositions corresponding to the grouping given by the phylogenetic tree analysis.
CsWRKY genes were unevenly distributed on chromosomes
To determine the genomic distribution of the CsWRKY genes, all 242 identified CsWRKY mRNAs/ORFs were mapped onto their corresponding chromosome by BLAST against the released genome for C. sativa. A total of 224 CsWRKY gene loci were distributed across all C. sativa chromosomes visualized by MapChart (Chr1-Chr20, Figure 4), with 15 CsWRKY gene loci generating 33 alternatively spliced variants of CsWRKY ORFs (Additional file 3: Table S3) (for details, see Additional file 8: figure S1). However, the distribution and density of the WRKY genes on each chromosome were uneven. For example, the minimal number of CsWRKY genes (three loci) were located on chromosomes 1, 15 and 19, whereas the largest number of CsWRKY genes (20 loci) were detected on chromosome 11, which accounted for 8.3% of all the CsWRKY genes. Moreover, the same numbers of CsWRKY genes (11 loci) were present on chromosomes 2, 8, 9 and 13. One exception is that CsWRKY179 was not mapped to any chromosome since it is present on a scaffold region. Interestingly, a few regions with a higher density of CsWRKY genes were observed on some chromosomes, such as chromosomes 10, 11, and 16, suggesting that there might be WRKY gene hot spots in the camelina genome.
CsWRKY gene family experienced 137 segmental duplication events, with high synteny with WRKYs from Arabidopsis and Brassica rapa
As described above, the CsWRKY gene family expanded greatly compared to other plant WRKY families, ranking as the second most common in the quantity of WRKY members tested so far. To elucidate the mechanism underlying WRKY gene family expansion in C. sativa, BLASTp and MCScanX (Multiple Collinearity Scan toolkit) were employed to identify gene duplication patterns (tandem and segmental duplication), which were considered to provide genetic materials to generate new genes and support the evolutionary formation of new gene biological functions [32, 36-39]. The results showed that 137 segmental duplication events were detected for 146 CsWRKY genes (Figure 5, Additional file 4: Table S4). However, no tandem duplication was observed for any CsWRKY gene. No tandem duplication event was further confirmed on the basis of the method by Guo et al [40]. Compared to other tested plant WRKY gene families (see Additional file 5: Table S5), the CsWRKY genes experienced the largest number of segmental duplication events. These events revealed that segmental duplication was a major driving force for CsWRKY gene evolution.
The comparative synteny maps of two related genomes (C. sativa VS A. thaliana, and C. sativa VS Brassica rapa (B. rapa)) were created to explore the origin and evolution of CsWRKY genes (Figure 6, Additional file 6: Table S6). Orthologous relationships were detected between 173 CsWRKY genes and 65 AtWRKY genes, and then 173 orthologous WRKY gene pairs were identified accordingly, with most of them located on the syntenic locus in Arabidopsis and C. sativa chromosomes. Similarly, the orthologous relationships were also present between 166 CsWRKY genes and 111 BrWRKY genes, and the corresponding 282 orthologous WRKY gene pairs were built, with many found on the syntenic locus in the chromosomes of C. sativa and B. rapa. Remarkably, multiple CsWRKY genes were identified as putative orthologs of a single AtWRKY gene. For example, CsWRKY123, CsWRKY124 CsWRKY125 and CsWRKY126 were the orthologs of AtWRKY39. This syntenic relationship detection in these WRKY genes indicates that the expansion of CsWRKY genes may have occurred after that of A. thaliana in evolution.
To investigate whether these orthologous WRKY genes underwent selection pressure (purifying and positive selection), the synonymous substitution rates (Ks) and non-synonymous substitution rates (Ka) of the identified orthologous CsWRKY gene pair were calculated using KaKs Calculator 2.0, followed by calculations of the Ka/Ks to determine if the selective pressure acts on protein-coding CsWRKY genes or not [41]. Interestingly, the Ks and Ka values for all the orthologous CsWRKY gene pairs had a ω value of <1, indicating that purifying selection occurred in these gene pairs, which is consistent with the reports on B. rapa by Tang et al [18].
Expression patterns of CsWRKY genes in twelve camelina tissues
The gene expression pattern can provide essential information to determine the biological function of a gene. To explore the possible functions of CsWRKY genes in C sativa growth and development, transcriptome data from C. sativa under normal growth conditions were downloaded from the publicly available database [6] and used to examine the expression patterns of 202 CsWRKY genes in twelve tissues/organs, including the root (R), stem (S), young leaf (YL), mature leaf (OL), flower (F), inflorescence (IF), early seed development (ESD), early-mid seed development (EMSD), late-mid seed development (LMSD), late seed development (LSD), germinating seed (GS) and cotyledon (C). A heat map illustration of the expression profiles for the CsWRKY genes is shown in Figure 7.
Notably, half more tested CsWRKY genes were found to be expressed in at least one of those tissues, although CsWRKY36 and 227 did not display any detectable expression. For example, 180 CsWRKY genes (89.11%) were expressed in R while 106 CsWRKY genes (52.48%) accumulated in LSD. The percentages of expressed CsWRKY gene numbers were 88.61% in F, 81.68% in IF, 80.69% in S, 78.71% in OL, 77.23% in ESD, 76.73% in EMSD, 75.25% in GS, 68.32% in C, 67.82% in LMSD, and 64.85% in YL. The 70 CsWRKY genes (34.65%) were expressed in all twelve tissues, including 28 CsWRKYs in group I, 41 CsWRKYs in group II, and one CsWRKY gene in group III. In particular, 24 CsWRKY genes displayed high expression levels in all the tissues. Furthermore, a number of CsWRKY genes exhibited a tissue-specific expression pattern. For example, genes that were expressed preferentially in two tissues included CsWRKY25 and 34 from group I in IF and LSD, CsWRKY174 of group IIb in GS and R, CsWRKY73 and 203 from group IIc in F and OL or GS, CsWRKY118 from group IIe in GS and S, and CsWRKY100 from group III in ESD and R. The genes that were specifically expressed in one tissue were CsWRKY182 and 202 as well as 204 from group IIc in F and ESD, respectively, and CsWRKY111, 208 and 226 from group III in R. This expression analysis indicates that most CsWRKY genes may act constitutively during plant organ development, with a number of them working differentially in different tissues.
Number of CsWRKY genes expressed in response to various abiotic stresses
The functions of most AtWRKY genes have been verified, which can be used to infer the potential roles of the CsWRKY genes clustered together with the AtWRKYs in the phylogenetic tree. For example, AtWRKY25 and 33 were sensitive to NaCl stress, and overexpressing either of them enhanced NaCl tolerance in Arabidopsis [42]. According to the structural features and phylogenetic analysis of CsWRKYs and AtWRKYs, AtWRKY33 was identified to be the ortholog of CsWRKY8, 9 and 10, while AtWRKY25 was the ortholog of CsWRKY48, 49, and 50, indicating that these six CsWRKY genes may participate in plant responses to NaCl stress. Subsequently, to verify this prediction, the expression profiles of the six CsWRKY genes were examined in the roots and shoots under salt stress (SS) and normal conditions (NC) by quantitative RT-PCR (qRT-PCR) (Figure 8). These six genes (CsWRKY8, 9, 10, 48, 49, and 50) showed higher levels of expression in the shoots than in the roots under SS. Compared with the gene expression levels in NC, the expression of three CsWRKY genes (CsWRKY8, 9, and 10) under SS were significantly downregulated in the roots, whereas the other three CsWRKY genes (CsWRKY48, 49, and 50) showed no obvious change in the roots. In particular, the CsWRKY50 transcript was significantly upregulated in the shoots under SS. These results implied that the six CsWRKY genes may positively regulate the SS response in the shoot, whereas three of them (CsWRKY8, 9, and 10) may act negatively in the roots.
As described above, 15 CsWRKY gene loci contain alternative structures. To investigate whether all the alternative splice variants from a CsWRKY gene locus are involved in plant responses to various stresses, a total of 5 splice transcripts derived from two CsWRKY gene loci were selected to examine their expression levels in camelina seedlings treated with cold stress (CS), drought stress (DS) and, and SS, respectively. As shown in Figure 9, these five splice isoforms showed different expressions under the three stress conditions, with three of them being expressed 10-fold higher than the other two. CsWRKY21 and 22, the two splice variants from gene locus Csa06g039950, both showed the highest expression at almost the same level under SS among the three treatments. However, CsWRKY21 exhibited significantly greater expression under CS than under DS. By contrast, the CsWRKY22 expression was greater under DS than under CS. Similarly, CsWRKY35, 38 and 41, the three spliced variants from gene locus Csa10g016180, were expressed at different levels, with CsWRKY35, 38, and 41 being the predominant transcripts in response to DS, CS, and SS, respectively. More strikingly, these data led to unexpected findings that alternative slice variants from the same gene locus may function differentially in plant responses to different stresses.