Identification of 242 WRKY family members in Camelina sativa
To identify WRKY proteins encoded in camelina genome, all 75 Arabidopsis AtWRKY protein sequences were used as queries to search the publicly available genome sequences of C. sativa by BLAST (Basic Local Alignment Search Tool), and then examined using HMM (Hidden Markov Model) with the WRKY-domain (PF03106). A total of 243 putative WRKYs were initially obtained in Camelina sativa. One predicted WRKY protein (XP_019082498.1) was removed due to incomplete WRKY domain. The remaining 242 WRKYs (see Additional file 2: Table S2) were renamed from CsWRKY1 to CsWRKY242, respectively, and their characteristics were further detected, including gene locus ID, gene start and end position in the chromosomes, member classification, the protein sequence length (SL), molecular weight (MW) and isoelectric point (pI). The length of CsWRKY proteins ranged from 136 (CsWRKY136) to 1699 amino acids (CsWRKY47), with an average of 371 amino acids. Their MWs ranged from 15.98 kDa (CsWRKY136) to 190.70 kDa (PvWRKY34), with an average of 41.33 kDa. The pIs of CsWRKYs ranged from 4.13 (CsWRKY204) to 10.47 (CsWRKY107), with an average of 7.38 (Additional file 2: Table S2).
Compared with the numbers of WRKY family members reported in other plant species, CsWRKYs family with 224 gene loci coded 242 proteins is one of the largest WRKY families, just less than that in Brassica napus (287 WRKY genes) (Additional file 1: Table S1), indicating that this TF family was extensively expanded in C. sativa during evolution process.
Alternative splicing (AS) occurs among 15 CsWRKY genes, producing 33 spliced ORFs
Alternative splicing (AS) is an important post-transcriptional regulatory mechanism, which causes one gene locus to generate two or more ORFs (open reading frame) different in initiation and termination sites, as well as splice donor and acceptor sites. These alternative ORFs encode different protein variants from the same gene locus. In this study, we identified that 15 CsWRKY genes have different types of AS events such as exon skipping and intron retention, producing 33 spliced variants of CsWRKY (see Additional file 3: Table S3, Additional file 8: figure S1). All the splice variants were completely matched by their corresponding mRNAs identified in camelina. For example, Csa06g039950 had two spliced variants (CsWRKY21 and 22) while Csa10g016180 generated three spliced variants (CsWRKY41, 35, and 38). These results indicate that alternative splicing might be worked in regulating the function of WRKY family members.
Multiple sequence alignment and phylogenetic analysis of CsWRKY proteins
To investigate the evolutionary relationship among CsWRKY proteins, a multiple protein sequence alignment of all 242 CsWRKYs was conducted using ClustalW with Bioedit software. Alignment result of the randomly-selected CsWRKYs was showed in Figure 1(For details to see Additional file 9: Figure S2). Subsequently, based on the highly conserved WRKY domains of 242 CsWRKYs and 75 AtWRKYs, a phylogenic tree was built by MEGA7.0 software using the neighbor-joining method (Figure 2).
WRKYGQK heptapeptide is considered as the signature for WRKY proteins. As shown in Figure 1, however, WRKY domain amino acid sequence variants were detected among 11 CsWRKYs (all in group IIc) although WRKYGQK was the most common variant present in 231 CsWRKYs. For WRKYGQK sequence, most variations involved Q to K substitutions, with a few cases from Q to other amino acids. For example, the variant WRKYGXK sequence (X is any amino acid) was detected in CsWRKY202, but the variant WRKYGKK sequence were present in 10 CsWRKYs including CsWRKY127, 128, 135, 223, 228, 229, 230, 232, 233 and 235. All these variants were occurred in group IIc, showing that WRKYGQK sequence in subgroup IIc WRKY proteins prone to mutate, like in WRKY families from peanut [34] and soybean [32].
According to the constructed phylogenetic tree containing CsWRKYs and AtWRKYs (Figure 2), the CsWRKYs were classified into three main groups (groups I, II and III), with 56 CsWRKYs in group I, 149 CsWRKYs in group II, and 37 CsWRKYs in group III, respectively. Moreover, CsWRKYs in group II were further classified into five subgroups (groups IIa, IIb, IIc, IId and IIe), containing 12, 26, 63, 24, and 24 members of CsWRKYs, respectively. Notably, among all these groups or subgroups most members of the CsWRKYs were present in subgroup IIc, which is similar to AtWRKY proteins. Combined description above that WRKYGQK sequence variants were only present in subgroup IIc, this suggests that subgroup IIc WRKY proteins might possibly involve in a variety of biological functions.
Of the 242 CsWRKYs found, 56 CsWRKYs in group I had two conserved WRKY domains located in the N- and C-terminus of the protein, respectively, as well as the zinc-finger domains of C-X4-C-X22-23-H-X1-H. Total of 186 CsWRKYs in group II and III contained one WRKY domain. 149 CsWRKYs in group II had the zinc-finger domain of C-X4-5-C-X23-24-H-X1-H while 37 CsWRKYs in group III had the zinc-finger domain of C-X7-C-X23-H-X1-C. It was worth noting that based multiple sequence alignment, either one or two more mismatched amino acids were detected within the conserved regions of 144 CsWRKY proteins including CsWRKY100, 101, 111, 186, 201, 208, 226, 240, 241, and 242. Such annotation errors might be caused by genomic sequencing or gene prediction software.
As shown in Figure 1, each WRKY domain had at least one conserved intron structure except the N-terminal WRKY domain of group I WRKYs. Intron insertion position was also rather conserved. Two main kinds of intron structures were present in the conserved regions of CsWRKY genes, much likely to those conserved in AtWRKYs [35]. One of the introns is PR intron which was spliced at the codon of R amino acid. The other is the VQR intron. The PR intron is located in the WRKY domains of CsWRKY genes in group III, and subgroups Ic and IIc–e. However, the VQR intron exists within the zinc-finger structure (C-X4-5-C-X5-VQR-X18-19-H-X1-H) in group IIa and IIb.
A number of WRKY proteins contain both R-protein conserved domains and the typical WRKY domain, and thus named R-protein WRKYs, which are one of the most significant features of the WRKY proteins in flowering plants [25]. For example, Arobidopsis had three R protein-WRKYs (AtWRKY16, AtWRKY19 and AtWRKY52) [26]. Here, the phylogenetic analysis showed that AtWRKY19 and CsWRKY47 were clustered closely in group I while AtWRKY16, AtWRKY52, CsWRKY186 and CsWRKY220 were highly classified together into subgroup IId. This indicated that CsWRKY47, CsWRKY186 and CsWRKY220 may belong to R-protein WRKYs. Further protein architecture analysis identified three CsWRKY proteins as the R-protein WRKYs. CsWRKY47 contained the typical domain of PAH-WRKY (1-N terminus)-WRKY (1-C terminus)-NB-ARC and the protein kinase domain in the C-terminal end of the protein. CsWRKY186 had the typical domain of NB-ARC-LRR-WRKY. CsWRKY220 had the TIR-NB-ARC-LRR-WRKY domain. The three R-protein CsWRKYs might be involved in camelina responses to biotic and abiotic stress.
Ten conserved motifs were detected in CsWRKY proteins
In order to gain insight into the functional regions of CsWRKY proteins, the MEME (Multiple Em for Motif Elicitation) program was employed to reveal the conserved motifs among 242 CsWRKY proteins. A total of 10 conserved motifs were identified, namely Motif 1-10 (Figure 3). These 10 conserved motifs were shown with colored boxes according to the scale (Additional file 10: Figure S3), with sizes ranged from 15 to 50 aa residues in width (Figure 3). Among them, motif 1was examined to encode the conserved WRKY domain, motif 2 and 3 were examined to encode the conserved zinc finger structure. One or two WRKY motifs were detected in all CsWRKYs as described in the sequence alignment (Figure 1). In addition to the conserved WRKY motif, other conserved motifs (motifs 4-10) were also predicted to exist among CsWRKY proteins. Each CsWRKY protein had at least three conserved motifs, with the maximum number of seven conserved motifs for several CsWRKYs. The distributions of conserved motifs were diverse among different CsWRKY groups. For example, group I CsWRKYs had nine motifs (motif 1, 2, 3, 4, 5, 6, 8, 9 and 10), with each having at least two of motif 1, one zinc finger (motif 2 and 3), as well as motif 6 and 9 which were unique to the CsWRKY members in group I. Motif 10 only appeared in both group I and subgroup IIc. Motif 4 and 8 were present in group I, and subgroup IIb and IIc. Motif 7 existed in subgroup IId, IIe and group III. Motif 5 was only absent in subgroup IIc and IIe. However, all CsWRKY members of group IIa contained two conserved Motif 5. As a whole, the motif analysis of CsWRKYs showed that every group or subgroup of CsWRKYs had similar motif compositions, corresponding to the grouping by the phylogenetic tree analysis.
CsWRKY genes were uneven distributed on chromosomes
To determine the genomic distribution of CsWRKY genes, all 242 CsWRKY mRNAs/ORFs identified were mapped on their corresponding chromosome by BLAST against the released genome of C.sativa. A total of 224 CsWRKY gene loci were distributed across all C. sativa chromosomes visualized by MapChart (Chr1-Chr20, Figure 4), with 15 CsWRKY gene loci generating 33 alternatively spliced variants of CsWRKY ORFs (Additional file 3: Table S3) (For details to see Additional file 8: figure S1). However, the distribution and density of the WRKY genes on each chromosome were uneven. For example, the minimal number of CsWRKY genes (three loci) were located chromosome 1, 15 and 19 whereas the largest number of CsWRKY genes (20 loci) were detected on chromosome 11, which accounted for 8.3% of all CsWRKY genes. Moreover, the same number of CsWRKY genes (11 loci) existed on chromosome 2, 8, 9 and 13). One exception is that CsWRKY179 was not mapped to any chromosome since it’s present on an scaffold region. Interestingly, a few regions with a higher density of CsWRKY genes were observed on some chromosomes such as chromosome 10, 11, and 16, suggesting that there might be WRKY gene hot spots in the camelina genome.
CsWRKY gene family experienced 137 segmental duplication events, having a high synteny with WRKYs from Arabidopsis and Brassica rapa
As described above, CsWRKY gene family was expanded much greatly compared to other plant WRKY families, ranking as the second in the quantity of WRKY member tested so far. To elucidate the mechanism underlying WRKY gene family expansion in C. sativa, Blastp and MCScanX (Multiple Collinearity Scan toolkit) were employed to identify gene duplication patterns (tandem and segmental duplication), which were considered to provide genetic materials for the generation of new genes and the evolutionary formation of gene new biological functions [32, 36-39]. Results showed that 137 segmental duplication events were detected for 146 CsWRKY genes (Figure 5, Additional file 4: Table S4). However, no tandem duplication was observed for any CsWRKY gene. No tandem duplication event was further confirmed on the basis of the method by Guo et al [40]. Compared to other plant WRKY genes families tested (see Additional file 5: Table S5), CsWRKY genes were experienced the largest number of segmental duplication events. These revealed that segmental duplication was a major driving force for CsWRKY gene evolution.
The comparative synteny maps of two related genomes (C. sativa VS A. thaliana, and C. sativa VS Brassica rapa (B. rapa)) were performed in order to further explore the origin and evolution of CsWRKY genes (Figure 6, Additional file 6: Table S6). The orthologous relationships were detected between 173 CsWRKY genes and 65 AtWRKY genes, and then 173 orthologous WRKY gene pairs were identified accordingly, with most of them located on the syntenic locus in Arabidopsis and C. sativa chromosomes. Similarly, the orthologous relationships were also present between 166 CsWRKY genes and 111 BrWRKY genes, and the corresponding 282 orthologous WRKY gene pairs were built, with lots of them found on the syntenic locus in chromosomes of C. sativa and B. rapa. Remarkably, multiple CsWRKY genes were identified to be putative orthologs of a single AtWRKY gene. For instance, CsWRKY123, CsWRKY124 CsWRKY125 and CsWRKY126 were the orthologs of AtWRKY39. Such syntenic relationship detected form these WRKY genes indicates the expansion of CsWRKY genes possibly occurred after A. thaliana in evolution.
To investigate whether these orthologous WRKY genes underwent selection pressure (purifying and positive selection), the synonymous substitution rates (Ks) and non-synonymous substitution rates (Ka) of the identified orthologous CsWRKY gene pair were calculated using KaKs Calculator 2.0, followed by calculating the Ka/Ks to determine if the selective pressure acts on protein-coding CsWRKY genes or not [41]. Interestingly, Ks and Ka values for all orthologous CsWRKY gene pairs had a ω value of <1, indicating that purifying selection happen in these gene pairs, which is consistent with the reports on B. rapa by Tang et al [18].
Expression patterns of CsWRKY genes in twelve camelina tissues
Gene expression pattern can provide essential information to determine the biological function of a gene. To explore the possible functions of CsWRKY genes in C sativa growth and development, transcriptome data of C. sativa under normal growth conditions were downloaded from the publically available database [6] and used to examine the expression patterns of 202 CsWRKY genes in twelve tissues/organs, including root (R), stem (S), young leaf (YL), mature leaf (OL), flower (F), inflorescence (IF), early seed development (ESD), early-mid seed development (EMSD), late-mid seed development (LMSD), late seed development (LSD), germinating seed (GS) and cotyledon (C). The heat map illustration of expression profiles of CsWRKY genes is shown in Figure 7.
Notably, half more CsWRKY genes tested were found to express in at least one of those tissues although CsWRKY36 and 227 did not show any detectable expression. For example, 180 CsWRKY genes (89.11%) were expressed in R while 106 CsWRKY genes (52.48%) accumulated in LSD. The percent of expressed CsWRKY gene number were 88.61% in F, 81.68% in IF, 80.69% in S, 78.71% in OL, 77.23% in ESD, 76.73% in EMSD, 75.25% in GS, 68.32% in C, 67.82% in LMSD, and 64.85% in YL, respectively. 70 CsWRKY genes (34.65%) were expressed in all twelve tissues, including 28 CsWRKYs in group I, 41 CsWRKYs in group II, and one CsWRKY gene in group III. Particularly, 24 CsWRKY genes displayed high expression levels in all tissues. Furthermore, a number of CsWRKY genes exhibited a tissue-specific expression pattern. For example, genes expressed preferentially in two tissues included CsWRKY25 and 34 from group I in IF and LSD, CsWRKY174 of group IIb in GS and R, CsWRKY73 and 203 from group IIc in F and OL or GS, CsWRKY118 from group IIe in GS and S, and CsWRKY100 from group III in ESD and R. The genes expressed specifically in one tissue were CsWRKY182, 202 as well as 204 from group IIc in F, and ESD, respectively, and CsWRKY111, 208 and 226 from group III in R. Such expression analysis indicates that most of CsWRKY genes may function constitutively for plant organ development, with a number of them working differentially in different tissues.
A number of CsWRKY genes expressed in responses to various abiotic stresses
The functions of most AtWRKY genes had been verified, which can be used to infer potential roles of CsWRKY genes clustered together with the AtWRKYs in phylogenetic tree. For instance, AtWRKY25 and 33 were sensitive to NaCl stress, and overexpression of either of them enhanced tolerance to NaCl in Arabidopsis [42]. According to structural features and phylogenetic analysis of CsWRKYs and AtWRKYs, AtWRKY33 was identified to be the ortholog of CsWRKY8, 9 and 10, while AtWRKY25 was the ortholog of CsWRKY48, 49, and 50, indicating that these six CsWRKY genes may participate in plant responses to NaCl stress. Subsequently, to verify this prediction, expression profiles of the six CsWRKY genes were examined in roots and shoots under salt stress (SS) and normal conditions (NC) by quantitative RT-PCR (qRT-PCR) (Figure 8). These six genes (CsWRKY8, 9, 10, 48, 49, and 50) showed higher levels of expressions in shoots than that in roots under SS. Compared with gene expression levels in NC, expressions of three CsWRKY genes (CsWRKY8, 9, and 10) under SS were significantly down-regulated in roots, whereas the other three CsWRKY genes (CsWRKY48, 49, and 50) had no obvious change in roots. In particular, CsWRKY50 transcript was significantly up-regulated in shoots under SS. These results implied that the six CsWRKY genes may positively regulate SS response in shoot, whereas three of them (CsWRKY8, 9, and 10) may act negatively in roots.
As described above, 15 CsWRKY gene loci contain alternative structures. To investigate whether all alternative splice variants from a CsWRKY gene locus involve in plant responses to various stress, total of 5 splice transcripts derived from two CsWRKY gene loci were selected to examine their expression levels in camelina seedlings treated by cold stress (CS), drought stress (DS) and, and SS, respectively. As shown in Figure 9, these five splice isoforms had different expressions under the three stress conditions, with three of them expressing 10-fold higher than the other two. CsWRKY21 and 22, the two splice variants from gene locus Csa06g039950, both showed the highest expression at almost the same level under SS among the three treatments. However, CsWRKY21 exhibited significantly larger expression under CS than under DS. On the contrary, CsWRKY22 expression was more under DS than under CS. Similarly, CsWRKY35, 38, and 41, the three spliced variants from gene locus Csa10g016180, expressed at different levels, with CsWRKY35, 38, and 41 as the predominant transcript in response to DS, CS, and SS, respectively. More strikingly, these data led to unexpected findings that alternative slice variants from the same gene locus may function differentially in plant responses to different stresses.