Identification and phylogeny analysis of TaSnRK genes
After genomic retrieval by BLASTp with 38 AtSnRK and 47 OsSnRK proteins (Table S1), 1240 putative TaSnRK proteins (734 HC members and 506 LC members) were hunted [5, 24–25]. However, after validation by NCBI CDD and SMART online tools, only 186 protein sequences (transcribed from 149 genes), including splice variants (181 HC members and 5 LC members), were confirmed. The SnRK family which contained 21 SnRK1 proteins (20 HC members and 1 LC member), 38 SnRK2 proteins (38 HC members), and 127 SnRK3 proteins (123 HC members and 4 LC members) (Fig. 1; Table S2). Because the naming method of SnRK genes in wheat was not consistent currently, with several genes having several synonymous names, we renamed all genes according to their subfamily (SnRK1, SnRK2, and SnRK3) and sub-genome location (A, B, or D) (Table S2). Gene’s name started with an abbreviation for the acronym of species name Triticum aestivum (Ta), followed by subfamily (SnRK1, SnRK2, and SnRK3). Genes belonging to triads share the same gene number, but use suffixes A, B, and D to distinguish subgenomes they located. Consecutive minuscules separated by a semicolon distinguished splice variants (e.g., TaSnRK1.3-B;a and TaSnRK1.3-B;b) [26]. Phylogenetic trees generated by SnRK protein sequences of Arabidopsis, rice, and T. aestivum showed that TaSnRKs belong to well-defined subfamilies (Fig. 1). A previous study found that 94 proteins from 30 genes of the TaSnRK3 subfamily were identified [23]. This article matched all these genes except TaCIPK6 because we did not retrieve its gene or CDS (Coding Domain Sequence) sequence. The remaining 29 genes belonging to the SnRK3 subfamily were firstly identified in this paper (Fig. 1; Table S2).
Chromosomal localization, gene duplication, synteny and Ka/Ks analysis
TaSnRK genes are located on 21 chromosomes, except gene TaSnRK3.44-U in an unanchored contig (Fig. 2a; Table S2). And their chromosome distributions are uneven. There are eight, seven, and eight genes on the 1A, 1B, and 1D chromosomes, respectively, and three, one, and two genes on the 7A, 7B, and 7D chromosomes, respectively. That may be related to the different sizes and structure of the chromosomes. Overall, TaSnRK genes tended to located in the more central segments of the chromosomes (R2a, R2b, and C; 73.3% of genes) were more than genes in the distal telomeric parts of the chromosomes (R1 and R3; 26.7% of genes) (Fig. 2a) [26].
Compared with the number of SnRK genes in Arabidopsis (47) and rice (38), the number in wheat was much higher (149) (Fig. 3a-c). This partially due to the hexaploid nature of wheat. However, even when corrected for ploidy-level, the number of SnRK genes in bread wheat was significantly higher than in Arabidopsis and rice (1.71- and 1.38-fold higher, respectively). Specifically, genes in SnRK1 and SnRK3 subfamilies are largely more than expectation (Fig. 3a-d). To better understand why SnRK genes are so abundant in the wheat, we analyzed homoeologous groups in detail (Table 1). At the genome-wide level, 35.8% of wheat genes were present triads [26]. By contrast, the triad proportion of TaSnRK genes is as high as 84.86%. If only “HC” SnRK genes (HC defines genes with high confidence in the wheat genome) were considered, this ratio was even higher (85.42%, Table 1). Moreover, the loss of one homolog was less pronounced in SnRK genes (2.68% vs. 13.2%, Table 1). And only 19 genes were singletons (12.75%; 11.81% for HC only) because they could not be identified as groups. Therefore, the high homologous retention rate could partly explain the high number of TaSnRK genes. That is also evidence that genome-wide doubling events (WGD) contribute to the expansion of gene families.
Table 1
Groups of homoeologous SnRK genes in wheat.
Homoeoligous group (A:B:D) | HC-only1 | all wheat genes2 | wheat SnRK (excluding splice variants, HC and LC) | wheat SnRK (excluding splice variants, HC-only) |
number of groups | number of genes | % genes | number of groups | number of genes | % genes |
1:1:1 | 41.3% | 35.8% | 425 | 126 | 84.6% | 41 | 123 | 85.4% |
n:1:1/1:n:1/1:1:n3 | | 5.7% | / | / | / | / | / | / |
1:1:0/1:0:1/0:1:1 | | 13.2% | 26 | 4 | 2.7% | 2 | 4 | 2.8% |
orphans/singletons | | 37.2% | 19 | 19 | 12.8% | 17 | 17 | 11.8% |
other ratios4 | | 8.0% | / | / | / | / | / | / |
total | | 99.9% | | 149 | 100.0% | | 144 | 100.0% |
1,2according to IWGSC (2018); 3 n > 1; 4 either n:1:n or 0:1:n, n > 1; 5 Compare to HC-only, one triads (TaSnRK3.30-A, TaSnRK3.30-B, TaSnRK3.41-D) was added; 6 Compare to HC-only, one triads (TaSnRK3.30-A, TaSnRK3.30-B) was changed to another triads (TaSnRK3.31-A, TaSnRK3.43-D) |
Furthermore, to investigate the gene duplication events in wheat, tandem and segmental duplication were also analyzed. Through screening sequence’s similarity and matching rate, 49 paralogous pairs belonging to segmental copies were found, and there was no tandem repetition (triads were not considered) (Fig. 2b; Table S3). Combined with the phylogenetic tree (Fig. 1), we found that the segment duplications occurred within each subfamily (SnRK1, 17 pairs; SnRK2, three pairs; SnRK3, 29 pairs). Additionally, synteny analysis between wheat, Arabidopsis, and rice were also compared to explore the evolutionary constraints acting on the SnRK gene family (Fig. 3e). The results revealed that 15 (wheat-Arabidopsis; SnRK1, 0 pairs; SnRK2, two pairs; SnRK3, 13 pairs) and 159 (wheat-rice; SnRK1, nine pairs; SnRK2, 35 pairs; SnRK3, 115 pairs) orthologous pairs were detected, respectively (Fig. 3e; Table S4). That reflects the conservative evolution nature of the SnRK gene family in the plant.
Meanwhile, we calculated Ka (non-synonymous substitution rate), Ks (synonymous substitution rate), and the Ka/Ks ratio of each pair (triad pairs are taken into account) to estimate evolutionary rates and determine the relative divergence times [27–29]. Except that the Ka/Ks ratios of two gene pairs (TaSnRK3.27-B/OsSnRK3.26, TaSnRK3.27-D/OsSnRK3.26) were > 1, the remaining pairs were all less than 0.87, implying that the SnRK genes were under different natural selection during plant adaptation to terrestrial environments (Fig. 3f; Table S3-4). Also, these genes' duplication events were estimated to have occurred around ~ 10.05 MYA (million years ago; Ks values average at 0.13). In addition, the Ka/Ks ratios of the orthologous gene-pairs between wheat and other two model plants were also calculated (Fig. 3g; Table S3-4). The average Ka/Ks value was maximum between wheat and Arabidopsis (0.54), followed by rice (0.13), suggesting the genes pairs between wheat and those two species appeared to have undergone extensive, intense purifying selection. The divergence time was about 41.50 and 10 MYA for Arabidopsis and rice, respectively. On the whole, the differentiation of gene pairs is concentrated in the α WGD period, which flowering plants experienced together (Fig. 3g) [27].
Motif composition, exon-intron structure, protein feature and structure analysis
The conserved domain and gene structure analysis provide information regarding gene duplication and their functional conservation during evolution [30]. We analyzed TaSnRKs according to the following criteria: (1) one gene of the triads was retained, (2) only the first variant was kept, the 65 representative TaSnRKs were chosen (Fig. 4). First, we found six of 20 conserved motifs related to the functional domains in plant SnRK proteins (Fig. 4b; Fig. S1-2, Table S5). Motif 2 and motif 4 contained Ser/Thr active site and ATP-binding region, respectively. Motif 17, motif 15, and motif 8 contained specific conserved domain of SnRK1 (KA1), SnRK2 (glutamine-303 to proline-318), and SnRK3 (NAF), respectively. Besides, ABA-specific box hid in motif 18 (Fig. 4b; Fig. S1-2, Table S5). Overall, the same subfamily members shared similar motif compositions, supporting the clustering of the phylogenetic tree. Second, we found that all genes of the SnRK1 and SnRK2 subfamily had introns. However, there were 40% of SnRK3 genes without the intron. Furthermore, there were 28.6%, 5.3%, and 18.9% genes from SnRK1, SnRK2, and SnRK3 had UTR (untranslated region), respectively (Fig. 4c). We also noted that the exon number of TaSnRKs varied from 1 to 13. In short, the difference of gene and protein structure within three subfamilies illustrates the possibility related to the existence of gene subfunctionalization or neofunctionalization in the TaSnRK family.
To further understand the features of TaSnRKs, protein properties of 186 TaSnRK proteins were analyzed (Table S2). The length of TaSnRKs ranged from 103 (TaSnRK1.10-B) to 885 (TaSnRK3.32-B;c) amino acids (aa). The average molecular weight (MW) of SnRK1, SnRK2, and SnRK3 subgroup proteins were 53, 40, and 51 kDa, respectively. Meanwhile, the theoretical isoelectric point (pI) were ranged from 4.38 (TaSnRK2.8-D;a) to 9.51 (TaSnRK3.12-B and TaSnRK3.21-B). Compared to the other two subfamilies, the members of SnRK2 have lower MW values, and pIs are all less than 7 (Fig. 5a). All the TaSnRKs belonged to hydrophilic proteins due to their grand average of hydropathicity (GRAVY) values are negative. The secondary and tertiary structure analyses revealed the prominence of helices and loops in TaSnRK proteins (> 73%) (Fig. 5b; Table S2). In addition, SnRK1 proteins matched models 6b1u.2 and 6c9h.1, while SnRK2s matched three different models (3zuu.1, 3ujg.1, and 5wax.1). Members of the SnRK3 were matched to five models, two of which were shared with SnRK1s, and the remaining three were 5iso.1, 4czt.2, and 6c9d.1, respectively. Generally, protein properties were similar between the SnRK1s and SnRK3s, and SnRK2s were significantly different from them, which supports the functional differentiation of this gene family.
Promoter analysis
After analyzing the 1.5 kb upstream region of TaSnRK genes, we found that most of them have been broadly categorized into growth and development relate cis-acting elements. CAAT-box, a common element in promoter and enhancer regions that plays an essential role in transcription, is predominant among these genes (98.5%) [31]. Other cis-acting elements, such as A-box, AT-rich sequence, CAT-box, CCAAT-box, GCN4_motif, and O2-site, were also found in various genes and known for participating in multiple growth regulation processes (Fig. 6a; Fig. S3, Table S5) [31–33]. Furthermore, many identified elements are related to hormone signaling pathways, namely auxin (AuxRR-core and TGA-element), abscisic acid (ABRE), gibberellin (TATC-box, GARE-motif, and P-box), salicylic acid (TCA-element and SARE), and methyl jasmonate (CGTCA-motif and TGACG-motif) [31, 34]. In addition, a few elements were predicted to be involved in abiotic stresses, such as wounding (WUN-motif), cold (LTR), and light (Box 4, GATA-motif, G-Box, I-box, Sp1, and MRE) [31–34]. We also found that the number of cis-acting elements distributed in the TaSnRK genes varied from 2 (TaSnRK3.42-B and TaSnRK3.44-U) to 63 (TaSnRK3.11-A) (Fig. 6b). The different numbers and types of cis-elements presenting indicated the diversified regulatory networks that the TaSnRK genes may involve.
Expression patterns of tissue developmental stages and stress conditions
To determine TaSnRK genes’ expression patterns, we analyzed 209 RNA-seq samples under nonstress conditions of ‘Azhurnaya’ (a hexaploid wheat variety) (Fig. 7a; Fig. S4, Table S7-8) [26]. About 80.65% (genes) TaSnRKs were expressed at least one developmental stage with a broad expression range from 1 to 285 TPM (Transcripts Per Kilobase of exon model per Million mapped reads). The remaining 18.81% (35 genes) showed a deficient expression with a TPM < 1 and were considered not expressed (SnRK1s, six genes; SnRK2s, three genes; SnRK2s, 26 genes). TaSnRK genes were expressed in multiple tissues, including leaves, roots, sheaths, spikelets, anthers, and grains (Fig. 7a; Table S8), which was consistent with previous reports in Arabidopsis [4–5, 35], tomato [11], potato [36], pea [37], and maize [38–39]. To verify the expression patterns in wheat, we made hierarchical clustering according to expression similarity, and then group the data into 15 different expression modules (Fig. 7b; Fig. S4). The result showed that only four expression patterns in SnRK1, while the SnRK3 contained the most types of expression patterns (up to 12). Moreover, the expression modules I and V were the most widely distributed, with multiple gene anastomoses in all three subfamilies (Fig. 7b-c). In general, the genes of TaSnRK1s and TaSnRK2s were both ubiquitously expressed during the plant life cycle (Fig. 7c). In addition to being generally expressed (39/127, 30.7%), the TaSnRK3 genes were also explicitly highly expressed in reproductive organ spikelets (modules XIII (66/127, 60.0%) and XIV (2/127, 1.6%)). Interestingly, not or low-expressed genes (module VIII) were also concentrated in the SnRK3 subfamily (20/127, 15.7%). That may be related to a large number of genes and functional redundancy in this subfamily. The multi-tissue and multi-stage expression suggest the TaSnRK gene family's critical role in wheat growth and development.
To verify the response pattern of TaSnRKs to abiotic and biotic stresses in wheat, nine genes belonging to different subfamilies with universally expression patterns were randomly selected. Their expression patterns under various stresses were further illustrated by heatmap and analyzed by qRT-PCR.
At low temperature, TaSnRK2.4-B’ expression was decreased (Fig. 8a), which was similar to the expression pattern of ZmSnRK2.10 in maize [40]. In the SnRK3 subfamily, two genes were responsive to cold stress. One is TaSnRK3.35-A, whose expression was notably increased, may form a complex with CBL1 to regulate cold stress like homologous gene AtSnRK3.10 [41]. Another is TaSnRK3.16-D, which was drastically down-regulated than control, implying that TaSnRK3 genes participate in cold response in various ways. Also, TaSnRK1.1-A, TaSnRK2.4-B, and TaSnRK2.7-A responded to phosphorus deficiency only in the root. Under drought treatment (Fig. 8a), the expression of TaSnRK2.4-B, TaSnRK2.7-A, and TaSnRK3.35-A were increased significantly. In heat stress, except for TaSnRK3.37-D, the remaining eight genes all reached a peak at six hours. When subjected dual stress of drought and heat, TaSnRK2.7-A and TaSnRK3.37-D showed a gradual increase, and the remaining seven genes were down-regulated first and then up-regulated. In the qRT-PCR result (Fig. 8b), these nine genes also respond to drought stress. Under osmotic stress, the expression levels of TaSnRK1.2-A, TaSnRK2.4-B, TaSnRK2.10-A, TaSnRK3.16-D, TaSnRK3.35-A, and TaSnRK3.37-D changed significantly. For ABA treatment, each gene notably responded to the ABA signal but reached its maximum expression value at different times.
For biotic treatment, the response patterns of the nine genes were also varied (Fig. 8a). In the infection of Zymoseptoria triticipowdery in leaf, all TaSnRK genes showed down-regulated expression after 14 days post inoculation (dpi), showing the rife response pattern. Moreover, three different modes of expression were observed when threatened by stripe rust. First, the expression of TaSnRK1.2-A, TaSnRK3.35-A, and TaSnRK3.37-D was increased over time. Second, TaSnRK1.1-A, TaSnRK2.4-B, TaSnRK2.7-A, TaSnRK2.10-A, and TaSnRK3.16-A were decreased first and then increased gradually. Finally, the expression of TaSnRK1.9-A was lower and lower. When inoculated with powdery mildew, TaSnRK1.2-A, TaSnRK2.7-A, TaSnRK2.10-A, and TaSnRK3.16-A had significant positive responses to stress. In contrast, the expression of TaSnRK2.4-B decreased with the infestation time. Interestingly, in our qRT-PCR results (Fig. 8c), the genes that maintain continuous response to powdery fungus are TaSnRK1.1-A, TaSnRK3.16-A, and TaSnRK3.37-A. Similarly, through the RNA-seq data, we only found two genes that significantly responded to Fusarium graminearum, namely TaSnRK1.2-A and TaSnRK2.7-A. However, in our study, all nine genes were responded to Fusarium graminearum infection. The inconsistencies in response patterns may be related to the difference between cultivars and stress treatment time. In conclusion, wheat SnRK genes are involved in varieties of abiotic and biotic stress regulation.
Gene Ontology (GO), Kyoto Encyclopedia of Gene and Genome (KEGG) and Protein-protein interaction(PPI)network Analysis
To understand the functions of the biological processes, all TaSnRKs were searched for GO and KEGG databases. In total, 86.5% (129/149) TaSnRKs were assigned to one or more GO terms in the biological process (129 genes), molecular function (129 genes), and cellular component (102 genes) categories (Fig. 9a-b; Fig. S5, Table S10). We also mapped the TaSnRKs in the KEGG pathway database (Fig. 9b). The result revealed that most genes were enriched in signal transduction of environmental information processing pathways, supporting the previous studies [1–2, 4–5]. By grouping comparison, we found that the members from the TaSnRK1 and the TaSnRK3 subfamilies may perform similar functions on plant physiological regulation (Fig. S6). The reason is they shared eight identical pathways, except the pathways of “Folding, sorting and degradation” and “Membrane transport”, which were only annotated with the SnRK3 subfamily. By contrast, the TaSnRK2 members were only enriched on the “signal transduction pathway” of environmental information processing. That signals functional differentiation among the TaSnRK subfamilies and supports the SnRK genes regulation network drawn by Cramer et al. [42].
Furthermore, we used the model plant Arabidopsis protein database as a reference to constructed the interaction network of the three TaSnRK subfamilies, respectively. At first, 26 protein pairs were predicted with confidence to interact (score > 0.47) between 21 TaSnRK1s and 12 other proteins in the SnRK1 subfamily (Fig. 9c; Table S11). Among these protein pairs, AT4G16360, AKINBETA1, SEX4, and SNF4 code subunit β, and SNF4 code subunit γ, together with subunits α to form TaSnRK1 heterotrimers (Table S11) [7, 10]. And, TaSnRK1s maybe regulate sugar signals and control plant energy transfer through interacting with B3-domain transcription factor FUSCA3 (FUS3), which have been verified by experiments in Arabidopsis [35, 43]. Simultaneously, TaSnRK1s also bind to a myristoylated 2C-type protein phosphatase ABI1, which contributes to balance carbon and nitrogen and ABA-free signaling pathways [44]. Another PP2C protein that binds TaSnRK1s is PP2C74, which involved the early development in the plant as one of the substrates of AtNMT1 in Arabidopsis [45]. Secondly, 52 protein pairs associated with TaSnRK2s were identified. Of these, four pairs occurred within the TaSnRK2 subfamily, and 48 pairs were associated with 38 TaSnRK2s and other five functional proteins (Fig. 9c; Table S11). TaSnRK2s and their interaction proteins are all concentrated in the PYR/PRL-PP2C-SNRK-ABF signaling pathway [8, 13–15, 22, 24–25, 35, 43–44]. When ABA content is low, the activities of SnRK2s were inhibited by clade A PP2C phosphatases (ABI1, ABI2, and HAB1) [46–48]. After the concentration increases, ABA is bound to receptor PYR/PRL and PP2C phosphatases to form regulating complexes and release the inhibition of SnRK2s and downstream stress signaling [46–48]. Meanwhile, uninhibited SnRK2s [49–51] activate downstream bZIP-like transcription factor (ABF), induces ABA response gene expression, and regulates plant growth and development [52]. Ultimately, the signal transduction process is inhibited by SnRK-Calcium-binging Sensor (OZS1) and scaffold protein ABA Terminator (Fig. 9c, 10a) [53–54]. Finally, the SnRK3 subfamily has been shown to participate in complex networks of interactions. We identified 135 protein pairs involving 24 SnRK3 proteins and ten other proteins. Among them, 18.52% (25/135) of interactions occurred within the subfamily, while the remaining protein pairs (81.48%, 110/135) were concentrated in the calcium-dependent CBL-SnRK3 signaling pathway (Fig. 9c; Table S11) [1, 3, 9, 16–17, 23, 25, 41–42, 55–58]. We speculate that the complex of TaSnRK3s and CBL can regulate Arabidopsis K+ transporter (AKT1), which balances the absorption and release of K+ in root cells under low K+ concentration [55]. Meanwhile, TaSnRK3s may participate in the SOS (SOS1, SOS2, SOS3, SIP3, and SIP4) pathway to regulate the transport of Na+/H+ reverse transporters (NHX1) and NO3- (NRT1.1) to enhance tolerance to abiotic stress [16–17, 23, 56–60].
Path analysis, subcellular localization and transient Agro-Infiltration assays of TaSnRK2.4-B
In previous studies, TaSnRK2.7 (renamed as TaSnRK2.4-D in this study) responded to polyethylene glycol and NaCl stress, but not to ABA application [59]. However, TaSnRK2.4-B, the triplet of TaSnRK2.4-D, showed significant change under ABA stress (100 uM) (at 2 h, 12 h, and 24 h), which was at odds with previous research (Fig. 9b) [59]. To investigate whether TaSnRK2.4-B indeed responds to ABA signaling, we quantified the expression levels of several key genes involved in the ABA pathways concerning to SnRK2 subfamily (Fig. 10a; Fig. S7) [8, 22, 24, 44, 60]. On the one hand, TaPYR1 (homologous to AtPYR1) was weakly expressed after ABA treatment, and TaPYR4 (homologous to AtPYR4) and TaPP2C (homologous to AtPP2CA and AtABI2) was continuously decreased with stress time. Moreover, TaSnRK2.4-B and TaABF (homologous to AtABF) both increased and reached a peak at 12 h and then decreased continuously. On the other hand, in the SCS-SnRK2 signaling pathway [60], TaSCS (homologous to AtSCS-A and AtSCS-B) was highly expressed at all time points, especially at 12 h, whose expression level was 731 times compared to control. Furthermore, its expression trend is utterly consistent with TaSnRK2.4-B. That suggests that ABA signaling may cause an outbreak of ROS (Reactive Oxygen Species), which induce SCS overexpression in a calcium- or non-calcium- dependent manner [60]. After that, TaSCS may inhibit the expression of TaSnRK2.4-B by directly interacting with it, as similarly reported in Arabidopsis [60].
To validate the subcellular localization of TaSnRK2.4-B, we successfully constructed the part27-35S:TaSnRK2.4-B-GFP fusion expression vector into Agrobacterium GV3101 and transferred it to tobacco leaves. The results showed that TaSnRK2.4-B was located in the cell membrane, cytoplasm, and nucleus (Fig. 10b), which was consistent with the localization results of its homologous gene TaSnRKs [22, 60].
In this study, we have demonstrated that TaSnRK2.4-B responds to many biotic stresses, and we wanted to examine whether this response exists in Phytophthora infestans (P. infestans) challenge. Therefore, the mature protein-coding regions of TaSnRK2.4-B were cloned and transiently expressed in N. benthamiana using Agrobacterium-mediated expression followed by a P. infestans infection. After six days post inoculation (dpi), significantly smaller lesions were observed in areas expressing TaSnRK2.4-B compared to that of free GFP (Fig. 10c-d; P < 0.05), suggesting that the expression of TaSnRK2.4-B inhibited P. infestans invasion to the host. This inhibition is likely to enhance plants’ ability to resist stress by responding to signal transduction pathways of hormones such as salicylic acid or gibberellin [61–62]. However, the specific mechanism of action still needs further experimental investigation.