Worldwide, ~ 20% of projected bread wheat (Triticum aestivum) production is lost to pests and disease every year1. The deployment of genetic variation for disease resistance is the most sustainable and environmentally friendly way to protect wheat crops2. For over 100 years, breeders have conducted numerous crosses to enrich the wheat gene pool with novel resistance genes. Notably, more than 200 of the 467 currently designated resistance genes in bread wheat have their origin outside of the bread wheat gene pool3. However, the deployment of these interspecific resistance genes is often hampered by linkage drag, that is, the co-introduction of deleterious alleles from linked genes. Moreover, single resistance genes tend to be rapidly overcome by the emergence of resistance-breaking pathogen strains4. Cloning individual resistance genes would enable their introduction as genetically modified polygene stacks, which are likely to provide more durable resistance5.
Most of the ~ 283 plant disease resistance genes cloned to date encode either intracellular receptors of the nucleotide binding and leucine-rich repeat (NLR) class or extracellular membrane-anchored receptor-like proteins (RLPs, called RLKs when they contain an intracellular kinase) (Supplementary Table 1)3,6. A new group of resistance genes has recently come to light, whose members encode two protein kinases fused as one protein. These tandem kinase genes include Rpg1, Yr15, Sr60, Sr62, Pm24, WTK4, and Rwt4 (refs. 7–13). Other resistance genes offer some variation to this architecture with protein kinases fused to a steroidogenic acute regulatory protein-related lipid transfer domain (Yr36)14, a C2 domain and a multi-transmembrane region (Pm4)15, a major sperm protein (Snn3)16, an NLR (Tsn1, Rpg5, and Sm1)17,18, 19, and a von Willebrand factor type A domain (Lr9)20.
These kinase fusion protein-encoding resistance genes appear to be unique to the Triticeae, the clade of grasses that arose 12 million years ago and encompasses the cereals wheat, rye (Secale cereale) and barley (Hordeum vulgare)21. However, the fusion events that gave rise to these genes, far from being rare and isolated, happened multiple times between different classes of kinases and spawned diverse combinations8,10. This genomic innovation resulted in resistance against phylogenetically distinct fungal pathogens spanning the ~ 300-million-year-old ascomycete/basidiomycete divide.
Here, we cloned the stem rust resistance gene Sr43, which was transferred from tall wheatgrass (Thinopyrum elongatum) into bread wheat 45 years ago22,23. The dominant resistance gene Sr43 was introgressed into chromosome 7D of hexaploid wheat (Fig. 1a,b). We mutagenized grains of the Sr43 introgression line with ethyl methanesulfonate (EMS) and screened 2,244 surviving M2 families for susceptibility to Puccinia graminis f. sp. tritici (Pgt). We identified 23 families segregating for stem rust susceptibility, of which we confirmed 10 independent mutants by progeny testing (Supplementary Table 2) and genotyping (Supplementary Fig. 1 to 11).
To clone Sr43, we performed chromosome flow sorting and sequenced the wheat-Th. elongatum recombinant chromosome 7D in the parental line and eight mutants (Extended Data Fig. 1 and Supplementary Tables 3 and 4). Sequence-assembly of the parental line and mapping of the mutant reads identified a 10 kb-window in a scaffold containing a mutation in all eight mutants. To determine the gene structure of the Sr43 candidate, we (i) conducted transcriptome deep sequencing (RNA-seq) analysis of Sr43 seedling leaves and mapped the reads to the Sr43 genomic scaffold (Extended Data Fig. 2a), and (ii) sequenced Sr43 clones obtained by PCR from a full-length cDNA library. We detected four different splice variants (Extended Data Fig. 2b and Supplementary Tables 5 and 6). Splice variant 1 contained all eight mutations and consisted of 18 exons with a predicted open reading frame of 2,598 bp (Fig. 1c). The eight mutations were all G/C to A/T transitions typical of EMS mutagenesis and introduced non-synonymous changes (seven mutants) or an early-stop codon in the predicted coding sequence (Fig. 1c,d; Supplementary Tables 7 and 8). The probability that all mutants would have a mutation in the same gene by chance alone out of the 5,822 non-redundant genes of chromosome 7D24 was 4x10–6, indicating that the identified gene is a good candidate for Sr43.
As all identified EMS mutations affected the predominant full-length Sr43 transcript, (Fig. 1c), we used its predicted 866–amino acid sequence to search for functional domains and homologs. We determined that Sr43 harbors an N-terminal kinase domain and two domains of unknown function (DUFs) in its C terminus (Fig. 1d). Five of the sequenced mutations resided within the kinase domain, with the remaining three mutations affecting either DUF (Fig. 1d).
The closest homolog of the Sr43 kinase domain was the serine/threonine kinase interleukin-1 receptor associated kinase (STKc IRAK) (Supplementary Fig. 12), which indicates that the Sr43 kinase is conserved between animals and plants. Further homology searches suggested that the kinase domain is intact (Supplementary Fig. 13). Mutant 1013a disrupted one of the conserved glycine residues in the glycine-rich loop, suggesting that kinase activity is required for Sr43 function (Supplementary Table 9). The C terminus of Sr43, containing DUF3475 and DUF668, has a similar domain architecture (44% identity) as the N terminus of PHYTOSULFOKINE SIMULATOR (PSI) proteins from Arabidopsis (Arabidopsis thaliana), which are critical for plant growth26. Unlike Arabidopsis PSI1, Sr43 lacked a putative nuclear localization signal or a putative myristoylation site. Sr43 had no transmembrane domain, as predicted by InterPro. However, we established that Sr43 likely localizes to the nucleus and plastids, as evidenced by the fluorescence detected from the transient expression of a Sr43-GFP (green fluorescent protein) construct in Nicotiana benthamiana leaf epidermal cells (Extended data Fig. 3).
The domain structure of Sr43 was thus clearly different from that of proteins encoded by the ~ 283 cloned plant resistance genes, which were largely (78%) extracellular or intracellular immune receptors (Supplementary Table 1). To explore the unusual structure of Sr43 in more detail, we used the AlphaFold artificial intelligence–augmented system to generate a 3D model27 (Supplementary Data 1). We determined that Sr43 adopts a modular structure, with the kinase and the two DUFs separated by flexible linker loops (Fig. 1e). The kinase domain contained α-helices and anti-parallel β-strands, whereas the DUFs were entirely alpha-helical. We compared the predicted structure of the Sr43 protein to those in the Protein Data Bank28, which predicted a protein kinase–like structure for DUF668. Therefore, we searched for ATP binding sites using the small molecule docking program HADDOCK25 and identified one high-confidence ATP-binding site in DUF668 (Fig. 1f, Supplementary Table 10, and Supplementary Data 2).
We cloned a 14-kb genomic Sr43 fragment including 3.2 kb of upstream and 2.5 kb of downstream regulatory sequence (Fig. 2a) (Supplementary Table 11) and introduced the resulting binary construct into the wheat cultivar Fielder. We obtained one primary (T0) transgenic plant, which carried three copies of the transgene, based on qPCR of the hygromycin selectable marker. We tested homozygous T1 and T2 lines against a geographically and phenotypically diverse panel of 11 Pgt isolates from North America, the Middle East, Europe, and Africa. In 10 cases, the Sr43 transgenic and wild-type introgression lines were resistant, whereas the cultivars Chinese Spring (the introgression parent) and Fielder were susceptible (Fig. 2b, Extended Data Fig. 4a,b, and Supplementary Table 12). By contrast, the Pgt isolate 75ND717C was intermediately virulent on the Sr43 introgression and transgenic lines (Fig. 2b). For Pgt isolate 69MN399, we compared the phenotype at 21ºC and 26ºC and noticed a marked reduction in Sr43-mediated resistance at the higher temperature, in line with previous observations29 (Fig. 2c). Taken together, these results confirm (i) the broad-spectrum efficacy of Sr43 (ref. 29), (ii) that a 14-kb Sr43 genomic fragment is sufficient for function, and (iii) that the transgenic line faithfully recapitulates the race-specific and temperature-sensitive resistance of wild-type Sr43.
We searched for Sr43 homologs to investigate its evolutionary origin. We identified proteins harboring either the kinase domain or the two DUFs alone across the Poaceae family spanning 60 million years of evolution (Supplementary Tables 13 and 14; Extended Data Figs. 5 and 6). We detected the Sr43 protein domain arrangement only within the Thinopyrum, Triticum, Aegilops and Secale genera of the Triticeae tribe, but not within Hordeum, suggesting that Sr43 likely arose between 6.7 and 11.6 million years ago (Fig. 3 and Supplementary Table 15). In those lineages lacking a clear Sr43 homolog, we mapped genes encoding the kinase and DUFs present in Sr43 to different chromosomes (e.g., Sorghum bicolor, Zea mays, T. urartu and Ae. sharonensis) or on the same chromosome but 6 to 36 Mb apart (Ae. tauschii and Setaria italica), suggesting that the recruitment of the kinase domain to the DUFs at the Sr43 locus involved an ectopic recombination event (Supplementary Table 15). In Thinopyrum elongatum, the ancestral state and Sr43 were retained as an intraspecies polymorphism; some species of Aegilops and Triticum retained the ancestral state (e.g., Ae. tauschii), whereas others retained the Sr43 innovation (e.g., the T. aestivum and T. durum B genomes) (Fig. 3).
In summary, we cloned the wheat stem rust resistance gene Sr43, which encodes a protein kinase fused to two DUFs. Of the 68 Triticeae resistance genes cloned to date, most encode NLRs (n = 41), followed by protein kinase fusion proteins (n = 15) (Supplementary Table 16). Of the latter, seven are tandem kinases, whereas Sr43, Pm4, Snn3, Sm1, Tsn1, Yr36, Rpg5 and Lr9 are single or tandem protein kinases fused to different domains14–20 (Extended Data Fig. 7 and Supplementary Table 16). Little is known about the function of kinase fusion proteins, but most confer race-specific resistance that is phenotypically indistinguishable from NLR-mediated resistance. Their encoding genes do not fall into the Lr34/Lr67 category of adult, broad-spectrum and multi-pathogen resistance30,31. To explain the role of these kinases in resistance, we sought clues from NLRs whose modus operandi is now well understood. NLRs can act as guards that monitor host components targeted by pathogen effectors32. These guards detect the interaction between an effector and its target, leading to a conformational change in the NLR that triggers downstream defense responses. This tripartite interaction creates an evolutionary “tug-of war” that imposes selective pressure (i) on the effector to evade detection by the NLR while maintaining its ability to coerce the pathogenicity target, (ii) on the NLR to recognize new effector variants, and (iii) on the pathogenicity target to avoid being disrupted by the effector while maintaining its cellular function. Duplication of the pathogenicity target can release it from this functional constraint and provide a ‘decoy’ for the effector. This diversification may also result in the decoy behaving genetically as the resistance gene33. In about 10% of all NLRs, the decoy has become integrated into the NLR itself34. Such a guardee-decoy fusion ensures that both components are inherited as a single operational unit.
By extrapolation, protein kinase fusion proteins may be pathogenicity targets that are guarded by NLRs. All protein kinase fusion proteins have one apparent functional kinase that is fused to a second, typically non-functional, kinase domain but sometimes to an altogether different domain, as in for example Sr43, Lr9 and Pm4 (Extended Data Fig. 7). Perhaps similarly to those NLRs that carry an integrated decoy, this second domain might be an integrated decoy, while the apparent functional kinase exerts the signaling function. Indeed, plants produce various enzymes, including protein kinases with different integrated domains, to catalyze reactions of various substrates. In the case of protein kinase resistance proteins, the integrated domain would define the specific substrates of pathogen Avr proteins, whereas the kinase would catalyze the phosphorylation of either the Avr protein, the integrated domain, itself, or a third signaling partner to trigger downstream defense, possibly via an NLR guard (Extended Data Fig. 8). EMS mutagenesis of Yr15, Pm24, Sr62, and Sr43 has shown a preponderance of missense mutations affecting the kinase active site or ATP-binding pocket of the apparent functional kinase domain (Extended Data Fig. 7), supporting the notion that kinase-mediated signaling is required for function.
The transgenic expression of Sr43 in a different background allowed us to confirm the broad-spectrum efficacy of Sr43, highlighting its potential value in resistance breeding. However, it is possible to obtain gain-of-virulence pathogen mutants that have lost AvrSr43 function under laboratory conditions35. Therefore, Sr43 should be used in combination with other broad-spectrum resistance genes to maximize its longevity in the field.