Characterization of stem rust responses
A total of 388 F2 plants from the cross of PI 272557 × PI 306540 were used to separate SrTm4 from the other three Sr genes (Sr21, Sr60 and Sr22b) present in PI 306540. From this population, we obtained one F3 family TmR4-260 carrying only SrTm4 and another family TmS4-110 carrying no Sr gene. In the seedling tests, plants from the TmR4-260 family exhibited mesothetic resistant infection types (ITs = ‘;3’ to ‘31’) to Pgt races TTTTF and 34C3RTGQM, whereas plants from family TmS4-110 displayed susceptible infection types of ‘3+’ to ‘4’ (Fig. S2). Using the software ASSESS v.2, we analyzed images of infected leaves from families TmR4-260 and TmS4-110 and measured the percentage of leaf areas covered by rust pustules. The data showed that the average percentage was significantly lower (P < 0.001) in the plants with SrTm4 relative to those without this gene (Fig. S2).
The Pgt race TTTTF, which is virulent on parental resistance genes Sr21, Sr60, and Sr22b, but avirulent on SrTm4 (Briggs et al. 2015; Chen et al. 2018a), was used to determinate disease reactions in the G3116 × PI 306540 population. At the seedling stage, F3:4 plants homozygous for the presence of SrTm4 showed ITs ranging from ‘;3’ to ‘31’ (similar to PI 306540), whereas plants homozygous for the absence of the gene displayed susceptible infection types (ITs = ‘3+’ - ‘4’, similar to G3116) (Fig. 1). In this population, measurement of the percentage of leaf areas covered by stem rust pustules was also significantly lower (P < 0.001) in the plants homozygous for the resistant allele than those homozygous for the susceptible allele (Fig. 1).
High-resolution genetic map of SrTm4
SrTm4 was previously mapped within a 2.1 cM interval on chromosome arm 2AmL (Briggs et al. 2015). To accelerate the development of markers in the candidate region, we first performed an RNA-seq experiment to identify single nucleotide polymorphisms (SNPs) between the two parental lines G3116 and PI 306540. Approximately 40.9 million and 28.1 million PE150 reads were generated for PI 306540 and G3116, respectively. After removing low-quality reads and adaptors, ~ 94% of the reads were mapped to the ‘Chinese Spring’ wheat reference genome RefSeq v2.1, and a total of 84,495 polymorphisms were identified between the two T. monococcum parents.
Based on these polymorphisms, we developed 12 new markers in the candidate region (Fig. 2b, Table S2). Meanwhile, screening of 811 F2 plants from the G3116× PI 306540 yielded 48 plants with recombination events between SrTm4-flanking markers BQ461276 (IWGSC RefSeq v2.1: 760,094,323 bp) and gwm526 (763,867,267 bp), resulting in a genetic distance of 2.96 cM (Fig. 2b). The 12 new markers were used to genotype the 48 lines with recombination events and to construct a genetic map of the SrTm4 region (Fig. 2b). Based on this result, SrTm4 was mapped within a 0.37 cM interval flanked by markers CD903048 and DK658885 and completely linked to markers CK167245.1, CK167245.2, BJ314745.1 and BJ314745.2 (Fig. 2b).
To define better the position of SrTm4, we screened another 3,950 F2 plants with the new flanking markers CD903048 and DK658885. The genetic distance between these two markers was estimated to be 0.27 cM based on the 20 plants with informative recombination events identified in this screen plus another 6 plants found in the previous 811 plants. Using these informative recombination events and six new markers developed in this candidate region (Table S2), SrTm4 was finally mapped within a 0.06 cM interval flanked by marker loci CS4211 (IWGSC RefSeq v2.1: 762.67 Mb) and 130K1519 (IWGSC RefSeq v2.1: 762.67 Mb, Fig. 2c).
Analysis of the candidate gene region revealed that it partially overlapped with a 1.8 Mb chromosomal inversion between IWGSC RefSeq v2.1 (763.4 Mb to 765.2 Mb) and IWGSC RefSeq v1.1 (Fig. S3 and Table S3). Comparisons among published reference genomes of diploid, tetraploid and hexaploid wheat showed that this 1.8-Mb inversion is present only in IWGSC RefSeq v1.1 but absent in all other sequenced wheat varieties (Fig. S4), indicating that the IWGSC RefSeq v2.1 is the correct assembly for this region.
Candidate genes for SrTm4 within the colinear region of hexaploid wheat genome
The 0.06 cM candidate region between SrTm4-flanking markers CS4211 and 130K1519 defines a 1.0-Mb region (762.67-763.67 Mb, Fig. 2d) in the reference genome of hexaploid wheat ‘Chinese Spring’ (IWGSC RefSeq v2.1) that contains only 19 high-confidence annotated genes (TraesCS2A03G1276000 - TraesCS2A03G1280800, Table S4).
Among these 19 candidate genes, four were differentially expressed (FDR < 0.05; p-value < 0.01; and |log2 foldchange| > 1) between the inoculated susceptible and resistant sister lines in the RNA-seq experiment (TraesCS2A03G1276800, TraesCS2A03G1278700, TraesCS2A03G1280400, and TraesCS2A03G1280700). Two of the differentially expressed genes (DEGs) showed that their transcripts were significantly higher in the plants carrying the susceptible SrTm4 allele compared to the resistant allele, whereas the other two DEGs were downregulated in the same lines (Table S5).
Among the DEGs identified in the RNAseq data (accession number PRJNA932462), 149 genes were significantly upregulated and 136 genes were significantly downregulated in the homozygous resistant lines (PI 306540, R-F14 and R-K18) relative to the susceptible lines (G3116, S-A13 and S-E14). The principal component analysis (PCA) of the RNAseq samples confirmed that the transcriptomes of the sister monogenic lines with and without SrTm4 are very similar to each other and intermediate between the two parental lines (Fig. S1). It also shows that the small number of DEGs detected between the sister lines are not sufficient to generate a clear difference between the two groups in the PCA.
Physical maps of the SrTm4 region
To determine whether additional genes were present in the candidate region in T. monococcum, we screened the BAC libraries of the susceptible line DV92 and the resistant parent PI 306540 using the two flanking markers (CS4211 and 130K1519) and three markers completely linked to SrTm4 (PRRF4R4, CK167245.2 and BJ314745.1, Fig. 2d). In addition, we developed new markers (Table S2) from genes located in the orthologous region in the ‘Chinese Spring’ reference genome to accelerate the screening process.
By chromosome walking, we identified 12 and 11 overlapping BACs from the BAC libraries of DV92 and PI 306540, respectively, covering the candidate gene region (Fig. S5 and S6). Based on sequencing and assembly of BAC sequences, we determined that the 0.06 cM candidate region defines a contiguous sequence of 754-kb region in T. monococcum accession PI 306540 (GenBank accession QQ503488). The 12 overlapping BACs of DV92 were sequenced at lower depth using the WideSeq approach, which yielded 30 non-overlapping contigs covering 652-kb excluding gaps.
High confidence genes in the SrTm4 candidate region
Our annotation of the 754-kb sequence showed no additional genes in the SrTm4 candidate region in PI 306540 relative to Chinese Spring and DV92. TraesCS2A03G1276100 is a pseudogene in both T. monococcum genotypes (DV92 and PI 306540), suggesting it is not a good candidate gene for SrTm4. For the other genes, we focused on those that were either differentially expressed or polymorphic in their coding regions between the resistant parent PI 306540 and the susceptible line DV92 (Table S5 and S6). We detected amino acid changes between PI 306540 and DV92 for nine expressed candidate genes (Table S6), and calculated SIFT scores to estimate their predict effects on protein structure and function. We found six genes with SIFT scores lower than 0.05, which indicate a high probability of deleterious effects (Table S6). Based on the functional annotation of these genes, we prioritized TraesCS2A03G1276200, TraesCS2A03G1276800, and TraesCS2A03G1278900 given their known role in plant defense against pathogens (Hopkins et al. 2008; Laluk et al. 2011; Qiu et al. 2021; Ruffel et al. 2002; Zhang et al. 2019a; Zhang et al. 2021; Zhao et al. 2022).
The four DEGs identified in the candidate gene region are also potential candidates for SrTm4 (Table S5). For each of these genes, we sequenced the promoter and open chromatin regions identified by ATAC-seq (Debernardi et al. 2022) (Fig. S7). We found multiple polymorphisms in these regions that may explain their differential expression.
Among the four DEGs, we eliminated TraesCS2A03G1280400, which is a short putative gene with a single predicted exon encoding a 120-amino acid peptide with no similarity to any known function protein. Attempts to annotate the orthologous regions on chromosomes 2B and 2D revealed multiple frame-shift mutations, and no possible functional orthologs. Based on these results, we hypothesize that this is not a real gene, and we did not consider it further as a candidate for SrTm4.
The other three DEGs are annotated in the genome of Chinese Spring (RefSeq v2.1) as eukaryotic initiation factor 4A-III homolog B-like (TraesCS2A03G1276800), S-acyltransferase 11-like (TraesCS2A03G1278700), and acyl-activating enzyme 5 (TraesCS2A03G1280700). Their transcript levels were analyzed in Pgt-inoculated and mock-inoculated T. monococcum plants by qRT-PCR at 0 h, 3- and 6- days post inoculation (dpi). Since we were not able to detect transcripts of TraesCS2A03G1276800 in PI 306540 and TraesCS2A03G1280700 in G3116 based on RNAseq data (Table S5), their transcripts were evaluated only in PI 306540 or in G3116. We found that transcript levels of TraesCS2A03G1276800 and TraesCS2A03G1278700 in G3116 were significantly higher (P < 0.05) in Pgt-inoculated plants than in mock-inoculated controls only at 6 dpi (Fig. S8a, b). There was no significant difference in the transcript levels of TraesCS2A03G1278700 and TraesCS2A03G1280700 in PI 306540 between Pgt-inoculated and mock-inoculated plants (Fig. S8c, d). In addition, we also detected higher transcript levels of TraesCS2A03G1278700 in PI 306540 than in G3116 (Fig. S8b, c), supporting the RNAseq data analysis (Table S5).
Detection of a 593-kb chromosomal inversion in the candidate region that disrupts a potential candidate gene
Based on the chromosomal walking experiments, we observed that the order of the markers was reversed within a ~ 600-kb region between PI 306540 and DV92 (Fig. S5 and S6), indicating a potential chromosomal inversion in the candidate region. We then compared the two SrTm4 physical maps with the Chinese Spring reference genome sequence (RefSeqv2.1), and confirmed the presence of an inverted segment in PI 306540 (Fig. 4a and 4b) relative to CS and DV92. Figure 4a shows that the PI 306540 inverted region was approximately 0.8 Mb long, extending from 762.7 Mb to 763.5 Mb based on CS RefSeq v2.1 coordinates.
Further sequence analysis revealed the inversion breakpoints in PI 306540, as shown in Figure S9. On the proximal side, we delimited the inversion breakpoint to a 3.3-kb region in PI 306540. This 3.3-kb region includes the border of the inversion located at ~ 762.7 Mb (762705407–762705985 bp) and the inverted part located at ~ 763.5 Mb (763515576 − 763513146 bp) based on CS RefSeq v2.1 coordinates. The two parts of the sequences are located ~ 807 kb apart in the Chinese Spring reference genome but they are adjacent in the 3.3 kb region in PI 306540. On the distal side of the inversion, we also identified a 5.2-kb region in PI 306540 that contains two segments located far apart in Chinese Spring (762705509 − 762704291 bp and 763515638–763518339 bp, Fig. S9). Using the physical map of SrTm4, we were able to determine that the inverted region in PI 306540 was ~ 593 kb.
The proximal inversion breakpoint disrupted one gene, TraesCS2A03G1276600LC.1, which was annotated as encoding an L-type lectin-domain containing receptor kinase protein (designated here as LLK1). The inversion breakpoint is located in the coding region of this gene in PI 306540 and, therefore affects the protein structure and function. TraesCS2A03G1276600LC.1 is a truncated gene in Chinese Spring since it carries premature stop codons, but its B- and D-genome homeologs encode complete proteins of 759 and 764 amino acids, respectively. Using the publicly available transcriptome databases of DV92 and G3116 (Fox et al. 2014) and other wheat genome sequences, we found that LLK1 is expressed in susceptible T. monococcum DV92 and G3116, and encodes proteins containing ~ 763 amino acids in different wheat species (Fig. S10). Using qRT-PCR analysis, we found that the transcript levels of LLK1 in G3116 were significantly higher in Pgt-inoculated than in mock-inoculated plants at both 3- and 6-dpi (P < 0.05; Fig. S11). LLK1 is completely linked to SrTm4 in the 4,761 F2 plants, and is of particular interest because this type of proteins has been previously associated with disease resistance (Wang et al. 2015a; Wang et al. 2015b; Wang et al. 2018; Woo et al. 2016).
Based on BLASTN searches using the 3.3-kb and 5.2-kb segments carrying the inversion breakpoints in the published reference genomes, we determined that this chromosomal inversion was not present in T. urartu (G1812), Aegilops tauschii (AL8/78), T. turgidum subsp. dicoccoides (Zavitan), T. turgidum subsp. durum (Svevo) and T. aestivum (10 + wheat varieties in the Wheat Pan Genome project).