Phenotypic evaluation of southern stem canker resistance in soybean accessions
All accessions were inoculated with mycelium from the CMES 480 isolate using the toothpick method under greenhouse conditions [26, 27]. The results of the inoculation experiment were expressed as the percentage of dead plants (%DPs), and all the differential genotypes showed a small lesion at the point on the stem where the toothpick penetrated, indicating that an infection had successfully occurred in all the inoculated plants. The cultivars Tracy-M (Rdm1/Rdm2), Crockett (Rdm3) and Hutcheson (Rdm5), which are sources of SSC resistance, showed complete resistance against the D. aspalathi isolate CMES 480, PI 398469 (Rdm?) also showed a high degree of resistance, but we still observed 3% DPs. On the other hand, the interactions between CMES 480 and the accessions harboring the Rdm1 (D85-10404), Rdm2 (D85-10412) and Rdm4 (cv. Dowling) genes were all compatible, such that these accessions were all highly susceptible (Table 1). The isolate CMES 480 was recognized by multiple R genes, resulting in the possibility of identifying different resistance loci if they are distributed in the GWAS panel.
Southern stem canker symptoms were evaluated at 60 days after inoculation and, as expected, known resistant (cv. Tracy-M) and susceptible (cv. BR 23) accessions showed highly contrasting results (Figure 1a). The resistant plants showed only a small area of necrosis in the stem tissue around the toothpick, the presence of a callus at the toothpick insertion point and no damage to plant development. On the other hand, the susceptible accessions presented both infected and dead plants, where the infected plants were identified on the basis of the absence of a callus, a reduction in the development of the aerial parts of the plant, a large necrotic region at the point of inoculation, and the presence of chlorotic and withered plants. Another parameter that easily distinguished resistant and susceptible plants was the length of the internal lesion; resistant plants usually showed a lesion length of less than 1 cm, unlike susceptible plants, which presented lesions greater than 1 cm (Figure 1b).
The pathogenicity test was carried out for all 295 accessions included in the GBS panel, where 205 were considered resistant, and 90 were susceptible. To highlight the diversity of the panel, among the resistant plants, 26% of the accessions came from China, 22% from Brazil, 20% from Japan and 12% from the USA. In the susceptible group, Brazil contributed 33% of the susceptible accessions; the USA contributed 20%; China contributed 18%; and South Korea contributed 17%. Based on the year of the release/cataloguing of the materials, the oldest resistant accessions in the panel (1930s) came from China and North Korea, while cultivars Tropical and cv. Doko were the oldest resistant Brazilian materials (1980s). PI 090763 from China (1930s), PI 196170 (South Korea), accessions from Japan (1950s), cv. Santa Rosa (1957), and the American cultivars Bragg and Davis (1960s) were examples of the oldest susceptible materials in this panel.
Identification and mapping of the southern stem canker resistance locus
The Fast-GBS pipeline produced approximately 50,000 high-quality SNPs from the GBS data. Using an MAF of ≥0.05 as a cut-off, we selected a total of 32,836 polymorphic SNP markers that we used in GWAS. The resulting SNPs were distributed over the whole genome. These SNPs proportionally covered all soybean chromosomes, with a mean SNP density of one SNP every 29.1 Kbp and a mean of 1,642 SNP markers per chromosome. The greatest number of SNPs was detected on chromosome 18 (2,845 SNPs), followed by chromosome 4 (2,145 SNPs), and the lowest numbers were observed on chromosomes 12 (951 SNPs) and 11 (959 SNPs) (Additional File 1). Regarding population structure, a principal component analysis (PCA) was performed, in which PC1 explained approximately 9% of the observed variance, PC2 approximately 7% and PC3 approximately 4%; together, the three PCs explained approximately 20% of the total genetic variance (Figure 2a and 2b). The GWAS was performed with the compressed mixed linear model (cMLM), which accounted for population structure (PCA) and relatedness by the kinship matrix (K matrix). The quantile-quantile plot showed that the observed p-values strongly deviated from the expected p-values for a few SNPs, which indicated that the cMLM model was appropriate for the performed GWAS (Figure 2c). We identified a single locus on chromosome 14 at which a total of 19 SNPs showed significant associations (FDR < 0.001) with SSC resistance (Figure 2d). Among these significant SNPs, the FDR-adjusted p-value ranged between 6.35E-27 and 4.13E-09, with SNPs explaining approximately 40% to 70% of the total phenotypic variation (Table 2).
The interval delimited by the significant SNPs extended just over 400 kbp, although the three most significant SNPs were located within a span of 34 kbp, thus identifying a very specific region. Within this region, the most significant SNP resided within Glyma.14g024300 (a DEA(D/H)-box RNA helicase family protein), the second most significant SNP resided within Glyma.14g024100 (a Rho GTPase-activating protein), and the third most significant SNP was located within Glyma.14g23900 (a methionine sulfoxide reductase).
Based on the results, the peak SNP by itself was sufficient to separate the resistant and susceptible accessions with a high level of concordance. At the peak SNP (1,744,370 – SNP1), the C allele was detected in 194 resistant accessions, while four resistant accessions were heterozygous, and the remaining seven resistant accessions showed the T allele. Similarly, an elevated concordance between the phenotype and genotype was observed among the susceptible materials. Among 90 susceptible accessions, 71 showed the T allele. Of the 19 apparent discrepancies, 16 accessions were heterozygous, and the remaining three carried the C allele. A comprehensive description of the SNP genotypes (at all 19 significant positions) and phenotypes for each accession is provided in Additional File 2.
Among the differential accessions, the C allele was detected at the peak SNP in all accessions that showed resistance to isolate CMES 480 as well as in the susceptible accession D85-10404, which is a line derived from cv. Tracy-M. On the other hand, cv. Dowling and the D85-10412 line showed both the susceptible phenotype and the T allele (Additional File 3).
We performed a haplotype analysis of the 295 accessions using SNPs associated with SSC resistance. First, from the initial 19 SNPs showing significant associations, we eliminated redundant SNPs (i.e., SNPs associated with SSC that provided the same information). Thereafter, we obtained four haplotypes containing the combination of four SNPs that were able to discriminate the main SSC resistance sources and grouped the accessions presented in the panel (Table 3). Haplotype 1 was present in the majority of resistant materials and was shared by cv. Hutcheson and the PI 398469 and was present in just one susceptible accession. Haplotype 2 was shared only by cv. Crockett and 35 resistant accessions. Haplotype 3, shared by cv. Tracy-M and line D85-10404, was also present in 22 resistant and two susceptible accessions. Finally, haplotype 4 was distributed in 70 susceptible accessions, in Dowling and line D85-10412 and in 5 other resistant accessions.
Whole-genome sequencing in the resistance locus interval reveals additional allelic variation
Analysis of the region associated with resistance against Da was performed by examining allelic variation 278 kb upstream and 200 kb downstream of the first peak SNP of the GWAS in the resequencing soybean dataset. This specific interval was based on SNPs with r2 values higher than 0.3, according to the LD analysis. (Additional File 4). We observed a total of 4,440 SNPs and 1,105 InDels in this interval (Table 4). Among the SNPs, 3,375 were identified in noncoding regions, 421 in intronic regions, 247 in UTRs, and 397 in exons. Among the last group, 248 nonsynonymous SNPs were observed in 39 different genes. Moreover, there were 69 InDels in UTRs, 98 InDels in introns, and 37 InDels in exons. Twenty-three InDels were responsible for a frameshift modification in 9 different genes.
The most significant SNP was a nonsynonymous modification located at exon 6 of the Glyma.14G024300 gene (encoding a DEAD/DEAH box RNA helicase). We also identified three other nonsynonymous SNPs associated with this gene (Figure 3), which were in perfect LD with the first peak SNP and could not be detected by the GBS strategy due the lower coverage of the technique compared to whole-genome sequencing. Unsurprisingly, given the large size of the haplotype block comprising the peak SNP, we observed 216 SNPs and 46 InDels in perfect LD (r2 = 1) with the first peak SNP of the GWAS, at a distance up to 224 Kbp from the described allele (Additional File 4). Some of these allelic variations were distributed within genes in the interval that presented structural domains commonly found in resistance genes, revealing other potential candidate genes for SSC resistance. Fifteen nonsynonymous SNPs were observed in eight genes, including two leucine-rich-repeat receptor-like protein kinases (LRR-RPK) (Glyma.14G026300, and Glyma.14G026500), a serine-threonine protein kinase (PRSTK) (Glyma.14G026700), a PH domain LRR-containing protein phosphatase 1 (Glyma.14G024400), a methyltransferase (Glyma.14G026600), an acid phosphatase-related gene (Glyma.14G024700), and a gene involved in DNA repair (Glyma.14G026900) (Table 5). Finally, an insertion of two nucleotides responsible for a frameshift modification in the exon of an LRR-RPK gene (Glyma.14G026500) was observed only in susceptible cvs. based on our analysis. To confirm the association of these allelic variations and the role of potential candidate genes in resistance to SSC, functional validation should be conducted in future studies.
Allelic discrimination using the Rdm SNP KASP assay
The peak SNP (1,744,370) was selected to develop a KASP assay to confirm the alleles obtained by GBS and to apply this assay in future MAS. Thus, a subset of 146 accessions from the GWAS panel were analyzed with this assay, and as expected, all of the same alleles/genotypes obtained by GBS were obtained using the KASP assay (Additional File 5). Furthermore, the developed assay was able to correct the heterozygous genotypes obtained by GBS (Figure 4). Among the accessions shown to be heterozygous at the peak SNP, 15 accessions were present in the subset analyzed with the assay, and all were found to be homozygous.
Therefore, the efficiency of the SNP marker and type I/II error rates were calculated and are shown in Table 6. The SNP1 marker was present in 98% of the accessions phenotyped as resistant, resulting in a low type I error rate (2.4%), which suggests a low probability of erroneously selecting a susceptible line based on the marker genotype. In addition, the marker also presented a low type II error rate or false negative rate of 1.19%.