In plant breeding, markers are used to construct genetic linkage maps, perform phylogenetic and evolutionary analyses, select desired alleles, and map genes/QTLs. These activities started with the identification of several low-throughput markers, such as RFLP, RAPD, and AFLP. Later, comparatively more abundant, highly polymorphic, and co-dominant markers, known as SSRs, complemented the markers mentioned above in soybean genetics, molecular biology, and breeding research[13]. Large numbers of QTLs associated with different biotic stresses have been identified in soybean using these markers (Table 1). In addition, marker-assisted selection (MAS) has sped up the breeding process, particularly for producing disease- and insect pest-resistant cultivars[14].
Earlier, a high density genetic map ‘Soybean Consensus Map (version 4.0)’ was constructed by combining available genetic and physical maps involving SSRs and SNPs[15]. Genomic applications in soybean became more frequent with the availability of the whole genome sequence, one of the first legumes to be sequenced and published[16]. In addition, sequencing-based genotyping approaches such as genotyping by sequencing (GBS) have become more cost-effective and efficient owing to recent advances in NGS technologies. The GBS approach gained much popularity due to its simplicity and cost-effectiveness and has been recently modified and streamlined for soybean. However, sequencing-based genotyping approaches necessitate computational knowledge and significant time for interpreting the generated data, limiting their application in marker-assisted breeding (MAB), where rapid selection is critical.
Recently, more high-throughput, reliable, and fast approaches named as array-based genotyping technologies have been introduced which includes the following six SNP genotyping arrays- (i) SoySNP6K Infinium BeadChip[17], (ii) SoySNP50K iSelect BeadChip[18], (iii) Axiom SoyaSNP array for approximately 180,000 SNPs[19], (iv) NJAU 355 K SoySNP array[5], (v) BARCSoySNP6K Infinium BeadChip[20], and (vi) Affymetrix SoySNP618K array[21].
Increasing evidence suggests that one to several reference genomes do not fully represent the genetic diversity of a species[22], limiting the identification of genetic variants, particularly for larger structural variants such as presence/absence variants and copy number variants that play key roles in the genetic determination of various traits[23]. As a result, the construction of pan-genomes is becoming increasingly popular in crops including soybean[21, 24]. Li et al. [24] published the first soybean pan-genome in 2014, which was produced by assembling seven wild soybeans and decoding them using second-generation sequencing technology. Liu et al. [21] used long-read sequencing to create a soybean pan-genome by de novo assembly of 26 representative wild and domesticated soybeans, producing golden-grade genomes for each accession as well as a graph-based genome for the first time, providing a viable foundation for future in-depth soybean functional genomic studies[21]. Most recently, a pan-genome of cultivated soybean (dubbed PanSoy) was developed by de novo genome assembly of 204 phylogenetically and regionally representative improved accessions from the larger GmHapMap collection[25].
1.1 QTL and meta-QTL-assisted breeding for biotic stress resistance in soybean
Several interval mapping studies have been conducted in soybean, identifying hundreds of QTLs for various traits, including those contributing resistance to different biotic stresses (Table 1). Among the identified QTLs, some directly affect traits, while others influence traits via epistatic interactions. For instance, in soybean, three main effect QTLs for antibiosis and antixenosis mechanisms conferring resistance to common cutworm explained 3.9–27.4% of the total phenotypic variation, while three epistatic QTL pairs explained 3.8–14% of the total phenotypic variation [26]. In another study, four QTLs providing resistance to SDS were identified and mapped on chromosomes 4, 8, 12, and 18, explaining 5.3–22.1% of the phenotypic variation. Furthermore, QTLs located on chromosomes 8 and 18 demonstrated epistatic interactions and explained almost 70% of the total phenotypic variation along with their main effects [27].
QTLs should be validated and fine mapped for MAB application. Zhu et al. [28] fine mapped a major effect QTL for corn earworm and soybean looper to a 0.5 cM region in the soybean genome. In another study, a major QTL conferring resistance to SMV was fine mapped to 79 kb flanked by markers gmssr_13-14 and gm-indel_13-3. Furthermore, two important candidate genes encoding Toll Interleukin Receptor (TIR) nucleotide-binding–leucine-rich repeat (NBS-LRR) resistance proteins, were reported in this region [29]. Most recently in 2021, the foxglove aphid resistance gene Raso2 was fine mapped from 13 cM interval to 77Kb region in the soybean genome and one promising candidate gene encoding an NBS-LRR domain containing protein was identified[30]. A QTL, qSCN18 providing resistance to SCN was fine mapped to a 166 kb genomic region containing 23 genes [31]. Efforts to clone QTLs have also been undertaken in soybean; for instance, two QTLs, Rhg1 and Rhg4 which provide resistance to SCN, were cloned and functionally characterized [32, 33].
Furthermore, meta-analyses of available QTLs can narrow down QTL regions most involved in trait variation, discovering candidate genes that confer resistance to different biotic stresses. In a meta-analysis of QTLs providing resistance against SCN, Guo et al.[34] identified several meta-QTLs (MQTLs) on linkage groups G, A2, B1, E, and J. Other studies identified six consensus and stable MQTLs conferring resistance to multiple insects[35] and 23 MQTLs providing resistance to different diseases, including soybean rust, brown stem rust, phytophthora root rot, Asian soybean rust, sclerotinia stem rot, white mold, SDS, frog eye leaf rust, brown stem rot, Rhizoctonia root and hypocotyl rot, and phomopsis seed decay [35]. Different criteria have also been proposed for selecting the most promising MQTLs for breeding programs [36]. Meta-analysis studies are urgently required for traits contributing to resistance to different biotic stresses in soybean. The predicted MQTLs will help MQTL-assisted breeding for improved resistance to different biotic stresses.
1.2 Genome-wide association mapping for biotic stress resistance
Interval mapping using biparental populations has a major limitation of restricted allelic diversity and resolution. In contrast, the genome-wide association study (GWAS) approach allows researchers to examine the enormous allelic diversity in natural crop germplasm. The millions of crossing events in germplasm during evolution increased the mapping resolution of GWAS. While GWAS is used widely for many plant species [37, 38], relatively few studies on the traits contributing to resistance to different biotic stresses have been reported in soybean (Table 2). Recently, GWAS for sclerotinia stem rot using 22,048 SNPs identified 18 marker trait associations (MTAs) and 243 underlying candidate genes [39]. The two most promising candidate genes, Glyma.03G196000 and Glyma.20G095100 encoding pentatricopeptide repeat proteins, were characterized using haplotype analysis [39]. In another study, four significant MTAs were identified for frog eye leaf spot [40]. Subsequently, annotation of 45 genes in the two resistance-related haplotype blocks identified the following promising candidate genes: (i) mitogen-activated protein kinase 7, (ii) pyruvate dehydrogenase, (iii) calcium-dependent protein kinase 4. Tran et al. [41] identified 12 significant MTAs for SCN in 461 soybean accessions using GWAS. They also reported 24 potential genes, including three genes colocalized with previously reported R genes on chromosome 7. Despite several advantages, unanticipated linkage disequilibrium (LD), alleles with low frequency, disease heterogeneity, synthetic associations, missing genotypes, and outcome disparities between various methods used continue to bottleneck the GWAS approach.
Frequent discrepancies are observed when comparing GWA studies for the same trait, possibly due to variations in allele frequency between populations, a lack of population structure control, or environmental factors. With the availability of standardized marker data across the USDA soybean germplasm collection, several analyses have used historical records and GWAS analysis to map important major effects of MTA, including those associated with one bacterial, five fungal, two nematode, and three viral diseases [42] and beet armyworm, Mexican bean beetle, potato leafhopper, soybean aphid, soybean lopper, and velvet bean caterpillar [43]. However, using raw measurements from different environments for many quantitative traits, including disease resistance, may reduce the detection power for significant MTAs. While data from the same environment(s) have a common environmental component, combining multiple panels grown in different conditions results in the incorrect assignment of environmental effects to genetic variations across the panels involved. Meta-analysis can provide a better solution for the above-mentioned GWAS challenges and can be performed using data from independent studies and available statistical methodologies. For instance, in 2021, a meta-GWAS involving 73 published studies covering 17,556 diverse soybean accessions [44] identified numerous potential candidate genes, including 33 for disease resistance traits.
1.3 QTL-seq, MutMap, M2-seq, and Chip-seq: progress and prospects
Michelmore et al. [45] proposed a powerful concept, bulk segregant analysis (BSA), for mapping traits of interest showing Mendelian inheritance. This concept has been modified for its applicability in re-sequencing-based techniques such as MutMap and QTL-seq, which are cost-effective extensions of BSA that align short reads of bulks to the reference genome instead of re-sequencing all the progenies of the biparental population. Several traits in soybean have been mapped using Mutmap and QTL-seq approaches [46], for instance, QTL-seq of soybean identified genomic regions on chromosomes 5, 8, and 14 conferring resistance to charcoal rot resistance [47]. Thus, Mutmap and QTL-seq are potential tools for future biotic stress resistance studies in soybean.
However, these techniques have certain limitations, such as the requirement for backcrossing and selfing generations, rendering it time-consuming. Later, an extension of these techniques named as ‘direct whole genome re‑sequencing’ was proposed to overcome the limitations of the above tools[48]. The potential of this novel technique was also demonstrated by mapping genomic regions associated with SCN resistance using two mutants of the soybean genotype ‘PI 437654’ [48]. Comparing the short reads of six re-sequenced genomes of the mutants to the reference genome ‘Williams 82’ narrowed the genomic region to three candidate genes encoding an ankyrin repeat family protein, LRR-receptor-like protein kinase family protein, and signal peptidase carrying ten mutations for the resistance gene. Bhattacharyya and Feng (2021) proposed another re-sequencing strategy for soybean known as M2-seq. The authors successfully mapped the genomic regions containing several causal mutations for mutant phenotypes among ten independent M2 populations and identified Glyma.08G193200 as a candidate gene controlling trichome density involved in broad-spectrum insect resistance. The technique is less laborious than MutMap and has great potential for exploiting disease and insect resistance in soybean.
Biotic stresses trigger a plethora of genes, transcription factors (TFs), and regulatory bodies in the plant genome, inducing a profound network of stress resistance pathways. Unleashing genome-level changes in genes involved in defense mechanisms requires integrating knowledge of regulatory and non-regulatory genomic regions, particularly epigenetic modifications. The specific sites of TF binding and their related target genes, including those involved in biotic stresses, can be identified using chromatin immunoprecipitation followed by high-throughput sequencing technologies (ChIP-seq) [49]. Several reports of ChIP-seq are available for humans and the model plant species Arabidopsis but not for biotic stress resistance in soybean. This technology has the potential to unravel biotic stress dynamics and epigenetic modification in soybean.
1.4 Genome editing for developing resistant soybean cultivars
With the advent of large sequencing data, mapped QTLs, identified candidate genes and trait-associated SNPs and InDels, the architecture of plant genomes can be reshaped using genome editing approaches to suit the needs of the growing population. Among others genome editing tools, CRISPR/Cas9 has emerged as the most successful and widely applicable genome editing tool owing to its specificity and minimum off-targets. Genes involved in biotic stress responses can be edited using CRISPR/Cas9 to impart resistance and generate multiple sources of defense response to different pests and pathogens. For instance, editing of the effector molecule Avr4/6 of P. sojae inhibited its identification by corresponding R proteins in soybean (Rps4 and Rps6), resulting in immunity activation and resistance against damping-off disease [50]. Another study reported resistance to SMV and enhanced isoflavone content by simultaneously knocking out the expression of three genes (viz., GmF3H1, GmF3H2, and GmFNSII-1) in the soybean genome using CRISPR/Cas9 [51]. Recently, novel R gene paralogs in soybean were developed through targeted cleavage of tandem duplicated regions of NBS-LRRs genes and subsequent repair of DNA [52]. Since NBS-LRRs are common targets of pathogenicity, targeting these genes with double-strand breaks induced by genome editing followed by natural DNA repair could trigger underlying immunity. Two genes, Rpp1L and Rps1, which interact with Phakopsora pachyrhizi[53] and P. sojae were targeted to demonstrate the efficacy of the technology. Genes for biotic stress resistance have been preserved in crop wild species, landraces, and diverse germplasms; however, substantial variation has been eroded in the domestication process. Thus, genome editing can offer a diverse set of biotic stress resistance genes for mitigating the challenges posed by various pests and pathogens.
1.5 Role of genomic selection in improving resistance against biotic stresses in soybean
Genomic selection (GS) can be used to address the limitations associated with linkage-based mapping, GWAS and marker-assisted breeding for biotic stress resistance[54–56]. GS has been used in soybean for various traits contributing to resistance to different biotic stresses[57, 58]. For instance, using GS for SCN resistance in soybean was more effective than using MAS, with a prediction accuracy of 0.59–0.67% [58]. A GS study with 697 accessions in the training population and 18,955 accessions in the breeding population was used to predict susceptibility to tobacco ring spot virus; a significant correlation (r2 = 0.67) occurred between the predicted and actual severity of the disease[59]. In another study, GS was performed for quantitative disease resistance against P. sojae in two diverse panels of more than 450 plant introductions; the study estimated a relative efficiency per breeding cycle of 0.57–0.83, which may significantly improve the genetic gain for P. sojae disease resistance in soybean breeding programs [57].
Studies have also shown that incorporating significant SNPs identified through GWAS as a fixed effect in GS models can increase the prediction accuracy for complex traits[60, 61]. For instance, Ravelombola et al. [62] identified 22, 14, and 16 SNPs associated with chlorophyll content indices (CCI) of uninfected and SCN-infected plants and the reduction of CCI SCN, respectively, in a GWAS of 172 soybean genotypes using 4,089 SNPs. An average GS accuracy ranging from 0.31 to 0.46 was reported when all SNPs were considered for GS, whereas it ranged from 0.55 to 0.76 when GWAS-derived SNP markers were used as fixed effects in the GS models. In another study, when marker effects were evaluated using SNPs obtained from GWAS with the GS model, the maximum accuracy was achieved (~ 2-fold) at each level of cross-validation [62]. Collectively, we have summarized the current breakthroughs in soybean genomics that are being utilized to improve soybean resistance to different biotic stresses as represented in Fig. 2.
1.6.Challenges and prospects of soybean genomics for biotic stress resistance
With substantial advances in NGS technologies, sequencing and genotyping have become more available and affordable, enhancing the application of genomics approaches for soybean improvement. However, the true challenge is discovering genome features that define the biology; an assembled reference genome sequence is merely a foundation[16]. Even though every cell has the same DNA sequence, epigenetic changes and gene expression differ substantially depending on the environment, developmental stage, and tissue type. Future efforts are needed to fully understand these biologically active states of DNA in the soybean genome. Such investigations may involve several forms, including integrating maps of DNA methylation, small RNA, histone modification, and transcript abundance measured across multiple tissues and environments. Furthermore, pangenomics is gaining popularity among scientists for tapping into species diversity, owing to genome sequencing and re-sequencing data availability. The newly constructed graph-based soybean pangenome revitalizes earlier omics data while revolutionizing functional and evolutionary genomic studies in soybean[21]. We suggest a more comprehensive approach in which we aspire for a super-pangenome, with the developing NGS technologies and decreasing costs.
Genomics-assisted breeding (GAB) offers the potential to solve future soybean improvement challenges but can only be realized by precisely detecting marker trait relationships (through interval mapping and GWAS) and estimating GEBVs (via GS). Both interval mapping and GWAS require precise genotyping and phenotyping data for gene identification. Breeders have reported considerable gaps between identifying QTLs/genes and their subsequent application in soybean improvement. The use of manual phenotyping and low-throughput marker systems, which have generated a genotype–phenotype (GP) gap, is the fundamental reason for the minimal use of MAS in soybean improvement. Progress in high-throughput phenotyping and genotyping can reduce the GP gap for crop selection and gene identification in soybean.
Continuous publishing of raw phenotypic results from screenings will improve the power of identifying relevant genes through meta-GWAS for both mean and plastic responses, reducing the expense and time load on individual programs while helping future breeders and researchers. Furthermore, as QTLs conferring resistance to different diseases and pests are mapped to the soybean genome, their spatial relationships (in the chromosomal context) can be assessed using meta-analysis of available QTLs. Co-localization of QTLs for different diseases and pests offers evidence for multiple disease resistance loci, as observed in different crops[36].
The advent of long-read sequencing technologies is hastening the identification of haplotypes that aid genome assembly[63], which can be used for various applications; for instance, haplotype-based GWAS analysis. The superiority of haplotype-assisted GS over SNP-based genomic prediction has been reported for charcoal rot resistance in soybean[64]. Developments in high-throughput phenotyping will also help identify superior haplotypes for subsequent application in disease and pest resistance breeding in soybean. Haplotype-based breeding aims to transfer superior haplotypes underlying genetic variants in strong LD with putative genomic regions linked to traits of interest. In recent years, haplo-phenoanalysis has been used to detect superior haplotypes in some crop species[65]. However, haplo-pheno analyses have not been reported for soybean traits providing resistance to different biotic stresses. Efforts are needed to identify superior haplotypes for biotic stress resistance in soybean; the recently constructed haplotype map for soybean using WGS data may prove useful in this regard [63].