Multi-omics assisted breeding for biotic stress resistance in soybean

Biotic stress is a critical factor limiting soybean growth and development. Soybean responses to biotic stresses such as insects, nematodes, fungal, bacterial, and viral pathogens are governed by complex regulatory and defense mechanisms. Next-generation sequencing has availed research techniques and strategies in genomics and post-genomics. This review summarizes the available information on marker resources, quantitative trait loci, and marker-trait associations involved in regulating biotic stress responses in soybean. We discuss the differential expression of related genes and proteins reported in different transcriptomics and proteomics studies and the role of signaling pathways and metabolites reported in metabolomic studies. Recent advances in omics technologies offer opportunities to reshape and improve biotic stress resistance in soybean by altering gene regulation and/or other regulatory networks. We suggest using ‘integrated omics’ to precisely understand how soybean responds to different biotic stresses. We also discuss the potential challenges of integrating multi-omics for the functional analysis of genes and their regulatory networks and the development of biotic stress-resistant cultivars. This review will help direct soybean breeding programs to develop resistance against different biotic stresses.


Introduction
Globally, soybean [Glycine max (L.) Merrill] is an economically important legume contributing a huge portion of edible oil (approximately 59%). It is aptly known as a 'wonder crop' due to its rich nutritional composition of protein (40%) and edible oil (20%), providing nutritional security to a vast population around the globe [1]. It has diverse applications as food, biofuel, and animal feed (SoyStat 2022). Its an excellent source of protein, which provides all the essential amino acids, with major proteins including glycinin, conglycinin lectin, and trypsin inhibitors. Its fatty acid composition comprises palmitic, stearic, oleic, linoleic, and linolenic acids. Soybean also contains trace amounts of minerals, vitamins, phytin, and phenolics [2]. Furthermore, as a leguminous crop, soybean fixes atmospheric nitrogen with the help of symbiotic Rhizobium bacteria, improving soil fertility and sustainability.
Soybean is a paleopolyploid with 40 diploid chromosomes [3]; thus, it is difficult to pinpoint its center of origin due to the complexity of its genome. According to two hypotheses, viz., single and multiple centers of origin, central China [4], northern China [5], Japan [6], or Korea [7] could be its center/s of origin. Soybean is cultivated on 124.9 million hectares (ha) worldwide, producing 339 million metric tons and averaging 2.79 t/ha (FAOSTAT 2018; SoyStat 2022). Brazil, the United States, Argentina, China, India, Paraguay, and Canada account for approximately 94% of the world's soybean production (http://soystats.com/).
Plants are sessile organisms that must acclimatize in response to changes in their growing environment. Changing climatic conditions can directly cause abiotic stresses (heat, drought, salt, cold, nutrient deficiency, and heavy metals) and indirectly change the distribution and emergence of new pests and pathogens. Biotic stresses include many organisms, such as fungi, bacteria, viruses, insects, and nematodes. Abiotic and biotic stresses act as selection pressures on plant populations by changing their plant architecture or weakening their defense mechanisms. These stresses cause significant yield losses, as they hamper plant growth, development, survival, and biomass production and pose a major threat to food security worldwide [8].
The rise in high-throughput next-generation sequencing (NGS) in the past few decades has led to the emergence of 'omics' approaches such as genomics, transcriptomics, proteomics, metabolomics, and phenomics. Integrated omics approaches can provide insights into the cellular mechanisms of an organism and their relationships with external environmental factors to produce a characteristic phenotype (Fig. 1). Omics have expanded our knowledge by unraveling complex biological interactions (genotype × environment). Furthermore, omics approaches have enabled the rapid identification of putative genes/quantitative trait loci (QTLs) associated with biotic stress resistance, the pathways involved, and functional analysis of intermediate or final proteins and metabolites produced. However, there is a lack of comprehensive information on using omics approaches for biotic stress resistance in soybean. Therefore, we review various omics approaches used in soybean to improve crop performance under different biotic stresses to assist future soybean breeding in developing novel resistant cultivars.

Soybean genomics
In plant breeding, markers are used to construct genetic linkage maps, perform phylogenetic and evolutionary analyses, select desired alleles, and map genes/QTLs. These activities started with the identification of several low-throughput markers, such as RFLP, RAPD, and AFLP. Later, comparatively more abundant, highly polymorphic, and co-dominant markers, known as SSRs, complemented the markers mentioned above in soybean genetics, molecular biology, and breeding research [13]. Large numbers of QTLs associated with different biotic stresses have been identified in soybean using these markers (Supplementary Table 1). In addition, marker-assisted selection (MAS) has sped up the breeding process, particularly for producing disease-and insect pestresistant cultivars [14].
Earlier, a high density genetic map, 'Soybean Consensus Map (version 4.0)' was constructed by combining available genetic and physical maps involving SSRs and SNPs [15]. Genomic applications in soybean became more frequent with the availability of the whole genome sequence, one of the first legumes to be sequenced and published [16]. In addition, sequencing-based genotyping approaches such as genotyping by sequencing (GBS) have become more cost-effective and efficient owing to recent advances in NGS technologies. The GBS approach gained much popularity due to its simplicity and cost-effectiveness and has been recently modified and streamlined for soybean. However, sequencing-based genotyping approaches necessitate computational knowledge and significant time for interpreting the generated data, limiting their application in marker-assisted breeding (MAB), where rapid selection is critical. Recently, more high-throughput, reliable, and fast approaches named as array-based genotyping technologies have been introduced which includes the following six SNP genotyping arrays-(i) SoySNP6K Infinium BeadChip [17], (ii) SoySNP50K iSelect BeadChip [18], (iii) Axiom SoyaSNP array for approximately 180,000 SNPs [19], (iv) NJAU 355 K SoySNP array [5], (v) BARCSoySNP6K Infinium BeadChip [20], and (vi) Affymetrix SoySNP618K array [21].
Increasing evidence suggests that one to several reference genomes do not fully represent the genetic diversity of a species [22], limiting the identification of genetic variants, particularly for larger structural variants such as presence/ Fig. 1 Various biotic stresses that cause diseases in soybean. Multi-omics approach integrating various omics technologies can be used to understand the complete dynamic interaction between host and external abiotic agents and thus produce healthy plants variation. Furthermore, QTLs located on chromosomes 8 and 18 demonstrated epistatic interactions and explained almost 70% of the total phenotypic variation along with their main effects [27].
QTLs should be validated and fine mapped for the MAB application. Zhu et al. [28] fine mapped a major effect QTL for corn earworm and soybean looper to a 0.5 cM region in the soybean genome. In another study, a major QTL conferring resistance to SMV was fine mapped to 79 kb flanked by markers gmssr_13-14 and gm-indel_13 − 3. Furthermore, two important candidate genes encoding Toll Interleukin Receptor (TIR) nucleotide-binding-leucine-rich repeat (NBS-LRR) resistance proteins, were reported in this region [29]. Most recently in 2021, the foxglove aphid resistance gene Raso2 was fine mapped from a 13 cM interval to a 77Kb region in the soybean genome, and one promising candidate gene encoding an NBS-LRR domain containing protein was identified [30]. A QTL, qSCN18 providing resistance to SCN, was fine mapped to a 166 kb genomic region containing 23 genes [31]. Efforts to clone QTLs have also been undertaken in soybean; for instance, two QTLs, Rhg1 and Rhg4, which provide resistance to SCN, were cloned and functionally characterized [32,33].
Furthermore, meta-analyses of available QTLs can narrow down QTL regions most involved in trait variation, discovering candidate genes that confer resistance to different biotic stresses. In a meta-analysis of QTLs providing resistance against SCN, Guo et al. [34] identified several meta-QTLs (MQTLs) on linkage groups G, A2, B1, E, and J. Other studies identified six consensus and stable MQTLs conferring resistance to multiple insects [35] and 23 MQTLs providing resistance to different diseases, including soybean rust, brown stem rust, phytophthora root rot, Asian soybean rust, sclerotinia stem rot, white mold, SDS, frog eye leaf rust, brown stem rot, Rhizoctonia root and hypocotyl rot, and phomopsis seed decay [35]. Different criteria have also been proposed for selecting the most promising MQTLs for breeding programs [36]. Meta-analysis studies are urgently required for traits contributing to resistance to different biotic stresses in soybean. The predicted MQTLs will help MQTL-assisted breeding for improved resistance to different biotic stresses.

Genome-wide association mapping for biotic stress resistance
Interval mapping using biparental populations has a major limitation of restricted allelic diversity and resolution. In contrast, the genome-wide association study (GWAS) approach allows researchers to examine the enormous allelic diversity in natural crop germplasm. The millions of crossing events in germplasm during evolution increases absence variants and copy number variants that play key roles in the genetic determination of various traits [23]. As a result, the construction of pan-genomes is becoming increasingly popular in crops, including soybean [21,24]. Li et al. [24] published the first soybean pan-genome in 2014, which was produced by assembling seven wild soybeans and decoding them using second-generation sequencing technology. Liu et al. [22] used long-read sequencing to create a soybean pan-genome by de novo assembly of 26 representative wild and domesticated soybeans, producing golden-grade genomes for each accession as well as a graph-based genome for the first time, providing a viable foundation for future in-depth soybean functional genomic studies [21]. Recently, in the year 2021, a pan-genome of cultivated soybean (dubbed PanSoy) was developed by de novo genome assembly of 204 phylogenetically and regionally representative improved accessions from the larger GmHapMap collection [25]. Most recently in the year 2022, another pan-genome was developed by assembling and analyzing more than 1,000 soybean accessions derived from the USDA Soybean Germplasm Collection involving both cultivated and wild lineages [25]. Finally, the pangenomes discussed above could be valuable resources for developing more efficient and targeted molecular breeding approaches for improving soybean resistance to different biotic stresses.
Further, soybean genomics research required an informatics establishment to make various genomic resources easily accessible and comprehensive in order to decipher the soybean genome. According to this viewpoint, major research groups worldwide created soybean bioinformatics databases based on various bioinformatics resources and tools. Table 1 lists some soybean specific bioinformatics tools and database resources that are useful for genomic and other omics studies in soybean, along with their websites and features (what information they provide).

QTL and meta-QTL-assisted breeding for biotic stress resistance in soybean
Several interval mapping studies have been conducted in soybean, identifying hundreds of QTLs for various traits, including those contributing resistance to different biotic stresses (Supplementary Table 1). Among the identified QTLs, some directly affect traits, while others influence traits via epistatic interactions. For instance, in soybean, three main effect QTLs for antibiosis and antixenosis mechanisms conferring resistance to common cutworm explained 3.9-27.4% of the total phenotypic variation, while three epistatic QTL pairs explained 3.8-14% of the total phenotypic variation [26]. In another study, four QTLs providing resistance to SDS were identified and mapped on chromosomes 4, 8  Frequent discrepancies are observed when comparing GWA studies for the same trait, possibly due to variations in allele frequency between populations, a lack of population structure control, or environmental factors. With the availability of standardized marker data across the USDA soybean germplasm collection, several analyses have used historical records and GWAS analysis to map important major effects of MTA, including those associated with one bacterial, five fungal, two nematodes, and three viral diseases [42] and beet armyworm, Mexican bean beetle, potato leafhopper, soybean aphid, soybean lopper, and velvet bean caterpillar [43]. However, using raw measurements from different environments for many quantitative traits, including disease resistance, may reduce the detection power for significant MTAs. While data from the same environment(s) have a common environmental component, combining multiple panels grown in different conditions results in the incorrect assignment of environmental effects to genetic variations across the panels involved. Meta-analysis can provide a better solution for the above-mentioned GWAS challenges and can be performed using data from independent studies and available statistical methodologies. For instance, in 2021, a meta-GWAS involving 73 published studies covering 17,556 diverse soybean accessions the mapping resolution of GWAS. While GWAS is used widely for many plant species [37,38], relatively few studies on the traits contributing to resistance to different biotic stresses have been reported in soybean (Table 2). Recently, GWAS for sclerotinia stem rot using 22,048 SNPs identified 18 marker trait associations (MTAs) and 243 underlying candidate genes [39]. The two most promising candidate genes, Glyma.03G196000 and Glyma.20G095100 encoding pentatricopeptide repeat proteins were characterized using haplotype analysis [39]. In another study, four significant MTAs were identified for frog eye leaf spots [40]. Subsequently, annotation of 45 genes in the two resistance-related haplotype blocks identified the following promising candidate genes: (i) mitogen-activated protein kinase 7, (ii) pyruvate dehydrogenase, (iii) calcium-dependent protein kinase 4. Tran et al. [41] identified 12 significant MTAs for SCN in 461 soybean accessions using GWAS. They also reported 24 potential genes, including three genes colocalized with previously reported R genes on chromosome 7. Despite several advantages, unanticipated linkage disequilibrium (LD), alleles with low frequency, disease heterogeneity, synthetic associations, missing genotypes, and outcome disparities between various methods used continue to bottleneck the GWAS approach.   Glyma.14G024300, Glyma.14G026300, Glyma.14G026500, Glyma.14G026700 [178] immunoprecipitation followed by high-throughput sequencing technologies (ChIP-seq). Several reports of ChIP-seq are available for humans and the model plant species Arabidopsis but not for biotic stress resistance in soybean. This technology has the potential to unravel biotic stress dynamics and epigenetic modification in soybean.

Genome editing for developing resistant soybean cultivars
With the advent of large sequencing data, mapped QTLs, identified candidate genes and trait-associated SNPs and InDels, the architecture of plant genomes can be reshaped using genome editing approaches to suit the needs of the growing population. Among others genome editing tools, CRISPR/Cas9 has emerged as the most successful and widely applicable genome editing tool owing to its specificity and minimum off-targets. Genes involved in biotic stress responses can be edited using CRISPR/Cas9 to impart resistance and generate multiple sources of defense response to different pests and pathogens. For instance, editing of the effector molecule Avr4/6 of P. sojae inhibited its identification by corresponding R proteins in soybean (Rps4 and Rps6), resulting in immunity activation and resistance against damping-off disease [50]. Another study reported resistance to SMV and enhanced isoflavone content by simultaneously knocking out the expression of three genes (viz., GmF3H1, GmF3H2, and GmFNSII-1) in the soybean genome using CRISPR/Cas9 [51]. Recently, novel R gene paralogs in soybean were developed through targeted cleavage of tandem duplicated regions of NBS-LRRs genes and subsequent repair of DNA [52]. Since NBS-LRRs are common targets of pathogenicity, targeting these genes with double-strand breaks induced by genome editing followed by natural DNA repair could trigger underlying immunity. Two genes, Rpp1L and Rps1, which interact with Phakopsora pachyrhizi [53] and P. sojae were targeted to demonstrate the efficacy of the technology. Genes for biotic stress resistance have been preserved in wild crop species, landraces, and diverse germplasms; however, substantial variation has been eroded in the domestication process. Thus, genome editing can offer a diverse set of biotic stress resistance genes for mitigating the challenges posed by various pests and pathogens [54].

Role of genomic selection in improving resistance against biotic stresses in soybean
Genomic selection (GS) can be used to address the limitations associated with linkage-based mapping, GWAS and marker-assisted breeding for biotic stress resistance [55][56][57]. GS has been used in soybean for various traits identified numerous potential candidate genes, including 33 for disease resistance traits [44].

QTL-seq, MutMap, M2-seq, and Chip-seq: progress and prospects
Michelmore et al. [45] proposed a powerful concept, bulk segregant analysis (BSA), for mapping traits of interest showing Mendelian inheritance. This concept has been modified for its applicability in re-sequencing-based techniques such as MutMap and QTL-seq, which are cost-effective extensions of BSA that align short reads of bulks to the reference genome instead of re-sequencing all the progenies of the biparental population. Several traits in soybean have been mapped using Mutmap and QTL-seq approaches [46]; for instance, QTL-seq of soybean identified genomic regions on chromosomes 5, 8, and 14 conferring resistance to charcoal rot resistance [47]. Thus, Mutmap and QTL-seq are potential tools for future biotic stress resistance studies in soybean. However, these techniques have certain limitations, such as the requirement for backcrossing and selfing generations, rendering it time-consuming. Later, an extension of these techniques named 'direct whole genome resequencing' was proposed to overcome the limitations of the above tools [48]. The potential of this novel technique was also demonstrated by mapping genomic regions associated with SCN resistance using two mutants of the soybean genotype 'PI 437654' [48]. Comparing the short reads of six re-sequenced genomes of the mutants to the reference genome, 'Williams 82' narrowed the genomic region to three candidate genes encoding an ankyrin repeat family protein, LRR-receptorlike protein kinase family protein, and signal peptidase carrying ten mutations for the resistance gene. Zhou et al. [49] proposed another re-sequencing strategy for soybean known as M 2 -seq. The authors successfully mapped the genomic regions containing several causal mutations for mutant phenotypes among ten independent M 2 populations and identified Glyma.08G193200 as a candidate gene controlling trichome density involved in broad-spectrum insect resistance. The technique is less laborious than MutMap and has great potential for exploiting disease and insect resistance in soybean.
Biotic stresses trigger a plethora of genes, transcription factors (TFs), and regulatory bodies in the plant genome, inducing a profound network of stress resistance pathways. Unleashing genome-level changes in genes involved in defense mechanisms requires integrating knowledge of regulatory and non-regulatory genomic regions, particularly epigenetic modifications. The specific sites of TF binding and their related target genes, including those involved in biotic stresses, can be identified using chromatin genotypes using 4,089 SNPs. An average GS accuracy ranging from 0.31 to 0.46 was reported when all SNPs were considered for GS, whereas it ranged from 0.55 to 0.76 when GWAS-derived SNP markers were used as fixed effects in the GS models. In another study, when marker effects were evaluated using SNPs obtained from GWAS with the GS model, the maximum accuracy was achieved (~ 2-fold) at each level of cross-validation [63]. Collectively, we have summarized the current breakthroughs in soybean genomics that are being utilized to improve soybean resistance to different biotic stresses, as represented in Fig. 2.

Challenges and prospects of soybean genomics for biotic stress resistance
With substantial advances in NGS technologies, sequencing and genotyping have become more available and affordable, enhancing the application of genomics approaches for soybean improvement. However, the true challenge is discovering genome features that define biology; an assembled reference genome sequence is merely a foundation [16].
contributing to resistance to different biotic stresses [58,59]. For instance, using GS for SCN resistance in soybean was more effective than using MAS, with a prediction accuracy of 0.59-0.67% [59]. A GS study with 697 accessions in the training population and 18,955 accessions in the breeding population was used to predict susceptibility to tobacco ring spot virus; a significant correlation (r 2 = 0.67) occurred between the predicted and actual severity of the disease [60]. In another study, GS was performed for quantitative disease resistance against P. sojae in two diverse panels of more than 450 plant introductions; the study estimated a relative efficiency per breeding cycle of 0.57-0.83, which may significantly improve the genetic gain for P. sojae disease resistance in soybean breeding programs [58].
Studies have also shown that incorporating significant SNPs identified through GWAS as a fixed effect in GS models can increase the prediction accuracy for complex traits [61,62]. For instance, Ravelombola et al. [63]   subsequent application in disease and pest resistance breeding in soybean. Haplotype-based breeding aims to transfer superior haplotypes underlying genetic variants in strong LD with putative genomic regions linked to traits of interest. This haplotype-based breeding approach may also be quite helpful in soybean hybrid breeding, particularly in the selection of parents based on the availability of diverse and superior haplotypes. The selected parental lines with a diverse set of haplotypes could be better suited for developing the next-generation of superior haplotypes. If the parental lines do not contain the superior haplotypic combinations, new parental lines with the intended haplotypes can also be produced via haplotype-based breeding.In recent years, haplo-pheno analysis has been used to detect superior haplotypes in some crop species [66]. However, haplo-pheno analyses have not been reported for soybean traits providing resistance to different biotic stresses. Efforts are needed to identify superior haplotypes for biotic stress resistance in soybean; the recently constructed haplotype map for soybean using WGS data may prove useful in this regard [64].

Soybean transcriptomics
Transcriptome analysis of protein-coding, non-coding, and regulatory genes is essential for comprehending the complex molecular mechanisms and gene regulatory networks governing resistance against different biotic stresses. Proteincoding genes include functional genes that are involved in downstream signaling related to defense-related pathways. In contrast, non-coding genes such as microRNAs (miR-NAs), long non-coding RNAs (lncRNAs), and regulatory genes (TFs) regulate the expression of functional genes. Several methods, including northern blotting, microarrays, and RNA-sequencing (RNA-seq) analysis, have been used to conduct transcriptomic studies in soybean. Consequently, microarray profiling and deep sequencing are being used to identify many biotic stress-responsive miRNAs, which could be exploited to engineer biotic stress resistance in soybean.

Northern blot analysis for biotic stress resistance in soybean
Northern blot analysis is a method for analyzing the size and steady-state level of specific RNA in a complex sample. It helped in understanding the relationship between constitutive expression levels of defense-related genes and varying degrees of partial resistance to P. sojae in soybean cultivars [67]. The accumulated levels of PR1a, MMP, IPER, and beta-1, 3-endoglucanase transcripts 48-72 h post inoculation (hpi) were associated with high levels of partial Even though every cell has the same DNA sequence, epigenetic changes and gene expression differ substantially depending on the environment, developmental stage, and tissue type. Future efforts are needed to fully understand these biologically active states of DNA in the soybean genome. Such investigations may involve several forms, including integrating maps of DNA methylation, small RNA, histone modification, and transcript abundance measured across multiple tissues and environments. Furthermore, pangenomics is gaining popularity among scientists for tapping into species diversity, owing to genome sequencing and resequencing data availability. The newly constructed graphbased soybean pangenome revitalizes earlier omics data while revolutionizing functional and evolutionary genomic studies in soybean [21]. We suggest a more comprehensive approach in which we aspire for a super-pangenome, with the developing NGS technologies and decreasing costs.
Genomics-assisted breeding (GAB) offers the potential to solve future soybean improvement challenges but can only be realized by precisely detecting marker trait relationships (through interval mapping and GWAS) and estimating GEBVs (via GS). Both interval mapping and GWAS require precise genotyping and phenotyping data for gene identification. Breeders have reported considerable gaps between identifying QTLs/genes and their subsequent application in soybean improvement. The use of manual phenotyping and low-throughput marker systems, which have generated a genotype-phenotype (GP) gap, is the fundamental reason for the minimal use of MAS in soybean improvement. Progress in high-throughput phenotyping and genotyping can reduce the GP gap for crop selection and gene identification in soybean.
Continous publishing of raw phenotypic results from screenings will improve the power of identifying relevant genes through meta-GWAS for both mean and plastic responses, reducing the expense and time load on individual programs while helping future breeders and researchers. Furthermore, as QTLs conferring resistance to different diseases and pests are mapped to the soybean genome, their spatial relationships (in the chromosomal context) can be assessed using meta-analysis of available QTLs. Co-localization of QTLs for different diseases and pests offers evidence for multiple disease resistance loci, as observed in different crops [36].
The advent of long-read sequencing technologies is hastening the identification of haplotypes that aid genome assembly [64], which can be used for various applications; for instance, haplotype-based GWAS analysis. The superiority of haplotype-assisted GS over SNP-based genomic prediction has also been reported for charcoal rot resistance in soybean [65]. Developments in high-throughput phenotyping will also help identify superior haplotypes for economically important diseases and pests have been studied using RNA-seq (Table 3).
Successful use of RNA-seq data for detecting and identifying soybean viruses was demonstrated by Jo et al. [72] for five viruses, including SMV, for marker-based diagnosis of crop pathogens. Similarly, Díaz-Cruz et al. [73] harnessed the potential of the RNA-seq approach for identifying major and residual pathogens, including bacteria, fungi, and viruses. For SCN resistance, DEGs associated with hormone signal transduction pathways, MAPK signaling, and WRKY-controlled transcriptional regulation were found to be important [74]. The identified disease resistance genes could be used as molecular markers for breeding SCN-resistant soybean. RNA-seq also helped to characterize three-way interactions between soybean, soybean aphids (SBA), and SCN [75], which could be used to mine disease resistance genes involved in complex regulatory pathways. Moreover, the intricate process behind the non-host resistance (NHR) response to SCN was also understood using this approach [76]. This laid the foundation for NHR research that could lead to development of disease-resistant soybean cultivars. Psuedomonas syringae resistance, thus establishing the role of these proteins in the defense response. This may assist in identifying the underlying genes/QTL encoding proteins participating in defense pathways which is the primary step in resistance breeding programs.

Microarray analysis for biotic stress resistance in soybean
While northern blotting is an efficient technology for studying the expression of a few genes, microarray analysis simultaneously involves high-throughput detection of a large number of genes encoding proteins and small RNAs (sRNAs). Microarray-based identification of genes followed by validation using real-time polymerase chain reaction (RT-PCR) has helped identify target genes. Khan et al. [68] used a microarray carrying 1,300 cDNA inserts to detect differentially expressed genes (DEGs) in soybean root cells two days post-invasion (dpi) of SCN. Genes encoding for repetitive proline-rich glycoprotein, endoglucanase, peroxidase, defense, and signaling proteins were highly expressed. Another study by the same group identified several DEGs, such as WRKY6 TF, trehalose phosphate synthase, EIF4a, Skp1, and CLB1 [69]. Affymetrix Soybean GeneChip containing 37,500 soybean probe sets was used by Ibrahim et al. [70] to investigate the expression of soybean genes in root galls formed by the root knot nematode 12 days and ten weeks after infection. Differential gene expression patterns and Kyoto Encyclopedia of Genes and Genomes pathway analysis identified enzymes involved in changing carbohydrate and cell wall metabolism, cell cycle control, and plant defenses. Likewise, Studham et al. [71] studied the transcriptional response toward aphid resistance. The study revealed changes in a single transcript in the resistant cultivar compared to the induction of numerous defense-related genes in a susceptible cultivar that are otherwise constitutively expressed in resistant plants. Associations between the salicylic acid (SA) and jasmonate pathways and successful defense responses in resistant cultivars were established. However, aphids suppressed these pathways in susceptible cultivars. These findings are significant in revealing the role of disease resistance genes in insect resistance.

RNA-seq analysis for biotic stress resistance in soybean
While several microarray studies have been conducted to understand disease resistance mechanisms in soybean, RNA-seq has become the technology of choice with the advent of NGS platforms. Transcriptomics via NGS allows us to study the expression of novel genes and the regulation of gene expression at the post-transcriptional level. Several Bacterial leaf pustule 2,761 PAMPs (pathogen-associated molecular pattern) and DAMPs (damage-associated molecular pattern) [78] single-cell methods have the potential to revolutionize the study of soybean development and tissue-specific responses to biotic stresses. The comprehensive resource such as "soybean cell atlas," may prove a very valuable and distinctive reference data set for soybean improvement. Combining this understanding at the single-cell level with modern methods for editing gene expression, such as CRISPR/Cas technologies, can also enable targeted gene expression modifications that will enhance important traits and result in soybean plants that are resistant to biotic stresses.

Involvement of microRNAs in biotic stress resistance in soybean
MicroRNAs (miRNAs) are short non-coding RNAs with 18-24 nucleotides that regulate gene expression at the posttranscriptional level by degrading or inhibiting the translation of target mRNA. In soybean, several studies have reported the integral roles of miRNAs in providing resistance to different biotic stresses (Supplementary Table 2). For instance, microarray profiling of soybean infected with P. sojae revealed 20 differentially expressed miRNAs (targeting the LRR proteins, protein kinases, and TFs). In the same year, microarray profiling of soybean plants infected with P. sojae revealed kinases to play major roles in biotic and abiotic stress resistance [84]. Other examples inlude ASR [85], bean pyralid larvae [86], SCN [33,87], an viruses [88].

Involvement of lncRNAs in biotic stress resistance in soybean
Long non-coding RNAs (lncRNAs) are defined as noncoding transcripts that are at least 200 nucleotides long. They have multiple mechanisms of action via epigenetic, transcriptional, and post-transcriptional regulation of gene expression in response to various stress. For instance, they regulate miRNA expression by binding and sequestering target miRNAs and participating in mRNA expression regulation. To understand the role of lncRNAs in providing defence against nematode infection in soybean, Khoei et al. [89] identified 384 lncRNAs against SCN and 283 lncRNAs against reniform nematode (Rotylenchulus reniformis) infection in soybean. Few of these lncRNAs interacted with DEGs, whereas, few were involved in microRNA-mediated regulation of gene expression in soybean in response to SNC and reniform infection.
is a non-host-specific pathogen causing chlorosis by producing coronatine (COR) toxin. Analysis of COR-treated and control soybean plants revealed DEGs belonging to JA production, ethylene synthesis, phenylpropane metabolism, photosynthesis, and SA signaling pathways [77]. Kim et al. [78] revealed the genes involved in the plant defense mechanism against bacterial leaf pustule caused by Xanthomonas axonopodis pv. glycines using two nearisogenic lines (NIL). The BLP-resistant NIL had highly expressed genes for the pathogen-associated molecular pattern and damage-associated molecular pattern. Additionally, the resistant and susceptible NILs differentially regulated components of the JA signaling pathway, particularly MYC2 and JASMONATE, ZIM-motif. GmMYB TFs were demonstrated to confer resistance by regulating lignin biosynthesis and phenylpropanoid pathways, mediated by Rpp2 and Rpp5 and upregulated in resistant genotypes [79]. Differential gene expression in Aphis glycines infestation identified DEGs related to stress and detoxification, including cytochrome P450s, glutathione-S-transferases, carboxylesterases, and ABC transporters [80]. Comparative transcriptome profiling in the defense mechanism of bean pyralid larvae (Lamprosema indicata) revealed that most DEGs were involved in the synthesis of substances resisting the insect attack [81]. Overall these studies shed light on the molecular mechanisms driving resistance which could be useful for biotic stress resistance breeding.

Soybean expression atlas
The soybean expression atlas was developed by Severin et al. [82] using expression data from 14 diverse tissue sets, revealing tissue-specific gene expression of highly expressed genes and genes specific to legume seed development and nodal tissue. Further, Machado et al. [83] established an extensive soybean expression atlas (http://venanciogroup. uenf.br/resources/) comparing 1,298 publicly available soybean transcriptome datasets with the reference genome. These tools will help to identify tissue-specific genes and their expression patterns across different tissues. The promising candidate genes could be introgressed to develop soybean cultivars resistant to biotic stress.

Prospects of single cell sequencing in breeding for biotic stress resistance
The term "single-cell sequencing technologies" refers to the sequencing of a single cell's genome or transcriptome in order to obtain genomic, transcriptome, or other multiomics data in order to reveal differences between cell populations and cellular evolutionary relationships. By exposing hitherto unknown players and gene regulatory processes, of protein surpasses coding sequences as there is no oneto-one relationship between protein numbers and coding sequences, making protein identification more complex than gene identification. The impact of biotic stresses has multifarious effects on proteome content as it alters the quantity, cellular localization, PTM, protein-protein interactions, and biological function [93]. Proteomics is crucial for providing comprehensive information on protein categories in the cell/tissue/plant under stress conditions as it regulates the plant epigenome transcriptome and metabolome. They are also directly involved in signaling and triggering immunity responses to external pathogens and pest attacks. Soybean proteomics mainly focuses on the comparative analyses of protein abundance between tolerant/susceptible cultivars, species, cultivar-specific metabolic processes, and pathways that help reveal differentially expressed proteins (DEPs).

Approaches in proteomics
Proteomics includes aspects such as sequence, structure, function, and expression of proteins. Prior to proteomics, chromatography-based techniques such as ion-exchange chromatography, size exclusion chromatography, and affinity chromatography were used routinely for protein purification. More recently, high-performance liquid chromatography has been used to determine amino acid sequences and characterize single proteins. Information on three-dimensional protein structure is important for predicting its putative function using computer modeling approaches and visualization techniques such as nuclear magnetic resonance, electron microscopy, crystallography, and X-ray diffraction. Yeast hybrid assays and microarray analysis can assess protein function. Western blotting and enzyme-linked immunosorbent assays are commonly used to detect target proteins; however, various modern methods are available to identify and quantify thousands of DEPs simultaneously (Fig. 3), as discussed in the following sections-.

Gel-based approaches for protein profiling
The most commonly used gel-based techniques for protein purification/isolation are one-dimensional gel electrophoresis (1D-GE), two-dimensional gel electrophoresis (2D-GE), and two-dimensional polyacrylamide gel electrophoresis (2D-PAGE). Several proteomics studies have reported the presence of disease/defense-associated proteins under different diseases ( Table 4). The first proteomic study documenting biotic stress resistance in soybean used 2D-GE to unravel the role of allele rhg1 in destroying giant cells formed in roots by SCN. Interestingly, the Kunitz-type trypsin inhibitor protein, a major anti-nutritional factor in

Challenges and prospective of transcriptomics for biotic stress resistance
Transcriptome studies have helped to unravel the complex mechanisms involved in disease resistance. The number of publications on soybean transcriptomics under biotic stress has doubled in the past decade [46]. Key genes involved in disease resistance mechanisms belong to hormonal signaling pathways (JA, SA), TFs such as MYBs, WRKY, and typical NLRs specifically carrying NBS-LRRs. Promising genes could be exploited to develop functional markers for soybean breeding programs.
Hundreds of miRNAs have been identified in soybean using high-throughput or deep sequencing technologies. Many studies have reported differential expression of miR-NAs and their targeted genes under various biotic stresses. Therefore, miRNAs could act as potential candidates for improving our understanding of biotic stress responses in soybean at the molecular level. Identified miRNAs could be manipulated using overexpression/repression of stressresponsive miRNAs and/or their target mRNAs, miRNAresistant target genes, target-mimics, and artificial miRNAs (amiRNAs) to engineer plants for enhanced biotic stress tolerance. The amiRNAs approach was designed to suppress the expression of a protein-coding mRNA in tobacco to gain resistance against the cucumber mosaic virus by inhibiting the expression of 2b gene [90]. Most of the target genes of identified miRNAs have been predicted via computational analysis, and only a few are validated experimentally [91,92]. Future experiments could cater validation of important miRNAs. Recently, miRNA-SSRs have been used to study genetic diversity among wheat genotypes to discriminate contrasting wheat genotypes for nitrogen and phosphorus use efficiency. Such miRNA-SSRs could also be designed in soybean for MAS to develop biotic stress resistant soybean varieties. An increasing number of studies have suggested that mRNAs, miRNAs, lncRNAs, and TFs could play important roles in various biological processes that are associated with the defence mechanisms against various biotic stresses. However, the information related to the interplay amongst them is lacking which could be beneficial to determine the precise regulatory mechanisms of miRNAs, lncRNAs, TFs and mRNAs in biotic stress resistance in soybean.

Soybean proteomics
An organism's genome, inherited from its parent, is static in nature and remains the same in all cells throughout its life cycle. However, proteins are dynamic in nature as they are subjected to alternate splicing and post-transcriptional and translational modifications (PTM). Thus, the number Gel-free: Label-free approaches for protein profiling Substantial advances in proteomics have occurred in the past two decades through numerous 'gel-free' methods for separating proteins. There has been a shift from 'gel-based' methods to 'gel-free' approaches due to their increased sensitivity, reproducibility, detection of low-abundance proteins, precision, multiplexing, and identification of PTMs [97]. Mass spectrometry (MS) is the most common gel-free method, which can be label-based or label-free. Label-free approaches quantify the relative MS ion signals and compare them with reference databases [98]. Label-free approaches can generate a large catalogue of proteins in complex cell extracts and detect low-abundance proteins using subcellular fractionation. Liquid chromatography-mass spectrometry (LC-MS/MS) is a basic label-free approach for high-throughput proteomics.
Further improvements in this technology for identifying proteins in complex mixtures involve fractionation methods, known as 'shotgun proteomics' or a 'bottom-up' strategy. Other methods including ultra-performance liquid chromatography-mass spectrometry, selective reaction monitoring, and multiple reaction monitoring, quantify proteins using triple-quadruple MS. The molecular mass of proteins can also be assessed using electrospray ionization and collision-induced dissociation [99,100]. However, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) and liquid chromatography/tandem mass spectrometry (LC-MS/MS) are the commonly used approaches for high-throughput detection [101] and have improved the pipeline for de novo assembly [98]. Other advances in MS include Fourier transform and Orbitrap-based MS.

Gel-free: Label-based approaches for protein profiling
Label-based approaches require chemical/metabolic labeling of proteins or peptides [95,98]. Major label-based approaches which allow quantification and comparative/ differential proteomics are as follows-stable isotope labeling by amino acids in cell cultures (SILAC), isobaric tags relative and absolute quantification (iTRAQ), isotope-coded affinity tag (iCAT), dimethyl and 18 O labeling, tandem mass tag, and 15 N labeling. The SILAC method is based on in vivo chemical labeling of cell cultures. However, cell cultures do not fully mimic the behavior of fully grown plants. Also, the culture must be auxotrophic for labeling the specific amino acid, but plants are autotropic, thus decreasing efficiency during the labeling process [95]. Nevertheless, SILAC is an extremely powerful dynamic technique that can quantify constant modifications in the proteome. soybean, was differentially expressed in the non-infested genotype [94], indicating its importance in SCN resistance.
However, the gel-based approach lacks sensitivity for detecting proteins in low abundances, such as those involved in signal transduction or regulatory mechanisms, especially in complex combinations [95]. Nevertheless, 2D-GE remains popular as thousands of proteins can be detected with relatively easy sample preparation. Coomassie blue can be replaced by fluorescent dyes to increase sensitivity [96], such as in difference gel electrophoresis and Refraction-2D™ by NH DyeAGNOSTICS. While advances in 2D-GE have improved resolution and reproducibility, it lacks automation for high-throughput protein analysis.   the interaction between the host and rust-causing pathogen (P. pachyrhizi) using 2D-GE and LC-MS/MS and identified two key proteins (viz., pathogenesis-related protein 10 and chalcone isomerase 1). Several soybean proteomic studies have used 2D-GE and MALDI-TOF for different biotic stresses, including SMV [109], root and stem rot [110], soybean rust [111], cotton worm (Prodenia litura) [112], and MYMIV and MYMV [113]. Non-host resistance is an extremely powerful line of defense in plants. Dong et al. [114] investigated the NHR response in soybean to Bipolaris maydis (fungal pathogen commonly affecting maize), which evoked major cell proteins while interacting with fungal spores. The study revealed myriad complex interactions between plant and pathogen.

Soybean databases for proteomics analyses
Different repositories have been developed containing proteome information from different sources and datasets to facilitate soybean proteomics ( Table 1). The first 2D map of the soybean proteome became available as a database in 2005 [115]. Later, Sakata et al. [116] created a soybean proteome database using 21 reference maps. The same group published a newer database with 23 reference maps, analyzing proteins using 2D-GE and nano-LC-MS [117]. The Soybean Knowledge Base is a comprehensive web resource that was developed for bridging soybean translational genomics and molecular breeding research. However, the presence of numerous proteins with unknown functions highlights the limitations of bioinformatic prediction tools and requires further functional analyses to understand the protein makeup of soybean. SoyProDB is another freely accessible database comprising seed proteomics data of the 'PI 423954' soybean The iCAT and iTRAQ methods label proteins after protein/proteolytic digestion extraction and consequently reflect a 'static' assessment rather than the 'dynamic' view in SILAC. Zeng et al. [102] performed proteomics analyses using the iTRAQ approach to understand resistance against soybean webworm (Lamprosema indicata); the resistant plants produced proteins that inhibited the growth and development of the pest. Recently, Bai et al. [103] used this approach to determine the effectiveness of soybean inoculated with Funneliformis mosseae (arbuscular mycorrhizal fungus; AMF) against root rot disease. In another study, this method helped identify a biological control agent, Sinorhizobium fredii strain Sneb183, as a promising candidate against SCN [104].
A comparative approach of post-iTRAQ could be used to identify PTM, such as proteolysis, glycosylation, nitrosylation, phosphorylation, ubiquitination, and interactions among proteins. The phospho-proteomic approach detects phosphorylated amino acid residues quantitatively and qualitatively [105]. It aided in the identification of resistant and susceptible wheat and grapevine varieties against Septoria tritici and phytoplasma, respectively [106,107]. This approach could be extended to soybean for identifying PTM when subjected to various biotic stresses. Most of the techniques mentioned above are used in combination(s) to better understand the protein composition of crops under different biotic stress conditions.

Combining gel-based and gel-free approaches for soybean proteomic studies
Gel-based and gel-free approaches are mostly combined for proteomic analysis. For instance, researchers investigated  accumulation after integrating post-transcriptional and posttranslational regulations. Thus, metabolite abundance and the associated relative change are considered the ultimate response of biological systems to genetic or environmental changes [122,123]. Therefore, metabolomics approaches that capture small molecules present in a biological sample are a promising avenue for studying real-time stress-induced perturbations in plants. Consequently, metabolomic studies have been used widely to study the metabolic reprogramming of plants in response to various stresses [124][125][126], including in soybean [127,128].

Types of metabolomics approaches
Metabolomics is categorized broadly into two main categories: targeted and untargeted metabolomics [123]. As the name suggests, the targeted approach is used when metabolites of interest are known [129]. For instance, Murakami et al. [130], using 13 C isotope labeling, observed an increase in the abundance of isoflavone aglycones and glucosides in response to infestation and oral secretions secreted by tobacco cutworm (S. litura). Such analyses are sometimes referred to as quantitative analyses since targeted analysis generally involves quantifying target metabolites using reference standards and calibration curves, providing high confidence in the identity. However, considering that a wide array of plant metabolites, especially PSM, are unknown and difficult to purify, targeted approaches can be limited. Some of these limitations can be circumvented using untargeted approaches that profile all metabolites present in each biological sample in a given moment. The analyses are performed in conjunction with MS, with metabolites identified by matching the mass, fragmentation spectra, and retention time of compounds with libraries and databases [131]. Further, internal standards are used for quality control between samples and batches. For instance, Zhu et al. [132] compared the differential metabolomic response of resistant (R) and susceptible (S) soybean lines to P. sojae at 12 and 36 hpi. Using GC-MS, they identified 311 metabolites; isoflavonoid and daidzein accumulated in R and S plants, along with reduced metabolites involved in benzoate degradation [132]. Untargeted analyses offer the advantage of identifying novel metabolites; however, the data analysis is complex, with most of the metabolites unidentified [129].
The metabolomic analysis involves three main steps: (i) sample preparation, (ii) instrument analysis, and (iii) data analysis [123,133]. Sample preparation involves all steps from sample harvest before instrument analysis, including metabolite extraction and/or derivatization, to ensure the sample is amenable to instrument analysis. The exact details vary depending on the matrix of study, metabolites of interest, and instruments used for analysis [131,134]. Instrument cultivar [118]. Since SCN is a highly damaging pest of soybean, the proteins associated with SCN are available [118], which could be helpful for developing resistant varieties.

Challenges and prospective of soybean proteomics for biotic stress resistance
Even with the technological advances in proteomics, soybean protein profiling under biotic stresses lags behind other model plants such as Arabidopsis or rice. Soybean is a recalcitrant species for proteomic analysis as it is a paleopolyploid with a complex genome, with far more protein isoforms than the model plant species. Maintaining the secondary and tertiary structures of proteins is difficult while performing proteomic analysis. Therefore, a standard method for protein isolation, separation, visualization, and identification is required for accurate analysis. High levels of phenolic compounds, proteolytic and oxidative enzymes, organic acids, carbohydrates, and secondary metabolites also complicate protein extraction from soybean, affecting downstream processing for proteomics [119]. Moreover, proteomic studies require expensive and sophisticated instrumentation.
As discussed earlier, low-abundance proteins and PTMs, leading to the formation of transient proteins, are hard to detect. Further, data processing and evaluation pose a major challenge. A soybean protein reference map specifically developed for various biotic stresses is required. Therefore, it is vital to have an appropriate, publicly accessible database, algorithms, and open-source programs to create a pipeline for identifying and quantifying proteins. Interactions between proteins and other proteins and metabolites can be explored to understand the functional biology and intercellular cross-talk between plant cells under stress conditions. Identification of specific molecular signatures under biotic stress could assist in cloning the corresponding genes and developing biomarkers. Thus, these findings can be used to create soybean varieties with improved resistance.

Soybean metabolomics
Plants produce various physical and chemical defenses to protect themselves against biotic stresses [120]. Chemical defenses include the production of low-molecular-weight (< 1,500 Da) compounds, commonly known as plant secondary metabolites (PSMs), to counter biotic stress agents [121]. Based on the biosynthesis pathway, PSMs are classified as polyphenols (such as phenolic acids, flavonoids, stilbenes, and lignans), terpenoids, alkaloids, or tannins [122,123]. These defense-associated metabolites are the end products of cellular machinery produced as a result of enzyme activity that depends on gene expression and protein

Metabolomics in understanding biotic stress responses in soybean
Metabolites play direct and indirect roles in plant responses to different biotic stresses. The direct role includes producing metabolites (comprising volatile and non-volatile compounds) for attacking agents or attracting parasite predators or their natural enemies. Indirectly, the compounds produced by attacking pests, such as pathogen effectors, trigger induced systemic resistance in hosts. Moreover, the saliva or regurgitant of caterpillars, comprising fatty acid conjugates, β-glucose oxidase, and peroxidase, acts as an elicitor activating the octadecanoid pathway, leading to several biochemical changes and production of various plant metabolites [121,135]. Further, phytohormones such as SA, ethylene, JA, and brassinosteroids mediate the stress-induced signaling cascades underlying plant defense response [121,135], as represented in Fig. 4. As a result, studies have reported increased biosynthesis of secondary analysis usually involves separating metabolites based on their chemical structure via chromatography, followed by chemical analysis using MS, NMR, ultraviolet (UV)-visible (Vis) absorbance, and capillary electrophoresis. The most commonly used platforms for metabolomics analysis are chromatography-liquid (LC) or gas (GC)-coupled with mass spectrometers (GC-MS, LC-MS) due to their wide adaptability, high sensitivity, resolution, and high throughput [133]. For instance: Silva et al. [133] confirmed the enhanced production of phenylpropanoids in soybean leaves inoculated with P. pachyrhizi infection using UHPLC/MS/ MS. Following instrument analysis, the acquired data is processed and analyzed for treatment comparison [131]. Several open and proprietary databases serve as a significant resource for processing untargeted metabolomics data (Supplementary Table 3). Similar to pathogens, the role of metabolites in mediating interactions between soybean and insect pests has been explored [141]. For instance, Sato et al. [142] conducted a time-course experiment to examine the impact of the foxglove aphid (Aulacorthum solani) on the metabolism of soybean leaves. The disruption in metabolism occurred as soon as 6 hpi, including the accumulation of phenylpropanoid precursors, shikimate, and trans-cinnamate, suggesting a flow of resources to the biosynthesis of secondary metabolites. The chemical composition of soybean also influences insect behavior. Silva et al. [143] reported two times higher oviposition of Euschistus heros on soybean line 'BRS 267,' containing higher sugars and lower isoflavones than other lines. Further, volatile organic compounds, also known as green leaf volatiles, play a significant role in plant-insect interactions by repelling insect pests or attracting their natural enemies [144]. However, the role of these compounds in mediating the soybean response, along with their diversity and composition, remains largely unknown.
Understanding dynamic changes in the plant metabolome due to nematode infection in soybean is in its infancy, with a few studies deciphering changes in metabolite composition due to nematode infestation. For instance, Afzal et al. [94] used global metabolomics, and transcriptomics approaches to identify the pathways underlying the incompatibility of the soybean line 'PI437654' to SCN strain HG1.2.3.5.7 by comparing its response to three compatible lines. Fourteen metabolites were disrupted in 'PI437654,' including upregulation of N-acetyltranexamic acid, nicotine, and tryptophan and their associated KEGG pathways (including tropane, piperidine, and pyridine alkaloid biosynthesis, alanine, aspartate and glutamate metabolism, sphingolipid metabolism, and arginine biosynthesis), indicating their role in defense against HG1.2.3.5.7 infection [141]. This study revealed a significant shift in metabolic fluxes during nematode infections in compatible and incompatible lines. Further, the accumulation of phenolic acids and flavonoids (neobavaisoflavone, glycitin, genistin, and genistein) has been reported in soybean roots inoculated with the nematode Aphelenchoides besseyi, confirming the role of PSM, specifically phenylpropanoids, in nematode infections. Kang et al. [128] used omics approaches to characterize the transcripts and metabolites underlying the tolerance mechanism provided by Bacillus simplex strain Sneb545 against SCN and found that Sneb545-treated soybeans had higher concentrations of nematicidal metabolites than nontreated soybeans. These studies used global metabolomics approaches to identify potential bioactive compounds and pathways that could be used in soybean breeding to produce incompatible interactions against SCN and extend to other host-pest combinations. metabolites that enhance plant resistance against pathogens, insects, and nematodes in soybean [133,136].

Metabolite profiling in soybean
Several studies have unraveled metabolomic changes in soybean in response to bacterial and fungal pathogens [136,137]. The earliest studies in soybean reported the accumulation of isoflavones (daidzein, formononetin, and glyceollins) and their glucosides (daidzin, genistin, and ononin), secondary compounds produced from the phenylpropanoid pathway as a plant defense mechanism to overcome the biotic stress [138]. For instance, Rivera-Vargas et al. [139] investigated the effect of flavonoids on the growth of the fungal pathogen, P. sojae. They reported that the inhibitory effects of compounds were a function of the chemical structure and concentration of the particular compound. One set of compounds, such as coumestrol, biochanin A, genistein, naringenin, and isorhamnetin, were inhibitory at lower concentrations (60-120 μm), while fungicides were inhibitory at higher concentrations (240 μm). Another class of compounds comprised of quercetin, quercetin glucoside, and isoquercitrin had growth inhibitory effects but did not show any fungicidal effects, while the third set of compounds, including daidzein, formononetin, kaempferol, apigenin, chrysin, and rutin had no inhibitory effects in the studied range [139]. Ranjan et al. [140] confirmed the production of benzoic, caffeic, and ferulic acids in the stem of the resistant soybean line  against the fungus, Sclerotinia sclerotium. Another fungal pathogen, Aspergillus sojae increased naringenin, isoflavones, and coumestan levels in soybean [137]. Silva et al. [133] confirmed the enhanced production of phenylpropanoids in soybean leaves inoculated with P. pachyrhizi infection. Interestingly, the abundance of many terpenes decreased in response to infection. It has also been observed that a non-protein acid pipecolic acid formed during stressful conditions regulates local and systemic resistance acquired by plants due to the invasion of bacterial pathogens. Subsequent production of nucleotides by aspartate amino acid oxidation was also reported as an important metabolic pathway for plant defense in soybean before and after pathogen invasion. In addition to amino acids, disruption in the abundance of other primary metabolites has been reported. Zhu et al. [132] reported an accumulation of sugars, organic acids, and amino acids at 12 and 36 hpi with P. sojae. Copley et al. [136] also observed an increase in the abundance of sugars, tricarboxylic acid metabolites, and amino acids along with antioxidants associated with reactive oxygen species signaling post-infection with fungus Rhizoctonia solani strain AG1-IA, indicating a significant reprogramming of plant metabolism to cope with biotic stresses.
understand the activity at each moment, potentially missing important details that might be helpful for integrated pest management strategies. Also, there is no global extraction and sampling procedure standard, reducing data reproducibility [131]. We still have a long journey to develop optimum extraction protocols under a practical set of conditions while maintaining the purity of samples. Furthermore, we must devise sample preparation methods to reduce chemical purities as even a micro-level of interfering molecules can change the results drastically.
It is also challenging to analyze metabolomics data, especially untargeted data comprising thousands of features [129]. Therefore, databases should be comprehensive and updated to include the recent information on the metabolome in question. Robust computational software and skills are needed to fulfill this knowledge. In conclusion, metabolomics has great potential as a standalone and integral part of other omics approaches, gaining significant attention in the recent decade. However, significant research on optimizing metabolomics approaches and developing metabolite databases is needed to solve the challenges of integrated pest management and utilize the benefits of this approach fully.

Soybean phenomics
A phenotype is the outcome of multidimensional interactions of environment, genetics, and crop management practices [148]. Crop improvement programs require precise and rapid assessments of phenotypes for thousands of germplasm and breeding lines over space and time, a feasible option using low-and high-throughput phenomics tools. Plant breeders reap the selection gains as a result of precise phenotyping [149,150]. The last two decades have witnessed advances in the genomics and transcriptomics of major food crops, including soybean, that have greatly enhanced crop production [151]. Molecular genetics coupled with precise phenotypes deepens our understanding of plant genetic architecture. However, progress in highthroughput phenotyping is much slower than high-throughput genotyping, despite being interdependent in developing stress-resilient genotypes [152]. Thus, developments in efficient phenotyping are indispensable for unleashing the benefits of high-throughput genomics. Phenomics is a multidisciplinary technology integrating physics, mathematics, life sciences, and earth sciences coupled with artificial intelligence to analyze multidimensional high-throughput phenotypic data of crops in complex environments [153]. Advanced phenomics techniques should bridge the gap in high-throughput phenotyping. Advanced phenomics tools can be used to characterize phenomes expressed in response Soybean roots also release compounds, including isoflavones, triterpene saponins, and others in the rhizosphere that could mediate plant-pest interactions underground [145]. However, comprehensive information on the role of these metabolites in mediating host-pest interactions in soybean is limited. In addition to improving our understanding of plant-pest interactions, metabolomics is a powerful tool for plant breeding [146], enabling rapid molecular phenotyping to estimate molecular variability in plant collections and screen individuals with a pre-selected composition of metabolites. Thus, defense-associated metabolites, in addition to being candidates for direct introgression into crops, can serve as metabolite markers for selection. The high phenotypic resolution of metabolomics is also attractive for diagnostic purposes and product quality testing [122]. It is indeed the method of choice for estimating the unintended effect of trait introgression on plant yield or the levels of other cellular metabolites and has been used for quality testing of transgenics in soybean [147]. However, the utility of metabolomics approaches could be improved by integrating them with other omics strategies. For instance, the simultaneous application of metabolomics, transcriptomics and/or proteomics could increase our understanding of the regulatory networks of metabolic pathways and improve gene annotation. In soybean, such integrative approaches have been used in response to biotic stresses [128,136], yielding important insights into cellular responses to biotic stimuli. Metabolomics can also be associated with a genetic map to identify pathway-based QTLs from a population or with GWAS that considers genetic diversity in a large germplasm collection to obtain molecular and metabolic associations with important traits. Thus, incorporating metabolomic approaches in soybean breeding programs as a selection and validation tool could improve breeding efficiency and reduce the time required for producing new varieties [131].

Challenges and prospective of soybean metabolomics for biotic stress resistance
With such enormous and far-reaching benefits, plant metabolomics concerning biotic stress has made commendable advances in elucidating defense-associated metabolites [124,125]. However, due to the extensive diversity of phytochemicals, hundreds and thousands of metabolites and their potential defense-associated roles remain to be discovered. Further, metabolomics is in its infancy compared to genomics and transcriptomics. No single metabolomics technique can analyze all metabolites present; instead, we must examine the same sample using multiple platforms to expand the coverage of the metabolomic landscape [129]. Moreover, the complexity of the plant system can change the metabolome in a few seconds. Hence, it becomes difficult to photochemical reflectance indices, and green normalized differentiation vegetation indices. These indices provide information on different physiological and stress attributes in the plants. Tractor-mounted multispectral sensors were used to detect sudden death syndrome in soybean before the visual symptoms over the canopy can appear, which could assist in planning precautionary measures. Herrmann et al. [158] used partial least square regression on the multispectral data collected from a spectrophotometer to classify SDS-affected soybean plants using the reflectance from SDS-inoculated and controlled fields.

Thermal imaging
Thermal imaging sensors measure reflection from infrared regions to determine transpiration rate and canopy temperature, potentially linked to different biotic stresses in plants. These imaging tools have been used to detect biotic and abiotic stresses, plant water status, and maturity in soybean [149]. Sandmann et al. [159] used thermography imaging to detect biotic stresses in lettuce; the same technique could be used to differentiate and detect the occurrence of various biotic stresses in soybean.

Fluorescence imaging
Photosystem II's fluorescence emissions determine phytochemistry changes in plants using fluorescence imaging [149]. These fluorescent sensors provide information on chlorophyll content, photosynthetic rate, and various physiological processes, which are linked indirectly to different biotic stresses in plants. Das et al. [150] identified 58% of plants infested with SCN and SDS using fluorescence imaging. These spectral systems can also be attached in tandem with microscopes, tractors, robots, aircraft, and satellites.

X-ray computed tomography
These imaging tools provide 3D tomographic images of objects using various 2D and 3D radiographic images. Imaging tomography has been used to study root architecture by separating objects based on their densities. X-ray CT has also been used to study quality traits and plant morphology. Various other tools are available to study different biotic stresses in crop plants, including light detection and ranging, flight time, positron emission tomography, radio detection and ranging, sound detection and ranging, and magnetic resonance imaging [149,151,156].
to biotic stresses and challenging climatic conditions to develop sustainable cultivars.
A plethora of techniques in combination with machine learning and complex computer algorithms are currently being explored for accurate and timely detection of diseases caused by biotic factors in soybean [55,154]. Advances in phenotypic technologies have significantly facilitated the detection, differentiation, and quantification of different diseases in the early stages. Early and reliable disease detection is important, especially in soybean, as it is a major commercial crop with current annual yield losses of approximately 40% due to various biotic agents [155]. Several phenotyping platforms, such as unmanned aerial vehicle (UAV), satellite imagery, aerial, and ground sensing robots, are equipped with imaging tools like optical imaging, chlorophyll fluorescence imaging, spectroscopy techniques, red, green, blue (RGB), multispectral, hyperspectral, and thermal imaging for detecting biotic stresses in crop plants. Below, we provide an overview of different imaging sensors and how they have been used for biotic stress detection in soybean.

Visible imaging
Colored or RGB cameras are used to take real-time images of an object, equivalent to human perception. This is one of the cheapest and most used imaging techniques to provide phenotyping data for phenomics-assisted selection [156]. RGB imaging techniques provide three important bands (red, green, and blue) of the electromagnetic spectrum, which can be used to select and classify biotic stresses in crop plants. Visible imaging has been used to detect diseased patches for soil-borne diseases in soybean [157]. These imaging sensors use the differential response of plants to optical properties at different electromagnetic spectra, as stress induces changes in leaf morphology, color, reflectance, and fluorescence. Tetila et al. [157] mounted RGB cameras on a UAV platform to capture images for soybean foliar diseases, including SDS, IDC, and FLS, using image segmentation, feature extraction, and classification techniques of leaf physical properties to separate diseased and healthy plants.

Multi and hyperspectral imaging
These imaging techniques cover visible and infrared regions of the electromagnetic spectrum to provide an overview of different physiological processes occurring in crop plants. Hyperspectral imaging has a very narrow bandwidth and provides a plethora of data to make decisions. Hyperspectral imaging can scan at 1 nm intervals, improving the detection limits manifold [152]. Information on the bands from these imaging tools can be used to extract various vegetative indices such as anthocyanin reflection indices, simple ratio, numerous 'omics' methodologies have the potential to decipher the mechanisms underlying complicated biotic stress pathways. The availability of the soybean genome sequence has accelerated the study of functional architecture, responses, and signaling under stress conditions. Soybean breeding has gained momentum in the post-genomics era. However, further efforts are needed to understand the plant response to numerous biotic stresses in the field, as the information generated must be collated, interpreted, and validated in proteomic and metabolomic experiments. Furthermore, a major issue with breeding for biotic stress resistance is the continuous evolution of pathogens that outpaces the development of resistant cultivars, necessitating a pressing need for genetic diversity and variation in the breeder's pool. The information gleaned from a multi-omics approach will help determine the direction of soybean breeding resistance in the gene pool. Funding This research received no external funding.

Data availability Not applicable.
Code availability Not applicable.

Declarations
Conflicts of interest/Competing interests There are no competing interests declared by the authors.
Ethics approval Not applicable.

Challenges and perspective of phenomics for biotic stress resistance in soybean
Challenges continually accompany technological developments, including high-throughput phenotyping in plants, especially biotic stresses in soybeans. Most phenomics studies on biotic stress detection are conducted under controlled conditions; hence, methods need to be developed for their field implementation. The biggest challenge in image processing is separating the background and adjusting for environmental variations. Most soybean biotic stresses cause similar chlorotic and necrotic symptoms, making it difficult for RGB cameras to separate infections. Moreover, image processing becomes more complicated if multiple symptoms are observed on a single plant. Research is underway using hyperspectral imaging to remove these bottlenecks.
High-throughput phenotyping involves various steps, mainly image capture, data storage, image processing, trait extraction, feature selection, machine learning/classification, and decision-making modeling. New imaging sensors are needed that adjust for environmental variation, especially cloud cover and light conditions, some of the limiting factors of data collection. New phenotypic platforms that handle many sensors at a reasonable cost could provide information for inferring different biotic and abiotic stress conditions. The current portable sensors are limited to a few fixed points, limiting high-throughput implementation.
Machine and deep learning have been used for image classification and disease detection; however, their accuracy is questionable. More robust models should be built for transfer and reinforcing learning for timely detection and prediction of stress conditions in crop breeding. Deep learning models have great potential with their advantage of autonomous learning of the pattern in images and data using various non-linear activation functions. Important deep learning models used for high-throughput phenotyping include convolutional neural networks, generalized neural networks, recurrent neural networks, and multilayer perceptron.

Conclusion and future perspectives
High-throughput scientific and technological advances have generated enormous data at the genomic, transcriptomic, proteomic, metabolomic, and phenomic levels. With the blurring of disciplinary boundaries and the emergence of computational biology, the objective is no longer limited to identifying key genetic players or regulatory elements underlying the stress response but also to measuring modulations at the protein and metabolite expression levels within cells. Consequently, thorough investigations integrating