Exploiting genomic tools for genetic dissection and improving the resistance to Fusarium stalk rot in tropical maize

Fusarium stalk rot (FSR) is a global destructive disease in maize, the e�ciency of phenotypic selection for improving FSR resistance was low. Novel genomic tools - genome-wide association study (GWAS) and genomic prediction (GP) - provide an opportunity for genetic dissection and improving FSR resistance. In this study, GWAS and GP analyses were performed on 562 tropical maize inbred lines consisting of two populations in four environments under arti�cial inoculation. In total, 15 SNPs signi�cantly associated with FSR resistance were identi�ed across two populations and the CombinedPOP consisting of all 562 inbred lines, with the P-values ranging from 1.99×10 -7 to 8.27×10 -13 , and the phenotype variance explained (PVE) values ranging from 0.94 to 8.30%. The effects of the 15 favorable alleles ranged from -4.29 to -14.21%. One stable genomic region in the interval of 0.95 Mb from 250,089,724 bp to 251,044,933 bp on chromosome 1 was detected across all populations, and the PVE values of the detected SNPs ranged from 2.16 to 5.18%. Medium GP accuracy of FSR severity, 0.29 to 0.51, was observed in two cross-validation (CV) schemes. When incorporating genotype-by-environment interaction, GP accuracy was improved from 0.36 to 0.40 in the CV1 scheme, and from 0.42 to 0.55 in the CV2 scheme. Considering both the genome coverage and the total PVE of SNPs for selecting a subset of molecular markers further improved the GP accuracy. These �ndings extend the knowledge of exploiting genomic tools for genetic dissection and improving FSR resistance in tropical maize.


Introduction
The Food and Agriculture Organization (FAO) estimates that by 2050 the world's population will surpass 9 billion (Nations and United Nations.2019).Ful lling the food and feed demand, the average genetic gain per year should accelerate to more than 2%, which will be a big challenge under the effect of climate change (Prasanna et al. 2021).With the change in precipitation, temperature, and humidity, crop diseases become the key factor that affects genetic gain.Among the major maize diseases, stalk rot and ear rot are the two crucial diseases that have the highest impact on climate change (Prasanna et al. 2021).
Stalk rot, one complex fungal disease, can be caused by Fusarium verticillioides (F.v), F. graminearum (Gibberella), Colletotrichum graminicola (Anthracnose), and Pythium aphanidermatum, as well as some bacterial species of Erwinia with similar symptoms caused by Fusarium spp.(Chambers 1987).Fusarium stalk rot (FSR), caused by F.v, is one of the most disastrous diseases worldwide, especially in the tropical and subtropical zones (Savary et al. 2019;Chivasa et al. 2021).In South and Central America, the incidence of FSR is usually above 50% (Christensen et al. 2014).Generally, the incidence of FSR ranges from 30-70%, in some speci c years, it could surge to 90% in India, China, and the Philippines (Duan et al. 2019).FSR can cause 38-100% yield loss of maize, furthermore, it can produce low molecular-weight secondary metabolites known as mycotoxins in the grains and the plant, bringing fatal harm to humans and other animals (Maize AICRP, 2014;Subedi et al. 2016;Mueller et al. 2022).F.v is one of the aggressive pathogens that can infect any part of maize from the beginning to the end of the cropping season and keep alive on the residue of maize or the rotation crops during winter (Munkvold, 2003;White, 1999).There are no fungicides currently available for managing FSR.Hence, the e ciency of eld control for FSR was very low (Zhu et al. 2021;Holland et al. 2020).Alternatively, the development and promotion of disease-resistant varieties is a cost-effective and environment-friendly approach.
Novel genetic tools like GWAS (Genome-Wide Association Study) provide an opportunity to explore the genetic architecture of FSR resistance for selecting and breeding disease-resistant varieties.Various genetic studies reported that FSR resistance is a complex quantitative inherited trait.Hundreds of QTLs (quantitative trait loci) or genomic regions associated with resistance to stalk rot caused by different pathogens have been detected genome-widely, including qRfg1 (Yang et (Jung et al. 1994) for Anthracnose stalk rot.The causal genes of qRfg1 and qRfg2 have been cloned and functional markers were developed for the implementation of marker-assisted selection for improving stalk rot resistance.
A key genomic region on Chr. 6 at 168 Mb conferring FSR resistance, with the PVE values ranging from 6.16 to 8.38%, was identi ed by a GWAS analysis recently, which was further validated by linkage mapping in two F 2:3 populations (Rashid et al. 2022).The candidate gene conferring FSR resistance in this crucial region is annotated as a nucleic acid binding protein, playing an integral part in gene silencing pathways, and responding to diverse abiotic stress tolerances in maize.(Zhai et al. 2019, Qian et al. 2011).However, no more GWAS research has been conducted on mapping resistance to FSR in tropical maize, the genetic loci conferring resistance to FSR in tropical maize needs to be characterized comprehensively for a better understanding of the genetic architecture of FSR resistance at the postowering stage.
Genomic prediction (GP), another novel genomic tool, provides opportunities to improve the breeding e ciency of developing FSR-resistant inbred lines and hybrids.GP was also known as genomic selection (GS), which offers an attractive alternative to conventional or marker-assisted selection (Meuwissen et al. 2001).In GS, the effects of all the molecular markers across the entire genome were estimated to predict the Genomic Estimated Breeding Value (GEBV) of candidates to be selected ( ).However, the potential of exploiting GP for improving FSR resistance is still not clear, further studies need to be conducted to estimate the prediction accuracy and explore the GS breeding strategy for improving FSR resistance.
In this study, GWAS and GP analyses were performed on 562 tropical and subtropical inbred lines, where all of them were screened in four environments under arti cial inoculation to evaluate their response to FSR resistance and genotyped with genotyping-by-sequencing.The main objectives of the present study are to (1) dissect the genetic architecture of FSR resistance and estimate the effects of the favorable alleles conferring FSR resistance; (2) assess the prediction accuracy within the population and between the populations to explore the potential of GP for improving FSR resistance; (3) investigate the effects of key factors on estimation of prediction accuracy of the response to FSR resistance, including employing various prediction models in different cross-validation schemes, incorporating the genotype-byenvironment interaction into prediction, utilizing the phenotypic datasets from different environments for prediction, and selection a subset of molecular markers by considering both the genome coverage and the threshold of the P-value of SNPs for prediction.

Plant materials
In the present study, 562 tropical and subtropical maize inbred lines from two populations were used to conduct GWAS to dissect the genetic architecture of FSR resistance and estimate the prediction accuracy of FSR resistance under different scenarios.was used for all experiments with three replications per location, and a single-row plot per replication.Each plot was 2.5 meters long with 11 plants.The distance between rows was 0.8 meters and the distance between plants in a plot was 0.25 meters.
The environment was de ned as a combination of year and location.Therefore, each population would have phenotypic data from 4 environments and 12 data points.For example, the CML population was screened in four environments, designated as 2018AF, 2018TL, 2019AF, and 2019TL, respectively.

Arti cial inoculation and evaluation
All the 562 inbred lines were arti cially inoculated with the pathogen Fusarium verticillioides (F.v), which was the main pathogen in Mexico (Prasanna et al. 2021).It was cultured in fresh potato dextrose agar plates in which sterile toothpicks were inserted.The culture was incubated at 25°C for 2 weeks and the infected toothpicks were used for inoculation (Lal and Singh, 1984).Fourteen days after owering, all plants in each plot were inoculated by inserting infected toothpicks into a drilled hole on the rst stem segment (approximately 0.1 meters above the soil surface).
Before harvest (four weeks after inoculation), the plants were cut off at the height of the cob approx.0.5 to 1.0 meters high above the ground and the stalks were split longitudinally through the points of inoculation.Disease severity was estimated by the formula below: FSR severity (%) = visible lesion area/ whole longitudinal cut area ×100% The FSR severity ranges from 0 to 100%.If the FSR severity was close to 0% (no visible disease symptoms or lesions identi able on the stalk), that means the line has the highest level of resistance to FSR, i.e., the lowest FSR severity.Whereas, if the FSR severity was close to 100%, that means the line has the lowest level or no resistance to FSR, the highest FSR severity.where is the FSR severity, µ is the overall mean, , , and are the effects of the i th genotype, j th environment, and i th genotype by j th environment interaction, respectively.is the effect of the k th replication within the j th environment.is the residual effect of the i th genotype, j th environment, and k th replication.Genotype is treated as the xed effect, whereas all other effects are declared as random effects.Moreover, there is no (interaction between genotype and environment) in the single environment analysis.

Phenotypic data analysis
The environment with an estimated heritability below 0.40 was excluded from the CominbedENV analysis.The H 2 of FSR severity in individual environment analysis and CombinedENV analysis were calculated as: Individual environment analysis: CombinedENV analysis: Where is genetic variance, is the variance of interaction between genotype and environment, is error variance, is the number of environments, and is the number of replications within each environment.
In addition, description statistics of phenotypic data analysis was carried out in IBM SPSS Statistics, version 22.0 (IBMCorp.2022).The distributions of FSR severity in the individual environment and CombinedENV analysis were plotted in R (R Core Team, 2020) using the 'ggplot2' package (Wickham 2016).The Pearson correlations of FSR severity among each single and combined environment in the populations of CML and DTMA were calculated using the BLUE values and visualized in R using the package 'ggcorrplot' (Wickham et al. 2016).Moreover, the top 10 lines with the lowest FSR severity and the bottom 10 lines with the highest FSR severity were identi ed within each population.

Genotyping, GBS, and SNP calling
Total genomic DNA was extracted from bulked young leaves for all lines using a CTAB procedure (Doyle and Doyle 1987).Genotyping was performed at Cornell University Biotechnology Resource Center (Ithaca, NY).Genomic DNA was digested with the restriction enzyme of ApeK1.Genotyping-by-sequencing (GBS) libraries were constructed in the 96-plex and sequenced on Illumina HiSeq2000 (Elshire et al. 2011).SNP calling was performed using the TASSEL GBS Pipeline, where the GBS Version 2.7 TOPM (tags on physical map) le downloaded from Panzea (www.panzea.org) was used to anchor reads to the Maize B73 RefGen_v2 reference genome (Glaubitz et al. 2014;Wang et al. 2020).For each inbred line, 955,690 SNPs were called, 955,120 SNPs of them were evenly distributed on the ten maize chromosomes, while the other 570 SNPs were without position information.

Genome-wide association study (GWAS)
Before GWAS analysis, quality control for the genotypic data is an important step to ensure the accuracy of the later analysis.The combined population, consisting of all the 562 inbred lines coming from the two populations of CML and DTMA, was abbreviated as CominbedPOP.The raw GBS datasets were ltered with a minor allele frequency (MAF) above 0.05, missing data rates below 30%, and heterozygosity rate below 5% in TASSEL V5.0 in the populations of CML, DTMA, and CombinedPOP, respectively.Then, the imputation was performed with the default parameters in TASSEL 5.0 (Bradbury et al. 2017) using the LD KNNi method (Money et al. 2015).The imputed GBS datasets and the BLUE values of FSR severity were used to conduct GWAS analyses in all three populations mentioned above.
Bayesian-information and linkage-disequilibrium iteratively nested keyway (BLINK) model (Huang et al, 2019) was chosen to detect the associations between the SNPs and FSR severity in the GWAS analysis because this model effectively reduces the false positives.In addition to the capability to incorporate principal components (PC) and kinship (K) as covariates to reduce false positives, BLINK iteratively incorporated associated markers as covariates to eliminate their unclear connection to the individuals.
Moreover, the SNPs sampled in the BLINK model were selected according to linkage disequilibrium, optimized for Bayesian Information Content (BIC), and re-examined across multiple iterative to reduce false positives.The BLINK conducted two xed-effect models and one ltering process (Huang et

Effect analysis of the favorable allele
For each SNP detected by GWAS, the allele with a lower average FSR severity was assigned a favorable allele, whereas another allele with a higher average FSR severity was assigned an unfavorable allele.The formula for calculating the effect of each favorable allele was shown below: Effect of each favorable allele = Average FSR severity of the lines carrying favorable allele -Average FSR severity of the lines carrying unfavorable allele

Candidate gene analysis
The average linkage disequilibrium (LD) decay for each chromosome was measured in TASSEL V5.0, using sliding window analysis with a window size of 50 SNPs.Squared Pearson correlation coe cient (r 2 ) between vectors of SNPs was used to assess the level of LD decay on each chromosome, and the average LD decay distance across ten chromosomes at r 2 = 0.1 was used to measure the LD decay distance (Sharma) in the populations of CML, DTMA, and CominbedPOP (Yan et al. 2009).The LD decay results were plotted against physical distance (kb) in R by the package 'ggplot2' (Wickham et al. 2016).
Considering LD decay distance, the interval of the physical position of SNP ± LD decay distance was de ned as a genomic region.The overlapped or partially overlapped genomic regions were joined together as one region.Putative genes located in all the genomic regions were considered candidate genes conferring FSR resistance.Annotation of candidate genes was performed on NCBI (https://www.ncbi.nlm.nih.gov) and MaizeGDB (https://www.maizegdb.org).

Genomic prediction analysis
A ve-fold cross-validation scheme with 20 replications was used to generate the training and validation sets randomly and assess the prediction accuracy.The average value of Pearson correlations between the true breeding values and the genomic estimated breeding values in the testing population was de ned as the prediction accuracy (Liu et al. 2021).GP analysis was conducted using whole genomewide SNPs and the BLUE values of FSR severity from single environment analysis in the populations of CML, DTMA, and CombinedPOP, respectively.The GP analysis was conducted using the BGLR library (Pérez et al. 2014) in the R program, where Deviance Information Criterion (DIC) value was calculated for each model at the same time.The lower DIC value means the model was more precise (Tomohiro 2011).
Two cross-validation schemes (CVs) were applied.The rst cross-validation scheme, i.e., CV1, was used to mimic one breeding scenario that predicts the newly developed lines, which means these lines have not been observed in any environment.The second cross-validation scheme, i.e., CV2, was used to mimic sparse testing, in which some lines were observed in some environments but absent in others (Mageto et al. 2020).
To compare the prediction accuracy between phenotypic selection and GS, and to assess the effects of incorporating genotype-by-environment interactions in improving prediction accuracy, three prediction models were applied.The rst prediction model, i.e., M1, is a phenotypic prediction model, where the effects of the environment and lines were employed for prediction.The second prediction model, i.e., M2, is a general GP model where the effects of markers were added.The third prediction model, i.e., M3, is an extension of M2 incorporating genotype-by-environment interactions.More details of these three models were described in Method S1.Hereinafter, CV2, and M3 were applied for further GP analyses.
To evaluate the effects of year and location on estimation of prediction accuracy, the phenotypic data of FSR severity was analyzed within the CML population and the DTMA population by combining the data from the same location (CombinedAF and CombineTL) or the same year (Combined2018 and Cominbed2019 in the CML population, Combined2014 and Cominbed2019 in the DTMA population).Within each population, the prediction accuracy was estimated using the BLUE values of FSR severity from the same location or the same year.
To investigate the GP accuracy estimated with the signi cantly associated SNPs conferring FSR resistance, different numbers of SNPs detected by GWAS at different thresholds of the P-value of 10 − 3 , 10 − 4 , and 10 − 5 were selected for conducting GP analyses with M3 in CV2, only the unique SNPs across all the GWAS analyses were selected for GP analyses.
GP accuracy was also estimated between the CML population and the DTMA population, by training one population to predict the other as a testing population, where both the genome-wide SNPs and the signi cant SNPs conferring FSR resistance detected by GWAS at a P-value threshold of 10 − 3 were used for GP analyses with all the prediction models and CV2 scheme.

Phenotypic variation of FSR severity and correlation analysis
The FSR severity had broader variations and higher average values in the CML population than those in the DTMA population across the individual environment and CombinedENV analyses, except for in 2019TL (Table 1, Fig. 1a and c).In the CombinedENV, the FSR severity in the CML population ranged from 29.17 to 92.50%, with an overall mean of 56.24%.The FSR severity in the DTMA population ranged from 17.41 to 79.86%, with an overall mean of 46.70%.The phenotypic differences between these two populations indicated their differences in genetic variations responding to FSR resistance, and the disease pressure occurred in different years.
The estimated heritabilities of FSR severity were medium to high in both populations, ranging from 0.67 to 0.85 in the CML population, and from 0.53 to 0.79 in the DTMA population by excluding the lowest heritability of 0.38 observed in the environment of 2014AF.In the CombinedENV analysis, the heritability of FSR severity in the populations of CML and DTMA was 0.77 and 0.55, respectively (Table 1).
The Pearson correlation coe cients of FSR severity among all the individual and combined environments were positive and moderate to high.(Fig. 1b and d).The Pearson correlation coe cients between the CombinedENV and the individual environments were higher than those between the individual environments in both populations.In the CML population, the correlation coe cients between the individual environments ranged from 0.26 to 0.56, and the correlation coe cients between the CombinedENV and the individual environment ranged from 0.56 to 0.82, which were 0.15 to 0.

Population structure analysis and LD decay distance
After QC, 215914, 209111, and 221190 SNPs were selected to perform further genetic analysis in the population of CML, DTMA, and CominbedPOP, respectively.The high-quality SNPs were distributed evenly on ten chromosomes in all three populations.The average MAF was 0.22, 0.19, and 0.20 in the population of CML, DTMA, and CominbedPOP, respectively.The average missing rate was 9%, 3%, and 6% in the population of CML, DTMA, and CominbedPOP, respectively (Fig. S1).
The result of population structure in all three populations was illustrated by the PCA plot, where the rst two principal components of PC1 and PC2 together explained a total of 5.6%, 8.1%, and 4.7% of the phenotype variation in the populations of CML, DTMA, and CombinedPOP, respectively (Fig. 2a-c).All three populations have been divided into two clusters -tropical lines and subtropical lines -based on their pedigree information.
The average LD decay distance at r 2 = 0.10 across the ten chromosomes was 3.60 kb, 3.47 kb, and 2.83 kb in the populations of CML, DTMA, and CombinedPOP, respectively (Fig. 2d-e).
Signi cantly associated SNPs, the effect of favorable alleles, genomic regions conferring FSR resistance detected by GWAS, and annotation of candidate genes In total, 15 SNPs signi cantly associated with FSR resistance were detected in GWAS analyses across all three populations at the P-value threshold of 0.05/n (n is the number of genome-wide SNPs), i.e., 2.3×10 − 7 , 2.4×10 − 7 , and 2.3×10 − 7 in the populations of CML, DTMA, and CombinedPOP, respectively (Table 2, Fig. 3a, c, e). the QQ plots from the three GWAS analyses indicated that the population structure was well controlled, and the BLINK model applied in the present study is powerful to identify reliable SNPs conferring FSR resistance (Fig. 3b, d, f).These 15 SNPs signi cantly associated with FSR resistance were distributed on all ten chromosomes, only except for on chromosomes 9 and 10.Five of them were detected in the CML population, which were located on chromosomes 1, 2, 3, and 5. Seven of them were detected in the DTMA population, which were distributed on chromosomes 1, 3, 4, 6, 7, and 8. Three of them were detected in the CombinedPOP, which were concentrated on chromosomes 1 and 4. The Pvalues of the 15 signi cantly associated SNPs ranged from 1.99×10 − 7 to 8.27×10 − 13 , whose phenotype variance explained (PVE) values ranged from 0.94 to 8.30%, with an average PVE value of 3.63% (Table 2).These results showed that resistance to FSR in tropical maize is controlled by multiple SNPs with minor effects (PVE < 10%).
The effect analysis shows that 15 favorable alleles had a signi cant or extremely signi cant impact in the populations of CML, DTMA, and CombinedPOP (Fig. 4).The CML population experienced effects ranging from − 7.19 to -14.12, with an average effect of -9.62, while the DTMA population experienced effects ranging from − 4.29 to -9.65, with an average effect of -7.09.The effects in the CombinedPOP population ranged from − 5.26 to -12.22, with an average effect of -8.47.All favorable alleles were major alleles, except S2_41485521 in CML with an MAF of 0.11, S3_165448326 in CML with an MAF of 0.29, S6_112215613 in DTMA with an MAF of 0.32, and S8_21865355 in DTMA with a MAF of 0.27, whose effects were − 9.21, -7.19, -7.13, and − 9.07, respectively (Fig. 4).This is the expected result -most of the Additionally, the rest of the putative candidate genes were directly associated with the intracellular signal transduction or response to environmental stress.Results of candidate genes revealed in this genomic region associated with the biotic or abiotic stress response, indicating that this aggregated genomic region was more likely associated with the FSR resistance in tropical maize.
Prediction accuracy of FSR severity estimated with different prediction models and CV schemes Prediction accuracy of FSR severity estimated with the phenotypic prediction model of M1 was relatively low in the CV1 scheme.In CV1, the average prediction accuracy of FSR severity in the populations of CML, DTMA, and CombinedPOP was 0.03, 0.03, and − 0.04, respectively.The prediction accuracy estimated in M1 was improved in the CV2 scheme, which was 0.48, 0.29, and 0.48 in the populations of CML, DTMA, and CombinedPOP, respectively (Fig. 5a-c).
Prediction accuracy of FSR severity estimated with M2, a GP model incorporating molecular marker effects, was higher than those estimated with M1 in both CV1 and CV2.Prediction accuracy of FSR severity estimated with M2 in the CV1 scheme were 0.36, 0.29, and 0.34 in the populations of CML, DTMA, and CombinedPOP, respectively, which, in the CV2 scheme, increased to 0.51, 0.34, and 0.51 in the populations of CML, DTMA, and CombinedPOP, respectively (Fig. 5a-c).
Prediction accuracy of FSR severity could be further improved by applying M3 which incorporated the effects of molecular marker and G×E interaction into the prediction model (Fig. 5a-c).In CV1, the average prediction accuracy of FSR severity estimated with M3 was 0.40, 0.36, and 0.36 in the populations of CML, DTMA, and CombinedPOP, respectively.In CV2, the average prediction accuracy of FSR severity estimated with M3 was improved to 0.55, 0.42, and 0.53, respectively.
Based on the prediction accuracy compared with M1 and M2, M3, which incorporates the effects of G × E interaction, could get the highest prediction accuracy, indicating M3 is more powerful when predicting complex traits affected greatly by the environment.After comparing the prediction accuracy in different prediction schemes, it is clear that the prediction accuracy of FSR severity estimated in the CV2 scheme were higher than those estimated in the CV1 scheme in all three models, indicating that the predictions could bene t from previous records of lines whose FSR severity values have already been observed in other environments.However, the prediction accuracy estimated in the CV1 scheme were acceptable as well in the breeding program, which means GP had the power to assist in selecting newly improved lines.
Prediction accuracy of FSR severity with the phenotypic data from four individual environments and the combined phenotypic data from the same location or same year Prediction accuracy of FSR severity estimated with the BLUE values from the combined analysis of the same location or same year were higher than those estimated with the BLUE values from four individual environments.In the CML population, the prediction accuracy of FSR severity estimated with the BLUE values from individual environment analysis was 0.55 in M3 and CV2, which was improved to 0.60 using

Discussion
Overall, the FSR resistance in maize is a complex quantitative trait controlled by several minor effect loci and greatly affected by temperature, precipitation, humidity, plant nutrition, and stalk and root insect activity.Similar observations were reported in maize for stalk rot resistance caused by different pathogens (Pè et al. 1993 ).In the present study, the prediction accuracy of FSR severity estimated with the phenotypic prediction model (M1) and GP models (M2 and M3) were evaluated.In CV1, the prediction accuracy of FSR severity estimated from M1 across all the three populations were close to zero, indicating the phenotypic prediction model is ineffective for predicting the FSR resistance of the newly developed inbred lines.On the contrary, the prediction accuracy of FSR severity estimated from M2 by employing the SNP markers into the prediction model, were observed from 0.29 to 0.36 in the CV1 scheme and from 0.34 to 0.51 in the CV2 scheme, accounting for nearly or more than half of the heritability of the FSR severity estimated from the multiple location trials.This result shows that GP is a promising genomic tool for improving FSR resistance by selection of newly developed inbred lines based on their GEBVs.
Furthermore, considering both the genome coverage and the threshold of the P-value of SNPs to select a subset of molecular markers further improved the GP accuracy.Incorporating genotype-by-environment interaction into prediction, the prediction accuracy of FSR severity were improved from 0.36 to 0.40 in the CV1 scheme, and from 0.42 to 0.55 in the CV2 scheme.These ndings extend the knowledge of exploiting genomic tools for improving FSR resistance in tropical maize.
The gains in prediction accuracy for the GP model were dependent on the molecular markers sampled in the model.Generally, GP using the whole genome-wide SNPs was expected to achieve the highest prediction accuracy (

2018
).However, the reasons for sampling a subset of molecular markers for improving prediction accuracy were still not clear.In this study, the effects of the threshold of the P-value of SNPs, genome coverage, and the percentage of total PVE estimated with a subset of molecular markers on improving the prediction accuracy were further discussed (Fig. 7).To select a subset of molecular markers to achieve the equivalent prediction accuracy estimated the whole genome-wide molecular markers, both the genome coverage and the threshold of the P-value of SNPs have to be considered to capture more genotypic information conferring the FSR resistance, and the relative high percentages of total PVE estimated with a subset of molecular markers get.A certain number of molecular markers with lower Pvalues are required to get relatively high percentages of the total PVE.In the present study, the percentage of the total PVE estimated with the 2105 selected SNPs was 71.9, 64.3, and 67.1% in populations of CML, DTMA, and CombinedPOP, respectively, which were higher than those values estimated with the whole-genome wide SNP, 197 SNP, and 26 SNP.Moreover, the percentages of the total PVE estimated with the 2105 selected SNPs in all the three populations were close to the heritabilities of the FSR severity estimated from the multiple location trials.Furthermore, information of the variance components estimated with different number of molecular markers showed that the prediction using 2105 signi cantly associated molecular markers could reduce the residual variance, whose Deviance Information Criterion (DIC) value was also the lowest, which means the prediction model more effective and precise (Supplementary Table 4).
GP models that incorporated genotype-by-environment interaction achieved higher prediction accuracy in both CV1 and CV2 schemes relative to models that did not include genotype-by-environment interaction  5).In the CML population, the effect of environment in M2 explained the largest proportion of the total variance of 39.18%, which increased to 51.62% in M3 by including the effects of both environment and genotype-by-environment interaction.Meanwhile, the proportion of residential effect reduced from 37.16% in M2 to 23.59% in M3, and the DIC values also decreased from 6371.36 in M1 to 6354.09 in M2 until 6185.37 in M3, which means M3 was the most effective and precise model.
The same trend was observed in the DTMA population.Overall, the genotype-by-environment interaction, a variable response of genotypes across environments, resulting in different trait values, could be used to better drive selection as well as in statistical models incorporating environmental information to achieve more accurate genomic predictions (Braz et al. 2021).
The effects of environment and genotype-by-environment interaction on estimation of prediction accuracy depends on how much environment variance was contained in the phenotypic data.In this study, using the combined BLUE values estimated from the same location across years improved the prediction accuracy by up to 12.7% in the CML population and 23.3% in the DTMA population, in comparison to the average prediction accuracy estimated with the data from four individual environments.In a GP study in barley, a higher accuracy of plant height was also observed by using the combined phenotypic data across two years (Oakey et al. 2016).In both populations, the DIC values were signi cantly decreased when the prediction was implemented using BLUE values from combined analysis (Supplementary Table 6).Since, this is reasonable and effective way to improve the prediction accuracy.See image above for gure legend

For
the CML and DTMA population, the best linear unbiased estimate (BLUE) values and broad sense heritability (H 2 ) of FSR severity were analyzed within the single environment analysis and the combined analysis across environments (CombinedENV) by META-R software (https://hdl.handle.net/10883/20997)(Alvarado et al. 2020) using the mixed linear model.The mixed linear model applied in META-R was implemented with the 'lme4' (Bates et al. 2015) R-package using the function of 'lmer'.Meanwhile, the estimation of variance components in the mixed linear model used the function of 'reml'.The formula was as follows: Individual environment: Combined environments:

(
Burgueño et al. 2012; Guo et al.2013; Jarquín et al. 2014; Lopez-Cruz et al. 2015; Zhang et al. 2015; Monteverde et al. 2018).In this study, the variance components of the random effects in each prediction model and the percentage of the total variance explained by each random effect were estimated (Supplementary Table

Figure 1 See
Figure 1

Figure 2 See
Figure 2

Figure 3 See
Figure 3

Figure 4 See
Figure 4

Figure 5 See
Figure 5

Figure 6
Figure 6 al. 2010; Wang et al. 2017), qRfg2 (Zhang et al. 2012; Ye et.al.2018), qRfg3 (Ma et al. 2017) and Rgsr8.1 (Chen et al.2017) for Gibberella stalk rot, Rpi1 (Yang et al. 2005), RpiQI319-1, RpiQI319-2 (Song et al. 2015), RpiX178-1 and RpiX178-2 for Pythium stalk rot (Duan et al. 2019), and Rcg1 (Yu et al. 2022;2019;Nyaga et al.Oakey et al. 2016. 2017)rmore, moderate to high prediction accuracy for FSR severity were observed for the CML and DTMA populations.To increase the prediction accuracy, the effect on prediction accuracy of trait heritability, prediction model, marker density, genotype × environment (G×E) interactions, and the relationship between the training and testing population were dissected in this study.The present study enhances the understanding of the genetic architecture of FSR resistance in tropical maize and increases breeding e ciency through improving GP strategy.One stable aggregated genomic region was detected on chr. 1 at 250-251 Mb in all populations of CML, DTMA, and CominbedPOP in this study, which was the only one of the 13 regions that overlapped in all populations.Similarly, one recent study on FSR also detected a stable genomic region on chromosome 6 at 168Mb, which used a CAAM panel assembled at CIMMYT, consisting of 342 tropical/sub-tropical inbred lines bred in Asia(Rashid et al. 2022).Interestingly, in this present study, near this region (162-168 Mb), six SNPs were detected by the mixed linear model in GWAS using the populations from the present study (Supplementary Table3).The P-value of the 6 SNPs ranged from 2.88×10 − 5 to 7.07×10− 4, and the PVE ranged from 2.60 to 6.13%.Indicated that the data, the analysis methods, and the results in this present study were reliable.Several QTLs associated with stalk rot resistance have been reported on chromosome 1, one QTL associated with Pythium stalk rot resistance(Song et al. 2015) was reported in bin 1.03; one genomic region detected by one of our previous studies also conferred with FSR resistance was located in bin 1.06; and another study for Pythium stalk rot(Duan et al. 2019) reported one major QTL in bin 1.09, which is the nearest QTL with the stable region detected in this study, however, unsurprisingly, none of them were overlapped with each other.Differences in pathogens and inoculation methods could explain some of the lack of congruency, but the complexity of the traits cannot be ignored.In this aggregated stable region, lots of the putative candidate genes were reported highly associated withSitonik et al. 2019;Kuki et al. 2020;Holland et al. 2020).In this study, the prediction accuracy for FSR resistance was ~ 0.40 in M3 and CV1, and increased to ~ 0.50 in M3 and CV2.The moderate to high prediction accuracy indicated that GP can be used in maize breeding to improve FSR resistance.Previous GP studies gave promising results in improving disease resistance in maize, despite the prediction accuracy being highly in uenced by the trait heritability, prediction model, marker density, genotype × environment (G×E) interaction, the relationship between the training and testing population, etc.(Sitonik et al. 2019;Nyaga et al. 2019;Zhang et al. 2017).Extensive research has been conducted on evaluating the e ciency of the utilization of GP in crop improvement for various target traits(Yu et al. 2022; Guo et al. 2020;Oakey et al. 2016 (Zhang et al. 2015;Werner et al. 2018lakoti et al. 2020).However, several previous studies also proved the potential of the low-density markers due to cost effectiveness(Zhang et al. 2015;Werner et al. 2018).In the present study, the highest prediction accuracy of FSR severity was achieved using 2105 signi cantly associated markers, which was higher than those estimated with the whole genome-wide SNPs, 197 SNPs, and 26 SNPs.Similar result was also observed in a GP study in barley, where GP accuracy estimated with 2000 SNPs was relatively high and cost-effective(Abed et al.