Genetic Mapping And Genomic Prediction For Northern Corn Leaf Blight (Exserohilum Turcicum (Pass.) Leonard And Suggs) Resistance

Background: Northern corn leaf blight (NCLB) of maize caused by Exserohilum turcicum is a serious foliar disease. Resistance to NCLB is complexly inherited and the highly signicant genotype x environment interaction effect makes selection of resistant genotypes dicult through conventional breeding methods. Hence an attempt was made to identify the genomic regions associated with NCLB resistance and perform genomic selection (GS) in two F 2:3 populations derived from the crosses CM212 × MAI172 (Population-1) and CM202 × SKV50 (Population-2). Results: Two populations, each comprising of 366 progenies, were phenotyped at three different locations in the disease screening nurseries. Linkage analysis using 297 polymorphic SNPs in Population-1 and 290 polymorphic SNPs in Population-2 revealed 10 linkage groups spanning 3623.88cM and 4261.92cM with an average distance of 12.40 cM and 14.9 cM, respectively. Location-wise and pooled data across locations indicated that QTL expression was population and environment specic. The genomic prediction accuracies of 0.83 and 0.79 were achieved for NCLB Population 1 and Population 2, respectively. The resistant progenies from both populations were advanced to derive inbred lines and crossed with four different testers in line x tester mating design to test for their combining ability. High overall general combining ability was exhibited by 21 inbred lines. Among crosses 48 % were assigned high overall specic combining ability status. Out of 136 single crosses, seven recorded signicant positive standard heterosis over the best check for grain yield. The clustering pattern of inbred lines developed from the two populations revealed high molecular diversity. Conclusions: In this study, comparatively better genomic prediction accuracies were achieved for NCLB and the worth of F 3 progenies with high genomic predictions was proved by advancing them to derive inbred lines and establishing their higher combining ability for yield and yield related traits. compared earlier accuracies of GS in for on the for NCLB accuracies on [55], and maize chlorotic mottle virus (0.32, 0.78, 0.47 and 0.21) [56] in maize. Higher prediction accuracies of 0.72 and 0.80 were reported for different agronomic traits in sugar beet [57]. considering gca and sca or heterosis over a number of characters was assessed Assessment of molecular diversity in the inbred lines using SSR markers Genomic DNA from 45 inbreds developed was extracted from the leaves of three-week-old maize seedlings using modied CTAB procedure The genotyping of the inbred lines was carried out with 64 SSR markers. For each primer, the amplied DNA fragments were scored as 1 or 0 based on whether the DNA fragment is present or absent. The binary data thus obtained were used to estimate pair wise similarity coecients [99]. The NTSYS software was employed to subject the resulted similarity matrix to hierarchical cluster analysis using the Unweighted Pair Group Method with Arithmetic average (UPGMA) algorithm [100].

Parents differed signi cantly in their reaction to NCLB ( Table 2). The parents MAI 172 and SKV 50 showed resistant reaction at all locations. The susceptible inbreds CM 212 and CM 202 recorded signi cantly higher disease incidence. The mean disease incidence of 57.92 %, 50.53 % and 57.93 % was observed at Hassan, Mandya and Davanagere, respectively and it was 55.34 % when pooled across locations. Maximum range of disease incidence (7.00 to 99.00% for Population 1 and 9.00 to 99.00 % for Population 2) was recorded at Hassan followed by Davanagere (20.00 to 90.00% and 20.00 to 92.00 % for Population1 and 2, respectively) and Mandya (20.00 to 81.00 % and 14.00 to 88.00 % for Population 1 and 2, respectively). The pooled NCLB incidence ranged from 7.00 to 99.00 % for the Population 1 and 9.00 to 99.00 for the Population 2. The frequency distribution pattern of F 3 families for NCLB was negatively skewed and platykurtic at all the three locations and across locations in population 1 and 2 ( Table 2, Figs. 1 and 2). The estimates of phenotypic and genotypic coe cient of variation were high in both populations. High heritability and genetic advance which is a measure of genetic gain under selection were observed at all the locations and over locations ( Table 2).

Construction of linkage map
Parental polymorphism survey using SNPs

QTL analysis
The QTLs were detected using the disease incidence data from three locations during rainy season of 2014 and pooled across locations.
In population 1, six QTL positions were identi ed for NCLB resistance at Hassan location ( Out of three QTLs found at Davanagere, the QTL located on chromosome 4 explained 3.66 % phenotypic variation and the other two QTLs were located on chromosome 6 which explained 6.32 and 4.53 % phenotypic variation, respectively. In combined QTL analysis three QTLs were identi ed with the QTL located on chromosome 1 explained 3.63 % phenotypic variation. The remaining QTL were located on chromosome 6 explaining 11.08 and 3.69 % phenotypic variation. In population 2, three QTL positions were identi ed at Hassan (Table 3, Fig. 4) and the QTL located on chromosome 8 was the major with phenotypic variation of 27.37 % followed by the second major QTL located on chromosome 5 with 14.45 % phenotypic variation. The third major QTL located on chromosome 10 explained 15.58 % phenotypic variation. Only one QTL position was identi ed at Mandya on chromosome 9 which explained 4.73 % phenotypic variation. The favorable allele for this QTL was contributed by susceptible parent CM 202 which showed additive gene action. The First QTL located on chromosome 4 explained 4.08 % phenotypic variation with LOD of 3.84 at Davanagere. The second and third QTL located on chromosome 5 and 10 exhibited 5.61 and 3.52 % phenotypic variation. In combined QTL analysis, two QTL positions were identi ed for NCLB resistance with the QTL located on chromosome 3 explained 6.55 % phenotypic variation. The second QTL located on chromosome 8 explained 10.85 % phenotypic variation.

Genomic selection
Best linear unbiased predictors (BLUPs) The genotypic data of 297 polymorphic SNPs of the Population 1 and 290 polymorphic markers of Population 2 were used to estimate the marker effects of each SNP using the RR-BLUP procedure. The marker effects estimated for Population 1 ranged from − 0.0866 to 0.0857 and from − 0.0101 to 0.0105 for Population 2 (Supplementary Table S1). The estimated effects were included in the model to predict the genomic estimated breeding values (GEBVs) of each individual (Supplementary Table S2).

Prediction accuracy and cross validation
The entire 366 F 2:3 progenies from population 1 were cross validated 100 times in two different ratios of validation set and training set. The rst case was with two-fold cross validation, where in each time validation set and training sets were randomly selected equally as 1:1. In this case the prediction accuracy of the GEBVs was 24% ( Table 4). The second case was with ve-fold cross validation, where the population was divided into ve equal parts, where four sets were used as training set against one validation set. In this case the prediction accuracy of the GEBVs was 26%. The same kind of cross validation was done for Population 2 where GEBV prediction accuracies were 29% and 32% for the validation ratios of 1:1 and 1:5, respectively (Table 4, Figs. 5 and 6). The prediction accuracy measured in terms of correlation between the BLUEs and GEBVs was found positive ( Table 5). The Population 1 had a correlation of 0.79 and for the Population 2 it was 0.83. However, when predictions were made using top 10 per centselections correlation between GEBVs and their BLUEs was reduced to 0.35 in Population 1 and 0.50 in Population 2.  The analysis of variance for combining ability with respect to 12 quantitative traits indicated that the crosses exhibited high level of signi cance for all the traits ( Table 6). The variance due to crosses was further divided into variance due to lines, testers and line × testers. The variance due to lines was signi cant for ear height, kernel rows per cob and kernel per rows whereas, variance due to testers was signi cant for days to 50 per cent anthesis, days to 75 per cent dry husk, plant height, ear height, kernel rows per cob, test weight and plot yield. The line × tester interaction variance was highly signi cant for all the traits. The GCA/SCA variance ratio was less than unity.  Table S6).
The best single cross hybrids based on mean, sca effects and standard heterosis for grain yield were MAI-E2-72 x MAI105, MAI-E9-46 x NAI137, MAI-E2-70 x NAI137, MAI-E2-81 x V351, MAI-E9-220 x V351, MAI-E9-211 x NAI137, and CIL1218 x V351 compared to the best commercial check DKC9144 (Table 7). The diversity was assessed in the 45 inbreds and most of the inbred lines could be discriminated with SSR markers. Out of 64 SSR markers tested 35 were found to be polymorphic for the genotypes studied (Supplementary Table S7). The PIC (polymorphism information content) value ranged from 0.11 (bnlg1327) to 0.85 (umc1951) with mean value of 0.45. Allelic frequency was in the range of 0.52 to 0.93. Markers umc1139, mmc0111 and umc2269 (0.93) showed high allelic frequency, followed by umc1080 (0.90) and least was seen in case of bnlg1449 (0.52). Gene diversity was in the range of 0.24 (bnlg1942) to 0.50 (umc1139, bnlg1063, umc1085 and bnlg1887) with mean value 0.46.

Clustering of inbreds based on SSR marker data
The UPGMA based dendrogram was obtained from the data deduced from the DNA pro les of the samples analysed (Fig. 7). Forty-ve maize inbreds were clustered by Jaccard's similarity coe cients which ranged from 0.17 to 0.81 per cent and at similarity coe cient of 0.41, four main groups were observed. The Cluster CI consisted of 21 inbred lines, followed by cluster CII (16), cluster CIII (6) and CIV was with least number of inbred lines (2).

Discussion
The parents differing for NCLB infection were used in the development of mapping populations which was re ected in signi cant differences among the progenies as revealed by the analysis of variance at each location and over three locations. Maximum number of progenies in both populations showed moderate resistance to NCLB. The disease pressure varied across locations and Hassan location had the highest disease incidence compared to Davanagere and Mandya. The genotype x environment interaction component was signi cant indicating the in uence of environment on the expression of NCLB. The data from three locations were pooled as Bartlett's test proved the homogeneity of error mean sum of squares for NCLB data across locations.

Distribution of the F 3 progenies with respect to NCLB incidence
For a better understanding of the breeding material descriptive statistics is commonly employed. The nature of gene action [21] and number of genes responsible for the trait [22] are denoted by the coe cients of skewness and kurtosis, respectively. Frequency distribution of 366 F 3 progenies from the two crosses CM 212 x MAI 172 and CM 202 x SKV 50 revealed non-normal distribution. Negatively skewed distribution was observed at all locations and in pooled data. Distribution of disease expression was skewed towards susceptible parent CM 212 which denoted the dominance of susceptibility. However, the distribution was made near normal through arcsine transformation of the per cent disease incidence data. The near normal distribution of phenotypic data on F 2:3 populations was reported by several workers [7,9,23,24] with negative skewness for Ht1 gene [25].
The involvement of relatively large number of segregating genes with epistasis was indicated by platykurtic and skewed distribution of F 2:3 populations in the inheritance of resistance to NCLB [26][27][28][29].
Genetic parameters in F 2:3 mapping populations Phenotypic co-e cient of variation (PCV) was higher than the genotypic co-e cient of variation (GCV) re ecting on the direct possibility of selecting resistant phenotypes for NCLB [30][31][32]. High heritability coupled with high genetic advance over mean also indicated the scope for selection of disease resistance genotypes in these populations [7,10,30,[33][34][35]. Mandya location had comparatively lower heritability and genetic advance and this could be due to low disease incidence.

QTL mapping for resistance to NCLB
The foremost requirement to perform QTL analyses is availability of mapping populations developed from the parents with contrasting reaction and it was clearly accomplished in the present study. Both populations showed near normal distribution pattern of F 3 progenies suggesting their reliability in the identi cation of QTLs for resistance to NCLB. Since substantial environmental variation was observed, each environment was separately analysed and pooled analysis was done to estimate the overall QTL effects. Most of the QTL were environment speci c. The QTLs were identi ed on chromosomes 1, 3, 4, 6, 7, 8, 9 and 10 and the UMC reference map of maize [36] aid in the comparison of QTL positions across experiments and diverse genetic backgrounds [39,40]. The major QTLs detected on chromosome 3, 7 and 10 were also reported earlier [8, 10,23]. They also reported minor QTLs on chromosome 3, 4, 6, 8 and 9 as in the present study. These NCLB resistant alleles present in resistant parents could be effectively used in MAS to support maize breeding programs [8,9,23,35,41,42]. Since the QTL expression was population speci c and ne mapping could be an option for clarity on this. The role and nature of QTL x environment interaction is also very crucial as we found most of the QTLs were environment speci c and deserve more attention, critical analysis and deeper understanding of molecular basis for the bene t of disease resistance breeding.

Estimation of Genomic Estimated Breeding values (GEBVs) of the selection candidates for the NCLB resistance and prediction of GS accuracies
All methods employed in genomic selection e ciently utilize genome wide markers to predict any trait with accuracy enabling selection on that prediction alone. These breath-taking developments enabled potential acceleration of the breeding cycle and also increase in selection intensity [43,44]. It is possible to reduce four-year breeding cycle that include three years of eld testing, to only four months for growing and crossing a plant. It also helps in the evaluation of thousands of selection candidates without ever taking them out to the eld. However, eld trials are still a very much part of breeding program using GS, and selection is not based on the phenotypes. Phenotypes are just used to train a prediction model [45] Higher variation in the inbred lines was evident from high range of PIC values and the mean PIC value was comparable with the previous ndings [72,73]. The inbred lines derived from the two crosses E2 (CM212 × MAI172). E9 (CM202 × SKV50) were distributed in all the four clusters indicating the presence of molecular diversity among the inbred lines developed.

Conclusion
In this study, comparatively better genomic prediction accuracies were achieved for NCLB and the worth of F 3 progenies with high genomic predictions was proved by advancing them to derive inbred lines and establishing their higher combining ability for yield and yield related traits.

Plant material
The plant material required for the present study was developed at the Zonal Agricultural Station (ZARS), V.C. Farm, Mandya, Karnataka, India, using two resistant (MAI172 and SKV50) and two susceptible inbreds (CM212 and CM202). Two F 1 s were produced (CM212 x MAI172 and CM202 x SKV50) during summer 2013. They were grown during rainy season of 2013 and selfed. 4-meter single row plots with a spacing of 60 cm between rows and 20 cm between plants. The susceptible checks for NCLB, CM202 and Pioneer Hibred Seeds commercial hybrid P3522 were planted after every 10th row to assess the disease pressure and as spreader rows. The arti cial inoculation procedure was followed for creating disease epiphytotic condition.

Creation of arti cial epiphytotic condition
For uniform disease development, arti cial inoculation was carried out [74]. The diseased leaves were obtained from the eld, and washed three times with sterile distilled water. These leaf tissues were cultured on potato dextrose agar medium to obtain the pathogen inoculum. The sorghum seeds soaked overnight were transferred to sterilized conical asks and the pathogen inoculum was added. These asks were shaken once every two days and after one-week equal amount of fresh sorghum seeds were added. The infected sorghum seeds were ground to ne powder, and 1 to 1.5 gram of the ground inoculum was applied to the leaf whorl. Light spray of water was given to create humidity and the inoculum was sprayed 20 days after sowing between 3.00 to 6.00 PM. Another inoculum spray was made after one week.

Disease scoring methodology
The northern corn blight severity was recorded at owering (60 days after planting) and at dough stage (80 days after sowing) using a standard scale ranging from 1 (Susceptible) to 9 (Resistant) (www.pioneer.com). The disease severity was assessed based on lesion spot development in the middle to upper part leaves.
Score % Leaf loss Remarks The disease scores were converted into per cent disease severity using the following formula [75] for data analysis.

Statistical analysis
Since the data did not follow normal distribution, the arcsine transformation was done [76] with the expectation that the means and variances become independent and normally distributed. Using PROC GLM procedure of SAS package version 9.3, the analysis of variance was conducted on transformed phenotypic data. Bartlett's test was used to test for homogeneity between data obtained from three environments before combining data [77]. The analysis of variance was performed considering seasons, replicates and F 2:3 families as random in the statistical model. Transformed entry means were used to carry out the combined analyses of variance and covariance across seasons [78]. Estimates of variance components σ²g (genotypic variance), σ²ge (G x E) interaction variance), and σ²e (error variance) of F 2:3 families were computed [79] (Searle, 1971 (Endelman, 2011). We estimated environment wise adjusted phenotypic values of each F3 progenies initially (in step 1) using ASReml version 3.0. Further adjusted phenotypic values in each environment were used for genomic prediction. We estimated combined BLUEs across environment using adjusted phenotypic values. Then BLUEs were used as phenotypes to t whole genome prediction model.

GS Prediction Accuracies
Evaluating GEBV accuracy through Cross Validation (CV) The GS studies on empirical data use cross validation (CV). In CV the data were split into a training set and validation set. We employed two-fold and a ve-fold CV in this study. The data set was randomly divided into ve sets in ve-fold CV. The four sets were combined to form the training set and the remaining set was denoted as the validation set. In two-fold, the observations were divided into training and validation sets equally. In any case, a model was developed using the training data and GEBVs were calculated using genotypes of the validation set. The sampling of training and validation sets was repeated 100 times. The marker effects estimated were used to predict the genomic breeding values in each family across locations. The data sets were divided into training set (TS) to estimate marker effects, and a validation test set (VS), in which the predictive ability (Pearson correlation rMP) between observed BLUEs and the predicted genotypic values was estimated as a measure of prediction accuracy. Correlations were estimated either as accuracy of prediction rp = rMP or as standardized accuracy of prediction. with two replications during rainy season of 2018. Each genotype was sown in a single row of 3m length with 20cm between the plants and 60 cm between the rows. The data recorded on 12 morphological characters were analysed for combining ability and heterosis. Five plants in each genotype were tagged randomly in both replications and same were used to record observations. Mean of ve plants was used for the combining ability analysis [89]. Combining ability analysis helps in identifying superior lines to be used in breeding programs or to identify promising cross combinations for the development of varieties [90]. General combining ability (GCA) describes the breeding value of a parent and is generally associated with additive genetic effects, while speci c combining ability (SCA) is the relative performance of a cross that is associated with non-additive gene action, predominantly contributed by dominance, epistasis, or genotype × environment interaction effects [91,92]. Therefore, both gca and sca effects are important in the selection or development of breeding populations [93]. Heterosis over the best commercial check (standard heterosis) was computed [94,95].
Overall general combining ability ( gca ) status of parents, speci c combining ability ( sca ) and Heterotic status of crosses The gca status of parents and sca or heterosis of crosses were estimated and expressed in the desirable direction. The overall status of a parent or cross with respect to gca and sca or heterosis over a number of characters was assessed [96,97].
Assessment of molecular diversity in the inbred lines using SSR markers Genomic DNA from 45 inbreds developed was extracted from the leaves of three-week-old maize seedlings using modi ed CTAB procedure [98]. The genotyping of the inbred lines was carried out with 64 SSR markers. For each primer, the ampli ed DNA fragments were scored as 1 or 0 based on whether the DNA fragment is present or absent. The binary data thus obtained were used to estimate pair wise similarity coe cients [99]. The NTSYS software was employed to subject the resulted similarity matrix to hierarchical cluster analysis using the Unweighted Pair Group Method with Arithmetic average (UPGMA) algorithm [100].