Genome resequencing facilitates high-resolution exploration of a maize quantitative trait locus for resistance to aflatoxin accumulation

Aflatoxin contamination is a major threat to maize production in the southeastern United States. Screening for genetic resistance to aflatoxin has led to the identification of aflatoxin-resistance quantitative trait loci (QTL) in maize inbred lines. However, these QTLs typically span large DNA regions, making identification of actual resistance-associated sequences difficult. In this study, we took the portion of the maize B73 genome at chromosome bin 4.06 (APG v3) containing a 22-Mbp QTL (QTL-4.06) associated with aflatoxin resistance and used it as a reference to identify single-nucleotide polymorphisms (SNPs) and insertion/deletion variants (INDELs) that differ between resistant (Mp313E and Mp715) and susceptible (B73 and Va35) maize inbred lines. Our study provides a rich list of SNPs and INDELs that can be used as markers in the fine-mapping of candidate regions in QTL-4.06 and demonstrates the great potential of resequencing in generating higher-density molecular maps that can be leveraged in molecular breeding-based crop improvement.


Introduction
Aflatoxins are carcinogenic fungal secondary metabolites most notably produced by the pathogenic fungus Aspergillus flavus (Castegnaro and McGregor 1998;Robens and Cardwell 2005).Pre-harvest maize plants are prone to A. flavus infection and subsequent aflatoxin contamination in climates with high heat and humidity (Gourma and Bullerman 1995;Payne 1992;Windham and Williams 1998a, b).An ideal approach to protect maize production is to identify natural maize A. flavus/aflatoxin genetic resistance and breed for resistant maize lines (Williams and Windham 2005;Windham andWilliams 1998a, b, 2002).Characterization of the genetic variants related to maize host plant resistance/susceptibility to A. flavus infection and aflatoxin accumulation is important in the development of resistant maize germplasm.
Studies on maize genetic variation and levels of aflatoxin accumulation have led to the identification et al. 2013;McKernan et al. 2009).In maize, wholegenome resequencing of Mp313E, Mp715, and Va35 and alignment of sequences with a quality de novo genome sequence (e.g., B73) will allow genome regions harboring resistance QTL to be compared base by base with the same loci in susceptible lines.Investigation of whole-genome sequences can reveal the genetic structure of QTL and benefit maize-breeding projects.The objectives of this study were to utilize NGS to conduct a comparative genome sequence analysis of a maize QTL region.Through genome sequence analysis, DNA markers in high density can be developed to facilitate the maize-breeding process and increase host plant resistance through markerassisted breeding.

Plant materials
Plant materials used in this study were obtained from the United States Department of Agriculture (USDA), Agricultural Research Service (ARS), and Corn Host Plant Resistance Research Unit (USDA-ARS-CHPRRU) at Mississippi State, Mississippi.Three maize inbred lines-Mp313E, Mp715, and Va35were used in this experiment.Mp313E and Mp715 are inbred lines that carry the resistance to aflatoxin accumulation in maize kernels, whereas Va35 is a maize inbred line with good agronomic traits but with susceptibility to A. flavus infection and aflatoxin accumulation.The maize inbred lines were planted at the R. R. Foil Plant Science Farm at Mississippi State University in Starkville, MS, USA.The experimental design was a randomized complete block design including three replications.All primary ears were self-pollinated.Developing kernels were collected from primary ears 21 days after pollination (Kelley et al. 2012).The maize kernel samples were collected and immediately flash-frozen with liquid nitrogen in the field.The samples were then stored at − 80 °C.

DNA sample preparation
To prepare DNA samples for whole-genome sequencing, the Qiagen DNeasy Plant Maxi Kit (Qiagen, Germantown, MD, USA) was used for DNA extraction.The frozen maize kernels were ground to a powder with a mortar and pestle in the presence of liquid nitrogen.DNA was extracted from ground samples following the manufacturer's protocol with minor modifications.Buffer AP1 (400 μl) was added to 200 mg of ground maize samples.The samples were vortexed, RNase A (4 μl) was added, and the samples were mixed well.The samples were incubated at 65 °C for 10 min and then centrifuged (14,000 g for 1 min) to pellet debris.Buffer P3 was added to the supernatants and solutions were gently mixed to prevent damage to DNA.The samples were incubated on ice for 5 min to precipitate protein, polysaccharides, and other impurities.The lysates were centrifuged and each supernatant was poured into a QIAshredder spin column.The subsequent steps were carried out as described in the Qiagen DNeasy Plant Handbook (https:// www.qiagen.com/ us/ resou rces/).DNA was eluted from the DNeasy columns by adding 50 μl of AE buffer.

Next-generation sequencing
High-molecular-weight, RNase-treated genomic DNA samples were prepared with the Qiagen DNeasy Plant Maxi kit and quantified by NanoDrop (Fisher Scientific Company LLC, Hanover Park, IL, USA).At least 5 µg (in 100 µl AE buffer) purified DNA sample for each of the three maize inbred lines (Mp313E, Mp715, and Va35) was prepared for the whole-genome next-generation sequencing.DNA samples were submitted to the genome sequencing service at Georgia Genomics Facility (GGF) (https:// dna.uga.edu/).The GGF prepared the libraries using TruSeq DNA-PCR free library prep kits (Illumina, San Diego, CA, USA), performed the cluster generation, conducted 150-bp paired-end sequencing using an Illumina platform 2 × 150 bp (PE150) High Output Flow Cell, generating paired-end reads (2 × 151 bp).Adapters were removed in silico by the GGF to generate 'raw reads' ready for the subsequent genome sequence alignment and mapping analysis (Kircher et al. 2012).

Aligning and mapping of sequencing reads
The large volume of raw Illumina sequencing reads was obtained in the fastq format.Genomic sequence from maize B73 genome sequences (AGP v3) (Schnable et al. 2009) was downloaded from the MaizeGDB website (https:// www.maize gdb.org) and used as the reference against which re-sequenced genomes would be compared.Bowtie2 (v2.1.0-macos-86-64)(Langmead and Salzberg 2012) was used to align sequencing reads to the B73 genome QTL-4.06 region sequence using the pipeline functions as described in the software manual (both software and manual were downloaded from SourceForge website: http:// bowtie-bio.sourc eforge.net/ bowti e2/ index.shtml).Command-line programs were run on a Mac OS X (× 86) platform.Sequencing reads were mapped to the reference maize B73 genome QTL-4.06 sequences with default parameters for paired-end alignment.Mapped sequencing reads were then processed to convert them from SAM files to sorted and indexed BAM files using SAMtools (Li 2009(Li , 2011a(Li , 2011b) (v1.2, http:// www.htslib.org/).BCFtools was also used to call variants (DNA polymorphisms) among the maize inbred lines Mp313E, Mp715, Va35, and the reference B73.Sorted and indexed BAM files were viewed with the Integrative Genomics Viewer (Robison et al. 2011(Robison et al. , 2017;;Thorvaldsdottir et al. 2013) (IGV_2.3.91,https:// softw are.broad insti tute.org/ softw are/ igv/) to directly visualize and compare the genomic sequences side-by-side on the maize QTL-4.06 region from maize inbred line Mp313E, Mp715, Va35, and B73.

Variant calling and filtering
Variant calling is the process of extracting singlenucleotide polymorphisms (SNPs) and other genetic variants from BAM files and recording the variant data in Variant Call Format (VCF) files (Danecek et al. 2011;Li 2011aLi , 2011b;;Robinson et al. 2011).Variant calling was performed by first using SAMtools (v1.2) 'mpileup' function and then by using BCFtools (v1.data (Knaus and Grunwald 2016;2017).Variants in raw VCF files were then subjected to extensive filtering based on the quality scores to reduce the number of low-quality SNPs and INDELs.In this study, variant filtering was performed based on the quality scores for read depths (DP) and sequencing reads quality (QUAL), using an R program (https:// www.rproje ct.org/) we developed.To analyze the chromosome positions of the identified SNPs and INDELs in relation to those of the genes in the specific genomic region, all known genes and their chromosome positions from maize B73 chromosome 4.06 region were downloaded from the MaizeGDB website (https:// www.maize gdb.org/) and integrated into the final data table (Supplemental Table 1).

Next-generation sequencing
Illumina NextSeq sequencing was performed on maize inbred lines Mp313E, Mp715, and Va35 in a multiplexed setting with the DNA of each maize inbred line labeled with its own Illumina adapter.The yield of Illumina NextSeq data was 138.24 Gbp, with %PF at 85.09%, % ≥ Q30 at 74.08%, and %error rate at 1.08%.After demultiplexing and removal of low-quality reads, there were 143,740,380 pairedend reads (each read 151 bp long) for Mp313E, 150,377,954 paired-end reads for Mp715, and 142,946,537 paired-end reads for Va35.The average genome equivalent coverage for each of the maize inbred lines was 15 × coverage based on a maize genome size of 1C = 2.3 Gb.The Illumina sequencing reads were obtained in fastq format.

Mapping of sequencing reads to maize reference B73 genome chromosome QTL region
In this study, we focused on the identification of SNPs and INDELs in a maize chromosome QTL region that is strongly associated with resistance to aflatoxin accumulation.The genomic sequence for the QTL-4.06region (a segment of 22 Mbp at chr4:151,000,000.. 173,000,000, AGP v3) from the maize B73 genome was used as the reference sequence for the assembly of resequencing reads.B73 genome version APG v3 is used because it has well-curated gene annotation.Our strategy was to take advantage of the memory-efficient sequencing alignment tools of Bowtie 2 to process only the target regions of interest in the genome.Specifically, genomic sequences for maize inbred lines Mp313E, Mp715, and Va35 were assembled, aligned with the B73 QTL-4.06 region, displayed along with their Illumina sequencing reads, and stored in SAM/ BAM files for further variant calling.The BAM files for maize inbred lines were viewed with the Integrative Genomics Viewer (IGV_2.3.91) for side-by-side comparisons of the genome sequences.Large numbers of SNPs were revealed in the QTL-4.06region among maize-inbred lines Mp313E, Mp715, Va35, and B73 (Fig. 1).

Variant calling and filtering for high-quality SNPs and INDELs
Genetic variants (including SNPs and INDELs) and associated data were extracted from the sequencing mapping (SAM/BAM) files to generate VCF files.From our study, a total of 348,571 genetic variants were extracted from the 22-Mbp maize QTL-4.06 region.Figure 2 is an example visualization of the raw VCF variant data along the first 2-Mbp segment at the beginning of the QTL-4.06showing the distribution of the DP, QUAL, and MQ quality scores obtained through variant calling.The distribution of the DP, QUAL, and MQ quality scores along with genomic data provided a quick survey of the quality of the genetic variants.Such a survey was necessary for the determination of criteria for variant filtering to maximize the number of true genetic variants and minimize error.In this study, the raw VCF file was filtered stepwise, first by DP ≥ 500 and then by QUAL ≥ 600.By stepwise filtering, low-quality variants with scores below the thresholds were removed from the downstream analysis (Fig. 3).A total of 44,229 genetic variants (SNPs and INDELs) met our filtering thresholds.Figure 3 shows the distribution of the DP and QUAL scores of all 348,571 raw variants before variant filtering and the changes in MQ scores before and after variant filtering.In practice, we only used very high QUAL scores (999) as the threshold for mapping quality because there were plenty of variants with high QUAL and DP values (Fig. 3).The genotypes on each variant site were also extracted Examination of the 44,229 variants in the maize inbred lines revealed that Mp313E, Mp715, and Va35 shared more common genotypes at these variant sites and were more distant to B73 in the QTL-4.06region.Subsequently, we screened for variants polymorphic among Mp313E, Mp715, and Va35 maize inbred lines, in addition to being polymorphic to B73, and obtained 1081 genetic variants.Figure 4 shows a survey of the distribution of MQ scores in the set of 1081 polymorphic genetic variants.Supplemental Table 1 1 describes a variant, which is a SNP, the reference allele (from B73) being 'A' and the alternate allele being 'G'.The genotypes at this chromosome position for the maize inbred lines were listed under the three fields labeled with 'Mp313E', 'Mp715', and 'Va35'.SNPs were converted from the character code into integer code as 0 and 1, representing 'REF' and 'ALT', respectively.The genotype in 'Mp313E' '0/1' means that Mp313E is heterozygous 'A/G' at this position.The genotype of 'Va35' '1/1' means Va35 is homozygous for the alternate allele 'G/G' at this position.This data line also includes quality scores (DP = 3111 and QUAL = 999) showing that this SNP was associated with high-quality scores.Similarly, on the data line labeled as SNP ID 2758, the variant is an INDEL.The reference allele at this variant site is 'TGAAG' and the alternate allele here is 'TGA AGA AG' (DP = 731 and QUAL = 999).We have established a workflow to identify large numbers of SNPs and INDELs by genome resequencing.As an example, we obtained 1081 highly polymorphic SNPs and INDELs along with 311 genes providing the genomic variation data spanning the 22-Mbp maize QTL-4.06 region (Supplemental Table 1).It is worth noting that we identified eight SNPs that distingush the two resistant corn inbred lines from the two susceptible corn inbred lines, providing important DNA markers for future breeding aoolications.

Discussion
Maize resistance to aflatoxin accumulation is a quantitative trait controlled by multiple genetic and environmental factors.QTL mapping studies on two populations developed from Mp313E × B73 and Mp313E × Va35, respectively, revealed common maize chromosome regions (Brooks et al. 2005;Davis et al. 2000;Willcox et al. 2012).Since both QTL mapping populations utilized the resistant maize inbred line Mp313E by crossing with two different susceptible maize inbred lines (Va35 or B73), the common QTL regions identified from Mp313E can be considered with high significance (Brooks et al. 2005; Davis et al. 2000;Willcox et al. 2012).The molecular genetic map of the population derived from Mp313E xVa35 was created using only 103 available RFLP and SSR DNA markers (Willcox et al. 2012).The Mp313E × B73 map was only based on 85 available SSR markers to cover 1553 cM with a mean interval of 18.1 cM.Areas involved in resistance to aflatoxin accumulation were contributed from Mp313E in both studies (Brooks et al. 2005).For example, the QTL-4.06region of Mp313E was identified in both studies as a major QTL that significantly contributed to reduced aflatoxin levels across different environments (Brooks et al. 2005;Willcox et al. 2012).However, due to the lack of sufficient DNA markers at the time, chromosome regions harboring resistance QTL are often large, making it difficult to use the QTL in breeding.Therefore, highdensity DNA markers like those found in this study are needed to break into large intervals and precisely pinpoint the DNA sequence(s) underlying the trait of interest.Increasing the density of DNA markers per QTL region is important to identify the resistance to aflatoxin accumulation, which is the priority of our genome sequencing analysis.Whole-genome resequencing has proven to have great potential in the identification of large numbers of SNPs and INDELs, which can be utilized as DNA markers for breeding (Belkadi et al. 2015;Koboldt et al. 2013;McKernan et al. 2009).In this study, genome resequencing was conducted on two resistant maize inbred lines (Mp313E and Mp715) and one susceptible maize inbred line (Va35).Taking advantage of the reference genome B73 being one of our maize breeding parental lines, our study aimed to reveal maize SNPs and INDELs and increase the density of DNA markers in the previously identified QTL-4.06.Large amounts of genetic variants (SNPs and INDELs) were revealed from our analysis.Assembly of the genome sequences on maize QTL-4.06 for the four maize inbred lines revealed 348,571 variants spanning the 22-Mbp-long chromosome segment.This is in sharp contrast to the previous studies in which only two DNA markers flanking the 22-Mbp region were available (Brooks et al. 2005).Nevertheless, the large volume of SNPs adds a new level of complexity to the study of the QTL-4.06region.Rigorous filtering must be applied to remove lowquality SNPs and INDELs.In this study, we took a multi-step filtering strategy, first by the DP value (the number of sequencing reads identifying that variant) and then by the QUAL value (the likelihood of the alternate allele is misidentified, the larger value the better).A total of 44,229 variants with high-quality scores were obtained after filtering.We added a filter step that narrowed our pool to 1081 polymorphic genetic variants.The 1081 potential DNA markers from the 22-Mbp QTL-4.06 region on maize chromosome 4 will be very helpful in dissecting this significant QTL region carrying the resistance to aflatoxin accumulation.In this study, we also demonstrated a strategy to assemble and analyze sequencing reads in a specific genomic region at one time.Extraction of genetic variants for all genomic positions is timeconsuming and unnecessary for many applications.Therefore, it is advantageous and feasible to only focus on the target chromosome regions of interest.
From our research, we also observed the interestingly high level of heterozygosity of SNPs revealed in the QTL-4.06region among the corn inbred lines in our research.The heterozygosity in this region was revealed after we assembled and compared the genome sequencing data.We noticed significant numbers of 0/1 genotypes versus 0/0 and 1/1 genotypes of SNPs.The heterozygosity may indicate that the admixture in this region exist in these inbred lines.For example, these inbred lines may have higher levels of heterozygosity in certain regions due to hybridization or introgression events in the process of selection for traits.Theoretically, some genomic regions may be more prone to heterozygosity due to structural variation or other factors.It may be helpful if such regions contain important genes or regulatory elements that contribute to phenotypic variation or are involved in important biological functions such as fungal resistance.The study of the QTL-4.06region serves as a first step towards the analysis of the whole-genome sequencing data for resistant corn inbred lines.Since aflatoxin resistance is a complex trait controlled by multiple genes, eventually more genetic regions will be analyzed in the whole genome.Modern crop breeding relies on sufficient molecular markers, particularly for quantitative agronomic traits.The NGS-assisted crop breeding is very powerful as an enhanced version of the marker-assisted molecular breeding method.
2) functions to generate VCF files (v4.2, http:// samto ols.github.io/ bcfto ols/).VCF files usually contain a long list of variants, such as SNPs and INDELs, extracted from the aligned sequencing data along with the quality scores.A VCF file has a typical text format for listing genetic variant data, with fields labeled as 'CHROM', 'POS', 'ID', 'REF', 'ALT', 'QUAL', 'FILTER', 'INFO' and 'FORMAT', indicating the variant chromosome position (POS), reference allele (REF), alternative allele (ALT), various quality scores (QUAL and INFO), and the genotype call for each sample (FORMAT).The R package vcfR was used for the visualization of the raw VCF 104 Page 4 of 11 Vol:.(1234567890) from VCF files for maize inbred lines Mp313E, Mp715, Va35, and B73.

Fig. 1
Fig. 1 Genomic graph produced by the Integrative Genomics Viewer (IGV_2.3.91)showing a portion of assembled genome sequences side-by-side for the maize inbred lines in this study.The graph is composed of six tracks.The top track shows the scale of this chromosome segment, followed by the VCF track displaying the chromosome positions of the called variants in this region.The next three tracks display the assembled genome sequences of Mp313E, Mp715, and Va35 (from top

Fig. 2 Fig. 3
Fig. 2 Chromosome plot (a Chromoqc plot generated by R package 'vcfR') showing the raw VCF variant data along the first 2-Mbp segment at the beginning of the QTL-4.06region.This visualizes the chromosome positions of the variants extracted by variant calling and the distribution of the DP, MQ, and QUAL (Phred-scaled) quality scores associated with the called variants.It provides a quick view of the quality of the variants.Such a survey is necessary for the determination of criteria for subsequent variant filtering contains a list of 1081 polymorphic genetic variants with chromosome positions of known genes in the maize QTL-4.06 region.Table 1 is a simplified example showing major features of the genetic variant data to help navigation for data displayed in Supplemental Table 1.The first data line in Table 1 describes a gene starting from the chromosome position chr4: 151,098,011 and ending at the position chr4:151,109,982 in the maize QTL-4.06 region.The second data line in Table

Table 1
Example showing major features of the variant (SNPs and INDELs) data identified by variant calling methods