Identication of Accession-Specic Variants and Development of KASP Markers for Assessing the Genetic Makeup of Crop Seeds

Most crop seeds are F1 hybrids. Seed providers and plant breeders must be condent that the seed supplied to growers is of known, and uniform, genetic makeup. This requires maintenance of pure genotypes of the parental lines and testing to ensure the genetic purity of the F1 seed. Traditionally, seed testing for purity was done with a grow-out test (GOT) in the eld, but these tests are time consuming and costly. Seed testing with molecular markers was introduced as a replacement for GOT early in the last decade. Recently, Kompetitive allele specic PCR (KASP) markers are promising tools for genetic testing of seeds. However, the markers available at that time could be inaccurate and could be used with only a small number of accessions or varieties due to the limited genetic information and reference genomes available. Here, we identied 4,925,742 SNPs in 50 accessions of the Brasscia rapa core collection. Furthermore, the total 2,925 SNPs were selected as accession-specic SNPs, considering properties of anking region harboring accession-specic SNPs and genic region conservation among accessions by NGS analysis. In total, 100 accession-specic markers were developed as accession-specic KASP markers. Based on the results of our validation experiments, the accession-specic markers successfully distinguish individuals from the mixed population including 50 target accessions from B. rapa core collection and outgroup. This study provides ecient methods for developing KASP markers to distinguish individuals from the mixture comprised of breeding lines and germplasms from the resequencing data of Chinese cabbage (Brassica rapa spp. pekinensis).


Conclusions
This study provides e cient methods for developing KASP markers to distinguish individuals from the mixture comprised of breeding lines and germplasms from the resequencing data of Chinese cabbage (Brassica rapa spp. pekinensis).

Background
Most growers of vegetable crops rely on F1 hybrid seeds, and suppliers of these seeds must maintain genetically pure stocks. Not only do the suppliers need to keep seeds of known genetic makeup for sales but also for their ongoing breeding programs. Until the late 1990s, seed providers relied on what is known as the grow-out test (GOT), in which the seeds were planted in the eld and the traits of the test plants were assessed by investigation [1].
However, this method is time consuming, requires a large amount of land, and is partly subjective as plant phenotype can be affected by the environment [2]. Thus, precise and e cient tools to assess the genetic makeup and purity of hybrid seeds are sought by seed providers.
In response to these limitations of the GOT, various types of molecular markers have been developed and used to characterize the genotypes of crop plants. This endeavor began in the early 1990s and has resulted in the identi cation of numerous types of markers. These include restriction fragments length polymorphism, ampli ed fragments length polymorphism, simple sequence length polymorphism, simple sequence repeat (SSR), and sequence tagged site (STS) markers. The PCR-based SSR or STS markers can be rapidly acquired and are easy to assay, and they have been used for crop breeding or assessment of hybrid seeds in rice, maize, pigeon pea, and pepper [1][2][3][4]. However, these markers were developed for speci c breeding lines or varieties and are not su cient to assess the purity of hybrid seeds.
Application of molecular markers to a wide range of situations that require accurate assessment of the genetic makeup of a plant must entail investigating genetic variants in the core collections and commercial lines. Previously, investigation of genetic variants of core collections and commercial lines of crops was limited because of the expense of sequencing and the absence of reference genomes. With the advent of next-generation sequencing technology, reference genomes have been constructed for a number of crops, including tomato [5], pepper [6], cucumber [7], melon [8,9], wheat [10], and Chinese cabbage [11]. Whole-genome resequencing of various crops has also been undertaken. This has allowed the development of widely applicable molecular markers, accomplished by resequencing analyses of core collections. Also, the development of the Kompetitive Allele Speci c PCR genotyping (KASP) assay has permitted the development of accession-speci c markers for large-scale seed purity assessments [12][13][14].
Here, we present pipelines for the detection of accession-speci c genetic variants and accession-speci c markers from 50 Chinese cabbage accessions.
The pipelines were constructed with a combination of genetic variants calling, detection of accession-speci c variants, and determination KASP marker candidate sequences. Accession-speci c single nucleotide polymorphisms (SNPs) were identi ed from 50 Chinese cabbage core collections, and 100 accession-speci c KASP markers from 50 accessions were developed from a pool of these SNPs. Then, evaluation of KASP markers was carried out using the core collection and 35 non-core collections. We have identi ed 100 KASP markers that we believe will be useful in assessing hybrid seed purity.

Detection and evaluation of accession-speci c variants
We performed genome resequencing analysis of 50 accessions from the Brassica rapa core collection, with the goal of developing markers speci c to each accession. This core collection is composed of four different groups: non-pekinensis, Chinese, Japanese, and Korean breeding lines ( Fig. 1 and Supplementary Table 1). The reads from the analysis of these accessions were mapped to the Brassica rapa reference genome (ver 3.0) [11] with the BWA-MEM (ver 0.1.17) using the default parameters. We detected a total of 4,925,742 SNPs from the 50 accessions (Table 1 and Supplementary Data 1). Our goal was to identify genetic variants from the B. rapa core collection. To this end, we constructed a variant-identi cation pipeline by combining the calling and ltering variants (Supplementary Figure 1). First, SNPs of individual accessions were detected and merged in the joint variant calling step.
Then, homozygous alternative alleles for single accessions were identi ed as accession-speci c SNPs by comparing the genotype of each individual accession in the core collection. To develop KASP markers, each accession-speci c marker was evaluated by considering the non-redundant anking sequences, overlapping of repeat sequences, and annotation of the SNPs. Finally, SNPs with unique anking sequences without overlapping repeat sequences were identi ed as candidates for development of KASP markers. We identi ed 2,925 accession-speci c SNPs as such candidates (Table 1).
Almost all of these SNPs were in anking sequences of genes and 2,806 of them, or approximately 95.9%, were in genic regions ( Table 2). Of the 2,925 SNPs, approximately 456, or 15.6%, resulted in non-synonymous mutations, and 19 variants led to abnormal termination of translation. These genetic variants may be important in future investigation of trait-associated genes or markers. Our next step in the development of accession-speci c markers was to validate the SNPs with genome resequencing analysis, which we did with Sanger sequencing (Fig. 2).
Eight anking sequences of the accession-speci c SNP candidates were selected from the four groups of the core collection. Primers for Sanger sequencing were designed (Supplementary Table 2). From the results of the Sanger sequencing, we concluded that 7 of the SNP candidates were speci c to a single accession (  Figure 8), leading us to conclude that SNPs with conserved anking sequences were the best candidates for developing accessionspeci c markers with PCR. Also, candidate SNPs with highly conserved anking sequences that are suitable for primers may be necessary for developing wide-ranging KASP markers that will apply to crops not in the core collection or to commercial cultivars. Clearly, determination of primer sites for KASP markers is important for the development of accession-speci c KASP markers.

Development and evaluation of KASP markers
Our next venture was to develop accession-speci c KASP markers for assessment of hybrid seed purity. Five of the accession-speci c SNP candidates identi ed as described above were selected from individual accessions for further analysis. Primer sites played an important role in successful marker development, and conserved anking sequences of SNPs in our core collections were surveyed (Fig. 3a). Flanking regions containing non-sequence sites, shown as N in the reference genome, were removed from the primer candidate sequences (Fig. 3b). Then, ve anking sequences in each accessionspeci c SNP were selected for further evaluation of KASP markers. It was necessary to consider the genomic position of the SNP in the development of a wide range of markers, as overlapping genomic positions among markers may lead to ine ciency or false positive results when seed purity is assessed.
To avoid this redundancy, the genomic positions of ve candidate SNPs from individual accessions were investigated and the positions unique to the accessions were selected (Fig. 4). In total, two SNPs in each accession were selected for validation of KASP markers (Supplementary Table 3). Many of the KASP markers that were in genic regions caused non-synonymous variation, although almost all accession-speci c SNPs were detected in the anking regions of genes ( Table 2).
Validation of KASP markers was carried out using 50 accessions from core collection and 35 from non-core collections or commercial cultivars to determine their applicability to a wide range of seed purity assessments (Fig. 5, Table 3, and SupplementaryData2). Based on the results, we conclude that accession-speci c markers were successfully distinguished in individual accessions in both the core collection and the outgroup (Fig. 5). We suggest that accession-speci c markers developed using a large amount of individual resequencing data can be used to assess seed purity of seed from of non-sequenced accessions or cultivars. The accession-speci c markers developed here should be useful in a wide range of seed purity assessments in crop breeding and commercial seed production.

Discussion
Crop breeders need to maintain accessions and varieties of crop plants of known genetic makeup for the successful development of new varieties. Seed purity is traditionally estimated with the GOT in the eld [1], but this test is both time consuming and expensive. The use of molecular markers to identify the genotype of seeds and plant material holds promise for replacing the GOT. Although the markers are faster and less expensive that the GOT, at present, they can be inaccurate and most of them were developed early in the 2010s for only a small number of accessions or varieties. With the advent of next-generation sequencing technology, construction of high-quality reference genomes and genetic information for many different cultivars and species has been generated. This information should provide the background necessary for the development of molecular markers that will provide accurate information and will be useful in a wide range of applications. These will include studying genetic variants in individual accessions, varieties, and large populations [25]. Reference genomes also provide useful detailed information on genetic variants such as gene structures, repetitive sequences, and accurate positions of various genetic features. This technology can also be applied to correlation analyses of phenotypes and may prove useful in analyses such as quantitative trait locus mapping and genome-wide association studies (GWAS) [26-28].
In the current study, we identi ed SNPs in the B. rapa core collection with genome resequencing (Fig. 1). Examining accession-speci c genetic variants, we identi ed 4,925,742 SNPs in 50 accessions and, among these, we identi ed 2,925 SNPs that were speci c to a single accession (Table 1). Almost all of the genetic variants we detected were in anking regions of genes, but KASP markers were developed from SNPs that caused non-synonymous variations and were in genic regions. Conservation of the genic regions could maintain the function of the genes, accounting for our observation that the ratio of conserved sequences was relatively greater than for the other regions. The non-synonymous mutations might be involved in phenotypic or morphological differences among accessions and should be useful in investigation of trait-associated genes or markers associated with traits.
Until quite recently, molecular markers have not been developed for crops or cultivars, and their application has been limited. Our development of molecular markers using the core collection of B. rapa was performed, in part, to address this problem: we sought to develop markers, considering conserved sequence for primer sites, for a wide range of applications. (Fig. 3). Furthermore, genomic positions of accession-speci c markers were investigated to avoid overlapping of the genomic positions of KASP markers (Fig. 4). In total, 100 accession-speci c markers were developed as accession-speci c KASP markers. Based on the results of our validation experiments, we conclude that the accession-speci c markers were successfully distinguished in individual accessions in test populations from non-core or commercial cultivars (Fig. 5).

Conclusions
This study shows e cient methods for developing KASP markers to distinguish individuals from the mixture comprised of breeding lines and germplasms from the resequencing data of Chinese cabbage (Brassica rapa spp. pekinensis). We show that the accession-speci c SNPs identi ed by NGS data pipelines are feasible targets to develop KASP markers. We believe the KASP markers developed here will be applicable to assessment of seed purity in a wide variety of situations, including core collections, other non-sequenced accessions, or commercial cultivars. These markers should also prove useful to breeding programs of B. rapa, facilitating the essential maintenance of pure parental lines. Furthermore, the non-synonymous mutations detected here should aid investigations of genes or markers associated with traits and in functional studies of genes. This study will help marker development to check the seed purity of commercial F1 seed samples whether they are produced by unintended crossing or not.

Plant materials
To develop accession-speci c KASP markers, 50 accessions of Brassica rapa core collections [15] were used for whole-genome resequencing analysis. These accessions were characterized as inbred lines or doubled haploid lines. For assessments of KASP markers, 35 accessions (F1 hybrids and germplasm) donated by Chungnam National University (CNU) and Rural Development Administration were used as the control panel for validation of the KASP markers.
Genome resequencing of core collection Truseq Nano DNA libraries were constructed according to the manufacturer's instructions. In total, 100 ng or 200 ng of high molecular weight genomic DNA to generate a large (550 bp) insert size were sheared to yield DNA fragments using Covaris S2 system. Blunt-ended DNA fragments were generated with a combination of ll-in reactions and exonuclease activity. A single A-base was then added to the blunt ends of each strand in preparation for ligation to the indexed adapters. Each adapter contained a single T-base overhang for ligating the adapter to the A-tailed fragmented DNA. Ligated products were ampli ed with reduced-bias PCR. The quality of the ampli ed libraries was veri ed with capillary electrophoresis (Bioanalyzer, Agilent).
After QPCR using SYBR Green PCR Master Mix (Applied Biosystems), we combined index-tagged libraries in equimolar amounts in the pool. Wholegenome resequencing was performed with an Illumina NovaSeq 6000 system, following the protocols provided for 2×100 sequencing.

Construction of a pipeline for accession-speci c variants calling
Positions of SNPs that had homozygous alternative alleles for one accession in the population variant call format (vcf) le were selected as accessionspeci c variants by in-house perl script. To reduce the possibility of primer ampli cation for multiple loci, target sequence redundancy in the B. rapa genome was estimated with megablast task of BlastN [22] with 501 bp sequences harboring accession-speci c SNPs. Accession-speci c SNPs without anking sequence redundancy were selected for KASP primer design. Also, accession-speci c SNPs with anking sequence overlapping predicted repeat sequences were ltered out by utilizing gff le provided by B. rapa reference genome ver 3.0. Accession-speci c variants on the exon region were given priority for KASP primer design after SNP annotation by snpEFF [23]. A total of 20 accession SNP positions from 10 accessions were con rmed by Sanger sequencing.

Construction of pipeline for KASP marker development
We sought to minimize the failure of primer ampli cation due to insertion or deletion on the marker target sites (Fig. 3a). This led us to develop a pipeline for producing KASP candidate sequences for accession-speci c variants. The pipeline we developed generates anking region sequences that harbor accession-speci c variants from bam les of each accessions and aligns them based on the reference genome sequence with ClustalW (-OUTPUT=CLUSTAL -TYPE=DNA -GAPOPEN=10 -ENDGAPS -GAPDIST=0.05) [24]. The pipeline evaluates the proportion of missing or alternative alleles from all of the aligned positions and produces consensus sequences masking variable positions (non-reference allele for positions > 10%) with N ( Fig. 3b). Accession-speci c variants located at bp 251 on the consensus sequences were directly used for the KASP primer designed by the LGC Genomics.

Evaluation and application of KASP markers
Validation of the KASP markers was performed with the Nexar system (LGC Douglas Scienti c, Alexandria, USA) at the Seed Industry Promotion Center of the Foundation of Agricultural Technology Commercialization and Transfer (Gimje, Korea). An aliquot (0.8 L) of 2x Master mix, 0.02 L of 72x KASP assay mix (both from LGC Genomics), and 5 ng genomic DNA template from the 85 B. rapa accessions were mixed into 1.6 L of KASP reaction mixture in a 384-well Array Tape. Duplicate reactions were run, and non-template controls were included in each run. KASP ampli cation was performed with the following thermal cycling pro le: 15 min at 94℃, a touchdown phase of 10 cycles at 94℃ for 20 s and at 61℃-55℃, in which the temperature decreased by 0.6℃ per cycle, for 60 s, and 26 cycles at 94℃ for 20 s and 55℃ for 60 s ( rst PCR stage). Next, recycling was performed with three cycles of 94℃ for 20 s and 57℃ for 60 s (second PCR stage). The recycling was performed twice, and the uorescence read was taken for KASP genotyping after PCR ampli cation.

Declarations
Ethics approval and consent to participate   Figure 1 Morphological features of eight representative accessions from four groups of the Brassica rapa core collection. Development of KASP markers. a) Potential problem of primer alignments by possible sequence variation from core collection during KASP marker development, b) Process for development of KASP markers.