Development of KASP Markers for Germplasm Characterization and Fingerprinting Identication of Broccoli in China

Background: Broccoli (Brassica oleracea var. italica) is a vegetable widely cultivated in China. Many new-type broccoli cultivars were bred and developed by Chinese breeders during the recent three decades. However, the broccoli cultivar nomenclature and detailed information of genetic relationships among broccoli germplasms are unclear. Results: The present study identied millions of SNPs by next-generation sequencing of 23 representative broccoli lines. Through several steps of selection, 100 SNPs were successfully converted into KASP markers, and used to evaluate the genetic diversity, genetic relationship, and population structure of 392 broccoli accessions, which represent the mainly broccoli breeding materials in China. The initial, introduced and improved accessions were well clustered, though some accessions were overlapped between groups, probably reecting the fact that breeding activities led to genetic similarities. To make the KASP genotyping more ecient and cost-effective, 25 of the 100 KASPs were selected for ngerprinting of all accessions, and the 2D barcode contained ngerprinting information were generated for elite varieties. Conclusion: The KASP markers developed in this study provided an ecient way for germplasm characterization, DNA ngerprinting, seed purity identication, and marker-assisted selection of broccoli in China.


Background
Broccoli (Brassica oleracea var. italica) is an economically important vegetable in many countries, and is becoming more popular as a human diet due to its rich source of nutritional value as well as anti-cancer glucosinolates in orets [1]. It was originated in the Mediterranean basin and introduced to China in the 20th century [2]. In recent years, the broccoli industry in China has made tremendous progress with about 80 000 ha cultivated area in 2019. In addition, many new-type broccoli cultivars were produced and developed by Chinese breeders during the recent three decades. While signi cant increases in productivity resulting from the use of improved varieties with narrow genetic base. Consequently, it is di cult to distinguish the similar broccoli varieties in seed trading market. The International Union for the Protection of New Varieties of Plants (UPOV) established distinctness, uniformity and stability (DUS) testing for new varieties before registration [3]. As supplementary, DNA ngerprinting can help improve knowledge of varieties with minimum distance in traditional characteristics [4].
Over the past three decades, several different DNA marker technologies, including ampli ed fragment length polymorphisms (AFLPs), simple sequence repeats (SSRs), and single nucleotide polymorphisms (SNPs), have been widely applied in research areas such as DNA ngerprinting of varieties, genetic diversity analysis, population structure analysis, and molecular marker-assisted breeding [5]. SSRs are routinely used for ngerprinting, because of their high level of polymorphism [6][7][8][9]. However, SSRs have some disadvantages, for instance, the throughput of detection is low, and the data integration or comparison between different detection platforms is di cult [5]. While SNPs are the most abundant polymorphisms in any species [10,11], and have been used in many genetic applications, including germplasm characterization (genetic diversity, genetic relationship, and population structure), quantitative trait loci (QTL) mapping, and marker-assisted selection [12]. Some SNP array-based marker sets could also be used for ngerprinting [5,[13][14][15], though this may not be their primary purpose.
Kompetitive allele speci c PCR (KASP) assays are based on competitive allele-speci c PCR and enable bi-allelic scoring of SNPs, insertions and deletions (InDels) at speci c loci [16]. Compared with xed chip array, KASP is a more exible and cost-effective technology for SNP genotyping, especially for a small number of markers to genotype large number of samples. KASP assays have been used in various crops, such as rice [17,18], wheat [19], and maize [12]. In some horticultural plants, KASP markers were used primarily for ne mapping of the candidate QTL [20][21][22]. To our knowledge, the development of KASP markers for broccoli has not been reported yet.
In this study, millions of SNPs were identi ed by re-sequencing of 23 broccoli lines. Through several steps of selection, 100 SNPs were successfully converted into KASP markers, and used for genetic diversity, genetic relationship, and population structure analysis of 392 broccoli accessions in China. Then 25 of the 100 KASPs were used for ngerprinting of all accessions. This KASP marker set provided an e cient way for broccoli germplasm characterization and marker-assisted selection.

KASP marker development
Twenty-three broccoli lines were used for the next generation sequencing, with average sequence depth of 28.0×. A total of 346.188G raw data were initially obtained, and after quality ltering and adapter trimming, 345.577G clean data were available for further processing (Additional le 1: Table S1). A total of 2303.9 million paired reads (each line ranges from 87.7 to 124.9 million) were generated, and 2272.9 million (98.7%) paired reads were successfully aligned to the broccoli reference genome [23] (Additional le 1: Table S2). As a result, millions of SNP markers were detected in each line, with the number from 899,926 to 1,908,908 (Additional le 1: Table S3).
A total of 28,220 SNPs were selected after ltering out the SNPs with high missing data (> 20%), low levels of MAF (< 0.05) and the SNPs with multiple variation before or after 50bp of the loci. The PIC values of all these SNPs were calculated, and 13,621 SNPs with the PIC between 0.2 and 0.5 were retained. Of these, 8,768 (64.4%) SNPs could be successfully designed as KASP markers. According to the physical position of the SNPs and the genomic structural annotation, 2,515 SNPs located the exonic (including non-synonymous, stop gain or stop loss), intergenic, upstream or downstream of the functional genes. These SNPs may effected some important agronomic traits that could be used for molecular breeding. Ultimately, 500 SNPs that uniformly distributed across nine chromosomes were selected to develop KASP markers.

Genotyping of 392 broccoli accessions
To evaluate the quality of the KASP markers, 23 broccoli lines used for next generation-sequence (NGS) were genotyped by the selected 500 KASP markers. As a result, 347 (69.4%) KASP markers could be genotyped successfully, that is, the genotype for most of the accessions was clear (Fig. 1A). While 54 KASPs presented inconformity genotype between NGS and KASP assay, including some KASPs with no polymorphism (Fig. 1B). The remaining 293 KASPs were further used for genotyping the 392 broccoli accessions. To make the results reliable, markers showed more than 10% missing values or had ambiguous SNP calling were removed (Fig. 1C), and 100 markers with high quality and evenly distribution were used for further analysis ( Fig. 2; Additional le 2) In all 392 accessions, the heterozygous marker ratio was 5.5%-87.2%, and 387 accessions (98.7%) were labeled with more than 30% heterozygous marker ratio, indicating that most of these accessions were heterozygous (Fig. 3A). broccoli accessions were clustered into three groups (Fig. 4). The cluster I contained 103 accessions, and most of them were improved accessions. The cluster II contained eight accessions, which were introduced from Japan, including bck2, bck3, bck5, bck6 and so on. The cluster III is the major group that contains 281 accessions, and most of them were initial accessions. Most of the improved or introduced accessions presented strong growth potential, high-round ower head with thin and uniform size buds.
While the initial accessions were faded in some characteristics.
Principal component analysis (PCA) indicated that cluster II (introduced accessions) was contained within cluster III (initial accessions), and cluster I (improved accessions) was overlapped with the other groups (Fig. 5). The rst axis explained 8.1% and the second axis explained 6.5% of the overall variance, respectively.
The population structure of 392 broccoli accessions was classi ed by Structure software version 2.3.4. The population number K was set to 1-10, and each hypothetical K value was calculated ve times. The ΔK of Evanno was maximal at K = 2 (Fig. 6A). Therefore, the 392 broccoli accessions were divided into 2 groups: POP1 and POP2 (Fig. 6B). POP1 was an improved-type broccoli subgroup consisting of 155 broccoli accessions, and POP2 was an initial-type broccoli subgroup consisting of 237 broccoli accessions. The F ST value of POP1 and POP2 was 0.0008, which could strongly explain the genetic distance between these two populations. Comparing with the results of phylogenetic tree, 98 of 103 accessions in cluster I belongs to POP1. All accessions in cluster II, and 223 of 281 accessions in cluster III belong to POP2 (Additional le 4).
Above all, these 100 KASP markers were effective at discriminating the population structure of the accessions. However, different types of accession were overlapped, probably re ecting the fact that breeding activities led to genetic similarities.

Selection core KASP markers for ngerprinting of broccoli accessions
To build a rapid and cost-effective way of variety identi cation, we selected 25 KASP markers from the genotyping database as the ngerprinting for every accession. These KASP markers were highly effective for distinguishing among 392 examined accessions, and evenly distributed across the nine chromosomes (Fig. 2). For each accession, the genotype based KASP barcode was used to generate corresponding 2D barcode using online tool (available at www.cli.im ). Then this barcode can be scanned to obtain the information used for creating the 2D barcode. Figure 7 depicts barcode of a representative variety of broccoli used in the present study.

Discussion
KASP marker development by next-generation sequencing Markers have been extensively used for genetic diversity assessment, population structure determination, QTL mapping or molecular breeding. As the third generation marker type, SNPs are the most commonly used markers because of its high density in genomes, and more available high-throughput genotyping method. In the present study, we identi ed millions of SNPs by next-generation sequencing of 23 representative broccoli lines. Mining SNPs by sequencing so many lines can make SNPs a wide range of variation.
Hundreds of SNPs can meet the demand of germplasm characterization and ngerprinting. To make it cost-effective and more exible, KASP assay technology, which produced by LGC company, was used for SNP genotyping. While not all SNPs can be converted into KASP markers. Design success rate and work success rate are two parameters for conversion e ciency evaluation. The SNP sites that can be used to design primers to the total number of SNP sites are called the design success rate. And the number of SNP sites that can generate genotype calls via primers to the number of SNP sites with successful designed primers called the work success rate [17]. In this study, we drove up the ltrating standards, which the GC content of the primers should be more than 0.3, so that the design success rate was only 64.4%. But this rate could not affect the nal result or experimental cost, because we designed the KASP primers before synthesis. In addition, 347 (69.4%) of the 500 KASP markers could successfully genotyped for the broccoli accessions. Compared with some other reports, this lower work success rate might be because of the inconsistent DNA quality and quantity provided by different breeding groups. To solve this problem, we selected 100 high quality KASP markers for germplasm characterization. To our knowledge, this was the rst time that hundreds of KASP markers were developed in broccoli. Thanks to the supporting of these breeding groups, so many accessions could represent nearly all types and important agronomic traits of broccoli in China.

Characterization of 392 representative broccoli accessions in China
Broccoli is a popular vegetable in China, not only because of its good taste, but also the rich nutrients in the ower head and stem. While broccoli breeding in China has a history of only 30 years, and a few elite varieties were used as core germplasms, resulting the narrow genetic base of accessions. In this study, based on the 100 selected KASP markers, 392 accessions were clustered into two (Fig. 6) or three subgroups (Fig. 4). In particular, PCA results showed that group II was encompassed by group III, and group I was overlapped with other groups (Fig. 5), indicating that the initial and introduced type did not have obvious boundaries.
Increasing favourable alleles in a breeding population, while reducing inbreeding and avoiding genetic drift, is of great importance for attaining long-term breeding gains [24]. The genotyping of Chinese broccoli accessions can help improve knowledge of the genetic relationships among accessions, which contribute to reduce chances of introgression of undesirable alleles or guide selection of parents to reduce mating among related lines [25].

Application value of ngerprinting for broccoli accessions
As the narrow genetic base of broccoli, it is sometimes di cult to distinguish varieties according to their agronomic performance. Marker-based ngerprinting have advantages in identifying varieties, and are free from environmental impact, compared with customary eld inspection [26]. Similar to other horticultural species, spurious varieties are easily mixed with the genuine ones in broccoli seed market, which will eventually harm the interest of consumers and breeders. In the present study, the barcodes of all broccoli accessions were established, especially for some widely used varieties. This barcode contained ngerprinting information of the accessions, and can ensure the authenticity of varieties.
In addition, commercial F 1 cultivars of broccoli were cytoplasmic male sterile lines, and the parents or other heterogeneous seeds will be easily mixed during seed production, including the cause of pollen contamination, mechanical admixture, or arti cial mislabeling. Loss of parent materials can cause irreparable damage to the breeders. To solve these problems, it is essential to identify the seed purity before commercial distribution. Though we have developed 100 KASPs for broccoli ngerprinting, in our experience, 3-5 KASPs were adequate for seed purity identi cation. The marker assembly can be selected from the 100 KASPs according to the parental genotypes of a speci c variety.
The exchange of germplasm resources is the prerequisites for human beings to effectively develop and utilize their resource value, avoid duplication and waste, and create more wealth for human beings. While protecting the intellectual property of breeders needs to be taken into consideration before germplasm exchange. From this perspective, ngerprinting can not only provide the identi cation for every germplasm, but also the evidence for the proportion of shared bene ts. That is, breeders or merchants can acquire corresponding bene ts from a hybrid variety, depending on the ratio of the same ngerprinting markers as its parents.
As the high-e ciency and cost-effective advantages, KASP assay is a promising technology for molecular breeding. In this study, most of the SNP loci selected for KASPs were based on the physical positions of functional genes. For example, the KASP markers GZKASP_1 and HFKASP_6 were developed from the reported loci controlling clubroot and black rot resistance in cabbage, respectively [27,28]. Some KASPs were located the homologous genes of Arabidopsis thaliana, controlling glucosinolate, owering, anthocyanin content and so on. Further study should be performed to evaluate the function of these KASPs in broccoli, so that they could be applied in marker-assisted selection.

Materials And Methods
Materials for next-generation sequencing and ngerprinting Based on their phylogenetic relationships, 23 broccoli lines were selected for NGS. Ten of these materials were DH lines, and the others were high-generation inbred lines. These materials are representative in many important agronomic traits of broccoli, such as head color, head shape, leaf angle, stem height, owering time and so on.
A total of 392 broccoli accessions provided by 11 breeding groups in China, were used for ngerprinting identi cation. These materials could nearly cover all ecological types of broccoli in China.

Selection of SNP loci
After obtaining the SNP loci from next generation sequencing of the 23 broccoli lines, statistical values, including minor allele frequency (MAF), polymorphic information content (PIC) and heterozygous marker ratio for each SNP were estimated using PowerMarker v3.25 [29].
Several steps were performed to select the available SNP loci for KASP marker development.
(1) Filtered out the SNPs which in more than 20% of the materials were missing data. The remaining loci were selected according to the physical position, and cover the chromosomes as evenly as possible.

KASP marker genotyping
Genomic DNA for genotyping was extracted from young leaf tissues by a modi ed cetyltrimethylammonium bromide method [30]. The DNA quality and quantity were determined by a NanoDrop 2000 Spectrophotometer, and 10-30 ng/ml was suitable for KASP assay.
The PCR ampli cation was performed in 96 well microplates using a 10.14 µl reaction volume. Each well contained 5 µl template DNA, 5 µl 2×KASP Master mix, and 0.14 µl KASP assay mix. After dispensing genotyping mix onto the reaction plate, the PCR was run following a program including three steps.

Consent for publication
Not applicable.

Competing interests
The authors declare that they have no competing interests. Representative results of KASP genotyping assay. (A) KASP marker with good polymorphism that was successfully developed. (B) KASP marker with no polymorphism that should be discarded. (C) KASP marker with more than 10% missing values or had ambiguous SNP calling that should be discarded.

Figure 2
Distribution of KASP markers among broccoli chromosomes. The horizontal lines perpendicular to a chromosome represent the KASP markers developed in this study, among which the red ones were core KASP markers selected for ngerprinting of broccoli accessions.  Technology Co.,Ltd; Japan: Sakata Seed Corporation, Japan. The colors in the tree correspond to subpopulations. Cluster I with blue color was improved accessions, cluster II with yellow color was introduced accessions, cluster III with red color was initial accessions.

Figure 5
Principal component analysis (PCA) of the 392 broccoli accessions genotyped with 100 genomic KASPs.
Cluster I with blue color was improved accessions, cluster II with yellow color was introduced accessions, cluster III with red color was initial accessions.