Linkage and Association Mapping Identi es Loci and Develops Kompetitive Allele Speci c PCR Markers for Improving Wheat Grain Protein Content

Peng Jiang Jiangsu Academy of Agricultural Sciences https://orcid.org/0000-0001-7840-8362 Peng Zhang Jiangsu Academy of Agricultural Sciences Lei Wu Jiangsu Academy of Agricultural Sciences Yi He Jiangsu Academy of Agricultural Sciences Chang Li Jiangsu Academy of Agricultural Sciences Hongxiang Ma Jiangsu Academy of Agricultural Sciences Xu Zhang (  xuzhang@jaas.ac.cn ) Jiangsu Academy of Agricultural Sciences


Introduction
Wheat is one of the most widely grown food crops and provides 20% of human food energy requirements and 25% of proteins consumed on a daily basis, worldwide (Wheat Initiative, www.wheatinitiative.org/). Grain protein content (GPC) of wheat is also related to the end-use quality. Therefore, improving GPC is an important breeding objective for programs that aim to increase the nutritional value and quality of wheat.
(2015) used 163 recombinant inbred lines (RILs) derived from two widely divergent parents and detected 11 QTL on 9 chromosomes, and two of these QTL had major effects, explaining up to 16.5% and 16.9% of phenotypic variation. Fatiukha et al. (2020) identi ed 12 GPC QTL in an RIL population derived from the elite durum cultivar Svevo and a wild emmer wheat accession Y12-3, and four major QTL with the highest LOD scores and relatively high and stable phenotypic variance explained (PVE) were found on chromosomes 4BS, 5AS, 6BS, and 7BL. The QTL on 6BS had similar physical position as that on the cloned QTL, Gpc-B1 (Uauy et al. 2006). A worldwide bread wheat core collection, including 372 accessions, was genotyped using DArT, SSR, and single nucleotide polymorphism (SNP) markers on the whole wheat genome, and 14 stable markers across 15 chromosomes were found to be associated with GPC (Bordes et al. 2011). A genome-wide association study was conducted using 189 spring bread wheat genotypes from the International Center for Agricultural Research in the Dry Areas, and 37 signi cant GPC SNPs were identi ed, some of which also exhibited an effect on gluten content (Suliman et al. 2021). However, the mapping results were variable with different mapping populations, methods, and environments, and most studies mainly focused on the identi cation of related loci, but the consequent validation and application were seldom reported except for Gpc-B1. SNP markers have emerged as powerful tools for many genetic applications due to their low assay cost, high genomic abundance, locus speci city, and codominant inheritance. Kompetitive allele speci c PCR (KASP) assay provides a rapid and cost-effective way to precisely determine SNP genotypes that perfectly t the marker-assisted selection (MAS) on a large-scale (Semagn et al. 2014). Rasheed et al. (2016) developed and validated 70 functional KASP markers that are associated with wheat adaptability, grain yield, quality, and biotic and abiotic stress resistances, revealing that the KASP assays were 45 times faster compared with gel-based PCR markers. Lin et al. (2021) identi ed a major QTL related to grain number per spikelet on chromosome 2D and generated a closely linked KASP marker to validate its effects in different genetic backgrounds. Zhang et al. (2021) identi ed 22, 22, and 23 loci associated with seedling resistance to leaf rust, adult plant resistance to leaf rust, and adult plant resistance to stripe rust, using GWAS, respectively, and they converted 12 associated SNPs into KASP markers and veri ed in bi-parental populations.
A combination of linkage mapping and GWAS analysis can provide richer and more accurate information for QTL detection. Dhakal et al. (2018) identi ed a QTL for wheat curl mite resistance on chromosome 6DS in the wheat cultivar TAM 112 using linkage and association analyses, thus demonstrating the effectiveness of this locus to reduce symptom severity. QTrl.saw-2D.2 that controls total root length, was identi ed in the RILs from SY95-71×CH7034 using linkage mapping and was validated in another population comprising 215 diverse lines, which suggests an effective and reliable result (Zheng et al. 2019).
The middle and lower reaches of the Yangtze River is the largest soft wheat production area and the second-largest wheat belt in China (He et al. 2002). The quality traits of wheat, including GPC, have received more attention in recent years; however, limited progress has been made in genetic improvement of GPC in this area (He et al. 2018;Zhao et al. 2020). In this study, we performed linkage and association mapping in the wheat grown in this area to identify the genetic loci related to GPC, and successfully developed and validated the corresponding KASP markers.
On the basis of these results, we obtained some effective molecular markers which can be utilized in MAS for GPC in the middle and lower reaches of the Yangtze River, and are expected to provide assistance for the quality breeding.

Materials
A RIL population derived from Ningmai 9 × Yangmai 158 (282 F 8 RILs) was used for linkage mapping. Ningmai 9 is a soft wheat variety, whereas Yangmai 158 is a hard wheat variety. The RILs and their parents were planted using an augmented design with single row plots, 60 seeds per row with 1.  Table 1) was planted in a randomized complete block design experiment with the same plots as that of RIL population, and two All mature grains were harvested for GPC testing, and the 1163 F 4 lines and 164 F 6 lines were harvested from the breeding population in 2019-2020 at Nanjing, China. The GPC was measured using a Perton DA7200 (Sweden) device, according to the AACC Method 39-10.

Genotyping
Total genomic DNA was isolated from young leaf tissues using the CTAB method (Porebski et al. 1997). The RILs and their parents were genotyped using Illumina 90 K SNP assay (Wang et al. 2014). The genetic map containing 2285 bin markers with a total length of 3002 cM has been published in a previous study (Jiang et al. 2020). The natural population was genotyped using Affymetrix 50 K assay (CapitalBio Technology, Beijing, China). All markers were blasted in IWGSC Reference Sequence v1.0 to obtain the physical positions (The International Wheat Genome Sequencing Consortium 2018).

Phenotypic analysis and QTL analysis
Preliminary statistical analysis and correlation analysis for phenotype were performed in Microsoft Excel 2016. The test for analysis of variance (ANOVA) was conducted using IBM SPSS Statistics 19.0. The broad sense heritability (h 2 ) was estimated according to He et al.
QTL mapping was conducted using the BIP function in IciMapping 4.1, and the algorithm of inclusive composite interval mapping (ICIM) was selected (Li et al. 2007). The walking step for QTL detection was set as 0.1 cM, and the threshold of LOD scores was set as 2.5.

Association mapping
Quality control for the genotype data from Affymetrix 50 K assay was performed in TASSEL software V5.2.13 (Bradbury et al. 2007), and the SNP markers with a minor allele frequency ≤ 5% and missing data >10% were removed. For sequencing the candidate genes, multiple pairs of primers were designed to amplify the whole gene using Primer 6.0, and the sequencing services were provided by Shanghai Sangon Biotech Co., Ltd.

Phenotype analysis
The description of genotype for the RIL and natural populations is shown in Table 1. In the RIL population, the parent Ningmai 9 showed a lower GPC than Yangmai 158 in all environments, which was consistent with their production performance. The max-min and coe cient of variation (CV) were above 5%, except for E1. The GPC in the natural population also presented a great variation, with CV ranging from 6.28% to 10.40% across the environments. GPC showed a signi cant positive correlation among almost all the environments in RIL and natural populations ( Table 2), and the heritability reached 0.56 and 0.74, respectively (Table 3). Both genotype and environment had signi cant in uence on the GPC of the RIL and natural populations, and their interaction also had signi cant in uence on GPC of the natural population, but not on that of the RIL population (Table 3).

Association mapping
The natural population was genotyped using Affymetrix 50 K assay. After quality control, 36360 SNPs were retained for further analysis. The  Table S3). According to the population structure analysis, the population was divided into three distinct subgroups (△K = 3) ( Fig. 2A), and a similar result was obtained from the kinship matrix analysis (Fig. 2C). Approximately one fth of the materials was assigned to different subgroups by the two programs because of the different statistical methods used (Supplemental Table S1).
Using association mapping, we obtained 17 chromosome intervals containing signi cant markers associated with GPC distributed on 14

Development and validation of KASP markers for candidate intervals
To further utilize the related QTL and associated markers, we tried to convert them into friendly KASP marker. Firstly, nine of these intervals were selected out according to the following three criterions (Table 6): (1) QTL detected in multi-environments; (2) Type A associated intervals; (3) repeatedly regions between QTL mapping and association mapping. Secondly, the loci with low homology in these intervals were chosen for marker development (Supplemental Table S2). At last, KASP genotyping was performed in the materials from RIL population and natural population to test the developed markers.
Association mapping was performed in a large breeding population (1163 F 4 lines) with the nine successfully developed KASP markers based on GLM (Fig. 4). Then, we compared the GPC of the lines with 1~9 GPC-increased alleles and GPC-decreased alleles, and the selected order of the markers was according to the P values from lowest to highest (Fig. 5). It was found that the difference between the lines with GPCincreased alleles and GPC-decreased alleles increased as the number of markers increased, and remained relatively stable as the three markers with lowest P values were used in selection. Therefore, the three markers of Kgpc-2B, Kgpc-2D, and Kgpc-4A with low P values (<10 −10 ) were applicable for GPC selection, and their combination presented more effective.

Application of signi cant KASP markers
Further, we used the three markers to test 164 F 6 lines, and 15 lines with GPC-increased alleles showed an average GPC of 14.85%, which was signi cantly higher than 13.15% for eight lines with GPC-decreased alleles ( Fig. 6 and Supplemental Table S4), thus indicating a good selection effect.
Exploration of candidate genes in the signi cant intervals The published IWGSC reference sequence provided detailed information for the identi cation of candidate genes, and 607, 42, and 235 high con dence (HC) genes were identi ed in the intervals of Kgpc-2B, Kgpc-2D, and Kgpc-4A, respectively (Supplemental Table S5). We also analyzed the homologous genes of Gpc-B1, and a homologous gene of TraesCS4A01G242700 was found in the interval of Kgpc-4A. Further, TraesCS4A01G242700 of the parents, Ningmai 9 and Yangmai 158, was sequenced (Supplemental Table S6), however, no sequence difference were found. Gene expression data of TraesCS4A01G242700 was extracted from WheatEXP, indicating that TraesCS4A01G242700 was expressed in the grain, leaf, spike, and roots, and it showed the highest expression level in roots (Supplemental Table S7). In addition, 23 of the HC genes in the interval of Kgpc-2D expressed in different tissues at reproductive stage, and 17 expressed in the grain were analyzed (Supplemental Table S8). indicating that it was di cult to be directly utilized in our practical breeding.

Discussion
Varieties with low GPC also contained some GPC-increased alleles, and the lines with expected high GPC could be produced as polymerizing more GPC-increased alleles. In this study, neither the parents of the RIL population showed good performance in GPC. The additive effect of eight and nine QTL was contributed from Ningmai 9 and Yangmai 158, respectively, and some derived RILs, which polymerize their GPCincreased alleles showed high GPC, which is consistent with many previous studies (Fatiukha et al. 2020; Li et al. 2009). Therefore, it is more feasible to improve GPC for exploring and polymerizing the GPC-increased alleles in local varieties.

Comparison of the candidate intervals for GPC with previous studies
In this study, we identi ed 17 QTL and 17 associated intervals using linkage mapping and association mapping, respectively. Most of the loci had a similar or close physical position to those reported in previous studies, particularly the ones selected for KASP marker development. Li et al. (2009) identi ed an interval on 1B related to GPC, grain hardness and wet gluten content, and the physical position was close to Kgpc-1B. ). Using high-density SNP assays, some QTL, such as Kgpc-2D and Kgpc-5A, were mapped to a small interval, which provided closer markers and facilitated gene exploration. Some adjacent loci were reportedly related to several quality traits and were worthy of further research and utilization. In addition, some loci, such as Qgpc-5B.1 and Qgpc-5B.2, and the associated intervals on 3D, were rst reported to enrich the genetic study of GPC.

Development of available KASP markers for GPC
For quantitative traits, polymerizing more loci may produce a better effect. However, as the number of unlinked loci increased, the breeding population needed to be large enough to obtain a target genotype. When the number of unlinked loci was three or four, the minimum population sizes were 293 and 1177, respectively. As the number of loci increased to ve and six, the population sizes expanded rapidly to 4714 and 18861, respectively (Wang et al. 2007). In our practical breeding program, a large number of loci and population sizes can make selection impracticable due to the high labor and time cost involved; therefore, 3-4 loci may be acceptable.
In this study, we obtained three e cient molecular markers, and their combination could produce a good selection effect, which depended on the appropriate materials, a reliable mapping method, and strict validation. In the application trial, 15 lines with GPC-increased alleles were selected from 164 F 6 lines, and they were used as candidate lines with high GPC. In practical breeding, to obtain more candidate lines, it was better to apply the markers in early generations, such as F 2 and F 3 . F 2 enrichment increased the frequency of selected alleles and was a considerable strategy in MAS (Bonnett et al. 2005).

Exploration of candidate genes for GPC in the signi cant intervals
We obtained three signi cant markers by genetic analysis, and gene exploration in the intervals would be the next topic of study. On one hand, gene exploration is the precondition of the development of functional markers, whereas on the other hand, it can help to clarify the mechanism underlying grain protein synthesis. The physical sizes and HC genes differed greatly, so the method used for further gene exploration may be different. The database of high-quality genome and transcriptome provided great support (Kuzay et al. 2019; Zheng et al. 2019). Gpc-B1, which encoded an NAC transcription factor, has been previously cloned (Uauy et al. 2006). We found a homologous gene, TraesCS4A01G242700, in the interval of Kgpc-4A, but both the sequencing and expression pattern of TraesCS4A01G242700 objected to a possible role in GPC accumulation. Therefore, ne mapping work should be considered for too many genes included in the interval of Kgpc-4A and that of Kgpc-2B. Combining the transcriptome data, 17 candidate genes expressed in the grain were identi ed in the interval of Kgpc-2D; however, further genetic and functional studies are needed.

Conclusion
In this study, we identi ed 17 QTL and 17 associated intervals by linkage and association mapping, respectively, and 9 of them were selected for KASP marker development. Large-scale association mapping was conducted on the basis of nine KASP markers and 1163 F 4 breeding lines, and the markers Kgpc-2B, Kgpc-2D, and Kgpc-4A were proved to be applicable for GPC selection and their combination was found to be more effective. The three markers were then used to test 164 F 6 lines, and candidate lines with high GPC were successfully obtained. Further, the strategies for gene exploration in the three signi cant intervals were discussed. These results are expected to be useful for wheat quality breeding in the middle and lower reaches of the Yangtze River.      Figure 1 Quantitative trait loci (QTL) mapping for grain protein content The numbers in the gure were consistent with the sequence numbers of QTL in Table 4.

Figure 4
Signi cance of the candidate markers for grain protein content Comparison of grain protein content among the lines with different numbers of candidate markers The numbers below the box indicated the numbers of candidate markers used for selection, and the select order of the markers were according to the P values from lowest to highest.
The horizontal dashed line indicates the mean value of all lines; The horizontal line in the data box indicates the median; '×' in the data box indicates the mean value.