Candidate Genes and Stable QTL for Grain Yield and Seed Size in Durum Wheat

Background: In wheat grain yield is expressed as the product of different components. Among these, thousand kernels weight (TKW) reects the combination of several grain related traits including grain length (GL), grain width (GW) and area. Grain weight is also affected by phenological traits, such as heading time (HT) and plant height (PH). To detect stable QTL and candidate genes involved in phenotypic control of grain yield, a recombinant inbred line (RIL) population derived from two elite durum wheat cultivars (Liberdur and Anco Marzio) was evaluated for yield components and grain related traits for three growing seasons in southern Italy. The mapping population was genotyped with a 90K SNP array and a high-density genetic linkage map with 5134 markers was obtained. Results: A total of 30 QTL were detected on the durum RIL population including 9 stable QTL for TKW (2 QTL), GL, GW (2 QTL), AREA, HT and PH (2 QTL) distributed on 1B, 2A, 3A and 6B chromosomes. Interestingly, a QTL cluster mapped on 2A included a major QTL for HT explaining at least 70% of phenotypic variance and co-located with a QTL for YLD, TKW, GL and GW and AREA, respectively. In the physical position of this QTL cluster a photoperiod sensitivity gene (Ppd-A1) was found. Serine carboxypeptidase, Big Grain 1 and β-fructofuranosidase candidate genes were mapped in clusters containing stable QTL. Candidate genes involved in auxin metabolism were also found in QTL clusters in which a QTL for AREA was declared. Conclusions: This study showed that yield components and phenological traits had higher inheritances than grain yield, allowing an accurate stable QTL cluster detection. This was a powerful requisite to physically map QTL on the reference durum wheat genome and to identify candidate genes strongly affecting the genetic grain yield network.

Understanding the genetic and molecular determinants of grain size might provide valuable information on the markers to be used for improving grain yield. As extensively reviewed by Nadolska-Orczyk et al. [5], the major genes determining yield-related traits can be classi ed in several groups. Among these, there are: transcription factors, which can affect grain number by regulating spike development; genes involved in metabolism or signaling of growth regulators, affecting plant architecture; genes determining cell division and proliferation, related to grain size; oral regulators, which regulate in orescence architecture and seed number, and genes involved in carbohydrate metabolism, affecting plant architecture and grain yield.
The most advanced knowledge on the genetic factors controlling grain size were detected in rice. Indeed, quantitative genetic studies and map-based cloning have allowed to isolate several genes associated with grain size and weight in rice [6]. Rice OsGW2 gene was found to affect both grain weight and width [7], as well as its wheat orthologue, named TaGW2, which was mapped on chromosome 6A [8] and found to be related to grain weight, grain width and length [9]. Similarly, allelic variation at TaGS-D1, the wheat orthologue of rice GS3 [10], showed main effects on grain weight and kernel size [11]. Genes involved in starch and sucrose metabolism pathways were also shown to affect grain weight, such as TaSus1 and TaSus2 [12,13]and TaCwi-A1 [14]. In addition, major phenology loci can exhibit pleiotropic effects on spike and kernel traits. These include the locus Rht1, which have been intensively used since the Green Revolution in breeding programs worldwide, and the major photoperiod sensitivity locus Ppd1, which is known to affect heading time as well as a range of other traits [15]. In particular, Ppd1 genes play an important role in wheat growth and development regulation, by affecting the accumulation and distribution of dry matter within the plant and modifying source-sink equilibrium [16,17]. Indeed, it has been estimated that up to 35% of bread wheat's grain yield increment observed in Europe has been due to photoperiod insensitivity [18].
In the last decade, SNP markers have become fundamental for both genetic studies and breeding programs, and the development of wheat high-density SNP array [19] provided the most innovative tool for candidate genes searching by both QTL mapping or GWAS studies. Moreover, the recently released durum wheat genome, could provide powerful information to decipher complex marker-traits associations and simplify the searching and discovery of genes underlying important agronomic traits [20].
In the present study, a high-density genetic linkage map based on iSelect 90K SNP markers was realized using the durum wheat Liberdur x Anco Marzio recombinant inbred line (RIL) population that was evaluated to detect QTL controlling grain yield components and grain size. Furthermore, the availability of a full reference durum wheat genome [20], allowed the physical projection of the previously identi ed QTL genetic intervals on the reference Svevo genome, as well as the detection of candidate genes involved in phenotypic control of grain yield and grain size.

Phenotypic characterization
The two parental line, Liberdur and Anco Marzio, and the RILs mapping population were evaluated for yield components and grain size traits (YLD, PH, HT, TKW, GL, GW and AREA) for three growing seasons (2016, 2017 and 2018) in southern Italy (Valenzano, Bari, Italy). Analysis of variance revealed highly signi cant differences among RILs for all traits in each year (Additional le 1: Table S1), while the  combined analysis across years revealed signi cant effects of RILs, years and a strong genotype x year  interaction (Table 1). However, although the strong season effect, the genotype variability was higher than genotype x year component for all traits with the exception of YLD.
Mean values of Liberdur, Anco Marzio, and RILs in single season and across the three led trials are reported in Table 2. The two parents showed signi cant differences for HT, PH, AREA in all three seasons, and for TKW, GL and GW in two trials. In particular, Anco Marzio has generally larger grains compared to those of Liberdur, which were narrower. Analysis of the frequency distribution of traits in the RIL population was performed to have a preliminary idea of the genetic basis for each trait. A bimodal distribution was observed for HT indicating a single locus segregating in the RIL mapping population (Fig. 1). By contrast, a normal distribution obtained for YLD, PH, TKW and grain size traits would be indicative of several to many loci each contributing to a small proportion of the total variation observed. High transgressive segregation was recorded for all traits suggesting the presence of superior alleles for grain yield components in both parental lines. Low values of broad sense heritability were estimated for YLD, con rming that the trait was strongly affected by environmental conditions. High heritability values, exceeding 0.55, were found for HT, GL and AREA while TKW heritability values ranged from 0.45 to 0.72 (Table 2).
Phenotypic correlation among the traits across the three seasons are reported in Table 3. HT was negatively correlated to TKW (r = -0.24), GW (r = -0.36) and AREA (r = -0.19), while a positive correlation was shown with YLD (r = 0.55). As expected, the grain size traits were inherently correlated, such as AREA and GL (r = 0.86). TKW showed a high positive correlation with AREA (r = 0.95), GL (r = 0.70) and GW (r = 0.83).

Genetic Linkage Map
Out of 81,587 SNPs assayed, 5543 (6.8%) resulted as failed and 67,999 (83.3%) were monomorphic across the mapping population. The remaining 8,045 (9.86%) were polymorphic; however, 2,686 had more than 10% missing data and 225 with distorted segregation (at p ≥ 0.05 value) were excluded from further analysis. Hence, 5,134 (6.29%) markers were used for the genetic map construction. The 5,134 loci were grouped in 21 linkage groups when using a LOD score 5. Linkage groups were assigned to the A and B genome chromosomes using the durum consensus map [21]. Eight chromosomes (1B, 2A, 3B, 4B, 6A, 6B, 7A and 7B) were assembled in a single linkage group (Table 4). Twenty-tree loci assembled in a linkage group of 4A chromosome resulted to be coincident in the same position. Therefore, this linkage group was discarded from the QTL analysis. A total of 2085 markers were localized on the A genome with a total length of 1,145.25 cM, whereas 3,049 were mapped on the B genome (total length 1,062.69 cM). The entire map covered 2,207.94 cM with an average chromosome length of 157.71 cM. The lengths of individual chromosomes varied from 73.25 cM (chromosome 5B) to 197.22 cM (chromosome 7A). The overall SNP density was 2.3 markers/cM, with a maximum of 3.8 for chromosome 1B and a minimum of 1.2 for chromosome 4A. The SNP markers were generally well distributed throughout the genome, although some chromosomes exhibited higher densities.

QTL detection
A total of 30 putative QTL were detected on ten chromosomes in the durum wheat Liberdur x Anco Marzio RIL mapping population (Table 5). Two YLD QTL were located on chromosomes 2A and 7B respectively, explaining individually 11.9-67.9% of the phenotypic variance. QYLD.mgb-2A, declared in 2016 and across years, showed the additive effect being provided by the parental line Liberdur (Fig. 2). QYLD.mgb-7B was detected in 2018 and displayed the additive effect being conferred by Anco Marzio.
Two QTL conferring HT were detected on chromosomes 2A and 7A, accounting for 2.4 to 85.7% of the explained phenotypic variance. The -log10(P) scores of QHT.mgb-2A ranged from 44.6 to 69.7 and explained over 70% of the phenotypic variance in each year. This locus was stably detected in three years and across years, resulting as a major QTL, which additive effect was contributed by Liberdur.
Four QTL were found to be signi cantly associated with PH on chromosomes 2B, 3A, 6B and 7A, accounting for 7.8 to 17.5% of the explained phenotypic variance. QPH.mgb-3A and QPH.mgb-6B were stably detected in two eld experiments and across years. The additive effects at QPH.mgb-3A and QPH.mgb-6B were provided by Liberdur and Anco Marzio, respectively.  2) in the interval between 87.5 and 89.6 cM. Interestingly, among the ve QTL for TKW, two co-located together with QTL for GL, GW and AREA on chromosome 2A (both regions), and two co-located with QTL for GL or AREA on chromosomes 3A and 6B, respectively. Four out of 6 QTL for GL resulted co-located with QTL for GW on chromosomes 1B, 2A (two QTL) and 6B. QTL for AREA always co-located with grain traits QTL (TKW, GL, and GW) on chromosomes 2A (2 QTL), 3A, 4B and 7B.

Candidate genes involved in grain yield related traits
To identify candidate genes in physical regions underlying the QTL detected in the RILs mapping population, each region of interest was projected on the recently released reference durum wheat genome of cv. Svevo [20]. Regions with two or more co-locating QTL were considered as a QTL cluster (Table 6), and the sequences of anking markers of each of them were anchored to their physical position on durum wheat genome. A total of eight clusters were searched for gene content, and the retrieved IDs genes were screened for their annotated functional role and involvement in cellular metabolisms, which were con rmed by searching for paralogues and orthologous genes in closely related species. The QTL cluster size ranged from 0.4 Mbp (on chromosome 3A, where two QTL for TKW and AREA co-located) to 57.1 Mbp (on chromosome 6B, with a QTL for TKW and a GL co-locating).
Among the several candidate genes localized within the QTL cluster regions, three were particularly noteworthy, as previously reported to be directly involved in grain yield: TRITD2Av1G019250, encoding for pseudo-response regulator (Ppd-A1) in the 2A cluster that included QHT.mgb-2A (Table 5); TRITD4Bv1G171270, encoding for a Big Grain 1 protein in the 4B cluster, and two candidate genes encoding for an acid β-fructofuranosidase (TRITD6Bv1G005370 and TRITD6Bv1G005450), paralogues of the cell wall invertase gene, both in the 6B cluster containing QTL for GL, GW and PH.
Speci c protein classes were frequently observed, such as proteins involved in ubiquitination processes, including E3 ubiquitin-protein ligase and RING U-box superfamily proteins (identi ed in six QTL clusters), cytochrome P450 (identi ed in four QTL clusters) and thioredoxins (identi ed in two QTL clusters), as well as serine carboxypeptidase proteins (identi ed in two QTL clusters).
Interestingly, in all cluster in which a QTL for AREA was detected, candidate genes involved in auxin metabolism were found: TRITD1Bv1G118820, TRITD2Av1G189400, TRITD7Bv1G173200, encoding for protein involved in auxin response; TRITD4Bv1G175480, involved in auxin signalling, and TRITD3Av1G012070, a paralog gene of the YUC family encoding for a Flavin-containing monooxygenase, which is directly involved in auxin biosynthesis. Similarly, except for one cluster on 2A chromosome, in all QTL clusters in which the QTL for GW were co-located, candidate genes encoding for cytochrome P450 were found.

Discussion
Grain yield is considered a typical quantitative trait, controlled by a complex genetic system and strongly in uenced by both environmental factors and agronomic management. Grain yield re ects the combination of thousand kernels weight and grain number per area. Besides being one of the key components of grain yield, thousand kernels weight is also often used as standard parameter both for our-milling yield and marketing standard. Thousand kernels weight is mainly and tightly underpinned by grain morphology including grain length, grain width and area. Grain yield can be maximized by growing varieties which heading time allows the crop to avoid stresses during grain-lling phase [17]. Indeed, heading time is a critical stage that delimits the duration of spike formation and marks the transition into the grain-lling period, during which grain per spike and grain weight are both de ned. In addition, in the last century, grain yield improvement was obtained through increased harvest index and straw strength with the introduction of major genes conferring reduced plant height.
In this study, YLD, TKW, morphology grain traits (GL, GW and AREA), HT and PH were investigated by using a RILs mapping population consisting of 133 lines derived by crossing the elite durum wheat cultivars Liberdur and Anco Marzio. Plant materials were grown for three years in eld trials.
A total of 30 QTL were identi ed and localized on 10 out of 14 durum wheat chromosomes. Among them, there were 9 stable QTL for TKW (2 QTL), GL, GW (2 QTL), AREA, HT and PH (2 QTL), distributed on 1B, 2A, 3A and 6B chromosomes, which were detected at -log10(P)>3.0 in at least two years and across years (Table 5). Interestingly, 6 out of 9 stable QTL were co-located in QTL clusters on chromosome 2A, while the other three were detected on 1B, 3A and 6B chromosomes (Fig. 2). Furthermore, 4 out of 8 QTL clusters were mapped on 2A and 6B chromosomes, and the remaining ones located on 1B, 3A, 4B and 7B.
The rst QTL cluster on chromosome 2A, comprised within 35.6-36.4 cM, included two stable QTL (QHT.mgb-2A and QTKW.mgb-2A.1) along with a QTL for YLD, GL, GW and AREA. In this region, the QHT.mgb-2A can be considered a major QTL involved in the phenotypic control of HT as it was declared in each eld trial with a -log10(P) ranging from 44.0 to 69.7, and a R 2 from 74.3 to 85.7, respectively. The candidate genes analysis performed on durum wheat genome detected a pseudo-response regulator protein (TRITD2Av1G019250), corresponding to a photoperiod sensitivity gene (Ppd-A1), within the physical position of this QTL, thus con rming the effect of Ppd-A1 gene on YLD. QHT.mgb-2A.1 was previously detected in the same genomic region co-locating with a QTL for YLD in a durum multi-parental cross population [22]. Wilhelm et al. [23], found two large deletions within the Ppd-A1 gene in durum wheat (1027 and 1117 bp deletion designated as alleles 'GS-100' and 'GS-105, respectively), which remove a common region from the wild-type sequence. Therefore, our results suggested that the parental lines Liberdur and Anco Marzio could be different for the alleles at Ppd-A1 locus. Similar conclusions were reported by Maccaferri et al. [24], who identi ed a QTL reducing heading date associated with Ppd-A1 gene in a RILs population derived from the cross 'Kofa' ('GS-100' allele) × 'Svevo' ('GS-105' allele), thus suggesting that these alleles decrease photoperiod sensitivity at different degrees. Heading time resulted strongly related with grain yield (r=0.55), underlining the importance of phenology tting in maximizing grain yield as observed by Royo et al. [25]. 2) declared in three years and across years. The positive additive effect was provided by the parental line Liberdur for all four QTL. As shown by correlation analysis data, TKW was always positively and signi cantly correlated to GL, GW and AREA. Considering both these results, it can be suggested that TKW improvement could be due to the grain size increase. A QTL for TKW was previously detected in this region by a genome wide association mapping study on a tetraploid wheat collection [27], thus validating the presence of a metaQTL regardless of the identi cation approach. Six putative candidate genes were included in this QTL cluster, and among them, a serine carboxypeptidase protein. Previous studies have shown that serine carboxypeptidase enzymes are involved in several biological processes, including development of plant organs [28,29], cell division [30] and cell elongation [31]. Indeed, by promoting cell division, these enzymes determine larger grain size due to an increased cell number [30]. So far, our result was not unexpected considering that also other studies on dissecting grain size detected a strong relation with serine carboxypeptidase proteins [32,33].
A QTL cluster on 6B chromosome included the stable QTL QPH.mgb-6B, which was detected in the same region where QTL for PH were already reported by Canè et al. [34] and Soriano et al. [35]. Two additional QTL co-localized in this cluster, QGL.mgb-6B.1 and QGW.mgb-6B, where Canè et al. [34] detected a QTL for test weight, a trait which is known to be affected by grain size [36]. Therefore, it could be supposed that high values of test weight might be determined by increasing GL and GW. The co-localization of QTL affecting different traits such as PH, GL and GW, implies closely linked genes involved in different biological processes related to yield [37]. Among the identi ed putative candidates, two genes, TRITD6Bv1G005370 and TRITD6Bv1G005450, both encoding for acid β-fructofuranosidase enzyme. These two genes have 22 paralogues in the B genome of durum wheat, which probably diverged from a common ancestral gene and evolved by duplication. More interesting is the fact that these genes are paralogues of cell wall invertase genes. β-fructofuranosidase (EC 3.2.1.26) are indeed commonly known as invertase in plants. These genes are involved in the carbohydrate metabolic process and have a common molecular function, as invertase/fructofuranosidase belong to the Glycoside hydrolase family 32 [38]. Ma et al. [14] reported an association between a cell wall invertase gene (TaCwi-A1) and TKW, useful to improve grain yield. Li et al. [39] showed that expression of the cell wall invertase genes signi cantly improved shoot growth, grain yield and starch content in transgenic maize plants, and speci cally increased both grain size and grain number.
QTKW.mgb-6B and QGL.mgb-6B.2 were co-located on the second QTL cluster on 6B, suggesting that TKW improvement could be due to GL increase. In the same region, Eloua and Nachit [40] reported a QTL for TKW using a linkage mapping population. Notably, the positive additive effect of the ve QTL colocating in the QTL clusters on 6B chromosome, was provided by the parental line Anco Marzio, indicating that this durum cultivar has potentially useful alleles for grain yield improvement.
Two QTL for grain size (QGW.mgb-4B, QAREA.mgb-4B) were found to co-localize on 4B chromosome. No QTL for grain size has been previously mapped in this region, indicating that could be considered a new QTL cluster associated to grain size. The positive additive effect was provided by the parental line Liberdur. Among the candidate genes identi ed within the 4B cluster, one was particularly noticeable, a Big Grain 1 protein (TRITD4Bv1G171270), which has been annotated as a positive regulator of auxin response and transport, as well as a regulator of grain size. The rice orthologue gene has been well characterized by Liu et al. [41], who showed how its activation signi cantly improved grain size. This protein, localized in plasma membrane, is induced by auxin treatment and its expression in vascular tissues could improve plant productivity, the most signi cant change being observed for increased grain size and grain weight.
The QTL cluster located on 1B was reported for the rst time in this study, as no previous QTL for GW and GL have been reported in this region. Nevertheless, the positive additive effect of the stable GW QTL negatively affected the GL, suggesting that linked and/or pleiotropic genes for grain size map in this region.
QTL clusters on 3A and 7B chromosomes included a QTL for AREA, which co-localized with a TKW and a GW QTL, respectively, con rming the positive and highly signi cant correlation among TKW and grain size.
Candidate genes searching highlighted two more interesting regions on 3A and 5A chromosomes, respectively coincident with a PH QTL and a GL QTL (Fig. 2). Indeed, the candidate genes screening on 3A chromosome identi ed a serine carboxypepdidase, TRITD3Av1G187020, and a NADH-dependent glutamate synthase (NADH-GOGAT), TRITD3Av1G177110, con rming their involvement in grain yield [32,33,42]. Additionally, an interesting gene was found in the GL QTL region on 5A chromosome, TRITD5Av1G037950, encoding for an expansin protein, a paralogue of TRITD2Av1G019110 gene, reported in the rst 2A QTL cluster, which belong to a gene family previously found related to both grain size and yield [43,44].

Conclusions
In this study showed how yield improvement could be pursued considering yield components (TKW, GL, GW and AREA), as well as phenology related traits (PH and HT). These yield sub-components showed higher inheritance than grain yield, allowing a more accurate and powerful stable QTL detection. Physical anchoring of these QTL on the reference durum wheat genome cv. Svevo, enabled the identi cation of candidate genes strongly affecting the genetic grain yield network. Therefore, the availability of SNPs markers within candidate genes sequences, might represent a new breeding strategy based on functional markers, determining a more e cient grain yield genetic gain. During each eld trial heading time (HT) was recorded as the number of days from March 1 st , to 50% earemergence, corresponding to stage 55 on the Zadoks et al. [45] scale. Plant height (PH) was measured at complete maturity of plants, and grain yield (YLD) was measured after harvesting the plots. The morphometric grain related traits were determined by digital imaging analysis. For each replication of each line, 10 g of kernels were scanned using high resolution scanner-based image analysis. The images were processed using the Image-Pro Plus 7.0 software (Media Cybernetics, USA). Grain length (GL), grain width (GW), grain area (AREA), and kernels number were measured. The kernels number was used to calculate the thousand kernels weight (TKW).

SNP genotyping and linkage map construction
Genomic DNA from each RIL and parental line (Liberdur and Anco Marzio) was diluted to 50 ng/μL and further analyzed with the wheat 90K iSelect array [19]. Genotyping was performed by TraitGenetics GmbH [46] following the manufacturer's recommendations as described in Akhunov et al. [47]. The genotyping assays were carried out using the Illumina iScan reader and performed using GenomeStudio software version 2011.1 (Illumina, San Diego, CA, USA).
Chi-squared tests were used to determine the goodness-of-t at p > 0.001 of segregation ratios to expected 1:1 ratio for each SNP. All markers with more than 10% missing data or segregating as presence/absence in the mapping population were excluded from further analysis. Linkage analysis between markers and determination of the linear order of loci was performed by QTL IciMapping 4.1 using the BIN and MAP functions [48]. Grouping was performed using the independence LOD parameter, with groups showing a LOD 5. The Kosambi mapping function was used to calculate map distances. SNPs data from the durum consensus map [21] were used as anchor loci and for assigning linkage groups to speci c chromosomes. Linkage groups were named accordingly to the wheat chromosome nomenclature followed by a number linkage groups within chromosomes.

Statistical analysis and QTL detection
Analysis of variance (ANOVA) was performed to test the signi cance of differences among the RILs and replications, using the MSTAT-C software. The effects of replicates and genotypes were considered in the model in each year. All data from each eld trial were tested by combined analysis of variance using the statistical model as follows: Y ijk = μ + g i + α k + δ ik + β jk + ijk (1) Where μ is the general mean, g i is effect of the i-th line, α k is effect of the k-th year, δ ik is interaction effect between the i-th line and the k-th year, β jk is effect of the j-th replication within the k-th year. The g i , δ ik and 's effects are assumed independently and randomly distributed, with zero means and variances of σ 2 G , σ 2 GY and σ 2 , respectively.
The phenotypic traits' average values of each eld trial (2016, 2017 and 2018) and the mean across trials were used for QTL analysis. Inclusive Composite Interval Mapping (ICIM) method was employed for QTL mapping using IciMapping 4.1 software [48]. A threshold P value of 0.001 (−log10(P) ≥ 3.0) was used for QTL detection [51][52][53], while suggestive QTL were considered at the sub-threshold 2.5<-log10(P)< 3.0 when declared at least in one year. A QTL was considered stable when detected at -log10(P)≥3.0 in at least two years [35,[53][54][55]. The phenotypic variation explained (PVE = R 2 ) and additive effect were estimated for each detected QTL. Positive or negative additive effect indicates the increasing or decreasing effect of the parental line Liberdur. QTL were named according to the rules of International Rules of Genetic Nomenclature [56]. The QTL name combined the traits evaluated ('YLD', 'PH', 'HT', 'TKW', 'GL', 'GW', and 'AREA') and the Research Institute (Genetics and Plant Breeding Section, University of Bari, 'mgb') that carried out the experiments. Graphical representation of linkage groups and QTL was carried out using MapChart 2.2 software [57].
Candidate genes research QTL intervals detected on genetic map were physically mapped on the durum wheat reference genome Svevo [58]. Left and right anking markers of each con dence interval were rst searched in the durum consensus map [21], and then projected on Svevo genome. When the marker on RIL mapping population was not mapped on the consensus map, the closest one was chosen.
The projected anking markers were searched and positioned on the durum reference genome, and the annotated genes within each interval were screened based on their con dence and functional annotation.