Genome-Wide Association Study of Multi-Traits in Common Bean (Phaseolus Vulgaris L.)

Background: Micronutrient deciencies caused by lack of Iron (Fe), Zinc (Zn) and Vitamin A have negative effects on human health worldwide. Even though common bean meets Fe and Zn deciencies based on recommended dietary allowances, the cooking time remains an important challenge. Understanding the genome organization, the loci and genes localization controlling multiple traits is crucial for developing enhanced micronutrient content and improved common bean with short cooking time. Results: In this study, GWAS method was used to determine SNPs associated with four traits studied, including water uptake, cooking time, Fe and Zn concentration level, using ~5,000 SNPs for a panel of 206 genotypes. By the analysis of population structure, 206 common bean genotypes were formed in three groups, the Mesoamerican gene pool, the Andean gene pool, and the admixture gene pool using the Bayesian method. With the signicance levels between p = 2.94 x 10 -08 to p = 8.33 x 10 -03 and phenotypic variation of 8.26% to 18.19%, total of 10 SNPs was found to be signicantly associated with traits on chromosomes Pv03, Pv04 and Pv10. Among of these SNPs, four SNPs were associated with water uptake, another four SNPs with the cooking time, one SNP with Zn concentration and one of less signicant SNP associated with Fe concentration. Within the range or nearby the peak of signicant SNP, putative loci encoding pectin degradation, cell structure and cellular exchange/transport were identied, suggesting these loci may play important roles in modulating expression of traits. Conclusion: The resultant data in this study contributed on deciphering the molecular mechanisms associated with water uptake, cooking time and micronutrient contents in common bean. The investigation of association between phenotypic and genotypic data has showed that GWAS approach is a powerful tool to dissect genetic architecture of complex traits and cooperate molecular breeding for advanced genotypes with enhanced micronutrients and short cooking time.


Background
Common bean (Phaseolus vulgaris L.) was domesticated in two centers, the Andean Center of Domestication (ACD) or andean gene pool and Mesoamerican Center of Domestication (MCD) or Mesoamerican gene pool from Andes mountains of Peru and Lerma-Santiago basin of Mexico respectively [1]. Common been is one of the most important legume crops that contributes to human nutrition [2]. It provides an important source of nutrients for more than 300 million people in different parts of the world, including Eastern Africa and Latin America where it represents 65% of total protein consumed, and 32% of energy [3], [4]. It is also the major source of micronutrients including Iron (Fe) and Zinc (Zn) [5] that are essential for the normal growth, development, reproduction and other physiological functions in the human body [6]. Fe and Zn bioavailability may be impeded by phytate content in common bean even though they provide a rich source of Fe and Zn. Some studies showed that the iron bioavailability, i.e. the amount that a person would absorb, is higher in the quicker cooking beans from different bean market classes [7] as micronutrients gradually decreased with the increasing of cooking time [8]. In order to improve micronutrient bioavailability, pre-soaking, germination and or fermentation of seeds in water for at least 12 hours are adopted to reduces the phytic acids [9] for accelerating the cooking time. It is believed that fast cooking with enhanced micronutrient bioavailability has impact on bean consumption.
More than 2 billion people around the world are suffering from vitamin and micronutrient de ciencies [10]. Imbalances of these nutrients lead to a signi cant risk of illness and mortality among children under ve years of age, pregnant women and lactating mothers [11]. Common beans provide a rich source of Fe and Zn based on recommended dietary allowances; however, they often take a long time to cook [12]. Many communities around the world have limited fuels for cooking. They mainly rely on burning wood, charcoal or other available source of biofuels. Cooking of common bean is time consuming; therefore, leading to deforestation in areas using burning wood as main source of energy [13]. Traits of cooking time and micronutrient contents in common bean varies as changes in physical and chemical property (i.e. seed size, thickness and seed coat color) and environment (relative humidity and storage temperature) [14], which is genetically regulated. Understanding the genomic regions related to the variation of cooking time and micronutrient contents allows to genetically manipulate these traits and helps to improve this crop with enhanced micronutrient content and short cooking time, which may lead to higher consumption.
Genome-wide association study (GWAS) has been used to determine genomic variants associated with traits of interest using recombinant inbreed line population or landraces population [15], [16]. Literatures have demonstrated that some genomic regions have been identi ed to be associated with traits, such as bruchid resistance [17], cooking time of andean diversity panel [18], agronomic traits [19], drought tolerance [20], anthracnose and angular leaf spot diseases [21] and symbiotic nitrogen xation [22] in common bean. However, there is a limited information on grain micronutrient contents of Fe and Zn as well as seed water imbibition during pre-soaking and cooking time. Therefore, the aim of this study was to dissect these complex traits, identify genomic regions associated with variations of traits, and reveal putative genes mediating traits using the GWAS technology in a diverse population of common bean.

Phenotyping multiple traits
There were large variations in traits of water uptake and cooking time among genotypes studied. The initial moisture level across the samples ranged between 10-14% (data not shown). After soaking in water overnight, the maximum water uptake was 61.54% and minimum was 3.7% in this collection of genotypes ( Figure 1a). The longest cooking time was 76 minutes and 15 seconds while the shortest was 14 minutes and 17 seconds (Figure 1b). Negative correlation of r = -0.20 at p<0.001 was observed between water uptake and cooking time among the common bean genotypes.
Traits for seed micronutrient contents of Fe and Zn showed signi cant difference (p<0.001) among the genotypes and environments. The concentration of Fe varied from 47.00 to 112.70 mg/kg (Figure 1c) while Zn concentration had a range of 21.73 to 42.50 mg/kg (Figure 1d). There was a positive correlation of r = 0.51 at p<0.001 between Fe and Zn contents. The positive correlation illustrated that Fe concentration increased as Zn increased in both locations, suggesting micronutrients Fe and Zn might share a similar metabolic pathway, such as transport and translocation, for uptake and assimilation of Fe and Zn.
Population structure A large number of genome-wide DNA markers could cover most recombination variation in a natural population. In this study, the SNP array containing 5052 SNP markers was used to assess population structure through the principal component analysis. The collection of 206 common bean genotypes is divided into three clusters (K=3) using the Bayesian method, in which the rst cluster (K=1) consisted of 113 genotypes belonging to the Mesoamerican gene pool; the second cluster (K=2) consisted of 72 genotypes belonging to the Andean gene pool and the third cluster (K=3) consisted of 21 genotypes belonging to the admixture gene pool (Figures 2). Since this collection contained some breeding lines derived from different sources of parents, there was a possibility that offspring shared Mesoamerican and Andean gene pools leading to these "mosaic" chromosomes in the third cluster.
To evaluate the difference of micronutrient concentration levels among gene pools, the concentrations of Fe or Zn of genotypes in each group were calculated for their median concentration at different locations. The result demonstrated that the median concentration of Fe or Zn in Mesoamerican and Andean gene pools was not signi cantly different, except that the concentration of Zn was signi cant difference between two gene pools at SARI. However, the median concentration of Fe and Zn in the admixture group was signi cantly similar to Mesoamerican gene pool than to Andean gene pool depended on the location plants grow (Figure 3).

Marker-trait associations
Water uptake and cooking time The water uptake of a dried mature seed represents the capability of seed imbibition that is related with the cooking time, i.e. higher water uptake, shorter cooking time. A total of 10 signi cant SNPs associated with water uptake was identi ed on the Pv03, Pv04, and Pv10 ( Figure 4). The most signi cant SNP was ss715649433 (p = 2.94 x 10 -08 ) located on Pv04, which accounted for 18.19% of phenotypic variability (Table 1). For cooking time trait, 11 SNPs were detected on the Pv03, Pv04, and Pv10 and explained about 9.31% -10.06% phenotypic variations and the highest signi cant SNP was ss715643803 (p = 2.14 x 10 -05 ) on Pv03.

Micronutrient concentration
Micronutrients are essential for human growth and play an important role in balanced human health. Common bean provides a rich source of Fe and Zn for human. In this study, Fe and Zn considered as signi cant invisible traits in common bean genotypes. Signi cant SNPs were identi ed for Zn on the Pv04 ( Figure 4) and the most signi cant (p = 5.98 x 10 -05 ) SNP ss715646134 explained 11.09% of phenotypic variation (Table 1). Signi cant SNPs were identi ed for Fe on Pv10 with the most signi cant SNP (ss715646330) at p = 8.33 x 10 -03 explaining 8.26% of phenotypic variation (Table 1).

Putative causative loci
Traits evaluated in this study displayed a quantitative feature as continuous distribution of values in this collection. The GWAS approach was used to dissect genetic architectures of these complex traits and to uncover the causative loci for traits. Signi cant SNPs identi ed on chromosome regions associated with various traits facilitate the identi cation of putative causative loci nearby SNP peaks. In this study, several putative candidate genes related to traits studied were identi ed at the anking sequence of signi cant SNPs. For instance, nuclear pore complex protein GP210 (PHVU_004G008000g), galactinol-sucrose galactosythransferase 5 (PHVU_004G007100g), and glycosyltransferase BC10 (PHVU_004G007000g) were linked to signi cant SNPs located on Pv04 for the trait of water uptake; oligoudylate-binding protein 1-like (PHVU_003G129100g), probable polygalacturonase (PHVU_003G129300g), and ubiquinone biosynthesis protein COQ9 (PHVU_003G128500g) were near signi cant SNPs in chromosome 3 for cooking time; transmembrane 9 superfamily member 3-like (PHVU_004G145000g); transcriptional factor MYB52 (PHVU_004G144900g); and probable cyclic nucleotide-gated ion channel 20 (PHVU_004G144200g) were associated with Zn concentration within signi cant SNP peak on Pv04. Several putative causative loci underlying traits were listed in Table 2.

Discussion
Genetic diversity of common bean provides a source of variations in various agronomic traits for improvement of varieties. The genetic variation characterized by different molecular markers can be utilized in plant breeding via marker-assisted selection to pyramid favorable loci into a cultivar. Two gene pools are available in common bean and most of the genotypes collected in this study belongs to Mesoamerican gene pool. However, a separated group of genotypes (admixture) was observed between the two gene pools after the analysis of population structure. This admixture group consists of breeding lines that stemmed from the hybridization of genotypes of the two gene pools. Similar results were observed from previous studies [23]- [25]. Breeding for crop improvement is depending on the enhanced of single or multiple traits that is desirable for food and nutrition systems [26]. In this study, the median concentration of Fe and Zn from genotypes in the admixture group was higher than the median concentration of genotypes in the two gene pools, suggesting that the Fe or Zn concentration in hybridization bean is mediated by minor additive effect and in uenced by environment. Micronutrient uptake and accumulation traits in plant are heritable [27]; few additive genes controlling Zn concentration were also reported in rice, maize and soybean [28]- [30]. Trait controlled by additive effect has been bene cial to effectively breeding for micronutrient improvement.
Common bean contains high levels of important micronutrients especially Fe and Zn compared to the low level of Fe and Zn in many other staple food crops [31]. A previous study demonstrated that QTLs related to the concentration levels of Fe and Zn were identi ed on Pv02, Pv06, Pv07 and Pv08 by utilizing microsatellite and RAPD markers [32]. Another study reported that SSRs markers associated to the Fe concentration level on Pv02, Pv05, Pv06, Pv07, Pv09 and Pv10, while markers associated with Zn on Pv01, Pv03, Pv05, Pv07 and Pv10 with one marker in Pv04 [33]. In this study, probable cyclic nucleotidegated ion channel 20 (CNGC20) locus was observed nearby the signi cant SNP associated with the Zn concentration on Pv04. Zinc is micronutrient essential for all living organisms with key roles related to growth, development, and defense. Zinc ions are involved in oxidation/reduction processes, thus have both bene cial and toxic effects on plant cells. CNGCs are believed to be involved in the uptake of both essential and toxic cations [34]. The identi ed CNGC20 in common bean may play a role for maintaining Zn homeostasis. Although Zn seems to affect the capacity for water uptake and transport in plants [35], the correlation (r = 0.06) between Zn concentration and water uptake obtained in this study did not support this conclusion.
For the trait of Fe concentration, although we did not identify highly signi cant SNPs associated with Fe concentration, a putative causative locus was detected very closed to a less signi cant SSR (8.33 × 10 − 03 ) located on Pv10. This gene is annotated as cation/calcium exchanger 4 (CCX4) and identi ed as cation transporter. In Arabidopsis, AtCCX4 is expressed throughout the plant [36]. Although the mechanisms underlying the tra cking of iron are not clearly understood, chloroplast is the iron-richest system in plant cell [37]. We also identi ed another locus cytochrome P450 (CYP) (PHVU_010G126900g) within same peak of signi cant SNP. The CYPs are enzymes that oxidize substance using iron [38]. Because iron is highly reactive and need to shield Fe ions interaction with other molecules causing oxidative damage to the cell [37]. The CCX4 identi ed in this study may play a key role for cellular iron homeostasis as CNGC20. The correlation (r = 0.13) between Fe concentration and water uptake may suggest that a higher concentration of Fe may have a synergized effect with water uptake.
Interestingly, nuclear pore complex protein GP210 locus was detected on Pv04 and the most signi cant SNP was exactly resided within the sequence range of this locus. The nuclear pore complex (NPC) controls all tra cking of molecules in and out of the nucleus [39]. GP210 is a membrane-spanning glycoprotein and a major component of the NPC, mediating molecular transport between nucleoplasm and cytoplasm [40], [41]. Whether GP210 gene affects water uptake in common bean, needs further detail investigation because GP210, not only plays a role in molecules transit; but also, is required for nuclear pore assembly. Knockdown research study of the GP210 gene in the C. elegans embryos, prevented the accumulation of aberrant structures on the nuclear membranes [41]. In addition, a gene encoding subtilisin protease SBT1.4 was observed in the anking region of signi cant SNP on Pv10. Subtilisin proteases (SBTs) are serine proteases controlling diverse developmental processes with six distinct subfamilies [42]. For instance, MtSBT1.1 in M. truncatula controls seed size in legume while the gene AtSBT1.7 is essential for mucilage release from seed coats, triggering the accumulation and/or activation of cell wall modifying enzymes necessary for the loosening of seeds in Arabidopsis [40]. It is believed that the SBT1.4 was associated with leaf senescence in Arabidopsis [43], [44]. The function of SBT1.4 in common bean is not fully understood, analysis of loss-of-function mutants would provide evidence and elucidate the function of SBT1.4 in seeds. A study in common bean has revealed a correlation between water uptake and the seed coat luster [45]. Because shiny seeds have rmer texture controlled by the Asper (Asp) gene, this gene is the major determinant of water uptake and is located on Pv07. However, Asp gene is not detected within the regions of signi cant SNPs on Pv07 in this study, suggesting that this gene may not be directly related to water uptake.
The cooking time in common bean has become an important trait due to its association with the volume of fuel, water and time spent on food preparation. Among traits interested by consumers, cooking time was prioritized in farmers with proportional of 63% followed by taste (51%) and atulence (50%) as stated in consumer acceptance report [38]. Cooking time is accompanied with water uptake trait. In this study, water uptake and cooking time were signi cantly associated with negative correlation r=-0.20. These two variables indicated that a decrease in cooking time predictably relies on the increase in water uptake or vice versa.
Reports from previous studies showed that genomic regions associated with cooking time were identi ed on different chromosomes, such as Pv06 [18]; Pv02, Pv03 and Pv06 [46]. In this study, in the upstream of the signi cant SNP on Pv03, a probable polygalacturonase homolog was detected as a candidate causal locus. Polygalacturonase is a major component of the pectin remodeling and disassembly network [47]. It is involved in catalyzing the degradation of pectin in the cell wall via the hydrolytic cleavage of glyosidic bonds [48], probably making seeds soften and shorten the cooking time.

Conclusion
GWAS technology is a powerful tool to determine the chromosome regions associated with various traits and facilitate the identi cation of candidate causative genes within these chromosome regions. We have demonstrated that several putative causal loci mediating four traits were identi ed through this GWAS analysis. Further validated loci could be used in common bean breeding to develop varieties with higher concentration level of micronutrients, e cient water uptake for shorter cooking time.

Plant materials and eld experiment
A natural population of 206 common bean genotypes collected from International Centre for Tropical Agriculture, CIAT -Uganda (179), Ethiopia (12), Kenya (10), Tanzania (3) and Rwanda (2) were used in this study [49]. The experimental materials were planted in the northern Tanzania at medium altitudes with 1407 m above sea level (a.s.l) of S03 21.690' and E36 37.879' (Selian Agricultural Research Institute, abbreviated SARI) and at low altitudes with 992 m a.s.l of S03 19.905'and E037 14.067' (Tanzania Coffee Research Institute, commonly called Lyamungo) in 2017 and 2018. The soil characteristics of these areas were Eutrophic Brown Soils on volcanic and Alluvial sediments -medium texture (loamy soils) [50]. The range of Fe was 29.85 mg/kg to 39.24 mg/kg and Zn was 0.33 mg/kg to 0.60 mg/kg. In both seasons and agro-ecologies, the experiments were laid-down in Incomplete Block Design with two replications for 206 genotypes. The plot size was 3.2 m length and 1.5 m width with 4 rows per plot and 0.5 m inter row spacing and 0.2 m intra row spacing under rainfed. Middle two rows were harvested and used for data collection.

Sampling and sample preparation for Fe and Zn determination
At full maturity, 30 well lled pods of each genotype were randomly harvested from two centered rows in each plot. Total of 100 seeds/genotype/location were sampled and dried in an Oven (Binder drying chamber, Model: ED 115, Tuttlingen, German) for 12 hours at 60℃. The moisture of dried seeds was measured by Grain Moisture Tester -Soybeans (Baton cooperation. Model: 8500, Michigan 48084) and seeds were ground by Retsch SK 100 (Retsch GmbH, Model: SK 1001C, 42781, Haan, Germany) with a stainless-steel grinding ring to produce bean ne powder. About 5 -10 g our of each genotype was scanned for the content of Fe and Zn (mg/kg) from different locations using X-ray Fluorescence (XRF) (Bruker AXS GmbH, Model: Tracer 5i, Östliche Rheinbrückenstr, 76187 Karlsruhe Germany).
Sampling and sample preparation for water uptake and cooking time determination The rest of harvested seeds from each plot was bulked, sun dried in a paper bag sized 6.5 cm ×14 cm and stored at room temperature. When seed moisture reached between 10-14%, 30 seeds from each genotype were weighted (g) using electronic scale (SONASH ®, Model: SKS-006, Frances). The dried seeds were soaked in a jar with 150 ml water overnight. Thirty-seed weight was measured again to quantify water uptake (ml) using the formula (i) below. Then, 25 seeds were placed on the top of each well in a Matson Cooker (Customized Machining and Hydraulics Co., Winnipeg, Canada), which consists of a plate with 25 wells for individual seeds and 25 metal pins [51]. Cooking time in minutes were recorded when the 20 th pin drop down, which represents the time of 80% seeds to be completely pierced with 85 g stainless steel rod with a 2 mm pin [51], [52]. The average cooking time was determined for each genotype based on two locations.

Phenotypic data analysis
Analysis of Variance (ANOVA) from GenStat software was used to determine the signi cant effects (p<0.05) among the genotypes and locations used and Fisher's protected least signi cant differences (LSD) was applied to identify the populations whose means differs statistically [53], [54]. The formula in excel analysis software and Pearson correlation were used for plotting graph, correlation, regression, scatter plots and pie charts. . The ideal number of sub-populations was determined using K method [57] implemented in the HARVESTER software [58].

Marker-trait association analysis
Filtering the SNPs with monomorphic marker was set at <2% of minor allele frequency (MAF) and 5052 SNP markers were retained and used in principal component analysis (PCA) and population structure analysis. The obtained SNP markers in respective distribution of chromosomes were employed in association analyses with TASSEL 5.0 software [59]. The following mixed linear model (MLM) formula (ii) was used: Where, Y is Phenotype of each genotype; γ is the xed effects of the SNP; ρ is the xed effect of population structure from PCA results; k is the random effect of kinship relative; ε is the term error under normally distribution with mean =0 and variance δ2. The statistical model was used to test for traitmarker association studies [60].
Putative candidate loci linked to traits By GWAS analysis, signi cant SNPs associated with traits can be detected in speci c regions of chromosomes. To identify putative candidate loci in uencing on traits, coding genes were explored within or nearby the signi cant SNP region based on the common bean reference genome (https://plants.ensembl.org/Phaseolus_vulgaris/Location/Genome? fbclid=IwAR1rLGjP0bqPtzMoJTGKoeU7KCRGYmXtcUlRfYgbltmf-KfHLhLgjKxNk8k) and annotated by BLAST in the NCBI (https://www.ncbi.nlm.nih.gov).