Genome Wide Association Studies for Four Physiological Traits in Groundnut (Arachis hypogaea L.) Minicore Collection

Background In genotypes, identification of genomic regions/genetic markers for drought surrogate traits is essential. We used SNP markers for a genetic analysis of the ICRISAT groundnut minicore collection for genome wide marker-trait association for some physiological traits and to determine the magnitude of linkage disequilibrium (LD) present in the genetic resources. Results The LD analysis showed that about 36% of loci pairs were in significant LD (P < 0.05 and r2 > 0.2) and 3.14% of the pairs were in complete LD. There was rapid decline in LD with distance and the LD was <0.2 at a distance of 41635 bp. The marker trait association (MTAs) studies revealed 20 significant MTAs (p <0.001) with 11 markers for leaf area index (4), canopy temperature (13), chlorophyll content (1) and NDVI (2). The markers explained 2 to 21% of the phenotypic variation observed. Most of the MTAs identified on the A subgenome were also identified on the respective homeologous chromosome on the B subgenome. The duplications of effect observed could be due to common ancestor of the A and B genome which explains the linkage detected between markers lying on different chromosomes seen in the current study. present study a total 20 highly significant associations with 11 for four LAI, CT, SCMR The markers identified in this study serve as useful genomic to initiate marker-assisted selection and trait of for The identified markers this study useful after


Abstract Background
In order to integrate genomics in breeding and development of drought tolerant groundnut genotypes, identification of genomic regions/genetic markers for drought surrogate traits is essential.
We used SNP markers for a genetic analysis of the ICRISAT groundnut minicore collection for genome wide marker-trait association for some physiological traits and to determine the magnitude of linkage disequilibrium (LD) present in the genetic resources.

Results
The LD analysis showed that about 36% of loci pairs were in significant LD (P < 0.05 and r2 > 0.2) and 3.14% of the pairs were in complete LD. There was rapid decline in LD with distance and the LD was <0.2 at a distance of 41635 bp. The marker trait association (MTAs) studies revealed 20 significant MTAs (p <0.001) with 11 markers for leaf area index (4), canopy temperature (13), chlorophyll content (1) and NDVI (2). The markers explained 2 to 21% of the phenotypic variation observed. Most of the MTAs identified on the A subgenome were also identified on the respective homeologous chromosome on the B subgenome. The duplications of effect observed could be due to common ancestor of the A and B genome which explains the linkage detected between markers lying on different chromosomes seen in the current study.

Conclusions
The present study identified a total of 20 highly significant marker trait associations with 11 markers for four physiological traits of importance in groundnut; LAI, CT, SCMR and NDVI. The markers identified in this study can serve as useful genomic resources to initiate marker-assisted selection and trait introgression of groundnut for drought tolerance. The identified markers in this study may be useful for marker assisted selection after further validation.

Background
Groundnut or peanut (Arachis hypogaea L.) is an important food legume grown worldwide and is considered to be a rich source of protein for both humans and animals. Groundnut seed contain high quality edible oil (50%) easily digestible protein (25%) and carbohydrate (20%). The crop is grown on 27.9 million hectare worldwide with a total production of 47.1 million metric tons (1).
Developing countries account for 96% groundnut areas and 92% of the global production. Despite the developing countries been the largest producers of groundnut, the average yield per hectare (China, 2490 kgha -1 and Nigeria 840 kgha -1 ) is low when compared to America (3673 kgha -1 ) (1). Among the factors contributing to low yield, drought adversely affects the crop's performance (2). Drought stress has adverse influence on water relations, photosynthesis, mineral nutrition, metabolism, growth and yield of groundnut (3). Plants, being sessile, have evolved specific acclimation and adaptation mechanisms to respond to and survive short-long-term drought stresses (4). Traits that allow adaptation to water deficit include root traits, stomatal conductance, SPAD chlorophyll meter reading, leaf area, and canopy temperature (5).
In order to integrate genomics in breeding and development of drought tolerant groundnut genotypes, identification of genomic regions associated with drought tolerance traits is essential. With the advent of genomic tools, marker assisted breeding (MAB) has been deployed to enhance efficiency of selection of target traits in groundnut (6-10). Very few informative and good quality single nucleotide polymorphisms (SNPs) are available in peanut in contrast to availability of thousands of simple sequence repeats (SSRs) (9), though SNPs can be more easily generated than SSRs and SNPs are usually preferred due to low cost. In recent times, restriction site associated DNA sequencing (RADseq) (11) and genotyping by sequencing (GBS) (12) methods has allowed researchers to identify and genotype thousands of SNPs in plants. Diversity Arrays Technology methodology (DArT) which is based on genome complexity reduction and SNP detection through hybridization of PCR fragments (13), has been used in genome-wide association studies (GWAS), construction of dense linkage maps and mapping quantitative trait loci (QTL) (7,(14)(15)(16). The DArTseq technology from DArT produces data on biallelic SNP markers as well as the older dominant DArT markers. GWAS discovers marker-trait association (MTA) involving representative markers and genetically diverse populations (17,18). Furthermore, the magnitude of linkage disequilibrium (LD) present in genetic resources are important pre-requisites to deduce the genetic makeup, composition and genomic predictions of traits of interest during selection. Linkage disequilibrium per se could also be used as a predictor of the resolution at which significant genomic regions with influence on traits can be detected through marker-trait-association analysis (19).
In this study, SNP markers were used for the genetic analysis of a groundnut minicore collection from International crop research institute for the semi-arid tropics (ICRISAT) for genome wide marker-trait association for some physiological traits associated with drought tolerance and to determine the magnitude of LD present in the genetic resource.

Phenotypic evaluation
Weather data was not available for the test sites. However, drought is a common occurrence and each environment was exposed to drought though the extent can't not be quantified.
The groundnut mini core collection represents the global diversity of about 14,000 accessions conserved in the ICRISAT genebank. The result of the phenotypic evaluation showed highly significant differences (p<0.01) between lines for canopy temperature (CT), SPAD chlorophyll meter reading (SCMR) and normalized difference vegetative index (NDVI) but no significant genetic variation for leaf area index (LAI) ( Table 1). The interaction between lines and environment was significant (p<0.01) for only SCMR ( Table 1). The heritability of the traits was moderate to high except for LAI that had a low heritability (0.03) ( Table 1). The correlation of SCMR with LAI and CT was negative and non-significant but was positive with NDVI. A negative and significant correlation was observed between CT and LAI as well as CT with NDVI. The correlation between LAI and NDVI was negative and non-significant (Table 2).

Marker Data
The DArTseq genotyping produced 3,591 biallelic SNP markers of which 3,396 had a call rate that exceeded 0.6999. The average polymorphism information content of the 3,396 markers was 0.077. Of the 3,396 markers, just 396 had a minor allele frequency that exceeded 0.05. A total of 3,124 markers were given chromosome assignments: 368 (11.8%) were aligned to only the A genome, 449 (14.4%) only to the B genome, and 2,308 (73.8%) to both genomes. Over 73% of the markers that aligned with both the A and B genomes were assigned to homeologous chromosomes and the correlation of their position on those two set of homologues was 0.87. A principal component (PC) analyses of the data from the 3,124 markers assigned to a chromosome (s) did not reveal a strong discernible population structure in the first PC that accounted for 61% of the variation (Figure 1).
Cluster analysis of the marker data suggested two groups of lines with one group having two subgroups ( Figure 2

Linkage disequilibrium
Linkage disequilibrium analysis was conducted using 305,919 loci pairs within chromosomes. The LD was significant (P < 0.05) for 36.3% of loci pairs were ( Figure 3, Table S1). Furthermore, 9,592 (3.14%) of the pairs were in complete LD (r 2 = 1). There was rapid decline in LD with distance and the correlation analysis revealed negative correlation (r = -0.0795) between the LD R 2 and the physical distance; as well as between the P-value and R 2 (r = -0.5381), revealing the existence of linkage decay ( Figure 4).

Marker-trait association
Due to the insignificant genotype by environment interaction the GWAS was performed using phenotypic best linear unbiased predictions (BLUPs) estimated over all environments. None of the 396 biallelic SNPs were significantly associated with any trait at P<0.001. Only the marker-trait associations (MTAs) that had P values < 0.001 (Table 3) were considered as significant for all traits (details of significant MTAs with P values between 0.05 and 0.001 are presented in Table S2). We found 20 MTAs with 11 markers (Table 3). Two markers (M1, M2) identified four possible loci for leaf area index (LAI). Marker M1 was associated with chromosomes A03 and B03 while M2 was associated with chromosomes A06 and B07. The individual markers explained 6.8 to 7.3% of the total phenotypic variation observed (Table 3). For CT, seven markers (M3-M9) identified 13 possible loci: six of these markers (M3-M8) identified regions on both the A and B genomes. The CT markers explained 9.6 to 16.6% of the variation observed. One marker (M10) on chromosome B05 was found to be associated with SCMR and explained 20.8% of the observed phenotypic variation. One marker (M11) identified regions of chromosome A04 and B02 associated with NDVI. The summaries of the number of SNP markers observed are presented in Table 4 and these may be useful for marker assisted selection after further validation. From all the 20 associations found, the B genome had 11 associations and the A genome had nine. Equal number of associations was found on each genome except for canopy temperature were A genome had six and B genome seven.

Gene identification and description
Most of the MTAs identified were located in genes that have functions corresponding to the gene description ( Table 5). Two of the MTAs (M1_A03 and M1_B03) identified for LAI were residing in genes (Aradu.02I1H and Araip.E67FB, respectively) that both have the functions of transition of mitotic cell cycle and regulation of meristem structural organization. Nine out of the 13 MTAs identified for canopy temperature were located in the coding sequence of already identified for genes which are responsible primarily for phospholipase, ATP and shoot gravitopism activities ( Table 5).
The marker identified for SCMR was located in Araip.5TB63.1 gene which is responsible for zinc ion binding. For NDVI, the two MTAs identified, M11_AA04 and M11_B02, were located in Aradu.DA17R and Araip.I1WMT which regulates structural constituent of ribosome and jasmonic acid carboxyl methyltransferase, respectively. All the genes identified in A genome and their corresponding chromosomes on the B genome were having the same descriptions.

Discussion
The phenotypic evaluation showed that the groundnut panel varied significantly for the physiological traits except for LAI ( Table 1). The physiological traits, LAI, CT, SCMR and NDVI are important in improving productivity and are used as indirect indices for improving drought tolerance. Traits such as NDVI have been reported to be highly associated with yield and can be effective as in-season prediction of yield (20,21). The heritability of the traits ranged from moderate to high. The most correlations between traits were significant though the values were low suggesting that the traits are fairly independent ( Table 2).
The marker data suggested that the population was not highly structured. The SNP and silico markers gave similar results. The two groups of minicore collections and one group having two subgroups as suggested by the marker data may be due to different origins of the collections. Both types of markers produced more markers in the B genome than the A genome. Both markers systems assigned a similar portion of the markers to both genomes. While many polymorphic markers were detected, a large portion had MAF < 0.05 and the average PIC values for both types of markers was very low, about 0.07. Groundnut is known to have a low polymorphism rate and low genetic diversity (22)(23)(24) and our results support that conclusion.
In general, most markers assigned to the A and B genomes were assigned to homeologous positions and their genetic positions in the A and B genome were highly correlated and suggest that their position in the genomes have been mostly conserved since evolving from their common ancestor. Of the markers mapped to non-homeologous chromosomes in the A and B genome there was some evidence for exchange of chromosome segments between chromosomes A02 and B09 and then A08 and B07.

Linkage disequilibrium
The linkage disequilibrium analysis showed that about 36% of loci pairs were in significant LD (P < 0.05) and 3.14% of the pairs were in complete LD with an average distance of 31 kb among these pairs, indicating that LD extended quite some distance in groundnut. This is not surprising given the low polymorphism rate and PIC values in groundnut which means that detectable recombination is likely to be very low. There was a decline in LD with distance. Several studies have reported LD decay with distance (7,25,26).

Marker-trait association
The MTA studies revealed 20 significant MTAs (p <0.001) involving 11 markers. Eight markers (M1-M8) identified possible MTA in both the A and B genomes. Our analysis cannot distinguish if just one or if both chromosomes identified by such markers are affecting the trait. Two of these markers (M2 and M7) found MTA on non-homeologous chromosomes. It is possible that some chromosome rearrangement has caused the marker sequence to appear on different homeologous, or is possible that there is some error in the sequencing or bioinformatics alignment process. Validation studies will be needed to see if these markers are identifying one locus or perhaps a locus duplicated in the two genomes.
Pandey et al. (7) used SSR markers to identify some significant MTAs for physiological traits in groundnut including LAI and SCMR. These two traits were also observed in the present study. Four MTA were identified for LAI among which the MTA on chromosomes A06 and B07 may be the same as those previously identified by Pandey et al. (7) for total leaf area and leaf area respectively. Marker M2 identified MTAs for LAI at similar positions on chromosome A06 and B07. Thirteen (13) MTAs were identified for CT among which six markers were located in both A and B genome and four were identified on the same chromosome in A and B genome. One MTA was found for SCMR and two for NDVI. The MTA identified on B05 is different from the previously reported loci of A06 (7). The marker associated with NDVI was located on both the A and B genome but on different chromosomes.
From the result of the MTAs analysis, most of the MTAs identified on the A subgenome were also identified on the respective homeologous chromosome on the B subgenome. Argawal et al. (10) reported that a significant proportion of marker loci with assigned physical locations to chromosome of one genome were mapped to respective homeologous positions on chromosomes of the other genome. Our results support this as the correlation of markers positions between the A and B genome exceeded 0.87 for both marker types. Most of the homeologous MTAs were seen between chromosomes A03 and B03 which is similar to the study of Argawal et al. (10). Other homeologous MTAs in the present study are between chromosomes A02 and B02, and A05 and B05. Homeologous mapping of QTLs in groundnut has also been reported between chromosomes A07 and B07, and A08 and B08 (27). We found some evidence for genetic exchanges occurring between the groundnuts genomes as reported earlier (24). Also, a large number of similar markers were placed on the genetic map on different chromosomes. The possible translocation we noted do not appear to be terminal or reciprocal unlike the translocations noted by Farre et al. (28). Translocations of markers have been previously reported in groundnut (10,27). Some of these observed 'translocated' markers might be also due to miss-assignments because of the highly repetitive structure of groundnut genome (24).
The markers identified for LAI are needed for cell expansion and growth in plant leafs. Markers identified for canopy temperature were linked to genes associated with phospholipase, shoot gravitopism and ATP productions. In response to a cold exposure, it was shown that the phospholipase D (PLD) and the phospholipase C (PLC)/diacylglycerol kinase pathways are simultaneously activated (29). Natalini et al. (30) showed that temperature significantly affect the production of PLD and PLC in tomato plants. Furthermore, Ryu et al. (31) and Kim et al. (32) reported that shoot gravitopism 5 (SGR5) alternative splicing is accelerated at high temperatures, resulting in a high-level accumulation of SGR5β proteins. They proposed that the thermos-responsive alternative splicing of SGR5 gene provides an adaptation strategy by which plants protect the shoots from aerial heat frequently occurring in natural habitats.
The marker identified for SCMR was located in a gene which is responsible for zinc ion binding. Zinc compounds have been shown to increase chlorophyll content by at least 50% in cilantro plant (33).
Plant growth, chlorophyll contents, crude proteins and zinc contents were noted to be higher when greater supply of zinc doses was applied (34).
The two genes identified for NDVI regulates structural constituent of ribosome and jasmonic acid carboxyl methyltransferase respectively. Defective ribosome biogenesis inhibits cell division, cell expansion, and pavement cell differentiation (35). The depletion further downregulates auxin and up regulates Jasmonic acid (JA) (35). It can therefore be proposed that when the nutrient condition is sufficient for the plants, the increased ribosome biogenesis will enhance cell division and expansion thereby promoting rapid growth. But if the nutrient condition is limited, then there will be an increase in JA (which causes leaf senescence) and it is associated with upregulation of the gene in the B genome.
The present study identified a total of 20 highly significant marker trait associations for four physiological traits of importance in groundnut; LAI, CT, SCMR and NDVI. The markers identified in this study can serve as useful genomic resources to initiate marker-assisted selection and trait introgression of groundnut for drought tolerance. The identified MTAs can also be used for fine mapping and cloning of the underlying genes. Further studies are required to validate the significant markers identified in the present study using a larger population.

Methods
The research was carried out in the research field of ICRISAT located at Minjibir (2015 wet season) and Bayero University (2017 dry season) that are both in Kano State. The annual mean rainfall of both location is between 800 mm to 900 mm; and variations about the annual mean values are up to ±30%.
One hundred and twenty five groundnut minicore core collections including five check varieties were evaluated using randomized incomplete block design with three (3) replications. Each plot in a replication consisted of one row measuring 5m in length with an inter and intra row spacing of 75cm and 10cm respectively with a 1m alley between replications.
Crop specific practices such as plant protection were followed in the sowing process. Fertilizer application was done alongside planting. A basal application of Nitrogen, Phosphorous, and Potassium was done to all plots at the rate of 20 N kg/ha, 40 P 2 O 5 kg/ha and 40 K 2 O kg/ha at planting. Hand weeding was done using hoes at 3 rd , 8 th and 12 th week after planting to prevent weed infestation and competition between plants and weeds. Data was collected for CT using leaf thermometer (Meco IRT550 Infrared Thermometer), SCMR using SPAD 502 PLUS Chlorophyll meter, NDVI using Hand Held optical sensor unit (Model 505 NTech Industries, Inc) and LAI was taken by the use of Leaf Area Meter (ACCUPAR LP-80). The data were collected at 8 weeks after sowing from five randomly selected plants in each plot.

DNA extraction and genotyping
Groundnut leaves were collected into 96 deep well samples collection plates and sent to Integrated Genotyping Service and Support (IGSS) platform located at Biosciences Eastern and Central Africa (BecA-ILRI) Hub in Nairobi for Genotyping. DNA extraction was done using Nucleomag Plant Genomic DNA extraction kit. The genomic DNA extracted was in the range of 50-100ng/ul. DNA quality and quantity were checked on 0.8% agarose. Libraries were constructed according to Kilian et al. (36).
DArTSeq complexity reduction method through digestion of genomic DNA and ligation of barcoded adapters was done followed by PCR amplification of adapter-ligated fragments. Libraries were sequenced using Single Read sequencing runs for 77 bases. Next generation sequencing was carried out using Hiseq2500.
The IGSS platform uses a GBS DArTseq TM technology, which provides rapid, high quality and affordable genome profiling, even from the most complex polyploid genomes. DArTseq markers scoring was achieved using DArTsoft version 14, which is an inhouse marker scoring pipeline based on algorithms. Two types of DArTseq markers were scored, SilicoDArT markers (scored as presence or absence, 1,0) and biallelic SNP markers which were both scored for presence of the reference allele, the alternative allele, or both. In genomic representation of the sample, both SilicoDArT markers and SNP markers were aligned to the reference genomes of Arachis duranensis (V14167, A-genome ancestor) and A. ipaensis (K30076, B-genome ancestor) to identify chromosome.

Data Analysis
The analysis of phenotypic data was done to obtain BLUP by fitting the following mixed linear model in R package "lme4". y ijk = u + g i +e j +r(e) jk + ge ij + error ijk where g i is the effect of the i th line, e j is the effect of the j th environment, r(e) jk is the effect of the k th rep nested in the j th environment, ge ij is the genotype by environment interaction and error ijk is the error associated with the observation. Replication and environments were considered random effects.
The analysis was run in R with the lme4 package (37).

Entry mean heritability was calculated as
[Formula can be found in the supplmental files as an image file] Where σ g 2 is the variance among lines, is the σ ge 2 genotype by environment interaction variance, σ 2 error is the error variance, r is the number of replications, and e is the number of environments.

Linkage disequilibrium and marker trait association
The parameter r 2 was used to estimate LD between SNPs on each chromosome via the software package TASSEL 5.0 (38). Marker trait association analysis, probability values, and % of the variation modelled by the markers were calculated using the GAPIT package via the KDCompute interface (https://kdcompute.igss-africa.org/kdcompute/home). The first three principal components and the relationship matrix were included in the model. SNPs with MAF <5% and missing data >20% were excluded from the analyses. Missing values were imputed using the choice of nearest neighbor algorithm using TASSEL 5.0 (38).
We used the unweighted pair-group method to cluster the lines and form a dendrogram. We also used the marker data in a principal component analysis. Both of these analyses were executed using KDCompute.

Gene identification and description
The identified markers were searched at peanut base (https://www.peanutbase.org) to identify genes in the regions and their descriptions were retrieved from legume mine (https://mines.legumeinfo.org).    Figure 1 Graph of the scores of the first two principal components of the analysis of the marker data from the groundnut minicore collection Linkage disequilibrium among markers scored in the groundnut minicore collection