Profile and diversity of SNP markers
A total of 12,689 SNPs markers were generated from the DArT-seq genotyping of 274 cowpea accessions (Table 1) from 33 countries across the globe (Fig. 1). High number of markers (9562 SNPs) were removed during filtering based on low minor allele frequency (MAF < 0.05), high percentage (> 20%) of missing data, and lack of information on their position in the genome. The rest of the markers, 3217 SNPs (24.65%) spanning the 11 linkage groups of cowpea matched the quality criteria and were used in the diversity analyses.
The profiles of the 3127 SNP markers are presented in Table 2. The markers showed high reproducibility (0.99) with a mean call-rate value of 0.87. Markers diversity analysis revealed that these markers had an average minor allele frequency (MAF) of 0.22, and the majority (93.51%) had a PIC value above 0.1 with a mean value of 0.24 (Fig. 2, Table 2). The mean expected heterozygosity (0.23) of the markers was higher than the average observed heterozygosity (0.07)
Table 1
Geographical distribution of the 274 cowpea accessions
Regions
|
Accessions
|
Countries of origin
|
West and Central Africa
|
113
|
Benin, Burkina-Faso, Côte d’Ivoire, Ghana; Liberia; Mali; Niger; Nigeria; Senegal, Central Africa Republic
|
East and Southern Africa
|
93
|
Kenya, Malawi, Mozambique, Uganda,
|
Botswana, Lesotho, South Africa, Swaziland
|
North Africa
|
3
|
Egypt and Mauritania
|
Asia
|
53
|
India, Siri Lanka, Iran, Pakistan, Yemen
|
America
|
9
|
Brazil, Colombia, Guatemala, Honduras, Nicaragua, Puerto Rico, US
|
Oceania
|
3
|
Australia
|
Table 2
Quality and diversity of SNP markers used to investigate genetic diversity and population structure of
1-Markers quality parameters
|
Mean
|
Min
|
Max
|
Call rate
|
0.87
|
0.80
|
0.98
|
One ratio—reference allele
|
0.66
|
0.05
|
1.00
|
One ratio—SNP allele
|
0.37
|
0.05
|
0.99
|
Reproducibility
|
0.99
|
0.91
|
1.00
|
2-Markers diversity
|
Mean
|
Min
|
Max
|
Major Allele Frequency
|
0.78
|
0.50
|
0.96
|
Minor Allele Frequency
|
0.22
|
0.05
|
0.5
|
Expected Heterozygosity (He)
|
0.30
|
0.08
|
0.5
|
Observed Heterozygosity (Ho)
|
0.07
|
0
|
0.40
|
Polymorphism information content
|
0.24
|
0.08
|
0.37
|
Population structure
Two different approaches (STRUCTURE and DAPC) were used to identify the fitting number of clusters within the cowpea germplasm. Results of the structure analysis showed that the likelihood of DeltaK (ΔK) peaked at K = 3 (Fig. 3a), indicating that three clusters contribute to the total variation in the diversity panel under study. Consequently, the 274 cowpea accessions can be grouped into 3 subpopulations/clusters. The distribution of the cowpea accessions among different clusters revealed that cluster 1 had the highest percentage of membership (53.2%), followed by cluster 2 (31.8%) and cluster 3 which recorded the lowest percentage (14.9%). However, as shown in the inferred ancestry bar plot (Fig. 3b), some accessions were in admixture i.e. represent a sum of variation from more than one cluster. Based on the probability value of assignment of each accession to a specific cluster i.e. coefficient of ancestry, 31 accessions fell into the group of admixed individuals (Additional file 2).
In the DAPC approach, the curve of BIC versus number of clusters shows a rapid decline from K = 1 to K = 4 followed by a very slow increase (Fig. 4a), suggesting that K = 4 is the optimum number of clusters inferred through this approach. Furthermore, 3 discriminant functions (DA) were detected, which explained 30.12%, 16.27%, and 12.80% of the variation in the dataset respectively (Fig. 4b). The graph of probability values of assignment of each accession to a specific cluster showed a perfect inference of the accessions with no admixed individuals (Fig. 4c). However, the plot of the densities of individuals on the first discriminant function displays a slight overlapping between three clusters (cluster 1, cluster3, and cluster 4), while cluster 2 was a very distinct group (Fig. 4d). Cluster 4 had the highest number of membership (31.75%; 87 accessions), followed by cluster 1 (28.83%; 79 accessions), cluster 3 (26.64%; 73 accessions) and finally cluster 2 with the lowest population size (13.50%; 37 accessions), (Table 3).
Table 3
Distribution of 274 cowpea accessions in subpopulations/clusters based on discriminant analysis of principal components (DAPC) clustering method
Country
|
Number of clusters
|
Number of accessions
|
Regions
|
Clusters/subpopulations
|
1
|
2
|
3
|
4
|
Benin
|
4
|
42
|
West and Central Africa
|
19
|
17
|
41
|
36
|
Burkina-Faso
|
1
|
5
|
Côte d’Ivoire
|
2
|
3
|
Ghana
|
4
|
13
|
Liberia
|
2
|
2
|
Mali
|
2
|
2
|
Niger
|
1
|
2
|
Nigeria
|
4
|
40
|
Senegal
|
1
|
2
|
Central Africa Republic
|
2
|
2
|
Total Accessions
|
113
|
Kenya
|
2
|
3
|
East and southern Africa
|
40
|
15
|
14
|
26
|
Malawi
|
2
|
3
|
Mozambique
|
3
|
5
|
Uganda
|
4
|
76
|
Botswana
|
2
|
3
|
Lesotho
|
1
|
1
|
Swaziland
|
1
|
1
|
South Africa
|
1
|
1
|
Total Accessions
|
93
|
Egypt
|
2
|
2
|
North Africa
|
1
|
0
|
0
|
2
|
Mauritania
|
1
|
1
|
Total Accessions
|
3
|
India
|
4
|
49
|
Asia
|
15
|
4
|
14
|
20
|
Iran
|
1
|
1
|
Pakistan
|
1
|
1
|
Siri Lanka
|
1
|
1
|
Yemen
|
1
|
1
|
Total Accessions
|
53
|
Brazil
|
2
|
2
|
America
|
2
|
0
|
4
|
3
|
Colombia
|
1
|
1
|
Guatemala
|
1
|
1
|
Honduras
|
1
|
1
|
Nicaragua
|
1
|
1
|
Puerto Rico
|
1
|
1
|
US
|
2
|
2
|
Total Accessions
|
9
|
Australia
|
2
|
3
|
Oceania
|
2
|
1
|
0
|
0
|
Diversity and genetic relationship between accessions
The grouping of the accessions according to their regions of origin showed that West and Central Africa (113 accessions), East and central Africa (93 accessions) and Asia (53) were the three dominant origins of the accessions while the rest (15 accessions) of the accessions were from North Africa, America, and Oceania (Table 1). The genetic distance values between pairs of accessions based on differences at marker loci varied from 0.005 to 0.44 (Additional file 1). The unrooted phylogenetic tree constructed based on the distance matrix and the NJ clustering algorithm showed three main sub-roots (Fig. 3c), which confirms the presence of three clusters in the germplasm. Cluster 1 consisted of 53.28% (146 accessions) of the accessions, while cluster 2 and clusters3 had 36.86% (101 accessions), and 9.85% (27 accessions), respectively.
The phylogenetic relationship showed that accessions from all regions, except the 3 accessions from Oceania, were distributed in two or three subpopulations (Table 4). Accessions from West and Central Africa, and accessions from East and Southern Africa were predominant in cluster 2 and Cluster 1, respectively. In terms of within country diversity, Nigerian accessions were highly represented in cluster 1 (20 accessions) and cluster 2 (17 accessions). The majority of accessions from Benin were found in cluster 2 (27 accessions). Ugandan accessions were predominant (76 accessions) among accessions from East and Southern Africa region, with the majority (53 accessions) in cluster1. Indian accessions (49 accessions), the most predominant in the group of accessions from Asia, were distributed in cluster 1 (28 accessions), cluster 2 (15 accessions) and cluster 3 (6 accessions).
Principal Component Analysis (PCA) was performed and the scatter plot of the 274 accessions based on first two principal component axes, which together explained 23.57% of the variation among accessions, showed that the accessions can be grouped into 3 main subpopulations with some of them in admixture (Fig. 3d). These patterns are consistent with the structure analysis
Table 4
Distribution of 274 cowpea accessions in subpopulations/clusters based on Neighbour Joining clustering method
Country
|
Number of clusters
|
Number of accessions
|
Regions
|
Clusters/Subpopulations
|
1
|
2
|
3
|
Benin
|
3
|
42
|
West and Central Africa
|
44
|
59
|
10
|
Burkina-Faso
|
2
|
5
|
Côte d’Ivoire
|
2
|
3
|
Ghana
|
3
|
13
|
Liberia
|
1
|
2
|
Mali
|
2
|
2
|
Niger
|
1
|
2
|
Nigeria
|
3
|
40
|
Senegal
|
1
|
2
|
Central Africa Republic
|
2
|
2
|
Total Accessions
|
113
|
Kenya
|
2
|
3
|
East and Southern Africa
|
61
|
18
|
14
|
Malawi
|
1
|
3
|
Mozambique
|
3
|
5
|
Uganda
|
3
|
76
|
Botswana
|
2
|
3
|
Lesotho
|
1
|
1
|
Swaziland
|
1
|
1
|
South Africa
|
1
|
1
|
Total Accessions
|
93
|
Egypt
|
2
|
2
|
North Africa
|
2
|
0
|
1
|
Mauritania
|
1
|
1
|
Total Accessions
|
3
|
India
|
3
|
49
|
Asia
|
29
|
18
|
6
|
Iran
|
2
|
1
|
Pakistan
|
1
|
1
|
Siri Lanka
|
1
|
1
|
Yemen
|
1
|
1
|
Total Accessions
|
53
|
Brazil
|
2
|
2
|
America
|
4
|
5
|
0
|
Colombia
|
1
|
1
|
Guatemala
|
1
|
1
|
Honduras
|
1
|
1
|
Nicaragua
|
1
|
1
|
Puerto Rico
|
1
|
1
|
US
|
2
|
2
|
Total Accessions
|
9
|
Australia
|
1
|
3
|
Oceania
|
0
|
0
|
3
|
Genetic diversity and population differentiation of observed groups
Genetic diversity and population differentiation in the germplasm were examined by estimating diversity parameters and analysing molecular variance among and between inferred clusters/subpopulations (Table 5, Table 6 and Table 7). From the structure analysis, the genetic variability among the three inferred subpopulations represented herein by Nei’s net nucleotide distance (Table 5), varied from 0.17 (between cluster1 and cluster 2) to 0.26 (between cluster 2 and cluster 3). This indicates a degree of relatedness between subpopulations, with cluster1 more related to cluster 2 than cluster3 while cluster 2 and cluster 3 were more distant. The highest within population variation (expected heterozygosity) was observed in cluster 3 (0.32), followed by cluster1 (0.26). Cluster 2 contained the highest proportion of genetic variance (FST=0.56) whilst cluster1 and cluster 3 had similar mean values of population variance (FST =0.49 and 0.48). Results from the NJ clustering showed that the number of effective alleles (Ne) and polymorphic information content (PIC) values varied across the three subpopulations (Table 6). The highest number of effective allele (Ne = 1.48) was recorded in cluster 3 while the highest PIC mean value was observed in cluster 2 (PIC = 0.23). The observed heterozygosity (Ho) values were generally lower than the expected heterozygosity (He) across subpopulations (Table 6). Cluster 3 had the highest observed heterozygosity (Ho = 0.11) while cluster 1 and cluster 2 recorded the lowest values for this parameter, 0.07 and 0.06 respectively.
Table 5
Genetic variability among (net nucleotide distance) and within (expected heterozygosity) populations, proportion of membership, and mean value of the fixation index (Fst) observed from the population structure of 274 cowpea cultivars
Population
|
Net Nucleotide Distance
|
Expected Heterozygosity
|
Mean Fixation Index (Fst)
|
% of Individuals
|
|
Cluster 2
|
Cluster 3
|
|
|
|
Cluster 1
|
0.17
|
0.21
|
0.26
|
0.49
|
53.2
|
Cluster 2
|
|
0.26
|
0.22
|
0.56
|
31.8
|
Cluster 3
|
|
|
0.32
|
0.48
|
14.9
|
Table 6
Diversity parameters of the identified subpopulations using Neighbour-Joining and discriminant analysis of principal components clustering methods
Subpopulations
|
Subpopulation Size
|
Ne
|
He
|
Ho
|
PIC
|
NJ clustering
|
Cluster 1
|
146
|
1.46
|
0.27
|
0.07
|
0.22
|
Cluster 2
|
101
|
1.46
|
0.28
|
0.06
|
0.23
|
Cluster 3
|
37
|
1.48
|
0.27
|
0.11
|
0.22
|
DAPC clustering
|
Cluster 1
|
78
|
1.47
|
0.24
|
0.05
|
0.19
|
Cluster 2
|
37
|
1.47
|
0.28
|
0.27
|
0.23
|
Cluster 3
|
72
|
1.42
|
0.26
|
0.03
|
0.21
|
Cluster 4
|
87
|
1.48
|
0.29
|
0.05
|
0.23
|
Ne = Number of Effective Alleles, He = Expected Heterozygosity, Ho = Observed Heterozygosity |
The genetic diversity estimates based on the DAPC clustering method (Table 6) showed that the number of effective alleles ranged from 1.48 in cluster 4 to 1.42 in cluster 3. Cluster 2 and cluster 4 had the same PIC mean value (0.23) while the lowest value for this parameter (0.19) was observed in cluster 1. Overall, expected heterozygosity (0.24 ≤ He ≤ 0.29) was higher than observed heterozygosity (0.03 ≤ Ho ≤ 0.27) in all subpopulations. The highest observed heterozygosity was obtained in cluster 2 (Ho = 0.27) which was approximately equal to the expected heterozygosity (He = 0.28).
Analysis of molecular variance (AMOVA) revealed similar patterns in the repartition of the genetic variance irrespective of the clustering approaches used (Table 7). The total genetic variation in the germplasm was partitioned mainly into among accessions variation (99%) and among subpopulations variation (1%). The inbreeding coefficient (FIS) was very high implying that the inferred subpopulations are mainly composed of inbred lines.
Table 7
Analysis of molecular variance (AMOVA) for variation among and within sub populations of 274 cowpea accessions
Source
|
df
|
SS
|
MS
|
% Est.Var.
|
F-Statistics
|
Probability
|
NJ based grouping
|
Among subpopulations
|
2
|
2552.44
|
1276.22
|
1
|
|
Among accessions
|
271
|
243531.44
|
898.64
|
99
|
FIS
|
1
|
0.001
|
Total
|
273
|
246083.88
|
|
100
|
|
DAPC based grouping
|
Among subpopulations
|
3
|
4097.61
|
1365.87
|
1
|
|
Among accessions
|
270
|
241987.5
|
896.25
|
99
|
FIS
|
1
|
0.001
|
Total
|
273
|
246085.11
|
|
100
|
|
FIS= Inbreeding coefficient, % Est.Var. = percentage of estimated variance |
Linkage disequilibrium
Linkage disequilibrium (LD) was analysed for the 3127 SNP markers across the cowpea genome. Out of the 158025 pairs of comparisons between SNP markers, only 47262 (29.90%) show highly significant LD (pDiseq < 0.001), (Additional file 3). A strong LD was observed on chromosome 6 (Fig. 5b). The means of LD (R2) estimates were plotted along different interval of physical genetic distance between markers across the genome (Fig. 5a). High LD values were observed over a short physical genetic distance which decayed rapidly, with very low decrease at longer distances between markers across the genome (Fig. 4a and Table 8). At MAF ≥ 0.05, the mean of LD (R2) ranged from 0.53 (0-0.1Mbp) to 0.10 (24-24Mbp), and declined to half of its value LD (R2) = 0.27 at 0.2-0.3Mbp of physical genetic distance (Table 8). The variation of MAF thresholds showed significant effect on LD and its decay (Fig. 6). Considering a physical distance of 0-8Mbp, means of LD estimates (R2) at MAF ≥ 0.2 were generally higher as compared to those obtained at MAF ≥ 0.05 and MAF ≥ 0.1 thresholds, while little variation was observed among the values for the different thresholds at longer distance between markers (Dist ≥ 8Mbp) (Fig. 6 and Additional file 3).
Table 8
Mean of linkage disequilibrium LD (R2) estimates among SNP markers with MAF ≥ 0.05 at different physical distance across the genome
Dist_Mbp
|
Mean LD (R2)
|
SD (R2)
|
N (R2)
|
0-0.1
|
0.53
|
0.35
|
3057
|
0.1–0.2
|
0.32
|
0.27
|
1924
|
0.2–0.3
|
0.27
|
0.24
|
1713
|
0.3–0.4
|
0.24
|
0.21
|
1488
|
0.4–0.5
|
0.22
|
0.21
|
1403
|
0.5–0.6
|
0.20
|
0.18
|
1278
|
0.6_0.7
|
0.20
|
0.19
|
1171
|
0.7_0.8
|
0.18
|
0.16
|
1113
|
0.8–0.9
|
0.18
|
0.16
|
1051
|
0.9-1
|
0.17
|
0.15
|
1095
|
1–2
|
0.16
|
0.14
|
8587
|
2–3
|
0.15
|
0.13
|
6213
|
3–4
|
0.14
|
0.12
|
4919
|
4–5
|
0.14
|
0.11
|
3386
|
5–6
|
0.14
|
0.12
|
1986
|
6–7
|
0.13
|
0.10
|
1284
|
7–8
|
0.14
|
0.10
|
964
|
8–9
|
0.14
|
0.10
|
728
|
9–10
|
0.14
|
0.10
|
571
|
10–11
|
0.13
|
0.09
|
373
|
11–12
|
0.13
|
0.09
|
388
|
12–13
|
0.14
|
0.10
|
367
|
13–14
|
0.14
|
0.11
|
356
|
14–15
|
0.14
|
0.09
|
417
|
15–16
|
0.12
|
0.07
|
225
|
16–20
|
0.13
|
0.10
|
879
|
20–24
|
0.13
|
0.09
|
248
|
24–34
|
0.10
|
0.06
|
77
|
N (R2) = number of R2 pairwise values, SD standard deviation, Mbp = 106 base physical distance |