Profile and diversity of SNP markers
A total of 12,689 SNPs markers were generated from the DArT-seq genotyping of 274 cowpea accessions (Table 1) from 33 countries across the globe (Fig. 1). High number of markers (9562 SNPs) were removed during filtering based on low minor allele frequency (MAF<0.05), high percentage (>20%) of missing data, and lack of information on their position in the genome. The rest of the markers, 3217 SNPs (24.65%) spanning the 11 linkage groups of cowpea matched the quality criteria and were used in the diversity analyses.
The profiles of the 3127 SNP markers are presented in Table 2. The markers showed high reproducibility (0.99) with a mean call-rate value of 0.87. Markers diversity analysis revealed that these markers had an average minor allele frequency (MAF) of 0.22, and the majority (93.51%) had a PIC value above 0.1 with a mean value of 0.24 (Fig. 2, Table 2). The mean expected heterozygosity (0.23) of the markers was higher than the average observed heterozygosity (0.07)
Two different approaches (STRUCTURE and DAPC) were used to identify the fitting number of clusters within the cowpea germplasm. Results of the structure analysis showed that the likelihood of DeltaK (ΔK) peaked at K =3 (Fig. 3a), indicating that three clusters contribute to the total variation in the diversity panel under study. Consequently, the 274 cowpea accessions can be grouped into 3 subpopulations/clusters. The distribution of the cowpea accessions among different clusters revealed that cluster 1 had the highest percentage of membership (53.2%), followed by cluster 2 (31.8 %) and cluster 3 which recorded the lowest percentage (14.9 %). However, as shown in the inferred ancestry bar plot (Fig. 3b), some accessions were in admixture i.e. represent a sum of variation from more than one cluster. Based on the probability value for assignment of individual accession to a specific cluster, 31 accessions fell into the group of admixed individuals (Additional file 2).
In the DAPC approach, the curve of BIC versus number of clusters shows a rapid decline from K=1 to K=3 followed by a very slow increase (Fig. 4 a), suggesting that K=3 is the optimum number of clusters inferred through this approach. Furthermore, 2 discriminant functions (DA) were detected, which explained 59.09% and 24.92 % of the variation in the dataset respectively (Fig. 4b). The graph of probability values of assignment of each accession to a specific cluster showed a perfect inference of the accessions with no admixed individuals (Fig. 4c). The plot of the densities of individuals on the first discriminant function showed that cluster 1 was distant from the two other clusters (Fig. 4d). Cluster 2 had the highest number of membership (42.70%; 117 accessions), followed by cluster 3 (28.83%; 79 accessions), and cluster 3 (28.46%; 78 accessions) (Supplementary file2).
Diversity and genetic relationship between accessions
The grouping of the accessions according to their regions of origin showed that West and Central Africa (113 accessions), East and central Africa (93 accessions) and Asia (53 accessions) were the three dominant origins of the accessions while the rest (15 accessions) of the accessions were from North Africa, America, and Oceania (Table 1). The genetic distance values between pairs of accessions based on differences at marker loci varied from 0.005 to 0.44 (Additional file 1). The Neighbour Joining phylogenetic tree depicted three main sub-roots (Fig. 3 c), which confirms the presence of three clusters in the germplasm. Cluster 1 consisted of 53.28% (146 accessions) of the accessions, while cluster 2 and clusters3 had 36.86% (101 accessions), and 9.85% (27 accessions), respectively.
The phylogenetic relationship showed that accessions from all regions, except the accessions from Oceania, were distributed in two or three subpopulations (Additional file 2). Accessions from West and Central Africa, and accessions from East and Southern Africa were predominant in cluster 2 and Cluster 1, respectively. In terms of within country diversity, Nigerian accessions were highly represented in cluster 1 (20 accessions) and cluster 2 (17 accessions). The majority of accessions from Benin were found in cluster 2 (27 accessions). Ugandan accessions were predominant (76 accessions) among accessions from East and Southern Africa region, with the majority (53 accessions) in cluster1. Indian accessions (49 accessions), the most predominant in the group of accessions from Asia, were distributed in cluster 1 (28 accessions), cluster 2 (15 accessions) and cluster 3 (6 accessions).
Principal Component Analysis (PCA) was performed and the scatter plot of the 274 accessions based on first two principal component axes, which together explained 23.57% of the variation among accessions, showed that the accessions can be grouped into 3 main subpopulations with some of them in admixture (Fig. 3 d). These patterns are consistent with the structure analysis.
Genetic diversity and population differentiation of observed groups
Genetic diversity and population differentiation in the germplasm were examined by estimating diversity parameters and analysing molecular variance among and between inferred clusters/subpopulations (Table 3, Table 4 and Table 5). From the structure analysis, the genetic variability among the three inferred subpopulations represented herein by Nei’s net nucleotide distance (Table 3), varied from 0.17 (between cluster 1 and cluster 2) to 0.26 (between cluster 2 and cluster 3). This indicates a degree of relatedness between subpopulations, with cluster 1 more related to cluster 2 than cluster 3 while cluster 2 and cluster 3 were more distant. The highest within population variation (expected heterozygosity) was observed in cluster 3 (0.32), followed by cluster 1 (0.26). Cluster 2 contained the highest proportion of genetic variance (FST=0.56) whilst cluster 1 and cluster 3 had similar mean values of population variance (FST =0.49 and 0.48). Results from the NJ clustering showed that the number of effective alleles (Ne) and polymorphic information content (PIC) values varied across the three subpopulations (Table 4). The highest number of effective allele (Ne=1.48) was recorded in cluster 3 while the highest PIC mean value was observed in cluster 2 (PIC=0.23). The observed heterozygosity (Ho) values were generally lower than the expected heterozygosity (He) across subpopulations (Table 4). Cluster 3 had the highest observed heterozygosity (Ho=0.11) while cluster 1 and cluster 2 recorded the lowest values for this parameter, 0.07 and 0.06, respectively.
The genetic diversity estimates based on the DAPC clustering method (Table 4) showed that the number of effective alleles ranged from 1.48 in cluster 3 to 1.44 in cluster 2. Cluster 2 had the highest PIC mean value (0.23) while the lowest value for this parameter (0.19) was observed in cluster 3. Overall, expected heterozygosity (0.23≤He≤0.29) was higher than observed heterozygosity (0.05≤Ho≤0.10) in all subpopulations. The highest observed heterozygosity was obtained in cluster 2 (Ho=0.10).
Analysis of molecular variance (AMOVA) revealed similar patterns in the repartition of the genetic variance irrespective of the clustering approaches used (Table 5). The total genetic variation in the germplasm was partitioned mainly into among accessions variation and among subpopulations variation. Low contribution of between subpopulations variance was observed; 1% and 19% following the NJ clustering and DAPC, respectively. The inbreeding coefficient (FIS) was very high implying that the inferred subpopulations are mainly composed of inbred lines.
Linkage disequilibrium (LD) was analysed across the cowpea genome. Out of 3127 SNPs, 1754 SNPs spanning the whole genome were in true LD based on comparison between pairs of markers in window size of 500kb ( =0.2 threshold). These markers were almost evenly distributed across the genomes with the highest numbers of SNPs in LD with each other registered on chromosomes 3 and 7 (Fig5.a). LD estimates (Rvs2), corrected by structure and genetic relationship observed in the cowpea germplasm, were plotted along physical genetic distance between markers across the genome (Fig5.b). The LD decay plot showed that on average LD estimates were in general low (Rvs2 =0.02). However, few high LD values were observed over a short physical genetic distance which decayed rapidly, with very low decline at longer distances between markers across the genome. LD decayed below Rvs2 =0.2, more precisely to Rvs2=0.1, with an increasing physical distance of 8.75 to 25.16 kb (Fig. 5b).