Genetic diversity, population structure and linkage disequilibrium analysis
A total of 583 simple sequence repeats (SSR) polymorphic markers randomly distributed on the genome, were used to genotype the association mapping panel of 292 peanut accessions. The polymorphic markers produced 3,663 alleles with an average of 6.28 alleles per locus ranging from 2 to 20 (Table 1 and Additional file 1: Table S1). The major allele frequency ranged from 0.15 to 0.98, with a mean value of 0.60. The average genetic diversity was 0.51 and ranged from 0.03 to 0.90. The polymorphic information content (PIC) ranged from 0.03 to 0.90, with an average of 0.45 (Additional file 1: Table S1). Of the 3,663 alleles, 629 were unique alleles (allele frequency < 0.05%), 1,471 allele were rare alleles (0.05% ≤allele frequency < 5%), 1,547 allele were polymorphic alleles (5% ≤ allele frequency < 95%), and 15 were fixed alleles (allele frequency ≥ 95%), with corresponding proportions of 17.17%, 40.16%, 42.23% and 0.41%, respectively (Additional file 1: Table S2).
The population structure analysis was performed using multi-allelic SSR genotyping data. The most significant change of the LnP(D) value was observed when parameter K increased from 1 to 2, and the highest ΔK value was obtained when K=2 (Fig. 1a and 1b). The previously available information suggested two subgroups in the peanut panel and the results of this study on phylogenetic relationship and PCA analysis further proved that the 292 peanut accessions could be clearly divided into two subgroups (G1 and G2), which were consistent with the population structure results (Fig. 1c and 1d). All the landraces in G1 subgroup were subsp. hypogaea, while the landraces in G2 subgroup belonged to subsp. fastigiata (Fig. 1c). The pairwise FST value between the two subgroups was 0.16, and Nei’s (1972) genetic distance was 0.27. Compared with G1 subgroup, G2 had a relatively higher genetic diversity (0.47) and PIC value (0.36). However, the allele number per locus was higher in G1 than G2 (Table 1).
Table 1 Statistic summary for population diversity.
|
|
|
Population
|
Sample Size
|
Allele Number
|
Major Allele Frequence
|
Genetic Diversity
|
PIC
|
G1
|
161
|
5.36
|
0.72
|
0.40
|
0.36
|
G2
|
131
|
5.29
|
0.66
|
0.47
|
0.42
|
Total
|
292
|
6.28
|
0.61
|
0.51
|
0.45
|
The peanut association mapping panel consisted of cultivars from 17 provinces of China. Most accessions (93.2%) were distributed in nine provinces (Hebei, Shandong, Henan, Sichuan, Hubei, Jiangsu, Fujian, Guangdong, and Guangxi). The proportion of two subgroups in these provinces exhibited obvious differences (Fig. 2a). In Northern China (Hebei, Shandong, and Henan provinces), the proportion of G1 subgroup ranged from 77.42% to 85.71%. Similarly, the proportion of G1 ranged from 66.10% to 91.67% in peanut accessions distributed in the Yangtze River region (Sichuan, Hubei, and Jiangsu provinces). Whereas, the proportion of G1 subgroup were below 11.11% in Southern China (Fujian, Guangdong, and Guangxi provinces). It is suggested that genetic diversity was highly linked to geographic distribution. The phylogenetic tree showed that the peanut-distributed provinces could be clearly clustered into two clades (Fig. 2b). The provinces in Southern China (Fujian, Guangdong, and Guangxi) were clustered together, and the provinces from Northern China and the Yangtze River region were grouped into another clade.
The linkage disequilibrium (LD) was estimated using coefficients (r2) of 280 SSR markers mapped on 20 linkage groups [25]. The average r2 was 0.11 and almost 53.4% of the coefficients (r2) showed statistically significant (P < 0.01). The 95th percentile of distribution of all r2 between the unlinked marker-pairs, i.e., r2 = 0.28, was set as the background level. Since the average distance of pair combinations was below 1 cM with the r2 plot dropping to background level, the estimated LD decay in the peanut panel is 1 cM (Additional file 2: Fig. S1).
Phenotypic variation for oil content among peanut accessions
The analysis was done for the phenotyping data generated on oil content for 292 Chinese peanut accessions from seeds harvested from four environments. The oil content among association panel ranged from 45.85 to 59.72% in 2015WH, 43.82 to 55.88% in 2016WH, 44.22 to 54.97% in 2017NC, and 45.11 to 56.69% in 2017WH, respectively (Table 2). The median values of oil content in four environments varied from 48.47% to 51.89%, and the standard deviation of phenotypic data ranged from 1.78 to 2.39 across four environments. Two elite cultivars (Zhonghua 15 and Yuhua 9326) which have superior yield potential, exhibited stably high-oil feature across four-environmental trials (average oil content > 55%). The continuous distributions of phenotypic values for peanut accessions were shown in Additional file 2: Fig. S2. The phenotypic data of the peanut panel in 2015WH, 2016WH, and 2017WH followed a normal distribution based on the Shapiro-Wilk normality test (Table 2). Variance analysis across four environmental trials showed that genotype, environment, and genotype × environment significantly influenced oil content at the P < 0.001 level (Additional file 2: Fig. S2). The broad sense heritability for oil content was evaluated to be 0.76 in the peanut panel.
We further studied phenotypic differences in the genetically highly diverse association mapping panel containing genotypes different geographic distributions in China. The oil content in the accessions from Northern China was statistically higher than that from Southern China in all the field trials. Similarly, the accessions from the Yangtze River region had higher oil content than the accessions from Southern China in 2016WH, 2017NC, and 2017WH. The phenotypic difference between Northern China and the Yangtze River region was not statistically significant in three of the four environments. Meanwhile, we also made a comparison among cultivated peanuts released at different times (Additional file 2: Fig. S3). In general, there was no obvious difference in oil content between cultivars released at different times.
Table 2 Phenotypic variation for oil content (%) for 292 peanut accessions across four environments.
|
Env
|
Min (%)
|
Max (%)
|
Median (%)
|
SD
|
Kurt
|
Skew
|
w(Sig)
|
H2
|
2015WH
|
45.85
|
59.72
|
51.89
|
2.39
|
-0.04
|
0.22
|
1.00 (0.51)
|
0.76
|
2016WH
|
43.82
|
55.88
|
50.13
|
1.78
|
0.54
|
0.00
|
0.99 (0.36)
|
|
2017NC
|
44.22
|
54.97
|
48.47
|
1.94
|
0.66
|
0.06
|
0.99 (0.04)
|
|
2017WH
|
45.11
|
56.69
|
51.53
|
1.86
|
0.32
|
0.06
|
1.00 (0.50)
|
|
Env, environment; SD, standard deviation; Kurt, Kurtosis; Skew, skewness; w, Shariro-Wilk statistic value; Sig, significance
|
Association analysis for oil content
The MLM model with K+Q matrix was used to perform association mapping with SSR-markers and the phenotypic data on oil content generated on 292 peanut accessions in four environments. The marker-trait association analysis identified two associated loci for 2015 WH environment, eight associated loci for 2016WH environment, three associated loci for 2017NC environment, and five associated loci for 2017WH environment. Twelve significantly associated loci at P < 0.00186 explained 4.54-9.94% phenotypic variance across four environments (Table 3 and Additional file 2: Fig. S4). Among them, AGGS1014_2 with up to 9.94% PVE had been repeatedly detected in multiple environments (2016WH, 2017NC, and 2017WH). These markers were widely distributed on nine linkage groups based on previously reported genetic maps (Additional file1: Table S2). Physical position of associated markers were on 12.7 Mb of B01 (AGGS1014_2), 57.1 Mb of B07 (AGGS1081), 47.4 Mb of A03 (AGGS1149), 124.9 Mb of B06 (AHGS0798), 30.1 Mb of B08 (AHGS1388), 20.8 Mb of B06 (AHGS1431), 36.9 Mb of A04 (AHGS1679), 57.1 Mb of B07 (AHGS2053), 67.8 Mb of B07 (AHS0127), 119.6 Mb of A09 (pPGPseq8D9), 5.1 Mb of A10 (TC11B4_2), 35.5 Mb of A08 (TC9F10_2), respectively.
The allelic number of these associated loci ranged from two (pPGPseq8D9 and AGGS1014_2) to six (TC11B4_2). The most favorable alleles which have the largest effect values included pPGPseq8D9-131bp, TC9F10_2-256bp, TC11B4_2-298bp, AHGS1679-293bp, AGGS1149-192bp, AGGS1081-201bp, AGGS1014_2-215bp, AHGS2053-256bp, AHS0127-188bp, AHGS1431-260bp, AHGS0798-174bp, and AHGS1388-304bp (Table 3, Additional file 1: Table S3). The accessions with different alleles showed statistically significant difference in a four-environment average of oil content (Fig. 4a). Compared with accessions in Southern China (FJ, GD, and GX), the genotypes from Northern China and Yangtze River (SD, HEB, HN, JS, SC, and HUB) carried more alleles with relatively high effect (Fig. 4b). Similarly, the frequencies of the most favorable alleles also showed geographic differences. For ten associated loci (pPGPseq8D9, TC11B4_2, AHGS1679, AGGS1149, AGGS1014_2, AHGS2053, AHGS0127, AHGS1431, AGHS0798, and AHGS1388), the most favorable allele frequency was highest in Northern China, the second-highest in the Yangtze River region, and lowest in Southern China (Fig. 4c). However, the most favorable allele frequencies were highest in Southern China for another two associated loci (TC9F10_2 and AHGS1431).
Table 3 Marker–trait associations across four environments for oil content.
|
Marker
|
Environment
|
F-value
|
P-value
|
PVE(%)
|
Favorable allele
|
pPGPseq8D9
|
2017NC
|
13.22
|
3.29E-04
|
4.61
|
pPGPseq8D9-131bp
|
TC9F10_2
|
2017WH
|
6.48
|
3.09E-04
|
7.59
|
TC9F10_2-256bp
|
TC11B4_2
|
2017WH
|
4.08
|
1.43E-03
|
8.84
|
TC11B4_2-298bp
|
AHGS1679
|
2017WH
|
5.95
|
6.04E-04
|
6.39
|
AHGS1679-293bp
|
AGGS1149
|
2016WH
|
6.54
|
1.68E-03
|
4.54
|
AGGS1149-192bp
|
AGGS1081
|
2016WH
|
5.47
|
1.15E-03
|
5.76
|
AGGS1081-201bp
|
AGGS1014_2
|
2016WH
|
23.23
|
2.50E-06
|
9.94
|
AGGS1014_2-215bp
|
2017NC
|
14.89
|
1.45E-04
|
6.90
|
2017WH
|
21.43
|
5.90E-06
|
8.75
|
AHGS2053
|
2016WH
|
6.29
|
3.81E-04
|
6.65
|
AHGS2053-256bp
|
AHS0127
|
2016WH
|
10.04
|
6.13E-05
|
6.99
|
AHS0127-188bp
|
AHGS1431
|
2016WH
|
9.11
|
1.52E-04
|
7.35
|
AHGS1431-260bp
|
AHGS0798
|
2015WH
|
9.54
|
1.03E-04
|
7.28
|
AHGS0798-174bp
|
AHGS1388
|
2016WH
|
8.84
|
1.94E-04
|
6.78
|
AHGS1388-304bp
|
PVE phenotypic variance explained
|
|
|
|
Evaluation of RIL population and confirmation of associated markers
To estimate potential values of associated loci in peanut breeding, a RIL population derived from two additional accessions (Zhonghua 10 and ICG12625) was employed as a test population. Oil content of the RIL population across four environments ranged from 47.45% to 60.88% in Env1, 45.30% to 58.96% in Env2, 42.89% to 55.07% in Env3, and 45.98% to 58.37% in Env4, respectively. The oil content of the female parent was 51.88 ± 1.41 %, whereas that of the male parent was 53.32 ± 1.47%. Three makers (AGGS1014_2, AHGS0798, and AHGS1431) showed association with oil content in the RIL population. A significant difference in oil content between homozygous alleles from P1 and P2 at AHGS1431 locus was observed in Env1 (Additional file 1: Table S4). Compared with the homozygous allele from P1 at AGGS1014_2 locus, the homozygous allele from P2 had significantly higher oil content in two environments i.e., Env2 and Env4 (Fig. 5a and Additional file 1: Table S4). For marker AHGS0798, oil content of the homozygous allele from P2 was significantly higher than that of the homozygous allele from P1 in two environments (Fig. 5a and Additional file 1: Table S4). Combined allele effect of AGGS1014_2 and AHGS0798 showed that oil content of homozygous alleles from P2 was significantly higher than that of the homozygous allele from P1 across environments (Fig. 5c).
Among the 292 peanut accessions, the alleles at AGGS1014_2 (X) locus and AHGS798 (Y) locus formed six combined genotypes, namely X-205bp/Y-170bp, X-205bp/Y-172bp, X-205bp/Y-174bp, X-215bp/Y-170bp, X-215bp/Y-172bp, X-215bp/Y-174bp (Additional file 2: Fig. S5). The oil content is highest in X-215bp/Y-174bp (51.49 ± 1.30%), median in X-215bp/Y-172bp (51.04 ± 1.11%), X-215bp/Y-170bp (50.88 ± 1.17%), X-205bp/Y-174bp (50.66±1.38%), and X-205bp/Y-170bp (49.72 ± 2.10%), and lowest in X-205bp/Y-172bp (49.62 ± 1.19%). The genotypic frequency of X-215bp/Y-174bp was 4.00% in peanut varieties released before 1980, and it increased to 22.13% in peanut varieties released after 2000. Similarly, the frequency of X-205bp/Y-172bp has an increase from 12.00% in peanut varieties released before 1980 to 32.79% in peanut varieties released after 2000. The frequency of X-205bp/Y-172bp and X-215bp/Y-170bp were lower in peanut varieties released after 2000 than the varieties released before 1980.