Genotype analysis of potato population
All SNPs and indels were filtered for minor allele frequency (MAF) > 0.05 and Hardy–Weinberg equilibrium P value > 0.001, and 20,382 high-quality SNPs and 2,107 indels were obtained. Annotation of the high-quality SNPs and indels showed that 18,683 (83.07%) were located in intergenic regions; 3,806 (16.92%) were located in the gene regions of the genome, of which 951 were located in un-transcribed regions, 2,796 were located in introns, and only 1,682 SNPs were in coding regions. In the coding regions, 771 SNPs produced silent mutations and 911 SNPs produced missense mutations, for ratio of 1.18(Table 1).
Table 1
Summary of single-nucleotide polymorphisms and indels
Totals SNPs and indels
|
intergenic
|
untranslated region
|
intron
|
coding sequence
|
nonsyn/synratio
|
3' UTR
|
5' UTR
|
Total
|
missense
|
synonymous
|
22,489
|
18,683
|
538
|
413
|
2,796
|
1,682
|
771
|
911
|
1.18
|
Identification and analysis of the late blight resistance of potato leaves in vitro
Among the 284 materials tested, 37 germplasm resources were asymptomatic or had only necrotic spots at the inoculation site, so they had a disease severity grade of 0; 15 germplasm resources had a lesion diameter (d) ≤ 5 mm, no chlorotic halo at the edge of the lesion, and thus a disease severity grade of 1; 30 germplasm resources had 5 mm < d ≤ 10 mm, the lesion spot was water-soaked with chlorotic halo on the edge, and so the disease severity grade was 2; 107 germplasm resources had 10 mm < d ≤ 20 mm, a water-soaked lesion with a chlorotic halo on the edge, and so a disease severity grade of 3; 95 germplasm resources had d > 20 mm, a lesion spot covered with a uniformly thick layer of mildew, and thus a disease severity grade of 4 (Table S1).
Analysis of population structure
The ADMIXTURE software was used to analyse 22,489 high-quality SNPs and Indels, the largest cluster subgroup value (K) was assumed to be each integer from 1–12, and the cross-validation (CV) error of each K value was calculated (Fig. 1A). When K was 1–4, the CV error gradually increased. When K was greater than 4, the CV error dropped rapidly to a nadir at K = 8, and for K > 8 it gradually increased. Therefore, K = 8 was optimal, that is, the entire potato population was divided into eight subgroups.
PCA was performed using all high-quality SNPs and indels. The calculation and analysis process were performed by R software (Fig. 1C). After the analysis was completed, plots were generated by R. For plotting, the eight subgroups inferred by ADMIXTURE software were used for grouping. The results showed that the eight subgroups could basically be distinguished on the PC1 axis, and the clustering results were consistent with the population structure division. Potatoes are native to the Andes of South America, and the history of artificial cultivation can be traced back to southern Peru from 8,000 to 5,000 BC.
According to the Q value of each material in these eight subgroups, each material was classified into the subgroup with the largest Q value (Fig. 1B). Subgroup 1 to Subgroup 8 had 11, 33, 25, 16, 12, 30, 51, and 86 germplasm resources, respectively. The distribution of the eight subgroups showed difference on the PC1 axis, and the clustering results were consistent with the population structure division. The eight subgroups of potato could not all be clustered together on the phylogenetic tree.
(A) The population structure of 284 potato materials was analysed by using ADMIXTURE software. CV error was calculated when K = 1–12. (B) When K = 8. In this population structure, each individual is represented by a line with eight different colours. According to the proportion of colours, which subgroup the variety belongs to can be inferred. (C) PCA was performed on all 284 potato samples with high-quality polymorphic loci. Each dot represents a sample.
The neighbour joining method was used to construct a phylogenetic tree, and the tree diagram was drawn with iTOL software to explore the genetic relationships between the 284 potato germplasms. Overall, the clustering results were consistent with the division of the population structure: subgroup 1, subgroup 2, and subgroup 6 clustered together well, while samples of other subgroups could be clustered together, and there was certain cross-over between samples (Fig. 2).
Genetic diversity revealed by SNP markers
According to the 22,489 high-quality SNPs and indels data, the genetic diversity (π) of all 284 potato germplasm resource was 0.2161, and the genetic diversity index of eight subgroups was between 0.1638 and 0.2502. Among them, subgroup 8 had the lowest genetic diversity index (0.1638), and subgroup 6 had the highest (0.2502) (Table 2). These data show that there was rich genetic diversity in the 284 potato germplasm resources.
Table 2
Summary of genetic diversity (π)
Subgroup
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
Total
|
π
|
0.2378
|
0.2387
|
0.2353
|
0.2007
|
0.2080
|
0.2502
|
0.2130
|
0.1638
|
0.2161
|
The population pairwise F-statistics (FST), a measure of population differentiation, was used to evaluate the degree of difference between subgroups of the 284 potato germplasm resource (Table 3). It was found that the FST among the subgroups was between 0.0251–0.1489, and the subgroup 1 and the subgroup 8 had the highest FST (0.1489), and subgroup 3 and subgroup 7 had the lowest FST (0.0251). Subgroup 2 and Subgroup 3, Subgroup 2 and Subgroup 7, Subgroup 3 and Subgroup 7, Subgroup 3 and Subgroup 8, and Subgroup 7 and Subgroup 8 were relatively weakly differentiated, and their genetic relationships were relatively close, whereas there was a moderate degree of differentiation between other subgroups.
Table 3
Summary of population pairwise F-statistics (Fst)
Subgroup
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
1
|
|
|
|
|
|
|
|
|
2
|
0.0920
|
|
|
|
|
|
|
|
3
|
0.0881
|
0.0285
|
|
|
|
|
|
|
4
|
0.1442
|
0.0602
|
0.0714
|
|
|
|
|
|
5
|
0.1310
|
0.0802
|
0.0719
|
0.1228
|
|
|
|
|
6
|
0.0737
|
0.0642
|
0.0661
|
0.1170
|
0.0936
|
|
|
|
7
|
0.1076
|
0.0325
|
0.0251
|
0.0611
|
0.0726
|
0.0869
|
|
|
8
|
0.1489
|
0.0600
|
0.0479
|
0.0972
|
0.1098
|
0.1051
|
0.0298
|
|
GWAS analysis of late blight resistance of potato germplasms
With the obtained SNPs and indels, the mixed linear model analysis was used to perform correlation analysis on the lesion spot diameter identified on the isolated leaves affected by late blight (Fig. 3) and disease resistance grade (Fig. 4). For the lesion diameters, P < 4.4 × 10− 7 was set as the threshold to screen significant loci, and 18 candidate genes were obtained after annotating genes located at or near the significant loci. For the disease resistance grade, P < 5 × 10− 2 was set as the threshold to determine the significant loci, and 22 candidate genes were obtained after annotating genes located at or near the significant loci.
Potato genome-wide LD
The LD-decay curve was obtained by analysing the LD of all 284 potato germplasm resources using the 22,489 SNP markers obtained from the whole genome (Fig. 5). The results show that LD decreased as the physical distance between SNPs increased. When taking the coefficient of determination r2 = 0.1 as the decay threshold, within this population the decay distances of subgroup 3 and subgroup 8 were approximately 0.9 kb, those of subgroup 5 and subgroup 7 were approximately 1 kb, those of subgroup 2 and subgroup 4 were approximately 1.1 kb, that of subgroup 1 and subgroup 6 were approximately 1.2 kb, and that of the entire population was approximately 0.9 kb. All of these are far lower than the LD-decay distances of cultivated rice (123 kb), cultivated soybean (133 kb), cultivated corn (30 kb), and cultivated cassava (8 kb) and slightly smaller than that of the maize inbred line population (1.5 kb) [23].
Candidate genes
Genome-wide association analysis of disease severity (lesion spot diameter)
We annotated the genes located in the significant loci and within 100 kb around these loci, and 18 candidate genes were found. The detailed information is shown in Table 4. Among these genes, PGSC0003DMG400028682 encodes chitinase 1, which is related to immune response; PGSC0003DMG400036902 encodes a glycine-rich protein involved in the defence ability of the cell wall; and PGSC0003DMG400027651, PGSC0003DMG401013892, and PGSC0003DMG400015828 encode ethylene receptor 2, ethylene response transcription factor 7, and ERF transcription factor 5, which are all related to the response to ethylene. Ethylene can coordinate with jasmonic acid during invasion of pathogens. Protein kinases such as those encoded by PGSC0003DMG400000043, PGSC0003DMG400016913, PGSC0003DMG400044850, PGSC0003DMG400007634, and PGSC0003DMG400006739 can catalyse protein phosphorylation and are regulatory factors mediating signal transduction in response to external stimuli. PGSC0003DMG400031455 and PGSC0003DMG400047159 encode late blight resistance proteins, and PGSC0003DMG400031878 encode NBS-LRR resistance proteins, which are directly related to potato late blight resistance. Among them, NBS-LRR proteins are the most important late blight resistance proteins. PGSC0003DMG400023057 and PGSC0003DMG400016899 are leucine-rich repeat sequence family proteins that may be involved in the specific recognition of pathogenic effectors.
Table 4
Candidate genes of significant association markers
CHR
|
Position
|
Candidate gene
|
Start
|
End
|
description
|
1
|
72247021
|
PGSC0003DMG400000043
|
72233966
|
72243677
|
Protein kinase
|
3
|
17084761
|
PGSC0003DMG400016913
|
17041192
|
17046193
|
3-phosphoinositide dependent protein kinase 1
|
PGSC0003DMG400031455
|
17182186
|
17186379
|
Late blight resistance protein
|
5
|
5568522
|
PGSC0003DMG400023057
|
5564035
|
5567888
|
Leucine Rich Repeat family protein
|
23042227
|
PGSC0003DMG400016899
|
23031367
|
23033031
|
Leucine Rich Repeat family protein
|
PGSC0003DMG400016900
|
23089073
|
23090433
|
Gene of unknown function
|
6
|
935787
|
PGSC0003DMG400031878
|
920425
|
927075
|
NBS-LRR resistance protein
|
17021861
|
PGSC0003DMG400006739
|
16991723
|
17001511
|
Serine/threonine-protein phosphatase PP1 isozyme 3
|
PGSC0003DMG400043784
|
17040350
|
17041875
|
Gene of unknown function
|
18227724
|
PGSC0003DMG400044850
|
18176603
|
18183507
|
Calcium-dependent protein kinase
|
7
|
51816228
|
PGSC0003DMG400027651
|
51816086
|
51822785
|
Ethylene receptor 2
|
8
|
11363802
|
PGSC0003DMG400047159
|
11343551
|
11343883
|
Late blight resistance protein
|
37714191
|
PGSC0003DMG400007634
|
37744509
|
37747990
|
Serine/threonine protein kinase
|
11
|
6786565
|
PGSC0003DMG400028682
|
6788959
|
6789991
|
Chitinase 1
|
12
|
6421567
|
PGSC0003DMG401013892
|
6424671
|
6425243
|
Ethylene-responsive transcription factor 7
|
12912464
|
PGSC0003DMG400036902
|
12954240
|
12957322
|
Glycine rich protein 2
|
40498866
|
PGSC0003DMG400015828
|
40428885
|
40430635
|
ERF transcription factor 5
|
PGSC0003DMG400046173
|
40523269
|
40524198
|
Gene of unknown function
|
Genome-wide association analysis of disease severity grade
By annotating genes in the significant loci and nearby, 22 candidate genes were found (Table 5). The finding of the five genes PGSC0003DMG400027651, PGSC0003DMG400000043, PGSC0003DMG400006739, PGSC0003DMG400047159, and PGSC0003DMG400028682 was consistent with the results of the genome-wide analysis of lesion diameter. PGSC0003DMG400025989 encode PGSC0003DMG400026048 are ERF1, and PGSC0003DMG400026821, PGSC0003DMG400027651, and PGSC0003DMG400036493 encode ethylene response transcription factor 4, ethylene receptor 2, and AP2/ERF domain–containing transcription factors, respectively, which are related to the response of ethylene. The six genes PGSC0003DMG400023584, PGSC0003DMG400019737, PGSC0003DMG400005532, PGSC0003DMG400008506, PGSC0003DMG400033667, and PGSC0003DMG400016323 encode serine/threonine protein kinases. PGSC0003DMG400033661 encodes a mitogen-activated protein kinase. PGSC0003DMG400023346 encodes a serine/threonine protein phosphatase. Plant protein kinases can catalyse protein phosphorylation, and protein phosphorylation is the main method of signal transduction. PGSC0003DMG400019926 is a plant resistance protein that is directly involved in potato late blight resistance. PGSC0003DMG400047228, PGSC0003DMG400041609, PGSC0003DMG400042169, and PGSC0003DMG400033671 encode unknown proteins.
Table 5
candidate genes of significant association markers
CHR
|
Position
|
Candidate gene
|
Start
|
End
|
description
|
1
|
69527286
|
PGSC0003DMG400025989
|
69503694
|
69504447
|
ERF1
|
PGSC0003DMG400026048
|
69548531
|
69549211
|
ERF1
|
72247021
|
PGSC0003DMG400000043
|
72233966
|
72243677
|
Protein kinase
|
3
|
44871940
|
PGSC0003DMG400036493
|
44865282
|
44866451
|
AP2/ERF domain containing transcription factor
|
4
|
8095786
|
PGSC0003DMG400023584
|
8093725
|
8096358
|
Serine/threonine protein kinase
|
9276044
|
PGSC0003DMG400019737
|
9274796
|
9280465
|
Serine/threonine protein kinase
|
20429483
|
PGSC0003DMG400005532
|
20327311
|
20330083
|
Serine/threonine protein kinase
|
PGSC0003DMG400047228
|
20522006
|
20522377
|
Gene of unknown function
|
5
|
16350485
|
PGSC0003DMG400041609
|
16320408
|
16320749
|
Gene of unknown function
|
PGSC0003DMG400008506
|
16418030
|
16428188
|
Receptor serine threonine protein kinase
|
31701650
|
PGSC0003DMG400033661
|
31687765
|
31689237
|
Mitogen-activated protein kinase
|
PGSC0003DMG400042169
|
31960558
|
31964367
|
Gene of unknown function
|
32532025
|
PGSC0003DMG400019926
|
32513930
|
32514640
|
Plant resistance protein
|
51527765
|
PGSC0003DMG400023346
|
51518931
|
51524033
|
Serine/threonine protein phosphatase
|
6
|
7094249
|
PGSC0003DMG400033671
|
7082039
|
7082791
|
Gene of unknown function
|
PGSC0003DMG400033667
|
7164679
|
7168294
|
Serine/threonine protein kinase
|
17031409
|
PGSC0003DMG400006739
|
16991723
|
17001511
|
Serine/threonine-protein phosphatase PP1 isozyme 3
|
39855777
|
PGSC0003DMG400016323
|
39852449
|
39854311
|
Serine/threonine protein kinase
|
7
|
49532127
|
PGSC0003DMG400026821
|
49531657
|
49532565
|
Ethylene-responsive transcription factor 4
|
51816228
|
PGSC0003DMG400027651
|
51816086
|
51822785
|
Ethylene receptor 2
|
8
|
11363730
|
PGSC0003DMG400047159
|
11343551
|
11343883
|
Late blight resistance protein
|
11
|
6786565
|
PGSC0003DMG400028682
|
6788959
|
6789991
|
Chitinase 1
|
Expression patterns of candidate genes
Four candidate genes were randomly selected to verify their gene expression patterns in late blight–resistant varieties A1, CIP10-1, and 0422 − 19 and susceptible varieties D8, UK7, and FAVORITA by qRT-PCR.The green leaves around the plaque, after inoculation with Phytophthora infestans in vitro for five days, and leaves without inoculated were used. Then their relative expression levels were calculated (Fig. 6). The results showed that most of the candidate genes were up-regulated after inoculation. The expression levels of PGSC0003DMG400028682 and PGSC0003DMG401013892 in resistant varieties were higher than those in susceptible varieties, while the expression levels of PGSC0003DMG400000043 in susceptible variety UK7 were higher than those in resistant variety 0422 − 19, and PGSC0003DMG400006739 in susceptible variety Favorita was higher than that in resistant variety Favorita Sexual variety A1. The results showed that the resistance to late blight was a quantitative trait controlled by multiple genes.