A genomic variation map of autotetraploid potatoes. To capture the diversity of modern commercial autotetraploid potato cultivars, we selected 137 varieties from European, Oceania and North American countries and performed Illumina deep resequencing. This yielded a total of 7.3 Tb data with a mean sequencing depth of 66.73-fold. Previously reported resequencing data of 19 autotetraploid potato cultivars from The United States and Canada, and 10 tetraploid potato landraces distributed in South America [6], were also included in this study (Table S1). The resultant collection comprised 166 accessions from 14 countries of Europe, North America, Oceania and South America, which also covered major potato market classes including fresh market, crisp or French fry processing and starch industry (Fig. 1a and Table S1).
Resequencing reads of the 166 varieties were mapped to the potato reference genome DM version 6.1 [16], with a mean mapping rate of 98.90% (Table S1). We next identified a total of 70,265,669 SNPs and 9,639,375 small insertions and deletions (InDels, ≤ 30 bp), which were evenly distributed across the genome (Table 1 and Fig. S1). Only 2.9% of these variants were localized at gene exons, of which 1.4%, 0.1% and 0.04% led to amino acid changes, frameshift and gain or loss of stop codons, respectively (Table S2). This genomic variation map provides starting materials for functional mining of variants underlying agronomic traits and also offers copious markers that are potentially useful in potato breeding.
Table 1
Summary of genetic variation and population statistics of the autotetraploid potato accessions used in this study.
| | | Groups |
| | All | European | Starch | American |
Number of accessions | | 166 | 102 | 23 | 27 |
Variation | SNPs | 70,265,669 | 58,003,325 | 41,316,793 | 40,790,409 |
| InDels | 9,639,375 | 7,650,164 | 4,879,140 | 4,770,147 |
| Nucleotide diversity (π) | / | 2.76 × 10− 3 | 3.13 × 10− 3 | 3.24 × 10− 3 |
LDa | LD decay (kb) | / | 2.4 | 6.1 | 0.8 |
| | | European vs. Starch | Starch vs. American | European vs. American |
Population divergence | FST, top 5% threshold | 0.21 | 0.28 | 0.13 |
| Mean divergence level (FST) | 0.062 | 0.093 | 0.038 |
| Regions | 94 | 161 | 151 |
| Length (Mb) | 43.45 | 47.40 | 46.15 |
| Genes | 2,019 | 2,476 | 3,333 |
aLD decay denotes the distance where LD reduces to the half of the maximum measurement of LD (r2). |
Population structure of autotetraploid potatoes. Based on market niches, the potato accessions used in this study could be classified into four groups: 72 varieties bred for fresh consumption market (Fresh), 49 varieties that are used in processing crisps or French fries (Processing), 23 varieties specifically bred for starch industry (Starch) and 12 breeding clones that have served as progenitors for potato cultivars (Others). Elucidation of their population structure and relationships is a prerequisite to understand how human selection has altered the genomes of these different potato varieties; however, this knowledge of autotetraploid potatoes is still limited. We initially investigated their relatedness by principal component analysis (PCA) and observed that varieties in the Starch group could be distinguished with a large proportion of the other autotetraploid accessions by the first principal component, whereas those in the Fresh and the Processing groups could hardly be separated (Fig. 1b).
Using the 10 tetraploid potato landraces from South America as the outgroup, the constructed phylogenetic relationships indicated that all 23 Starch varieties, along with 10 out of the 12 breeding clones, were clustered into a monophyletic clade (Fig. 1c and Fig. S2), consistent with the result of PCA. According to the passport information and the potato pedigree database [28], these 10 breeding clones have been frequently exploited for introducing resistance against nematodes, late blight and viruses to breed starch varieties (Table S1). The majority of Fresh and Processing potato accessions were distributed over three monophyletic clades, while 9 Processing varieties were also grouped into the lineage which Starch potatoes formed (Fig. 1c). For example, the variety “Allure” was selected from a cross between a starch variety “Astarte” and a breeding clone “AM 66 − 42” [28], two accessions that are phylogenetically close to Allure (Fig. 1c and Fig. S3). Allure has a high Under Water Weight (UWW, an indicator of dry matter content; 504 g/5kg), which might have originated from its genetic background that is largely from Starch potatoes. We also found that 25 out of the 27 North American cultivars were distributed in a monophyletic group, in which 9 non-starch varieties from European countries were also represented (Fig. 1c).
Model-based clustering supported the distinction between Starch varieties and other potatoes when the number of group (K) was equal to 2 (Fig. 1d and Figs. S4 and S5). We also observed that 12 out of the 23 Starch varieties and 24 out of the 27 North American cultivars are largely genetically homogeneous, while most non-starch European potatoes show admixture (Fig. 1d). Seven breeding clones (black lines in Fig. 1c and Fig. S3), such as MPI 19268, AM 78-3736 and VTN 62-33-3, also contained large proportions of “Starch” genomic composition (yellow bars; Fig. 1d and Fig. S5), consistent with the fact that they have been frequently used as donors of disease and pest resistance throughout the breeding of starch potato varieties (Table S1) [28]. Collectively, these results suggest that potato varieties bred for starch industry show obvious genetic distinction compared with other potatoes for the markets of fresh consumption and frying processing, and tetraploid potato cultivars from North American countries have quite different genetic background than their European counterparts. We thus divided our potato accessions into three groups for further analyses: 102 non-starch potato accessions from European countries (European), 23 starch industry potato varieties (Starch) and 27 cultivars from North America (American).
Diversity of autotetraploid potatoes. A global view of genetic diversity level among tetraploid potatoes is essential to understand population dynamics during human selection. We calculated nucleotide diversity (π) [29] for potato accessions from the European, Starch and American groups and observed that American potato varieties had the highest diversity (π = 3.24×10− 3; Fig. 1e and Table 1). Counterintuitively, the nucleotide diversity of the Starch varieties (π = 3.13×10− 3) was higher than that of European potatoes (π = 2.76×10− 3; Fig. 1e, Table 1 and Table S3), since Starch potatoes have been selected for high dry matter weight and high yield from non-starch potato accessions. This may be because of frequent efforts to introduce resistance genes during the development of starch varieties, by introgression from wild potato species [28], which has possibly increased diversity in Starch potatoes.
We extracted private variants (those only present in one given group of potatoes) in the three potato groups, Starch, European and American. Starch potatoes contain the lowest number of private variants (1,741,745), only accounting for 3.77% of all identified variants, which was significantly lower than that of European potatoes (20.80%; p = 2.20 × 10− 168 in χ2 test). This may be due to the markedly lower number of starch varieties in our collection (23) compared to that of European potatoes (102). Among these private variants in starch varieties, we identified 24,828 nonsynonymous SNPs and InDels that led to changes of amino acids, potentially having high impact to protein functions to 9,762 genes in the potato DM reference genome. We observed that protein domains related to nucleotide-binding and leucine-rich repeat immune receptor (NLR) type of disease resistance genes, including coiled-coiled, NB-ARC, and leucine-rich repeat, were significantly enriched in these 9,762 genes (Table S4 and Fig. S6), suggesting possible exotic origin due to wild introgression.
The decay of linkage disequilibrium (LD) with physical distance between pairs of SNPs to half of the maximum values occurred at 6.1 kb in Starch potatoes (r2 = 0.23), considerably higher than that in European (2.4 kb; r2 = 0.20) and American (0.8 kb; r2 = 0.22) accessions (Fig. 1f and Table 1), suggesting possible bottlenecks during the selection of starch industry potato varieties.
Divergence between starch and European potato varieties. Starch potato varieties contain significantly higher tuber UWW (an indicator of dry matter content) compared with other non-starch European potatoes (Fig. 2a). Exploitation of potato wild species during the breeding of starch industry varieties has successfully introduced disease resistance loci [28] but also significantly elevated tuber SGA content (Fig. 2b). These efforts may have jointly produced starch potatoes with a markedly different genomic composition compared to other potatoes (Fig. 1b, d); however, the genomic basis of divergence between starch varieties and non-starch European potatoes remains elusive. We computed fixation index (FST), a measure of population differentiation, between 95 European (excluding those accessions that were used in breeding of starch potato varieties) and 23 Starch potatoes. This led to the identification of 94 highly diverged genomic regions (FST > 0.21; the genome-wide average was 0.062) that contained 2,019 genes, occupying 43.45 Mb of the potato genome (Table 1, Fig. 2c, Figs. S7 and S8 and Table S5). The majority of these genes were involved in cell metabolic, biosynthetic and protein modification processes (Table S6). We found that 27 previously reported quantitative trait loci (QTLs) and genome-wide association study (GWAS) peaks underlying potato agronomic traits, including tuber starch content, tuber SGA content, tuber yield, and disease resistance (Table S7), co-localized with these highly diverged genomic regions (Fig. 2c).
To further identify genes that may have undergone substantial differentiation in starch varieties compared with their non-starch European counterparts, we identified 2,782 non-synonymous SNPs that displayed a high degree of population divergence (FST > 0.3; Fig. 2d) from a total of 70,265,669 SNPs, of which 52.7% (1,466) were located on chromosome 3, consistent with the most obvious FST peak detected in this chromosome (Fig. 2c). These variants impacted coding regions of 1,000 genes (Table S8), and 271 out of the 1,000 genes were expressed in tubers (transcripts per million [TPM] ≥ 1) and also overlapped with previously reported QTLs underlying tuber starch content (tsc; Fig. 2d). Notably, among the 271 genes, two genes, encoding sucrose synthase 6 (Soltu.DM.03G019120.1), starch synthase I (Soltu.DM.03G022350.3), respectively, are involved in the proposed starch biosynthetic pathway [30]. The other gene encodes a phosphoglucan phosphatase (Soltu.DM.03G024560.1) that participates in starch degradation [30]. All three genes were localized in tsc3.1 (Fig. 2d and Table S8) and therefore, might have important roles in regulating tuber starch metabolism in starch industry potato varieties. Our analysis provides a genomic foundation for further functionally characterizing genes that contribute to the extremely high content of starch in potatoes bred for starch produciton. It can, however, not be excluded that these identified highly diverged genomic regions, may also echo possible introgressions in starch varieties derived from potato wild species, as compared to non-starch European potatoes.
Selection in starch varieties. The development of potato varieties for starch processing industry involves intensive pursuit of high yield and high tuber starch content, which has possibly exerted strong selection pressure in certain genomic regions. Dissecting the genomic basis of selection in starch varieties may facilitate the identification of loci underlying these traits. We thus scanned the potato genome and identified regions that showed drastic reduction of genetic diversity in starch varieties compared with non-starch European potatoes (πEuropean/πStarch > 1.85; selective sweeps; Fig. 3a, b). A total of 161 regions that encompassed 44.75 Mb of the potato reference genome were identified, which contained 2,380 protein-coding genes (Tables S9 and S10). Most of these genes were annotated to participate in metabolic and biosynthetic processes (Table S11). We observed that only 10.5% (4.71 Mb out of 44.75 Mb) of selective sweeps and 368 out of the 2,380 genes within selective sweeps overlapped with those in domestication sweeps (Fig. 3b), which are regions with obvious decrease of diversity between S. candolleanum, the proposed progenitor of diploid potato landraces [6]. This suggests that selection in starch varieties may have targeted distinct parts of the potato genome compared with the domestication from wild species. We found that 9 out of the 16 yield-related QTLs (QTLs of tuber starch content [tsc], tuber starch yield [tsy] and tuber yield [ty]) were shared between domestication and selective sweeps, suggesting that these loci, despite having potentially been selected during potato domestication, may have undergone a second round of selection during the development of starch industry varieties. Among the 16 yield-related QTLs, seven (3 tsc, 2 tsy and 2 ty QTLs) were exclusively detected to overlap with selective sweeps (Fig. 3a, b), implying that selection from European potatoes to starch varieties might have targeted an additional set of loci underlying potato yield. The landscape of selective sweeps provides a starting point for further identifying genes contributing to superior agronomic important traits in potato starch industry varieties.
We observed that nine genes involved in starch biosynthesis [30], which encoded four β-amylases (BAM), one α-glucan, water dikinase (GWD), one starch synthase (SS), two sucrose synthases (SuSy) and one ADP-glucose pyrophosphorylase small subunit (AGPS), overlapped with the identified selective sweeps (Fig. 3c and Table S12). Intriguingly, 3 out of the 4 BAM-encoding genes were highly expressed in potato leaves and not expressed in tubers (Fig. 3c, d), suggesting that selection for starch degradation pathway in starch industry varieties might have mainly targeted on BAMs that function in leaves. The two genes encoding SuSy, which catalyzes the conversion from sucrose to ADP-Glucose in cytosol, a key step of starch biosynthesis [31], were predominantly expressed in tubers (Fig. 3c,d). Starch phosphate content is important to potato starch manufacturing industry, as numerous starch-derived products require the use of phosphorylated starch [32]. GWD catalyzes the phosphorylation of starch in both potato leaves and tubers (Fig. 3d), which has been proposed to be a candidate gene for starch phosphorylation degree [33]. We observed that GWD was within selective sweeps but not domestication sweeps, suggesting that GWD might have recently been selected to develop potato verities suitable for starch processing industry. These results offer perspectives how artificial selection has occurred in the starch metabolic pathway in starch industry potato varieties.
Wild introgression in starch varieties. To introduce in cultivated potatoes resistance to pathogens such as Phytophthora infestans (the late blight pathogen), potato cyst nematodes and viruses, potato wild species have been exploited over the breeding history of starch industry potato varieties, and also more recently for other markets [28, 34]. These efforts have been employed more extensively in starch potatoes than other market types of varieties such as table potatoes and processing potatoes, due largely to the fact that fewer number of traits need to be simultaneously considered in starch varieties: only yield, tuber starch content (or dry matter content) and disease resistance was crucial to meeting the demand of the starch processing industry. However, how introgression breeding has shaped the genomes of starch potato varieties remains unexplored. To examine possible introgression patterns in potatoes, we employed ABBA BABA statistics (also known as Patterson’s D statistics) [35] to compare allelic composition in the genomes of starch varieties and 50 diploid potato wild species [6], using 3 non-tuber bearing species from Solanum section Etuberosum as the outgroup. We observed significantly more wild-origin alleles in starch varieties than in non-starch European varieties, compared with 47 out of the 50 wild potato species, setting the significance threshold (Z-score) to 3. Wild species from nuclear clades 3 + 4 [36] displayed markedly higher Patterson’s D than did their clades 1 + 2 counterparts (Fig. 4a). This reflects hybridization barriers between autotetraploid cultivated potatoes with 2 or 4 endosperm balance numbers (EBN) and diploid wild potato species belonging to clades 1 + 2 (1 EBN) [36]. The most significant introgression was detected in Solanum vernei (Fig. 4a,b), a wild potato species that has been extensively exploited in starch potato breeding programs for disease and pest resistance [36, 37]. We estimated that 6.24% of the genomes of starch varieties have possibly been introgressed from S. vernei derived genomic fragments (Fig. 4b). We also detected significant signals of introgressions from several wild species that are phylogenetically close to S. vernei, such as Solanum chacoense, and Solanum brevicaule (Fig. 4a) [24], which were reported to have been used in potato breeding [36]. Interestingly, wild introgression in starch varieties was not random: chromosome 3 and 6 showed the most obvious wild admixture in several wild potato species, whereas nearly no introgression was detected on chromosome 10 (Fig. 4c).
We next divided the potato genome into 500-SNP windows with 250-SNP of step size (mean window size = 831.3 kb) and computed fdM statistics, an extension of Patterson’s D and f4-ratio that proves useful in estimating introgression signatures along chromosomes [38], between the 23 starch varieties and S. vernei. The mean introgressed proportion from S. vernei into starch varieties was estimated to be 6.81% (54.5 Mb), consistent with the estimation based on f4-ratio (6.24%; Fig. 4b). We detected clear signals of introgressions close to previously reported resistance genes, including Gpa5 conferring resistance to Globodera pallida cyst nematode [39], Ny(on)sto [37] and Rx [40], which provides resistance to Potato virus Y and Potato virus X, respectively (Fig. S9a). We observed that recently developed starch varieties tended to carry higher proportions of introgressions from S. vernei (Fig. S9b), and this proportion varied among the 23 starch industry varieties, with the highest being observed in “Valiant” (7.66%), whereas the lowest was only 1.72% in “Ehud” (Fig. 4d), in accordance with its lowest proportion of “Starch” genomic composition (yellow bars; Fig. 1d). We also observed diverse distribution of putative introgressed genomic fragments across the potato genome (Figs. S10 and S11). One of the most striking differences was found between the two starch varieties, Altus and Kardent: introgression signals on chromosomes 3 and 6 observed in Altus were remarkably reduced in Kardent, whereas obvious signatures of alien introgression on chromosome 12 in Kardent could barely be detected in corresponding genomic regions in Altus (Fig. 4e), suggesting diverse efforts that have been applied to handle wild introgressions in starch varieties. Therefore, if breeders intend to remove putative wild-origin segments on chromosomes 3 and 6 in Altus, one possible approach would be crossing between Altus and Kardent followed by selecting descendants that do not carry these fragments. Our analyses offer a holistic view of wild introgression in starch potato varieties, providing a theoretical foundation for further mitigating the impact of possible wild species-derived linkage drag in starch potato varieties.
Contribution of alien introgression to yield-related traits. We observed that overall introgression proportion was significantly positively correlated with UWW, an indicator of tuber dry matter and starch content (Fig. 5a), while no significant correlation could be detected between overall introgression proportion and total SGA content (Fig. 5b). Of the 54.5 Mb potential introgressed regions from S. vernei among the 23 starch varieties, 6.35 Mb (11.6%) contained NLR-encoding disease resistance genes (Table S13) [24], whereas 3.81 Mb (7.0%) and 8.56 Mb (15.7%) overlapped with previously reported QTLs underlying tuber SGA content and yield-related traits (tuber starch content, tuber yield and tuber starch yield; Table S7), respectively (Fig. 5c). Note that due to the limited resolution of QTL mapping [41–44], those yield-related QTLs occupied 169.1 Mb of the potato genome, which was much larger than the genomic fragments that contain NLR-encoding genes (44.6 Mb). This may lead to overestimation of wild introgression segments that underlie potato yield. The remaining 65.7% of alien introgression may include genomic segments displaying possible linkage drag. We then visualize the genome-wide introgression proportion along with previously reported QTLs underlying yield-related traits (Fig. 5d). We identified that tsc3.1, tsc3.2, tsy3.1, ty5.1, ty5.2, tsy12.1, tsc12.1 and ty12.1 resided in regions with relatively high degree of introgression (Fig. 5d). These results suggest that in addition to resistance genes, introgression from wild potato species may also incorporate favorable alleles contributing to yield-related traits, such as starch content, into starch industry potato varieties.
We further performed genome-wide association studies (GWAS) for UWW normalized by maturity (an indicator of tuber dry matter and starch content) using the potato varieties collected in this study, and identified 14 association signals (Fig. 5e and Tables S14 and S15). We observed significantly higher proportions of wild introgression within the flanking genomic regions of these 14 GWAS signals, as compared with that of the overall potato genome (Fig. 5f; two-tailed Wilcoxon rank sum test, p = 0.049). For example, one GWAS signal overlapped with a 4.2-Mb region at the end of chromosome 8 (55.0 Mb – 59.2 Mb), showing obvious introgression signature (Fig. 5e,f). Among the 23 starch varieties investigated in this study, 16 displayed clear introgression signals within this region (Figs. S10 and S11). Despite that, no nearby NLR-encoding disease resistance genes were identified (Table S13) [24], implying that this likely introgression event may not be due to the selection of disease resistance. Notably, starch varieties carrying this introgressed segment have higher tuber content dry matter (estimated by UWW) than those lacking this fragment, whereas non-starch European potatoes, which did not show signal of introgression within this region, have significantly lower dry matter content (Fig. 5g). Within this region, we identified two genes involved in starch biosynthesis pathway (Soltu.DM.08G025340.1 and Soltu.DM.08G025360.1; Fig. 5h and Table S15). Soltu.DM.08G025340.1 encodes a trehalose 6-phosphate phosphatase, whose substrate, trehalose 6-phosphate, was reported to regulate starch synthesis in Arabidopsis plastids [45]. Soltu.DM.08G025360.1 encodes a β-fructofuranosidase (invertase) that catalyzes the conversion of sucrose to reducing sugars, a crucial step in starch biosynthetic pathway [5]. Our analysis underlines the potential significance of wild introgression in the genetic determination of potato yield-related traits, providing promising candidates for further investigating the contribution of wild potato-origin alleles to the high dry matter content in starch potato varieties.
New genomic loci associated with agronomic traits. Leveraging the diversity panel of 137 potato accessions, we performed GWAS for five agronomic traits (Blackspot bruise, maturity, tuber flesh color, tuber skin color and resistance to common scab; Table S16 and Fig. S12) and identified 11 significantly associated signals, of which eight were newly detected in this study (Fig. 6a–d, Tables S17 and S18 and Fig. S13). Blackspot bruise, which results from mechanical damage during harvesting or transportation, leads to black or blue patches beneath the skin of potato tubers, representing one of the major consumer complaints and also compromising manufacturing quality [46, 47]. Potato varieties bred for industry starch production contain a high content of dry matter, which has been reported to associate with increased bruising susceptibility [46]. We identified a clear peak on chromosome 6 for tuber blackspot bruise, within which the leading SNP (G–T substitution) had a p value of 3.05 × 10− 08 (Fig. 6a,e). Potato accessions that carry increasing numbers of alternate allele (T) displayed a significantly higher resistance to blackspot bruise (Fig. 6f). Notably, the gene closest to the leading SNP (7,042 bp), Soltu.DM.06G024040, encodes an ethylene-responsive transcription factor belonging to the APETALA2/Ethylene-Responsive Factor (AP2/ERF) superfamily (Fig. 6e). Genes in this superfamily are known to play key regulatory roles in plant stress response [48]. The expression of its Arabidopsis ortholog (AT5G25190) was reported to be significantly induced by ethylene [49]. Previous studies indicated that pretreatment of tubers with a low concentration of ethylene mitigates tuber blackspot bruise [50, 51]. Another research also found that AP2/ERF transcription factors were significantly differentially expressed between potatoes with high and low resistance to tuber blackspot bruise [52]. Therefore, Soltu.DM.06G024040 may serve as a candidate gene for resistance to tuber blackspot bruise and its function necessitates further investigation.
Potato plant maturity has been used as an estimate of several important traits such as the time point of tuberization and total plant life cycle length [53], and it has also been reported to strongly associate with tuber dry matter content [41, 44]. Potatoes grown for industry starch production are mostly late season varieties that will mature in ~ 120 days, leading to a high tuber dry matter content as measured by UWW (Table S16). The potato CYCLING DNA-BINDING WITH ONE FINGER FACTOR 1 (StCDF1) gene was reported to be a central regulator for plant maturity [53]; however, other minor-effect loci underlying this important phenotype remain largely unknown. Apart from the most significant GWAS peak on chromosome 5, which harbored StCDF1 (Fig. 6c), we detected another clear signal on chromosome 9 for plant maturity (Fig. 6c,g,h). A phytochrome A-associated F-box protein-encoded gene (Soltu.DM.09G023440) was found to be localized 71.1 kb downstream of the leading SNP (Fig. 6g). Notably, this gene is orthologous to EID1 (AT4G02440) in Arabidopsis (Fig. 6i), which functions as an important component in phytochrome signaling cascade [54]. Its tomato ortholog (Solyc09g075080) was reported to be responsible for slower circadian rhythms in domesticated tomatoes [55]. Given the established regulatory role of circadian clock genes in plant growth and adaptation, such as flowering time and tuber onset [56, 57], we propose that Soltu.DM.09G023440 represents a potential candidate gene for regulating potato’s plant maturity, whose function warrants further experimental validation.