Phylogeny of sweet potato and its wild relatives
To investigate the phylogenetic relationship of sweet potato and its wild relatives, we analyzed all putative genetic donors of sweet potato as well as 23 sweet potato cultivars and landraces (Fig. 1). Considering the reported horizontal transfer of IbT-DNAs from Agrobacterium spp. into the genomes of sweet potato and some of its wild relatives, IbT-DNA1 and IbT-DNA2 can serve as ideal natural genetic markers to trace the progenitors of the hexaploid cultivated sweet potato 17,18. Therefore, we sampled the diploid and tetraploid relatives based on IbT-DNA screening (Supplementary Table 1 and Supplementary Figs. 1-4). As diploid relatives, we included three accessions of I. trifida, the species that most likely to be the diploid progenitor of sweet potato, and the two wild relatives I. triloba and I. sp. PI553012. We also sampled ten representative IbT-DNA positive tetraploid wild relatives (ten accessions of I. batatas 4x) from all geographical locations according to Quispe-Huamanquispe, et al. 17 and collection records of the International Potato Center (CIP) and the USDA (Supplementary Fig. 1).
Phylogenetic analyses (Fig.1; Supplementary Fig. 5) based on whole genome variations (50,062,627 SNPs) revealed that the diploid wild relatives form the basal clade in the phylogeny of sweet potato and its wild relatives. The ten accessions of I. batatas 4x are not monophyletic. Among them, the basal I. batatas 4x lineage (accessions CIP403270, CIP695141, CIP695150 and PI518474) resides at the base of a large clade composed of sweet potato cultivars and the monophyletic Ecuador I. batatas 4x lineage (accessions PI561246, PI561247, PI561248, PI561255, PI561258 and PI561261). The phylogenetic reconstruction showed that all sweet potato cultivars and the Ecuador I. batatas 4x lineage form two independent monophyletic lineages, thus suggesting a sister relationship between these two lineages.
The basal I. batatas 4x lineage was nested between the diploid progenitor (I. trifida) and the large clade composed of sweet potato cultivars and the Ecuador I. batatas 4x lineage. Thus, the basal I. batatas 4x lineage likely was the progenitor of both sweet potato cultivars and the Ecuador I. batatas 4x lineage (Fig.1; Supplementary Fig. 5). Specifically, the accession CIP695141 had the smallest branch length relative to sweet potato, suggesting that CIP695141 represents the tetraploid accession that is most closely related to sweet potato. It is particularly noteworthy that accession CIP695141 is the only tetraploid accession examined in this study that harbors both the IbT-DNA1 and the IbT-DNA2 insertions (Supplementary Table 1).
The closest tetraploid accession revealed by haplotype-based phylogenetic analysis (HPA)
To identify the closest tetraploid accession and the possible chromosomal donor to sweet potato, we developed a HPA pipeline (Supplementary Fig. 6). First, we independently phased the genome sets of five representative hexaploid sweet potato cultivar and ten tetraploid I. batatas 4x accession. As for the representative sweet potato cultivars, we chose one representative cultivar from each lineage in the sweet potato phylogeny (Fig.1; Supplementary Fig. 5), i.e., Huameyano, NASPOT5/58, NK259L, Y601, and Yuzi7. Each cultivar was used to extract the syntenic haplotype block with each I. batatas 4x accession. We obtained 439,555-760,769 haplotype blocks in the five sweet potato cultivars (Supplementary Table 2; Supplementary Fig. 7) and 380,895-542,596 haplotype blocks in the ten I. batatas 4x accessions (Supplementary Table 3; Supplementary Fig. 7). Second, we extracted the syntenic haplotype blocks shared between each sweet potato cultivar and each I. batatas 4x accession by comparing their genomic positions. In doing so, we identified 606,246-1,050,700 syntenic haplotype blocks (Supplementary Table 4; Supplementary Fig. 8). Third, we removed (i) redundant syntenic haplotype blocks that had overlapping regions with other blocks, and (ii) those blocks that consist of very short sequences (less than 3 bp). Ultimately, 406,488-642,341 syntenic haplotype blocks were extracted, which accounted for 33.8-42.7% of the sweet potato genome (Supplementary Table 5; Supplementary Fig. 9).
The previously identified syntenic haplotype blocks between each sweet potato cultivar and each I. batatas 4x accession were used to perform phylogenetic reconstructions independently. The phylogenetic trees were inferred by two methods: Unweighted Pair-Group Method with Arithmetic Mean (UPGMA) and maximum likelihood (ML). We calculated the monophyletic ratio and the Nsp-Nwr distance to measure the relationship between the investigated tetraploid accession and the hexaploid sweet potato (Supplementary Fig. 6d). The monophyletic ratio is defined as the proportion of trees in which sweet potato haplotypes forming a monophyletic clade (Supplementary Fig. 6d). The Nsp-Nwr distance is defined as the tree branch length between the most recent common ancestor (MCRA) node of sweet potato haplotypes (i.e., Nsp) and the MCRA node of the tetraploid accession (i.e., Nwr) (Supplementary Fig. 6d). Smaller indices indicate a closer relationship between the investigated tetraploid accession and the hexaploid sweet potato. To increase accuracy, we only included trees that had the same monophyletic judgement by both tree-building methods, and these trees were used to calculate the monophyletic ratio and Nsp-Nwr distance. Among all syntenic haplotype blocks, the 6:4 data set (composed of six haplotypes of sweet potato and four haplotypes of I. batatas 4x) produced the most robust results based on the two indices (Fig. 2; Supplementary Fig. 10-24).
The monophyletic ratio and the Nsp-Nwr distance both indicated that the accession CIP695141 was the closest tetraploid relative of the hexaploid sweet potato, regardless of which sweet potato cultivar was used as the hexaploid reference (Fig. 2). In addition, the accessions CIP403270, CIP695141, CIP695150, and PI518474, which belong to the basal I. batatas 4x lineage, showed closer relationships with sweet potato than any accession in the Ecuador I. batatas 4x lineage (Fig. 2). This result confirms the close relationship between the basal I. batatas 4x lineage and sweet potato as revealed by the SNP-based phylogenetic analysis, and demonstrates that HPA is suitable to distinguish the closest tetraploid accessions of sweet potato.
Gene conversion between sweet potato subgenomes
Gene conversion in polyploids refers to sequence exchanges between homologous genes from different subgenomes, in which one progenitor allele overwites another 29-31. The sweet potato genome is comprised of two B1 and four B2 subgenomes (B1B1B2B2B2B2). Subgenomes B1B1 were donated by the diploid progenitor and subgenomes B2B2B2B2 by the tetraploid progenitor 13 (Fig. 3a). If no conversion events occurred, each syntenic haplotype block should have two copies of the B1 subgenome from sweet potato, four copies of the B2 subgenome from sweet potato, and four copies of the B2 subgenome from I. batatas 4x (Fig. 3a, c). If a gene were converted between B1 and B2 subgenomes, the copy numbers of subgenomes and tree topology should deviate from the standard 2:8 ratio between B1 and B2 in the hexaploid sweet potato and I. batatas 4x (Fig. 3c-e). To detect possible gene conversion events, we first filtered those syntenic haplotype blocks and use only blocks in gene regions with six haplotypes of sweet potato and four haplotypes of I. batatas 4x within. We then used 786-1,634 homogeneous haplotype blocks in gene regions of sweet potato and the closest I. batatas 4x accession (CIP695141) to identify gene conversion events between subgenomes (Supplementary Table 6). Using different sweet potato cultivars as references, 49.1-53.4% of gene regions in sweet potato showed evidence of conversion between subgenomes (Fig. 3b; Supplementary Table 6). We found that B1 to B2 subgenome gene conversions (41.1%-43.8%) were much more common than B2 to B1 conversions (8.0-9.7%) (Fig. 3c; Supplementary Table 6). This was to be expected, as gene conversion is known to be a copy number-dependent process 32.
Genomic signatures of selective sweeps in sweet potato
A genetic diversity comparison between sweet potato and its tetraploid wild relatives was performed by estimating the genome-wide nucleotide diversity (π) of 23 sweet potato cultivars and landraces, and ten I. batatas 4x accessions. The subsequent evaluation showed that sweet potato and I. batatas 4x have very similar genome-wide nucleotide diversities (Supplementary Fig. 26; πsweet potato =0.0227, πI. batatas 4x =0.0231).
To detect potential signatures of selection during sweet potato domestication, we employed three metrics, π ratio (π wild relative/π sweet potato), Fst and XP-CLR, to identify potential selective sweeps associated with natural selection and domestication. Using a 100 kb sliding window with 10 kb steps, a total of 466 potential selective sweeps in the top 1% of π ratio, Fst and XP-CLR scores were detected. These regions contained 1559, 1438 and 8814 genes, respectively (Supplementary Fig. 27). Many of these genes are associated with root initiation and development, cell wall organization, phytohormone biogenesis and response, sugar transport, starch and sucrose metabolism, and plant defense (Supplementary Table 7). We highlighted the 20 genes supported by at least two metrics in Manhattan plots (Fig.4; Supplementary Table 8). Among them, NAC domain-containing protein 100 (NAC100), Agamous-like MADS-box protein AGL14, homeobox protein knotted-1-like (KNOX1), ethylene-responsive transcription factor RAP2-13-like (RAP2-13), and fimbrin-like protein 2 (FIM2) have been reported to be involved in storage root initiation and/or development 33-38. Root hair defective 3 (RHD3), topless-related protein 3-like (TPR3), FAR1-related sequence 5-like (FRS5), and wall-associated kinase (WAK) are functionally related to root development in Arabidopsis 39-41. The nine genes may play important roles in storage root development in sweet potato. Pectin is important for cell wall properties and storage root development 42,43. Two genes involved in pectin biogenesis and acetylation were identified, arabinosyltransferase 1 (ARAD1) and pectin acetylesterase 8-like (PAE8) 44,45. In addition, the sugar transporter SWEET1 known to mediate both low-affinity uptake and efflux of sugar across the plasma membrane 46, three well-known plant defense genes (i.e., Xa21 encoding receptor kinase-like protein, DDS encoding dammarenediol II synthase-like, and N encoding TMV resistance protein N-like) were identified 47-49. Sporamin B, a major storage protein in sweet potato storage roots, which plays additional roles in defense and development 50 was also detected. In polyploids, maintenance of genomic stability poses particular challenges due to the complex meiotic behavior of the chromosome sets and recombination 51,52. Two genes required for maintenance of genomic stability were identified, spindle and kinetochore-associated protein 1 (SKA1), which is essential for proper chromosome segregation 53; and RECQ helicase L2 (RECQL2), which prevents recombination events and channels repair processes into non-recombinogenic pathways 54.