Genomic insights into the recent chromosome reduction of autopolyploid sugarcane Saccharum spontaneum

Saccharum spontaneum is a founding Saccharum species and exhibits wide variation in ploidy levels. We have assembled a high-quality autopolyploid genome of S. spontaneum Np-X (2n = 4x = 40) into 40 pseudochromosomes across 10 homologous groups, that better elucidates recent chromosome reduction and polyploidization that occurred circa 1.5 million years ago (Mya). One paleo-duplicated chromosomal pair in Saccharum, NpChr5 and NpChr8, underwent fission followed by fusion accompanied by centromeric split around 0.80 Mya. We inferred that Np-X, with x = 10, most likely represents the ancestral karyotype, from which x = 9 and x = 8 evolved. Resequencing of 102 S. spontaneum accessions revealed that S. spontaneum originated in northern India from an x = 10 ancestor, which then radiated into four major groups across the Indian subcontinent, China, and Southeast Asia. Our study suggests new directions for accelerating sugarcane improvement and expands our knowledge of the evolution of autopolyploids. A high-quality autopolyploid genome of Saccharum spontaneum accession Np-X and resequencing of 102 accessions provide insights into the recent chromosome reduction and polyploidization in Saccharum.

P olyploidization is a major force in the evolution of plants, and up to 70% of flowering plant species originated shortly after polyploidization 1 . The study of autopolyploids genome evolution has been hampered by the low availability of information, as few homologous chromosome-level genome assemblies have been completed for autopolyploid taxa.
Modern sugarcane (Saccharum spp., Poaceae) is a crucial crop with an economic value of US$90 billion and provides 80% of the world's sugar and 40% of its ethanol yield (https://www.fao.org/faostat/zh/#data/QV). Saccharum spontaneum is a founding Saccharum species that is widely distributed from the Mediterranean to the Pacific. Notably, the stress tolerance of modern sugarcane hybrids has been improved by the donation of alleles from S. spontaneum to their genetic background, a major breakthrough during sugarcane breeding. Even more importantly, this species exhibits wide variation in chromosome numbers, which range from 2n = 40 to 2n = 128, with ploidy levels ranging from 4x to 16x (ref. 2 ). Moreover, the hexadecaploid genome of S. spontaneum is particularly notable for the highest known degree of polyploidy within its genus 2 and exhibits three basic chromosome numbers 3 , x = 8, x = 9 or x = 10. Because of the extensive variation in the ploidy of its genome, S. spontaneum presents an extreme model for the study of the evolution of autopolyploid genomes in plants.
Recently, the release of four Saccharum genomes has provided an opportunity to jointly characterize the evolutionary history of Saccharum 4-6 (J. Zhang et al., submitted). S. spontaneum Np-X (2n = 4x = 40), which grows along the Himalayas at over 1,300 m above sea level, has the lowest total number of chromosomes among natural Saccharum accessions and, to our knowledge, is the only S. spontaneum accession with x = 10. Here, we describe a high-quality autotetraploid genome assembly of the S. spontaneum Np-X genome that we have generated using circular consensus sequencing (CCS) 7 and the resequencing of 102 S. spontaneum accessions with various ploidy levels and basic chromosome numbers. In addition to describing the very recent chromosome reduction and polyploidization events in Saccharum, our study presents a hereditary blueprint for the genomic basis of sugarcane biology and sheds light on the evolution of autopolyploids.
in the Hi-C chromatin contact signals, the corrected contig assembly has an N50 of 382 Kb and represents a significant improvement over the previous S. spontaneum AP85-441 genome (Table 1 and Supplementary Fig. 1). The total size of this contig-level Np-X genome assembly was 2.76 Gb, which accounts for 97.53% of its genome size as estimated by a genome survey based on k-mers (2n = 4x, ~2.83 Gb) ( Table 1 and Supplementary Fig. 2). A total of 59.97 billion (98.73%) of 60.74 billion Illumina short reads could be aligned, and covered 99.05% of the assembly (Supplementary Table 2).
Five rounds of sequential fluorescence in situ hybridization (FISH) 9 suggested that there are ten homologous groups with four allelic chromosomes within each group in Np-X ( Fig. 1c and Extended Data Fig. 1). We then used ~105× of Hi-C data to scaffold the allele-aware, chromosome-level autotetraploid S. spontaneum Np-X genome using ALLHiC 5,10 , and the resulting assembly was further corrected with the assistance of 3D-DNA 11 using JuiceBox 12 (Extended Data Fig. 2 Table  3). A total of 2.73 Gb (98.60%) of contigs were anchored into 40 pseudochromosomes composed of ten homologous groups of four allelic chromosomes with a mean size variation of 5.9% within each group (Fig. 1, Supplementary Tables 4-6, Supplementary Fig. 3 and Supplementary Note). A total of 241 (96.27%) complete gene models among the 248 ultraconserved core eukaryotic genes in CEGMA 13 and 1,389 (96.46%) among 1,440 conserved genes in BUSCO 14 were recalled in our assembly (Supplementary Tables 7 and 8).

and Supplementary
As an independent validation, a total of ~26 Gb of Oxford Nanopore ultralong reads with a raw error rate of 5-8% and an N50 of 121 Kb were used to verify the genome assembly (Supplementary Table 1). We could align 90.4% of these ultralong reads that have a mean identity of 95.6% to the assembled S. spontaneum Np-X genome (Extended Data Fig. 3). We further calculated that the rate of false joins between haplotypes is 3.94%, with an estimated 0.05 switch errors per megabase (Mb) (Extended Data Fig. 4 and Supplementary Table 9). This highly contiguous assembly allowed us to predict 37 potential centromeric regions with lengths ranging from 0.41 Mb to 8.08 Mb and 52 telomere regions with monomer copy numbers ranging from 104 to 6,914 along the 40 chromosomes (Supplementary Tables 10 and 11).
Genome restructuring in S. spontaneum. The decrease in the basic chromosome number from 10 to 8 in the previously reported S. spontaneum AP85-441 was caused by fission followed by fusion of its ancestral sorghum chromosome homologs 5 and 8 and its rice chromosome homologs 11 and 12, which are paleo-duplicated chromosome pairs (PdCPs) that originated from the ρ event in the Poaceae 5,17,18 . In contrast, these chromosomal reductions and restructuring events were not found in S. spontaneum Np-X (Fig.  1d,e). Relative to sorghum, two inversions occurred in S. spontaneum AP85-441 chromosomes APChr2AB and APChr7AB and three chromosomal fragments occurred in AP85-441 chromosome APChr6ABD 5 , but these chromosomal inversions were also absent in S. spontaneum Np-X or the related genus Miscanthus (Extended Data Figs. 5 and 6a). These results indicated that chromosomal inversions occurred after the event resulting in the chromosome number decrease in S. spontaneum AP85-441, further confirming that S. spontaneum Np-X retained the chromosome forms of the last common ancestor of S. spontaneum.
To study the fission and fusion of Chr5 and Chr8, we assessed the synteny between S. spontaneum Np-X and S. spontaneum AP85-441 chromosomes (Fig. 2) and found that the recombination breakpoints in S. spontaneum Np-X were located on the centromeres of NpChr5 and NpChr8. APChr5 is composed of the ancestral short arms of NpChr5 and NpChr6, APChr6 is composed of the ancestral long arms of NpChr5 and NpChr7, and the ancestral centromere-specific sequences of NpChr5 were only retained in APChr6 (Fig. 2a,c). Similarly, APChr2 is composed of the ancestral long arms of NpChr8 and NpChr2, and its centromere-specific sequences were retained from only ancestral NpChr2. APChr7, which is composed of the ancestral short arms of NpChr8 and NpChr9, retained two centromere-specific sequences from the two Np-X chromosomes. Sb02 Comparative analysis of the homologous genomic regions in Np-X and AP85-441 showed that a 6.5-Mb genomic region of NpChr5 containing 15 genes and a 7.1-Mb segment of NpChr8 containing 28 genes appear to have been lost from the centromeric regions of the AP85-441 genome ( Fig. 2d and Supplementary Tables 16 and 17). The chromosome-specific oligo-FISH experiment further verified the chromosome rearrangement between Np-X and AP85-441. The centromere satellite repeat FISH probe displayed strong signals in one centromere of each chromosome in Np-X, whereas signals from dicentric chromosomes were identified in two of the fusion chromosomes, APChr6 and APChr7 ( Fig. 2a and Extended Data Fig. 6a). These results were also consistent with observations of Hi-C interaction signals in the assembled genomes of Np-X and AP85-441 (Extended Data Fig. 6b,c), further supporting the reliability of the S. spontaneum Np-X genome assembly.
Gene redundancy on PdCPs. The genes on NpChr5 and NpChr8 displayed lower transcript expression than those located on the other chromosomes in the examined tissues, and a similar trend was observed in the AP85-441 (Fig. 3a) and S. officinarum transcriptomes ( Supplementary Fig. 7). Moreover, the homologous regions between these two chromosomes displayed significantly higher mean K s values (K s = 0.036) than the homologous regions of other chromosomes (K s = 0.022) in Np-X (Fig. 3b), suggesting more rapid evolution of these PdCPs.

Change in the chromatin compartments of PdCPs.
To investigate the evolution of the three-dimensional (3D) genome for PdCPs in Poaceae, we analyzed chromatin compartments in PdCPs of AP85-441 5 , Np-X, sorghum 16,21 and rice 20,22 . In these four genomes, 45-56% and 44-55% of genomic regions exist in compartment A and compartment B, respectively (Supplementary Table 19 and Supplementary Note). The genes of the homologous chromosomes of NpChr5, NpChr6, NpChr7 and NpChr9 (74.0-93.5%) reside mainly in the most conserved regions among the four genomes, whereas the genes (66.7-85.4%) that underwent switching from compartment B to compartment A reside mainly on the homologs of NpChr2, NpChr7 and NpChr9 (Extended Data Fig. 7). It is noteworthy that NpChr5 and NpChr8 displayed the lowest degree of regions exhibiting B-to-A compartment switching among the homologous chromosome sets in Np-X compared with AP85-441 ( Fig. 3c and Extended Data Fig. 7b), and that genes located in these regions are more highly expressed in AP85-441 than in Np-X (Extended Data Fig. 8). These results indicate that the restructuring of NpChr5 and NpChr8 might have suppressed the switching of chromatin status from inactive to active. The experiments were repeated at least three times with similar results. a1,a4, FISH mapping of chromosome-specific probes (Chr2, Chr5, Chr6, Chr7, Chr8 and Chr9) in the same metaphase cells from S. spontaneum Np-X and AP85-441. FISH signals of six chromosome-specific probes were detected in individual chromosomes in Np-X, whereas in AP85-441, FISH with Chr5-specific probe (green) showed colocalization signals with the Chr6-specific probe (yellow) and Chr7-specific probe (red), and FISH with Chr8-specific probe (cyan) showed colocalization signals for the Chr2-specific probe (pink) and Chr9-specific probe (blue). a2,a5, FISH mapping of centromere satellite-specific probe in S. spontaneum Np-X and AP85-441. FISH signals of centromere satellite-specific probes were detected as a single centromere for each chromosome in Np-X, whereas FISH signals of dicentric chromosomes were detected in two sets of homologous fusion chromosomes, ApChr6 (NpChr5-NpChr7) and ApChr2 (NpChr8-NpChr9), in AP85-441. a3,a6, DAPI-stained metaphase chromosomes in S. spontaneum Np-X and AP85-441. Scale bars = 5 μm. b, Characteristics of the evolution of ancestral NpChr5 and NpChr8 in the Poaceae. Genomic collinearity blocks are linked by gray lines. The synteny blocks in ancestral NpChr5 and NpChr8 between different species are shown with blue and yellow lines, respectively. AP, Saccharum spontaneum AP85-441; Ms, Miscanthus; Np, Saccharum spontaneum Np-X; Os, Oryza sativa; Sb, Sorghum bicolor. c, The collinearity blocks between chromosomes are indicated with gray lines, and the blue line with peaks represents the density of centromere-specific sequences. The red arrows and bars indicate the corresponding centromeric regions in breakpoints, and the blue arrows and bars indicate the (corresponding) centromeric regions in four chromosomes including NpChr6, NpChr7, NpChr2 and NpChr9. d, The recombination breakpoints are indicated by red lines, together with the segment sizes and the number of genes lost. The synteny blocks of NpChr5 and NpChr8 in S. spontaneum Np-X compared with the corresponding regions in S. spontaneum AP85-441 are highlighted with blue and yellow lines, respectively. , internodes (2, 4 and 6 for Np-X; 3, 6 and 9 for AP85-441; scale bars = 2 cm), leaf 1 (scale bars = 5 cm), a cross-sectional shape of leaves (long rectangular shape for S. spontaneum AP85-441 and elliptical shape for S. spontaneum Np-X, scale bars = 100 μm) and anatomical structure of leaves (scale = 20 μm). BS, bundle sheath cell; M, mesophyll cell. The experiments were repeated independently at least three times with similar results. b, The genomic features of S. spontaneum Np-X. The tracks indicate (from outermost to innermost) 40 pseudochromosomes of S. spontaneum Np-X in Mb (A), GC content (B), gene density (C), distribution of genes putatively encoding nucleotide-binding site (NBS) proteins (D), TE density (E), SNP density (F), InDel density (G), leaf transcriptome (H), stem transcriptome (I), π values (J) and Tajima's D values (K); links in the inner circle indicate matching synteny blocks between ChrXA and ChrXB (red), ChrXA and ChrXC (green) and ChrXA and ChrXD (blue), with X representing chromosome numbers 1 through 10 (L). c, FISH mapping of chromosome-specific oligo probes in the same metaphase cell of S. spontaneum Np-X (2n = 4x = 40). The experiments were repeated independently at least three times with similar results. From left to right in the upper panel, S. spontaneum Np-X chromosome-specific oligo probes for NpC1, NpC3, NpC6, NpC7 and NpC9 are visualized in red, and probes for NpC2, NpC4, NpC5, NpC8 and NpC10 are in green. The lower panel shows karyotypes of S. spontaneum Np-X NpChr1 through NpChr10 used to assess the chromosome specificity of the FISH probes in its genome. Scale bars = 10 μm. d,e, Graphical alignment of S. spontaneum Np-X chromosomes with Sorghum (d) and S. spontaneum AP85-441 chromosomes (e). A set of four homologous Np-X chromosomes aligned to a single sorghum chromosome (d) and a set of four homologous AP85-441 chromosomes (e). The labels (0, 20k, 40k, …) on the x and y axes indicate the gene rank along the length of the chromosomes.
Recent polyploidization in Saccharum. Based on K s estimates, Saccharum diverged from sorghum and Miscanthus ~6.4 Mya (K s = 0.08) and ~4.0 Mya (K s = 0.05), respectively. Within Saccharum, S. spontaneum split from S. officinarum about 1.6 Mya (K s = 0.02), and the two S. spontaneum accessions, Np-X and AP85-441, separated ~0.8 Mya (K s = 0.01) (Fig. 4a), demonstrating that the chromosome reduction in AP85-441 occurred very recently. Considering that x = 10 appears to be the ancestral chromosome number in Saccharinae and Sorghinae, the autooctoploids of Saccharum, S. spontaneum SES208 (2n = 8x = 64) and S. officinarum (2n = 8x = 80), should have experienced two rounds of whole-genome duplication (WGD), whereas the autotetraploid S. spontaneum Np-X experienced only one round of WGD after its divergence from sorghum (Fig. 4) diverged at an estimated K s value of 0.01 in both Np-X and S. officinarum, and at a K s value of 0.00 in AP85-441. Np-X is supposed to have more highly diverged sets of homologous chromosomes than the halved homologous chromosome sets of AP85-441, which is a haploid (1n = 4x = 32) of a natural octoploid SES208. The timing of the WGD leading to the autopolyploidization of natural S. spontaneum (Np-X) is similar to that of S. officinarum, although these WGDs were inferred to be independent events (Fig. 4a).
In the four genomes 4-6 (J. Zhang et al., submitted), LTRs appear to have undergone continuing and recent amplification bursts ranging from 0 Mya to 2 Mya (Fig. 4b). Ten distinct insertion peaks with identities ranging from 65% to 98% were detected in the four genomes (Fig. 4c,d). A Gaussian probability density function (GPDF) analysis estimated that the earliest TE insertion events (P1 and P3) occurred ~2. Genes related to the key characteristics of Saccharum. Given that Np-X exhibits the ancestral chromosome forms of S. spontaneum, comparative analysis of the genes in the three Saccharum and sorghum genomes may offer clues to the evolution of key agronomic characteristics of Saccharum. We therefore analyzed the core gene families related to sugar accumulation, photosynthesis and leaf width in Saccharum (Supplementary Note and Supplementary Data 3). C 4 photosynthesis pathway. Compared with sorghum, gene expansions occurred in the NAD-ME gene family in both Np-X and AP85-441, but not in S. officinarum (Fig. 5a, Supplementary Fig. 13   The horizontal and vertical axes represent the insertion time of intact LTR retrotransposons and the density of intact LTR retrotransposons, respectively. c, Classification of intact LTR retrotransposons in the S. spontaneum Np-X genome. LTR families with more than 100 copies are shown. d, Sequence identity distribution of TE hits represented in a swarm plot for representative species. The most recent and longest LTR/Gypsy sequence among LTR families was chosen as the representative sequence for detecting additional TE hits in the genomes. We identified a total of 85,022 dots (TE hits) in S. spontaneum AP85-441, 88,087 in S. spontaneum Np-X, 53,927 in S. officinarum, 28,135 in Miscanthus and 22,849 in sorghum. The TE hits in areas inside the boxes (P1-P10) represent ten distinct LTR/Gypsy burst events in different genomes. e, Number of TE hits with the representative intact LTR retrotransposons as the query sequence and their associated identity values. The calculated burst time based on the GPDF fitting of each peak is indicated at the arrow. The ten peaks, P1-P10, defined in d are highlighted as shaded columns. The first red triangle represents the divergence between S. spontaneum and S. officinarum ~1.6 Mya, and the second red triangle represents the divergence between S. spontaneum Np-X and S. spontaneum AP85-441 ~0.8 Mya. f, A schematic of the evolution of Saccharum. The stars represent the WGDs, and the circles represent chromosome rearrangement events. The dates shown for the WGDs are inexact and are for illustrative purposes only, as we lack the definitive resolution from the K s results alone.  2  1  1  1  1  1  SPP1B  1  1  1  1  1  1  SPP2  0  0  1  1  1  1  HK  10  10  7  10  7  7  PPG  5  12  5  6  5  6  PFK  10  14  11  12  11  14  PFP  5  8  6  6  6  6  UGPases  6  16  5  5  7  4  GPI  3  5  2  2  6  3  USPase  1  3  1  1  1  1  AGPase  1  1  1 (Fig. 5b). Therefore, it seems that the types of component genes involved in the C 4 photosynthesis pathway in Saccharum and the regulation of their respective expression might have converged.
Narrow-leaf genes. S. officinarum has much larger leaves than S. spontaneum, and among S. spontaneum varieties, Np-X has much smaller leaves than AP85-411 (Fig. 1a). About six narrow-leaf (NAL) genes controlling leaf width have been reported in rice 23-28 , among which NAL1 has the strongest effect on leaf width. We identified 13 NAL genes in the three Saccharum genomes (Fig. 5a,c and Extended Data Fig. 9a), and NAL1 and NAL10 transcripts are expressed at much lower abundance in Np-X than in either AP85-441 or LA-Purple, and both exhibit lower transcript expression in leaves than in stems ( Fig. 5c and Extended Data Fig. 9b), suggesting NAL1 and NAL10 as candidate genes affecting leaf width.
Origin and independent polyploidization of S. spontaneum. We generated a total of 4,682 Gb of resequencing reads for 102 S. spontaneum accessions and 14 related species for population genetics analysis using the reference S. spontaneum Np-X genome ( Fig. 6a and Supplementary Table 21). A total of 3,345,380 high-confidence variants, including 3,140,400 single nucleotide polymorphisms (SNPs) and 204,980 insertion-deletion polymorphisms (InDels), were identified in these data. Using the 14 related species as the outgroup (one Sorghum, one Miscanthus, seven S. officinarum and five S. robustum), principal component analysis (PCA) of these SNPs revealed substantial genetic diversity among S. spontaneum groups (Fig. 6b). PC1 (35.81%) was sufficient to separate the S. spontaneum accessions from the outgroup accessions, and PC1 together with PC3 clearly divided the S. spontaneum accessions into four groups, which was consistent with our maximum likelihood-based phylogenetic analysis (Fig. 6a,c). These four groups displayed continuous geographic distribution from the Indian subcontinent to eastern and southern Asia (Supplementary Note). Further, we estimated admixture proportions and individual ancestry based on the SNP dataset ( Fig. 6d and Extended Data Fig. 10). In the admixture plot at K = 5 (Fig. 6d), each of the five groups exhibited distinct relative monophyletic ancestry matching our maximum likelihood phylogenetic analysis and further supports a low level of genetic exchange among the four S. spontaneum groups due to their geographic isolation. Only weak gene flow was detected between group I and group II (P value = 2.2 × 10 −308 ; F = 0.397) (Fig. 6e). Further, population structure analysis showed that 12 accessions in group I had some group II ancestry and that 11 accessions in group II had some group I ancestry (Fig. 6d), supporting the hypothesis of limited gene flow between the two groups. All of these findings indicate that the four S. spontaneum groups evolved relatively independently after they originated on the Indian subcontinent (group I) and spread step by step to the regions where group II, group III and group IV are now distributed. These results are also supported by linkage disequilibrium (LD), nucleotide diversity value (π) and genetic differentiation values (F ST ) among these four groups (Extended Data Fig. 10 and Supplementary Note).

Evolution of basic chromosome numbers in S. spontaneum.
With the availability of the S. spontaneum Np-X genome (x = 10), basic chromosome numbers can be determined based on the coverage of mapping reads on the restructured chromosomes (NpChr5 and NpChr8). Accessions with x = 9 typically show signs of restructuring around NpChr5 but not NpChr8 (Fig. 7a).
A total of 4, 7 and 91 of the S. spontaneum accessions have x = 10, x = 9 and x = 8, respectively (Fig. 7b). It is noteworthy that the accessions with x = 8 were distributed across all four groups, whereas the accessions with x = 10 and x = 9 formed a clade that was nested within group I and were mainly located in Pakistan, northern India and Tibet (Fig. 6c). As accessions with x = 9 and x = 10 are found only in group I, whereas those with x = 8 appear in all four groups of S. spontaneum accessions, we assumed that these fluid ploidy levels have evolved independently from ancestral progenitors with x = 8 in group II, group III and group IV. In addition, gene flow could not be detected between any pairs of these three groups of accessions (Fig. 6e). The population of accessions with x = 8 shows much lower LD decays (mean, 0.043) than those with x = 9 and x = 10 (means of 0.127 and 0.256, respectively) (Fig. 7c), supporting the notion that they had not undergone artificial selection (Supplementary Note).
S. spontaneum experienced a prominent expansion in effective population size (N e ) (N e ~ 500,000) around 120-140 thousand years ago (Kya) and a subsequent N e contraction (N e ~ 10,000-60,000) around 8-14 Kya (Fig. 7d). Interestingly, the time of the N e expansion corresponded to that of the marine isotope stage (MIS) 5e interglacial period, the warmest interval of a warming phase during which the earth emerged from an extreme glacial phase [29][30][31][32][33] . The time of the N e contraction corresponded to the Younger Dryas cold event that occurred ~1.1-1.2 Kya, during which the global climate changed dramatically [34][35][36] . The S. spontaneum population of accessions with x = 10 diverged demographically from the populations with x = 9 and x = 8 with a much smaller N e in recent history.

Discussion
Studies of recent genomic polyploidization are rare because genomic analysis of autopolyploids is notoriously challenging. The autopolyploid Saccharum genomes can bridge ancient and recent polyploidization events, as the Saccharum genomes experienced a recent WGD less than 0.80 Mya and retained a set of PdCPs that originated from a much older WGD affecting all cereal species that occurred ~80 Mya. K s estimates for the PdCPs (NpChr5 and NpChr8) in S. spontaneum Np-X indicate relatively higher levels of nucleotide diversity between alleles (Fig. 3b). Interestingly, in AP85-441, this pair of chromosomes coincide with the site of the chromosomal fission and inversions. The rearrangement between this paleo-chromosome pair might have largely suppressed the illegitimate recombination between the duplicated genes, which   led to the decrease of ongoing gene conversions, and resulted in fewer converted gene pairs. In contrast, Np-X retained the intact paleo-chromosome pair (Chr5 and Chr8) and has thus kept some recurring recombination between this paleo-chromosome pair. Functional redundancy probably resulted in genes on one PdCP, NpChr5, displaying the lowest transcript abundances among genes on the ten S. spontaneum Np-X chromosome groups (Fig. 3a). Ancestral NpChr5 split into two major segments and experienced translocations in the ancestors of NpChr6 and NpChr7 in all of the examined accessions with x = 9 ( Fig. 1e and Fig. 2). Therefore, we hypothesize that the ancestral NpChr5 split was the first step in the chromosome number reduction followed by rearrangement of ancestral NpChr8 during the evolution of the x = 8 form. But we cannot exclude the possibility that the ancestral NpChr8 split was the initial step for the chromosome reduction, as we might not have identified all of the accessions with x = 9. We might answer the questions raised in ref. 17 and consider the split of NpChr5 in the x = 9 form as a stepping stone in the process of chromosome number reduction rather than assuming that the x = 9 form originated from crosses between the x = 8 and x = 10 forms. However, we can still hypothesize that S. spontaneum evolved after recent polyploidization followed by chromosome number reduction in diploids (Fig. 7e). The phylogeny and current geographical distribution indicate that S. spontaneum originated from northern India, a biodiversity hotspot near the Himalayas with a climate that has been influenced by monsoons in both eastern and southern Asia 37 , and then radiated along the Middle East, East Asia and Southeast Asia mainly with founder species of x = 8. S. spontaneum with x = 8 is more adaptable to the environment, which facilitates diversification and radiation. However, the x = 10 forms of Saccharum, including S. spontaneum, S. robustum and S. officinarum, have very limited geographical distributions. This phenomenon might be explained by the recent WGD having generated only genetic redundancy rather than genetic variation, whereas both the recent genome restructuring and genes involved in selective sweeps for functions related to polysaccharide metabolism and stress tolerance have contributed more substantially to the adaptive evolution of S. spontaneum.
An ancestral S. spontaneum population with x = 10 and a small N e diverged during a population contraction 8 Kya and might not have a wide adaptation to environment. Around the same time, other S. spontaneum populations with x = 8 and x = 9 also underwent population contractions. Thus, the forms of S. spontaneum with x = 8 and x = 9 likely evolved from the form with x = 10. The LDs for different subpopulations (x = 8, x = 9 and x = 10) dropped quickly but tend to decay at different rates (Fig. 7c). This may be due to the different N e values for these subpopulations. The 'effective' , recombining population size for x = 8 is probably the largest (consistent with the larger diversity spanning multiple subpopulations), followed by x = 9 and x = 10. The genome restructuring that occurred within S. spontaneum has shown interesting functional and adaptive implications for this species on the population scale.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/ s41588-022-01084-1.

Methods
Sampling and sequencing. Fresh, young leaves were collected from individual S. spontaneum Np-X plants cultivated in a greenhouse kept at 25-30 °C with 16 h of light per day. Genomic DNA was isolated from these young leaves using a cetyltrimethylammonium bromide (CTAB) method 38 . The extracted DNA was then packaged with dry ice and sent to Biomarker Technologies Corporation for the construction of CCS libraries and Illumina short-read libraries that were subsequently sequenced on the PacBio Sequel and Illumina Hiseq platforms, respectively. A total of ~52 Gb of PacBio HiFi reads and ~417 Gb of Illumina reads were generated for de novo assembly of the S. spontaneum Np-X genome.
Total RNA was extracted from mature stems and leaves of S. spontaneum Np-X using TRIzol (Invitrogen). These two RNA samples were sent to Novogene for the construction of transcriptome libraries and were sequenced on an Illumina platform.
Hi-C library construction and sequencing. Fresh, tender leaves were collected from S. spontaneum Np-X plants and used to prepare chromatin crosslinked to DNA and fixed with formaldehyde 5 , and then we digested the crosslinked DNA with HindIII. The produced sticky ends were biotinylated and proximity-ligated to form chimeric junctions. The processed DNA was further enriched and physically sheared into fragments of 300-500 base pairs (bp). After that, all of the prepared DNA fragments were processed into paired-end sequencing libraries. Finally, a total of ~290 Gb (~105×) 150-bp paired-end Hi-C reads were produced from the Illumina platform and further used for genome scaffolding (Supplementary Table 3).
Genome survey for genome size estimation. About 417 Gb of Illumina short reads from S. spontaneum Np-X were filtered using Trimmomatic 39 (v0.36) software with default parameters. The clean reads were used to create k-mer sets using Jellyfish 40 (v2.2.6). The genome size was estimated using k-mer (k = 21) frequency-based methods with k-mer number/k-mer depth. The genome size of S. spontaneum Np-X was estimated as 2,832.7 Mb with a heterozygosity rate of 1.14% based on a 21-k-mer distribution ( Supplementary Fig. 2).
Genome assembly and scaffolding. About 52 Gb of CCS clean reads were used to assemble the S. spontaneum Np-X genome with Canu (v1.9) 8 using the optimized parameters 'batOptions = -dg 3 -db 3 -dr 1 -ca 500 -cp 50' , HiCanu (v2.0) 41 with the parameters 'batOptions = -eg 0.0 -sb 0.001 -dg 0 -db 3 -dr 0 -ca 2000 -cp 200' , and hifiasm (v0.11-r302) 42 with the parameters '-l0 -u' . This resulted in an assembly of 2.76 Gb and an N50 length of 405 Kb by Canu, an assembly of 2.97 Gb and an N50 length of 3,541 Kb by HiCanu, and an assembly of 2.89 Gb and an N50 length of 1,962 Kb by hifiasm. However, many chimeric contigs were identified in the latter two assemblies so that they could not be scaffolded with Hi-C using ALLHiC 10 . We attempted to solve the chimera problem using Hi-C interaction signals, but the scaffolds were still aborted after chimera correction. After comprehensive comparison of the results obtained by the above different assembly methods, the best-assembled Canu version was finally chosen to serve as the reference genome for subsequent analysis.
A total of 290 Gb Hi-C reads were mapped on the contig-level assembly using Burrows-Wheeler Aligner (BWA, v0.7.15) 43 software with default parameters. The mapping results were pruned using an in-house script that generated a BAM file. We correct the contig-level assembly based on the chromatin contact signal. All of the corrected contigs were remapped with Hi-C reads and reordered and scaffolded using the ALLHiC 5,10 pipeline, and the resulting assembly was manually corrected with the assistance of 3D-DNA 12 according to the visualization of chromatin contact patterns. Finally, we generated a pseudochromosome assembled genome that included 40 chromosomes with a total length of 2,760 Mb (Np-X.assembly.v5) ( Table 1). We mapped about ~60.7 Gb of Illumina short reads onto the assembled chromosome-level genome using Bowtie2 (v2.3.4.3) 44 to estimate the quality of the assembly.

Switch error evaluation.
High-molecular-weight genomic DNA was prepared by the CTAB method and was followed by purification using the QIAGEN Genomic kit (13343). The ultralong reads were generated using the Oxford Nanopore high-throughput (ONT) sequencing platform. The reads were obtained by GUPPY (v5.0) with the high-accuracy model. The longest 10,000 ultralong ONT reads with an N50 length of 193 Kb and a mean per-base accuracy rate (QV) of 92-95% were selected to assess the haplotype error rate. These high-quality ONT reads have a cumulative length of 1.98 Gb (Supplementary Table 10). We sliced each high-quality read into 10-Kb windows and used each window to look for the best alignment across the assembled genome. If no switch error occurred within the read, then all windows should align to the same or consecutive contigs on the same haplotype; otherwise, the read could indicate a potential haplotype switch error.
Gene annotation. Transcriptomic data were obtained from RNA isolated from stem and leaf tissues of S. spontaneum Np-X. Trimmomatic was used to further filter RNA sequencing (RNA-seq) data, and then HISAT2 (v2.1.0) 45 , which blocks duplicates, was used to align reads to the reference genome sequence. After the reads were aligned, different coverage thresholds were set according to the sequencing depth of each aligned region to obtain reliable intron and optimal transcript information. Then, we used TransDecoder (v5.5.0) (https://github.com/ TransDecoder/TransDecoder/wiki) to predict the open reading frames (ORFs) of optimal transcripts and define gene models. The optimal gene models were then screened and trained using AUGUSTUS (v3.3.2) 46 software. We chose protein sequences from closely related species of maize, sorghum, rice and sugarcane, and used them as input for Genewise (wise2-4-1) software (https://github.com/brewsci/ homebrew-bio/blob/master/Formula/genewise.rb) for gene prediction in the S. spontaneum Np-X genome. Next, we collected exon information for homologous proteins and transcripts, and collected intron information by comparing reads. AUGUSTUS was used to combine the above intron and exon information for gene prediction. The results of the above three methods were integrated, and then the Pfam database 47 was used for screening to obtain final gene prediction results. Finally, we used a Perl script to analyze the final assembled genome for eukaryotic genes and obtained a total of 123,128 high-confidence gene models.
Gene function annotation. Gene functions in the S. spontaneum Np-X genome were predicted using the best matches of the alignments as queries against the eggNOG 48 (http://eggnog5.embl.de/#/app/downloads), Nr 49 (ftp://ftp.ncbi.nih. gov/blast/db) and SWISS-PROT 50 (https://www.uniprot.org/downloads) databases using BLASTP (v2.10.0+) (E value = 1 × 10 −5 ). The eggNOG, Nr and SWISS-PROT databases were downloaded to our local server. Unigenes were then used to query the National Center for Biotechnology Information (NCBI) nonredundant nucleotide sequence (Nt) database using BLASTN (v2.10.0+) with a cutoff E value of 1 × 10 −5 . We used a Perl script provided by the European Bioinformatics Institute (EBI, https://www.ebi.ac.uk/) for InterPro (http://www.ebi.ac.uk/ interpro) annotation. The script sends the sequences to the official web server at InterProScan 51 for InterPro annotation and returns the results to our local server.
To determine whether the proteins encoded by these genes might participate in any functional pathways, all gene models were aligned (E value = 1 × 10 −5 ) to the Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology (KO) database. The putative functions of the gene models were predicted and classified using the Clusters of Orthologous Groups (COG) database and the Eukaryotic Orthologous Groups (KOG) database. The online KEGG Automatic Annotation Server was used to assign assembled sequences to KEGG pathways.
Design of ten oligo libraries and Oligo-FISH. S. spontaneum Np-X genome (chromosome set A) was used for oligo design. RepeatMasker (v4.0.8) software (http://www.repeatmasker.org/) was used to filter the potential repetitive sequences with Saccharum, sorghum and maize repetitive library (https://www.girinst.org/ server/RepBase/index.php, RepBaseRepeatMaskerEdition-20181026.tar.gz). Then, the remaining sequences were used to design the oligo probe sets using Chorus2 (v2.0) software 53 . The oligo libraries were synthesized by CustomArray (GenScript) and labeled with FAM-green (direct) and Cy3-red (direct) according to ref. 9 . FISH was performed as previously described in ref. 9 with minor adjustments: 10 μl of hybridization mixture solution containing 1.5 μl of FAM (green) probes and 1.5 μl of Cy3 (red) probes was placed onto the selected slide. Subsequently, the slides were prehybridization heated for 3 min at 55 °C, followed by incubation for overnight at 37 °C. Then, coverslips were gently removed, and the slides were washed for 3 min in 2× SSC, 10 min in 2× SSC and 3 min in 1× PBS, continuously. Finally, the slides were dried and counterstained with 4′,6-diamidino-2-phenylindole (DAPI). For multiple rounds of FISH, the probes were removed by washing the slides in 4× SSC (including Tween-20) three times (each 5 min) and in 2× SSC three times (each 3 min) after removing the coverslips. The slides were then dehydrated in 70% and 100% ethanol for 3 min. Then, the slides were denatured again in 70% formamide at 70 °C for 1 min for the next round of FISH for further hybridization with the remaining painting probes.
Preparation of chromosome spreads and FISH. Chromosomes were prepared according to a previously described method 54 . In brief, root tips were collected from potted plants and pretreated in 2 mM 8-hydroxyquinoline at 25 °C for 2 h. The root tips were fixed in 3:1 (v/v) ethanol:glacial acetic acid at 25 °C overnight. After washing in distilled water for 5 min, the root tips were incubated in an enzyme mixture consisting of 2% cellulase Onozuka R-10 (Yakult Pharmaceutical Industry Co., Ltd.) and 1% pectolyase from Aspergillus niger (Sigma-Aldrich). Afterward, the root tips were washed again in distilled water and then fixed in an ethanol:acetic acid (3:1) fixative solution. The root tips were broken by using a pipette tip to separate cells from one another. The cell suspension was dropped onto glass slides. Finally, the slide was air-dried and then kept at −20 °C until use.
The FISH experiment was performed as previously described 54 with moderate improvement. The 10 μl of hybridization solution containing 1 μl of biotin-labeled probes and 1 μl of digoxigenin-labeled probes was hybridized with the target slide. Then, the slides were incubated overnight at 37 °C. Coverslips were gently removed, and the slides were washed for 3 min in 2× SSC, 10 min in 2× SSC and 3 min in 1× PBS, continuously. Hybridization signals were detected with Alexa Fluor 488 streptavidin (Thermo Fisher Scientific) for biotin-labeled probes and with rhodamine-conjugated anti-digoxigenin (Roche Diagnostics) for digoxigenin-labeled probes. The chromosomes were counterstained with DAPI (2 mg ml −1 ) and mounted in VECTASHIELD (Vector Laboratories). An Olympus BX63 fluorescent microscope was used to observe metaphase plates with fluorescent signals that were photographed with an Olympus DP80 CCD camera and visualized using cellSens Dimension (v1.9) software (Olympus).
Hi-C read mapping and normalization. After quality filtering using Trimmomatic, clean Hi-C data were mapped to S. spontaneum Np-X genome using Bowtie2 with default parameters. Singleton reads, multimapped reads and duplicated read pairs were removed by the quality control module of HiC-Pro (v2.11.1) 55 . Therefore, only pairs for which both reads could be uniquely aligned were retained to identify valid interactions. Raw contact matrices were constructed with bin sizes of 500 Kb and normalized using the iterative correction and eigenvector decomposition (ICE) method implemented in HiC-Pro.

Identification of compartments A and B.
PCA implemented in the R package HiTC (v1.28.0) 56 was applied to identify compartments A and B on the 500-Kb corrected matrix of each chromosome in S. spontaneum Np-X, S. spontaneum AP85-441, sorghum and rice. For each chromosome, genomic bins with a positive or negative value for the first eigenvector (PC1) were assigned to compartment A or B, respectively. Regions with PC1 in the same direction with a greater number of genes and higher expression levels corresponded to compartment A, whereas regions with PC1 in the opposite direction belonged to compartment B. To identify regions that had switched compartment status during genome polyploidization, we considered only regions that showed changes of PC1 values from positive to negative or vice versa in both biological replicates. However, smaller, local-scale chromatin structures in the S. spontaneum Np-X genome such as topologically associating domains (TADs) were not analyzed, as the depth of our Hi-C data precludes a high-resolution analysis. Therefore, the relationship between compartment and local-scale chromatin structures in polyploids cannot yet be reliably described.
Repeat sequence annotation. Consensus TE sequences were generated using RepeatModeler (open-1.0.8) with a combination of de novo and homology strategies, including two de novo repeat-finding programs, RECON (v1.08) 57 and RepeatScout (v1.0.5) 58 , which we imported into RepeatMasker to identify and cluster repetitive elements. Tandem repeats were identified using the Tandem Repeats Finder (TRF) (v4.07b) package 59 , and unknown TEs were classified using TEclass (v2.13) 60 . Next, the outputs from the above processes were used to identify telomeres and centromeres. We also integrated results from LTR_FINDER (v1.06) 61 and LTRharvest (v1.5.10) 62 and removed false positives from the initial predictions using the LTR_retriever (v2.6) pipeline 63 . These LTRs were also classified as either intact or nonintact LTRs.
LTR burst time estimation. The most recent and longest LTR/Gypsy sequence was selected as the representative sequence for detecting additional TE hits in the genomes. Full-length and truncated LTRs with various lengths and identities were identified across genomes, and then each sequence (l, length) was divided into 30-bp units to determine the number of dots (TE hits) (n = l/30) with the same identity following a previously reported method 64 . All dots were used to generate a box plot according to their identities. Single peaks in the TE identity distribution curves were separated for GPDF fitting and burst time calculation, and the average nucleotide substitution ratio (K) was defined as 2.58 s.d. The formula t = K/r was used to calculate the TE burst time point for single peaks, where r refers to the nucleotide substitution rate for sugarcane species (6.5 × 10 −9 ).
Population genomics analysis. The high-confidence set of variants was then used to perform population genetic analysis. Values of π and Tajima's D were calculated using VCFtools in 500-Kb windows and 100-Kb steps based on the high-confidence filtered SNPs. PLINK (v1.90) 67 was used to perform PCA and to transform the VCF file into a Plink binary file for input. The results of PCA were plotted using R 68 . ADMIXTURE (v1.3) 69 was then applied to infer population stratification among the 102 sugarcane accessions using the predefined number of genetic clusters K from 1 to 10. After the best value of K was calculated, the population structure of S. spontaneum was inferred using fastSTRUCTURE (v1.0) 70 for K = 1 through K = 10. A maximum likelihood tree was constructed using RaxML (v8.12) 71 , and the format conversion of the input file was performed using vcf2phylip (v2.0) (https://github.com/edgardomortiz/vcf2phylip). Finally, FigTree (v1.4.4) (https://github.com/rambaut/figtree/releases) was used to visualize the tree.
Analysis of population demographic history. We inferred a demographic history for S. spontaneum by applying the pairwise sequentially Markovian coalescent (PSMC, v0.6.5-r67) model 24 to the complete diploid genome sequences. This method reconstructs the history of changes in population size over time using the distribution of the most recent common ancestor (TMRCA) between two alleles in an individual. Consensus sequences were obtained using SAMtools. Bases with low sequencing depth (less than one-third of the mean depth) or high depth (twice the mean depth) were masked. The analysis was performed using the parameters -N25 -t15 -r5 -p '4+25*2+4+6' . The mutation rate per generation per site was 6.5 × 10 −9 and g = 1. PSMC modeling was performed using a bootstrapping approach, with sampling performed 100 times to estimate the variance of the simulated results.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
All raw sequencing data and assembled genome sequences for the S. spontaneum were deposited into the Sequence Read Archive (under BioProject accession PRJNA721787). Genome assemblies and annotation files of S. officinarum (LA-Purple) are available from NCBI with the same accession number and under Bioproject accession PRJNA744175. Fig. 6 | Restructured chromosomes validation. (a)FISH signals of centromere satellite repeat probes were detected as a single centromere of each chromosome in Np-X, while in AP85-441 FISH signals of dicentric chromosomes were detected in some chromosomes (arrowhead). The experiments were repeated independently at least three times with similar results. Scale bars = 5 μm. (b). The Hi-C read of S. spontaneum AP85-441 (upper) and S. spontaneum Np-X (bottom) were mapped to S. spontaneum AP85-441 genome respectively. The chromosomal rearrangement breakpoints are indicated by blue arrows that show matching discontinuities in the contrasting Hi-C contact maps. (c). The Hi-C reads of S. spontaneum Np-X (upper) and S. spontaneum AP85-441 (bottom), respectively, were mapped to the S. spontaneum Np-X genome. The chromosomal rearrangement breakpoints are indicated by blue arrows. AP: S. spontaneum AP85-441; Np: S. spontaneum Np-X. Fig. 8 | Transcript expression of genes within B to a compartment switching in S. spontaneum Np-X and corresponding genes in S. spontaneum aP85-441. Note: The genes expression in leaf and stem are indicated in (a), and the transcript expression of genes located in the A to B switching, B to A switching, or in the conserved compartment in leaf and stem are shown in (b) and (c), respectively. n indicates the number of genes expression. The centerline in each box represents the median; the lower and upper hinges represent the 25th and 75th percentiles, respectively. The whiskers represent 1.5× the interquartile range. The dots beyond the whiskers are outliers. P-values were calculated using the two-sided Wilcoxon rank-sum test.