Isolation and sequencing of a single copy of an introgressed chromosome from a complex genome for gene and SNP identification

This manuscript describes the identification, isolation and sequencing of a single chromosome containing high value resistance genes from a complex polyploid where sequencing the whole genome is too costly. The large complex genomes of many crops constrain the use of new technologies for genome-assisted selection and genetic improvement. One method to simplify a genome is to break it into individual chromosomes by flow cytometry; however, in many crop species most chromosomes cannot be isolated individually. Flow sorting of a single copy of a chromosome has been developed in wheat, and here we demonstrate its use to identify markers of interest in an Erianthus/Sacchurum hybrid. Erianthus/Saccharum hybrids are of interest because Erianthus is known to be highly resistant to soil borne diseases which cause extensive sugarcane yield losses in Australia. Sugarcane (Saccharum) cultivars are autopolyploids with a highly complex genome and over 100 chromosomes. Flow cytometry for sugarcane, as in most crops, does not resolve individual chromosomes to a karyotype peak for sorting. To isolate a single chromosome, we used genomic in situ hybridization (GISH) to identify the flow karyotype region containing the Erianthus chromosomes, flow sorted single chromosomes from this region, PCR screened for the Erianthus chromosomes and sequenced them. One Erianthus chromosome amplified and sequenced well, and from this data we could identify 57 resistant type genes and SNPs in nearly half of these genes. We developed KASP SNP assays and demonstrated that the identified SNP markers segregated as expected in a small introgression population. The pipeline we developed here to flow sort and sequence single chromosomes could be used in any crop with a large complex genome to rapidly discover and develop markers to important loci.


Introduction
Many of our grass crop species of major economic significance for example, wheat, barley, oat, rye, and sugarcane have large genomes (Kellogg 1998). This presents challenges in the implementation of new technologies for genomeassisted selection and genetic improvement of crops. One approach that has been successfully used in many plants is to break down the complexity of the genome by using flow cytometric sorting to isolate chromosomes or groups of chromosomes according to their size. For most grass crops usually only one or few chromosomes can be identified and separated by monovariate flow sorting, while others cluster together forming composite peaks (Giorgi et al. 2013). We have recently developed the use of flow cytometry in sugarcane for chromosome sorting (Metcalfe et al. 2019). Sugarcane cultivars are derived from crosses between the high sugar producing S. officinarum (2n = 10x, = 80) and the wild species S. spontaneum (2n = 8x, = 40-128) followed by backcrossing to S. officinarum (Roach 1972 1 3 have resulted in the modern hybrids with ploidy levels of 10 or more and large genomes (R570 cultivar, 10,000 Mb and 2n = 112) (D'Hont 2005;D'Hont and Glaszmann 2001;Piperidis et al. 2010a, b). In the flow karyotype of the two S. officinarum accessions and three hybrid cultivars, five main peaks, representing groups of chromosomes of similar DNA content or size, were identified (Metcalfe et al. 2019). Using microsatellite markers to screen chromosomes from each of the peaks it was determined that at least one homo(eo)log of each of the 10 chromosomes was present in each peak. This indicates that like most other grass crops, a single peak does not contain a single chromosome. The composite peaks could represent ancestral genomes from previous whole genome duplications in this autopolyploid genome (Metcalfe et al. 2019). Individual chromosomes cannot therefore be isolated by standard flow sorting in sugarcane from a single peak.
Fluorescence in situ hybridization in suspension (FISHIS) can be used to isolate chromosomes using a fluorescent label. However, this has only been successfully demonstrated in wheat and barley with two microsatellite probes and isolates only some chromosomes with > 90% purity (Giorgi et al. 2013;Sánchez-Martín et al. 2016). Attempts to use FISHIS in other species have been unsuccessful (unpublished data and pers. comm Jarsolav Doležel). As an alternative, isolation of a single copy of a chromosome by flow sorting has been developed in wheat (Cápal et al. 2015). Flow cytometry of a single chromosome results in very low quantities of DNA which need to be amplified before sequencing. This amplification process results in uneven coverage and some missing genes because of the random nature of the amplification (Cápal et al. 2015). However, it has been successfully used in wheat to identify chromosomes carrying a transgene  and in pea to identify chromosome translocations (Kreplak et al. 2019). Here we demonstrate how flow sorting of one single copy of a chromosome can also be used for gene identification and SNP discovery.
We chose to isolate an Erianthus chromosome from an Erianthus-Saccharum hybrid because of the potential value to sugarcane breeders. The genetic base of sugarcane cultivars is limited due to the relatively few S. spontaneum and S. officinarum clones being used in their initial development (Aitken and McNeil 2010;Arceneaux 1965). To incorporate new genes or alleles for resistance into sugarcane cultivars, sugarcane breeders have established introgression breeding programs with sugarcane-related genera of the 'Saccharum complex' Santchurn et al. 2014;Waclawovsky et al. 2010). Of these, Erianthus arundinaceus is of most interest because some E. arundinaceus clones are highly resistant to root knot and root lesion nematodes and to Pachymetra root rot (Magarey and Croft 1996). Yield loss from nematodes are estimated at 12-15%, costing the Australian sugar industry about AU$82 million per year (Blair and Stirling 2007). Similar yield losses, 10 to 15% are estimated for Pachymetra root rot, with up to nearly 40% losses in susceptible varieties (Magarey and Bull 2003). E. arundinaceus is a hexaploid and contains 60 chromosomes making whole genome sequencing and assembly very costly and difficult with six copies of each chromosome. Introgression of E. arundinaceus by crossing into a sugarcane background results in genotypes that contain mostly Saccharum chromosomes with the addition of a few E. arundinaceus chromosomes. The first fertile genuine hybrids between S. officinarum and E. arundinaceus was reported at the Hainan Sugarcane Breeding Station of the Guangzhou Sugarcane Industry Research Institute in China in 2002 (Cai et al. 2005;Deng et al. 2002). The initial hybrid between S. officinarum and E. arundinaceous has 40 Saccharum chromosomes and 30 Erianthus derived chromosomes. This has been backcrossed to Saccharum to recover the high-sucrose phenotype, resulting in reduced numbers of Erianthus chromosomes, BC4 plants have 0-3 or 4 Erianthus chromosomes (Piperidis et al. 2010a, b). Recombination does occur between the Erianthus and Saccharum chromosomes resulting in stable inheritance (unpublished data), so with further crossing recombination will reduce the linkage block and allow the incorporation of resistance genes into sugarcane.
The process of introgression in Saccharum using conventional breeding procedures is relatively long and risky. Following initial crosses with wild clones generally two or more backcrosses to elite commercial type parents are made and then lengthy (up to 10 years) field evaluation and selection programs are conducted before the commercial type progeny are developed (Hotta et al. 2010;Zhang et al. 2014). Without clear identification of specific traits or genes (markers) derived from the exotic germplasm, there is no guide for the selection for these traits or genes and valuable components from the exotic source may be easily lost after several generations of backcrossing. There is also the probability that undesirable traits from the exotic genome may be inherited making some progeny inferior agronomically. For these reasons markers specific to chromosomes carrying resistance genes of interest are desirable and can be used for selection of progeny that contain the resistance gene from the exotic germplasm.
To demonstrate that sequencing of a single flow-sorted chromosome can be used to identify markers linked to resistance genes, we isolated an Erianthus chromosome from the BC4 Erianthus-Saccharum hybrid, QC12-20,006. The BC3 parent of QC12-20,006 is known to have some resistance to Pachymetra root rot and root knot nematodes (Shamsul Bhuiyan pers.com.). In order to achieve our objective, we first identified the region of the flow karyotype representing the Erianthus chromosomes by genomic in situ hybridization (GISH) to chromosomes flow sorted onto slides. The single 1 3 chromosomes from this region were flow sorted into plates and amplified using the non-PCR based DNA amplification method, multiple displacement amplification (MDA) (Dean et al. 2002). MDA products were then screened by PCR with an Erianthus specific sequence to isolate Erianthus chromosomes. Single Erianthus chromosomes were sequenced and used to identify genes and SNP markers for use in introgression breeding. comm.) The BC3 parent (KQ08-6013) has resistance to Pachymetra root rot and Root knot nematode, while the Saccharum parents are not resistant (Bhuiyan et al. 2016;Shamsul Bhuiyan, pers. Comm).

Flow Cytometry and identification of composite peaks with Erianthus chromosomes
QC12-20,006 for flow cytometry was grown in a glasshouse at CSIRO (St Lucia, Queensland) with a natural photoperiod, 12 h 30 °C, 12 h 24 °C and humidity > 55%/70% (day/night relative humidity). Chromosome suspensions were prepared and sorted according to (Metcalfe et al. 2019). The protocol was modified from the method developed by Vrána et al. (2016). To identify flow karyotype peaks containing Erianthus chromosomes, 2000-3000 chromosomes from each composite peak were flow sorted onto slides and probed with a probe preparation of whole Erianthus genomic DNA by GISH. Peak 4 was not clearly a single peak so was split into two collections points, IVa and IVb. DNA was extracted from the Erianthus arundinaceus clone IJ76-346 using a standard CTAB method (CIMMYT Laboratory Protocols 2005), replacing the octanol with isoamyl alcohol. One μg of DNA was used in a 25 μL labelling reaction with the Digoxigenin (DIG)-11-dUTP-Nick Translation Mix (Roche). Hybridization and detection were performed according to Vrána et al. (2016), except that salmon sperm DNA was used instead of calf thymus DNA in the hybridization mix and the signal of probe hybridization was detected using Anti-DIG Rhodamine (Roche) (1 µg/mL).

Chromosome amplification, evaluation and sequencing
Proteinase K (2 µg/mL) in 1 × TE buffer was filter sterilized through a Millex 0.22 µm syringe-driven filter unit (Merck Millipore), 7.5 µL aliquoted into a 96 well plate and then UV-irradiated for 30 s in a BioRad XR + Gel Documentation System. Chromosomes were flow sorted into the 96 well plate at the rate of 100 events per second. One hundred chromosomes were flow sorted into the first row as a positive control, single chromosomes into rows B to G, and no chromosomes into the last row as an amplification and flow cytometry negative control. Chromosomes were purified and amplified according to Vrána et al. (2016) and Cápal et al. (2015), with the REPLI-g Single Cell Kit (Qiagen) and the protocol `Whole genome amplification from genomic DNA using the REPLI-g Single Cell Kit with increased sample volumes', with half the volumes described. We tested 8 h incubation time vs 16 h for increased amplification.
The multiple displacement amplification (MDA) products were evaluated by agarose gel electrophoresis, diluted 1:10 and quantified with the Qubit dsDNA Broad Range Assay kit (ThermoFisher). Diluted amplified DNA was PCR screened for sugarcane and Erianthus genomic DNA, and for bacterial contamination. For all PCR reactions 5 ng of purified DNA was used as a template in 15 μl PCR reaction with 10 µM of each primer and the MyTaq HS DNA Polymerase (Bioline) kit with dNTPs, MgCl 2 and enhancers included in the buffer provided. The following cycling conditions were used, initial denaturation 94 °C 1 min, 35 cycles of (94 °C 30 s, 60 °C 1 min, 72 °C 30 s), followed by a final extension of 72 °C for 5 min.
Three sets of primers were used to identify an Erianthus chromosome (Table S1). The first set of primers are universal bacterial 16S rDNA primers (Wang et al. 2003). The second set of primers were designed against a conserved domain of a highly abundant transposable element in the sugarcane clone R570 (Domingues et al. 2012). These primers were used to verify that Saccharum or Erianthus DNA had been amplified and that there was low or no bacterial contamination. The third set of primers are specific to Erianthus (Besse and McIntyre 1998) and were used to identify the Erianthus or Saccharum/Erianthus hybrid chromosome.
Two MDA products identified as Erianthus DNA were prepared with the Nexera Illumina library kit and sequenced on an Illumina (https:// www. illum ina. com/) HiSeq 2500 platform at the Australian Genome Research Facility (AGRF).
To check for bacterial contamination 5000 reads were randomly selected using a custom perl script and compared with the NCBI nucleotide database using blastn version 2.7.1 (Altschul et al. 1990;Sayers et al. 2011). A custom R script was used to summarize the results.
Reads from the two MDA products and the simulated R570 reads were mapped to the R570 STP (Garsmeur et al. 2018), while the two MDA products were also mapped to the gene regions. Mapping was done using the Burrows-Wheeler mem aligner using default settings (Li 2013) and quality filtered (q = 30) to keep only uniquely mapped reads and filtered so that both reads mapped. Mapping statistics were obtained using samtools version 1.9.0 flagstat and view from the SAMtools package (Li et al. 2009). Bedtools version 2.28.0 genomecov (Quinlan and Hall 2010) with the −bga option was used to generate a coverage reports. Custom R scripts were used to generate graphs and information for the tables.

SNP identification and verification
Samtools version 1.9.0 mpileup and bcftools version 1.9.0 from the SAMtools package (Li et al. 2009) were used to identify SNPs from mapping to the sugarcane monoploid genome. A R570 resistance genes bed file was created by extracting annotations that included the terms LRR, NBS, RAG, NBC-ARC, resistance, TIR and WAK from the R570 monoploid reference GeneStructureAndFunction.gff3 file (https:// sugar cane-genome. cirad. fr/ conte nt/ downl oad) (Garsmeur et al. 2018). Bedtools version 2.28.0 slop (Quinlan and Hall 2010) was used to expand the gene coordinates 100 bp for primer design. Bedtools version 2.28.0 intersect was then used to extract the resistance genes from the vcf file (-header -wa options) and to annotate the resulting file with resistance gene descriptions (-wb option). A custom R script was used to filter out SNPs that had another SNP within 20 bp as primers could not be designed to these regions.
KASP primers were designed to the selected SNPs using Primer3Plus (Untergasser et al. 2007). A pre-amplification step was used for the SNP genotyping to improve the allelic discrimination of the genotypes. The pre-amplification was carried out with 10 × buffer, 25 mM MgCl 2 , 10 mM dNTPs, Taq 1µ/µl, Forward and Reverse primers 10 µM in a final volume of 6 μl. PCR cycle was as follows, 94 °C 3 min, (94 °C 20 s, 55 °C 30 s, 72 °C 30 s) for 30 cycles then 72 °C for 5 min. The product was diluted 1 in 100 then run on the ABI Viia7 instrument using the KASP (Biosearch Technologies) genotyping program using 2.5 µl of pre-amplification in a 5 µl reaction containing 2.5 µl KASP master mix which contained 0.07 µl KASP assay mix and 0.1 µl of primer mix. Amplification conditions were 95 °C for 15 min, 95 °C 20 s, 65-57 °C 60 s then touchdown over 10 cycles (95 °C 10 s and 57 °C 60 s) for 20 cycles. Once the run was complete the data were analysed and the allelic discrimination plot used to call the genotypes.
To determine the segregation ratio the SNP markers were screened across 29 clones from the Saccharum-Erianthus BC3 population, from the QN80-3425 × QBYC06-30,415 cross. The SNP markers were tested against the expected ratio for a single dose marker (1:1) using a χ 2 test. The clones were screened for pachymetra root rot response using the method in Magarey and Croft (1996). To determine association with the Pachymetra root rot rating the data were analysed using a t-test: Two-Sample assuming unequal variances to test for association.

Results
Chromosomal composition of the BC4 hybrid, QC12-20,006, by GISH Genomic in situ hybridization analysis of QC12-20,006 with whole Erianthus DNA identified three Erianthus chromosomes in the sugarcane background. The red chromosomes are inherited from Erianthus and the green chromosomes are inherited from Saccharum (Fig. 2). One of these chromosomes, indicated with a yellow arrow, which has a majority of red signal also contains a small introgression of green indicating a recombinant chromosome (Fig. 2). In total QC12-20,006 contained approximately 105 chromosomes of which three were of Erianthus origin.

Flow cytometry and identification of peaks with Erianthus chromosomes
Five composite peaks were identified in the QC12-20,006 flow karyotype (Fig. 3). The QC12-20,006 flow karyotype is similar to other sugarcane cultivars but distinct from S. officinarum (Metcalfe et al. 2019). Composite peaks I and II in QC12-20,006 are the highest and equally abundant, with composite peaks III and IV also of equal height (Fig. 3a).
To ensure enrichment of Erianthus chromosomes when sequencing, GISH with whole Erianthus genomic DNA was used to narrow down the screening of single flow sorted chromosomes to one or two composite peaks (Fig. 3b). No signals were observed on chromosomes flow sorted onto slides from composite peaks I, II or V. Signals from the distinctive Saccharum-Erianthus recombinant chromosome were observed in QC12-20,006 composite peaks III and IV (Table 1).

Amplification, screening and sequencing of chromosomes
Chromosomes from composite peaks III and IV were flow sorted into six 96 well plates and amplified. Using the standard amplification reaction time suggested with the REPLIg Single Cell Kit (Qiagen), the amplification success rate was very poor. Only one out of the five samples processed resulted in a product at the expected yields and was positive for the Saccharum/Erianthus transposable element (TE). After increasing the amplification incubation time to 16 h, as described in the supplementary protocol for maximum yield, the success rate improved to four successfully amplified products out of five samples.
Thirty single chromosomes from the two QC12-20,006 flow karyotype composite peaks were purified, amplified and PCR screened. The DNA from the single chromosome amplifications was evaluated on a 1.5% agarose gel. An example of the amplifications is shown in Fig. S1. In Fig.   Fig. 2 GISH analysis of the Saccharum-Erianthus BC4 hybrid, QC12-20,006. QC12-20,006 metaphases were probed with whole Erianthus DNA labelled with biotin-14-dUTP and visualized amplified/detected with avidin-conjugated Texas red and genomic DNA from S. officinarum labelled with DIG and detected with FITC according to (Piperidis et al. 2010a, b;Piperidis 2014). The recombinant chromosome is indicated with the yellow arrow (colourfigure online) 1 3 S1, all amplifications show high molecular weight DNA, except for Lane 2 which represents an amplification failure. The negative control produces a product of the same size as the samples because DNA is generated during the REPLI-g Single Cell reaction by random extension of primer dimers. Total yields ranged from 6 to 31 µg, with an average of 19 μg.
Purified and amplified samples were PCR screened with the three sets of primers (Table S1). Wells with no chromosomes flow sorted into them were PCR negative, indicating no contamination from the flow sorter or amplification process. Of the thirty single chromosome amplification products screened 24 (80%) were PCR positive for the Saccharum/ Erianthus repeat, 17 of these 24 chromosomes (71%) were negative for the universal bacterial 16S rDNA sequence. After screening with the Erianthus specific repeat, there were two MDA products that were positive for the Saccharum/Erianthus TE and the Erianthus specific repeat, and negative for the universal bacterial 16S rDNA. As we would expect 3 out of 105 chromosomes to be Erianthus (2.8%) and we identified two Erianthus chromosomes out of 17 tested (11.7%) by flow sorting from a selected peak we obtained a fourfold enrichment for Erianthus chromosomes. Amplification yields for the MDA products that were Saccharum/ Erianthus PCR positive and bacterial 16S rDNA negative were between 15 and 31 μg in a total volume of 25 μL.

Read trimming, alignment and mapping
Around 9 Gb of sequence was obtained for both single chromosomes sequenced (Table S2), resulting in approximately 110 × coverage, based on the estimated chromosome size of 80 Mb for sugarcane (cultivar R570: 10 Gb/112 chromosomes (Piperidis et al. 2010a, b). After trimming and quality checking the coverage dropped to approximately 62x, with a mean read length of 107 bp and approximately 80% of the reads retained (Table S2). The top hit for about ½ of a random sample of 5000 reads with a blastn against the NCBI nucleotide database was Saccharum, following by Sorghum ( Table 2). The top hit was Saccharum rather than Erianthus because there is very little sequence information available for Erianthus. There was no indication of bacterial or human contamination.
Approximately 70% of the reads were properly paired and mapped to the R570 STP (Garsmeur et al. 2018), which dropped to 43% for the first MDA product and 36% for the second MDA product after quality filtering (Table S2). For mapping to gene regions only, 24 and 18% of reads from MDA products one and two were properly paired and mapped, respectively. This dropped to 17% (7,602,141 reads) and 12% (10,056,900) after BAM mapping quality filtering (Table S2).
Filtered MDA product reads were separately mapped to the R570 STP chromosomes and genes (Garsmeur et al. 2018). Mapping to genes was used to identify which chromosome had been amplified and to examine gene coverage. Mapping to STP chromosomes was used to identify resistance genes and SNPs and to examine overall coverage. The highest proportion of reads from the first MDA product mapped to genes from chromosome 7, consistent with chromosome mapping results (Table 3, S3 and S4). Results for the second MDA product were not concordant, where reads mapped with higher frequency to gene sequences on chromosome 4, but overall mapped best across the entirety of chromosome 5. However, the highest mean depth and second highest mean breadth of read mapping were to chromosome 5 genes (Table 3, S3 and S4). Results for the second MDA product are therefore shown for chromosome 5.
Coverage of STP chromosome seven (MDA one) and chromosome five (MDA two) was uneven, showing very high coverage in some regions and very low coverage in other regions (Fig. 4a). Figure 4b shows that for the first MDA product 80% of chromosome 7 had no reads mapped, 16% had less than 200 reads, while 2.4% had over 200 reads, while for the second MDA product 98% of chromosome 5 had no reads mapped, 1.39% had less than 200 reads, while only 0.27% had over 200 reads.
The average gene read depth and percentage mapping breadth was much higher for the first MDA product (74.05 and 44.15%, respectively), than for the second Fig. 3 Summary of results (a) Histogram of relative DNA fluorescence of chromosomes (flow karyotypes) obtained by flow cytometric analysis of DAPI-stained chromosome suspensions of QC12-20,006. The flow karyotype was divided by peaks into five regions (dotted lines) and 2000 to 3000 chromosomes from each region were flow sorted onto microscope slides.; (b) Flow sorted chromosomes (grey) were probed with whole Erianthus genomic DNA (red) to identify regions with Erianthus chromosome. (c) Single chromosomes from regions most enriched for the Erianthus chromosomes (dark grey region in (a)) were flow sorted into a 96 well plate, purified and MD amplified. (d) amplified DNA from 30 single-flow sorted chromosome were PCR-screened using 3 sets of primers, a primer specific to a Saccharum/Erianthus transposable element, an Erianthus specific repeat primer, and a universal bacterial 16S rDNA primer. Two single chromosomes positive for the Saccharum/Erianthus transposable element and the Erianthus specific repeat, but negative for the universal bacterial primer, were Illumina sequenced and analysed for contamination and coverage. (e) 136 SNPs within 31 annotated resistance genes were identified from mapped illumina reads from the two chromosomes. (f) Six KASP primers were designed to SNPs within 5 classic resistance genes and used to determine the segregation ratio of the SNP markers across 29 clones from a Saccharum-Erianthus BC3 population using the KASP system. One SNP segregated as a single dose marker separating the progeny into two groups one containing the Erianthus heterozygous SNP (red points) and the other the homozygous group inherited from Saccharum (green points) (colourfigure online) ◂ 1 3 MDA product (30.68 and 12.59%, respectively) (Tables 3  and S4). This is reflected in Fig. 5a, a higher proportion of reads from MDA one map with higher coverage to chromosome 7 than to any chromosome. Reads from the first MDA product covered 5% of the R570 STP of chromosome 7, with 5% of genes 80-100% covered, 16% of genes 40-80%, 42% of genes > 0-40%, while 38% of genes were missing (Table S5). One hundred resistance genes were identified on chromosome 7 and 68 on chromosome 5. For the first MDA product, over half had some read coverage, 16% had over 40% coverage. Only 16% of the chromosome 5 resistance genes had some coverage  resistance genes. MDA one clearly maps to chromosome 7, while MDA two does not map clearly to a single chromosome. MDA one also has a higher fraction of genes with > 80% coverage breadth, suggesting that MDA one is the product of a more successful amplification 1 3 with reads from the second MDA product (Fig. 5 and Table S5).

SNP identification and verification
As the test case was to see if we could identify SNP markers inherited from the Erianthus chromosomes, genes that were annotated as resistance genes were targeted. SNPs were identified in 24 classic resistance genes in MDA 1 on chromosome 7, i.e. nearly half of the resistance genes with some read coverage. Only 7 of the resistance genes in MDA 2 on chromosome 5 contained SNPs but this was over half of the resistance genes with some read coverage. A total of 136 SNPs within 31 annotated resistant genes were identified as potential candidate markers (Table S6). Genes were identified that had SNPs at a coverage of 3 to 19 reads or above. Four KASP primers were designed against four genes identified from the first MDA product and two KASP primers against one gene identified from the second MDA product ( Table 4). The KASP primers were optimized for annealing temperature and run across the parents of the BC3 Erianthus/ Saccharum population. Five of the SNP markers did not segregate as single dose markers. One SNP, in gene ID Sh07_g011420, segregated as single dose in a small BC3 population (Fig. 3f). This SNP segregated as expected for a single dose marker (1:1) and verified the high quality of the sequence information. A preliminary association test between this SNP data from the BC3 clones and their pachymetra root rot rating revealed a significant association (p ≤ 0.01).

Discussion
We aimed to demonstrate the use of single chromosome flow sorting as part of a genome-assisted selection and genetic improvement strategy in any crop with a large complex genome. Routine isolation of single chromosomes, either by flow cytometry or other methods, results in quantities of DNA too low to sequence without amplification Arumuganathan et al. 1994). Whole genome amplification (WGA) was developed for analysis of small pools of cells or single cells, for example, for forensic analysis, prenatal or preimplantation diagnosis, oncogenetics and paleo-archeology (Dean et al. 2002;Spits et al. 2006). The first methods were PCR-based which were hampered by non-specific amplification artefacts (Cheung and Nelson 1996), incomplete coverage of loci (Paunio et al. 1996) and the small size of the DNA products (Zhang et al.1992;Telenius et al. 1992). Multiple displacement amplification is a newer WGA method with limited sequence representation bias (Dean et al. 2002;Lizardi et al. 1998). Although MDA generates less amplification artefacts than the PCR based method there is still considerable variation in MDA amplification. Amplification and sequencing of single or small numbers of chromosomes (Hotta et al. 2010) has been developed in humans for haplotype phasing (Fan et al. 2011;Ma et al. 2010;Yang et al. 2011;Kirkness et al. 2013), in the Japanese eel to identify genetic linkage group sequences (Matsubara et al. 2018) and in plants (Cápal et al. 2015Kreplak et al.2019) for wheat and pea.
Sugarcane (Saccharum) cultivars present particular challenges, they are polyploid aneuploid hybrids with eight or more homeologous chromosome copies and in total between 100 and 120 chromosomes (Piperidis et al. 2010a, b;D'Hont et al. 1998. We have recently developed the use of flow cytometry with Saccharum (Metcalfe et al. 2019), which provided the ability to carry out chromosome flow sorting to isolate a single targeted chromosome. The method we used was based on that developed by (Cápal et al. 2015), combining flow sorting of a single chromosome into a 96 well plate, amplification and sequencing. In Cápal et al. (2015) the amplified wheat chromosome 3B corresponds to only 0.002 ng template DNA. The total sugarcane genome is about 10 Gb (10.22 pg) (D'Hont and Glaszmann 2001; Garsmeur et al. 2018) and the majority of the chromosomes are small and of similar size (Meng et al. 2019). On average a single sugarcane chromosome is therefore about 0.0001 ng, an order of magnitude smaller than a wheat chromosome. GISH was used successfully to narrow down the two karyotype composite peaks that putatively contained Erianthus chromosomes by flow sorting chromosomes from individual peaks onto slides for detection. This allowed easier isolation, enrichment and sequencing. Two individual Erianthus chromosomes were isolated from flow-sorted chromosomes, verified by PCR and an Erianthus specific repeat. The two chromosomes were sequenced to high depth with Illumina reads and used to identify SNPs against the R570 monoploid sugarcane genome. To determine if the SNPs identified from the sorted chromosome were high quality and segregated as single dose, we tested several SNP markers identified in two genes that have been annotated as resistant genes across a small Erianthus introgression population. Although this population was small as we had Pachymetra root rot resistance ratings we tested for association to resistance to Pachymetra root rot. This pathogen was chosen as loci (or QTL) located on Erianthus chromosomes in QC12-20,006 have been shown to contain resistance to this pathogen (Magarey and Croft 1996). The SNPs were converted to a KASP assay and the BC3 population genotyped to identify regions associated with resistance to Pachymetra infection.
Two out of the three Erianthus chromosomes from an introgression line were isolated, amplified and sequenced. The second amplification product (MDA) had very low sequence coverage and depth which could be due to a lower amount of amplified product. The first MDA coverage and depth was comparable to, but lower, than reported by Cápal et al. (2015) for a single wheat chromosome and Kirkness et al. (2013) for a single sperm. Uneven coverage (Figs. 4,5) was also observed by Cápal et al. (2015) and Kirkness et al. (2013). The total region examined with no coverage (80%) is higher slightly than that reported by Cápal et al. (2015) (70%) and Kirkness et al. (2013) (57-72%) (Fig. 4). The per cent of genes 80-100% covered, 5%, is slightly lower than reported for the three wheat chromosomes (7-15%) (Fig. 5) (Cápal et al. 2015). The depth is similar (10% of chromosome 7 covered by at least 10 reads) to that reported by Cápal et al. (2015). There are probably several contributing factors to low coverage observed for the first MDA. The input (0.0001 ng) is many fold lower than that recommended by the MDA amplification kit, which is designed for an input of 10 ng, and tenfold lower than a wheat chromosome (Cápal et al. 2015). The sequencing reads were mapped onto the evolutionary closest available genome, the sugarcane monoploid genome because an Erianthus genome is not available, whereas Cápal et al. (2015) were able to map directly back onto published sequence of the same chromosome that was isolated. The sugarcane monoploid genome sequence is generated from selecting gene rich Saccharum BAC clones aligned to the sorghum genome sequence so centromeric and other repeat regions are under-represented.
Despite the low coverage of the amplified chromosomes, we were able to identify SNP markers in nearly a quarter of the resistance genes identified on chromosome 7 (24 out of 100) with the first MDA product. Segregation of these SNP markers were verified using a KASP genotype assay on 29 clones from the BC3 population which includes KQ08-6013, the intergeneric parent carrying the Erianthus chromosomes. In most cases the KASP primers identified SNPs that were not segregating in the BC3 population. This was probably due to them being either multi-dose in the BC3 population, which contain from 3 to 7 Erianthus chromosomes, or the SNPs were not in fact Erianthus specific SNPs, but were also present in Saccharum, just not in the R570 monoploid sequence, so did not segregate. The SNP located in an IAAalanine resistance gene was segregating in the BC3 population and was tested for association with the Pachymetra root rot ratings for these clones. This SNP was significantly associated (p ≤ 0.01) with resistance to Pachymetra root rot and a larger population should be screened to verify this result. Interestingly IAA-alanine resistance genes are members of the ZIP transporter family which are probably involved in zinc uptake (Grotz et al. 1998). IAA is the major form of auxin in plants, and in rice, auxin signalling is closely associated with zinc efficiency (Begum et al. 2016). The homolog of the IAA-alanine resistance gene in Arabidopsis, IAR1, has also been shown to participate in auxin metabolism or response (Lasswell et al. 2000). Auxin has been known to be a regulator of plant growth and development ever since its discovery, however, studies on plant-pathogen interactions have also identified auxin as a key character in pathogenesis and plant defence (Fu and Wang 2011).
We have successfully demonstrated that there is potential to couple GISH, single chromosome sorting, PCR screening and sequencing to rapidly identify genes/markers to a particular introgressed chromosome. This method allowed us to identify single dose segregating SNP markers that when tested showed potential association with an important trait in sugarcane. The genes/markers identified in this research will be tested on a larger population to verify their association with resistance. The level of sequence coverage was very uneven across the chromosome which could result in being unable to identify a SNP in a particular gene of interest. For MDA one 38% of genes had no read coverage 1 3 although there are a number of strategies that could be used to improve low depth and the low and uneven coverage and improve the chances of identifying resistance genes/markers. The increased yield of the kit we used would allow long read sequencing combined with Illumina sequencing. Amplification bias could be improved by amplification in an nanolitre reactor or using a microfluidic digital droplet MDA technique (Marcy et al. 2007;Rhee et al. 2016), which could be used in combination with single chromosome flow sorting. In Cápal et al. (2015) and Kreplak et al. (2019) their stratergy was to merge reads from three amplification products and evaluate the amplification products before sequencing by PCR using a set of single loci specific to the 3B chromosome, which we were unable to do because of the lack of sequencing information for Erianthus, but this is a strategy which could be used to isolate single sugarcane chromosomes.