Targeted sequencing of the short arm of chromosome 6V of a wheat relative Haynaldia villosa for marker development and gene mining

Background: Short arm of chromosome 6V (6VS) of Haynaldia villosa has been used in wheat breeding programs to introduce Pm21 resistance gene against powdery mildew and some other genes. Results: In this work, 6VS was isolated from a wheat ( Triticum aestivum ) - 6VS telosome addition line by flow cytometric sorting and sequenced by illumina technology. The assembly length was 230.39 Mb with contig N50 of 9,788 bp. The sequence annotation identified 3,276 high confidence genes supported by RNA sequencing data, representing about 2.3% of the chromosome arm sequence; repetitive elements accounted for 74.91% of the arm sequence. Sequences homologous to 6VS genes were identified on short arms of chromosomes 6A of T. urartu , 6D of Aegilops tauschii , 6A and 6B of T. dicoccoides , 6A, 6B and 6D of T. aestivum and 6H of Hordeum vulgare , revealing synteny relationships among these chromosome arms. Based on differences in intron size between the homologous genes on 6VS and 6AS/6BS/6DS of T. aestivum , 222 primer pairs were designed. Out of them, 120 amplified 6VS-specific products and are suitable as intron-target (IT) markers to trace the 6VS chromatin introduced into wheat. Conclusions: The results obtained and markers developed in this work will facilitate introduction of important genes to common wheat from its wild relative, while reducing the presence of unfavorable genes due to linkage drag. GO: NR: database; nucleotide database; COG: clusters of orthologous groups; KEGG: kyoto encyclopedia genes and genomes; FISH: fluorescence in situ hybridization; TREP: triticeae repeat sequence database; LTR: long terminal repeat; TGW: thousand grain weight; IT: intron targeting; SNP: single nucleotide polymorphism; SINE: short interspersed nuclear elements; TIR: terminal inverted repeat.

frost and drought of wheat [7,8]. These characters make H. villosa a highly attractive source of important genes and alleles for wheat improvement [1]. In the previous study, several useful genes were mapped on short arm of chromosome 6V, such as the Pm21 locus, which provides immunity or high resistance to all powdery mildew isolates, and NAM-V1, which contributes to increased grain protein content (GPC) in the wheat-H. villosa 6AL/6VS translocation lines [9, 10]. However, the lack of the genome sequence hampered the efforts to mine other important genes from H. villosa and the use of molecular tools to introduce them to wheat, while avoiding unfavorable alien chromatin.
The progress in DNA sequencing technology now makes the production of whole genome sequence assemblies feasible by whole genome shotgun approaches and the number of sequenced genomes of wheat relatives keeps on increasing [11][12][13]. However, if chromosomal location of the loci of interest is known, the option is to sequence only the chromosome, or chromosome arm of interest. This approach significantly reduces the project costs and thus enables sequencing chromosomes from multiple lines of a species, if needed. It also simplifies bioinformatic analyses due to reduced volume of sequence data.
Purification of a particular chromosome by flow sorting may be hampered by the inability to discriminate the chromosome from other chromosomes in a karyotype if its size or relative DNA content is not different. Various strategies have been developed to overcome this difficulty and one of them is to sort translocation or deletion chromosomes with altered size [15][16][17][18][19]. Larger deletions are not viable in diploids, but they may be developed from wild type chromosomes after they are introduced to a polyploid species, such as wheat, which tolerate aneuploidy. Thus, Tiwari et al. sorted chromosome 5Mg from a wheat/Ae. geniculata disomic substitution line [20]. In a similar way, Xiao et al. used a wheat-alien ditelosomic addition line "NAU1201" to isolate chromosome arm 4VS of H.
villosa. Thus, the line needs not to be prepared exclusively for flow cytometric sorting and may be already available [21].
In this work, a wheat-alien addition line containing a pair of short arms of chromosome 6V of H. villosa was used to isolate, sequence and assemble the sequence of 6VS. The draft sequence made it possible to characterize molecular composition of 6VS including DNA repeat content, identify genes and characterize syntenic relationships with the genomes of tribe Triticeae and other sequenced grasses. The 6VS sequences would also be used to develop PCR-based 6VS-specific markers, which will support alien introgression breeding of wheat and the cloning of favorable genes from 6VS.

Results
Flow sorting and sequencing of chromosome arm 6VS of H. villosa The population representing 6VS telosome was identified after screening all populations with lower DAPI fluorescence, which were expected to correspond to smaller chromosomes. Microscopic analysis of flow-sorted particles after FISH with probes for pSc119.2 and Afa family repeats enabled unambiguous identification of the population representing 6VS telosomes ( Figure 1). A detailed microscopic analysis showed that 6VS telosome could be sorted at an average purity of 89.41%. The sorted DNA was amplified by multiple displacement amplification (MDA) reactions before illumina sequencing.
Sequencing of DNA amplified from flow-sorted chromosome 6VS in illumina MiSeq system generated 47.7 Gb high-quality paired-end reads from two libraries, with insert sizes of 500 bp and 1,000 bp, respectively.
De novo assembly was performed using the software Hecate (http://bgiinternational.com/us/, unpublished) with different k-mer sizes (41, 45, 49 and 63). The result of the 45-mer run provided the assembly with the best sequence coverage and N50 size, and therefore was used to generate the 6VS scaffolds. The sequencing data and detailed assembly for 6VS are summarized in Table 1. A total length of 230.39 Mb assembled sequences was obtained, comprising 153,177 scaffolds. The maximum and minimum lengths of the scaffold were 138,620 bp and 100 bp, respectively, with contig N50 length of 9,788 bp and mean length of 1,464 bp.

Identification of repetitive DNA elements
The overall repetitive DNA composition including transposable elements (TEs) and tandem repeats across the 6VS assembly was analyzed. When compared with two DNA repeat databases combined, the Repbase UPdate library [30] and the TREP library [31], a total of 74.91% of the 6VS assembly corresponded to repeat elements (  In order to test the annotation quality of the 6VS, we used genes NLR-V [ 32], STPK-V [ 33] and NAM-V1 [ 9] cloned from 6VS to perform BLASTn search. We found sequences homologous at 99.93%, 100.00% and 99.93%, respectively, implying a high qualityof H. villosa 6VS annotation. Thus, the 6VS draft sequence obtained in this work will facilitate extensive mining of 6VS genes in wheat breeding.

Comparative analysis of 6VS sequence composition
We then used a set of "toplevel" wheat sequences consisting of molecule-level assemblies (http://plants.ensembl.org/) that were released by IWGSC [34] to identify 6VS syntenic regions on wheat chromosomes 6A, 6B and 6D. We also identified 6VS syntenic regions on chromosomes 6A and 6B of tetraploid T. dicoccoides, 6D of Ae. tauschii, 6A of T. urartu and 6H of H. vulgare. All predicted genes were used to identify possible syntenic regions in genomes of the related grass species. After filtering, 2,867 6VS predicted genes had 1,499, 1,577 and 1,430 blastn hits with homologous genes in wheat chromosomes 6A, 6B and 6D, respectively; the number of hits with homologous genes in T. dicoccoides NB-ARC domain proteins are commonly known as disease resistance genes. In the 6VS assembly, a total of 45 genes were predicted to encode NB-ARC domain proteins using HMMER model [35]. In a separate project, we analyzed transcriptome of the wheat-H.
villosa translocation line T6VS/6AL after the treatment with two Blumeria graminerum f.sp tritici ( Bgt) isolates E26 and E31 (data not shown). We found that 28 genes were expressed after inoculation of both isolates within 24 hours, with 15 genes up-regulated two-fold or more when compared to the control ( Figure 3). As 6VS chromatin introduced to wheat showed the main contributor of the resistance to various Bgt isolates, 6VS genes might be involved in the innate immunity of H. villosa to powdery mildew.
Wheat cultivars with 6VS/6AL translocation have been used extensively in wheat production with more than four million hectares in China [32], not only due to broadspectrum resistance to powdery mildew, but also due to their contributions to higher 1000-grain weight (TGW) [36]. In a previous study, TaGW2-6A was described as negative regulator of grain-width and grain-weight [37][38][39]. Four SNPs that occurred in the promotor region of TaGW2-6A were reported to be associated with TGW at positions -998bp, -739bp, -593bp and -494bp, in which SNP at -494 bp showing significant association with TGW and was located in the 'CGCG' motif [37]. SNP-494 has most effect on TaGW2-6A expression level and TGW, with haplotypes with A allele having significantly lower TaGW2-6A expression and higher TGW compared with those with G allele. To figure out if the increased TGW of 6VS/6AL translocations was due to the substitution of 6AS with 6VS, the TaGW2-6A gene homologue HvGW2-6V was identified in the 6VS assembly. The HvGW2-6V in H. villosa belongs to G allele at SNP-494, which was associated with low TGW ( Figure 4). We speculate that higher TGW of 6VS/6AL translocations might be affected by other genes rather than GW2-6V, or that the expression of alien gene is suppressed due to genomic shock in wheat background although the genotype at position -494 was the same with low TGW. villosa translocation line T6AL·6VS, but not in common wheat, the primer pair was considered 6VS-specific. In total, 120 markers were obtained with the success rate of developing 6VS chromosome arm-specific molecular markers as high as 54.05% (Table   S1). All IT markers were tested on three different translocation lines, NAU418, NAU419 and NAU1203, involving 6VS with different introgressed segments. The chromosome arm could be dissected into four bins: bin1 to bin4 (Fig 5), which contained 34, 11, 46 and 29 markers, respectively. Given that all three translocation lines are resistant, the resistant gene Pm21 was mapped within bin3. The 40 markers within this physical bin are suitable for marker-assisted breeding.

Discussion
Aneuploidy germplasm facilitate flow-sorting target chromosomes or its arms In order to characterize short arm of H. villosa chromosome 6V (6VS) at DNA level, we combined flow cytometric chromosome sorting and next generation DNA sequencing.
When compared to whole genome sequencing, this approach provided a massive and lossless reduction of DNA sample complexity and facilitated DNA sequence analysis. A chromosome can be purified by flow sorting if it differs in relative DNA content from other chromosomes in a karyotype, which is not the case of chromosome 6V in H. villosa. Thus, we have sorted 6VS chromosome arm from T. aestivum-H. villosa 6VS ditelosomic addition line, where the telocentric chromosome 6VS is smaller than other chromosomes. With the aim to achieve high resolution of 6VS, we employed bivariate analysis of DNA content (DAPI fluorescence) and the amount of GAA microsatellites labelled by FITC following the FISHS protocol [40]. This approach permitted sorting 6VS arm at almost 90% purity.
The 6VS sequences would accelerate breeding program H. villosa has been an important donor of disease resistance in wheat breeding, and Pm21 transferred from H. villosa to wheat remains the most effective powdery mildew resistance gene [10]. Although Pm21 has been cloned, its introduction by genetic transformation may not be acceptable by the market [41]. On the other hand, Pm21 transferred from wheat-H.
villosa translocation line T6AL.6VS, has been successfully utilized in wheat breeding, and more than 20 wheat varieties have been released in China [42]. Thus, the introgression of alien chromatin harboring traits of interest by chromosome engineering remains a priority.
However, due to linkage drag, this strategy often introduces favorable traits together with deleterious loci, compromising yield and quality [43]. Thus, advanced chromosome engineering is needed to minimize alien chromatin during alien introgression breeding.
The main procedures for reducing alien chromatin in wheat is to induce chromatin breakrejoining by ionizing radiation, or induce meiotic recombination between the alien chromatin and its homoeologous common wheat counterpart. In order to preserve beneficial genes and remove deleterious loci, it is important to know the location of Development specific molecular markers using chromosome sorting strategy Development of molecular markers is now much easier than before due to falling costs of next-generation sequencing. As shown in this work, this is true also in species without genome sequence, especially if a chromosome of interest can be purified by flow sorting.
The sequences from alien chromosome could then be combined with available wheat genome sequence to develop molecular markers suitable for detecting alien chromatin.

Development of Intron Target markers
In the first step, we chose a set of genes to calculate exon-exon junction size in genome sequences of homologous arms 6AS, 6BS and 6DS of wheat as well as in 6VS assembly.       The SNPs at the promotor region of TaGW2-6A and HvGW2-6V Characters that highlighted with red color were the SNPs that reported association with thousands of grain weight (TGW) within the promotor region of TaGW2-6A and HvGW2-6V.