Development of high-resolution GenoBaits probes for wheat- Th. elongatum
Chinese Spring (CS) and Abbondanza are the wheat cultivars that have been used the most for distant hybridization. To design the GenoBaits array for Th. elongatum target capture-based sequencing, the whole genome resequencing data from CS and Abbondanza were aligned with Th. elongatum reference genomes (Wang et al. 2020). The genomic regions that could not be covered by both CS and Abbondanza were identified as Th. elongatum specific genomic regions, which participated in 11830821 windows with total length of 1.85 Gb (40.77%) (Fig. S1).
The design of the GenoBaits probes used in this study was described in the Experimental Procedures. For Th. elongatum, the goal of our probe design was to maximize the usability of the probes to trace each gene. After a selection pipeline, 40100 regions were ultimately selected as targets. Subsequently, we developed 80000 Th. elongatum bait probes to capture these targets. Each chromosome was covered by 7880 to 14315 probes (Fig. 1a). The interval length between the genes and adjacent probes was distributed mainly from 0 to 3 kb, representing 87.0% (38391/44144) of the genes on the seven assembled pseudochromosomes (Wang et al. 2020) (Fig. 1b). We observed that 42.2% of the probes were located in the gene body region, involving 18646 different genes (Fig. 1c). Only 4.4% (1933/44144) of the gene-probe pairs were more than 100 kb apart. Additionally, a total of 7027 probes were collocated with 458 genes that are related to seed quality and disease resistance (e.g., high molecular gluten, FHB7, R genes) to increase resolution for their haplotyping, with an average of 15.3 probes per gene. Overall, these results indicate that we basically realize the goal of developing a roadmap that guids each gene in the Th. elongatum genome.
For wheat, we primarily intended to select species-specific loci that would uniformly cover all the chromosomes to ensure an even distribution of detection signals along each chromosome. The final array contained 10000 probes, targeting 5035 loci in the wheat genome. The lowest number of probes was for chromosome 4D at 332, and the highest number was for chromosome 3B at 711 probes, with a variation of 2.1 times (Fig. S2a). However, the probe density on each chromosome was similar, ranging from 0.61 to 0.88 probes per megabase (Mb) of DNA sequences. Of these probes, 82.7% were located in the genic region, with 10.2% distributed from 0 to 100 kb, and 7.1% had a distance greater than 100 kb between the gene-probe pairs (Fig. S2b).
Finally, a total of 90000 capture probes (each 110 nt in length) derived from two species were integrated into one GBTS liquid chip, named GenoBaits®WheatplusEE. The density of the probes decreased along the chromosome from the distal regions to the centromere (Fig. 1a and 1d), which was consistent with the fact that the distal regions are rich in genes ((IWGSC) 2018; Wang et al. 2020). Taken together, the probes in the GenoBaits®WheatplusEE array were well distributed throughout the genome for both wheat and Th. elongatum.
Sensitivity and reproducibility of GenoBaits®WheatplusEE array
To evaluate the efficiency of target capture, we tested the array using wheat variates CS and Aikang58 in DNA duplicates, as well as Th. elongatum. Following targeted capture and sequencing using the MGISEQ-2000 platform (PE150), approximately 4.9×10− 7 pair-end reads were generated, leading to an expected average sequencing depth of 107.9x. More than 96.2% of the sequences aligned with the original reference genome sequence (Table S1). The highest mapping rate was obtained for CS (98.5%) and the lowest for Th. elongatum (89.1%). Meanwhile, about 78.1% of reads could be mapped back to the longer references from which probes were originally designed (110-bp capture probe sequence + 1000-bp flanking sequences at both sides of probe), with number ranged between 76.0% (CS repeat2) and 83.0% (Th. elongatum).
To display the data in a form that is independent of the absolute number of sequences, the depths for each targeted location were normalized by depths per million reads (DPM) in each sample. For CS, 99.5% of the wheat-targeted loci received a detection signal (DPM༞0), 97.5% received a detection signal with DPM > 1, 0.5% was not covered in this experiment (Fig. 2a-b). Similar results were also observed for AK58, except for chromosome 1B (Fig. 2b). We observed that weak detection signals occurred in the 1BL chromosome of AK58 (Fig. 2c), suggesting the potential of using this array to confirm wheat-rye 1BL.1RS translocation line. Furthermore, more than 78.6% of the wheat-targeted loci were not covered in Th. elongatum. Consequently, Th. elongatum achieved a high detection rate level for the target loci derived from Th. elongatum, 67.9% compared to 1.7% (AK58) and 0.1% (CS). For the two DNA replicates, the normalized depth distribution plot for the targeted regional sequencing was considerably consistent (Fig. 2d and Fig. S3-S4), indicating high reproducibility. In summary, the GenoBaits®WheatplusEE array generated unambiguous high-reproducibility signal patterns for each of the chromosomes, allowing the identification of individual chromosomes for wheat and Th. elongatum.
Characterization of chromosome constitution in wheat- Th. elongatum derivatives
The designed array was applied to direct the capture and targeted sequencing of various wheat-Th. elongatum derivatives to evaluate its specificity and efficiency. We first tested in 14 disomic addition lines of wheat (CS)‐Th. elongatum. In the 7E addition line, 7E and 21 wheat chromosomes showed densely and evenly detected signals along the chromosomes, while the successful captured loci in the remaining 6 Th. elongatum chromosomes were sparsely and randomly distributed (Fig. 3a). Similar results were observed for the addition lines CS-1E, CS-2E, CS-3E, CS-4E, CS-5E and CS-6E (Fig. S5). Furthermore, the designed array could clearly visualize the deletion of the chromosome and the corresponding break point (Fig. 3b). According to the distribution pattern of the successfully captured loci among a set of Th. elongatum telosomic addition lines, the arm ratio expressed as the ratio of the length of the short arm to that of the long arm (Gill et al. 1996) was calculated for each of the Th. elongatum chromosomes (Fig. 3c). For example, the longest chromosome for Th. elongatum was 7E (~ 744.1 Mb)(Wang et al. 2020). The overlap of successfully captured loci between 7EL and 7ES was the centromere region, with a length of 19.8 Mb. Based on this, the fraction length in 7ES was 0.51 and that in 7EL was 0.49, with an arm ratio of 1.02 for 7E chromosome. However, we did not observe robust capture signals in the centromere region for CS-7ES when comparing it with CS-7E and CS-7EL, indicating that 7ES might lack a natural centromere.
We also collected 16 wheat-Th. elongatum substitution lines for further evaluation. Referring to the successfully captured loci, the array could determine the chromosome constitution for each material as expected (Fig. S6). As shown in Fig. 3d, most of the probes only captured their corresponding chromosomes, while very few detection signals were randomly distributed in the absence of the chromosome. For example, three disomic substitution lines each with one of the three homoeologous wheat chromosomes of group 1 (1A, 1B or 1D) were replaced by chromosome 1E, designated as CS-1E(1A), CS-1E(1B), and CS-1E(1D), respectively. The sequencing results showed that the array could not only reflect the composition of the foreign chromosome, but could also succeed in distinguishing the three homoeologous chromosomes of wheat. These results demonstrated the high specificity and efficiency of this array in capturing the wheat and Th. elongatum chromosomes.
Validation of array for chromosome identification in closely species
Elytrigia Desv. and Pseudoroegneria (Nevski) A. Löve are important resources for breeding distant hybrids of wheat. To examine the potential application of this array in distantly related species, three representative species (Ps. spicata, Th. bessarabicum, and Th. intermedium) and 2 wheat-Th. ponticum derivatives were used for targeted capture and sequencing analyzes. We found that a large number of probes designed based on the Th. elongatum reference genome could capture sequences in these three closely related species (Fig. 4a-c, Fig. S7a-d). The successfully captured loci appeared evenly distributed across the chromosomes, indicating good conservation of homoeologous groups among them. Using a DPM > 1 as a threshold, the highest capture rate was obtained for Th. bessarabicum (37.0%, 14829/40100), followed by Th. intermedium (26.7%, 14829/40100) and Ps. spicata (23.9%, 9602/40100) (Fig. S7e-f). This result indicated that Th. elongatum was more closely related to Th. bessarabicum.
A previous study has shown that it was difficult to identify the individual chromosomes of Th. ponticum (Host) D.R. Dewey (2n = 10x = 70) using current tandem repeat-based probes (Li and Wang 2009; Xi et al. 2019). CH18058 was a stable line derived from a cross between the wheat(7182)-Th. ponticum partial amphiploid line Xiaoyan784 (2n = 8x = 56). GISH using Th. ponticum genomic DNA as probes showed that CH18058 contained three Th. ponticum chromosomes (Fig. 4d). The result of FISH analysis revealed that a pair of 1D chromosomes were absent and replaced by two alien chromosomes, which was further confirmed by the result of GenoBaits®WheatplusEE array detection. Furthermore, the array data analysis also showed that another alien chromosome was a short arm of the Th. ponticum group 5 chromosome (Fig. 4e and Fig. S8a). Therefore, CH18058 carried 40 wheat chromosomes consisting of 14 A-genome, 14 B-genome, 12 D-genome, 2 Th. ponticum chromosome, and 1 Th. ponticum chromosome arm.
Analysis of chromosome compositions of the partial amphiploid
XiaoYan693 is a wheat-Th. ponticum partial amphiploid (2n = 8x = 56). Using GISH and FISH, we observed that XiaoYan693 possessed 16 Th. ponticum chromosomes and 40 wheat chromosomes composed of 14 A chromosomes, 12 B chromosomes (missing a pair of 6B), and 14 D chromosomes (Fig. 5a-b). This was consistent with the result of previous studies (Li et al. 2021; Zheng et al. 2014b). Based on the results of the matrix, the 1B chromosome displayed a significantly lower capture rate than other wheat chromosomes, 17.1% compared to 84.3% (average)(Fig. 5c). A comparable number of successfully captured loci were observed for the 7 Th. elongatum chromosomes, ranging from 775 (4E) to 1533 (2E). However, 6E has a higher sequencing depth than other chromosomes, with a 1.44-fold variation (Fig. 5d). A similar result was observed for CH18058, in which the sequence depth of the single chromosome arm could be reduced by 23.1% compared to the other pair of Th. ponticum chromosomes (Fig. S8b). These results indicated that we can use sequence depth to estimate the chromosome copy number.
We also compared the alien chromosomes of XiaoYan693 with other distantly related species and derivates to infer its subgenome constitution. Based on the mapped sequence reads, we remained with 644180 high-quality biallelic SNPs for constructing phylogenetic trees. The highest frequency of SNPs was identified in chromosome 2E (17.9%, 115547), followed by 7E (17.9%, 115483), 6E (15.0%, 96590), 5E (14.0%, 90133), 3E (13.0%, 83853), 1E (12.4%, 79562) and 4E (9.8%, 63012). As expected, Th. elongatum and its derivatives were preferentially clustered together. For XiaoYan693, the five alien chromosome groups were more closely related to the E genome of Th. elongatum (Ee) and Th. bessarabicum (Eb), while the remaining alien chromosomes were preferentially clustered together with the St genome of Ps. Spicata. Based on these findings, we speculate that six chromosomes of the St genome and one of the E genome (group 5) were present in its alien genomes for XiaoYan693. Similarly, the alien genome of CH18058 may be composed of one pair of St chromosomes (1St) and one short arm of Eb chromosome (5EbS) (Fig. 5e and Fig. S8).
An investigation of polymorphism in Fhb7 homologs from Thinopyrum and its derivatives
Fhb7 is a gene horizontally transferred from fungus and can confer broad resistance of wheat to head blight and crown rot without a yield penalty (Wang et al. 2020). We designed 26 probes to target capture of the gene body and flanking sequences (± 2Kb) of Fhb7 in GenoBaits®WheatplusEE (Fig. 6a). Target sequencing of Th. elongatum demonstrated that Fhb7 was successfully captured for both the coding and flanking regions using GenoBaits®WheatplusEE (Fig. 6b). Further results supported that Fhb7 homologs were indeed detected in the genera Thinopyrum, including Th. bessarabicum, Th. intermedium, and Th. ponticum (Guo et al. 2022). XiNong 511, a leading cultivar in China, was derived from the common wheat–Th. ponticum line and has been shown to exhibit good resistances to FHB (Fig. 6c). However, we observed that Fhb7 was present in XiaoYan693 but absent in XiNong511 (Fig. 6b), indicating that resistance was probably contributed by another gene than Fhb7. In comparison, only 12 high-quality biallelic SNPs occurred in the coding region of Fhb7 among Thinopyrum species, but significantly higher levels of nucleotide variation (154 SNPs vs 12 SNPs) distributed in the flanking region. Furthermore, large fragment deletions were observed in the flanking region of Fhb7 for Th. intermedium and XiaoYan693. In summary, in contrast to solid chip-based technology, the GBTS platform is highly convenient due to capture-in-solution (liquid chip), which will better serve Thinopyrum species functional studies and wheat improvement.