Genome Evolution and Diversity of Wild and Cultivated Rice Species

doi:10.21203/rs.3.rs-4350570/v1

Download PDF

Biological Sciences - Article

Genome Evolution and Diversity of Wild and Cultivated Rice Species

https://doi.org/10.21203/rs.3.rs-4350570/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Rice (Oryza sativa L.) is a vital staple food globally, but its genetic diversity has decreased due to extensive breeding. However, research on genome evolution and diversity of wild rice species, particularly those with BB, CC, BBCC, CCDD, EE, FF, and GG genome types, is limited, impeding their potential in rice breeding^1,2. This study presents chromosome-scale genomes of thirteen representatives wild rice species from the Oryza genus. By integrating these genomes with four previously published ones, a total of 101,723 gene families were identified across the genus, including 9,834 (9.67%) core gene families. Additionally, 63,881 new gene families absent in cultivated rice species were discovered. Comparative genomic analysis among Oryza genomes reveals potential mechanisms underlying genome size variation, centromere evolution, and gene number and expression influenced by transposable elements. Extensive structural rearrangements, large scale sub-genomes exchanges, and widespread allelic variations and regulatory sequence variations were discovered in wild rice. We noticed an inversion that are pervasive occurred in Oryza rufipogon and Oryza sativa japonica, which is tightly linked to a locus that might contributed to the expansion of geographical range. Interestingly, a notable expansion but less diversity in disease resistance genes in cultivated genomes was observed, likely due to the random loss of some R genes and extensive amplification of others for specific diseases during domestication and artificial selection. This comprehensive study not only provide previously hidden legacy accessible to genetic studies and breeding but also deepens our understanding of rice evolution and biology.

Biological sciences/Genetics/Genome/Genetic variation

Biological sciences/Plant sciences/Plant evolution

Rice (Oryza sativa L.) is a crucial crop globally and serves as a key model species for monocot and crop plant research ³. Rice production will need to double by 2050 in order to feed the demand of the increasing world population⁴. It is expected to break the current production bottleneck that stagnates rice yields by exploiting the presence and absence genes and interspecies allelic diversity^5,6.

The Oryza genus comprises two cultivated rice species: Asian and African rice, along with 20 extant wild rice species. These wild rice comprising six diploids (AA, BB, CC, EE, FF and GG) and five allotetraploids (BBCC, CCDD, HHJJ, HHKK and KKLL), exhibiting significant genetic and phenotypic diversity, adapting to various ecological environments across Asia, Africa, America, and Australia, exhibits great genetic and phenotypic diversity ^7,8.

Recent advancements in obtaining high-quality genome assemblies of cultivated rice have allowed for a thorough characterization of structural variation through comparative genomic analysis ^9–11. Various studies have contributed to the construction of pangenomes that integrate cultivars and the AA genome of wild rice ^12,13. Currently, there is a lack of high-quality reference genomes for 11 rice wild relatives (BB, BBCC, CC, EE, FF, and GG) and a comprehensive super pangenome of the rice genus are lacking. Here, we de novo assembled high-quality genomes from 13 wild rice accessions (Supplementary Table 1), along with two cultivated rice species (cv. Nipponbare, referred to as NIP of japonica sub-species and R498 of indica sub-species), Oryza. glaberrima (cv. CG14 of African cultivated rice species) and Oryza rufipogon (W1943)^9,14–16. The Oryza genus super pangenome usher a new era for rice researcher and breeders with the amount new resource to improve cultivated rice and meet future food demands.

High-quality assemblies of thirteen representative wild rice species

We selected thirteen representative wild rice species, including six allotetraploids and seven diploids, for de novo genome assembly. These accessions exhibit significant variation in geographical distribution and phenotype, as illustrated in Fig. 1a. A total of 331.3 Gb of High-fidelity (HiFi) reads were generated for the 13 wild rice accessions, representing approximately 40-fold coverage relative to the NIP genome size of around 400 Mb (Supplementary Table 1). Through the integration of high-throughput chromosome conformation capture^17,18, all 13 wild rice accessions were assembled at the chromosome level, resulting in genome size ranging from 393.77 Mb to 903.31 Mb with an average N50 contig size of 36.80 Mb (Supplementary Table 2, Supplementary Fig. 1, Extended data Figs. 1, 2). The mapping of Illumina short reads and HiFi reads to the 13 wild rice genome assemblies revealed high percentages of alignment, with the results of more than 99.88% and 99.95%, respectively (Supplementary Fig. 2). While an average of 99.44% of the 1614 single-copy orthologs (BUSCO) were identified in these assemblies¹⁹. Furthermore, an average value of 23.64 for the LTR assembly index (LAI) was calculated for all assemblies²⁰, as along with high consensus quality values, indicative of their superior quality, continuity and completeness (Supplementary Table 2, Supplementary Fig. 1).

The gene structure annotation was performed using a combination of de novo, homologous, and transcript prediction methods for gene structure annotation based on the repeat-masked genome. This resulted in a higher number of protein-coding gene numbers (ranging from 37,711 to 85,846) compared to previous report on O. brachyantha and NIP²¹ (41,096) (Supplementary Table 2). The transcript data supported a high percentage (ranging from 71.1–87.7%) of the predicted protein-coding genes, indicating the quality of the gene annotations (Supplementary Table 1). A phylogenetic tree based on 3555 single-copy orthologous genes grouped the 17 rice species into 5 clusters, which contradicted previous findings²² (Fig. 1b). Phylogenetic inference using 1000 randomly selected single-copy orthologs showed that 66.42% were supported, suggesting potential incomplete lineage sorting or gene flow (Fig. 1b). Additionally, an ML tree using chloroplast data was constructed to trace the maternal progenitors of allopolyploid wild rice (Supplementary Fig. 3). Transposable elements (TEs) play a significant role in shaping large plant genomes and driving genome evolution through periodic bursts of amplification (Supplementary Fig. 4). The TE content in the 13 wild rice genomes ranged from 35.11–76.35%, with long terminal repeat retrotransposons (LTR-RTs) being the most abundant (Fig. 1).

Construction of the super pangenome of the Oryza genus helps to review untapped genes and hidden genomic variations

The rice pangenome was expanded to include 16 species (17 subspecies) in the Oryza genus, incorporating genomes from three AA-genome Oryza species. Through OrthoFinder analysis²³, a super pangenome was constructed, clustering 808,478 predicted gene models from 13 wild rice species, along with three previously published AA genotype rice genomes and a reference genome of Oryza sativa (Nip), resulting in a pangenome cluster of 101,723 (Extended data Fig. 3a, Extended data Fig. 4a). A total of 9.66% of the gene families (9,834) were found to be shared among all 17 rice accessions, classified as core gene families. Dispensable gene families, present in 2–15 individuals, made up 56.84% of the Oryza genus pan genome, while 33.48% were identified as species-specific gene families (Extended data Fig. 3a, b). Compared with cultivated rice, our super pan-genome of this study can provide an additional 63,881 new gene families.

The 17 rice accessions, comprising 11 diploids and 6 allotetraploids, were categorized into diploid genome types labeled A to G (Extended data Fig. 4b). Additionally, a syntenic pangenome was constructed to analyze the differentiation among the 7 diploid genomes, revealing 16,849 core gene families shared across these genomes, Dispensable families, accounted for 29.73% of the total gene sets (Extended data Fig. 4, Supplementary Table 3), with the largest proportion represented by genome type private gene sets, unique to individual genome types, making up 52.90% of the total gene sets (Supplementary Table 3).

Within the Oryza genus, approximately 81.14% of the core genes could be assigned to protein domains in our Pfam and InterPro databases, which is nearly twice as high as the percentage of dispensable genes (41.82%) and more than seven times higher than accession-specific genes (10.83%) (Extended data Fig. 3d). Core genes exhibited 6- to 20-fold higher expression levels compared to shell and private genes (Extended data Fig. 3e). This tendency was also reflected in gene length, with core genes showing significantly lower (0.15-fold on average) pairwise nonsynonymous substitution/synonymous substitution ratios (Ka/Ks) compared to the dispensable genes (Extended data Figs. 3f, g), indicating conservation of function among core genes in the Oryza genus, while variable genes evolved more rapidly to adapt to diverse environments. The average LTR insertion ratio of core genes was notably lower than that of shell- and species-specific genes (Extended data Fig. 3i), likely due to the lower number of exons per gene in accession specific genes compared to dispensable and core genes (Supplementary Fig. 5), suggesting that exon shuffling or loss contributes to their specificity of genes to each species.

Intriguingly, the core genes in the Oryza genus are primarily involved in fundamental functions such as transposition, iron ion binding, transport, and electron transport, indicating their role in maintaining essential activities of Oryza genus (Extended data Fig. 3h, Supplementary Table 3). A further GO analysis of the different genome types of rice private genes showed distinct functional difference except for disease resistance (Supplementary Fig. 6). These Oryza genus super pangenome open the door for non-AA genome wild rice resources utilizing in rice biology and breeding.

Transposon signature contribute to various genome and centromere size in rice

The selective removal and retention of TEs has significantly influenced the genome size, adaptation and evolution of the Oryza genus ²⁴. Scientists have been intrigued by the question of which specific subfamily of TEs influences the genome size²⁵. To address this, a comprehensive classification of TEs within the Oryza genus was conducted, along with an in-depth analysis of the expansion profiles of long terminal repeat (LTR) subfamilies (Top 6 selected from the largest genome size in Oryza) were performed. Each LTR-RT sub-family displayed unique patterns of amplification across the species, impacting gene numbers and expressions (Extended data Fig. 5b, Supplementary Fig. 7a, Supplementary Table 4). In addition to the amplification of Gypsy superfamily (Ogre, Retand, and Tekay), the Angle LTR belongs to the Copia superfamily emerged as a significant contributor to the large genome size of O. australiensis (E genome type rice), distinguishing it from other rice species (Extended data Fig. 5b, c, Supplementary Fig. 7a). The genome sizes of B, C, and D type rice genome were primarily influenced by the top three Gypsy superfamilies in descending order: Ogre, Retand and Tekay (Extended data Fig. 5c, Supplementary Table 4). The genome size of G genome type (O. meyeriana) was predominantly influenced by Retand LTR amplification (Extended data Fig. 5c), and the abundance of Tekay in O. glumaepatula surpassed that in the O. sativa resulted in its slightly puffy genome. Notably, O. brachyantha (F genome type) exhibited significant elimination of all LTR subfamilies compared to O. sativa (Extended data Fig. 5c), aligning with the observed LTR density patterns across Oryza genomes (Supplementary Fig. 8). Genome size showed a stronger correlation with the Retand LTR superfamily compared to the other subfamilies within the Oryza (Extended data Fig. 5d). While SIRE and the CRM subfamily made up a small portion of the entire genome, they influenced a certain proportion of genes expression (Extended data Fig. 5c, Supplementary Fig. 7b, Supplementary Table 4).

The distribution of whole-genome intact long terminal repeats (LTRs) indicated that the majority of LTR bursts occurred in proximity to centromeric regions (Supplementary Fig. 8). Rice centromeres consist of organized satellite repeats (SRs), interrupted by centromere-specific retrotransposons (CRRs) ²⁶. It remains unclear if other lines or wild rice plants possess unique centromere satellites and whether centromere repositioning events occurred during centromere evolution. In cultivated rice (MH63 and ZS97), 155 bp and 165 bp CentO satellite repeats were categorized into seven distinct subsets across the 12 chromosomes ²⁷. Interestingly, only a few copies of satellite repeats were identified in O. meyeriana and O. branchyantha wild rice. Compared to the cultivated rice (NIP) genome, the C genome contains centromere-specific 126 bp and 366 bp Cent satellite repeats (Supplementary Table 4). Phylogenetic analysis results showed that the satellite repeats in Oryza can be classified into four groups (Extended data Fig. 5e), with chromosomes of the same type tending to cluster together, supporting models of repeated amplification events involving the central domain and local homogenization. Examination of the genetic characteristics of centromeres in the Oryza genus revealed a decrease in gene density as centromeres are approached, along with an increase in transposon density and the frequency of k-mers (Extended data Fig. 5f). While the wild rice centromere sizes, except Oryza brathyantha, were notably larger than those in cultivated rice, the opposite trend was observed for the number of genes (Extended data Fig. 5g, Supplementary Fig. 7c). A comparative sequence map of centromere synteny between cultivated rice and wild rice highlighted extensive structural rearrangements in centromeric and pericentric regions across the Oryza genus (Extended data Fig. 5h, Supplementary Fig. 9, Supplementary Table 4). Additionally, several centromere repositioning events were noted in the synteny analysis (Supplementary Fig. 9).

Large-scale chromosomal rearrangement and inferring the genome evolutionary history of Oryza lineages

To compare the karyotype stability between cultivated rice and wild diploid genomes, we created a synteny map and conducted whole-genome pairwise alignments to identify large segment variations, such as translocations and inversions⁷. Notably large inversions (with more than 5 consecutive genes) shared by at least two consecutive species in the Oryza genus were prevalent in the genome alignments¹¹ (Extended data Fig. 6). While reports on segregating inversions in wild rice are scarce and have not yet included natural polyploid wild rice, most of these events were observed in the low-recombining pericentromeric regions of the Oryza chromosomes, with a few inversions being species-specific (Extended data Fig. 6). The AA genomic type genome displayed a significant level of chromosomal conservation. Furthermore, the gene synteny findings provide support for the phylogenetic position of the rice genus (Supplementary Fig. 10). Through multiple species/genome comparisons, many large-scale genomic rearrangements were validated, including an inversion of an approximately 2.53 Mb segment comprising 166 genes on Chr6 in the modern cultivated rice NIP (Fig. 2a). The inversion occurred only among the common wild rice and Oryza sativa japonica with high latitude distribution, indicating that it has risen to expansion of geographical range. OsMFT1, which contribute to its later flowering in Oryza sativa was located in the inversion region with an 89 kb distance next to the breakpoint in Oryza glumaepatula ²⁸(Fig. 2b).

Chromosomal rearrangements involving homoeologous groups 1, 3, and 6 of allotetraploid wild rice were initially identified in Oryza species (Extended data Fig. 7a). The comparison of syntenic blocks revealed that chromosomes 6D_t in O. latifolia displayed complete collinearity with the corresponding 3D_t chromosomes in O. alta and O.grandigumis, However, a reciprocal translocation was observed between 3D_t and 1C_t in O. alta and O.grandigumis (Extended data Fig. 7a), due to a fragmental collinearity between 3C_t in O. latifolia and 1C_t in O. alta and O.grandigumis. Additionally, a translocation between 1C_t and 6C_t as identified, with the 1Ct segment translocated to the end of 6C_t. Furthermore, a translocation was detected in homoeologous groups 4 and 7 in Oryza (Extended data Fig. 7c-i).

By aligning allotetraploid wild rice resequencing data to the corresponding diploid BB and CC or CC and EE genomes, homeologous exchanges on each chromosome were identified based on the coverage depth calculated from unique reads. Several translocations of large segments between the subgenomes post-tetraploidization were discovered (Extended data Fig. 7b, c). Chromosomes B_t1 and C_t1 exhibited high synteny, but the coverage depth of the reads to the BB and CC genomes indicated a significant homoeologous exchange between them (Extended data Fig. 7b). The potential history of homeologous exchange is depicted in Extended data Fig. 7d.

Discovery of untapped SVs in the Oryza genus sequence and their influence on agronomic traits

Despite extensive efforts to analyze genetic variants in cultivated rice and its ancestor species O. rufipogon ²⁹, the genetic diversity in distantly related wild rice species such as O. punctata, O. rhizomatis, and O. meyeriana remains poorly understood. We identified 2781–10656 insertions, 2680–10419 deletions, 4–52 translocations, and 7–22 inversions in the 16 rice accessions, with sizes ranging from 162.49-278.65 Mb, 182.13-705.17 Mb, 8.64-887.29 kb, and 41.51–11.33 Mb, respectively (Extended data Fig. 8a, Supplementary Table 5). Interestingly, the cultivated rice and the AA genome of wild rice showed a higher number of structural variations compared to the non-AA genome of wild rice, although the size of variation was smaller, likely due to more regions aligning with the reference genome (Extended data Fig. 8a). Wild rice species-specific SVs accounted for a significant portion of the total variation, indicating untapped genetic diversity in wild rice compared to cultivated varieties (Extended data Fig. 8a).

The majority of insertions, deletions, and inversions in the cultivar were shorter than 5 kb. As the length of SVs increased, there was a significant decrease in the number of SVs in cultivar and wild rice, whereas the wild rice variety had a higher number of SVs larger than 250 kb, leading to the presence of numerous private genes in the wild rice genome (Extended data Fig. 8b, Extended data Fig. 9a). Intergenic regions within the Oryza genus were most common locations for SVs, followed by regions ± 10kb around genes (Extended data Fig. 9b, Supplementary Table 5), in line with previous research findings ⁹. Surprisingly, insertion and deletion variations were less frequent at the chromosomes ends (Extended data Fig. 8c). The private genes in each rice genome type and their corresponding presence-absence variations (PAVs) relative to the NIP were also recorded (Extended data Fig. 8c, d). when using NIP genome sequence as a reference, the cultivated rice displayed a higher number of SVs compared to wild rice, consistent with earlier observations (Extended data Fig. 8a). However, wild rice, particularly O. australiensis and O. meyeriana, exhibited substantial variation in SV sizes, indicating an enrichment of SVs in repetitive DNA regions (Extended data Fig. 9c). Further examination of transposable elements in PAV sequences revealed that other and DNA transposable elements were the primary components of both deletion and inversion variation (Extended data Fig. 9d).

By analyzing a large number of SVs across different rice genomes within a phylogenetic framework, we were able to uncover evolutionary events that would have otherwise gone undetected with a limited number of genomes. Recent findings suggested that gene loss could be linked to insertion/deletion event. For example, a 500 kb insertion corresponding to the NIP genome was identified on chromosome 12 at 14.50 Mb (Extended data Fig. 8e). In addition, the insertion occurred only in the O. eichingeri and C_t subgenome of O. punctata. Further detailed investigation revealed that the PAV region contained a gene (OPUW363G084108/OEIW71G043491) specific to the C genome wild rice³⁰ (Extended data Fig. 8e). This gene might be de novo birth to contribute to the ability of wild rice to adapt to poor and problem soil in Sri Lanka. The phylogenetic results revealed that the gene originated from O. eichingeri and then transferred to O. punctata, providing evidence that O. eichingeri was the progenitor of tetraploid O. punctata, consistent with our chloroplast evolution results. This result demonstrated that insertion variation occurred during C genome wild rice speciation and cultivated rice, which exhibited SVs possibly via introgression from hybridization with O. eichingeri.

A multitude QTLs in O. officinalis have been identified for brown planthopper resistance, but the lack of unknown sequences in wild rice has hindered the cloning of these genes³¹. In this study, we identified the Bph4 gene through a combination of comparative genome analysis and gene annotation within the QTL region (Supplementary Fig. 11). Haplotype analysis indicated that Bph4 is highly conserved in cultivated rice but displayed diversity in wild rice.

Among the 13 wild rice accessions studies, only three species retained the functional S28 locus³², while others lacked either the ribosomal protein S27 gene or the nearby UDPGT gene (Extended data Fig. 8f). The phylogenetic analysis of the Oryza genus suggested that the HS locus likely originated from O. australiensis and diverged from the C genome of wild rice (Extended data Fig. 8f).

Allelic and regulatory elements variations

The natural allelic variation of genes is essential for phenotypic diversity, environmental adaptation, and the process of domestication ^33–35. Our analysis focused on variations in whole-genome alleles and their regulatory sequences (gene ± 10kb) in the rice genome, as there are very few highly collinear blocks between non-AA genomic wild rice and cultivated rice (Extended data Fig. 10, Supplementary Table 6). As the divergence from cultivated rice increased, the number of colinear genes between wild rice diploids and cultivar rice decreased, ranging from 18,463 to 23,812, with an average of 20,288 (Supplementary Table 6). Including 19 published, high-quality, chromosome-level cultivated rice genomes (Supplementary Table 1) in our study allowed us to identify comprehensive SVs resources for both wild and cultivated rice⁹. By mapping collinear genes with 10 kb nearby regions onto the corresponding region of Nipponbare, we identified SNPs and InDels of 50 bp or greater as PAV targets. The total number of SVs increased with the accession number, with cultivated rice showing a higher percentage of nonredundant SVs compared to wild rice (Extended data Fig. 11a). The wild rice genomes exhibited a greater number of alleles and gene haplotypes than cultivated rice (Extended data Fig. 11b, c), indicating a rich source of novel genetic variations. To delve deeper into the functional impact of SVs on genes or proteins, combining variant alleles detected in each species into haplotypes and annotating each accession independently is essential. Wild rice (sub)genome displayed a higher number of alleles in collinear genes within the core genome compared to cultivated rice (Extended data Fig. 11c). The number of gene haplotypes (gHap) and gene-coding sequence (CDS)-haplotypes (gcHap) in wild rice was significantly greater than in cultivated rice (Extended data Fig. 11c). Analyses of protein diversity in collinear genes between wild and cultivated rice have provided insight into their functional differentiation. A genome-wide protein cluster was created based on their domain similarity, revealing that wild rice had approximately 7 clusters, corresponding to the number of wild rice genome types, whereas cultivated rice predominantly clustered into one group, corresponding to the AA genome type (Extended data Fig. 11d). Furthermore, analysis of gene presence-absence variations (PAVs) distinguished major species and highlighted significant differences between wild and cultivated rice (Extended data Fig. 10b-d). The majority of group-unbalanced genes, accounting for 87.33%, were more prevalent in wild rice but less common in cultivated rice, underscoring the substantial legacy of mutations in wild rice (Extended data Fig. 11e). Notably, the selection for grain coat color during rice domestication is evident, with wild rice species predominantly displaying black and red grain, while most cultivars exhibit white seed coat color. Structural variation analyses revealed distinct haplotypes of the Rc protein among cultivated and wild rice accessions³⁶. Compared to the Rc haplotype in cultivated rice, wild rice exhibited 7 haplotypes corresponding to different genome types, suggesting that genetic divergence in Rc played a role in grain pericarp development during domestication (Extended data Fig. 11f).

Gene CNVs and NLR repertoire in rice

Recent studies have highlighted the significant role of genomic copy number variations (gCNVs) in the evolution and domestication of crops ^37,38. However, the accurate identification of gCNVs in highly repetitive genome sequences within the rice genus pose notable challenges. Leveraging our high-quality assemblies, we systematically investigated gCNVs by aligning collinear blocks of the rice accessions against the Nipponbare reference genome, assessing their potential impact on important agronomic traits. Through whole-genome comparisons, we identified 207 genes with tandem repeats across the 14 wild rice assemblies, potentially influencing yield, resistance, grain quality, heading date, biotic and abiotic resistance (Supplementary Table 7). To gain further insights into the functional roles of gCNVs in rice, we analyzed 4400 genes with known functions from a previous study³⁹. Among these genes, 36 exhibited tandem repeats in the rice genus, impacting various agronomic traits related to yield, disease and pest resistance (e.g., blast, bacterial blight, rice brown planthoppers), biotic stress tolerance, element transport, and other important adaptation traits like heading date and hybrid sterility (Fig. 3a). Additionally, we assessed the expression levels of selected gCNVs to investigate potential alterations in their expression profile. Notably, several variations linked to the Pi9 cluster, with gCNVs in the 10.38 Mb region of Nip genome chromosome 6, were also identified. Pi9 is a well-known gene in rice that offers strong and long-lasting resistance to the fungus M. oryzae⁴⁰. Interestingly, Pi9 is a typically NLR genes with copy number variation, which contributed to rice species environmental adaptation (Supplementary Fig. 12).

Genes that encoding nucleotide-binding domain and leucine-rich repeat (NLR) proteins play a crucial role in plant immune systems ⁴¹. Therefore, it is essential to have a comprehensive and accurate NLR dataset for rice genera. Plant NLRs often occur in clusters, making their identification challenging. To address this issue, we utilized RGAugury ⁴² and DupGen_finder ⁴³ tools, resulting in a total of 7,048 NLR genes across rice genus (Supplementary Table 7). The number of NLR genes varied from 419 in O. glabberima to 511 in O. sativa indica (R498) in cultivated rice and from 159 in O.australiensis to 669 in O. punctata in wild rice (Fig. 3b, Supplementary Table 7), This suggests that the immune system in wild rice has a more diverse evolutionary history compared to cultivated rice.

Our study focused on identifying and categorizing NLRs in different rice species to establish a comprehensive understanding of NLR diversity within rice genus. Interestingly, the diploid rice genome exhibited a lower number of NLR in wild rice compared to cultivated rice, despite the larger genome size in wild rice (Fig. 3f). For instance, the genomes of O. australiensis and O. meyeriana, although twice the size of NIP, contain only half the number of NLRs of that in cultivated rice (Fig. 3f). Analysis of NLR distribution showed that while R gene singletons were similar between wild and cultivated rice, cultivated rice tended to have a higher number of R genes in pairs or clusters compared to wild rice (Fig. 3b-d). Redundancy analysis revealed that 55.64% of NLR signatures were shared across all genomes, with 15 unique signatures in the cultivated group and 162 unique signatures in the wild group (Fig. 3g). The study found that as the number of cultivated rice accessions increased, the number of core NLR signatures also tended to increase. Redundancy analysis of the NLR gene in wild and cultivated rice revealed that 78.8% of the NLR genes in cultivated rice were dispensable, slightly lower than in wild rice (Fig. 3h). More than 90% of NLR genes in the core NLR genome were expressed in both wild and cultivated rice, while around 20% of NLR genes in the dispensable genome exhibited low or no expression under normal conditions, suggesting specific expression upon encountering disease (Fig. 3i).

We classified NLRs of the rice genus into 369 clusters, and 167 clusters of which were sharply increased in cultivated rice (Supplementary Table 7), including well-studied rice R gene families that provide resistance to rice blast disease caused by Xanthomonas oryzae. Pv. Oryzae (Xoo), such as WRKY61 (Fig. 3j). Additionally, an NLR expansion event was observed in the wild rice pangenome (Fig. 3j), enabling these plants to adapt to various environments compared to cultivars. By leveraging lost NLR gene rice during domestication and artificial selection, we can enhance the resistance resources of cultivated rice and enrich the diversity of modern commercial rice. The total number of NLRs in cultivated rice species has increased compared to diploid wild rice species, despite some NLR gene losses, indicating that NLR expansion into cluster forms may be driven by breeding for specific pathogen resistance.

We integrated 13 wild rice species, three cultivated species and one common wild rice high-quality assembly to construct a comprehensive super pangenome of the rice genus. Compared with cultivated rice, our super pan-genome of this study can provide an additional 63,881 new gene families. Notably, we reconstructed the phylogenetic tree of Oryzae at the genome level and corrected the evolutionary positions of BB, CC, FF and GG rice species. Our analysis delved into pervasive structural variations, examining the size and distribution of Oryzae.

In addition to the Oryzae pangenome resources we present, our study also exemplifies how these new resources can enhance our understanding of the role of SVs, gCNV, and allelic variation in the processes of environmental adaptation, domestication, differentiation, and artificial selection in rice. Moreover, our examination of why genome sizes vary significantly during evolution in Oryzae and which component of repeat sequences contributes predominantly to rice genome size serves as a model for similar analyses in other plant species. Additionally, we observed that the number of NLRs in cultivated rice exceeded that in wild rice diploid genome but exhibited lower disease resistance than in wild rice (Fig. 3e-j); The cluster NLR number in cultivated rice was notably higher than in wild rice, suggesting that some additional copies of NLRs may be redundant in ensuring resistance in cultivated rice. This aligns with the notion that multiple NLRs are necessary for the broad-spectrum resistance of Tetep to blast ⁴⁴.

The next step of Oryza genus pan-genomic will focus on the effect of increasing production, resistance to various diseases and adaptation to changing environment for the private genes and alleles through gene editing.

Acknowledgments

We thank all the member of the Longan Yan group at Jiangxi Academy of Agricultural science for collecting and preserving the wild rice resource. We thank Dr. Zhilan Fan for providing the GG genome type wild rice O. meyeriana at Guangdong Academy of Agricultural science, we thank Dr. Shengyi Liu at Oil crops research institute, Chinese Academy of Agricultural science for providing constructive suggestions. This work was supported by China Agriculture Research System (CARS-01-08), National Key Research and Development Program of China (2017YFD0100302, 2023YFD1201203), National Natural Science Foundation of China (31960400), Major Discipline Academic and Technical Leaders Training Program of Jiangxi Province (20213BCJL22044), Jiangxi Technology Innovation Guidance Program (20223AEI91010).

Author contribution

L. Y., and Y. C. supervising the work. L. L., L. L., W. X., and Y. L. collected sample for resequencing, HiC, and HiFi sequencing. M. W. performed the genome assembly. Q. H. performed the genome annotation. Y. W., and Y. W. conducted the super rice pangenome construction. Q. H., and Y. W. Conducted SV and CNV identification. J. W., Z. Y., and W. C. collected sample for RNA-seq sequencing and conducted expression validation. W. L., H. D., and H. X wrote the manuscript and design the experiment.

Declaration of interests

The authors declare no competing interests.

Yu, H. et al. A route to de novo domestication of wild allotetraploid rice. Cell 184, 1156-1170 e1114, doi:10.1016/j.cell.2021.01.013 (2021).
Huang, C., Chen, Z. & Liang, C. Oryza pan-genomics: A new foundation for future rice research and improvement. The Crop Journal 9, 622-632, doi:10.1016/j.cj.2021.04.003 (2021).
Wing, R. A., Purugganan, M. D. & Zhang, Q. The rice genome revolution: from an ancient grain to Green Super Rice. Nat Rev Genet 19, 505-517, doi:10.1038/s41576-018-0024-z (2018).
Walkowiak, S. et al. Multiple wheat genomes reveal global variation in modern breeding. Nature 588, 277-283, doi:10.1038/s41586-020-2961-x (2020).
Wang, W. et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557, 43-49, doi:10.1038/s41586-018-0063-9 (2018).
Khan, A. W. et al. Super-Pangenome by Integrating the Wild Side of a Species for Accelerated Crop Improvement. Trends Plant Sci 25, 148-158, doi:10.1016/j.tplants.2019.10.012 (2020).
Stein, J. C. et al. Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza. Nat Genet 50, 285-296, doi:10.1038/s41588-018-0040-0 (2018).
Ge, S., Sang, T., Lu, B. R. & Hong, D. Y. Phylogeny of rice genomes with emphasis on origins of allotetraploid species. Proc Natl Acad Sci U S A 96, 14400-14405, doi:10.1073/pnas.96.25.14400 (1999).
Qin, P. et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell 184, 3542-3558 e3516, doi:10.1016/j.cell.2021.04.046 (2021).
Li, N. et al. Super-pangenome analyses highlight genomic diversity and structural variation across wild and cultivated tomato species. Nat Genet, doi:10.1038/s41588-023-01340-y (2023).
Zhou, Y. et al. Pan-genome inversion index reveals evolutionary insights into the subpopulation structure of Asian rice. Nat Commun 14, 1567, doi:10.1038/s41467-023-37004-y (2023).
Zhao, Q. et al. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat Genet 50, 278-284, doi:10.1038/s41588-018-0041-z (2018).
Shang, L. G. et al. A super pan-genomic landscape of rice. Cell Research 32, 878-896, doi:10.1038/s41422-022-00685-z (2022).
Xie, X. et al. A chromosome-level genome assembly of the wild rice Oryza rufipogon facilitates tracing the origins of Asian cultivated rice. Sci China Life Sci 64, 282-293, doi:10.1007/s11427-020-1738-x (2021).
Du, H. et al. Sequencing and de novo assembly of a near complete indica rice genome. Nat Commun 8, 15324, doi:10.1038/ncomms15324 (2017).
Wang, M. et al. The genome sequence of African rice (Oryza glaberrima) and evidence for independent domestication. Nat Genet 46, 982-988, doi:10.1038/ng.3044 (2014).
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol 31, 1119-1125, doi:10.1038/nbt.2727 (2013).
Kaplan, N. & Dekker, J. High-throughput genome scaffolding from in vivo DNA interaction frequency. Nat Biotechnol 31, 1143-1147, doi:10.1038/nbt.2768 (2013).
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210-3212, doi:10.1093/bioinformatics/btv351 (2015).
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res 46, e126, doi:10.1093/nar/gky730 (2018).
Chen, J. et al. Whole-genome sequencing of Oryza brachyantha reveals mechanisms underlying Oryza genome evolution. Nat Commun 4, 1595, doi:10.1038/ncomms2596 (2013).
Zou, X. H. et al. Analysis of 142 genes resolves the rapid diversification of the rice genus. Genome Biol 9, R49, doi:10.1186/gb-2008-9-3-r49 (2008).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20, 238, doi:10.1186/s13059-019-1832-y (2019).
Pulido, M. & Casacuberta, J. M. Transposable element evolution in plant genome ecosystems. Curr Opin Plant Biol 75, 102418, doi:10.1016/j.pbi.2023.102418 (2023).
Kidwell, M. G. Transposable elements and the evolution of genome size in eukaryotes. Genetica 115, 49-63, doi:10.1023/a:1016072014259 (2002).
Comai, L., Maheshwari, S. & Marimuthu, M. P. A. Plant centromeres. Curr Opin Plant Biol 36, 158-167, doi:10.1016/j.pbi.2017.03.003 (2017).
Song, J. M. et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol Plant 14, 1757-1767, doi:10.1016/j.molp.2021.06.018 (2021).
Song, S. et al. OsMFT1 increases spikelets per panicle and delays heading date in rice by suppressing Ehd1, FZP and SEPALLATA-like genes. J Exp Bot 69, 4283-4293, doi:10.1093/jxb/ery232 (2018).
Kou, Y. et al. Evolutionary Genomics of Structural Variation in Asian Rice (Oryza sativa) Domestication. Mol Biol Evol 37, 3507-3524, doi:10.1093/molbev/msaa185 (2020).
Gamuyao, R. et al. The protein kinase Pstol1 from traditional rice confers tolerance of phosphorus deficiency. Nature 488, 535-539, doi:10.1038/nature11346 (2012).
Hu, J. et al. Fine mapping and pyramiding of brown planthopper resistance genes QBph3 and QBph4 in an introgression line from wild rice O. officinalis. Molecular Breeding 35, doi:10.1007/s11032-015-0228-2 (2015).
Yamagata, Y. et al. Mitochondrial gene in the nuclear genome induces reproductive barrier in rice. Proc Natl Acad Sci U S A 107, 1494-1499, doi:10.1073/pnas.0908283107 (2010).
Bai, F. et al. Natural allelic variation in GRAIN SIZE AND WEIGHT 3 of wild rice regulates the grain size and weight. Plant Physiol 193, 502-518, doi:10.1093/plphys/kiad320 (2023).
Sun, X. et al. Natural variation of DROT1 confers drought adaptation in upland rice. Nat Commun 13, 4265, doi:10.1038/s41467-022-31844-w (2022).
Huang, X. et al. Natural variation at the DEP1 locus enhances grain yield in rice. Nat Genet 41, 494-497, doi:10.1038/ng.352 (2009).
Furukawa, T. et al. The Rc and Rd genes are involved in proanthocyanidin synthesis in rice pericarp. Plant J 49, 91-102, doi:10.1111/j.1365-313X.2006.02958.x (2007).
Wang, Y. et al. Copy number variation at the GL7 locus contributes to grain size diversity in rice. Nat Genet 47, 944-948, doi:10.1038/ng.3346 (2015).
Deng, Y. et al. Epigenetic regulation of antagonistic receptors confers rice blast resistance with yield balance. Science 355, 962-965, doi:10.1126/science.aai8898 (2017).
Huang, F. et al. New Data and New Features of the FunRiceGenes (Functionally Characterized Rice Genes) Database: 2021 Update. Rice (N Y) 15, 23, doi:10.1186/s12284-022-00569-1 (2022).
Qu, S. et al. The broad-spectrum blast resistance gene Pi9 encodes a nucleotide-binding site-leucine-rich repeat protein and is a member of a multigene family in rice. Genetics 172, 1901-1914, doi:10.1534/genetics.105.044891 (2006).
Feehan, J. M., Castel, B., Bentham, A. R. & Jones, J. D. Plant NLRs get by with a little help from their friends. Curr Opin Plant Biol 56, 99-108, doi:10.1016/j.pbi.2020.04.006 (2020).
Li, P. et al. RGAugury: a pipeline for genome-wide prediction of resistance gene analogs (RGAs) in plants. BMC Genomics 17, 852, doi:10.1186/s12864-016-3197-x (2016).
Qiao, X. et al. Gene duplication and evolution in recurring polyploidization-diploidization cycles in plants. Genome Biol 20, 38, doi:10.1186/s13059-019-1650-2 (2019).
Wang, L. et al. Large-scale identification and functional analysis of NLR genes in blast resistance in the Tetep rice genome sequence. Proc Natl Acad Sci U S A 116, 18479-18487, doi:10.1073/pnas.1910229116 (2019).

There is NO Competing Interest.

Fig.S1.tif
Supplementary Fig. 1
Fig.S2.tif
Supplementary Fig.2
Fig.S3.tif
Supplementary Fig.3
Fig.S4.tif
Supplementary Fig.4
Fig.S6.tif
Supplementary Fig.6
Fig.S7.tif
Supplementary Fig.7
Fig.S9.tif
Supplementary Fig.9
TableS1.xlsx
Supplementary Table 1
TableS3.xlsx
Supplementary Table 3
TableS4.xlsx
Supplementary Table 4
TableS5.xlsx
Supplementary Table 5
TableS6.xlsx
Supplementary Table 6
TableS7.xlsx
Supplementary Table 7
ExtendeddataFig.1.jpg
Extended Data Fig.1
ExtendedDataFig.2.jpg
Extended Data Fig.2
ExtendeddataFig.3.tif
Extended Data Fig.3
extendeddataFig.5.tif
Extended Data Fig.5
ExtendeddataFig.7.tif
Extended Data Fig.7
ExtendeddataFig.8.jpg
Extended Data Fig.8
ExtendeddataFig.10.tif
Extended Data Fig.10

Download PDF

Version 1

posted

You are reading this latest preprint version

Genome Evolution and Diversity of Wild and Cultivated Rice Species

Status:

Version 1

Abstract

Figures

Introduction

Results

High-quality assemblies of thirteen representative wild rice species

Transposon signature contribute to various genome and centromere size in rice

Allelic and regulatory elements variations

Gene CNVs and NLR repertoire in rice

Discussion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1