Population Structure Analysis
We first performed global comparisons between accessions from Asia and those from Africa using a Principal Component Analysis (PCA) (Fig. 1b). A total of 2710 individuals were studied, with a total of 7.315.477 unrelated SNPs spread across the entire genome. The accessions are well separated in three main clusters reflecting well-known genetic groups on the first two axes. On the first axis (39% of variance), accessions representing the subspecies Japonica (GJ) are differentiated from the subspecies Indica (XI) accessions, represented by varied shades of red-orange. On the second axis (21% of the variance), a third group, composed of the cA ecotype, appears separated from the GJ and XI subspecies. On the second component, the basmati rice varieties seem to be more related to the GJ group. There is no dimensional variation among the three GJ subgroups (GJ-trp, GJ-tmp, and GJ-sbtrp). The South Asian subpopulation XI-2 is distinct from the other XI subpopulations. It has a genetic similarity to the cA group. Most of these results have already been described in studies of rice diversity (Wang et al. 2018; Wang et al. 2018). On the basis of the PCA, the Japonica accessions from Africa appear undifferentiated from those from Asia. They are fully encompassed in the Japonica cloud. The predominant Japonica subgroup in Africa is GJ-trp. There is only one GJ-tmp accession (from Tanzania) and one GJ-sbtrp accession (from Côte d'Ivoire) among the 2710 accessions. Circum-Basmati rice (cB) and circum-Aus (cA) ecotypes are very rare in the African 3K-RG sample. We identified only three cB accessions in Africa, two of which are in Madagascar, and one in Liberia, and only one cA accession, from Kenya. These four accessions do not show any peculiarity in the PCA space. In contrast, all Indica subgroups (XI-1A, XI-1B, XI-2, and XI-3) are found in Africa. Their distribution in the PCA feature space conforms to the Asian variation. Altogether there was no obvious differentiation of African groups and subgroups compared to their Asian homologs.
The ADMIXTURE method served to test whether the African accessions exhibit specific constitutions compared to the major Asian groups (Fig. 1c). The cross-validation error is lowest for K = 9 (Fig. 1d). At K = 9, we find the same genetic groups that have previously been identified for this species. There was no cluster specific to African accessions, either Indica or Japonica, in this global inference analysis. The ADMIXTURE analysis assigned African accessions to the same cluster as Asian accessions.
PCA analyses on accessions representing only the GJ-trp reveal a slight geographical structuring of tropical Japonica groups (Fig. 2). The first axis (13% of variance) separates some of the GJ-trp from Indonesia from other Asian and African GJ-trp. The GJ-trp from the East Asian countries are distinguished from the other GJ-trp on the second axis (9% of variance). These results correlate well with the patterns observed by (Gutaker et al., 2020) within Japonica and more finely among varieties classified as tropical. On either side of axis 2, the West African GJ-trop (upland accessions) are divided into two groups (Fig. 2a). The West African GJ-trp stand out from most of the other GJ-trp on axis 3, together with accessions from Madagascar (Fig. 2b). As a result, axis 3, which accounts for 7% of the variance, allows to distinguish Japonica accessions originating from different geographic area (from Africa versus Asia). At the interface, a set of ten accessions originating from Indonesia are located between the African group and the GJ-trp accessions from Asia on the PCA (Fig. 2).
Local Ancestry Estimates
We investigated the genomic differentiation between African and Asian Japonica by chromosome painting with two approaches, PCA-KDE (Principal Components Anaysis – Kernel Density Estimation) and ELAI (Efficient Local Ancestry Inference). PCA-KDE (Santos et al., 2019) was used to locally assign blocks of 150 SNPs to Japonica, Indica and cA references. The individuals selected to represent the ancestral poles faithfully reflect the perfect genetic diversity structure of the Oryza sativa species (Fig. S1). For each chromosome, we represented the corresponding ideogram (Fig. S2). Within chromosome 1, seven GJ-trp accessions from Asia (Fig. S2) share a haplotype of more than 10 Mb (from 9.2 Mb to 19.3 Mb). This haplotype block is not found in African GJ-trp (Fig. S2). A significant extended haplotype on chromosome 6 (Fig. 3) of 3.8 Mb (between 17.9–21.7 Mb) and accounting for 2% of cA ancestry (yellow colour) is shared by 14 accessions (Table S1) of tropical Japonica (GJ-trp) in West Africa (Fig. 3). Two accessions (IRIS_313-11103, IRIS_313-11104) share a shorter cA haplotype (between 17.9 and 20.26 Mb) with the previous 14 accessions. The blocks assigned to cA are highlighted in Fig. 6c. We also note on the end of chromosome 6, between 25.8 and 26.7 Mb, a block of haplotype of cA ancestry on half of the GJ-trp accessions from Africa.
The findings of the ELAI allelic assay also highlight regions on chromosome 6 of cA ancestry in the GJ-trp rice population from Africa (Fig. S3). Within the regions assigned to cA by the PCA-KDE method, ELAI consistently identifies an occurrence of two cA alleles. ELAI differs from the PCA-KDE approach by directly estimating ancestry blocks by considering recombination through switch methods between different layers of the hidden Markov chain. The smoothness of the ideograms produced by the ELAI method is strongly influenced by the mg parameter (generation time after hybridisation). Therefore, this parameter affects the precision of allele inference in the source populations (Fig. S4-S7). The results with mg = 20 are comparable with the PCA-KDE approach (Fig. S8). The above results suggest a hybridization between the GJ-trp accessions and the circumAus subgroup in the generation of West African upland accessions.
Testing For Introgression
Local ancestry results identify at chromosome 6, a haplotype block of cA on sixteen GJ-trp accessions from West Africa. To assess whether this trace was left by hybridization or rather by incomplete lineage sorting (ILS), we applied the ABBA-BABA method (Durand et al., 2011) on the whole genome, and per chromosome. The analysis was performed twice by consciously changing the cA accessions. The sizes of the respective populations P1, P2, P3 and P4 (Fig. 4b) used are 322, 50, 201 (two randomly selected sets) and 3, respectively. O. glaberrima was used as an outgroup (P4). The results of the D-statistics are indicative of gene flow between cA and GJ-trp from Africa. The significance of the D values was tested by computing the standard deviations (SD) with the Jackniffe block procedure. We subdivided the genome into 379 blocks. The genome-wide mean value is D = 0.039, which is not very far from D = 0 (null hypothesis). The associated Z-score value is 0.56, which means that D is not significantly different from 0 (D is significant when Z is less than − 3 or greater than 3). As a result, we cannot conclude for the occurrence of gene flow between cA and GJ-trp at the genome-wide level. We quantified the admixture proportion (f), from the P3 population (cA) to the P2 population (African GJ-trp accessions), following the procedure described by S. Martin (https://github.com/simonhmartin/tutorials/blob/master/ABBA_BABA_whole_genome/README.md). Consistent with the ideogram plots, the percentage of cA genome in the African GJ-trp found is 0.22%.
Looking for footprints of introgression on each chromosome (Fig. 4a) we detected a significantly positive D value only for chromosome 6 (D = 0.55, Z = 4.39) (Table S2). This result suggests that the derived allele (alternative allele to the Nipponbare reference) is shared by the GJ-trp accessions from West Africa more often than expected by chance, strongly pointing to introgression of cA material into the West African GJ-trp accessions genetic background.
Using TreeMix, we were able to determine the genomic mixing patterns between cA and the geographically subdivided (Fig. 5a) tropical Japonica accessions (GJ-trp) between Asia and Africa (see Materials and Methods). The variance-covariance matrix shows that the model with no migration, m = 0, explains 95.3% of the total variance. In the absence of gene flow, this model predicts the most probable population splits. In the case where m = 1, the results show a first hybridization signal (in this case, the most relevant one) involving O. glaberrima species to generate the improved NERICA rice varieties. The inferred tree explains 98.3% of the variation with two simulated migration events (m = 2). Implemented on each chromosome, we could again detect a signal of gene flow from cA accessions towards West African GJ-trp (Fig. 5b).
The Southeast Asian GJ-trp (As4 and As5) and West African GJ-trp are likewise genetically close. The TreeMix results show that the African GJ-trp accessions are more closely linked to the Southeast Asian GJ-trp, but also highlight a hybridization signal between the cA and the West African GJ-trp on chromosome 6, as suggested by Patterson's D statistic.
Genetic Diversity Between Populations
The fixation index, abbreviated as Fst, assesses the degree of genetic differentiation between populations or distinct groups, ranging from 0 (no differentiation) to 1 (complete differentiation). The most striking global differentiation involves GJ-trp As7 vs GJ-trp Int+ (Fst = 0.55). (Fig. S9). The Fst values are generally low among Asian populations. Fst is comparatively quite low between the tropical Japonica accessions of As2, As4, As5, and As6 (between minimum 0.05 and maximum 0.11) while it shows a higher range when As7 is involved (Fst value between 0.20 and 0.37), confirming that tropical Japonica from North-East Asia (As7: Japan and Republic of Korea) appears to be distinct, possibly due to gene exchange with temperate forms. Genetically, the tropical Japonica accessions from Madagascar are more closely linked to those from Southeast Asia (As5), with a relatively low Fst of 0.06. These same Madagascan accessions differ slightly (Fst = 0.08) from West African upland rice without the cA introgression (GJ-trp Int1-). This Fst is 0.26 for accessions with the cA introgression (GJ-trp Int1+). The measured difference between GJ-trp Int1 + and GJ-trp Int1- is 0.19.
On chromosome 6, where we observed a signal of cAus introgression in the West African Tropical accessions, we found a very high value of Fst, differentiated between Int + and Int-. (Fig. 6a). The level of differentiation measured along chromosome 6 is quite similar between the West African tropical Japonica accessions with introgression and the other populations in Africa (tropical Japonica without introgression and the Madagascar accessions) and Asia (As2, As4, As5, As6 and As7). In all comparisons, a peak in Fst value was found on chromosome 6 between 18 and 22 Mb, corresponding to the region of cA introgression (Fig. S10). The highest values were found between GJ-trop Int + vs GJ-trop As6 (0.8) or vs GJ-trop As7 (near 1). Unlike in comparisons of other Japonica groups without introgression, the Fst values are very heterogeneous, and the peaks appear to be located at the telomeric regions on q arm (Fig. S11).
The level of intra-population diversity has no impact on Dxy, which measures the absolute genetic divergence between populations. It is more influenced by ancestral alleles and substitution rate. In contrast to the rest of chromosome 6, our results show a high level of absolute divergence around the introgression region of GJ-trop Int + relative to the different subgroups of tropical Japonica (Fig. S12). This level of absolute divergence was not observed when we compared West African Japonica that did not introgress the cA fragment to Asian populations, nor when we compared Asian populations to Asian populations (Fig. S13).
The degree of sequence polymorphism in a population is measured by nucleotide diversity (pi). The level of nucleotide diversity along chromosome 6 between tropical Japonica accessions from Africa shows a difference depending on whether the cAus introgression is present (Fig. 6d). Accessions with the cAus introgression (Int+) have a significantly lower level of genetic diversity than those without the introgression (Int-). GJ-trop As4 and GJ-trop Int + have remarkably low (near zero) pi values on chromosome 6 between 11 and 20 Mb (Fig. S14). The GJ-trop Int + and GJ-trop AS7 populations have the lowest degrees of polymorphism, 0.0034 and 0.0087 respectively.
Tajima's D value, which detect non-random evolution of sequences, is overall negative for more than three quarters of the chromosome 6 in the African accessions with the cA introgression. This pattern was not observed in the other populations. Interestingly, African accessions that were not introgressed with cA have a high positive Tajima D value along chromosome 6, except between 18 and 20 Mb, where it falls below zero (Fig. 6e).
Origin of the introgression
We investigated the diversity in and around the main introgressions in order to derive information on the origin and the process that led to the observed patterns. We focussed on the main two cA introgressions in African Japonica accessions on chromosome 6.
For the larger one (Int1), we extracted polymorphism data within the 3.8 Mb, between 17.9 and 21.7 Mb on chromosome 6 and compared the African upland GJ-trp that bear the introgression with the cA accessions. The African upland accessions that bear the full length Int1 all cluster on one clade in the resulting tree, whereas the two accessions with the smaller cA Int1 fragment (Fig. S15) fall together further in the tree. The cluster of the full Int1-bearing accessions is homogenous and clearly separated within a large branch of cA accessions that displays hardly any structure. Altogether this suggests that Int1 has a unique origin but that it does not point to a particular compartment of the cA group. For the smaller one (Int2), we extracted data for the 350 kb segment between 25.70 and 26.05 Mb and ran the same analysis. All but one (GBRA) of the GJ-trp accessions that fell with the cA varieties, that is bearing Int2, formed a small cluster together with 13 cA varieties (Fig. S16a,b). Despite their low number, these accessions have very diverse geographic origins. Note that these 13 varieties are part of the large cA branch that bears the cluster of varieties which have Int1. Accession GBRA falls in the cA cluster, but in another sub-branch, suggesting that it does not have Int2 per se.
Regarding the location of “integration”, we looked at the most informative external borders, i.e. those that are shared by a majority of the introgression patterns and that fall in a region with an easily traceable origin. They were the left external border of Int1, that we characterized on the basis of the 17.7–17.8 Mb segment, and the right outer border of Int2, that we characterized with the 21.7–21.8 Mb segment. The other borders are less stable in location and fall into regions with complex origins, as illustrated by the intermediate colours on the ideograms of Fig. 3. The selected regions of 100 kb are expected as speakers in the most recent introgression reduction.
The Int1 left outer border analysis reveals eight clearly distinct branches corresponding to cA, cB and several clusters within XI and within GJ. West African cultivars that have Int1 form a long branch with a sawtooth (annotated with asterisk in Fig. 6f) tip that connects close to the root of the tree, with varieties that do not have Int1 and originate from Africa, Madagascar and the Islands of Southeast Asia, Philippines and Indonesia (For more details see Fig. S17).
Given the broad diversity observed on this border, we extended the analysis to higher values along the chromosome and resolved the diversity patterns at the finest scale, enabling localization of the latest recombination point within less than a kb (between 17,904,819 and 17,905,605 see Table S3).
For Int2, the analysis of the 100 kb between 26.75 and 26.85 Mb shows a classical pattern with Japonica, Indica and cA major branches bearing a limited number of exceptions. Japonica appears divided into several sub-branches and the West African Japonica varieties bearing Int2 are distributed in a GJ-trp sub-branch, in company with varieties from insular Southeast Asia, continental Africa and Madagascar (Fig. S16b).
Considering the stability of the borders and the length of Int1 on chromosome 6 introgression, we attempted to extract loci which display specificity among the Int1 + materials in order to try and reconstitute the history of the introgression. The results are given in Table S3. Two main cases were found: on one side 40 loci displayed an allele present in (almost) all Int1 + materials and absent or very rare in the Int1-; on the other side, 6 loci displayed an allele found in only one Int1+. No intermediate case was found, thus impeding meaningful phylogenetic analysis. The former 40 cases are helpful to trace the initial source materials. Most of the occurrence outside the Int1 + materials was concentrated in 13 accessions that have a broad geographic distribution, including outside Asia, and that are classified Indica, cA or admixed. It features multiple haplotypic combinations that suggest ancient populational relationships rather than recent cut and paste recombinational relationships. The latter 6 cases are likely to characterize recent mutations that occurred after the start of the introgression process. These data do not allow any clear/firm conclusion but they can be useful for further analyses with broader materials.
Positive Selection Of Candidate Region
Using the rehh v3.3.2 program (Gautier et al., 2017), we detected signals of selection on four regions on chromosome 6 in the African GJ-trp group. These locations are related with a list of fifteen outlier SNPs with p-values greater than the -log(p-value = 5) threshold. The iHS statistic values for these fifteen SNPs are all negative, indicating that the alternative allele (non Nipponbare allele) is driving the selection. The candidate regions are shown in Fig. 6b. Two of these areas are situated between 17.9 and 21.7 Mb in the cA introgression region described above, while one region is located between 25.8 and 26.7 Mb in the second cA introgression region.
When performed on the Asian population, we detected signal of selection in two regions. They are between chr6:14,554,337 and 14,995,883 bp and chr6:21,497,039 and 21,702,660 bp (Fig. S18). Positive selection would involve the reference allele (Nipponbare) with positive iHS values.
Identification Of Candidate Genes
In a putative region with a substantial cA introgression (17.9–21.7 Mb) on chromosome 6, we found 46 non-redundant genes (Table 1). In the short cA introgression (25.8–26.7 Mb) of chromosome 6, twenty-one genes (Table 1) were discovered. Genes identified between 17.9–21.7 Mb are highly enriched in 43 biological processes (Table S4), 28 molecular activities (Table S5) and 4 cellular components (Table S6). We only kept the GO terms annotations that had adjusted p-values below the cutoff of 0.01 in overall. Subsequently, twenty-three GO terms are evaluated, and they are divided into five key process categories (Table S7). These terms, in turn, play an important role in: phosphatase activities, protein downregulation, abscisic acid (ABA) response pathway and cellular responses (Fig. 7a-b).
Table 1
List of candidate genes identified in the tropical West African japonica population between the two cA introgression regions.
gene | name | description |
Os06g0504900 | OsWRKY31 | Similar to WRKY transcription factor 31 |
Os06g0507200 | Prol-17 | Bifunctional inhibitor/plant lipid transfer protein/seed storage domain containing protein |
Os06g0507300 | Os06g0507300 | Similar to GAMYB-binding protein |
Os06g0507400 | Os06g0507400 | Similar to GAMYB-binding protein (Fragment) |
Os06g0507900 | Os06g0507900 | Conserved hypothetical protein |
Os06g0508700 | MetRS | Similar to methionyl-tRNA synthetase-like protein (ISS) |
Os06g0517700 | Os06g0517700 | Similar to cDNA, clone: J075167K24, full insert sequence |
Os06g0523900 | Os06g0523900 | RNA-processing protein, HAT helix domain containing protein |
Os06g0524300 | Os06g0524300 | Protein of unknown function DUF3133 domain containing protein |
Os06g0526100 | OsbHLH054 | Hypothetical conserved gene |
None | None | None |
Os06g0526350 | Os06g0526350 | Hypothetical conserved gene |
Os06g0526400 | OsPYL7 | Similar to AT-rich element binding factor 3 |
Os06g0526466 | OsEnS-89 | Conserved hypothetical protein |
Os06g0526600 | OsABP | DEAD-box helicase ATP-binding protein, Response to abiotic stress (salt, dehydration, ABA, blue and red light |
Os06g0526650 | Os06g0526650 | Non-protein coding transcript |
Os06g0526700 | OsPP2C55 | Probable protein phosphatase 2C 55 |
Os06g0526800 | OsPP2C56 | Probable protein phosphatase 2C 56 |
Os06g0527100 | OsCNGC12 | Similar to Cyclic nucleotide-gated channel A (Fragment) |
Os06g0527201 | Os06g0527201 | Hypothetical gene |
None | None | None |
Os06g0527300 | OsCNGC13 | Similar to cDNA clone:J023078M02, full insert sequence |
Os06g0527500 | Os06g0527500 | Conserved hypothetical protein |
Os06g0527800 | OsPYL/RCAR7 | Similar to Polyketide cyclase |
Os06g0528300 | OsPYL/RCAR8 | Similar to Polyketide cyclase |
Os06g0528600 | OSSPDS2 | Aminopropyl transferase |
Os06g0532500 | FCP2 | Protein with similar CLE domain, Meristem maintenanc |
Os06g0533700 | Os06g0533700 | Hypothetical conserved gene |
Os06g0534200 | Os06g0534200 | Conserved hypothetical protein |
None | None | None |
Os06g0534500 | Os06g0534500 | Zinc finger, RING-type domain containing protein |
None | None | None |
None | None | None |
Os06g0534800 | Os06g0534800 | Zinc finger, RING-type domain containing protein |
None | None | None |
Os06g0534900 | Os06g0534900 | Hypothetical conserved gene |
Os06g0534950 | Os06g0534950 | Similar to zinc finger, C3HC4 type family protein |
None | None | None |
Os06g0547900 | GSK4 | Similar to Shaggy-related protein kinase eta (EC 2.7.1.-) (ASK-eta) (BRASSINOSTEROID-INSENSITIVE 2) (ULTRACURVATA1) |
Os06g0551400 | OsRab11 | Ras-related small GTP-binding protein, Regulation of vesicular trafficking from trans-Golgi network to plasma membrane or vacuole, Jasmonic acid (JA)-mediated defense signalin |
Os06g0553200 | PFL | Similar to Meiosis 5 |
Os06g0556000 | OsAAP12A | Amino acid permease, Transport of amino acid |
Os06g0559400 | Os06g0559400 | Conserved hypothetical protein |
Os06g0561000 | OsMIOX | Myo-inositol oxygenase, Drought stress toleranc |
Os06g0562200 | OsPYL/RCAR9 | Bet v I allergen family protein |
Os06g0633300 | OsPSK | Similar to Phytosulfokines 1 |
Os06g0633500 | Os06g0633500 | Zinc finger, RING/FYVE/PHD-type domain containing protein |
Os06g0637500 | R2R3-MYB | Similar to MYB transcription factor R2R3 type |
Os06g0638000 | Os06g0638000 | Zinc finger, CCCH-type domain containing protein |
Os06g0642900 | Os06g0642900 | Ubiquitin system component Cue domain containing protein |
Os06g0642950 | Os06g0642950 | Non-protein coding transcript |
Os06g0643000 | Os06g0643000 | Phox-like domain containing protein |
Os06g0643050 | Os06g0643050 | Hypothetical protein |
Os06g0643100 | OsPBC1 | Proteasome subunit beta type 3 (EC 3.4.25.1) (20S proteasome alpha subunit C) (20S proteasome subunit beta-3) |
Os06g0643300 | Os06g0643300 | Development protein-like protein |
Os06g0643500 | Os06g0643500 | Similar to ADR11 protein (Fragment) |
Os06g0643600 | MutM | DNA glycosylase/AP lyase domain containing protein |
Os06g0643900 | OsPAP26 | Purple acid phosphatase (EC:3.1.3.2), Regulation of phosphate remobilization, Utilization of organic phosphoru |
Os06g0644200 | OVP1 | Similar to Pyrophosphate-energized vacuolar membrane proton pump (Pyrophosphate-energized inorganic pyrophosphatase) (H+-PPase) (Vacuolar H+-pyrophosphatase) |
Os06g0644800 | OsOS-9 | Glucosidase II beta subunit-like domain containing protein |
Os06g0645700 | OsHAF701 | Similar to HAF1 |
Os06g0649000 | WRKY28 | PAMP (pathogen-associated molecular pattern)-responsive transrepressor, Defense respons |
Os06g0649600 | Os06g0649600 | Non-protein coding transcript |
Os06g0649800 | Sdt97 | Similar to DNA-3-methyladenine glycosylase I |
Os06g0650300 | OsglHAT1 | Histone H4 acetyltransferase, Regulation of grain weight, yield, and plant biomas |
Os06g0652000 | OsRpo | Similar to T3/T7-like RNA polymerase (Fragment) |
GO terms enrichment categorises candidate genes into two primary functional groupings (Table S8). The first cluster includes 8 GO terms that are all involved in binding functions. The second cluster comprises genes with GO term enrichments involved in protein regulation/inhibition (Fig. 7c-d).