Population structure
The structure analysis performed on the 1,929 accessions of the A-panel indicated K = 3 as the most likely number of populations, i.e., the lowest rate (10%) of accessions with ancestry coefficients below the threshold of 0.80. The three populations corresponded to the indica, japonica and cAus. Assignment of individual accessions to one of the three populations was almost completely in line with the Wang et al. (2018) classification of the 3K RGP accessions, based on ~ 4.8 million SNPs. The rate of identical classification was of 96% for the indica population (1,090 out of 1,139), 100% for the japonica population (497 out of 495), and 92% for cAus (156 out of 169). The 187 admix accessions, with estimated ancestry coefficients below 0.80, included 90 accessions classified as Admix, XI-adm or GJ-adm by Wang et al. (2018), and 16, 62 and 19 accessions assigned by the same authors to cAus, cBas and indica populations, respectively. The 62 cBas accessions derived a large share (0.68%) of their ancestry coefficient from the japonica population (Supplementary Fig. 2). The hypotheses of K = 4 and K = 5 subdivided the indica population into its components XI-1, XI-3 and XI-2, and at K = 6, the cBas accessions formed a separate population. However, under these hypotheses, the rate of accessions with ancestry coefficients below 0.80 increased drastically, reaching 39%, 46% and 43%, respectively. Thus, one can consider that our 24K SNP genotypic dataset provided a reasonably accurate description of the global structure of the A-panel.
When a similar clustering analysis was performed on the A-panel plus M-panel of 620 accessions, the hypothesis of K = 2 separated a population including the indica accessions from another including the japonica accession (Fig. 1A). At K = 3, the population including the indica accessions split into 2, and at K = 4 it split into 3 subpopulations. At K = 5, the population including the japonica accessions split into 2 subpopulations. K values above five did not improve population discrimination and did not reduce the number of Admix. Among the populations identified at K = 5, four corresponded to indica, japonica, cAus and cBas from the corresponding A-panel, plus 21%, 7%, 0.2% and 0.3% of accessions from the M-panel, respectively. The fifth population, exclusively composed of Malagasy accessions (38% of the accessions of the M-panel), corresponded to the group specific to Madagascar first described by Ahmadi et al. (1988). Thirteen percent of accessions of A-panel and 33% of accessions of M-panel with less than 80% of their estimated ancestry deriving from one of the identified populations were considered as admixed.
Lastly, when structure analysis was performed on the 620 accessions of the M-panel alone, the most likely number of populations was K = 3, among which two corresponded to indica and japonica and the third to the group specific to Madagascar (Fig. 2A). The M-Panel representatives of indica, hereafter called G1 group, accounted for 25% of the accessions, the japonica accessions, hereafter called G6 group, for 7%, and the group specific to Madagascar hereafter called Gm for 36%. The remaining 32% accessions were mainly admixtures of the G1 and Gm groups, except two Malagasy aromatic varieties with more than 70% of their estimated ancestry deriving from G6, and one cAus accessions. The G1 group consisted of lowland rice varieties cultivated in low elevation areas (altitudes < 800 m). The G6 was composed of upland rice varieties cultivated in the low elevation forest areas of the east coast (altitude < 750 m) on the one hand, and a few lowland rice varieties cultivated in high elevation areas (altitude > 1250 m) on the other hand. Lastly, the Gm assembled lowland rice varieties mainly cultivated at altitudes between 800 m and 1600 m (Supplementary Fig. 3).
The distance-based neighbour-joining tree supported the results of the structure analyses for the joint analysis of the A-panel and M-panel (Fig. 1B) as well as for the analysis of the M-panel alone (Fig. 2B). It also allowed to separate rather clearly the component subpopulations of the indica and japonica, as subdivided by Wang et al. (2018). The indica XI-1A subpopulation mainly originated from China, the XI-1B subpopulation of diverse origins, the XI-2 subpopulation mainly originated from the Indian subcontinent, and the XI-3 subpopulation mainly originated from South-East Asia. The GJ-temp component mainly originated from Japan, China and Korea, the GJ-trop component, mainly originated from Indonesia, Philippines, Malaysia and Sri Lanka, and the GJ-sbtrop component mainly originated from Bhutan, India, Laos, Myanmar and Thailand (Fig. 1B). The Malagasy G1 accessions split into two ensembles, one clustering with the Asian XI-1B subpopulation of very diverse origins, and the other, together with the Malagasy admix accessions, with the Asian XI-2 subpopulation of South-Asian origin. The Malagasy G6 accessions formed a distinct cluster among the Asian GJ-trop subpopulation, mainly of Indonesian origin. The Malagasy Gm group formed a distinct cluster at the vicinity of the Asian XI-2 subpopulation.
Genetic characteristics of the Asian subpopulations and Malagasy groups
In the A-panel, average gene diversity over the 24 k common SNPs ranged from 0.030 ± 0.0142 for GJ-temp subpopulation to 0.177 ± 0.084 for XI-2 (Table 1). Average gene diversity in Malagasy G1 (0.138 ± 0.089) and G6 (0.100 ± 0.048) was smaller compared to their Asian counterpart indica (0.183 ± 0.086) and japonica (0.111 ± 0.037). The Gm group had a medium level gene diversity, 0.094 ± 0.044.
Table 1
Gene diversity, linkage disequilibrium and genetic differentiation between populations estimated by FST statistic
|
|
|
|
|
Asian panel
|
Malagasy panel
|
Population
|
N
|
Gene diversity
|
Distance* (kb) for r² ≤ 0.2
|
Genetic differentiation between subpopulations estimated by FST statistic
|
indica
|
XI-1A
|
XI-1B
|
XI-2
|
XI-3
|
cAus
|
japonica
|
GJ- trop
|
GJ-sbtrop
|
GJ-temp
|
cBas
|
G1
|
Gm
|
G6
|
Indica (XI)
|
1,139
|
0.19 ± 0.09
|
75–100
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- XI-1A
|
178
|
0.13 ± 0.06
|
250–300
|
|
0.00
|
|
|
|
|
|
|
|
|
|
|
|
|
- XI-1B
|
49
|
0.15 ± 0.07
|
250–300
|
|
0.25
|
0.00
|
|
|
|
|
|
|
|
|
|
|
|
- XI-2
|
179
|
0.18 ± 0.08
|
125–150
|
|
0.23
|
0.17
|
0.00
|
|
|
|
|
|
|
|
|
|
|
- XI-3
|
392
|
0.16 ± 0.06
|
125–150
|
|
0.25
|
0.18
|
0.15
|
0.00
|
|
|
|
|
|
|
|
|
|
cAus
|
169
|
0.16 ± 0.08
|
200–250
|
0.42
|
0.53
|
0.49
|
0.42
|
0.49
|
0.00
|
0.73
|
|
|
|
|
|
|
|
japonica (GJ)
|
495
|
0.11 ± 0.04
|
350–400
|
0.66
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- GJ-trop
|
210
|
0.09 ± 0.04
|
500–600
|
|
0.74
|
0.74
|
0.68
|
0.68
|
0.70
|
|
0.00
|
|
|
|
|
|
|
- GJ-sbtrop
|
105
|
0.05 ± 0.02
|
700–800
|
|
0.76
|
0.78
|
0.69
|
0.69
|
0.71
|
|
0.25
|
0.00
|
|
|
|
|
|
- GJ-temp
|
150
|
0.03 ± 0.01
|
700–800
|
|
0.79
|
0.83
|
0.72
|
0.71
|
0.75
|
|
0.30
|
0.43
|
0.00
|
|
|
|
|
cBas
|
62
|
0.12 ± 0.06
|
500–600
|
0.54
|
0.65
|
0.62
|
0.56
|
0.59
|
0.56
|
0.51
|
0.48
|
0.56
|
0.64
|
0.00
|
|
|
|
G1
|
156
|
0.14 ± 0.09
|
125–150
|
0.07
|
0.14
|
0.14
|
0.11
|
0.16
|
0.44
|
0.73
|
0.69
|
0.70
|
0.74
|
0.56
|
0.00
|
|
|
Gm
|
220
|
0.09 ± 0.05
|
600–700
|
0.42
|
0.57
|
0.58
|
0.41
|
0.52
|
0.55
|
0.78
|
0.77
|
0.80
|
0.82
|
0.70
|
0.46
|
0.00
|
|
G6
|
45
|
0.10 ± 0.05
|
1250–1500
|
0.66
|
0.70
|
0.69
|
0.63
|
0.65
|
0.64
|
0.17
|
0.13
|
0.34
|
0.43
|
0.46
|
0.62
|
0.75
|
0.00
|
N: number of accessions; in the cases of indica (XI) and japonica (GJ), N includes 342 XI-adm and 30 GJ-adm accessions, respectively. r²: Linkage disequilibrium (LD); *: threshold of pairwise distance between markers above which the mean r² reaches values below 0.2. The 64 Admix accessions of the A-panel, the 196 Admix, 1 cAus and 2 cBas accessions of the M-Panel were excluded from LD, gene diversity and genetic differentiation analyses. |
Important differences were observed between subpopulations for r² estimate of the initial LD, i.e. pairwise distance between markers below 25 kb (from 0.405 for XI-2 subpopulation to 0.724 for GJ-sbtrop) and for its subsequent decay (Supplementary table 3). While the distance at which the LD went below 0.2 was between 125 and 150 kb for XI-2, it was above 700 kb for GJ-sbtrop and GJ-temp. Pattern of LD decay in G1 was similar to the one of the indica population but the absolute values were slightly higher (0.05 point) whatever the pairwise distance between markers (Fig. 3). The G6 group had a high initial LD (r² = 0.594) and rather low rate of LD decay (r² < 0.2 at pairwise distances between markers above 1.5 Mb). The Gm group had the highest initial LD (r² = 0.648 for distances below 25 kb) but the LD decay was rather fast (r² < 0.2 at distances above 600 kb) and similar to the one of G1 for distance above 2 Mb. Compared to G1 group and Asian subpopulations, G6 and Gm showed larger inter-chromosome variations of LD pattern. For instance, in Gm, the average r² values (over 28 levels of pairwise distance between markers) of chromosomes 6 and 11 were + 42% and + 33% higher than in other chromosomes, respectively.
Genetic differentiation was very high between the three Malagasy groups, ranging from 0.46, for G1-Gm, to 0.75, for G6-Gm (Table 1). However, “among population” variance, calculated by AMOVA, represented only 40% of the total molecular variance, indicating large within-group differentiation. Differentiations between the Malagasy G1 and G6 groups and their Asian counterparts were rather low but highly significant suggesting a specific history of the Malagasy rice varieties. The G1 - cAus FST and the G1 - cBas FST were of the same magnitude as the indica - cAus and the indica - cBas FST, respectively. Similar pattern was observed for G6 - cAus and G6 - cBas FST, and its Asian counterpart japonica. The FST between Gm and the Asian subpopulations were also high, ranging from 0.41 to 0.82 (Table 1), confirming the fact that Gm could not be directly affiliated to one of those subpopulations. The FST between Gm and the Asian subpopulations was lowest with XI-2.
Haplotype level genetic variations
Results of the KDE-based local classification are represented for each chromosome in Fig. 4 and are summarised over the whole genome in Table 2. As expected, a large percentage of haplotypes of accessions of the A-panel that were considered as members of the indica, japonica or cAus subpopulations at the whole genome level (Wang et al., 2018), was attributed by the KDE-based classification to their homologous classes of haplotypes, ind, jap and cAus. On average, the percentage was of 68.7, 58.4 and 53.1 for indica, japonica and cAus accessions, respectively. However, variations of up to 35% of these shares existed between chromosomes. The percentages of cross-classification of haplotypes of accessions among the three subpopulations (i.e. indica to jap and vice-versa, indica to cAus and vice-versa, and japonica to cAus and vice-versa) were low, rarely exceeding 2%. The percentage of haplotypes attributed to each of the four intermediate classes (ind-jap, ind-cAus, jap-cAus, and ind-jap-cAus) depended on the classification of accessions at the global level. In the case of indica accessions, on average 12.3% of the haplotypes were classified as ind-cAus and 9% as ind-jap-cAus, while the average share of the ind-jap and jap-cAus haplotypes were of 4.8% and 2.9%, respectively. In the case of the japonica accessions, on average 13.8% of the haplotypes were classified as ind-jap and 17.7% as ind-jap-cAus, while the average share of ind-cAus and jap-cAus haplotypes were of 3.8% and 4.0%, respectively. Accessions belonging to the cAus subpopulation were characterised by a large share of haplotypes (on average 26.4%) classified as ind-cAus. The proportions of haplotypes of indica, japonica and cAus accessions classified as outliers were very low, less than 1%, on average. Accessions of the A-panel belonging to the cBas subpopulation of Wang et al (2018) were characterised by large shares of jap haplotypes, 31.2% on average, and outlier haplotypes, 11.0% on average.
Table 2
Mean shares (%) of the eight categories of haplotypes identified by the KDE-based local classification, in each population and subpopulation of the Asian and Malagasy diversity panels.
Diversity panel
|
Global classification
|
KDE-based local classification
|
ind
|
cAus
|
Jap
|
ind-jap
|
ind-cAus
|
jap-cAus
|
ind-jap-cAus
|
Outlier
|
Asian Panel
|
indica
|
68.9
|
1.2
|
0.9
|
4.8
|
12.3
|
2.9
|
8.9
|
0.1
|
(59.8–84.8)
|
(0.9–1.9)
|
(0.4–2.7)
|
(2.1–7.4)
|
(5.8–19.2)
|
(1.6–4.5)
|
(3.8–15.7)
|
(0–0.4)
|
- XI-1
|
67.6
|
0.5
|
1.0
|
5.2
|
12.4
|
3.4
|
9.8
|
0.1
|
(57.6–85.6)
|
(0.2–1.6)
|
(0.3–2.9)
|
(2–7.9)
|
(4.3–17.9)
|
(1.7–9)
|
(4.1–18.9)
|
(0–0.2)
|
- XI-2
|
65.6
|
2.9
|
0.5
|
4.4
|
14.5
|
2.8
|
8.5
|
0.8
|
(51.4–80)
|
(1.9–4.8)
|
(0.2–1.6)
|
(1.7–6.6)
|
(8.4–23.2)
|
(-0.4–4.7)
|
(1.2–15.1)
|
(0–8.6)
|
- XI-3
|
72.7
|
0.3
|
0.5
|
4.6
|
10.9
|
2.7
|
8.3
|
0.1
|
(60.8–88.5)
|
(0.1–0.7)
|
(0.1–2.4)
|
(2–9.3)
|
(4.1–19.9)
|
(1.5–4.5)
|
(3.3–13.9)
|
(0–0.3)
|
- X-adm
|
67.2
|
1.5
|
1.4
|
5.0
|
12.6
|
3.0
|
9.2
|
0.1
|
(57.5–83.5)
|
(1.2–2.3)
|
(0.7–3.3)
|
(2.2–7.2)
|
(6.2–18.9)
|
(1.7–4.6)
|
(4.1–16.1)
|
(0–0.5)
|
japonica
|
1.0
|
0.4
|
58.4
|
13.8
|
3.8
|
4.0
|
17.7
|
0.8
|
(0.4–1.8)
|
(0.3–0.7)
|
(36.4–71.6)
|
(3–23.2)
|
(2.7–6.5)
|
(2.7–6.5)
|
(9.7–27.3)
|
(0.1–7.2)
|
- GJ-trop
|
1.1
|
0.5
|
58.2
|
14.7
|
3.7
|
3.9
|
16.0
|
2.0
|
(0.3–2.0)
|
(0.3–1.1)
|
(36.3–71.9)
|
(3.1–23.2)
|
(1.9–6.4)
|
(1.8–6.3)
|
(8.4 − 27.4)
|
(0.1–9.8)
|
- GJ-sbtrop
|
0.8
|
0.4
|
58.4
|
14.5
|
4.1
|
4.2
|
16.5
|
1.3
|
(0.1–1.7)
|
(0.1–0.9)
|
(40.9–73.4)
|
(2.9–23.9)
|
(2.8–6.9)
|
(2.8–6.4)
|
(9.1–25.5)
|
(0.1–6.6)
|
- GJ-temp
|
1.0
|
0.2
|
55.7
|
14.6
|
4.5
|
4.6
|
18.2
|
0.8
|
(0.4–2.4)
|
(0.0–0.6)
|
(33.7–71.4)
|
(3.2–23.0)
|
(2.8–6.9)
|
(2.9–7.1)
|
(11.1–28.1)
|
(0.0–3.9)
|
- GJ-adm
|
2.7
|
0.7
|
54.8
|
14.8
|
4.4
|
4.4
|
17.1
|
1.5
|
(1.4–5.0)
|
(0.3–1.1)
|
(35.3–70.3)
|
(5.1–22.7)
|
(3.0–6.5)
|
(2.8–6.6)
|
(9.8–27.9)
|
(0.0 -6.6)
|
cAus
|
1.8
|
53.1
|
0.8
|
3.3
|
26.4
|
3.5
|
10.8
|
0.2
|
(1.3–2.3)
|
(37.6–63.8)
|
(0.5–2)
|
(1.7–6.7)
|
(14.4–42.2)
|
(1.8–6.7)
|
(4.8–18.4)
|
(0–0.4)
|
cBas
|
16.9
|
13.2
|
31.2
|
6.9
|
7.1
|
2.6
|
11.1
|
11.0
|
(10.6–27.4)
|
(4.9–22.4)
|
(20.7–38.7)
|
(2.2–12.9)
|
(2.9–10.9)
|
(-0.9–5.4)
|
(4.4–21.2)
|
(6.1–23.7)
|
Admix
|
24.4
|
12.4
|
24.3
|
7.4
|
10.6
|
4.3
|
10.3
|
6.2
|
(20.1–27.2)
|
(8.7–16.4)
|
(16.4–30.9)
|
(2.3–11.3)
|
(7.5–14.5)
|
(2–13.3)
|
(1.8–17.2)
|
(4.9–9.7)
|
Malagasy panel
|
G1
|
69.7
|
1.3
|
0.6
|
5.0
|
11.9
|
2.8
|
8.4
|
0.4
|
(59.9–84.2)
|
(0.6–2.3)
|
(0.1–2.1)
|
(2.0–7.6)
|
(4.5–17)
|
(1.6–4.2)
|
(3–16.2)
|
(0.1–1.2)
|
G6
|
2.7
|
2.0
|
57.4
|
12.5
|
3.9
|
3.8
|
15.0
|
3.1
|
(1.5–4.8)
|
(0.8–4.1)
|
(38.3–69.0)
|
(3.0–18.9)
|
(2–7)
|
(1.6–6.8)
|
(7.1–23.1)
|
(1.2–10.8)
|
Gm
|
50.6
|
8.2
|
5.4
|
5.7
|
14.3
|
3.3
|
9.8
|
2.6
|
(34.9–79.6)
|
(2.4–19.8)
|
(1.2–24)
|
(2.1–13.2)
|
(6–21.5)
|
(1.8–6.3)
|
(3.7–19.2)
|
(1.3–3.8)
|
Admix
|
58.9
|
6.2
|
3.4
|
4.6
|
12.5
|
2.8
|
7.4
|
4.1
|
(49.2–75.0)
|
(2.4–8.3)
|
(2.6–5.1)
|
(1.9–8.8)
|
(5.0–20.5)
|
(1.0–5.2)
|
(1.6–12.9)
|
(2.3–6.2)
|
Supplementary material |
Among accessions of the M-Panel, the shares of the eight classes of haplotypes in the G1 group were similar to the ones observed for its Asian counterpart, indica. In the case of G6 the shares differed slightly from its Asian counterpart, japonica, with higher percentages of haplotypes classified as ind and cAus. Patterns of distribution of the eight classes of haplotypes among the accessions of Gm group, significantly diverged from the ones observed for all other groups. It displayed a significantly lower share of ind haplotypes (50.6% on average) and significantly higher shares of cAus, jap and outlier haplotypes than the Asian indica and the Malagasy G1 group, 8.2%, 5.4% and 2.6% on average, respectively. The Malagasy Admix had a haplotype pattern similar to Gm, with slightly lower shares of cAus and jap haplotypes.
To further investigate the relationship between the Gm and the Asian indica, we compared the patterns of distribution of haplotypes of Gm accessions with the ones of each of the four indica subpopulations, XI-1, XI-2, XI-3 and XI-adm (Wang et al. 2018). Patterns of distribution of eight classes of haplotypes in the Gm group diverged from the ones of the four indica subpopulations by lower share of ind haplotypes and higher shares of cAus and jap haplotypes. The divergence was the lowest with the XI-2 subpopulation.
The Gm accessions also displayed larger variations of the shares of each class of haplotype across the 12 chromosomes. For instance, chromosome 6 displayed an exceptionally high share of jap haplotypes (24% on average), chromosome 8 a large share of cAus haplotypes (19.8% on average) and chromosome 11 a low share of ind and high shares of ind-jap and ind-jap-cAus haplotypes, 34.9%, 13.2% and 19.2%, respectively. In some cases, these haplotypes were contiguous and covered a large segment of the chromosome. For instance, on chromosome 6, one of the jap haplotype segments spread over 4.7 Mb (13.0 to 17.7 Mb) and on chromosome 8 one of the cAus haplotype segments spread over 5.4 Mb (10.9–16.3 Mb).
Dissymmetry based neighbour-joining tree constructed with the genotypic data of individual chromosomes (Supplementary Fig. 4) confirmed the heterogeneity of shares of different classes of haplotype among the 12 chromosomes. It also showed the discriminatory power of genetic diversity at the individual chromosome level. Accessions belonging to Gm were always clustering in a well-individualised branch of the tree, often near the cAus branch. M-Admix accessions almost systematically clustered with the Gm accessions. The chromosome level neighbour-joining trees also allowed, in some cases, to identify accessions of the M-panel and A-panel that displayed similar patterns of haplotypes shares. In the case of the M-panel, the three accessions most often clustering with Gm belonged to G1 and were originated from the Mahajanga region on the west coast of Madagascar. In the case of the A-panel, more than 95% of accessions clustering near Gm were of Indian subcontinental origin and belonged to XI-2 subgroup (66%), XI-adm (21%) and cAus (13%). Three accessions of the A-panel most frequently present in the vicinity of the Gm cluster were of Indian origin: Larha Mugad (IRGC 52339-1) from Karnataka, Adukkan (IRGC 81783-1) from Kerala and Cuttack 29 (IRGC 49573-1) from Odisha. Larha Mugad and Adukkan belonged to XI-2, Cuttack 29 was an XI-adm.
Characteristics of the large cAus and Jap chromosomic segments in the Gm group.
The two largest jap and the largest cAus chromosomic segments in the Gm accessions were investigated for kinship with their homologous segments in other Malagasy groups and in Asian subpopulations, and for their functional proprieties. Dissimilarity based phylogenetic neighbour-joining trees were constructed using the genotypic data from each of the three segments (240, 221 and 225 SNP loci for chromosome 6, 8 and 11, respectively) for kinship analysis. Annotation of the genomic sequence of the three chromosomic segments was retrieved from MSU database, for analysis of gene function.
In the neighbour-joining trees constructed with the genotypic data from the 13.0 to 17.7 Mb segment of chromosome 6, the Gm accessions and a significant share of the Malagasy Admix accessions formed a rather compact branch in the vicinity of the japonica cluster (Supplementary Fig. 5). Japonica accessions most closely clustering with Gm accessions were members of the Malagasy G6 group, (Fig. 5). The non-Malagasy accessions clustering near Gm were GJ-trop from Indonesia, Malaysia and the Philippines, plus one Admix accession from India. The segment encompassed 678 genes, including 330 genes described as coding for transposon or retrotransposon proteins, and at least 25 genes belonging to families involved in rice response to temperature-related stresses.
In the neighbour-joining trees constructed with the genotypic data from the 10.6 to 15.7 Mb segment of chromosome 11, Gm accessions formed again a rather compact branch near the japonica cluster. Japonica accessions most closely clustering with Gm belonged to G6, to GJ-trop from Indonesia and, to a lesser extent, to GJ-sbtrop from India and GJ-tem from China (Supplementary Fig. 5). The segment encompassed 673 genes, including 366 genes described as coding for transposon or retrotransposon proteins, and at least 14 genes belonging to families involved in rice response to temperature-related stresses.
In the neighbour-joining trees constructed with the genotypic data from the 10.9–16.3 Mb segment of chromosome 8, Gm accessions formed a compact cluster near the cAus subpopulation. Two cAus accessions from Bangladesh and India were embedded in the Gm cluster. The closest neighbours to the Gm cluster were 30 cAus, 4 XI-2 and one XI-adm accessions of Bangladeshi and Indian origin (Supplementary Fig. 6). The segment encompassed 695 genes, including 426 genes described as coding for transposon or retrotransposon proteins, and at least 25 belonging to families involved in rice response to temperature-related stresses.