Resequencing of 672 native rice accessions to explore genetic diversity and trait associations along Vietnam

doi:10.21203/rs.3.rs-70697/v1

Download PDF

Original article

Resequencing of 672 native rice accessions to explore genetic diversity and trait associations along Vietnam

https://doi.org/10.21203/rs.3.rs-70697/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

BACKGROUND:

Vietnam possesses a vast diversity of rice landraces due to its geographical situation, latitudinal range, and a variety of ecosystems. This genetic diversity constitutes a highly valuable resource at a time when the highest rice production areas in the low-lying Mekong and Red River Deltas are enduring increasing threats from climate changes, particularly in rainfall and temperature patterns.

RESULTS:

We analysed 672 Vietnamese rice genomes, 616 newly sequenced, that encompass the range of rice varieties grown in the diverse ecosystems found throughout Vietnam. We described four Japonica and five Indica subpopulations within Vietnam likely adapted to the region of origin. We compared the population structure and genetic diversity of these Vietnamese rice genomes to the 3,000 genomes of Asian cultivated rice. The named Indica-5 (I5) subpopulation was expanded in Vietnam and contained lowland Indica accessions, which had with very low shared ancestry with accessions from any other subpopulation and were previously overlooked as admixtures. We scored phenotypic measurements for nineteen traits and identified 453 unique genotype-phenotype significant associations comprising twenty-one QTLs (quantitative trait loci). The strongest associations were observed for grain size traits, while weaker associations were observed for a range of characteristics, including panicle length, heading date and leaf width.

CONCLUSIONS:

Our results highlight differences in genome composition and trait associations among traditional Vietnamese rice accessions, which are likely the product of adaption to multiple environmental conditions and regional preferences in a very diverse country. Our results highlighted traits and their associated genomic regions that are a potential source of novel loci and alleles to breed a new generation of sustainable and resilient rice.

Plant Molecular Biology and Genetics

Plant Physiology and Morphology

Rice

breeding

adaptation

QTL

genetic diversity

GWAS

landraces

Rice production in Vietnam is of great value for export and providing daily food for more than 96 million people. However, agricultural production, especially rice cultivation, is inherently vulnerable to climate variability across all regions in Vietnam. Based on the records of monthly precipitation and temperature from 1975 to 2014 (Nguyen et al. 2019), the areas of highest crop production in the low lying Mekong and Red River Deltas are particularly vulnerable to the increasing threat from climate change. In 2017, the total planted area of rice in Vietnam was 7.7 million hectares. This includes 4.2 million hectares in the Mekong River Delta and 1.1 million hectares in the Red River Delta (GSO-Database 2017). These are also the areas where most of the population of the county is concentrated. In the Mekong River Delta, the damaging effects of salinisation and drought to rice production have increasingly manifested themselves in recent years (Parker et al. 2019; Son 2018; Tran et al. 2019; Yen et al. 2019).

Vietnam possesses a vast diversity of native and traditional rice varieties due to its geographical situation, latitudinal range and diversity of ecosystems (Fukuoka et al. 2003). This diversity constitutes a largely untapped and highly valuable genetic resource for local and international breeding programs. Vietnamese landraces are disappearing as farmers switch to modern elite varieties. To limit this erosion of genetic resources, several rounds of collection of landraces, particularly from the northern upland areas, have been undertaken since 1987. Thousands of rice accessions have been deposited in the Vietnamese National Genebank at the Plant Resources Center (PRC, Hanoi, Vietnam), together with passport information detailing their traditional name and province of origin. One hundred and eighty-two traditional Vietnamese accessions were selected for a genotype by sequencing (GBS) study in 2014 (Phung et al. 2014). This study yielded 25,971 single nucleotide polymorphisms (SNPs) that were used to describe four Japonica and six Indica subpopulations. These subpopulations were classified by region, ecosystem and grain-type using passport information (province and ecosystem) and phenotyping. This dataset had subsequently been used for genome-wide phenotype-genotype association studies (GWAS) relating to root development (Phung et al. 2016), panicle architecture (Ta et al. 2018), drought tolerance (Hoang et al. 2019b), leaf development (Hoang et al. 2019a) and Jasmonate regulation (To et al. 2019).

An international effort to re-sequence Asian rice accessions known as the “3000 Rice Genomes Project” (3K RGP) has provided the rice community with a better understanding of Asian rice diversity and evolutionary history, as well as providing valuable knowledge to enable more efficient use of these accessions for rice improvement (Wang et al. 2018; Wing et al. 2018). However, only 56 of these accessions originated from Vietnam, suggesting that the rice diversity within this country may not be fully captured within the 3K RGP. While the original 3K RGP analysis described nine subpopulations (Wang et al. 2018), subsequent reanalysis had shown that the 3K RGP could be further subdivided into fifteen subpopulations (Zhou et al. 2020).

In this paper, we newly sequenced 616 Vietnamese rice accessions using whole-genome sequencing (WGS), most of them being native landraces. 164 of these rice accessions were in common with a previous study [8] based on a genotyping-by-sequencing (GBS) approach. We supplemented this dataset with all 56 Vietnamese genotypes from the 3K RGP to form a native diversity panel with 672 accessions. We analysed this diversity panel of 672 accessions to explore how breeding and environmental pressures have shaped the rice genome in Vietnamese accessions. We also carried out a comprehensive analysis of the population structure of the 3,635 rice genomes obtained from joining our diversity panel and the complete 3K RGP datasets. We completed a GWAS on the diversity panel with 672 accessions (and separately for the Japonica and Indica subtypes within it) on thirteen phenotypes, which are available for around two-thirds of the samples. Our results highlight genomic differences and trait associations in traditional Vietnamese landraces, which are likely the product of adaption to multiple environmental conditions and regional culinary preferences in a very diverse country.

Sequencing rice diversity from Vietnam

Whole-genome sequencing was carried out on 616 rice accessions. 511 of the accessions were obtained from the PRC (Plant Resource Centre, Hanoi, Vietnam, http://csdl.prc.org.vn), together with their passport data, which shows that they were collected from all eight administrative regions of Vietnam (Table S1). The remaining samples were obtained from AGI’s collection (Agricultural Genomics Institute, Hanoi, Vietnam). Three reference accessions (Nipponbare, a temperate Japonica; Azucena, a tropical Japonica; and two accessions of IR64, an Indica) obtained from the PRC, were included in the dataset. A total of 1,174 Giga base-pairs (Gbps) of data was generated for the 616 samples representing an average sequencing depth of 30x for 36 “high coverage” samples and 3x for 580 “low coverage” samples (Table S1). These 616 newly-sequenced accessions were classified into 379 Indica and 202 Japonica subtypes, with the remaining 35 (including the Aus and Basmati varieties) being classified as admixed, based on the STRUCTURE (Pritchard et al. 2000) output for K=2 using a subset of 163,393 SNPs.

Population structure of rice within Vietnam

The population structure of rice within Vietnam was analysed using the diversity panel of 672 samples, comprising 616 newly sequenced accessions and 56 Vietnamese genotypes from the 3K RGP. We assigned the 672 samples to four Japonica subpopulations and five Indica subpopulations (Table S1) using (i) the population structure information obtained from the STRUCTURE analysis (Fig. 1), (ii) the previous characterisation of a panel of Vietnamese native rice varieties using GBS (Phung et al. 2014), and (iii) the assessment of the optimal number of subpopulations (Fig. S1) using the method described in Evanno et al. (2005). Subpopulations were named as in Phung et al. (2014), except that we considered the I6 subpopulation to be part of the I3 subpopulation. Although the previous study used a limited number of GBS markers, 129 of the 164 common samples were assigned to the same subpopulations in both studies. Most differences were due to samples being classified as admixed in either one of the studies. We classified 48 (11%) of the Indica (Im), and eight (4%) of the Japonica samples (Jm) as admixed. The reference varieties Nipponbare (Temperate Japonica), Azucena (Tropical Japonica), and IR64 (Indica) were classified as J4, J1 and I1, respectively.

Each Indica subpopulation contained shared ancestry (admixed components) with other Indica subpopulation (Fig. 1a). The admixed components are shown in detail for the 43 samples in the I5 subpopulation (Fig. 1c) namely 38 samples from our dataset and the following five samples from the 3K RGP; IRIS 313-11384 (IRGC 127275), B184 (IRGC 135862), IRIS 313-11383 (IRGC 127274), IRIS 313-10751 (IRGC 127577) and IRIS 313-11893 (IRGC 127519). The Japonica subtropical J1 subpopulation shared ancestry (between 0 and 25% of the genome) with the Japonica tropical J3 subpopulation, whereas the two temperate subpopulations, J2 and J4 shared ancestry dominantly with each other. The tropical J3 subpopulation contained four samples with around 20% of the haplotypes in common with the temperate J4 subpopulation. Using the passport information available from the PRC, the proportion of each subpopulation originating from each of the “administrative regions” of Vietnam is shown in Fig. 1d. Only the I1 and I2 Indica subpopulations were collected from the Mekong River Delta regions, I2 being almost exclusively grown there whereas I1 was more widespread than I2. The I4 and J4 subpopulations were mainly collected from the Red River Delta areas. The J1 and J3 subpopulations were closely related; the J1 subpopulation was predominantly from the North of Vietnam whereas the J3 subpopulation was concentrated around the South-Central Coast region. Small variations in the percentage of reads mapping were observed for each of the subpopulations (Fig. S2).

A Principal Component Analysis (Fig. 2a and 2b) showed the relationship between these nine Vietnamese subpopulations. Concerning the Vietnamese genotypes from the 3K RGP dataset included in the diversity panel, the Indica I1 subpopulation included two XI-1B modern varieties and eight admixed (XI-adm) accessions. I2 included fourteen XI-3B1 genotypes, which comprises Southeast Asian accessions, and similarly, I3 and I4 included one and ten XI-3B2 genotypes, respectively. Finally, I5 included five XI-adm accessions and clustered distinctly away from all the other subpopulations (Fig. 2a). On the other hand, J1 included the two subtropical (GJ-sbtrp) accessions from the Vietnamese 3K RGP genotypes, and J3 included one tropical (GJ-trp1) accession from the Vietnamese 3K RGP genotypes (Fig. 2b). These results correlate well with the latitudinal distinction between these subpopulations. J2 and J4 included two and one temperate (GJ-tmp) accessions, respectively; and split into two clear subpopulations in Vietnam compared with the East Asian temperate subpopulation described by the 3K RGP.

Population structure of the combined 3,635 Asian cultivated rice genomes

612 of the 616 newly sequenced accessions from this study and the 3,023 accessions from the 3K RGP were combined and classified into 9 and 15 subpopulations (Table S2), and compared with the subpopulations from the 3K RGP analysis (Wang et al. 2018; Zhou et al. 2020). For clarity, we used the prefix Jap- and Ind- to label these subpopulations from our analysis.

When the combined dataset of 3,635 samples was classified into nine subpopulations (Fig. S3a), we found that 95% of the 3K RGP accessions (2,882 out of 3,023) were assigned into the same subpopulations. The remaining 5% lines were either (i) previously classified as admixture and our analysis placed into a subpopulation, or (ii) were previously classified in a subpopulation and were now classified as admixture. The 612 newly sequenced Vietnamese accessions were placed in three Indica clusters (187 accessions), three Japonica clusters (176 accessions), the Basmati and Sadri aromatic cB group (11 accessions), or the Aus cA subpopulation (one accession). In more detail, the three Indica clusters included three Im accessions in the East Asian cluster (Ind-1A), seventy-six I1 accessions in the cluster of modern varieties of diverse origins (Ind-1B), and 108 accessions (I2, I3 and Im) in the Southeast Asian cluster (Ind-3). Whereas, the three Japonica clusters included 54 accessions (J2, J4 and Jm) in the primarily East Asian temperate cluster (Jap-tmp), 119 accessions (J1, J3 and Jm) in the Southeast Asian subtropical cluster subpopulation (Jap-sbtrp) and three J3 accessions in the Southeast Asian Tropical subpopulation (Jap-trp). Any remaining accession with admixture components over 65% either Indica or Japonica were classified as Ind-adm (191 accessions) or Jap-adm (27 accessions), respectively. Finally, the remaining accessions were considered as Admix (19 accessions). Notably, all thirty-seven I5 accessions were placed in Ind-adm, and ten of the sixteen J3 accessions were placed in Jap-adm.

When the combined dataset of 3,635 samples was reclassified into 15 subpopulations (K15_new, Fig. S3b), we noticed the following differences in the distribution of subpopulation compared to the 3K RGP analysis for the same number of 15 subpopulations (K15_3KRGP); we did not observe the division of the Aus samples into cA-1 and cA-2, and we subdivided the Indica subtypes and Japonica subtypes into eight and five subpopulations, respectively. A Principle Coordinate (PCO) analysis of the Indica and Japonica subpopulations is shown in Fig. 3, highlighting our new eight Indica and five Japonica subpopulations (In addition the Vietnamese and 3K RGP subpopulations are shown in Figs. S5 and S6).

The relation between the subpopulations in our comprehensive analysis (3,635 accessions) and the 3K RGP (3,023 accessions) was as follows: (i) The Ind-1A, Ind-1B.1 and Ind-1B.2 were equivalent to XI-1A, XI-1B1 and XI-1B2, respectively. Forty-three of the Vietnamese I1 accessions were in the Ind-1B.1 subpopulation, and the remaining 102 I1 accessions were classified as admixed. (ii) The Ind-2 was equivalent to XI-2A and XI-2B, and as expected, this geographically distant South Asian subpopulation was not present in Vietnam. (iii) The previously observed split of the Indica-3 subpopulation into 3A and 3B was also observed in our analysis, where Ind-3.1 was equivalent to XI-3A and did not contain any Vietnamese accessions. (iv) The remaining Ind-3.2, Ind-3.3 and Ind-3.4 were a rearrangement of the XI-3B1 and XI-3B2 subpopulations. (v) The 89 Vietnamese I2 accessions belonged to Ind-3.2, which was a subset of XI-3B1. (vi) Ind-3.3 contained 16 of the 37 Vietnamese I3 accessions. (vii) 72% of the accessions in Ind-3.4 were from Vietnam, which contained 13 of the 37 I3 accessions, 61 of the 62 I4 accessions, and all I5 accessions. Within Ind-3.4, the admixture components of I3, I4 and I5 subpopulations (Fig. S7) showed that I3 accessions were highly admixed, some I4 and I5 accessions were completely within Ind-3.4, while other I4 and I5 accessions showed admixture with Ind-3.3 (I5) or Ind.2, Ind-3.2, and Ind-3.3 (I4). To clarify these relations, a principle component analysis (PCA) with a reduced number of accessions was carried out using the 723 sample dataset (672 Vietnamese accessions and 51 genotypes from neighbouring Southeast Asian Countries; Fig. S8), this supported the close relationships of I2 with XI-3B1, I4 with XI-3B2, I5 with XI-adm, J1 with GJ-sbtrp, and that both J2 and J4 were within GJ-tmp.

Phenotypic and genetic diversity analysis of the Vietnamese Indica and Japonica subpopulations

Phenotypic measurements for 19 traits were scored in field conditions in the Hanoi area by breeders from the Agricultural Genomics Centre (AGI) for approximately two-thirds of the samples in our study. For five of these traits, additional scores were also included from trials by the Vietnamese Plant Resource Centre. In addition, phenotypic data were available for eleven of the traits in 38 of the 56 samples sourced from the 3K-RGP dataset (Table S3, S4). Finally, the grain length to grain width ratio (GL/GW) was calculated to give a total of 20 traits (Table S5). Scores were available for between 328 and 503 of the 672 samples (Indica subpanel, 170 – 297 samples and Japonica subpanel, 134 – 178 samples).

There were significant differences in measurements between the Indica and Japonica subtypes for ten of the traits; these are detailed in Table S5 and histograms are shown in Fig. 4 for selected phenotypes. The Indica subtypes had significantly (p-value <0.0001) higher values for grain length to width ratio, leaf pubescence, culm number, culm length, and floret pubescence. In contrast, the Japonica subtypes had significantly higher values for grain width, leaf width, flag leaf angle, panicle length, and floret colour. The Indica I1 subpopulation (mostly elite varieties) was the most phenotypically distinct when compared to the rest of the Indica samples (mostly native landraces). I1 samples had longer grains (p-value = 2.2e-16), earlier heading date (p-value = 9.9e-12), higher culm strength (p-value = 2.2e-16), shorter leaf length (p-value = 2.7e-14) and shorter culm length (p-value < 2.2e-16). Similar values were obtained when comparing I1 to just the I5 subpopulation (Fig. 4). The I5 subpopulation was not phenotypically distinct (p-value < 0.001) from the other landrace subpopulations I2, I3 and I4, except for a significantly lower measurement of leaf pubescence (p-value = 0.0007). The Japonica J2 subpopulation had a significantly lower grain length to width ratio than J1 (p-value = 1.8e-13) and J3 (p-value = 5.7e-07). A correlation analysis carried out between the 20 phenotypes (Fig. S9) showed that the highest correlation (r = 0.6) was between leaf length and culm length (excluding the correlation between grain length to width ratio and grain length and grain width). Histogram and correlation plots are available for the 13 traits used for the GWAS analysis in Fig. S10 comparing the Indica and Japonica subtypes and in Fig. S11 comparing subpopulations I1 and I5. Further boxplots showing the phenotypic distribution according to subpopulation for culm length, grain length, grain width and heading date are available in Fig. S12.

The Japonica subtypes had a lower nucleotide diversity (p = 0.000912) than the Indica subtypes (p = 0.00167). Looking at the individual subpopulations (Table S6), the elite I1 subpopulation is the most diverse (p = 0.00144), and the I5 subpopulation is the least diverse (p = 0.00103). Regions of the genome with low diversity in all Indica subpopulations, and regions with low diversity in specific subpopulations, were observed when plotting diversity along each chromosome (Fig. S13). The J3 subpopulation is the most diverse of the four Japonica subpopulations. (p = 0.000697). Large genomic regions with very low diversity were observed in chromosomes 2, 3, 4 and 5 in all Japonica subpopulations (Fig. S14).

Genome-wide genotype-phenotype association analysis

Three independent GWAS were conducted using the full panel (672 samples, 361,191 SNPs), the Indica subpanel (426 samples, 334,935 SNPs) and the Japonica subpanel (211 samples, 122,881 SNPs). Thirteen (13) of the 20 traits were suitable for GWAS based on the variance (CV < 56% for the full panel). The full list of phenotypic measurements is available in Table S3. We found 643 significant phenotype-genotype associations. These associations were organised into 21 QTLs (Table 1, Table S7). The GWAS Manhattan and Quantile-Quantile plots are available in Fig. S17 and Fig. S18. The QTLs ranged from 41 kb (16_FP) to 3,148 kb (5_GS). The 21 QTLs contained 1,730 genes and covered a total of 11 Mbp over ten chromosomes, and contained 453 SNPs with a significant association to a trait in at least one diversity panel (Fig. 5). The list of genes within each QTL is available in Table S8. Functional enrichment was found within 9 of the QTL (Table S9).

Seventeen QTLs were identified in the full diversity panel significantly associated with eight traits: grain length, grain width, grain length-to-width ratio, leaf width, panicle length, floret pubescence, heading date and internode diameter. A further 4 QTLs associated with grain length and grain width were observed only in the Japonica subpanel. Three of the QTLs, which were found in the full panel, were also observed in the Indica subpanel.

The set of 3.8M SNPs (see methods), representing one SNP every 99 bases, was annotated based on the potential effect of each SNP in protein function using SnpEff (Table S12). 526,138 (4.79%) of the SNPs were in genes. There were 21,639 (0.197%) SNPs in 11,125 genes classified as having a putative “High impact” effect (E.g. Exon changes, frameshifts, gene fusions or rearrangements, protein structural changes, etc.). Following additional minimal allele frequency (MAF) filtering, in the Indica dataset (MAF 5%, 2,027,294 SNPs), there were 11,906 "High impact” SNPs in 7,396 genes and the Japonica dataset (MAF 5%, 1,125,716 SNPs), there were 6,240 “High impact” SNPs in 4,439 genes of which 2,818 were present in both Indica and Japonica.

None of the 453 SNPs with a significant association was annotated as resulting in protein changes (“High impact” SNPs). However, “High impact” effects were identified in other SNPs within the QTL. Among the total 1,730 genes in the 21 QTLs, we annotated 309 genes with “High impact” SNPs in the Indica subpanel, 248 genes with “High impact” SNPs in the Japonica subpanel, including 137 “High impact” SNPs common between the two sets. 129 of the 309 genes and 94 of the 248 genes had functional annotations in PhytoMine (Goodstein et al. 2012), but no functional overrepresentation was found for these sets of genes. In addition, we looked for overlaps with the QTL in five published Vietnamese studies (Hoang et al. 2019a; Hoang et al. 2019b; Phung et al. 2016; Ta et al. 2018; To et al. 2019), which used 25,971 SNPs in 182 samples (164 in common). We found that 2_GL and 6_GS overlapped with QTL for panicle morphological traits (Ta et al. 2018); 2_GL overlapped with QTL9 for secondary branch number, and spikelet number (SBN and SpN), and 2_GS overlapped with QTL12 for secondary branch average length (SBL). 4_GW_jap overlapped with “q1” for longest leaf length (LLGHT) (Phung et al. 2016).

Indica and Japonica rice subpopulations within Vietnam

Whole-genome sequencing of 616 Vietnamese rice accessions, predominantly landraces, plus 56 Vietnamese genotypes previously sequenced by the 3K RGP, provides us with a diversity panel to clarify the structure of rice subpopulations in Vietnam. Here, we describe five Indica subpopulations and four Japonica subpopulations using phenotypic measurements from this study, passport information available from the Vietnamese National Genebank (PRC), and the agronomic and geographical annotations from Phung et al. (Phung et al. 2014). In general terms, our population structure within Vietnam agreed with the previous study, which used a smaller number of markers and 182 samples and is approximately a third of our diversity panel (Phung et al. 2014). Subpopulation I1 is the most phenotypically distinct of the Indica subpopulations and shows typical phenotypes of ‘elite’ varieties, such as short height, strong culm strength, long slender grains and a short growth-duration (less than 120 days from sowing to harvest). I1 accessions are grown throughout Vietnam in irrigated ecosystems but predominantly in the Mekong River Delta in the south of the country. Subpopulation I2 is mainly composed of long growth-duration (over 140 days), tall varieties grown in the rainfed lowland and irrigated ecosystems of the Mekong River Delta with a broad diversity of grain shapes. The remaining three Indica subpopulations are intermediate between I1 and I2 for growth-duration, height and culm strength, have a broad diversity of grain shapes, and are not grown in the Mekong River Delta. Subpopulation I3 has the highest proportion of upland varieties but also includes some lowland varieties from the “South Central Coast” region many of which were classified as an independent subpopulation (I6) by Phung et al. (Phung et al. 2014). Subpopulation I4 is mainly grown in the rainfed lowland and irrigated ecosystems of the Red River Delta. Subpopulation I5 is grown in a range of ecosystems but concentrated around the North Central Coast and Red River Delta regions, but excluding the Northwest region suggesting that it is the main lowland subpopulation. The J1 and J3 subpopulations are closely related upland varieties and the J2 and J4 subpopulations are closely related lowland varieties. Subpopulation J1 is mostly composed of medium growth-duration upland varieties from the mountainous regions in the North of Vietnam, with long large grains typical of upland varieties. Subpopulation J2 is grown throughout Vietnam in a range of ecosystems but has consistently short grains. Subpopulation J3 is mainly grown in the “South Central Coast” region and has long large grains. Subpopulation J4 is primarily grown in the Red River Delta region in lowland and mangrove ecosystems and has short grains.

The drought tolerance of these subpopulations can be inferred from the root traits measured by Phung et al. (2016). The J1 and J3 upland subpopulations have deeper and thicker roots than the thinner shallower roots in the J2 and J4 subpopulations, which are grown in irrigated and mangrove ecosystems (Phung et al. 2016). This suggests that the J1 and J3 subpopulations, which are grown mainly in rainfed upland regions, would be more drought tolerant than the others. Similarly, the I3 subpopulation has the deepest and thickest roots. It would, therefore, be more drought tolerant than the I1 and to a lesser extent the I5 subpopulation, which has the thinnest, shallowest root systems.

A comprehensive analysis of the available 3,635 Asian cultivated rice genomes

The comprehensive analysis of the combined 3,635 Asian cultivated rice genomes obtained by joining our diversity panel with the full 3K RGP dataset resulted in a similar assignation to the previous 3K RGP analysis in 84% of the cases. The largest differences were that the 3K RGP split the cA and XI-2 subpopulations, while our analysis split the GJ-tmp and rearranged the two XI-3B subpopulations into Ind-3.2, Ind-3.3 and Ind-3.4. The single temperate subpopulation (GJ-tmp) from the 3K RGP is further split in our study between the Jap-tmp.1 and Jap-tmp.2 subpopulations, with 88% of the samples in Jap-tmp.2 coming from Vietnam and forming the J2 subpopulation. These differences are likely due to changes in the distribution of genetic variants in subpopulations expanded within Vietnam.

Vietnamese rice subpopulations in the context of the 3K RGP Asian cultivated rice subpopulations

The Indica I1 subpopulation, which contains a high proportion of elite varieties, clustered with the X1-1B1 subpopulation of modern varieties. The Southeast Asian native subpopulations (XI-3B1 and XI-3B2) clustered with the I2 and I4 subpopulations, respectively. I3 appeared to include both XI-3B1 and XI-3B2 accessions. The subpopulations from East and South Asia (XI-1A, XI-2A, XI-2B, XI-3A) had no representatives from Vietnam and fell outside of the Vietnamese subpopulation clusters, as expected. Our four Vietnamese Japonica subpopulations relate to the tropical (J1), subtropical (J3) and temperate (J2 and J4) Japonica subpopulations from the 3K RGP according to their latitudinal origin from South to North Vietnam, respectively.

The most exciting subpopulation is I5. When all 3,635 samples were considered, the subpopulation XI-3.4 included half of the I3, all but one of I4 and all I5 Vietnamese accessions, as well as half of the Southeast Asian native XI-3B2 genotypes from the 3K RGP. The remaining XI-3B2 were classified as Indica admix (Ind-adm). However, when only the Vietnamese samples were considered in the analysis, I5 clustered distinctly away from I3 and I4 subpopulations (Fig. 2A) and included five accessions from the 3K RGP, which had very low shared ancestry (admixture components) with other 3K RGP samples. Notably, Vietnamese landrace IRIS 313-11384 (IRGC 127275) had no shared ancestry with any other Vietnamese 3K RGP genotypes. Remarkably, a recent study on genomic signals of admixture and alien introgression in a core collection of 948 accessions representative of the earlier Asian Rice Landraces (Santos et al. 2019) included IRIS 313-10751 (IRGC 127577) and IRIS_313-11383 (IRGC 127274) from the I5 subpopulation.

Genome-wide association analysis in Vietnamese rice landraces highlighted 21 QTL

We have also extended upon five published GWAS (Hoang et al. 2019a; Hoang et al. 2019b; Phung et al. 2016; Ta et al. 2018; To et al. 2019), which focussed on specific traits but used a smaller number of markers and a third of the samples from the Vietnamese dataset. We took a similar approach of carrying out the analysis on both the full panel and the Indica and Japonica subpanels. Showing the QTL for the various traits altogether in Fig. 7 has highlighted some interesting overlaps. Notably, the overlap of QTL for panicle morphology with our QTL for grain size (2_GL and 6_GS). These previous studies found QTL in the full panel and in the Indica subpanel, but not in the Japonica subpanel. However, we found QTL for grain size that were only present in the Japonica subpanel, and all the QTL found in the Indica subpanel were also found in the full panel. These differences probably reflect our larger dataset. Comparing our results with the GWAS results from the 3K RGP (https://snp-seek.irri.org/) (Mansueto et al. 2017; Mansueto et al. 2016), the QTL 5_GS on chromosome 3 is in the same region as a marker associated with grain length, and the QTL 10_GS on chromosome 5 is in the same region as a marker associated with both grain width and grain length. Underlying these two QTL, there are genes that have a putative role in the control of grain size in rice (Li et al. 2018), namely GS3 (Os03g0407400) in 5_GS and GSE5 (LOC_Os05g09520, Os05g0187500) in 10_GS. We also looked for genes with “High impact” SNPs in QTL, relevant candidates include bip130 (Zhou et al. 2019) (LOC_Os05g02260, Os05g0113500) with a stop gain mutation underlying the QTL 9_PL for panicle length and OsSPX-MFS3 (LOC_Os06g03860, Os06g0129400) (Wang et al. 2015) with a splice acceptor variant at the end of an intron underlying the QTL 11_GL for grain length.

Subpopulation I5 constitutes an untapped resource of cultivated rice diversity.

The analysis restricted to Vietnamese accessions allowed us to observe differences among the accessions within the country. Although 38 accessions (including two genotypes from the same accession in our study) are deposited in the PRC in Hanoi, and the remaining five accessions are available from the 3K RGP, there is limited information from the passport and phenotypic data to be able to understand the distinctiveness of this subpopulation fully. Further analysis of this subpopulation should encompass ‘Indica specific genes’ which may have been overlooked in our study as we used a Japonica reference.

Phung et al. (Phung et al. 2014) described subpopulation I5 as “medium growth-duration accessions from various ecosystems of the North and South Central Coast regions, with rather small and non-glutinous grains”. Our I5 accessions are predominantly from the Red River Delta and contiguous coastal departments, the “North Central Coast” and “Northwest” administrative regions, but remarkably excluding the higher altitude Northwest region in the North, the more upper “Central Highlands”, as well as the whole Mekong River Delta in the south. This suggests that I5 accessions are common traditional low yielding lowland varieties with specific environmental or culinary values.

Comparing the Vietnamese subpopulations to the fifteen Asian rice subpopulations identified from the 3K RGP highlighted the I5 subpopulation as a potential source of novel variation as it forms a well-separated cluster. Subpopulation I5 originates from lowland areas such as the Red River Delta and adjacent regions. For the range of phenotypes measured in this study, the I5 subpopulation did not differ phenotypically from the other landraces, which have undergone breeding selection within Vietnam. However, compared to the ‘elite’ I1 subpopulation, I5 accessions have shorter grains, take longer to flower, having lower culm strength, longer culms and leaves.

In this study, we generated a large genome-variation dataset for rice by sequencing 616 accessions from Vietnam and supplementing these with the data obtained for the 3K RGP. Using this resource, we incorporated the Vietnamese rice diversity within the population structure of the Asian cultivated rice. A GWAS analysis yielded the strongest associations for grain characteristics and weaker associations for a range of characteristics such as panicle length, heading date and leaf width. We used these associations together with published QTLs obtained using a subset of our accessions to give us an insight into traits underlying the regions identified as being under breeding selection.

Vietnam is currently experiencing increasing variability in the local climate due to global changes and the growing severity of the El Nino-Southern Oscillation phenomenon, creating notable inter-annual variations in precipitation ranging from severe drought to large-scale floods (Yen et al. 2019). The Mekong River Delta region is an essential region for rice production globally, but the adverse effects of salinisation have damaged rice production in recent decades (Tran et al. 2019). In addition, long-term trends in rainfall and temperature patterns have been identified in areas with a high proportion of agricultural land. Genomic studies on the locally adapted varieties and subpopulations will provide a potential source of novel alleles which can be exploited in rice breeding programs, such as the new generation of sustainable ‘Green Super Rice’ which are designed to have lower inputs, enhanced nutritional content and suitability for growing on marginal lands (Wing et al. 2018).

Sequencing of 616 accessions from Vietnam

We sequenced a total of 616 rice accessions, 612 accessions from Vietnam and three reference accessions, Nipponbare, a temperate Japonica; Azucena, a tropical Japonica; and IR64, an Indica (2 samples). 511 accessions are available from the Vietnamese National Genebank (PRC) at http://csdl.prc.org.vn (Table S1). All Vietnamese native rice landraces were grown at Dai Dong Experimental Farm (Dai Dong commune, Thach That district, Hanoi, Vietnam) in 2015. The healthy seeds generated from one mature spikelet of the individual plant in each landrace were harvested and dried separately. After that, the selected seeds (35-40 seeds/landrace) were incubated and sown for two weeks to collect leaf samples (30g/sample) for genomic DNA extraction.

Total genomic DNA extraction of each rice landrace was made from young leaf tissue using the Qiagen DNeasy kit (Qiagen, Germany). DNA concentration and purity of the samples were measured by the UV-VIS NanoDrop ND-2000 spectrophotometer (Thermo Fisher Scientific) at OD 260/280 nm and OD 260/230 nm wavelengths.

Sequencing was performed by Genomic Services at the Earlham Institute (Norwich, UK). Around 1µg of genomic DNA from each sample was used to construct a sequencing library. For the 36 high coverage samples (prefix: SAM) the Illumina TruSeq DNA protocol was followed, and the samples were sequenced on the HiSeq 2000 for 100 cycles. For the low coverage samples (prefix: LIB), genomic DNA was sheared to 500bp using the Covaris S2 Sonicator (Covaris and Life technologies), and samples were processed using the KAPA high throughout Library Prep Kit (Kapa Biosystems, MA, USA). The ends of the DNA were repaired for the ligation of barcoded adapters. The resulting libraries were quality checked, pooled, and quantified by qPCR. The libraries were sequenced on a HiSeq 2500 instrument following the manufacturer’s instructions.

Phenotyping

Phenotyping experiments were conducted at the Thach That Experimental Farm of AGI in 2014 and 2015 (Dai Dong commune, Thach That district, Hanoi, Vietnam). The seeds of each rice landrace were incubated in an oven at 45^oC for five days to break the seed dormancy. All rice seeds were soaked in tap water for two days and incubated at 35-40^oC for four days for germinating. The fully germinated seeds of each rice landrace were directly sown in the paddy field plot (1.5m² in the area). After 15 days of sowing, 24 seedlings of each landrace were carefully transplanted by hand in field plots (2x4m²). The fertiliser and pesticide applications were performed following the conventional methods of rice cultivation in Vietnam. The phenotypic and agronomic characteristics were carried out following the method of IRRI and Institute (2014).

In addition, phenotypic data were available for eleven of the traits in 38 of the 56 genotypes sourced from the 3K-RGP dataset. These eleven traits were included in our analysis because we did not observe a significant difference (p-value > 0.07) between our dataset and the 3K-RGP dataset for the I2 subpopulation (Table S5).

Merging the SNP called in the sequenced materials and the complete 3K RGP dataset

Raw sequencing reads were mapped to the Nipponbare reference genome Os-Nipponbare-Reference-IRGSP-1.0 (IRGSP-1.0), using BWA-MEM with default parameters except for “-M -t 8”. Alignments were compressed, sorted and merged using samtools. Picard tools were then used to mark optical and PCR duplicates and add read group information. We used freebayes v1.1.0 for variant calling using default parameters. A total of 21.2 M variants were identified of which 16.4 M were SNPs, and 4.8 M were indels. The resulting VCF file was then filtered for biallelic SNPs with a minimum SNP quality of 30, resulting in 16.0 M variants. PLINK v1.9 was used to convert the VCF into a PLINK BED format. These variants were then combined with the 3K-RGP 29 M biallelic SNPs dataset v1.0 by downloading the PLINK BED files from the “SNP-seek” database (https://snp-seek.irri.org) excluding variants on scaffolds and 26,553 SNPs that were flagged as triallelic upon merging, resulting in 36.9 M SNPs. The SNPs present in both datasets were then extracted and filtered using an identical approach to Wang et al. (Wang et al. 2018), resulting in 5.9 M SNPs. For that, PLINK v1.9 “--hardy" (Purcell et al. 2007) was used to obtain observed and expected heterozygosity for 100,000 SNPs. We removed SNPs in which heterozygosity exceeds Hardy–Weinberg expectation for a partially inbred species, with inbreeding coefficient (F) estimated as the median value of “1−Hobs/Hexp”, in which Hobs and Hexp are the observed and expected heterozygosity for SNPs where “Hobs/Hexp <1” and the minor allele frequency is >5% and using the cut-off value of 0.479508 for the entire 3,622 samples dataset. A further filtered set of 3.4 M SNPs was obtained by removing SNPs with >20% missing calls and MAF < 1%. Finally, a core set of 361,279 SNPs was obtained with PLINK by LD pruning SNPs with a window size of 10 SNPs, window step of one SNP and r2 threshold of 0.8, followed by another round of LD pruning with a window size of 50 SNPs, window step of one SNP and r2 threshold of 0.8. Samples with more than 50% missing data in this core set were then removed, resulting in dropping seven newly sequenced samples and one genotype from the 3K-RGP dataset.

Population structure of the combined 3,635 samples

The population structure was analysed using the ADMIXTURE software (Alexander and Lange 2011) on the SNP set obtained in the previous section. First, ADMIXTURE was run from K=5 to K=15 in order to compare it with the analysis from IRRI (Wang et al. 2018; Zhou et al. 2020). For each K, ADMIXTURE was then run 50 times with varying random seeds. Each matrix was then annotated using the subpopulation assignment from the 3K-RGP nine subpopulations. Then, up to 10 Q-matrices belonging to the largest cluster were aligned using CLUMPP software (Jakobsson and Rosenberg 2007), these were averaged to produce the final matrix of admixture proportions. Finally, the group membership for each sample was defined by applying a threshold of ≥ 0.65 to this matrix. Samples with admixture components <0.65 were classified as follows. If the sum of components for subpopulations within the major groups (Ind and Jap) was ≥ 0.65, the samples were classified as Ind-adm or Jap-adm, respectively, and the remaining samples were deemed admixed (admix).

Multi-dimensional scaling analysis was performed using the ‘cmdscale’ function in R, using a distance matrix obtained in R using the ‘Dist’ function from the amap package (Lucas 2018). The resulting file was then passed to Curlywhirly (https://ics.hutton.ac.uk/curlywhirly/) and rgl v0.100.19 (https://r-forge.r-project.org/projects/rgl/) for visualisation.

Recalling the diversity panel with 723 samples

The 616 rice samples were mapped to the Japonica Nipponbare (IRGSP-1.0) reference with BWA-MEM using default parameters, duplicate reads were removed with Picard tools (v1.128) and the bam files were merged using SAMtools v1.5 (Li et al. 2009). Variant calling was completed again on the merged bam file with FreeBayes v1.0.2 (Garrison 2012) separately for each of the 12 chromosomes, but using the option “--min-coverage 10”. Over 6.3 M bi-allelic SNPs with a minimum allele count of ³3 and quality value above 30 and missing in <50% of samples were obtained with VCFtools v0.1.13 (Danecek et al. 2011). BAM alignment files to the Nipponbare IRGSP 1.0 reference genome were downloaded from http://snp-seek.irri.org/ (Mansueto et al. 2017; Mansueto et al. 2016) for 107 selected samples. Alignment statistics are included in Table S12. These BAM files were merged and variant calling was similarly completed using FreeBayes v1.0.2 (Garrison 2012) separately for each of the 12 chromosomes using the option --min-coverage 10, and filtered with VCFtools v0.1.13 as before to obtain 6.8 M bi-allelic SNPs with a minimum allele count of ³3 and quality value above 30 and missing in <50% of samples. The two sets of 6.3 M and 6.8 M SNPs were merged using BCFtools v1.3.1 isec to obtain 4.4 M SNPs which were present in both sets and in at least 70% of samples. These 4.4 M SNPs were then filtered to remove positions which fell outside the expected level of heterozygosity for this dataset, as previously indicated. The resulting estimate of F for the 723 samples was 0.882, so a SNP whose heterozygosity is >5x higher than the most likely value for a given frequency and the dataset’s inbreeding rate will be deemed as having an excessive number of heterozygotes. The cut-off value was 0.591, which resulted in 3.8 M SNPs passing this filter, a scatter plot indicating the SNPs which were kept and removed is shown in Fig. S15. Missing data was imputed in this latest dataset using Beagle v4.1 with default parameters (Browning and Browning 2016). A comparison using PCA, between the imputed and non-imputed SNP sets showed that imputation did not change the clustering of these 723 samples (Fig. S16). The 3.8M SNPs were subsequently filtered for minimum allele frequency (MAF), linkage disequilibrium (LD pruning or filtering), and distance between polymorphisms (thinning) in different subsets of samples to obtain fourteen sets of SNPs that ranged from 59K to 3.8M SNPs, which were appropriate for the various downstream analysis described below (Table S11).

Population structure and diversity analysis for the panel of 672 Vietnamese samples

SNP sets were filtered for MAF 5%, followed by LD filtering using PLINK --indep-pairwise 50 10 0.2, with further thinning if required. We ran STRUCTURE (Pritchard et al. 2000) v2.3.5 using the default admixture model parameters; each run consisted of 10,000 burn-in iterations followed by 50,000 data collection iterations. STRUCTURE was run using K=2 for the 616 samples using SNP set 1 (163,393 SNPs). Samples with admixture components <0.75 were classified as admixed, and the remaining samples were classified as Indica or Japonica. STRUCTURE was run varying the assumed number of genetic groups (K) from 3 to 10 with three runs per K value for the 672 Vietnamese samples (SNP set 9 – 80,000 SNPs); from 1 to 8 with ten runs per K value for the 426 Indica subtypes from Vietnam (SNP set 10 - 108,420 SNPs) and the 211 Japonica subtypes from Vietnam (SNP set 11 – 59,815 SNPs). The output files were visualised using the R package POPHELPER v.2.2.7 (Francis 2017) including the calculation of the number of clusters (K) using the Evanno method (Evanno et al. 2005; Zheng et al. 2012). Using the combined-merged clumpp output from POPHELPER, Indica (K=5) and Japonica (K=4) samples were classified into Indica I1 to I5 and Japonica J1 to J4 subpopulations using a threshold of >= 0.6, with the remaining samples being classified as mixed (Im and Jm). The principal component analysis (PCA) was performed using the R package SNPRelate v1.16.0 (Zheng et al. 2012) using method = ‘biallelic’. Nucleotide Diversity (p) was measured for each of the subpopulations with VCFtools v0.1.13 using 100-kbp windows and a step size of 10 kbp.

Determining the effect of SNPs

The effects of all bi-allelic SNPs (low, medium and high effects) on the genome were determined based on the pre-built release 7.0 annotation from the Rice Genome Annotation Project (http://rice.plantbiology. msu.edu/) using SnpEff (Cingolani et al. 2012) release 4.3, with default parameters. The complete set of 3,750,621 SNPs (SNP set 2) which contained on average one variant every 99 bases was annotated. Using sequence ontology terms, the effect of each SNP was classified as described by SnpEff. A summary of the SNP effect analysis is available in Table S20.

Genome-wide association analysis

Three independent analyses were conducted using the full panel (672 samples, 361,191 SNPs), the Indica subpanel (426 samples, 334,935 SNPs) and the Japonica subpanel (211 samples, 122,881 SNPs), SNP sets 12, 13 and 14 respectively (Table S11). The GWAS analysis was performed by employing the R package Genome Association and Prediction Integrated Tool (GAPIT) version 3.0 (Lipka et al. 2012; Tang et al. 2016). The covariate matrix was generated in STRUCTURE. We used the combined-merged output from POPHELPER for the full panel (K=8), the Indica subpanel (K=5) and the Japonica subpanel (K=4). The covariate matrix and the kinship calculated in GAPIT were included in the GWAS model to control for false positives. The SUPER (Settlement of MLM Under Progressively Exclusive Relationship (Wang et al. 2014) method integrated into GAPIT, designed to increase the statistical power, was used to perform the association mapping analysis. The SUPER method was implemented in GAPIT by setting the parameter of “sangwich.top” and “sangwich.bottom” to CMLM and SUPER, respectively. A quantile-quantile (Q–Q) plot was used to check if the model was correctly accounting for both confounding variables. Associations held by peaks with -log₁₀(p-value) ³ 8.0 were used to declare the significant associations. The Genes lying within the QTL regions were extracted and subjected to enrichment analysis using PhytoMine implemented within Phytozome (Goodstein et al. 2012) https://phytozome.jgi.doe.gov/ for Gene Ontology, Protein Domain and Pathway enrichment using a max p-value of 0.05 with Bonferroni correction.

Acknowledgements

We thank Professor Giles Oldroyd for his contributions to the conception of this project. We are grateful for the support from Dr. Nelzo Ereful, and Matt Heaton during outreach activities in Vietnam, and Dr. Luca Venturi, Dr. Ricardo Ramirez Gonzalez, Dr. Graham Etherington for their support during summer training activities in the UK, and Dr. Chris Watkins, Dr. Helen Chapman and the Genomics Pipelines team at the Earlham Institute for the sequencing support.

Funding

This work was supported by the Biotechnology and Biological Sciences Research Council (BBSRC) through the grants BB/N013735/1 (Newton Fund) and BBS/E/T/000PR9818, and the Newton Fund Institutional Links (Project 172732508), which is managed by the British Council.

Availability of data

All sequence data used in this manuscript have been deposited as study PRJEB36631 in the European Nucleotide Archive.

Author contributions

TDK, KHT, AH, SD, LHH, MC and JDV designed and conceived the research. TDK, KHT, TDD, NTPD, NTK, DTTH, NTD, KTD, CNP, TTT, NTT, HDT, NTT, HTG, TKN, CDT, SVL, LTN, NVG and LHH performed the phenotyping and laboratory experiments. JH and BS performed the data analysis with assistance from TDD, NTPD, DTTH, NTD, KTD, NTT, LTN, TDX, MC and JDV. JH, BS and JDV wrote the paper. All authors read and approved the final manuscript.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that there is no conflict of interest regarding the publication of this article.

Abbai R, Singh VK, Nachimuthu VV, Sinha P, Selvaraj R, Vipparla AK, Singh AK, Singh UM, Varshney RK, Kumar A (2019) Haplotype analysis of key genes governing grain yield and quality traits across 3K RG panel reveals scope for the development of tailor-made rice with enhanced genetic gains. Plant Biotechnol J 17:1612–1622
Alexander DH, Lange K (2011) Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinformatics 12:246
Browning BL, Browning SR (2016) Genotype Imputation with Millions of Reference Samples. Am J Hum Genet 98:116–126
Chen R, Cheng Y, Han S, Van Handel B, Dong L, Li X, Xie X (2017) Whole genome sequencing and comparative transcriptome analysis of a novel seawater adapted, salt-resistant rice cultivar - sea rice 86. BMC Genom 18:655
Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6:80–92
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, Genomes Project Analysis G (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158
de Freitas GM, Thomas J, Liyanage R, Lay JO, Basu S, Ramegowda V, do Amaral MN, Benitez LC, Bolacel Braga EJ, Pereira A (2019) Cold tolerance response mechanisms revealed through comparative analysis of gene and protein expression in multiple rice genotypes. PLoS One 14:e0218019
Delteil A, Blein M, Faivre-Rampant O, Guellim A, Estevan J, Hirsch J, Bevitori R, Michel C, Morel JB (2012) Building a mutant resource for the study of disease resistance in rice reveals the pivotal role of several genes involved in defence. Mol Plant Pathol 13:72–82
Du H, Liu L, You L, Yang M, He Y, Li X, Xiong L (2011) Characterization of an inositol 1,3,4-trisphosphate 5/6-kinase gene that is essential for drought and salt stress responses in rice. Plant Mol Biol 77:547–563
Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14:2611–2620
Francis RM (2017) Pophelper: an R package and web app to analyse and visualize population structure. Mol Ecol Resour 17:27–32
Fukuoka S, Alpatyeva NV, Ebana K, Luu NT, Nagamine T (2003) Analysis of Vietnamese rice germplasm provides an insight into Japonica rice differentiation. Plant Breeding 122:497–502
Garrison E, Marth G (2012) Haplotype-based variant detection from short-read sequencing
Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N, Rokhsar DS (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40:D1178–D1186
GSO-Database (2017) General Statistic Office in Vietnam, Database. (accession May 15, 2020)
Hoang GT, Gantet P, Nguyen KH, Phung NTP, Ha LT, Nguyen TT, Lebrun M, Courtois B, Pham XH (2019a) Genome-wide association mapping of leaf mass traits in a Vietnamese rice landrace panel. PLoS One 14:e0219274
Hoang GT, Van Dinh L, Nguyen TT, Ta NK, Gathignol F, Mai CD, Jouannic S, Tran KD, Khuat TH, Do VN, Lebrun M, Courtois B, Gantet P (2019b) Genome-wide Association Study of a Panel of Vietnamese Rice Landraces Reveals New QTLs for Tolerance to Water Deficit During the Vegetative Phase. Rice (N Y) 12:4
Institute IRRI IRR (2014) Homepage of the International Rice Research Institute, Philippine 2002 (accession May 15, 2020)
Jain M, Nijhawan A, Arora R, Agarwal P, Ray S, Sharma P, Kapoor S, Tyagi AK, Khurana JP (2007) F-box proteins in rice. Genome-wide analysis, classification, temporal and spatial gene expression during panicle and seed development, and regulation by light and abiotic stress. Plant Physiol 143:1467–1483
Jakobsson M, Rosenberg NA (2007) CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23:1801–1806
Kim EH, Kim YS, Park SH, Koo YJ, Choi YD, Chung YY, Lee IJ, Kim JK (2009) Methyl jasmonate reduces grain yield by mediating stress signals to alter spikelet development in rice. Plant Physiol 149:1751–1760
Kim SK, Park HY, Jang YH, Lee KC, Chung YS, Lee JH, Kim JK (2016) OsNF-YC2 and OsNF-YC4 proteins inhibit flowering under long-day conditions in rice. Planta 243:563–576
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079
Li N, Xu R, Duan P, Li Y (2018) Control of grain size in rice. Plant Reprod 31:237–251
Lipka AE, Tian F, Wang Q, Peiffer J, Li M, Bradbury PJ, Gore MA, Buckler ES, Zhang Z (2012) GAPIT: genome association and prediction integrated tool. Bioinformatics 28:2397–2399
Lira-Ruan V, Ruiz-Kubli M, Arredondo-Peter R (2011) Expression of non-symbiotic hemoglobin 1 and 2 genes in rice (Oryza sativa) embryonic organs. Commun Integr Biol 4:457–458
Liu X, Lan J, Huang Y, Cao P, Zhou C, Ren Y, He N, Liu S, Tian Y, Nguyen T, Jiang L, Wan J (2018) WSL5, a pentatricopeptide repeat protein, is essential for chloroplast biogenesis in rice under cold stress. J Exp Bot 69:3949–3961
Lombardo F, Kuroki M, Yao SG, Shimizu H, Ikegaya T, Kimizu M, Ohmori S, Akiyama T, Hayashi T, Yamaguchi T, Koike S, Yatou O, Yoshida H (2017) The superwoman1-cleistogamy2 mutant is a novel resource for gene containment in rice. Plant Biotechnol J 15:97–106
Lucas A (2018) amap: Another Multidimensional Analysis. CRAN package, https://CRAN.R-project.org/package=amap
Lyu J, Li B, He W, Zhang S, Gou Z, Zhang J, Meng L, Li X, Tao D, Huang W, Hu F, Wang W (2014) A genomic perspective on the important genetic mechanisms of upland adaptation of rice. BMC Plant Biol 14:160
Macovei A, Vaid N, Tula S, Tuteja N (2012) A new DEAD-box helicase ATP-binding protein (OsABP) from rice is responsive to abiotic stress. Plant Signal Behav 7:1138–1143
Mansueto L, Fuentes RR, Borja FN, Detras J, Abriol-Santos JM, Chebotarov D, Sanciangco M, Palis K, Copetti D, Poliakov A, Dubchak I, Solovyev V, Wing RA, Hamilton RS, Mauleon R, McNally KL, Alexandrov N (2017) Rice SNP-seek database update: new SNPs, indels, and queries. Nucleic Acids Res 45:D1075–D1081
Mansueto L, Fuentes RR, Chebotarov D, Borja FN, Detras J, Abriol-Santos JM, Palis K, Poliakov A, Dubchak I, Solovyev V, Hamilton RS, McNally KL, Alexandrov N, Mauleon R (2016) SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa. Current Plant Biology 7–8:16–25
Mukherjee S, Sengupta S, Mukherjee A, Basak P, Majumder AL (2019) Abiotic stress regulates expression of galactinol synthase genes post-transcriptionally through intron retention in rice. Planta 249:891–912
Nguyen DK, Ancev T, Randall A (2019) Evidence of climatic change in Vietnam: Some implications for agricultural production. J Environ Manage 231:524–545
Ouyang J, Cai Z, Xia K, Wang Y, Duan J, Zhang M (2010) Identification and analysis of eight peptide transporter homologs in rice. Plant Sci 179:374–382
Park SH, Chung PJ, Juntawong P, Bailey-Serres J, Kim YS, Jung H, Bang SW, Kim YK, Do Choi Y, Kim JK (2012) Posttranscriptional control of photosynthetic mRNA decay under stress conditions requires 3' and 5' untranslated regions and correlates with differential polysome association in rice. Plant Physiol 159:1111–1124
Parker L, Bourgoin C, Martinez-Valle A, Laderach P (2019) Vulnerability of the agricultural sector to climate change: The development of a pan-tropical Climate Risk Vulnerability Assessment to inform sub-national decision making. PLoS One 14:e0213641
Peng B, Kong H, Li Y, Wang L, Zhong M, Sun L, Gao G, Zhang Q, Luo L, Wang G, Xie W, Chen J, Yao W, Peng Y, Lei L, Lian X, Xiao J, Xu C, Li X, He Y (2014) OsAAP6 functions as an important regulator of grain protein content and nutritional quality in rice. Nat Commun 5:4847
Phung NT, Mai CD, Hoang GT, Truong HT, Lavarenne J, Gonin M, Nguyen KL, Ha TT, Do VN, Gantet P, Courtois B (2016) Genome-wide association mapping for root traits in a panel of rice accessions from Vietnam. BMC Plant Biol 16:64
Phung NT, Mai CD, Mournet P, Frouin J, Droc G, Ta NK, Jouannic S, Le LT, Do VN, Gantet P, Courtois B (2014) Characterization of a panel of Vietnamese rice varieties using DArT and SNP markers for association mapping purposes. BMC Plant Biol 14:371
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575
Saeng-ngam S, Takpirom W, Buaboocha T, Chadchawan S (2012) The role of the OsCam1-1 salt stress sensor in ABA accumulation and salt tolerance in rice. Journal of Plant Biology 55:198–208
Sakamoto T, Kitano H, Fujioka S (2017) Rice ERECT LEAF 1 acts in an alternative brassinosteroid signaling pathway independent of the receptor kinase OsBRI1. Plant Signal Behav 12:e1396404
Santos JD, Chebotarov D, McNally KL, Bartholome J, Droc G, Billot C, Glaszmann JC (2019) Fine Scale Genomic Signals of Admixture and Alien Introgression among Asian Rice Landraces. Genome Biol Evol 11:1358–1373
Son NY, Sebastian BT LS (2018) Development of Climate-Related Risk Maps and Adaptation Plans (Climate Smart MAP) for Rice Production in Vietnam’s Mekong River Delta. Agriculture and Food Security (CCAFS) Working Paper No. 220, pp. 1–30
Sumiyoshi M, Nakamura A, Nakamura H, Hakata M, Ichikawa H, Hirochika H, Ishii T, Satoh S, Iwai H (2013) Increase in cellulose accumulation and improvement of saccharification by overexpression of arabinofuranosidase in rice. PLoS One 8:e78269
Ta KN, Khong NG, Ha TL, Nguyen DT, Mai DC, Hoang TG, Phung TPN, Bourrie I, Courtois B, Tran TTH, Dinh BY, La TN, Do NV, Lebrun M, Gantet P, Jouannic S (2018) A genome-wide association study using a Vietnamese landrace panel of rice (Oryza sativa) reveals new QTLs controlling panicle morphological traits. BMC Plant Biol 18:282
Tanaka W, Toriba T, Hirano HY (2017) Three TOB1-related YABBY genes are required to maintain proper function of the spikelet and branch meristems in rice. New Phytol 215:825–839
Tang Y, Liu X, Wang J, Li M, Wang Q, Tian F, Su Z, Pan Y, Liu D, Lipka AE, Buckler ES, Zhang Z (2016) GAPIT Version 2: An Enhanced Integrated Tool for Genomic Association and Prediction. Plant Genome 9(2):1–9
To HTM, Nguyen HT, Dang NTM, Nguyen NH, Bui TX, Lavarenne J, Phung NTP, Gantet P, Lebrun M, Bellafiore S, Champion A (2019) Unraveling the Genetic Elements Involved in Shoot and Root Growth Regulation by Jasmonate in Rice Using a Genome-Wide Association Study. Rice (N Y) 12:69
Tran TV, Tran DX, Myint SW, Huang CY, Pham HV, Luu TH, Vo TMT (2019) Examining spatiotemporal salinity dynamics in the Mekong River Delta using Landsat time series imagery and a spatial regression approach. Sci Total Environ 687:1087–1097
Tu B, Hu L, Chen W, Li T, Hu B, Zheng L, Lv Z, You S, Wang Y, Ma B, Chen X, Qin P, Li S (2015) Disruption of OsEXO70A1 Causes Irregular Vascular Bundles and Perturbs Mineral Nutrient Assimilation in Rice. Sci Rep 5:18609
Vemanna RS, Bakade R, Bharti P, Kumar MKP, Sreeman SM, Senthil-Kumar M, Makarla U (2019) Cross-Talk Signaling in Rice During Combined Drought and Bacterial Blight Stress. Front Plant Sci 10:193
Wang C, Yue W, Ying Y, Wang S, Secco D, Liu Y, Whelan J, Tyerman SD, Shou H (2015) Rice SPX-Major Facility Superfamily3, a Vacuolar Phosphate Efflux Transporter, Is Involved in Maintaining Phosphate Homeostasis in Rice. Plant Physiol 169:2822–2831
Wang Q, Tian F, Pan Y, Buckler ES, Zhang Z (2014) A SUPER powerful method for genome wide association study. PLoS One 9:e107684
Wang W, Mauleon R, Hu Z, Chebotarov D, Tai S, Wu Z, Li M, Zheng T, Fuentes RR, Zhang F, Mansueto L, Copetti D, Sanciangco M, Palis KC, Xu J, Sun C, Fu B, Zhang H, Gao Y, Zhao X, Shen F, Cui X, Yu H, Li Z, Chen M, Detras J, Zhou Y, Zhang X, Zhao Y, Kudrna D, Wang C, Li R, Jia B, Lu J, He X, Dong Z, Xu J, Li Y, Wang M, Shi J, Li J, Zhang D, Lee S, Hu W, Poliakov A, Dubchak I, Ulat VJ, Borja FN, Mendoza JR, Ali J, Li J, Gao Q, Niu Y, Yue Z, Naredo MEB, Talag J, Wang X, Li J, Fang X, Yin Y, Glaszmann JC, Zhang J, Li J, Hamilton RS, Wing RA, Ruan J, Zhang G, Wei C, Alexandrov N, McNally KL, Li Z, Leung H (2018) Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557:43–49
Wing RA, Purugganan MD, Zhang Q (2018) The rice genome revolution: from an ancient grain to Green Super Rice. Nat Rev Genet 19:505–517
Xie W, Wang G, Yuan M, Yao W, Lyu K, Zhao H, Yang M, Li P, Zhang X, Yuan J, Wang Q, Liu F, Dong H, Zhang L, Li X, Meng X, Zhang W, Xiong L, He Y, Wang S, Yu S, Xu C, Luo J, Li X, Xiao J, Lian X, Zhang Q (2015) Breeding signatures of rice improvement revealed by a genomic variation map from a large germplasm collection. Proc Natl Acad Sci U S A 112:E5411–E5419
Yang S-h, Niu X-l, Luo D, Chen C-d, Yu X, Tang W, Lu B-r, Liu Y-s (2012) Functional Characterization of an Aldehyde Dehydrogenase Homologue in Rice. Journal of Integrative Agriculture 11:1434–1444
Yen BT, Quyen NH, Duong TH, Van Kham D, Amjath-Babu TS, Sebastian L (2019) Modeling ENSO impact on rice production in the Mekong River Delta. PLoS One 14:e0223884
Yi J, Kim SR, Lee DY, Moon S, Lee YS, Jung KH, Hwang I, An G (2012) The rice gene DEFECTIVE TAPETUM AND MEIOCYTES 1 (DTM1) is required for early tapetum development and meiosis. Plant J 70:256–270
Ying Y, Yue W, Wang S, Li S, Wang M, Zhao Y, Wang C, Mao C, Whelan J, Shou H (2017) Two h-Type Thioredoxins Interact with the E2 Ubiquitin Conjugase PHO2 to Fine-Tune Phosphate Homeostasis in Rice. Plant Physiol 173:812–824
Yu M, Yau CP, Yip WK (2017) Differentially localized rice ethylene receptors OsERS1 and OsETR2 and their potential role during submergence. Plant Signal Behav 12:e1356532
Yuenyong W, Chinpongpanich A, Comai L, Chadchawan S, Buaboocha T (2018) Downstream components of the calmodulin signaling pathway in the rice salt stress response revealed by transcriptome profiling and target identification. BMC Plant Biol 18:335
Zang D, Li H, Xu H, Zhang W, Zhang Y, Shi X, Wang Y (2016) An Arabidopsis Zinc Finger Protein Increases Abiotic Stress Tolerance by Regulating Sodium and Potassium Homeostasis, Reactive Oxygen Species Scavenging and Osmotic Potential. Front Plant Sci 7:1272
Zhang B, Wang X, Zhao Z, Wang R, Huang X, Zhu Y, Yuan L, Wang Y, Xu X, Burlingame AL, Gao Y, Sun Y, Tang W (2016) OsBRI1 Activates BR Signaling by Preventing Binding between the TPR and Kinase Domains of OsBSK3 via Phosphorylation. Plant Physiol 170:1149–1161
Zhang T, Li R, Xing J, Yan L, Wang R, Zhao Y (2018) The YUCCA-Auxin-WOX11 Module Controls Crown Root Development in Rice. Front Plant Sci 9:523
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012) A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28:3326–3328
Zhou X, Ni L, Liu Y, Jiang M (2019) Phosphorylation of bip130 by OsMPK1 regulates abscisic acid-induced antioxidant defense in rice. Biochem Biophys Res Commun 514:750–755
Zhou Y, Chebotarov D, Kudrna D, Llaca V, Lee S, Rajasekar S, Mohammed N, Al-Bader N, Sobel-Sorenson C, Parakkal P, Arbelaez LJ, Franco N, Alexandrov N, Hamilton NRS, Leung H, Mauleon R, Lorieux M, Zuccolo A, McNally K, Zhang J, Wing RA (2020) A platinum standard pan-genome resource that represents the population structure of Asian rice. Sci Data 7:113

Table 1: 21 QTLs identified for plant description traits in the full panel, and Indica and Japonica subpanels. Detailing for the QTL analysis; significance threshold -log₁₀(p value) ³ 8.0; panel in which significant associations were detected, highest level of significance for all panels, the occurrence of any overlap with selected regions in the four Japonica or five Indica subpopulations, any overlap with publish QTLs for Vietnamese rice populations or for the 3K RGP.

QTL Name	Trait	Chrom	Panel	Segment position (bp)	Sig SNPs nb	min P.value	Number of genes	FST I5 vs I2,I3,I4 ^	Overlap with selected regions	Overlap with QTLs	enrichment phytozome *	enrichment phytozome *
1_DI	Diameter_Internode	2	FP	6,805,273 - 6,923,410	3	3.12E-08	18	0.14	I2,I4
2_GL	Grain_Length	2	FP & Jap	15,480,976 - 16,798,043	27	2.69E-12	197	0.01	J1,J2,J3,J4	panicle morphology [Ta 2018]	IPR003480	Transferase
3_GL_jap	Grain_Length	2	Jap	35,638,527 - 35,927,940	4	3.16E-11	58	0.12	I3
4_GW_jap	Grain_Width	3	Jap	3334516 - 3,532,506	3	5.26E-09	34	0.05		Leaf Length [Phung 2016]
5_GS	Grain_Length	3	FP & Ind & Jap	16,520,656 - 16,908,475	30	9.26E-17	53	0.10		grain width and grain length [Mansueto 2017, Li 2018]
6_GS	Grain_Width	3	FP & Jap	17,686,248 - 20,833,777	355	2.02E-13	471	0.18	J2	panicle morphology [Ta 2018]	PWY-861	dhurrin biosynthesis
7_GL	Grain_Length	4	FP	12,043,539 - 13,108,767	14	5.51E-11	167	0.06	J2		IPR001283	Cysteine-rich secretory protein
8_HD	Heading_Date	4	FP	16,165,354 - 16,384,087	4	1.72E-08	37	0.10	I4		PWY-5733, PWY-6275	Terpenoid Biosynthesis
9_PL	Panicle_Length	5	FP	667,557 - 767,557	2	6.17E-08	20	0.38	I5
10_GS	Grain_Width	5	FP & Ind	4,802,345 - 5,383,914	57	2.40E-11	75	0.18		grain width and grain length [Mansueto 2017, Li 2018]
11_GL	Grain_Length	6	FP & Ind	1,561,006 - 1,664,716	16	2.68E-10	17	0.17
12_GL	Grain_Length	6	FP & Ind	6,680,831 - 7,190,137	51	1.81E-14	78	0.17	I4,I5		GO:0071554	cell wall organization or biogenesis
13_GL	Grain_Length	6	FP	7,453,914 - 7,553,914	2	5.90E-08	13	0.11	I2		PWY-4203	volatile benzenoid biosynthesis I
14_PL	Panicle_Length	6	FP	20,400,110 - 20,500,110	2	2.72E-08	13	0.40	I5
15_GL_jap	Grain_Length	7	Jap	11519294 - 12,296,525	3	5.76E-08	99	0.04	J4
16_FP	Floret_Pubescence	8	FP	18,004,654 - 18,104,654	2	1.64E-08	17	0.14	J1
17_FP	Floret_Pubescence	8	FP	26,175,268 - 26,275,268	2	6.06E-08	15	0.05			IPR001607	Zinc finger, UBP-type
18_FP	Floret_Pubescence	9	FP	6,656,837 - 7,940,621	51	7.23E-12	168	0.16	I4		IPR004158	DUF247
19_HD	Heading_Date	9	FP	14,067,272 - 14,807,406	7	6.86E-09	115	0.10	I2		GO:0002438	response to stimulus
20_GW_jap	Grain_Width	10	Jap	1,098,998 - 1,404,807	6	3.61E-12	52	0.21
21_LW	Leaf_width	12	FP	17,445,137 - 17,561,823	2	2.14E-09	13	0.08

* for full list of enriched (Max p-value 0.05 with Bonferroni correction) protein domains, Gene Ontology Biological Processes and Meta-Cyc pathways and underlying genes see Table S9. ^ F_st between the 43 accessions in subpopulation I5 and the 190 accessions in subpopulations I2, I3 and I4.

FP: full panel; Ind: Indica subpanel; Jap: Japonica subpanel; Chrom: chromosome; Sig SNPs nb: number of significant SNPs. References: Ta 2018 (Ta et al. 2018), Phung 2016 (Phung et al. 2016), Mansueto 2017 (Mansueto et al. 2017), Li 2018 (Li et al. 2018).

Download PDF

Review #2 received at journal
20 Feb, 2021
Editorial decision: Major revision
20 Feb, 2021
Reviewer #2 agreed at journal
29 Jan, 2021
Review #1 received at journal
14 Oct, 2020
Reviewers invited by journal
14 Sep, 2020
Reviewer #1 agreed at journal
14 Sep, 2020
Editor assigned by journal
02 Sep, 2020
First submitted to journal
01 Sep, 2020
Submission checks completed at journal
01 Sep, 2020
Editor invited by journal
01 Sep, 2020

You are reading this latest preprint version

Resequencing of 672 native rice accessions to explore genetic diversity and trait associations along Vietnam

Status:

Version 1

Abstract

BACKGROUND:

RESULTS:

CONCLUSIONS:

Figures

Background

Results

Discussion

Indica and Japonica rice subpopulations within Vietnam

Conclusions

Materials And Methods

Declarations

References

Tables

Supplementary Files

Status:

Version 1