Assessment of Genetic Diversity in Traditional Landraces and Improved Cultivars of Rice

Background: Rice is the staple food for more than half of the world's population. Rice cultivation needs expansion to meet the increasing food demands across the globe. Genetic diversity is desired for crop breeding because it serves as the backbone for improving cultivars. The process of domestication and modern plant breeding technologies applied to rice has contributed to the erosion of genetic diversity. Current breeding programs have extensively shaped the genetic diversity of elite rice cultivars to no small extent. Results: We explored the genetic diversity of traditional landraces and improved cultivars by inspecting the whole genome SNP markers of 20 rice accessions. We found a higher number of genetic variations (76.70%) and observed heterozygosity (0.024) in landraces than improved cultivars. The principal component analysis also revealed the higher genetic diversity among the landraces. While population structure based on the phylogenetic tree suggested the population's structure according to rice subspecies. The genetic diversity parameter, F ST, was applied to estimate the genetic differentiation of rice, which revealed week genetic differentiation (0.121) and nucleotide diversity (0.314) in modern rice cultivars. Genome-wide genetic differentiation (F ST ) analysis identied the two domesticated genes: Kala4 (pericarp color) and Ghd7 (heading date), and eight improvement genes: Sd1, Ghd8, GW2, NRT1.1b, GW6a, and Hd3a, that coincide with the candidate selective sweeps. Inbreeding depression (0.68617) among the modern cultivars suggests no genetic gain in future breeding efforts and compels exotic material utilization in the breeding programs. Conclusion: These ndings demonstrate that modern cultivars have a narrow genetic base compared to landraces. Therefore, exploring the genome of landraces at a large scale to identify the genes responsible for stability and adaptation to abiotic stresses can help design varieties that can survive vulnerable climates.


Background
Rice is one of the important staple food crops worldwide. Rice is cultivated in the broad sphere of ecological and climatic conditions across the globe. The Asian cultivated rice (Oryza sativa) is the primary food crop that satis es the food demands of more than half of the world's population [1]. To create food surpluses for the rapidly growing world population is one of the biggest concerns nowadays.
According to the prediction, this current production rate is insu cient for the projected global community in 2035. To feed the expanding population: a further 116 million tons of rice will be required [2]. Rice farming needs expansion to ful ll the demand for rice desired across the globe. Since the onset of agriculture, crop plants have undergone a series of genetic manipulation to meet the expanding world population [3]. Currently, rice breeders focused on both the quality and yield of the major food crops that ful ll human beings' dietary needs. Achieving future production goals is not straightforward due to narrow genetic diversity within the breeding stocks.
Genetic diversity plays an essential role in the evolution of species [4]. It favors the improvement of crops and allows them to adapt to the changing environmental conditions. Diversity in crop plants presents the plant breeders with the opportunity to cultivate improved varieties that have favorable traits like high yield, large grain size, and resilient biotic and abiotic factors. Arti cial selection and rice's adaptability to various habitats has ended in a diverse range of improved varieties. Almost 780,000 rice accessions are presented in the gene banks worldwide [5]. To explore the genetic diversity in rice (Oryza sativa): an international effort was made to re-sequence the 3000 rice accessions from different parts of the world and made the data publicly accessible to the scienti c community [6]. This single nucleotide polymorphism (SNPs) dataset of genome sequences opens a new way to outline the genetic diversity within the spectrum of plant germplasm, including traditional landraces and improved cultivars and wild ancestors. Information from genomic sequences provides an opportunity for breeders to select desired diversity for improved farming varieties. This dataset of genetic variations grants the ground for the characterization of population diversity, population structure, and species [7]. The 3000 rice genome data has been employed recently for the study of structural variants [8], genetic variations, population structure and diversity [9], and detection of transposable elements insertion in rice [10].
During domestication and plant breeding, technologies have contributed to the erosion of genetic diversity, resulting in making the crop plants defenseless against the dynamic climate conditions with lower genetic potentials in the future. Such as the Irish potato famine [11] and southern corn leaf blight [12] are examples of this. Modern breeding practices also resulted in a narrow genetic base of advanced lines due to arti cial selection pressure for improvement related genes. Hence, the present study examines the genetic diversity erosion phenomena and arti cial selection footprints. Pooling many accessions together and using shallow genetic variation data provided limited information [13].
Therefore, a small sample size with deep sequencing is a more reliable strategy [14]. To get more in-depth information, we collected the whole-genome re-sequencing genetic variant data from 20 diverse accessions belong to four main rice groups, i.e., indica, japonica, aus, and admixed. We used the wholegenome re-sequencing data of 20 rice accessions to (a) assess the population structure using distancebased methods and principal component analysis, (b) estimate the genetic diversity among traditional landraces and improved cultivars through population genetics analysis, and (c) examine the footprints of genetic erosion and arti cial selection.
These ndings from population genetic analysis provide insight into genetic diversity within the traditional landraces and improved cultivars and identify the variants highly variable between the populations and are associated with important traits. These variants can be useful in marker-assisted selection for modern breeding programs.

Genotyping and variant calling
Genetic diversity is the prime objective of improving the genetic gain of the crop. This study estimates the genetic diversity between traditional landraces and improved cultivars. We rst downloaded the whole genome sequencing data of 20 rice accessions (Fig. 1 [17]. We analyzed the whole genome biallelic SNPs for two subpopulations; traditional landraces and improved cultivars, to perform principal component analysis. Principal component analysis has clustered the 20 rice accessions into three groups (Fig. 3). Interestingly, the clustering of samples is the same as observed in the phylogenetic tree. Traditional landraces are scattered along the axis of the PCA plot, indicating the higher genetic diversity among the landraces compared to improved cultivars. The rst PCA explains 21%, and the second PCA explains 12% of the total variance. Hence, rst, two PCA add up to explained 33% of total SNP variation, which is higher than the previous studies [14,17,18]. PCA scores of SNPs were analyzed in correlation to the axis. Two principle coordinates are enough to epitomize the total variance between the two populations. SNPs in the 1st principal coordinate explained more variance than the 2nd coordinates. 1st principal coordinate is more differentiating between the populations. Based on 1st principal coordinate plotted top 1000 SNPs are plotted with the highest variance values in the overall population (Fig. 4). PCA allows identifying the contribution of SNPs in structuring the population by using F-statistics. Spatial dependencies of SNPs nd regions within the genome that are responsible for structuring the populations and can be identi ed as selection signals [19].

Estimation of population genetics
We further investigated genetic divergence between traditional landraces and improved cultivars. To measure the genetic differentiation in selected samples, we calculated pairwise nucleotide diversity and F-statistics, expected heterozygosity, and inbreeding coe cient for each SNP locus. To investigate the sub-populations' genetic diversity: we calculated nucleotide diversity (π) in each group. The ratio of nucleotide polymorphism in landraces and varieties is 0.314 and 0.321, with an unremarkable difference.
The inbreeding coe cient F IS values in landraces and varieties are 0.68173 and 0.68617; shows an insigni cant discrepancy. But these F IS values suggest high inbreeding in selected rice genotypes.
We further investigated the genetic differentiation between the landraces and improved cultivars. For this purpose, we calculated the xation index value (F ST ) at the whole-genome level between the two subpopulations, which showed weaker genetic differentiation of 0.121.
Traditional landraces show higher genetic diversity concerning polymorphic loci of 76.70% and private alleles 847986. The proportion of polymorphic loci in improved varieties is 75.19%, and the number of private alleles is 786955. The observed heterozygosity between the landraces and improved cultivars are 0.024 and 0.017, respectively (Table 1). In comparison, observed homozygosity between the landraces and improved cultivars is 0.975 and 0.983, respectively. The pattern of homozygosity and heterozygosity between the landraces and improved cultivars also suggested a slight decline in variability and increased homozygosity in varieties (Table 1). These ndings are consistent with the general expectation of traditional landraces possesses more genetic diversity than modern cultivars.  [14,20].
Whole-genome sequencing data from landraces and improved cultivars provide an opportunity to identify the selective regions. We calculated the ratio of genetic diversity in modern cultivars to the diversity in landraces (π = π improved cultivars /π landraces ) in the non-overlapping window of 10 kb along the entire genome (Fig. 5). To determine the candidate selective sweeps: the top 10% of the ratio of the genetic differentiation between improved cultivars and landraces was selected. To support these results, we further estimated the genetic differentiation between improved cultivars and landraces using the same non-overlapping window of 10 kb along the entire genome. Use similar top10% threshold criteria to select the candidate selective sweeps from genetic differentiation (F ST ) results. We noticed many regions with strong selection signals where F ST between modern cultivars and landraces were extremely low. To identify the vital selection signals, we selected SNPs that were in domestication and improvement related genes. We selected the most 13 well-characterized domesticated genes, including Prog1 (tiller angle) [21], Rc (pericarp color) [22], qSH1 (seed shattering) [23], sh4 (reduce seed shattering) [24], Ghd7 (heading date) [25], LABA1 (barbless awns) [26], Kala4 (pericarp color) [27], LG1 (grain width) [28], OsLG1 (Alteration in the laminar joint and ligule development forming closed panicles) [29], GW5 (grain width) [30], Bh4 (hull color) [31], An-1 (awn length) [32] and GAD1 (awn length) [33].

Discussion
Rice is one of the most ancient and extensively consumed staple food crops. Its cultivation and domestication have a signi cant role in the rise of agricultural civilization in Asia. Rice is considered to have been domesticated from Asian wild rice, O. ru pogon, 10,000 years ago [10,59,60,61]. The split between two progenitors, indica, and japonica from which both cultivated types originated, occurred 800,000 years ago [10]. This separation shows long before the origin of agriculture. While aus/boro lineage split from indica appears to be more recent as ~ 540,000 years ago [10].
During the process of domestication, rice has experienced signi cant phenotypic changes like grain size, color, shattering, seed dormancy, and tillering, as recently identi ed and veri ed through quantitative trait loci mapping [21,22,62,63]. Since the domestication of rice, a series of arti cial selection procedures have been applied in rice breeding programs that have led to a decline in genetic variability [64]. After the domestication, rice breeders mainly focused on selecting lines with long grain, more tillering, and high yield potential, except for other biotic and abiotic stress tolerant and quality traits. Such unidirectional selection of varieties resulted in a narrow genetic base among the modern cultivars. The present study assessed 20 rice accession genetic diversity, including landraces and modern cultivars, using SNP markers.
Our results revealed that modern cultivars have a narrow genetic base compared to landraces. Like the previous study, the genetic bottleneck caused a limited diversity in cultivated varieties [15]. There is an urgent need to harnessing genetic variation for further improvement and enhance the crop yield's genetic gains. Higher heterozygosity was observed in landraces (H T = 0.02444) than improved varieties (H T =0.01671), as also observed by Alvarez et al. (2007) [65]. Low F ST values between these sub-populations and increased observed homozygosity in varieties suggest high inbreeding depression. Thus, the genetic diversity within landraces will be signi cant for designing new commercial varieties to broaden the new genotypes' variability.
Genetic diversity is desired for crop breeding because it serves as the backbone for improving cultivars. It assists in designing varieties capable of coping with changing climatic conditions by manipulating genetic makeup [66]. Developing elite rice cultivars with increased genetic variability has become a leading challenge for crop breeders, which can implicate recent advances in breeding technologies. The collection of diverse and valuable germplasm in the gene bank is one of the keys to enhancing genetic diversity [14,67].
Modern crop breeding techniques and advances in crop management practices signi cantly improve the annual gain of 0.8-1.2% in crop productivity [68]. Genomic breeding is one of the modern breeding technologies, integrating diverse accessions, genomic resources, and molecular technology and breeding tools. Large scale dense genotyping of various germplasm resources has become an essential part of crop germplasm characterization and its further utilization. Based on the genetic and morphological characterization of germplasm, additional help dissect the genetic basis of quantitative traits and identify the novel genes [16,41,69]. Utilization of critical genetic loci and pyramiding of these loci through breeding, leading to the development of new germplasm. This advanced breeding approach is named "genome-based breeding by design." Genome-based breeding by design strategy successfully develops green super rice (GSR) cultivars [70,71,72].
Genomic selection is one of the most crucial breeding strategies to increase genetic gains and have advantages over the traditional approaches [73]. It can be improved by incorporating the high-throughput SNP chips and next-generation sequencing (NGS)-based platforms and high-throughput phenotyping technologies. This advanced technology helps identify suitable parents for breeding programs, ultimately resulting in a genetic gain of future crops. A genome-wide association study is another genomic strategy using in rice crops [14,15,17]. This technique is used to decipher the genetic basis of important quantitative traits, identifying the novel genes underlying study traits, and providing knowledge about the valuable haplotypes. With the implementing such bene cial haplotypes in breeding program result in a genetic gain of advanced cultivars. A haplotype is consisting of two or more SNPs with strong linkage disequilibrium. Sometimes one SNP linked with an undesirable trait causes linkage drag. In this regard, gene-editing technology plays a vital role in reducing linkage drag and regulating critical genes' gene expression. CRISPR-Cas9 tool enabled multiplex-gene editing and was considered a non-GMO approach [74]. This advanced genome editing technique helps breed the new cultivars by activating the homeoalleles of a gene or deactivating the alleles causing the linkage drag.
In the future, diverse germplasm collection, especially early domesticated cultivars (landraces), enriches the gene pools with multiple genetic backgrounds. Traditional landraces are a rich source of genetic variability and adaptable to stressful environmental conditions. Therefore, exploring the genome of landraces at a large scale to identify the genes responsible for stability and adaptation to abiotic stresses can help design varieties that can survive vulnerable climates. Further, effective implementation of these advanced breeding techniques at one platform will bring the next-generation crops with higher genetic gains. These next-generation crops will help to meet the food security demands for the projected global population in 2050.

Conclusion
The present study based on whole-genome SNPs assessed the genetic diversity in traditional landraces and improved varieties of rice. Crop breeding requires genetic diversity for developing new cultivars and improving varieties. Genetic diversity estimation suggests that there is an immediate appeal to include more diverse donor parents in the breeding programs for improvement in varieties to broaden the genetic basis. Malaysia, Indonesia, and China) were selected from 3K Rice Genome Project. Whole-genome resequencing data of these accessions were retrieved from NCBI SRA.

Structure analysis
Principal component analysis PCA was performed to investigate the genetic structure of populations and relationships in association with SNPs, using the "adegenet" package in R [77].

Phylogenetic analysis
Phylogenetic analysis was performed for the integrity of the variant calling pipeline. SNPs from the whole genome were selected for NJ tree construction. The distance matrix based on the p-distance model was calculated for all SNPs, and an unrooted neighbor-joining tree was constructed using TASSEL (standalone v.5.0) [78] and visualized using Interactive Tree of Life (iTOL) [79].

Estimation of Genetic Diversity
Genetic analysis was performed to estimate the genetic diversity between and within the traditional landraces and improved cultivars population. Populations program from Stacks [80] was used for Fstatistics to measure pairwise genetic differentiation F ST [81], nucleotide diversity π [82], and heterozygosity for genetic variability, and percentage of polymorphic loci across the genome.

Genomic Fingerprints for Selective Sweeps
For selective sweeps identi cation, we employed two approaches: (1) identify the genomic regions which are lost during selection procedure from landraces to modern varieties resulted in narrow genetic-base and (2) ngerprint the selection pressure-related areas, which are highly selected for crop improvement. For this purpose, two population statistics methods, i.e., genetic diversity (π) and genetic differentiation (F ST ), were employed. The genetic diversity in the landraces and modern varieties (π landrace / π variety ) was measured. Window-based π was calculated using vcftools [83] with the window size of 100 kb, and candidate selective sweeps were selected based on the top 10% of values. Genetic differentiation (F ST ) was also calculated with the window size of 100 kb by using vcftools. Declarations