Genetic diversity
We sequenced ten unlinked gene segments from 594 accessions from 28 rice landrace populations in Yunnan (Table S1 and Figure S1). The length of the aligned sequences ranged from 420 to 627 bp for each gene locus, with a total length of 4,994 bp (Table S2 and Figure S2). The standard statistics of sequence polymorphisms for each locus are summarized in Table S3. The term θπ indicates the nucleotide diversity, which was calculated for each locus. This term ranged from 0.0023 (STS22) to 0.0149 (CatA) in rice landraces, with an average of 0.0060. The haplotype diversity varied from 0.501 (Os1977) to 0.788 (GS5), with an average of 0.655. Four-gamete testing revealed that the Rm ranged from 0 to 7 in rice landraces, with an average of 3, indicating that there was low heterozygosity in the rice landraces due to high levels of inbreeding. Neutrality test values for most gene loci were not significant, except Os1977 (D = 2.856) and Ehd1 (D = 2.280), indicating that most were neutral gene loci. Based on SSR markers, a total of 629 alleles were detected in all accessions, with an average of 13.10 per locus. The average gene diversity and PIC were 0.755 and 0.725, respectively (Table S4).
Standard sequence polymorphism statistics for each location are summarized in Table S5 and Fig. 1. For the gene loci, we observed genetic diversity, estimated by θπ, ranging from 0.0039 (Jianchuan, JC) to 0.0063 (Longchuan, LN), with an average of 0.0055. The haplotype diversity varied from 0.475 (Jianchuan, JC) to 0.671 (Ximeng, XM) (with an average of 0.610). Based on SSR markers, gene diversity, PIC, and heterozygosity ranged from 0.5082 to 0.7289, 0.4717 to 0.6928, and 0.0355 to 0.1264, respectively. Globally, the mean genetic diversity, PIC, and heterozygosity were 0.6729, 0.6318, and 0.0690, respectively. Correlation analysis indicated that genetic diversity was significantly negative with respect to latitude (Figure S3). In other words, the genetic diversity of rice landraces in Yunnan province, China showed a geographical decline from south to north.
Population Structure And Genetic Relationships
The STRUCTURE analysis suggested K = 2 as the optimal number of clusters based on the calculation of ∆K (Figure S4). This suggests that the rice landraces can be grouped into two groups, referred to as P1 and P2. We found the 271 P1 accessions are japonica or japonica-like, and the 323 P2 accessions are indica or indica-like. This result indicated that rice landraces from Yunnan were clearly differentiated into japonica and indica groups (Figure S5a). We found both japonica- and indica-type landraces distributed in each location (Figure S5b). We further detected a significant positive correlation between the proportion of japonica rice landraces in various locations and latitudes (r = 0.367, P < 0.05), but a significant negative correlation between the proportion of indica rice landraces in various locations and latitudes (r = -0.367, P < 0.05) (Figure S6). In other words, the proportion of japonica landraces in each location in Yunnan increased from low to high latitudes. By contrast, the proportion of indica landraces in each location decreased from low to high latitudes. In each japonica or indica group, the neighbor-joining tree showed that the rice accessions could be clearly differentiated according to their geographic locations, including a cluster from northern locations, a cluster from southern locations, and other cluster from middle locations (Fig. 2). Moreover, we found more obvious geographic structure in the japonica group than in the indica group.
In the japonica and indica groups, the first two components of the PCA accounted for 64.1 and 51.3% of total variation, respectively, which were used to visualize the dispersion of the populations in a graph (Fig. 3). The result of PCA also showed geographic structure in the japonica group: we found that the rice accessions from northern locations were assigned to one cluster, the rice accessions from southern locations were assigned to one cluster, and rice accessions from locations in the middle were assigned to one cluster (Fig. 3a). Additionally, we found the accessions from the indica group were more highly dispersed than the japonica group (Fig. 3b).
Isolation By Distance And Environment
Pairwise FST was calculated using Arlequin 3.5 (Excoffier and Lischer 2010 ). Population-level pairwise genetic differentiation as FST/(1-FST) (Slatkin 1995) was measured. Isolation by distance (IBD) was evaluated by assessing the correlation matrix between pairwise geographical distance and genetic differentiation between locations was assessed using Mantel tests. A total of 100,000 random permutations were performed. The analyses above were performed using methods implemented in Arlequin 3.5 (Excoffier and Lischer 2010 ).
We reduced the environmental variables using principal component analysis (PCA) based on the 24 environmental variables (Table S15) in SAS version 8.02. With PCA, we kept the first two axes as environmental variables because they explained 82.26% of the environmental variation, and they were used to calculate the environmental dissimilarity. Prior to analysis, all environmental variables were standardized. The correlations between genetic differentiation and geographic/environmental factors were determined by partial Mantel tests. The environmental distance between populations was the Euclidean distance calculated with the values of the PCA axes. Partial Mantel tests with 100,000 permutations were performed between genetic factors and one factor under the influence of the other, using the “ecodist” package in R (Goslee and Urban, 2007). Because we identified two genetic clusters corresponding to the indica and japonica subpopulations (see results), partial Mantel tests were also conducted separately in each subpopulation.
Differential Selection In The Different Geographic Locations
We used a composite likelihood ratio method (CLR) (Nielsen et al. 2009) to identify genomic regions differentially selected in the subgroups from different geographic locations. (Fig. 5). Regions with the strongest 5th percentile of CLR selection signals were considered. In the japonica group, 179 regions and 170 regions were identified as the most affected by selection in subgroup Jap-N (rice landraces from northern locations) and Jap-S (rice landraces from southern locations), respectively (Table S6-7). The selected regions of Jap-N had a mean size of 105.9 kb, covering 5.1% of the rice genome, whereas those selected regions of Jap-S covered 5.0% of the rice genome with a mean size of 110.95 kb. The selected regions of Jap-N and Jap-S contained 2,165 and 1,785 protein-coding genes, respectively (Table S6-7). In the indica group, we identified 140 potential selected regions with an average size of 134.7 kb in Ind-N (rice landraces from northern locations), comprising approximately 5.0% of the rice genome, and 77 potential selected regions with an average size of 234.5 kb in Ind-S (rice landraces from southern locations), comprising approximately 4.8% of the rice genome (Table S8-9). The selected regions of Ind-N and Ind-S contained 1,964 and 1,875 protein-coding genes, respectively (Table S8-9). 37 selected regions of Jap-N overlapped with 35 selected regions of Jap-S, containing 309 genes, and 52 selected regions of Ind-N overlapped with 37 selected regions of Ind-S, containing 984 genes. These results suggested that although different target genes were selected in different subgroups, some of the targets were common to the different subgroups.
As expected, we found a number of candidate genes included in the potential selected regions were important agronomic genes/quantitative trait loci (QTLs) released by Q-TARO (Yamamoto et al. 2012) (Table S10-11). We found the gene MYBS3 included in the selected regions of Jap-N on chromosome 10 (21.796–22.338 Mb, CLR = 332.87) (Fig. 5a), which were involved in cold stress response and cold tolerance (Su et al. 2010). Meanwhile, the region on chromosome 6 containing OsiSAP8 showed a selective-sweep signal (24.469–24.593 Mb, CLR = 1148.06) in Ind-N (Fig. 5c). This gene was also related to cold tolerance (Kanneganti and Gupta 2008). Whereas, the selection signals were detected in regions of Jap-S and Ind-S, which includes the TOGR1 and Spl7 genes (Fig. 5b and 5d). These genes are involved in response to high temperature (Wang et al. 2016; Yamanouchi et al. 2002). Additionally, many important genes such as Hd1 (Yano et al. 2000), Ehd4 (Gao et al. 2013), Gn1a (Ashikari et al. 2005), GIF1 (Wang et al. 2008), xa25 (Chen et al. 2002) and Bph14 (Du et al. 2009) were found in selected regions (Table S10). In total, we found 208 cloned agronomic genes in selected regions of Jap-N, Jap-S, Ind-N and Ind-S. Notably, these selective-sweep regions were extremely enriched in grain yield and plant type gene categories, followed by the abiotic stress resistance category (Figure S7). Furthermore, nearly all (96.6%) of our selected regions were overlapped with previously reported agronomic QTLs (Table S11).
Genomic signatures of local adaptation
To screen genomes for signatures of local adaptation, we tested for associations between genetic variation and environmental variables using Latent Factor Mixed Models (LFMM) (Frichot et al. 2013). A total of 1346 SNPs that obtained |z|-scores greater than 6 (P < 10− 10) (Table S12) were associated with environmental variables. Among these SNPs, we found the SNP (|z| = 7.02) located at 2.22 Mb on chromosome 3 was in the gene region of sd37 involving in plant height in rice (Zhang et al. 2014), which indicated this candidate gene might be related to adaptation to the local environment (Table S13). We also found a significant SNP (|z| = 6.90) on chromosome 9 in the gene OsMYBc (Table S13), which is involved in salt tolerance (Wang et al. 2015), reflecting its adaption to the local environment. Among these SNPs significantly correlated with environmental variables, several notable examples include candidate genes involved in heading date (OsRR1 and OsRRM) (Cho et al.2016; Chen et al. 2007), grain shape (GL7 and BG2) (Yang et al., 2013; Wang et al., 2015), plant height (CYP734A4 and OsIAGLU) (Choi et al. 2012; Qian et al. 2017), disease resistance (OsCERK1 and JAmyb) (Cao et al. 2015; Huang et al. 2020), and heat tolerance (HSP70) (Breusegem et al. 1994), etc. (Table S13). All of these significant associations between loci/genes and environmental variables imply adaptation to the local environment. In total, in terms of the loci with high levels of association with environmental variables, we found 190 loci were also in the selected regions, which indicates that loci involving local adaptation have been particularly targeted in selection.