Population divergence and relative kinship analysis of the rice accessions
After SNP calling and filtering, 56456 SNPs distributed on all 12 chromosomes were used for the final genetic analysis. Only about 0.3% of inter-SNP distances were greater than 50 kb (Additional files 1: Figure S1A). Based on the nucleotide polymorphism, we first calculated the genetic component of each variety. The results showed that the value of Evanno’s ΔK had the highest value at K = 2 (Fig. 1A). Therefore, combined with varieties’ original information, we speculated that two subspecies, indica (Pop1) and japonica (Pop2), were present in our natural population (Fig. 1B). The neighbor-joining (NJ) tree showed the same results (Fig. 1C). Similarly, principle component analysis (PCA) results distinguished the accessions into two subpopulations, PC1 and PC2, which accounted for 39.39% and 5.57% of the genetic variation, respectively (Fig. 1D). Together, results of kinship relatedness of the pairwise accessions showed that a few accessions within Pop2 had relatively strong relatedness, while the kinship relatedness among the accessions was weak in the Pop1 (Fig. 1E). Similar results were observed from the NJ tree, in which the accessions in Pop2 showed stronger kinship relatedness than the accessions in Pop1. Moreover, the population differentiation statistics (FST) value between the two subpopulations was 0.71, indicating that a high level of subpopulation differentiation existed in the 145 accessions. Finally, we analyzed the linkage disequilibrium (LD) rate for three populations using the SNP data. The genome-wide LD decay rates of Pop1, Pop2, and the full population were estimated to be about 200 kb, 300 kb, and 600 kb, respectively (Fig. 1F). These results showed that the density of the SNP used in our study is sufficient for the GWAS.
Evaluation of the rooting ability of rice
In our research, four rooting-related traits were measured at the seedling stage to evaluate rice rooting ability. These four traits root growth ability (RGA), maximum root length (MRL), root length (RL), and root number (RN) were all close to normal distribution and had abundant variations (Fig. 2A, Additional files 1: Table S1). Among them, RGA had the highest coefficient of variation value (CV = 59.97%), followed by MRL, RL, and RN, and all of them had a CV value larger than 30% (Table 1). RN had the highest H’ value of diversity index at 0.41 (Table 1), followed by RGA (0.39), MRL (0.38), and RL (0.37). These results indicated that our natural population had abundant phenotypic variations. Correlation analysis results revealed that RGA had a high positive correlation with the other three traits (Fig. 2B). The correlation coefficient between MRL and RL was the highest (r = 0.95). RN was moderately correlated with RGA (r = 0.68) and weakly correlated with RL and MRL. These results may indicate that RN is a relatively independent trait for evaluating rooting ability compared with the other traits in our research. The four traits’ phenotypic variation explained by population structure (RQ2) were significant (p < 0.01), but all RQ2 values were relatively low. The RGA had the lowest RQ2 value at 0.08 (Table 1). In addition, the comparison of the mean value of the four traits between the two subgroups (Pop1 and Pop2) by one-way analysis of variance (ANOVA) revealed obvious differences in MRL (p = 0.0436), RL (p = 0.0118), and RN (p = 0.0133), with the exception of RGA (p = 0.9266) (Additional files 1: Figure S1B). These results suggest that the effect of population structure should be considered in the next GWAS analysis.
Table 1
phenotypic variation for the root growth ability and related traits in 145 rice accessions
trait | mean ± SD | range | CV | diversity of H' value | RQ2 |
MRL | 5.59 ± 2.63 | 1.50 ~ 17.77 | 47.05% | 0.38 | 3.73% |
RL | 4.17 ± 1.89 | 1.12 ~ 10.36 | 45.32% | 0.37 | 5.76% |
RN | 4.85 ± 1.70 | 1.56 ~ 12.22 | 35.05% | 0.41 | 4.48% |
RGA | 20.76 ± 12.45 | 2.34 ~ 60.47 | 59.97% | 0.39 | 0.08% |
GWAS analysis results for rooting ability traits in rice
Association mapping was conducted in three association panels for the full, Pop1, and Pop2 to minimize the impact of the population structure on the power of GWAS. In total, 88 (p < 1.8 × 10− 5), 23 (p < 2.1 × 10− 5), and 43 (p < 3.2 × 10− 5) suggestive SNPs were detected in the full, Pop1 (Additional files 1: Figure S2), and Pop2 (Additional files 1: Figure S3) association panels for the four traits, respectively. These SNPs were distributed on all chromosomes with the exception of chromosome 10. Figure S4 clearly displayed SNP hotspots (Additional files 1: Figure S4).
Under a general linear model (GLM), the full population had most significant SNPs detected among the three panels. RN-related SNPs were detected most commonly (Fig. 3) (Additional files2: Table S2), followed by RGA, MRL, and RL. The SNP seq4_15403602 had the lowest p-value and the highest explanation rates at 33.85%, 32.61%, and 28.92% for the RL, MRL and RGA, respectively. Additionally, there were three regions contained most of the cluster SNPs which were correlated with RN. And thelead SNPs were seq5_6945054, seq1_2164532, and seq3_35311659, respectively. Among them, the SNP seq1_2164532 had the lowest p-value and the highest explanation rate of RN variation as 17.59%.
A total of 23 significant SNPs were detected in the Pop1 among the three GWAS panels, and most of them were related to MRL, followed by RL and RGA. No significant SNP was detected for RN (Additional files 2: Table S3). The important SNP seq9_11941704, also detected in the full panel, had the lowest p-value for RL, MRL, and RGA and explained 39.33%, 47.56%, and 37.50% of phenotypic variation, respectively. Only one region containing clustering SNPs was detected for MRL, and the leading SNP seq8_18343904 could explain 43.66% of MRL variation.
In the Pop2 panel, the number of SNPs detected for RN was the highest, followed by RGA, RL, and MRL (Additional files 2: Table S4). The SNP seq4_15403602, which had been also detected in the full panel with the lowest p-value, had the highest phenotypic explanation rate for RL, MRL, and RGA in this panel. The lowest p-value SNP seq8_18343904 in this panel explained 30.20% of RN variation, and was also identified in Pop1 for MRL. In addition, there were another three regions containing the clustering SNPs with the lead SNPs seq6_1153742, seq8_15949787, and seq5_6932106, where seq6_1153742 and seq8_15949787 explained 28.15% and 32.07% of the RN variation rate, respectively, and seq5_6932106 explained 24.55% of RGA variation, which was also detected in the full panel.
Identifying the candidate genes associated with the rooting ability of rice
To obtain candidate regions from these significant SNPs, we combined the suggestively significant SNPs with quantile-quantile (Q-Q) plots of the GWAS results. We then focused on the SNPs distributed in clusters and detected by more than one panel to avoid false positives if the Q-Q plot was not well-fitted. Based on this, a total of 32 SNPs (Table 2) were obtained
Table 2
The 29 significant SNPs and their rice genomic information
Trait | marker | population-identyfied | allele | Chr | Pos | candidate genes |
RN | seq1_2164532 | full | C/T | 1 | 2164532 | OsHsp17.0(Ham et al, 2013) |
RN | seq2_20578075 | full,Pop2 | T/C | 2 | 20578075 | |
RN | seq3_1727891 | full,Pop2 | C/T | 3 | 1727891 | CDPK13(Ho et al, 2013) |
RN | seq3_35381634 | full | T/C | 3 | 35381634 | |
MRL,RGA,RL | seq4_5315034 | Pop2 | A/G | 4 | 5315034 | OsCyc1(Otomo et al, 2004) OsDTS2 (Wilderman et al, 2004) |
MRL,RGA,RL | seq4_15403602 | full,Pop2 | T/C | 4 | 15403602 | |
MRL | seq4_18740406 | full,Pop1 | C/G | 4 | 18740406 | OsSAD5/SLL1 (Shelly et al, 2013) |
RN,RGA | seq5_6758272 | full | T/C | 5 | 6758272 | OsPYL(Kim et al, 2014); |
RN,RGA | seq5_6784506 | full | A/G | 5 | 6784506 | OsGA2ox10(Fang et al, 2008) |
RN,RGA | seq5_6784672 | full | A/C | 5 | 6784672 | |
RGA | seq5_6882162 | full,Pop2 | T/C | 5 | 6882162 | |
RGA | seq5_6884530 | full,Pop2 | C/A | 5 | 6884530 | |
RGA | seq5_6890004 | full,Pop2 | G/A | 5 | 6890004 | |
RGA | seq5_6908893 | full,Pop2 | T/C | 5 | 6908893 | |
RGA | seq5_6935074 | full,Pop2 | T/G | 5 | 6935074 | |
RN,RGA | seq5_6945054 | full,Pop2 | T/C | 5 | 6945054 | |
RGA | seq5_6959356 | full,Pop2 | G/A | 5 | 6959356 | |
RGA | seq5_6981293 | full,Pop2 | T/C | 5 | 6981293 | |
RN | seq6_11153742 | full,Pop2 | C/T | 6 | 11153742 | |
MRL,RL | seq6_12537542 | full,Pop2 | T/C | 6 | 12537542 | OsPT9(Wang et al, 2014) |
MRL,RL | seq6_29405495 | full,Pop1 | C/T | 6 | 29405495 | OsMPK4(Agrawal et al, 2003) |
MRL,RL,RGA | seq7_5182658 | full,Pop2 | A/G | 7 | 5182658 | SQS(Manavalan et al, 2012) |
MRL,RL,RGA | seq8_433198 | full,Pop1 | G/T | 8 | 433198 | |
RN | seq8_15949787 | Pop2 | G/C | 8 | 15949787 | qtl(Lilley et al, 1996) |
MRL,RN | seq8_18343904 | Pop1,Pop2 | T/G | 8 | 18343904 | |
MRL,RN | seq8_18364190 | full,Pop1,Pop2 | C/T | 8 | 18364190 | |
MRL | seq8_18365309 | full,Pop1 | C/T | 8 | 18365309 | |
MRL,RL,RGA | seq9_11941704 | full,Pop1 | T/C | 9 | 11941704 | |
MRL,RL | seq9_15520580 | Pop1 | A/C | 9 | 15520580 | OsGL1-1(Islam et al, 2009) |
Traditionally speaking, the gene nearest to SNP is not always the causal gene. Therefore, we screened the candidate genes from the LD blocks mentioned according to their annotation information from the China Rice Data Center (http://www.ricedata.cn/gene/) and the Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/index.shtml). We mainly selected genes with GO information related to metabolic processes, catabolic processes, or stress responses or genes that have a high expression level at seedling stages. As a result, some known genes and QTLs reported previously were obtained (Table 2). By summarizing the given annotation information, these genes and QTLs could be classified into four categories. The first category included six genes related to endogenous hormone metabolism. Asr1, located at about 310 kb from the SNP seq11_3585036, encodes an abscisic acid-stress-ripening-inducible 5 protein, and is regulated by both abscisic acid and gibberellin (Takasaki et al. 2008), which is also involved in drought tolerance (Philippe et al. 2010). OsCDPK13 was located at about 99 kb upstream of the SNP seq3_1727891. It has been found to negatively regulate the expression level of the enzymes essential for gibberellin biosynthesis and prevent drought stress injuries in the seedling development stage (Ho et al. 2013). OsMPK4 was mapped at about 4 kb upstream of the SNP seq6_29405495. Its transcript’s level is up-regulated upon wounding (by cut), hormones, and heavy metals, et al (Agrawal et al. 2003). In the second category, two genes are related to phosphate uptake. OsMYB2P-1 was identified at about 106 kb upstream of the SNP seq5_2421894. Longer primary roots and adventitious roots were produced in the OsMYB2P-1-overexpression plants than in wide-type plants under Pi-deficient conditions (Dai et al. 2012). OsPT9 was identified at about 142 kb downstream of the SNP seq6_12537542. It encodes a phosphorus transporter and its expression is specifically induced by Pi starvation (Wang et al. 2014).In the third category, two genes were responsible for root drought tolerance. OsHSP17.0 was found at about 3 kb from the nearest SNP, seq1_1954975. Overexpression OsHSP17.0 transgenic lines plants displayed a higher tolerance to drought and salt stress compared to wild-type plants (Ham et al. 2013). OsGL1-1 was located at 24 kb upstream of the nearest SNP seq9_15520580. OsGL1-1 is a Glossy1-homologous gene and osgl1-1 mutants show increased sensitivity to drought (Islam et al. 2009). In the fourth category, two genes were correlated with morphogenesis of roots. SQS and SLL1 directly influence root architecture and were found at 257 kb downstream of the SNP seq7_5182658 and 348 kb upstream of the SNP seq4_18740406, respectively. RNAi of the SQS (farnesyltransferase/squalene synthase) plants grown in plates with Yoshida’s nutrient solution with 1.2% agar show increased root length and an enhanced number of lateral roots (Manavalan et al. 2012). Over-expression SLL1 plants produced significantly longer lateral roots compared to wild-type plants (Shelley et al. 2013). Meanwhile, one region containing the lead SNP seq8_15949787 belonged to a previously reported QTL (Lilley et al. 1996) of root dehydration tolerance.
In total, 26 genes were used to conduct the quantitative real-time PCR (qRT-PCR) The primers used for the qRT-PCR are listed in the supplementary materials (Additional files 2: Table S5). Considering these loci responsible for different traits and the correlation among them, we mainly chose varieties with contrasting RN or RL. Taking into consideration that these two traits exhibit differences between the indica and japonica varieties, we mainly choose the varieties within the same subpopulation. More importantly, we also sequenced possible genes in varieties with contrasting phenotypes to find possible excellent haplotypes, taking into consideration the fact that few genes had low expression levels in roots and proved to be outstanding candidate genes for improved abiotic stress resistance. Interestingly, we not only found two genes that differ in relative expression level between differing RLs (Additional files 2: Table S6), but we also found that one gene’s sequence variation is associated with RN. The results are as follows.
One candidate region qRL-4 of about 480 kb for RL, MRL, and RGA including the significant SNP is located at about 5.3 Mb on chromosome 4 (Fig. 4A). In qRL-4, we determined that LOC_Os04g09900(OsCyc1) encodes syn-copalyl diphosphate (syn‐CDP) synthase, which is responsible for phytoalexin biosynthesis (Otomo et al. 2004). A comparison of expression levels of this gene among the 10 accessions with extreme RL differences (p < 0.01) was conducted by qRT-PCR. LOC_Os04g09900 had a higher expression level in the group with longer root length (LRL group) than the group containing varieties with shorter root length (SRL group) (p = 0.0330) (Fig. 5). Another gene, LOC_Os04g10060 ( OsDTS2), encodes 9β-pimara-7,15-diene synthase. Previous qRT-PCR results (Wilderman et al. 2004) showed that varieties whose root length was longer had higher expression level. Another region, qRN-5, for RN and RGA is about 300 kb. It contains a cluster of significant SNPs and the lead SNP is seq5_6945054 (Fig. 4B). Two known genes, LOC_Os05g11810 (OsGA2ox10) and LOC_Os05g12260(OsPYL), were identified in qRN-5. LOC_Os05g11810 encodes the gibberellin 2-oxidases (GA2oxs), which can regulate plant growth by inactivating endogenous bioactive gibberellin (GA) (Lo et al. 2008), and LOC_Os05g12260 encodes a rice orthologue of the ABA receptor (Kim et al. 2014). From the sequencing results, we only found sequence variations of LOC_Os05g11810 among the varieties with root number differences (Fig. 6). Based on these, we divided varieties into two haplotypes (hap1 and hap2). Hap2, sharing the same haplotype with nipponbare rice, had fewer roots compared with hap1 (p = 0.0005). Additionally, both of the two groups had low transcripts which may cause that although hap2 group had a higher mean value of relative expression level than hap1, it was not statistically significant (p = 0.3727).