Statistical analysis of phenotypic variation in anaerobic germination tolerance
A total of 209 indica and japonica rice accessions from all over the world (Table S1) were tested for anaerobic germination. CL, CD, CSA and CV on the second, third and fourth days of germination were used as the indexes of submergence tolerance. Abundant genetic variation was detected in this indica and japonica rice accession population (Fig. 1). Analysis of variance showed that the 209 rice accessions had significant differences in submerged coleoptile phenotype at different time points. At the same time, both genotype and length of submergence time had significant influences on coleoptile phenotype (Table 1). The variation coefficients of CL, CSA and CV were greater than 10% on the 2nd, 3rd and 4th days of submergence, indicating that the CL, CSA and CV of the different varieties were significantly different. The variation coefficients of CL, CD, CSA and CV were the greatest on the second day of submerging. With increasing submerging time, the variation coefficients of each phenotype decreased gradually, indicating that the differences in each phenotype were significant on the second day of submerging, and the coleoptiles grew rapidly to avoid the anaerobic environment.
Table 1 Phenotypic variation of rice coleoptile during submergence germination.
Phenotype
|
Environments
|
Mean ± SE
|
Range
|
Coefficient of variation (%)
|
G × E
|
CL
|
AN2d
|
0.2226 ± 0.0132 cm
|
0.0062 ~ 0.6463 cm
|
18.7571
|
***
|
AN3d
|
0.6629 ± 0.0258 cm
|
0.0334 ~ 1.7719 cm
|
13.0543
|
|
AN4d
|
1.2068 ± 0.0498 cm
|
0.1659 ~ 3.8798 cm
|
12.9767
|
|
CSA
|
AN2d
|
0.0266 ± 0.0051 cm2
|
0.0045 ~ 0.6463 cm2
|
56.1438
|
|
AN3d
|
0.1104 ± 0.0048 cm2
|
0.0098 ~ 0.3259 cm2
|
14.3195
|
|
AN4d
|
0.2030 ± 0.0089 cm2
|
0.0201 ~ 0.7689 cm2
|
13.7986
|
|
CD
|
AN2d
|
0.4543 ± 0.0272 mm
|
0.1092 ~ 0.6597 mm
|
22.4859
|
|
AN3d
|
0.5155 ± 0.0137 mm
|
0.3908 ~ 0.6837 mm
|
7.9094
|
|
AN4d
|
0.5496 ± 0.0065 mm
|
0.4452 ~ 1.0680 mm
|
3.5583
|
|
CV
|
AN2d
|
0.0008 ± 0.0001 cm3
|
0.0001 ~ 0.0625 cm3
|
35.6686
|
|
AN3d
|
0.0015 ± 0.0001 cm3
|
0.0049 ~ 0.0015 cm3
|
18.3201
|
|
AN4d
|
0.0030 ± 0.0010 cm3
|
0.0003 ~ 0.0419 cm3
|
20.8486
|
|
It can be seen from the histogram of frequency distribution (Fig. 2) that after 4 days of continuous oxygen-deficiency treatment, the AG tolerance indexes were consistent with a normal distribution (except CV at 2 and 4 days of submergence), indicating that these traits are regulated by many genes with small effects.
Correlation Analysis Of Phenotypic Traits
Correlation analysis showed correlations among the coleoptile phenotypes after 2 days, 3 days and 4 days of submergence stress (Table 2). Among them, CV-AN2d/CL-AN2d and CSA-AN2d/CSA-AN3d/CSA-AN4d/CD-AN2d/CD-AN3d were correlated, but there was no correlation between CV-AN4d and CD-AN2d. CV-AN2d was negatively correlated with CL-AN3d, CL-AN4d, CD-AN4d, CV-AN3d and CV-AN4d. CL-AN3d/CSA-AN3d/CV-AN3d, CL-AN4d/CSA-AN4d, and CSA-AN3d/CV-AN3d had very significant positive correlations with a P-value > 0.800.
Table 2 Correlation analysis of coleoptile phenotpye.
Trait
|
CL - AN2d
|
CL - AN3d
|
CL - AN4d
|
CSA -AN2d
|
CSA - AN3d
|
CSA - AN4d
|
CD - AN2d
|
CD - AN3d
|
CD - AN4d
|
CV - AN2d
|
CV - AN3d
|
CV- AN4d
|
CL - AN2d
|
1
|
|
|
|
|
|
|
|
|
|
|
|
CL - AN3d
|
0.521**
|
1
|
|
|
|
|
|
|
|
|
|
|
CL - AN4d
|
0.475**
|
0.739**
|
1
|
|
|
|
|
|
|
|
|
|
CSA -AN2d
|
0.767**
|
0.442**
|
0.405**
|
1
|
|
|
|
|
|
|
|
|
CSA - AN3d
|
0.567**
|
0.948**
|
0.725**
|
0.385**
|
1
|
|
|
|
|
|
|
|
CSA - AN4d
|
0.445**
|
0.657**
|
0.912**
|
0.403**
|
0.668**
|
1
|
|
|
|
|
|
|
CD - AN2d
|
0.290**
|
0.268**
|
0.291**
|
0.304**
|
0.353**
|
0.250**
|
1
|
|
|
|
|
|
CD - AN3d
|
0.215**
|
0.180**
|
0.189**
|
0.266**
|
0.313**
|
0.293**
|
0.632**
|
1
|
|
|
|
|
CD - AN4d
|
0.226**
|
0.148*
|
0.181**
|
0.435**
|
0.195**
|
0.272**
|
0.404**
|
.729**
|
1
|
|
|
|
CV - AN2d
|
0.066
|
-0.034
|
-0.023
|
0.128
|
0.042
|
0.002
|
0.054
|
0.005
|
-0.01
|
1
|
|
|
CV - AN3d
|
0.549**
|
0.885**
|
0.710**
|
0.463**
|
0.924**
|
0.687**
|
0.394**
|
0.452**
|
0.354**
|
-0.032
|
1
|
|
CV- AN4d
|
0.217**
|
0.296**
|
0.436**
|
0.205**
|
0.324**
|
0.730**
|
0.01
|
0.204**
|
0.175*
|
-0.016
|
0.409**
|
1
|
** At 0.01 level (2-tailed), the correlation was significant.
|
|
|
|
|
|
|
|
* At 0.05 level (2-tailed), the correlation was significant.
|
|
|
|
|
|
|
|
The dynamic test results of coleoptile phenotypes showed significant correlations among CL and CD measurements at various time points, and these traits play key roles in seed tolerance of anaerobic conditions. This conclusion was also supported by previous work. Therefore, GWAS analysis was performed with CL and CD as phenotypes.
Whole-genome Resequencing And Polymorphism Identification
In this study, the genomes of 209 rice germplasm accessions were resequenced. Burrows-Wheeler Aligner (BWA) software was used to compare the sequencing data with the Japanese Haruka reference genome (IRGSP 1.0.27). Finally, 1,243.37 Gbp of clean data was obtained, and the Q30 reached 93.15%. The average contrast ratio between each sample and the reference genome was 93.06%, the average sequencing depth was 12 ×, and the average coverage of the reference genome was 90.90% (at least one base coverage).
A total of 2,123,725 SNP sites were detected in this dataset. Among them, the SNP types were mainly C:G > T:A and T:A > C:G (Fig. S1a). Using SnpEff software to annotate the detected SNPs, we found 663,818 SNPs with synonymous mutations and 712374 with nonsynonymous mutations (Fig. S1b).
Population Genetics And Evolutionary Analysis
Phylogenetic and population structure analysis
From the phylogenetic tree (Fig. 3a), it can be seen that the population structure used in this experiment is uniform, without strong population stratification. Based on the SNPs, Admixture (Alexander et al. 2009) software was used to analyse the group structure of the research materials. Cross-validation error analysis showed that the error peak was lowest at K = 5, indicating the optimal grouping (Fig. 3c). The population structure analysis showed no obvious pedigree differentiation in the selected plant materials, confirming that they were suitable for subsequent GWAS analysis. The phylogenetic tree results showed that the selected population could be divided into 5 subgroups, which verified the conclusion that K = 5 was the optimal result in the population structure analysis. The Q matrix with K = 5 was selected for the subsequent association analysis (Fig. 3d).
Based on the SNP data, EIGENSOFT software was used to perform principal component analysis (PCA) (Price et al. 2006) to cluster the samples (Fig. 3b). The results of PCA supported the evolutionary analysis, further confirming that the degree of discreteness of individual kinship in the population was small.
Dynamic genome-wide association analysis of coleoptile phenotypic traits under submergence
Based on the developed high-density SNP marker data, we performed a GWAS on the dynamic changes in two phenotypic traits, CL and CD, to mine new genetic loci associated with AG. The Manhattan plots and quantile-quantile (QQ) plots are shown in Fig. 4 and Fig. 5. Taking − log10 (P) value > 7.3271 as the threshold, 26 significant SNPs were detected (Table 3). Among them, 24 SNPs were significantly associated with CD (9 at AN2d, 1 at AN3d, and 17 at AN4d) and were distributed on chromosomes 1, 2, 3, 5, 6, 7, 8, and 11. Two SNPs were significantly correlated with CL (0 at AN2d, 1 at AN3d, and 2 at AN4d), and they were distributed on chromosomes 1 and 3.
Table 3 Genome-wide significant associations for coleoptiles phenotype under anaerobic germination (AG) using MLM.
Traits
|
SNPa
|
Chr.
|
Peak positionb
|
P value
|
—log10 ( P )
|
Allleles
|
MAF
|
Known loci
|
CD-AN2d
|
chr1_ 39674301
|
1
|
39674301
|
1.48E-10
|
9.83
|
A/C
|
0.2
|
Hsu & Tung, 2015; qCD-1(ES)(Yang et al. 2019)
|
chr2_ 35360509
|
2
|
35360509
|
7.98E-11
|
10.1
|
A/G
|
0.2
|
|
chr2_ 35376690
|
2
|
35376690
|
4.84E-09
|
8.32
|
C/A
|
0.2
|
|
chr5_ 7135836
|
5
|
7135836
|
2.32E-11
|
10.63
|
C/T
|
0.1
|
|
chr5_ 22190175
|
5
|
22190175
|
2.69E-11
|
10.57
|
G/A
|
0.2
|
|
chr6_ 20797781
|
6
|
20797781
|
8.49E-12
|
11.07
|
G/A
|
0.1
|
Hsu & Tung, 2015
|
chr8_ 13123867
|
8
|
13123867
|
2.65E-11
|
10.58
|
G/T
|
0.1
|
|
chr8_ 11595732
|
8
|
11595732
|
5.64E-10
|
9.25
|
A/T
|
0.1
|
|
CD-AN3d
|
chr5_ 27918754
|
5
|
27918754
|
1.69E-09
|
8.77
|
A/G
|
0.4
|
Hsu & Tung, 2015
|
CD-AN4d
|
chr1_ 32866906
|
1
|
32866906
|
1.24E-09
|
8.91
|
G/A
|
0.2
|
|
chr3_ 27854371
|
3
|
27854371
|
5.11E-16
|
15.29
|
A/G
|
0.3
|
|
chr3_ 2828271
|
3
|
2828271
|
4.16E-11
|
10.38
|
A/G
|
0.2
|
|
chr3_ 28290634
|
3
|
28290634
|
3.02E-10
|
9.52
|
G/A
|
0.1
|
|
chr3_ 28191513
|
3
|
28191513
|
3.83E-10
|
9.42
|
C/T
|
0.1
|
|
chr3_ 28336693
|
3
|
28336693
|
3.90E-10
|
9.41
|
C/T
|
0.2
|
|
chr3_ 32459722
|
3
|
32459722
|
1.20E-09
|
8.92
|
C/T
|
0.5
|
|
chr3_ 7243651
|
3
|
7243651
|
4.89E-09
|
8.31
|
C/A
|
0.2
|
|
chr3_ 15571447
|
3
|
15571447
|
2.23E-08
|
7.65
|
G/T
|
0.3
|
|
chr5_ 27918754
|
5
|
27918754
|
1.72E-11
|
10.76
|
A/G
|
0.4
|
Hsu & Tung, 2015
|
chr5_ 18621890
|
5
|
18621890
|
9.76E-11
|
10.01
|
A/C
|
0.1
|
|
chr5_ 21585755
|
5
|
21585755
|
1.21E-08
|
7.92
|
T/A
|
0.2
|
|
chr7_ 18722403
|
7
|
18722403
|
3.45E-09
|
8.46
|
T/C
|
0.2
|
qAG 7.2 (Septiningsih et al. 2013; Hsu & Tung, 2015)
|
chr8_ 22721012
|
8
|
22721012
|
1.31E-09
|
8.88
|
A/G
|
0.2
|
|
chr8_ 9946213
|
8
|
9946213
|
2.51E-08
|
7.6
|
T/A
|
0.2
|
Hsu & Tung, 2015
|
chr11_ 19165397
|
11
|
19165397
|
4.62E-09
|
8.34
|
C/A
|
0.3
|
qAG 11(Jiang et al. 2006; Angaji et al. 2009; Zhang et al. 2017)
|
CL-AN3d
|
chr3_ 24689629
|
3
|
24689629
|
4.71E-08
|
7.33
|
T/C
|
0.4
|
|
CL-AN4d
|
chr1_ 25099585
|
1
|
25099585
|
1.73E-08
|
7.76
|
G/A
|
0.2
|
|
chr3_ 24689629
|
3
|
24689629
|
1.83E-08
|
7.74
|
T/C
|
0.4
|
|
Note: MAF minor allele frequency;
|
|
|
|
|
|
|
|
a The SNP positions were based on the annotation data on Os-Nipponbare-Reference-IRGSP-1.0 (RAP-DB, http://rapdb .dna.affrc .go.jp/);
|
|
|
b Position of the SNP showing the most significant association for AG.
|
|
|
|
|
|
To verify the accuracy of our results, we compared them with previously reported QTLs (Jiang et al. 2006; Yang et al. 2019) related to the control of coleoptile elongation and the survival rate of plants under flooding conditions (Angaji et al. 2010; Baltazar et al. 2014; Septiningsih et al. 2013). We found that some of our QTLs has been identified before in rice submerged germination tolerance. We found that 6 genomic regions were colocalized (grey and orange blocks in Fig. 6). A SNP on chromosome 1 (S1_ 39674301), which was significantly associated with CD in this study, was located in the qCD-1(ES) gene interval reported by Yang et al. (2019) as closely related to CD during the early cropping season. One SNP (S6_20797781) located on chromosome 6 was very close to a gene locus detected in another association study (Hsu & Tung, 2015). The physical distance between these two SNPs was only approximately 4 kb. Another SNP (S7_18722403) significantly correlated with CD at AN4d was found inside the genomic interval of qAG7.2 on chromosome 7; this locus was previously detected using linkage mapping and association mapping (Septiningsih et al. 2013; Hsu & Tung, 2015), validating its genetic effect. On chromosome 8, a SNP (S8_ 9946213) significantly associated with CD was very close to a site detected in another related study as closely related to the anaerobic response index (treated CL vs. control CL; Hsu & Tung, 2015), and the physical distance between them was approximately 95 kb. On chromosome 11, another SNP (S11_ 19165397) significantly associated with CD was located in the genomic region of qAG 11, which was closely related to the submerged CL (Hsu and Tung, 2015) and the anaerobic response index (qAG 11, Jiang et al. 2006; Angaji et al. 2009).
In addition, some loci were identified at multiple anoxia time points. CD was correlated with SNP S5_ 27918754 at AN3d and AN4d (Fig. 5), as reported in a previous study (Hsu & Tung, 2015). In addition, CL was corelated with SNP S3_24689629 (Fig. 4) at AN3d and AN4d, and this result had not been reported previously.
Identification Of Candidate Genes
We selected 7 loci from the GWAS results, including S3_ 24689629 and S5_ 27918754, which were repeatedly detected for CD and CL, respectively, at both AN3d and AN4d, indicating that these two loci were highly correlated with coleoptile elongation and thickening under hypoxic conditions. In addition, S1_39674301, S6_20797781, S7_18722403, S8_9946213, and S11_19165397 (Hsu & Tung, 2015; Yang et al. 2019; Septiningsih et al. 2013; Zhang et al. 2017; Jiang et al. 2006; Angaji et al. 2009) were consistent with previous research results and were included. Based on the linkage disequilibrium (LD) decay distance (100–350 kb) (Fig. S2), we detected a total of 106 differentially expressed genes (DEGs) in these seven trait-associated SNP sites. To more effectively identify candidate genes related to AG in rice seeds, RNA-seq analysis was carried out on RNA extracted from germinated seeds of the flooding-tolerant variety YZX at AN0d, AN2d, AN3d, AN4d, AN3dA1d, and A3dAN1d. We normalized the FPKM values of these DEGs as follows (Fig. 6): log2 FC (AN3dA1d/AN3d), log2 FC (AN3dA1d/AN4d), log2 FC (A3dAN1d/A3d), and log2 FC (A3dAN1d/A4d). Based on the candidate gene selection criteria | log2 FC | ≥ 1, FDR ≤ 0.05 and FPKM ≥ 1 (special: log2 FC (AN3dA1d/AN3d), log2 FC (AN3dA1d/AN4d) ≤ 1; log2 FC (A3dAN1d/A3d), and log2 FC (A3dAN1d/A4d) ≥ 1), a total of 4 DEGs were obtained.
According to the gene annotation information of the rice reference genome, we concluded that these four DEGs, gibberellin 2-beta-dioxygenase (OsGA2ox8, Os05g0560900), berberine bridge enzyme-like 13 (Os06g0548200), protein dehydration-induced 19 (OsDi19-1, Os05g0562200), and B3 domain-containing protein VP1 (OSVP1, Os01g0911700), had major impacts on AG in rice seeds. The results of qRT-PCR (Fig. 7) were similar to the results of RNA-sEq. These four genes were specifically expressed under hypoxic conditions, their expression levels increased with increasing hypoxia time, and their expression levels decreased after transfer to aerobic conditions. These results indicated that these four DEGs had strong correlations with submerged germination in rice.
Four New Genes Related To The Anaerobic Germination Phenotype
To further verify the associations between the candidate genes and the germinating coleoptile phenotype under AG conditions, we performed a full-length sequencing analysis of the four candidate genes in materials with extreme phenotypes (11 high-AG materials and 11 low-AG materials) (Fig. 8).
Os01g0911700 (OsVP1) encodes the VP1 protein, which contains the B3 domain and is a key transcription factor in the abscisic acid (ABA) signal transduction pathway. Among the 11 high-AG materials and 11 low-AG materials, we identified 11 allelic variants and 3 haplotypes of this gene (Fig. 8a). Of the 11 allelic variations, 10 included mutation sites within the gene, and these sites were mainly distributed in the first, third and sixth exon regions. The 11 high-AG varieties were different from the low-AG varieties at these 10 loci. The haplotype analysis of the extreme materials showed that Hap.2 (compared to Hap.3, all the variable sites except + 471, downstream of the start codon, were different) was associated with the high-AG phenotype based on the traits of the coleoptile at AN4d. The average CL value was 2.932 cm. Plants carrying Hap.3 (in which none of the 11 variable sites were mutated) all had the low-AG phenotype, and the average CL value at AN4d was 0.257 cm (Table S4). Therefore, we think that the mutations at the remaining 10 sites have a greater impact on the AG phenotype than the mutation at site + 471 (downstream of the start codon).
The Os05g0560900 (OsGA2ox8) gene encodes a gibberellin 2-beta-dioxygenase, which is involved in GA catabolism. From the gene sequences of the 11 high-AG materials and 11 low-AG materials, we identified 17 allelic variations (Fig. 8b), which were mainly located in the first intron region. Two were located in the first and third exon regions, and the first was located at position 431 (C → G) after the start codon, causing a missense mutation of glycine (G) to cysteine; the second was located at position 1323 (A → T) after the start codon, causing a missense mutation from glutamine (Q) to leucine (L). Haplotype analysis showed 8 haplotypes in these 22 extreme materials (Table 4). Among them, Hap.2 was associated with the high-AG phenotype. From the perspective of gene sequence, the high-AG phenotype appeared to be caused by a heterozygous mutation or deletion at + 1235 (downstream of the start codon) and was apparent in seedlings at AN4d, when the average CL was 2.931 cm. Hap.5 and Hap.6 had missense mutations in the first exon (+ 431) and the third exon (+ 485); plants with these haplotypes showed both high-AG and low-AG phenotypes, and their average CL values at AN4d were 2.412 and 2.274 cm, respectively. In contrast, Hap.8 was associated with the low-AG phenotype, and the average CL value of plants with this haplotype at AN4d was 0.257 cm (Table S4).
Os05g0562200 (OsDi19-1) encodes the dehydration-induced Dil9 protein, which contains two zf-Di19 and one Di19_C domains. From the gene sequences of 22 (11 high-AG and 11 low-AG) materials, we identified 15 allelic variants, all of which were located within the gene (Fig. 8c). The gene sequences of the low-AG material were mainly similar to the sequence of the reference genome (IRGSP-1.0.27), and the gene sequences of the high-AG material harboured different mutations at these 15 loci. We speculated that these 15 allelic variations were closely related to the high-AG phenotype. In particular, three missense mutations were located in the UTR, and two were located in the third and fourth exon regions. The mutation at 120 bases upstream of the start codon is T → G, the mutation at 80 bases upstream of the start codon is C → T, the mutation 1395 bases downstream of the start codon is A → G, the mutation at1860 bases downstream of the start codon is T → C (isoleucine to threonine), and the mutation at 2727 bases downstream of the start codon is G → A. Haplotype analysis showed 7 haplotypes among these 22 extreme materials (Table 4). Hap.2 was the reference genome sequence, and Hap.2 plants mainly showed the low-AG phenotype, with an average CL value at AN4d of 0.288 cm. The average CL values were 2.391 and 2.269 cm, respectively (Table S4).
Os06g0548200 encodes berberine bridge enzyme (BBE). We selected 11 high-AG and 11 low-AG materials and identified 9 allelic variations in their gene sequences (Fig. 8d) that were potentially related to the AG phenotype. One G → A mutation was located 31 bases upstream of the start codon; a missense mutation was located 645 bases downstream of the start codon (C → G, proline (P) to arginine (R)); a missense mutation (T → G) located 1607 bases downstream of the start codon resulted in an aa substitution from isoleucine (I) to serine (S); a missense mutation (A → T) was located 1685 bases downstream of the start codon (asparagine (N) to I). These four allelic variants had only one genotype (mutant type) in the high-AG materials, while there were two different genotypes (mutant and reference genotypes) in the low-AG materials, and these had polymorphisms located in the UTR and outside the exon regions. Haplotype analysis showed 10 haplotypes in these 22 extreme materials (Table 4), of which Hap.5 showed a high-AG phenotype, and its average CL at AN4d was 2.274 cm. Hap.8 showed a low-AG phenotype, and its average CL at AN4d was 0.273 cm (Table S4). Hap.5 had mutations at -31, + 645, +1607, and + 1685, and Hap. 8 had the opposite pattern, so we believe that these four sites may be closely related to the high-AG phenotype.
Metabolic Spectrum Detection Of Rice Seeds Under Anaerobic Germination
LC-MS/MS was used to obtain the mass spectrum data of samples, and qualitative and quantitative analysis was carried out on the basis of a self-established database (MWDB) by Metware Company and public metabolite information database. In total, 730 metabolites were identified, including 32 substances and their derivatives (Table S2).
Through the detection of metabolites in the seeds of the YZX rice variety during AG, we found that the contents of metabolites regulated by the four genes identified above were specifically correlated with hypoxic stress. Among them, we detected three biologically active GAs, GA9, GA15, and GA20, which are related to the regulation of GA catabolism by the OsGA2ox8 (Os05g0560900) gene. The contents of GA9 and GA15 increased within 2 days after submergence, and the content at AN4d decreased compared to that at AN3d; the content of GA20 decreased to its lowest value within 2 days after flooding and then did not change (Table S3, Fig. 10). OsVP1 (Os01g0911700) and OsDi19-1 (Os05g0562200) participated in the regulation of ABA-related phenotypes. Quantitative results showed that ABA content decreased within 3 days after flooding and then suddenly increased on the fourth day (Table S3, Fig. 10). The BBE-like enzyme encoded by Os06g0548200 is involved in plant phenylpropanoid metabolism. Quantitative results showed that the contents of shikimic acid and Phe increased with increasing submerging time (Table S3, Fig. 9).