Grain Type Phenotype Investigation
The wild-type (WT) material used in this study was Huahang No. 31 (Fig. 2a). A total of 3872 seeds of the M2 generation per plant were harvested. The grain length was between 8 and 10.22 mm, the average length was 9.28 mm, and the coefficient of variation was 1.89%. The grain length was 9.31 mm (Fig. 2c). The grain width was between 1.54 and 2.87 mm, with an average width of 2.02 mm, and the WT grain width was 2.03 mm, with a coefficient of variation of 0.56% (Fig. 2d). Both the grain length and grain width conformed to a normal distribution and had a wide variation range. Compared with the WT, there were many materials with large grain type differences, indicating that there were several potential grain type mutations (Fig. 2b)
Table 1
Grain type phenotype survey
trait
|
Min
|
Max
|
Average
|
Standard deviation
|
Coefficient of variation/%
|
Length (mm)
|
8.00
|
10.22
|
9.28
|
0.18
|
1.89
|
Width (mm)
|
1.54
|
2.87
|
2.02
|
0.01
|
0.56
|
Targeted Sequencing of Mixed Samples to Screen a Mixed Pool of Potential Mutations
A total of 484 mixed samples and 1 WT sample were obtained by mixing 3872 individual DNAs in equal amounts at a ratio of 8:1. The detected fragments were the grain length major gene GS3 (Fig. 3a) and the grain width major gene GW5 (Fig. 3b). After evaluating the target segment location information, a set of primers that could cover the entire target segment (Supplementary Table S1) was designed to perform targeted sequencing on 485 pooled samples (Fig. 3c). After the sequencing was completed, the vaf of the SNP site was first analyzed and calculated (the calculation method is shown in Fig. S 1), and the WT sample was used as a reference to screen for mutation sites: 1) When the parent is a pure genotype at this site, the offspring and the parent are at this site; 2) when the parent is a heterozygous genotype at this locus and the mutation frequency difference between the offspring and the parent at this locus is greater than or equal to 1/16, the locus shall be retained (site retention).
According to the above screening method, a total of 179 mutation sites were obtained, all of which were homozygous mutations, of which 110 sites were in the GS3 interval and 69 were in the GW5 interval. The total mutation frequency in the GS3 interval was calculated to be 4.05× 10 − 5, and the total mutation frequency in the GW5 interval was 9.02×10 − 5 (total mutation frequency = mutation base number/gene fragment length). Among the 179 mutation sites, 63.57% were located in the intron region, and 30% were located in the exon region (Fig. 3d). We retained only the nonsynonymous and nonsense mutations located in the exon region that could cause amino acid changes. At the same time, reliable sites with relatively high reads were screened, and a total of 15 SNPs were obtained (Table 2), of which 14 were nonsynonymous mutations and 1 was a nonsense mutation, including 11 GS3 interval loci and 4 GW5 interval locus points (Fig. 3e), for a total of 12 mixed samples.
Table 2
Targeted sequencing mutation site information
SNP number
|
Mixed sample number
|
Gene
|
Location
|
Ref
|
Mut
|
Function type
|
SNP-1
|
4-101
|
GS3
|
16732992
|
G
|
C
|
nonsynonymous SNV
|
SNP-2
|
4-101
|
GS3
|
16734441
|
G
|
T
|
stopgain
|
SNP-3
|
5–6
|
GS3
|
16734009
|
G
|
C
|
nonsynonymous SNV
|
SNP-4
|
6–23
|
GS3
|
16731920
|
C
|
A
|
nonsynonymous SNV
|
SNP-5
|
6–44
|
GS3
|
16729753
|
A
|
T
|
nonsynonymous SNV
|
SNP-6
|
6–44
|
GS3
|
16729861
|
G
|
T
|
nonsynonymous SNV
|
SNP-7
|
7-112
|
GS3
|
16735064
|
G
|
C
|
nonsynonymous SNV
|
SNP-8
|
7–41
|
GS3
|
16729815
|
C
|
T
|
nonsynonymous SNV
|
SNP-9
|
7–78
|
GS3
|
16729903
|
G
|
A
|
nonsynonymous SNV
|
SNP-10
|
7–78
|
GS3
|
16729863
|
C
|
A
|
nonsynonymous SNV
|
SNP-11
|
9 − 5
|
GS3
|
16729886
|
C
|
A
|
nonsynonymous SNV
|
SNP-12
|
5–35
|
GW5
|
5365411
|
A
|
G
|
nonsynonymous SNV
|
SNP-13
|
7–82
|
GW5
|
5366479
|
C
|
G
|
nonsynonymous SNV
|
SNP-14
|
10–14
|
GW5
|
5366520
|
T
|
G
|
nonsynonymous SNV
|
SNP-15
|
10–51
|
GW5
|
5366501
|
A
|
G
|
nonsynonymous SNV
|
Screening of Individual Mutant Plants and Identification of Their Authenticity
To further screen out the mutant individual plants from the mixed samples, we isolated individual plants in the 12 mixed samples, which contained a total of 96 individual plant materials, and added 200 bp before and after the 15 SNPs to amplify the SNP-containing strains. The 96 individual plant materials of the fragment were subjected to Sanger sequencing and compared with the results of targeted sequencing to determine the target mutant individual plant. A total of 13 loci were consistent with the targeted sequencing results, among which the Sanger sequencing results of SNP-5 and SNP-6 were different from the targeted sequencing results (Fig. 4a), so the mutants at these two loci were excluded and targeted sequencing. The concordance rate with Sanger sequencing was 86.67%, and a total of 13 mutants were screened. The complete 15 SNP results are shown in Fig. S2.
To verify the authenticity of the selected mutants, we identified the authenticity of the selected 13 single-plant materials according to the technical regulations for the identification of rice varieties (SSR marking method) issued by the Ministry of Agriculture and designed a total of 10 pairs of SSR markers. The agarose gel electrophoresis detection results of the 13 mutant individual plants were consistent with the WT, indicating that they were all true mutations (Fig. 4b).
Phenotypic Verification and Protein Function Analysis of Individual Mutant Plants
After verification, the corresponding grain type and phenotype data of the real variant individual plants were found according to the number. Only 6 of the 13 individual plants showed significant changes in grain type, and the grain types of the remaining 7 individual plants were the same as those of the control. There was no significant difference in the ratios between samples (Table 3). According to the screening results of targeted sequencing, 13 SNPs are nonsynonymous and nonsense mutations, which theoretically lead to amino acid changes, while some SNPs do not cause significant changes in phenotype, presumably not changing the function of a protein or structural or other genetic mutations.
Table 3
Amino acid mutation and phenotypic information of single mutant plants
Mutant number
|
Single plant number
|
Amino acid mutation
|
Grain length (mm)
|
Grain width (mm)
|
WT
|
WT
|
|
9.31
|
2.03
|
GS3-1
|
7-112-2-4
|
A5G
|
9.21
|
2.12
|
GS3-2
|
4-101-1-3
|
C55X
|
10.22**
|
2.03
|
GS3-3
|
5-6-1-4
|
C135W
|
8.06**
|
2.04
|
GS3-4
|
4-101-2-2
|
S141C
|
9.41
|
2.17
|
GS3-5
|
6-23-1-4
|
C173F
|
8.51**
|
1.9
|
GS3-6
|
7-78-1-3
|
G167C
|
9.33
|
1.94
|
GS3-7
|
9-5-1-1
|
C190F
|
9.97**
|
1.86
|
GS3-8
|
7-78-2-3
|
S184F
|
9.33
|
2.09
|
GS3-9
|
7-41-1-4
|
R200H
|
9.30
|
1.9
|
GW5-1
|
5-35-1-2
|
V411G
|
9.53
|
2.29**
|
GW5-2
|
7-82-1-4
|
A405G
|
9.42
|
2.02
|
GW5-3
|
10-14-2-1
|
K397E
|
9.46
|
2.03
|
GW5-4
|
10-51-2-3
|
D97G
|
9.40
|
1.84**
|
*: represents significant variation at p < 0.05; **: represents highly significant variation at p < 0.01 |
We screened a total of 9 grain length mutants, of which 2 grain lengths were significantly longer and 2 grain lengths were significantly shorter than WT grains (Fig. 5a). Nine SNPs related to GS3 were identified, including 8 nonsynonymous mutations and 1 nonsense mutation (GS3-2), in which GS3-1 was located in exon 1 and GS3-2 was located in exon 2. The remaining seven mutations were located in exon 5 (Fig. 5b). The mutation position of GS3-1 is relatively advanced, and it is not located in the functional structural region and has no effect on the structure and function of the protein, so the grain length does not change significantly (Fig. 5b). GS3-2 is located in the OSR domain, the 55th amino acid is mutated to a stop codon, the OSR domain is deleted, and the protein structure and function are severely affected (Fig. 5d), resulting in a significant increase in grain length. Both GS3-3 and GS3-4 are located in the TNFR domain of Cys-rich mutants, and the grain length of GS3-3 is significantly reduced. Protein structural analysis showed that the mutation of amino acid No. 135 leads to two additional β sheets in the secondary structure of the protein. and presumably resulted in impaired TNFR domain function (Fig. 5d), whereas the GS3-4 grain length was not significantly altered. GS3-5, GS3-6, GS3-7, GS3-8 and GS3-9 are all located in the Cys-rich VWFC domain, among which only the grain length of GS3-5 is significantly reduced, and the mutation of GS3-5 may lead to the impaired structure of the VWFC domain function, but there is no significant difference in protein structure compared with WT plants (Fig. S3). The GS3-6, GS3-8 and GS3-9 phenotypic results were similar to those of GS3-4; functional domain amino acid point mutation occurred, but the phenotype did not change significantly. It was speculated that the mutation of these 4 amino acids may not affect the function of the protein or that other gene mutations have an impact on the phenotype; however, the grain length of GS3-7. In contrast, the functional site analysis of its protein showed that the mutation of amino acid 183 of GS3-7 was located in the ligand binding site of the protein (Fig. 5c). After the mutation, the function of the protein was affected, so the particle shape changed. However, GS3-4, GS3-5, GS3-6, GS3-8 and GS3-9 showed no significant difference when compared to the WT in terms of protein structure and function (Fig. S3).
We screened 4 mutants with grain width, of which 1 grain width increased, 1 grain width decreased, and the remaining two grain widths had no significant changes (Fig. 6a). The four identified GW5-related SNPs were all nonsynonymous mutations, of which GW5-4 was located in exon 1, and the other three were located in exon 2 (Fig. 6b). GW5-1, GW5-2 and GW5-3 are all located in the calmodulin-binding domain; the difference is that only the granule width of GW5-1 is significantly wider than that of other mutants, and the other two mutants have no obvious change in phenotype. We predicted the protein structure and function of mutants and found that the protein structures of GW5-1 and GW5-2 were more similar to each other than to the WT, while the protein structure of GW5-3 had no obvious change (Fig. 6c). The phenotypes of GW5-1 and GW5-2 of the same domain differ, presumably due to interference from other genes. The GW5-4 mutation position is relatively forward, not located in the functional structural region, and has no effect on the structure and function of the protein (Fig. 6b), but the grain width is significantly narrowed, and the protein structure is relatively concentrated (Fig. 6c).
WGS of Key Mutants Identifies New SNPs Affecting the GS3 Mutation Effect
To explore the reasons for the contradiction between genotype mutation and phenotype mutation and to clarify whether allelic variation in other grain length-related genes had an impact on the phenotype, we used the mutant GS3-5. For GS3-7, DNA was extracted, and equal amounts of DNA were mixed to construct mixed sample GS3-M1; mutants GS3-4, GS3-6, GS3-8, and GS3-9 with genotype mutation but no change in phenotype were utilized to construct mixed sample GS3-M2; and WT plants were utilized to prepared the WT sample. A total of 3 samples were subjected to WGS.
A total of 2,084,534 SNPs and 336,039 InDels were obtained by sequencing GS3-M1, and 2,116,343 SNPs and 341,777 InDels were obtained from GS3-M2. After screening, three new allelic variants related to grain length were finally obtained (Table 4) (Fig. 7a). GS3-G1 is located in the second exon of OsNST1, which mutates serine No. 65 to threonine. At present, there are few reports on this gene, and its protein structure cannot be predicted. Mutants exhibit reduced cell wall cellulose content and structural changes, resulting in reduced mechanical strength and abnormal plant development, such as dwarf plants and smaller seed size (Song et al., 2011). GS3-G2 is a variant located in the first exon of OsMAPK6 that mutates the aspartic acid at No. 131 to arginine, which affects only one of its functional domains. Inhibition of OsMPK6 expression can make rice panicles denser and grains smaller, and mutation of this gene can significantly reduce grain length, grain width and thousand-grain weight (Guo et al., 2018). GS3-G3 is located in the second exon of RAE2, resulting in a frameshift insertion mutation at amino acid 99 and impaired function of the cysteine-rich region of the encoded protein EPFL1. The number of kernels decreased, the kernels became longer, and the proportion of awned kernels increased (Jin et al., 2016) (Fig. 7b).
Table 4
Whole-genome mutation site information
Number
|
Chr
|
Location
|
Ref
|
Alt
|
Structure gene
|
Function type
|
GS3-G1
|
2
|
24236917
|
C
|
T
|
Os02g0614100
|
nonsynonymous SNV
|
GS3-G2
|
6
|
2812755
|
T
|
C
|
Os06g0154500
|
nonsynonymous SNV
|
GS3-G3
|
8
|
23999540
|
-
|
C
|
Os08g0485500
|
frameshift insertion
|