Main agronomic characteristics of lsl2
To elucidate the genes that regulate flower development in rice, we screened for a floret mutant phenotype among an EMS-mutagenized population and identified a long sterile lemma 2 (lsl2) mutant in the ZH11 background. Phenotypic comparisons between the lsl2 mutant and wild-type ZH11 are presented in Table 1. The results showed no significant differences in major agronomic traits, including plant height, panicle length, number of effective panicles, spikelets per panicle, seed-setting rate, 1,000-grain weight, grain length and grain width.
Phenotypic observation and analysis of the lsl2 mutant
In the vegetative stage, ZH11 and lsl2 plants showed indistinguishable phenotypes, but their spikelets displayed different phenotypes from the boot stage to the mature stage (Table 1, Fig. 1a, b). The ls12 mutants exhibited a much longer sterile lemma than that of ZH11, though other components of the spikelet were the same (Fig. 1a, b). Interestingly, there was no significant difference in grain size or brown rice size between lsl2 and ZH11 after maturation (Table 1, Fig. 1c, d).
We compared the germination rates of seeds between lsl2 and ZH11 and found that on the second day, wild-type ZH11 started sprouting (69.3%) but that the lsl2 mutant had barely begun to germinate (2.3%) (Fig. 2 and Table 2). Compared to wild-type, the lsl2 mutants showed obviously reduced germination rates from the second day to the fourth day (Table 2).
Genetic analysis of the lsl2mutant
To determine whether the lsl2 mutant is caused by a single gene, we next crossed the lsl2 mutant with ZH11. The F1 generation showed normal phenotypes, and the F2 population showed Mendelian segregation (Table 3). Indeed, segregation between the wild-type and mutant plants fit a 3:1 segregation ratio in the two F2 populations (χ2=0.124～0.462, P>0.5), indicating that the lsl2 mutant phenotype is controlled by a single recessive gene.
Initial localization of the lsl2 gene
To determine which gene mutation causes the lsl2 phenotype, we next mapped the lsl2 gene. Two SSR markers, RM4584 and RM2006, located on rice chromosome 7, were found to be associated with mutant traits in 193 F2 individuals. Based on the recombination frequency, the genetic distance between RM4584 and RM2006 was calculated to be 28.8 cM. Therefore, lsl2 is located in a 28.8-cM region on chromosome 7 flanked by SSR markers RM4584 and RM2006 (Fig. 3a).
Fine mapping of the lsl2 gene
To delineate the gene to a smaller region, an accurate map was constructed between RM4584 and RM2006 by using published markers (Table 4). Through genetic linkage analysis, the lsl2 gene was mapped between the molecular markers RM8059 and RM427, with a distance of 7.6 cM (Fig. 3b). For further mapping, genotyping of all recombinant genes was performed using 9 polymorphic markers (Table 4). The results showed that the lsl2 gene is located between the molecular markers Indle7-13 and Indle7-15, with a physical distance of 205 kb (Fig. 3c and Table 4). To fine map the lsl2 gene, seven polymorphic Indel markers for recombinant screening (Table 4) detected one, one, three, three, six, seven and eleven recombinant plants, respectively (Fig. 3d). Thus, we precisely localized the lsl2 gene between the molecular markers Indel7-22 and Indel7-27, with a physical distance of 25.0 kb.
Candidate genes in the 25.0-kb region
Four candidate genes are annotated (LOC_Os07g04660, LOC_Os07g04670, LOC_Os07g04690, LOC_Os07g04700) in this 25.0-kb region (Fig. 3e). According to the available annotation database, these four genes all have a corresponding full-length cDNA. LOC_Os07g04660 encodes white-brown complex homologue protein 16, LOC_Os07g04670 a DUF640 domain containing protein, LOC_Os07g04690 UDP-arabinose 4-epimerase 1 and LOC_Os07g04700 an MYB family transcription factor.
Sequence analyses of the lsl2 gene
To analyse which gene causes the mutant phenotype, we sequenced the above four genes in ZH11 and lsl2 and found only a single 1-bp change (T to C) in LOC_Os07g04670 between wild-type ZH11 and the lsl2 mutant. No other differences in the remaining three gene sequences were observed. Thus, we speculated that the LOC_Os07g04670 locus corresponds to lsl2. Interestingly, the G1/ELE gene, encoding a DUF640 domain-containing protein, is present in this locus . Based on phenotypic similarity and localization analysis, we hypothesized that the long sterile lemma phenotype of lsl2 may be caused by functional changes in the product of the LOC_Os07g04670 locus. These results suggest that the lsl2 gene may be allelic with G1/ELE.
Analysis of the open reading fragment (ORF) showed one exon and no intron for the LSL2 gene (LOC_Os07g04670). lsl2 is a 1-bp mutant that results in the exchange of a serine (Ser) for a proline (Pro) (Fig. 4). Ser is a polar amino acid, whereas Pro is nonpolar. Such a mutation may alter the function of a protein.
The lsl2 geneis responsible for the long sterile lemma phenotype
To confirm that the mutation phenotype can be attributed to lsl2, we examined whether knockout of LSL2 in the cultivar ZH11 would lead to the long sterile lemma phenotype. One sequence-specific guide RNA (sgRNA) was designed to knock out the LSL2 gene by using the CRISPR/Cas9 gene editing system. A total of three plants from three independent events were obtained and confirmed by sequencing to carry insertions and deletions in the target sites (Table 5).
We then investigated the panicle characteristics of these three homozygous lines after maturity and found that all three exhibit a long sterile lemma phenotype (Fig. 5), indicating that knockout of LSL2 in ZH11 leads to the long sterile lemma mutation phenotype.
Analyses of 3-D structures between the LSL2 protein and the lsl2 protein
By further simulating the 3-D structure of the protein, we found changes between the lsl2 protein and the LSL2 protein (Figure 5). Moreover, we observed a significant change in protein structure when residue 79 of LSL2 was changed from Ser to Pro (Fig. 6).
Haplotype analysis of the LSL2 gene
To further investigate the genetic and evolutionary characteristics of the LSL2 gene, we performed SNP calling and haplotype analysis of the 3,000 sequenced rice genomes available in the CNCGB and CAAS databases  and found 492 haplotypes for the LSL2 gene, with 49 haplotypes among more than 15 rice resource materials (Supplementary Table 2). However, in the 3,000 sequenced rice genomes, no haplotype or SNP was found for the lsl2 mutant.