Fine Mapping and Candidate Gene Analysis For A Novel Male-Sterile Mutant Ms40 in Maize

Liu Xiaowei Sichuan Agricultural University Yue Yujing Sichuan Agricultural University Gu Zicheng Sichuan Agricultural University Huang Qing Sichuan Agricultural University Pan Zijin Sichuan Agricultural University Zhao Zhuofan Sichuan Agricultural University Zheng Mingmin Sichuan Agricultural University Zhang Zhiming Shandong Agricultural University Li Chuan Sichuan Agricultural University Moju Cao (  caomj@sicau.edu.cn ) Sichuan Agricultural University https://orcid.org/0000-0002-5883-5882


Introduction
Maize is one of the most important crops widely cultivated in the world and is one of the earliest crops to utilize heterosis. In the production process of maize hybrids seeds, arti cial emasculation is the most common method but this method is time-consuming and laborious. Meanwhile, the purity of hybrids seeds is di cult to be guaranteed. However, using the male sterile lines in seed production will address these problems greatly. Maize male sterility is divided into cytoplasmic male sterility (CMS) and genic male sterility (GMS). There are some obvious problems in the application of CMS, such as the instability of sterility and di culty in nding strong and stable restorer lines. For GMS, it is di cult to nd completely maintainer lines, making it hard to apply directly in the hybrids seeds production of maize.
The seed production technology (SPT) bring about hope for applying GMS in hybrid seeds production, therefore more and more GMS genes research have been reported and more attention from breeders were attracted. However, there are few male sterile mutants with independent intellectual property rights. So, it is particularly important to create male sterile mutants with independent intellectual property rights in China.
To date, approximately 19 genes of GMS mutants have been successfully cloned. At present, the reported maize male sterility genes encoded different protein types, including secretory protein, lipid transporter, redox protein, enzyme, and transcription factor. MSCA1(MULTIPLE ARCHESPORIAL CELLS1) encodes a plant speci c glutathione reductase gene, the mutation msca1 had the deletion of GSH binding site, which maybe impact the initiation of archesporial cells (Marc et al. 2009), its homologous genes, OsTDL1A and AtTPD1, had been reported to be related to the development of anther (Wang et al. 2012 (Cigan et al. 2001), all of them are required for the formation of pollen exine and anther cuticles in maize. Some transcription factor had been reported to be associated with genic male sterility in maize. OCL4 (OUTER CELL LAYER) encodes a HD-ZIP transcription factor, which plays a major role in the trichome differentiation and division of the anther cell wall in maize (Vernoud et al. 2009). MS9 encodes a R2 / R3 plant speci c MYB transcription factor (MC et al.). IG1(INDETERMINATE GAMETOPHYTE1) encodes a LOB domain protein which can regulate the proliferative phase of female gametophyte development (Evans 2007). MS7 encodes a PHD-nger transcription factor, which was used to hybrid seed production by multicontrol sterility system ). MS23 and MS32 encode bHLH transcription factors responsible for tapetal development and PCD (Moon et al. 2013;Nan et al. 2017).
The bHLH transcription factors (TFs) in owering plants consist of large families with 213 encoding genes annotated in maize (Lin et al. 2014), which is the most member transcription factor family in maize. MS23 encodes the bHLH16 transcription factor (Nan et al. 2017), which plays an important role in the differentiation of the endothecium and tapetum cells of anther and plays a direct or indirect role in the biogenesis of 24-phasiRNAs. MS32 encodes the bHLH66 transcription factor (Moon et al. 2013), and speci cally expressed in anther of premeiotic stage. Moreover, MS32 can interact with the protein encoded by MAC1 to regulate the pericytosis of L2 layer cells and the differentiation of anthers sporogonia, thus affecting pollen development. Although these bHLH transcription factors had been cloned, the regulating mechanisms for pollen abortion have not been elucidated clearly. The discovery of more other bHLH transcription factors controlling maize male sterile mutants maybe helpful for clearing these bHLH transcription factors regulating relationship between each other.
It is a long process to transfer a speci c male sterile gene from one genetic background into another elite inbred line, so the best strategy is to create male sterile mutant based on an elite inbred line background with a single base change, which can effectively accelerate the application of GMS gene. Maize inbred line RP125 cultivated by Sichuan Agriculture University is widely planted in southwest China, its characteristics of high combining ability, high yield, high resistance to northern leaf blight and southern leaf blight, moderate resistance to sheath blight and other major diseases in southwest corn production area, as well as the e cient utilization of phosphorus, making it to be one of the most popular parents in Southwest China in the 21st century.
In this study, we found a no-pollen male sterile mutant, ms40, which derived from maize inbred line RP125 by EMS mutagenesis treatment. Cytological observation showed that the tapetum of ms40 anther exhibited abnormal expansion, and defective in Ubisch bodies and pollen exine aslo were observed. The sterile gene of ms40 was located within a 282-kb interval on the chromosome 4 by map-based cloning, and Zm00001d053895 was found to be the key candidate gene. This study provides a new genetic resource not only for the application of GMS in the hybrid seed production but also for the regulating mechanisn interpretation of maize anther development.

Plant materials
In the spring of 2015, the maize inbred line RP125 bred by Sichuan Agriculture University was planted in experimental eld of Sichuan Agriculture University, Sichuan. Next, the pollen samples were treated by ethyl methanesulfonate (EMS) and then self-pollinated to produce M 1 seeds, then planted in Yunnan experimental eld, in the autumn of 2015. The M 1 plants self-pollination were conducted and M 2 seeds were obtained. The M 2 seeds were planted at experimental eld in spring of 2016, Sichuan. Among these, a male sterile mutant was found, termed ms40, pollinating with RP125 pollen to get (ms40×RP125)F 1 seeds. Two inbred lines B73, Mo17 were also used in this study.

Phenotype identi cation and genetic analysis
The (ms40×RP125)F 2 seeds were gained by the (ms40×RP125)F 1 self-pollinated, and then the eld fertility identi cation and pollen grains stained with 1% (m/v) I 2 -KI solution were adopted to evaluate the fertility of anthers. If the sterility phenotype of ms40 can be inherited stably, then the inbred line B73 and Mo17 were used as test lines for genetic analysis, Chi-square test was used for the phenotype segregation analysis.
A Canon M3 digital camera and Olympus SZX16 stereomicroscope were used to take photographs of the plants and anthers. Pollen grains were stained with 1% I 2 -KI (m/v) solution and photographed by a Lecia DM2000 microscope. For cytological observation of anthers, anthers of fertile and male sterile plants from the sister cross population (ms40/ms40×Ms40/ms40) at different developmental stages were xed in formaldehydeacetic acid-ethanol (FAA) overnight and dehydrated by gradient concentrations of ethanol. Then the anthers were in ltrated with a mixed gradient solution of ethanol and 7100 hardener II solution (Technovit 7100, Germany) and embedded in spur resin. Slicing was performed by a Leica DM2255 slicer and then stained with 0.1% (m/v) Toluidine Blue solution. A Lycra DM2000 microscope was used to observe and photograph sections.

Semi-thin section and scanning electron microscopy of anthers
Map-based cloning of ms40 male sterile gene The (ms40×B73)F 2 was applied to the mapping population of ms40, and 134 InDel markers uniformly covering 10 chromosomes of maize were developed based on the differences between genome sequences of RP125 and B73, the genomic DNA was extracted using the CTAB (hexadecyl trimethyl ammonium bromide) method (Luan et al. 2008) with minor modi cations from the original method. The bulk-segregant analysis (BSA) method was performed, the fertile DNA pool and the male sterile DNA pool were constructed using twenty fertile and twenty male sterile plants selected randomly from (ms40×B73)F 2 population by mixing equally. The 134 InDel markers were selected to detect polymorphisms between the two DNA pools. Next, the polymorphic markers were used to examine genotype of 115 male sterile individuals from the (ms40×B73)F 2 population to judge whether the sterility phenotype and the polymorphic markers were linked. Based on the primary mapping, 4 new polymorphic InDel markers were developed, and the 1230 sterile plants from the larger (ms40×B73)F 2 population was used. All markers information is provided in Table S1.

Key candidate gene prediction and analysis
Candidate gene predictions and functional annotations were obtained from the Gramene database (http://ensembl.gramene.org/). The conserved domains of candidate genes were predicted by the NCBI Conserved Domain Search tool (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi), and the data of the expression patterns were derived from an RNA-seq expression database (https://www.maizegdb.org/). The sequences of the candidate genes were ampli ed from ms40 and RP125, then PCR products were sequenced and analysed.
Based on the sequence difference of Zm00001d053895 among ms40 and RP125, the relationship between the fertility and SNP of Zm00001d053895 in ms40 was analysed, the SNP marker was developed according to the anking sequence of the mutation site (SNP-F: 5'-TGTCATTGTACGTACGGCGG-3', SNP-R: 5'-CGTGGGATGTACGGCGATG-3'). Co-segregation analysis of phenotypes and genotypes in the individuals of (ms40×Mo17)F 2 population and (ms40×RP125)BC 1 F 1 population with the SNP marker was performed . To detect whether the SNP of Zm00001d053895 exists only in ms40, the sequence fragment including the mutation site was ampli ed with the SNP marker in 30 maize inbred lines and sequedced.
Rapid ampli cation of cDNA ends (RACE) assay Total RNA was extracted from the anthers of RP125 at different developmental stages by using TRIzol reagent (Invitrogen). The N711 Kit (Vazyme, Nanjing, China) was used for the RACE assay according to the manufacturer's instructions. GSP1 and GSP2 primers and the gene-speci c primers used for RACE were designed according to the reference cDNA sequence of Zm00001d053895. The GC content of GSP1 and GSP2 primers was required at 50% -70%, and its T m were approximately 60°C (GSP1: 5'-ACCTGCCTCCATCAATCCAGCTCG-3', GSP2: 5'-AATGAGGTGGCAGTGCAGGCGGA-3'). GSP1 and GSP2 primers were used for 5' RACE and 3' RACE, respectively. The PCR products were cloned into the pEASY-Blunt cloning vector and sequenced. The CDS of Zm00001d053895 was forecasted via the ORF nder tool (https://www.ncbi.nlm.nih.gov/or nder/).

RNA extraction and qRT-PCR
Total RNA, including leaves, roots, stems, and anthers of various developmental stages, were extracted from the RP125 plants using TRIzol reagent (Invitrogen, USA). Each sample contained three biological replicates. Total RNA was reverse transcribed using the Reverse Transcription Kit (Vazyme, China), and qPCR was performed using SYBR Green PCR Master Mix (TaKaRa, Japan). Biorad CFX96 system was used to detect the relative expression of Zm00001d053895 with the primers q-51-F/R (5′-CTCTGGGTCCCCCTGCAT-3′, 5′-GGTGGTGGTGGGGTGGA-3′). Three biological replicates and four technical replicates were performed for each procedure. ZmACTIN was used as the internal control to normalize the expression data (Chen et al. 2017) and its amplifying primers were 5′-TCACCCTGTGCTGCTGACCG-3′ and 5′-GAACCGTGTGGCTCACACCA-3′. Relative expression levels were calculated according to the 2 -△△Ct method, and all results were expressed as the mean ± standard error of the mean (SEM).

Subcellular localization
For analysis of the subcellular location of Zm00001d053895, we rst used the TargetP-2.0 Server (http://www.cbs.dtu.dk/services/TargetP/) to predict the putative subcellular location. Then, the CDS of Zm00001d053895I was ampli ed from RP125 (without the stop codon) and cloned into the pCAMBIA2300-eGFP vector. The construct, as well as the empty construct, were cotransformed with the nuclear marker (p35S::NLS-RFP) (Wu et al. 2016) into tobacco (Nicotiana benthamiana) mesophyll cells, respectively. The uorescence signals were observed by a laser scanning confocal microscope (Zeiss 800). eGFP and RFP uorescence signals were tested at 488 nm and 561 nm, respectively.

Transactivation activity analysis
The Zm00001d053895 CDS was inserted into the pGBKT7 vector using In-Fusion cloning method (Vazyme ClonExpress II One Step Cloning Kit, Vazyme Biotech, China). Then the recombinant was transformed into AH109 yeast strains (Tiandz, China) via the lithium acetate-mediated approach. The growth performances of cotransformants of positive clones were examined on SD/-Trp medium and SD/-His-Trp medium containing 50 mg l −1 χ-α-gal (Coolaber, China) for 2-4 days at 28°C. The free pGBKT7 vector and pGBKT7-GAL4 AD ) were used as positive and negative controls, respectively.

Phylogenetic analysis
For determination of the evolutionary relationship between Zm00001d05395 and its homologs in various species, homologue sequences were searched in the NCBI database (https://www.ncbi.nlm.nih.gov/) using the Zm00001d053895 amino acid sequence as the query, and 14 homologs from Oryza sativa, Solanum lycopersicum, Brachypodium, Foxtail millet, Hordeum vulgare, A. thaliana, Solanum tuberosum, Sorghum and Triticum aestivum were retrieved. Multiple sequence alignment was performed using CLUSTALW with default settings within MEGA 6 (HIGGINS 1996). We adopted MEGA 6 to construct an unrooted phylogenetic tree via the neighbour-joining method, which was tested by 1000 bootstrap replicates, and the phylogenetic tree was pretti ed using EvolView (https://www.evolgenius.info/evolview/) (Zhang et al. 2012).

Co-expression analysis
Co-expression analysis was used to identify potential interacting proteins of Zm00001d053895. The expression data of approximately 40,000 maize genes from 8 tissues of B73 were downloaded from the q-teller database (http://www.qteller.com), and the gene expression data were fragments per kilobase of exon per million fragments mapped (FPKM). Pearson correlation coe cient values (P-values) of each gene with Zm00001d053895 were calculated based on expression data. The co-expressed genes with Pvalues > 0.8 and P-values < 0.05 were selected. The FPKM values of co-expressed genes were homogenized by log 2 (FPKM + 1), and then, the Z-scores were calculated (Sekhon et al. 2011). A Z-score value larger than 2 was determined as the tissue-speci c gene. For characterization of the putative function of Zm00001d053895 co-expressed genes, GO terms for each coexpressed gene were obtained at Gramene (http://www.gramene.org/), and GO enrichment analysis was performed using OmicShare tools (http://www.omicshare.com/tools).

Results
Male sterile mutant ms40 is controlled by a recessive nuclear gene The pollens of maize inbred line RP125 were treated by EMS, M 1 plants were self-pollinated to obtain M 2 .
Male sterile mutant ms40 was found among the M 2 generation, then RP125 pollens were used to fertilize ms40, all the individuals of (ms40×RP125)F 1 population were fertile, thereupon, the (ms40×RP125)F 1 individuals were conducted self-pollinated on the one hand, and reciprocal crossed with RP125 on the other hand, the male sterile phenotype were separated again no matter within the F 2 populations or the backcross populations, which implied that ms40 was a genetic male sterile (GMS) mutant. Hence, some male sterile plants were selected and pollinated with the pollens of B73, Mo17 separately, all the individuals of (ms40×B73)F 1 , (ms40×Mo17)F 1 presented fertile plants. Then self-pollination were conducted for (ms40×B73)F 1 and (ms40×Mo17)F 1 , the F 2 seeds were planted both in Yunnan and Sichuan for the tassels fertility identi cation. The male sterile phenotype was separated in the F 2 population no matter planted in Sichuan or Yunnan. Moreover, the segregation ratio of fertile plants vs male sterile plants within (ms40×B73)F 2 population and (ms40×Mo17)F 2 population t to the ratio of 3:1 with student t test (Tab 1). These results proved that the sterile phenotype of ms40 was controlled by a single recessive nuclear gene. No obvious differences were found between RP125 and ms40 for their agronomic traits (Fig. 1A, B). After tasseling, RP125 tassels and anthers presented normal-appearing, whereas ms40 tassels failed to expose and no pollen shed, and its anthers were smaller and thinner compared with those of the RP125 (Fig. 1C-F). By means of I 2 -KI staining, the pollen grains of RP125 were dark blue-stained with round-shaped staining, while no pollen grains were found for ms40 ( Fig. 1G-H).
The anther development presents a certain imperfection for ms40 In order to reveal the characteristics of ms40 anthers, we examined epidermis and inner surface of anthers by scanning electron microscope (SEM). The anthers of ms40 were signi cantly smaller, shorter and much more withering ( Fig. 2A) in comparison with that of RP125 (Fig. 2B). In addition, the anthers of RP125 showed a latticed-waxy-crystal anther epidermal surface (Fig. 2C), but ms40 was smooth without any cuticle (Fig. 2D). Meanwhile, we found that Ubisch bodies covered the whole inner surface of RP125 anthers (Fig. 2E), while no Ubisch bodies were observed on the inner surface of ms40 anthers (Fig. 2F). From the broken anthers, we could nd pollen grains with round-shape lled in RP125 anthers (Fig. 2G, I), but no pollen grains were found within ms40 anthers (Fig. 2H), which further veri ed that ms40 was a nopollen type male sterile mutant. These results showed us that a series developmental defects of anther and pollen coupled with ms40.
Mutant ms40 exhibit the delayed degradation for anther tapetum A variety of anther dysplasa were observed in various male sterile individuals. Understanding the cytological characteristics of pollen abortion is helpful for explain the mechanism of failure for a male sterile mutant. So, anthers of male sterile and fertile plant from (ms40×RP125)BC 1 F 1 population at different stage were examined using semi-thin section. At sporogenous and pollen mother cell stages, no substantial difference was observed between the anthers of RP125 and ms40 (Fig. 3A-D). At meiosis stage, the tapetum degradation began with paliform-shape in RP125, while the ms40 tapetum remained almost intact, suggesting the degradation of ms40 was delayed (Fig. 3E, F). Subsequently, the content of tapetum cell in RP125 began to concentrate, deepened staining, while the ms40 tapetum was swelled seriously, and irregular microspores were observed (Fig. 3G, H). At large vacuolated stage, large vacuoles had formed in the centre of microspores, and tapetum cells were further concentrated and degraded in RP125, however the microspores of ms40 began to shrink and were unable to form large vacuoles, tapetum cells were clearly visible with no signs of disintegration (Fig. 3I, J). At binucleate stage, the vacuolated microspores underwent asymmetric mitotic division and displayed falcate-shape, accompanied with tapetum completely disintegration in RP125, while the microspores of ms40 gradually degraded, and the vacuolation of the tapetum was more obvious (Fig. 3K, L). At mature pollen grain stage, vast pollen grains lled with starch were observed in the anther locule of RP125, in contrast, the microspores of ms40 almost completely degraded, leaving only remnants in their locules, and vacuolized tapetum could be observed (Fig. 3M, N). In conclusion, male sterile mutant ms40 exhibited signi cantly delayed disintegration of tapetum and no-pollen abortive type.
Male sterile gene of ms40 was mapped on chromosome 4 within a 282-kb region In this study, (ms40×B73) F 2 was taken as a mapping population, 134 InDel markers covering the whole maize genome were developed based on the whole genome re-sequencing data of RP125 and B73. The 134 InDel markers were used for polymorphism scanning between ms40 and B73, then 73 polymorphic markers were obtained. Afterwards, the 73 polymorphic markers were applied to scan polymorphism between male sterile DNA pool and fertile DNA pool, then InDel markers umc1940 and umc1649 were selected for their polymorphism, interestingly, both umc1940 and umc1649 were located at bin 4.10 on maize chromosome 4. So, we developed some novel InDel markers between ms40 and B73, among them, 6 polymorphic markers were obtained, then 115 male sterile individuals from (ms40×B73)F 2 population were genotyped with the 6 InDel markers, 1 and 7 recombinants were screened in the 115 male sterile individuals with X98 and X72, respectively, and no recombinant was detected with X76 among the 115 male sterile individuals. Therefore, ms40 locus was located between X98 and X72 on chromosome 4 (Fig.  4A).
In order to shorten the mapping region, 1230 male sterile individuals derived from (ms40×B73)F 2 population were used for genotyping with X98 and X72, and a total of 11 and 6 recombinants were screened from the 1230 male sterile individuals, respectively. Then, we furtherly developed 4 polymorphic InDel markers within the mapping interval between X98 and X72. Among these InDel markers, only 1 recombinant were detected in the 1230 male sterile individuals for marker X214 and X242, respectively. No recombinant was detected in the 1230 male sterile individuals for marker X168, therefore the male sterile gene of ms40 was mapped to chromosome 4 between X214 and X242, the physical distance between X214 and X242 was 282 kb (Fig. 4B). According to the MaizeGdb database, a total of 5 open reading frames (ORFs) were identi ed within the region of ne mapping (Tab 2). All markers information used in this study are provided at Table S1.
Zm00001d053895 is the key candidate gene of ms40 Then 5 ORFs were cloned from RP125 and ms40, next sequencing and sequence alignment were conducted. We found that only Zm00001d053895 harboured an SNP (G to A) at position 2851 for ms40 and RP125, no differences were found for the sequence of other 4 genes. The SNP locus was located at the seventh exon of Zm00001d053895, which led to a change of amino acid from Gly (GGG) to Arg (AGG) (Fig. 5A). Zm00001d053895 was predicted to be an anther-speci c gene that encodes a bHLH transcription factor (https://www.maizegdb.org/gene_center/gene/ Zm00001d053895), the mutants (tdr1 and ams) of its homologous genes OsTDR and AtAms manifested male sterile phenotype and displayed developmental defect in the anther tapetum (Ferguson et al. 2017;Li et al. 2006).
In order to reveal whether or not the SNP (G/A) at position 2851 of Zm00001d053895 is related to the male fertility, we developed an SNP marker (SNP-F/SNP-R) based on the anking sequence of the mutation site in ms40. Then 1589 individuals from (ms40×Mo17)F 2 population including 1218 fertile plants and 371 male sterile plants, and 197 individuals from (ms40×RP125)BC 1 F 1 population including 101 fertile plants and 96 male sterile plants were used for co-segregation analysis with the SNP marker (SNP-F: 5'-TGTCATTGTACGTACGGCGG-3', SNP-R: 5'-CGTGGGATGTACGGCGATG-3'). As a result, 1218 fertile plants of (ms40×Mo17)F 2 population carried either homozygous G/G or heterozygous A/G allele, 101 fertile plants of (ms40×RP125)BC 1 F 1 population carried heterozygous A/G allele at position 2851 of Zm00001d053895, whereas homozygous A/A allele was detected in all the 467 male sterile plants,371 sterlie individuals from (ms40×B73)F 2 population and 96 sterlie individuals from (ms40×RP125)BC 1 F 1 population. All these results showed that the haplotype (A/A) at position 2851 was co-segregated with the male sterile phenotype of ms40. Moreover, 30 different inbred lines were examined using the SNP marker, and only ms40 had the homozygous A/A allele at position 2851 of Zm00001d053895, the 30 inbred lines were all homozygous G/G for the corresponding locus of Zm00001d053895 (Fig. 5B) (Table  S2), suggesting that the 2851st nucleotide (G) should be a conserved nucleotide and the position 2851 of Zm00001d053895 may be a key functional site.

Zm00001d053895 encodes a bHLH transcript factor and has the transcriptional activating ability
To illustrate the evolutionary relationship of Zm00001d053895, we performed phylogenetic analysis based on 14 orthologous genes from 10 plant species that shared high sequence similarity with Zm00001d053895. Through multiple sequence alignment, a classic HLH domain was found in all the 14 homologous genes (Fig. 6A), suggesting that Zm00001d053895 is a typical bHLH transcription factor and the orthologs of Zm00001d053895 might have conserved function among various plant species. Moreover, the mutation site of Zm00001d053895 in ms40 was located within the HLH conserved domain of the bHLH transcription factor. Phylogenetic analysis showed that these genes were divided into three clades, which indicates that their molecular functions had a degree of evolutionary conservation. Zm00001d053895 shared the highest homology with Sb04g001650 (81.2%) of Sorghum (Fig. 6B), a putative TDR bHLH transcription factor, which has connected with the development of anther tapetum thus it can be seen that Zm00001d053895 plays an important part in regulating tapetum development.
The transactivation activity assay was performed to investigate the transcriptional activating ability of Zm00001d053895, the transformant pGBKT7-Zm00001d053895 were constructed, and free pGBKT7 and pGBKT7-GAL4 AD as negative and positive control respectively. All of them could grew well on the SD/-Trp medium. However, on the SD/-His-Trp medium containing 50 mg/l χ-α-gal, the free pGBKT7 transformant could not live, but the pGBKT7-GAL4 AD, pGBKT7-Zm00001d053895 grew normally and turned the indicator blue (Fig. 7), which indicated that Zm00001d053895 had the transcriptional activating ability.
Four transcripts of Zm00001d053895 were identi ed in maize anther by RACE In our study, the rapid ampli cation of cDNA ends (RACE) assay was performed for determine the structure of Zm00001d05389 transcripts, four transcripts of Zm00001d053895 were ampli ed with total RNA of RP125 anthers from different development stages. The CDS was predicted according to the sequences of transcripts by the ORF nder tool (https://www.ncbi.nlm.nih.gov/or nder/), then four transcripts were identi ed and encoded 560aa (Zm00001d053895-T001), 597aa (Zm00001d053895-T002), 603aa (Zm00001d053895-T003) and 628aa (Zm00001d053895-T004), respectively (Fig. 8A). Among the four transcripts, Zm00001d053895-T001, Zm00001d053895-T003 and Zm00001d053895-T004 contained 7 exons and 6 introns, only Zm00001d053895-T002 contains 8 exons and 7 introns. Through comparing with the shortest transcript Zm00001d053895-T001, we found that both Zm00001d053895-T003 and Zm00001d053895-T004 transcripts resulted from alternative 5` splice site (A5SS), while transcript Zm00001d053895-T002 resulted from exon skipping (ES). By sequence alignment of the four transcripts, we also found that the stop codon position of the four transcripts were same, but the position of start codon was different, which account for the diversity of these CDS owing to the difference of transcription starts sites (TSS) of four transcripts.
To decipher the expression speci city of Zm00001d053895, qRT-PCR was performed for roots, stems, leaves and anthers from different developmental stages of RP125 with the speci c primers q-51-F/R. Zm00001d053895 was preferentially expressed at pollen mother cell and tetrad stage of anthers, and the expression level was low at uninucleate and binucleate stages. These results were consistent with the phenotypic differences observed at the meiotic stages between the RP125 and ms40. In contrast to the expression levels of anthers, the expression levels in roots, stems and leaves were very low (Fig. 8B), which suggested that Zm00001d053895 was an anther-speci c gene and played an important role in the development of anthers.
Zm00001d053895 was localized in the nucleus and coexpressed with some anther-speci c genes The TargetP-2.0 server were used to predict the putative subcellular location of Zm00001d053895 rstly, then we performed transient expression assays in tobacco leaves. The vector p35S::Zm00001d053895-eGFP was constructed with Zm00001d053895 coding sequence and fused to the N-terminus of eGFP driven by 35S, the free vector p35S-eGFP was used as a positive control vector p35S::NLS-RFP carried the nuclear location signal (NLS) was also constructed in this study. As expected, the nuclear localization signal (NLS) of vector p35S::NLS-RFP was distributed in the nucleus of tobacco mesophyll cells, the eGFP signal of vector p35S-eGFP was distributed throughout the entire cell, and the uorescence signal merged in the nucleus when vector p35S::NLS-RFP and vector p35S-eGFP were cotransformed (Fig. 9A-D). When p35S::Zm00001d053895-eGFP and p35S::NLS-RFP were cotransformed, the uorescence signals merged in nucleus (Fig. 9E-H), all the results suggested that Zm00001d053895 was localized in the nucleus, which consistent with the location result predicted by TargetP-2.0 server tool.
We identi ed 1192 genes coexpressed with Zm00001d053895 in the whole genome using the expression data of maize gene downloading from the q-teller database. Among them, the male sterile gene Zm00001d02680 (ms7) , and ve putative GMS genes (Wan et al. 2019), Zm00001d031312, Zm00001d033335, Zm00001d013732, Zm00001d013991, and Zm00001d035841, shared expression PCC values of 0.91, 0.98, 0.95, 0.97, 0.88 and 0.92 with Zm00001d053895, respectively. Such a high correlation suggested that Zm00001d053895 may be related to male fertility.
Next, the GO terms of 1192 co-expressed genes were analysed. For the biological process category, metabolic process, cellular process and single-organism process were highly enriched in the GO classes. Of the cellular components, 107 GO terms were enriched, mostly in the membrane and membrane parts. For the molecular function category, binding and catalytic activity were the most abundant subcategories (Fig. 10A), and these terms had been reported that their functions were related to alterations in male fertility (Cunmin et al. 2015;Mei et al. 2016;Zhu et al. 2015). After homogenization of the expression data of co-expressed genes, we found that 647 genes were speci cally expressed in anthers and pollen (Fig. 10B). These results indicated that Zm00001d053895 highly possible was related to the development of anthers.

Discussion
Male sterile mutant ms40, which derived from the progeny of EMS-treated inbred line PR125, showed stable male sterility for the multi-year tests no matter planted in Sichuan or Yunnan locations. Mutant ms40 exhibited no anther exerted and belonged to no pollen type. Genetic analysis showed that ms40 is controlled by a recessive nuclear gene. Through map-based cloning, we had successfully located it on the long arm of chromosome 4 within a 282-kb region, which consists of ve open reading frames. Based on cloning and sequencing analysis, an SNP from G to A change was found within the seventh exon of Zm00001d053895, which encodes a bHLH transcription factor (https://www.maizegdb.org/gene center/gene/Zm00001d053895), its homologous genes OsTDR1 and AtAms were reported to be related to the anther development (Li et al. 2006).
The intragenetic marker was developed according to the SNP locus and co-segregation analysis were conducted with two different fertility segregation populations, the results showed that all the male sterile individuals were A/A genotype, which furtherly supported that Zm00001d053895 was the male sterile gene of ms40. In addition, 30 maize inbred lines were used as materials for the sequence conservative analysis of Zm00001d053895, as a result, all the inbred lines present the G/G genotype for gene Zm00001d053895 through PCR ampli cation and sequencing analysis, only ms40 individuals were A/A genotype at the same locus, these results indicating the G-A base change was responsible for the generation of ms40 mutant. Therefore, Zm00001d053895 is considered to be the key candidate gene of mutant ms40.
Understanding the failure characteristic of anther is necessary for exploring the abortion mechanism of a male sterile mutant. Tapetum located in the innermost layer of the anther is connected with the development of microspores, normal development of tapetum is vital to the formation of pollen.
Cytological observation showed that the anther tapetum of ms40 exhibited obvious vacuolization and delayed degradation. The abortive features of tapetum development in ms40 are consistent with the previous reported function of bHLH transcription factors, such as AtAMS, OsTDR1, OsUDT1, OsEAT1, ZmMS23, ZmMS32 (Ferguson et al. 2017;Jung et al. 2005;Ko et al. 2014;Moon et al. 2013;Nan et al. 2017;Niu et al. 2013), all of these mutants manifested different degrees of tapetum abnormality. Thus, the cytological observation results of ms40 also supported that Zm00001d053895 was the key candidate gens of ms40. qRT-PCR analysis showed that Zm00001d053895 expressed speci cally in anther, and preferentially expressed at the pollen mother cell stage as well as tetrad stage of developing anthers. Consistently, the cytological differences of anther development between the RP125 and ms40 occured at the meiotic stages. Moreover, we also found that ms40 had no cuticle on anther epidermal and no Ubisch bodies coating on its inner surface. Not all male sterile mutants were short of cuticles and Ubisch bodies in anther, such as OsLSP1 (Luo et al. 2020) and OsMS1 (Yang et al. 2019). Compared with RP125, the contents of tapetum cell in ms40 anther were particularly thin, and then probably hindered the cuticles formation of anther epidermal as well as the Ubisch bodies formation adhering inner surface of anther.
Zm00001d053895 has the classical HLH domain of bHLH transcription factors, which is necessary for bHLH transcription factors forming homodimers or heterodimers and then regulating the expression of target genes (R et al. 2013), bHLH-bHLH complexes had been con rmed to be relevant to plant fertility.
Our transcriptional activation assay showed that Zm00001d053895 had the ability of transcriptional activation. However, the SNP site of G to A single nucleotide change was predicted to locate in the binding sites of Zm00001d053895 with the NCBI-CDD tool. So different haplotype of the SNP within the seventh exon of Zm00001d053895 may directly affect the ability of transcriptional activation, and then disturbed the pollen development through regulating the target genes expression.
It is well known that maize male sterile mutant plays an important part in hybrid seed production.
Although various male sterile genes have been cloned, there is a certain distance from the discovery of a male sterile mutant to applying it in seed production for a particular cross combination. Therefore, the high yield and high combining ability excellent inbred line RP125 was selected as the basic materials for EMS-treatment. Fortunately, the male sterile mutant ms40 was obtained from their progeny, thus we can utilize ms40 for the hybrid seed production of some cross combinations which using PR125 as the female parent. Based on the different years and different locations experiments, no obvious difference was found between PR125 and ms40 for their agronomic traits, other studies showed that PR125 can tolerate low phosphorus and resist to a variety of diseases. Moreover, exploring the abortion mechanism of ms40 not only can contribute to the maize anther developmental biology research but also can substantially promote the application of using male sterile lines for hybrid seed production.        Phylogenetic tree of Zm00001d053895 and its homologs in other plant species. The percentage identities represent the sequence similarity between the corresponding proteins and Zm00001d053895. The yellow star represents Solanum tuberosum, the red circle represents Solanum lycopersicum, the blue star represents Arabidopsis thaliana, the black star represents Glycine max, the green circle represents Oryza sativa, the black rectangle represents Foxtail millet, the red star represents Zea mays, the red triangle represents Sorghum, the blue rectangle represents Brachypodium, the red rectangle represents Hordeum vulgare, and the green triangle represents Triticum aestivum.

Figure 7
Page 22/24 transcriptional activation assay of Zm00001d053895 in the AH109 yeast strain. Free pGBKT7 and pGBKT7-GAL4 AD as negative and positive control respectively. Yeast transformants were spotted on to control medium (SD/-Trp) and selective medium (SD/-Trp/-His).   The bar indicates the relative gene expression level. These gene expression data were retrieved from the qTeller database and log2-normalized (original data+1).