Identification of Nucleus Sterility Candidate Genes using a Resequencing Technique in Sweet Pepper Sterile Line AB91

Breeding hybrids with nuclear male sterile lines is an important method for the cross breeding of sweet peppers. To date, few reports have been published on the nuclear male sterility gene of sweet pepper. Yet, there are approximately 20 pepper nuclear male sterility lines in the world. Using the self-developed testing material, sweet pepper nuclear male sterile dual-purpose line AB91, the genome-wide resequencing technique was applied to firstly find that the mutation site causing the abortion of sweet pepper nuclear male sterility AB91 is on chromosome #5. The mutation gene Capana05g000747 was filtered out and validated by the flight mass spectrometry genotyping method and determined to be the gene causing the abortion of sweet pepper nuclear male sterility AB91. The gene Capana05g000747 contains eight exons and seven introns, and its mutation site is a non-synonymous mutation site located at the 6th exon; the base C mutated into A, and the amino acid changed from alanine to serine. Sequence alignment analysis showed that the gene Capana05g000747 has a similar function to gene At2g02148. The gene At2g02148 contains a pentatricopeptide repeat protein which has important physiological functions in the gene expression process of organelles and is closely related to the performance of male sterility genes. Therefore, Capana05g000747 was selected as an important candidate gene for sweet pepper nuclear male sterile testing material AB91.

gene Capana05g000747 contains eight exons and seven introns, and its mutation site is a nonsynonymous mutation site located at the 6th exon; the base C mutated into A, and the amino acid changed from alanine to serine. Sequence alignment analysis showed that the gene Capana05g000747 has a similar function to gene At2g02148. The gene At2g02148 contains a pentatricopeptide repeat protein which has important physiological functions in the gene expression process of organelles and is closely related to the performance of male sterility genes. Therefore, Capana05g000747 was selected as an important candidate gene for sweet pepper nuclear male sterile testing material AB91.

Background
Sweet pepper (Capsicum annuum var. grossum) is a one-year or perennial crop in the Solanaceae and a variant of pepper. The heterosis of sweet pepper has high utilization value and using male infertility to breed hybrids is an important approach to solving the problem of artificial emasculation. Therefore, the research topic of male infertility has drawn a great deal of attention of domestic and international research scholars. Due to the lack of recovery genes for male sterility in the sweet pepper nucleus and the difficulty of finding excellent recovery lines, the breeding and application of cytoplasmic male sterility in sweet pepper hybrids is limited. However, the genetics of sweet pepper nucleus male sterility is relatively simpler and the number of recovery sources is relatively large, which are a great advantage inbreeding sweet pepper hybrids. Martin and Crawford [1] first reported capsicum sterility materials controlled by a recessive single-gene. Since then, new genes have been discovered in the pepper male sterile material. To date, nearly 20 nuclear sterility genes have been discovered in peppers [2][3][4] . However, there are very few reports on the sweet pepper nuclear male sterility genes.Fan et al. [5] discovered a natural sterile sweet pepper plant. After years of research, improvement, and breeding, the sweet pepper nuclear male sterile dual-purpose line AB91 was bred, and its sterility feature was controlled by a pair of recessive nuclear genes named msc2 [6] . The sterile line AB91 is completely aborted and stable in sterility with no adverse cytoplasmic effect and has excellent agronomic traits and wide recovery sources. Usually, inbred lines can easily become their own recovery lines, the degree of freedom for association is large, and they are more likely to obtain advantageous combinations. So far, 14 sweet pepper hybrids have been bred using AB91 and have been widely planted [7] . We have found that the abortion of sterile plants occurs after the development of quarter microspores at the cytological level; the reasons for microspores abortion are disintegration of the tetraploid enamel wall, tapetum cell dysplasia, and delayed disintegration [8] ; however, there are no reports of co-segregation markers or candidate genes for this gene site.
In recent years, with rapid development of modern molecular biology technology in the field of plants, male infertility research has risen to the level of genetic engineering. At present, a lot of genes are connected to the sterility traits for the pepper sterile lines. Ma et al. [9] and Li et al. [10] discovered the gene PAP3 in peppers, which was found to cause atrophy of pollen grains leading to male sterility.
Fan et al. [13] found a new gene CaCTS from the pepper male sterile line by cloning, which is highly expressed in flowers and seeds, moderately expressed in the placenta and pericarp, and weakly expressed in stems and leaves. Deng et al. [14] expanded out the malate dehydrogenase(MDH) gene by reverse transcription PCR. For the MDH gene, the performance level is low in abortion and the performance trend is not stable, which may interfere with the energy metabolism balance of the sterile line. Guo et al. [15] discovered the protein CaAMS, the down-regulation of which causes partial filament shortening, withering, stamen non-cracking, and pollen abortion. CaAMS plays an important role in the development of pepper tapetum and pollen through a complex regulatory network. Deng et al. [16] expanded the complete coding sequence for the triose phosphate isomerase (TPI) gene by reverse transcriptase PCR. In the abortion process, the activity and performance level of TPI in the anthers of male sterile lines were significantly decreased, and the levels of TPI in F1 hybrids and maintainer lines remained normal, which indicated that the stable TPI transcripts maintained the energy metabolism at a normal level. Qing et al. [17] performed genomic resequencing and comparison analysis for the male sterile line and fertile line of capsicum and selected Capana02g002096 as a candidate gene for the msc1 site according to genetic variation and annotation.Capana02g002096 encodes a homologous gene of AtDYT1,which is a bHLH transcription index involved in the early development of the tapetum. In addition, a deletion of 7bp was found in the exon of Capana02g002096, which leads the codon to terminate prematurely, resulting in infertility due to a function loss. These prior genetic studies related to infertility have provided important insights for the further study of the mechanism of male infertility, and they also provided a theoretical foundation for related research.
Based on the studies mentioned above, our study used sweet pepper nuclear male sterile line AB91 as the testing material. According to the pepper genome information published [18] , the difference sites and differential candidate genes of fertile and sterile plants for sweet pepper nuclear infertility AB91 were obtained through whole genome resequencing technology, and the genome resequencing results were analyzed by bioinformatics methods. The candidate gene sites were filtered out and validated by a mass spectrometry genotyping method, and the candidate gene of male sterility for AB91 was finally identified, which provides a theoretical and technical foundation for further cloning as well as transformation and utilization of the gene.

Fertility results and statistical analysis
As Figure 1 shows for the male sterile dual-purpose line of the sweet pepper AB91, the anthers of the fertile plant are full and bright yellow, the stigma is lower than the anther, and the whole anther is covered in pollen; however, the sterile plant anther is small and lavender, the stigma is higher than the anther, the anther is shriveled, and there is no pollen. A total of 473 plants of self-crossing F2 generation of the sweet pepper AB91 dual-purpose fertility plant (Msms) were planted in a field, and fertility identification was performed at the flowering stage. As the investigation results for the F2

Bulked-segregant analysis and data analysis
MsMs and msms pools were sequenced by the bulked-segregant analysis sequencing (BSA-Seq) technique; 417.04 million and 467.1 million filtered reads were obtained from the MsMs and msms pools, respectively. The MsMs-pool comparison ratio was 99.31%, the average coverage depth for the reference genome (excluding the N region) was 19.26X, and the 1X coverage degree (at least one base got covered) was 92.78%. The msms-pool comparison ratio was 99.03%, the average coverage depth for the reference genome (excluding the N region) was 21.55X, and the 1X coverage degree (at least one base got covered) was 92.95%. According to the results in Table 1, the comparison results are normal and can be used for subsequent mutation detection and correlation analysis. In order to reduce the impact of sequencing errors and comparison errors, the SNP index and InDel index polymorphism sites were filtered out, and the number of polymorphic marker sites was 11,348,482 after filtration. By making a difference between the SNP index and InDel index for two descendants by choosing a 95% confidence level. The window larger than the threshold was selected as the candidate interval, and the numbers of polymorphism marker sites selected out for SNP and InDel were 27,541 and 1,865 respectively. For the candidate site extraction ANNOVAR annotation results, stop loss, and stop gain were selected with the priority, non-synonymous mutation, or alternative splicing site selected as the candidate gene. The number of SNP candidate genes filtered out from the fertile and sterile gene pools was 35(InDel has no annotated candidate genes above),and 33 candidate genes of them located at chromosome #5 ( Figure 2).

Functional annotations of genes
The 33 genes in the correlated region are compared to the databases of NR, SwissProt [19] . GO [20] and KEGG [21] used BLAST software to identify 10 candidate genes and 11 candidate sites connecting to sweet pepper nuclear infertility, respectively, and they are speculated to be related to putative A mass spectrometry genotyping technique was used to detect the 11 mutation sites obtained by BSA-sequencing a population of 222 self-crossed F2 generation of sweet pepper nuclear male sterile dual-purpose AB91 fertile plant(Msms). Both sites 19194137 and 34599677 did not exhibit complete cosegregation with msc2. Therefore, the two corresponding genes Capana05g000617 and Capana05g000896 were excluded. The remaining 9 locations showed co-segregation, but it was slightly different from the msc2 phenotype detection in the field; the mutation site 28594037 had the highest similarity with msc2, and the accuracy rate (the number consistent with the msc2 phenotype detection in the field/F2 generation total population size) was 99.5%, and the corresponding gene for this site is Capana05g000747. Hence, the Capana05g000747 gene was determined to be the most powerful candidate gene for msc2( Figure 3).

Sequence analysis of male sterility gene msc2 for sweet pepper line AB91
The gene msc2 of sweet pepper male sterile AB91 is composed of eight exons and seven introns. The cDNA sequence is 1350bp in length and encodes 450 amino acids. The mutation site of the sterility gene is located at the 6th exon of Capana05g000747, the base C is mutated into A, and the amino acid is changed from alanine to serine, which causes a change in the fertility expression of the sterile plant of sweet pepper nuclear male sterile dual-purpose line AB91, leading to male sterility( Figure 4).

Homology of nuclear male sterility gene msc2 in sweet pepper AB91 with other species
For a further functional study of msc2, we performed a sequence alignment and homology relationship analysis for the gene msc2 and other species, which showed that msc2 gene was highly conserved and at the same branch as tomato, potato, and tobacco, indicating that the their homology relationship is very close and msc2 is more likely to have the same effect as a close homology.
However, the function of gene msc2 in tomato, potato, and tobacco sequences is predicted, and its function was found to be similar to the At2g02148 gene, which encodes the pentatricopeptide repeats (PPR) protein. This finding preliminarily implies that the msc2 gene has a connection with the PPR protein( Figure 5).

Discussion
Employing male sterility to breed advantageous hybrids is the most economical and efficient approach. To date, two types of recessive nuclear male sterility resources for sweet peppers have been discovered in China; one is the pepper male sterile material discovered by Shizhou Yang with the nuclear sterility gene named msc1, and the other is the sweet pepper male sterile line AB91 used in this study with its nuclear sterility gene named msc2. To date, sweet pepper nuclear male sterile line AB91 has been used to breed five national recognized vegetable varieties (JiYan5, JiYan6, JiYan15, JiYan16, JiYan108) and nine provincial recognized vegetable varieties (JiYan8, JiYan12, JiYan13, JiYan19, JiYan4, JiYan105, JiYan102, JiYan20, JiYan28), which are widely planted throughout China.Therefore, it is of great meaning to study the sterility gene msc2 of the sweet pepper male sterile line AB91. However, the research on the infertility mechanism of the msc2 gene has not been reported yet. Whole genome sequencing technology (WGS) is currently the most effective method for group function genetic mining and has many advantages such as its ability to contain comprehensive, efficient, and accurate information. WGS is a bioinformatics technique that performs differential sequence analysis of individuals or groups at the genomic level to explore species and filter out functional genes [22] . Currently, WGS has been widely used in rice [23] , cucumber [24] , potato [25] , watermelon [26] , sorghum [27] , and other crops. Therefore, this study carried out in-depth research to analyze the reason for abortion caused by gene msc2 using genetic sequencing and genotyping technology.

Material with prominent and genetically stable objective traits
The sweet pepper male sterile dual-purpose line AB91 is a sister cross-hybrid whose agronomic traits within groups have remained stable after years of breeding and whose sterility is controlled by a pair of recessive nuclear genes. The pure dominant fertile plant has a genetic background that is consistent with that of the pure recessive sterile plant, which avoids the wrong interference analysis caused by the genetic background difference between fertility materials.

Capana05g000747 is an important candidate gene of msc2
The phenomenon of nuclear male sterile caused by point mutation has been reported in cucumber [28] , maize [29] and other crops, but has not been reported in sweet pepper.In this study, the the mutation site that caused the abortion of sweet pepper male sterile AB91, which is on chromosome #5, was first discovered through WGS technology. Flight mass spectrometry genotyping was used to verify the differential gene and filter out the gene Capana05g000747, which is most likely to be the important gene leading to the abortion of sweet pepper male sterile AB91. For the gene Capana05g000747 including eight exons and seven introns, the mutation site 28549037 is a non-synonymous mutation site with its position at the 6th exon of Capana05g000747; the features with base C mutated into A and the amino acid changed from alanine to serine are what causes the male sterility of sweet pepper malesterile line AB91.

Sequence analysis of the Capana05g000747 gene
For a further function of the gene Capana05g000747, we performed a sequence alignment analysis that showed that the gene homology is close to tomato, potato, and tobacco, but their functional annotations were all predicted and are similar to that of the gene At2g02148. At2g02148 encodes a PPR gene protein, which is encoded by a nuclear gene and consists of 35 degenerate amino acids in a series of repeating units; most of the protein is transported into organelles to fulfill their functions [30] .
The PPR gene protein has important physiological functions in the gene performance process of organelles and is involved in almost all stages of gene performance, including transcription [31] , RNA splicing [32] ,RNA editing [33] ,translation [34] , and RNA stability maintenance [35] .Previous studies have shown that the PPR gene protein is dispersed all over the entire pepper genome [36][37] , while most of the Rf candidate genes are on chromosome #6 [38][39] and the sterility performance is realized by controlling the male sterility related genes. However, our research shows that the candidate gene protein PPR of the nuclear male sterility line AB91 on chromosome #5 has a base mutation, which causes a protein function change, resulting in abortion microspores, but the molecular mechanism of the PPR gene protein in the sweet pepper nuclear male sterility AB91 is still unclear and its verification will require further research.

Materials And Methods
The sweet pepper recessive male sterile dual-purpose line AB91 was provided by the Sweet Pepper and hybrid plants (Msms) were separated and identified according to offspring fertility to complete the basic group construction.

Gene bank construction and Illumina sequencing
The homozygous fertile plants (MsMs) and recessive sterile plants (msms) in the F2 generation isolated group of sweet pepper AB91 were used as testing materials, and the DNA of the young leaves of plants was extracted by the CTAB method [40] to construct the MsMs and Msms gene banks. The DNA sample was fragmented into a size of 350bp by sonication, then the DNA fragments were finally polished, A-tailed, and ligated with the full-length adapter for Illumina sequencing with further PCR expansion. At last, the PCR products were purified (AMPure XP system). Then, the size distribution of gene banks was analyzed by an Agilent2100 Bioanalyzer (Aligent, Santa Clara, CA, USA). Quantified analysis was performed using real-time PCR. The genebanks constructed above were sequenced by an Illumina HiSeq4000 platform (Illumina, San Diego, CA, USA) and 150bp paired-end reads were generated with an insert size of approximately 350bp.

Data analysis
The raw reads from the msms and MsMs pools were filtered and aligned to the pepper sequence genomics (http://peppersequence.genomics.cn/page/species/download.jsp) using the Burrows Wheeler BWA alignment tool [41] . GATK software was used to detect single-nucleotide polymorphisms (SNPs) and insertion-deletion (InDels) [42] . The reads depth information for the above homozygous SNPs/InDels in the offspring pools was obtained to calculate the SNP/InDel index [43] . We filtered out those points that had an SNP/InDel index in both pools of less than 0.3. The sliding window method was used to present the SNP/InDel index of the whole genome. The average of all SNP/InDel indexes in each window was as the SNP/InDel index for this window. Usually, we use a window size of 1 Mb and step size of 10 Kb as default settings. The difference of the SNP/InDel index of the two pools was calculated as the delta SNP/InDel index. The differential candidate genes for male sterile and fertile plants of sweet pepper male sterile line AB91 are identified by calculating the indexes of the SNP and SNP/InDel index. The differential candidate genes of male sterile and female fertile AB91 nuclear male sterile lines were determined by calculating the values of SNP/InDel index andΔ(SNP/InDel -index).

Mass spectrometry method identification
The self-crossing F2 generation of the sweet pepper male sterile line AB91 fertile line (Msms) was tested by the flying mass spectrometry Sequenom platform typing technique [44] , and the testing results were read in real-time and completely analyzed by the software Mass ARRAY® (Agena Bioscience Inc., San Diego, CA, USA). The detection for all samples was repeated to verify accuracy.
The test results were compared with the test results with the results of phenotypic identification in the field to calculate the accuracy rate (the number of test results consistent with the phenotype identification in the field divided by the total number of F2 generation groups), and filter out the candidate genes for sweet pepper nuclear male sterility msc2 according to the accuracy rate.

Conflicts of interest
The authors declare that they have no competing interests.     Supplemental.xlsx