A comprehensive analysis of the polygalacturonase family of wheat (Triticum aestivum L.) shed light on important genes affecting pollen development and anther dehiscence


 Background: Polygalacturonase (PG) belongs to a large family of hydrolases that undertake many important functions in cell separation during plant growth and development by degrading pectin. The specific expression of PG genes in pollen may have great significance for plant male sterile research and hybrid wheat breeding. However, it has not been reported in wheat ( Triticum aestivum L.).
Results: Therefore, we systematically studied the PG gene family using the latest published wheat reference genomic information. A total of 113 PGs were identified and renamed as TaPG01 - 113 based on their position on the chromosome. They were unequally distributed on 21 chromosomes and were classified into six categories of A-F. Analysis of gene structures and conserved motifs revealed that the TaPGs of Class C and D had relatively short gene sequences and a small number of introns, and Class E TaPGs were the least conserved and all members did not have III conserved domain. Segmental duplication has been shown to be one of the major drivers of the expansion of the wheat PG gene family. The cis-element predictions indicate that wheat PGs had a wide range of functions, including response to light, hypothermia, anaerobic and hormonal stimulation, and also involved in meristematic tissue expression. In addition, twelve spike-specific expressions of TaPGs were screened using RNA-seq data, and finally three important genes were identified by expression analysis in the sterile and fertile anthers of thermo-sensitive male sterile wheat. TaPG93 was involved in the pollen development and elongation of pollen tubes, and TaPG87 and TaPG95 played important roles in the separation of pollen grains and the cracking of anthers dehiscence.
Conclusions: This study, we performed a thorough analysis of the wheat PG gene family and finally obtained three TaPGs that affect wheat fertility. This will lay a solid foundation for the function exploration of wheat PG gene family and provide new enlightenment for the fertility conversion mechanism of male sterile wheat.

gene family genome-wide pectin pollen polygalacturonase male sterility wheat Abstract Background: Polygalacturonase (PG) belongs to a large family of hydrolases that undertake many important functions in cell separation during plant growth and development by degrading pectin. The specific expression of PG genes in pollen may have great significance for plant male sterile research and hybrid wheat breeding. However, it has not been reported in wheat ( Triticum aestivum L.).
Results: Therefore, we systematically studied the PG gene family using the latest published wheat reference genomic information. A total of 113 PGs were identified and renamed as TaPG01 -113 based on their position on the chromosome. They were unequally distributed on 21 chromosomes and were classified into six categories of A-F. Analysis of gene structures and conserved motifs revealed that the TaPGs of Class C and D had relatively short gene sequences and a small number of introns, and Class E TaPGs were the least conserved and all members did not have III conserved domain.
Segmental duplication has been shown to be one of the major drivers of the expansion of the wheat PG gene family. The cis-element predictions indicate that wheat PGs had a wide range of functions, including response to light, hypothermia, anaerobic and hormonal stimulation, and also involved in meristematic tissue expression. In addition, twelve spike-specific expressions of TaPGs were screened using RNA-seq data, and finally three important genes were identified by expression analysis in the sterile and fertile anthers of thermo-sensitive male sterile wheat. TaPG93 was involved in the pollen development and elongation of pollen tubes, and TaPG87 and TaPG95 played important roles in the separation of pollen grains and the cracking of anthers dehiscence. Conclusions: This study, we performed a thorough analysis of the wheat PG gene family and finally obtained three TaPGs that affect wheat fertility. This will lay a solid foundation for the function exploration of wheat PG gene family and provide new enlightenment for the fertility conversion mechanism of male sterile wheat.

Background
Polygalacturonase (PG, Ec, 3.2.1.15) is a cell wall binding protein that catalyzes the cleavage of α-(1,4)-polygalacturonic acid in pectin. Among numerous enzymes that degrade pectin, PG belongs to the largest family of hydrolases. In 1956, PG was first studied and confirmed to be closely related to fruit softening. Subsequent studies have shown that the main functions of this enzyme is to degrade polygalacturonic acid in the cell wall of fruits into galactose and aldehyde acid, which disintegrates the cell wall structure and leads to fruit softening [1,2]. Fabi et al found that the overexpression of cp-PG1 in papaya can promote the softening of pulp [3]. However, with the application of molecular biology technology in PG research, people have a deeper understanding of it. Roongsattham et al identified 14 PG genes during the maturation and shedding of oil palm fruit, all of which were expressed in the detachment of the fruit base at ripening [4]. A PG gene RDPG1 in Brassica napus was expressed mainly in the dehiscence zone of fruit pods and anthers, and also in the detachment zone of floral organs and the nodes of stems and pedicels during the growth of pollen tubes [5].
Arabidopsis thaliana PG gene QRT3 participates in the separation of tetrad microspores during the pollen development process by secreting and degrading pollen mother cell walls around the tetraploid microspores [6]. Bergey [7] and Orozco-Cardenas [8] suggested that injury induced the expression of PG genes, which produced plant-derived endogenous elicitors, and further activated defense genes PPO and pis in mesophyll cells. Those proved that members of the PG gene family can be expressed in different tissues and different stages of plant development, in addition to its role in fruit ripening, PGs are also associated with leaf and flower shedding, pod cracking, pollen maturation, pathogen defense, and plant host interaction [9,10]. Therefore, other functions of PGs on non-fruit plants are also worthy of further study, although its role in affecting fruit ripening has always been the focus of attention.
PGs can be classified into endo-polygalacturonase (endo-PG, Ec, 3.2.1.15), exo-polygalacturonase (exo-PG) and rhamno-galactosidase (oligo-PG) depending on the site of the substrate [11]. According to the characteristic of amino acid sequences, Hadfield et al. classify PGs into three classes of A-C: the differentiation unit A consists of genes expressed in non-pollen tissues, which encode PG genes that lack of a pre-sequence proteins; the differentiation unit B consists of all PG genes encoding a presequence, and the differentiation unit C consists of PG genes expressed in pollen, encoding exo-PGs [12]. Although this classification was determined by the PGs sequence, it also reflected the function differences of PGs, and different subclasses of PG have unique biological functions. However, Park  of Class F cannot be classified into any of properties of PG [11]. The A to F six class PG genes in Arabidopsis differ in expression patterns, with some class members tending to be expressed in vegetative tissues, while others tend to be expressed in reproductive organs [13]. Therefore, it is more detailed and convenient to screen functions related genes according to cluster classification rules, which is suitable for PG gene family analysis of most crops.
Based on the clustering classification rules, systematically analysis of the PG gene family have been carried out on Arabidopsis and rice [13], soybean [14], apple [15], poplar [16] and other crops, which provide great convenience for function mining of plant PG gene family. However, no analysis of the PG gene family has been reported in wheat. Wheat is a heterologous hexaploid organism with a total of 42 chromosomes, and the research progress was limited due to its large but incomplete genomic information. However, the completion of wheat genome sequencing and sequence release make it possible to explore and study important functional genes of wheat at the genome-wide level [17].
Since the pollen-specific PG genes in tobacco [18] and cotton [19] and Arabidopsis [20] have been studied, the study of wheat pollen-specific genes is well-founded and will provide a new perspective for wheat male sterile research.
KTM3315A is a thermo-sensitive male sterile wheat material that with excellent traits and great potential for hybrid production, which our team has always attached great importance to research. In the previous study, we found that KTM3315A showed male sterile under normal autumn sowing conditions in Shaanxi, China, but it showed fertility can be self-crossed during spring sowing in the same area. By microscopic observation, the binucleate stage was identified as a critical period of fertility conversion, from which the microspore mother cells of the sterile pollens exhibited abnormal shapes such as shrinkage and irregularity [21]. Transcriptome analysis proved that such abortive morphology was closely related to the expression of pectinase [21]. Obviously, it is necessary to study the pollen-specific PG genes that involved in pollen development mechanism through excellent male sterile materials, which can help us to cultivate more stable male sterile materials and accelerate the wheat hybridization breeding process.
In this study, a total of 113 members of wheat PG gene family were identified, and a systematic and comprehensive overview was performed through the analysis of its evolutionary relationships, gene structures, conserved domains, cis-elements and expression patterns. Based on the expression patterns in sterile and fertile anthers, we screened out three TaPGs related to pollen development and anther dehiscence. This is both pioneering and a meaningful study in wheat.

Identification and annotation of wheat PG gene family
The Hidden Markov Model (HMM) file of Glycoside hydrolase family 28 (GH28, EC 3.2.1.) and the amino acid sequences of PGs of Arabidopsis were aligned with the whole amino acid sequences of wheat, and a total of 113 wheat PGs were finally obtained after strict screening and identification (Additional file 1: Table S1). These 113 wheat PGs were unevenly distributed on 21 chromosomes, and each chromosome had at least one PG. However, there were five PG genes distributed on the A, B and D of chromosomes 2 and 5 respectively, as well as chromosomes of 1D, 4A, and 4B. (Additional file 2: Figure S1). According to their chromosomal location, these 113 wheat PG genes were renamed as  (Table 1). Accurate identification and reasonable naming are essential for further research in this gene family.

Evolutionary analysis of wheat PG gene family
To understand the evolution pattern of wheat PG, a total of 238 PG proteins from wheat, Arabidopsis and rice were used to construct a phylogenetic tree (Fig. 1). The results showed that 238 PG proteins of these three species were clustered into six branches (A-F). Among these six classes, class A had 12 TaPGs, and class B to class F had 16, 13, 28, 30, and 14 TaPGs in sequence. In addition, there were 39 pairs of paralogous genes in 113 TaPGs, of which A class has four pairs of paralogous genes, class B, C and F all had five pairs, while class D and E has nine and eleven pairs of paralogous genes respectively. However, there was only one pair of PG orthologous of rice and wheat (LOC_Os01g33300/TaPG44) in the D class. Therefore, it can be speculated that the expansion of the wheat PG family mainly occured after species differentiation from ancestor.

The duplications of wheat PGs
In order to find out the expansion dynamics and evolution direction of the wheat PG gene family, we investigated its gene duplication events. The duplications of the wheat genome indicated that there were eight pairs of tandem duplications and 87 pairs of segmental duplications between TaPGs. The KA/KS values of these eight pairs of tandem duplication TaPGs were all less than 1, indicating that they were in the purify selection (  Table S2). This is a good evidence that the segmental duplications promote the expansion of wheat PG gene family.

Gene structure and conserved motif analysis of wheat PGs
The alignment of amino acid sequences of 113 wheat PGs indicated that most of them contained four intact or tend to be intact conserved domains (Ⅰ 'SPNTDGI', Ⅱ 'GDDC', Ⅲ 'CGPGHGISIGSLG', Ⅳ 'RIK') ( Fig. 3). However, some PGs lack a certain domain, for example, in the B class, the amino acid sequences of the conserved domains of TaPG07 and TaPG81 were missing and incomplete. The most obvious deletions were shown in class E PGs, just as in the case of apple [15], the III conserved domains of all members of wheat class E PGs were missing. In addition, some of the conserved domains of PGs were complete, but individual amino acids were mutated. Most of the PGs in class E mutated the serine (S) to alanine (A) or threonine (T) in the conserved motif 'SPNTDGIH'. The domain IV was the most conserved region, except for the deletion in TaPG07 and the mutation of class E PGs, which completely conserved in the remaining TaPGs. In summary, the PGs in E class were the most variable members of the family. MEME tool was used to search for the conserved motifs of 113 wheat PGs, and ten conserved motifs were obtained (Fig. 4, Additional file 4: Figure S2). No PG had all 10 motifs. Wheat PGs of class A and class C had the same five motifs, while all class D wheat PG members had the same nine motifs except for TaPG92. It was found that motif 1 and motif 3 both contained conserved domain I 'SPNTDGI' and II 'GDDC' of PGs at the same time, and motif 1 also contained the portion of the conserved domain III 'CGPGHGISIGSLG', while motif 2 and motif 6 contained a conserved domain IV 'RIK'. In addition to the above four conserved motifs, motif 5 and motif 9 were the most conservative because they existed in almost all PGs including group E PGs. The structure of TaPGs showed that most of the PG genes had 1-9 introns, among which classes of C and D PGs belonging to exo-PG had relatively shorter gene sequences and fewer introns, and E class PGs belonging to oligo-PG generally had longer intron sequences, while class F PGs generally had more intron numbers (7-9). From the analysis of motifs and gene structures, not only the structural characteristics of each class of PGs can be obtained, but also provided a basis for classification of PG functions.

GO annotation and cis-elements prediction of wheat PGs
It is a very effective means to obtain corresponding gene functions by searching for annotation information in kind of databases and predicting upstream regulatory factors. We obtained the molecular function, biological process and cellular component of 113 TaPGs Table S3). Cis-acting elements of TaPGs were predicted, and ten of them were analyzed here. They

Analysis of the expression patterns of TaPGs
To predict the relevant functions based on the expression specificity of PG, a heatmap were generated using RNA-seq data from Chinese spring wheat in 15 different developmental stages and tissues (Fig. 7). Obviously, 21, 7, 11 TaPGs were significantly up-regulated in the anthesis spikes In order to explore TaPGS that play an important role in wheat pollen development, we focused on twelve completely spike-specific expressed TaPGs (TaPG09, 14,18,28,40,41,49,50,54,87,93,95).

Pollen-specific TaPGs identified
Real-time quantitative PCR (qRT-PCR) was performed on eight of spike-specific expressed TaPGs by sterile and fertile anthers of thermo-sensitive male sterile wheat KTM3315A to understand their expression modes in anthers. The results showed that three TaPGs (TaPG14, 40, 41) that specifically expressed in the spikes of the flag leaf stage showed a decreasing trend with the development of anthers, while the anthesis spike-specific TaPGs (TaPG18, 28, 87, 93, 95) were up-regulated during the late anther development (Fig. 8). Just as confirmed by RT-PCR (Additional file 6: Figure S3), these differentially expressed data during anther development supported the fact that these spike-specific TaPGs were actually expressed in anthers. Although TaPG14, TaPG40 and TaPG41 were up-regulated at the uninucleate stage, their role in fertility conversion could be ruled out for they were not significantly differentially expressed between sterile and fertile anthers. TaPG18, TaPG28, TaPG87, TaPG93, and TaPG95 all had the highest expression levels at the trinucleate or binucleate stage in fertile anthers and showed significant differences in expression level with sterile anthers. However, the expression level of TaPG87 was significantly down-regulated in fertile anthers compared with sterile anthers, while the other four TaPGs were opposite. Their non-uniform expression patterns suggested that they had different roles in the development of anthers, and their functions were not redundant even if TaPG87 and TaPG95 all belong to class B in this gene family.
By performing an alignment analysis on the above five differentially expressed TaPGs, we found that their protein sequences were highly similar to four Arabidopsis proteins (AT4G18180, PGA4, ADPG2, ADPG1), respectively. Like TaPG18 (401aa), the similarity between TaPG28 (401aa) and AT4G18180 (422a) was also bit-score 364.8, the bit-score of TaPG93 (294aa) and PGA4 (431aa, AT1G02790) was 263, TaPG87 (402aa) and ADPG2 (AT2G41850, 433aa) was 399.4, and TaPG95 (420aa) and ADPG1 (AT3G57510, 431aa) was 401. AT4G18180 belonged to the pectin lyase-like superfamily protein and its function was described as polygalacturonase activity, but its role in anthers has not been determined. However, PGA4 was identified as exo-polygalacturonase and played a role in the depolymerization of pectin during pollen development, pollen grain germination, and pollen tube growth. Its interaction protein included five pectin lyase-like superfamily proteins, two beta-Dxylosidase, two alpha-L-arabinofuranosidase, and a glycosyl hydrolase 9A1 involved in cellulose biosynthesis. ADPG2 was identified as a polygalacturonase involved in cell separation in the final stages of pod shatter, in anther dehiscence and in floral organ abscission, and part of its interaction proteins have the function of organ shedding. ADPG1 was a polygalacturonase protein involved in silique and anther dehiscence, and most of its interaction proteins belonged to pectin lyase-like superfamily protein (Additional file 7: Table S4). Therefore, by aligning with the more well-studied model plant Arabidopsis to obtain sequence-similar Arabidopsis PG genes and predicting functions of the corresponding TaPGs, three TaPGs worth studying have broken into our field of vision. TaPG93 was significantly up-regulated during the binucleate stage and was supposed to be involved in pollen development and pollen tube growth, while TaPG87 and TaPG95, which were up-regulated in the trinucleate stage, played an essential role in pollen separation and anther dehiscence.

Discussion
Polygalacturonase, a glucosyl hydrolase family 28, also known as pectic enzyme or pectin depolymerase. It is a gene family that is closely related to plant development, and playing roles by expressing them in different time and space, or by encoding different hydrolysis patterns or binding different substrates. In recent years, the development of genome sequencing has made it possible to systematically study the evolution, localization, structural and expression of plant PG genes. 66 PG genes were identified in Arabidopsis [12], 46 PG genes were identified in rice [13], 62 and 53 were identified in watermelon and cucumber [22], and 75, 85, 112, and 99 in Populus, apple, soybean, and Brassica rapa, respectively [14][15][16]23]. In this study, we identified 113 wheat PG genes and demonstrated that PG gene family plays a complex role in higher plant wheat. The existence of genome-wide repeats and other repetitive events have led to the generation of many repetitive genes in the PG gene family during evolution [13]. Function redundancy and function differentiation may occur among numerous PG members due to the high degree of similarity of sequences [24]. From the duplication analysis of TaPGs, it can be seen there was higher synteny probability occurred between ABD sub-genomic in the same chromosome groups of wheat than that between different chromosome groups. There were eight pairs of tandem duplications and 87 pairs of segmental duplications in wheat PG gene family, and the proportion of segmental duplications of TaPGs was much higher than that of whole genomes. Therefore, we speculated that the main driving force for the expansion of wheat PG gene family was genome replication and segmental duplications. . In general, a gene encoding a protein having the above four conserved domains is identified to belong PG gene family, but it is worth noting that the conservation of domain III is relatively poor [12]. This is consistent with our research. All TaPGs  As early as the 1990s, PG was found to be widespread in the late stages of plant anther development [18,36,40,41], and the exploration of their action mechanism has been very frequent in recent years. BcMF2 was specifically expressed in the tapetum and pollen grains in the late tetrad, however, in the inhibited mutants bcmf2, the inner wall of the mature pollen grains was in a disordered state, the pollen tube failed to germinate normally, and the fertility of the plant was reduced [42]. BcMF9 was also expressed in the tapetum and microspores of late stage of pollen development, which affected the development of pollen and pollen tubes by affecting the formation of inner and outer walls [43]. In addition, in Chinese cabbage, BcMF6, BcMF16, and BcMF17 have also been identified as PG genes that play an important role in pollen development [44][45][46]. In Arabidopsis, three endo-PGs, including ADPG1 and ADPG2 and QRT2, were critical for pollen grains separation and contributed to anther dehiscence, but the overexpressing mutants of QRT2 caused male sterility [20].
In this study, by sequence similarity search, the protein sequences of TaPG87 and TaPG95 were found to be highly similar to the ADPG2 and ADPG1, respectively. From the qRT-PCR results, we found that they all expressed in the late stage (trinucleate) of fertile anthers just as many of the pollen-specific PGs that have been studied above, so we speculate that they are responsible for the separation of

Plant Materials
In this experiment, we used a wheat material KTM3315A selected by our research team in 2001 at the Northwest A&F University of Yangling, Shaanxi, China. KTM3315A is a thermo-sensitive male sterile wheat material that our research group has been focusing on because of its strong agronomic traits and stable fertility performance. It exhibits sterile in low temperature conditions and fertile under conditions of high temperature, so it can both achieve self-crossing and be an advantageous The roots, stems, leaves and anthers from both sterile and fertile plants in three stages (uninucleate, binucleate and trinucleate) were saved in the -80 ℃ refrigerator, as well as freshly grouted grains of fertile plants.

Identification of wheat PGs gene family
The whole protein sequences of wheat and the HMM file of GH28 were downloaded from the Ensembl plant (http://plants.ensembl.org/Triticum_aestivum/Info/Index) [52] and Pfam databases (http://pfam.xfam.org/family/PF00295) [53], respectively. A wheat-specific HMM file was constructed using the results of the wheat protein sequences alignment with the GH28 HMM file with the e-value was less than 1e-20. Then, The newly constructed wheat-specific HMM file was used again to align with the wheat protein sequences, and the results with an e-value of less than 0.01 were identified as candidate wheat PGs. To avoid omissions, we aligned whole wheat protein sequences with PGs sequences of Arabidopsis that downloaded from phytozome [13]. The above two strategies obtained all candidate wheat PGs, and the final wheat PGs were identified after screening by CD-search (https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi) [54], SMART (http://smart.embl.de/) [55] and pfam (http://pfam.xfam.org/) [53]. Their genetic location information were obtained from reference genome annotation. The amino acid length of PG proteins, molecular weight and isoelectric point were predicted by Expsy (https://www.expasy.org/structural_bioinformatics).

Constructing phylogenetic tree of wheat PGs, and analysis of gene duplications
In order to sort out the evolutionary relationship of wheat PGs, the amino acid sequences of 66 Arabidopsis PGs, 59 rice PGs and 114 wheat PGs were performed for clustalw sequence alignment and phylogenetic tree construction by MAGE. The phylogenetic tree was constructed by using the Neighbor-Joining (NJ) method with parameters were set as repeated with bootstrap 1000 times.

Tandem duplications and segmental duplications of the wheat PG gene family were obtained by
Mcscanx [56] and the rations of the nonsynonymous substitution rate to the synonymous substitution rate (KA/KS) of tandem duplication PGs were calculated by the KaKs_Calculator2.0 [57].

Analysis of gene structure and conserved domain of wheat PGs
From multiple sequence alignment analysis of conserved regions, we obtained four conserved domains of the PG gene family and visualized them using GeneDoc software. Using MEME, 10 conserved motifs with lengths of 6-50 were obtained. Combined with the wheat genome annotation information, the genetic structure and conserved motifs of wheat PGs were visualized by TB-tools [58].

GO annotation, cis-element and function predictions of TaPGs
To understand the molecular functions, biological Processes and cellular components of wheat PGs, the GO annotation of wheat PGs were obtained from ensembl plant. It is helpful to understand the upstream regulatory factors of PG by analyzing cis-elements. Therefore, the 1500 bp sequences of upstream of the TaPGs promoter were used to predict the cis-elements of PGs via plantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) [59]. To predict the function of TaPGs, STRING was used to obtain Arabidopsis PGs that with similar sequences to wheat PGs, and their functions and interactions [60].

Expression analysis of TaPGs
In order to screen genes based on expression preference, RNA-seq data from 15 different tissues

Consent for publication
Not applicable.

Availability of data and materials
The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request

Competing interest
The authors declar that they have no competing interests

Additional Files
Additional file 1: Table S1. Amino acid sequences of 113 wheat PGs.
Additional file 3: Table S2. Position information of wheat PG genes involved in segmental duplications.
Additional file 4: Figure S2. Ten motif sequences of wheat PGs.
Additional file 5: Table S3. Details of the GO annotation of the TaPGs.
Additional file 7: Table S4. Gene annotations and interaction annotations of Arabidopsis proteins that with similar sequences to the important TaPGs. Figure 1 Phylogenetic tree of Arabidopsis thaliana, rice and wheat PGs. The shades of different colors on the tree are used to distinguish between different branches, and the letters A-F indicate the classification of the PG gene family.

Figure 2
The synteny relationships of the wheat PGs. The gray lines in the middle represent the synteny of the whole wheat genome, and the orange lines represent the synteny between the PGs.      qRT-PCR results of 10 TaPGs.