Identification and annotation of INV gene family members in wheat
By aligning all the protein sequences of wheat with the INVs of Arabidopsis (17), rice (19) and Brachypodium distachyon (19), 130 wheat INVs were identified and strictly screened (Table S1, Table S2). Twenty of them were evaluated as A/NINVs and belong to the GH100 family, and mainly expressed in cytoplasm and chloroplast. According to the position order, these 20 A/NINVs were named as TaA/NINV1-TaA/NINV20. In addition, 68 INVs were predicted to be expressed in the cell wall and were named TaCWINV1-TaCWINV68, and 42 INVs were named TaVINV1-42 for their subcellular location was vacuole. By analyzing their physical and chemical properties, it was found that the number of amino acids of A/NINV was 505-653 aa, and the corresponding molecular weight was 56.31-72.8 kDa. 509-670 aa and 56.61-74.97 kDa for VINVs, respectively. However, the amino acid number and molecular weight of CWINVs are smaller than those of the former two, which are 332-657 aa and 37.88-74.94 kDa. The theoretical isoelectric points of these 130 TaINVs span a wide range, from 4.69 to 9.31 (Table S3). By mapping their chromosome distribution, we found that the numbers of TaINVs were unequal in the three A, B and D sub-genomes (A:B:D=52:33:41), but it can be predicted that most genes have three homologous copies because of their uniform location distribution in the three sub-genomes (Fig S1).
Phylogenetic analysis of TaINVs
In order to understand the evolutionary relationship of the wheat INV gene family, 130 TaINVs with 19 rice INVs (OsCINVs) and 19 Brachypodium distachyon INVs (BdINVs) were used to construct a phylogenetic tree for cluster analysis. Compared with the previous studies on rice (Ji et al. 2005), Brachypodium (Wang et al. 2017), the classification results of TaINVs were similar. The 130 TaINVs were divided into two major categories: AINVs and A/NINVs. The AINVs group included CWINs and VINs, A/NINVs were divided into two branches including α (TaA/NINV1-11) and β (TaA/NINV12-20) (Fig. 1). In the A/NINVs group, almost every OsINV and BdINV corresponded to three homologous TaINVs from the wheat ABD sub-genome (wheat: 20, rice: 8, Brachypodium: 8). In the AINVs group, the situations are a little more complicated. Compared with rice and Brachypodium distachyon, the number of INVs in wheat was more than three times that of them (wheat: 110, rice: 11, Brachypodium: 11), and a higher level of increase. There were only two VINVs in rice and three VINVs in Brachypodium, however, the number of VINVs in wheat was as high as 42.
The duplication of wheat INV gene family
Gene duplication often occurred in the whole genome, the main reasons were genome-wide duplication, tandem duplication, and segmental duplication (Zhang 2003). Of the 130 TaINVs, 54 genes had one copy on each of the three homologous chromosomes in the three sub-genomes, 43 genes had one copy on each of the two homologous chromosomes in the three sub-genomes (Table S4). By analyzing the segmental duplications, we found that there were 82 pairs of segmental duplications involving with 101 TaINVs (Fig. 2). However, except for the six pairs of segmented duplications produced by AINV from different chromosomes, the rest was duplications between different ABD sub-genomes of the same chromosome. Moreover, 13 TaINVs involved with 22 pairs of tandem duplications were found in the wheat INVs gene family (Table S5), which all occurred in TaAINVs and KA/KS values less than one. Overall, polyploidization of wheat played an important role in the expansion of TaINVs, and the segmental and tandem duplication events partly caused the significantly higher number of AINVs in wheat.
The gene structures and motifs of TaINVs
Introns were characteristic of eukaryotes, which were subject to relatively little selective pressure, resulting in rapid changes in the size and order of genes structures (Lecharny et al. 2003; Rogozin et al. 2003) However, the positional correspondence between introns and exons was usually highly conserved among homologous genes, so they were used to classify paralogous genes into subfamily (Park et al. 2008). Obviously, different subtypes of TaINVs had different numbers of introns and exons (Fig. 3). In TaAINVs, the number of exons ranged from two to nine. Except for a few genes, most TaCWINVs contain 5-9 exons, and most TaVINVs contain 3-4 exons. In TaA/NINVs, except for TaA/NINV2 (3 exons) and TaA/NINV18 (7 exons), there were four and six exons in α subgroup and β subgroup, respectively. On the contrary, TaINVs of the same subtype contain relatively uniform intron and exon, for example, all TaAINVs contain a mini-exon, which encodes NDP. Furthermore, homologous copies, or paralogs formed by duplication events, had almost the same number and structure of introns and exons.
From the amino acid level, motif as a super secondary structure could facilitate the identification of functional differentiation within gene family. This study identified 15 conserved motifs in the TaA/NINV and TaAINV groups respectively, and they were completely different (Fig. 3). In TaAINVs, motif1 (β-fructosidase motif NDPN), motif6 (RDP) and motif9 (WECP/VD) were essential markers, while motif3 and motif15 were specific to CWINV, and motif 10 was unique to VINV. In TaA/NINVs, there were nine motifs (motif1-3, 5-9, 12) shared by 20 A/N-INVs, while two motifs (motif11, motif15) were specific in α subgroup and one motif (motif13) specific in the β subgroup.
Eight AINVs were specific expressed in wheat spikes during reproductive stage
Compared to identifying members with specific motifs, the time- and space-specificity characteristics of gene expression always provided straightforward information for the study of gene functions related to the expression position. In order to explore the expression pattern of TaINVs and screen important ones, the RNA-seq data from roots, leaves, spikes, and grains in the different growth stages were analyzed. The results showed that 124 TaINVs were detected in the above tissues (Fig. 4). The tissue specificity of these TaINVs was more obvious than the period specificity, as a result, the expression patterns of TaINVs in the same tissue between different stages were similar. The expression patterns of most TaINVs in vegetative organs and reproductive organs were just opposite. Therefore, the expression profile of 124 TaINVs could be roughly divided into two categories as Fig. 4, and 63 of them expressed preference during the vegetative period and 61 highly expressed during the reproductive period. Notably, eight wheat AINVs (TaCWINV40, TaCWINV53, TaVVIN27, TaCWINV46, TaCWINV68, TaVVIN7, TaCWINV36, and TaCWINV2) specifically expressed in spikes.
Six TaINVs differentially expressed in anthers of KTM3315A under different fertility conditions
To further investigate the TaINVs that may be related to wheat male fertility, we performed qRT-PCR on eight wheat spikes-specific TaINVs and four TaINVs highly expressed in wheat spikes. The sterile and fertile anthers of thermo-sensitive male sterile wheat KTM3315A at three stages (uninucleate, binucleate, trinucleate) were used as materials. The results showed that six of them (TaCWINV2, TaCWINV3, TaCWINV4, TaCWINV41, TaVINV7, and TaVINV27) had no significant difference in expression level at each stage of sterile and fertile anthers (Fig. 5). The other six TaINVs (TaCWINV36, TaCWINV40, TaCWINV43, TaCWINV46, TaCWINV53, and TaCWINV68) were not only similar in expression patterns, but also had significantly up-regulated expression in the fertile anthers than that of sterile anthers at binucleate stage. Interestingly, three genes, TaCWINV40, TaCWINV46, and TaCWINV53, are orthologous genes of rice OsCWINV2 (LOCOs04g33720) (Fig. 1), which has been revealed to lead male abortion when suppressed by low temperature (Oliver et al. 2005).
Silencing of TaCWINV40 induces a decrease in wheat fertility
Rice OsCWINV2 was confirmed to be a cell wall invertase, which was anther-specific, mainly by affecting the hexose production and starch formation (Oliver et al. 2005). Here, TaCWINV40 was used as a representative to study whether it had similar functions to OsCWINV2. By fusion expression with green fluorescent protein, the subcellular location of TaCWINV40 was confirmed to be the cell wall (Fig. 6A). The analysis of the promoter region of TaCWINV40 revealed the presence of four important cis-elements (Fig. 6B). POLLEN1LELAT52 and GTGANTG10 have been reported in the promoters of the tomato lat52 gene and tobacco late pollen gene g10, respectively, and as regulatory elements responsible for their pollen specific activation (Bate and Twell 1998; Rogers et al. 2001). WRKY71OS and MYCCONSENSUSAT were recognition sites of transcription factor WRKY and MYC, which were revealed to be a regulator of cold-induced transcriptome and a transcriptional repressor of the gibberellin signaling pathway, respectively (Zhang et al. 2004; Chinnusamy et al. 2003). These four cis-elements were distributed at least 3 sites on the promoter of TaCWINV40, which illustrated that TaCWINV40 may be regulated by them and then specifically expressed in pollen and regulated by cold and gibberellin pathway.
To investigate the effect of TaCWINV40 on wheat fertility, VIGS technology was carried out using KTM3315A plants grown in fertile environments (>24℃). The infected plants showed abnormal leaves on about 14 days, and the white spots of positive control plants BSMV: PDS indicated that the barley virus successfully infected the plants and effectively silenced PDS gene (Fig. 7A). The qRT-PCR result showed that the expression of TaCWINV40 in the anthers of BSMV: TaCWINV40 was significantly lower than that of the negative control plants BSMV: 0 (Fig. 7G), which indicated that TaCWINV40 had been silenced. Although the anthers of BSMV:TaCWINV40 plants were still cracking and pollen grains were formed (Fig. 7B), the pollen microspores stained with I2-KI and DAPI showed sterile characteristics, that is, transparent shrunken vacuoles and two round sperm nuclei. On the contrary, in BSMV: 0 plants, the microspores possessed two spindle-shaped sperm nuclei, and they were all dyed into solid regular circles by I2-KI due to fulling of starch (Fig. 7C, D). In addition, the pollen tubes of BSMV: TaCWINV40 germinated extremely low, while which of BSMV: 0 almost germinated (Fig. 7E). SEM and TEM observations of trinucleate microspores support the key to further understanding BSMV: TaCWINV40, from which we observed the sparse arrangement and abnormal secretion of ubisch body, as well as the adhesion of shrinking microspores on the anther wall (Fig. 7F). In mature plants, seed setting rate of BSMV: TaCWINV40 was significantly lower than that of BSMV: 0 (Fig. 7H). once again confirmed the indispensable role of TaCWINV40 in wheat anther development and fertility determination.