Identification and Characterization of Subtilisin-Like Serine Protease Genes in maize
The genome message was downloaded through the maize database. In order to identify SBTs in maize, we use the Hidden Markov Model (HMM) tool to search all the possible SBTs on maize peptide maizegdb (Version 3.22) with Peptidase_S8 domain (PF00082) as the query. Then, the batch CD-search of NCBI was employed to ensure that all the identified sequences contain the target domain, such as Peptidase_S8 domain. etc (Marchler-Bauer et al., 2011). Finally, we identified 59 ZmSBTs in Z.mays and give them name according to their phylogenetic relationships with pineapple's SBT proteins (Supplementary Table S1). Then the basic physical and chemical properties of the 59 ZmSBTs were evaluated in Supplementary Table S2, including coding sequence (CDS) lengths, protein lengths, isoelectric point, and molecular weight. The coding sequence of ZmSBTs have 259 ~ 1040 amino acids, but most of the sequences have 700 ~ 800aa in length. The molecular weight ranged from 28.05 and 115.61 kDa, but most of them are between 75.00 and 85.00. The range of isoelectric points of these 59 ZmSBT genes varies greatly, ranging from 4.68 to 9.60, of which 31 genes have isoelectric points less than 7.0. The span of these characteristics is relatively large, which indicates that the biochemical characteristics of ZmSBT are diverse. The instability index gives us a message of whether ZmSBT is stable or unstable in various biochemical processes. Table S2 shows that ZmSBT is relatively stable and about two-thirds are hydrophilic proteins.
In order to determine the molecular function of ZmSBT, we predicted the subcellular localization through WoLF PSORT, the results of subcellular localization predicted that ZmSBT was widely localized, almost half of ZmSBT was located in chloroplast, and the rest were located in vacuole, plasma membrane, Peroxisomes, nucleus, mitochondrion, Extracell and Cycloplast respectively. There is only one ZmSBT protein located in Peroxisomes and nucleus, ZmSBT5.4 and ZmSBT2.2. At the same time, we also predicted that ZmSBT3.3 located not only in chloroplast, but also in mitochondrion.
Phylogenetic Analysis of maize SBTs
The phylogenetic tree of pineapple and maize SBT family was constructed using the neighbor joining statistical method. According to previously reported, 59 ZmSBT genes were divided into 6 subfamilies, namely Group I, Group II, Group III, Group IV, Group V, and Group VI. The largest subfamily is Group I, with a total of 30, and the smallest subfamily is Group VI, just 2 genes (Fig. 1A).
To further understand the characteristics of SBT family in monocotyledons and dicotyledons, Euclaena mexicana, Oryza sativa, Arabidopsis thaliana, Ananas comosus, Zea mays and Solanum tuberosum were selected for evolutionary tree analysis and the results revealed that there were most SBT genes in potato genome than other, and the teosinte SBT gene are the least (Fig. 1B and Supplementary Table S3). Through the quantitative analysis of the SBT family genes of these six species, an interesting phenomenon was found, In monocotyledons, the number of group Ⅰ is the largest, while group Ⅳ is the least. But in dicotyledons, on the contrary, group Ⅳ is the most and group Ⅰ is the least. The interesting phenomenon is that the number of group VI family is not much different in monocotyledons or dicotyledons which shows that group VI is conservative.
Gene Structure Characterization and protein Motif Identification
Gene structure often determines gene function, so understanding the diversity of gene structure helps to understand gene function. In order to further understand the gene structure of zmSBT, exon-intron organization of ZmSBT genes are analyzed by TBtools. As shown in Fig. 2, the gene structure of the same subgroup tends to be more similar. The gene structure of group I is simple, often contains only one exon, and the gene length is also highly conservative, except for ZmSBT1.30 and ZmSBT1.17. While the gene structure of other subgroups is relatively complex, including 3 or more exons, however, the ZmSBT5.1 gene only contains one exon, which is different from the gene structure in the fifth group. This may be caused by the loss of the remaining exons of the ZmSBT5.1 gene during evolution. (Fig. 2B)
Protein motifs represent the necessary part of the conserved binding sites or catalytic sites retained in the evolution of the same ancestral protein family. Different proteins may have the same or similar functions if they contain the same structural motifs. Therefore, in order to understand the functional diversity of proteins, the conserved and diversified motifs were further identified by meme with setting 10 motifs (Fig. 2A). Motif 7 is present in almost all ZmSBTs except AcoSBT2.2 and AcoSBT2.6. The fifth group gene all contains 10 motifs. The conserved motif of group III gene is 2-6-7-4-3. and the conserved motif of group IV gene is 9-10-2-6-7-4. Except ZmSBT1.11 and ZmSBT1.29, all group I contains these 10 motifs.
There are three conserved domains in the SBT gene family, peptidase S8, PA peptide and inhibitor I9 peptide (Fig. 2C). Except ZmSBT3.4, ZmSBT4.1, ZmSBT4.3, ZmSBT6.2, ZmSBT2.1, ZmSBT2.6 and ZmSBT2.2, other genes all contain peptidase S8. Almost all ZmSBT genes contain PA peptides for substrate recognition, except for group VI and some group II genes (ZmSBT2.6 and ZmSBT2.2).
Chromosome location and evolutionary analyses of ZmSBTs
There are 10 chromosomes in the maize genome, and 59 ZmSBTs are unequally distributed on these 10 chromosomes. As shown in Addition figure S3, the fourth chromosome contains the smallest number of ZmSBTs, only 2, while Chr1 contains the most ZmSBTs, up to 11.
To further investigate the evolutionary mechanisms of the ZmSBT family, all intragenomic and intergenomic duplication data files of maize were filtered by MCScan, just 4 pairs of duplicated genes were found, ZmSBT2.7/ZmSBT2.5, ZmSBT1.2/ZmSBT1.15, ZmSBT1.18/ZmSBT1.28 and ZmSBT1.2/ZmSBT1.29(Fig. 3 and Supplementary Table S4). Obviously, these four pairs of gene duplication occur in the same group, so it can be seen that each subgroup is quite conservative. Group I subfamily has more replication events than other families, indicating that they play an important role in SBT amplification in Maize. There are two main types of gene replication. Segmental replication and tandem segment replication are long and almost identical DNA fragments, which belong to gene duplication in specific regions of chromosomes. Repetitions range from one chromosome to the whole genome. It is usually the most common in plants, so a large number of duplicate chromosome blocks are retained in their genomes (Yu et al., 2005). Tandem replication mainly occurs in the chromosome recombination region (Li et al., 2001). The gene family members generated by tandem replication are usually closely arranged on the same chromosome, forming gene clusters with similar sequences and functions (Holub, 2001). These four pairs of genes belong to segment duplication, which indicates that segment duplication contributes greatly to the expansion of ZmSBT gene family.
Euclaena mexicana is the ancestor of maize. In order to understand the evolutionary mechanism between them, we further conducted collinearity analysis of ZmSBT between teosinte and maize. A total of 28 homologous genes were identified, indicating that these genes are very conservative and have existed in an earlier period. We also found some interesting phenomena, such as the collinearity of one gene in teosinte corresponding to two genes in maize, such as EmSBT1.10-ZmSBT1.2/ZmSBT1.15, EmSBT1.2-ZmSBT1.18/ZmSBT1.28, EmSBT2.1-ZmSBT2.8/ZmSBT2.6, EmSBT2.2-ZmSBT2.8/ZmSBT2.6, EmSBT2.3-ZmSBT2.7/ZmSBT2.5 and EmSBT3.2-ZmSBT3.6/ZmSBT3.13, these genes may be formed after the differentiation of teosinte and maize. Similarly, these 28 gene pairs all occur in the same group, which indicates that the ZmSBT family genes are very conserved in the evolution process. We further calculated the Ka/Ks values of the gene pairs shown on the comparative synteny map (Supplementary Table 3), we found that the Ka/Ks value of all gene pairs is less than 1, when the Ka/Ks value is less than 1, it means that they are experiencing negative selection. Among these gene pairs, the EmSBT1.8/ZmSBT1.6 pair has the highest Ka/KS, up to 0.93. These results indicate that the SBT family genes in maize have experienced strong intensified selection pressure.
In order to understand the evolutionary relationship between the maize SBT gene and the SBT gene of dicotyledonous and monocotyledonous plants, we conducted collinearity analysis with the maize SBT gene and the monocotyledonous representative plants rice and teosinte, the dicotyledonous representative plants Arabidopsis and potato, respectively. According to the syntenic analyze results (Addition Figure S4), in monocotyledonous, there are 42 orthologous gene pairs in maize and teosinte and 40 orthologous gene pairs in maize and rice. However, there are only three orthologous gene pairs of Arabidopsis and potato.
Cis-Acting Elements Identification in Promoter Region of ZmSBTs
In order to further understand the biological role of ZmSBT gene in maize, we intercepted the 2000bp upstream of all ZmSBT genes to predict its potential cis-acting element. In addition to the common TATA box, CAAT box and some unknown elements, a total of 1441 cis-acting elements of 27 types were identified (Supplementary table S5). It is mainly divided into plant growth and development response elements, stress-related response elements and plant hormone related response elements. Among them, growth and development response elements mainly include AE box, ACE, ATCT motif, I-box, box-4, G-box and GATA motif. Box-4 has the largest number, up to 71, followed by G-box and GATA motif. This indicates that these three cis-acting elements play a very important role in the growth and development of ZmSBT on maize. Similarly, the representative elements of stress related response elements include ABRE, ARE, DRE1, ERE, LTR and STR, among which the highest number is abre elements, up to 293.
According to the analysis of cis-regulatory elements of ZmSBT genes, the binding sites of key transcription factors are related to phythohormone response elements (43%), stress related elements (36%) and growth response and development elements (21%) (Fig. 4 and Supplementary Table S5). At the same time, various hormones such as auxin, salicylic acid, abscisic acid, gibberellin and methyl jasmonate (MeJA) were also found. Besides, In order to understand the correlation between each subgroup and cis-acting elements, we analyzed the cis-acting elements of each subgroup (Fig. 4). We found that not all groups contain these 27 cis-acting elements, for example, group IV and group VI do not contain ACE elements and AE boxes, while ATC and ATCT motif elements are only contain by group I ~ III. However, the distribution of cis-acting elements in the promoter region has no strong correlation with phylogenetic groups.
Spatio-T emporal Expression Profiles of AcoSBTs in maize
As an initial step towards understanding of the biological function, the expression patterns of ZmSBTs during entire growth and developmental cycle analyzed using available oligonucleotide microarray data is necessary. We used the transcriptome data published by Scott C.(Stelpflug et al., 2016) on original research to analyze the expression profiles of all SBT genes in different tissues and development stages. Understanding the expression of 59 ZmSBTs in various tissues of maize will help to understand the difference in the role of ZmSBT genes in different tissues of maize, so as to predict their potential functions (Supplementary Table 6). As shown in Fig. 5, ZmSBT’s expression is different in various tissue. ZmSBT1.22 and ZmSBT5.4 show highly expression in leaf. Seven genes, including ZmSBT5.10, ZmSBT2.3, ZmSBT1.7, ZmSBT1.21, ZmSBT5.3, ZmSBT1.20 and ZmSBT5.1, have highly transcript level during meiotic tassel development in maize, while they are show low expression in other tissues. Therefore, these genes play a vital role in tassel development in maize. And we also found an interesting phenomenon. Some ZmSBT are expressed in all tissues, and its wide expression shows that it is essential for the growth, development and reproduction of maize, sucn as ZmSBT1.9, ZmSBT6.1, ZmSBT2.2. In group IV, almost all genes were poorly expressed in all tested samples, which indicates that this group of genes may not play a role in the development of maize. In addition, there are also some genes of group I that do not participate in the development of maize, such as ZmSBT1.22, ZmSBT1.29, ZmSBT1.11 and ZmSBT1.14. Although the number of genes in group I is the largest, there are also some genes that do not function, which may be due to the loss of their original role during evolution.
Protein-Protein Networks Analysis of the ZmSBT Family Genes
To understand the interaction of ZmSBT with other proteins and better understand its potential function, we constructed a network using the STRING database (Fig. 6 and Supplementary Table S7). According to the predicted results, We identified two ZmSBT proteins specifically expressed in meiotic tassel tissues to interact with 8 and 5 different proteins in maize, respectively. ZmSBT1.7 interacts with GRMZM5G891373_P02, GRMZM5G803365_P01, GRMZM2G167338_P01, GRMZM2G077034_P01 and GRMZM5G895188_P02, while ZmSBT2.5 interacts with Oleosin Zm II, GRMZM2G108894_P01, GRMZM2G380650_P02, GRMZM2G477683_P01, GRMZM2G060481_P01, GRMZM2G447984_P01, GRMZM2G056231_P01 and GRMZM2G143373_P01. In addition, we also identified the interaction between the specific expression of ZmSBT protein in roots and seeds, and other proteins in maize. ZmSBT1.25, specifically expressed in maize roots, can interact with GRMZM5G866861_P02, GRMZM2G335978_P01, GRMZM2G119397_P01, GRMZM5G831795_P03, GRMZM2G098583_P01, GRMZM2G168115_P01 and GRMZM2G13. And ZmSBT1.25, specifically expressed in maize seeds, can interact with GRMZM2G167338_P01, GRMZM2G008327_P01, GRMZM5G878541_P02, GRMZM2G421491_P01, GRMZM2G092174_P01, GRMZM2G077034_P01, GRMZM2G103647_P01, GRMZM2G029979_P01, GRMZM5G803365_P01 and GRMZM5G891373_P02.