I. Genome-Wide identification of catalase genes family in the potato
1. Identification of genes coding catalase enzymes in the potato
To identify the potato genes encoding catalase antioxidant enzymes, we conducted a BlastP of the potato genome database using catalase genes of Arabidopsis thaliana as queries. We then used the Pfam and NCBI-CDD databases to verify the presence of conserved domains. Based on sequence homology, we confirmed the identification of 3 putative non-redundant genes encoding the different CAT enzymes (Table 1). All these genes consist of 8 exons. The full size of StCAT genes is ranging from 3471 to 5368 pb, encoding proteins of 492 aa. Still, their CDS and mRNA size showed the same profile (Figure 1).
2. Chromosomal localization and synteny analysis of catalase genes
Chromosomal positions of the different genes encoding catalase enzymes were determined from the database Phytozome. Their genes StCAT are distributed on three different potato chromosomes (Figure 2). The StCAT2, StCAT3 and StCAT1 are localized to chromosome II, IV and XII, respectively. They are phylogenetically distinct making it possible to conclude that they do not correspond to a duplicate.
To explore more the phylogenetic relationships within catalase gene family, a comparative synteny procedure was carried out wherein the physically mapped StCATs were compared with those of Arabidopsis and tomato on their respective chromosomes. A synteny block involving 3 potato, 3 Arabidopsis and 3 tomato CATs genes was generated (Figure 2). Almost all potato StCAT orthologs displayed the same synteny location with Arabidopsis or tomato.
3. Cis elements involved in transcriptional activity regulation
After having identified and selected a region of 1500 pb upstream the StCATs ATG for searching putative regulatory motif, several cis-acting regulatory elements associated with cellular growth, tissue-specific expression, transcription factors, hormonal and light regulation, abiotic and biotic stresses response elements were found (Table 2).
To explore, in fine, the dispersion of the cis-regulatory elements in the StCATs promoters, ‘Word cloud’ scheme was included (Figure 3) accentuating the most frequent motifs such as DOFCOREZM, TATABOX and CAATBOX. Overall, ten main motifs were checked counting the DOFCOREZM (DOF transcription factor), GT1CONSENSUS (Consensus GT-1 binding site in many light-regulated genes), CAAT box (CAAT promoter consensus sequence), ROOTMOTIFTAPOX1 (root expression motif), POLLEN1LELAT52 (regulatory elements responsible for pollen specific activation of tomato), GATABOX, ARR1AT (responsible for cytokinin expression), CACTFTPPCA1 (responsible for mesophyll expression), ARR1AT (responsible for cytokinin expression), CACTFTPPCA1 (responsible for mesophyll expression) and EBOXBNNAPA (responsible for seed expression) (Figure 3).
Various phytohormone responsive elements were also found in the StCATs putative promoter sequence, including ABA and cytokinin. The latter, being the most abundant with 4 to 8 duplications in each catalase gene (Table 2), implies that StCAT expression might be strongly induced and regulated by cytokinin in potato plants. In particular, the DOF (DOFCOREZM) motif known for enhancing transcription activity was the most abundant in all StCAT gene promoters. The Dof proteins are a family of plant specific transcription factors that includes Dof1, Dof2, Dof3, and PBF (Yanagisawa 2000). Maize Dof1 was suggested to be a regulator of the expression of the C4 photosynthetic phosphoenolpyruvate carboxylase (C4PEPC) gene (Yanagisawa 2004). Dof1 also enhances the transcription of the cytosolic orthophosphate dikinase (cyPPDK) genes and the non-photosynthetic PEPC gene (Yanagisawa 2000).
The putative recognition site for MYC, functioning as transcriptional activator upon dehydration, was identified in most StCAT genes with 1 to 6 copies in each. Other stress regulator motifs, such as salt-responsive element (Park 2004), light responsive elements (Terzaghi and Cashmore 2003), disease resistance response (Luo et al. 2005) and ABA responsive (Kaplan et al. 2006) were present as well in the StCAT promoters suggesting the probable implication of catalase family in response to these stresses in potato through ABA signaling pathway. However, no ABRE (ABA responsive elements) was detected within the StCAT3 promoter, presuming other regulatory mechanisms than ABA responsiveness. Also, GATABOX is entailed for light-dependent and nitrate-dependent control of transcription in plants (Reyes et al. 2004). The GATA motif has been found in the promoter of the Cab22 gene that encodes the Petunia chlorophyll a/b binding protein; this motif is the specific binding site of activating sequence factor-2 (ASF-2; (Lam and Chua 1989)
All of the herein mentioned putative cis-regulatory elements suggest that StCAT family members are implicated in varied cellular processes and they might reply to environmental stresses via different phytohormones signaling mediation.
II. Phylogenetic relationships and motif analysis of catalase potato enzymes
The analysis of phylogenetic tree elaborate from the catalase protein sequence from S. tuberosum, A.thaliana and S. lycopersicum (Figure 4) shows that all the StCAT proteins are closer to their homologous in S. lycopersicum (SlCAT) than those of Arabidopsis.
Furthermore, in order to identify conserved motifs and consensus domains constituting the CAT proteins, the online MEME Suite (v4.8.2) program was used (Figure 4a). The sequence details of each motif are shown in additional file 2. Analogous motifs were shared between the 3 members of StCAT, suggesting common conserved functions inside the catalase family.
In addition, a phylogenetic tree of 39 protein sequences including several CATs from different origins showed two main groups that seem to be associated according to their family, species, and systematic method (Figure 4b).
III. Structural Characterization of StCAT proteins
1. Primary Structure
The size of the identified StCAT proteins had the same size of 492 amino acids (aa), and the corresponding predicted molecular masses ranged from 44,9 to 59,9 kDa, (Table 1). The computed PI of these proteins was ∼6.54 on average, indicating that they are likely to precipitate in either acidic or basic buffers and can be maintained within a neutral buffer.
The amino acid sequence analyses showed that the protein sequence of CAT from S. tuberosum, A.thaliana and S. lycopersicum exhibit a high level of identity between each other (Additional file 3). Similarity percentages within members of the CAT family ranged between 82.55 and 100 %, whereas the identity percentages varied from 75.25 to 99.39%. StCAT2 showed the highest similarity and identity level (100 and 99.39% respectively) with SlCAT2 of S.lycopersium. The lowest similarity and identity percentages were observed between SlCAT1 and AtCAT3, the values being 82.55 and 75.5% respectively.
Main represented amino acids (Figure 5) of the StCAT members are proline (7.3%), arginine (7.1%), aspartate (6.9%), leucine (7.7%), valine (6.5%) and alanine (6.1%). The least common amino acids residues were cysteine, methionine and tryptophan which accounted for ∼1% of the protein’s primary structure. The low amounts of cysteine residues indicated that the chance of disulfide bond formation is low. Leucine, alanine and valine are hydrophobic, aliphatic and non polar amino acids and are thus expected to be found inside the protein or within lipidic membranes.
2. Secondary and tertiary Structure
The secondary structure features as predicted by NPS and GORIV Secondary Structure Prediction Method (Combet et al. 2000) shows that random coils (50%) dominated among secondary structure elements followed by the alpha helix (25%) and extended strand (15%). The predominance of coils points to the fact that catalase from S. tuberosum might not be a very stable enzyme (Perticaroli et al. 2014).
The homology search of the tertiary structure of StCAT was predicted based on template-based modeling by PHYRE2 (Kelley et al. 2015). Six templates for each StCAT were chosen based on heuristics to maximise confidence, percentage identity and alignment coverage (Table 3). The threading templates were selected by the PHYRE2 server from the PDB database on the basis of normalized Z-score of >1.0.
Furthermore, Clustal Omega (1.2.4) was used for multiple sequence alignment and active site identification. An alignment of the translated CDS of S. tuberosum catalase with the selected template catalases is shown in additional file 4. Conserved residues of the catalase sequence involved in the H2O2 binding (V2, H3, V44, D56, N76, F81, F82, F89) were identified after carefully studying the alignment. The results were found to be consistent with the experimentally determined crystallographic structures of human erythrocyte catalase (1QQW) (Putnam et al, 2006). However, few substitutions such as of isoleucine (I) by alanine (A), of methionine (M) by phenylalanine (F), of valine (V) by isoleucine (I) and of glutamine (Q) by leucine (L) were also observed in the proteins of StCAT (Additional file 4).
The quality of the 3D model was assessed on the basis of the confidence score. The validated model using various programs such as Ramachandran plot and energy plot confirmed the reliability of the model. All the parameters for validation were within the range showing the compatibility of the model with its sequence and depicting the excellent quality model (Figure 6A).
Using the Pymol software, the residues involved in the hydrogen peroxide binding are conserved among the 3 StCATs confirming the key role in hydrogen peroxide binding as identified by Prosite-ProRule annotation.
The Figure 6B shows the ligand (H2O2) sits in the area lined by the predicted active site residues. Since, the catalytic site is actively involved in charge transfer reactions required for formation and degradation of bonds, so it is expected to have high electron density (Vivekanand and Balakrishnan 2009).
IV. Study of the expression profile of genes encoding catalase enzymes in silico
1. Tissue- specific expression analysis of potato catalase genes
To check the role of the different potato CAT family members, the gene expression was analyzed, first, within different organs and tissues at various developmental stages of the S. tuberosum phureja variety using microarray data available in Spud DB database (Figure 7). The expression data of 3 StCAT genes were available and retrieved for analysis. Fluorescence intensity values were analyzed to generate a clustered heat map based on the average Euclidian’s distance. As shown in figure 7, StCAT family members exhibited spatial variations in transcript abundance, with high levels of transcript abundance in one or some tissues and low transcript abundance in others.
2. Expression analysis of StCAT genes under different abiotic stresses, biotic stress elicitors and hormonal treatments
The expression of StCAT genes was analyzed in response to a variety of stress agents, to check their specific functions. To evaluate the effect of stress on StCAT gene expression, various abiotic stresses such as salinity, drought, wounding and heat were evaluated on potato plants (Figure 8a). StCAT1 showed up-regulation in all stress conditions, while StCAT2 showed down-regulation in response to salt and mannitol stress. The StCAT3 showed a completely opposite pattern of expression in comparison to StCAT1. Indeed, it was down regulated in response to almost all stresses.
Then, transcript abundance was analyzed in response to three biotic stress elicitors such as benzothiadiazole (BTH) and β-aminobutyric acid (BABA) and pathogen attack (Figure 8b). The transcript abundance of all three StCAT was found to be modified in response to the pathogen infection. StCAT1 seems to be up-regulated in response to all biotic stress. StCAT1 showed the highest activation levels in response to all hormonal treatments except for BAP However, the StCAT2 is down regulated only in response to BABA elicitor. StCAT3 showed down regulation in response to almost all the applied treatment stresses (Figure 8c).
V. Functional study of genes encoding catalase in the of potato variety Nicola
The expression profile of the 3 StCAT genes was investigated by semi-quantitative RT-PCR of RNA from the potato Nicola cultivar cultivated in the presence or in the absence of 100mM NaCl. Oligo-nucleotides were identified and used as primers in RT-PCR reactions allowing the amplification of an internal region of each gene (Additional file 1). The RT-PCR results showed that salt stress caused activation of the expression of StCAT1 gene in plant leaves (Figure 9). However, the expression of StCAT2 and StCAT3 genes seems to be not affected by salt stress. In conclusion, the investigation of StCAT gene expression in the leaves of Nicola potato was consistent with transcriptome data.