Genome-Wide in Silico Analysis of Ethylene Responsive Factors Associated with Stress Responses in Wheat


 Background: Transcription factors have largely been involved in the qualitative and quantitative regulation of gene expression. Among the diverse transcription factor families, AP2 (Apetala2) /ERF (Ethylene responsive factors superfamily is a well-known plant family of proteins for controlling various and diverse stress responses in plants. Ethylene responsive factors (ERF) are known to be vital for generating various stress responses.Results: A total of 181 TaERF proteins were identified in Triticum aestivum. Phylogenetic analysis, conserved motif analysis, chromosomal localization, gene structure analysis and multiple sequence alignment of ERF genes along with protein- protein interaction network analysis was performed. The phylogenetic analysis led to classification of these ERF members into 10 major groups and provided evolutionary relationship among the groups on the basis of similar protein motifs like which include WLG and 15 others. Further analysis revealed exon number, molecular weight, isoelectric point, and length of proteins. Gene ontology analysis concluded that TaERF were involved in responses to many Stressors, revealing their possible function. Sub cellular localization revealed that most of the proteins were confined to nucleus, cytoplasm and mitochondria. Network analysis provided insight of proteins involved in generating responses to stresses based on topological analysis and betweeness centrality of the network based on network theory.Conclusions: Genome-wide identification, gene structure, phylogeny and network analysis of TaERF proteins in Triticum aestivum supplied a solid theoretical foundation for the functional study of TaERF family. The bioinformatics analyses of 181 TaERF were implemented systematically revealing proteins characteristics and structural information.


Background
Abiotic stresses and biotic stresses affect many economical crops and reduce their yield and trigger stress responses [1]. Regulation of stresses occur at cellular, biochemical, physiological and molecular level. A number of stress responsive genes and proteins associated with signaling at molecular level have been identi ed and characterized [2]. Transcription factors are DNA binding proteins that initiate transcription and bind themselves to cis-regulatory elements. The functional and evolutionary similarities among different TFs have led to their classi cation into various families including APETALA2/ethylene response factor (AP2/ERF), AP2/ERF, WRKY, FAR1, DREB etc [3]. AP2/ERF transcription factor family play a crucial role in expression of genes, related to stress responses, reproduction, defense, and hormone secretion [4]. Over the past several years, there has been an increasing interest in the study of functional and structural characteristics of AP2/ERF. AP2/ERF superfamily emerged by horizontal transfer from bacteria/viruses to plants [5]. First discovered in Arabidopsis thaliana, the proteins comprises of 60 to 70 amino acids [6]. These genes were cloned or replicated in tobacco plants initially [7]. Currently, AP2/ERF gene family have been recognized in several species including wheat, barley, foxtail millet, sorghum, maize, barley, soya bean, grape, poplar, moso bamboo and among most of the members of the grass family [8][9][10][11].AP2, ERF and RAV based on the domain observed among these proteins. AP2 family have two AP2/ERF domains and is majorly involved in the plant developmental process like leaf formation and development of, owers, embryos, ovules and fruits [12]. It also consists of two sub families AP2 and ANT [13]. ERF family has only one AP2/ERF domain and is involved in ethylene signal transduction. [14,15], pathogen-stimuli and gene expression [16]. Recently, ERF genes have been studied in maize [17]. In RAV family, two structural domains are present on AP2/ERF and one B3 domain also present in other transcription factors. [18] Moreover, RAV show hormonal responses to ethylene and brassinosteroid [19].
ERF family is further split up into two subfamilies: the CBF (C-repeat)/DREB (dehydration response element binding) and ERF (Ethylene responsive factors) [20]. ERF subfamilies bind to the GCC box (AGCCGCC) [21], where G2, G5 and C7 are the core and crucial residues [21][22][23][24]. The 3-D analysis of AP2/ERF protein domain depict 3 anti-parallel beta-sheet and one alpha-helix [22][23][24]. On the other hand, ERF transcription factor proteins are involved in various signaling cascades involving complex proteinprotein interactions (PPIs) forming protein networks driving various signaling pathways. PPIs are vital for a variety of cellular processes, thereby, making the apprehension of PPIs a crucial factor for perception of cell physiology in normal states and diseased states. [25][26][27][28][29][30]. Protein-protein interaction networks (PPIN) are mathematical depictions of the physical connections between various proteins within the cell.
Globally, wheat demand is increasing with increase in world population and food demands [31].Biotic and abiotic stresses affect wheat production to a greater extent. Many TF have been studied in wheat that respond to pathogen related attack and tolerating freezing temperatures [31][32].To responses. To date, there is no report that represent detailed analysis of structure and function prediction of ERF proteins in wheat using in silico approaches. Genome-wide analysis of a family is relatively effective approach to classify and differentiation of plant gene or proteins functions, facilitating the study of the phylogeny of genes and genomes.
Though, AP2/ERF proteins have been thoroughly studied in various plant families, the current research provides detailed structural analysis and protein interaction network of ERF proteins involved in stress regulation. Present study is mainly focused on genome-wide analysis of ERF subfamily and proteinprotein interaction network analysis in Triticum aestivum. Phylogenetic evolutionary studies, gene structure analysis, protein-protein interaction network analysis, gene ontology has been incorporated.
This prediction-based study would help in providing an insight for future strategies and research for functional analysis and to design wet lab-based experiments.

Results
Bioinformatics analysis of ethylene responsive factors ERF genes comparative analysis in different species was done showing Triticum aestivum, sorghum have almost similar number of ERF gene e.g. 181, and 172 respectively. While Arabidopsis, Oryza sativa and Hordeum vulgare have fewer number than Triticum aestivum (Table S1). The individual genes of Triticum aestivum are mentioned in (Table S2), with their predicted characteristics, including intron number, amino acid (a.a) sequence length, molecular weight and isoelectric point. The length of proteins ranges from minimum 48 amino acids to maximum 1409 amino acids and PI ranged from 4.4 to 12.6 mentioned in (Table S2).

MSA (Multiple sequence alignment)
To perceive and identify residues that are conserved in ERF domain, multiple sequence alignment of these 181 ERF genes was done using CLC work bench software package. With setting parameters: start gap: 10.0, cost setting as: 0.1, end gap cost: free. These parameters provide the insight of how the sequences will align which characters are to be considered as gap length of the gap. The ERF proteins sequences after Alignment were used to view phylogenetic tree in CLC viewer 20.0.3 and constructed in clustral w. Most sequences show similarities and are highly conserved across the family. The motifs such as WLG are conserved and present in most of protein sequences of wheat ERF's. Residues present in between are more or highly conserved collate to C-terminal and N-terminal residues as shown in (Fig. 1).

Phylogenetic relationship between the ERF genes in Triticum aestivum
Phylogenetic analysis of 181 ERF proteins of Triticum aestivum was done with Arabidopsis thaliana, Oryza sativa, horduem vulgare, zea mays. On the basis of phylogenetic analysis, ERF protein were seen to be closely related with zea mays and horduem vulgare. NJ method with bootstrap replicates value 1000 was used. [Fig. 2].

Subcellular localization and gene analysis of Triticum aestivum ERF
In order to obtain insight of ERFs structure, intron was analyzed by obtaining number of exons in each ERF. Most of the ERFs have intron and many of them have 1 or 2 introns. The number of introns were predicted by GSDS tool. Traes_5BL_7F0FD1538.2 protein has a greater number of exons (9). In subcellular localization analysis, most of ERF proteins were in nucleus. Plant-mPLoc evaluation, suggested that ERF genes were present in nucleus, cytoplasm, mitochondria. One protein was present in mitochondria and, Wolf psort predicted ERF genes located in multiple locations (Chloroplast, Nucleus, cytoplasm, Mitochondria, vacuole, cyto skeleton) as shown in (Table S3) and (Table S4).

Conserved motif Analysis in ERF family
Conserved motif Analysis was done by MEME motif discovery the software depicted that there was ERF motif and many other motifs. Some ERF protein sequences also showed EAR-like motifs like other various plant species include Arabidopsis thaliana, rice plant and Zea mays. WLG motif was conserved among most of the sequences. MEME software for motif discovery inputs multiple protein sequences and number of conserved motifs as well as all input sequences motifs 1,2,3,4,5 PPIs network construction and with draw sub network from the network For the construction of PPIs network PPI interactions were taken from string database. The candidate genes were browsed into the STRING database and a le was obtained consisting all genes PPI interactions. A network was obtained by using drawing feature of PAJEK software. Topological analysis was done containing properties of hubs including degree (k), betweeness centrality (BC) and closeness centrality (CC) were used to assess the hubs in a system; particularly k and BC are two crucial boundaries in the network theory further more backbone network was constructed made up of high BC proteins and the links between them. 10% of the total node set was set as high BC to get most number of nodes as the backbone in order to study highly connected proteins in the analysis network. High BC value proteins were the candidates for hub proteins involved in various pathways as shown in (Fig. 4) and shown in (Table S5).

GO Analysis
Gene ontology analysis predicted ERF proteins involvement in various biological processes, cellular processes, and molecular processes. The analysis depicted that ERF proteins were present in different stresses and their responses and involved in different regulations. Stress responses in which ERFs were involved include water stress, hormones, cold and salt stress. In different molecular processes, ERF proteins were predicted to possess DNA binding activity as shown in (Fig. 5).

Discussion
Transcription factors presume a vital role in controlling and the adjusting responses of plants to different inside or outside signals [26]. They regulate the downstream genes in stress signal transduction pathways by means of initiation and suppression of genes on exposure to stresses. Plant genomes contain multiple transcription factors like AP2/ERF [27]. It has been anticipated that the Arabidopsis genome codes for 1533 transcription factors, involving over 5.9% of its absolute anticipated genes (Rao et al., 2014). As per, 181 AP2/ERF genes or proteins were recognized in ERF subfamily having one Apetala2-like domain or region. All through the most recent decade, a colossal measure of exploration has been coordinated that demonstrated overexpression of ERF family genes builds the resistance of plants to stresses as studied in [25].and Shinozaki, 2012) in chillies [25][26][27][28][29][30]. Therefore, it is one of the most signi cant challenges to distinguish the genes or proteins which provide resistant against abiotic factor. The phylogeny, structural analysis, GO analysis, motif analysis in various plants incorporating genome-wide investigation in model plant, Arabidopsis, rice [25]. And sorghum [26]. While few ERF genes in wheat were additionally concentrated in pathogen attack. With the completion of wheat genome sequencing ventures, phylogenetic examination of TaERF proteins or genes would be useful to explore general function of ERF genes or proteins and the developmental processes of the AP2/ERF in plants. It was found that TaERF has 181 ERF genes which are more noteworthy than Arabidopsis, sorghum, peach and moso bamboo, while not as much as Chinese cabbage and Carrot.
A comparative example of conserved residues was accounted for in sorghum ERF genes [27].and in cucumber. [28]. these conserved residues have critical impact on the function of AP2-domain and furthermore help for single change in protein. Comparative analysis of amino acid residues among various plants have demonstrated that they are most conserved in the AP2/ERF superfamily were available in different species, including Arabidopsis, rice, maize and different plants like WLG and EAR [25]. Are additionally seen in TaERF proteins in which WLG is the most important. Aside from motifs in Arabidopsis and rice, TaERF assume signi cant role in evolutionary relationships. A signi cant number of exons were found in TaERF proteins. In previous studies, it has been reported that most of the ERF genes Apart from these ndings the protein protein interaction network was analyzed and it showed various proteins are physically connected with each other having high BC values and shortest path length which means these proteins having high BC values are hub proteins playing a role in regulating the whole network these are the directly linked proteins one protein can link to several proteins at a time which means that protein has high BC value.
To our knowledge, this is the rst study in Wheat Triticum aestivum for genome-wide analysis of ERF genes. This research may also contribute functional gene resources for genetic future engineering approaches for producing plants resistant to various biotic and abiotic stresses. This research will also provide core proteins involved in a network pathway that could be used for further investigations in future. ERF's with high BC value are involved in various pathways and are involved in various functions.
The structural Analysis is linked to the network analysis as it represents the deeper insight of each ERF which help us identify sites which are important, and which are an apart of various processes. Phylogenetic Analysis reveal the ancestral groups containing similar ERF's such as arabidopsis and other grasses.

Conclusion
As wheat genome has been sequenced with a lot of information about the genome still there is a lot information to be interpreted from the genome which is ambiguous and needed to be studied further such as functions of certain genes that are unidenti ed and need further analysis there are a lot of wheat transcription factors that are not well analyzed and have unclear information that needs to be studied. This study aims to provide a detailed structural and PIN analysis on wheat ERF's based on prediction using bioinformatics and computational biology. This research may also contribute functional gene resources for genetic future engineering approaches for producing plants resistant to certain biotic and abiotic stresses. This research will also provide core proteins involved in a network pathway or signaling pathways initiating stress responses that could be used for further investigations in future however, further research needs to be done for better understanding.

Methods
Identi cation of Ta ERF proteins by computational Analysis AP2/ERF superfamily has thoroughly been studied in Triticum aestivum which led to the identi cation of 183 ERF subfamily proteins. The 183 ERF genes identi ed and reported in an earlier studied were considered for the study. Data of these genes were obtained from different databases including Plant Transcription Database (http://planttfdb.cbi.pku.edu.cn/).Phytozome(http://phytozome.jgi.doe.gov/). As a result, sequences of 183 ERF genes, Analysis of ERF genes was performed which involve amino acid (a.a) length, molecular weight (kDa), isoelectric point (Ip) using ExPASy server (http://www.expasy.ch/tools).

Phylogenetic-Domain-MSA analysis of ERFs
To observe conserved residues in ERF domain, multiple sequence alignment of these 181 ERF genes was executed using CLC work bench software package ( The descriptors for data analysis included; number of different motifs: 20, minimum motif width: 15, maximum motif width: 60 amino acids, distribution of the motif occurrence: zero or one per sequences. These parameters were incorporated according to the length of the protein and number of motifs under study to narrow down the desired outcomes. Scanning PPI's and construction of Protein-Interaction network PPIs were taken, from STRING database (https://string-db.org/cgi/network.pl) for the inspection of protein-protein interactions. A network was constructed that consist of the seed proteins and also their close neighbors and the linkage between all proteins. Pajek program was used for the analysis of network linkages, construction and visualization of huge networks.

Topological graph theory analysis
Properties of graph theory were acquired to evaluate the nodes or center points in a network; most important of them are k which is degree of the network and BC which is the centrality of the network or the shortest paths among nodes are essential and necessary in the network theory. For topological center of the network worldwide topological estimations of systems incorporate degree, mean, path length and diameter used to de ne network. The proteins with high BC values were the intensely seed or hub centers, and the connections between them make up a backbone. The most direct route between the candidates were calculated and evaluated by Pajek software (http://mrvar.fdv.uni-lj.si/pajek/). The sub network construction was done by shortest or longest path calculation with nodes having shortest path be the hub proteins we detected the direct routes between selected proteins which aided in obtaining the proteins having multiple interactions and centrality determining to be the centre or highly involved proteins of the network.

GO Analysis and sub cellular localization
The gene ontology analysis of ERF protein sequences of wheat was performed using BLAST GO with default settings. The number of exons using GSDS tool were predicted. Sub-cellular localization was performed by two web-based tools WoLF PSROST and Plant-mPLoc. GO analysis of wheat ERF provided information of the biological process in which ERFs are involved, molecular function they perform in the cell. Information about cellular localization where they are located in cell which provides the insight of the organelle in which they are situated and helps in predicting their possible function in the cell that could contribute against different stress responses.