In-Silico Identification and Characterization of Universal Stress Protein (USP) Gene Family in Triticum aestivum

doi:10.21203/rs.3.rs-1943975/v1

Download PDF

Article

In-Silico Identification and Characterization of Universal Stress Protein (USP) Gene Family in Triticum aestivum

https://doi.org/10.21203/rs.3.rs-1943975/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Climate has changed drastically over the last decade. It is crucial to understand the needs of the plants and their adaptive mechanism that help them survive during adverse environmental conditions. Abiotic stressors mainly salt concentration, osmotic stress, heat stress, drought, flooding, etc. affect plants significantly. In this research work, we identified and characterized wheat's Universal Stress Protein (USP) gene family. In-silico approaches such as identification, gene ontologies, chromosomal mapping, circos, and synteny analysis were used to analyze the reported sequences. The study revealed that the domain architecture plays the most significant role in this family's multi-functional features, which is present in all plants. Moreover, the syntenic relationship revealed the conservancy among the monocot genomes. The role of USP in host cells was explored through studies/tools such as subcellular localization and gene ontologies The presence of several regulatory elements also gave insight into stress-specific modulation and regulation. Furthermore, protein modeling of the TaUSP genes revealed the presence of binding pockets with functionally important amino acids This work led us to report a total of 107 protein sequences on the ABD genome grouped into 34 TaUSP genes. Further instigations such as expression profiling might help verify these genes' stress-specific transcriptional modulation. Hence, this work would be quite useful in developing economically stress-resilient varieties.

Abiotic stress

drought

salinity

Universal Stress Proteins

regulatory elements

Throughout the life cycle, plants face several changes to environmental conditions from optimum to adverse. The changing climate has a drastic effect on the plants such as stunted growth, cellular damage, decreased photosynthesis, abnormal flowering and germination times, and decreased yields^[1,2]. A variety of regulatory pathways comprising proteins and transcription factors play their part in bringing homeostasis. Different cellular proteins play a significant role to withstand stressful cellular environments in response to abiotic stress conditions in the stress management in the cell defense system. One such class is the Universal stress protein (USP) family^[3].

USPs were first identified in E. coli and responded against several environmental stressors including temperature, nutrient scarcity, osmotic and oxidative stress, heavy metal accumulation and antibiotics, etc.^[4] These genes have been identified in many prokaryotes and eukaryotes including plants. In plants, the most abundant is Universal Stress Protein A (USPA) which belongs to the Adenine Nucleotide Alpha Hydrolase (AANH) superfamily. These are small cytoplasmic proteins approximately 111–167 amino acids long. They have serine/ threonine residues and autophosphorylation properties. ^[5]. The structure of the USP from E. coli contained only a single USP domain but in other organisms, many different domains accumulated and performed a function in conjugation with the USP. The diversity in the functionality of USP genes is due to the extra domains^[6]. Very few plants have been studied for the presence of USPs for example, Arabidopsis thaliana, Gossypium arboreum, Medicago falcata, etc^[7,8]. OsUSP1 from the rice was the first to be reported in plants. It activates the signaling pathway due to ethylene during hypoxia conditions. Recently 44 OsUSP were reported in rice before 38 OsUSP were identified before^[9,10]. A. thaliana has 41 identified USPs genes, increased expression was observed under abiotic stressors such as cold and drought, etc.^[11]. USPs are also involved in the hormonal regulation and particularly signaling pathways mediated via ethylene and regulate fruit ripening^[9]. Moreover, abscisic acid upregulates the expression of GUS^[11].GhUSP1 and GhUSP2 of cotton were also reported to be effective against drought response^[12]. In tomatoes, there is a USP named SIRd2 with its interacting partner SlCipk6, which plays role in decreasing the Reactive Oxygen Species (ROS)^[13]. MfUSP1 from Medicago falcata also plays role in ROS homeostasis and increased tolerance to salinity, cold and osmotic stress^[14]. OeUSP2 of olive acts as a biomarker against salt stress^[15]. Pigeon pea from the leguminous family has fifty-one identified drought-responsive genes out of which 10 also contain a USP_A domain^[16]. SbUSP from Salicornia brachiata when overexpressed shows the response to salt and osmotic stress^[17]. The response against several different environmental stressors indicates the functional capabilities of these proteins^[18].

Wheat is an important economic crop as well as a staple for almost 30% of the world population^[19]. The whole genome sequencing of the wheat has made it possible to discover the underlying mechanisms and genes involved in abiotic stress response^[20,21]. A study of the USP family in different plants gives insight into their stress-specific modulation^[22].

2.1. Sequence retrieval and characterization of USP genes in Triticum aestivum

To characterize the USP gene family in Triticum aestivum, the amino acid sequence of Oryza sativa (Japonica) IRGSp-1.0 USP was retrieved from Rice Annotation Project Database(RAP-DB) (https://rapdb.dna.affrc.go.jp/) ^[23] and were used as query sequences to perform protein BLAST against the Triticum aestivum IWGSC RefSeq v1.1 genome using 1⋅ 10^− 5 as the e-value in ensemble plants database (http://plants.ensembl.org/)^[24]. USP genes belong to the adenine nucleotide alpha hydrolase (AANH) superfamily and contain a conserved USP domain (PFAM000582), presence of the domains in the retrieved sequences was done using PFAM(http://pfam.xfam.org/)^[25] as it is the characteristic feature of this gene family. Iso-electric points and molecular weights were calculated using the ExPasy online server(https://www.expasy.org/resources/compute-pi-mw)^[26].

2.2.Multiple Sequence Alignment, phylogenetic evolutionary analysis, and nomenclature of USP genes of Triticum aestivum.

Complete protein sequences of Brachipodium distachyon, Oryza sativa, and T. aestivum were aligned to draw the phylogenetic tree. Multiple sequence alignment of the identified genes was done using Clustalx2.1 software^[27]. The alignment file was edited using the GeneDoc software^[28]. Phylogenetic analysis was performed using IQ-tree through maximum likelihood using bootstrap value as 1000^[29]. Resultant tree pictures were viewed and edited in iTOL^[30]. Nomenclature of the genes was given according to the OsUSP genes identified earlier in 2021^[10].

2.3. Analysis of Gene structure, motif, and amino acid composition

For the analysis of the gene structure and motifs present in the USP, complete coding sequences and genomic sequences were taken from the ensemble plants database (http://plants.ensembl.org/)^[24]. The exon-intron distribution was graphically visualized using the Gene Structure Display Server (GSDS 2.0)(http://gsds.gao-lab.org/) by comparing the genomic sequence with the CDS sequences of TaUSP genes^[31]. For the analysis of the motifs MEME Suite, 5.4.1 was used to check the conserved motifs in these genes with the following parameters (1). Total motifs to be found were set to 10. (2). The width of the motif was set to be 6 and 50 as minimum and maximum, respectively. The rest of the parameters were used as default^[32].

2.4.Chromosomal mapping and gene duplication analysis

The information related to the positions of USP genes on wheat chromosomes was taken from the Ensemble database. The location of the genes was presented using the gene visualization advance tool in TBtools by Chen J^[33]. Gene duplication analysis was done keeping in mind two modes of duplication: 1) tandem and 2) segmental duplications respectively. Pair of genes on the same chromosome and separated by < = 5 gene positions are termed tandem duplications. Analysis of divergence time and pressure effect of USP genes, the Ka (non-synonymous) and Ks (synonymous) values were calculated using the Ka/Ks calculation tool in TB tools. The approximate divergence time was assessed using the formula \(T=Ks/2xMYA, x=6.5\times 10-9, \text{M}\text{Y}\text{A}=10-6\).^[34]

2.5.Gene Ontology (GO) annotation and Subcellular localization

The GO enrichment analysis was done using the online tool PANTHER version14^[35]. The protein sequence of wheat USP was uploaded to the online server to check the molecular, biological, and cellular functions. Subcellular localization was predicted using the WoLF PSORT web server^[36]. The heat map was drawn to show the localization of these genes using Heat Map illustrator software on TBtools^[33].

2.6. Cis-regulatory elements (CREs) analysis of TaUSP promoters

Analysis of CREs present in USP of wheat involved extraction of promoter sequences 2.5kbp upstream was done from the ensemble database. Extracted sequences were uploaded to the PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) database to predict different regulatory motifs present in the USP family of Wheat^[37]. Graphical representation of these motifs was done using the Toolkit Biologist Tools software along with a stacked bar plot, by using the start and ending positions of respective motif^[33].

2.7. Identification of Glycosylation and phosphorylation sites in TaUSP genes

For the prediction of the glycosylation sites in TaUSP genes NetNGlyc 1.0 online server was used using the preset 0.5 threshold value^[38]. The phosphorylation sites were predicted using NetPhos 3.1 with the prediction on all three sites i.e., serine, threonine, and tyrosine both generic as well as kinase-specific phosphorylation, respectively^[39].

2.8. Protein modeling, disordered region analysis, and prediction of binding sites

Protein modeling for the four TaUSP genes was done through homology modeling using the SWISS-MODEL workspace^[40]. Secondary structures of the model build were identified using SOPMA web server^[41]. Disordered regions were analyzed using the Mobi DB server (https://mobidb.bio.unipd.it)^[42] and the binding pockets were discovered using the DoGSiteScorer webserver^[43].

3.1.Identification of USP Genes in T. aestivum

Blastp search revealed, a total of 107 USP genes in the T. aestivum genome. This search was made using the OsUSP genes. Moreover, due to their presence on the ABD genome, they were sub-grouped further. The coding DNA sequence’s length ranged from 186 bp to 6498 bp; (TaUSP13(A), and TaUSP16(B) respectively. The length of protein ranged from 121aa to 844aa (TaUSP7(D), TaUSP27b(D) respectively). Molecular weight of these proteins varied greatly from 11,495.23 g/mol to 91,760.08 g/mol (TaUSP42(D), TaUSP8(A) respectively. The average pI of these genes was 7.1098. Supplementary Table 1

3.2. Multiple Sequence Alignment and phylogenetic evolutionary analysis and nomenclature of USP genes of Triticum aestivum.

Multiple sequence alignment of the 107 USP genes gave insight into the conserved amino acids present in these sequences. A total of 34 USPs were identified comprising the 107 sequences identified by the BLASTp search. This phylogenetic analysis revealed that not all OsUSPs identified in rice were present in B. distachyon and T. aestivum. For instance, i.e., OsUSP10, and OsUSP42 identified in T. aestivum were not present in B. distachyon. Similarly, no USP genes were identified in both plants against OsUSP2, OsUSP6, OsUSP23, OsUSP39, OsUSP40, and OsUSP4.

The phylogenetic analysis revealed that the tree was divided into two distinct clades. This classification was done based on the domain architecture. USP genes contain either a single USP domain or a USP domain with an additional kinase domain belonging to the major classes of protein kinase. TaUSP10, TaUSP27(a, b), TaUSP34 and TaUSP35 also had an extra U-box domain in addition to USP and Protein kinase domain. The study also showed that not all the genes are present in all three wheat genomes ABD. The sequences are present on the either AB, AD, or BD genome. Figure 1.

3.3. Gene structure and motif Analysis and amino acid content of USP of wheat

Identification of the conserved motifs was done using MEME online server. There were 10 motifs identified. The information on these motifs is present in Supplementary Table 2. There are either two, 7, or 10 motifs collectively present in the genes. Motifs 1 and 2 are the characteristic feature of the USP domain whereas 3–10 belong to the protein kinase and U-box domain. All the USPs belonging to Class A had 1–2 motifs whereas Class B had 6–10 motifs in total. This confirms the presence of a single USP domain and a protein kinase domain along with a U-box domain in these proteins. Figure 2a

The gene structure of all the identified TaUSPs in the wheat were analyzed through the GSDS webserver. To have a better understanding the exon and intron distribution was studied. A moderate variation is seen in the distribution of these introns and exons. For introns, the distribution was from 1 to 10 with TraesCS1A02G145700.1, TraesCS1D02G144300.1, belonging to TaUSP37A and TaUSP37D have the least introns that is one. The sequences that contain USP and protein kinase domain had relatively more introns as well as exons. The architectural similarity of the exon-intron distribution indicates that these are also similar at the protein level. The exons were also distributed from as low as one to a maximum of 10. TaCS4B02G228200.1 and TaCS4D02G229100.1 have the highest number of exons 10 and these two belong to the TaUSP16A and TaUSP16D. Only TaUSP1A has 1 exon. The presence of 5’UTR and 3’UTR regions suggests their role in post-translational modifications. Figure 2b.

Moreover, the amino acid (aa) composition of these USPs indicates that USPs with the same domain pattern have similar amino acid content whereas, the USPs with different domain patterns have a slight difference in the amino acid composition. This can be checked by looking at the amino acid composition of TaUSP3 and TaUSP35 both have different domain patterns i.e., the former contains a single USP domain whereas, the latter has an additional domain along with USP. Hence, they also have different compositions. The results of gene structure, motifs, and amino acids indicate consistency of the domain architecture present. Supplementary Fig. 1.

3.4. Chromosomal mapping and gene duplication analysis

The genes are distributed variably on each of the seven chromosomes of the ABD genome of wheat. For instance, Chromosome 5A has the highest number of genes i.e., 9, 5B has 7 and 5D has 10. Similarly, 6A 6B, and 6D with 7, 9, and 7 genes, respectively. Whereas chromosomes 7A, 7B, and 7D have only 1 gene Fig. 3a Supplementary Table 1. These results indicate that all genes might not be present on all three genomes of the wheat, and they might have been lost in the process of evolution. The existence of more than 1 gene on certain chromosomes indicates that duplication events have occurred over time that gave rise to these genes on the chromosomes in separate locations. The duplication analysis revealed that 15 gene pairs had been duplicated in which only 1 tandem duplication and the rest of the 14 gene pairs were segmental duplications Fig. 3b. Divergence time is from 1.51 MYA to 5.01 MYA calculated by the Ka to Ks ratio. As well as the selection pressure was found to be negative as the ka/ks ratio was below 1. Synteny analysis among rice and wheat showed high conservancy and identified several paralogs. Figure 3c

3.5. Gene Ontology (GO) annotation and Subcellular localization

Predicting the subcellular localization of the identified proteins revealed that these genes show expression in many compartments of the cell performing a variety of functions. The results revealed that these proteins are localized within the cytoplasm, mitochondria, endoplasmic reticulum, cytoskeleton, chloroplast, and nucleus. They are also present in the extracellular matrix, and some of them are also present in peroxisomes that indicate their function in anoxic conditions and peroxidase activity. The intensity of the heat map indicates that these genes are strongly localized within the cytoplasm. Their presence in the Golgi complex, on the nuclear plasma membrane, vacuoles elucidated their enormous functioning in the various biological, molecular, and cellular processes. Figure 4

Functional annotations were checked through the web server PANTHER. The different Go ontologies based on cellular processes, biological processes, and molecular functions were predicted. A total of 29 out of 107 TaUSP genes were mapped to eight biological processes as protein phosphorylation (GO:0006468), phosphorylation (GO:0016310), phosphate-containing compound metabolic process (GO:0006796), phosphorus metabolic process (GO:0006793), cellular protein modification process (GO:0006464), Protein modification process (GO:0036211), macromolecule modification (GO:0043412) and no sequence found a match with biological regulation (GO:0065007). Figure 5a

Furthermore, talking about the cellular processes, out of 107 sequences were mapped with the PANTHER IDs. These were mapped to 9 cellular processes as cytoplasm (GO:0005737), intracellular anatomical structure (GO:0005622), nucleus (GO:0005634), cellular anatomical entity (GO:0110165), cellular component (GO:0005575), intracellular membrane-bounded organelle (GO:0043231), membrane-bounded organelle (GO:0043227), intracellular organelle (GO:0043229), organelle (GO:0043226), whereas 67 sequences were categorized as unclassified. Figure 5b

Lastly, talking about the molecular functions, sequences were mapped to functions like AMP binding (GO:0016208), protein kinase activity (GO:0004672), phosphotransferase activity, alcohol group as acceptor (GO:0016773), kinase activity (GO:0016301), transferase activity, transferring phosphorus-containing groups (GO:0016772), adenyl ribonucleotide Binding (GO:0032559), adenyl nucleotide binding (GO:0030554), purine ribonucleotide binding (GO:0032555), purine nucleotide binding (GO:0017076), ribonucleotide binding (GO:0032553), carbohydrate derivative binding (GO:0097367), ATP binding (GO:0005524), purine ribonucleoside triphosphate binding (GO:0035639), nucleotide binding (GO:0000166), nucleoside phosphate binding (GO:1901265), anion binding (GO:0043168), small molecule binding (GO:0036094), catalytic activity, acting on a protein (GO:0140096) and the last one being transferase activity (GO:0016740). Figure 5c

3.6. Cis-regulatory elements analysis of USP promoters of wheat

The promoter sequences downloaded from the database were used to identify different cis-regulatory elements (CREs) which might control the activation of genes under certain conditions. The analysis revealed that a total 7 types or classes of CREs are present in the sequences such as light-responsive elements (LREs), hormone-responsive elements (HREs), development-related elements, promoter elements, abiotic stress-related elements, biotic stress-responsive elements, and then was the 7th class of elements with functions still unknown Fig. 6a. The result of each class is discussed below.

3.6.1. Promoter related elements

There are 7 identified promoter-related elements present in these genes. These are CAAT-box, TATA-box, AT^~TATA-box, A-box, AT-rich element, TAAT, unnamed-1. One of the most abundant CREs is located upstream to start codon: TATA-box and CAAT-box at -35 and − 10 positions respectively also called core promoters. In the present sequences, the CAAT-box is the most abundant. Figure 6b.

3.6.2. Light responsive elements

Light responsive elements (LRE) identified were G-box, TCT-motif, AE-box, GATA-motif, Sp1, GT1-motif, Box 4, ACE, LAMP-element, I-box GA-motif, Gap-box, ATCT-motif, 3-AF1, L- box, chs-CMA1a, chs-CMA2, chs-Unit 1 m 1, AT1-motif, ATC-motif, TCCC-motif, Pc-CMA1c, GTGGC-motif. These are supposed to activate different photosystems and hence give a response to light. The results also indicated two or more elements of these classes are near each other, which indicates that more than one element is required for the activation of these promoters. G-box is the most abundant LRE present in the sequences. The presence of light-responsive elements in these sequences strongly tells that these genes might be involved in the activation of pathways regulated by light. Figure 6c.

3.6.3. Hormone-related elements

The upstream regions of the USPs also contained the regulatory elements related to the hormones. A total of 18 hormone-related elements belonging to 6 different classes have been identified. Class 1 is abscisic acid (ABA) related: ABRE, ABRE3a, ABRE4, AT ~ ABRE, class 2 is auxin-related: AuxRE, AuxRE-Core, TGA-box, TGA-element, CGTCA-motif (jasmonic acid), JERE and TGACG-motif are included. Class 4 belongs to salicylic acid (SA): TCA, TCA-element, and salicylic acid-responsive elements (SARE), class 5 is gibberellic acid: P-Box, TATC-box, and gibberellic acid responsive element (GARE-motif) and the last class 6 is the ethylene (ETH) which has ethylene responsive elements (ERE). The results indicate that these proteins also get activated by the change in hormones and affect these pathways giving responses according to them. Figure 6d.

3.6.4. Development-related responsive elements

A total of 25 development related responsive elements identified are AAGAA-motif, CCGTCC-box, AC-I, AC-II, as-I, CGGTCC-box, circadian, CAT-box, dOCT, E₂Fb, F-box, GCN4-motif, HD-zip 1, HD-zip 3, MSA like, NON, CARE, re2f1, RY element, NON-box, O2 Site, Unnamed__8, Unnamed__10, Unnamed__12 and Unnamed__14. These factors play a distinguished role in the cellular development process including the cell cycle and the cell proliferation pathways. Some of the genes might also control circadian pathways, and the pathways involved in zein metabolism. These motifs also indicate that these might play role in tissue-specific expression of genes related to the developmental process. Figure 6e.

3.6.5. Abiotic stress-responsive elements

A total of 18 out of 113 elements were identified as abiotic stress-responsive elements. CCAAT-box, Drought responsive elements DRE core, DRE 1, GC-motif, LTR; low-temperature response also called response to cold, MBS, MBS 1, MYB along with its binding and recognition sites, MYC, MYB like the site, STRE is for low pH, osmotic pressure, AT-rich sequence and ARE. Figure 6f.

3.6.6. Biotic stress-related elements

Among 113 sequences 4 biotic stress-related elements were identified and participated in wound healing and response to pathogen attack. Box S, W-box, WRE 3, and WUN-motif are identified motifs. Figure 6g.

3.6.7. Unidentified

The last class contains the unnamed unidentified sequences with no known function but are present in quite a number. They are Unnamed__2, Unnamed__4, Unnamed__6, Unnamed__16. Unnamed__4 with a motif sequence CTCC is the most abundant. Figure 6h.

3.7. Identification of Glycosylation and phosphorylation sites in TaUSP genes

Phosphorylation is one of the significant post-translational modifications and plays a key role in the activation, deactivation, and regulation of cellular pathways. In this process, Protein kinases phosphorylate amino acids such as serine, threonine, and tyrosine. A substantial number of phosphorylated sites are predicted in 107 sequences. TraesCS7A02G084400.3(TaUSP27) has the highest number of phosphorylation sites and TraesCS1D02G108300.1 has the lowest sites 122 and 5, respectively. Supplementary Table 3.

Glycosylation is another post-translational modification that helps proteins in proper folding and gives them their characteristics and functionality and plays role in the stability of these protein structures. This study has revealed that 81 glycosylation sites in 19 out of 34 TaUSPs have been predicted. The highest glycosylation sites were present in TaUSP10 i.e., 7; followed by TaUSP32 with 6 sites, TaUSP27a and TaUSP35 with 4 sites, TaUSP8 and TaUSP27b with 3 sites, TaUSP4 and TaUSP14 with 2 sites while the rest 11 TaUSP have only 1 glycosylation site. N-glycosylation scores of more than 0.5 and a jury score of 9/9 indicate high specificity to glycosylation event and predict that protein might have a stable glycosylated mediated structure Supplementary Table 4.

3.8. Protein modeling, disordered region analysis, and prediction of binding sites

Three-dimensional structures of four TaUSP genes TaUSP4, TaUSP10, TaUSP21, and TaUSP30, chosen from the two groups, were modeled, using the crystal structure of human IRAK1 (PDB_6BFN.1. A), the crystal structure of USP from Arabidopsis Thaliana At3g01520 (PDB_2GM3.1. A) and a hypothetical protein (PDB_1MJH.1. A) respectively. Secondary structure of all these proteins was predicted using the SOPMA webserver and α−helices, β−sheets, extended strands, and random coils were observed between 31.91%-46.3%, 4.59%-6.63%, 9.85%-20.48%, and 37.22%-45.91% respectively. Supplementary Table 5.

The TaUSP4 and TaUSP30 have 19% and15.2% disordered region whereas, TaUSP10 and TaUSP21 have no disordered region and solely contains the functional domains. The predicted protein models were validated by the Ramachandran plot in Supplementary Figs. 2, 3, 4, and 5. The favored regions were above 90%. Supplementary Table 6.

Two binding pockets were predicted in the models plotted. The binding pockets contained several functional amino acids such as alanine (Ala), aspartate (Asp), asparagine (Asn), cysteine (Cys), glutamine (Gln), glycine (Gly), histidine (His), isoleucine (Ile), leucine (Leu), serine (Ser), threonine (Thr), valine (Val), tryptophan (Trp) and tyrosine (Tyr) Fig. 8a, b, c, and d. Supplementary Table 7. The presence of the serine and threonine residues in the binding pockets shows that these might be the sites for phosphorylation and the presence of asparagine give insight into N-glycosylation sites.

Universal stress proteins have been proven to be stimulated in response to several abiotic stresses. Previous studies have revealed the presence of a variety of USP gene families in different plants, which include Oryza sativa japonica, Zea mays, Gossypium arboreum, and solanum Lycopersicum, solanum penneli, Pigeon pea, Medicago falcata^[7]. In plants, there are usually twenty to fifty genes. T. aestivum and Brassica napus have more than 100 genes. The presented study identified a total of 107 TaUSP genes in the wheat genome Supplementary Table 1. Multiple sequence alignment revealed conserved binding site and ATP binding motif having G-2X-G-9X-G(S/T)^[44].

The phylogenetic tree showed two main clades that match the domain architecture of E. coli. Group A contains a single USP domain. On the other hand, group B has peptides with longer lengths as well as an additional functional kinase domain. There are functional domains other than the protein kinase which gives diversity in the functioning of USP genes and helps protects plants against a variety of abiotic stresses, within these there are also two classes the ATP binding class and non-ATP binding class.^[45,46]. The presence of protein kinase and protein tyrosine kinase domain gives information on their phosphorylation properties in signaling pathways^[47]. U-box domain is responsible for the ubiquitination of the proteins, hence the presence of the domain along with the USP and protein kinase proposes its role in proper folding^[48]. Structural diversity has also been verified by Arabia et al^[10]. Figure 1

The identified genes are spanned all over the wheat genome. Structural analysis of the identified genes along with proteins also showed that domain architecture plays a vital role in their specific functioning^[49]Figure 3a. Duplication analysis revealed that 15 out of 107 genes were duplicated segmentally and there was only 1 tandem duplication Fig. 3b. Tandem and segmental duplications play a significant role in the increased number of these genes in wheat. Further analysis revealed that a high number of orthologs are present between rice and wheat. Duplication played the part in the expansion of these genes within monocot lineage^[34] Fig. 3c. The ratio of Ka/Ks values gives insights into the selection pressure within the gene family. The results indicated negative section pressure for all the duplication events over the time^[50].

The GO annotation was mapped to several cellular, biological as well as molecular functions. The Subcellular localization of these genes was predominant in cytoplasm, chloroplast, and mitochondria some in the nucleus and a few in peroxisomes. This also shows their role in certain cellular pathways and their functioning in response to stressed^[51]. The role of these genes in redox reactions is proven in many studies^[52]. Studies on the expression pattern of these genes indicate regulation and modulation under different stresses. These findings truly describe former studies of stress-specific modulation and regulation of USP genes and their overexpression, resulting in increased thermo-tolerance and enhanced tolerance to osmotic stress^[52]. Moreover, the TaUSP might have similar functionalities in response to stress. To top this argument, we also found several stresses- responsive regulatory elements present in the upstream regions of TaUSP genes. These include Sp1, ABRE, LTRs, ARE, MBS, GT motif, and TC- rich repeats. Previous studies on tomato USP genes show that a wild variety of these is induced under the influence of ABA, drought, salt, temperature, and ethylene stressed^[53] Fig. 6(a, b, c, d, e, f, g, and h). OsUSP genes were upregulated in response to cold temperatures whereas downregulated in response to ABA^[9]. Promoters of AtUSP were highly induced by different stressors and showed a multi-stress response. SbUSP of Salicornia brachiata was expressed in response to temperature, drought, and salt stress^[17]. There was another study, in which cotton USP promoters were activated in response to heavy metals, salts, osmotic stress, and gibberellic acid stressed^[54]. In general, these studies provide proof of differential modulation and regulation of TaUSP genes under abiotic stresses. Furthermore, these promoters can induce stress tolerance and are excellent to produce variable stress-responsive expression of these genes in transgenic plants Fig. 7.

To deeply understand this gene family, proteins of these sequences were also studied. The protein products' proper functioning and stability depend on the post-translational modifications' mechanisms. The two major modifications studied for TaUSP’s were glycosylation and phosphorylation. Glycosylation plays role in the proper folding of proteins, strength, and signaling pathways^[55]. phosphorylation has a vital role in protein activation and deactivation through certain alterations in the conformation. It has a major function in the signaling pathways and metabolic processes^[56].

Naturally, these modified proteins are reported to be significant in several biological processes. The studies have revealed that modifications in different plants occur in response to osmotic as well as cold stress. The predicted glycosylation sites give insight into the functional stability of these genes in response to stress^[55]. There are several studies in which phosphorylation of the proteins resulted in combating several stressors^[57,58]. The position-specific kinase activity of these proteins can help us further verify their roles using different laboratory techniques. The protein modeling results revealed that active binding pockets of the TaUSP contain several functional amino acids. The presence of alanine, valine, and aspartic acid within the ligand binding site predicts that it might attach hydrophobic chains as ligands^[59]. Figure 8.

In general, this study investigated the USP gene family in wheat. A total of 107 sequences on wheat’s ABD genome were reported. The focus of the study was to conduct a detailed investigation of the functional and structural attributes of TaUSPs. It revealed the presence of various functions among the TaUSP genes of wheat as well as their characteristic multi-stress nature. Furthermore, expression analysis studies will confirm the multi-stress nature of these genes. Despite this, additional studies are needed to spot the diverse roles of these genes by producing either knockout lines or overexpressing TaUSP genes in solitary along with different groups. By using advanced technology and machinery the identified TaUSP genes can be used not just to improve wheat cultivars but are also potential candidates to develop over-expression lines and stress-tolerant crop varieties.

Data availability

The supporting datasets of the sequences are available on Ensemble plants (http://plants.ensembl.org/ along with their Uniprot IDs in Supplementary Table 1.

Author Contributions

All the authors contributed equally to this work. H.I. designed the research, and performed different analyses, results from the analysis, and have written the manuscripts. R.A., A.G., also designed the research and proof edited the manuscript. R.Z.P. analyzed the results of duplication analysis and protein modeling. F.M., R.A., and M.F.B. helped in result analysis and write-up reviewing.

Sabagh, A. E. et al. in Plant Growth Regulators 1-38 (Springer, 2021).
Imran, Q. M., Falak, N., Hussain, A., Mun, B.-G. & Yun, B.-W. Abiotic Stress in Plants; Stress Perception to Molecular Response and Role of Biotechnological Tools in Stress Resistance. Agronomy 11, 1579 (2021).
Chi, Y. H. et al. The physiological functions of universal stress proteins and their molecular mechanism to protect plants from environmental stresses. Frontiers in plant science 10, 750 (2019).
Nyström, T. & Neidhardt, F. C. Cloning, mapping and nucleotide sequencing of a gene encoding a universal stress protein in Eschericha coli. Molecular microbiology 6, 3187-3198 (1992).
Freestone, P., Nyström, T., Trinei, M. & Norris, V. The universal stress protein, UspA, of Escherichia coli is phosphorylated in response to stasis11Edited by J. Karn. Journal of Molecular Biology 274, 318-324, doi:https://doi.org/10.1006/jmbi.1997.1397 (1997).
Sousa, M. C. & McKay, D. B. Structure of the universal stress protein of Haemophilus influenzae. Structure 9, 1135-1141 (2001).
Isokpehi, R. D. et al. Identification of drought-responsive universal stress proteins in viridiplantae. Bioinformatics and biology insights 5, BBI. S6061 (2011).
Chi, Y. H. et al. The physiological functions of universal stress proteins and their molecular mechanism to protect plants from environmental stresses. Frontiers in plant science, 750 (2019).
Sauter, M., Rzewuski, G., Marwedel, T. & Lorbiecke, R. The novel ethylene‐regulated gene OsUsp1 from rice encodes a member of a plant protein family related to prokaryotic universal stress proteins. Journal of Experimental Botany 53, 2325-2331 (2002).
Arabia, S., Sami, A. A., Akhter, S., Sarker, R. H. & Islam, T. Comprehensive in-silico characterization of Universal Stress Proteins in rice (Oryza sativa L.) with insight into their stress-specific transcriptional modulation. Frontiers in plant science, 1589 (2021).
Bhuria, M., Goel, P., Kumar, S. & Singh, A. K. Genome-wide identification and expression profiling of genes encoding universal stress proteins (USP) identify multi-stress responsive USP genes in Arabidopsis thaliana. Plant Physiology Reports 24, 434-445 (2019).
Zahur, M. et al. Isolation and functional analysis of cotton universal stress protein promoter in response to phytohormones and abiotic stresses. Molecular biology 43, 578-585 (2009).
Gutiérrez-Beltrán, E., Personat, J. M., de la Torre, F. & Del Pozo, O. A universal stress protein involved in oxidative stress is a phosphorylation target for protein kinase CIPK6. Plant physiology 173, 836-852 (2017).
Gou, L., Zhuo, C., Lu, S. & Guo, Z. A Universal Stress Protein from Medicago falcata (MfUSP1) confers multiple stress tolerance by regulating antioxidant defense and proline accumulation. Environmental and Experimental Botany 178, 104168 (2020).
Sadder, M. T. et al. Characterization of putative salinity-responsive biomarkers in olive (Olea europaea L.). Plant Genetic Resources 19, 133-143 (2021).
Sinha, P. et al. Identification and validation of selected universal stress protein domain containing drought-responsive genes in Pigeonpea (Cajanus cajan L.). Frontiers in plant science 6, 1065 (2016).
Udawat, P., Mishra, A. & Jha, B. Heterologous expression of an uncharacterized universal stress protein gene (SbUSP) from the extreme halophyte, Salicornia brachiata, which confers salt and osmotic tolerance to E. coli. Gene 536, 163-170 (2014).
Melencion, S. M. B. et al. RNA chaperone function of a universal stress protein in Arabidopsis confers enhanced cold stress tolerance in plants. International journal of molecular sciences 18, 2546 (2017).
Zörb, C., Ludewig, U. & Hawkesford, M. J. Perspective on wheat yield and quality with reduced nitrogen supply. Trends in plant science 23, 1029-1037 (2018).
Levy, A. A. & Feldman, M. Evolution and origin of bread wheat. The Plant Cell, doi:10.1093/plcell/koac130 (2022).
Appels, R. et al. Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361 (2018).
Tiwari, V. et al. in Abiotic stress management for resilient agriculture 313-337 (Springer, 2017).
Sakai, H. et al. Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics. Plant & cell physiology 54, e6, doi:10.1093/pcp/pcs183 (2013).
Yates, A. D. et al. Ensembl Genomes 2022: an expanding genome resource for non-vertebrates. Nucleic acids research 50, D996-D1003 (2022).
Mistry, J. et al. Pfam: The protein families database in 2021. Nucleic Acids Research 49, D412-D419, doi:10.1093/nar/gkaa913 (2020).
Gasteiger, E. et al. ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res 31, 3784-3788, doi:10.1093/nar/gkg563 (2003).
Larkin, M. A. et al. Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947-2948, doi:10.1093/bioinformatics/btm404 (2007).
Nicholas, K. B. Genedoc: a tool for editing and annoting multiple sequence alignments. http://wwwpscedu/biomed/genedoc (1997).
Minh, B. Q. et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Molecular Biology and Evolution 37, 1530-1534, doi:10.1093/molbev/msaa015 (2020).
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic acids research 49, W293-W296 (2021).
Hu, B. et al. GSDS 2.0: an upgraded gene feature visualization server. Bioinformatics 31, 1296-1297 (2015).
Bailey, T. L., Johnson, J., Grant, C. E. & Noble, W. S. The MEME suite. Nucleic acids research 43, W39-W49 (2015).
Chen, C. et al. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Molecular plant 13, 1194-1202 (2020).
Panchy, N., Lehti-Shiu, M. & Shiu, S.-H. Evolution of gene duplication in plants. Plant physiology 171, 2294-2316 (2016).
Mi, H., Muruganujan, A., Ebert, D., Huang, X. & Thomas, P. D. PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic acids research 47, D419-D426 (2019).
Horton, P. et al. WoLF PSORT: protein localization predictor. Nucleic acids research 35, W585-W587 (2007).
Lescot, M. et al. PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic acids research 30, 325-327 (2002).
Gupta, R., Jung, E. & Brunak, S. NetNGlyc 1.0 Server. Center for biological sequence analysis, technical university of Denmark available from: http://www. cbs. dtu dk/services/NetNGlyc (2004).
Blom, N., Sicheritz‐Pontén, T., Gupta, R., Gammeltoft, S. & Brunak, S. Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4, 1633-1649 (2004).
Waterhouse, A. et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Research 46, W296-W303, doi:10.1093/nar/gky427 (2018).
Geourjon, C. & Deleage, G. SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Bioinformatics 11, 681-684 (1995).
Piovesan, D. et al. MobiDB: intrinsically disordered proteins in 2021. Nucleic Acids Res 49, D361-d367, doi:10.1093/nar/gkaa1058 (2021).
Volkamer, A., Griewel, A., Grombacher, T. & Rarey, M. Analyzing the Topology of Active Sites: On the Prediction of Pockets and Subpockets. Journal of Chemical Information and Modeling 50, 2041-2052, doi:10.1021/ci100241y (2010).
Isokpehi, R. D. et al. Developmental regulation of genes encoding universal stress proteins in Schistosoma mansoni. Gene regulation and systems biology 5, GRSB. S7491 (2011).
Li, W.-T. et al. Identification, localization, and characterization of putative USP genes in barley. Theoretical and Applied Genetics 121, 907-917 (2010).
Wang, X.-F. et al. Functional characterization of selected universal stress protein from Salvia miltiorrhiza (SmUSP) in Escherichia coli. Genes 8, 224 (2017).
Andjelkovic, M. et al. Role of translocation in the activation and function of protein kinase B. Journal of Biological Chemistry 272, 31515-31524 (1997).
Wiborg, J., O'Shea, C. & Skriver, K. Biochemical function of typical and variant Arabidopsis thaliana U-box E3 ubiquitin-protein ligases. Biochemical Journal 413, 447-457 (2008).
Tkaczuk, K. L. et al. Structural and functional insight into the universal stress protein family. Evolutionary applications 6, 434-449 (2013).
Huang, D., Mao, Y., Guo, G., Ni, D. & Chen, L. Genome-wide identification of PME gene family and expression of candidate genes associated with aluminum tolerance in tea plant (Camellia sinensis). BMC Plant Biology 22, 1-13 (2022).
Cui, X. et al. Genome-wide analysis of the Universal stress protein A gene family in Vitis and expression in response to abiotic stress. Plant Physiology and Biochemistry 165, 57-70 (2021).
Gonzali, S. et al. Universal stress protein HRU1 mediates ROS homeostasis under anoxia. Nature Plants 1, 1-9 (2015).
Loukehaich, R. et al. SpUSP, an annexin-interacting universal stress protein, enhances drought tolerance in tomato. Journal of experimental botany 63, 5593-5606 (2012).
Gorshkova, D. & Pojidaeva, E. Members of the Universal Stress Protein Family are Indirectly Involved in Gibberellin-Dependent Regulation of Germination and Post-Germination Growth. Russian Journal of Plant Physiology 68, 451-462 (2021).
Jayaprakash, N. G. & Surolia, A. Role of glycosylation in nucleating protein folding and stability. Biochemical Journal 474, 2333-2347 (2017).
Proud, C. G. Phosphorylation and signal transduction pathways in translational control. Cold Spring Harbor perspectives in biology 11, a033050 (2019).
Damaris, R. N. & Yang, P. Protein phosphorylation response to abiotic stress in plants. Plant Phosphoproteomics, 17-43 (2021).
Zhu, M. et al. Insights into the trihelix transcription factor responses to salt and other stresses in Osmanthus fragrans. BMC genomics 23, 1-18 (2022).
Singh, L. R., Poddar, N. K., Dar, T. A., Kumar, R. & Ahmad, F. Protein and DNA destabilization by osmolytes: the other side of the coin. Life sciences 88, 117-125 (2011).

No competing interests reported.

Supplementaryfile.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

In-Silico Identification and Characterization of Universal Stress Protein (USP) Gene Family in Triticum aestivum

Status:

Version 1

Abstract

Figures

1. Introduction

2. Methodology

2.1. Sequence retrieval and characterization of USP genes in Triticum aestivum

2.2.Multiple Sequence Alignment, phylogenetic evolutionary analysis, and nomenclature of USP genes of Triticum aestivum.

2.3. Analysis of Gene structure, motif, and amino acid composition

2.4.Chromosomal mapping and gene duplication analysis

2.5.Gene Ontology (GO) annotation and Subcellular localization

2.6. Cis-regulatory elements (CREs) analysis of TaUSP promoters

2.7. Identification of Glycosylation and phosphorylation sites in TaUSP genes

2.8. Protein modeling, disordered region analysis, and prediction of binding sites

3. Results

3.1.Identification of USP Genes in T. aestivum

3.2. Multiple Sequence Alignment and phylogenetic evolutionary analysis and nomenclature of USP genes of Triticum aestivum.

3.3. Gene structure and motif Analysis and amino acid content of USP of wheat

3.4. Chromosomal mapping and gene duplication analysis

3.5. Gene Ontology (GO) annotation and Subcellular localization

3.6. Cis-regulatory elements analysis of USP promoters of wheat

3.6.1. Promoter related elements

3.6.2. Light responsive elements

3.6.3. Hormone-related elements

3.6.4. Development-related responsive elements

3.6.5. Abiotic stress-responsive elements

3.6.6. Biotic stress-related elements

3.6.7. Unidentified

3.7. Identification of Glycosylation and phosphorylation sites in TaUSP genes

3.8. Protein modeling, disordered region analysis, and prediction of binding sites

4. Discussion

5. Conclusion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1