Protein sequence resources
The Almond proteome (Prunus dulcis, Strain: cv. Texas) was administered from publicly available database namely, UniPort (www.uniprot.org). It constitutes 8 chromosomes with 6618, 3668, 3486, 3294, 2745, 4196, 3166, 3074 number of proteins present in respective chromosomes and 1764 protein as unassembled. This totals to 31,932 proteins in the complete proteome of almond
(Alioto, 2020)
.
Identification of potential allergen proteins
For discovering potential allergens from Prunus Dulcis proteome, Allergenonline, version 21 was used (www.allergenonline.org). This database contains 2233 protein sequences (913 groups, 430 species) and is maintained by Food Allergy Research and Resource Program (FARRP), University of Nebraska–Lincoln. For identification the FASTA sequence was searched against the database with search method Full Fasta 36, provided within the Allergenonline toolbox. Credentials: E value < 1E-07 and Identity > 50% were used while searching and max alignments was set to default.
Motif based screening of potential cross-reactive protein sequences
Identified potential allergens were further screened. In, here we selected allergen sequences from AllFam database and used them as a reference for screening. AllFam constitute 151 families with 1059 allergens in total and in maintained by Medical University of Vienna. Reference allergens were chosen on two bases, first on the basis of their routes of exposure, second was the source. Motifs from query sequence were compared to that found in reference and the sequences that matched the most were selected
Classification of putative allergens into protein families along with their gene ontology (GO)
The retrieved sequences were subjected to Pfam database (www.pfam.xfam.org) and Conserved Domain Database (www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) for protein family analysis. Pfam 34.0 contains 19179 families (as of May 2021) and was produced and maintained by European Bioinformatics Institute. Protein families were retrieved by using Hidden Markov Models Method (www.ebi.ac.uk/Tools/hmmer/search/hmmscan). Gene ontology (GO) was carried out to determine the biologic, molecular or cellular function of all the discovered potential allergens. GO accession numbers of retrieved allergens were obtained from InterPro database (www.ebi.ac.uk/interpro). For plotting and visualizing the ontology data the GO accessions were submitted to Web Gene Ontology Annotation Plot (WEGO, www.wego.genomics.cn ).
Orthology Modelling and Phylogeny
Galaxy server was used for analyzing orthologs of putative allergens (https://usegalaxy.eu/). Proteinortho tool from galaxy server was used for finding the orthologs (Lechner, 2011). It is a tool used to detect orthologous genes or proteins within different species. Basically, it equates the similarities between the given data sequences (genes or proteins) and assemble them to significant groups. Diamond Algorithm and E-value 1E-03 was used for finding the orthologs. Orthology analysis was carried out using the proteomes of related species i.e., Apricot (Prunus armeniaca), Cherry (Prunus avium) and Peach (Prunus persica). Orthology groups with allergenic proteins were examined and selected. Phylogeny tree of the selected sequences was built using Phylogeny.fr (Dereeper A., 2010).
B-cell epitope prediction
The Immune Epitope Database (IEDB) was used for prediction of B cell epitopes (https://www.iedb.org/). IEDB constitute experimental data of antibody and T cell epitope studies conducted in humans and other animals. Majority of the available data is from infectious disease, allergy and autoimmunity studies. B-cell linear epitopes were predicted using Bepipred 2.0 and ABCpred (Nemati Zargaran, 2021). Bepipred 2.0 web server uses Hidden Markov Model (HMM) whereas ABCpred uses artificial neural network (ANN) to predict B cell epitopes. Thereafter, to find the conserved epitopes among the allergens and non/putative-allergen proteins, epitope conservancy analysis was carried out.
Structure prediction by molecular modelling
Swiss model server was used for structure modelling of the selected potential allergens (www.swissmodel.expasy.org/). Swiss model is a web server maintained by Swiss Institute of Bioinformatics Biozentrum at University of Basel. The target sequence was uploaded along with project title to search for templates. Among all the suggested templates a specific template on the basis of its coverage, GMQE (Global Model Quality Estimate), identity and method of structure prediction was selected to predict the model of target sequence. The Global quality estimate, local quality estimate and Z-score index of the predicted model was used to assess the quality of predicted model. Quality of structure was also assessed by ProCheck tool. Model with best quality index was selected for further analysis.
Molecular Docking
Cluspro 2.0, server was used for molecular docking (www.cluspro.bu.edu/home.php ). It is an online server created and maintained by Vajda lab at Boston University. Finally selected potential allergen structure was uploaded as a ligand whereas an antibody structure procured from protein data bank was uploaded as receptor onto the server. For specific antibody-protein docking, antibody mode provided by Cluspro was used. For precise results, non-CDR regions of antibody were masked on.
Molecular dynamic simulation studies
MD simulations of the best docked results with most negative energy was carried out using GROMACS 2020.1 software package. The elements of the simulation system were protein and water. All the MD simulations were executed using CHARMM36 force field
(Huang, 2016)
. A triclinic box was created around the protein-protein complex and thereafter solvated with water model TIP3P. The system was positively charged and to neutralize, 3 CL negative ions were added. After Energy minimization, position restrained was applied followed by NVT and NPT equilibrium. NVT (constant Number of particles, Volume, and Temperature) was performed with coupling groups of protein and non-protein, at temperature of 300 K, with a coupling constant of 0.1 ps for 100 ps whereas for NPT (Number of particles, Pressure, and Temperature are all constant) pressure of 1 bar with coupling constant of 2.0ps for 100 ps with the same coupling groups was applied. Finally, MD simulation of 25 ns was executed
(Lemkul, 2018)
.