Structural and Functional Basis of Red Palm Weevil: Pathogen Interactions Leading Towards Enhanced Crop Production in Date Palms


 Food safety remains a significant challenge despite the growth and development in agricultural research and the advent of modern biotechnological and agricultural tools. Though the agriculturist struggles to aid the growing population's needs, many pathogen-based plant diseases by their direct impact on cell division and tissue development have led to the loss of tons of food crops every year. Though there are many conventional and traditional methods to overcome this issue, the amount and time spend are huge. Scientists have developed systems biology tools to study the root cause of the problem and rectify it. Host-pathogen protein interactions (HPIs) have a promising role in identifying the pathogens' strategy to conquer the host organism. In this paper, the interactions between the host Rhynchophorus ferrugineus (an invasive wood-boring pest that destroys palm) and the pathogens Proteus mirabilis, Serratia marcescens, and Klebsiella pneumoniae are comprehensively studied using protein-protein interactions, molecular docking, and followed by 200 ns molecular dynamics simulations. This study elucidates the structural and functional basis of these proteins leading towards better plant health, production, and reliability.


Introduction
For nearly 5000 years, Phoenix dactylifera L. Arecaceae (the date palm) is an ancient plant, providing high nutritious food for the Indus valley and Middle East people (1,2). These date palms are considered as livelihood security of the rural farmers as they are extensively grown as economic or ornamental trees (3,4). Rhynchophorus ferrugineus (Olivier 1790) or red palm weevil (RPW), the most devastating pest of all palm trees worldwide, belongs to the order Coleoptera of the Curculionoidea family. It was once a native of Melanesia and Southeast Asia, but to an accidental anthropogenic introduction in the Caribbean, USA, Mediterranean Basin, and the Middle East, it has spread across the globe (5). These insects have turned out to be herbivorous crop pests during evolution (by specialized interactions with its host plants) (6,7). The distinct genetic makeup and life cycle of the RPW and its cryptic nature have made it hard to detect and, therefore, di cult to control the pest (8). Food and Agriculture Organization in the United Nations has identi ed RPW as a category-1 quarantine pest in the Middle East and North Africa (MENA) region as it is a threat to the date palm farmers (9) . Currently, around 40 palm species (globally) are affected with R. ferrugineus (10). It uses date palm for feeding and mating regulated by a male-produced pheromone consisting of 4(RS)-methylnonan-5-one (ferrugineone) and (4RS, 5RS)-4-methylnonan-5-ol (ferrugineol) (11).
Host-pathogen protein interactions (HPIs) are highly complex processes, often involved in the pathogen's approach to invade and breach the host immune defenses and multiply and endure within the host organism (12,13). Traditional biological research such as isolating and studying small sets of components (14) and other experimental procedures for identifying interacting proteins like binary approaches and co-complex methods (15) are expensive as well as time-consuming (16) and may not provide a proper insight on HPI on a larger scale (14). To overcome this, systems biology approaches, which have been developed to enhance the accuracy, coverage, and e ciency in recognizing protein pairs, are an upcoming tool to support understanding the underlying mechanisms during HPIs (17).
Indeed, several different systems biology approaches have demonstrated their effectiveness (18). These approaches depend on an unbiased and complete knowledge of the host/pathogen transcriptomics and use high-throughput data for anticipating protein-protein interaction (PPI) (19). Further computational analyses of protein sequence, structural and genomic pro les have partially revealed the mechanisms of interaction and functional relationships (20,21) between host and pathogen, leading to a deeper understanding of the infection process (22). Some of the techniques such as gene neighbor and gene cluster methods (23,24), interologs (25,26), and phylogenetic pro le (27,28) are used for recognizing both PPIs (29,30) and host-pathogen protein interactions (31,32).
Based on the literature search, several microorganisms, such as Bacillus thuringiensis (33), Metarhizium anisopliae, and Beauveria bassiana (34,35), have been suggested to play an essential role in controlling the weevil. We hypothesize that Proteus mirabilis, Serratia marcescens, and Klebsiella pneumoniae directly interact with the host (RPW) based on protein-protein interactions. Hence, they might stop the growth of RPW and can eventually lead to its death. In this study, we want to identify proteins that are highly likely to interact, predict their function and 3D structures of the proteins, dock the interacting proteins, and nally do molecular dynamics simulations. Our goal is to nd the top candidate enzymes used as a safe bio-pesticide to kill the RPW.

Host and Pathogen Sequence data retrieval
The host sequences, GenBank assembly accession: GCA_014462685.1, were retrieved from the NCBI website (36), using the following link: https://www.ncbi.nlm.nih.gov/assembly/GCA_014462685.1/#/st by clicking on "download assembly", "GenBank" and then choosing the le type "protein fasta (.faa)". The downloaded protein sequences contain more than 25,000 hypothetical proteins, which is their functions not yet annotated. Three pathogens, namely Proteus mirabilis, Serratia marcescens, and Klebsiella pneumonia, were chosen for the study. The protein sequences for these three pathogens were downloaded in FASTA format from NCBI

Protein-Protein Interaction
PredHPI tool (http://bioinfo.usu.edu/PredHPI/) was for solving and studying the protein-protein interaction. The host and the pathogen's sequence identity and coverage were set to 90% and 80%, respectively, to obtain the best hits.

Homology Modeling and Protein-protein Molecular Docking
Structures of the best hits for the host proteins and their respective pathogens were modeled using the PHYRE2 web-server (37). All the respective sequences were retrieved from NCBI and were subjected to homology modeling utilizing the intensive mode. After homology modeling, Prey-predator Protein-protein molecular docking was performed using the online web-server HawkDock (38).

System setup
The protein-protein complexes were prepared for molecular dynamics simulations. Six hundred ns accumulative simulations were performed with the amberff14sb force eld (39). Solvated systems were prepared using scripts in the pdb2gmx module of Gromacs 2020.4 (40), and MD simulations using Gromacs 2020.4 (40) were performed. The system consisted of the protein complex, TIP3P water, counter ions Na+/Cl-. Then, several steps have been done on the system. These are energy minimization, which lasted for around 3200 steps following 1000 pico-second equilibration. The MD production run was set up for 200 ns each, totally to accumulative 600 ns MD production procedure. NPT ensemble was used (1 bar) with a time step of 2 femtoseconds. The temperature has been set up at 300K with a low damping coe cient, while pressure was controlled using the Nose-Hoover Langevin piston. Electrostatics was calculated using the particle mesh Ewald (PME) method. A total cut-off at 12 Å was given for short-range and van der Walls electrostatics. All simulations were replicated twice with initialized random seed to get average scorings.

Data analyses
Data analysis for the produced trajectories was performed using Gromacs scripts previously implemented in Gromacs (40), and data were plotted using GnuPlot (http://gnuplot.info). We have also calculated RMSF α alignments for carbons for all residues and structural changes by RMSD throughout the simulation run. Calculation between the hydrogen donor and acceptor was set with a cut-off at 3.2 Å, including the backbone and side-chain. Other analyses such as Radius of gyration (ROG), Solvent accessible surface area (SASA), and Hbond formations upon ligand binding were calculated using Gromacs scripts. RMSD, RMSF, total energy, SASA, Radius of gyration, and bonds were plotted using GnuPlot.

Results And Discussion
Protein-protein interactions for prey and predator systems Based on the predictions from the PredHPI tool, we found several proteins of the host and pathogens interacting with each other. The host and the pathogens' most critical interacting proteins are tabulated in Table 1 and Table 2, respectively, with their designated function (based on functional analysis with similar proteins of that kind). Table 3, Table 4, and Table 5 demonstrated the interaction pro le and similarity score between the host and the three different pathogens (Klebsiella pneumoniae, Proteus mirabilis, and Serratia marcescens) proteins. We can observe the protein KAF7276687.1 as a common protein targeted by all three pathogens. It directly suggests the crucial and vital role of this protein in the host body. As per Table 1, the homolog of this protein is P0ACD5 which has a function of Iron-sulfur cluster assembly scaffold protein IscU. This protein shows that this protein works in the structural integrity of the host organism.
Homology modeling of all unknown proteins of host and pathogens: All unknown host and pathogen proteins are modeled based on the homology modeling approach utilizing Phyre2 web-server (37). The homology model of the protein WP_004244046.1 was made using the template 1W7V chain D (41). The homology model of Another pathogen protein WP_001062737.1 was homology modeled using the Crystal structure of a DUF3571 family protein (ABAYE3784) from Acinetobacter baumannii AYE at 1.95 A resolution (PDB ID: 4L3U-A). The rest of the information can be obtained in Table 6.

Molecular docking
After obtaining all homology models of unknown proteins, molecular docking has been performed between the most common and most critical interacting proteins of the host and the predator (data as provided in Table 7) to understand the mechanism of interaction of two proteins. Docking score as the measure of docking interaction has been recorded. The best docking score has been observed in the KAF7276687.1 protein of the Host and WP_016929764.1 protein of the pathogen. The probable function of both the proteins also signi es the cause of the better binding. KAF7276687.1 has the probable function of Iron-sulfur cluster assembly scaffold protein IscU, and WP_016929764.1 has the Master enzyme that delivers sulfur to several partners involved in Fe-S cluster assembly as its potential role. A little similar score has been obtained for KAF7276687.1 and YP_005228233.1 interacting proteins which has same probable functions as just discussed. Table 7 shows that Same functional probability resulted in the achievement of similar docking score.

Molecular dynamics simulations
The best docking pose with favorable interactions was selected for molecular dynamics simulations to study long-range/short-range To determine the complexes' structural stability, we also monitored the solvent-accessible surface area (SASA), as shown in Figure 3 (right panel). We observed that the structure of all the three complexes almost had a similar average SASA value of 240 nm 2 .
These results suggested that the complex is very stable; thus, the conformational dynamics analyses RMSD, Rg, and SASA results were consistent, indicating the stable interaction of protein complex over time.
The number of hydrogen bonds formed between two proteins of the protein-protein complex during the simulation was plotted, proving effective interaction of the two proteins to each other during the 200 ns time of simulation (Figure 4-left panel). The higher numbers of hydrogen bonds (around 400) display signi cant interactions between the K. pneumoniae complexes during the simulation. All these results proved the highly stable binding of K. pneumoniae complex proteins, thus providing great scope for its potential to modulate the structural conformation of the complex to a greater extent. P. mirabilis and S. marcescens complexes also show equivalently strong and stable conformations.
The number of H-bonds between solute and solvent in the case of K. pneumoniae complex is higher (around 1160) than the other two complexes (around 1130 in P. mirabilis and S. marcescens). This evidence shows a greater chance for K. pneumoniae complex to interact freely with the environment and to adapt to any conformational change during the simulation (Figure 4-right panel).
To further explore the dynamic properties of the investigated structures in our simulations, the essential dynamics (ED) analysis on the backbone atoms was performed (48). ED re ects the overall conformational space of the protein-protein complex during simulations ( Figure 5-left panel). The projection of trajectories onto the rst two principal components (PC1, PC2) shows the motion of the investigated system in phase space and is illustrated in Figure 5 (left panel). In the ED analysis, all the three complexes show similar phase space behavior and explored the same conformational space. Minor irregular stretches or clusters of line in the ED plot of the complexes may be due to increased exibility in the initial part of the simulation suggesting the more conformational dynamics of both partner proteins upon complex formation, and stabilizing over time, which remains stable until the end as seen in the RMSD plots.
We also calculated the H-bonds between Host protein and bacterial protein over 200 ns. As shown in gure 5 (right panel), the results of these analyses indicate the increase in h-bonds after 90 ns. The results implied that hydrogen bonds stabilize the interaction between both proteins over time and increase the complex's stability.

Conclusions
Critical challenges at present related to food safety has been a matter on concern on growth and development in agricultural research and the advent of modern biotechnological and agricultural tools. In this paper, host-pathogen interaction between Rhynchophorus ferrugineus (an invasive wood-boring pest that destroys palm) also known as red palm weevil and pathogen's Proteus mirabilis, Serratia marcescens, and Klebsiella pneumoniae have been comprehensively studied using structural and functional based research. The systems biology approach has also been applied, and it was found that the Iron sulfur cluster assembly protein (KAF7276687.1) in the host is targeted commonly by all three pathogens (Klebsiella pneumoniae, Proteus mirabilis, and Serratia marcescens) through their sulfur delivery master enzyme's. Following molecular dynamics simulations results show stable protein complexes over the course of 200 nanoseconds. This study provides a suitable starting point to carry forward the experimental work to make bio-pesticides or the control of red palm weevil, the most dangerous pest for the date's cultivation. Tables Table 1 shows the function of homologous proteins for the host interacting proteins.     Table 7. Consensus table of host protein interaction to predator protein and the docking score obtained after molecular docking using HawkDock web-server (38). Figure 1 shows the protein-protein interactions of (A) Klebsiella pneumoniae, (B) Proteus mirabilis, and (C) Serratia marcescens with the predator protein, respectively.  An average number of Hydrogen bonds formed within the solute (protein-protein complex) and between solute and solvent during the entire 200 ns of simulations.

Figure 5
Essential dynamics provided the PCA graph (left) for all three host-pathogen systems. Projections for S. marcescens (in green) and P. mirabilis (in red) are exactly overlapping, and therefore red is not visible. The right panel shows the H-bonds between Host protein and bacterial protein.