Designing a new bi-specific tandem single-chain variable fragment antibody against tumor necrosis factor-α and interleukin-23 via in silico studies for the treatment of Rheumatoid arthritis

Background: Rheumatoid arthritis disease is a chronic autoimmune inflammatory disease that mainly causes synovial joint inflammation and cartilage destruction. Tumor necrosis factor-α (TNF-α) is a pivotal cytokine that plays an important role in rheumatoid arthritis. The treatments focusing on a single cytokine' inhibition, are able to clinically produce meaningful responses in only about half of the treated patients due to multiple cytokines involved in this disease. In the present study, a bi-specific tandem single-chain variable fragment was designed in order to suppress both human tumor necrosis factor-α and interleukin-23 (IL23) as a potential therapeutic drug candidate for this disease. To do so, at first, eight bi-specific tandem single-chain variable fragment models were built against tumor necrosis factor-α and interleukin-23 cytokines with different domain orders and then 50 ns molecular dynamics simulation was performed for each one and thereafter structural properties were exploited. Results: MD simulation results indicate that the domains' order strongly affects tandem single-chain variable fragment properties and in overall, the fragment VLIL23+Linker+VHIL23+linker+VLTNF+Linker+VHTNF +His6 (VL is variable light and VH is variable heavy fragments and His6 is six histidine) not only separated antibody domains but also had better stability and solvation energy. Conclusions: Hence, this structure can be considered as a potential drug for rheumatoid arthritis. It is hoped that this research could shed a light for the treatment of Rheumatoid arthritis disease.


Introduction
Rheumatoid arthritis (RA) is a chronic autoimmune disease of the joints that is characterized by synovial inflammation as well as cartilage and bone destruction [1]. This disease is associated with undeniable negative consequences for the individual such as loosing his/her function and as a result work disability, which poses significant socioeconomic challenges to the society [2]. Many clinical studies have validated the pivotal contribution of cytokines in the pathogenesis of RA [3]. Tumor necrosis factor (TNF-α) antagonist is assumed as one of the most efficient agents available for rheumatoid arthritis [4]. Randomized clinical trials and extension studies have revealed that all anti-TNF drugs are highly effective in both early and established RA. Large-scale observational and registry studies have evaluated their use and long-term efficacy in clinical preparation. It is proved that all anti-TNF drugs have not only a validated and rapid efficacy, but could also inhibit the progression of structural joint damage as due to their rapid and prolonged inflammation' suppression [5]. However, according to the results of clinical trials, almost 50% of RA patients treated with TNF-α inhibitors do not achieve efficient response. Moreover, some RA patients treated with cytokine inhibitors lose their sensitivity over the time and need to switch to other agents [6].Therefore, there are still a high medical requirement and also alternative strategies to fight against inflammatory and autoimmune illnesses and to further improve patients' lives.
Applying a combined inhibitor for different cytokine targets may overcome this challenge via permitting more effective inflammation' suppression. Given the inflammatory process complexity and the heterogeneity among the patients with RA, it is very likely that the best therapeutic efficacy is obtained through using more than one single mechanism [1].
IL-23 and IL-17 plays important roles in both joint inflammation and the association of destructed bone, indicating that their inhibition would ameliorate both processes [7]. IL-23 is considered as a vital checkpoint in the generation, maintenance, and also activation determinant of pathogenic Th17 cells. This is largely attributable to the fact that only pre inflammatory Th17 cells express the IL-23 receptor (IL-23R) [8]. It could be noted that IL23 stabilizes IL17 expression by Th17 cells and seems to give 'license' to Th17 cells in order to become pathogenic. By inducing the expression of its own receptor, IL23 also induces an autocrine positive-feedback loop [3]. Proliferation and stabilization of Th17 cells also require IL-23 [9]. Therefore, therapeutic strategies that specifically block Th17 cell development (i.e., IL-23) can be used as a putative treatment method for RA. IL-23 up regulates receptor activator' expression of nuclear factor-kB ligand (RANKL) on fibroblastlike synoviocytes, myeloid precursors and on CD4 cells and eventually on driving osteoclast formation. The chief observation fueling this idea is that IL-23p19 − /− mice are fully resistant to the development of collagen-induced arthritis (an animal model of RA) [10]. In male DBA1 mice, collagen-induced arthritis is associated with increased expression levels of IL-23 and IL-23R by the synovial membrane. The intra-articular injection of an antibody to IL-23R significantly improved the arthritis and decreased the synovial expression of IL-23, IL-23R, IL-17, IL-1, IL-6, and TNF-α [11]. The intra-articular injection of IL-23 into the mice resulted in the local migration of neutrophils by increasing IL-17 production and decreasing IL-12 and IFN γ production. In vivo neutralization of IL-23 was uncovered to be the cause of diminishing IL-17 induction; so, it can be concluded that Th17 cells cannot be produced in IL-23 deficient mice [9]. These data confirm the fact that targeting IL-23p19 through a vaccination strategy is protective in collagen-induced arthritis. This specific targeting of IL-23 might constitute a promising therapeutic approach to investigate rheumatoid arthritis [12].The serum level of IL-23 was significantly elevated in the patients with RA compared to the healthy controls. In addition, there was a significant positive relationship between the serum level of IL-23 in RA patients and individual disease activity' parameters [13][14][15].
Abovementioned data, suggest that IL-23 may potentially play a role in RA pathogenesis and therefore targeting the IL-23, may provide a new promising therapeutic approach in the treatment of RA. It seems that the combination of IL-23 with TNF-α inhibitor, may represent a very promising novel approach for the treatment of rheumatoid arthritis.
Bi-specific antibodies (BsAbs) are the second antibodies' generation used for therapeutic purposes in immunotherapy [16]. At least 30 different BsAb and related molecules are currently applied in clinical development and mainly oncology, autoimmune or chronic inflammatory indications [17]. The most successful clinical application of BsAbs up to now has been its usage in oncology by redirecting cytotoxic T cells to kill tumor cells.
Simultaneous blockade of several different disease mediators may lead to a greater therapeutic efficacy and/or benefit for the patients than targeting individual disease' mediators [18].
Small recombinant bi-specific antibodies, devoid of all constant immunoglobulin parts, are assembled from the variable light (VL) and heavy chain (VH) domains of two antibodies.
ScFv fragments of different specificity can be easily included into a bi-specific molecule by joining genetically the two scFv moieties with a more or less flexible peptide linker (VL + Linker + VH). In these tandem scFv (taFv) molecules, each antigen-binding site forms a separate folding unit. Consequently, tandem scFv molecules could be considered in different arrangements of scFv and variable domains (VL or VH) [19]. The most commonly used flexible linkers have the sequences which primarily consist of Gly and Ser residues ("GS" linker). An example of the most widely used flexible linker is the sequence of (GGGGS) n . The length of this GS linker could be optimized via adjusting the copy number "n" in order to achieve appropriate separation of the functional domains, or to maintain necessary inter-domain interactions. The incorporation of Ser or Thr would maintain the linker stability in aqueous solutions by hydrogen bonds' formation with water molecules and therefore reduces unfavorable interaction between the linker and protein moieties.
The selection of (Gly) 8 linker was due to the reports accentuated that flexible linkers can increase the accessibility of an epitope to antibodies or improve protein folding. In sum, flexible linkers are generally rich in small or polar amino acids such as Gly and Ser and are able to provide good flexibility and solubility. In addition, flexible linkers can serve as passive linkers to keep a distance between the functional domains. The length of flexible linkers might be adjusted to permit proper folding or to achieve optimal biological activity of the fusion proteins [20]. Thus, tandem scFv molecules were constructed possessing short Ala-Ala-Ala or Gly-Gly-Gly-Gly-Ser linkers, or longer linkers, which can adopt secondary structures [19]. In this study, the flanking linkers (FL) within the scFv units and middle linkers (ML) connecting the two scFv (ATNF and AIL23 domains) molecules were (GGGGS) 3 .
The association between an antibody and an antigen involves myriad of non-covalent interactions between the epitope (binding site on the Ag), and the paratopes (binding site on the Ab). Six hyper variable loops within the variable domains of Abs, commonly termed complementarity determining regions (CDRs), are widely assumed to be responsible for Ag recognition; while, the constant domains are believed to mediate effector activation. A set of CDRs constitutes a paratope; three CDRs are found in the heavy chain (H1, H2, H3) and three are revealed in the light chain (L1, L2, L3) [21].
There is a considerable interest in developing more effective treatments of the diseases by simultaneously targeting multiple components of the disease pathways.
There are currently five FDA-approved TNF inhibitors using for the treatment of RA, including etanercept, adalimumab, certolizumab, infliximab, and golimumab [22].
However, almost 50% of RA patients treated with TNF-α inhibitors do not show any notable positive response; therefore, it is tempting for the researchers to speculate a fusion protein for TNF-α that participate in the pro-inflammatory cytokine cascade [28] and the treatment of RA. Previous studies also reported that the bispecific fusion protein has curative effectiveness for RA [23,24] or other inflammatory diseases [25][26][27].
Presently, the use of computational techniques in chemistry and biology, from the quantum mechanics of molecules to the dynamics of large complex molecular aggregates, is becoming quite popular. Molecular interactions steer chemical reactions, phase transitions and other physical phenomena and can be studied via molecular dynamics (MD) simulation that show detailed motion of molecules or atoms as a function of time.
The properties of a model system, minima geometries of proteins and DNA, and binding drugs' free energy can be studied via MD simulation [28].
In this study, we aim at designing a new bispecific taFv against TNF-α and IL-23 using in silico simulation as a new treatment for Rheumatoid arthritis. At first, eight bispecific taFv were designed against TNF-α and IL-23 by molecular modeling and the impact of domains' order on structure and dynamics of taFv was investigated.
In the present study, we designed a new bispecific tandem scFv antibody, which is capable of simultaneously recognizing TNF-α and IL-23. Eight arrangements of AIL23 and ATNF with one linker were considered for this protein. These structures were examined via theoretical study because it was somewhat a less expensive and faster method. It should be noted that inappropriate design of taFv leads to unsuitable antibody' generation [29].
Consequently, it could be said that this research is the first attempt to design a bispecific taFv using in silico methods. The Fab (antigen-binding) fragment of one antibody is divided into the Fv (fragment variable) (Fv = VH + VL) and other fragments. Fv is the smallest fragment that binds to antigen via the connection of both heavy and light chains. CDR is on Fv fragment and the Fv of the Fab fragment of anti-TNF-α antibody has PDB code 4nyl and anti-IL-23 has PDB code 4 m6n. These structures were used as the model building' templates. Homology models of the models were constructed using Modeller 9.16 software [38]. For each taFv structure, 1000 models were generated and sorted by their molecular probability density function (molpdf) values [39,40]. Ten models with lower molpdf energy value for each taFv structure were selected out of 1000 generated models and used for further analysis.

Methods
The best model was selected based on stereochemistry among these ten models using Procheck software [41].The amount of allowed regions' residues of the Ramachandran plots was higher than 91% in all taFv models, indicating that they are reliable homology modeled structures. For contraction, the Ramachandran plots were not exposed. In addition, the quality of final models were assessed using ProSA-web (Protein Structure Analysis) [42] and it was confirmed that all of them were appropriate energies (data not shown). Thereafter, best selected taFv models were used as starting structure for MD simulations. To do so, Pymol software was used to visualize the models [43].

Molecular Dynamics Simulation
MD simulations were carried out using the Gromacs 5 [44] under G43A1force field. Each of these models was put in a center of cubic box filled with SPC216 water model and 10 Å far from the box edges. Anionic strength of 0.14 M (Na + and Cl − ions) was also considered to simulate the physiological conditions. To avoid edge effects and better describe the condition of full hydration, periodic boundary conditions were applied for each hydrated protein system. Subsequently, the systems were energy minimized using steepest descent algorithm followed by conjugate gradients algorithm with a tolerance of 60 kJ/mol/nm. After energy minimization, the systems were equilibrated at the constant volume (NVT) for 1000 ps and constant pressure (NPT) for 2000 ps at 100 K in which, initial configuration of the structures were kept fixed. Finally, a 50 ns production of MD simulation at 300 K with a time step of 0.002 ps was performed for each system using isothermal coupling ensemble (NPT). Electrostatic interactions were computed using particle mesh Ewald (PME) method and the cut off electrostatic and van der Waals' interactions were 10 Å. All covalent bonds involving hydrogen atoms were constrained by LINCS algorithm during the production phase. Then, in total 400 ns MD of the simulation for all models were executed and analyzed via Gromacs analysis tools. Afterward, 10 snapshots were taken with 500 ps intervals during the last 5 ns of the simulations of each taFv structure. These structures were used to generate PQR files by the PDB2PQR web server (http://nbcr-222.ucsd.edu/pdb2pqr_2.0.0/). Eventually, APBS software (Adaptive Poisson-Boltzmann Solver software) was applied to calculate solvation energies [45] and the average of 10 solvation free energies was calculated for each model. The dielectric constant was set to four for the protein and 78 for the solvent.

Results And Discussion
At first, the structures were designed with modeler software and then molecular dynamics simulation was implemented for each structure. Thereafter, several factors were analyzed including: RMSD, Rg, hydrogen bond numbers, distances between domains, SASA, free energies of solvation and electrostatic and van der Waals interactions, during the last 40 ns of the simulation. These parameters demonstrate protein stability and ability to bind to antigen as well as aberrant folding and aggregation of fusion proteins [46]. The amount of residues belonging to the most favored/additional allowed regions of the Ramachandran plots was higher than 98.4% in all taFv models, which indicates the reliability of homology modeled' structures. The ProSA website showed that Z-scores (overall model quality) of all models were negative and within the range of -9.37 to-10.02 which underlines its acceptable quality.
Rmsd, Potential Energy, Temperature: To obtain the equilibrium time of each simulated protein model during MD simulation, the backbones root mean square deviation (RMSD) were calculated (Fig. 2). RMSD plot is usually used to evaluate the time required for a system to reach structural equilibrium and to estimate the duration of running a simulation. As it is illustrated in Fig. 2, all models represented stable dynamics and reached a plateau during the last 40 ns and then they were all analyzed and performed during the last 40 ns. < Fig. 2 near here > Table 1 shows the results of average potential energy, temperature, root mean square deviation (RMSD), radius of gyration (Rg), ratio of the total energy drift to the average of total energy during the last 40 ns of the simulation. The system temperature reached the plateau at 300 K after 10 ns (the temperature plot shows that all molecular systems became stabilized and remained stable throughout 50 ns of the molecular dynamics simulations) and fluctuated around 1 K during the last 40 ns of the simulation. Next, all systems reached thermal equilibration. The potential energy of all systems was measured to be negative and stable during the last 40 ns (throughout 50 ns) of the simulation. Also, the total energy drift was trivial in all systems; so that the simulation times seemed to be adequate. The results of backbone RMSD, potential and the temperature showed that the equilibrium and simulation times were sufficient in all systems. Table 1 The average of the potential energy, temperature, root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (Rg), ratio of the total energy drift to average of total energy during the last 40 ns of simulation.

Radius Of Gyration:
The radius of gyration (Rg), which is a property linked to the molecule volume and compactness, was found to fluctuate between 2.38 nm and 3.04 nm. The gyration radius of eight taFvs models were presented in Fig. 3. The Rg of all models reached the stable state after 5 ns; while, the Rg of taFv4 and 8 were greater than other models. < Fig. 3 near here > In protein engineering, the functional domains' separation to reduce interference, is considered as a very important factor. As it was discovered by Arai et al [ 47], a higher Rg is associated with increased distance between the domains of a fusion protein. In our study, Rg was consistent with the distance between ATNF scFv and AIL23 scFv domains.
These distances in taFv8 and taFv4 models were more than other models (Table 4); therefore, in this structure these domains have the least interference with each other. The absence of changes in Rg, confirms the fact that the protein properly folds; while, the changes of Rg value imply that protein structure unfolds over the time. The Rg value fluctuation of taFv4 is more than taFv8 (Table 1). Hence, it can be conclude that taFv8, remains more stable during the simulation time. Table 4 The average of free energy of solvation (× 10000 kj/mol) of all models, electrostatic and Van der Waals interactions energy (kj/mol) between protein and water and the average of short range electrostatic Solvent accessibility surface area (SASA) of all models and CDR residues (in VH and VL parts) were computed during the 50 ns of MD simulation (Fig. 4) and the average of the last 40 ns MD of the simulation were exhibited in Table 2. Although many researchers take advantage of the SASA to monitor folding process, in current research, SASA was utilized to determine whether the selected model has sufficient area for binding to IL-23 and TNF antigen. The solvent accessible area plays important role in interaction between antigen and antibody. The higher surface area could mediate high affinity of the antigen-antibody complex. As it is displayed in this table, taFv1, taFv4, taFv7 and taFv8 have similar and higher total surface area than other models. The highest SASA value of CDR residues of ATNF scFv, was observed in taFv2 (27.59 nm 2 ). Also, CDR residues of AIL-23 scFv were observed in taFv4 (25.19 nm 2 ). Moreover, the ASAS of taFv8 in both ATNF and AIL23 was remarkable (about 23 nm 2 ). The total SASA value of taFv8, taFv7, taFv4 and taFv1 were similar and greater than other constructs. The SASA of CDR in these models was more than other models (Table 2). So, it is concluded that these models have more ability for binding to the antigens. Table 2 The average of total accessible surface area of all modelsand CDR residues during the last 40nsof MD simulation. Protein solubility, and consequently aggregation, is a critical issue in several areas of biochemical and biopharmaceutical research such as the production and maintenance of protein pharmaceuticals or industrial enzymes [48]. In the present study, interactions between the protein and water was analyzed by a number of H-bond, electrostatic and van der Waals' interactions. The energies between protein and water and free energies of solvation were also investigated.
In the present study, the interaction between the VH and VL domain in each scFv was analyzed by number of H-bonds, center of mass (COM) distance, and van der Waals plus electrostatic energies between domains. The pattern of hydrogen bonding was evaluated with a donor-acceptor distance cut-off of 0.35 nm and donor-hydrogen-acceptor angle cutoff of 30°. The average of hydrogen bonds within the models and between the protein and water are articulated in Table 3. Table 3 The average of number of hydrogen bonds between VH and VL in AIL23 and ATNF domain and average of number of protein-protein hydrogen bonds and protein-waterduring the last 40 ns of MD simulation. Hydrogen Bond: The hydrogen bonds are responsible for a major factor of stable conformation' maintenance of the protein [49]. H-bond analysis was performed to examine the H-bond for eight models during the 50 ns of the simulation period. The average number of Hbonds within the proteins (intramolecular) is given in Table 3 The changes in the VH/VL association can modify relative position of the CDR loops which, in turn, can alter general shape of the antigen-binding site, as well as the arrangement of the side-chains that interact with the antigen [39]. Also, insufficient VH-VL interface stability of scFvs has often been suggested as the main cause of their aggregation and non-functionality [50]. In particular, specific H-bonds are supposed to be crucial for the conformation and stabilization of the CDRs and VH/VL interface' packing [39]. The average number of hydrogen bonds between VH and VL domains of scFvs were calculated and compared with each other. The results uncovered that in ATNF scFv domains the maximum number of hydrogen bonds was in the model 3 (11.3); while, in AIL23 scFv domain the greatest number of hydrogen bonds between VH and VL was in the model 5 (7.2) ( Table 3).
All models introduced an average of 4.2-11.3 hydrogen bonds throughout the last 40 ns of the simulation period. In addition, the numbers of hydrogen bonds were considerable in taFv8 (about 5 and 7.3 in AIL23 and ATNF respectively).
< Table 3 near here > Free Energy Of Salvation: The average of salvation free energy in all models is illustrated in Table 4. The least solvation free energy belongs to the model 8 and is -23800 kJ/mol. This energy is compatible with the number of hydrogen bonds between the models and waters (Table 4).
< Table 4 near here > Electrostatic And Van Der Waals Interactions' Energy: Attractive interactions between the protein and water are a dominant factor in ensuring high solubility. In addition, the short range van der Waals (LJ-SR) interaction, and short range coulomb (Coul-SR) interaction (electrostatic) energies were calculated between the protein and water. The lowest eletrostatic + Van der Waals energies between them were obtained for taFv4, taFv8 and taFv1 (Table 4). These results are consistent with the number of hydrogen bonds between the protein and water. In total, considering the number of hydrogen bonds, the free energy of salvation, and van der Waals and electrostatic energies, the taFv8 exhibits the best results in terms of the interactions between the protein and water.
Van der Waals plus columbic energy between the protein and water in taFv4 and taFv8 and taFv1 were more negative than other models and it is consistent with the number of hydrogen bonds between the protein and water in these models and salvation free energy in the model 8. Subsequently, the models four and eight and one and particularly model eight have more solubility than others.
Also the average of electrostatic plus van der Waals interactions' energies between VL and VH domain of ATNF and AIL23 are exposed in Table 4. The strong electrostatic plus van der Waals' interaction between them showed the protein stability and was in taFv3, taFv7 for ATNF domain and also in taFv2, taFv3 and taFv5 in AIL23 domain. A tight relationship was detected among electrostatic plus van der Waals' interaction energies and the number of hydrogen bonds between VH and VL segments. Distance: Spatial separation of the domains in hetero-functional chimeric proteins is important for independent work of domains [47]. Figure 5 and Table 4 represent the distance between the center of mass in AIL23 scFv and ATNF scFv domain or during 50 ns MD of the simulation. Figure 5 shows that among eight models, the maximum center of mass distance between anti-TNF-α scFv and anti-IL23 scFv, was belong to taFv4 and taFv8.
Thus, in these models scFvs fragment can be separated from each other better than in other models. These results are consistent with their gyration radius. Although the number of hydrogen bonds between VH and VL and also the sum of van der Waals and electrostatic energies between VH and VL in ATNF and AIL23 is not better than other models, it seems that these parameters are not necessary for the stability, solubility and binding biphasic antibody to the antigens. Consequently, according to all calculated parameters taFv8 is selected as the best model.
It is worthy to say that the properties of the taFv8 can be improved by the application of middle linker modifications between the two scFv molecules.

Conclusion
We designed a new biphasic (bispecific) antibody by using MD of the simulation to neutralize TNF-α and IL-23 cytokines. In silico analysis showed that taFv8 (VL IL23 +FL + VH IL23 +ML + VL TNF +FL + VH TNF +His6) has better structural properties comparing other taFvs. The reason is that almost all its structural properties such as Rg and solvation energy, hydrogen bonds, the distance between the domains and accessible surface area were better than other models. Overall, the methodology adopted in this study is fruitful in the drug discovery process and would improve drug efficacy and safety profiles in RA treatment. It should be noted that this is the first drug candidate with the combination of anti-IL23 and anti-TNF-α via molecular dynamics simulation which was used for the treatment of RA and it is hoped that this drug can be effective in ongoing experimental studies in our Lab. Reporting studies involving human participants, human or animal data and tissue: Not applicable, this work is only a bioinformatics study.

Consent for publication of individual person's data:
Not applicable, this work was done only via authors data.

Availability of data and material:
All data and materials of this manuscript are available. Data sharing is applicable if somebody request.

Funding:
Design of this study and collection and analysis were performed by funding the Isfahan University, research department of medical sciences, Isfahan, Iran and interpretation of data and writing the manuscript were done by funding of Shahrekord university of Iran.  Backbone RMSD of all models during 50 ns MD simulation.

Figure 4
Total salvent accessible surface area (SASA) of models during 50 ns MD simulation.

Figure 5
The center of mass distance between two scFv during 50 ns MD simulation.