Drug discovery is an arduous and expensive process, with long term preclinical and clinical phases. Currently, the drug industry is witnessing a paradigm shift towards a pathology-based bioinformatics approach. The advent of artificial intelligence and machine learning programmed softwares provide a better understanding of molecular interactions of drugs with receptors. Through the application of computational biology, the role of drug molecules in the pathophysiology of diseases can be virtually predicted with precision. 
The development of in-silico simulation has now made it possible to infer detailed information regarding pathways of regulatory genes, mechanism of action of potential drug candidates and binding energy calculations between a molecule and its receptor. It is likely that the development of these technologies will dramatically change, the drug discovery process in the next few decades. [16, 17]
The research work undertaken in this study describes the 3D homology model construction of 18-ECFP and its molecular dynamics simulation. The current study has employed computational biology to examine the interaction of the modelled protein with cancer cell death receptors alongside binding energy calculations. The gene expression induced by 18-ECFP in oral cancer cell line SCC-9 in vitro has also been determined by wet lab techniques providing an insight into the mechanism of cell death.
Proteomic procedures were undertaken to determine the type and sequence of amino acids present in the 18-ECFP for modeling by energy-based methods. These findings enabled the construction of the homology model of 18-ECFP that is required for in-silico evaluation (molecular docking and binding energy calculations) with cancer cell death receptors Caspase-3 and Capsase-8.
The isolation of 18-ECFP was initiated by SDS-PAGE and progressed by MALDITOF & MS-MS Sequencing. The technique of PMF involves digestion of a target protein with trypsin and obtaining a mass spectrum of the peptides in the mixture.  The m/z values obtained were crossmatched with the known protein databases for a parallel match. This method exclusively relies on the peptide map obtained in the mass spectrum and not on the AA sequence. Proteins with translational modifications can also be identified with a PMF analysis, however increased post translational changes may interfere with the protein identification.  Tandem Mass spectrometry (TMS) or Mass spectrometry (MS) is an advanced technique for protein analysis and is an indispensable tool for proteomics research.  For any MS based experiment, the instrument type, the fragmentation strategy and method of analysis should be considered. The MS functions on the principle of charge to mass ratio (m/z), the MS measures the (m/z) of ions in gas phase. The MS consists of a source of ions that convert the protein analyte into gas phase ions. A mass analyzer separates the ionized analytes based on their (m/z) ratio. The detector records the number of ions at each (m/z) value. 
Protein analysis by MS was revolutionized with the advent of electrospray ionization (ESI) and matrix-assisted laser desorption/ ionization (MALDI), these soft ionization techniques are capable of ionizing peptides or proteins efficiently, the same has been utilized in the current study. Use of two simultaneous mass analysers increases the detection specificity and helps in detecting target proteins with precision. [22, 23] In this scenario the first analyzer depends on the data of the other analyzer, hence it’s also referred to as data dependent.
A study by Pisamai et al., 2018, unveiled the unique PMFs of novel candidate protein markers of canine oral tumors by performing Malditof and LCMS analysis. The authors identified several peptide fragments that play a crucial role in tumorigenesis. The authors concluded by stating that this data might help veterinarians choose drugs of choice and treatment plan.  In the present study Malditof-MS was employed to determine the PMF of the 18-ECFP, this was implemented in order to study the peptides present. This is the first report of Malditof-MS analysis of ECF from EE. The Malditof – MS/MS analysis followed by a Mascot search of the sample 18-ECFP showed peptide match with Extracellular globin OS = Glossoscolex Paulistus (Appendix − 1). However, several amino acid details were missing hence a Nano-LCMS bases amino acid sequencing had to be performed to retrieve complete sequence and number of amino acids present in the 18 kDa protein.
Hybrid platforms like LC-MS/ MS and high-resolution LCMS are popular analytical methods for productive research. In LC-MS-based proteomics, the complex mixture of proteins is first enzymatically cleaved resulting in formation of peptide products which are then analyzed by a mass spectrometer. A small percentage of peptides present in the sample is usually considered for identification. The 10 most abundant peaks are selected from the first MS spectra (MS1) step for fragmentation in the second MS analysis (MS2).The data obtained is simplified by removing redundant information from isotopic peaks through a process called deisotoping. 
The present study has utilized the Q-TOF SYNAPT G2 Mass Spectrometer. The equipment provides the most complete characterization of complex mixtures and molecules with unique levels of MS performance, industry leading informatics and unparalleled platform versatility with superior separation with high-efficiency Ultra Performance Liquid Chromatography (UPLC)-MS/MS performance. The current study preferred UPLC over HPLC for protein analysis. The pressure difference applied differentiates between the High-Performance Thin Layer Chromatography (HPLC) and UPLC procedures. The development of < 2 microns LC columns have drastically increased the efficiency of the LC procedure. [26, 27]
UPLC employs robust high efficiency pumps perform at ultra-high pressures hence the name. Due to the capability of high-speed usage, dwell volume, detector flow cell and detector scanning rate the UPLC’s were designed using significantly smaller diameter tubing. The UPLC sensors have increased scanning rate efficiency which ensures that the entire protein peak could be detected with no missing data.  Conventional HPLC instruments employ scanning rates of 20 Hz only whereas UPLC detectors use high rates at 160 Hz.  The use of a UPLC instrument like the one employed in the current study ensured production of reliable results.
The performance of MS technology in the domain of oncology provides a greater understanding of pathology in terms of new protein markers and possible drug targets.  A study by Fiolka et al., 2019, determined the amino acid sequence of a protein carbohydrate complex from the coelomic fluid of earthworm Dendrobaena veneta. The authors performed LC-MS screening-based AA-sequencing to delineate the fundamental composition of the complex. The authors concluded that it can be used for the treatment of skin and mucous membrane candidiasis in the future due to its anti-fungal activity.  The current study performed AA-sequencing for the first time for a coelomic fluid protein of EE. The 18-ECFP revealed a total of 158 amino acids making this the first report in literature. (Fig. 6 and Table 3)
The knowledge of the type, number and sequence of amino acids present enabled a homology model construction. Homology modeling is a tool for drug discovery that aids in construction of a 3D model of the molecule of interest. Using simulation tools, the constructed model can be docked with known pro-apoptotic cancer receptors, since anti-cancer drugs act through these receptors in causing cancer cell death. Identifying the structure of these biomolecules could be an initial step towards drug development and future anti-cancer research expectations could be promising.
A recent study conducted by Abdelmonsef AH, 2019, built a valid three-dimensional (3D) model of Rab39a. The amino acid sequence of Rab39a (ID: Q14964) was retrieved from Uniprot database. The Rab39a/DENND5B interactions were examined by molecular protein–protein docking. The authors concluded that Rab39a has emerged as a therapeutic target for drug development towards lung cancer. 
Compared to the study stated above, the current study constructed a homology model not by merely downloading the sequence from a protein database, instead the complete sequence of the protein of interest was systematically determined in the laboratory by nano-LCMS based amino acid sequencing and then the homology model was built using Schrodinger software.
Schrodinger is a state-of-the-art computational biology technology that encompasses homology modeling and protein sequence analysis tools that include advanced loop predictions, chimeric model building, annotation capabilities and interactive protein structure quality analysis. 
The current study has used Schrodinger simulation software to build the previously unknown homology model of the 18-ECFP. The structure of ECF anti-cancer protein was previously neither available nor modelled. To the best of our knowledge this is the first report of a homology model of any earthworm extract till date. (Fig. 7A-7D) This highlights the novelty of the present study.
As noted from literature review, only few studies have employed MDS to demonstrate the dynamics of an anti-cancer protein in question. The MD simulation was carried out using Desmond tool of Schrodinger software. A study by Sreenivasan et al., 2014, performed MDS for a proposed anti-cancer protein Nek6. The overall structural fluctuation of Nek6 protein was evaluated by analysing the RMSDs of backbone atom versus simulation time. The authors stated that the trajectories produced a stable protein that could be carried forward for molecular docking.  A study by Nagpal et al., 2017, evaluated the mortalin-p53 complex formation by MDS. Mortalin with sequence ranging from 50–384 showed an average standard deviation of 0.303 in the RMSD profile of the modelled protein structure. The authors stated that the simulated protein structure had not deviated much in comparison to its initial structure and is relatively stable. 
The predicted model of 18-ECFP in the present study was proved to be stable and reliable after MDS evaluation in terms of spatial and energy boundaries. This is the first report of MDS that has been performed for a specific component of any earthworm extract. (Fig. 8 and Fig. 9) This is an added novelty of this study undertaken. The 18 kDa model was subjected to molecular docking studies. The modelled protein was docked with pro-apoptotic cancer receptors to study the binding affinity. Results of the current study exhibited satisfactory binding affinity with pro-apoptotic genes Caspase-3 and Caspase-8 responsible for cancer cell apoptosis.
It is crucial to predict protein-ligand or protein-protein interactions with accurate assessment of their binding energies. The BioLuminate programme of Schrodinger offers a comprehensive strategy for protein-protein docking experiments.  Built on a compact groundwork of all-inclusive protein modeling tools, BioLuminate offers access to supplementary advanced tools for protein engineering and analysis of protein-protein interactions.  Schrodinger’s BioLuminate is the first comprehensive user interface tool designed with significant user input, to precisely address the key requests related with the molecular design of biologic samples. 
A recent study by Tahlan et al., 2019, screened benzimidazole compounds for their anti-cancer property. The docking study of data sets were carried out by Schrodinger-Maestro v11.5 using CDK-8 (PDB code: 5FGK) and ER-alpha (PDB code: 3ERT). Compound 12 exhibited the best docking score of − 8.907 with CDK-8 receptor.  Another recent study by Yadav et al., 2020, evaluated PPD of tubulin protein with paclitaxel, etoposide and topotecan by molecular docking using Schrodinger software. The authors concluded that etoposide is the best drug for tubulin with a docking score of – 4.916. 
The above-mentioned studies performed docking of molecule of interest whose crystal structure was already available in the PDB. The findings of the current study are unique as compared to the above-mentioned studies as the homology model of the protein of interest was first constructed and then docked with Caspase – 3 and Caspase – 8 receptors. (Fig. 10 and Fig. 11) Over 100 docking poses were witnessed for both Caspase – 3 and Caspase – 8 receptors with the homology model of the 18 kDa protein. The top 5 poses with the highest PIPER score were considered for energy calculations.
PIPER is a state-of-the-art protein-protein docking program based on a well-constructed multi-staged tactic and progressive numerical system that reliably generates precise structures of protein-protein complexes. Based on well-validated docking code adopted from the Vajda lab at Boston University, PIPER has a confirmed track record as an outstanding predictor of protein-protein complexes. These results have also been judged by previous CAPRI (Critical Assessment of Prediction of Interactions) blind experiments. 
The binding energies between the 18 kDa protein and caspase receptors were calculated by the MM-GBSA assay. The Prime MM-GBSA approach is employed to envisage the free energy of binding for a receptor with a set of ligands or proteins. Prime MM-GBSA generates a sizeable proportion of energy properties. The properties generated report interaction energies for the ligand, receptor, and protein-complex structures as well. The tool also generates energy differences pertaining to strain and binding.  In the current study the \(\varDelta\)Gbind for pose_28 between 18 kDa protein of ECF of EE with 6bdv – Caspase-3 receptor exhibited a top bind score of -93.73 kcal/mol. The \(\varDelta\)Gbind for pose_28 between 18 kDa protein of ECF of EE with 5jqe – Caspase-8 receptor exhibited a top bind score of -103.21 kcal/mol. Satisfactory binding between the modelled protein and caspase death receptors were witnessed through the simulations conducted. As the MM-GBSA binding energies are approximate free energies of binding, a more negative value indicates a stronger binding between two molecules. (Table 6)
A recent study by Lokhande et al., 2019, successfully docked Deguelin (CID: 107935) with cyclin D1 and cyclin E receptors. The authors employed the MM-GBSA to evaluate the \(\varDelta\)Gbind between the deguelin and respective cyclin D1 as well as cyclin E receptor. The authors reported a \(\varDelta\)Gbind of − 10.2 kcal/mol and − 8.8 kcal/mol for deguelin with cyclin D1 and cyclin E receptors respectively. 
Another recent study by Suganya et al., 2019, docked Proanthocyanidin (PAC) a promising anti-cancer compound with BCL-XL, CDK2 and were compared with 5-FU receptor. The authors employed the MM-GBSA and concluded that PAC exhibited better binding affinity of − 5.23, − 5.17 and − 4.43, − 4.47 kcal/mol against BCL-XL, CDK2 when compared to 5-FU.  This is the first of a kind novel report highlighting the interaction of an anti-cancer protein from an earthworm source and human caspase receptors with binding energy calculations.
After obtaining the in-silico results, the 18-ECFP was evaluated on oral cancer cell line SCC-9 in vitro to determine gene expression of apoptotic genes. Wet lab studies such as RT-PCR, and Q-PCR were undertaken to achieve this objective.
Through PCR reactions, target nuclei acid sequences can be amplified with the use of a DNA polymerase, primers and nucleotides. A nucleic acid sequence can be used as a template for a PCR reaction. The source for the nuclei acid could be DNA, RNA, or cDNA. Short nucleic acids synthesized in vitro are called primers.  In this study, Caspase-3, Caspase-8, Bcl-2 and Bax gene expression induced by 18-ECFP was quantified in SCC-9 cell line by semi quantitative PCR. On evaluation the expression level of Caspase-3 and Caspase-8 genes by RT-PCR in SCC-9 cells treated with 10 µg/mL and 20 µg/mL of the 18 kDa protein were up regulated when compared to non-treated cells. Bcl-2 was not significantly upregulated. (Table 7–9 and Fig. 12) There was no expression of Bax gene in both control and treated samples. (Fig. 13) The internal control β-actin (housekeeping gene) was used to normalize the Caspase-3, Caspase-8, Bcl-2 and Bax gene expression.
Caspase-3 is a major executioner caspase involved eventually in both intrinsic and extrinsic mechanisms of apoptosis. Caspase – 8 is an initiator caspase which eventually activates executioner Caspase-3 leading to DNA fragmentation. Hence, Caspase-3 and Caspase-8 were chosen for the study.
A recent study by Fiolka et al., 2019, evaluated the apoptotic potential of coelomic fluid of Dendrobaena veneta by measuring the level of the activity of caspases 2, 3, 4, 5, 7, and 10 in after exposure to the test sample by ELISA technique. After incubation at a concentration 250 µg/mL a two-fold increase in the level of caspase 3, 4, 5 and 10 was witnessed. A small decrease in the activity of initiator caspase 2 was observed.  A study by Liu et al., 2017, evaluated the effect of EFE from Eisenia foetida on MCF-7 cells. The authors estimated the FAK and CD44v6 expression by RT-PCR and western blotting. They concluded that EFE at 80 µg/mL could inhibit cell adhesion by reducing the protein expression of FAK and CD44v6 in MCF-7 breast cancer cells.  Earlier studies have used whole fluids or a single protein component of high concentration of other earthworm specie to evaluate gene expression in cancer cells by PCR. In the current study only 10 and 20 µg/mL concentrations of the protein were used which is an added advantage highlighting the minimal dose required to induce gene expression.
Real time PCR was conducted to validate the results of RT-PCR for Caspase-3 and Caspase-8 gene expression. Q-PCR is a robust and sensitive method to quantitatively assess gene expression in samples. Q-PCR is a reliable molecular technique which has several advantages over conventional RT-PCR.
Q-PCR-based analyses uses the principle of combining the older end-point detection PCR with fluorescent identification methods to report the assimilated amplicons in ‘real time’ during each PCR cycle when amplified. 
Upregulation of Caspase-3 and Caspase-8 in treatment groups in comparison to control was found to be drug concentration dependent. (Table 10 and Fig. 14) Through the Q-PCR results obtained, it is established that the 18-ECFP has upregulated proapoptotic genes Caspase-3 and Caspase-8 in SCC-9 cells.
In the current study, a specific 18 kDa anti-cancer protein of EE was employed and gene regulation was studied on cancer cells in vitro. This is the first report in literature of a specific anti-cancer protein from EE upregulating pro-apoptotic genes in oral cancer cells in vitro estimated by three different gene expression techniques. This highlights the novelty of the current study. Identification of potent biomolecules through anti-cancer studies may facilitate their usage in drug discovery for adjunctive management of cancer therapies. The mechanism of inhibiting the division and proliferation of cancer cells by these extracts explored could open novel gateways for therapeutic targeting.