The RCSB PDB database was used to retrieve the 6ncq representation of the SFPQ protein. The 6ncq model was chosen because it was an isolated protein and was free of any paraspeckle components.
The NLSdb database was used to obtain a workable set of NLSs to work with and screen through. The NLSdb was used because it was the only comprehensive publicly available database of NLSs.
The PEepSMI tool was used to convert raw peptide sequences to SMILES format. This ensured software compatibility. This was done for each NLSs’ sequence, resulting in 3255 SMILES strings.
CLC Drug Discovery Workbench
The CLC Drug Discovery Workbench was the predominant tool used in this study It was used for Lipinski's Rule of 5, Initial Screening, and molecular docking. Due to this workbench being retired by CLC Bio, this software is only available in trial mode.
This tool was used to help verify the ADMET properties of passed ligands.
NLS/SFPQ Protein Handling
First, a CSV file of all 3255 signals in the NLSdb database was downloaded. Next, all 3255 NLS sequences were prepped and converted into SMILES format using the PepSMI tool. Given the NLS sequence, the PepSMI tool assumed linear configuration and converted each one into a SMILES String. Then, each string was copied into the CLC Workbench, which generated the 2d molecular structure in the workbench and the relevant statistics(size,weight).
The 6ncq representation of the SFPQ protein was downloaded as a PDB file and imported into the CLC Workbench.
In initial screening, NLSs’ that were certain to fail were screened out, saving time and energy in later phases of testing. Screening was done based on multiple factors such as the size of the NLS and structural deformations. If the raw peptide sequence was greater than 10, the NLS was removed from testing.
After this step, 650 NLSs’ were left, about 20% of the initial 3255 NLSs’ from the NLSdb database.
Lipinski’s Rule of 5
To narrow down our pool of NLSs, Lipinski’s Rule of 5(Ro5) was applied. Lipinski’s Rule of 5 is a rule to evaluate the drug-likeness of a chemical compound. Lipinski’s Rule of 5 states that a given molecule must not have more than 5 hydrogen bond donors, no more than 10 hydrogen bond acceptors, a molecular mass less than 500 daltons, and an octanol-water partition coefficient that does not exceed 5. To pass, a molecule must meet a minimum of 3 requirements. All 650 passed NLSs from initial screening were then put into the CLC Workbench and underwent Rule of 5 testing. 26 molecules passed Lipinski’s Rule of 5 with 2 violations or less.
The 26 molecules that passed Rule of 5 testing were put through a docking simulation in the CLC Workbench to see how well an NLS would bind to the SFPQ protein. The 26 NLSs from Rule of 5 testing and the 6ncq SFPQ protein were given as input. Given these, the workbench identified the best docking site on the SFPQ protein. Then, it ran the docking simulator to find how well each NLS binds to the SFPQ protein. This docking simulator returned a PLANTSPLP score for each NLS. The PLANTSPLP score was calculated by the equation Score=Starget-ligand+Sligand. This scoring system rewarded Hydrogen bond, lone-pair, and nonpolar interactions. Conversely, it also punished non-polar - polar interactions, hydrogen bond donor-donor contacts, and hydrogen bond acceptor-acceptor contacts. Since this final output score mimics the potential energy change between the protein and ligand. A negative score corresponds to a strong binding while a positive score correlates with a weaker binding. [Table 2] shows that all 26 NLSs had excellent binding affinities for the SFPQ protein.
Taking all 26 NLSs that passed the docking simulation, ADMET verification was conducted on all the molecules.ADMET, which stands for (absorption, distribution, metabolism, excretion, and toxicity), is a way to measure the efficacy and safety of a molecule. The SMILES strings of all 26 molecules were inputted into the admetSAR 2.0 webtool. Here, crucial properties such as Human Intestinal Absorption, Caco-2 permeability, Blood Brain Barrier, P-glycoprotein inhibitor/substrate, Carcinogenicity(binary/ternary), Ames mutagenesis, ether go-go inhibition and acute oral toxicity were tested. These properties were hand-picked due to the molecules that are being used(NLSs). For example, the Blood Brain Barrier (BBB) was picked due to the location of my target drug (neurons). After testing each of the 26 NLS for all 8 properties, only 2 NLSs(NLSs 551&544) passed all 8 properties with positive/neutral results. (4 Properties show in [Table 2] )
Out of the 2 NLSs that passed ADMET Verification, (551&544), NLS 551 was chosen to be the lead molecule, because it had better property assurance. Specifically, NLS 551 had a better assurance of Human Intestinal Absorption(.9419) when compared to NLS 544 (.4308)