Methods for Energy-Based Analysis
We obtained the myocilin (MYOC) structure (PDB entry 4WXQ) and modified it using the PyMOL computer software for molecular visualization (Fig. 2a, b)
The most recent Rosetta score function utilizes highly accurate calculation algorithms for energy approximations to a degree that allows for ΔΔG prediction in kilocalories per mole (kcal/mol). Optimization data for the Rosetta energy function for macromolecular modeling and design illustrate a strong correlation between the Rosetta-predicted values and experimental data for ΔΔG calculation (R = 0.994)45, thus we define values below 0 kcal/mol as having stabilizing thermodynamic properties. Initially, we compared our ΔΔG outputs with the Rosetta literature value for the T193V mutation (–4.95 kcal/mol) in the reverse transcriptase-RNaseH-derived peptide bound to HIV-1 protease (PDB entry 1KJG)45 and found that our calculated ΔΔG (–4.12 kcal/mol) using the updated version of the REF2015 score function (beta_nov16_cart) was slightly closer to the experimentally determined value (–1.11 kcal/mol), thereby signifying an improvement in the score function parameters for predicting free energy changes upon an introduction of a mutation.
Python programming language and the PyRosetta program (www.pyrosetta.org) were used for the Rosetta energy calculations. The program’s input is the 4wxs.clean.pdb file, a coordinate file for the 4WXS PDB structure that was cleaned from non-canonical amino acids prior to running the program.
We refined the PDB using a fast relax protocol and the beta_nov16_cart energy score function, using the following initialization parameters:
init(extra_options = "-beta_nov16_cart -in:file:s 4wxs.clean.pdb -packing:use_input_sc -relax:constrain_relax_to_start_coords -in:ignore_unrecognized_res -relax:coord_constrain_sidechains -relax:ramp_constraints false -relax:cartesian -relax:min_type lbfgs_armijo_nonmonotone")
The PyRosetta commands that we’ve implemented in our calculations matched the Rosetta options in the initialization. From this initialization, we calculated the initial PDB score for the relaxed wild type.
After obtaining an initial PDB score for the relaxed wild-type protein, we utilized the mutate_residue() function within PyRosetta to introduce a mutation of interest. We then applied the fast relax on this mutation. The program produced a PDB file and the PDB score for the relaxed mutation. We compared the difference between the PDB scores for the relaxed mutation and the relaxed wild type, and if we ran multiple tests of the relaxed mutation, we also output the average difference between the PDB scores for the mutation and wild type.
After this refinement, we applied a Cartesian version of Rosetta’s ΔΔG protocol on the output PDB files of the relaxed wild type and relaxed mutations, using the following initialization:
init (extra_options = "-beta_nov16_cart -in:file:s input_file -packing:use_input_sc -relax:constrain_relax_to_start_coords -in:ignore_unrecognized_res -relax:coord_constrain_sidechains -relax:ramp_constraints false -relax:cartesian -relax:min_type lbfgs_armijo_nonmonotone -ddg:mut_file -ddg:iterations 5 -max_cycles 200 -relax:min_type lbfgs_armijo_nonmonotone -fa_max_dis 9.0")
We also ran PyRosetta commands to apply a fast relax on the input PDB files. From this initialization, we calculated the initial PDB score for the relaxed wild type applied with ΔΔG protocol. We ran a similar procedure for all the relaxed mutations and compared the difference between the PDB scores for the relaxed mutation and the relaxed wild type, both applied with ΔΔG protocol. If we ran multiple tests of the relaxed mutation, we also output the average difference between the PDB scores for the mutation and wild type.
After running the program in the same directory, all the output PDB files of the relaxed mutations can be found, and the PDB scores can be found in the output text files.
The python program we used, mutation_scorer.py, is included in the Supplementary Information, named python_codes.docx. The program runs on Python3 and has been tested on MacOS and Linux operating systems.
The updated version of the REF2015 algorithm was used for accurate ΔΔG prediction. We ran a PyRosetta implementation that matched the Rosetta functions used by Alford, et al. (2017)45 using the PDB entry 1KJG. They obtained a ΔΔG of -4.95 kcal/mol for the T193V mutation, whereas our value was − 4.12 kcal/mol, which is slightly closer to the experimentally determined value of -1.11 kcal/mol.
To learn more about the nature of these MYOC mutations, we have utilized Rosetta scoring algorithms to predict the free energy change (ΔΔG) of known MYOC mutations45.
We obtained a high resolution (1.9 \(\AA\)) MYOC OLF domain PDB structure (entry 4WXS) with a E396D variant46 from the protein data bank (www.rcsb.org) and performed ΔΔG calculations for each somatic and germline mutation using the most updated Rosetta scoring function with Cartesian refinement (beta_nov16_cart), which provided a very efficient sampling of the global minimum45. Interestingly, stabilizing mutations were predominantly located on the exterior of the MYOC OLF domain. These variants are schematically illustrated as dotted Van der Waals structures for glaucoma-causing (Fig. 2a) and cancer-associated (Fig. 2b) mutations. Similarly to Fig. 1, the codes that generated Fig. 2c and 2d are included Supplementary Information: MYOC_Figure2c.docx and MYOC_Figure2c.docx.
Amino acid changes are color-coded based on the Venn diagram additive color model (Fig. 1d).
We found that the ratio of cancer to glaucoma stabilizing variants was quite large; 31 cancer-associated and four glaucoma-inducing mutations exhibited negative ΔΔG values. The thermodynamically stabilizing cancer-associated variants within the OLF domain are: G244E, E253Q, T281N, D289E, D289N, T293K, T293M, D294N, F299I, E300Q, M308I, K314N, I317T, T325M, V344L, Y371F, E409K, L413I, E414K, T416I, A427T, A427V, I432V, V439I, A447V, T448N, V449I, R470H, Y473F, F487L, and M494V. Five of these mutations overlapped with the germline neutral polymorphisms (T293K, T325M, E414K, A447V, and R470H) and one variant (A427T) overlapped with a germline glaucoma-causing mutation observed in older individuals (61 ± 21.1 years of age)21. A427T was previously reported to possess a relatively high Tm (48.3 ± 0.3 °C) compared to other glaucoma-causing mutations46. Additionally, the other three stabilizing glaucoma-inducing mutations, Q297H, E300K, and N450D, also had a late age of onset between 60–75 years of age21, 47.
G367R, a germline glaucoma-causing variant with an early age of onset (13–23 years of age) and a high maximum IOP (40 mmHg)48 was observed in tumors of patients with esophageal adenocarcinoma. Interestingly, this variant was not stabilizing according to Rosetta calculations with a ΔΔG value of + 4.23 kcal/mol. Several glaucoma pathogenic mutations manifested similar characteristics (i.e., overlap with somatic variants, early age of onset, and destabilizing properties) including G246R, L255P, T285M, R296C, A363T, P370L, T377M, D384N, and R470C, thus recapitulating the likelihood that these pathogenic mutations accumulate within MYOC of TM cells and largely contribute to the onset of glaucoma.
The E253Q and A447V cancer-associated variants were the most stabilizing of all mutations tested with Rosetta ΔΔG values of − 6.93 and − 6.12 kcal/mol, respectively. Interestingly, the A445V and Y473C neutral polymorphisms were also highly stabilizing according to the Rosetta energy calculations with ΔΔG values of − 4.15 and − 5.67 kcal/mol, respectively. The reported literature Tm value for A445V is 54.2 ± 0.2 °C and for Y473C is 54.2 ± 0.1 °C, which compared to the wild-type MYOC Tm (52.2 °C)46 demonstrates that these neutral variants are thermodynamically stabilizing.
To emphasize where the stabilizing somatic (Fig. 2c) and germline (Fig. 2d) variants are located along the mutation diagram, we slightly raised the mutation points above the rest of the respective variants within the OLF domain (Fig. 2c, d). Multiple thermodynamically stabilizing cancer-associated mutations appeared at amino acid positions D289 and A427 within the OLF domain. The stabilizing R470H cancer-linked and neutral variant, illustrated as a white circle with a thicker lining (Fig. 2c), overlaps with a glaucoma-causing and cancer-inducing R470C mutation, which did not exhibit favorable thermodynamic properties.
To investigate ΔΔG parameters of a somatic and neutral germline variant within the N-terminal region of MYOC, we calculated the energy of the R82H mutation that is associated with pancreatic cancer and is among the most frequently mutated cancer-linked variants. We utilized the recently published full length structure of MYOC7 49 to calculate the free energy of the frequently mutated R82H variant located in the coiled-coil region and found it to be exceptionally stabilizing (–9.69 kcal/mol). This observation is reminiscent to the disorder-to-order transition observed with disease-associated mutations within intrinsically disordered regions of proteins50.