Following various established formalisms (Wilson et al. 1955; Field 2007; Fuglebakk et al. 2013; Dasgupta et al. 2014a), we describe how NMA is performed using ENMs as a general method in this section. NMA captures vibrational signature of a given protein structure, assuming the harmonic approximation. In this method, the potential energy of the system is expressed as a Taylor series of the displacement of atomic coordinates truncated to the second differential term.
For a system of N atoms, the ith atom has coordinate (xi, yi, zi), and corresponding displacements are given as Dxi, Dyi, Dzi. The coordinate system can be defined as qk = Dxi, qk+1 = Dyi, qk+2 = Dzi where k = 1, 2, 3,..., 3N-2, 3N-1, 3N describing a total of 3N degress of freedom.
The potential for the above-described coordinate system can take the following form,
$$U\left(q\right)=U\left(q=0\right)+ \sum _{i}^{3N}{\left(dU/d{q}_{i}\right)}_{q=0}{q}_{i}+ \frac{1}{2}{\sum }_{i,j}^{3N}{\left({d}^{2}U/d{q}_{i}d{q}_{j}\right)}_{q=0}{q}_{i}{q}_{j}$$
1
Where the equilibrium condition can be imposed by limiting q, around the minimum potential energy \({\left(dU/d{q}_{i}\right)}_{q=0}=0\). We can further simplify the above equation by setting first term in the left-hand side to zero, giving,
$$U\left(q\right)= \frac{1}{2}{\sum }_{i,j}^{3N}{\left({d}^{2}U/d{q}_{i}d{q}_{j}\right)}_{q=0}{q}_{i}{q}_{j}$$
2
The potential energy curve is a parabolic function centered around the equilibrium position of the system. The partial derivatives in (2) constitute a Hessian matrix (H) of the size 3N x 3N, where each term is given by \({H}_{ij}={\left({d}^{2}U/d{q}_{i}d{q}_{j}\right)}_{q=0}\). The time-evolution of the system given in (2) using Newton’s equation of motion is given by,
$${m}_{j}{\ddot{q}}_{j}= - {\sum }_{i}^{3N}{H}_{ij}{q}_{i}$$
3
,
for each j, and j = 1, 2, 3,..., 3N and mj is the mass of atom j. Thus, the Hessian matrix is constituted by 3N simultaneous second-order differential equations, where a coupled harmonic oscillator is constructed for each degree of freedom. To solve the second-order differential equations, the following general form can be used,
$${q}_{j}={A}_{j}\text{c}\text{o}\text{s}({\omega }_{j}t+{\varphi }_{j})$$
4
,
where Aj is the amplitude of the jth oscillator with frequency wj with phase fj. By combining (3) and (4) the secular equation is as follows,
$$\text{H}\text{V}= {\Lambda }\text{V}$$
5
where, H is mass-weighted hessian matrix where each term \({H}_{ij}={\left({d}^{2}U/d{q}_{i}d{q}_{j}\right)}_{q=0}/\sqrt{{m}_{i}{m}_{j}}\), L is a diagonal matrix including oscillator frequencies, \({\Lambda } \equiv \left[{\lambda }_{ij}\right]\), \({\lambda }_{ij}={\delta }_{ij}{{\omega }_{j}}^{2}\), where \({\delta }_{ij}\) is Kronecker delta-function, which is 1 only when i = j, and otherwise 0. The V includes 3N eigenvectors V = [v1 v2 ... v3N], where each eigenvector has 3N dimensions. By diagonalizing the Hessian matrix (Eq. (5)), the displacement of atom i from equilibrium position is obtained as a sum of contribution from individual oscillators,
$${q}_{i}^{\left(0\right)}= \frac{1}{\sqrt{{m}_{i}}}\sum _{k=1}^{3N-6}\frac{\sqrt{2{K}_{B}T}}{{\omega }_{k}}{v}_{k}\text{c}\text{o}\text{s}({\omega }_{k}t+{\varphi }_{k})$$
6
where, KB is Boltzmann constant, T is temperature. The superscript ‘(0)’ indicates the native conformation. The effective eigenvectors run from 1 to (3N – 6), where the first six are related to six rigid body motions of the system (three translational and three rotational). Using Eq. (6), the covariance matrix (3N x 3N) can be calculated via the following,
$${C}_{ij}= ⟨\left({q}_{i}-{q}_{i}^{\left(0\right)}\right)\left({q}_{j}-{q}_{j}^{\left(0\right)}\right)⟩= \frac{{K}_{B}T}{\sqrt{{m}_{i}{m}_{j}}}\sum _{k=1}^{3N-6}\frac{{v}_{k,i}{v}_{k,j}}{{\omega }_{k}^{2}}= {K}_{B}T{\text{M}}^{-1/2}\text{V}{{\Lambda }}_{g}^{-1}{\text{V}}^{T}{\text{M}}^{-1/2}$$
7
where, vk,i indicates i-th component of k-th eigenvector, M is diagonal mass-matrix with k, k + 1 and k + 2 diagonal elements from atom i as \(1/\sqrt{{m}_{i}}\), and Lg is diagonal matrix with 3N-6 non-zero eigenvalues. The superscript ‘T’ indicates the transpose operation.
For standard NMA, the system of atoms should be minimized so that the whole system reaches a minimum energy conformation prior to solving Eq. (5). However, for a reasonably large system it is difficult to ensure such minimum energy conformation by energy minimization techniques, due to the complexity of the potential energy function in molecular mechanics force-fields. A reasonable approximation is derived by assuming that experimentally determined coordinates capture a low-energy conformation (a local minima, indicating native conformation), and thus, the Taylor series expresses the potential energy at the energetic minimum, avoiding costly energy minimization.
Still, obtaining each Hessian matrix term remains arduous. This can again be bypassed effectively by using ENMs. In ENMs, the harmonic oscillator is envisioned as a Hookean spring (Tirion 1996; Fuglebakk et al. 2013), fluctuating around a native conformation. To define a spring, the distance between a pair of atoms i and j, dij is used, for which a spring between atom i and j has energy,
$${u}_{ij}=\frac{1}{2}{c}_{ij}{\left({d}_{ij}-{d}_{ij}^{\left(0\right)}\right)}^{2}$$
8
where, \({c}_{ij}\) is force-constant of the spring, and \({d}_{ij}^{\left(0\right)}\) is distance between i and j in native conformation. The total potential energy is sum of individual Hookean oscillator,
$$U= \left(K/2\right){\sum }_{i,j}^{N}{{c}_{ij}\left({d}_{ij}-{d}_{ij}^{\left(0\right)}\right)}^{2}$$
9
where K is a phenomenological constant. First note that, to calculate each Hessian term, dij should be written in terms of individual coordinate differences, like, \({d}_{ij}^{2}= {{{\Delta }x}_{ij}}^{2}+ {{{\Delta }y}_{ij}}^{2}+ {{{\Delta }z}_{ij}}^{2}\). Then double differential of U is calculated with respect to change in coordinate in two different directions (say a and b where \(\alpha ,\beta \in \{x,y, z\}\)) for two atoms i and j. This constitutes mass-unweighted Hessian terms,
$${\left(\frac{{d}^{2}U}{d{q}_{i}d{q}_{j}}\right)}^{\left(0\right)}= {\left(\frac{{d}^{2}U}{d{\alpha }_{i}d{\beta }_{j}}\right)}^{\left(0\right)}= -\left(1-{\delta }_{ij}\right){h}_{ij}+ {\delta }_{ij}{\sum }_{k\ne i}{h}_{ik}$$
10a
where,
\({h}_{ij}= {c}_{ij}\frac{\left({\alpha }_{i}^{\left(0\right)}-{\alpha }_{j}^{\left(0\right)}\right)\left({\beta }_{i}^{\left(0\right)}-{\beta }_{j}^{\left(0\right)}\right)}{{\left({d}_{ij}^{\left(0\right)}\right)}^{2}}\) (10b).
A popular way to define the pair-wise force constant is,
$${c}_{ij}=exp\left[-\frac{1}{2}\left(\frac{{d}_{ij}^{\left(0\right)}}{{d}_{cutoff}}\right)\right]$$
11
where, dcutoff is a cutoff distance specified by user. This also referred as distant-dependent force-constant.
Note that, for a system with N atoms the Hessian matrix is 3Nx3N dimensional and thus, for large systems diagonalization of Hessian is a memory-intensive procedure. There are several ways to increase computational efficiency without sacrificing significantly on accuracy:
-
A popular coarse-graining scheme is to define the elastic model based on only Ca-atoms. This reduces the computational burden by considerable extent. Moreover, in this case all constituent atoms are identical so that mass-weighting of Hessian is redundant (Bahar and Rader 2005).
-
Another approach related to decreasing burden of diagonalization is rotational-translational-block (RTB) approach (Durand et al. 1994; Tama and Brooks 2005). Here, the Hessian matrix is projected to a sub-space, which is then diagonalized. The projection matrix is designed to have a dimension of 3N x 6M, where M is the number of blocks of the system. After projection the Hessian matrix is 6Mx6M in dimension, which is much simpler as M is much less than N. If M is equal to number of residues, then each residue is included in a block, and RTB and Ca-based coarse-graining are of similar size. In such a condition, one could argue that Ca-based method treats the effect of other atoms in each residue implicitly.
-
The normal mode of protein structure can be obtained in dihedral angle space (Noguti and Gō 1983; Ishida et al. 1998; Lopéz-Blanco et al. 2011). For this the above potential energy expression (1) need to be changed to internal coordinate space, i.e., degrees of freedom of the system are defined in terms of bond lengths, bond angles and dihedral angles. Then one can assume that bond lengths and bond angles do not change in functionally relevant motions, and those degrees of freedoms are fixed. Thus, only dihedral angles are allowed to fluctuate. One limitation of this method is that the normal mode displacements (as in Eq. (6)) is defined in terms of dihedral angle, requiring a back-transformation to the usual Cartesian coordinate system (Wako and Endo 2011).
An important aspect of the covariance matrix (Eq. (7)) is that when diagonalized, it returns the normal modes of the system with the reciprocal of the eigenvalues as the approximate normal mode frequencies. This acts as an equivalent method to characterize intrinsic dynamics of a system via molecular dynamics (MD) simulation (Amadei et al. 1993). An MD simulation obtains a set of conformations of a protein, from which the covariance matrix (\({C}_{ij, MD})\) can deduced from ensemble average, given by
$${C}_{ij, MD}= \frac{1}{{N}_{c}-1}{\sum }_{k=1}^{{N}_{c}}\left({x}_{ik}-<{x}_{i}>\right)({x}_{jk}-<{x}_{j}>)$$
12
where, Nc is number of conformations in the ensemble, \({C}_{ij, MD}\) is i,j-th element of the MD-based covariance matrix, \({x}_{ik}\) is i-th coordinate out of 3N degrees of freedom from k-th conformation, and \(<{x}_{i}>\) represents average of i-th coordinate over all the conformation. For MD, diagonalization of the above covariance matrix gives us principal components (eigenvalue) as variance, where each eigenvalue represents magnitude of variation of the data along the principal eigenvector. The eigenvector includes coefficients for the linear combination of observed structures (often referred as loadings for the principal components).