Bidirectional dynamic neural networks with physical analyzability

The rapid growth in research exploiting deep learning to predict mechanical systems has revealed a new route for system identification; however, the analytic model as a white box has not been replaced in applications because of its open physical information. In contrast, the models generated by end-to-end learning usually lack the ability of physical analysis, which makes them inapplicable in many situations. Consequently, high-accuracy modeling with physical analyzability becomes a necessity. In this paper, we introduce bidirectional dynamic neural networks, a deep learning framework that can infer the dynamics of physical systems from control signals and observed state trajectories. Based on forward dynamics, we train the neural ordinary differential equations in a trajectory backtracking algorithm. With the trained model, the inverse dynamics can be calculated and based on Lagrangian\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textit{Lagrangian}$$\end{document}Mechanics\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textit{Mechanics}$$\end{document}, the physical parameters of the mechanical system can be estimated, including inertia, Coriolis and centrifugal forces, and gravity. As a result, the model can seamlessly incorporate prior knowledge, learn unknown dynamics without human intervention, and provide information as transparent as analytic models. We demonstrate our method on simulated 2-axis and 6-axis robots to evaluate model accuracy, including physical parameters and verified its applicability on real 7-axis robots. The experimental results show that this method is superior to the existing methods. This framework provides a new idea for system identification by providing interpretable, physically consistent models for physical systems.


Introduction
The daunting challenge in dynamic deep learning lies not only in model accuracy but also in the lack of understanding of the inner workings of neural networks. This makes analytic models have different advantages from deep learning. For example, white-box modeling using analytic parameterization is conducive to applications, while black-box modeling using machine learning is free of prior knowledge. Therefore, we are eager for a flexible, gray-box model that unites the advantages of these two aspects. Traditionally, we use prior theories to build theoretical models [1][2][3] and then obtain the parameters with system identification [4]. The resulting analytic models are impeccable in integrity, and not only the forward and inverse dynamics but also some more physical parameters, such as inertia, Coriolis force, centrifugal force, and gravity, can be calculated. The conventional algorithms have been well-established, such as the least squares based on inverse dynamics identification model (IDIM-LS), the extended Kalman filter (EKF), and the instrumental variable (IV) technique. Unfortunately, the priori required by conventional methods is too detailed. For example, building a theoretical model of a 6-axis or a 7axis robot is really tedious work. Conventional methods also require excessively stringent filtering or numerical differentiation techniques. For example, how to accurately estimate acceleration without extra error. Due to these factors, deep learning is becoming more and more popular in dynamics.
Compared with traditional methods, deep learning is widely used in modeling based on its capabilities of function approximation [5]. The latter's automatic encoding of priors enables dynamical behaviors to be learned by neural networks [6,7]. Neural ordinary differential equations (ODEs) are a milestone [8] in the growing number of studies, and one of its common strengths and its variants is that modeling requires only input and state data, which makes its application far simpler than traditional methods. However, only the forward dynamics can be specified, while inverse dynamics need to be remodeled by other methods. In addition, the model is usually a black box, so the information hidden in the network is usually not as sophisticated as the analytic model. Although various algorithms have been designed to constrain the network to improve modeling ability [9][10][11], it is difficult to obtain accurate physical parameters.
Reviewing the above research, the automatic and universal modeling ability of deep learning is favored. We also anticipate the excellent analyzability of traditional models. In particular, the end-to-end forward and inverse dynamics are only a small part of the model applications, while the internal parameters of the model, such as inertial force, Coriolis and centrifugal force and gravity, are the basis of wider application scenarios [12][13][14]. This is the advantage of the analytic model as a white box, and it is also one of the reasons why neural networks as a black box have not replaced the analytic model. In other words, when we value the advantages of deep learning in dynamics modeling and attempt to apply it to general works, the weakness of the lack of analyzability to physical parameters will not be ignored. In this context, we designed bidirectional dynamic neural networks (BDNNs), which use a structure that is as simple as possible but without losing constraints to combine physical priors and train the network based on the general form of forward dynamics. The obtained model can seamlessly calculate the inverse dynamics only through formal adjustment. Furthermore, benefiting from the compatibility of our networks with Lagrangian Mechanics, we can also estimate the physical parameters of the system, similar to an analytic model.

Related work
as a representation of state transformation, modeling with deep learning has become increasingly popular for physical systems in the recent decade [6][7][8][9]11,15]. RNN is an immature solution since it can be seen as an Euler discretization of a state-space model [16][17][18]. This method performs well in control tasks that are insensitive to error [19,20] but is not competent in more precise situations. In pursuit of higher accuracy, the networks can be used to approximate continuous-time systems and represent state equations; this idea can be traced back to the Runge-Kutta neural network (RK-Net) [21]. The neural ODEs (NODE) further proved that this method can be extended to any ODE solver [8]. Neural networks can also be used to fit inverse dynamics [22], but they should finally be transformed into state transitions to avoid the need for acceleration [23]. Another important research on dynamics is the application of physical priors. One of the commonalities of physical systems is that they have strong priors, and various exciting studies, such as LNN, HNN and their variants, were developed to encode them [6,7,[9][10][11]. However, a problem encountered by these studies is that they overemphasize the generality and ignore the internal analysis ability of the mechanical system. Therefore, the problem that the model is still a black box has not been solved.
An encouraging trend is the combination of neural networks with Lagrangian Mechanics [7,9,23], such as LNN and DeLaN. The benefit of using this physical prior is that the Coriolis force and centrifugal force can be derived from the inertial matrix. This makes it possible to analyze the internal parameters of mechanical systems using machine learning. Nonetheless, the manner in which physical priors are exploited must be carefully weighed. For example, NODE, LNN and HNN only provide the framework without constraining the internals of the model, but they can all build accurate dynamic models. DeLaN, in contrast, strictly follows Lagrangian Mechanics for forward and backward propagation, which leads to its more complex structure and higher computational cost. Unfortunately, the complex constraints did not improve the accuracy of DeLaN; in contrast, its accuracy ranks low among existing methods [11]. One of the prominent effects is that the model computed by Lagrangian Mechanics requires the second derivative of the neuron during backpropagation, which makes many well-performing activation functions suffer unnecessary gradient losses in DeLaN. Another outstanding problem is that DeLaN computes the inverse of the inertia matrix in each forward and backward propagation, and the matrix inversion error is not linear with the fitting error [24]. This means that although the inversion of the matrix is accurate when the fitting accuracy is high, the model does not asymptotically approach the true value during training but instead generates severe fluctuations and makes training difficult in some cases. Furthermore, another important fact to be aware of is that the control optimization based on a single trajectory does not represent the dynamics model accuracy, especially the internal physical parameters. When there are few training samples, the neural networks can easily lead to serious errors in the estimation of physical parameters, even if the end-to-end training shows high accuracy [10,25].
The research status urges us to search for novel algorithms. We need a framework that is as simple as possible to guarantee Lagrangian Mechanics as a prior for deep learning. The attraction of the latter lies in its physical analyzability and makes neural dynamics no longer a black box. Additionally, this prior should be latent because its purpose is to embed the physical properties rather than directly applying them to the forward and inverse dynamics to avoid an increase in computational cost and a decrease in accuracy. In this paper, we design a new framework and a prior function of the inertia tensor to ensure that the inertia matrix conforms to the physical prior with a lightweight constraint. According to the law that the observations of dynamical systems are usually based on state transitions, we design this prior function on forward dynamics and derive its backpropagation to enable machine learning and Lagrangian Mechanics. The resulting networks are simple in both forward and backpropagation, and no complicated operations are required in the forward and inverse dynamics. This new framework combines the advantages of structural simplicity and the use of physical priors to optimize the networks, with the ability to implement Lagrangian Mechanics to analyze physical parameters when needed.
Relying on this framework, the BDNN has the following advantages: Simplified structure BDNN embeds the physical priors in a simpler form, which makes it computationally less expensive and more stable for training.
Accurate modeling and physical analyzability BDNN accuracy in end-to-end modeling is comparable to the best current models, while significantly outperforming existing algorithms in terms of physical parameter estimation.
Generalization ability The purpose of BDNN is not to model the dynamics of deterministic tasks but to build models that approximate the real system in the entire sample space. The obtained model is generalizable in both forward and inverse dynamics and parameter analysis.

Theory
How to describe the dynamics for mechanical systems has been extensively studied, and various formalisms exist, such as the most prominent Newtonian, Hamiltonian and Lagrangian Mechanics. For fully actuated systems, all methods ultimately yield the second-order ODEs as follows: M(q)q + C(q,q)q + G(q) = u (1) where q are generalized coordinates that uniquely define the system configuration. M(q) is the inertia matrix, and it has a crucial prior of symmetric and positive definite [1]. C(q,q)q represents the Coriolis and centrifugal forces, and G(q) and u represent gravity and the system inputs, respectively. Then, the left side of the equation can be denoted as the sum of M(q) and its remainder R(q,q) = C(q,q)q + G(q); thus, the forward and inverse dynamics of a system can be expressed as: where H(q) = M −1 (q). Another theoretical basis within this work is Lagrangian Mechanics, which can be used as a prior for its broad generality to mechanical systems [1][2][3]. Although the form of Lagrangian is not unique, for a given system, it can always be defined as a function of q to describe the complete dynamics. With knowledge of Lagrangian Mechanics, the dynamics of the system are specified by the Euler-Lagrange equation, in which the Coriolis force and centrifugal force can be derived from M(q) as: For status transition, assuming that we are given initial conditions q t ,q t and system inputs u t , the target that we care about is q t andq t at a certain time t = t + t. To achieve this prediction, the generalized acceleration needs to be calculated first through the forward dynamics in Eq. (2) and then organized as the state equationsẋ t = f (x t , u t ), in which the status x t = q tqt . Obviously, x t can be predicted by an ODE solver such as the Runge-Kutta fourth order (RK4).

Methodology
Starting from the above theories, the dynamic model can be established by approximating H(q) and R(q,q). Assume that H(q) and R(q,q) are specified by two neural networks H(q, θ) and R(q,q, ε), where θ and ε are parameters of the two networks. Based on the theory in Sect. 3, the BDNN can be designed as shown in Fig. 1. Then, we train the network with forward dynamics since it can be easily optimized by methods such as neural ODEs. Obviously, after optimization, inverse dynamics can also be easily realized.
BDNN does not use Lagrangian Mechanics in forward and inverse dynamics, so its structure is relatively simpler. Another benefit of this is that the second derivative of the neurons is not required for backpropagation, thus avoiding the vanishing gradient caused by the activation functions. In addition, since the training labels are state transitions, taking the inverse of the inertia matrix H(q) rather than M(q) as the learning object frees us from the inverse operation during training, which reduces the computational cost and it avoids the gradient oscillation caused by the nonlinearity between matrix inversion error and matrix error, which is more important. The latter enables the networks to achieve faster convergence and finally obtain better solutions.
Another problem we are concerned with is the embedding of the physical prior. Because the inertia matrix M(q) is symmetric and positive definite, its inverse H(q) must also be symmetric and positive definite.
The existing research uses the Cholesky Decomposition to ensure this characteristic [9,22]; however, this prior knowledge can be encoded very simply. Let the output tensor of the net H be H T (q, θ). In BDNN, we design the prior function P to define the relationship between H T (q, θ) and H(q, θ) as H = P(H T ). Obviously, the key of BDNN lies in the design of function P and its back propagation. Referring to Fig. 2a, the elements of H T successively represent the main diagonal of H, one of the adjacent equidistant diagonals on either side of the main diagonal, until one of the two corners of the matrix. In the backpropagation of the network, we also need to use the inverse operation of the function P. We definê H T = P −1 (H); it should be emphasized that unlike P, inĤ T , the elements other than the main diagonal are equal to the sum of the equidistant diagonals on both sides of the main diagonal, which is also shown in Fig. 2a.
A series of studies have shown that physical priors can significantly speed up the training of dynamic neural networks and improve model accuracy [9][10][11]. Clearly, the function P guarantees the symmetry of the matrix H while reducing the width of the tensor H T and thus is a simple but effective physical prior. The function P also provides a soft constraint on the positive definiteness, which requires the cooperation of the activation functions of the output layer. In BDNN, we use Softplus [26] for neurons corresponding to the main diagonal elements in H T . The other neurons in the output layer of net H and R were all used as Pureline, and we use Mish [27] for all the hidden layer neurons. The purpose of this framework is to make the elements of the main diagonal of H only positive, and according to the random initialization of the network weights, the off-diagonal elements are distributed approximately 0 and are symmetrical. This means that H should be positive and that this positive definiteness will be preserved as training progresses. Undoubtedly, the function P and the BDNN framework have distinct advantages in complexity and computational costs compared to research employing Cholesky decomposition.
When optimization training is performed, we construct the state equations through the forward dynamics mode of the BDNN, as shown in Fig. 1. There are many optimization methods, such as RNN, RK-Net, and neural ODEs. Neural ODEs are widely favored because of their concise form. For the state equations right shows the source of the training data, and the model is trained by state space equations that can be specified by forward dynamics. During optimization, the ODEs solver is used to estimate the state transition and obtain the loss. The training completion of forward dynamics means that the inverse dynamics are obtained at the same time by net H and R Fig. 2 The mapping graph of the prior functions P and D. a In forward propagation, the tensor H T output from the net is transformed into matrix H through P, while in backpropagation, the gradient vectorĤ T is obtained by P −1 in the form of a fold summation and further propagated into the network. b Since the function P transforms H and H T , the function D needs to be used when calculating the gradient, which can convert any N × 1 dimensional vector s into a N × 1 2 u, θ, ε) formed by the BDNN, the key step in neural ODEs is to compute the gradient of the output to the input, where the H network is influenced by the function P. To solve this problem, the function D is derived as shown in Fig. 2b, which makes: with function D, we calculate the gradient from the matrix to the network output tensor so that the backpropagation can be completed. For a more detailed process, please refer to the Appendix A. Moreover, we have a different focus from the classical algorithm, since neural ODEs are designed to pursue minimal memory cost, while we are more concerned with modeling accuracy. Therefore, we make adjustments to the neural ODEs, record the variables in the process of the ODE solver and track the numerical trajectory to perform backpropagation. First, we take an arbitrary norm of the estimation error as the instantaneous loss: (5) and the overall loss for the entire time series is: Then, we implement backpropagation based on numerical trajectory backtracking, which means that the recorded data during forward integration are used for reverse integration. The advantage of this is to prevent possible nonclosure of forward and backward integrals when the time horizon is long. We document the details in Algorithm 1, and the adjusted method makes the process of computing gradients more like a hybrid of RK-Net and NODE. Benefiting from net H being directly designed for forward dynamics, BDNN does not require inversion during training, so there is no problem of interfering with the optimization due to the ill-conditioned matrix H(q, θ ). Subsequent experiments will prove that this measure makes a great contribution to the modeling accuracy, especially the estimation accuracy of the physical parameters. As the training progresses, H(q, θ ) will gradually approach the real value of the physical system, and the physical parameters can also be estimated. Since the derivative of the inverse of any matrix A has (A −1 ) = −A −1 A A −1 , the following can be known: where the estimate ofḢ T (q,q, θ ) is calculated by the net H through the chain rule: From the prior function P, we can obtainḢ = P(Ḣ T ). Simultaneously, through Lagrangian Mechanics, we have: According to Eq. (3), the Coriolis force and centrifugal force C(q,q)q and gravity G(q) can be calculated as: The above analysis shows that BDNN has the advantage of deep learning because the network only needs to be specified by the status and control signals. In addition, benefiting from the compatibility of our network with Lagrangian Mechanics, we can also compute the estimations of physical parameters, similar to an analytic model. Differ from other algorithms for dynamic modeling using deep learning, BDNN is committed to the balance between accuracy and computational expense: Computational expense There are many methods for embedding physical priori, but the stricter the priori, the more complex it becomes. BDNN avoids the Cholesky decomposition and makes use of the symmetry of H(q, θ ) to reduce the width of the tensor H T (q, θ ). The framework enables H(q, θ ) to be diagonally dominant at initialization and to be smoothly transited to the optimized value. We also derive the priori functions P and D for network's simplification. These measures embed a physical priori with little additional computation.
Accuracy We are committed to eliminating sources of error and possible loss of accuracy, which include avoiding the use of second derivatives of neurons in back propagation, improving neural differential equations in backtracking, and avoiding gradient fluctuations and additional errors in the matrix inversion.

Algorithm 1 Backpropagation to compute the BDNN gradient
Input: dynamic parameters θ ,ε, time horizon from 0 ∼ T , time step h Output: loss gradient ∂ L /∂θ, ∂ L /∂ε Forward propagation: for t = 0 : T do t X 3,t , u t+1 , θ, ε) = f (I 4,t , u t+1 , θ, ε) end for return I 1,t , I 2,t , I 3,t , I 4,t , X 1,t , X 2,t , X 3,t , X 4,t Backpropagation, initialize: Simple but effective priors Another ingenious application of function P is to blunt the gradient of nondiagonal elements. The gradient of the diagonal is unaffected, while the others are superimposed based on the symmetry of the locations. The latter value is randomly positive and negative, and decreases with superposition. As a result, the diagonal gradient becomes significantly larger than that of non-diagonal, that is, the diagonal elements are always trained first. Therefore, H(q, θ ) is not only symmetric, but also tends to be strictly diagonally dominant in training, which contributes to its positive definiteness. Compared to other complex constraints, the functions P and P −1 are simple but effective as physical priors, P ensures the symmetry of H(q, θ ) and P −1 promotes its positivity.

Experiments
In experiments, we aim to justify the performance of our algorithm on the forward and inverse dynamics and physical analyzability above. According to different applications, we will verify the following three properties: -Ability of the forward dynamics model for long sequence time-series forecasting (LSTF). -Better inverse dynamics accuracy than existing methods in real-time computation. -More accurate physical parameter estimation.
The experiments will be conducted through both simulation and practice, both of which are equally important. For real systems, we can only observe the end-to-end inputs and outputs, and the internal parameters can only be accurately obtained in simulations. Furthermore, end-to-end modeling accuracy does not mean that the physical analysis is also accurate because less intuitive statistics obscure the specific problems and tend to exaggerate model performance. Therefore, the estimations of physical parameters will be one of the focuses of our experiments. After that, we will verify how the BDNN performs on a real system. Before the experiments, we would like to share a few additional tips that we have used: -Loss function: in our tests, the training effect of the L 1 loss is slightly better than that of L 2 ; -Regularizer: Although strict regularization may lead to the decline in end-to-end accuracy, in our experiments, the Lipschitz regularizer [28] slightly improves the estimation of physical parameters; -Training samples: the extreme values of the Coriolis force and centrifugal force of the manipulator only appear in a particular status, and the distribution of the sample must be specifically considered for this.

Experimental setup
The experiments consist of two parts: simulations and practice. In the simulations, we use BDNN to verify the performance of forward and inverse dynamics and parameter estimation on a 2-axis and a 6-axis robot. For the simulated 2-axis robot, we defined the arm lengths as 0.6 m and 0.5 m, and the moments of inertia as 3.8 kg·m 2 and 1.8 kg·m 2 , respectively. For the 6-axis robot, the data are derived from the MATLAB Robotics System Toolbox. In the practical part, we use BDNN to model a physical 7-axis robot to confirm the performance on a real system. Here what we use is Franka Emika Panda.
Baseline models In a series of existing studies, we select CHNN and DeLaN 4EC to compare with our work. Since the representativeness of the two, CHNN shows the highest accuracy in the horizontal comparison [11,25], while, although DeLaN is not accurate enough, it has compatibility with Lagrangian Mechanics [23]. The former provides a benchmark for model accuracy, while the latter is a reference for physical analyzability. As a benchmark, conventional parameter identification (PI) is also included in baseline models. Here what we use is IDIM-LS.
Model variants When approximating the objective function, both baseline models and BDNN use fully connected neural networks. The hidden layers of the baseline model are defined as 200 neurons, while BDNN is set to 100 neurons since there are two parallel networks.
Data generation As a simple system, we drive the 2-axis robot open-loop with random sinusoidal torque and for complex 6-and 7-axis robots, we use the PID controller to make the robots track random single and superimposed sinusoidal trajectories. We use a total of 200 random trajectories for the 2-axis robot, each with a sampling length of 15 s and a time step of 0.001 s. The observed status of the data are the angular displacement and velocity with the "ode45" solver in simulations. For the 6-and 7-axis robot, we use twice the data as for the 2-axis', and the extra part is used to activate the extrema of the Coriolis force and centrifugal force to balance the distribution of the data. Compared to deep learning, PI methods rely on different types of samples. We estimate the acceleration of robots with finite Fourier series under closed-loop control and select the optimal sample for IDIM-LS based on the condition number.
Model training In gradient descent, we use the Adam optimizer [29] with 500 epochs. The sample set is split into a large number of segments, the time horizon of each segment is set to T = 5, and we choose "RK4" as the numerical integration scheme in Neural ODE.

Modeling in simulations
As mentioned previously, the main purpose of modeling in simulations is to verify the model accuracy, including forward and inverse dynamics and physical analyzability. We demonstrate the performance of the baseline models and BDNN on a 2-axis robot and a 6-axis robot. The 2-axis robot is a common subject in existing research, which has typical multi-input and multioutput (MIMO), nonlinear and coupling characteristics, but the model is relatively simple. We also extend the model to the 6-axis robot whose dynamics are much more complex, and we use this to verify the adaptability of the BDNN to complex systems.

Forward dynamics
The LSTF based on forward dynamics are the most important index in modeling because the sequences of status x t need to be estimated in a recursive algorithm. This application requires the highest model accuracy because, over time, even small errors will be gradually amplified through integration and cause the model's behavior to diverge from the ground truth. Therefore, we set up a group of 10 prediction tasks up to 15 s as tests, each of which is benchmarked against the symmetric mean absolute percentage error (SMAPE). Then, we evaluate the accuracy of each model by the arithmetic mean of statistics and use the one closest to the mean as a display. We observe the performance Fig. 3 Long sequence time-series forecasting by the models and their absolute errors for the robots. Each model share the same initial conditions and input series as the background truth. The coordinate system on the left shows the trajectories predicted by the BDNN and baseline models, and the coordinate system on the right shows their absolute error. a Estimation and absolute error for the 2-axis robot. b Estimation and absolute error for the 6-axis robot of different models on the 2-axis robot, as shown in Fig. 3a. Apparently, all the models exhibit appreciable accuracy in all 4 dimensions of the status x t and show no tendency for divergence in the recurrence computation. However, a result can be clearly observed that BDNN shows significantly better performance than DeLaN and even slightly better than CHNN.
When we turn our attention to a more complex 6axis robot and widen the dimension of x t to 12, the situation changes slightly, in which DeLaN has errors in predictions and shows a divergent trend, as shown in Fig. 3b. In contrast, BDNN continues to maintain high accuracy similar to CHNN. This means that BDNN not only achieves excellent performance in simple systems but can also adapt to more complex environments.

Inverse dynamics
The inverse dynamics estimates u, which is the driving torque of each joint for a robot. As in Sect. 5.1.1, the demonstration of inverse dynamics is also based on an approximate average of the statistical performance. The difference is that except for forward dynamics, only the mean squared error (MSE) is used as an evaluation index. Unlike forward dynamics, the common application of inverse dynamics is in control, and many studies, therefore, focus on learning dynamics during control rather than on modeling. Conversely, most modeling algorithms, including CHNN, do not have the ability to compute inverse dynamics, which makes them lose the opportunities to be applied. This is the advantage of BDNN because it can obtain inverse dynamics models simultaneously through training. Among the algorithms we list, only DeLaN can be used for comparison, and the performance of the two on the 2-axis robot is shown in Fig. 4a.
To further test their performance in complex system, we compared them on the 6-axis robot, as shown in Fig. 4a. It can be seen that as in the forward dynamics, the performance of DeLaN drops significantly with the complexity of the learned objects, while BDNN continues the previous performance. This shows that BDNN keeps better than DeLaN in the adaptability of complex systems.

Physical analyzability
Physical analyzability is the focus of our research, which comes from the compatibility of the framework with Lagrangian Mechanics. Therefore, DeLaN becomes the reference for our algorithm. The contents of the analysis include the inertial matrix M(q), Coriolis force and centrifugal force C(q,q)q and gravity G(q). To present the results more concisely, we replace M(q) with inertial force M(q)q. The comparison based on the 2-axis robot is shown in Fig. 5.
DeLaN has a slight error for M(q)q and G(q), as shown in Fig. 5a and c. As a further derived value, the error of C(q,q)q increases significantly, as shown in Fig. 5b. In machine learning, the fit function is constrained to the 0th-order neighborhood of the ground truth, while the accuracy of C(q,q)q is determined by the 1st-order neighborhood of the M(q) fit. This is also the reason why we would rather sacrifice a little end-to-end accuracy to use the Lipschitz regularizer, as it provides a constraint on the 1st-order neighborhood. Due to the advantages in modeling, the physical parameters estimated by the BDNN are more accurate and have practical value. To confirm this, we also examined the performance of the BDNN on the 6-axis robot, as shown in Fig. 6.
As expected, the physical analyzability of the BDNN continues to lead the baseline model, and even for complex systems, the former can still accurately estimate physical parameters, which provides new possibilities for applications of deep learning in dynamics.

Statistics
As a summary of the simulation experiments, the statistics is used for overall evaluation of the baseline models and BDNN. As mentioned above, SMAPE is used for forward dynamics, while MSE is used for others. The results for the 2-axis robot are shown in Fig. 7. More importantly, our algorithm is not only applicable to 2-axis robots, but also performs well on 6-axis robots, although the dynamics of the latter are much more complex, as shown in Fig. 8. According to the functions of BDNN, the statistics is shown from three aspects: Forward dynamics Among the deep learning methods, BDNN has the ability to compete with or even slightly surpass CHNN; the latter is one of the best models that we have referenced. For the 2-and 6-axis robots, the average SMAPE of LSTF for CHNN is 3.11×10 −2 and 3.66 × 10 −2 , while that of BDNN is 1.85 × 10 −2 and 2.41×10 −2 , respectively. Compared to PI, existing research shows that the accuracy of conventional methods is only similar to that of DeLaN [23]. Although we optimized PI and forced better performance, it is uncompetitive with BDNN.
Inverse dynamics Although CHNN performs well in forward dynamics, it lacks abilities in inverse dynamics and physical analysis. Compared to PI and DeLaN, the MSE of BDNN maintains a clear advantage.
Physical analysis Statistics show BDNN performs best in general, but PI's performance is also noteworthy, especially in estimating Coriolis force and centrifugal force. For this single item, PI is slightly more accurate than BDNN for 2-axis robot, but BDNN performs better for 6-axis robot.

Modeling in practice
The purpose of modeling in practice is to confirm the adaptability of our framework to real systems. Unlike in simulations, many state variables and physical parameters are unknown or estimated in real systems, which makes the accurate background truth difficult to be obtained. The limited observations mean that verification based on simulations can be implemented more comprehensively, while verification based on physical reality can only be used as a supplement. In a real robot, only the driving torque u and the position response q can be measured directly, and this relationship will impose strict requirements on the model when it is described recursively using forward dynamics. This is why we only focus on forward dynamics in practice. In addition, rigid bodies far from the end show a stronger representation in dynamics, which can be clearly observed in Sects. 5.1.1 and 5.1.4. Therefore, we only show the first two joints of Franka, as shown in Fig. 9a.
The experimental results show that BDNN is competent for real physical systems, and its performance is consistent with that in simulations, as shown in Fig. 3b. The errors of the both are kept at the same order of magnitude.

Results
The experiments range from simple systems exemplified by 2-axis robot to complex systems exemplified by 6-axis robot, and the practicability was validated on a physical robot. Through a series of considerations to improve accuracy, we finally succeeded in optimizing the performance of BDNN to be the best among the models we referred to. In this regard, PI is an excellent benchmark; it is better than DeLaN in LSTF, but weaker than CHNN and BDNN. This is because CHNN and BDNN are based on differential dynamics and are better at state prediction. Lagrangian Mechanics is the key we emphasized, however, it does not always perform perfectly in neural networks such as DeLaN. BDNN  successfully eliminated potential sources of error and brought the networks based on Lagrangian Mechanics to a higher level. Moreover, although DeLaN was designed based on inverse dynamics while BDNN was designed based on forward dynamics, BDNN's accuracy in inverse dynamics is still higher than DeLaN, even higher than PI. This indicates that the estimate of matrix M(q) is accurate and that the physical prior we designed is effective. BDNN also generally performs the best in physical analysis, only slightly weaker than PI in estimating Coriolis and centrifugal force for 2-axis robots. This is because BDNN estimates this item in the 1-order neighborhood while PI in the 0-order neighborhood. However, the same estimate from BDNN is more accurate for 6-axis robots, which is due to the fact that PI's performance decreases with the complexity of the system, especially when the PI method is based on closed-loop sampling and introduces control errors. All the above experiments confirmed the advantages of BDNN, and we believe that the accuracy of BDNN can satisfy most dynamic model applications and can be a substitute for the analytic model to a certain extent.

Conclusions
In this work, the general dynamic structure of the mechanical system was taken as a prior knowledge, incorporate it into the neural network to construct a novel network model with the capabilities of forward and inverse dynamics and physical analysis. We have extensively referenced existing methods, including conventional identification and novel deep learning methods, and selected several of the most representative methods as the baseline based on the existing horizontal comparative research. The results show that our work equals or even slightly outperforms the best reference model for robots. This novel structure and improved algorithm we designed enable our model to achieve better predictions, and this better prediction is achieved by learning a physically interpretable, continuous state-space model.
It is noteworthy that some papers claimed online learning abilities, such as in the control process. However, BDNN is not recommended to do so. Although BDNN can also establish accurate forward dynamics in a single trajectory and few-shot learning, our tests has shown that the estimation of physical parameters in this case is not always accurate. This is a topic unique for deep learning, and a great number of research is being conducted on it. However, we have not yet extended our work to this area, and we are eager to engage in discussions and research about this in the future.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.