Neural network modeling and dynamic behavior prediction of nonlinear dynamic systems

In practical engineering, it is difficult to establish complex nonlinear dynamic equations based on theories of mechanics. Data-driven models are built using neural networks in this paper to meet the needs of high dimension, multi-scale and high precision. We construct a two-coefficient loss function for whole data-driven modeling and substructure data-driven modeling according to the linear multi-step method. The forward Euler method is combined with trained neural networks to predict a five-degree-of-freedom duffing oscillator system. Comparative results show that the prediction accuracy of substructure data-driven modeling is higher than whole data-driven modeling, and the generalization and robustness of the model are verified. Meanwhile, the selection of training data and the number of hidden layers have a great impact on the prediction ability. Adopting an adjustable learning rate, adding control parameters to the network input shows better performance than not adding control parameters to the network input.


Introduction
The modeling of dynamic systems plays an important role in solving practical engineering problems. Most classical modeling methods are based on theories of mechanics, such as shear deformation theory [1], geometric nonlinearity [2], and nonlocal strain gradient theory [3]. Beams, plates and shells are the basic structures in the engineering field, which are generally modeled by theories of mechanics. As the structure becomes more complicated, the difficulty of modeling increases. Wicker [4] analyzed the nonlinear vibration of the axially moving tensioned beam through Euler-Bernoulli theory. Belabed et al. [5] established system equations for functionally graded material plates and proposed a novel higher order shear and normal deformation theory. The modified coupled stress theory and the first-order shear deformation shell model were applied to deriving a new formula for the functionally graded cylindrical shell using Hamilton's principle [6]. For complex multibody systems, theoretical modeling is valuable but challenging. Rong et al. [7] concluded that the dynamic equations of different forms of flexible multibody systems can be derived from different mechanical principles. As it is difficult to set up precise dynamic equations due to the complexity of the problem backgrounds and research objects [8], the model is initially roughly estimated based on data (like the epidemic model [9], inventory assessment model [10], stock model [11]). With the coming era of big data, it is possible to develop sufficiently accurate mathematical models with data.
Data-driven modeling is a method to train models through data integration and refining, including parametric methods and nonparametric methods. Parametric methods generally need to assume a model description to be identified. Reduce the model error according to the data and then update undetermined parameters. Combining least squares and maximum likelihood to identify models is a classical parameter method. Nevertheless, in practical engineering problems, the equations of systems with nonlinear structures are difficult to determine. For these kinds of nonlinear systems, nonparametric methods show more flexibility. These methods simply explore the mapping relationship between input data and output data without prior information about the physical properties of systems. The nonlinear autoregressive moving average with exogenous input (NARMAX) model is widespread among nonparametric modeling methods. It is found that NARMAX has the characteristics of large storage requirements and low data stability.
Machine learning is a newly emerged technology about data learning, which powers all kinds of aspects: from information filtering to image recognition to intelligent monitoring. It is divided into supervised learning and unsupervised learning, including a variety of algorithm models, such as regression model, Naive Bayesian model, and clustering algorithm. Machine learning is suitable for classification, prediction and other issues, further promoting the development of data-driven modeling. Chiuso and Pillonetto [12] employed machine learning to identify parameters of linear and nonlinear systems. Focusing on the nonlinear system, Li and Yamamoto [13] extended the linear regression model to the polynomial regression model and verified the effectiveness of model-free predictive control. Guo et al. [14] studied the multiinput and multi-output nonlinear system described by Takagi-Sugeno fuzzy model. They applied fuzzy C-means clustering data-driven algorithm and least squares method to system identification.
As a significant approach to machine learning, neural networks are network structure models composed of many artificial neurons. The interconnected neurons have nonlinear transfer functions, resulting in strong nonlinear fitting ability of neural networks [15][16][17][18][19]. For this reason, neural networks can better reconstruct complicated dynamic systems, so that they are widely used in motion simulation, trajectory prediction and preventive control problems [20][21][22]. Feedforward neural networks [23], convolutional neural networks [24], long short-term memory neural networks [25,26] and echo state networks [27,28] are selected for data-driven prediction of various systems. Over decades of research, different improvement of neural networks has been developed to raise the accuracy of the model. Sharma and Padhy [29] added an extended B-polynomial to the network structure for time-delay systems. They proved the convergence of the algorithm with the help of the Lyapunov stability theory. Han et al. [30] set up an error correction step, additionally training the deep neural network to approximate the model error, so as to increase the network performance. Eshkevari et al. [31] designed a physics-based recurrent neural network model to estimate system dynamics. They proposed several techniques to significantly accelerate the learning process only using smaller datasets. When choosing the optimizer of the neural network, the performance of first-order and second-order methods was compared. Moreover, a projection loss function was presented to replace the traditional loss function.
The loss function is crucial in the optimization process of neural networks [32][33][34][35]. By calculating the loss function and its partial derivatives, the network parameters are updated constantly. In the following research, new loss functions are created to investigate nonlinear dynamics. Adopting the feedforward neural network, Pan and Duraisamy [36] studied long-time predictive modeling of nonlinear dynamic systems. Jacobian regularization was introduced into the loss function, which can enhance the accuracy and robustness of the model. Zhang et al. [37] pointed out that embedding physical constraints in the loss function can not only strengthen model training, but also exactly capture potential system nonlinearity.
Neural networks are combined with other methods or models for higher precision. For example, Lu et al. [38] presented a hybrid neural-network-interpolation (NN-I) surrogate model to provide the accurate response under limited realization times. Morales and Yu [39] made use of the statistical model obtained by the Bayesian method to change the structure of the neural network. It showed that the model can be better when existing large noises and the system dynamics is complex. In order to meet the requirements of efficient and precise modeling of nonlinear jointed structures, Ma et al. [40] constructed equivalent neural networks to replace nonlinear nodes. A hybrid modeling method combining neural networks with simplified finite element models was raised to predict its inherent nonlinear behavior via a numerical example. Yu et al. [41] advanced a physics-guided machine learning method using the recurrent neural network and multilayer perceptron. It had better generalization ability and lower training cost, and only needs to simulate the unknown physical part. Li et al. [42] proposed a hybrid forecasting model composed of empirical mode decomposition technology, convolutional neural networks and gated recurrent units. They applied a butterfly algorithm based on quantum coding to model optimization, which achieved higher prediction accuracy.
The reconstruction of models with chaotic dynamic behavior is relatively complex due to the instability of chaos. The chaotic model developed by the fractal-fractional operator has been implemented in chaotic circuits and image encryption [43,44]. Forecasting chaotic time series via neural network techniques is still a challenging task. For chaotic circuit systems, Xu et al. [45] evaluated the performance of neural network models in short-term and long-term prediction. Huang et al. [46] designed a deep hybrid neural network to forecast chaotic time series. The convolutional neural network was selected to extract spatial features, and the gated recurrent unit network was chosen to capture the spatio-temporal features from the combined sequence. Bompas et al. [47] compared the efficiency and accuracy of three neural network technologies in chaos simulation. Besides, the prediction of chaos using neural networks attracts considerable attention in a wide range of fields such as finance, traffic, and weather. Through the above analysis, in the process of building dynamic models with neural networks, there is little research concentrating on the influence of self-defined loss function, the number of hidden layers and other factors on model precision.
Neural networks are not only applied to modeling, but also used to solve differential equations. A generalized residual network framework was established for solving ordinary differential equations and its feasibility was proved [48]. The physical information contained in the partial differential equations was taken as a regularization term. Then the partial differential equations were solved through this improved physics-informed neural network [49]. For nonlinear partial differential equations, wave solutions with higher precision were derived by the improved physics-informed neural networks [50,51]. An approach for solving ordinary and partial differential equations based on the fast converging neural network was introduced, eliminating the time-consuming optimization process in neural network training [52]. Derkevorkian et al. [53] incorporated the trained neural network into the ordinary differential equation (ODE) solver and the models only needed the initial conditions to forecast the response. It was mentioned that the modeling of nonlinear systems is more challenging than linear systems. In this paper, we integrate the idea of the above framework into the modeling procedure and take nonlinear systems as the objects to study the data-driven dynamics.
In general, without foreseeing the model information, this article adopts a data-driven modeling method for nonlinear dynamic systems. Based on the modeling method of Cai et al. [54], substructure data-driven modeling is considered and a five-degree-of-freedom duffing oscillator is investigated in this paper. The specific steps of modeling are presented first. The corresponding response data obtained from the known state equations and the given external excitation is taken as the measured data to train the neural network. According to the linear multi-step method and the known relationship between system input and output, the loss function with two coefficients is constructed. As the optimizer continues to update the weight parameters, the training stops when the loss function reaches the corresponding precision. Take the trained neural network instead of the part without excitation on the right side of the equations to achieve a new data model. We predict the response data of the system through the forward Euler method and neural networks. Then, considering the numerical example, the above methods are used for modeling and forecasting.
The displacement time series curve, amplitude-frequency response curve and phase diagram can reflect the prediction ability and verify the effectiveness of the model. The influence of the loss function, the number of hidden layers, training data and noise on the accuracy of the model is further analyzed. Finally, on the basis of the adjustable learning rate, APNI and NAPNI are utilized to investigate the bifurcation of the system.

Problem description
A data-driven modeling method was proposed by Cai et al. [54], in which the response data obtained from the dynamic state equations was used to train the neural network. The part without excitation terms on the right side of the equations was replaced by the trained neural network, a data model. The research question can be described as follows.
Consider a non-autonomous dynamic system where xðtÞ ¼ ½x 1 ðtÞ x 2 ðtÞ Á Á Á x n ðtÞ T 2 R n is the displacement vector, M represents the mass matrix, GðxðtÞ; _ xðtÞÞ is the generalized restoring force and FðtÞ represents external excitation force.
Transform Eq. (1) into state equations where vðtÞ ¼ ½v 1 ðtÞ v 2 ðtÞ Á Á Á v n ðtÞ T 2 R n is the velocity vector. Equation (2) can be expressed as where XðtÞ ¼ ½x T ðtÞ v T ðtÞ T 2 R 2n is the state vector of the system and f ðXðtÞÞ ¼ ½v T ðtÞ ðÀM À1 GðxðtÞ; vðtÞÞÞ T T 2 R 2n , Usually, the generalized restoring force GðxðtÞ; _ xðtÞÞ contains nonlinear terms. In this paper, assuming this item is unknown, the data-driven model of the system should be carried out. Given the external excitation f ðtÞ, the state vector of the corresponding state equations is solved to simulate the actual measurement data, which is input into the neural network for training. Using the trained neural network f ðXðtÞÞ instead of f ðXðtÞÞ, the following state equations can be achieved for response prediction dXðtÞ dt ¼f ðXðtÞÞ þ f ðtÞ: ð4Þ 2.2 Whole data-driven modeling method The specific modeling steps are as follows.
(a) Obtain training data of the neural network. For different frequencies and amplitudes, there is M time series data of external excitation, which is denoted by f s ðt j Þ, s ¼ 1; . . .; M, j ¼ 0; 1; . . .; N.
As the corresponding state equations are solved via the ODE solver, the obtained time series of the state vector are expressed as X f h ðt j Þ to simulate the actual measurement data. (b) Train the neural network. The data Xðt j Þ is input into the neural network, and the output data is We have the following relationships: The input and output data of the neural network satisfy the quantitative relationship formed by the linear multi-step method where i ¼ k À 1; . . .; N À 1. N is the number of sampling points, h is the sampling time interval, a p and b q are constants. According to Eqs. (5) and (6), the loss function of the neural network is constructed. In this paper, we introduce the coefficient c 1 to enhance the prediction accuracy of data-driven modeling. Thus, the loss function is defined as where c 1 and c 2 are constants, The loss function is an index to measure the consistency between network output data and real data, which is similar to the error function. The ultimate goal of network training is minimizing the loss function. Adam optimizer is chosen to update the weight parameters of the network in this paper. When the loss function value reaches a certain precision, the iteration stops. Meanwhile, the training ends, generating a trained neural networkf ðXÞ.
(c) Replace f XðtÞ ð Þ withf ðXÞ. For higher prediction accuracy, we combine the forward Euler method and the neural network to iterate and predict in this step. Given an initial value X pre ðt 0 Þ as the input data at the first time-step, the neural networkf ðXÞ is used to forecast the output dataf ð X pre ðt 0 Þ À Á .
Solve Eq. (4) by the forward Euler method The initial value X pre t 0 ð Þ and the output datã f X pre t 0 ð Þ À Á are provided to calculate the state vector X pre t 1 ð Þ at the second time-step. The state vector is regarded as the new input data of the neural network. In the same way, the input data is X pre t j À Á at time-step j þ 1, and the predicted data via the neural network is f X pre t j À Á À Á . Solved by the forward Euler method, the state vector X pre t jþ1 À Á at time-step j þ 2 is treated as the input data of the network at this time. Constantly update the network input and repeat the above process. Finally, the integration of state vectors at each timestep is the predicted response data X pre ðtÞ.
The architecture of the neural network and forecast flow chart of whole data-driven modeling are shown in Fig. 1.

Substructure data-driven modeling method
In the whole data-driven modeling, it is presumed that each input and output is related. A fully connected neural network with 2n inputs and 2n outputs is built. When n is a large numerical value, the operation speed of the whole data-driven modeling will be slow. However, in the practical system, each mass may only be related to several connected masses. Based on the above analysis, the establishment of several neural networks corresponding to masses can reduce the difficulty of modeling by reducing the degree of freedom, and can improve the accuracy. The method of modeling is called substructure data-driven modeling.
The substructure data-driven modeling method focuses on the displacement and velocity data of a single object and its associated objects. Input the data into a single neural network for training and construct the loss function of each network. The specific modeling steps are as follows. . . .; n À 1. Its input is X m i , X m iÀ1 , X m iþ1 . Since X includes the displacement and velocity, the input data satisfies For the leftmost and rightmost masses, the corresponding neural network contains 4 inputs and 2 outputs, meaning that X 2 R 4 ; Y 2 R 2 .
Divide X m i t j À Á into X 1 t j À Á and X 2 t j À Á on account of the displacement and velocity. Equivalently, the first n-dimensional data of Y t j À Á is Y 1 t j À Á , and the last ndimensional data is Y 2 t j À Á . According to the linear multi-step method and the known relationship of the system, the loss function is created where c 1 and c 2 are constants, When the loss function value reaches a certain precision, the iteration stops. At the end of the training, we can get several neural networksf m i , i ¼ 1; 2; . . .; n.
(c) Replace f ðXðtÞÞ with multiple trained neural networksf . The initial value X pre ðt 0 Þ is grouped according to the masses like training data. Take X pre m i ðt 0 Þ, X pre m iÀ1 ðt 0 Þ, X pre m iþ1 ðt 0 Þ (the masses at both ends correspond to two input vectors) as the input data of the i-th neural network at the first time-step. The output datã f m i X pre m i t 0 ð Þ; X pre m iÀ1 t 0 ð Þ; X pre m iþ1 t 0 ð Þ À Á is received with the help of the neural network. Employing the forward Euler method, this output data and the initial value X pre ðt 0 Þ are provided to calculate the state vector X pre t 1 ð Þ at the second time-step.
This state vector is regarded as the new input data of the neural networks. Similarly, the input data is X pre ðt j Þ at time-step j þ 1, and the forecast data by the i-th neural network isf m i X pre m i t j À Á ; À X pre m iÀ1 t j À Á ; X pre m iþ1 t j À Á Þ. The forward Euler method is used to gain the state vector X pre t jþ1 À Á at time-step j þ 2. The state vector is also the input data of the networks at this time. Constantly update the network input and repeat the above process. Finally, the integration of state vectors at each time-step is the predicted response data X pre ðtÞ.
The architecture of the neural networks and forecast flow chart of substructure data-driven modeling are shown in Fig. 2. Fig. 1 The process of whole data-driven modeling method 3.1 A five-degree-of-freedom duffing oscillator Consider a five-degree-of-freedom duffing oscillator system [55], as shown in Fig. 3. The system contains five identical masses, whose movement is carried out in the horizontal direction. It is subjected to six linear dampers with the same coefficient c and six linearnonlinear springs with the coefficients k and k 0 . The third mass is under sinusoidal excitation.
The system is described by the following equations Fig. 2 The process of substructure data-driven modeling method The following non-dimensional time and displacement are introduced, and the derivatives between the dimensional and non-dimensional times are substituted into Eq. (10) Non-dimensional differential equations can be obtained

Whole data-driven modeling
Through the linear multi-step method, the following expression is derived Based on Eq. (13), the corresponding loss function is constructed where According to the steps of the whole data-driven modeling, the neural network is trained by the known external excitation and response data. Replace with the network to obtain the substitution equation Fig. 4 The displacement time series curve on the plane ðt; x 3 Þ when k ¼ 1:6 by whole data-driven modeling where sin ktÞ T T andf ðXðtÞÞ is the trained neural network. Supposing d ¼ 0:2, a ¼ 1, the five non-dimensional natural frequencies are 0.2846, 0.8308, 1.3097, 1.6825 and 1.9190, respectively. The frequencies of training data are k 0 ¼ 1:2; 1:3; 1:4; 1:5, the sampling time is set to 300 s, and the sampling frequency is 10 Hz. The corresponding dynamic equations are solved via the ODE solver to get the numerical solution, which is the time-domain response data. The neural network has a single hidden layer with 256 neurons and a hyperbolic tangent transfer function. The training epoch is 500000. By taking the trained neural network as the term containing the generalized restoring force in state equations, the forward Euler method is used to calculate the response data of predicted frequency k ¼ 1:6. Predict the displacement time series curves of masses in different time periods, as shown in Fig. 4.
In Fig. 4, the blue dotted line represents the numerical solution (accurate solution) obtained by solving the known dynamic equations with the ODE solver. The red solid line is the predictive solution obtained by establishing a new data model after training the neural network. Figure 4a shows the displacement time series curve from 0 to 1000 s to reflect the overall trend. To observe the prediction results more clearly, Fig. 4b, c are given, which reflect displacement time series curves from 0 to 160 s and 900 to 1000 s. It can be seen that the predictive solution is close to the accurate solution.

Substructure data-driven modeling
In the five-degree-of-freedom duffing oscillator system, each mass only acts on its interconnected masses. Substructure data-driven modeling method is utilized to predict the dynamic behavior. We preset a single hidden layer with 64 neurons and a hyperbolic tangent transfer function.
Employing the substructure modeling method, training data, sampling time, sampling frequency and training epoch are consistent with the whole data-driven modeling. The displacement time series curves by substructure data-driven modeling are received as shown in Fig. 5.
Comparing Figs. 4b and 5b, it shows a better agreement between the predicted value and the accurate value in Fig. 5b. For further evaluation, Fig. 6 presents the absolute error in x 3 dimension and mean absolute error (MAE) in five dimensions. It should be noticed that the substructure modeling method has a higher degree of coincidence and builds a high-precision model. The following research is based on the substructure data-driven modeling method. The influence of the loss function, the number of hidden layers, training data and noise on the accuracy of the model is analyzed through substructure datadriven modeling method, respectively. Use the model to forecast the maximum amplitude of steady-state response x 3 at different frequencies k. Choose single layer, the loss function coefficient c 2 ¼ ½0:1; 0:1; 0:1; 0:1; 0:1 T and the frequencies of input data k 0 ¼ 1:2; 1:3; 1:4; 1:5. By modifying the loss function coefficient c 1 , the amplitude-frequency response curves are obtained. Figure 7a, b represents the curves of different loss function coefficients c 1 ¼ ½1; 1; 1; 1; 1 T and c 1 ¼ ½1; 1; 1:8; 1; 1 T . The blue dotted line is the accurate value and the red solid line is the predictive value based on the substructure datadriven model. The peak value of the prediction curve in Fig. 7b is lower than that in Fig. 7a and is closer to the accurate value. This indicates that properly adjusting the loss function coefficient c 1 can enhance the prediction capability of the substructure datadriven model, but there is still error between the prediction model and the theoretical model. Similarly,   8 Amplitude-frequency response curve of the five-degree-of-freedom duffing oscillator system (k 0 ¼ 0:7; 0:9; 1:1; 1:3, c 1 ¼ ½1; 1; 1; 1; 1 T , c 2 ¼ ½0:1; 0:1; 0:1; 0:1; 0:1 T ) Fig. 9 Amplitude-frequency response curve of the five-degree-of-freedom duffing oscillator system (c 1 ¼ ½1; 1; 1; 1; 1 T , c 2 ¼ ½0:1; 0:1; 0:1; 0:1; 0:1 T , single hidden layer with 64 neurons) We select k 0 ¼ 0:7; 0:9; 1:1; 1:3 to get training data, aiming at exploring the influence of the number of hidden layers on the prediction ability. Set single, double, three and four hidden layers, as shown in Fig. 8. Figure 8b has a higher consistency, compared with Fig. 8a. The curves in Fig. 8c are in better agreement than Fig. 8b. From single hidden layer to double hidden layers and then to three hidden layers, the curves coincide better, that is, the prediction error is gradually decreased. Instead, Fig. 8d has a lower Fig. 11 Amplitude-frequency response curve of the five-degree-of-freedom duffing oscillator system (k 0 ¼ 0:7; 1; 1:2; 1:5, c 1 ¼ ½1; 1; 1; 1; 1 T , c 2 ¼ ½0:1; 0:1; 0:1; 0:1; 0:1 T ) Fig. 10 Amplitude-frequency response curve of the five-degree-of-freedom duffing oscillator system (k 0 ¼ 0:7; 1; 1:2; 1:5, c 1 ¼ ½1; 1; 1; 1; 1 T , c 2 ¼ ½0:1; 0:1; 0:1; 0:1; 0:1 T ) precision, reflecting the effect of network overfitting. The simulation results show that the performance of the substructure data-driven model can be increased by adding hidden layers appropriately.
To verify the model constructed under the same external excitation can predict the response under different times of external excitation, the external excitation of training data is fixed as 1 sin k 0 t, k 0 ¼ 0:7; 1; 1:2; 1:5. Figures 10, 11, 12 and 13 are the predicted amplitude-frequency response curves, whose external excitation is taken as 1 sin kt, 0:5 sin kt, 1:5 sin kt, 2 sin kt, respectively. A comparison of the networks with single hidden layer (64 neurons per layer), double hidden layers (32 neurons per layer), three hidden layers (20 neurons per layer) and five hidden layers (8 neurons per layer) is brought.
Comparing Figs. 10a, 11a, 12a and 13a, the prediction ability of the same training data for various external excitation is different. The prediction accuracy under 1 sin kt and 0:5 sin kt is higher than that under 1:5 sin kt and 2 sin kt. For the amplitudefrequency response curves of different numbers of hidden layers, the curves of double and three hidden layers almost coincide under 1 sin kt and 0:5 sin kt. However, the prediction accuracy of the model decreases when neural networks add to five hidden layers, which is due to the slight overfitting phenomenon of the networks. Under 1:5 sin kt and 2 sin kt, the improvement of prediction accuracy of double and three hidden layers is not obvious. The performance of the model has been evidently improved when choosing five hidden layers, which is still worse than the case under 1 sin kt and 0:5 sin kt. We arrive at a conclusion that a reasonable increase in the number of hidden layers will effectively raise the prediction accuracy of the model. Training data is under 1 sin kt, thus affecting the prediction results under different external excitation.
Considering the influence of different degrees of measurement noise, the independent and identically distributed Gaussian noise is used to simulate the environmental noise and is added to training data. The mean value of noise is zero, and the standard deviation is set to 6 percent and 10 percent of the standard deviation of the original data. Select k 0 ¼ 0:7; 1; 1:2; 1:5 to get training data. Preset double hidden layers and three hidden layers (32 neurons per layer). Training and predicted data are both under 1 sin kt.  Figure 14 represents the influence of the above two kinds of noise on the model. When adding 6 percent noise, neural networks with double and three hidden layers show high prediction accuracy. When adding 10 percent noise, the prediction results of three hidden layers are obviously better than that of two hidden layers. It is proved that the robustness of the prediction model can be enhanced by reasonably increasing the number of hidden layers. Moreover, the substructure data-driven model has good robustness under the noise.
The capability to forecast the nonlinear dynamic behavior of the system by substructure data-driven modeling is analyzed. Assuming d ¼ 2, a ¼ 1, the initial value of the state vector is chosen as ½0; 0; 0; 0; 0; 0; 0; 0; 0; À0:2 T . Adopt k 0 ¼ 0:7; 1; 1:2; 1:5 to predict the response data of k ¼ 1:6. The system generates periodic motion. Figure 15a, c is displacement time series curves on the planes ðt; x 3 Þ and ðt; x 5 Þ. Figure 15b, d represents the phase diagrams on the two-dimensional planes ðx 3 ; _ x 3 Þ and ðx 5 ; _ x 5 Þ. Figure 15e, f are the phase diagrams in the three-dimensional spaces ðx 1 ; x 4 ; x 5 Þ and ðx 2 ; x 3 ; x 5 Þ. The blue dotted line is the accurate value and the red solid line is the predictive value. It can be found that the predictive solution is consistent with the accurate solution.
Seeing that the learning rate is an important parameter in the optimizer of neural networks, a comparison between the fixed learning rate and adjustable learning rate is presented in Fig. 19. The learning rate of Fig. 19a is fixed at 0.00001. In Fig. 19b, the adjustable learning rate is chosen (initial learning rate: 0.00001, learning rate drop period: 25% of the epoch, learning rate drop factor: 0.9). It should Fig. 16 The displacement time series curve and phase diagram with the initial value ½0; 0; 0; 0; 0; 0; 0; 0; 0; 0:1 T (k ¼ 0:1, 600 * 1000 s) be noted that neural networks with the adjustable learning rate can better predict the amplitude-frequency response.
In what follows, on the basis of the adjustable learning rate, the effect of control parameters on the dynamic characteristics is discussed. The first is not adding control parameters to the network input (NAPNI). The flow chart of NAPNI is shown in Fig. 20a. Since neural networks cannot recognize the transformation of parameters, one training and one prediction are conducted each time the parameters are changed. Obtain the training data of 0-400 s and predict the steady-state data of 400-1000 s based on neural networks. In the process of forecasting the bifurcation diagram, it requires a large amount of data and takes a long time to retrain networks. The second approach adds control parameters to the network input (APNI) and its flow chart is shown in Fig. 20b. By varying the parameters, several groups of training data are received. Neural networks that can identify parameters change are fixed after training. The data of 0-1000 s and the bifurcation diagram under different parameters will be predicted. In this part, control parameters of the training set are selected as As depicted in Fig. 21a, b, respectively, represent the bifurcation diagrams of NAPNI and APNI. The total training and prediction time of NAPNI and APNI Fig. 17 The displacement time series curve and phase diagram with the initial value ½0; 0; 0; 0; 0; 0; 0; 0; 0; 0:1 T (k ¼ 1:3, 800 * 1000 s) are given in Table 1. Obviously, APNI has higher accuracy and efficiency in simulating the behavior of the five-degree-of-freedom duffing oscillator system.

Conclusions
Based on neural networks, data-driven modeling and prediction of nonlinear multi-degree-of-freedom systems are studied in this paper. According to the linear multi-step method and the known relationship between input and output data of the system, the loss function with two coefficients is constructed to enhance the accuracy of the data model. Then, the whole data-driven modeling method and substructure data-driven modeling method are introduced. The former is a fully connected neural network, and the latter is composed of several lower-degree-of-freedom neural networks that are easy to train. Numerical simulations are carried out on a five-degree-offreedom duffing oscillator system. The results show that: 1. The proposed method can predict the system response of different frequencies by the external excitation and response data. Meanwhile, the substructure data-driven modeling has higher precision than the whole data-driven modeling. 2. The selection of training data has a main influence on the accuracy of the substructure data-driven Fig. 18 The displacement time series curve and phase diagram with the initial value ½0; 0; 0; 0; 0; 0; 0; 0; 0; 0:1 T (k ¼ 1:1, 800*1000 s) model. If training data covers a wider range (or contains more useful information), the model will perform better. When predicting different external excitation or considering noise, properly adding hidden layers of neural networks can effectively improve the prediction ability. Under the influence of noise, the model has good robustness. Reasonably increasing the number of hidden layers can enhance the robustness of the substructure datadriven model. Furthermore, our model has high Amplitude-frequency response curve of the five-degree-of-freedom duffing oscillator system (k 0 ¼ 0:7; 0:9; 1:1; 1:3, c 1 ¼ ½1; 1; 1; 1; 1 T , c 2 ¼ ½0:1; 0:1; 0:1; 0:1; 0:1 T , double hidden layers with 16 neurons per layer) accuracy in predicting the periodic and quasiperiodic motion of the five-degree-of-freedom duffing oscillator system.
3. Compared with the fixed learning rate, neural networks with the adjustable learning rate show better prediction results. On the basis of the adjustable learning rate, APNI and NAPNI are proposed and their impacts on the dynamic characteristics are studied. By adding control parameters to the input of neural networks, the capability of the model to predict the nonlinear dynamic behavior can be significantly strengthened.
The conclusion can be reached that this work extracts potential features from the data, thus effectively reducing the experimental cost and the difficulty of control design in solving practical engineering problems. Although our model has higher prediction accuracy than previous results, it can also be improved. Our future research will evaluate the performance of the substructure data-driven model on more complicated and higher-degree-of-freedom systems. To raise the accuracy of data-driven modeling in forecasting nonlinear dynamic behavior, the model needs to be further optimized.
Author contributions All authors contributed to the design and analysis of the model. Material preparation, data collection and analysis were performed by LZ, YS, AW and JZ. The first draft of the manuscript was written by LZ and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding This project was funded by the National Natural Science Foundation of China (Nos. 11902038, 12272058 and 12272057).
Data availability The data included in this study are available from the corresponding author upon request.

Declarations
Conflict of interest The authors declare that they have no conflicts of interest.