Reconstruction of nonlinear flows from noisy time series

Nonlinear dynamics is a rapidly developing subject across all disciplines involving spatial or temporal evolution. The reconstruction of equations of motion for a nonlinear system from observed time series has been a hot topic for a long time. Nevertheless, in practice only partial information is available for many systems which are very likely contaminated with noise. Here, based on the invariance of the evolution equation of an autonomous system during time translation, a globally valid local approximation of the trajectory is determined, which could be reliably used for the reconstruction of the vector fields with unknown parameters or functional forms, or even with partial observations. Moreover, the noise interference with nonlinearity is computed to the leading order, which together with the global consideration bestows exceptional robustness and extra accuracy to the technique. The new scheme asks only for the solutions of linear equations and thus is very efficient, which is nicely demonstrated in the Lorenz equation in different conditions, while an FitzHugh–Nagumo (FHN) neural network model is used to show its strength in high dimensions.

In nonlinear dynamics, time series analysis was started quite early. In 1980, Packard [14]et al. proposed a method to reconstruct a strange attractor in the phase space of an unknown nonlinear system using given time series. At nearly the same time, Takens [15] proposed the great time-delay embedding technique which enables the construction of strange attractors by only employing scalar time series with delay coordinates. On the basis of this reconstruction, many methods [16][17][18][19] were developed to estimate different dynamically invariant quantities, such as fractal dimensions, Lyapunov exponents. But these important quantities are not able to supply all the relevant information concisely; a direct reconstruction of the equations of motion from time series seems necessary on many occasions. Polynomials are often used for this type of reconstruction [20,21], even with one or several variables being absent [22]. When only one observable time series is measured, a standard system for parameter inference is proposed [23][24][25][26][27] by repeatedly taking derivatives of the time series. In this type of calculation, however, it is hard to control the numerical error. Impressive alternative techniques were proposed, such as the multiple shooting method [28] which demands very little data, and synchronization type of methods [29,30] which could gradually learn values of parameters online.
In practice, the system is inevitably affected by uncontrollable disturbances, such as noise, which makes the reconstruction problem even more unwieldy. Many existing algorithms more or less try to eliminate unwanted perturbations based on heuristic approaches, but most of them may become unstable with moderate or strong noise. For example, in the presence of noise, taking time derivatives could amplify the noise and lead to failure. Although a preliminary smoothening algorithm may help a lot [31][32][33], serious uncertainty still exists. Lu considered the influence of noise in the polynomial fitting of the Lorenz and the Chen equation [34], and concluded that with increment of noise, the reconstruction error increases rapidly. In [35], it is also shown that the polynomial model is not only sensitive to noise, but also sensitive to the sampling rate.
Recent focus has been shifted to the reconstruction of complex networks for interacting nodes, such as the reverse inference of gene regulatory networks from expression data [36,37] and reconstruction of global climate networks [38]. For this purpose, different methods are proposed in the literature to detect network structures based on time series [31,. The Granger causality test or information theoretic tools could be conveniently used to infer network topology [55][56][57][58][59][60][61], although weights of links are not evaluated. To overcome this difficulty, various techniques based on correlation functions [31,39,[46][47][48][50][51][52][53] are designed, but either a small noise limit is required or a preliminary smoothening of the data should be done. At the same time, methods inspired by control theory are designed and applied. For instance, an extra local driving is used for the topology inference [41,65,66], which may not be available in practice or in the given data. So, non-interference or synchronization tools are also developed [42][43][44][45]. When dealing with network connectivity problems in real life, the actual connections are far fewer than possible ones, which leads to the sparsity in complex networks [63]. In this case, the technique of compressive sensing could be invoked for an efficient and robust reconstruction of network structures [39,40].
Here, in order to maximize the reliability of the algorithm, according to the time-invariance of the governing autonomous equations, one global expression is invoked to fit all local short segments of typical trajectories, and then is used to derive the deterministic part of the dynamics and the noise characteristics at the same time. All the available points are used for the local fitting so that the invariant features are extracted very effectively, which bestows the technique extra robustness against noise contamination. There is no need to use another program to smoothen the data first [67]. In fact the global expression could be combined with other techniques in the literature for further extension. Also, the interplay of noise with nonlinearity is carefully examined in order to better separate the noisy and the deterministic part. As a result, even with noise of moderate strength, the computation is still quite accurate since the impact of noise on the orbit is properly taken care of. In the presence of hidden variables, delay coordinates are used in both orbit fitting and parameter evaluation, which constitutes an alternative technique to recover equations of motion with only part of the state variables. In all the calculations, if the dynamics is expressible with polynomials, the evaluation of all the coefficients could be carried out by solving only linear equations, which is very efficient.
The rest of the paper is organized as follows. The main idea of the current approach is detailed in Part II, and extra attention is paid to the noise impact on the trajectories. It is then applied to the famous Lorenz system [68] driven by white noise in Part III under different conditions: with or without a known functional form of the equation of motion, with time series of all or part of the state variables. In Part IV, the new scheme is applied to an FHN neural network model [69] for the evaluation of the coupling matrix with observed data in a quite high-dimensional phase space. In the final part, the results are summarized and possible future research directions are pointed out.

Dynamical systems driven by white noise
A general n-dimensional nonlinear dynamical system driven by white noise can be written as: where is the state variable, f describes the governing equations, and is an extra term that originates from the microscopic or the insignificant unknown part of the dynamics and usually is fastchanging, being simplified as white noise with the following characteristics [70] where · denotes an ensemble average and the delta function is a Kronecker one with respect to the discrete indices i , j but a Dirac one to the continuous time variable t −t , where D represents the noise intensity of η i . For D = 0, the differential system (1) is autonomous and thus invariant in time translation, the solution of which only depends on the initial point and the evolution time interval. For nonlinear dynamical systems without noise, we will use the fourth-order Runge-Kutta algorithm to do the integration. However, in order to consider a system driven by white noise, this paper employs a second-order stochastic integration scheme. Please refer to Appendix A for a detailed explanation of this computation.

Global polynomial fitting of the local dynamics
For the reconstruction of nonlinear systems based on time series, considering the invariance of the governing laws, we employ globally valid analytic expressions for both local trajectories and the evolution equations. Thus, the orbit of the system in the phase space is divided into thousands of tiny segments and each short segment is fitted with the same expression. Specifically, we start from an initial point on each segment and fit a few steps forward and backward by tweaking the parameter values in the assumed analytic expressions based on a combined error estimation over all the segments. The advantage of this idea is that if data points in the time series are seriously contaminated by noise, the scheme will suppress the error through fitting the whole system and averaging out the irregular noisy fluctuation. Therefore, when the scale of the time series to be processed is very large, the advantage of this method becomes obvious.
The following expression is the Taylor series expansion of the local solution of Eq. (1) where |t| 1 is small. Thus, Eq. (3) only describes a short-time evolution, and in many cases a polynomial expression could be employed for g i and h i .
where In this paper, polynomial approximation is always used for g i and h i for simplicity. In more detail, if the initial state is x i (t 0 ), the forward or backward evolution for a time t (here we make t = mt unit , where m ∈ Z and t unit is a chosen elementary time step) is conveniently approximated by Eq. (3). We may pick up 2m +1 points along the orbit, namely x i (t 0 + mt unit ) for the fitting. The same expression should be valid for all initial conditions and a reliable description of the system is obtained by minimizing the following cost function with respect to the unknown coefficients, where x i (t 0 + t) is the known time series and x i (t 0 + t) is computed from Eq. (3). N is the total number of points in the time series and m is the number of local evolution steps forward or backward. In the following computation, t unit = 0.01.

Separate noise from data
In reality, available data are often polluted with noise for various reasons, which should be considered in the dynamics reconstruction. In this paper, we add noise terms (additive noise) to the governing equations (Eq. (1)), which is Gaussian white with the average 0 and being δ-function correlated. When noise is weak, locally the noisy orbit may be viewed as a superposition of a deterministic orbit with noisy fluctuations. However, if the noise is moderate or strong, the orbit will deviate from the deterministic one even locally. This influence will be calculated below in a general nonlinear system [70,71]. For the dynamical system Eq. (1), we may write down its integral form which has a (deterministically) first-order approximation (stochastic 1/2-order) and then a (deterministically) second-order approximation (stochastic 3/2-order) x (2) A second-order Taylor expansion is carried out on , resulting in where in the second-order term, only the lowest order of the difference (x (1) In Eqs. (9) and (10), , respectively. By taking an ensemble average of Eq. (10), the deterministic part of the orbit is obtained: where the last term is a second-order contribution from the noise, which serves to push the average off the deterministic orbit and should be taken into account if the noise strength D is not small. The previous local fitting is done actually for this average orbit, which contains noise contribution. The noise can be estimated by taking a second moment of Eq. (10) to the lowest orders. From Eq. (12), it is easy to see that if ∂ f i /∂ x i is large, the second-order modification of D should be considered for a better estimation of both the noise and hence the original vector field.
In practice, once the fitting with Eq. (3) is completed, we regard it as the average orbit given by Eq. (11). Thus, by comparing it with the given observation x (2) i (t), we may get a rough estimation of the noise strength D, neglecting the second-order term in Eq. (12). Then, we may estimate the unknown coefficients in the vector field f i by minimizing the following error function where x i (t) is given by Eq. (3) and < x (2) i (t) > is from Eq. (11) with the above estimation of D. In Eq. (13), the summation is similar to that in Eq. (5). If the unknown coefficients enter f i in a linear way, the minimization of Q could easily be done by repeatedly solving linear algebraic equations, starting from a first-order approximation by identifying the coefficient of t in Eq. (3) with f i (see Eq. (11)).

Error estimation
After fitting the data, we examined the root-meansquare (one-step) prediction error M SE [35] and the relative error P err to properly evaluate the results, which could be explicitly written as follows: where N represents the number of data points along the temporal direction, n is the number of degrees of freedom, m determines the number of points on a local piece of the orbit. The variables x i (t) and x i (t) are defined similarly as in Eq. (5).
In the following, we also calculate the magnitude of the noise with a more precise evaluation of D. Explicitly, Eq. (12) will be used to get a better estimation of D based on the observed data and the polynomial fitting of the average orbit after the vector field f i is determined. A better evaluation of D incurs better estimation of other unknown parameters in f i . This could be repeated again and again for a refined determination of all the parameters. The relative error of a parameter is defined as: where parm is the fitting result and parm is the true value.

System description
The Lorenz system is a typical deterministic dissipative system [68], which readṡ where the parameters a > 0, b > 0, r > 0. When a = 10, b = 8 3 , r = 28, it becomes chaotic and a strange attractor manifests. Chaotic system like Lorenz equation has an obvious feature that nearby trajectories separate in an exponential form, and as a result it is impossible to make long-term prediction.
We may arrange the time series data in a matrix with the following form . . . . . . . . .
which is extracted from a long trajectory obtained with a fourth-order Runge-Kutta simulation.  (17), a loss function P may be constructed, which depends on some unknown coefficients as in Eqs. (4a) and (4b). However, based on the principle of least squares, the coefficients (a i ,  Table 1. The expression Eq. (3) is in fact a quite accurate description of the orbit. Upon substituting the point x(t 0 ) into the model (Eq. 3), we obtain the next one at t = t 0 + t unit , which is again taken as the initial point for the next iteration. The above process is repeated S t times, and we then have prediction of the next S t points along the orbit. The x component of the original trajectory and the predicted orbit are depicted in Fig. 1, where the first 5000 points are used for the fitting of the parameters and the later ones could be evaluated with Eq. (3). It is found that the prediction for the next 400 points is quite accurate, even with a small noise D = 0.01. For S t > 400, although the main features remain similar, the prediction slowly goes awry, which   Table 2. Also, the errors for different time steps are displayed in Table 3. It is easy to see that in all the calculations, the errors err a , err b , err c are quite small and the three equation parameters could be reproduced reasonably well.
From Tables 1, 2 and 3, as expected, we see that the more time points used for the fitting, the more accurate the result and the smaller the error. In addition, it can be seen from the tables that when N = 1000, the fitting error of the system is already very small, and further increase of the number of data points has only a minor impact on the error reduction. In addition, it is also observed that the fitting error increases gradually if the number m of local fitting points or the time step t unit increases. Two possible reasons could be envisaged for this. For small t unit , the information of the vector field is mainly stored between neighboring points along the orbit. Extra points may not give much additional information on the local field if the noise is small enough and the round-off error is under control. Moreover, with more points included, higher-order terms in t should be invoked to describe distant points. However, the coefficients of these terms involve higher-order polynomials of the initial conditions x i (t 0 ) and the functional form Eq. (4a,4b) may not be valid there. For the same reason, the time interval should not be large.

The Lorenz system driven by white noise
For a Lorenz system driven by white noise, we use a second-order stochastic integration scheme to do the simulation (Appendix A). A matrix A similar to A can also be constructed. In this section, we mainly study the sensitivity of our scheme to noise, so we fix N = 10000, m = 1, t unit = 0.01 and only change the size of noise.
The computation is similar to that without noise. According to the principle of least squares, based on Eq. (3), we obtain the fitted coefficients, where the influence of noise strength is shown in Table 4.
From the table, we see that with the increase of noise, the fitting error seems to increase proportionally to D. However, even with considerably large noise, the fitting error is still within an acceptable range. Therefore, the current technique appears robust upon noisy perturbation. With our formulation, even if the noise is strong and not Gaussian white, extra state variables may be used to generate it from white noise [31,52], we can still properly evaluate the noise intensity and extract the original vector field from the noisy data. Equation (11) and Eq. (12) may be used to evaluate system parameters a, b, r and noise intensity D, using the method of least squares. With N = 10000, m = 1, t unit = 0.01 but different noise, the system parameters are computed and displayed in Table 5, while Table 6 stores the relative errors for different N , D's.
In Tables 5 and 6, D , a , b and r , respectively, represent the computed noise strength and the system parameters. The relative errors err D , err a , err r , and err b are defined in Eq. (15). One interesting thing is that the relative error of the estimated noise strength gets smaller with stronger noise even though the absolute error still increases as expected. Usually, the noise intensity used in the literature for dynamics reconstruction is less than 1, and very rarely a value greater than 5 is considered. Nevertheless, the method we proposed works well even when the noise intensity is equal to 50. Of course, we believe that the method will also fail if noise dominates the dynamics. For relatively large noise (D = 50) in which case a typical trajectory and time series are shown in Fig. 2, the computed values for the parameters are very close to the true ones though the fitting error grows with noise. One may imagine that the current technique could possibly fail in the presence of very strong noise,e.g., D > 100, since the perturbation calculation in the previous section may become invalid. Furthermore, the dependence of the estimation accuracy on the number of time points N is also apparent from Table 6.

Reconstruction with sparsity consideration
In the previous calculation, we assumed that the form of the system is known (Eq. 16) and only the unknown parameters a, b, r need to be fixed. However, in prac- Table 2 The error characteristics err a , err b , err c given by Eq. (15) Table 3 The error characteristics err a , err b , err c given by Eq. tice, the functional form of a vector field may not be known and we have to reconstruct it from scratch. In this case, a functional basis {Z j } j=1,...,d should be taken in terms of which the equation of motion is expressed aṡ where a i j are the parameters that need to be determined from the data.  Table 6 The error characteristics err a , err b , err c given by Eq. To incorporate this sparsity consideration, compressive sensing is invoked in the reconstruction [73][74][75][76][77], in which an L 1 norm [62] term is added to the cost function Eq. (13). This extra regularization term not only caters to the demand of sparsity, but also ensures robustness against noise or other perturbations, which may even enable detection of hidden nodes in complex networks [64].
In the literature, the SINDy method designed by Brunton [40] utilizes compressed sensing to do the fitting and has achieved remarkable success. To embrace the idea, an L 1 -norm term β i , j ||a i, j || 1 is added to the error function to improve the fitting here, where β is the weight for the sparsity requirement. A comparison between the two approaches is made together with a high-order correlation computation (HOCC) method [31,[51][52][53]65] which is also a newly emerging efficient technique and utilizes correlation between different terms in the equation of motion. Table VII compares the fitting results of our method with those of the SINDy and the HOCC technique. Without extra smoothening algorithm, the HOCC method needs very small time step t unit = 0.001 and thus large N = 100000 for a reasonably good computation of the velocity and more time points. With even a small noise D = 1, the HOCC computation is already not very precise but the other two (our method and the SINDy) are still good and accurate. At D = 5, the current method shows some advantage. With even larger noise, our method still works, showing its privilege in dealing with noisy dynamics.

Reconstruction of the Lorenz system with partial observations
Very often, we only observe some of the state variables of a system. For example, in the Lorenz system we may only have access to the data in the x and z directions. In this case, the Takens embedding theorem [15] should be invoked to carry out the fitting. Delayed coordinates are used for possible extra dimensions of the system. In the current case, as the y-coordinates are unknown, we are allowed to use only the first and the third columns of the time series matrix (A) Eq. (17) as the observed time series data. There is a great deal of literature concerning the proper choice of the embedding dimension and the time delay [12,78]. Here, we assume that the choice has been made: the embedding dimension is 3 and the time delay is τ = 0.06. We use x, x τ , z as the state variable for the reconstruction, where x τ is the value of x delayed by τ . Similar to what has been done before, Eq. (3) gives a local fitting in terms of polynomials in x, x τ , z, which could be used as the state variables to construct a vector field if there is no extra information about the dynamics. In contrast, if the functional form of the Lorenz system is known (with unknown parameters), we may express the unknown variable y with a combination of x, z and x τ .
where l, g, h are polynomials of their arguments, being similar to Eq. (4a) or (4b). All the unknown coefficients including the parameters could be determined by substituting Eq. (19) and the local expressions for x, z into Eq. (16) and minimizing the squared error at all the available data points. The parameters of the Lorenz sys-tem obtained in this procedure are a = 10.0240, b = 2.6867, r = 30.7600, and the relative errors are err a = 0.24%, err b = 0.75%, err r = 9.86%. The fitting errors defined by Eq. (14a,14b) are M SE = 0.6993 and P err = 0.0086, respectively.

System description
The FitzHugh-Nagumo (FHN) system was derived by FitzHugh in a simplification of the Hodgkin-Huxley (HH) model [79]. The FHN equation is a second-order nonlinear system, which can be used to simulate the dynamical behavior of neurons. One particular form of the equation is as follows: where v i is the rate of change of the membrane potential, and w i is a slowly changing recovery variable for the ith neuron. L is the number of neurons. In this model, the dynamical behavior of the system is controlled by the coefficients κ, b, c 1 , c 2 and the interaction weight W i j . 1,i , 2,i are noise terms, which are usually taken to be white satisfying Eq. (2a-2b). When external noise is added to the system, coherent resonance may occur with system parameters [80,81]. We use an FHN network with L = 10 nodes to test our scheme, shown in Fig. 3 as a directed graph. In order to reach a balanced state, the sum of the influences from other nodes on a given node should be opposite to the self-influence. Therefore, the interaction between nodes can be expressed as a Laplace matrix W i j . In more detail, when i = j, W i j ∈ [0, 1) being randomly distributed, i, j = 1, 2, · · · , L; when i = j, W ii = − i = j W i j . Still, the simulation of the stochastic differential equation Eq. (20) is carried out with the integration scheme presented in Appendix A. The parameters in the equation are chosen to be κ = 0.08, b = 0.8, c 1 = 0, c 2 = 0.7, and the initial state variables are distributed uniformly in [0, 1). The evolution of the variable v 1 is plotted in Fig. 4 in the time interval [0, 10000]. Over a fluctuating background, several spikes are clearly seen.

Fitting the FHN model
In the current fitting, the functional form of the vector field is known though the dimension of the phase space is quite high. In order to reduce the amount of calculation, the mutual influence in the v variables between different nodes of the network is approximated by the summation term on the right-hand side of the following equation to the lowest order where i = 1, 2, · · · , L. It turns out that such an approximation is good enough to recover the observed trajectories. To facilitate writing, we use one variable x i to collectively denote the variables {v i , w i } i=1,··· ,L . The coefficients g vi , h vi , g wi , h wi in Eq. (21) could be written as: where for convenience we use the notation The superscript in the coefficient distinguishes the coefficients for different variables. With the method of least squares, all these coefficients could be determined from a given long time series. In this model, N = 10000 observation points with time interval t step = 0.08 are taken from a typical orbit, according to which local fitting is computed with Eqs. (21) and (22a, 22b, 22c, 22d). Among them, the calculated coefficients include the estimation of the matrix W i j and the parameters in the FHN model: To refine the computation, we may treat b, c 1 , c 2 , D v , D w as known and minimize an equation similar to Eq. (13) to get a better W i j and κ, which could be fixed in the next turn when computing other parameters. The iteration with the method of least squares is carried out until all the parameter values become steady. The fitting results of the coupling coefficients W i j are plotted against the true values in Fig. 5. All the non-diagonal terms of W i j are positive, which are computed with quite high accuracy. The diagonal terms of W i j are all negative and also have a good estimation but not as accurate as the non-diagonal terms, probably due to their magnitudes. The estimates of the parameters of the dynamics equation are listed in Table 8, where the error is calculated according to Eq. (15).

Conclusion
Reliable reconstruction of nonlinear dynamical systems is very important for their proper understanding, prediction or control. In this paper, globally invariant polynomials are used to fit all the local pieces of given trajectories, which are then invoked to deduce equations of motion. Specifically, the impact of noise on an average trajectory is evaluated to the lowest order, which is seen to be proportional to the noise strength D.
When noise is small, this correction may be neglected but it could become significant when noise is moderate or strong. Due to the use of all available points for fitting, the current scheme appears exceedingly robust and reliable in the presence of noise, which is demonstrated in the Lorenz system comprehensively. Regardless of whether the functional form of the equation of motion is known or not, the dynamics can be reconstructed with an accuracy comparable to or even better than by prevailing algorithms, especially when noise is not small. A specific procedure is designed to retrieve the parameters in the equation if only some of the state variables are available, being also tested successfully in the Lorenz system. The computation of all unknown coefficients could be carried out by solving linear equations, which grants high efficiency. The applicability of the current technique is also checked in quite high dimensions-an FHN neural network model, where a simplified expression for the local trajectories is assumed and nicely restores all the unknown coefficients from the observed data.
In the current paper, we use only polynomials to do all the fitting, which is convenient and capable of handling quite general situations. However, in other circumstances, an alternative basis may be used to better represent the system under investigation. It would be very interesting to find a systematic way of designing a proper basis of functions in general upon which to do the expansions [9,12]. Moreover, for very highdimensional systems, such as turbulent fluids or brain networks [82,83], there could be enormous number of possible nonlinear terms in the fitting expressions. How to choose relevant ones among them is worthy of further investigation.
Although the current scheme may handle nonlinear systems in the presence of weak or moderate noise, it is likely to fail if the noise is too strong since a perturbation calculation is performed in our derivation of noise effects. A different consideration may be needed to disclose the regular part of the dynamics that is buried under possibly rampant noise. Even when a colored noise is involved, the current scheme may falter if the correlation time is non-negligible. A potential solution is to model the colored noise with a system only involving white noise [82]. A proper treatment of fluctuations may even help us dump the non-essential degrees of freedom to noise terms and achieve a considerable reduction of the complexity of a problem [83]. contract number 2019XD-A10 and also by the Key Program of National Natural Science Foundation of China (No. 92067202).
Data availability All data, models, generated or used during the study, appear in the submitted article, and code generated during the study is available from the corresponding author by request.

Declarations
Conflict of interest The authors declare that they have no conflict of interest.

Appendix A: Random Taylor expansion
Equation (1) is simulated with algorithms derived from Taylor's expansion. Firstly, Integration is performed on both sides of formula (1) at the same time to obtain: where the noise integral term can be expressed as: Next, a Taylor expansion is performed on f i (x 1 (s), x 2 (s), · · · , x n (s)) at points (x 1 (t), x 2 (t), · · · , x n (t)) in equation (A1) resulting in i (x 1 (t), x 2 (t), · · · , x n (t)) × (x i (s) − x i (t))]ds + we have < i1 (t) >= 0 and < i2 (t) >= 0, but they are not independent. The linear combination of Gaussian random numbers i1 , i2 is used to carry out the approximation [84], i1 (t) = a i1 , i2 (t) = b i1 + c i2 . a , b and c can be determined by three equations related to the quadratic moments of i1 (t) and i2 (t) and their correlation, resulting in a = √ For the stochastic Eq. (20), the following stochastic Runge-Kutta scheme [85,86] is employed to do the numerical integration where the intermediate variables F 1 and F 2 are computed as follows: , where w 0 , w 1 , w 2 are random numbers satisfying the standard normal distribution.