Modeling of Chaotic Processes by Means of Antisymmetric Neural ODEs

. The main goal of this work is to construct an algorithm for modeling chaotic processes using special neural ODEs with antisymmetric matrices (antisymmetric neural ODEs) and power activation functions (PAFs). The central part of this algorithm is to design a neural ODEs architecture that would guarantee the generation of a stable limit cycle for a known time series. Then, one neuron is added to each equation of the created system until the approximating properties of this system satisfy the well-known Kolmogorov theorem on the approximation of a continuous function of many variables. In addition, as a result of such an addition of neurons, the cascade of bifurcations that allows generating a chaotic attractor from stable limit cycles is launched. We also consider the possibility of generating a homoclinic orbit whose bifurcations lead to the appearance of a chaotic attractor of another type. In conclusion, the conditions under which the found attractor adequately simulates the chaotic process are discussed. Examples are given.


Introduction
Today recurrent neural networks (RNN) are one of the main tools for modeling various dynamic processes. It should be noted that the quality of modeling with the help of RNN depends on the activation functions used in the network (see [1] - [7]).
As for activation functions, the good results of modeling various processes were obtained precisely with the help of those neural networks in which the well-known rectified linear units (ReLU) were used [4,6,8]. Naturally, any generalizations of ReLU deserve attention. Therefore, below we will consider some of them.
We will not focus on the advantages or disadvantages of one or another activation function, but will focus only on those properties that are essentially used in this work.
In recent years, an interesting idea has appeared to interpret a system of ordinary differential equations in the form of a suitable neural network (residual network) [9] - [12]. Precisely this interpretation is implemented in the present work: a system of differential equations (a system of so-called neural ODEs) is considered as a continuous analogue of some RNN [13] - [17]. It should be noted that in [17] the neural network was considered as a linear control system closed by nonlinear feedback. In this case, the ReLU activation functions played the role of the functions constituting the feedback. The task of modeling was not to bring the trajectories of the model and the real process closer together, but to find the algebraic invariants that determine the behavior of the model built for a known time series. If the corresponding invariants for different lengths of this time series turn out to be equal, then we can talk about the adequacy of the model and the real process.
Below we will focus on two areas of research, which can be formulated in the following questions.
1) If a neural network models a certain dynamic process, then how to guarantee the stability or boundedness of solutions of the system of differential equations describing a continuous analog of the aforementioned neural network?
2) In the theory of bifurcations, the following result is well known: in any determinate system, chaotic processes arise as a result of bifurcations of limit cycles or homoclinic orbits [18] - [21]. Therefore, how to design the architecture of neural ODEs system so that the resulting architecture would generate a limit cycle? (It is now known that most types of chaos in systems of differential equations begin with bifurcations of limit cycles [18] - [20]. ) The answer to the first question will be successful if the activation functions are chosen so that Lyapunov analysis can be done for the resulting system of neural ODEs [22] - [24]. Piecewise continuous functions, each part of which is a power function, can be proposed as such functions. The use of power activation functions (PAF) in neural networks is a generalization of the rectified linear units. In the present time ReLU are standard functions to increase the depth of learning. Therefore, power activation functions are on obvious generalization of ReLU.
Note that for systems of neural ODEs with PAF, the answer to the first question has already been partially obtained in article [25]. (In the present paper, the results proved in [25] will be generalized.) As for the second question, the main part of the article will be devoted to the answer to this question.
It should be said that in a large part of the article neural ODEs with chaotic modes are discussed. Chaos constitutes the basic form of collective neural activity for all processes and functions of perception. It acts as a controlled noise source to ensure uninterrupted access to previous memorized images and memorizing new ones. Chaos allows the system to be always active, ridding it of the need to wake up or enter a stable state every time the input changes [26]. Many researchers agree that the best from the point of view of storing and processing information is the regime of ordered chaos [27]. On the one hand, this mode has all the advantages of chaos, on the other hand, this mode can be controlled. The set of states through which the trajectory of a chaotic system passes is called a chaotic attractor. Therefore, the conditions for the existence of chaotic attractors in systems of neural ODEs are the subject of research in this paper.
The final sections of the article are devoted to the reconstruction of neural ODE systems. For this purpose, several algorithms have been developed for determining the parameters of ODE systems for known time series. The essence of these algorithms lies in the fact that they use the special structure of neural ODEs (antisymmetric neural ODEs) with which it is possible to generate a limit cycle. After that, by choosing the weight coefficients, we obtain such bifurcations of the indicated cycle, which lead to the simulation of a real chaotic process.

Mathematical preliminaries
We now recall several well-known results from the theory of approximation of real functions of n variables [28] - [30].
Let X be an arbitrary set in the linear space R n . By C(X) denote a set of continuous real functions of n variables with domain of definition X. Definition 1. A set of real functions F ⊂ C(X) is called separating points of the set X ⊂ R n if for any different x 1 , x 2 ∈ X, there exists a function f ∈ F such that f (x 1 ) = f (x 2 ). Definition 2. A collection of functions F ⊂ C(X) is called closed with respect to a function of one variable φ : R → R if φ(f ) ∈ F for any f ∈ F. Theorem 1( [28]). Let X ⊂ R n be a compact space and C(X) be the algebra of continuous real functions on X. Let also the set F ⊂ C(X) containing the constant 1 be the linear subspace closed with respect to the nonlinear continuous function φ and separating points of the set X. Then F is dense in C(X).
Theorem 1 can be interpreted as a statement about the universal approximation possibilities of arbitrary nonlinearity: using linear operations and a single nonlinear element φ, one can construct an algorithm that builds an analytical model of any continuous function with any desired accuracy.
From an applied point of view, Theorem 1 can be presented as follows. Let (u, v) ≡ ((u 1 , ..., u n ), (v 1 , ..., v n )) be a scalar product of vectors u, v ∈ R n . Let F (x 1 , ..., x n ) be a real continuous function defined on a closed bounded domain D ⊂ R n . Let also > 0 be any arbitrarily small number, which means the accuracy of the approximation. Theorem 2 ([28] - [30]). Let ψ be a continuous nonlinear real function of one variable. Then there exist an integer m > 0, a set of vectors a j ∈ R n , and sets of real numbers ξ j and b j ; j = 1, ..., m, such that the function approximates the given function F (x 1 , ..., x n ) with the error in the entire domain D.
Thus, for any vector In terms of neural networks, this theorem can be formulated as follows. Any continuous function of several variables can be realized with any accuracy using a two-layer neural network with a sufficient number of neurons and one nonlinear activation function in the hidden layer.

Periodic solutions of neural ODEs
We assume that we know the dimension n of the real phase space in which the considered dynamic process x(t) = (x 1 (t), ..., x n (t)) T ∈ R n takes place [17,25,26]. We will also assume that the functions x 1 (t), ..., x n (t) are continuous and differentiable with respect to time t on the interval [0, ∞). Our goal will be to model this process with a suitable system of neural ODEs. This system will be based on new concepts, which are demonstrated below.
2) Let us turn to Definition 1 in which we will consider X = R and F the set of all odd functions. Since for any f ∈ F and any a ∈ X equation f (u) = a has a single root, then it is clear that the set F separates the points of the set X.
3) Let f (u, γ ∨ δ), g(u, α ∨ β) ∈ F, where F is the set of all odd functions. Then, we have Thus, we have ∀f, g ∈ F f (g) ∈ F and the requirement of Definition 2 is satisfied. As follows from items 1) -3), all conditions of Theorems 1 and 2 are true for odd activation functions. In this regard, we can reduce the parameters ξ j , j = 1, ..., m, in formula (1) of Theorem 2.
Corollary of Theorem 2. Let ψ be the odd activation nonlinear function of one variable. Then there exist an integer m > 0, a set of vectors a j ∈ R n , and a set of real numbers b j ; j = 1, ..., m, such that the function approximates the given function F (x 1 , ..., x n ) with the error in the entire domain D.
Now, we can apply this Corollary to approximate the derivativesẋ i (t) of the functions x i (t); i = 1, ..., n.
As a result, we can get the following system of ordinary differential equations: with the known vector of initial values (x 10 , ..., x n0 ) T .
Here h i (u i , α i ∨ β i ) and f ij (u i , γ ij ∨ δ ij ) are real power odd functions of one variable u i ; x = (x 1 , ..., x n ) T ∈ R n is the vector of states, A ∈ R n×n , I is the identity n × n matrix; [A − rI] i is the i-th row of the matrix A − rI; r, b i , p In what follows, we will assume that the conditions of Theorem 2.2 (on local existence and uniqueness of a solution) [31] are fulfilled for system (6) with initial data vector (x 10 , ..., x n0 ) T . System (6) was created for solving approximation problems. However, in various issues of modeling, it can be interesting in itself. Therefore, in the next theorem we can weaken the conditions under which system (6) was constructed.
Proof. The proof of this theorem basically repeats the proof of Theorem 4.1 [25].
Let us introduce a new variable into system (6) according to the formula x → y = (A − rI)x + b, where b = (b 1 , ..., b n ) T . Further, the number r ≥ 0 can be taken large enough so that the matrix (A − rI) will be invertible. Therefore, the specified replacement will be correct.
Since fractions β i + 1 and α i + 1 have an even numerator and odd denominator, then the function V (x 1 , ..., x n ) will be positive definite.
In this case, on the one hand, there exists a moment T 0 > 0 such that if t > T 0 , then W (x 1 , ..., x n ) − bH(x 1 , ..., x n ) < 0, and Thus, we have V (x 1 (t), ..., x n (t)) → 0 at t → ∞. But on the other hand this fact means that if the function V (x 1 (t), ..., x n (t)) is small enough, then there exists the moment T 1 > T 0 > 0 such that if t > T 1 , then W (x 1 (t), ..., x n (t)) − bH(x 1 (t), ..., x n (t)) > 0 and the positive function V (x 1 (t), ..., x n (t)) increases, and so on. Denote by H = {h 1 , ..., h n } ⊂ R n the set of all points satisfying condition W (h 1 , ..., h n ) − bH(h 1 , ..., h n ) ≤ 0 . Since function H(x 1 , ..., x n ) is positive definite, then H is the compact positively invariant set with respect to (7). Therefore, if solution x(t) of system (7) belongs to H, then it is bounded. It means that the solution V (x 1 , ..., x n ) of equation (9) also should be bounded. Now we assume that there exists a moment T u such that for t = T u x(T u ) ∈ H. Then inequality (12) holds.
Denote by S the ball centered at the origin whose surface passes through point T u ∈ S. In this case H ⊂ S. Therefore, by virtue of (12) and according to LaSalle's Theorem [25], there is a moment T s > T u such that x(T s ) ∈ H. Again we get that solution x(t) of system (7) starting at S belongs to S. In addition, x(t) is attracted to the boundary of H as t → +∞. Thus, it is bounded.
Now we use Comparison Principle [25]. Then it remain to compare the solution V (x 1 , ..., x n ) of equation (9) and a similar solution of inequality (10). From here it follows the boundedness of solution x(t) of system (7) for any initial condition x 0 ∈ R n .This completes the proof of case (a1) for strictly odd functions.
Now we can apply Theorem 4.1 [25] to equation (7). Then all the ideas that were used in the proof of Theorem 4.1 [25] can be directly carried over to the proof of Theorem 3. Since it remains to verify only one condition of Theorem 4.1 [25]: the symmetric matrix A + A T − 2rI must be negative definite. The last condition can always be achieved by choosing the sufficiently large parameter r ≥ 0. It is clear that the same statement will also hold for system (6). This remark completes the proof.
Comment. In the general case the function V (x 1 , ..., x n ) is not the Lyapunov function for system (7). It is guaranteed to be the Lyapunov function if f (x) ≡ 0.
Let us compose from functions (4) the following vector-function: where s j = (s j1 , ..., s jkj ), j = 1, ..., n. Let Corollary of Theorem 3. Let the vector h(x) in system (7) be replaced by the vector H(x, s 1 , ..., s n ). Then, under the conditions of Theorem 3, any solution of Proof repeats the proof of Theorem 3 if in this theorem we replace the function V (x 1 , ..., x n ) (8) by the function where all the indicated integrals are integrals of power functions.
Now, we will assume that in system (7) r = 0 and f (x) = 0. Then we get the following systemẋ (t) = Ah(x).
Let also A ∈ R n×n be an antisymmetric matrix such that rank A = m ≤ n. Let V ⊂ R n be a (n − m)-dimensional subspace of R n such that AV = 0 (if m = n, then V = 0; for odd n > 1, we always have m < n and V = 0). Then any solution x(x 0 , t) of system (13) starting from Proof. We will again use the Lyapunov function V (x 1 , ..., x n ) of form (8), which was introduced in the proof of Theorem 3.
Then the set S ⊂ R n is compact positively invariant with respect to (13). In addition, from here it follows that V t (x) = 0.5h T (x)(A + A T )h(x) = 0. This means that the solution x(x 0 , t) of system (13) starting from a point x 0 ∈ R n forms a closed curve on the boundary of the set S ⊂ R n and therefore the solution x(x 0 , t) is periodic [31]. Now consider the following system of ordinary differential equations: with the known vector of initial values x 0 . Here A i is the i-th row of the matrix A; i = 1, ..., n.  (14) is periodic; if at least for one k ∈ {1, ..., n}, we have α k = β k or b k = 0, then any solution of system (14) is a winding of an infinite cylinder with generatrix V. In addition, there exist numbers T > 0 and λ = λ(T ) ∈ R that do not depend on x 0 , such that for any solution x(t) of system (14), we have Proof. (b1) With the help of suitable changes of variables y = Ax + b, we transform system (14) into the following system (generally speaking, not equivalent to system (14)): where b = (b 1 , ..., b n ) T . Then in the new variables y we get the systemẏ(t) = Ah(y). (This is system (13).) It is clear that in this case all the conditions of Theorem 4 are fulfilled and we obtain a periodic solution p(t) of the systemẏ(t) = Ah(y). In addition, taking into account the new variables y, system (14) can be written in the formẋ(t) = h(y). From here it follows that It is important to note that the solutions q(t) of system (14) are not (generally speaking) periodic. (It is found from the solution of the equation p = Aq + b, where for odd n matrix A is singular.) It is known that the indefinite integral of a continuous periodic function of period T is the sum of a periodic function of the same period T and some linear function. Thus, from (16) and (14) it follows that q(t) = q T (t) + tλV ⊂ R n and Here q T (t) = (q T 1 (t), ..., q T n (t)) T ∈ R n is a periodic vector function.
, where tλV is a straight line in R n passing through the origin. Thus, the set {q T (t) + tλV} ⊂ R n is a curve wound on the cylinder with generatrix V. (The projection of the periodic curve q Consider the situation α 1 = β 1 and b 1 = 0. Then at A 1 · x ≥ 0 the first equation of system (14) will not change, and at A 1 · x < 0 this equation will be represented asẋ 1 Since the initial conditions are the same for both equations, then, in accordance with the well-known Cauchy theorem on the existence and uniqueness of solution, there must be λ = 0. Repeating the same reasoning for each of the equations of system (14) for α i = β i and b i = 0, we finally obtain λ = 0; i = 2, ..., n. The last statement completes the proof of the whole Theorem 5.
The next figure demonstrates the statements of Theorem 5 (see Fig. 1): Definition 4. System (14) in which matrix A is antisymmetric, power activation functions .., n, are odd is called the unperturbed system of antisymmetric neural ODEs.
In addition, the following systeṁ obtained from system (14) of antisymmetric neural ODEs with the help of the linear invertible transformation T ∈ R n×n (x = T ·y) will also be called the unperturbed system of antisymmetric neural ODEs.
Definition 5. Let the matrix A of system (6) be antisymmetric and r ≥ 0. Then, under the conditions of Theorems 3, system (6) is called a perturbed system of antisymmetric neural ODEs.
4 On conditions for the appearance of chaos in system (7) It is known that the following question often arises when modeling chaotic processes: can the created model generate chaotic behavior? The same question can be extended to model (7) (or (6)). Let us denote by the symbol where N is the set of natural numbers. Then the inequality deg Consider the following simplified version of system (7): Here matrices A, B ∈ R n×n , and the matrix A is antisymmetric; the power vector functions Let us construct for system (18) a positive definite function V (x 1 , ..., x n ) (8). Then, we havė We denote by W the set of all points from R n such thatV t (x 1 , ..., x n ) ≤ 0. Let also L ⊂ W be the set of all points in W such thatV t (x 1 , ..., x n ) = 0. We also denote by Theorem 6. Let the point 0 be a unique equilibrium point for system (18) in W. Suppose also that: 1) point 0 is a saddle point; 2) there exists a value of parameter r > 0 such that for an arbitrary vector of initial values (x 10 , ..., x n0 ) T ∈ X the solutions x 1 (x 10 , ..., x n0 , t), ..., x n (x 10 , ..., x n0 , t) of system (18) Then under the conditions of Theorem 3 in system (18) there exists a chaotic dynamics.
Proof. According to Theorem 3, there exists the value r = r 0 such that all solutions of system (18) are bounded. Therefore, for r = r 0 the set W is a compact positively invariant set with respect to system (18).
According to LaSalla's theorem every solution of system (18) starting in W approaches to the largest invariant set M ⊂ L as t → ∞ (see [25,31]). In our case, by assumption A T + A = 0 and condition 1) of Theorem 6, the role of the set M can be played either by a limit cycle or by a homoclinic trajectory connected at 0 (see Theorem 4). If both conditions 1) and 2) of Theorem 6 are satisfied, then there exists a sequence of values r 01 > r 02 > ... > r 0k > ... of parameter r such that lim k→∞ r 0k = r c > 0 and the set M(r c ) at the critical value r c is the homoclinic orbit.
(Indeed, let N s and N u be stable and unstable manifolds of the point 0 [21,34]. Let's denote by x 0 = (x 01 , ..., x 0n ) T ∈ N u the starting point. Since at the point x 0 we haveV t (x 01 , ..., x 0n ) ≥ 0, then the solution x(x 0 , t) of system (18) should be attracted to a certain limit cycle in L. According to condition 2) of Theorem 6, this limit cycle for some value of parameter r will pass arbitrarily close to the origin (to the manifold N s ). In other words, near point 0 on trajectory x(x 0 , t) there will be point x 1 = (x 11 , ..., x 1n ) T such thatV t (x 11 , ..., x 1n ) ≤ 0. Therefore, there must be the value r c of parameter r for which N s ∩ N u = ∅. This means the existence of homoclinic orbit.) The last statement enables us to construct a discrete mapping for system (18). Theorems 4 and 5 allow us to assert that a limit cycle can exist in perturbed system (18) for some values of parameter r. Let T 0 be the period of this cycle. In this case, the continuous relation (11) for the positive definite function V (t) ≡ V (x 1 (t), ..., x n (t)) can be rewritten as where k = 0, 1, ... A discrete analogue of this relation according to the technique described in [32] - [34] can be represented in the following form: Here p > 0; φ(u) is a linear combination of power functions u qi of one variable u > 0 (with q i > 0) and 0 < deg φ(u) = max q i < q, i = 1, ..., l.
In [32] - [34] it is shown that for some r = r c mapping (19) generates a chaotic dynamics. Therefore, system (18) at r = r c will also exhibit chaotic behavior. Comment 1. Theorems 4 and 5 guarantee the existence of periodic trajectories for unperturbed systems. This means that for small perturbations, periodic motions (limit cycles) will appear in system (18). Namely the existence of limit cycle in system (18) allows starting a cascade of bifurcations leading to the appearance of a chaotic attractor.
Comment 2. Note that the condition A T + A = 0 cannot be omitted. Indeed, if this condition is satisfied, then ∀r > 0 A T + A − 2rI = −2rI < 0. Otherwise (at A T + A = 0), the inequality A T + A − 2rI < 0 does not hold ∀r > 0. Therefore, if A T + A = 0 condition 2) of Theorem 6 generally speaking cannot be achieved.
Nevertheless, the results obtained for system (18) can be strengthened if matrix A is replaced by a similar non-antisymmetric matrix A n = H −1 AH (A T n + A n = 0): In order to check condition 2) of Theorem 6 we can proceed in the following way. 1. In system (18) set parameter r large enough. 2. Decrease the parameter r until the limit cycle appears in system (18). 3. Continue slowly decreasing the parameter r until limit cycles of any multiplicity appear in system (18). (This means that a non-negative function V (x 1 (t), ..., x n (t)) will have any number of minimums.) If parameter r is close enough to value r c (which is unknown), then the minimums of function V (x 1 (t), ..., x n (t)) will take arbitrarily small (not zero!) values. This will mean that r ≈ r c (see Fig. 1).

Homoclinic chaos
A large number of chaotic attractors arise when so-called homoclinic orbits exist in a dynamical system (see [18] - [21], [32] - [37], and references to papers on chaotic dynamics cited elsewhere).
Definition 6 [35]. A bounded trajectory x(t, x 0 ) ∈ R n of system (7) is called a homoclinic orbit if the trajectory converges to the same equilibrium point as t → ±∞.
Let A = (a 1 , ..., a n ) ∈ R n×n be the antisymmetric matrix composed of columns a i ∈ R n ; i = 1, ..., n.
Let us ∀s ∈ {1, ..., n} denote by symbol A s ∈ R n×n (A s ∈ R n×n ) the matrix obtained from matrix A by replacing the column a s (row −a T s ) with the column −a s (row a T s ). Definition 7. Matrix A s (or A s ) will be called partially antisymmetric.
Let p < n be a positive integer. If columns a s1 , ..., a sp , 1 ≤ s 1 < ... < s p ≤ n of the matrix A are replaced by columns −a s1 , ..., −a sp , then the resulting matrix will also be denoted as A s1...sp and called partially antisymmetric. A similar designation A s1...sp is retained for the rows. Let .., f n (x n , γ n ∨ δ n )) T be two vectors of power functions.
Theorem 7. Let ψ and φ = 0 be arbitrary real numbers and let h(x) be odd function. Consider the following system of ordinary differential equationṡ in which it is assumed that deg h(x) > deg f (x). Then any solution x(t, x 0 ) of system (21)  (a2) Let ψ · φ = 0. Consider two scalar products: . (Here we have used the obvious equality: Without loss of generality, we can assume that φ > 0. Then from the conditions of Theorem 7 it follows that for a sufficiently large norm x of the vector x, we have V (x) > 0. This means that if V (x 0 ) = C(x 0 ) > 0, then the function V (x) = C(x 0 ) > 0 is bounded. Consequently, any solution of system (21) is closed and therefore periodic. Now let the vector of initial conditions x * 0 be such that V (x * 0 ) = 0. Therefore, if the function f (x) is odd, then the function V (x) is even. From here it follows that V (x * 0 ) = V (−x * 0 ) = 0 and in addition, V (0) = 0. Consequently, there is a trajectory of the system (21), which for t = 0 leaves some point x 0 = 0 arbitrarily close to x * 0 ≈ 0 and at t → ±∞ approaches to the point 0. Now, in system (21), we will change the sign of time: t → −τ . Then we will havė Further, we apply to the study of solutions of equation (22) the same technique as for the study of solutions of equation (21). As a result, instead of analyzing of the equation V (x(t)) = C(x t=0 ) > 0, we come to an analysis of the equation −V (x(τ )) = −C(x τ =0 ) < 0. Thus, based on the structure of function V (x(t)), we have V (x(t)) = V (x(τ )) = V (x(−t)) = V (−x(t)) = C(x 0 ) > 0. This means the existence of a homoclinic orbit.
(a3) Now let p > 1. Introduce the vector Then the proof of case p > 1 completely repeats the proof of case p = 1 if we take into account the obvious equality: A s1...sp s1...sp + (A s1...sp s1...sp ) T = 0. Since φ = 0, we can assume that φ = 1. Consider the following special case of system (18): (Here the matrix A is antisymmetric and the matrix A s1...sp is partially antisymmetric.) Thus, if conditions of Theorems 6 and 7 are satisfied for system (23), then we will get some chaotic behavior of this system. (Note that if the vectors h(x) and f (x) in system (21) is replaced by the vectors H(x, s 1 , ..., s n ) and F(x, s 1 , ..., s n ), then by virtue of Corollary of Theorem 3, the statement of Theorem 7 is preserved.) Definition 8. Chaos arising in system (23) will be called homoclinic. (21) we have n = 2, ψ = 1, and

Suppose that in system
Then, depending on parameter φ, we obtain the following phase portraits (see Fig. 2):  Then, depending on parameter ψ, we obtain the following phase portraits (see Fig.3 The result of Theorem 7 can be generalized in the following way. Let P ∈ R n×n be a real matrix. We introduce the vector (14)). Introduce also the following l = 2 n matrices: Then, Equation (23) can be rewritten aṡ Equation (24) can be used in two directions. 3. Simulation of systems containing a limit cycle.
In this case, ψ = 0 and instead of (24), you can use the systeṁ (Here, the shift vector c was added to the right side of system (24).) 4. Simulation of systems containing a homoclinic orbit (for example, Lorenz-like systems [37]). Clearly, if ψ = 0, then among the solutions of equation (24) there exists a homoclinic orbit. In addition, the homoclinic orbit can be obtained in the following way. Let ψ = −1, h(x) = f (x), c, c 0 ∈ R n , r ≥ 0, and instead of (24), you can use the systeṁ where D 0 =diag(d 11 , ..., d nn ) ∈ R n×n ; i, j ∈ {1, ..., l}. (The case D 0 = I is not excluded.) Then, for certain values of the parameters, the homoclinic orbit will exist among the solutions of system (26). (Note that in system (26) the components h i (v i ), i = 1, ..., n, of vector h(v) can be either even or odd activation functions.) For example, for the Lorenz systeṁ we have: The use of representation (26) to approximate derivatives can be motivated by the following considerations.
Indeed, without loss of generality, we can assume that a ≥ 0 and c ≥ 0. Let u * be the root of function φ(u) (it is easy to check that this root is unique).
We introduce the following vectors: The following Figure 4 shows the transition of system (18) from regular regime to chaotic behavior: Thus, we have shown that systems (6) can simulate a chaotic processes.
2. Strange attractor. Theorems 3 and 6 are valid if r > 0. In case r = 0 the statements of these theorems have not been proved. In this regard, consider 3D system (20) in which r = 0 (compare with system (17)): Thus, system (20) allows simulating more complex processes than system (18).

Algorithms for adjusting the weight coefficients of neural ODEs with power activation functions
Suppose that we study the behavior x(t) of some dynamical system and we can determine the dimension n of the space in which this system operates. We introduce real numbers ∆t > 0 and 0 < T < ∞ such that T >> n∆t. Assume that for any t ∈ [0, T ] vectors x(t + k∆t) andẋ(t + k∆t) ∈ R n can be measured; k = 0, 1, .... (If the measurement of vectorẋ(t) is impossible, then the standard approximation of derivativė is used to find it.) Algorithms are based on the least squares method [1] and the fact that we know sufficient precision the components of x(t) and its derivativeẋ(t).
Suppose that n time series g 10 , g 11 , g 12 , ..., g 1N , g 20 , g 21 , g 22 , ..., g 2N , . . . . . . . . . g n0 , g n1 , g n2 , ..., g nN are given on the same time interval T in equally spaced N nodes: 0, ∆t, ..., k∆t, ...., N ∆t = T . Thus, ∆t = T /N . The objective is to determine the dynamical system described by the equationẋ(t) = F(x) from the known time series; here F : R n → R n is realized by a multilayer neural network x k+1 = x k + ∆t · F(x k ); k = 0, 1, ... (see [9,10,11,17,25]). The difference between the input x and the output F(x) is compared withẋ(t) to generate an error e(t) ∈ R n . This error is used to adjust the network parameters so that e(t) =ẋ(t) − F(x) → 0. The function F(x) is the right-hand side of system (6) with unknown parameters. It is necessary to minimize the error e(t) (in any norm) by adjusting some of the parameters included in (6).

Algorithm 1: architecture of neural ODEs does not use antisymmetric matrices
The algorithm implements a search procedure for approximating a time series by solutions of a system of differential equations. In this system: 1. There is a linear part. 2. The nonlinear part contains one fixed even and one odd activation function that must be adjusted. 3. All matrices of the system are matrices of general form (antisymmetric matrices are not used).

Construct a perturbation matrix
2.5. Find the matrix Y using the least squares method: (Here I ∈ R (3n+1)×(3n+1) is the identity matrix.) 3. Compute the matrix of errors ERR := (e ij ) = DER − W · Y T ∈ R m×n and the total computational error at iterative steps k a and k b : (End of the double cycle for integer variables k a and k b .) 6. Print the matrix Y ≡ Y la,l b ∈ R n×(3n+1) , the numbers α > 1, β > 1 and stop the algorithm. 7. Solve system (28). If the solutions of system (28) diverge, then to increase the values γ and δ by a small number ∆ > 0 (γ := γ + ∆ > 0, δ := δ + ∆) and go to item 2. If for several values γ and δ the solutions of system (28) still diverge, then stop the algorithm. Comment 1. If there are no matrices B or D in system (28), then there are no fixed degrees of activation functions. Indeed, let B = 0, then Algorithm 1 works directly. If D = 0 holds, then the following redesignations of variables B → D, γ → α, and δ → β should be introduced into Algorithm 1. Comment 2. If the matrix D + D T is negative definite, then the resulting model of neural ODE will generate bounded solutions (see [25]).

Algorithm 2: antisymmetric matrices are used in the architecture of neural ODEs
It is known that in most real dynamical systems, chaotic phenomena arise as a result of the development of certain periodic processes. In this regard, in order to model the chaotic behavior of such systems, it is desirable to include a mechanism generating a limit cycle in the architecture of the neural network. In turn, the cascade of bifurcations of the limit cycle will lead to the appearance of chaos in the modeled system. The following algorithm is designed to simulate limit cycles in real dynamic processes.
As an object of modeling, we will choose some generalization of system (25). Consider the following matrix such that (S + D 0 ) + (S + D 0 ) T = 0. (Thus, (S + D 0 ) is antisymmetric.) Now, we introduce the following system It is known that the basis of search algorithms is the Jacobi matrix Jac. Let's form this matrix for system (30) using representation (29). In this case, the vector of parameters z will be constructed as follows : Let m = 1. We introduce the following auxiliary matrices that will be needed to construct the Jacobi matrix for system (30): where is the zero matrix; I n−i ∈ R (n−i)×(n−i) and I n ∈ R n×n are the identity matrices; i = 1, ..., n − 1. Now, we briefly describe the algorithm for adjusting the weights for system (30). .., g n ) ∈ R m×(n+1) .

Compute the vector of errors
and the total computation error at iteration step k:

If
then compute the vector else go to item 6. (Here I ∈ R (C 2 n +n(n+2))×(C 2 n +n(n+2)) is the identity matrix.) Print the vector Y k−1 ∈ R C 2 n +n(n+2) and stop the algorithm. 8. Solve system (30). (If the solutions of system (30) diverge, then in the initial conditions replace the value d 11 = 0 with a sufficiently small value |d 11 | and go to item 2.) Comment 1. A chaotic dynamic process modeling should begin with building a limit cycle. As Theorems 4 and 5 show, for a periodic trajectory to appear in system (30), it is sufficient that b 1 = ... = b n = 0. Therefore, Algorithm 2 can be started by putting b 1 = ... = b n = 0. Let n = 3. In order to simulate a chaotic process, it is necessary to introduce perturbations into the right-hand sides of system (30). These perturbations are generated by adding one neuron each, two neurons each, three neurons each (nonlinearity), etc. to the right parts of system (30). As a result, we can get the following system: Here , v γi i ) are real piecewise continuous power functions of one variable; k i = 1(k i = 0) if f i (...) is odd (even); i, j = 1, ..., 3; The addition of neurons can be stopped when a suitable approximation accuracy will be obtained.
We assume that B = 0. Then the use of Algorithm 1 with odd activation functions leads to such coefficients of system (28) Now, in order to simulate processes in system (32), we assume D = 0 and again use Algorithm 1, but with even activation functions. Then we get such coefficients of system (28):  Thus, we can assert that the use of even activation functions leads to more accurate modeling (especially, system (34)) than use of odd activation functions.

Applications of Algorithm 2
Consider the following special case of system (26):  The triple (l 1 , l 2 , l 3 ) indicates one of the possible types of homoclinic orbits that can be realized in system (36). In this subsection, Algorithm 2 will only be used to simulate systems with possible homoclinic orbits (ν = −1). Therefore, instead of the nonlinear part h(Sx + b) in the system (30), the nonlinear part h(SG 1 x + b) − h(SG 2 x + b) should be used.
In order to check the performance of Algorithm 2, several well-known systems were used. 1.2. We form time series for system (27) and assign h(v i ) = piecewise(v i < 0, (−v i ) 1.5 , v 1.5 i ); i = 1, ..., 3. Then Algorithm 2 applied to system (36) leads to the following results: where h 1 (v i ) = piecewise(v i < 0, −(−v i ) 1.1 , v 1.1 i ); i = 1, ..., 3. In this case, Algorithm 2 leads to the following results:   Thus, in the case 1.3 (see Fig. 8(a3)), there was no improvement in the quality of approximation.