Event-triggered optimal control for nonlinear stochastic systems via adaptive dynamic programming

For nonlinear Itô-type stochastic systems, the problem of event-triggered optimal control (ETOC) is studied in this paper, and the adaptive dynamic programming (ADP) approach is explored to implement it. The value function of the Hamilton–Jacobi–Bellman(HJB) equation is approximated by applying critical neural network (CNN). Moreover, a new event-triggering scheme is proposed, which can be used to design ETOC directly via the solution of HJB equation. By utilizing the Lyapunov direct method, it can be proved that the ETOC based on ADP approach can ensure that the CNN weight errors and states of system are semi-globally uniformly ultimately bounded in probability. Furthermore, an upper bound is given on predetermined cost function. Specifically, there has been no published literature on the ETOC for nonlinear Itô-type stochastic systems via the ADP method. This work is the first attempt to fill the gap in this subject. Finally, the effectiveness of the proposed method is illustrated through two numerical examples.


Introduction
The control problem of nonlinear stochastic systems is diffusely considered in different fields, such as biological systems, chemical reaction processes, financial systems [1,2]. And in existing control theories, the time-triggered control (TTC) is an important method for most control problems [3][4][5][6][7]. As we know, the TTC requires the controller to be updated at every moment. This defect of TTC greatly limits its practical applications [8]. Therefore, the method of event-triggered control (ETC) was proposed, which greatly reduces the computational complexity compared with the TTC approach. In ETC, the execution of the control task is determined according to a well-designed event-trigger mechanism, and the input control action is updated only when the triggering condition is violated. Since the property of less computation is required in ETC, it has been widely used to solve consistency problems in different systems, including multi-agent systems [9][10][11], unknown dynamics nonlinear systems [12,13], switching systems [14,15], and networked control systems [16,17].
It has been well acknowledged that nonlinear Itôtype stochastic systems are widely used in various fields, such as biological engineering, finance, nuclear reactors, and physics [18,19]. The stochastic systems not only supplement the deterministic systems, but also precisely reflect the dynamic characteristics of engineering systems. The existence of stochastic dis-turbances changes the dynamic characteristics of the original system, reduces the control performance, and even destroys the stability of the stochastic systems. In recent years, many scholars have studied the stability of stochastic systems. The problem of ETC for stochastic nonlinear delay systems with exogenous disturbances was first solved by Zhu in [20]. In [21], the stability analysis and design procedure of ETC for nonlinear stochastic systems with state-dependent noise were studied. For network-based control of linear stochastic systems with state multiplicative noise, the time-delay method and ETC approach were proposed to obtain sufficient conditions for the exponential mean square stability in [22]. However, almost all researches on ETC problems for stochastic systems focus on the design of feedback controller to achieve the control objectives. Different from the traditional control methods, the optimal control not only realizes the control goal but also optimizes the given performance cost function [23]. Optimal control has a very wide scope of applications in practice, such as enterprise planning, satellite launch, and production control. Therefore, it is of great significance to study the event-triggered optimal control (ETOC) of the nonlinear stochastic systems.
As we all know, the solution of optimal control problem usually comes down to the solution of HJB equation. However, the analytical solution of the HJB equation of nonlinear systems is very hard to get [24]. The ADP method that combines dynamic programming, reinforcement learning, and adaptive technology was firstly presented by Werbos in [25]. It provides a novel and effective method for solving the optimal control problem of nonlinear systems. Compared with dynamic programming, the advantage of ADP is suitable for complex nonlinear systems. Moreover, it can effectively solve nonlinear HJB equation and overcome the disaster of dimensionality. A greedy iterative algorithm by ADP was designed to solve the HJB for discretetime nonlinear optimal control in [26]. For nonlinear polynomial systems, a new global ADP approach was suggested to do with the adaptive optimal control problem [27]. The difference between this method and the known nonlinear ADP method is that it avoids the neural network (NN) approximation and greatly improves computational efficiency.
ADP methods for deterministic nonlinear systems with ETOC have been studied by many scholars and have been well developed [28][29][30][31][32][33][34][35][36]. For discretetime nonlinear deterministic systems, the ETOC prob-lem was studied in [28], which proposes an adaptive ETOC according to heuristic dynamic programming. In [29], an ETC based on ADP method was suggested to solve the optimal control problem of unknown nonlinear continuous-time systems with input constraints. Dong proposed an ETOC structure and ADP approach for nonlinear systems with control constraints in [32]. According to event-trigger mechanism and ADP approach, a new optimal control method for unknown nonlinear continuous-time systems was proposed in [34]. Guo et al. [36] studied the event-triggered guaranteed cost optimal tracking control problem for a class of uncertain nonlinear system by using ADP approach. However, there is no research for Itô-type stochastic systems with ETOC based on ADP method.
In addition, most works on ETOC problems for nonlinear deterministic systems first need to obtain eventtriggered HJB equations and then get approximate solutions by using ADP methods [37][38][39][40]. Noting that the fact that the event-triggered HJB equation is already a kind of approximate of the given equation, the error of the solution is relatively large. Moreover, the optimal performance index in ETOC is bound to degrade to some extent since the use of event-triggered control. Therefore, it is essential to investigate how much performance will decline for the ETC approach. However, there is little research on how the event-trigger scheme affects the optimal performance index. In the works of Dong et al. [32] and Zhu et al. [33], the prespecified cost function (C-F) for the ETOC contains an integral term, and the boundedness of the integral term was not analyzed. Luo et al. [41] proposed a new ETOC method, which guarantees an upper bound for C-F. However, this work does not take into account the impact of stochastic disturbances for system and performance index, which is not in line with practical applications.
Motivated by the above discussions, the problem of ETOC for nonlinear Itô-type stochastic systems is studied utilizing the ADP method in this paper. And the main contributions are as follows.
i) ETOC for nonlinear Itô-type stochastic systems is designed for the first time in this paper by using ADP method, which can get the numerical solution of HJB equation. Moreover, it can reduce computational load and savings communication resources. ii) In the existing researches, there is no literature to study the influence of ETOC on the corresponding performance index of Itô-type stochastic optimal control problem. In our work, a new ETOC is presented to ensure the predetermined upper bound of the corresponding performance index for Itô-type stochastic optimal control problem. iii) For ETOC problems, the main purpose of most works is to achieve the numerical solution of the event-triggered HJB equation by applying ADP method [37][38][39][40]. However, considering the fact that the event-triggered HJB equation itself is already a kind of approximate of the original HJB equation. In our work, a new event-triggering scheme is proposed, which can be used to design ETOC directly via the solution of HJB equation and the accuracy of the approximate solution can be improved.
The rest of the paper is introduced as follows. In Sect. 2, we give the problem statements and preliminaries. The ETOC scheme, stability problem, and an upper bound of the C-F are developed in Sect. 3. Section 4 constructs CNN to estimate the optimal value function. The ETOC based on ADP method is analyzed theoretically in Sect. 5. In Sect. 6, examples results are presented to illustrate the application of the proposed method. Section 7 concludes this paper.

Problem statements and preliminaries
The notations used in this paper are as follows. R n denotes the n-dimensional Euclidean space and · represent vector norm. The superscript T represents the operation of transpose. Denote by (Ω, F, {F t } t≥0 , P) a complete probability space with a natural filtration {F t } t≥0 . E{·} denotes the correspondent expectation operator with regard to a given probability measure P. σ (·) and σ (·) represent the minimum and maximum of singular values. X is a compact set of R n . Let C 2 (X ) represent the family of nonnegative functions V(x) and Y(x), which are twice differentiable in x ∈ X ⊂ R n .
Consider the Itô-type nonlinear stochastic systems as where x = [x 1 , · · · , x n ] T ∈ X ⊂ R n represents the system state, the control input u(x(t)) ∈ R p , and w(t) be a one-dimensional Brownian motion defined on space (Ω, F, {F t } t≥0 , P). F(x(t)), G 1 (x(t)), and G 2 (x(t)) are Lipschitz continuous on compact set X ∈ R n . Moreover, The cost function of (1) can be written as where Q(x(t)) is a positive definite function, expressed as a weighting function of state x(t) with Q(0) = 0, and R > 0. The goal of optimal control problems is to find the optimal value function as where V * (x(t)) ∈ C 2 (X ), V * (x(t)) ≥ 0, and V * (0) = 0. If the control input u(x(t)) is admissible, then the Hamiltonian of V * (x(t)) and u(x(t)) is given as where and Based on [42], V * (x) can be achieved through solving HJB equation as Therefore, the optimal control is given as and the optimal performance index can be written as

Substituting (6) into (5), the HJB equation becomes
Remark 1 Noticing that the HJB (7) is a time-triggered HJB equation, the controller is required to stay active at every time instance. Obviously, the time-triggered optimal control (TTOC) has the disadvantage of requiring a heavy computational burden and needs more communication sources. Fortunately, the controller in the ETOC method is updated only when the triggering condition is violated, which can surmount the above shortcomings. In this paper, we will develop a novel ETOC to achieve the approximate solution of (7).

Design of ETOC and stability analysis
Let triggering instants set {t k } satisfy 0 = t 1 < · · · < t k < · · · , lim k→∞ t k = ∞ and define the sampled state as where t k is the triggering instant. Let e(t) represent the error of true state x(t) and sampled statex(t). Then, we have The sampling interval of the ETOC is defined as According to (6), the ETOC has the form By system (1), error (8), and ETOC (9), the closedloop system is written as Now, a new event-triggering scheme considered in this paper is where c > 0 is a predetermined constant parameter, which will determine an upper bound of the C-F of (2). The controller will be updated according to the current state when the triggering scheme (11) is violated. Therefore, the next release time instant t k+1 can be updated as Theorem 1 Closed-loop system (10) with the eventtriggered optimal controller μ(x) in (9) and the eventtriggering scheme (11) is asymptotically stable in probability and the upper bound can be given for the C-F if V * (x) is a solution of (7).
Proof Choose V * (x) as the Lyapunov function. According to (11) and (12), we acquire which implies that (10) is asymptotically stable in probability. Based on (13), we have Next, it follows from Dynkin formula and (13) with V(x T ) > 0 that for any T > 0, Letting T → ∞, one can obtain (10) with the triggering scheme (11) and the triggering time sequence (12), the sampling interval h k of the ETOC is monotonic nondecreasing. That is, h k 1 h k 2 , for any k 1 k 2 > 0.

Corollary 1 For system
Proof For t ∈ 0, h k 2 , according to (12), we have
If the parameter c = 0, we can obtain the sampling interval h k = 0 for all k and J (x 0 , μ) = J * (x 0 ). Then, the ETOC will degrade into a traditional TTOC.
Proof According the HJB equation (7), we obtain Based on (11), (14) and the condition c = 0, we get one can obtain from (12) that t k+1 = t k , that is, h k = 0 for all k = 0, 1, 2, · · · . According to the condition c = 0 and Theorem 1, we can obtain Remark 2 Based on Corollary 1, we know that a larger c will lead to a larger sampling time, which greatly reduces the computational complexity and saving communication resources. That is, c is a tuning parameter between the TTOC and the ETOC.

Remark 3
It is worth noting that our proposed eventtriggered scheme (11) can guarantee that the C-F has an upper bound, and we consider the system and C-F affected by stochastic disturbances. Although the ADP method was used to solve the ETOC problem of continuous systems in [32,34,43], the C-F of event-triggered control was not unsolved. In addition, there has been no works to study the influence of ETOC on the C-F of stochastic optimal control problems.
Remark 4 In our work, the proposed ETOC approach only needs to solve the HJB equation (7) directly and does not need the event-triggered HJB equation, which has better practicability. However, the HJB equation of event-triggered was needed in [37,40], where the value function V * (x) was required to satisfy both the event-triggered HJB equation and original HJB equation. Thus, our work is more practical than those in [37,40].

Critic neural network design
The V * (x) is estimated by using a CNN. And the structure of NN-based approximate value function as where ω * ω * 1 , · · · , ω * L T represents the CNN weight vector, m k (x) [m 1 (x), · · · , m L (x)] T is the vector activation function with m k (x) ∈ C 2 (X ) and m k (0) = 0, and δ(x) denotes the error of CNN estimation.
Due to the ideal weight ω * is usually not available and difficult to obtain. Therefore, the CNN is applied to approximate V * (x) aŝ Submitting (16) to (9), the ETOC based on ADP is expressed as where ∇x M L where c > 0 is the predetermined constant that will determine an upper bound of the C-F with the ETOC (17).V(x) is the approximation of V * (x). When the event-triggering scheme (18) is violated, the controller will be updated according to the current state x(t).
The ADP approach is developed to study the CNN weight vector ω. Motivated by works [34,41,44], the following assumption is presented.

Assumption 1
Let P(x) ∈ C 2 (X ) and P(0) = 0 be a Lyapunov function candidate such that According to (16) and (17), the approximate Hamiltonian has the form where ⊗ represents the Kronecker product. Since the estimation error in the CNN, the replacement of V * (x) and u * (x) in (5) withV(x) and μ(x) will cause residual error, that is,V(x) = 0. Therefore, the residual error betweenĤ(x, μ(x), ∇ xV (x)) and H(x, u * (x), ∇ x V * (x)) can be expressed as Due to derive the minimum value of η, the square residual error has the form Accordingly, the following gradient-descent-like rule is designed aṡ η(x, w) where Remark 5 Two explanations for (20) are showed as follows.
(1) The purpose of the first term in (20) is to minimize the target function E(x, w) via the gradient descent method. Meanwhile, 1/(1 + φ T (x)φ(x)) is a normalized processing term, and γ > 0 is a constant gain. (2) The last term in (20) is added to ensure the stability of system (1). The derivation of this term is as follows. Represent the derivative of P(x) along system trajectory Applying the gradient descent approach to Ψ , we have Obviously, it is helpful to prove Ψ < 0, which can ensure the stability of system (1).
Remark 6 Now, we introduce the implementation principle of the ETOC based on ADP. Utilizing x andx to validation event-triggering scheme (18), and using x in (20) to calculate the CNN weight ω. The control input (17) will be calculated according tox andω when the event-triggering scheme (18) is violated.

Theoretical analysis
The system stability and cost function bound are analyzed theoretically in this section. First, the following definition and assumptions are introduced.
Definition 1 [45] For system (10), the trajectory x(t) is SGUUB in p-th moment. For a compact set X ∈ R n and any Define the CNN weights error isω(t) ω(t) − ω * . Then, we havė

Assumption 2
The control input u * (x) is Lipschitz continuous, that is to say, for every x 1 , x 2 ∈ X , there exists K > 0 satisfies

Assumption 4 Assume that
(1) F(x) is Lipschitz continuous, that is, for any x 1 , x 2 ∈ X , there exists l f > 0 satisfying F(x), G 1 (x), and G 2 (x) are bounded on the compact set X , that is, and there exist δ M , δ d M , and δ DM > 0 such that where and U , V , Y are as follows: Proof Selecting the Lyapunov function condition where P(x(t)) ∈ C 2 (X ) is given in Assumption 1, Y(x(t)) ∈ C 2 (X ), and Y(x(t)) ≥ 0 is a positive func- (24) includes both discrete dynamics and continuous dynamics, we analyze the stability analysis under the following two cases. Case 1 Events are not triggered, that is, t ∈ [t k , t k+1 ). For system (10), taking the derivative operation for (24) and using (22), we get LY(x(t)) = LV * (x k )+LV * (x(t))+ω Tω +LP(x(t)).
By applying E[LY(x(t))] 0 and using Dykin formula, one can obtain Thus, we get Then, the true state x(t), sampled statex(t), and CNN weights errorŵ(t) are SGUUB in probability.

Case 2
We consider the event-triggered moment t = t k . Consider the difference of the Lyapunov function Y(x(t k )) defined in (24) on the event-triggered instant, we get From the stability of the flow dynamics, we have that for S D , where K(·) represents class-K functions in [46]. The strictly increasing property of class-K functions ensures the decrease of ΔV * (x (t k )). Therefore, we can get ΔY ( when S D . Then, the true state x(t), sampled statex(t), and CNN weights errorŵ(t) are SGUUB in probability. (17), let Assumptions 2-4 hold. Then, an upper bound is given on the cost functionJ (x 0 , μ).

Theorem 3 For system (1) with control input
Proof For system (1) with control (17), doing derivative operation for V * (x) in (15), we get Then, we get According to the Dynkin formula and condition δ δ M , for any T > 0 and V(x T ) > 0, one has Letting T → ∞, we havē Thus, we can conclude that there exist an upper bound (1 + c)[J * (x 0 ) + 2δ M ] for the real performance index. Furthermore, it can be demonstrated that the upper bound of prespecified C-F will be close to the ideal value when the NN estimation error δ(x) → 0.

Remark 7
Noticing that for deterministic continuoustime systems, there have been many results reported about ETOC based on the ADP method [33,41]. However, there has been no published paper on the topic for nonlinear Itô-type stochastic systems. Our work is the first attempt to fill the gap of this subject.

Remark 8
It is worth emphasizing that an upper bound of the prespecified C-F can be obtained by the parameter c. Besides, according to Theorem 3, when the estimation error of NN is considered, the upper bound of NN can also be analyzed via ADP. However, in [32,33], the prespecified C-F for the ETOC contained an integral term, and the boundedness of the integral term was not analyzed.

Remark 9
In our work, the event-triggering scheme (11) is different from those in [38,39], and the latter ignored the effect of stochastic disturbances. In [38,39], the main purpose is to achieve the numerical solution of the event-triggered HJB equation by applying ADP method. However, the event-triggered HJB equation is really a kind of approximate of the given HJB equation. It is noteworthy that the event-triggering scheme (11) can utilize the ADP method directly to get original solution of HJB equation. Thus, the error of numerical solution can be reduced.

Simulation results
Example 1 In this example, a controlled Van der Pol oscillator of system (1) is expressed as where The corresponding cost function is given in (2) Figure 1 shows the response of the state x andx, one can obtain that system (35) is asymptotically stable in probability. Meanwhile, it shows that the state x under the TTOC and the ETOC has a similar effect, and ETOC reduces the computing load. Figure 2 shows that the ETOC μ(t) and the TTOC u(x). It reflects that the frequency of control updating of ETOC can be greatly reduced. The convergence of the CNN weight ω is shown in Fig. 3,  Fig. 3, the initial weights of the CNN are all set to zero, which means that the implementation of the control strategy does not require the initial stabilizing control. Figure 4 shows the intersample times h k , and the minimum h k is 0.02 s. Moreover, simulation results about t ∈ [0, 30] incidents that only 36 state samples applied to execute the ETOC approach, which takes about 0.036% of the whole state samples. Thus, the use of communication resources can be promoted and the computational complexity is greatly decrease.
Example 2 Consider the power system with stochastic disturbance as where the system matrices as where x = ΔP f , ΔG p , ΔV g , ΔP f represents the incremental frequency deviation, ΔG p denotes the incremental change in generator output, and ΔV g denotes the incremental change in governor valve position. By x 1 denoting the ΔP f , x 2 denoting the ΔG p , and x 3 denoting the ΔV g . P t = 0.5, R g = 0.1, T t = 0.6, G t = 0.5, and P k = 0.6 represent the plant model time constant, the feedback regulation constant, the turbine time constant, the governor time constant, and the plant gain, respectively.
Remark 10 Most works on the power systems are based on deterministic systems [38,47]. However, in practical applications, stochastic perturbations will have an impact on all aspects of the power systems, such as the fluctuation of system frequency, node voltage, and generator speed. The existence of stochastic perturbations destroys the stability and reduces the control performance of the systems. Therefore, the ETOC problem of the power system with stochastic perturbations is studied in this paper.
Simulation figures are shown in Figs. 5, 6, 7, and 8. Figure 5 shows the response of the state x andx, one can find that the ETOC can guarantee system (36) is asymptotically stable in probability. Figure 6 shows the trajectories of ETOC μ(t) Thus, the approximate value function asV(x) = M T L (x)ω = 0.334x 2 1 − 0.3568x 1 x 2 + 0.4211x 2 2 − 0.3813x 1 x 3 + 0.4064x 2 x 3 + 0.407x 2 3 . As illustrated in Fig. 8, the minimum interexecution times are 0.06 s. Moreover, simulation results about t ∈ [0, 10] incidents that only 11 state samples applied to accomplish the ETOC algorithm, which takes about 0.03% of the whole state samples. In that sense, the ETOC used in this paper can maintain the control performance while effectively reducing the number of sampling and control tasks.

Conclusion
In this paper, we study the ETOC problem for nonlinear Itô-type stochastic systems based on ADP method. A new event-triggering scheme is proposed, which can be used to design ETOC directly via the solution of HJB equation. Furthermore, the stability of nonlinear Itô-type stochastic system is analyzed, and the given ETOC scheme can give an upper bound for predetermined C-F. In particular, the ADP approach is firstly applied to study nonlinear Itô-type stochastic systems with ETOC, which can achieve the numerical solution of the HJB equation.
There are some meaningful topics that can be investigated in the future. Since the practical systems are inevitably impacted by time-delay, stochastic perturbations, and unknown dynamics. Therefore, it is of great significance to study the ETOC for stochastic systems with time-delay and unknown dynamics. In addition, it is also a very promising topic to study the ETOC for nonlinear stochastic multi-agent systems.