Adaptive dynamic programming-based optimal control for nonlinear state constrained systems with input delay

This paper investigates the problem of adaptive optimal tracking control for full-state constrained strict-feedback nonlinear systems with input delay. To facilitate the study, a novel control approach is developed by combining the backstepping design technique and adaptive dynamic programming (ADP) theory. At first, an intermediate variable is introduced to approximate the input delay using Pade approximation. Then, barrier Lyapunov functions are incorporated into the backstepping procedure to handle the state constraints. Moreover, neural networks are employed to approximate unknown functions in the presence of uncertainties. Based on this, an adaptive backstepping feedforward controller is developed, which converts the tracking task into an equivalent regulation problem for the affine form nonlinear system. To obtain the optimal control of the affine form nonlinear system, a critic network is constructed within the ADP framework to approximate the solution of Hamilton–Jacobi–Bellman equation, and online learning is utilized to obtain the optimal feedback control. The resulting controller consists of feedforward and feedback parts. Meanwhile, all signals in the closed-loop system are guaranteed to be uniformly ultimately bounded. Finally, the effectiveness of the proposed control scheme is illustrated through a numerical example.


Introduction
In the past decades, the design of tracking controllers for uncertain nonlinear systems has attracted growing attention in the field of control.Many approaches have been proposed to approximate the uncertain function in nonlinear systems, such as, fuzzy logic system [1], support vector regression [2] and neural network [3], among these, neural network approximation is one of the most popular methods.For instance, an adaptive backstepping control method was developed in [4] by integrating backstepping control design theory with neural networks, which not only addressed the tracking control challenges of nonlinear systems with uncertainty but also ensured the stability of the closed-loop systems.Then, various methods based on this idea were proposed for both single-input single-output nonlinear systems [5][6][7] and multi-input multi-output systems [8][9][10][11].However, the optimality of the controller is not considered in the aforementioned control approaches, while it has significant impacts on industrial systems.Therefore, it is of great significance to further investigate the design of optimal control for nonlinear systems in practical applications.
The key issue of the optimal control for nonlinear systems is the solution of the Hamilton-Jacobi-Bellman (HJB) equation, which is generally difficult to solve analytically since it is a first-order nonlinear partial differential equation.To overcome the issue, Werbos [12] proposed adaptive dynamic programming (ADP) theory, which allows for online approximation of the HJB equation and has attracted a great deal of attention in the field of control.ADP-based methods [13,14] typically employ neural networks to construct an actor-critic architecture, where the critic and actor network are used to approximate the performance index function and the control policy, respectively.In [15], the adaptive optimal control for nonlinear continuous-time systems in strict-feedback form is solved by combining the backstepping design technique and actor-critic architecture.Subsequently, the results were extended to the cases of output feedback optimal control design in [16].In practical engineering applications, security must be considered first and then followed by optimality.However, the current ADP-based methods seldom considered the security of system state constraints.Furthermore, the exploration signals required for online learning algorithms may cause system instability if unsafe exploration signals are employed.Therefore, it is critical to develop the ADP-based controllers that ensure both the state constraints and optimal performance.For continuous-time system, Marvi et al. [17] proposed a safe off-policy reinforcement learning method that incorporates a control barrier function into the performance index function to balance safety and control performance.For discrete-time system, the same control barrier function was introduced into the utility function to address state constraints in [18].Additionally, the adaptive optimal controller is designed based on Barrier Lyapunov Function (BLF) in [19].This approach transforms the state constraints problem into a stability control problem, allowing us to select appropriate parameters to ensure the stability of the closed-loop system and prevent violation of the state constraints.
In addition to the state constrained problem, time delays exist in most practical systems and may potentially degrade control performance or even cause instability of the closed-loop systems [20,21].Hence, extensive studies have been conducted in various areas to address the control problem with time delays [22][23][24][25][26][27].In [22], the H ∞ optimal control problem of linear time delay systems was studied.The authors proposed a value iteration-based method to transform the linear H ∞ problem into a nonlinear differential equation.In [23], an optimal control strategy was proposed with an explicit and easily computable upper bound on time delay by employing the model transformation method.The above methods solved the problem of state delay but were not applicable to input delay.One common approach for addressing input delay is to use the Halanay inequality.In [24], the time-varying dynamic event-triggered leader-follower control with input delay is transformed to stabilize the delay differential system.Another approach is to transform the input delay system into a new system without input delay using Pade approximation.In [25], a fuzzy sliding mode controller combined with Pade approximation was proposed to solve the problem of networkinduced input delay and unknown nonlinear dynamics.According to this idea, to eliminate the effect of timevarying input delay and unknown system dynamics, the adaptive neural network control scheme was proposed by combining Pade approximation method [26].Meanwhile, Sui et al. [27] transformed the triangular structure system into an affine nonlinear system using backstepping design techniques with dynamic surface control methods.
In this article, we investigate a ADP-based adaptive optimal control strategy for state constrained nonlinear systems with uncertainty and input delay.The main contributions of this paper can be summarized as follows: 1. To the best of our knowledge, this is the first study to investigate online ADP-based adaptive optimal control for nonlinear systems under the influence of uncertainty, exploration signal and input delay.Different from the control methods in [17,18], our proposed method ensures the stability of system when affected by the exploration signals during the learning process.2. This method takes full advantage of the global approximation characteristic of neural networks and avoids solving the controller derivatives ẋid in [15,16,19].Therefore, the proposed method is easier to implement in practical applications.3.In contrast to the control strategies without considering the optimal control objective in [7,26], the proposed method guarantees the boundedness of all signals in the closed-loop system and achieves the optimization of tracking performance.
The remainder of this paper is organized as shown below.The problem statements and preliminaries are given in Sect. 2. Adaptive feedforward controller and optimal feedback controller are designed in Sect.3. The stability analysis of the closed-loop system is provided in Sect. 4. In Sect.5, simulation is presented to illustrate the effectiveness of the developed method.Finally, in Sect.6, we conclude this paper by summarizing the content of our work.

Problem statement and preliminaries
Consider the following uncertain system in strictfeedback form with input delay where y ∈ R, x i ∈ R and u ∈ R are the system output, state variables and system control input, respectively, is the known smooth function, f i (•) stands for the unknown smooth function, τ denotes the input delay which is a positive constant.All the states are required to be in the set where k ci are positive constants.
Remark 1 Many real-world systems can be expressed or transformed as the nonlinear systems in (1), such as the missile-target guidance systems [19], the robotic manipulator [7], and so on.For various control systems, the constraints and time delays widely exist and may reduce system performance.Some works have considered the design of the adaptive controllers to address the control of uncertain nonlinear systems with both full-state constraints and input delay [25,26,28].However, these adaptive control techniques ignore the importance of optimality in real-world systems.This paper attempts to design an adaptive optimal controller to cope with full-state constraints and input delay in a unified framework, which is more significant and general for the realistic environment.
The purpose of this study is to develop an adaptive controller based on backstepping and ADP methods.When the controller suffer from the input delay, the output y can track the target trajectory y d (t) and the state constraints are never violated.In addition, the tracking performance is optimized to the extent possible.
Without loss of generality, we make the following assumptions, which hold generally in practical systems.
Assumption 1 There are positive constants Ā0 , and A 0 , A 1 , . . ., A n , such that the target trajectory y d (t) and its ith order derivatives y This assumption is commonly made in existing literature [16,19,28].Assumption 2 implies that the control gain functions g i ( xi ) are bounded.This is a reasonable assumption as the control gain cannot reach infinity in practical systems.
In this paper, the continuous function f i (•) is unknown and cannot be directly used to obtain the control objective.Due to the excellent function approximation capabilities, neural networks are frequently used to approximate nonlinear functions [11].Thus, the unknown function can be approximated using neural networks, which is given as follows: where W = [w 1 , w 2 , . . ., w l ] T represents the ideal weight vector, l is the node number of neural network, σ (•) is the activation function, δ(•) denotes the approximation error which is bounded over the constant δ, i.e. δ(•) ≤ δ.

Controller design
In this section, the adaptive optimal tracking controller is designed to address the state constraints and input delay.First, the system (1) is transform into a new system without input delay using Pade approximation.Then, the backstepping design approach and neural network adaptive control strategy will be used to construct the feedforward controller that transforms state constrained strict-feedback nonlinear systems into affine nonlinear systems without state constrained.Finally, an optimal feedback controller is developed based on ADP theory to ensure the stability of the affine nonlinear system.

Pade approximation
To begin with, Pade approximation technique is presented in this study to address the problem of input delay.The input delay u(t − τ ) can be described on the frequency domain by Laplace transform where {u(t)} represents the Laplace transform of u(t), s is the Laplace variable.The term e −τ s can be approximated by rational function, then the inverse Laplace transform of the approximation rational function can replace the input delay u(t − τ ) with some new variables.Pade approximation provides a better approximation than Taylor series of the same given order [29].Therefore, we use Pade approximation to approximate the term e −τ s e Remark 3 In this paper, we choose the [1/1] order of Pade approximation to solve small input delay problem.When the time delay τ is small, the approximation error of ( 3) is nearly zero.Therefore, replace the e −τ s term.
In order to transform back to the time domain, a intermediate variable x n+1 (t) = u(t − τ ) + u(t) is introduced and satisfies the following Laplace transform which can be rewritten as where τ = 2 τ .According to the inverse Laplace transform, one have Thus, the input delay can be approximated by introducing a new variable x n+1 .Then, the system (1) can be approximated as follows:

Feedforward controller design
In this subsection, the system (4) in strict-feedback form is transformed into an affine form nonlinear system using backstepping design technique based on adaptive neural control framework.
Based on the system (4), the state error equations are defined as where the virtual control input is given as x id = x α id + x * id , x α id stands for the feedforward virtual control input and x * id denotes the feedback optimal virtual control input.The actual control input is given as u = u α + u * appears in the last step, where u α and u * represent the feedforward actual control input and feedback optimal actual control input, respectively.
The backstepping approach will use N steps to stabilize the tracking error e 1 , which are presented next.
Step 1: Combining e 1 = x 1 − y d in (5) and the system (4), we have where x 1d = y d .The unknown function f 1 (•) and ẏd in ( 6) can be replaced by the following function where By applying neural network to approximate the uncertain function F 1 (z 1d ) is as follows: where Ŵ1 is the estimation weight vector of optimal weight vector W 1 and W1 = W 1 − Ŵ1 is the neural network weight error.Then, according ( 7) and ( 8), 6) can be approximated by where F 1 (z 1 ) = f 1 (x 1 ) − ẏd is a function of x 1 and ẏd , then the input z 1 is selected as Based on ( 8) and ( 9), ( 6) becomes To construct the ideal feedforward virtual control input x α 2d , consider the BLF candidate as where By using Assumption 2 and Young's inequality, one has Substituting ( 13) into ( 12) yields Then, the feedforward virtual control x α 2d is chosen as where k 1 > 0 is a design parameter.The adaptation law is given by where β i is a positive design constant.Substituting (15) and ( 16) into ( 14), we get Step i (2 where xid = [x 1d , x 2d , . . ., x id ] T .Define the unknown function where . By employing the neural network to approximate the uncertain function F i (z id ), one has where Ŵi is the estimation weight vector of optimal weight vector W i and Wi = W i − Ŵi is the neural network weight error.According to (19) and (20), 18) can be approximated by where Thus, (18) can be rewritten as Remark 4 Note that neural networks are used to approximate the unknown function in the proposed approach.From ( 15) and ( 18), it can be seen that d , then the input vector of neural networks is selected as The first element of the input vectors is used to recover the input of f i (•), while the remaining elements are used to recover the information of ẋid .Based on the designed neural networks in (20), ẋid in (18) can be eliminated, resulting in (21) without the unknown f i (•) and ẋid .
Consider the BLF candidate as where k bi is a positive constant to be designed.Then, differentiating By using Assumption 2 and Young's inequality, we obtain Substituting ( 24) into (23), we have where δ * i = [ δ1 , . . ., δi ] T .
The feedforward virtual controller x α (i+1)d is given as The neural network adaptation law is shown Then, along with ( 25), ( 26) and ( 27), yields k j e 2 j . ( Step n: According to (5), the time derivative of e n is given by where Ŵn is the estimation weight vector of optimal weight vector W n and Wn = W n − Ŵn is the neural network weight error.Based on the similar procedure in the above, one has where ) is a small exploratory signal is added to the optimal feedback control input, which is assumed that d is the upper bound of the external exploratory signal, i.e. d(t) ≤ d.The exploratory signal d(t) is employed to ensure the persistence of excitation assumption, which will be explained later.
Choose the overall BLF candidate as Differentiating V n and taking (28) into account generates By applying Assumption 2 and Young's inequality, we obtain where D = ḡn d.
Substituting ( 28), ( 32) and ( 33) into (31) gives Design the feedforward control input u α as The neural network adaptation law is given as Substituting ( 35) and (36) into (34), yields Using Young's inequality, one has Thus, (37) can be rewritten as where Remark 5 In the previous works [15,16,19], the feedforward controller x α id is calculated by the derivatives of virtual controllers ẋ(i−1)d .According to (5), we know that x id = x α id + x * id .Since x α id and x * id are calculated by neural network weights and system states, it is difficult to obtain analytical solutions for ẋid in practical applications.In the proposed approach, based on ( 7), ( 19) and ( 29), the derivatives of virtual controllers ẋid are approximated by neural networks.Therefore, all the feedforward controller x α id and u α can be obtained directly from the current system parameters without the requirement of the derivatives of virtual controllers in ( 15), ( 26) and (35).Hence, compared to [15,16,19], the proposed feedforward controller x α id and u α are easier to implement in practical applications.
Remark 6 Based on equation (39), it can be observed that the feedforward controller U α is insufficient to stabilize the entire closed-loop system by itself.Therefore, it is essential to develop the optimal controller U * for the following system (40) to guarantee the stability of the right-hand side of the last term in (39).The feedforward adaptive backstepping controller solves the fullstate constraints and input delay problem by introducing BLF and Pade approximation.Subsequently, the next step is to design the feedback controller based on ADP theory, which will optimize the tracking performance of the system and guarantee the stability of the closed-loop system.

Feedback optimal controller design
In this section, based on the ADP theory, the optimal controller U * is designed to stabilize the nonlinear uncertain system (40).
The system (40) can be rewritten as where E = [e 1 , e 2 , . . ., e n ] T is the system states, Define the performance index function as where Q and R are positive definite matrices with appropriate dimensions.Define the Hamiltonian function for the performance index function J (E) as where ∇ J (E) represents the derivative of J (E) with regard to E, i.e. ∇ J (E) = ∂ J (E) ∂ E .Then, by solving the HJB equation, the optimal control U * can be derived as where the optimal performance index function J * (E) is obtained by min Substituting ( 45) into (43), the HJB equation can be rewritten as The above equation is actually a nonlinear partial differential equation for ∇ J * (E), which is difficult to solve analytically.To overcome the problem, the ADP theory is applied in this paper.
The optimal control scheme is built using a critic neural network.The optimal performance index function J * (E) can be approximated by a single-layer neural network as where W c is the optimal critic neural network weight vector, σ (•) denotes the activation function, l c represents the number of the neurons, and ε l c (E) represents the approximation error.The gradient of J * (E) with respect to E is expressed as According to ( 46) and ( 47), (44) can be represented as Then, the HJB equation can be written as where ε H J B is the residual error.Due to the ideal weight W c is often unavailable, the performance index function is estimated by neural network, we have where Ĵ (E) and Ŵc are estimations of J (E) and W c , respectively.The weight estimation error for the critic neural network is defined as Thus, the estimate of U * is given by The approximate HJB equation is derived as Define the objective function as To obtain the performance index function, one often chooses to adjusting the weights of the critic neural network to minimize the objective function (53).
Inspired by [5] and [30], a suitable weight updating law is designed, which can minimize the objective function (53) and also guarantee that Ŵc converges to W c . where According to Assumption 2, we know that the function G(X ) is bounded.Then, Ĝ(X ) is bounded such that Ĝ(X ) ≤ G. F 1 and F 2 are adjustment parameters with the right dimensions. is described as follows: where V (E) is chosen as a continuously differentiable Lyapunov function candidate.Suppose that P is a positive definite function, we have the following formula satisfied Specifically, V (E) is a function of state variable E, which can be appropriately chosen, for example Remark 7 The Ẇc consists of three parts: the first part is devised by the gradient descent method.The second part is introduced to guarantee that the system states are bounded in the critic network learning process.The last part is included to ensure the stability of the system.Note that if E = 0 and H (E) = 0, then H (E, Ū , Ŵc ) = 0.If F 2 = F 1 ϕ T , then we have Ẇc = 0 and the weight vector Ŵc will not updated and the optimal control may not be obtained.In order to avoid this circumstance from happening before Ẇc has converged, a small exploration signal d(t) is added to the control input u in (30), i.e., the persistence of excitation assumption is needed.It is worth pointing out that the feedforward virtual control input is designed based on BLF, which can keep the system states within the user-defined sets.Therefore, if the small exploration signal d(t) satisfied d(t) ≤ d, the stability of the closed-loop system can also be guaranteed.

Stability analysis
In this section, we will discuss the system stability via Lyapunov direct method.Before proceeding, we give the necessary assumptions, which have been widely used in literatures like [5,19,30].
Remark 8 Assumption 3 shows that the critic neural network has the ability to approximate the performance index function J * (E) with finite approximation error, which is the basic assumption for using neural networks.
Theorem 1 Consider the strict-feedback nonlinear system (1), the feedforward controller and its parameters adjusting law are developed in (15), ( 16), ( 26), ( 27), ( 35) and (36).The optimal feedback control law is chosen as (51) and the critic neural network weight update according to (54).If the design parameters are selected appropriately, all signals in the closed-loop system are guaranteed to be bounded.The output y(t) of the system can optimally track the target signal y d (t).
Proof Please see the "Appendix".
We can derive the convergence property of the approximate optimal control Ū (E) based on Theorem 1.

Corollary 1
The estimated control Ū constructed in (51) converges to a bounded neighborhood of the optimal control U * .Proof Please see the "Appendix".
and Wc are bounded, we can obtain that the signal Ŵc = W c − Wc is bounded.Therefore, the optimal feedback control x * id is bounded by a constant x * id , i.e., x * id ≤ x * id , i = 1, . . ., n.Furthermore, because Wi and W i are bounded, the boundedness of Ŵi = W i − Wi can be obtained.From (15), the feedforward virtual control x α 2d is a function of x 1 , y d , ẏd , and Ŵi .Then, there exists a constant x α 2d which is the upper bounded of x α 2d , i.e., x α 2d ≤ x α 2d .Thus, we have that the control signal is By the same way, it can in turn be proved that Accordingly, one can come to the conclusion that all the closed-loop signals are bounded and the tracking error converges to a bounded region of zero.Furthermore, the states remain within the bounded intervals.

Simulation and results
In this section, a simulation is presented to illustrate the effectiveness and control performance of the proposed approach.
Consider the strict-feedback form of a second-order nonlinear system with input delay as u and y represent the input and the output of system.x 1 and x 2 are the states of system are constrained in |x 1 | ≤ k c1 = 0.55 and |x 2 | ≤ k c2 = 0.8, respectively.τ is the input delay which is given as τ = 0.032.The initial states are defined as x 1 (0) = 0, x 2 (0) = 0.1 and x 3 (0) = 0.The reference signal is y d (t) = 0.5 sin(t).
In the optimal feedback controller, the initial critic network weights are selected randomly in [0, 1].We choose the activation function of critic network as σ (E) = [e 1 , e 2  1 , e 2 , e 2 2 , e 1 e 2 ] T .In addition, the design parameters are selected as where I is the identity matrix with suitable dimensions.In the first 10 s, a small exploratory signal d n (t) = sin 5 (t) cos(t)+sin 5 (10t) cos(0.1t) is added to the control input u.
The simulation results are shown in Figs. 1, 2, 3, 4, 5, 6 and 7. Figure 1 exhibits the trajectories of system output y(t), reference signal y d (t) and the constraint interval.It is obvious that the designed controller is able to track the reference signal with a small error.The trajectories of the state x 2 and its constraint interval are provided in Fig. 2. Figure 3 illustrates that the trajectories e 1 and e 2 remain within the predefined intervals.After around 10 s, it can be found that the controller achieves a satisfactory tracking performance.From Figs. 1, 2, 3, 4, 5, 6 and 7, it can be observed that all the states and tracking errors remain within the constraint intervals, which proved the effectiveness of the proposed approach for handling the full-state constraints.Although there is an exploratory signal disturbance to the system in the first 10 s, under the action of the feedforward controller, the system states are still guaranteed to be within the specified range.Therefore, the proposed method still achieves that the system is within the predefined compact sets during the learning process of the optimal feedback controller, which is meaningful in practical applications.In addition, the trajectories of feedforward control u α , optimal feedback control u * and system input u are shown in Fig. 4. From Fig. 5, it is observed that the boundedness of the neural network weight.Figure 6 shows the performance of convergence of the critic neural network.It can be found that the weights have converged in about 7 s, so the 10 s exploration time is sufficient for the learning process of the critic network.To investigate the effectiveness of Pade approximation (4) and the optimal feedback control (51), we compare the tracking error e 1 generated by different methods in Fig 7 .The tracking error e α 1 is generated by the feedforward controller alone, the tracking error e P 1 is generated by the proposed approach without Pade approximation, the tracking error e * 1 is generated by the proposed approach with feedforward and feedback controller and Pade approximation.Compared with the tracking error e * 1 and e α 1 , it can be seen that adding on the optimal feedback control can achieve better tracking performance.From the trajectories of tracking e * 1 and e P 1 , it can be observed that the stability of the closed-loop system deteriorates significantly for τ = 0.032s without Pade approximation.Furthermore, if τ > 0.032s, the stability of the closedloop system cannot be guaranteed without the use of Pade approximation.Therefore, it can be concluded that the proposed approach combines feedforward and feedback controllers to achieve better performance in different situations, making it more secure than other approaches.
From Figs. 1, 2, 3, 4, 5, 6 and 7, it is easy to conclude that the developed adaptive optimal control approach achieves the output of the system y(t) tracking the reference signal y d (t).Furthermore, the stability of the closed-loop system is guaranteed and the system states are prevented from violating its constraints when the system is affected by input delay.

Conclusion
In this study, the ADP-based optimal tracking control technique is developed for nonlinear systems with full-state constraints and input delay.With the help of the ADP theory and the backstepping design method, an adaptive controller is designed, which consists of a feedforward part and an optimal feedback part.To reduce the influence of input delay, the intermediate variable and Pade approximation are introduced.It is proved that the designed control approach guarantees the stability of the closed-loop nonlinear system under the impact of input delay and ensures the states satisfy the full-state constraints condition.Finally, the simulation supports the validity of the proposed adaptive tracking control method.In future work, the proposed method will extend to online adaptive control of nonlinear discrete-time systems.

Conflict of interest
The authors declare that they have no conflict of interest.

Appendix
Proof of Theorem 1 Define Lyapunov function candidate as From (39), the derivative of V HJB can be derived as Combining (49), (50) and (52), we have According to (50), ( 58) and ( 54), one has where m s = 1 + φ T φ.Based on the above equation, we have where where According to (57) and (61), selecting appropriate F 1 and F 2 to make M is positive definite, one has where In this study, we suppose that H (E)+G(X )U * ≤ c E .Here, c is a positive constant.Applying the Young's inequality, we can obtain Thus, (62) becomes where . By choosing sufficiently large parameters k i , the following inequality can be made to hold In view of the definition of in (55), ( 63) is divided into two cases.4γ min (M) .Under the following conditions hold:  16γ min (P) , γ min (P) represents the minimum eigenvalue of P.
Therefore, we can derive VHJB < 0 with the following conditions holding From Theorem 1, we know that the weight estimation error vector is ultimately uniformly bounded with a constant WcM , such that Wc ≤ WcM .According to the Assumption (2), we can conclude that G(X ) is bounded, i.e., G(X ) ≤ g, where g is a positive scalar.Thus, the following conclusion will be derived The proof is completed.
Based on the Lyapunov theorem and combining Case 1 and Case 2, it can be concluded that if the conditionsE ≥ max(ϒ 1 , ϒ 2 ) or W ≥ max( 1 , 2 ) or ∇V (E) ≥ max( 1 , 2 ) or B ≥ max( 1 , 2 )hold, all signals in the control system are bounded.The proof is completed.