ADP based trajectory-tracking control via backstepping method for underactuated AUV with unknown dynamics

This paper investigates trajectory-tacking control problem for underactuated autonomous underwater vehicles (AUV) with unknown dynamics. Different from existing adaptive dynamic programming (ADP) schemes, our proposed control scheme can achieve high-level system stability and tracking control accuracy. Firstly, the backstepping approach is introduced into the kinematic model of underactuated AUV and produces a virtual velocity control which is taken as the desired velocity input of the dynamic model of underactuated AUV. Secondly, the error tracking system is constructed according to the dynamic model of underactuated AUV. Thirdly, the critic neural network and the action neural network are employed to transform the trajectory-tracking control problem into optimal control problem based on policy iteration algorithm. At last simulation results are given to verify the effectiveness of the control scheme proposed in this paper.


Introduction
The motion control of autonomous underwater vehicle(AUV) has received more and more attention due to its wide applications. Many control techniques are proposed to solve the motion problems of AUV [1]. Trajectory-tracking control is one fundamental functionality of motion control for AUV [2,3]. Many control schemes are developed to solve the trajectory-tracking problem for underactuated or fully-actuated AUV, such as output feedback control method [4,5], adaptive control method [6], fuzzy control method [7], model predictive control method [8], dynamic surface control method [9] and so on.
However, AUV is a complex system and exists the nonlinear dynamics, the unknown disturbances and the faults [10], which make the AUV unstable. In recent years, the neural network is introduced to solve the trajectorytracking problem for AUV due to its strong nonlinear fitting ability, strong robustness and strong self-learning ability. The adaptive neural network tracking controller [11] constructed with the radial basic function neural network is developed for underwater vehicle with unknown nonlinear function. The deterministic policy gradient [12] using multi Pseudo Q-learning method is proposed to reduce the overestimation of action-value function for underactuated AUV with unknown dynamics and constrained inputs. The neural networks are employed to approximate the uncertain underactuated vessel dynamics and external disturbances [13]. The radial basis function neural network is employed to deal with the uncertain nonlinear dynamics for the underactuated vessel [14]. The neural network is designed to solve the cooperative path following problem for a fleet of underactuated AUV with uncertain nonlinear dynamics [15]. The neural network is employed to estimate the unknown nonlinear dynamics of AUV [16]. A robust adaptive control scheme based on fully-tuned fuzzy neural network is proposed to solve the trajectory and attitude problem for unmanned underwater vehicle with thruster dynamics and unknown disturbances [17].
As an optimization method, adaptive dynamic programming (ADP) scheme has received more attention [19]. ADP scheme has been introduced to compensate for unknown dynamics, such as external disturbances, control input nonlinearities and model uncertainties for nonlinear systems [20,21]. Policy iteration [22][23][24][25][26] and value iteration [27,28] are two primary iteration ADP algorithms. In this work, ADP is introduced to solve the trajectory-tracking problem for underactued AUV with bounded time-varying disturbances and the higher-level tracking accuracy is received. The main contributions of this paper can be stated as below: (i) The virtual velocity control input is designed based on the kinematic model of underactuated AUV using the backstepping method and Lyapunov stability theorem. It is taken as the reference velocity input of the dynamic model of underactuated AUV to reduce the jitter in the system states of error tracking system.
(ii) The error tracking system of underactuated AUV are established by augmented matrix method. The policy iteration ADP scheme based on the critic-action neural networks is employed to transform trajectory-tracking control problem for underactuated AUV into optimal control problem. The ADP structure is different from that is proposed in [29]. The weight values of the critic-action neural networks are updated online. (iii) The unknown dynamics [19,29] is considered in underactuated AUV. The stability of error tracking system is analyzed based on the Lyapunov stability theorem. In order to verify the effectiveness of the method proposed in this paper, the simulation results of the compared method proposed in [30] are given.
This paper is organized as follows. In Section 2, The kinematic model and the dynamic model of underactuated AUV based on the reference model of AUV in [31]are given. In Section 3, ADP optimal controller is designed via backstepping method. In Section 4,two simulation examples are provided. Section 5 gives the conclusions.

Problem formulation and mathematical model of AUV
The mathematica model of underactuated AUV is shown in Fig 1. Two coordinate systems are used that one is the universal frame {O e − X e Y e Z e } and the other is the body-fixed frame

Kinematic model of AUV
The kinematic model of underactuated AUV is given as follows: where the position vector η = [x y z φ θ ψ] T consists of vehicle location x, vehicle location y, vehicle location z, roll angle φ, pitch angle θ, yaw angle ψ with respect to the universal frame; the velocity vector ξ = [u v w p q r] T consists of surge u, sway v, heave w, roll rate p, pitch rate q, yaw rate r with respect to the body-fixed frame; J(η) ∈ ℜ 6×6 is the coordinate transformation matrix.

Problem formulation
The position error of underacuated AUV is defined as follows: The time derivative of e η is given as follows: The velocity error of underactuated AUV is defined as follows: where ξ bs is the virtual velocity control input generated by the backstepping approach. The time derivative of e ξ is given as follows: 3 ADP optimal control design via backstepping method The block diagram of the proposed control scheme is show in Figure 2.

Kinematic control via backstepping method
Considering the kinematic model of underactuated AUV, the objective is to make e η = 0 as t → ∞. According to equations (4), (5) and (6), we can get the virtual control input ξ bs that is given as follows: where k 1 > 0 and J is invertible. The time derivative of equation (8) is given as follows: Lemma 1 Under the proposed virtual control input ξ bs and the kinematic model of underactuated AUV (1), eη is UUB.
Proof Choose the Lyapunv function as follows: The derivative of V 1 is calculated as follows: From equation (11), the time derivative of Lyapunov function is negative semidefinite. It can be concluded that V 1 is bounded and eη is UUB. This completes the proof.
According to equations (4)-(9), the dynamic model of underactuated AUV (2) can be transformed as follows: Let Z = [e ξ e η ] T , we can get the error tracking system as follows:

Critic neural network design
The critic neural network is designed to approximate V * 2 (Z) as follows: where W 1 is the unknown ideal constant weights of the critic neural network; ̟ 1 (Z) is the activation function vector of the critic neural network; and δ 1 (Z) is the approximate error of the critic neural network. The derivative of equation (20) is represented as follows: where ∇̟ 1 (Z) ∂̟1(Z) ∂Z and ∇δ 1 (Z) ∂δ1(Z) ∂Z . LetŴ 1 be the approximation of W 1 , the approximation of V * 2 (Z) can be represented as follows:V Then, the HJB function can be derived as follows: (23) The square residual error E 1 (Ŵ 1 ) is defined as follows: Given any admissible control policyΓ, it is desired to selectŴ 1 to minimize E 1 (Ŵ 1 ). The weight updating law based on the gradient descent algorithm is given as follows:Ẇ where α 1 is the adaptive gain of the critic neural network and α 1 > 0; According to the definition of ̺ 1 , there exist positive constants ̺ 1M > 1 and ̺ 1m > 0 such that ̺ 1m ̺ 1 ̺ 1M . The equation (25) can be transformed as follows: where

Action neural network design
The action neural network is designed to approximate Γ * as follows: where W 2 is the unknown ideal constant weights of the action neural network; ̟ 2 (Z) is the activation function vector of the action neural network; δ 2 (Z) is the approximation error of the action neural network . LetŴ 2 be the approximation of W 2 , the actual output can be expressed as follows:Γ =Ŵ T 2 ̟ 2 (Z) (28) According to equations (18) and (28), the feedback error is defined as follows: The square residual error E 2 (Ŵ 2 ) is defined as follows: It is desired to selectŴ 2 to minimize the objective function. The weight updating law based on the gradient descent algorithm is given as follows: where α 2 is the adaptive gain of the action neural network and α 2 > 0 .
The equation (31) can be rewritten as follows: ).

Stability analysis
According to the optimal controlΓ (28) and the error tracking system (13), The error tracking system (33) can be transformed using equation (27) as follows: Assumption 2 The unknown ideal constant weights W 1 and W 2 satisfy that W 1 W 1M and W 2 W 2M respectively. W 1M and W 2M are the positive constants.
Theorem 6 For the error tracking system (13), Assumptions 2-5 hold. The optimal cost function and the optimal control law are provided by (16) and (18). The weight updating laws of the critic neural network and the action neural network are given by (25) and (31). Then the tracking error Z and the weight estimate errorW 1 and W 2 are asymptotically stable.
The derivative of the Lyapunov function candidate (35) along the trajectories of the error tracking system (34) is given as follows: According to equation (26),L 1 can be represented as follows: According to equation (32),L 2 can be represented as follows: L 3 is given as follows: Then, we can getL (t) =L 1 +L 2 +L 3 < 0 (40) Therefore, it can be concluded that the tracking error Z and the neural network estimation errorW 1 andW 2 are UUB. This completes the proof.

Simulation
In order to verify the effectiveness of the proposed control technique, two simulation examples without unknown dynamics and with unknown dynamics are performed compared with the traditional ADP scheme without backstepping method respectively. Because underactuated AUV does not have independent actuators in the sway and heave axes, the available controls are the surge force, pitch moment and the yaw moment. The simulations are conducted using 5-DOF kinematic and dynamic models that φ = 0 and p = 0.

Example one without unknown dynamics
The numerical values of the parameters used in the simulations are given as follows:  Fig. 4 Tracking error of desired velocity with ADP without backstepping method Figure 3 and Figure 4 show the tracking errors of desired velocity. Tracking errors of desired position are shown in Figure 5 and Figure 6. From above simulation results, the states of error tracking system (34) converge to zero with the method proposed in this paper and the tracking error of desired velocity with ADP scheme without backstepping method is bounded. The time response of underactuated AUV clearly shows that the proposed control method guarantees a higher-level performance.
The tracking of desired velocity of underactuated AUV with two different control schemes is given in Figure 7. Figure 8 shows the tracking of desired position of underactuated AUV with two different control schemes. Figure 9 illustrates the spatial trajectories of unnderactuated AUV with different control schems. The actual position and the actual velocity and the actual spatial trajectory with the method proposed in this paper is more closed to the desired spatial trajectory.

Example two with unknown dynamics
In this section the unknown dynamics is considered. Figure 10 and Figure 11 show the tracking errors of desired velocity. Tracking errors of desired position are shown in Figure 12 and Figure 13. The states of error tracking system (34) converge to zero with the method proposed in this paper. Because the simulation example exists bounded time-varying disturbances, there exists small bounded chatter in the process of convergence. The time response of the underactuated AUV clearly shows that the proposed control method guarantees a higher-level performance.
The tracking of desired velocity of underactuated AUV with two different control schemes is given in Figure 14. Figure 15 shows the tracking of desired position of underactuated AUV with two different control schemes. Figure16 illustrates the spatial trajectories of unnderactuated AUV with different control schems. The actual position and the actual velocity and the actual spatial trajectory with the method proposed in this paper is more closed to the desired spatial trajectory.

Conclusions
The stability of error tracking system (34) is guaranteed based on the Lyapunov stability criteria. The simulation results have shown excellent convergence of the error tracking systems (34) compared with the ADP scheme without backstepping method. The proposed control scheme achieves good tracking performance. In the future work, we will consider the single critic neural network based ADP scheme to solve the trajectory-tracking control problem of underactuated AUV with actuator faults.