ADP Based Fault-Tolerant Tracking Control for Underactuated AUV with Actuators Faults via Neural Network Observer

In this work, the fault-tolerant tacking control issue of underactuated autonomous underwater vehicle (AUV) with actuators faults is investigated. Firstly, an output-feedback error tacking system is constructed based on the theoretical model of underactuated AUV with actuators faults. Then, an adaptive dynamic programming (ADP) based fault-tolerant control controller is developed. In our proposed control scheme, a neural-network observer is designed to approximate the system states with actuators faults. A novel ADP scheme is constructed with critic neural network and action neural network in order to reduce the jitter in the control input and improve the tracking accuracy. Based on Lyapunov approach, the stability of the error tracking system is guaranteed by the proposed controller. At last, the simulation results show that the underactuated AUV achieves better tracking performance.


Introduction
Trajectory tracking is a complex motion control task for autonomous underwater vehicle (AUV) in an unknown underwater environment (Che et al (2019a); Qiao and Zhang (2017); Shen et al (2018); Che et al (2019b)). Many trajectory-tracking control methods have been developed for AUV without actuators faults, such as adaptive terminal-sliding-mode control method (Qiao and Zhang (2017); Zhang et al (2018a)), fuzzy control method (Liu et al (2019); Yu et al (2017)), robust control method ), cooperative control method (Wang et al (2018)) and so on.
Actuators are very important parts of underactuated AUV. The actuators faults may lead to performance degradation of underactuated AUV (Hao et al (2019); Kadiyam et al (2020)). In order to maintain system stability and the acceptable tracking accuracy, many fault-tolerant control strategies have been developed for AUV with actuators faults, such as adaptive faulttolerant control method (Liu et al (2018a)), adaptive terminal sliding mode based fault-tolerant control method (Zhang et al (2015)), backstepping based adaptive region-tracking fault-tolerant control method ) and so on.
The adaptive dynamic programming (ADP) is introduced into this work to transform the trajectory-tracking control problem into optimal control problem for underactuated AUV with actuators faults. Policy iteration (PI) algorithm and value iteration (VI) algorithm are two important ADP procedures to solve the complex Hamilton-Jacobi-Bellman (HJB) equation (Gong et al (2019); Liu et al (2018b); Sun and Liu (2018)).
Many ADP algorithms have been developed to solve the tracking control problems for nonlinear systems in recent years. An infinite-time optimal tracking control problem is investigated based on greedy heuristic dynamic programming (HDP) iteration algorithm (Zhang et al (2008)). The output tracking control problem is solved based on event-driven ADP scheme ). The time delays are considered and HDP is designed to solve the tracing control problem for a class of nonlinear systems (Zhang et al (2011)). The ADP based tracking control scheme is designed for coal gasification system ). The ADP algorithm is designed for tracking control with unknown system dynamics (Kiumarisi and Lewis (2015); Qin et al (2014)). The tracking controller based on ADP scheme is designed for fully-actuated AUV with current disturbances and rudders faults. The neuralnetwork estimators are employed to approximate the current disturbances and rudder faults (Che and Yu (2020)). A PI algorithm is developed for online fault compensation control of a class of affine nonlinear systems with actuators failures (Zhao et al (2016)).
The main contribution of this work can be summarized as follows: -An output-feedback error tracking system is constructed based on the theoretical model of underactuated AUV with actuators faults. -The neural network observer is designed to approximate the actuators faults. The approximate actuators faults is introduced into an improved performance index function based on the performance index (Zhang et al (2008); Wei et al (2018); Zhao et al (2017)). -Because AUV is a very complex nonlinear system, the critic-action neural networks are employed in order to reduce jitter in the control input which are different from the single critic neural network structure (Zhao et al (2017)). 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 Title Suppressed Due to Excessive Length 3 -The error tracking system can be guaranteed to be uniformly ultimately bounded (UUB) based on the Lyapunov stability theorem.
The rest of paper is organized as follows. The output feedback based error tracking system is constructed and problem formulation is described in Section 2. In Section 3, the fault-tolerant ADP tracking controller with neural network observer is designed. Simulation examples are provided to demonstrate the effectiveness of the proposed method in Section 4. The conclusion is drawn in Section 5.
2 Theoretical model of underactuated AUV and problem formulation 2.1 Theoretical model of underactuated AUV Two coordinate systems are used in the theoretical model of underactuated AUV as shown in Fig. 1. One is the inertial coordinate system {O e -X e Y e Z e } and the other is the body- where η = [χ y z φ θ ψ] T is the location vector with respect to the inertial coordinate system; ξ = [u v w p q r] T is the velocity vector with respect to the body-fixed coordinate system; M ∈ ℜ 6×6 is the inertia matrix; C(ξ) ∈ ℜ 6×6 is the Coriolis and centripetal matrix; D(ξ) ∈ ℜ 6×6 is the hydrodynamic damping matrix; g(η) ∈ ℜ 6×1 is the gravitational forces and moment vector; τ ∈ ℜ 6×1 is the control forces; J(η) ∈ ℜ 6×6 is the spatial transformation matrix between two coordinate systems. The kinematics of underactuated AUV is described as follows: where J = J(η).

Problem formulation
The desired trajectory is given as follows: The error vectors are defined as follows: Then substituting equations (4), (5) into equation (3), the output feedback based error tracking system is given as follows: . We define error vector x = [e ηėη ] T , then the error tracking system (6) can be transformed as follows: where I ∈ ℜ 6×6 is the identity matrix.
Actuators faults are described as f ∈ ℜ m×1 and m is the number of actuators. The real output of actuators with actuators faults is given as follows: where µ ∈ ℜ m×1 is the output of controller. The vector of control forces and control torque e ′ τ of underactuated AUV with actuators faults can be represented as follows: where B ∈ ℜ 6×m is the actuators configuration matrix; e τ = Bµ; τ f = Bf . The error tracking system with actuators faults is given as follows: Assumption 1 Because underactuated AUV does not have independent actuators in the sway and heave axes, the available controls are the surge force, pitch moment and the yaw moment. The actuators faults f satisfies that f = Kµ ≤ µ ≤ δ 1 . K is a diagonal matrix and element k ii of diagonal matrix K satisfies 0 ≤ k ii < 1. ̟(x) and ρ(x) are locally Lipchitz continuous. δ 1 is a positive constant.
The performance index function is defined as follows: where U (x, µ) = x T Qx + µ T Rµ is the utility function and U (0, 0) = 0; Q ∈ ℜ 12×12 and R ∈ ℜ m×m are positive definite matrices;f is the approximate actuators failures f ; γ is a discount factor and 0 ≤ γ < 1; β is a positive constant.
Definition 1 A control law µ is defined as an admissible control policy for the error tracking system (10) with f = 0, if µ is continuous on a set Ω ⊂ ℜ 12 and can stabilize the error tracking system (10) with f = 0, µ(0) = 0 and V 1 (x 0 , 0) is finite for all x 0 ∈ Ω.
Based on the optimal control theory, the performance index function (11) is a Lyapunov function and satisfies as follows: where V 1 (0, 0) = 0 and ∇V 1 (x, µ) is the partial derivative of V 1 (x, µ) with respect to x, ∇V 1 (x, µ) = ∂V1(x,µ) ∂x . Then, the Hamiltonian function is defined as follows: The optimal cost function is defined as follows: where δ 2 is a positive constant. The optimal cost function (14) satisfies the HJB equation, then The optimal control is expressed as follows: The PI scheme is designed as shown in Algorithm 1.

Algorithm1 Online PI
Step1: Select an initial admissible control policy µ (0) and a positive constant ǫ and an initial performance index function ∇V 3 Fault-tolerant ADP tracking controller design via neural network observer
Lemma 1 (Zhao et al (2017(Zhao et al ( , 2016) With Assumption 1,2 and the control policy (16) for error tracking system (10) with f = 0, the continuously dif- hold. So, the optimal control law (16) is a solution to the error tracking system (10) with f = 0 and error tracking system (10) with f = 0 is UUB.
Proof The derivative of V * 1 (x, µ * ) is given as follows: From equation (15) we have Substituting equation (18) into equation (17), we can geṫ is a Lyapunov function. The error tracking system (10)is UUB. This completes the proof.

Design of neural-network observer
For the error tracking system (10), we developed a radial basis function (RBF) neural network to approximate the actuators faults.
where W T 0 ∈ ℜ l0 is the ideal weight; ϕ 0 (x) ∈ ℜ l0 is the activation function; l 0 is the neurons number of the hidden layer; ε 0 is the approximation error.
Substituting equation (21) into error tracking system (10), we can geṫ Then the neural-network faults observer is designed as follows: wherex the approximation of x;Ŵ 0 is the approximation of W 0 ; L ∈ ℜ 12×12 is the positive matrix.

Design of critic neural network
The ADP controller consists of critic neural network and action neural network. The critic neural network is utilized to approximate V * 1 (x, µ * ).
where W T c ∈ ℜ l1 is the ideal weight; ϕ c (x, µ) ∈ ℜ l1 is the activation function; l 1 is the neurons number of the hidden layer; ε c is the approximation error.
Then, V 3 (x, µ) is approximated as follows: whereŴ c is the approximation of W c . The derivative ofV 3 (Z) can be expressed as follows: Then, the approximate Hamiltonian function can be expressed as follows: Given any admissible control policy µ, it is desired to selectŴ c to minimize the squared residual error E c (Ŵ c ) as The weight update law for the critic neural network is given as followṡ where ̺ 1 is the learning rate of critic neural network and ̺ 1 satisfies that ̺ 1 > 0; ς 1 = ∇ϕ c (x, µ)(̟(x) + ρ(x)(µ −f )) − γϕ c (x, µ) and ς 1 ∈ ℜ l1 . The approximate weight error of critic neural network is defined asW c = W c − W c . Then, equation (36) can be transformed as follows: Assumption 4 ∇ε c (̟(x)+ρ(x)(µ−f ))+γε c ≤ δ 9 and ς min ≤ ς 1 ≤ ς max , where δ 9 , ς min and ς max are positive constants. Proof Select an Lyapunov function as Then, the time derivative of V 4 iṡ Hence,V 4 < 0 if W c > δ9 ςmin . The approximate weight error is UUB, according to the Lyapunov stability theorem. This completes the proof.

Design of action neural network
The optimal control µ * is approximated by the action neural network as where W T a ∈ ℜ l2 is the ideal weight; ϕ a (x) ∈ ℜ l2 is the activation function; l 2 is the neurons number of the hidden layer; ε a is the approximation error.
Because the ideal weight W T a is unknown, µ * is approximated as follows: whereŴ a is the estimate of W a . The approximate feedback error used for training action neural network is defined as the difference between the feedback control input applied to the error tracking system (10) and the optimal control µ * aŝ e a =Ŵ a ϕ a (x) + 1 2 The action neural network is defined to minimize the objective function as The weight updating law for the action neural network is given as followṡ where ̺ 2 is the learning rate of action neural network and ̺ 2 > 0. According to equations (16), (29) and (40), we have 12 Gaofeng Che, Zhen Yu* The approximate weight error of action neural network is defined asW a = W a − W a . Then, equation (44) can be transformed as follows: Assumption 5 R −1 ρ T (x)(∇ϕ c (x, µ)) T ≤ δ 10 , ε a + 1 2 R −1 ρ(x) T ∇ε c ≤ δ 11 and ϕ a,min ≤ ϕ a (x) ≤ ϕ a,max , where δ 10 , δ 11 , ϕ a,min and ϕ a,max are positive constants.
Theorem 3 The approximate weight error is UUB, if the weight of the action neural network is updated by (46).
Proof Select an Lyapunov function as Then, the time derivative of V 5 iṡ ϕa,min . The weight approximation error is UUB, according to the Lyapunov stability theorem. This completes the proof.
Theorem 4 With the performance index function (11), the error tracking system (10)can be guaranteed to be UUB by the approximate fault-tolerant tracking control policy (41).
Proof Select an Lypunov function as According to equations (13), (15), the equation (50) can be transformed as follows: . The error tracking system (10) is UUB, according to the Lyapunov stability theorem. This completes the proof.