Gradient iterative algorithm for rational models based on Gram-Schmidt orthogonalization method

An improved gradient iterative algorithm, termed as Gram-Schmidt orthogonalization based gradient iterative algorithm, is proposed for rational models in this paper. The algorithm can obtain the optimal parameter estimates in one iteration for the reason that the information vectors obtained by using the Gram-Schmidt orthogonalization method are independent of each other. Compared to the least squares algorithm and the traditional gradient iterative algorithm, the proposed algorithm does not require the matrix inversion and eigenvalue calculation, thus it can be applied to nonlinear systems with complex structures or large-scale systems. Since the information vector of the rational models contains the latest output that is correlated with the noise, a biased compensation Gram-Schmidt orthogonalization based gradient iterative algorithm is introduced, by which the unbiased parameter estimates can be obtained. Two simulated examples are applied to demonstrate the eﬃciency of the proposed algorithm.


Introduction
The major objective of the control engineering is to design a controller to force the dynamics of a control system to converge to the expectations of the track [1,2,3]. Robust controller designing usually has the assumption that the parameters of the control system should be known in prior. Therefore, system identification plays an important role in control engineering, and becomes a hot and challenging topic in research and applications [4,5,6]. There exist a plenty of identification algorithms, including the least squares algorithms [7,8], the gradient iterative (GI) algorithms and the expectation maximization algorithms [9,10,11].
The least squares algorithms usually compute the parameter estimates by minimizing a quadratic cost function, thus a matrix inversion is involved [12,13,14]. In addition, the derivative function of the cost function must be analytic; otherwise, the parameter estimates may be unsolvable. Therefore, the LS algorithm will be inefficient for systems with more complex structures [15,16]. The GI algorithm does not require to solve a function, so it can be used for nonlinear system identification [17,18]. However, it has slower convergence rates because of its zigzagging nature. To increase the convergence rates, two ways are commonly considered: to choose a better direction and to compute a suitable step-length, e.g., the forgetting factor GI algorithm [19], the conjugate GI algorithm [20]. Once the direction is determined, the corresponding step-length should be calculated. Therefore, a challenging question exists: How to obtain an available step-length? In general, we should find the largest eigenvalue of the matrix. When the order of the matrix is large, the eigenvalue calculation is difficulty/impossible. That is, the traditional GI algorithm is inefficient for large-scale system identification.
Rational model widely exists in life science, for example, in bioengineering field [21], in chemical industry [22]. The rational model identification is more difficulty when compared with the polynomial nonlinear model [23], because the information vector of the rational model is correlated with the noise, which makes the parameter estimates biased [24,25,26]. Therefore, a biased compensation term should be involved to eliminate the biased term. For example, Chen et al proposed a biased compensation recursive least squares algorithm for rational models with time-delay [27]. Zhu et al provided an implicit LS algorithm for rational models, while the biased compensation term is constituted of the noise estimates [28]. The basic idea of the biased compensation based methods is to update the noise estimates in each iteration. Notice that the GI algorithm has slow convergence rates, using the initial inaccurate parameter estimates to compute the noise estimates may lead to divergence of the GI algorithm.
The reason why the GI algorithm has slower convergence rates is that the information vectors in information matrix are dependent on each other. Inspired by the Gram-Schmidt orthogonalization method [29,30], we can make information vectors orthogonal to each other to improve the convergence rate of the GI algorithm. The focus of this paper is to propose a biased compensation Gram-Schmidt orthogonalization based GI algorithm for rational models, whose contributions are listed as follows: (1) The Gram-Schmidt orthogonalization based GI algorithm has quicker convergence rates (can get the optimal estimates in one iteration).
(2) The Gram-Schmidt orthogonalization based GI algorithm is robust to the initial parameter estimates.
(3) The Gram-Schmidt orthogonalization based GI algorithm does not require matrix inversion and eigenvalue calculation, so it can be applied to large-scale systems and nonlinear systems.
(4) A biased compensation term is involved in each iteration which can get unbiased parameter estimates. In summary, this paper is organised as follows. Section 2 introduces the rational model. Section 3 proposes a biased compensation GI algorithm. Section 4 develops a Gram-Schmidt orthogonalization based GI algorithm. Section 5 provides two simulation examples. Finally, Section 6 summarises the study and discusses future directions.

Biased compensation gradient iterative algorithm
To estimate the parameters ϑ through the collected input-output data, the LS algorithms are widely used. However, the computational efforts of the LS algorithms increase intensively when the order m + n − 1 becomes larger [31,32]. To deal with this problem, a biased compensation GI algorithm is proposed in this paper.

Traditional GI algorithm
Define the cost function The GI algorithm includes two steps: the first step is to determine an optimal direction which can make the estimates move toward to the true values; and the second step is to compute a step-length along the direction. Assume that the parameter vector estimate in iteration k−1 isθ k−1 , then we want to find a better parameter vector estimateθ k to ensure where d k−1 is the direction whose corresponding step-length is λ k−1 . Since To keep J(θ k ) J(θ k−1 ), the best direction d k−1 is determined as According to [30], the corresponding step-length λ k−1 can be computed by Then the traditional GI (T-GI) algorithm for estimating ϑ is summarized as followŝ

Convergence property of the T-GI algorithm
Subtracting ϑ on both sides of Equation (7) yieldŝ Let Equation (8) can be transformed into In this case, the GI algorithm is convergent. However, E(N ) is correlated with the information matrix Φ(N ). Thus the T-GI algorithm is biased. Remark 1. Since the information matrix contains the current output y(t) which is correlated with the noise e(t), the T-GI algorithm is biased.

Biased compensation gradient iterative algorithm
To get the unbiased parameter estimates, the biased compensation GI algorithm is introduced. In iteration k − 1, the parameter estimates areθ k−1 , the noise estimates then can be computed bŷ The noise estimate vector can be expressed bŷ Therefore, the noise estimates are applied to eliminate the biased term in the parameter estimates of the T-GI algorithm, and the biased compensation gradient iterative (BC-GI) algorithm is written bŷ In the BC-GI algorithm, the parameter estimation error ε k is Note that Φ T (N )Φ(N ) is a symmetric positive definite matrix, all the eigenvalues of the matrix ] are smaller than 1 and greater than zero. Thus, we can get which means that the BC-GI algorithm is convergent. Remark 2. In the BC-GI algorithm, the compensation term is introduced in each iteration to eliminate the biased term in the parameter estimates. Thus, the parameter estimates by using the BC-GI algorithm are unbiased.
In summary, the BC-GI algorithm constitutes of the following steps: 1) Letθ 0 = 1/p 0 with 1 being a column vector whose entries are all unity and p 0 = 10 6 .
then terminate the procedure and obtainθ k ; otherwise, increase k by 1 and go to step 5).

Gram-Schmidt orthogonalization based gradient iterative algorithm
The BC-GI algorithm needs to compute the largest eigenvalue of a matrix. When the matrix order is large, the difficulty of calculation will increase accordingly. In addition, the BC-GI algorithm has slower convergence rates because of its zigzagging nature. In this section, we develop a Gram-Schmidt orthogonalization based gradient iterative (GO-GI) algorithm, which can avoid computing the eigenvalues and has quicker convergence rates.

Gram-Schmidt orthogonalization based gradient iterative algorithm
In the GO-GI algorithm, each element in the parameter vector has its own direction and step-length. Let the collected input-output data be Φ(N ) andȲ (N ), as mentioned in Section 2.
It follows that where . .
Define the cost function Since ω(i) is dependent on ω(j) when i ̸ = j, which means the parameter estimate a i has an effect on the estimate a j . It leads to slow convergence rates. To overcome this difficulty, the Gram-Schmidt orthogonalization method is applied.
The new information vectors L(j) are formed as It follows that where The rational model is equivalent tō (1), · · · , ω(n), ω(n + 1), · · · , ω(n + m − 1)]ϑ + E(N ) = [L(1), · · · , L(n + m − 1)]Wϑ + E(N ). Let Then we havē Define the cost function Assume that the parameter estimates in iteration k − 1 areα k−1 . The purpose of the GO-GI algorithm is to find a new parameter vector estimateα k which can ensure Based on Equation (17), we have The step-lengths γ k−1 are greater than zero, to keep J(α k ) J(α k−1 ), the following inequality should hold The optimal direction is computed by For the reason that L(i) is orthogonal to L(j), i ̸ = j, we take the first-order functional derivative of J(α) with respect to α 1 , · · · , α m+n−1 respectively to obtain Therefore, the parameter estimatesα k can be updated bŷ Substitute Equation (20) into the above equation and take the first-order functional derivative of J(γ 1 , · · · , γ m+n−1 ) with respect to γ 1 , · · · , γ m+n−1 , respectively. It gives rise to , i = 1, 2, · · · , m + n − 1.
Equation (20) can be simplified aŝ Remark 3. Unlike the least squares algorithm in [33], the GO-GI algorithm avoids the matrix inversion (only calculate L T (i)L(i)). Thus it has less computational efforts.
Remark 4. Once the parameter estimatesα k are obtained, the original parameter estimatesθ k can be computed bŷ

Properties of the GO-GI algorithm
In order to show the effectiveness of the GO-GI algorithm, some properties of the GO-GI algorithm are given in this subsection. Based on Equation (21) . . .
Equation (22) shows that the elements in the parameter vector can be estimated separately by using the GO-GI algorithm. Remark 5. Compared with the T-GI algorithm [34], the GO-GI algorithm does not require the eigenvalue calculation, thus can be applied to large-scale system identification.
Remark 6. Equation (22) shows that the parameter estimates in iteration k are independent of the estimates in k − 1, which indicates that the estimates obtained by using the GO-GI algorithm can obtain the local optimal values in one iteration, regardless of the initial parameter estimates.
Remark 7. Assume that the noise E(N ) is uncorrelated with the information matrix Φ(N ), using the traditional least squares (LS) algorithm for the considered model, the parameter estimates arê it means that the LS algorithm also can get the local optimal values in one iteration. However, the computational efforts of the LS algorithm are heavier than those of the GO-GI algorithm, see Table 1 (only count the amount of multiplication and division).
Since L(i) is orthogonal to L(j), i ̸ = j, the above equation can be simplified to . . .
where the noises are estimated by the parameter estimatesα k−1 , Remark 8. Equation (25) shows that the GO-GI algorithm can get the optimal parameter estimates in one iteration if the noise is Gaussian white and is uncorrelated with the information vector.
Remark 9. Since the parameter estimates are not accurate at first, the noise estimates based on the parameter estimates are also inaccurate. However, as iteration progresses, the errors between the estimates and the true values will be diminished. Thus, these two kinds of estimates will eventually approach to their true values.

Example 1
Consider the rational model proposed in [27], Then one can get the {u(t)} satisfies N (0, 1), {e(t)} is a white noise satisfies N (0, 0.1 2 ), the number of the input-output data is N = 1000. The simulation data are depicted in Figure 1.  Applying the T-GI and BC-GI algorithms to the rational model, the parameter estimates and their errors τ = ∥θ k − ϑ∥/||ϑ∥ are shown in Figure 2 and Table 2.   Table 3. Finally, apply the BC-GO-GI algorithm for the rational models. The parameter estimates and their errors are shown in Table 4. The element relative errors in the parameter estimates by using T-GI, BC-GI, GO-GI and BC-GO-GI algorithms with different iterations (k = 2, 3, 5) are shown in Figure 3. Assume that the output estimates between [1001, 1200] areŷ(t). Using the parameter estimatesθ 1000 to predict the outputs yieldŝ y(t) = ϕ T (t)θ 1000 , t = 1000, 1001, · · · , 1200, while the true outputs are generated by , t = 1000, 1001, · · · , 1200.   The output estimation errors δ = y(t) −ŷ(t) are shown in Figure 4.
Then the following findings can be obtained: (1) Figure 2 and Table 2 show that the BC-GI algorithm is more effective than the T-GI algorithm; (2) Tables 3 and 4 demonstrate that the BC-GO-GI algorithm has more accurate parameter estimates; (3) Tables 3 and 4 illustrate that the GO-GI algorithm can obtain the optimal estimates in one iteration; (4) Tables 4 and 2 show that the BC-GO-GI algorithm is more efficient than the BC-GI algorithm; (5) Figures 3 and 4 demonstrate that the BC-GO-GI is the most effective algorithm among these four algorithms.

Example 2
Consider the following Michaelis-Menten model in [35], This model is usually applied to describe the enzyme kinetics. In such a model, y(t) is the initial velocity of an enzymatic reaction, u(t) is the substrate concentration. The parameters are: α = 0.0641 and β = 212.6837.
In simulation, the input {u(t)} satisfies u(t) ∼ N (0, 1), the noise satisfies v(t) ∼ N (0, 0.1 2 ). Apply the GO-GI and the BC-GO-GI algorithms to the Michaelis-Menten model. The parameter estimates and their estimation errors are shown in Tables 5 and 6.  Tables 5 and 6 also show that the BC-GO-GI algorithm is more effective than the GO-GI algorithm.

Conclusions
This paper proposes a Gram-Schmidt orthogonalization based gradient iterative (GO-GI) algorithm for rational models. Each information vector in the information matrix is independent of others. Therefore, the GO-GI algorithm can obtain the optimal estimates in one iteration no matter what the initial parameter estimates are. Since some information vectors contain the current outputs which are correlated with the noises, the estimates of the GO-GI algorithm are biased. To overcome this difficulty, a BC-GO-GI algorithm is developed.
The simulation examples demonstrate that the BC-GO-GI algorithm is effective.
The GO-GI algorithm presented in this paper can avoid matrix inversion calculations and requires less computation with a quicker convergence rate. Therefore, it is believed that this study helps to make the GI algorithm an attractive choice for system identification, and may bring forward a new direction in using GO-GI algorithm for different kinds of systems.

Conflict of interest
The authors declared that they have no conflicts of interest to this work.