Eﬃcient iterative algorithms for a class of nonlinear systems using the block matrix inversion and hierarchical principle

This paper is concerned with the identiﬁcation of the bilinear systems in the state-space form. The parameters to be identiﬁed of the considered system are coupled with the unknown states, which makes the identiﬁcation problem diﬃcult. To deal with the trouble, the iterative estimation theory is considered to derive the joint parameter and state estimation algorithm. Speciﬁcally, a moving data window least squares-based iterative (MDW-LSI) algorithm is derived to estimate the parameters by using the window data. Then, the unknown states are estimated by a bilinear state estimator. Furthermore, to improve the computational eﬃciency, a matrix decomposition-based MDW-LSI algorithm and a hierarchical MDW-LSI algorithm are developed according to the block matrix and the hierarchical identiﬁcation principle. Finally, the computational eﬃciency is discussed and the numerical simulation is employed to test the proposed approaches.


Introduction
Nonlinear systems widely exist in practical industry areas. There has been great interest in the modeling, analysis, synthesis, and control of nonlinear systems which includes various descriptive models, such as neural networks [1], nonlinear time series models [2], rational models [3], and Hammerstein-Wiener structures [4], to name just some examples. As a particular class of nonlinear systems, bilinear systems have relatively simple model structures but can approximately describe some complex dynamic processes such as nuclear fission, heat transfer and automobile braking systems [5,6]. Specifically, the output of the chemical process described in [7] was affected by the product of the external input and the system state, where the external input was the flow rate, and the system state was the temperature. Motivated by the practicality and importance of bilinear systems, the studies of bilinear systems become wider in the parameter identification [8,9] and state estimation [10,11].
Bilinear modelling approaches such as Volterra series [12], Hartley-based modulating functions [13] and Walsh functions [14] have been widely studied. Estimating the parameters of bilinear systems is more challenging then estimating the parameters of linear systems for the reason that there are coupling terms of unknown states and parameters. For purpose of solving this problem, deriving the input-output representation of bilinear systems is a good way [15][16][17]. This approach eliminates the need to consider the effect of unknown states on the parameter estimation by removing the state variables. In addition, the subspace identification algorithm for the multivariable bilinear system was extended [18][19][20], but the bilinear nature results in the complicated calculations and heavy computational cost. For this issue, Verdult and Verhaegen presented a kernel method that depends on the dimensions of the kernel matrix [21]. The objective of this paper is to propose the estimation algorithms that can reduce the computational cost.
For the systems with large dimensions, the number of variables and the number of the parameters to be estimated increase, which makes the calculation of the estimation method increase sharply, so that the implementation of the algorithm is greatly difficult. The matrix decomposition is an effective way to reduce the computational cost by using the block matrix inversion, which is an optimization of the calculation process without changing the identification model. In addition to the matrix decomposition, the hierarchical identification principle is also applied in the field of system identification to deal with the problem that the algorithm is difficult to be performed due to the large amount of computations. Based on the combination of the hierarchical identification principle and various estimation strategies, some new identification algorithms are proposed to identify different systems [22,23], which can be applied to multivariable systems [24], nonlinear systems [25] and dual-rate stochastic systems [26]. The basic idea is to transform the original identification model into several sub-models with small sizes and to identify these submodels by an interactive way. The two-stage iterative algorithm, for example, developed by Fan and Liu, dealt with the variable-gain nonlinearity problem by decomposing the original model into two parts [27].
When a batch of sampled data is collected, the iterative algorithm can use all the data to estimate the parameters in one iteration calculation. Hence, increasing the data utilization by multiple iterations, the iterative algorithms can usually be applied in cases where the parameter estimation accuracy is required to be higher [28][29][30]. On the basis of the least squares-based iterative (LSI) algorithm in the previous work [31], this paper improves the algorithm performance by combining the LSI algorithm and moving data window technique. The moving data window means that the data are updated over time by adding the new observations and eliminating the old ones. This proposed algorithm uses both the past data and the current data at each step. Different from the gradient-based iterative algorithm in [32], the algorithm proposed in this paper can obtain high-precision parameter estimates based on the least squares principle compared with the gradient search only needs to solve the first derivative. Thus, this paper presents a least squares-based iterative algorithm over moving data window (the MDW-LSI algorithm for short), which can improve the estimation accuracy. Moreover, in order to reduce the computational cost when calculating the inverse of the covariance matrix for the MDW-LSI algorithm, two computationally efficient algorithms are presented from different aspects by introducing the block matrix approach and the hierarchical identification principle. The main contributions are summarized as follows.
• Exploiting the least squares principle and the moving data window technique, we present the MDW-LSI algorithm to estimate the parameters. As for the unmeasurable states, a state estimator is designed. Combining the MDW-LSI algorithm and the new state estimator, a state estimator-based MDW-LSI algorithm is obtained to realize the joint state and parameter estimation.
• Using the hierarchical identification principle, we decompose the identification model into two sub-models and derive a hierarchical MDW-LSI (H-MDW-LSI) algorithm for identifying the two sub-models with the associate items and improving the computational efficiency.
• Introducing the inverse of the block matrix, we transform the matrix in the MDW-LSI algorithm and derive the matrix decomposition-based MDW-LSI (MD-MDW-LSI) algorithm for the purpose of reducing the computational complexity.
• The computational efficiencies of the proposed state estimator-based MD-MDW-LSI algorithm and the state estimator-based H-MDW-LSI algorithm are shown compared with the state estimator-based MDW-LSI algorithm.
The outlines of this paper are organized as follows. Section 2 derives the identification model of bilinear systems. Section 3 proposes the state estimator-based MDW-LSI algorithm according to the iterative estimation theory. In Sections 4 and 5, an MD-MDW-LSI algorithm and an H-MDW-LSI algorithm are derived to improve the computational efficiency. The computational complexity of these algorithms are discussed in Section 6. The numerical example in Section 7 demonstrates the effectiveness of the proposed algorithms. Finally, some concluding remarks are given in Section 8.

System description and identification model
Some notations used throughout this paper are first introduced as follows.

Symbols
Meaninĝ The estimate of the parameter vector θ at iteration k and time t. T The transpose of the vector or matrix. 1 n An n-dimensional column vector whose entries are all 1.
The identity matrix of appropriate sizes or n × n. X := A X is defined by A. p 0 A large positive constant, e.g., p 0 = 10 6 .
Consider a bilinear state-space system described by its observer canonical form as where x t := [x 1,t , x 2,t , · · · , x n,t ] T ∈ R n is the state vector, u t ∈ R and y t ∈ R are the system input and output variables, v t is an uncorrelated white noise with zero mean, i.e., E ∈ R n and c ∈ R 1×n are the parameter matrices or vectors of the system, where a := [−a 1 , −a 2 , · · · , −a n−1 ] T ∈ R n−1 , f i := [f 11 , f 12 , . . . , f 1n ] ∈ R 1×n , i = 1, 2, · · · , n. Assume that the dimension n of the system state is known, y t = 0, u t = 0, x t = 0 and v t = 0 for t 0. Premultiplying (1) by z := [z −1 , z −2 , · · · , z −n ] obtains From (5), we have the following relation: From (2) and (4), we have where the parameter vector θ and the information vector ψ t are defined by · · · , f n , a 1 , a 2 , · · · , a n ] T ∈ R n 2 +2n , Equation (7) is the parameter identification model of the bilinear state-space system. This paper considers the parameter identification problem in the case that the state x t is unknown, and presents the joint parameter and state estimation method according to the available inputs {u t } and outputs {y t }, and explores the computationally efficient algorithms to reduce the computational cost.

The state estimator-based MDW-LSI algorithm
The following derives the parameter identification algorithm using the least squares principle and the moving data window technique. It is well known that the iterative algorithm computes all sampled data at each iteration and makes efficient use of the data. The recursive algorithm can update the parameter estimates online [33,34]. Differing from the methods mentioned above, this section aims to derive a state estimator-based MDW-LSI algorithm to estimate the unknown states and parameters of the considered bilinear state-space system. Consider the newest h data (h represents the data window length) from t − h + 1 to t, and define the stacked output vector Y h,t and the stacked information matrix Ψ h,t as A dynamic data window criterion function is expressed as Using the least squares principle and minimizing J 1 (θ) with respect to θ, we can obtain the least squares estimateθ t of the parameter vector θ as follows: Faced with the difficulty that the information vector ψ t in Ψ h,t contains the unknown states x 1,t−i and x t−i , i = 1, 2, . . . , n, we cannot compute the parameter estimateθ t by (9). The optimal states generally are obtained by minimizing the state estimation error covariance matrix. Motivated by this idea, a state estimator is derived for bilinear systems based on the observation information.
Letx t := [x 1,t ,x 2,t , · · · ,x n,t ] T ∈ R n be the estimate of x t . Referring to the method in [35], we construct the following bilinear state estimator: If the parameter matrices A and F , and the vector b are known, we can generate the state estimate according to the state estimation algorithm in (10)- (12). Because of the unknown parameter matrices/vector A, F and b, the state estimate cannot be computed. To tackle this problem, the idea of the iterative estimation is used. In the following, we define the iterative index k := 1, 2, 3, · · · , and define the estimates of A, F and b at iteration k and time t aŝ Then, replacing A, F and b in (10)- (12) with their corresponding estimatesÂ t,k ,F t,k andb t,k yields the bilinear state estimation algorithm in the following, wherex t,k := [x 1,t,k ,x 2,t,k , · · · ,x n,t,k ] T ∈ R n is the state estimate at iteration k and time t, P t,k ∈ R n×n and G t,k ∈ R n are the state estimation error covariance matrix and the gain vector, respectively. Remark 1: For setting the initial values of the above state estimation algorithm, if some specific prior knowledge of the model is known, for example, the values of the parameters are within a certain interval, the initial values can be selected within the interval. Without prior knowledge, a sufficient small vector/matrix can be used as the initial value of an algorithm. Here, the initial state vector is defined asx 1,0 = 1 n /p 0 , and the initial covariance matrix is taken as P 1,0 = I n .
According to the structure of ψ t , we define the estimateψ t,k of the information vector ψ t aŝ Then, replacing the unknown stacked information matrix Ψ h,t in (9) with its estimateΨ h,t,k and combining the bilinear state estimator in (17)-(19) give the following estimation algorithm: Equations (20) (20) is reversible. The following is the procedure of the state estimator-based MDW-LSI algorithm.
The SE-MDW-LSI algorithm in (20)-(30) estimates the parameter vector θ and the state vector x t in the iterative way. In order to make the algorithm more efficient, we derive an matrix decomposition based MDW-LSI algorithm using the block matrix inversion.

The bilinear state estimator-based MD-MDW-LSI algorithm
For the purpose of improving the computational efficiency, this section presents a matrix decompositionbased MDW-LSI (MD-MDW-LSI) algorithm for the bilinear system by applying the inverse of the block matrix. In the following, the information vector ψ t in (8) is decomposed into two new information vectors u t and χ t defined as Then, the information vector ψ t and the stacked information matrix Ψ h,t can be written as where the stacked information matrices U h,t and X h,t are defined as Define the data product moment matrix Then, Equation (9) can be expressed aŝ When computing the parameter estimation vectorθ t , it is necessary to compute the inverse of the data product moment matrix S t . This results in a larger amount of computation, and prompts us to study the new algorithm based on the matrix decomposition so as to reduce the computational cost. Lemma 1. Suppose that A 1 ∈ R m×m and Q := A 2 − A 21 A −1 1 A 12 ∈ R n×n are the nonsingular matrices, then the following block matrix inversion relation holds: .
Applying Lemma 1 to Equations (35) and (36) obtains [ From (38), it can be found that the parameter vector cannot be computed, since there are the unknown states x 1,t−i and x t−i in X h,t and Q t . The identification difficulty can be solved by replacing the unknown items with their corresponding estimates at iteration k − 1. According to the structure of χ t , we define its estimate aŝ Similarly, use the estimateχ t,k to define the estimate of X h,t aŝ X h,t,k := [χ t,k ,χ t−1,k , · · · ,χ t−h+1,k ] T ∈ R h×(n 2 +n) .
Thus, X h,t in Q t in (37) is replaced with its estimate X h,t , and the estimate of Q t is obtained bŷ Replacing X h,t and Q t in (38) with their estimatesX h,t,k andQ t,k and combing the above derivations, we can summarize the MD-MDW-LSI algorithm as follows: Similarly, replacing A, F and b withÂ t,k ,F t,k andb t,k gives the following bilinear state estimator for estimating x t :  (1) and (2). It can be found that the (1,1)th partitioned matrix Ω t of S t is invariant at each iteration (i.e., independent of k). Likewise, the matrix R t and the vectors γ t and β t are independent of k, thus, they only need to be calculated at the first iteration.
To summarize, we list the steps of computing the state estimatex t,k and the parameter estimateθ t,k by the SE-MD-MDW-LSI algorithm in (41)-(58) as follows.

Remark 4:
When the dimension of the involved covariance matrix is large, the computational cost is large for obtaining its inverse matrix. For the MD-MDW-LSI algorithm, the information matrix is decomposed into two parts, one part is placed outside the iterative loop, whose inverse matrix only needs to be calculated once. The dimension of the data product moment matrix involved in the other part becomes small. Thus, the MD-MDW-LSI algorithm can improve the computational efficiency by solving the inversion of the covariance matrix.

The bilinear state estimator-based H-MDW-LSI algorithm
In the previous section, we use the matrix decomposition to improve the computational efficiency.
Here, we apply the model decomposition to reduce the computational cost based on the hierarchical principle. Specifically, the identification model in (7) is decomposed into two sub-models with smaller dimensions and fewer variables, and the sub-models contain the associated items with each other. A hierarchical MDW-LSI (H-MDW-LSI) algorithm is derived. The details are as follows.
Define the parameter vectors and information vectors as γ := [f 1 , f 2 , · · · , f n , a 1 , a 2 , · · · , a n ] T ∈ R n 2 +n , Define two intermediate variables: According to the hierarchical identification principle, Equation (7) can be divided into the following two sub-models: For the sub-models in (59) and (60), one contains the parameter vector β and the information vector u t , and the other contains γ and χ t . Consider the newest h data from t−h+1 to t. Letγ t,k := [f 1,t,k ,f 2,t,k , · · · ,f n,t,k ,â 1,t,k ,â 2,t,k , · · · ,â n,t,k ] T ∈ R n 2 +n andβ t,k := [b 1,t,k ,b 2,t,k , · · · ,b n,t,k ] T ∈ R n be the estimates of γ and β at iteration k and time t. Define the following two criterion functions, According to the least squares principle, minimizing J 2 (β) and J 3 (γ) gives the least squares-based iterative relations: From (61) and (62), it is obvious that the information vector χ t on the right-hand side contains the unknown state. Also, Equations (61) and (62) contain the unknown parameters β and γ. Then, we replace the states with their corresponding estimates at iteration k − 1, and replace χ t withχ t,k defined in (39). The unknown parameters β and γ are replaced withβ t,k andγ t,k . Similar to (33) and (40), we construct the stacked input vector U h,t and the stacked information matrix estimateX h,t,k using u t andχ t from t − h + 1 to t, respectively. Then, we havê Equations (63)-(75) form the state estimator-based H-MDW-LSI (SE-H-MDW-LSI) algorithm for estimating the system parameters and states.

Remark 5:
For the H-MDW-LSI algorithm, the identification model is decomposed into two sub-models, and the dimensions of the covariance matrices in sub-models are reduced, which can improve the computational efficiency compared with the MDW-LSI algorithm. Theoretically, the computational cost will decrease as the number of sub-models increases.

The computational efficiency
Computational efficiency is an important consideration for the identification algorithms. Using the concept of the floating point is a good choice to measure the computational cost of an identification algorithm. One floating point operation can be thought of as an flop, the flop operations are represented by the sum of the number of additions and the number of multiplications [36].
To simplify the expression, a subtraction is considered an add, and a division is considered a multiplication. Here, for three different iterative algorithms, by comparing the flop operations at each iteration, the algorithm with high computational efficiency can be measured.
The following discusses the computational cost involved in the MDW-LSI algorithm, the MD-MDW-LSI algorithm and the H-MDW-LSI algorithm, and makes a comparison in detail. The multiplications and adds of the MDW-LSI algorithm at each iteration are shown in Table 1. The computational cost of the constant matrix of the MD-MDW-LSI algorithm is shown in Table 2, and the total computational cost is shown in Table 3, Table 4 shows the computational cost of the H-MDW-LSI algorithm (n 0 := n 2 + 2n, n 1 = n 0 − n = n 2 + n, n is the dimension of the system state vector, h stands for the moving data length).  Table 2, we notice that Ω t , R t , γ t and β t are constant matrices or vectors that are independent of the iteration k, thus they only need to be calculated once (outside the iterative loop).
The differences in the computational cost of the three algorithms are given by Since n 1 and h 1, the differences N 1 − N 2 and N 1 − N 3 are positive. That is to say the computational cost of the MD-MDW-LSI algorithm and the H-MDW-LSI algorithm are less than that of the MDW-LSI algorithm. Taking n = 6 and h = 20 gives an example of the flop comparison in Table 5. Table 3 The computational cost of the MD-MDW-LSI algorithm

Expressions
Number of multiplications Number of adds λ k = RtX h,t,k γ k ∈ R n nn 1 (h + 1) Tables 1-5 = 89856 315168 ≈ 28.5%, see Table 5. -As the dimension n of the state vector increases, the computational efficiency of the MD-MDW-LSI algorithm and the H-MDW-LSI algorithm will become more prominent than that of the MDW-LSI algorithm.

Example
In this section, the example demonstrates the effectiveness of the proposed algorithms. A bilinear state-space system is given by The parameter vector to be identified is In simulation, an independent persistent excitation signal sequence {u t } with zero mean and unit variance is taken as the input signal, {v t } is an uncorrelated white noise sequence with zero mean and variance σ 2 = 0.50 2 . Take the data length L = 500, the moving window length h = 100 and let the maximum iterative number k max = 10. The output is generated by simulating the model with the input signal. The simulated input-output data are recorded and shown in Figure 1 (Samples 1 to 200).  Table 6 The parameter estimates and errors of the three algorithms  Figure 2 and Table 6. From the parameter estimation error curves in Figure 2, we can see that the parameter estimation accuracies of the state estimator-based MDW-LSI algorithm and the state estimator-based MD-MDW-LSI algorithm are almost same. This means that the improved algorithm based on block matrix inversion only optimizes the calculation process and does not affect the estimation accuracy of the algorithm. Although the state estimator-based H-MDW-LSI algorithm changes the identification model, it is still effective. Figure 3 shows the states x 1,t and x 2,t , their estimatesx 1,t,k andx 2,t,k , and the state estimation errors obtained by SE-MD-MDW-LSI algorithm against t. Figure 4 shows the states x 1,t and x 2,t , their estimatesx 1,t,k andx 2,t,k , and the state estimation errors obtained by SE-H-MDW-LSI algorithm against t. Since the accuracies of the SE-MDW-LSI algorithm and the SE-MD-MDW-LSI algorithm are almost same, only the estimation results of the SE-MD-MDW-LSI algorithm Taking the noise variances σ 2 = 0.10 2 and σ 2 = 0.10 2 , respectively, the SE-MD-MDW-LSI and SE-H-MDW-LSI parameter estimates against t are shown in Figure 5 with σ 2 = 0.10 2 and in Figure 6 with σ 2 = 0.10 2 (solid line: the SE-MD-MDW-LSI algorithm, dots line: the SE-H-MDW-LSI algorithm), it can be found that the parameter estimates fluctuate in the beginning, but approach their true values as t increases. To show the effectiveness of the model obtained by the MDW-LSI algorithm, we use a different data set (L r := 100 samples from t = L+1 to t = L+L r ) for the model validation. The parameter estimates in the fifth row in Table 6 are used to construct the resulting model. The actual output y t , the predicted outputŷ t , and their errorsŷ t − y t are plotted in Figure 7. It can be seen that the predicted output can track the actual output with high accuracy and small errors.
The Monte-Carlo simulations can symbolically represent the probabilistic characteristics of a method. Thus, using the Monte-Carlo simulations with m sets of different noises (m = 1000), the parameter estimates and the estimation deviations of the MDW-LSI algorithm are shown in   . 8 The distributions of the parameter estimates over 1000 runs Table 7, and the distributions of the parameter estimates over 1000 runs are shown in Figures 8  and 9. It can be seen that the average values of the parameter estimates are close to their true parameters.

Conclusions
This paper presents a SE-MDW-LSI algorithm for the a class of nonlinear systems to realize the joint parameter and state estimation. To increase the computational efficiency, an SE-MD-MDW-LSI algorithm and an SE-H-MDW-LSI algorithm are developed based on the block matrix inversion and the hierarchical identification principle. The simulation results demonstrate the ef-  . 9 The distributions of the parameter estimates over 1000 runs fectiveness of the proposed algorithms, There are still several interesting topics to discuss Some system data used in modern industries may be unmeasurable or missing, and if the observation noise and the process noise are both colored noises, the problem will be more challenging and more interesting. Therefore, we will do more research in the future on the bilinear systems with missing outputs and colored noise.