Highly eﬃcient maximum-likelihood identiﬁcation methods for bilinear systems with colored noises

As a special class of nonlinear systems, bilinear systems can naturally describe many industrial production process. This paper mainly discussed the highly eﬃcient iterative identiﬁcation methods for bilinear systems with autoregressive moving average noise. Firstly, the input-output representation of the bilinear systems is derived through eliminating the unknown state variables in the model. Then based on the maximum-likelihood principle and the negative gradient search principle, a maximum-likelihood gradient-based iterative (ML-GI) algorithm is proposed to identify the parameters of the bilinear systems with colored noises. For further improving the computational eﬃciency, the original identiﬁcation model is divided into three sub-identiﬁcation models with smaller dimensions and fewer parameters, and a hierarchical maximum-likelihood gradient-based iterative (H-ML-GI) algorithm is derived by using the hierarchical identiﬁcation principle. A gradient-based iterative (GI) algorithm is given for comparison. Finally, the algorithms are veriﬁed by a simulation example. The simulation results show that the proposed algorithms are eﬀective for identifying bilinear systems with colored noises and the H-ML-GI algorithm has a higher computational eﬃciency and a faster convergence rate than the ML-GI algorithm and the GI algorithm.


Introduction
Nonlinear systems can be found in many practical problems and have many characteristics which are significantly different from linear systems [1][2][3]. The modelling and parameter identification of nonlinear systems are much difficult because of their complexity [4,5]. Bilinear systems are the simplest nonlinear systems in the form [6], and can naturally describe many industrial production process, especially the chemical process [7][8][9]. They are jointly linear in the state and the force variables, and are the simple nonlinear extension of linear systems [10]. Therefore, bilinear systems are considered to be nonlinear systems that are closest to linear systems. Because of the widely applications of bilinear systems, the study of their identification methods is meaningful.
The parameter identification of bilinear systems has been studied for decades and many identification methods have been proposed for bilinear systems [11,12]. At the early time, many classical identification methods were used to identify bilinear systems, such as Walsh functions, block-pulse functions, Chebyshev polynomials, Taylor polynomials and Galerkin methods. Favoreel et al. generalized linear subspace identification theory to the subspace identification of bilinear systems [13]. However, the size of the Hankel matrix increases exponentially with the number of the dimensionality of the system states and system inputs-outputs. In addition, some parameter identification methods of bilinear systems are based on the conversion of the bilinear system into an equivalent linear model, such as the Kalman filtering-based identification algorithm [14], the superspace method [15] and the intersection subspace algorithm [16]. Besides, the least squares methods [17,18], the gradient methods [19,20], the maximum-likelihood methods [21][22][23] and the intelligent algorithms [24] have been developed to identify the bilinear systems.
As a class of important approaches for dynamical system identification, the maximum-likelihood estimation methods have good statistical properties and have been applied in image recognition, magnetic resonance, speech recognition, bioinformatics, and system identification, etc [25,26]. For example, Chen et al. presented a parameter separation least squares-based iterative algorithm and a maximum-likelihood least squares-based iterative method for the output-error bilinearparameter models with colored noises [27]; Alfonsi et al. studied the maximum-likelihood estimator for estimating the drift parameters of a Wishart process [28]; Aslam derived a recursive maximum-likelihood least squares-based secondary path identification algorithm for active noise control systems with autoregressive moving average noise [29]; Xia et al. derived a key term separation-based maximum-likelihood recursive extended stochastic gradient algorithm and a key term separation-based maximum-likelihood multi-innovation extended stochastic gradient algorithm for the multi-variable input nonlinear systems with unmeasured disturbances [30]; Gibson et al. proposed a new expectation maximisation-based maximum-likelihood algorithm and a gradient-based algorithm for parameter estimation of multi-variable bilinear systems [31].
As an optimization algorithm, the gradient method is useful in the field of identification [32,33]. Many gradient-based parameter identification methods have been developed using the auxiliary model approach, the multi-innovation identification theory, the hierarchical identification principle and the data filtering technique [34][35][36], etc. For example, Chen et al. developed an Aitken-based modified Kalman filtering stochastic gradient algorithm for dual-rate nonlinear models [37]; a multi-innovation stochastic gradient parameter estimation method and a recursive least squares algorithm were derived for parameter estimation of the sine combination signals and periodic signals in [38].
The maximum-likelihood estimation algorithm of bilinear systems in [21] is recursive and in [31] is based on the expectation maximisation method. Based on the previous work [39] and different from them, this paper focuses on the highly efficient parameter identification problems of bilinear systems with ARMA noise based on the maximum-likelihood principle [40] and the hierarchical identification principle. The main contributions of this paper are as follows.
-The model structure of bilinear state space systems involves the products of the states and inputs, which increases the difficulty of system identification. This paper eliminates the state variable in the systems and obtains the input-output representation of the bilinear state space system, and derives a maximum-likelihood gradient-based iterative algorithm by combing the maximum-likelihood principle with the negative gradient search principle. -For further improving the computational efficiency, the original identification model is divided into three sub-identification models with smaller dimensions and fewer parameters, and a hierarchical maximum-likelihood gradient-based iterative algorithm is developed based on the hierarchical identification principle. Then a gradient-based iterative algorithm is presented for comparison.
The remains of this paper is organized as follows. The identification model of bilinear systems with colored noises is derived in Section 2. Section 3 discusses the maximum-likelihood principle and proposes a maximum-likelihood gradient-based iterative algorithm. For further improving the computational efficiency, a hierarchical maximum-likelihood gradient-based iterative algorithm is proposed in Section 4. For comparison, a gradient-based iterative identification algorithm is presented in Section 5. The computational efficiency of the algorithms are discussed in Section 6. A simulation example is provided in Section 7 to verify the effectiveness of the proposed algorithms. Finally, we make some concluding remarks in Section 8.

System description and identification model
Let us introduce some notation. "A =: X"or "X := A" stands for "A is defined as X". 1 n represents an n-dimensional column vector whose elements are 1. Let T be the matrix transpose. r −1 stands for a unit backward shift operator:

Fig. 1 The bilinear state space system
Consider the bilinear state space system in Figure 1, and assume that the bilinear system has the following observability canonical form [17]: where x(s) := [x 1 (s), x 2 (s), · · · , x n (s)] T is the n-dimensional state vector, u(s) ∈ R and y(s) ∈ R are the input and output of the system, respectively, and A ∈ R n×n , B ∈ R n×n , b ∈ R 1×n , g ∈ R n and h ∈ R 1×n are constant matrices and vectors: Referring to the method in [17,20], from (1)-(2), we can obtain the input-output representation of form: where A(r −1 ) := 1 + a 1 r −1 + a 2 r −2 + · · · + a n r −n , a i ∈ R, The coefficients c i and d i and the parameters a i , b i and g i have the following relations: [c n , · · · , c 2 , c 1 ] := [g n + a n−1 g 1 + a n−2 g 2 + · · · + a 1 g n−1 , · · · , g 2 + a 1 g 1 , g 1 ] ∈ R 1×n , The practical industrial processes are always interfered with various factors, such as disturbances. These factors are generally called stochastic noise. Introducing a noise term w(s) ∈ R to (3), we have where the noise term w(s) may be a white noise process, an autoregressive (AR) process, a moving average (MA) process or an ARMA process. Without loss of generality, this paper considers w(s) as an ARMA process, and the algorithms studied in this paper can be applied to other special cases with an MA noise and an AR noise. This paper considers w(s) as an ARMA process, that is, , v(s) ∈ R is a Gaussian distributed white noise with zero mean and variance σ 2 . E(r −1 ) and F (r −1 ) are polynomials in r −1 and E(r −1 ) := 1 + e 1 r −1 + e 2 r −2 + · · · + e ne r −ne , e i ∈ R, Assume that the system orders n, n e and n f are known and u(s) = 0, y(s) = 0 and v(s) = 0 as s 0. The objective of this paper is to develop the highly efficient iterative algorithm for estimating the parameters a i , b i , c i , d i , e i and f i from the measured input-output data {u(i), y(i): i = (s − 1)l + 1, (s − 1)l + 2, · · · , sl} (l denotes the data length). Define the parameter vector ϑ and information vector φ(s) as According to the above definitions, E(r −1 )w(s) = F (r −1 )v(s) can be written as Equation (9) is the noise model. Thus, Equation (4) can be written as = ϕ T (s)θ + w(s) Equation (13) is the identification model for the bilinear system in (4), and the parameter vector ϑ contains the parameter vector θ of the system model and the parameter vector ρ of the noise model.
Assume that the observed values {u(i), y(i), i < s} and the parameter vector ϑ is independent with v(s), and the probability density function p(y l |u l−1 , ϑ) is available. Referring to the derivation process of the objective function in [39], the maximum-likelihood function is given by where K is a constant.
Taking the natural logarithm on both sides of (14) and maximizing the logarithm likelihood function give the following equivalent cost function where v(s) is given by Obviously, J(ϑ) and v(s) depend on the parameter vector ϑ, which contains the parameters In the following, we use the iterative methods to obtain the minimum of J(ϑ).

The iterative identification algorithm
Letθ k (s) be the estimate of ϑ at iteration k, i.e., Using the gradient search and minimizing the cost function J(ϑ), we can obtain the gradientbased iterative parameter estimateθ k (s) of ϑ: where µ k (s) is an iterative step-size or a convergence factor.
Define the information vectorφ f,k (s) aŝ Computing the partial derivatives of v(s) in (16) Thus, we havê Define the estimate of φ(s) aŝ From (13) From (11), Using the above definitions and (35) and from (20), we havê In order to guarantee the convergence ofθ k (s), one conservative choice of µ k (s) is to satisfy Equations (7), (17)- (19) and (28)- (38) form the maximum-likelihood gradient-based iterative (ML-GI) identification algorithm for the bilinear systems. The identification steps of the ML-GI algorithm to computeθ k (s) are listed as follows.
The flowchart of computingθ k (s) in the ML-GI algorithm is shown in Figure 2.

The hierarchical maximum-likelihood gradient-based iterative algorithm
The computation of the step-sizes in the ML-GI algorithm involves computing the eigenvalue of large-size matrix, which increases the computation burden of the algorithm. For further ?
? P P P P P P P P P P P P k = kmax? ? P P P P P P P P P P P P P P P P P P improving the computational efficiency, the original identification model is divided into three subidentification models with smaller dimensions and fewer parameters, and a hierarchical maximumlikelihood gradient-based iterative algorithm is derived in this section by using the hierarchical identification principle.
setθ 0 (s + 1) =θ k (s), increase s by 1 and go to Step 2; otherwise, obtain the parameter estimateθ k (s) and terminate the calculation process.
The flowchart of computingθ k (s) in the H-ML-GI algorithm is shown in Figure 3.

The Gradient-based iterative algorithm
To show the advantages of the proposed algorithms, we give the gradient-based iterative algorithm for identifying the parameter vector ϑ of bilinear systems.
Consider the newest l data from (s − 1)l + 1 to sl (l ≫ 4n + n e + n f − 1), and define the stacked output vector Y (l, s) and the stacked information matrix Φ(l, s) as ?
? P P P P P P P P P P P P k = kmax? ? P P P P P P P P P P P P P P P P P P Minimizing J 1 (ϑ) and using the negative gradient search lead to the iterative relation for computing the parameter estimatesθ k (s): where µ 1,k (s) is an iterative step-size or a convergence factor. Notice that the information vector φ(s) in Φ(l, s) contains the unmeasured noise terms w(s − j) (j = 1, 2, · · · , n e ) and v(s − j) (j = 1, 2, · · · , n f ), so Equation (71) cannot give the estimateθ k (s) directly. The solution is to replace the unknown noise terms w(s − j) and v(s − j) in φ(s) with their corresponding estimateŝ w k−1 (s − j) andv k−1 (s − j) at iteration k − 1 and to define the estimate of φ(s) aŝ According to the identification model in (13) Replacing φ(s) and ϑ withφ k (s) andθ k (s), we can obtain the estimate of v(s) aŝ Replacing θ withθ k (s) gives the estimate of w(s) : Replacing Φ(l, s) in (71) with its estimateΦ k (l, s), we can obtain the gradient-based iterative identification algorithm for estimating ϑ: In order to guarantee the convergence ofθ k (s), one conservative choice of µ 1,k (s) is to satisfy where λ max [X] denotes the maximum eigenvalue of the square matrix X. Equations (7), (17)- (19), (69) and (72)-(78) form the gradient-based iterative (GI) identification algorithm for the bilinear systems.
The identification steps of the GI algorithm to computeθ k (s) are listed as follows.
The flowchart of computingθ k (s) in the GI algorithm is shown in Figure 4.

Comparison of the computational efficiency
The computational efficiency of the algorithm can be evaluated by the number of multiplications (including divisions) and additions (including subtractions). One multiplication or one addition is called a flop, which means the floating point operations. The sum of the flops is the computational efficiency of the algorithm. The computational efficiency of the ML-GI algorithm, the H-ML-GI algorithm and the GI algorithm is shown in Tables 1-3, and we define n 0 in Tables 1-3 as n 0 := 4n + n e + n f − 1.
In order to make it clear, an example is taken for illustration. Taking the system orders n = 5, n e = 5, n f = 5 and the data length l = 3000, we have N 1 = 6918870, N 2 = 2970310, N 3 = 5682870, that is, the difference between the total flops of the ML-GI algorithm and the H-ML-GI algorithm at each iteration is N 1 −N 2 = 6918870−2970310 = 3948560, and the difference between the total flops of the GI algorithm and the H-ML-GI algorithm at each iteration is N 3 − N 2 = 5682870 − 2970310 = 2712560. The algorithm with higher computational efficiency means the algorithm has fewer flops. Obviously, the H-ML-GI algorithm have a higher computational efficiency than the ML-GI algorithm and the GI algorithm. ? P P P P P P P P P P P P k = kmax? ? P P P P P P P P P P P P P P P P P P  Table 1 The computational efficiency of the ML-GI algorithm

Example
Consider the following bilinear system: The parameter vector to be estimated is given by a 2 , b 1 , b 2 , c 1 , c 2 ,  In simulation, the input {u(s)} is taken as a persistent excitation sequence with zero mean and unit variance, and is shown in Figure 5. {v(s)} is an uncorrelated noise sequence with zero mean and variance σ 2 .

Case I: About the parameter estimation
Taking the noise variance σ 2 = 1.00 2 and σ 2 = 2.00 2 , respectively, taking the data length l = 3000, applying the ML-GI algorithm, the H-ML-GI algorithm and the GI algorithm to estimate the parameters of the system, the parameter estimates and their errors with different noise variances are shown in Tables Fig. 9 The estimation errors δ versus k with different algorithms (σ 2 = 1.00 2 ) Table 4 The estimates and errors with different algorithms (σ 2 = 1.00 2 ) -Under the same data length and noise variances, the ML-GI algorithm and the H-ML-GI algorithm have smaller estimation errors than the GI algorithm -see Tables 4 and 5. In other words, the parameter estimates accuracy of the ML-GI algorithm and the H-ML-GI algorithm are higher than those of the GI algorithm. -Among these three algorithms, the convergence rate of the H-ML-GI algorithm is obviously faster than that of the ML-GI algorithm and the GI algorithm -see Figures 9-10. -In this example, the system orders are n = 2, n e = 1, n f = 1, and the data length is l = 3000, and the number of iterations is k = 500. According to Tables 1-3, the total flops of the ML-GI algorithm, the H-ML-GI algorithm and the GI algorithm are kN 1 = 500 × 1062090 = Table 5 The estimates and errors with different algorithms (σ 2 = 2.00 2 ) Algorithms k a 1 a 2 b 1 b 2 c 1 c 2 d 2 e 1 f 1 δ (%) ML-GI 531045000, kN 2 = 500 × 570038 = 285019000, kN 3 = 500 × 690090 = 345045000, respectively. Obviously, the H-ML-GI algorithm has a higher computational efficiency than the ML-GI algorithm and the GI algorithm.

Case II: About the model validation
For the model validation, we use the ML-GI estimates, the H-ML-GI estimates and the GI estimates in Table 4 with k = 500 to construct the estimated models, respectively. Then we use these estimated models and the rest l r = 100 data from s = l + 1 to s = l + l r to compute the predicted outputsŷ i (s) of the system. The predicted outputsŷ i (s) and the true output y(s) are plotted in Figure 11 with different algorithms. Use the estimated outputs to compute the mean-squares errors (MSEs) and define MSE as From Figure 11, we can see that the predicted outputsŷ 1 (s),ŷ 2 (s) andŷ 3 (s) are very close to the true output y(s), and the MSEs of the ML-GI algorithm, the H-ML-GI algorithm and the GI algorithm are very close to the noise variance σ 2 = 1.00 2 . In other words, the estimated models can capture the dynamics of the system.  Fig. 11 The predicted outputsŷ i (s) and the true output y(s) versus s with σ 2 = 1.00 2

Conclusions
This work mainly considers the iterative identification problems of a special class of nonlinear systems with unmeasured disturbances. By combining the maximum-likelihood principle and the negative gradient search principle, a maximum-likelihood gradient-based iterative (ML-GI) algorithm is derived for the parameter estimation of bilinear systems with an ARMA process. Based on the hierarchical identification principle, a hierarchical maximum-likelihood gradientbased iterative (H-ML-GI) algorithm is developed for improving the computational efficiency. For comparison, a gradient-based iterative (GI) algorithm is also presented. The simulation results verify that the proposed algorithms can generate more accurate estimates for bilinear systems than the GI algorithm, and the H-ML-GI algorithm has a faster convergence rate and a higher computational efficiency than the ML-GI algorithm and the GI algorithm. The proposed algorithms can be extended to solve the parameter estimation problems of other nonlinear systems with colored noises or the nonlinear systems with interval-varying measurements, and can be applied to other fields such as the industrial process.