New framework for identifying discrete-time switched linear systems

This paper addresses the problem of offline identification of a particular class of hybrid dynamical systems that are discrete-time switched linear state space models. The identification process is carried out from data previously sampled from the system. Unlike most existing methods that prioritize identifying the switching instants first, a new framework is proposed in which the local models are identified first, and the task of identifying the switching instants is performed later. The methodology involves the iterative calculation of discrete and continuous models’ a posteriori probability density function using subspace identification, clustering, data classification, and hybrid stochastic filtering methods. This strategy allows grouping the data most likely to have been generated by the same submodels, thus allowing the estimation of these local models. An essential feature of the algorithm is that the matrices of the different submodels are identified with the same state-space basis allowing them to be evaluated and, if necessary, combined. The performance of the identification procedure is evaluated through numerical examples, and a comparison with a prior method described in the literature is conducted.


Introduction
Dynamical systems that simultaneously include continuous and discrete dynamical subsystems are called Hybrid Systems [14,37].The switched linear system is an important class of hybrid systems that uses switching between different linear models to approximate the dynamics of a complex nonlinear model [1,28].
In this context, the identification problem refers to estimating the subsystem parameters and their relationship to each other based on the switching signal.As classical identification methods cannot solve this problem, it has gained significant attention from researchers.This dynamical system can generally be represented with two different model structures: input-output models or statespace models.The proposed methods have mostly been developed around input-output models [6, 9, 11, 17, 19, 23-25, 27, 32, 36, 37].
While input-output models can depict dynamic systems, they are commonly found inadequate in the context of stability analysis, system realization, and observer design, which rely on state-space models [2].Furthermore, state-space models offer a concise and convenient representation for handling multiple-input, multipleoutput (MIMO) systems.
Nonetheless, identifying state-space models presents two challenges not encountered in input-output model estimation.First, local models are typically obtained using subspace identification techniques and are usually identified concerning distinct bases of the state space, making direct combinations infeasible.Hence, transformation to a common state basis is necessary [34].Second, partitioning the 623 Page 2 of 16 regression space becomes more complex due to the unknown state and lack of available regression vectors [3].
In recent years, identifying hybrid systems with models represented in the state-space form has garnered considerable attention.Various authors have addressed this problem using online approaches [2,20,29].In this context, the identification process involves real-time identification approaches.The cited works commonly utilize a recursive subspace identification algorithm with a dynamic classification method to conduct the identification process.
In offline strategies, the main approaches typically require a sliding window, which implies that consecutive switching times are separated by a specific minimum time, referred to as the dwell time.In [34], the identification problem is undertaken, assuming that the number of modes and switching times are known.Thus, the focus is on obtaining the transformations between the linear local models such that the models are characterized into the same state basis.In [16], two methods to embed the inputs and outputs in a given subspace are presented, and the data are segmented using the deterministic generalized principal component analysis algorithm.
An interesting approach is proposed in [30] and [31].The authors use a change detection algorithm and a subspace technique to estimate the switches and the local models in the state space in these works.The main difference between the approaches is that the method in [31] addresses the delay for the detection problem.A similar approach was proposed in [8], but in this approach, the change among the local models are detected because the projected output's rank increases when the sliding window contains mixed data from two successive models.An approach is proposed in [3] that does not assume a minimum dwell time.The authors present an idea to eliminate the unknown continuous state from the model equations.The identification process is done by classifying the data and updating the model parameters.However, it is necessary to have a priori knowledge of the number and order of the submodels for this approach to work.This paper proposes a new framework inspired by ideas from the EM algorithm [10] and clustering-based identification [11] to identify a switched linear system.Our approach consists of iteratively calculating the a posteriori probability density function (pdf) of discrete and continuous models.To achieve these goals, we combined the subspace identification methods, clustering, data classification, and hybrid stochastic filtering to segment the data into different submodels and estimate the local models.
In contrast to conventional approaches in the literature, the proposed framework adopts an alternative methodology.It commences using a classical subspace identification technique to identify the local models.This identification is performed on smaller data batches, selected using a sliding window.The next step involves classifying the estimated models and determining the number of local models using a robust clustering algorithm.Subsequently, data belonging to the same submodels are grouped, and a new identification process is initiated with more data to improve parameter estimates.Hybrid stochastic filtering is employed to reduce misclassifications further.The process is then repeated, with the sampled dataset divided into smaller subsets and the sliding window applied to segmented data to mitigate outliers.This iterative process continues until model estimates converge.An essential feature of the algorithm is that combining matrices from different systems is made possible by controlling the state basis in which the system matrices are computed.
The proposed identification technique builds upon earlier works presented in [21] and [22], and the present version includes significant new material to improve the accuracy and consistency of the identified models.Specifically, we have incorporated a new sliding window that moves from sample to sample, an algorithm to estimate the orders of the submodels, and an iterative identification procedure that uses a vital convergence check step to reduce misclassifications and improve the accuracy of the identified models.In addition, we compare this method with an existing method in the literature to demonstrate the features of the new methodology.
This paper is organized as follows.In Sect.2, the switched linear state-space model identification problem is formulated.Section 3 describes the proposed framework and presents the theoretical foundations involved.Section 4 presents the algorithm to be compared with our approach.Section 5 contains numerical results confirming our method's potential, and the conclusions are presented in Sect.6.

Problem statement
The discrete-time switched linear systems used in this work are described using the following state-space model: where k ∈ ℕ is the k-th sample, the discrete state of the systems are represented by m k ∈ ≜ {1, 2, … , M} , and M stands for the number of submodels or modes.The vectors u k ∈ ℝ n u , y k ∈ ℝ n y and x k ∈ ℝ n m k , are, respectively, the input, output and the continuous state of the system.The value n m k denotes the order of the submodel indexed by m k .The matrices A m k , B m k , C m k , and D m k have appropriate dimensions and describe each linear dynamic.Additionally, w k ∈ ℝ n m k , where w k ∼ N(0, Q k ) , and v k ∈ ℝ n y , with v k ∼ N(0, R k ) , rep- resent the process and measurement noises, respectively.These are treated as sequences of uncorrelated white or colored noise.
Problem 1: Considering system (1), the objective is to estimate the number of submodels M, the orders n m k for (1) each local mode, the system's switching times, as well as a realization { Âi , Bi , Ĉi , Di } M i=1 for each submodel from a dataset D = u k , y k N k=1 generated by this system.Assumptions: Problem 1 will be solved considering the following assumptions: 1.Each mode of the system is maintained for a minimum dwell time ( dwell ) before transitioning to the next mode.2. At each discrete time, the system operates within a single active local model 3.The input sequence u k persistently excites each system mode.4. All local models are assumed to be observable and stable.5. Matrices A m k are non-derogatory.
To address Problem 1, the unknown parameters can be treated as random vectors and represented by their joint probability density function (pdf).This approach enables the identification problem to be formulated as determining the conditional joint pdf of the parameters and segmentation.The pdf is defined as where Θ i are the continuous model parameters, which are defined as the set {A i , B i , C i , D i } , and S j is the segmenta- tion map and indicates the corresponding model for each data pair (u j , y j ) , i.e., S j ∶ D → {1, … , M} , where The notation in (2) can be relaxed to lighten the presentation along the paper, which implies that the terms {} M i=1 and {} N k=1 can be omitted in some cases.From the system (1) and the pdf (2), it is possible to obtain complete statistical information about the parameters and segmentation map.Thus, the point estimates of the parameters and segmentation can be obtained as In principle, this maximization can be solved using Bayes' rule.However, this calculation is a combinatorial problem because all possible mode sequences must be examined to find the optimal segmentation map and identify the respective parameter values [19].This requirement is computationally infeasible for larger datasets.
The authors in [19] solve this problem by relaxing the pdf (2) until a practically implementable method is obtained.However, in their research, Juloski (2005) incorporates a certain level of knowledge about the system and focuses on analyzing piecewise ARX systems with models described using input-output forms.
In this work, we consider an alternative approach that was inspired by ideas from the EM method [10], which is an iterative approach to find a local maximum of a (2) function.Let us consider the pdf in (2), which is factored as Thus, it is possible to formulate the parameter estimation problem as Hence, the key idea behind the approach that we introduce consists of maximization (5) using iterations that alternate between performing the maximization of p Θ i (Θ i | D) , which provides a current estimate of the submodels, and the maximization of p S k (S k | Θ i , D) , which computes a new segmen- tation map ( S k ).Then, this new segmentation map is used to find a new maximum of p Θ i (Θ i | D) .This procedure is performed repeatedly until the changes in the local models are no longer significant.This idea is similar to that used in the EM algorithm.
The procedure proposed in this paper to perform these tasks consists of seven steps: 1. Segmentation Data: Small datasets are formed by dividing the original dataset using a sliding window in time.2. Model Parameter Identification: A model is identified for each small dataset using a subspace identification method that yields a state-space representation of the model.3. Clustering: Using a data-clustering technique, the identified models are classified into several clusters.It is expected that the number of identified clusters will be equal to the number M of the submodels in the system.4. Update Model Parameters: The data that belong to the original datasets D are classified into M identified clus- ters, and a new identification process is performed.5. Update Segmentation Data: Misclassifications are reduced using the hybrid filtering algorithm.This algorithm uses the model parameters that are identified in the previous step and reclassifies the sampled data in different modes of the system.6. Convergence Check: Evaluate the local models.If the changes in these models are not significant (converged), then proceed to step 7; otherwise, return to step 1. 7. Final Identification: Once the measured data are correctly classified, a final identification process of local models can be performed.
Steps 1 through 4 maximize the term p Θ i (Θ i | D) of (5), whereas step 5 maximizes the term p S k (S k | Θ i , D) .This method is detailed in Sect.3. A flow diagram that clarifies the links among the seven steps is shown in Fig. 1.

The new identification framework
This section presents the main contribution of the work.The proposed approach allows the determination of the number and orders of local models and the parameter estimation of these models.Additionally, the algorithm allows an estimation of the system switching instants.The proposed procedure consists of seven steps, which are detailed below.

Step 1: Segmentation data
The methodology starts with the creation of small datasets to identify local models.These sets called local datasets (LDs), are generated by dividing the original dataset into sequential smaller sets using a sliding time window.
There is no knowledge regarding the segmentation map a priori (in the first iteration).Therefore, the sliding window is applied to the entire set of sampled data.To better describe this process, let us represent the original dataset as where T is a time horizon.
The sliding window divides the set T into ( N − W ) sequen- tial smaller sets {T j } N−W j=1 that contain W elements such that The time index in each set T j is used in the formation of s m a l l d a t a s e t s LDs that contain data from a single submodel are called pure LDs.On the other hand, mixed LDs contain data generated by different local models.Many mixed LDs can hinder identification and result in less accurate models.The information about the segmentation map obtained at each iteration is used to form new smaller datasets to reduce the formation of mixed LDs.After the first iteration, there is an estimate of the segmentation map, and the data points are classified into clusters . This classification is performed in step 5, which is described below.
Thus, after the first iteration, the sliding window is separately applied to different sets of clustered data . The building process of LDs continues as previously described.If there are errors in the data classification, mixed LDs are formed.However, these mixed sets are expected to diminish with each iteration performed.
Determining the sliding window size, denoted by W, is crucial to process identification.The window size must be sufficiently large to enable accurate model identification while having a duration that is less than or equal to the dwell time of the system, i.e., W ≤ dwell .However, deter- mining the appropriate window size remains challenging for approaches that use subspace identification methods to identify hybrid systems.Although there are some studies on this topic in the literature, determining the necessary and sufficient bounds for the identification window size remains unresolved.Thus, this paper will not address this problem.

Step 2: Model parameter identification
This step involves identifying the local models using subspace identification techniques.Specifically, for each local dataset LD j , a model j is identified and represented in state- space form.
Fig. 1 The proposed framework is presented in the form of a flow diagram.Each procedure step is enclosed within a box, and labels on the edges describe the input and output of each step Each identified local model is represented by its Markov parameter vector j defined as: where vec(•) is the vectorization operator.
These vectors contain all monitored parameters, and their dimensionality can be adjusted based on the application's complexity [29] and are used as observations for the clustering algorithm.One of the advantages of using these vectors is that it allows maintaining the dimensionality of vectors j even with local models of different orders.As demonstrated in [15], while the number of Markov parameters can be defined to match the complexity of the application, there exists a theoretical value of p = n m k + 1 that is adequate for accurately recovering the system matrices from the Markov parameters.
To identify the local models, any identification technique that provides a representation of the state-space model can be used.This work used a subspace identification method known as the PO-MOESP algorithm [35], a simple and efficient method that can address the nonwhite process and measurement noises.Additionally, the PO-MOESP method was combined with the methodology proposed in [26], which allows obtaining the estimated matrices in canonical form.In this way, all identified matrices are represented using the same state base.A summary of these methods is presented in [21].
The subspace identification method was also combined with an efficient algorithm proposed in [13] to estimate the orders of the systems using subspace methods.A concise overview of the methodology is provided below.For a more detailed understanding, the reader is advised to refer to the original work.

Order estimation
The estimation of the system order in subspace identification methods is usually performed by inspecting the singular values different from zero in specific matrices.However, inspecting these singular values is not trivial when a high noise level corrupts the system, the sample data are short, or the real system is nonlinear because every singular value differs from zero.In such cases, estimating the order as the number of dominant singular values is possible.However, these dominant singular values may not be evident.
Several works in the literature refer to order estimation of linear systems represented in state space, such as [5,13], and references therein.This work used the approach developed in [13].This method, called the NIDC criterion, uses the information contained in the estimated singular values of the system where the analysis of the largest neglected singular value in the SVD step is carried out.This approach gives (8) good results even for relatively small datasets.This algorithm is available in a Matlab toolbox for time series modeling called E 4 [13] and is defined as where n+1 is the n + 1 singular values, d(n) = 2nn u denotes the number of parameters, n = 0 , 1, … , s − 1 , are different orders evaluated, and H(T, s) is a penalty function defined as H(T, s) = exp −2 T −0.9 s 1.5 .The parameter s is the number of block rows in block-Hankel matrices, and T is the number of data used in the analysis.In (9), denotes the singular values of W 1 Y f W 2 , where Y f is the block-Hankel matrix, and W 1 and W 2 are two weighting matrices.The matrix W 1 is defined as The argument that minimizes (9) is the estimated order of the system.
However, in the proposed framework method in this paper, the orders of the local models are only estimated in the final identification of the models (step 7).First, to prevent variations in order estimation from causing large dispersion in the estimates of local models, thus reducing the performance of the clustering algorithm.Second, the hybrid filtering algorithm in step 5 requires that the local models have equal orders.Thus, step 2 applies a fixed order to all models.Because the orders are unknown a priori, we recommend that this number be equal to or larger than the expected maximum order.

Step 3: Clustering
Each vector of Markov parameters ( j ) identified in the previous step represents an observation sent to the clustering algorithm for classification.In this way, these vectors will be separated into different clusters.The number of identified clusters is expected to be the number of M submodels of the system.It is important at this stage to use a noise-robust classification algorithm with low initialization sensitivity.For this reason, this work used the Robust Competitive Clustering Algorithm (RCA), which was proposed [12].
The RCA methodology consists of minimizing the cost function presented in 10.This process is performed using robust statistical concepts where clusters compete for points and, when a cluster's size falls below a threshold , it is discarded.in this cost function B = ( 1 , … , C ) represents the proto- types that characterize each C cluster, u i,j indicates the prob- ability that point x j belongs to cluster i.Thus, concatenating the different points u i,j in a single matrix defines the ( 9) 623 Page 6 of 16 The set of N vectors in an n-dimensional feature space is denoted for X = {x j | j = 1, … , N} .The distance of each point x j from prototype i is represented by d ij .The optimal number of clusters C is found by appropriately balancing the two cost function terms and is carried out through the design parameter .The effects of outliers are reduced in the first term through the function i () .These outliers are discounted in the second term when the cardinalities are calculated through the weighting function

Step 4: Update model parameters
Identifying the number of submodels allows for a subsequent refinement of the model parameter estimates through a new identification process.It could be contended that this step is not entirely necessary, as the center of each cluster from the previous step already estimates the Markov parameter vector.However, these estimates may not be accurate, particularly in scenarios with numerous mode transitions resulting in many mixed LDs.Moreover, the data used in the segmentation process is obtained from identification within a small time window with limited data points, thereby making the identified partial models susceptible to significant effects of measurement and process noise, compromising their accuracy.Thus, identifying which points of the original data set were generated by the same local model is important to increase the accuracy of the identification process.This is a classification process, where available data points are classified into clusters {C i } M i=1 such that (u k , y k ) ∈ C i if and only if (u k , y k ) is attributed to the i-th mode.
To accomplish this task, it was the U C provided by the RCA.This matrix indicates the probability that point x j belongs to cluster i.The clusters that are identified by the RCA are represented by . Each j point is assigned to the cluster in which it most likely belongs.Then, the set of points ( u k , y k ) that generated each submodel j is assigned to the same cluster.This classification is defined as Once the entire original dataset is classified, it is possible to identify local models.An approach identical to the previously described subspace identification approach was used to obtain the new model estimates in state-space form.

Step 5: Update segmentation data
The classification described in the previous step has one main drawback.All data that belong to the same local dataset are assigned to the same cluster.However, there are errors in the classification of the points when the local datasets contain a batch of data from two successively activated models.These errors may increase when there are many mode transitions.
To reduce the misclassification, the segmentation data must be updated where each data pair (u k , y k ) is evaluated individually and assigned to the submodel that generated that pair.This task is considered a hybrid state estimation problem, involving estimating both discrete and continuous states of a hybrid system at a given time.However, here the focus is solely on estimating the discrete states.To achieve this objective, the Interacting Multiple Model (IMM) algorithm [7] is employed, which utilizes multiple models of Kalman filters (KFs).Notably, this algorithm seamlessly integrates measurement compatibility tests to automatically determine the most suitable set of measurement equations at each time step without necessitating additional verification.
Therefore, this algorithm uses the model estimates that are obtained in the previous step to estimate the discrete states.Thus, a new classification process can be performed, where the available data points are assigned to clusters ) is attrib- uted to the i-th mode.The superscript n in the notation was added to differentiate the classification performed in step 4. It is important to mention that the estimates of the switching times of the system are obtained through the estimation of discrete states in the IMM algorithm.A summary of the IMM algorithm can be found at [22].

Step 6: Convergence check
This step is responsible for checking the convergence of the algorithm.In our context, the convergence may be simply stated as when the model parameters do not suffer from large variations between two successive iterations.It was proposed to measure this variation using the Euclidian distance between the center of each cluster obtained in step 3.The algorithm is considered to have converged when these distances are smaller than a certain threshold c .
However, the proposed approach does not guarantee the convergence of the algorithm.Thus, if the chosen threshold is overly small or the noise level in the system is overly high, the algorithm may not converge, and the number of iterations can become notably large.Thus, to avoid excessive computation, an additional convergence criterion was used; namely, the algorithm's maximum number of iterations cannot exceed the value defined by i .
After reaching convergence or the maximum number of iterations, the algorithm proceeds to step 7 utilizing the last performed classification denoted as {C n i } M i=1 Otherwise, it should return to step 1.

Step 7: Final identification
The final identification of local models is performed using the last obtained classification ( ).To accomplish this task is used the same subspace identification approach was previously described.The proposed algorithm is summarized in Algorithm 1.

Pekpe's algorithm
Among the different off-line approaches of hybrid system identification using the described models in the state-space in the literature, only [8,16,30], and [31] perform the identification in the same context considered in this work, i.e., identify the same parameters using identical assumptions.Of these approaches, only the method proposed in [31] addresses the delay for the detection problem.Therefore, this work was compared with the framework proposed in this paper.This comparison is performed in Sect. 5.The algorithm in [31] was developed by K. M Pekpe, G. Mourot, K. Gasso, and J. Ragot.However, this algorithm is denoted as Pekpe's algorithm in the remaining text.
The proposed algorithm is based on a subspace identification technique.The principle is to use a time-sliding window to divide all sampled data into small data sets.Next, a subspace method is applied to each of these small datasets.However, the detection process suffers a delay due to statistical tests.Thus, the data immediately preceding the switching times are considered doubtful and classified only after estimating the Markov parameters.A merging algorithm, which uses Markov parameter comparison, is applied to combine similar local models.Subsequently, the parameters of the merged models are estimated in a one-time process, which is repeated iteratively until the number of models converges.This methodology will be briefly described here.For a more comprehensive elucidation, it is highly recommended that the reader refers to the primary source publication for further details.
The initial step in the procedure involves transforming the system (1) into a function dependent on the Markov parameters and the input block-Hankel matrix.
where the subscript j indicates that the j th local model is active.The sliding window size is denoted by the parameter L. The matrices U L and V j,L are block-Hankel matrices.The matrix ȳj,k,L is the matrix of outputs, which is defined as wj,k,L is the matrix of measurement noises and is similarly defined.
The matrices M d j,s ∈ ℝ n y ×n u s and M v j,s ∈ ℝ n×n u s represent the Markov parameters of the deterministic and stochastic components of the j th local model, respectively.The following expressions determine these matrices: To eliminate the influence of input terms from (12), the row space of the matrix ȳj,k,L is projected onto the orthogonal space to the row space of the input block-Hankel matrix.
where Π (U L ) ⊥ is an operator that realizes the orthogonal projection, and it is calculated as The authors in [31] demonstrate that the relation in ( 16) provides a residual vector that contains only the noise terms, which is a zero-mean Gaussian vector in the absence of model switching.The mean value of this vector is not null if a change occurs.Therefore, changes in the average value of this vector can be detected through a 2 test.The following relation gives the residual vector: where Consider and R the mean and variance matrix of the residual vector.Since the residual vector is Gaussian, the term j,k = T j,k R −1 j,k follows a 2 distribution with n y degrees of freedom.This way, no change occurs while j,k is less than the threshold .
The authors consider that a single submodel operates between a switching instant k 1 and a consecutive switching instant k 2 .Thus, estimating switching times allows for clas- sifying the data collected according to the system's operating mode.Only the points during the interval ]k 2 − t max ] are doubtful because of the detection delay ( t max ).After the initial estimation of the Markov parameter of submodels, these uncertain data are classified.
Identifying the Markov parameters of the local models are carried out in the next step of the algorithm.These parameters are obtained using least squares as follows: where the matrix of weights P j ∈ ℝ L×L of the submodel j is a zero matrix with the weights {p j,k } L k=1 on the main diagonal and p j,k is the weight associated to y j,k .Doubtful points are determined after estimating the Markov parameters.This task is accomplished using the ( 14) The dubious points are associated with the local model that presents the lowest error.
Then, similar local models are merged.For this, it is performed through a similarity test that calculates the distance of the Markov parameters representing each model.A 2 test is used to perform the similarity test.Considering as the test threshold, local models are similar if the distance between two Markov parameters is less than .The last step of the algorithm is the estimation of a realization ( A j , B j , C j , D j ) of each local model using their Markov parameters.The authors in [31] used the Eigenvalue Realization Algorithm (ERA) to obtain the minimal and balanced realization [18].

Simulation results
In this section, the performance of the proposed framework is demonstrated using numerical examples.In the first example, it was chosen a simple and didactic SISO switched system that allows a 3D representation to present tight and comprehensible results.In the second example, the algorithm proposed in this work was compared with Pekpe's algorithm.

Example 1
To demonstrate the procedure outlined in Sect.3, the switched system employed in [2] and [22] was considered.This system was chosen because it is a simple and didactic example that allows interested readers to make meaningful comparisons with the two approaches previously proposed in the literature.
In this way, the switched system comprises four secondorder Single Input Single Output (SISO) local models, which are represented by The switching sequence was generated considering a dwell time of dwell = 120 and a random function among the four possible operation modes of the system.An example of a switching signal is shown in Fig. 3.The system was persistently excited by a pseudorandom binary sequence (PRBS) with zero mean and unitary variance, considering N = 3200 samples.To evaluate the algorithm's efficiency, Monte Carlo simulations of size 100 were performed with different realizations of the input signal.Three simulation scenarios were created to evaluate noise robustness with signal-to-noise ratios of 20, 30, and 40 dB, achieved by adding zero mean white noise.Subsequently, the identification algorithm was applied to each data sequence.The dataset was divided into two sets for each simulation in the identification process.The first dataset, consisting of 2000 samples, was used for model identification, while the second dataset, with 1200 samples, was used to validate the estimated models.The user-specified parameters of the algorithm are as follows.The sliding window (W) is 100, and for the subspace identification algorithm, a block size of 12 was used.Because the system's order is unknown a priori, the order was defined to be 4 in step 2 of the algorithm (the real orders are estimated in step 7).The parameter p in ( 8) was defined to be 3 to allow a 3D visualization of the Markov parameters.The following values were used in the RCA: C = 6 , = 0 , = 10 , 0 = 10 −4 , k 0 = 5 , c = 12 , and Δ c = c min = 1 and = 0.05 * N s .The convergence parameters of the algo- rithm were set as c = 0.1 and i = 7 .The value to i was empirically established while noting that convergence was reached within seven iterations.
The criteria used to evaluate the algorithm were: detecting the switching instants, correctly estimating the number of modes and the order of each local model, and the ability of the estimated models to provide a simulated output close to the real output of the system from the true input signal.The similarity calculation between the real and simulated output on the validation data set was performed using the value of the variance-accounted-for (VAF) defined as [34]: where y and ŷ denotes the real and simulated outputs, respec- tively, and var(⋅) represents the variance calculation.
The results are presented in Table 1, which displays the VAF from 100 independent runs and the average classification data errors denoted as Error.The table also includes the count of instances where the algorithm correctly determined the number of local models, denoted as S s , and the number of times the algorithm accurately estimated the system order i, denoted as O i , throughout Monte Carlos simulations.
The proposed algorithm is robust to noise and can correctly estimate the number of local models even when a high noise level affects the system, as seen in Table 1.This good performance is mainly attributed to the acquired knowledge of the segmentation map along the iterations.Figure 2 illustrates this process showing the result of the clustering algorithm in different iterations for SNR = 30 dB.It can be seen in this figure how the number of outliers is reduced in each iteration, which makes each cluster more evident.The classification data in different submodels are realized with a relatively small error, as observed in Table 1 and Fig. 3.This small error allows for accurate and consistent estimating of the final models.In fact, the algorithm yields an average VAF greater than 90% for the three noise levels considered.This result showcases the good performance of the algorithm, despite the substantial presence of noise.Figure 4 reinforces the good performance of the algorithm by compared the eigenvalues of the estimated models with the true values.The quality of the estimates is reasonable in all simulated cases.Based on the observations from Fig. 5, it can be inferred that the estimated models effectively reconstruct the true output signal.It is worth noting that the size of the figure's observation window has been truncated for visualization purposes.
The method that was chosen for the order estimation leads to reasonable results, as observed in Table 1.As expected, the results deteriorate with an increasing level of noise in the system.However, for the considered example, the incorrect order estimation did not significantly affect the performance of the estimated models.
In the performed simulations, the algorithm converges for all runs.However, this convergence is directly related to the choice of parameters c and i .

Setting parameters and convergence issues
The proposed method has two limitations.The first limitation is that it requires an excessive number of design parameters.However, the parameter size window W and the of block rows s in the subspace identification are parameters common to all subspace identification methods.Moreover, as previously discussed, defining a suitable value of W remains an unresolved issue in the literature.The higher number of parameters is related to the RCA algorithm.However, this algorithm is robust to parameter initialization dependence, as shown in [12].In many simulations, we performed with different systems, all parameters were held constant, as described in Example 1. Finally, the problem of setting the IMM filter parameters is similar to the problem of adjusting a traditional KF.Therefore, the problem is relatively straightforward even though there are many parameters to adjust.
The second limitation is related to the lack of convergence analysis.However, as highlighted by [1], this problem is one of many challenging issues that remain unresolved in the field of hybrid system identification.As described in [4], the main difficulty in guaranteeing the convergence of the algorithms is related to the strong coupling between mode identification and parameter updating.
In our method, although the convergence of the entire procedure is not theoretically proven, it appears easily achieved with the proper choice of the parameter c .In fact, in the presented results, the algorithm converged in all simulations.However, the parameter c plays an important role in the convergence, convergence speed, and algorithm performance.Table 2 presents the algorithm's performance for different values of c .Decreasing the value of c improves the algorithm's performance but significantly increases the CPU time.In addition, the number of times that the algorithm converges is reduced.Therefore, the adjustment of c is a tradeoff between the CPU time and performance.The estimated models are consistent even when our algorithm does not converge.Indeed, in the simulations with c = 0.01 , the identified models exhibited the best performance, as observed in Table 2.The proper choice of the parameter c allows one to empirically establish the value of i such that the largest number of convergence of the algorithm is obtained.

Example 2
In this example, our proposed algorithm was compared with Pekpe's algorithm detailed in Sect. 4. The considered switched systems are composed of three third-order MIMO submodels, which are represented by The input signal with the same statistical properties as the previous example was considered.The switching signal in the system is shown in Fig. 7. Similar to the previous example, was performed a one hundred independent runs with different realizations of the input and output noise.Three different noise levels were considered: SNR = 10, 20, and 30 dB.In each run, the first 1500 points are used to identify a model, and the last 900 points are used to validate the estimated model.
Because we aim to compare the procedures, the order of the submodels was considered known to avoid variations from the order estimation algorithm.Exhaustive simulations were executed to determine the optimal tuning parameters of both algorithms.For Pekpe's algorithm the selected parameters were = 18.467 , t max = 10 .The other parameters were ( i = 18 , L i = 100 ), ( i = 9 , L i = 86 ), and ( i = 8 , L i = 40 ) for the noise levels of 10, 20, and 30 dB, respectively.For our algorithm, the parameters were:  The algorithms are evaluated and compared in various aspects, including accuracy, noise tolerance, estimated mode number, data classification performance, and computation time.As an additional measure of the performance of the algorithms, the Mean Squared Error (MSE) was utilized, defined as The results are summarized in Table 3.The elevated MSE values observed for both algorithms can be attributed to the substantial amplitude of the added noise in the output signal.As observed from Table 3, both algorithms display similar results for SNR = 30 dB.However, Pekpe's algorithm has a lower computational cost.When our algorithm is employed, the overall computation time increases by 4.44s.
However, our algorithm displays considerably better results when the noise level is increased.For SNR = 20 dB, our algorithms exhibit a 91% reduction in data mis- classification.As a result, the estimated models are more accurate, with a 27.2% reduction in the MSE to output y 2 ( MSE 2 ).When the noise level is increased to 10 dB, the MSE is reduced by 30.1% to output y 2 ( MSE 2 ), and the classification error is reduced by 84.7% .In addition, our algorithm produces smaller variances for the estimates, as observed in Fig. 6.The results in Table 3 demonstrate that our algorithms accurately recover the true number of submodels across all noise levels.In contrast, Pekpe's algorithm only exhibits satisfactory results at an SNR of 30 dB. Figure 7 presents the typical mode estimation results that were obtained using both algorithms.
The total computation time of our algorithm is increased by 656% in the worst case.Because the identification meth- ods are off-line, this increase in computation time is not considered a problem, and the increased performance justifies the use of our algorithm.

Conclusions
This paper presented and discussed a novel framework for identifying switched linear systems represented by state-space models from input-output data.The proposed approach employs an iterative method that involves data segmentation through subspace identification techniques, clustering, data classification, and hybrid stochastic filtering to partition the dataset into distinct submodels.This data segmentation is updated interactively based on posterior knowledge about the modes the hybrid filtering algorithm provides.
The simulation results indicated the efficiency of the developed method.Compared to Pekpe's simple and efficient algorithm, our algorithm identified more accurate and consistent models that yielded a lower MSE when the system was corrupted with high noise levels.However, the price of the increase in accuracy was the increase in computational complexity.In the scenarios analyzed in this paper, our algorithm resulted in a maximum increase in computation time of 656% , compared to the best-case reduction of 91% in misclassification error and 30.1% in modeling error, demonstrating a favorable tradeoff.In fact, this increase in computation time is not considered a restrictive problem because the identification methods are offline, and the increased performance justifies the use of our algorithm.Despite the promising results of the simulation examples, the lack of formal convergence analysis may still be  criticized.However, this problem is common in many algorithms in the field of hybrid system identification because the strong coupling between the model parameter estimation and data segmentation into discrete states generally makes convergence analysis difficult.Future work will focus on investigating the mathematical analysis of the algorithm and finding both necessary and sufficient bounds on the identification window size.

Fig. 2 623 Page 12 of 16 W
Fig. 2 Clustering results in different iterations for SNR = 30 dB: a first iteration; b second iteration; c third iteration; d fourth iteration.The red crosses are the centers of each cluster.In this case, the algorithm converges in 4 iterations

Fig. 5
Fig.5 Validation data with: a SNR of 40 dB.In this realization, the VAF obtained was 98.96% .b SNR of 20 dB.In this realization, the VAF obtained was 90.17%

Fig. 6 Fig. 7
Fig.6 Distribution of the criterion VAF over 100 independent runs of the identification algorithms.The distributions in blue refer to Pekpe's algorithm, and the plots in red refer to our algorithm

Table 1
Performance

Table 2
Convergence analysis under different values of parameter cThe simulation is performed with SNR = 20 dB and i = 7