Second Order Blind Identification of Event Related Potentials Sources

Event-related potentials (ERPs) recorded on the surface of the head are a mixture of signals from many sources in the brain due to volume conductions. As a result, the spatial resolution of the ERPs is quite low. Blind source separation can help to recover source signals from multichannel ERP records. In this study, we present a novel implementation of a method for decomposing multi-channel ERP into components, which is based on the modeling of second-order statistics of ERPs. We also report a new implementation of Bayesian Information Criteria (BIC), which is used to select the optimal number of hidden signals (components) in the original ERPs. We tested these methods using both synthetic datasets and real ERPs data arrays. Testing has shown that the ERP decomposition method can reconstruct the source signals from their mixture with acceptable accuracy even when these signals overlap significantly in time and the presence of noise. The use of BIC allows us to determine the correct number of source signals at the signal-to-noise ratio commonly observed in ERP studies. The proposed approach was compared with conventionally used methods for the analysis of ERPs. It turned out that the use of this new method makes it possible to observe such phenomena that are hidden by other signals in the original ERPs. The proposed method for decomposing a multichannel ERP into components can be useful for studying cognitive processes in laboratory settings, as well as in clinical studies.


Introduction
Event-related potentials (ERPs) are widely used to study the cognitive functions of the human brain (Luck and Kappenman 2011).ERPs recorded on the surface of the head represent the electrical activity of the brain in response to stimuli or certain events, and reflect the course of various cognitive processes (operations).
The conventional approach to ERP research usually involves the following steps.Multichannel electroencephalogram (EEG) is recorded in a subject performing a specific task.This task consists of several hundred of trials.In each trial, one or more stimuli are presented and, depending on these stimuli, the subject makes a specific response.Depending on the type of stimuli and the subject's response, the trials are divided into several categories (conditions).ERPs are calculated by averaging EEG epochs corresponding to trials of the same category.Comparing the ERPs under different conditions determined by the task, it becomes possible to identify the modulations of brain signals associated with the cognitive processes (operations) of interest.Several subjects usually participate in such studies, and for each of them the individual ERPs are calculated separately.To reveal common features of brain signals, ERPs are averaged across all subjects and appropriate statistical analysis, such as clusterbased analysis (Maris and Oostenveld 2007), is performed.
ERPs provide excellent temporal resolution limited only by the EEG sampling rate.However, the conventional approach has a limitation.Because of the volume conductance, the scalp-placed sensors record mixtures of signals from distant cortical sources.Therefore, the spatial resolution of ERPs is rather poor (Nunez and Srinivasan 2006).
Several approaches based on Blind Source Separation (BSS) techniques have been proposed to overcome this problem.Many of these methods focus on the single-trial ERP analysis, in which hidden source signals are separated for each EEG record separately.Such studies often use general-purpose BSS algorithms.In particular, a popular Information maximization (Infomax) algorithm was used to identify statistically independent EEG components (Jung et al. 2001;Makeig et al. 2002).As an alternative, the second-order blind identification (SOBI) algorithm was applied to separate low-correlated EEG components (Tang et al. 2006).A well-known tensor decomposition called parallel factor analysis (PARAFAC) has also been used to separate EEG components in the frequency domain (Miwakeichi et al. 2004;Mørup et al. 2006).Some other algorithms take into account the variability of ERP latency from trial to trial.These include differentially variable component analysis (dVCA) (Knuth et al. 2006), threeway shifted PARAFAC (ShiftCP) (Mørup et al. 2008) and Bayesian Estimation of ERP (BEEP) (Wu et al. 2014).While these approaches provide valuable information, they have a number of limitations.First, signals from hidden sources obtained for each subject individual cannot be quantitatively compared with each other, since they are estimated up to an arbitrary scale (Comon and Jutten 2010).To overcome this limitation, some of authors propose to evaluate hidden signals for all EEG recordings simultaneously (Congedo et al. 2010;Ponomarev et al. 2014).Second, the original EEG serves as the input for these methods.Since ERPs are relatively small in comparison to EEG, these methods primarily separate the hidden sources of EEG signals.However, the spatial localization of the sources of the initial EEG and ERP is not necessarily the same.Therefore, signals from hidden sources of ERPs may not be separated.
On the other hand, it is well known that ERPs vary from person to person.Figure 1 illustrates this observation.It is easy to see that the variability of the temporal patterns of the Dotted vertical lines show the beginning and end of the first and second stimuli in the trial (the beginning of the first stimulus is located at the origin of the x-axis).The lower panels represent group-averaged ERPs ERP can be described by variation in the amplitude, width, and peak latency of individual ERP waves.
Several methods are proposed for separating signals from hidden sources of ERPs by modeling their inter-subject variability.
One such approach is to use PARAFAC (Field and Graupe 1991;Verleger et al. 2013;Ponomarev et al. 2019;Ponomarev and Kropotov 2021).However, this approach assumes that the hidden source signal waveforms are the same for all subjects, which is not true for all possible sets of ERPs records (see, for example, Fig. 1).
An alternative approach, which does not assume the identity (up to the magnitude) of signals from hidden sources, can be based on the analysis of second-order statistics.The basic ideas of this approach were proposed in our previous studies (Ponomarev and Kropotov 2013;Kropotov and Ponomarev 2015).However, the application of this approach to different task settings revealed a number of its limitations.In particular, the previously proposed implementation of this method made it possible to acceptably separate signals from hidden sources of ERPs in relatively simple cases, when these signals were observed in a short time interval (several hundred milliseconds) and the number of signals was relatively small.To overcome these limitations, in the present article, we critically reconsider the main basic and computer implementation of this method based on recently proposed new algebraic algorithms.We also test this new approach on synthetic data as well as real ERPs from a large group of healthy adults performing the cued GO/NOGO task in which ERPs are observed over a relatively long period of time, i.e., for about two seconds.

Transient Signals Second Order Blind Identification (TSSOBI)
The BSS approach was used for the separation of latent (hidden) ERP components.Many of the well-known methods for extracting signals from multi-channel EEGs (such as, Infomax, JADE, SOBY and many others (for review, see the book by Comon and Jutten ed (2010) are based on the assumption that these signals are stationary or piecewise stationary.This assumption is not suitable for ERPs, which by definition are transient responses.Following the previously proposed ideas (for review, see Yeredor 2010), we assumed that second-order statistics could be used to separate the hidden signals of transient responses.By analogy with the wellknown method SOBI (Second Order Blind Identification), which is designed to analyze stationary signals (Belouchrani et al. 1997), we propose to call our method TSSOBI (Transient Signals Second Order Blind Identification).

TSSOBI Description
Let p ∈ R N×T is a multichannel ERPs obtained in p-th sub- ject, p = 1, ⋯ , P , where N is the number of electrodes and T is the number of time points.Group-averaged ERPs are calculated as follows: A covariance matrix of multichannel ERPs for each pair of indices t 1 and t 2 is given by following formula: Assume that multichannel ERPs are described by the following model: where ∈ R N×K is mixing matrix, p ∈ R K×T is hidden signals, p ∈ R N×T is additive noise, K is the number of hidden signals, and N ≥ K .(Note that this model assumes that the mixing matrix is the same for all subjects, which is an approximation).
Next, assume that the deviations of signals from their group average are approximately mutually uncorrelated.This means that 1 where t 1, t 2 ∈ R K×K are diagonal matrices.
Then to obtain we need to solve the joint diagonalization (JD) problem for t he set of matr ices t 1, t 2 , t 1 , t 2 = 1 ⋯ T .However, an exact solution to this problem may not exist.Then we need to estimate a diagonalizing matrix ∈ R K×N of rank K such that ∀t 1 , t 2 = 1 ⋯ T are as diagonal as possible, i.e. we need to perform an approximate joint diagonalization (AJD) of the set of matrices.Then can be obtained as Moore-Penrose pseudo-inverse of ( = + , = + ).
When and are received, the signals and noise are evaluated as follows: Usually ERPs are recorded under multiple conditions.In this case, we can assume that the mixing matrix does not depend on conditions.Then the estimation of the mixing matrix is similar to that described above, but with some important differences related to the calculation of the set of covariance matrices.These calculations are performed for each condition separately with the subsequent inclusion of the obtained covariance matrices in the joint set.
Remark 1 Alternatively, ERPs obtained under multiple conditions can be concatenated into a single time series (Ponomarev and Kropotov 2013).Then covariance matrices are calculated not only for time points and within one condition, but also between conditions.This approach seems to be equivalent to the one described above, but the number of matrices for AJD in this case grows proportionally to the square of the number of conditions, which leads to a rather noticeable increase in computational complexity.

TSSOBI Limitations
1.This model assumes that the mixing matrix is the same for all subjects, which is an approximation.2. This method does not work if there is no signal difference between the subjects.This follows from the fact that, in the absence of noise, the covariance matrices are zero if the individual signals do not differ from their group mean.In other words, the presence of intersubject variability in ERPs is a necessary condition for separating hidden source signals.3. Following Theorem 7.1 (Yeredor 2010), it is easy to show that and are identifiable if the variability of sources between subjects is such that there is no covariance for one source that is a scaled version of another.For example, sources will have indistinguishable (up to scale) covariance if they have signals with the same waveform, in which only the amplitude changes between subjects.Therefore, signals from sources located in the sensory areas of the left and right hemispheres, corresponding to the early peaks of the ERPs, seem to be indistinguishable. (7) See "Comparison with the Previous Implementation" for a discussion.

AJD Procedure
The requirements for the AJD procedure are derived from the ERP properties.
1.The elements of the covariance matrix t 1, t 2 are real values.2. In cases where t 1 ≠ t 2 these covariance matrices are non- symmetric.

Topographies of hidden ERPs signals (columns of matrix
) usually overlap in space and are not orthogonal.4. The number of signals ( K ) is usually less than the num- ber of electrodes ( N ).This case is referred to as a low- rank AJD setting.
Therefore, for ERPs, it is preferable to perform low-rank non-orthogonal AJD of real non-symmetric matrices.
Remark 2 In case K < N , most of application include a dimension reduction procedure based on SVD (the singular value decomposition).Usually, SVD is performed for initial data array as follows: = T , where ∈ R N×N is a unitary matrix, ∈ R N×TP is a diagonal matrix with non- negative real numbers on the diagonal (singular values) and ∈ R TP×TP is a unitary matrix.Further, by selecting the first K eigenvectors in (corresponding to largest singu- lar values) as a new basis, the signal space is reduced to K dimensions.The noise space is spanned by remaining N − K eigenvectors.Actually, it is assumed that the first K eigen- vectors correspond to signals space and remaining eigenvectors related to the noise (for example, see Rasmussen et al. 2008;Ponomarev and Kropotov 2013).However, this assumption is not necessarily true in the case of ERPs.In particular, the signal space and the noise space may overlap.As a result, the estimates of both the mixing matrix and the source signals may be biased.Moreover, low-power "ghost" components with noise-like signals are often observed in this case.Therefore, low-rank AJD is preferred over the SVD-based dimensionality reduction that was used in the previous implementation of this method.
Remark 3 In the case of exact JD, the problem (6) for non-symmetric matrices can be reduced as follows: Moreover, T t 1, t 2 = t 2, t 1 in accordance with the Eq. ( 5).
Therefore, problem (6) reduces to the following: T , ∀t 1 ≤ t 2 , and in this case, we can use a low-rank non-orthogonal AJD algorithm for symmetric matrices, which has less computational complexity (see Supplementary materials 1).In what follows, we will refer to this approach as TSSOBI-symmetric algorithm.However, in the presence of noise, the last equality holds only approximately, which can lead to biased estimates of the matrix A.
A low-rank non-orthogonal AJD algorithm can be developed based on QR or LU matrix factorization.We took the existing QR and LU based Jacobi-like algorithms for nonorthogonal JD (Afsari 2006;Wang et al. 2012;Gong et al. 2015;Cheng et al. 2018) and modified them following the ideas proposed by Kuleshov et al. (2015).As a result, we got six Jacobi-like algorithms (three QR based and three LU based) for low-rank non-orthogonal AJD of real non-symmetric (as well as, symmetric) matrices.We implemented these algorithms in C ++, tested them using simulated data, and chose the one with the best performance, which we called PLRQRJD (Parallel Low-Rank QR-based Joint Diagonalization) and used it further.The description of all six AJD algorithms and testing results are given in Supplementary materials 1.
Jacobi-like algorithms for JD are iterative procedures using a special cost function, the minimum of which must be found.In the presence of noise, this cost function can have several local minima, and the iterative procedure can converge to any of them, depending on the starting point.To avoid this ambiguity, we calculate the starting point as follows: First, we compute all covariance matrices for t 1 = t 2 ∀t 1 , t 2 = 1 ⋯ T according to Eq. ( 2) and average them.For this averaged covariance matrix − , which is symmetric semi-positive definite, we calculate eigendecomposition: T .Then, we transform the original data into eigenvector space as p = T p , calculate the set of covari- ance matrices for p according Eq. ( 2) and estimate ̂ by performing AJD with starting point ̂ = N , where N is the identity matrix of size N .The final diagonalizing matrix is computed as = ̂ T .

Dimension Reduction
A dense grid of electrodes is often used in modern ERPs research, that is, N is large (one or several hundred), while the effective number of ERPs components seems to be relatively small ( K ∼ 10) .In such cases, it seems reasonable to reduce the dimensionality of the original data.This can be done if we use the previously obtained eigendecomposition and select N ′ ( N � > K)eigenvectors corresponding to the largest eigenvalues as the eigenvector space with a basis ̂ of size N × N � .The simplest way to estimate N ′ (perhaps not optimal) is to select the eigenvectors describing the main part of the data variance.In other words, N ′ is the minimum value for which the following inequality holds: where nn are sorted in descending order and the value of threshold is chosen close to 1.
Then the estimation of the mixing matrix is performed as follows.An appropriate threshold is chosen so that the fol- lowing inequalities hold: N > N ′ > K .Dimension reduction is performed as described above.The matrix ̂ ∈ R K×N � is calculated using the algorithm of low-rank AJD.The final diagonalizing matrix is computed as = ̂ ̂ T .It is clear that the use of dimensionality reduction results in a better SNR at the input of the AJD algorithm, faster computations, and less sensitivity to roundoff errors (due to the smaller dimension of the problem).However, if N ′ is too close to K , the mixing matrix estimation accuracy may be worse (see Remark 2).Therefore, the threshold should be chosen close enough to 1 (for example, threshold = 99.9).

Selection Time Interval for TSSOBI and Optimization of Computations
As we will show below, the accuracy of estimating the mixing matrix critically depends on the signal-noise ratio.Intuitively, the impact of noise on estimating the matrix will be smaller when the ERP variance between the subjects will be bigger.Therefore, if the noise is assumed to be stationary, it is quite reasonable to choose a suitable time interval for calculating a set of covariance matrices corresponding to a large ERP variance between the subjects.
On the other hand, if the noise spectrum is wider than the ERPs spectrum, an effective way to increase the signal-tonoise ratio is bandpass filtering the original data.
Finally, since formula ( 5) is valid for any pair of indices t 1 and t 2 , any subset of covariance matrices can be used to estimate the mixing matrix to reduce the computational complexity.One effective way to select a subset of covariance matrices can be the simple sub-sampling of ERPs.

Estimating the Number of Hidden Signals
The number of hidden signals K is unknown and we need to estimate it.Previously, Bayesian Information Criterion (BIC) was proposed for the linear instantaneous mixing model of stationary signals (Kolenda et al. 2001;Rasmussen et al. 2008).Following the ideas of these authors, we have developed BIC for non-stationary signals such as ERPs.

BIC for Non-stationary Signals
Let there is a set of models indexed by M .The probability of a specific model given the observed data is denoted by P(M| ) .Bayes optimal decision rule leads to the optimal model Using Bayes' theorem, P(M| ) can be written as The priory probability P(M) reflects our prior beliefs in the specific model, and it can be assumed as uniformly distributed over the set of models.P( ) distribution does not depend on the model.
A model is typically defined in terms of a set of parameters .Then, for a particular choice of the model we have where P( | , M) and P( |M) are respectively the likeli- hood and the priory of the model parameters.
The integral in Eq. ( 11) is often too complicated to be evaluated analytically.Hansen, Larsen and Kolenda (Hansen et al. 2001;Kolenda et al. 2001) propose to use the Bayesian Information Criterion (BIC) approximation (MacKay 1992).With this approximation the integral become where d is the number of free parameters, N is number of samples (training cases) needs to be large, and * is the socalled maximum posterior parameters that maximized the integrand.
Let us have the model given by Eq. ( 3) and we need to estimate K opt (the optimal number of hidden signals).In this case, we will compare models with different K (i.e.M cor- responds K).
Assuming independence of signal and noise we have Then, evaluating the integral in (14) we have where tr denotes the trace of matrix and elements of ̂ s k and ̂ s k are calculated, respectively, as follows: Next, we need to obtain an estimate for P( |K) .Denote the noise estimate as follows ̂ = ̂ 1 , ⋯ , ̂ P ∈ R N×TP .It should be noted that rank � < N because the noise was esti- mated using the Eq. ( 8).To evaluate the rank of ̂ and obtain linearly independent noise components, we used the SVD.The calculations were carried out in the conventional way: by performing SVD, ̂ = T ; estimating N E = rank ̂ as a number of eigenvectors for which jj > tol; keeping N E col- umns of corresponding to the largest jj in ∼ ; and computing the components of the noise as Assume that all noise components are isotropic, stationary and zero means Gaussian distributed processes with equal variance, i.e. ) is the noise component, v and ∈ R T×T are the variance and the autocorrelation of noise, respectively, = Toeplitz(R(0), R(1), ⋯ , R(T − 1)) , and R( ) is the auto- correlation function.
If we have exact and p then P( |A, K) become delta function.This leads to the following estimate: Remark 4 If we assume that noise has no autocorrelation, the equations ( 18) are reduced to 2 ∕N E TP , respectively (see Kolenda et al. 2001).
Having P( |A, K) and P( |K) and assuming that we have no prior information about the parameters of the model (i.e.P( * |M) is a constant), BIC(K) can be defined as follows: Then, the optimal number of signals can be estimated asK opt ≈ argmin K BIC(K).
If the ERPs are recorded under several conditions, and the matrix is evaluated for these conditions simultaneously, then BIC(K) is the sum of BIC(K) calculated for each condition separately.
Remark 5 In the previous implementation, ERPs under different conditions were combined into one time series, which led to a large size of the matrices s k , and, accordingly, to difficulties in calculating its determinant and inverse.The current implementation, where calculations are performed for each condition separately, partially simplifies this problem.

BIC Implementation
The s k , s k and R( ) are unknown and we need to estimate them.
In the case of ERPs, the noise is mainly the EEG in eyes open condition, averaged over samples.Then R( )can be = ∫ (2 approximately estimated as a function of EEG autocorrelation in resting eyes open condition averaged over all channels and subjects. For estimating s k and s k we use the methods suggested by Rasmussen et al. (2008).Since the data set consists of a large number of ERPs obtained from different subjects, we have the opportunity to split randomly the data set into two subsets: train and test.Then, s k and s k are estimated using train subset and ̂ s k and ̂ s k -using test subset.This splitting was carried out several times (100 in our case).The BIC(K) function was calculated for each train and test subsets separately and averaged.
To estimate BIC(K) we need to calculate s k −1 and However, s k can be singular in the case of ERPs when the noise is close to zero.To overcome this problem we calculate eigendecomposition of s k .Since s k is a real symmetr ic semi-positive def inite matr ix then , where s k is an orthogonal matrix whose columns are eigenvectors of s k , s k is diagonal matrix whose entries are the eigenvalues of s k , which are greater than or equal to 0 ( s k ,tt ≥ 0) .Making the following sub- stitutions 15), the third term in ( 19) can be rewritten as follow: The exact sum in (20) cannot be calculated if any s k ,tt = 0 .Therefore, we assume that we can obtain a suit- able approximation if we perform the summation in Eq. (20) for Λ s k ,tt > tol Σ ⋅ max k Tr ⌣ s k , where ⌣ s k is estimated using entire data set and tol is a small positive constant.
Remark 6 In the previous implementation, to invert the nearsingular matrix s k or calculate its determinant, we used the traditional approach, in which a small value was added to the diagonal elements of this matrix (Chen et al. 2010).In this case, BIC-based estimates of the number of signals in the model were acceptable if the duration of the time interval chosen for analysis was relatively short.On the contrary, in the case of large time intervals for analysis, the estimates of the number of signals were unambiguously biased, because the BIC(K) often did not have a minimum and was a mono- tonically increasing function. (20) where E In addition, when the signals are close to zero and P → ∞ , s k ,tt ≈ 0∀t and BIC(K) cannot be calculated.To overcome this problem we have made the following assumption.Firstly, very weak signals are not relevant to ERP research because it is usually difficult to estimate them with acceptable accuracy.Secondly, these weak signals are zero with probability 1, and they do not affect the BIC value.Thirdly, the signal is very weak if , where ⌣ s k is estimated using entire data set with following normalization of the columns of : Finally, if K < N the matrix is not square.Moreover, rank of can be lower than K , for example, if K > K opt and the noise is closed to zero.To overcome these problems, we perform singular value decomposition of ( = T ) and calculate ln‖ ‖ as , where tol is small positive constant.
Accuracy of estimating the optimal number of hidden signals (K) using BIC(K) depends on both the signal to noise ratio (SNR) and the values of the constants tol , tol s and tol .Intuitively, these estimates will be more accurate in the case of a large SNR if the constants tol and tol s are relatively small, and vice versa.However, it is difficult to predict these optimal constants in advance.Using simulated data, we have empirically determined that for the relatively low SNR commonly observed in ERP studies, the appropriate values for these constants are as follows: tol = 10 −3 , tol s = 10 −4 and tol = 10 −4 (see Result).

Numerical Experiments
We tested the methods described above using both real ERPs data array and synthetic datasets.

Real ERPs Data
Real ERPs were taken from the large dataset (HBI database, www.hbimed.com/ en/ hbi-datab ase/).During the collection of this database, all studies conducted in accordance with principles for human experimentation as defined in the Declaration of Helsinki and International Conference on Harmonization Good Clinical Practice guidelines, and approved by the Ethics Committee of the N. P. Bechtereva Institute of the Human Brain, Russian Academy of Sciences (IHB RAS), St. Petersburg, Russia and the Local ethics committee of the canton Graubunden, district of Grison (Switzerland).Informed consent was obtained from each study participant after they were informed of the potential risks and benefits, as well as the exploratory nature of the study.These ERPs were early described many times (Kropotov and Ponomarev 2015; Kropotov et al. 2016).Briefly, the EEG was recorded in healthy subjects with a Mitsar-EEG electroencephalograph.The frequency range was 0.53-50 Hz, a notch filter was applied on the range 45-55 Hz, and the sampling rate was 250 Hz.Electrodes were placed according to the international 10-20 system with linked earlobe electrodes as a reference and Fpz as a ground.The electrode impedance did not exceed 5 kΩ.Eyeblinking artifacts were eliminated by decomposing the EEG into independent components and setting the signal components corresponding to eye blinks to be zero (Vigário 1997).Other artifacts were eliminated using the threshold criteria that had been selected empirically (Ponomarev et al. 2014).
Subjects had to perform a visual cued Go-Nogo task, where images of animals (A), plants (P), and humans (H) were presented.The duration of a stimulus was 100 ms.To maintain the level of attention, unexpected sounds accompanied the presentation of human images.Stimuli were presented in pairs (A-A, A-P, P-P, and P-H), each pair corresponding to one trial.The interval between the two stimuli in a pair was 1 s.The trial duration was 3 s.Pairs of stimuli were presented with equal probabilities in a pseudo-random order.The subject was instructed to press a button with his right hand upon presentation of A-A pair after second stimulus and to act as accurately and quickly as possible.When the task was performed incorrectly, the respective trials were excluded from further analysis.ERPs were calculated by averaging, which was performed for each trial type (condition) separately.Individual ERPs calculated with an insufficient number of averaged trials (less than 80 for each condition) were excluded.Only ERPs under conditions A-A, A-P and P-P were taken for further analysis.Averaged over subjects ERPs are shown in Fig. 2. So, the size of the real data array were as follows: N = 19 , P = 484 , T = 750 , C = 3.

Synthetic Datasets
When generating synthetic data sets, the following ERPs properties were taken into account: 1. Signal topographies overlap in space.2. Consistent with a commonly used simplification, we assumed that ERP signals are bell-shaped curves with different amplitude, peak latency, width, and polarity, and these curves overlap with each other in time.3. The amplitude, peak latency, and width, but not polarity, of these signals vary from person to person.Moreover, signals can exist in some conditions and absent in others.
In other words, the amplitude of the signals under some conditions can be equal to zero. 4. We also assumed that the noise is stationary and that it is mainly an EEG in the state of open eyes, averaged over trials, ignoring the well-known phenomena of the so-called EEG desynchronization upon the perception of visual stimuli and movement.
We took the signal topographies (the columns of mixing matrix ) from some seemingly suboptimal models for the real ERPs data array described in "Real ERPs Data".(This models were obtained by the method discussed in "Transient Signals Second Order Blind Identification (TSSOBI)".) We assumed that the signals are Gaussian functions defined as follows: where a k,c,p ( ) , k,c,p ( ) and k,c,p ( ) can be constant or random variables in dependence on , , and values.Namely, if = 0 , (or = 0 , or = 0 ), then a k,c,p ( ) (or k,c,p ( ) , or k,c,p ( ) ) is constant, and a k,c,p = a k,c (or k,c,p = k,c , k,c,p = k,c ).Other- wise these values are drawn from truncated normal distribution N , , x min , x max using an appropriate random number generator as follows: . Thus, by choosing the appropriate values for , , and , we can obtain synthetic datasets in which only the amplitude or only the peak latency or only the width, or all these characteristics of the bellshaped curve in any combinations, randomly change from subject to subject.
We simulated noise ( p c ) using autoregressive (AR) models obtained for each electrode position separately.The parameters of these AR models were estimated for each position of the electrodes separately using the autocovariance functions of the EEG in the state of open eyes, averaged over all subjects.Thus, we simulated stationary and anisotropic noise, which is closer to the properties of ERPs compared to isotropic noise.
Having matrix , we obtained mixtures of signals as follows: , where is a real positive value specifying the signal-noise ratio.When we tested the methods, we used two different synthetic datasets.

Synthetic Dataset 1
The synthetic dataset 1 is one conditional case, where C = 1 , N = 19 , P = 484 and T = 750 .The parameters of signals was as follows: = 10 ⋯ 300 , k,c = 100 , = 1 , = 0 , = 0 and = 0 ⋯ 1 .In this dataset, the sig- nals amplitude vary from case to case, and the peak latency of the bell-shaped curve linear increases from component to component.The signals were generated in time interval t = 0 ⋯ 3000 with step t = 4 .Such a synthetic dataset allows us to investigate the dependence of the accuracy of the mixing matrix estimate both on the signal-to-noise ratio and on the degree of overlap of signals.
All numerical experiments with TSSOBI were performed with a 5-fold sub-sampling of the input data, i.e. five neighboring samples were averaged to improve the signal-noise ratio.

Performance Index and Relative Residual Error
To evaluate the performance of the multichannel ERP decomposition method we use the following quantitative measures: 1. Amari Performance Index that was defined in our case as follow (Amari et al. 1996) where = 1 + 2 .This index is always positive and equal to zero when = , where and are respectively a diagonal matrix and a permutation matrix.The PI measure is an estimate of the degree of closeness of two matrices up to an arbitrary scale of columns and their permutations.In other words, a smaller PI value corresponds to closer matrices 1 and 2 .2. Relative Residual Error (RRE), which is calculated for individual and average signals, respectively, as follows RRE (or approximation error) indicate the discrepancy between an data and some approximation to it.In our case, RRE indicates how accurately the model describes the data: a lower RRE value corresponds to better accuracy.RRE I is average approximation error of ( 21) 1 3 individual ERPs, and RRE A is approximation error of group average ERPs.

Results
In this section, we will first show that TSSOBI can reconstruct signals from their mixture.We will then demonstrate the ability of BIC to estimate the number of components in a signal mixture.Finally, we will give an example of using the proposed methodology for analyzing real ERP.
We repeated this numerical experiment many times, varying the values of the parameters and .We generated a synthetic dataset 100 times for each pair of values for these parameters, calculated the performance index separately for each case, and averaged.The results of this experiment are shown in Fig. 4.This figure indicates that the accuracy of the mixing matrix estimation decreases with decreasing SNR, as well as with increasing overlap of signals that are identical in shape.
Similarly to the previous section, we repeated this numerical experiment many times varying SNR, as well as , and .The results are shown in the left panel of Fig. 6.This figure shows that the TSSOBI can separate signals that differ from case to case (from subject to subject) in amplitude, latency, or width, or all of these features simultaneously.
From a practical point of view, it is interesting how the performance of TSSOBI depends on the number of subjects in the dataset.An example of such a dependency is shown in right panel of Fig. 6 for the cases when = 1 , = 0 and = 0 .The results were similar for other cases.It can be seen that the accuracy of the estimation of the mixing matrix increases with increasing the number of subjects.However, this is a consequence of the fact that in the presence of noise, the accuracy of estimating covariance matrices improves with an increase in the number of subjects.

BIC Tuning and Testing
We used synthetic dataset 2 to tune and test the BIC.The epoch used for these purposes was selected based on the analysis of the average variance of the signal mixture and is shown in Fig. 7a.When tuning BIC, we empirically adjusted the values of the constants tol , tol s and tol so that the minimum of the BIC(K) function corresponds as often as possible to the true number of signals K under medium noisy condi- tions ( = 0.01 ⋯ 0.3 ).We have found that the appropri- ate values for these constants are as follows: tol = 10 −3 , tol s = 10 −4 and tol = 10 −4 .An example of the BIC(K) function is shown in Fig. 7b for the cases when = 1 , = 0 and = 0.
We tested BIC many times, varying the values of , , and .We generated a synthetic dataset 100 times for each combination of these parameters.Next, for each dataset separately we calculated BIC(K) , found its mini- mum and compared with the true number of signals.The percentage of correct estimate of the number of signals for synthetic dataset 2 using BIC is given in Table 2.These results show that BIC provides a correct estimate of the number of components in almost 100% of cases with a noise level corresponding to in the range of 0.01-0.3.Note that the SNR for commonly observed ERPs corresponds to being in this range.

Separation of Hidden ERP Components in the Real Dataset
The epoch used to separate the ERP components was selected based on an analysis of the average variance of the original ERPs.This epoch is marked with vertical lines in Fig. 8a.The BIC(K) function calculated for this epoch is shown in Fig. 8b.It has a minimum at the point K = 11 , which indicates that the optimal number of components is 11.
Figure 9 shows topographies and signals of hidden ERP sources.This model describes the original ERPs with acceptable accuracy: RRE I = 2.95 ⋅ 10 −2 and RRE A = 2.32 ⋅ 10 −3 .Figure 9 shows that components have different power

Comparison with TSSOBI-Symmetric
We have performed similar tests using the TSSOBI symmetric algorithm (see Remark 3).A detailed description of these results is given in Supplementary materials 2.Here we will only briefly note that the performance of TSSOBIsymmetric was similar in most tests.But, the model obtained for real ERPs described them with slightly less accuracy: RRE I = 2.97 ⋅ 10 −2 and RRE A = 2.35 ⋅ 10 −3 .Also, the BIC K opt was smaller when we used TSSOBI compared to TSSOBI-symmetric: BIC = 47.2,SD = 0.1 and BIC = 47.5, SD = 0.1, respectively.Although these differences are relatively small, they allow us to assume that the ERP source signal model shown in Fig. 9 seems to be better.

TSSOBI Advantages
TSSOBI as a BSS method does not rely on the stationarity assumption of hidden sources and can be applied to transient responses such as ERP.The results of the present study have shown that the use of TSSOBI makes it possible to obtain a model, which describes both individual ERPs and groupaveraged ERPs with acceptable accuracy.The obtained ERP model supports the well-known fact that late ERP waves are a superposition of signals from many brain sources and that these waves could be functionally heterogeneous.
Indeed, TSSOBI, applied to the real ERP dataset, separates correlates of visual processing into distinct occipital and temporal components (components 8 and 3, respectively) while operations of cognitive control seem to be reflected in parietal, frontal, and central components with distinct topographies and distinct time dynamics in response to different categories of trials.In other words, the use of the ERP model makes it possible, at least in part, to separate  The use TSSOBI, apparently, makes it possible to reveal such phenomena that are hidden by other signals in the original ERPs.Indeed, looking at the signals of component 5, one can see a negative deviation the signal with a pick latency about 200 ms after the second stimulus in condition A-A, when the subject presses the button with his right hand.This negative signal deflection seems to be related to the wellknown ERP phenomenon, the so-called readiness potential, which is observed in the central regions of the brain on the contralateral side of the limb that will move.However, this phenomenon is difficult to see in the original ERPs (Fig. 2).Another example, the waveforms of component 6 indicate that there appears to be a low-amplitude parietal positive potential deviation, which is observed both after the first and after the second stimulus in conditions where the subject cannot yet stop performing the trial task (in A-A and A-P conditions after the first stimulus, and in A-A condition after the second).In the original ERPs, a similar deviation can be observed only after the first stimulus, whereas it is masked after the second.
In the present ERP model, the mixing matrix is the same for all task conditions.Therefore, the signals for a single component and different task conditions are estimated at the same scale which makes it possible to carry out quantitative comparisons of these signals, including statistical analysis that is performed for each component separately.Any conventional method can be used for these purposes, for example, cluster-based analyzes (Maris and Oostenveld 2007) could be applied.The statistical analysis is simpler in the case of the ERP model compared to the original ERPs, since the number of components is usually less than the number of electrodes.
Under the ERP model, the potentials recorded on the scalp are a superposition of signals from several brain sources.Moreover, the TSSOBI is based on the assumption that the mixing matrix has its pseudoinverse.This means that the columns of the mixing matrix (topography of the sources) are linearly independent and therefore different.
This is possible if these brain sources differ in spatial position, dipole orientation, or both.To localize these sources, we can use well-known methods, for example, sLORETA (Pascual-Marqui 2002).
The TSSOBI is based on the assumption that the deviations of source signals from their group-averaged waveforms are mutually uncorrelated or weakly correlated over a set of subjects used for computing the group ERPs.At the same time, TSSOBI does not imply any specific type of signal variability from subject to subject.Given the fact that, possibly, the sources of these signals are spatially separated, the results obtained by the TSSOBI demonstrate that the reactivity properties of different regions of the brain vary from subject to subject independently of each other.This hypothesis seems quite realistic, since during the development of the brain, many random factors affect each of its regions in different ways.Moreover, TSSOBI helps to identify those areas of the brain in which their reactivity properties differ from person to person under given experimental conditions.If this hypothesis is correct, then additional possibilities of using the TSSOBI results become available for further research.In particular, a mixing matrix, once obtained by TSSOBI under some experimental conditions and a specific group of subjects, can be used to evaluate signals from the same brain sources for a different group of participants or under similar but not identical experimental conditions.However, it must be stressed here that such studies need to be planned with caution since the TSSOBI has some limitations (see next Section).
The results showed that, when analyzing the real ERPs, TSSOBI, apparently, allows us to obtain a model that describes the data more accurately than the TSSOBI-symmetric algorithm.However, there is no formal evidence that the performance of TSSOBI will be better than TSSOBIsymmetric for any ERPs dataset.It appears that a reasonable strategy would be to compute two ERP models using both of these algorithms and choose the best one.

Comparison with the Previous Implementation
Let us consider the main differences between the new implementation of TSSOBI and the previous one (Ponomarev and Kropotov 2013).

The previous implementation of TSSOBI uses full-rank
non-orthogonal AJD algorithm for symmetric matrices (Souloumiac 2009), i.e. it was assumed that W is a square matrix.In this case, to obtain models with a different number of components, it was necessary to use a dimension reduction procedure (see Remark 2).The disadvantage of this approach is that the resulting models often contain so-called "ghost" components with low-power noise-like signals, even if the number of components K is chosen optimally (according to BIC).On the contrary, the new implementation uses of TSSOBI uses low-rank non-orthogonal AJD algorithm for nonsymmetric matrices.Whereas the dimension reduction procedure can be used optionally if necessary (see "Dimension Reduction") to improve the performance of the algorithm.In this case, the model with optimal K does not contain "ghost" components (see Fig. 9).2. In the previous implementation of TSSOBI, ERPs for several conditions were concatenated into one time series, and covariance matrices were calculated both within and between conditions (see Remark1).This approach appears to have at least two drawbacks.First, the number of matrices for AJD in this case grows proportionally to the square of the number of conditions.This fact can reduce the accuracy of the AJD algorithm due to rounding errors.Second, the accuracy of the estimation of matrix A can also decrease for another reason.For example, if there are three conditions and under one of them, the magnitude of the signals is small (similar to the experimental conditions described in "Real ERPs Data").Then the ratio of the number of covariance matrices with low and high noise level will be relatively small (approximately equal to 2/3), which will result in a decrease in accuracy.Perhaps for these reasons, we were not able to obtain an easily interpretable model for three conditions using the previous implementation of TSSOBI.On the contrary, in the new implementation of TSSOBI, covariance matrices were calculated only within conditions.As a result, the number of covariance matrices depends linearly on the number of conditions.Moreover, for the example above, the ratio of the number of covariance matrices with low and high noise level becomes larger (approximately equal to 2).
The BIC calculation has also been seriously revised.
In the previous implementation, ERPs for several conditions were concatenated into one time series.In this case, the size of matrices s k was very large.In addition, the matrices s k can be singular or close to singular (see "BIC Implementation").However, in order to estimate the BIC, we need to compute s k −1 and ln To solve this problem, we used the traditional approach, in which a small value was added to the diagonal elements of this matrix.However, the BIC estimation error increases with the size of the matrices s k .As result, the BIC(K) often did not have a minimum and was a monotonically increasing function.In this paper, we propose an approximate method for calculating BIC, the implementation of which are described in detail in the "BIC Implementation".Testing has shown that this approximation produces acceptable BIC estimates for a suitable SNR range.However, these calculations are approximate and the results of using BIC should be treated with caution.Overall, based on the results of this theoretical analysis and empirical observations mentioned above, we argue that the new implementation of TSSOBI performs better than the previous one.An additional argument is the fact that we were able to obtain an acceptable ERP model for a relatively long interval of 2 s and three conditions simultaneously.

TSSOBI Limitations
1.The TSSOBI is based on the assumption that the mixing matrix is the same for all subjects.This assumption can be taken as a first approximation if we consider that the brain in all healthy people is approximately anatomically similar, and the electrodes are placed on the head under a unified protocol.This assumption is supported by the fact that local features of brain signals can be identified by group-averaged ERPs.Nevertheless, this assumption is an approximation that does not apply to patients with brain injury or tumors.If suitable data sets are available, a more sophisticated model can also be proposed, which takes into account the fact that mixing matrices can be different (Gong et al. 2015).2. The TSSOBI does not work if there is no signal difference between the subjects.Moreover, the mixing matrices obtained for the two groups of subjects may turn out to be different depending on the features of the variability of the ERPs.The appropriate strategy for selecting the group of subjects will be the one in which the variability of the ERP is most noticeable.For example, it would be reasonable to choose subjects across a wide range of ages in whom amplitude and latency of ERP waves systematically vary with age.3. The TSSOBI does not separate sources for which the signal has the same auto-covariance.For example, the TSSOBI is most likely unable to separate early signal components from sources located in the primary visual or auditory sensory regions of the left and right hemispheres of the brain.As a result, the signals of such sources, apparently, will be described by one component (see Fig. 9, components 3 and 8). 4. The accuracy of the mixing matrix estimate critically depends on the SNR.This imposes additional requirements on the quality of the ERP record and the number of subjects.The TSSOBI test results shown in Fig. 6b may be useful in planning such studies.5.The proposed Bayesian information criterion for evaluating the optimal number of components in an ERP model is an approximation.The results of this assessment should be verified by analyzing ERP models with different numbers of components, which are selected near the minimum of the BIC(K) function.It seems rea- sonable that an additional suitable criterion for choosing a model is the ability to easily interpret the features of the original signals.It is also possible that a more accurate criterion could be developed to estimate the optimal number of components using Bayesian probabilistic modeling (Chib 1995;Zhong and Girolami 2012).6.The use of TSSOBI does not guarantee that the resulting components have a different functional role.Several components can reflect the dynamics of the same brain process, just as signals from one source can refer to several processes occurring at different time intervals.To identify these brain processes, it is necessary to manipulate the experimental conditions, similar to studies using conventional approaches to ERP analysis.

Conclusions
In this paper, we proposed a novel implementation of a method for decomposing multi-channel ERP into components.This novel implementation of the method is superior to the previous one.We also proposed a novel implementation of Bayesian information criterion for evaluating the optimal number of components in an ERP model.The application of these methods to the analysis of the ERPs provides additional opportunities for identifying the features of cognitive processes occurring in the human brain, and can be useful both in laboratory conditions and in clinical studies.

Fig. 1
Fig. 1 Individual and group averaged ERPs in visual cued Go-Nogo task obtained from 484 subjects sorted by age.The upper panels are ERP-image plots, where each individual ERP is encoded as a colored line, warm colors indicate positive activity and cool colors negative activity.The middle panels represent individual ERP waveforms.
) = P( |M)P(M)∕P( ) (11) sum of signal and noise according to the following formula:

Fig. 4
Fig. 4 Average performance index versus SNR for synthetic dataset 1 for different shift of bell-shaped curve peak latency

Fig. 6 Fig. 7
Fig. 6 Average performance index versus SNR (left) and versus the number of subjects (right) for synthetic dataset 2

Fig. 8
Fig. 8 Estimation of the optimal number of signals using BIC for real ERPs data array.a The average variance of ERPs, vertical lines indicate the borders of selected interval for calculation BIC.b BIC(K) .The whiskers indicate three standard deviations from the means

Fig. 9
Fig. 9 Topographies and signals of hidden ERP sources in visual cued Go-Nogo task.Axis Y-amplitude (c.u.), axis X-time (ms).Signals are sorted by variance and shown in descending order.The par-

Table 2
Percentage of correct estimating the number of signals for synthetic dataset 2