An improved blind Gaussian source separation approach based on generalized Jaccard similarity

Blind source separation (BSS) consists of recovering the independent source signals from their linear mixtures with unknown mixing channel. The existing BSS approaches rely on the fundamental assumption: the source signals are non-Gaussian § this limited the use of BSS seriously. To overcome this problem and the weakness of cosine index in measuring the dynamic similarity of signals, this study proposes the fuzzy statistical behavior of local extremum (FSBLE) based on generalized Jaccard similarity as the measure of signal’s similarity to implement the separation of source signals. In particular, the imperialist competition algorithm is introduced to minimize the cost function which jointly considers the stationarity factor describing the dynamical similarity of each source signal separately and the independency factor describing the dynamical similarity between source signals. Simulation experiments on synthetic nonlinear chaotic Gaussian data and ECG signals verify the effectiveness of the improved BSS approach and the relatively small cross-talking error and root mean square error (RMSE) indicate that the approach improves the accuracy of signal separation.


Introduction
Blind source separation (BSS) aims at estimating source signals from their linearly mixtures without any prior information about the source signals and transmission channel [1,2].Originally, the BSS was introduced in 1984 by two French researchers to understand the biological behavior of neural system [3,4].As a momentous technical means of signal processing and analysis, BSS has been widely utilized in communication system [5], image processing [6], speech recognition [7], mechanical fault diagnosis [8], biomedical engineering [9] and many other aspects.
With the development and application of BSS technology, various approaches have been developed for estimating the source signals, such as, independent component analysis (ICA) [10], nonlinear principal component analysis (NPCA) [11], sparse component analysis (SCA) [12], etcetera.Among which ICA has attracted extensive attention of many experts and scholars as which is based only on statistical independence among the component.The main algorithms of ICA include fast fixed-point algorithm (FastICA) [13], maximum likelihood estimation algorithm [14], informax algorithm [15], joint approximate diagonalization of eigenmatrices (JADE) [16], fourth-order blind identification (FOBI) [17], and the like.The main assumption of ICA is that the source signals are non-Gaussianity, which shows that ICA is incapable of separating Gaussian source signals.The specific reason lies in that the invariance of Gaussian subspace makes it impossible to obtain the axis orientation information utilized to separate the source signals [18].Nonetheless, the observed signals are mixtures of Gaussian signals in many cases.Therefore, the separation of source signals need to be accomplished by considering other assumptions related to source signals and accordingly algorithm.Take ECG signal as an example, the work on nonlinear dynamics has indicated that the ECG signal is chaotic and has extensive nonlinear dynamic characteristics [9].Based on this, it is a vitally important idea to relax the assumption of non-Gaussianity via considering the dynamic similarity between source signals to achieve the separation of source signals, which has certain research significance.
The quantification of dynamic similarity of signals is perceived as a challenging work in the BSS problem.Recently, Niknazar et al. [19] proposed the fuzzy statistical behavior of local extreme (FSBLE) method based on cosine similarity to measure the dynamic similarity in signals, which solved the problem of isolating the linear chaotic Gaussian source signals and applied it to epileptic seizure prediction.Whereas, the cosine similarity is unable to make a distinction among the similarity vectors, and suffers from the problem of partial information loss, which lead to the unsatisfactory separation effect of the algorithm [20].For the purpose of surmounting this problem, this paper investigates the modified FSBLE based on generalized Jaccard similarity and introduced it into the BSS problem with Gaussian source signals to improve the separation performance of BSS approach.Particularly, due to the cost function in the proposed BSS algorithm is not differentiable, the imperialist competition algorithm with fast convergence and less computation time is selected to search for the optimal separation matrix.
The rest of this paper is described below.In Section 2, the FSBLE based on generalized Jaccard similarity is put forward.In Section 3, the modified BSS approach with Gaussian source signals is introduced.In Section 4, simulation experiments are conducted.In closing, the conclusion is given in Section 5.

FSBLE with generalized Jaccard similarity
In this Section, the cosine similarity and its existing problems in measuring the dynamic similarity of signals are introduced in subsection 2.1, the rationality and advantages of using the generalized Jaccard index to measure the similarity of signals are analyzed in subsection 2.2, and the FSBLE based on generalized Jaccard similarity is raised in subsection 2.3.

Cosine similarity
The previous studies have shown that the information of signal can be represented by a eigenvector [21].From this, using the information of eigenvector to measure the dynamic similarity of signals is the key step of BSS method based on the dynamic characteristics of signals.Furthermore, selecting an appropriate distance to measure the similarity is momentous, because equipped with a relatively good distance indicator can reduce the time and processing cost.In terms of different distance measurement indexes, it is more reasonable to employ bounded distance as the measure of similarity of signals.Hitherto, the existing BSS approach utilizes the cosine distance to quantify the similarity of signals.The cosine similarity is given via the dot product and length of vectors, that is, given two attribute vectors V 1 and V 2 , which is able to be represented by where V 1 and V 2 represent both eigenvectors about the information of signals, < •, • > is the dot product operator between two vectors, and || • || denotes the Euclidean norms operator of V 1 and V 2 .
In the light of the Eq. ( 1), the value of cosine similarity is always between 0 and 1 and will not be affected by the dimension of vector.Furthermore, it avoids the disadvantage of poor robustness about absolute distance.Whereas, cosine index exists the problem of information loss in measuring similarity [20], which results from that the cosine similarity is poor to distinguish the similarity vectors when employing it to measure the dynamic similarity of signals.Consequently, the signal separation performance is not ideal.For the sake of overcoming this problem, this paper proposes to utilize generalized Jaccard similarity, which will be introduced in the next subsection.

Generalized Jaccard similarity
Jaccard index, as a measure indicator, is utilized to quantify similarities and differences of limited sample sets.Let A and B be two sets, the Jaccard index is shown as the following ratio where and denote the intersection and union of sets A and B, respectively, | • | represents the potential of the set, that is, the number of elements in the set.It has been extended to the fuzzy Jaccard index, that is, X a and X b are two vectors with n components, in which each component is fuzzied, the value of which ranges from 0 to 1, and then the fuzzy Jaccard index of X a and X b can be given by According to this extended Jaccard index, a new measure of similarity in signals, which is called as generalized Jaccard similarity, is represented as follows where all of the symbols have the same meaning as their counterparts in Eq. ( 1).
The proposed generalized Jaccard similarity ρ 2 (V 1 , V 2 ) satisfies the following properties: According to Eq. ( 4), the previous part of the denominator of generalized Jaccard similarity can be regarded as the arithmetic average of two vectors squared, and the latter part can be regarded as subtracting the same part of two vectors on the basis of the arithmetic average of vectors.Relying on the fact that arithmetic average can retain the original state of each component than geometric average better, which further highlights the difference of signals, increases the identification of signal similarity, makes the signal less confusing, and solves the problem of partial information loss of original signals that cosine similarity will encounter.In addition, the rationality of the generalized Jaccard similarity employed to quantify the dynamical similarity of signals is illustrated via propositions 1, 2, 3. On this account, the generalized Jaccard similarity can quantify the similarity between two eigenvectors containing the information of signal better.In what follows, we get down to describe the modified FSBLE based on generalized Jaccard similarity in detail.

Modified FSBLE
The FSBLE is a dynamical similarity measure which employs time and amplitude information of local extrema to describe the dynamic characteristics of signals [22,23].This technique is mainly segmented into three steps to carry out: find optimal amplitude and time difference interval via the local extreme of signal, use the subordinate function to implement fuzzy processing on signal data, and measure similarity based on the generalized Jaccard similarity with extracted signal information.

Find optimum amplitude and time distance segmentation
The local extremum of signal has the information of amplitude and global frequency, which retains the important characteristics of signal.Thereby, the local extremum of signal plays a vitally important role.In order to better extract the information of signals, the time difference of consecutive local ectremums are also adopted.After obtaining the amplitude and time difference of consecutive local

Signal data fuzzy processing
On account of the acquired knowledge, statistical behavior of local extrema (SBLE) [23], symbolic aggregate approximation (SAX) [24], and their extension methods split the amplitude and time difference into intervals by means of the determined value of local extremum.Whereas, one problem with the above-mentioned methods is that it is highly sensitive to noise.Consequently, FSBLE is proposed, which isolates the amplitude and time difference into intervals via the fuzzy boundary of local extremum, surmounts the aforementioned problem, and thus improves the stability of the method.The O + 1 and P + 1 intervals of amplitude and time difference segregated in the previous stage are exploited to stipulate subordinate functions (SF) of fuzzifier and the type of subordinate function can be choosed by taking the histogram of values of amplitude and time difference of consecutive local extremums into consideration.It means that in order to maximize the extracted information, for the i-th local extremum, the matrix B i is constructed by incorporating the subordinate function value of amplitude and time difference into every Li using the following equation, where is the amplitude of i-th local extrema and T (Li) is its time difference with the (i + 1)-th local extrema, F refers to fuzzy subordinate function, in which triangular subordinate functions are widely adopted [25].And what is more, each signal is converted into a matrix sequence B to facilitate further processing.The specific process refers to literature [22].

Measure similarity
As indicated above, each local extremum corresponds to a matrix, based on which the signal is converted to a sequence consisted of n − 1 matrixes, in which n is the amount of local extremum.And then the statistical distribution of the defined pattern is exploited to extract the dynamic characteristics of signals.For each pattern, the following equation is employed to quantify belonging number of r sequential local extrema to possible amplitude and time difference intervals, Specifically, once the value of r is selected, the amount of possible patterns of signals sequences corresponding to r is: For each signal V signal R , it is constructed by changing r from 1 to R, where R denotes the maximum number of continuous local extremum defined.
where V signal R possesses the dynamic features of local extremum of signal and is able to be utilized to quantify similarity.
It is well known that the utilization of bounded distance metrics can make the similarity values more comparable in different studies.At the same time, in compliance with the previous analyses of generalized Jaccard similarity and cosine similarity, employing generalized Jaccard index to quantify the similarity between is superior to that exploits cosine distance.Consequently, the generalized Jaccard index bounded by 0 and 1 is chosen as the measurement indicator to quantify dynamical similarity of signals, as shown below: where the vectors V 1 R and V 2 R have been processed fuzzily via the preceding information.

Improved BSS approach
where Q is a full rank mixed matrix, S is the source signals, and X is the mixed signals.For BSS, the ultimate goal is to recover the source signals via finding the separation matrix W. Hence, it must be implemented by using where Ŝ is the estimate of source signals S.
Whereas, when the source signals is chaotic or nonlinear and there is no information about non-Gaussianity, the commonly used BSS approaches based on non-Gaussian assumption will be invalid.Consequently, taking some other hypotheses from observed signals into consideration is necessary to acquire the separated matrix W. According to this, the source signals are deemed to be chaotic and have stable dynamics, where ECG signal is one of the representatives.The following restrictions and requirements are able to be considered to obtain the appropriate matrix W for this problem.
(1) The dynamic similarity between each source signal and itself is the highest, which is called the signal dynamic stationarity, that is, the dynamic characteristics of source signal change with time is relatively static.
(2) The dynamic similarity between each source signal and other signals is the lowest, which is called the signal dynamic independence, that is, the maximum separability of source signal is satisfied.
For the sake of solving the problem of BSS based on the above-mentioned hypotheses, the method adopted is to find the separative matrix W which satisfies two assumptions above.Nonetheless, the premise of searching the separated matrix W is to quantify the dynamic similarity of signals, which means that it is crucially important to find an index to measure the similarity between signals.As described in subsection 2.3, the FSBLE based on generalized Jaccard similarity is taken as the measure of signal dynamic similarity in this study.
On the basis of aforementioned works, the problem can be transformed into an optimization problem based on the known dynamic information between signals, which mainly includes the following two factors: 1. Dynamic stationary factor: Hypothesis (1) is satisfied to maximize StaFac function.
The ambition of the dynamic stationarity factor and the dynamic independent factor is the maximization of dynamic stability of each estimated source signal where Ŝk i is the k-th fragment of the i-th estimated source signal and N s is the number of source signals.Algorithm 1 describes the main process of proposed BSS approach based on generalized Jaccard similarity in detail.Similar to the BSS approach based on cosine

The experimental model
In the following simulation experiments, we consider three classical models as objects in the field of chaotic systems: Lorenz [28], Rossler [29] and Mackey Glass [30], which are given by where σ , ρ and β denote Prandtl number, Rayleigh number and directional ratio respectively, ω signifies natural frequency, as the most typical example of time-delay chaotic system, in which r expresses time-delay number.In the following experi- measure the dynamic similarity between signals [22].Then, the Runge-Kutta method is adopted to select the signals X 1 , X 2 , X 3 of these systems as basic source signals S.
The specific simulation experiment will be studied in the next subsection.

Evaluation on ECG signal
The Three ECG signals are selected in the experiment to certify the ability of the improved BSS approach to separate ECG signals.

Performance comparison
To evaluate the performance of the improved BSS approach based on generalized Jaccarcd similarity with that based on cosine similarity [19], cross-talking error [32] and RMSE are adopted as the measurement indexes, which have the following expressions respectively: where C = WQ = (C pq ) represents the transfer matrix of the mixing-separation composite system, X sep,i and X ori,i denote the i-th separated signal and the corresponding original signal respectively.If the source signals are separated well, the value of

Cross-talking Error
Improved BSS algorithm based on generalized Jaccard similarity BSS algorithm based on cosine similarity Fig. 6: Cross-talking error of the BSS approach RMSE will be small and C will become a permutation matrix (although the elements may have different symbols).Only one element in each row or column of the permutation matrix is equal to 1, and all other elements are equal to 0. Obviously, the larger the value of cross-talking error and RMSE, the worse the separation performance of the approach.The curve of cross-talking error is depicted in Fig. 6.Wherein blue curve is the cross-talking error of the improved BSS approach based on generalized Jaccard similarity, red curve is that based on cosine similarity.The result of RMSE about two BSS algorithms is shown in Table 1.As can be seen from the simulated results in Fig. 6 and Table 1, the improved BSS approach based on generalized Jaccard similarity generally outperforms that based on cosine similarity under the criterions of the cross-talking error and RMSE, which confirms the superiority of the improved BSS approach.

Conclusion
In this paper, an improved BSS approach based on generalized Jaccard similarity is studied.Firstly, the generalized Jaccard index is proposed to solve the problem

Declarations
• Conflict of interest: the authors declare that there is no conflict of interest.
• Data availability statement: all of the material is owned by the authors and no permissions are required.• Authors contribution: Fu Xudan, Ye Jimin, and E Jianwei carried out the blind source separation of Gaussian signals based on generalized Jaccard similarity.Among them, Fu xudan put forward the idea of the research, participated in its design and coordination and drafted the manuscript, Ye Jimin participated in the construction of the framework of the paper, and helped to draft the manuscript, E Jianwei participated in the design of the experiment and helped to draft the manuscript.All authors reviewed the manuscript and approved the final draft.

Fig. 1 :
Fig. 1: The specific process diagram of 2.3.1 BSS is one of the most commonly methods in digital signal processing, which aims to separate the source signals under the situation of unknown transmission channel and source signals.The simplest model is a linear combination of the source signals and the hybrid matrix in the deterministic case (where the number of source signals and observed signals is equal).The specific mathematical expression is shown below: separately and dynamic independence between estimated source signals, in which the maximization of independence is able to be understood as the minimization of dynamic similarity between estimated source signals.Thereinto, to quantify the dynamic stationarity and independence of the source signals, each estimated source signal is segmented into d fragments, in which d ≥ 2 is required to ensure that the following equation holds, StaFac and IndFac are calculated by The function StaFac calculates the average similarity of total corresponding segments in each estimated source signal and the function IndFac gives the dynamic similarity of whole corresponding segments between two different estimated source signals.Different cost functions can be defined via specific assumptions.In this problem, cost function must meet the maximization of StaFac and the minimization of IndFac simultaneously.According to the existing knowledge, the exponential cost function is beneficial to the cost calculation of additive model.Therefore, the exponential combination of dynamic stationary factor and dynamic independent factor is selected as the cost function, as follows: CostFcn = e −StaFac * e IndFac .(14) By minimizing CostFcn, the dynamic stability and independence of source signals are satisfied, and thus the relatively optimal source signals are estimated.Generally speaking, as the corresponding prior knowledge about the parameter information of source signals and the characteristics of transmission channel is unknown, which leads to an uncertainty expansion factor on the inside.Accordingly, the separated source signals have major differences in amplitude, phase, and order, whereas, the waveforms of separated source signals are still consistent with the corresponding source signals.Meanwhile, CostFcn is not differentiable due to FSBLE based on generalized Jaccard similarity.Hence, the current problem (minimize CostFcn) is incapable of being solved with derivative-based iterative methods, which is the conventional solution of many other BSS methods (such as FastICA, In f ormax, etcetera).Consequently, the imperialist competition algorithm in the metaheuristic search algorithms is utilized to minimize the CostFcn via searching the value of matrix W [26, 27].
ments, these parameters are set to σ = −16, ρ = 45.92,β = 4, ω = 1, α = 0.2, η = 0.4, γ = 5.7, a = 0.2, b = 0.1 and c = 10.Once the parameters are selected, the influence of different initial points on the dynamic similarity of signals can be almost ignored in chaotic system which results from that the modified FSBLE is used to

Fig. 4 (
a) depicts the original ECG signals, Fig. 4(b) depicts the mixed ECG signals, and which will be executed by the improved BSS approach.The mixed ECG signals.

Fig. 4 :Fig. 5 :
Fig. 4: The diagrams of ECG signals of information loss in cosine similarity, and the modified FSBLE employed to extract the dynamic characteristics of signals is utilized to quantify the dynamic similarity of signals.Secondly, the improved algorithm based on the dynamic characteristics of signals is applied to the blind separation of Gaussian signals, and the imperialist competition algorithm is selected to find the separation matrix on the condition that the cost function is not differentiable.Finally, a series of simulation experiments on the nonlinear chaotic model verified that the proposed algorithm can successfully separate Gaussian source signals.At the same time, the cross-talking error and root mean square error (RMSE) of the improved approach is relatively small, which indicates that the separation accuracy of the algorithm is improved.In addition, it can also be applied to a series of biological signals besides ECG signals.Acknowledgments.This work is supported by the National Natural Science Foundation of China under Grant 61573014.
separation of ECG signals is considered as a real-world application of the proposed method.This study select ECG signals in three different states as the source signals, and multiply it with the randomly matrix U to obtain the mixed ECG signal, where these signals of AEECG, PEECG and SAECG obtained from Apnea-ECG Database, Post-Ictal Heart Rate Oscillations in Partial Epilepsy and UCD Sleep

Table 1 :
The comparison of RMSE between two BSS algorithm.