Multi-fault diagnosis of rolling bearing using two-dimensional feature vector of WP-VMD and PSO-KELM algorithm

In order to achieve accurate fault diagnosis of rolling bearing under random noise, a new fault diagnosis method based on wavelet packet-variational mode decomposition (WP-VMD) and kernel extreme learning machine (KELM) optimized by particle swarm optimization (PSO) is proposed in this paper. Firstly, the time–frequency domain feature vectors of the original rolling bearing fault signals are effectively obtained by preprocessing of WMD and decomposition and reconstruction of VMD. Then, the extracted two-dimensional feature vector is input into the KELM neural network for fault identification, and combined with PSO, KELM parameters were optimized. The experimental results show that the proposed method can effectively diagnose the rolling bearing under random noise, with the features of fast speed, stable performance and high accuracy. By comparison, this paper obtains better accuracy and real-time performance with fewer features, which provides a simple and efficient solution for fault diagnosis of rolling bearings.


Introduction
Rolling bearing has long been one of the most critical and vulnerable parts of rotating equipment, whose function largely affects the operating state of the entire mechanical system (Huang et al. 2019). Failure of rolling bearings can lead to machinery malfunctions or even catastrophic accidents. Therefore, it is of great significance to achieve reliable health monitoring and fault diagnosis for rolling element bearings Cao et al. 2021).
Since the rolling bearing vibration signals have the characteristics of non-stationary, nonlinear, multi-modulation, and multi-component, accompanied by random noise interference, it has been a challenging issue that how to effectively acquire the fault feature of the rolling bearing vibration signal in the signal processing field. At present, common extraction methods include wavelet transform (WT) Mao et al. 2019), shorttime Fourier transform (STFT) (He et al. 2014), empirical mode decomposition (EMD) and so on (Liang et al. 2005). However, there still exists some imperfections. For example, the WT cannot subdivide well the high-frequency of the signal, inducing information loss and energy leakage (Meng et al. 2016), and the analysis window of STFT needs to be further optimized. These shortcomings make these classic methods not be fully self-adaptive in essential (Yao et al. 2018;Cheng et al. 2021). As a self-adaptive time-frequency analysis method, EMD is able to handle non-stationary and nonlinear signals better, but it has disadvantages such as over-envelope, under-envelope and end-effect, what's more, its modal aliasing property will lead to the distortion of the decomposed IMFs (Yao et al. 2016). In view of the defects mentioned above, Dragomiretskiy and Zosso (2014) put forward a completely non-recursive self-adaptive signal decomposition method, named variational mode decomposition (VMD) method in 2014, which has a solid mathematical foundation , because it iteratively searches for the optimal solution of the mathematical model on the variational problem. In this way, the precision of IMFs component is high without these problems like over-envelope and underenvelope, and furthermore, the phenomenon of mode aliasing is effectively suppressed (Xu et al. 2019). And the superior performance of VMD was proved in the literature (Dragomiretskiy and Zosso 2014;Wang et al. 2015a, b). Ting-ting et al. (2020) decomposed the vibration signal of rolling bearing by VMD, and Qing et al. (2020) applied VMD to bearing fault diagnosis.
In fact, rolling bearing fault diagnosis can be regarded as a model recognition problem. In general, support vector machines (SVM) (Liu et al. 2015) and artificial neural networks (such as BP) are commonly used to implement pattern recognition (Jia et al. 2015). Nevertheless, they belong to the shallow learning model, lacking of strong characterization efficiency and having troubles in expressing the complex relationship in fault diagnosis problems in an efficient manner (Shao et al. 2017;Wang et al. 2020). Based on the disadvantages of SVM and BP, Huang et al. (2016) stated an available solution, the extreme learning machine (ELM), equipped with the advantage of fast running speed, whereas there still remain deficiencies such as unstable results and confusions on determining the number of hidden neurons in practice. To improve it, the kernel function is introduced into the ELM, and KELM is proposed by Huang (2014). By integrating kernel functions, it maps the linear non-separable mode to the high-dimensional feature space, making it linearly separable and improving the accuracy of the algorithm. Compared with ELM, KELM only needs to select the kernel function and its related parameters to obtain the output weight, which has fewer adjustable parameters, better generalization performance, etc. (Qin et al. 2017;Su et al. 2021) applied it to the diagnosis of rolling bearings then gained better accuracy. On the other hand, kernel function makes the KELM be very sensitive to parameter settings. Consequently, it is the key to optimize the KELM parameters for improving the accuracy of state recognition.
Aiming at the above analysis, a new method for fault diagnosis of rolling bearings based on WP-VMD and PSO-KELM is voiced in this paper. The following contents of this paper are organized as follows: In Sect. 2, the fault feature extraction method based on WP-VMD model is introduced. In Sect. 3, the fault diagnosis method based on PSO-KELM model is introduced. In Sect. 4, the test platform is presented and the proposed method of WP-VMD and PSO-KELM was verified by experiments. In Sect. 5, the method proposed in this paper is analyzed and discussed. Finally, the conclusion is summarized in Sect. 6.
2 Introduction of WP-VMD model

Wavelet packet decomposition
Since the WT is incapable to subdivide the high-frequency signal well, easily causing information loss, the more detailed signal decomposition algorithm, WPD (Wang et al. 2015a, b), is proposed to solve this defect, and it hierarchically divides the analyzed signals and improves the resolution of the frequency domain analysis, according to various characteristics of the signal to self-adaptively match the spectrums (Learned, 1992;Yan et al. 2018).
Taking the three-layer wavelet packet analysis as an example, the WPD diagram is shown in Fig. 1. It can be seen that WPD does not neglect the further decomposition of the high-frequency of the signal, performing fine decomposition on each node of each layer as well, which improves the resolution of frequency domain analysis.
The decomposition equation of WPD algorithm is given as where h kÀ2l and g kÀ2l are the high-pass and low-pass filter coefficients, and d j;2n l and d j;2nþ1 l are wavelet packet coefficients. The reconstruction equation is given as

Variational mode decomposition
The VMD is mainly divided into the construction and solution of variational problems. The corresponding specific steps are as follows: (1) Construction of constrained variational problem The VMD constructs a variational problem with the sum of the K IMFs being equal to the input signal f as a constraint and the sum of the estimated bandwidths of the K IMFs as the minimum: 1. Each IMF u k t ð Þ is converted into a corresponding analytical signal by a Hilbert transform, and a unilateral spectrum of the signal is obtained: 2. The pre-estimated center frequency e Àjx k t of all modally resolved signals is combined and the spectrum of each IMF is modulated to the corresponding baseband: 3. The square root of the time gradient L 2 norm of the demodulated signal is calculated, and the bandwidth of each modal classification is estimated. The representation of the constrained variational model is as follows: min where d t ð Þ is the Dirichlet function, Ã is the convolution operation. u k f g ¼ u 1 ; . . .; u k f grepresents the set of K IMFs components after VMD decomposition, x k f g ¼ x 1 ; . . .; x k f g represents the set of K IMFs component center frequencies and f is the input signal.
(2) Solution of constrained variational problem VMD transforms the problem of constructed constrained variational solution into an unconstrained variational solution by means of importing the quadratic penalty factor a and the Lagrange operator k t ð Þ. In the Gaussian noise environment, a can ensure the reconstruction accuracy of the signal, meanwhile k t ð Þ can strengthen the rigor of the constraint, and extended Lagrange expressions are as follows: The saddle point of the extended Lagrange expression is found by the alternating direction multiplier method (ADMM). Specific steps are as follows: 3. Update x k : 4. Update k: where c is the noise tolerance parameter and is generally taken as 0. 5. Repeat steps (2)-(4) until the iteration constraint is met to stop the loop. Multi-fault diagnosis of rolling bearing using two-dimensional… 8177 Before applying the VMD to signal decomposition, the number of modes K should be set ahead, which will lead to disparate results with different values of the parameter. When K is too small, the key information will lose, while too large, the frequency aliasing will come up. Aiming at this situation, it is known that if there is a component similar to the center frequency during the decomposition process, it indicates that over-decomposition has occurred. This paper sets forth a method to determine the number of modes K by using this feature. Firstly, the number of initialization modes is K ¼ 2, and then, it is determined whether the center frequency of the IMFs overlap. If there is no overlap, K ¼ K þ 1 is repeated. Otherwise, we get the final mode number as K ¼ K À 1. Figure 2 shows the fault feature vector extraction process of rolling bearing based on WP-VMD method. Firstly, the original vibration signal is denoised by the WPD method, and the denoised signal is decomposed into three layers to obtain signals of eight groups of different frequency bands. The signal energy of each group is calculated, and the frequency bands with larger energy are selected to be reconstructed. Then, using the VMD method to decompose the reconstructed signal into multiple IMFs and taking advantage of the kurtosis-correlation criteria to determine the IMFs proportion as the reconstruction coefficient. Finally, feature extraction is exerted on the reconstructed signal, and the best feature vectors in the time-frequency domain are picked up to construct a two-dimensional fault feature vector.

Introduction of PSO-KELM model
Through detailed analysis of the neural network model in fault classification, choose the KELM that introduces the kernel function framework in SVM under the ELM model to fault classification. Since radial basis function (RBF) has strong generalization ability and smoothness (Liang et al. 2013), this paper takes RBF as the kernel function of KELM. In order to achieve the best search performance of KELM, it is essential to single out the best penalty coefficient C and kernel function parameter c; afterward, a fault diagnosis model based on PSO-KELM is proposed. The detailed modeling processes are as follows: 1. Initialize PSO parameters and determine the coding form as h ¼ ðC; cÞ. 2. The KELM training is performed on the training subset using the parameters, obtained by the initial population decoding, and the average error rate obtained by the 5-CV cross-validation method is recognized as the fitness function. The average error rate is calculated as: Among them, T i ð Þ represents the test accuracy of them ith training sample obtained by the 5-CV method, and m represents the number of cross-validation, where m ¼ 5 3. Increase the number of iterations. 4. Update the position and velocity of the particles, and the fitness value of each population. 5. The combination of parameters obtained by decoding the new particles of the previous step is trained on KELM, and the fitness value is calculated according to Eq. 12. 6. If it has not reached the maximum population, go back to the fourth step to execute, otherwise, the execution will continue. 7. Compare the current fitness value with the global optimal fitness value and determine the global optimal fitness value. 8. If the maximum number of iterations has not been reached, go back to the third step, otherwise, continue. 9. Output global optimal parameter combination ðC; cÞ. .

Experiment platform
The experimental data used in this paper were collected by the electro-discharge machining (EDM) experimental platform built by Case Western Reserve University. Figure 3 shows the fault simulation experimental platform. The rolling bearing model in the experiment is SKF6205, which is a deep groove ball bearing with an inner diameter of 25 mm, an outer diameter of 52 mm, and a thickness of 15 mm. The experimental platform employs EDM technology, to introduce a single point of failure to the test bearing, and records the vibration data corresponding to the bearing fault condition. The sampling frequency of the experimental platform is 12 k, and a total of about 10 s of data is collected. Figure 4 shows four states of the rolling bearing. Under the situation when the motor speed is 1772/min, the drive end data are collected. Considering the sampling frequency and the characteristic frequency, the 2400 sample length is enough to be on behalf of the features of the bearing. In the four states of normal, inner race fault, outer race fault and rolling fault, 50 sets of data are obtained; thus, a total of 200 sets of data are obtained. Figure 5 shows the waveforms of original time domain vibration signal in each state of the rolling bearing.

Fault feature extraction based on WP-VMD
Taking the data of the inner race fault as an example, the wavelet packet denoising is performed on the signal, and the denoised signal is decomposed by WPD into three layers to obtain signals of eight groups of different frequency bands, which are recorded as x 1 3 t ð Þ; x 2 3 t ð Þ; . . .; x 8 3 t ð Þ, and the energy of each group of signals is calculated and recorded as E 1 3 t ð Þ; E 2 3 t ð Þ; . . .; E 8 3 t ð Þ, as shown in Fig. 6. The main energy bands are selected for reconstruction, and the reconstructed signal is used as the VMD decomposition object, where the mode number is K ¼ 3. The original reconstructed signal and the decomposition results are shown in Fig. 7.
In order to extract the fault signal characteristic more precisely, the IMFs components are comprehensively analyzed in accordance with the kurtosis-correlation coefficient criterion. For the sake of paying more attention to the IMFs components with a larger amount of information, the normalized proportion of the two indicators of the k IMFs components are calculated to the range ½0; 1, recorded as q i and w i , respectively, the reconstruction coefficient r i of the corresponding IMF component is where i ¼ 1; 2; :::; k, and the calculation results are shown in Table 1. The appropriate feature quantity selection is the key to fault diagnosis. In the time domain, the multi-scale entropy (MSE) of the signal is deemed to the feature quantity. Owing to the MSE is sensitive to the scale setting, the selection of the best scale is constantly too subjective. On the basis of the idea of statistics, the maximum value of the scale was set as 30, and the peak, variance, and root-meansquare value of the MSE at 1-30 scale in the statistics are figured out. In the frequency domain, the frequency point of energy concentration and its corresponding energy can be installed as parameters to signify the energy distribution trend in the spectrum, which is a crucial feature reflecting the bearing condition. Then calculating the center of gravity frequency f c and the center of gravity amplitude a c of the signal to reveal these features. The kurtosis Kurt, the correlation coefficient Corrcoef, and the MSE are introduced as follows: Among them, N is the length of the signal sequence, x i ð Þ is the value of the ith signal sequence, x is the average value of the signal, r is the standard deviation of the signal.
Costa et al. brings in the idea of coarse granulation and elaborate the MSE based on sample entropy (Madalena et al. 2005(Madalena et al. , 2007, on the reason that the sample entropy where s is the scale factor, and when s ¼ 1, the sequence is not subjected to the coarse granulation process, and p i l ð Þ is the original sequence. (2) Give the dimension m and threshold r of the model, and obtain the sample entropy of each coarse-grained sequence (Richman and Moorman 2000). The accessible sample entropy SampEnðm; r; NÞ at each scale is described as MSE.
The five mathematical characteristics of peak p, variance s 2 , root-mean-square value RMS, center of gravity frequency f c , and center of gravity amplitude a c are where f i and P f i ð Þ are the frequency and power of the signal sequence at sampling point i, respectively.
The results of the five feature vectors of the reconstructed signal are shown in Fig. 8-12. Each picture implies the corresponding distinctions of the four states of the bearing vibration signal. The larger the difference of the feature vectors in the four states, the better the effect of fault classification. Figure 8 and Fig. 9 symbolize the peak and variance characteristics of the MSE, respectively. In Fig. 8, there are aliasing phenomena in the four state feature vector curves. In Fig. 9, the outer race and the rolling feature vector curves are aliased. It is overt that both of them fail in distinguishing well the state of rolling bearings. Compared with Fig. 8 and Fig. 9, the root-meansquare value feature vector curves of Fig. 10 has no aliasing phenomenon; the feature vector curves of the four states are distinctly different as well; in this way, the rootmean-square value of the MSE of the signal is selected in the time domain as feature vector. Similarly, from the comparison of Figs. 11 and 12, the feature vector curves aliasing phenomenon of Fig. 11 is serious. The feature vector curves of Fig. 12 have a large spacing, which can distinguish four states well. Therefore, the center of gravity amplitude of the signal is recognized as the feature vector in the frequency domain.
Consequently, taking the signal's root-mean-square value of the MSE as well as center of gravity amplitude as two-dimensional feature vector in the time-frequency domain for fault diagnosis. Compared with other feature extraction methods, not only does this method comprehensively take into account the time-frequency domain  Multi-fault diagnosis of rolling bearing using two-dimensional… 8181 characteristics of the signal, but also its fewer dimensions on construction of signal eigenvectors.

Fault diagnosis based on PSO-KELM
Avoiding the impact of various magnitude as well as dimension feature data on convergence speed and classification accuracy of the neural network model, all sample data are normalized to ½À1; 1. The formula is as follows: where x is the true value of a sample of a single data, x min , x max are, respectively, the minimum and maximum values of the sample data, and the y is a standardized output data. Among 200 groups of experimental data, 160 groups were taken as training samples and 40 groups as testing samples. The minimum average error rate obtained under the 5-CV method was taken as the fitness evaluation function, after 100 iterations adopting PSO algorithm, and simultaneously, the optimal parameter combination ðC; cÞ was reached. The experiment was repeated 50 times to verify the effectiveness of this method. The fault diagnosis results of the first 20 experiments are shown in Table 2. From the results in Table 2, the fault diagnosis model based on PSO-KELM classifies accurately the fault state, providing preferable diagnostic effect.

Experimental data analysis
Since the experimental data processed in this article are collected by an ordinary EDM experimental platform, the    Center of gravity frequency of the reconstructed signal data collected by the experimental platform cannot accurately reflect the state of the bearing itself. This is because a large amount of noise is easily introduced during the collection process and cannot describe the sample-and-hold behavior of the measurement output well. At present, documents (Qiu et al. 2020a, b) propose more effective methods for data acquisition of random nonlinear systems to describe the sample-and-hold behavior of the measurement output of the data and to filter the data. Therefore, the use of better methods to collect data will further improve our experimental results.

Performance comparison
The extracted two-dimensional feature vector is input into the BP and SVM for fault diagnosis. In the 50 experiments, the average accuracy of PSO-KELM proposed in this paper is 99.65%, and BP and SVM were 86.45% and 96.88% accuracy. The specific classification results of the three methods are shown in Table 3.

Comparisons of WP-VMD and VMD
To illustrate the efficiency of the WP-VMD method proposed in this paper, the feature extraction effects and realtime performance of WP-VMD and VMD are compared. Figure 13 shows the feature extraction effects of VMD and WP-VMD, and it is obvious that WP-VMD has better performance.
The real-time performance of the two is compared by calculating the total running time of 50 samples in each  Multi-fault diagnosis of rolling bearing using two-dimensional… 8183

Conclusions
In order to solve the problem of fault diagnosis of rolling bearing, this paper proposes a new method based on WP-VMD and PSO-KELM rolling bearing fault diagnosis. The validity of the proposed method is verified by the experimental data of Case Western Reserve University. Furthermore, the accuracy as well as real-time performance of the proposed method are analyzed and discussed.
1. The WP-VMD method is utilized to preprocess the signal and decompose it into multiple IMFs; then, the kurtosis-correlation criterium is employed to determine the IMFs proportion as the reconstruction coefficient. 2. In the time-frequency domain, the root-mean-square value of the MSE as well as the center of gravity amplitude of the reconstructed signal are, respectively, selected as two-dimensional feature vector, which is input into the PSO-KELM algorithm for training and diagnosis. Due to the comprehensive consideration of the time-frequency domain indicators and the use of fewer features, the accuracy of fault diagnosis and the speed of fault diagnosis are improved.
3. Considering that the KELM is sensitive to parameter settings, whose classification performance is unstable. The PSO-KELM is proposed, which single out the best penalty coefficient C and kernel function parameter c of KELM, with better and stable fault diagnosis performance. 4. The accuracy and real-time performance of the method in this paper are analyzed and discussed. On the one hand, compared with the traditional neural network BP and SVM, the PSO-KELM is equipped with higher fault accuracy. On the other hand, compared with the VMD, the proposed WP-VMD method costs less time and has preferable real-time performance.
Although the method proposed in this paper has a good performance in the multi-fault diagnosis of rolling bearing, it cannot identify faults of unknown fault types and cannot diagnose two or more mixed faults. Therefore, the main research problems in future will focus on identifying unknown fault types and diagnosing mixed faults of rolling bearing. On the other hand, the identification of unknown fault types and the diagnosis of two or mixed faults will put forward higher requirements on the performance of the PSO algorithm. The PSO algorithm can be optimized by documents (Abu et al. 2016(Abu et al. , 2017Arqub OA et al. 2017, which can effectively solve the fuzzy problem. Therefore, the combination of PSO algorithm and fuzzy algorithm will further improve the performance and stability of optimal search parameters.
Authors' contribution All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by TJ, YL and SL. The first draft of the manuscript was written by TJ and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding The authors have not disclosed any funding.
Data availability All data generated or analyzed during this study are included in this published article.
Code availability The code that supports the findings of this study is available on request from the corresponding author. The code is not public due to privacy or ethical restrictions.

Declarations
Conflict of interest Author Tingyu Jiang declares that he has no conflict of interest. Author Yakun Li declares that he has no conflict of interest. Author Shen Li declares that he has no conflict of interest.

Supplementary information
The online version of this article (https://doi.org/10.1007/s00500-022-07704-6) contains supplementary material, which is available to authorized users. Multi-fault diagnosis of rolling bearing using two-dimensional… 8185