Early intelligent fault diagnosis of rotating machinery based on IWOA-VMD and DMKELM

The effect of early fault vibration signals from rotating machinery is weak and easily drowned out by intense noise. Therefore, it is still a great challenge to make early fault diagnosis. An intelligent early fault diagnosis method for rotating machinery is proposed based on the parameter optimization of the variational mode decomposition (VMD) and deep multi-kernel extreme learning machine (DMKELM). Firstly, the improved whale optimization algorithm (IWOA) is designed by introducing the iterative chaotic mapping, nonlinear convergence factor and inertia weight to optimize the VMD parameters. Secondly, the optimized VMD (OVMD) with sample entropy is created to reduce noise and reconstruct the signals. Finally, the radial basis kernel function (RBF) and polynomial kernel (PK) are introduced to construct the mixed kernel function, which can enhance the classification performance and generalization ability of the model. Two experiments on bearings and gears show that the fault diagnosis accuracy by DMKELM is 99 and 98.5%, respectively, which is at least 1% higher than comparative methods and increases by 4% after noise reduction. The result shows that the proposed method has great superiority in the early fault diagnosis of rotating machinery.


Introduction
Rotating machines are one of the most common mechanical components in subways, machine tools with computer numerical control (CNC), hydroelectric power plants, steam turbines and other devices [1][2][3]. Their health condition is related to the safe and stable operation of the equipment. Therefore, it is of great importance to monitor the condition of rotating machinery and make fault diagnosis. The life cycle of a piece of equipment generally includes three phases, normal condition, early failure phase and failure phase [4]. The serious failure caused by the gradual deterioration of the early failure phase does not occur immediately. Therefore, it is necessary to explore the early diagnosis of rotating machinery and eliminate hidden hazards in time when the fault is not serious. However, the signal is nonlinear and nonstationary, and the fault information is too weak to be distinguished from strong noise, which makes it hard to obtain the fault feature.
It is crucial for early fault diagnosis to reduce the interference from strong noise and extract the fault features effectively [5]. Empirical mode decomposition (EMD) [6,7], ensemble empirical mode decomposition (EEMD) [8], complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) [9] and empirical wavelet transform (EWT) [10] are common signal processing methods that have been applied in rotating machines [11][12][13]. However, the problem of mode mixing often occurs in EMD, and the algorithm does not have the adaptive ability [14]. Due to the limitation of recursive decomposition, CEEMDAN cannot fundamentally solve the modal mixing problem. The residual noise of the fault feature signal still exists after EWT filtering, it influences the feature acquisition efficiency. VMD [15] is an adaptive signal analysis method that performs better than EMD, EWT and EEMD in signal decomposition and feature extraction [16]. Li et al. [17] used VMD to further decompose the weak fault influence components, and perform accurate fault diagnosis. Li et al. [18] applied VMD and fractional Fourier transform to weak fault diagnosis of bearings. Unfortunately, VMD has a serious disadvantage, and its signal processing effect is related to the quantity of modal components (K) and the penalty factor (a). Therefore, it needs to be preset. The traditional setting of K and a is based on experience or central frequency method which greatly limits the effect of signal processing. To solve this problem, Tian et al. [19] used an improved genetic algorithm to obtain the parameters of VMD. Zhao et al. [20] optimized the VMD parameters using particle swarm optimization (PSO) with the aim of minimizing the envelope entropy and applied the optimized VMD to early fault diagnosis. Gai et al. [21] took the minimum average envelope entropy as the objective function, used the grey wolf optimization (GWO) to adaptively obtain the optimal parameters of VMD, and combined the Teager energy operator to deal with the early fault diagnosis of bearings. The WOA is a new swarm intelligence optimization algorithm proposed by Mirjalili et al. [22]. Its advantages are easy parameter setting, and superior global optimization capability [23]. In [24], it was proved that WOA has better optimization ability than GWO, PSO and other algorithms. It is easy to search locally, but difficult to search globally and it is common to sink into local optimum. In view of the above problems, the iterative chaotic mapping, the nonlinear convergence factor, and the adaptive inertia weight are introduced in the WOA algorithm. The parameters of VMD are optimized by IWOA. Then the OVMD is applied to the vibration signal processing.
The essence of intelligent fault diagnosis is pattern recognition. Traditional pattern recognition algorithms mainly contain support vector machine (SVM), backpropagation (BP) networks [25,26] and enhanced network learning method (ENLM) [27]. However, SVM, BP and ENLM belong to shallow machine learning models, which have weak nonlinear expression between fault characteristics and patterns, and have limited ability to process large amounts of data. Therefore, the shallow machine learning models [28] cannot effectively solve the problem of pattern recognition under the background of big data. Unlike the shallow machine learning models, the deep learning model can perform multi-level nonlinear transformation to process the input data, so that it can learn the multilevel abstract features in the big data and realize the intelligent recognition of the data [29]. Deep learning methods are very suitable for solving these problems. They can extract feature information directly from the original data without much prior knowledge. Convolutional neural networks (CNN), deep neural network (DBN) and Autoencoder (AE) are widely used in deep learning models. Currently, these methods are used for early fault detection in rotating machinery. Li et al. [30] used CNN to diagnose pitting in gears and achieved excellent early diagnosis performance. Gai et al. [21] used DBN for early fault diagnosis of bearings and obtained marvelous diagnostic results. Because of the large quantity of layers, the training time of these deep networks is long, which cannot guarantee the appropriate convergence [31]. To solve this problem, Tang et al. [30] combined the extreme learning machine and the automatic encoder proposed a deep extreme learning machine (DELM), which can effectively decrease the training time and have high classification accuracy. However, since extreme learning machine (ELM) maps samples to the hidden layer by random mapping, the generalization of the model is affected. In addition, DELM uses the sigmoid function as an activation function. The deficiency of sigmoid function is that the gradient disappears and the output does not hit zero, which affects the gradient calculation of the weight update to some extent [32]. Therefore, it is meaningful to choose the suitable activation function to enhance the accuracy of the grid. Mish activation function is a monotonic neural activation function that can be selfregulating. It can make information penetrate better into the neural network to achieve higher accuracy and generalization. Kernel function can map nonlinearly to higher dimensional space, which can enhance the properties of the model. The combination of multiple kernel functions can integrate the advantages of local kernel and global kernel, and make up for the random mapping process of extreme learning machine, so that the model can be improved in terms of generalization ability and learning ability.
The effective combination of models [33,34] can effectively combine the advantages of models, which plays an important role in improving the accuracy of fault diagnosis. Based on this, an intelligent early diagnosis method for rotating machinery based on the optimization VMD and DMKELM is presented, which targets the problem that the weak fault information in the early fault vibration signal of a rotating machine makes the early diagnosis difficult. First, the iterative chaotic mapping, nonlinear convergence factor and inertia weight are introduced into the WOA. A hybrid strategy improved WOA is introduced and used to optimize VMD. The optimized VMD is used in combination with the sample entropy to reduce noise and reconstruct the signal. Then, the reconstructed signal is split and converted into frequency domain signal. Finally, the DMKELM is proposed by introducing the mish function, RBF, PK and the mixed kernel mapping. The classification and generalization capability of the model is increased, and DMKELM is used for intelligent early fault diagnosis. The validation results of bearings and gears show that this method is superior over compared methods in early diagnosis. The contributions are as follows: (1) An improved hybrid strategy WOA algorithm is proposed to optimize VMD. (2) By introducing RBF and PK to construct mixed kernel function instead of random mapping, the proposed DMKELM has strong classification and generalization ability.
where i is present iteration times. Y * (i) is the best whale position so far, Y(i) represents the present whale position, A and C represent coefficients, they can be acquired from the formula (3)-(5) [35]: where r 1 and r 2 are random numbers in (0, 1) and I max is the maximum iteration number, a is the convergence factor, decreasing linearly from 2 to 0.

Hunting behavior
The mathematical model of hunting behavior is formula (6): jis the distance between preys, b represents a constant, and l is the random number in (-1, 1). It is assumed that there is a probabilistic choice contraction mechanism of P i and a probabilistic choice spiral model of 1 -P i to update the position of whales. The mathematical model is as follows: 2.2.1.3 Search for prey When searching for prey, the formula (8)- (9): where Y rand represents the randomly selected whale position. Initialization of populations affects accuracy and speed, and diversified populations are helpful to improve algorithm performance. As is known to all, WOA uses the population generated by the random method, which makes the search space unevenly distributed and affects the search efficiency. Chaos has the characteristics of randomness and convenience. By using the properties of chaos, the diversity of the initial population can be ensured. Iterative chaotic mapping is a typical representative of chaotic mapping, which can make the distribution of the population in the search space more even. Its formula is as (10): In the formula (5), the convergence factor is positively related to the number of iterations and gradually decreases, which easily leads to a slow convergence speed. The nonlinear adjustment strategy can not only guarantee capability of global search and local exploration but also accelerate the convergence rate of the algorithm. The nonlinear convergence factor [36] is as formula (11): Inertia weight can balance the capability of local and global search. This paper introduces adaptive inertia weight [37]. In the early stages of the algorithm, large weights are used to achieve strong global search performance and secure the search space. As the number of iterations increases and the optimal solution is approached, the weight value decreases exponentially, which greatly improves capability of local search. The adaptive weight is as follows: The random walk formula of the whale is:

VMD optimization
Entropy is an indicator for evaluating the sparseness of signals. The smaller the entropy, the better the signal sparsity [12]. In this paper, the local envelope entropy in reference [38] is selected as the objective function to optimize VMD. The optimization steps of VMD parameters using the WOA algorithm improved by the hybrid strategy are as follows: Step 1: Initialize the parameters, set the number of populations, the number of iterations, the range of VMD parameter k; a ½ , the minimum envelope entropy as the fitness function.
Step 2: Randomly generate whale population, calculate the fitness function value, and record the optimal fitness value corresponding to the individual position.
Step 3: When p\0:5 and A j j ! 1, update according to formula (13), and the foraging strategy.
Step 4: When p\0:5 and A j j\1, update the formula (2) to perform the search containment strategy.
Step 6: The judgment is that all the stop conditions are reached. If met, the optimization process is completed and the optimal solution is output. Otherwise, the optimization is carried out according to steps (3)-(5) until the end condition is reached.

Deep multi-kernel extreme learning machine
DELM is a deep network structure that combines the deep structure of a self-encoder with the learning efficiency of the ELM [32]. DELM structures are divided into unsupervised feature learning and supervised feature classification.
Unsupervised feature learning uses the extreme learning machine-automatic encoder (ELM-AE) structure to obtain a compressed representation of data layer by layer. Firstly, the data is mapped to the hidden layer H by h ¼ sigm Wx þ b ð Þ, where W represents the input weight and b is the bias. The hidden layer parameters are obtained by min Hb À Xk The supervised feature classification uses ELM for classification. ELM can extract effective information from the reduced-dimensional samples and enhance the classification performance of the algorithm.
The gradient function of the traditional deep extreme learning machine adopts the sigmoid function. The disadvantage of the sigmoid function is that the gradient disappears and the output does not settle at zero, which affects the gradient calculation of the weight update to some extent. The mish function, as shown in formula (14), helps maintain a smaller negative value and stabilizes the gradient flow of the mesh. It has better accuracy and generalization performance. Therefore, the mish function serves as the activation function in this paper.
Kernel method has superior performance in nonlinear mapping ability, and the most representative method is SVM. The kernel method can effectively map the sample to a higher dimension space, which can enhance the classification properties of the algorithm. Due to the superior performance of the kernel method, the kernel function is introduced to replace the random mapping process of the extreme learning machine. However, a single kernel function is difficult to be used for samples with different data characteristics. In order to achieve stronger learning and generalization ability, the variable weighting method is used to combine two different kinds of kernel functions and construct multi-kernel functions. Multi-kernel functions can combine the advantages of local and global kernel functions and improve the generalization and learning capabilities. The radial basis function is characterized by fast convergence and high learning ability. The polynomial kernel has a great influence on the sample and has excellent generalization properties [39]. On this basis, RBF and PK are introduced in this paper, and the formulas are as follows: RBF: PK: Using kernel function K x; y ð Þ,HH T is expressed as formula (17)-(18): The network output is: The mechanism of DMKELM is shown in Fig. 1. Firstly, ELM-AE is applied to acquire the input data layer by layer to obtain reliable features and improve the classification performance. The multi-kernel function combines the advantages of the single-kernel function, realizes the feature mapping in high-dimensional space, and further improves the generalization ability and classification performance of the algorithm.

Early fault diagnosis process
The process of early intelligent fault diagnosis is shown in Fig. 2.
Step 1: Tests were conducted on the fault simulation test bench to collect the vibration signals of bearings and gears in different states; Step 2: The new WOA algorithm is obtained by using the hybrid strategy to improve WOA, and the improved WOA is used to optimize the VMD parameters with the local minimum envelope entropy as the goal; Step 3: The optimized VMD is used to decompose the signal, and the modal component is obtained. The sample entropy of each modal component is calculated. The large sample entropy is discarded, and the residual is reconstructed to obtain the denoised signal; Step 4: RBF and PK kernel functions are introduced, and Mish activation function is used to construct a deep multicore machine for extreme learning; Step 5: The time domain signal obtained in the third step is converted into frequency domain signal, and the test set and verification set are partitioned. The dates are input to DMKELM for diagnosis.

Simulation verification
To prove the validity of the IWOA, WOA, GWO, and PSO are selected for comparative analysis. The maximum quality of iterations of the algorithm is set to 100, the quantity of populations is set to 30, specific parameters are shown in Table 1, and 10 standard test functions are selected for simulation analysis. The advantage of WOA algorithm is using the test function to verify, and the range of corresponding test function is given in [39]. The test function given in [40] can verify the feasibility of hybrid strategy to improve WOA algorithm, as shown in Table 2. F1-F7 are unimodal test functions which are mainly used to test. F8-F10 are multimodal functions mainly used to test the searchability of the algorithm. All algorithms were programmed using MATLAB R2020b. The test function shown in Table 1 is executed 10 times continuously and independently, and the best fitness value of each test function in 10 experiments is recorded to calculate the average fitness value. Table 3 summarizes the test results.
When using the improved algorithm WOA to solve F1-F10, compared with the comparison algorithm, IWOA has the best test function performance and optimization effect, and can obtain a stable global optimal solution. From Fig. 3, compared with WOA, the improved algorithm WOA has a higher optimization speed and can reach the convergence state earlier. This shows that the IWOA has the superiority of fast convergence. In fact, it is not enough to compare the algorithms only based on the average value and the optimal value. Statistical tests are needed to prove that the new algorithm has obvious advantages compared with other algorithms. Chakraborty [41] used Friedman test and boxplot analysis to verify the reliability of the algorithm. In this paper, Friedman test and box graph analysis are used to prove the rationality of the The results are shown in Table 4 and Fig. 4.
The results of Friedman test show the superiority of the proposed algorithm. The box plot analysis confirms the stability of the algorithm in finding the results. All in all, the improved WOA algorithm shows significant advantages in both optimization speed and optimization accuracy, which shows that the introduction of iterative chaotic mapping, nonlinear convergence factors and inertia weights to improve the WOA algorithm is effective.

Data acquisition
Based on validation data from IMS warehouses in Cincinnati. The scanning frequency is 20 kHz. Figure 5 shows the experimental bench. The vibration signal samples of bearing during 1000-1500 min and 4800-5300 min were selected as the samples of normal condition and weak fault [12]. The acquired vibration signals are shown in Fig. 6. When the rotating machine condition deteriorates, the motion stability is influenced firstly, which leads to an increase in amplitude. However, due to the weak disturbance, it is difficult to determine the nature of the rotating machine disturbance in the time and frequency domain. Therefore, advanced fault diagnosis methods must be used to determine the nature of the fault on a rotating machine.

Signal processing
Using a group of collected data as an example, the envelope entropy is used as a fitness value. The number of populations is 10, the quality of iterations is 30, the range of a is (1500, 4500), and the range of K is (4,8). The hybrid strategy is applied to enhance WOA.
To prove the feasibility of the modified WOA, the unimproved WOA, GWO and SFO were chosen for   Bold value indicate the minimum value Fig. 3 Comparison of test function optimization results optimization. As can be seen in Fig. 7, the fitness value varies with the quality of iterations. IWOA used for the optimization of VMD. The fitness value is 3.9264 at the 9th iteration, which is the minimum fitness value reached and the following iteration converges. This shows that IWOA has excellent performance in optimizing VMD, a and K is 2124 and 7, respectively.
Compared with the comparative methods, WOA improved by the hybrid strategy algorithm is far from superiority in terms of accuracy and speed, which proves the effectiveness of IWOA. OVMD is applied to process the signal. Then the sample entropy of component is calculated to reconstruct it. Taking the sample from the above chapter as an example, the obtained modal component and the number of penalty factors are introduced into the VMD for signal decomposition, and seven modal components are obtained. The sample entropy of the seven modal components is calculated as follows:  Hz. When the fault frequency of the bearing is calculated, the frequency of the outer ring of the bearing is 236 Hz. At this time, the outer ring fault of bearing is 1 frequency doubled, 1.5 frequencies doubled and 2 frequencies doubled, and just falls near the three obvious impact frequencies in the envelope diagram, so the outer ring fault of the bearing could be preliminarily determined. Therefore, it could be preliminarily determined that the bearing has an outer ring fault. Envelope analysis of the signals without noise reduction, such as Fig. 9. The comparison of Figs. 8 and 9 shows that the sample-optimized noise reduction method of VMD has great advantages.
In order to further verify the effectiveness of VMD parameter optimization, EEMD is selected as the comparison algorithm and used for signal processing combined with sample entropy for reconstruction. The envelope of the reconstructed signal is shown in Fig. 10. It can be seen that the signal reconstructed with EEMD has a significant effect only at frequency doubling, and the noise reduction effect is not obvious compared with the optimized VMD, which again proves the feasibility of parameter optimization of VMD. Based on the above analysis, the optimized VMD for signal processing can decrease the impact of noise, make the fault feature more obvious and enhance the reliability of fault diagnosis results.

Bearing fault diagnosis
For better pattern recognition, convert the processed signal into a frequency domain signal and split it into training data and test data. 200 sets of data for each condition are collected, which makes up a total of 800 sets of samples. Of these, 200 sets are chosen as test samples, and the remaining 600 sets serve as training samples. The mish activation function is used as the activation function, the regularization coefficient is 10, hidden layer quality is 3, the hidden layer nodes are [60 50 50], the kernel function weight is 0.5, the RBF kernel parameter is 3, and the PK-kernel parameter is 2 and 3. At this time, the bearing error diagnosis accuracy is 99%. At this time, the confusion matrix is shown in Fig. 11. The values of 1, 2, 3 and 4 are the inner ring fault, outer ring fault, rolling element fault and normal condition of the bearing, respectively. In general, the DMKELM has the high accuracy and high feasibility in the early fault diagnosis.
To prove the feasibility of this method, the CNN, the DBN, and the DELM, which are commonly used in deep learning, are selected as comparison methods. The results are shown in Fig. 12. It shows that DMKELM has certain advantages over CNN, DBN and DELM in the early fault detection, which is suitable for the early fault diagnosis of bearings. From      Fig. 12, the precision of diagnosis increases by 4% after the OVMD noise reduction, which indicates that the optimized VMD method shows better noise reduction. When the input of pattern recognition method is a noise reduction signal, the accuracy of DMKELM algorithm is higher than DELM, CNN, DBN, DCNN and IRCNN algorithms by 2.5%, 5%, 5.5%, 1% and 0.5%, respectively. Compared with the model with only one kernel function, the accuracy of DMKELM algorithm is higher by 3 and 1%, indicating that the DMKELM model constructed with multiple kernel functions is very suitable for early fault diagnosis. To further verify the feasibility of the proposed method, EEMD is used as a feature extraction method, and combined with DMKELM for bearing fault diagnosis. The results are shown in Fig. 12g, and the accuracy rate is 97%. Compared with the comparison method, the proposed method is improved by 2%, and the feasibility of the proposed method is reconfirmed.

Gear fault diagnosis
The gear vibration data was collected by SpectraQuest's gearbox dynamics simulation test rig. In Fig. 13, the number of gear teeth in planetary gearbox was 36 and 28, respectively, and that in the parallel gearbox was 100, 90, 36, and 29, respectively. The sample frequency is 12,800 Hz. The vibration signals of four gear conditions are acquired, including normal condition, tooth surface wear, wear between teeth and tooth breakage. The wear degree of tooth surface and inter-tooth wear is 0.15 mm, the depth of broken teeth is 1 mm, and the length is 1.5 mm. Figure 14 is time-domain of four states of gear. The vibration signal of the faulty gear shows no obvious effects, so it is impossible to judge the failure of the gear from the time domain signal.
To reduce the influence of the gear signal, IVMD is applied to decompose and reconstruct signal. To check the effect of noise reduction, an envelope analysis of the reconstructed signal and the original vibration , it can be seen that the original vibration signal contains more noise signals, which slightly affects the subsequent identification of the fault types. The noise signal is significantly reduced after noise reduction and contributes to the increase of fault diagnosis rate.
To improve pattern recognition, the processed signal is converted into a frequency domain signal and divided into training set and test set. The accuracy of transmission fault diagnosis is currently 99.5%, and the result is shown in Fig. 17. The values 1, 2, 3 and 4 represent the four states of the gear, namely wear between teeth, tooth breakage, tooth surface wear and normal state, respectively. The experimental gear data proves that this method is excellent for early diagnosis of rotating machinery.

Conclusions and future work
Early fault impact component in the vibration signal of rotating machinery is weak and easily overlaid by    intense noise, resulting in low fault diagnosis accuracy. An intelligent early fault diagnosis method based on IVMD and DMKELM is proposed. The method is shown to be highly practical in early fault diagnosis. The conclusions are as follows: (1) The iterative chaotic mapping, nonlinear convergence factor, and inertia weight are introduced into WOA, IWOA is proposed to optimize VMD. Compared with other comparative methods, IWOA is superior in optimizing accuracy and speed. (2) The optimized VMD is combined with the sample entropy to denoise the signal. Experiments show that this method has a good denoising effect. (3) The generalization ability and precision of the DELM are improved by adding a mish activation function and a mixed kernel function. A DMKELM method is proposed for early detection of fault patterns, and the experimental accuracy of bearings and gears is 99 and 99.5%, respectively.
Although the proposed DMKELM achieved good diagnosis results, due to the introduction of the kernel function, the diagnosis result of the model is not only influenced by the quality of nodes in the hidden layer and the regularization coefficient, but also closely related to the weight value and the parameters of the kernel function. Future research will focus on optimizing the DMKELM hyper parameters to enhance the performance of the model. In addition, weak signals have an uncertain impact on the accuracy of fault diagnosis. The question of avoiding weak signals is of great value in improving the accuracy of fault diagnosis and should also be the subject of future research.