An Improved Rolling Bearing Fault Diagnosis Method using DenseNet-BLSTM

To solve the problems of the conventional rolling bearing fault diagnosis method requiring a large amount of prior knowledge and easy to introduce error artiﬁcially, this paper presents a DenseNet-BLSTM method for rolling bearing fault diagnosis. The method combines the superiority of the multi-scale deep feature extraction ability of the densely connected convolutional network (DenseNet) and the advantage of bidirectional long short-term memory (BLSTM) in sequence modeling. First, multiscale abstract features are extracted from the vibration signal of a rolling bearing using one-dimensional convolution kernels. Then, the BLSTM was used to learn the time-dependence of features. Finally, the feature information is mapped to corresponding fault modes by the fully connected layers. The experimental results show that the accuracy of the proposed method is 99.5% in multi-load scenarios, and the method has good load adaptability and anti-disturbance ability.


Introduction
With the rapid development of modern industry, machinery and equipment are constantly evolving in the direction of complexity and automation.It improves work efficiency and also puts forward higher requirements for the safe and stable operation of mechanical equipment [1,2].As one of the core components of mechanical equipment, the rolling bearings support various alternating stresses during the operation of the equipment, and the probability of fault is relatively high [3].Once the bearing is defective, it may cause an accident and cause heavy losses.Therefore, it is important to diagnose the fault of the rolling bearing in time.
The fault of rolling bearing usually occurs in the inner raceway, outer raceway, and rolling element of bearing.It will produce periodic impacts with the operation of mechanical equipment [4].Therefore, the bearing faults can be identified by analyzing its vibration signal.The conventional fault diagnosis techniques of rolling bearings usually extract the fault feature based on signal processing algorithms, such as fast Fourier transformation (FFT) [5], wavelet transform (WT) [6], empirical mode decomposition (EMD) [7], etc.Then, the principal component analysis (PCA) [8] and independent component analysis (ICA) [9] is used to further filter the feature of the fault.Finally, the classifiers, such as support vector machine (SVM) [10], k-nearest neighbors (KNN) [11], and multi-layer Perceptron (MLP) [12], are used to identify the faults of rolling bearings.These methods have achieved great success in the field of fault diagnosis, but there are still some disadvantages that cannot be ignored.For example, these feature extraction and selection methods rely on a large amount of prior knowledge, which is easy to introduce errors artificially and have poor generalization ability [13].In addition, the shallow network model such as SVM, is difficult to establish a complex data recognition mechanism.Therefore, the conventional fault diagnosis methods of rolling bearings have some limitations.
With the development of the continuous improvement of modern fault monitoring techniques, the data available for analysis has also increased significantly.Due to the Deep learning technology can process large amounts of data and perform intelligent analysis, it has gradually become a research hotspot in the field of fault diagnosis [14].For example, the convolutional neural network (CNN) has been widely used because of its powerful feature extraction capabilities [15].Janssens et al. [16] proposed a rotating machinery fault detection method based on CNN, which extracts features through convolution kernel and achieves end-to-end fault diagnosis.Wang et al. [17] proposed a CNN-based model which adopts the multi-head attention mechanism to optimize the CNN structure and develop a new convolutional network model for intelligent bearing fault diagnosis.DenseNet [18] is a structure obtained by densely connecting convolutional layers.Compared with the standard CNN structure, it is perfectly utilized the features of different levels and easy to implement the deep network structure.Albahli et al. [19] proposed an enhanced DenseNet model for Coronavirus disease detection, which achieved better performance than ordinary CNN.Although these methods can effectively extract abstract features, it is difficult to fully capture the internal correlation of the bearing vibration signals in the time domain.
Long short-term memory (LSTM) is an improved recurrent neural network (RNN) that can effectively process sequence data.It can keep the gradient stable during training by the gate structure, which can extract the temporal features of the signal.Liu et al. [20] proposed a bearing fault prediction model combining LSTM and statistical process analysis, which has higher prediction accuracy than RNN.Combined LSTM and multi-layer self-attention mechanism, Xia et al. [21] proposed a remaining useful life estimation method to improve the operation and maintenance of the mechanical system.However, the memory cell of each time step in LSTM only contains the previous input information, and it is not sufficient to capture the time dependence of the periodic signal.To solve this problem, graves et al. [22] proposed a bidirectional long short-term memory (BLSTM) to learn the time dependence of data from the forward and backward, which can fully capture the temporal features of the input signals.However, these methods are still difficult to extract the abstract features of the fault signal, it is necessary to be combined with other feature extraction methods to effectively identify the bearing fault.
This paper devised an improved network that combines DenseNet and the BLSTM (called DenseNet-BLSTM hereafter) to diagnose the rolling bearing fault.The DenseNet is employed to extract and fuse the abstract features, the BLSTM is applied to further extract temporal features on multi-level abstract features.
(i) We designed an end-to-end bearing fault diagnosis network, DenseNet-BLSTM (DBLSTM).The network takes the original vibration signal as the input, and combines the advantage of the DenseNet and the BLSTM to fully extract the features of bearing vibration data.(ii) The network uses the DenseNet to extract and fuse the multi-level abstract features, which enhances transmission of features and solves disadvantages of the standard CNN structure.(iii) The network uses the BLSTM to capture the time dependence of the feature information, which overcomes the disadvantage that the memory cells of the LSTM structure can only capture the information of the previous time steps.(iv) The validity of the presented method is demonstrated by experiments in a variety of scenarios.The results show that the presented network can accurately locate the bearing fault and identify the degree of the fault, and also has strong load adaptability and good antiinterference ability.The comparisons are also given to verify the advantages of the presented method.The rest of this paper is organized as follows.Section 2 reviews the theories of the related network.Section 3 presents the DenseNet-BLSTM method in detail.Section 4 presents the results of the experiments in various scenarios.Finally, section 5 presents the conclusion.

Preliminaries
This section describes the network of CNN, DenseNet, LSTM, and BLSTM.

CNN
CNN is a kind of deep feedforward neural network with the characteristics of local connection and weight sharing [23].It can adaptively extract the feature information hidden in the data through convolution and pooling operations.With the deepening of the network layers, the extracted feature information will be more abstract.The structure of CNN is usually composed of convolution layers, pooling layers, and fully connected layers [24].
The convolution layer employs multiple filters to obtain the feature map of the input data through convolution operation and activation function.The output of the convolutional layer is as follows: where i represents the output of layer l, X ji represents the weight matrix, B (l) i represents the bias, and f (•) denotes the nonlinear function.
The pooling layer reduces the dimension of the feature map by calculating the local maximum or average value, which filtering partial redundant information [25].After obtaining the abstract feature information through multiple convolution layers and pooling layers, the fully connected layer further fits the mapping relationship between feature information and labels.For the multi-classification task, the last fully connected layer uses Softmax to obtain the classification results.The calculation method can be expressed as: where k represents the sum of all categories, a i is the input corresponding to the label i, Softmax(a i ) denotes the probability that the sample input to the network corresponds to the label i, the label corresponding to the maximum probability value is the prediction result of the network.

DenseNet
The DenseNet model draws on the advantages of Inception and ResNet models [26].On the premise of ensuring the transmission of information in the network, DenseNet connects all layers directly, it makes the input of each convolution layer in the network include the output of all previous layers.The structure of DenseNet is shown in Fig. 1.This structure enables the network to extract abstract features at different levels, which enhanced the transfer of features.In addition, each convolution layer has shortcut connections close to the input and output, which reduces the vanishing gradient problem.
In a standard CNN, the number of connections is equal to the number of convolutional layers.However, in DenseNet, when there are L convolution layers in the network, the number of connections between the layers is L (L + 1) /2.The DenseNet has combinations of batch normalization (BN), rectified linear unit (ReLU) activation function, convolution, and shortcut operations.It makes the input of each layer come from the output of all previous layers.The output of l layer is then computed as where x 0 is the input of the DenseNet, x l−1 is the output of layer l − 1, H l (•) represents the non-linear transformation operation of layer l,

LSTM
To solve the problem of gradient vanish and explosion during the training of RNN, Hochreiter et al. [27] proposed the LSTM structure which is an improved sequence model.The gate structures are introduced into each memory unit, so LSTM can control the accumulation of internal information.The structure of LSTM is shown in Fig. 2.
The cell of the LSTM unit consists of three gates, namely the forget gate f t , input gate i t , and output gate o t .The three gates are expressed as where h t−1 is the output of the previous cell, x t is the current input, W is the weight matrix, b is the bias, σ is the Sigmoid activation function.After obtaining the value of three gates, the memory cell updates the current state and obtains the output.The calculation process is as follows where c t is the updated status and is obtained by the state ct and the state of the previous c t−1 by calculation in three gates respectively.The output, h t , in ( 9) is obtained by filtered c t through the tanh function and the output gate o t .

BLSTM
BLSTM is an improved structure by introducing the time direction based on LSTM [28].Through two LSTM layers, the input data is modeled sequentially from forward and backward.Supposed the length of the input sequence is T , the BLSTM structure is shown in Fig. 3.
In the BLSTM model (Fig. 3), the forward layer iterates from time step 1 to T , while the backward layer is in contrast.The output is determined by the state information of two LSTM layers jointly.The calculation process is where H is the activate function, W and b are weight and bias respectively.− → C t and ← − C t are the states of the forward and backward layer at time step t respectively.y t is the output of the time step t.The state of each time step contains the information input in previous and subsequent time steps, thus the BLSTM can more fully learn the time dependence of the sequence.

Model
This section describes the structure of the presented DenseNet-BLSTM.

DenseNet-BLSTM
The DenseNet can adaptively extract and fuse the multilevel abstract features of the signal, reducing the dependence on prior knowledge of fault diagnosis and the error caused by the artificial selection of features.The BLSTM can capture the time dependence of the feature sequences, which makes the feature information more comprehensive.Combining the advantages of DenseNet and BLSTM, this paper presented a DenseNet-BLSTM network for fault diagnosis of rolling bearings.
Fig. 4 shows the structure of DenseNet-BLSTM.Inputting the original vibration signal, the abstract features of bearing fault are extracted by the DenseNet.Then, BLSTM is used to fully learn its time dependence from forward and backward.Finally, the fully connected layer is used to map the relationship between the feature information and the fault mode.The diagnosis result is obtained by the Softmax classifier.

Model training
During the training processing, the parameters of DenseNet-BLSTM are updated according to the error.The mapping relationship between the input information and the true label is fitted.The cross-entropy loss function is used as the cost function of the error in this paper.
The training processing is as follows: where ŷ(i) and y (i) is the predicted value and real value of the sample i, respectively.N is the total number of samples.
To improve the training efficiency of the presented DenseNet-BLSTM and avoid the problem of network parameters falling into local optimality, a batch training method is used.The training method is supervised learning, and the parameters are updated by the mini-batch gradient descent algorithm.The weight term W and the bias term B of the layer l in DenseNet-BLSTM are updated as follow: where η represents the learning rate; (X, Y ) is the input training data, X and Y represents the sample sequence and the corresponding label, respectively.L denotes the loss function.

General procedure of the proposed fault diagnosis framework
In this study, an Improved fault diagnosis method based on DenseNet and BLSTM is proposed for bearing fault diagnosis in different conditions.The DenseNet-BLSTM network structure is applied for intelligent fault feature extraction and fusion, avoiding the problems of incompleteness and subjectivity of manual feature extraction and selection.Fig. 5 shows the flowchart of the proposed method.The procedures are summarized below: Step 1: The bearing vibration signals are sampled by the experimental platform.Data augmentation and standardization are applied to data preprocessing.
Step 2: The DenseNet extracts and fuses the multi-level abstract features of the bearing vibration signal, and then obtain feature sequences.
Step 3: The BLSTM models feature sequences from the forward and backward directions, fully capturing the time dependence of the feature sequences.
Step 4: The fault diagnosis results are obtained by the fully connected layers and Softmax classifier.Then calculate accuracy of experiments in a variety of scenarios.

Experiment and analysis
This section gives the experimental results, including the description of the data set, data preprocessing and the results of experiments.The presented DenseNet-BLSTM was compared with the 4 other algorithms.

Data set
The experimental data are provided by the rolling bearing data center of Case Western Reserve University (CWRU) [29].The data set contains the bearing vibration signals under different loads, different fault locations, and damage levels.The CWRU bearing fault test platform is shown in Fig. 6.The bearings SKF 6205-2RS are used in the experiment.They are operated under four different loads: 0 HP, 1 HP, 2 HP, and 3 HP.Faults ranging from 0.007 inches in diameter to 0.021 inches in diameter were introduced separately at the inner raceway, rolling element (i.e.ball) and outer raceway.The vibration signal of the bearing is collected by the acceleration sensor at the sampling rate of 12 kHz.Fig. 7 shows the time-domain waveforms with three fault signals with a small 0.007-inch groove and a normal signal at 0 HP load.

Data augmentation and standardization
Every 1024 frame in a data set of the vibration signal are selected as a sample data.To avoid the over-fitting problem during the training, the oversampling method is used to augment the data set.The sampling window with a length of 1024 is moved along the time axis in steps of 800 frames, and the overlapping part is 224 data points.The process of data processing is shown in Fig. 8.There are 4000 samples in the enhanced dataset.It is shown in Tab. 1.
Additionally, to make the training process more stable, Z-Score standardization is used to map all features of the data to the same scale.The Z-Score standardization is express as where x is the average value of the data, σ is the standard deviation.
After the transformation of Z-Score standardization, the data with the mean value of 0 and standard deviation of 1 are obtained.

Vibration signal sampling and preprocessing
Additionally, the training settings of the presented DenseNet-BLSTM are as follows: the number of epochs is 20, the data of each epoch is randomly sorted and every 200 samples  are divided into batches, the initial learning rate is 0.005 which will decrease by a multiple of 0.8 every 5 epochs, The Adam optimizer is used and the execution environment is a single GPU.

Experiment in single load
To verify the effectiveness of the presented DenseNet-BLSTM in a single load scenario, the data set is divided into 4 subsets under 0 HP, 1 HP, 2 HP, and 3 HP loads.The 5 method, DBLSTM, CNN, BLSTM, LSTM, and CLSTM, are conducted experiments respectively.To avoid accidental errors, each experiments is repeated 10 times, and the average accuracy is recorded on Fig. 9 and Tab. 4.
It can be concluded that the accuracies of LSTM and BLSTM are less than 80%, which are much lower than that of DBLSTM, CLSTM, and CNN with convolution kernel structure.The reason is that their structures are difficult to extract abstract features and express fault features insufficient.The DBLSTM, CLSTM, and CNN with convolution kernel structure achieve accuracy over 97%, which reflects the advantage of the convolution kernel to capture abstract features.The presented DBLSTM can extract and fuse the multi-level abstract features of the signal, and then further give temporal dependence to the features.Therefore, the recognition accuracy of DBLSTM achieves the highest recognition accuracies more than 99%, which reflects the effectiveness and advantages of the presented method.

Experiment in multiple load
In actual application scenarios, the bearings are usually operated at different load conditions.To verify the effectiveness of the presented DBLSTM at different load, the data of 4 types of loads are mixed for experiments.After loading the data set, The 5 method, DBLSTM, CNN, BLSTM, LSTM, and CLSTM, are also used for training and testing.
To verify the stability of the presented DBLSTM, the experiment was repeated 10 times and the average accuracy and standard deviation were recorded in Tab. 5, and Fig. 10.The results illustrate that, the recognition accuracy of LSTM is 81.2%, the BLSTM model performs better than LSTM, the accuracy is 83.8%.Meanwhile, the recognition accuracies of the networks with convolution kernel struc-     has good stability.Compared with the CNN and CLSTM, DBLSTM also shows the advantages of dense connection in feature transfer.Furthermore, compared with the CNN, the CLSTM model also shows the advantages of combining the convolution kernel structure with the LSTM, the feature extraction is more sufficient and the training time is significantly reduced.Therefore, the presented method achieves the best fault diagnosis performance and has a relatively high learning efficiency.
The initial learning rate of DBLSTM is an important parameter during the training process.It has an important impact on the performance of training and fault recognition.Therefore, the experiment explored the influence of five initial learning rates (0.0005, 0.001, 0.005, 0.01, 0.02) on the convergence speed of the DBLSTM.Fig. 11 shows the experimental results.It can be seen that DBLSTM performs best when the learning rate is 0.005, and it is set as the final experimental parameter.Fig. 12 is the fault recognition confusion matrix of the DBLSTM.The ordinate is the label predicted by the network and the abscissa is the real label.The elements on the diagonal line represent the correct results of the model prediction.The bottom row counts the accuracy and average value.It is concluded that the presented DBLSTM achieves a fault recognition accuracy of over 99% in the multi-load scenario.It shows that the DBLSTM can accurately determine the location and the degree of the bearing fault, and has good load adaptability.

Experiment with random interference
In real application scenarios, irregular vibrations usually occur when the bearings are operating.To verify the effectiveness of the presented DBLSTM, white Gaussian noise is added to the multi-load data set, the samples with different signal-to-noise ratios (SNR) is obtained to simulate working conditions with different interference.Fig. 13 shows the vibration signals after adding noise at 0 HP load.Compared with the original signal, its periodic impact component is obviously weakened after adding noise.As the SNR continues to decrease, it will be more difficult for the network to identify bearing faults.
Fig. 14 shows the results of DBLSTM, CNN, BLSTM, LSTM, and CLSTM when the SNR is increased from 10 dB to 30 dB.It shows that the presented DBLSTM still has the best performance with random interference.When the SNR exceeds 20 dB, its accuracy reaches over 99%, which is close to the DBLSTM's recognition accuracy for the original signal.Therefore, the presented DBLSTM has a good ability of anti-interference.
As the signal-to-noise ratio continues to decrease, the recognition accuracy of the five methods also decreases.The models with convolution kernel structure, DBLSTM, CNN, and CLSTM, have better anti-interference ability than the two sequence models, BLSTM and LSTM.The reason is that the convolution and pooling structure can extract more abstract features, thus reducing the impact of  random interference.Therefore, the presented DBLSTM shows the best fault diagnosis performance and good robustness with noise interference.

Conclusion
This paper presents an improved rolling bearing fault diagnosis method using DenseNet-BLSTM.It solves the problem of incomplete and subjective of manual feature extraction and selection, and avoids the complicated signal processing process in conventional bearing fault diagnosis methods.The presented method has good load adaptability and ability of anti-interference.(iv) Combining the presented method with other deep learning algorithms to further simplify its time complexity and improve its performance based on big data is an interesting topic.Additionally, it is important to develop a robust and reliable network is another promising topic in the future.

Figure 5
Figure 5 The overall fault diagnosis framework based on DenseNet-BLSTM.

Figure 9
Figure 9The result of experiment in single load.

Figure 10 Figure 11
Figure 10 Accuracy distribution at multiple load.

Figure 13
Figure 13Bearing vibration signal diagram with noise.

Figure 14
Figure 14 Recognition accuracy testing on signals with different SNR.
(i) Combining the advantages of DenseNet and BLSTM, an end-to-end bearing fault diagnosis methd (DenseNet-BLSTM) is presented to fully extract the features of bearing vibration data.(ii) The original vibration signal is taken as the input of the DenseNet-BLSTM.The DenseNet composed of densely connected convolutional layers, extracts and fuses the multi-level abstract features of the bearing vibration signal.Then, BLSTM fully captures the time dependence of the features.Finally, the fully connected layers and Softmax classifier map the feature information to corresponding fault modes.(iii) Experiments in a variety of scenarios verify that the presented DenseNet-BLSTM can accurately diagnosis the location and the degree of the bearing fault.The fault diagnosis accuracy of bearing achieved 99.5% in multi-load scenarios.

Table 1
Data set of experiment

Table 3
The parameters of each model

Table 4
Average accuracy in single load

Table 5
Experimental result at multiple load