A method for bearing fault diagnosis of mine hoist using convolutional attention autoencoder

In view of the complex environment and frequent faults in the actual operation of mine hoist, a fault diagnosis method based on Convolution Attention Autoencoder (CAAE) is proposed through theoretical analysis and experimental veriﬁcation to improve the diagnostic stability of mine hoist under strong noise. First, a CAAE is constructed, which uses a combination of a convolutional neural network (CNN) and a channel attention module (CAM) to compress and encode the input signal, and then the input signal is reconstructed by a decoder to train the CAAE to extract the original signal fault features. Then, a fault diagnosis clas-siﬁer is constructed to classify diﬀerent fault patterns. Finally, experimental validation is performed with the Case Western Reserve University bearing dataset. The results show that the method has a strong feature extraction capability and a high classiﬁcation accuracy for bearing failure modes compared with existing methods. And the experiments on the application eﬀect of the proposed method in noisy environment are conducted


Introduction
Mine hoist is a crucial equipment connecting ground and underground, and its stable operation is directly related to the safe and efficient production of coal mines (He et al. 2016) (Kou et al. 2020). As a statistical fact, 70% of the failures of rotating machinery are caused by vibration (Jiang et al. 2013), 30% of which are caused by bearings (Gu et al. 2020). However, the characteristics of vibration signals, such as nonlinearity and nonsmoothness, bring great challenges to signal fault diagnosis. Therefore, it is important to investigate mining hoist bearing fault diagnosis methods for the special working environment of mines.
Since conventional machine learning methods are mostly shallow networks, it is difficult to handle signals with strong noise interference, which limits the ability to extract high-dimensional features (Qi et al. 2017). In recent years, fault diagnosis methods based on deep learning have received wide attention, and the core of deep learning is feature learning, which aims to learn fault features from the input signal by highlevel abstract modeling of the signal through deep networks. To effectively improve the signal-to-noise ratio (SNR) of vibration signals (Hamadache et al. 2017) and extract signal fault features, therefore, CNN (Eren et al. 2019)  and Autoencoder (AE) (Chen et al. 2017) (Ma et al. 2018) have been widely used for bearing fault diagnosis. Wu et al. proposed a fault diagnosis method based on convolutional neural network to solve the extraction of nonlinear signal features by using the powerful feature extraction ability of the one-dimensional convolutional neural network (1D-CNN). Liu et al introduced a multi-task 1D convolutional neural network model to improve the accuracy of the model by using different branches of different tasks to process the fault feature information at individual locations, and the performance of the model was verified by comparing with other networks (Liu et al. 2020). Wen et al. studied a fault diagnosis method for automatic extraction of fault features, which converts 1D signals into two-dimensional images and uses twodimensional convolutional neural networks (2D-CNN) for feature extraction, eliminating the effects of incomplete manual feature extraction (Wen et al. 2016). Liu et al proposed a CAE method for bearing fault diagnosis using CNN to enhance the feature extraction capability of denoising autoencoders and the effectiveness of the method was verified with bearing and gearbox datasets ). Che et al. extracted the input signal fault features using a denoising autoencoder and input the features into a convolutional neural network model for fault mode diagnosis, which improved the diagnostic performance of the model in a noisy environment (Che et al. 2020). However, the above fault diagnosis method, which cannot effectively extract the key features of the noisy input signal, limits the application of the approach to equipment fault diagnosis in noisy environments.
With the above analysis, the paper has presented a fault diagnosis method using a combination of 1D convolutional autoencoder (CAE) and channel attention mechanism. First, the attention mechanism is introduced in the CAE network, and the key features of the signal are extracted by the established attention module, and the residual learning is introduced to retain the fault information of the input signal; then, a deep learning classifier model is used to build a deep learning classifier model to classify the extracted fault features to obtain the diagnostic results of the fault patterns. The results of the experimental comparison and analysis verify the good diagnostic performance and stability of the proposed method. The contribution of the innovation of the paper is summarized as follows.
(1) A channel attention module based on attention mechanism and residual theory is designed in this paper for enhancing the overall model for feature extraction of key signals of the input signal.
(2) For the first time, a novel feature extraction method is built by combining the channel attention module with a convolutional autoencoder, which can extract the key fault features in the bearing vibration signal.
(3) The performance of the model in bearing fault diagnosis is further revealed by comparing with four methods in a noisy environment.
The remainder of the paper is broadly structured as follows. In Section II, the design of each module of the convolutional attention autoencoder network and the framework and process of the bearing fault diagnosis method are described in detail. A bearing fault diagnosis the framework was established. Section III presents the data set processing and model evaluation criteria for bearing fault diagnosis experiments. Section IV verifies the performance of the proposed method in the paper through experiments, and analysis and discussion are conducted. Finally, the work of this paper is summarized in Section V.

CAE Module
An autoencoder is an artificial neural network that learns an efficient representation of the input data by compressing the input into a kind of hidden space for representation and then reconstructing the input data using a decoder (Shao et al. 2017), and the basic structure of is demonstrated in Fig. 1. Fig. 1 The structure of an autoencoder The training sample Q = [q 1 , q 2 , · · · , q m ] is given, and for each q n = [q 1 , q 2 , · · · , q D ] T (nǫ[1, 2, . . . , D]). The input Q is transformed into the hidden space H = [h 1 , h 2 , · · · , h m ] by the encoder, where h n = [h 1 , h 2 , · · · , h D ] T . By the following function transformation where f (·) is the sigmoid activation function; W (1) and b (1) are the weight matrix and bias, respectively. By using the decoder to reconstruct H into V, the reconstructed output component where W (2) and b (2) are the weight matrix and bias, respectively. The autoencoder is designed to obtain the optimized encoder weights and bias parameters by training. The mean squared deviation is usually used as the loss function of the autoencoder.
A CAE is composed of a combination of a 1D-CNN and an AE, where both the encoder and decoder are built from convolutional layers. The basic framework of the convolutional autoencoder is shown in the Fig.  2. The encoder can use convolution operations to compress the input signal into a specific spatial representation, which is then reconstructed into the input signal using the upsampling operation in the encoder.

Channel Attention Module
By mining the interdependencies among the channel mappings, the limited attention is focused on the focused information, thus saving resources and obtaining effective feature information (Fu et al. 2019).
Throughout the fault feature extraction process of vibration signals, some features may be irrelevant to the fault information and may even obtain a wrong diagnosis. For this reason, this paper introduces channel attention module (CAM) to selectively extract useful channel information and suppress useless channel information by mining the interdependencies between channel mappings, and adaptively learn fault-related features (Xu et al. 2021).

Fig. 3 Channel attention module structure
The detailed idea of the CAM is given in the Fig. 3. Y is the feature map of the 1D convolutional output, and the input Y = [y 1 , y 2 , · · · , y C ], y i ǫR W ×1 , where C is the number of channels, iǫ[1, 2, · · · , C], and W is the feature width. Y is compressed by Global Average Pooling (GAP) to obtain the channel attention component U = [u 1 , u 2 , · · · , u C ] defined as The 1D excitation weights Y (Y = [z 1 , z 2 , · · · , z C ]) are learned by training with one-dimensional convolution and fully connected layers to activate each layer of channels, and the formula is where F (·) and G(·) are the one-dimensional convolution with channel number 1 and the fully connected operation with node number C, respectively; σ and δ are the Relu and Sigmoid functions, respectively. The attention to the critical channel domain is enhanced by multiplying the input Y with the excitation weight Z. The Equation is as follows The idea of residuals is introduced to improve the discriminability of the features while preserving the original information, and the final channel features are obtained as

Convolutional Attention Autoencoder Module
In this paper, based on the feature extraction capability of 1D-CNN on temporal signals  and the learning capability of the channel attention mechanism on key features, the one-dimensional CAE is combined with the channel attention module to build a CAAE module to extract the fault characteristics of the input signal. The structure of the CAAE module is shown in Fig.  4. The encoder part consists of three one-dimensional convolutional layers and three attention modules, and the input signal is compressively encoded by the threelayer convolutional attention module to obtain the fault feature representation of the input signal. The decoder consists of three upsampling layers and three 1D convolutional layers to reconstruct the encoded input signal and gradually recover the feature dimensionality of the input signal. The loss function is used to minimize the error between the reconstructed signal and the reconstructed signal to achieve a functional mapping as a function of the difference between the two.
The parameters of each layer of the CAAE network are configured as shown in Table 1. The input signal sample length is 2048, and the dimension is compressed to 256 by the encoder, and the number of channels is 64 in the hidden space. The decoder is then used to decode and reconstruct it into the input signal itself, thus achieving feature extraction of the input signal.

Mine hoist fault diagnosis process
To achieve mine hoist fault diagnosis, a method based on convolutional attention autoencoder fault diagnosis is proposed, and the overall fault diagnosis method flow chart is shown in the Fig. 5, and the specific diagnosis steps are as follows.
Step1: Obtain the original vibration signal. Step2: The original signal is window sliding according to the window length of 2048 and step size of 300 to obtain the data set, and divided into training set and test set, and the sample data are processed by random sorting.
Step3: The training and test sets are normalized by the formula where x is the input data; µ is the mean of the input data; and ν is the variance of the input data.
Step4: The training set is fed into a CAAE module to train the autoencoder module, where the autoencoder is trained using an SGD optimizer with a loss function of binary crossentropy, a batch training volume of 64, and a total number of iterations of 200.

Window length 2048
Step length 300  Step5: The decoder part of the CAAE is removed, and the fault features obtained by the encoder module are input into the batch normalization layer for batch normalization operation.
Step6: The batch normalized features are fed into the GAP layer for global average pooling to reduce the number of training parameters and avoid overfitting.
Step7: The final classification is performed by softmax classifier.

Experimental Analysis
In the paper, the proposed CAAE mine hoist bearing fault diagnosis method is implemented in the tensorflow framework of python3. In the training fault diagnosis model, Adam optimizer is used with a learning rate of 0.001 and a batch training volume of 64.
The model performance was validated using publicly available data on rolling bearings from Case West-  Fig. 6 and consists of a motor, torque sensor, power test meter, and electronic controller involving three types of failures: inner race, outer race, and ball failures. The drive end bearing model SKF6205-RS JEM SKF was used to machine faults in the inner race, outer race and ball of the bearing using electric spark with fault sizes of 0.1778 mm, 0.3556 mm and 0.5334 mm.

Fig. 6 Bearing test platform
The data used for the experiments were selected from 9 fault modes and normal modes at a motor load of 1 HP, a sampling frequency of 12 kHz, and an approximate motor speed of 1772 r/min, for a total of 10 fault diagnosis modes, and the time-domain waveforms of the 10 modes are shown in Fig. 7. The used training set and test set experimental data are sampled by sliding window, where the sampling window length is 2048 sample points and the step length is 300 sample points. The data sets were collected for the 10 types of vibration data separately, and 320 training samples were selected for each type, containing a total of 3200 samples, and the test set contained 802 samples, as indicated in Table 2.
The diagnostic performance of the CAAE network is measured by three performance metrics: accuracy Acc, precision P re and recall R (Zhang et al. 2020), which where T P , F P , T N and F N are the number of true positive, false positive, true negative and false negative samples, respectively, and the accuracy, precision and recall are in the scale from 0 to 1. To a more comprehensive evaluation of the model performance, we also conducted stability experiments of the CAAE model. Gaussian white noise is added to the original signal, and the noise immunity of the CAAE model is verified by comparison experiments. Signal-tonoise ratio is an important index for evaluating signal noise, defined as where, P signal and P noise are the power of signal and noise, respectively.

Experimental results
As shown in Fig. 8  From Fig. 9, it is observed that the error curve is gradually decreasing as the iterations increase, and convergence is reached at a shorter number of iterations, indicating that the proposed method converges quickly and steadily.
The output features of the key layers in the CAAE fault diagnosis network are visualized by the t-SNE technique (Zheng et al. 2018), which displays the output features of the convolutional attention autoencoder module, the GAP layer and the classifier in 2D space. The distribution of the fault features in the 2D space can be clearly seen as shown in Figs. 10. The Fig.  10 shows that there are different degrees of overlap in the 2D visualization of the output features of the convolutional attention autoencoder module. In Fig. 11, the output features are better classified after the GAP layer. In contrast, the output feature visualization results in a Fig. 12 show that the fault types are better separated. As the results indicate, the method proposed in this paper can effectively diagnose bearing failures.
To better understand the diagnostic accuracy of each failure mode, the confusion matrix was drawn. As shown in the Fig. 13, where F1, F2, F3, F4

Experimental comparison
The encoder part of the CAAE network designed in this paper is implemented using a convolutional channel attention module (CCAM) composed of stacked convolutional layers and CAM. To verify the effect of the number of CCAM on the performance of the model, four network structures are built: CCAM-1, CCAM-2, CCAM-3 and CCAM-4, respectively, followed by a number representing the number of CCAM. Table 3 shows that the accuracy of the model increases with the increase in the number of CAAM. It shows that stacking the number of modules can be achieved to improve the performance of the fault diagnosis model. When the number of modules is increased to four, the fault diagnosis rate of the model remains basically unchanged. However, this increases the testing time of the model. Therefore, in this paper, the encoder part of the autoencoder is built using three CAAM module.
To verify the performance of the proposed method in this paper, four metrics of accuracy, precision, recall and time of CAE, 1D-CNN, convolutional attention mechanism (CA) and 2D convolutional neural network (2D-CNN).
As shown in Table 4, we can see that the accuracy, precision and recall of the proposed method in this paper are 99.63%, which is higher than the other four networks. Comparison with the CAE method shows that adding the channel attention mechanism can effectively extract the fault features of the input signal, and comparison with the CA method shows that the feature extraction method through the autoencoder can more comprehensively extract the features of the bearing fault signal. And compared with CNN model, which shows that simple convolutional neural network can obtain higher accuracy, but not higher accuracy. And the accuracy of 2D-CNN is 1.24% lower than that of 1D-CNN, indicating that the process of converting 1D sig-nals to 2D signals through 2D-CNN loses some useful information related to faults.
To verify the noise immunity of the CAAE, it was tested against CAE, 1D-CNN, CA and 2D-CNN at signalto-noise ratios of -6dB, 0dB and 6dB. The experimental results are shown in Table 5. The accuracy, precision and recall of the proposed method in this paper are better than the other methods under three signal-tonoise ratios. Among them, the accuracy of 90.52% can still be obtained when the signal-to-noise ratio is -6dB, which indicates that the proposed method has stronger noise immunity than the other four methods. When the signal-to-noise ratio is 0 dB and 6 dB, the accuracy only decreases by 7.24%, which is the smallest decrease compared with other methods, further demonstrating the noise immunity of the proposed method. The accuracy, precision and recall of the presented method are better that 99% at 6 dB signal-to-noise ratio. In addition, the average test time for the three scenarios is 0.081s, indicating the good real-time performance of the presented method.

Conclusion
In this paper, a convolutional attention autoencoder mining hoist bearing fault diagnosis method is proposed, and a detailed theoretical analysis and comprehensive experimental verification are carried out. The main work and conclusions are as follows.
(1) Designing the channel attention module to extract the fault signal features and adding the idea of residual learning to solve the network degradation problem caused by the model becoming deeper, so that the network can learn the key features of the data more comprehensively.
(2) Build a convolutional autoencoder and combine the channel attention module to build a convolutional attention autoencoder feature extraction module to extract the fault features of the original input data; and design a fault feature classifier to classify the extracted features for diagnosis.
(3) The convolutional attention autoencoder fault diagnosis method has higher accuracy and noise immunity compared with other networks and can diagnose fault types in real time.
In the future, the network structure optimization of the convolutional attention autoencoder will be carried out to further investigate the application to the fault diagnosis method.

Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Open Access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.