Intelligent Fault Diagnosis Based on Dynamic Convolutional Depth Domain Adaptive Network

 Abstract: Deep learning-based mechanical fault diagnosis method has made great achievements. A high-performance neural network model requires sufficient labelled data for training to obtain accurate classification results. Desired results mainly depends on assumption that training and testing data are collected under the same working conditions, environment and operating conditions, where the data have the same probability distribution. However, in the practical scenarios, training data and the testing data follow different distributions to some degree, and the newly collected testing data are usually unlabeled. In order to solve the problems above, a model based on transfer learning and domain adaptation is proposed to achieve efficient fault diagnosis under different data distributions. The proposed framework adapts the features extracted by multiple dynamic convolutional layers, and creatively utilizes correlation alignment(CORAL) to perform a non-linear transformation to align the second-order statistics of the two distributions for fault diagnosis, which greatly improves the accuracy of fault classification in the target domain under unlabeled data. Finally, experimental verifications have been carried out among two different datasets.


Introduction
With the application of deep learning in various research fields, various data-driven methods based on end-to-end deep learning are widely used in the field of rotating machinery health monitoring [1].Nowadays, machine learning models such as artificial neural network (ANN), support vector machine (SVM), convolutional neural network (CNN), and autoencoder (AE) [2][3][4][5][6][7][8] have been applied successfully in machine fault diagnosis and have achieved some remarkable results. It shows that although the intelligent fault diagnosis method based on deep learning has been successfully developed, the first  Hua-Feng Zhou 1346677354@qq.com 1 Air Force Engineering University, xi'an 710051, China prerequisite is that the training data and test data must be collected under the same environment and working conditions and the data need to be labeled. In the actual operation of mechanical equipment, this assumption is almost impossible to realize. Rotating machines usually have a variety of working conditions in actual work, the environment and operating conditions of mechanical equipment are changing all the time. Conditions such as speed and load are not the same. Therefore, the data are collected under different working conditions. Literature [9] also fully confirms this point of view, and the desired performance of machine intelligence fault diagnosis depends on the degree of distribution similarity of the training data and testing data. Therefore, when training data and testing data have different probability distributions, the fault diagnosis performance of the network model trained by training data is reduced when tested on testing data [10]. The proposal of unsupervised transfer learning provides an effective way to solve the above problems. Transfer learning is to use the knowledge learned in the source domain to solve new related tasks in unlabeled data (denoted as the target domain) when the source domain and the target domain have different probability distributions [11] [12]. The domain data maintains good classification performance and obtains feature representations that are close to each other in different domains, so that it also has desired classification performance on the target domain [13]. Deep learning [14] is able to learn hierarchical representations of data, and the deep representations of the learned data can be used as cross-domain invariant features. Transfer learning based on these cross-domain invariant features can effectively reduce the differences between the source domain and the target domain, and align the distributions of the source domain and the target domain. Based on this idea, the method based on deep transfer learning [15][16][17][18][19][20] has been successfully applied in various tasks. Among them, the domain adaptive method, as a special form of transfer learning, has been widely used in fault diagnosis problems, and the learned diagnosis knowledge can be well extended to different mechanical conditions.
In the current literature, several intelligent fault diagnosis methods based on transfer learning have been proposed [21][22][23], and desired diagnosis results have been obtained. Wen, L et al. [21] used a three-layer sparse autoencoder to extract features of the original data, and the maximum mean difference is utilized to minimize the feature differences between the training data and the test data, so that it had a good diagnostic performance on the test data. Yang, B et al. [22] used Convolutional Neural Network (CNN) to extract the transferable features of the original vibration data, and they used domain adaptation and pseudo-label learning regularization conditions to constrain the CNN parameters to reduce the distribution differences and the distance between classes. Lu, WN et al. [23] proposed a domain adaptive algorithm, which trains a classifier or regression model in the source domain to realize adaptation to different but related target domains.
It can be seen from the above literature that the existing fault diagnosis methods mainly extract domain invariant features for fault diagnosis by minimizing the maximum mean difference (MMD) between the source domain and the target domain. The Maximum Mean Difference (MMD) [24]has been used to measure the average difference between the Hilbert space domain of the regenerated nuclear. As a kernel method, MMD may have some shortcomings, such as low global generalization ability, high sensitivity to kernel selection [25], and scalability for large-scale applications [26]. Moreover, the above paper only measure the distribution difference of a certain layer in the network, and ignore the features extracted from other layers of the convolutional neural network, and align the feature distributions extracted from other layers. To enable the network to have a better diagnostic performance on both the labeled source domain and the unlabeled target domain, align the probability distribution of the multi-layer fault features extracted by the convolutional neural network. A novel framework based on deep convolutional multi-layer domain adaptive network is proposed in this paper.
The domain adaptive network proposed in this paper is verified on the data sets which collected under different working conditions. The proposed method can be well adapted to migration under different working conditions and can effectively perform fault classification. The experimental results show that: compared with the deep model without domain adaptation, the proposed framework achieves more than 30% improvement in recognition accuracy of bearing working condition. The main innovations of this paper are summarized as follows: The dynamic convolution is innovatively applied to the corresponding network modules. Under the premise of not increasing the depth or width of the network, the attention mechanism is used to aggregate multiple convolution cores to a smaller The computing resources have realized the improvement of the performance of the network structure. The structure of this paper is as follows. Section 2 introduces the problem of domain adaptation in detail. The third section discusses the proposed method. The fourth section shows the experimental verification among different datasets under different working conditions. Finally, conclusions are drawn in Section 5.

Domain Adaptation
In order to better understand the problem of domain adaptation, it is necessary to provide specific problem descriptions. .The goal of transfer learning is to use labelled data s D to learn a classifier to predict the label t y of the target domain t D . Therefore, the core of transfer learning is to find the similarity and domain invariant features between the source domain and the target domain.
In this paper, the propose model is set to be trained using labeled data collected from a certain working condition of the mechanical equipment. The trained network model can recognize the health status of the mechanical equipment when the unlabeled data under other working conditions, As shown in Figure 1; this paper aims to perform transfer learning under different working conditions. , j x represents a data sample of the target domain, which does not have the label information about working conditions. Besides, it is assumed that the source domain and the target domain have the same types of working conditions, which means the label category retains the same. The main differences between the source domain and the target domain lies in the inconsistency of the probability distributions of the data under different working conditions. The network structure proposed in this paper is mainly to measure the difference between the source domain and the target domain and reduce the measured difference, so that the proposed deep learning framework is able to learn from it to obtain domain invariant features. Finally, it can effectively deal with unlabeled target domain data, and realize accurate fault classification. Therefore, being able to effectively learn domain invariant features is the key to accomplishing domain adaptation tasks.

The Proposed Method
In this section, the model structure and the training process of the proposed method is introduced in detail. The deep convolutional multi-layer domain adaptive network proposed in this paper is mainly composed of two modules: fault classification module and domain adaptive module. Fault classification mode classification module is able to automatically learn reasonable characteristics the potential failure, and contributes to accurate prediction of failure category. The main purpose of the domain adaptive module is to reduce the data difference between the source domain and the target domain, so that the feature extractor can extract the common features. The domain adaptation module mainly contains a multi-layer metric difference structure. It adopts Correlation Alignment(CORAL) [27] to perform a nonlinear transformation to align the second-order statistics of the two distributions, and minimizes the differences between the source and target domains. Since only the covariance statistics are calculated, the network model is simpler and more efficient. The network structure is shown in Figure 2.

Fault classification module
The fault classification module is implemented by a one-dimensional convolutional neural network with a data input layer, four dynamic convolutional layers, four pooling layers, three fully connected layers and a softmax output classification layer. The last layer uses softmax as the health state classifier of the network.
The input layer is converted from the original vibration signal into a frequency domain signal after FFT operation, The converted signals are used as the input of the one-dimensional convolutional neural network. The transformed signal length is L . Since the transformed frequency domain signal is one-dimensional, a one-dimensional convolutional neural network is used. At the same time, in order to better extract the fault characteristics, the model is made more complex and the diagnosis performance of the network is improved without increasing the depth or width of the network. This article uses dynamic convolution, which will be described in detail below.
Dynamic convolution is proposed by Chen et al [28]. It has k convolution kernels, which share the same kernel size and input/output dimensions. By using attention weights   () k x  for aggregation, consistent with the classic convolutional neural network design, batch normalization and activation functions (such as ReLU) are used to construct a dynamic convolutional layer after aggregation and convolution. Its structure is shown in the Figure 3, and the calculation formula is shown in eq(1):

Figure 3
The structure of dynamic convolution 1 1 Among them, In order to calculate the attention weight well, we introduce an attention module in its dynamic convolution, as shown in Figure 3; First, the dimensionality of the data is reduced by the average pooling layer, and then two fully connected layers (with a ReLU activation function between them) and softmax are used to generate normalized attention weights   ()  k x for K convolution kernels. Finally, dynamic convolution can be formed by kernel aggregation, which replaces the classic convolution.
Through later experiments, it is found that the network performance has been greatly improved without increasing the depth or width of the network. The recognition accuracy rate has also increased.

Domain adaptation module
The domain adaptation module mainly includes a multi-layer metric difference structure. At the same time, it uses the correlation alignment (CORAL) to perform a nonlinear transformation to align the second-order statistics of the two distributions, minimizing the difference in data collected under different working conditions. Finally, it contributes to enabling convolution neural network structure to extract the fault features whose domain is unchanged. Assume the training samples of the source domain  ,  ) ( ) For the domain adaptation model proposed in this paper, the covariance of each layer is measured, and respectively represent to the characteristics of the corresponding layer of the source and target domains.

Optimization objective
The depth domain adaptive network proposed in this paper mainly has the following two optimization objective: 1) Minimizing the classification loss of the fault category on the source domain data set.
In order for the deep-domain adaptive network to correctly recognize the machine health status category and extract domain invariant features, it is necessary to calculate the classification loss of the source domain data during the training process. For a data set with K fault categories, the loss can be defined as the standard softmax classification loss, as shown in eq (4): Where m is the batch size of training samples, K is the fault category, and () I  is the corresponding index function.
2) Minimizing the difference in the second-order statistics of the multi-layer source and target domains In order to enable the one-dimensional convolutional neural network to extract the domain invariant features under different working conditions and reduce the differences in data distribution under different working conditions, correlation alignment are adopted to reduce the second-order statistics (covariance) of the features of the three fully-connected layers between the source domain and the target domain. The detailed calculation method is shown in the above equation (2) (3). Using the chain derivation rule, the gradient related to the input feature can be calculated as in eq (5): Therefore, it demonstrates that the loss of the second-order statistics between the source domain and the target domain can be back propagated.
Combining the two losses proposed above, jointly minimizing the classification loss and the CORAL loss for training at the same time, updating the network parameters, not only can ensure the classification accuracy of the model, but also improve its generalization performance in the target domain. The definition of the total loss is mathematically descripted in eq (6): Where t represents the number of loss layers that need to be adapted to the CORAL in the proposed framework, i  is the loss weight under the corresponding domain adaptation layer, and  is the weight under the corresponding classification loss.
Training process requires to keep the balance between balance two different losses of CLASS ,and give full play to the network performance. During the experiment, it is found that the weight of the loss value of each layer will have a greater impact on the performance of the network. Therefore, this paper adopts the dynamic adjustment of the corresponding weight parameter related to the losses, which measures the proportion of the loss of each layer to the total loss. In this way, the weight of each layer is allocated, so that the network can have better diagnostic performance on the target domain. Use the method shown in eq (7)(8) to dynamically adjust the hyperparameters , i   , where i CORAL is the calculated difference between the second-order statistics of the source domain and the target domain at each layer.
For the need to minimize the objective loss function, as shown in eq (9), f  and e  are the network weight parameters that need to be updated for the feature classification module and the domain adaptation module, respectively.
Next, an optimization algorithm for updating the gradient is needed to update the weight parameters of the network model. For the network structure proposed in this paper, the batch gradient descent optimization algorithm is used to update the gradient.
In general, after sending the labelled source domain data and the unlabeled target domain data into the network, the network parameters are updated according to the above optimization goals. After the training process is completed, it can learn the domain invariant features of the source domain and the target domain, which has a strong domain self-adaptation ability, and contributes to a desired fault recognition performance on the unlabeled target domain.

Experimental verification and results analysis
In order to verify the effectiveness of the proposed method, two different datasets are utilized, including datasets from Case Western Reserve University (CWRU)[29] and from Southeast University(SEU) [30] were verified and compared with other domain adaptive methods are also conducted to further illustrate the effectiveness of the proposed framework.
The same method is used to preprocess the data in this paper. Data preprocessing and data segmentation are two important aspects that influence the performance of intelligent fault diagnosis [31]. In this paper, the measured vibration signals are used as the direct input of the model, Data enhancement is not adopted in the segmentation of samples, and the length of time-domain signal of each sample is 1024. The collected time-domain samples are transformed to the frequency domain through the FFT. Due to the symmetry of spectral coefficients, the length of each sample is 512. Z-score standardization is carried out on the data after FFT transformation to keep the input value within a certain range. Z-score standardized expression is shown in eq (10): In order to achieve the desired fault diagnosis performance of the model and avoid test leakage in the process of training neural network, which means test samples cannot be used in the process of training. Therefore, the used dataset is divided into a training set and a test set as shown in Figure 4, Specifically, 80% of the data is used as the training set and 20% as the test set, where the number of samples of each classes in each sample set is balanced.

CWRU data set introduction
The CWRU bearing data set is collected from the bearing experiment platform provided by Case Western Reserve University(CWRU) [29]. This paper uses the fault data of the drive end with a sampling frequency of 12kHZ. There are ten kinds of bearing health conditions, including one healthy bearing (NA) and three fault types, namely inner ring fault (IF) and ball fault (BF),and outer ring failure (OF), And according to each type of fault, there are three fault sizes divided into ten categories (one healthy state and nine fault states). Detailed information is shown in Table 1: In addition, the CWRU data set consists of four different motor load groups corresponding to four different speeds. This paper treats these different working conditions as different transfer tasks of 0, 1, 2, and 3shown in Table 3:

The experiment results of CWRU
The proposed network model in this paper trains and tests the data set which has been divided according to the requirements, and perform domain adaptation tasks under different working conditions of 0HP, 1HP, 2HP, and 3HP. It uses labeled source domain data for training. The accuracy on the source domain test set can reach 100%, so only the performance on the target domain test set will be discussed here. Using the domain adaptive method proposed in this paper, the experimental results carried out under 12 different transfer tasks are shown in Figure 5.

Figure 5 Accuracy under different transfer tasks
In addition, we compared the method proposed in this paper with the four domain adaptation methods currently proposed. These methods include the use of MK-MMD, Joint Maximum Mean Deviation (JMMD) to measure distance, DANN and CDAN [31]. MK-MMD uses different kernels to enhance MMD, thereby generating a principled method of selecting the optimal kernel to enhance the mobility of feature representation [32]. The JMMD Joint Maximum Mean Deviation (JMMD) criterion learns domain-invariant features by aligning the joint distribution of multiple domain-specific layers across domains [33]. The DANN structure includes a deep feature extractor and a label predictor, which together form a standard feedforward structure. In the training process based on backpropagation, the gradient is multiplied by a certain negative constant through the gradient reversal layer, and a domain classifier is connected to the feature extractor to realize unsupervised domain adaptation. Minimizing the label prediction loss (for the source domain) and the domain classification loss (for all examples),domain-invariant features can be obtained [34]. The CDAN conditional domain confrontation network confrontation is embedded in the deep network, learns transferable features, realizes the discrimination confrontation adaptation in the multi-modal domain, and realizes domain adaptation. Its accuracy rate under different domain adaptive tasks is shown in Figure 6: The above experimental results confirm the effectiveness and applicability of the method proposed in this article. It has good results in different domain adaptive tasks. Compared with other transfer learning methods, the accuracy rate is also certain. Therefore, the domain adaptive method proposed in this paper has certain advantages.

Experiments on SEU Datasets
Since the CWRU data set is relatively simple [31], in order to further illustrate the effectiveness of the proposed method, we choose to conduct experiments on the more complex SEU data set.

SEU data set introduction
The Southeast University (SEU) dataset is a gearbox dataset provided by Southeast University in China [30]. The data set consists of two sub-data sets, including the bearing data set and the gear data set, all of which are collected from the powertrain dynamic simulator. During the experiment, the data are collected through eight channels. In this article, we use the vibration data from channel 2. The data set has two working conditions of rotation speed and load configuration. It is considered that the two working conditions set to 20Hz-0V and 30Hz-2V are different. For tasks, there are 10 health categories in the data set under each working condition, as shown in Table 4, including two working conditions of 0 and 1.

The experiment results of SEU
Since it uses labeled source domain data for training, the accuracy on the source domain test set can reach 100%, Only the performance on the target domain test set is discussed here, and 100 epochs are trained. Testing is conducted on the test set of the target domain, and the results are shown in Figure 7 and Figure 8 The confusion matrix of the failure mode classification on the target domain test set is shown in Figure 9 and Figure 10; It can be seen from the experimental results that the accuracy of the domain adaptation method proposed in this paper is significantly higher than that of the non-migration method, especially for the 0-1 transfer task. The accuracy rate on the test set of the target domain is improved by more than 30%, which proves the effectiveness of the method proposed in this paper. It can be seen from the confusion matrix that it has a desired classification performance in diagnosis of most categories. The limitations mainly lie in the classification of tooth root faults and gear surface faults, while the classification performance of the overall model is superior.
In order to avoid the randomness of the experiment, 20 experiments were carried out on the domain adaptation tasks 0-1 and 1-0. The training process of each experiment was verified on the validation set, and the accuracy of the 20 experiments was finally calculated. At the same time, it is compared with other four or five domain adaptive methods, namely single-layer CORAL, MK-MMD, JMMD, DANN and CDAN. When calculating the 95% confidence level of various domain adaptive methods at the same time, the confidence interval is shown in Table 5. Under the 0-1 and 1-0 transfer tasks, the comparison box plot of different migration methods is shown in Figure 11 and Figure 12.  Through the above comparative experiments, it is found that the method proposed in this paper has certain advantages compared with other domain adaptations. The accuracy of fault classification on different transfer tasks is more than 90%. It has desired applicability in different transfer tasks.

Feature visualization analysis
In order to further confirm the superiority of the method proposed in this paper, visually understand the impact of transfer learning on the features of the target domain, using t-distributed stochastic neighbor embedding(t-SNE) technology to map high-dimensional features to two-dimensional space. Visualize the results of different domain adaptive methods on the 0-1 migration task on the Southeast University (SEU) dataset, and observe the distribution of their categories as shown in Figure 13; Through the analysis of the above feature visualization, it can be found that the feature separability of the domain adaptive method proposed in this article is more obvious in the target domain, and the distance between categories is larger, which makes it difficult for the network model to classify fault categories incorrectly. Therefore, its fault recognition accuracy rate in the target domain is higher.

Transfer ratio analysis
On the other hand, some existing unsupervised domain adaptive methods are prone to negative transfer. In order to compare the transfer effects in each class, this paper uses the convolutional neural network without domain adaptive algorithm as a benchmark to calculate the transfer ratio of each class. The calculation method is shown in eq (11).
Where represents the calculated transfer ratio, is the number of samples that are correct for the class predicted by the domain adaptive method; is the number of samples that the CNN network predicts the class is correct, and is the total number of samples for the class. The transfer ratio of each class under different domain adaptive methods is shown in Figure 14. By analyzing the transfer ratio of each class, we can clearly see that the transfer method proposed in this paper has no negative transfer in all class. Compared with other adaptive methods, its transfer ratio is also higher than other methods in most class, which further illustrates the advantages of the method proposed in this paper.

Conclusion
This paper applies unsupervised transfer learning to the field of intelligent mechanical fault diagnosis, and proposes a novel model for domain adaptive fault diagnosis under different working conditions. We verify the effectiveness of the proposed method through two different datasets, which contains a total of fourteen transfer tasks without labels in the target domain. According to the experimental results, the following three conclusions can be drawn. domain adaptation tasks, the method proposed in this paper achieves better results than those of the current domain adaptation methods, which demonstrates the effectiveness and superiority of the method proposed in this paper.