CvT Fault Diagnosis Method of Manifold Sensitive Modal Matrix Under Variable Speed

Rotating machinery (RM) is one of the most common mechanical equipment in engineering applications and has a broad and vital role. Rotating machinery includes gearboxes, bearing motors, generators, etc. In industrial production, the important position of rotating machinery and its variable speed and complex working conditions lead to unstable vibration characteristics, which have become a research hotspot in mechanical fault diagnosis. Aiming at the multi-classification problem of rotating machinery with variable speed and complex working conditions, this paper proposes a fault diagnosis method based on the construction of improved sensitive mode matrix (ISMM), isometric mapping (ISOMAP) and Convolution-Vision Transformer network (CvT) structure. After overlapping and sampling the variable speed signals, a high-dimensional ISMM is constructed, and the ISMM is mapped into the manifold space through ISOMAP manifold learning. This method can extract the fault transient characteristics of the variable speed signal, and the experiment proves that it can solve the problem that the conventional method cannot effectively extract the characteristics of the variable speed data. CvT combines the advantages of self-attention mechanism and convolution in CNN, so the CvT network structure is used for feature extraction and fault recognition and classification. The CvT network structure takes into account both global feature extraction and local feature extraction, which greatly reduces the number of training iterations and the size of the network model. Two data sets (the HFXZ-I planetary gearbox variable speed data set in the laboratory and the bearing variable speed public data set of the University of Ottawa in Canada) are used to experimentally verify the proposed fault diagnosis model. Experimental results show that the proposed fault diagnosis model has good recognition accuracy and robustness.


Introduction
Rotating machinery is an important component widely used in mechanical transmission. Under its low speed, high load and other conditions, the probability of failure in key parts is very high, so its fault diagnosis plays an extremely important role in mechanical transmission. A large number of experts and scholars have carried out research on the fault diagnosis of rotating machinery. The research results can be roughly divided into the following three categories.
In the first category, some scholars use traditional signal processing methods for fault diagnosis, such as signal timefrequency characteristics, signal decomposition and noise reduction processing methods. R.Q.Yan [1] et al proposed an overview of wavelet-based fault diagnosis for rotating machinery, sorting out Continuous Wavelet Transform (CWT), Discrete Wavelet Transform (DWT), Wavelet Packet Transform (WPT) and second-generation wavelet transform (SGWT) methods in fault diagnosis. Y.G.Lei [2] et al proposed various improved algorithms for Empirical Mode Decomposition (EMD) in the field of fault diagnosis and prospects for the shortcomings of EMD. K.Dragomiretskiy [3] et al Variational Mode Decomposition (VMD) methods applied in the art. P.Borghesani [4] et al proposed the use of squared envelope spectrum for fault diagnosis of variable speed signals. C.Mishra [5] et al proposed the use of synchronous averaging of angular velocity and wavelet denoising method to deal with the fault diagnosis of variable speed signals. The above-mentioned methods require a large amount of signal processing experience judgment and human judgment to extract effective features. The accuracy of fault diagnosis is extremely dependent on feature extraction. They are suitable for fault diagnosis in specific situations and are not very robust.
In the second category, some scholars use feature extraction and traditional machine learning algorithms. Commonly used machine learning methods include BP neural network, Support Vector Machine (SVM), Relevance Vector Machine (RVM), Hidden Markov Model (HMM), Random Forest (RF), dimensionality reduction algorithm, manifold learning, etc. H.X.Cui [6] et al proposed a combination of extracting signal information entropy features and optimizing SVM methods for fault diagnosis research. C.He [7] et al wavelet packet transform and improved features in combination Fisher feature extraction methods, and then using the multi-RVM classification task. L.J.Wan [8] et al proposed a method for rapid parallel construction of decision tree similarity matrix based on Spark and optimized sub-forest IRF algorithm to realize bearing fault diagnosis. Y.H.Cheng [9] et al proposed fault diagnostics of rolling bearings using feature fusion based BP, RBF and PNN neural networks. Y.K.Gu [10] et al proposed fault diagnosis method of rolling bearing using principal component analysis and support vector machine. Y.Wu [11] et al proposed fault diagnosis for industrial robots based on a combined approach of manifold learning, treelet transform and Naive Bayes. Z.Zhuang [12] et al proposed fault Detection of High-Speed Train Wheelset Bearing Based on Impulse-Envelope Manifold. On the one hand, this kind of fault diagnosis model too dependent on the results of feature extraction, on the other hand, the results of feature extraction and machine learning parameter adjustment determines the merits of the diagnosis of faults. Obviously, such methods have poor generalization ability for data sets in different states.
In the third category, some scholars use automatic feature extraction methods such as deep learning(DL) [13]. With the rapid development of computers and sensors, fault diagnosis has also entered the era of big data. How to use the advantages of big data to make fault diagnosis more efficient, more accurate and meet engineering standards has become a hot topic of current research. Since the advent of deep learning, which show its mettle in the field of computer vision. Since the advent of one-dimensional convolutional neural networks (1D-CNN) [14] , deep learning has been widely used in the field of fault diagnosis. In the current era of big data, deep learning models are trained through a large amount of data to simulate the structure of the human brain. The multi-layer network structure enables the deep learning model to adaptively extract high-dimensional features and better improve the accuracy of fault diagnosis and recognition. Popular deep learning models such as Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Generative Adversarial Network (GAN), Transfer Learning, Transformer Network, etc. C.Wu [15] et al proposed Intelligent fault diagnosis of rotating machinery based on one-dimensional convolutional neural network. L.Wen [16,17]  Among the many fault diagnosis techniques mentioned above, most of the research objects of fault diagnosis [23][24][25][26] are data collected under experimental conditions with a constant speed, ignoring complex working conditions. In the real industrial production environment, the superposition of multiple working conditions and variable speed of rotating machinery is very common. When performing fault diagnosis on variable speed signals, the accuracy of most of the diagnosis methods cannot meet the industrial requirements. Therefore, it is extremely important to study the fault diagnosis of rotating machinery under variable speed and complex working conditions. This paper proposes a multi-class fault diagnosis method for rotating machinery with variable speed and complex working conditions. Firstly, the Intrininsic Mode Function (IMF) with different noise amplitudes is obtained from the variable speed signal of the rotating machinery through EMD, and the time and squared envelope spectral kurtosis (TSESK) of calculating the IMF is proposed as the screening criterion. The IMF with the largest TSESK is selected as the sensitive mode, and the improved sensitive mode matrix (ISMM) is constructed by superimposing the white noise signal and the sensitive mode. Secondly, the highdimensional ISMM is mapped to the manifold space through isometric mapping (ISOMAP) to obtain the transient characteristics of mechanical faults. The transient characteristics of mechanical faults are transformed into 2Dheat maps as a data set. Finally, use the data set to train the Convolution-Vision Transformer (CvT) network model for training, and obtain the fault diagnosis and recognition results of the rotating machinery at variable speed and complex working conditions. The modes of IMFs are slightly different due to random noise and modal aliasing effects.
However, the transient state caused by the mechanical failure is fixed in each IMF, and the mechanical failure transient state will be retained when using ISOMAP to achieve the purpose of extracting the transient characteristics. Manifold learning is an effective method of nonlinear dimensionality reduction. C.Sun [27] et al proved that the inherent dynamic features can be extracted from high-dimensional data, and this inherent feature can reveal the fault transients of different machines. The modal aliasing in IMF and the additive noise and the noise from the noise-containing residue are referred confounders. The confusion factor is different in all IMFs in the ISMM. These confounding factors will be eliminated from the results. The process of the fault diagnosis model proposed in this paper is shown in Figure 1.
The other parts of the paper are structured as follows. The second section introduces the theoretical background of the method used in this paper. The third section introduces the laboratory HFXZ-I planetary gearbox data set, the experimental process and results, the comparison of the results using other seven signal processing methods, and the ablation experimental verification of the method in this paper. The fourth section introduces the use of the University of Ottawa's public bearing variable speed data set [28] to verify the fault diagnosis model. The fifth section concludes with a summary of the methods proposed in this paper, and prospects for future research directions.

Improved Sensitive Mode Matrix (ISMM)
The Sensitive Mode Matrix (SMM) was proposed by Jun Wang [29] et al (2020) in the fault diagnosis method of rotating machinery based on EMD manifold. In this paper, SMM is improved for the study of variable speed and complex working conditions of rotating machinery. The improved ISMM can have better results when processing variable speed data. This method can further improve the accuracy of model fault identification. The ISMM construction process is shown in Figure 2.

Figure 2 ISMM construction process
The fault information of rotating machinery mainly exists in the sensitive modes of IMFs obtained by EMD. EMD inevitably produces modal aliasing, and multiple resonances may occur. Therefore, multiple sensitive modes may appear in IMFs. Therefore, we need to extract the sensitive modalities that contain the most fault information from IMFS.
Perform EMD decomposition on the variable speed vibration signal () xt to obtain IMFs. In recent years, scholars [30][31][32][33] have proposed some methods for selecting sensitive modes. Reference [26] proposed combining kurtosis in the time domain and envelope spectrum domain to select the most influential sensitive mode. Antoni, J [34] (2007) proposed the squared envelope spectrum (SES) in Cyclic spectral analysis of rolling-element bearing signals: Facts and fictions. This paper uses squared envelope spectrum to calculate kurtosis in envelope spectrum domain calculation. Experiments have proved that the squared envelope spectrum calculation kurtosis can better extract the characteristics of variable speed.Analyzing sensitive mode is referred to as a standard time and squared envelope spectral kurtosis (TSESK). TSESK calculation formula refer to formula (1)(2)(3).

Isometric Mapping(ISOMAP)
Isometric Mapping(ISOMAP) [35,36] is a kind of manifold learning. ISOMAP was developed with Multi Dimensional Scaling (MDS). ISOMAP uses the geodesic distance in the manifold space according to the global nonlinear and local linear properties in the manifold, instead of the Euclidean distance. The three-dimensional geodesic distance model is shown in Figure 3. Calculate k nearest neighbors for each data point. Take the point within the radius ϵ as the neighbor point. The manifold is locally similar to the Euclidean space, and then the adjacency matrix graph is established. Use Dijkstra's algorithm to calculate the shortest path between two points. Similar to MDS, the distance function matrix D is defined as 11 , ij  is the distance between the i -th data and the j -th data. The standard quantization result of the vector is 11 21 1 Define the inner product matrix B, where b is an element of B. The derivation process is

Convolution-Vision Transformer(CvT)
Transformers were proposed by Ashish Vaswani [37] et al. (2017) for machine translation。In recent years, Transformer has made outstanding achievements in the field of natural language processing. However, it is still in the preliminary exploration stage in the field of fault diagnosis, and there is still a lot of room for development. Similar to the seq2seq model, the Transformer structure consists of two parts: Encoder and Decoder.
The Vision Transformer proposed by A Dosovitskiy [38] (2021) caused an upsurge in the application of Transformer in the image field. The emergence of Vision Transformer has changed the position of the Self -Attention mechanism in deep learning. The previous self-attention mechanism is either used in conjunction with neural networks or used to replace certain components in neural networks. But after Vision Transformer proposed, it proved that only relying on self-attention can also complete the image classification task with high precision.
But at the same time, VIT's shortcomings are also very obvious. It needs to be pre-trained on a large number of data sets to achieve good results on small and medium data sets. When training on a small data set, its effect is not as good as traditional neural network models such as CNN and RNN.
Haiping Wu [39] et al (2021) proposed the Convolution-Vision Transformer method in CvT: Introducing Convolutions to Vision Transformers. The traditional CNN method can only obtain the receptive field the size of the convolution kernel at a time, but it can connect the spatial information through the local receptive field. Self-Attention can use the entire picture as a receptive field, but cannot connect spatial information, so more data is needed for training. The Transformer combined with the CNN Convolution, you can use a smaller data sets were trained to play better results. The input image acquires features from the multivariate time series through a sliding convolution kernel, and the acquired features are sequentially input into multiple Convolutional Token Embedding and Convolutional Transformer Blocks. Finally, the classification result is obtained through the MLP module composed of two fully connected layers. Figure 4

Convolutional Token Embedding
The function of the Convolutional Token Embedding module is to perform convolution operations on the twodimensional image, expand the number of channels to better extract high-dimensional features, and reduce the size of the feature map to reduce the calculation parameters. Make the token contain more complex spatial features.  Figure 5.

Convolutional Projection For Attention
For the part of data projection and multi-head selfattention mechanism, a variety of projection structures are given in the literature: linear projection, convolution projection, etc. According to the needs of this article, select the method of Convolutional projection in the literature. The linear projection and convolution projection methods are shown in Figure 6.

HFXZ-I Planetary Gearbox Fault Diagnosis Experiment Platform
In order to verify the performance of the proposed model, it is tested whether the model can accurately identify faults under non-ideal data sets. Therefore, first select the HFXZ-I planetary gearbox data set collected in our laboratory.HFXZ-I The structure of the planetary gearbox fault diagnosis experiment platform is composed of magnetic powder brakes, planetary gears, helical gears, motors and other components. At the same time, it is equipped with a variety of planetary gears, helical gears and sun gears under different working conditions. The diagram of the planetary gearbox experimental platform is shown in Figure 7. Some parameters of HFXZ-I planetary gearbox are shown in Table  1.

HFXZ-I planetary gear box data description
Install the acceleration sensor to the surface position of the HFXZ-I planetary gear box to collect the vibration signal. HFXZ-I planetary gearbox test measuring point location is shown in Figure 8.

Data breakdown
In the experiment, the motor load was 0.5hp, and 9 experiments were carried out. The detailed information of the experiment data is listed in Table 2.
In the experiment, the data was collected for 60 seconds in each experiment, and the length of the original data collected was 614,400. There are nine types of complex faults in the experiment, nine channels of data are collected under each working condition, and there are a total of 81 raw data. The data of nine channels are divided into nine experiments, the purpose is to test the validity of the data collected by the method at each measuring point.
Each channel of each fault type is truncated according to the sample length of 1024 data points, and the overlap coefficient between samples is 50%. After truncation, each channel obtains 10,782 samples, of which 1,198 samples of each type of failure. For each type of failure, 80% of the samples are randomly selected as training data, and the remaining 20% are used as test data. The method of sample overlap expansion is shown in Figure  10.

Generate data samples
Calculate the data samples to create an ISMM highdimensional matrix(ISMM is introduced in section 2.1). Use manifold learning ISOMAP (ISOMAP is introduced in Section 2.2) to reduce the high-dimensional ISMM to a twodimensional matrix. The two-dimensional matrix is represented by a heat map as a data set. The process of generating data samples is shown in Figure 11. The partial results of the heat map of the data samples generated by the nine composite fault types are shown in Figure  12.

The result of the fault diagnosis experiment
Take a data set of 9 types of faults in 1 channel for fault diagnosis, and use the t-SNE dimensionality reduction algorithm to reduce the data from high-dimensional features to three-dimensional features. Use 3D-scatter chart to show the sample distribution change process in the process of this fault diagnosis method. From the sample distribution after pre-extracting features, it can be clearly seen that this method has a good effect on the fault diagnosis of variable speed rotating machinery data. In order to compare the preextraction feature effect of this method, EMD and CEEMD are used to compare the features extracted from the original data with the features extracted by this research method. The change process of the sample distribution is shown in Figure  13. Figure (a) is the sample distribution after the original data is reduced to three dimensions, Figure (b) shows the sample distribution after the EMD result is reduced to three dimensions, Figure (c) shows the sample distribution after the CEEMD result is reduced to three dimensions, Figure (d) shows the sample distribution after the ISOMAP manifold learning results are reduced to three dimensions, Figure (e) shows the sample distribution after the CvT feature is reduced to three dimensions.
According to Figure (b-d), it can be seen that ISMM+ISOMAP, EMD, and CEEMD pre-feature extraction are performed on the original data. Each fault sample distribution in (d) has been substantially separated and the polymerization started between fault samples. However, the fault samples in Figure (b) and Figure (c) have not yet been separated. Obviously, the method proposed in this paper is better than EMD and CEEMD in extracting features from variable speed rotating machinery fault data.
The comparison results can be seen, when the original data is mapped in three-dimensional space, Figure (a) exhibits the phenomenon of various types of discrete sample distribution, distribution of various types of samples between similar。By constructing a high-dimensional ISMM matrix on the original data, the ISMM is mapped to the manifold space. When the ISOMAP manifold learning results are mapped to the three-dimensional space, the sample distribution in Figure (d) shows that the sample distribution within each category is similar, and the samples between the categories are not completely separated. The result is converted into a heat map and passed into CvT for training, and then when the CvT feature is mapped to a threedimensional space, Figure (e) shows that the distribution of samples within each category is concentrated and the distribution of samples between each category is discrete.
Obviously, the method proposed in this paper has a good effect on the fault diagnosis of variable speed rotating machinery. The average accuracy of the fault diagnosis of the data set of 9 kinds of fault models in 1 channel is 99.49% in 5 experiments. The resulting confusion matrix and predicted label results are shown in Figure 14. Use the 9-channel heat map data set of the planetary gearbox, and use the same structure of the CvT network to perform 5 trainings respectively. Use the data of different channels to verify the effectiveness of this fault diagnosis method. The accuracy of 45 experiments in the test data set is shown in Figure 15.  Figure 16, and the Loss value change curve of 30 iterations during the training of the five models is shown in Figure 17.  Obviously, among the four fault diagnosis models, the effect of CvT has a higher accuracy rate than other fault diagnosis models.

Verification of ablation experiments
In order to verify that each part of the fault diagnosis method we proposed is indispensable. Therefore, by designing eight sets of ablation experiments, to verify the idea.  Table 3. By analyzing the results of the ablation experiment, it is obvious that the original signal is directly passed into the CvT network structure for training, and the accuracy of the training sample is only 68.12%. Compared with the other 8 groups of experiments, this group of experiments is the worst, and it also shows that CvT can not obtain more effective features by directly extracting the original variable speed signal. The second set of experiments removes manifold learning, and passes high-dimensional ISMM into the CvT network structure for training. The third and fourth groups of experiments used EMD and CEEMD instead of ISMM, respectively. The fifth and sixth experiments respectively mapped the results of EMD and CEEMD to ISOMAP. The problem of the above five groups of experiments is pre-feature extraction, which cannot effectively extract fault features, so their classification accuracy is relatively low. The seventh group of experiments uses MDS instead of ISOMAP, and uses Euclidean distance instead of geodesic distance when mapping. Obviously, only Euclidean distance cannot be used to obtain better results. The accuracy of the seventh set of experiments is only 92.14%. The eighth group of experiments used unimproved SMM (Proposed in Reference 29) for feature extraction, and the classification accuracy rate was 93.78%.The results of ablation experiments show that the feature pre-extraction method proposed in this study has a better effect in processing variable speed data, with an accuracy rate of 99.49%.

Bearing Dataset of University of Ottawa, Canada
In order to further verify the effectiveness of the proposed fault diagnosis method and its robustness on different data sets. The experiment uses a bearing data set under variable speed conditions from the University of Ottawa, Canada. The experimental device consists of a motor, an AC drive, a healthy bearing, and an experimental bearing. And the experimental device provides the failed bearing to replace the test bearing. The experiment uses ICP accelerometer and EPC incremental encoder to measure vibration data and shaft speed respectively. The structure of SpectraQuest Mechanical Failure Simulator (MFS-PK5M) is shown in Figure 18. The basic parameters of the experimental bearing are shown in Table 4.  The bearing data set of the University of Ottawa in Canada contains vibration signals collected from bearings of different health conditions under variable speed conditions. There are five types of failures:(i) healthy, (ii) faulty with an inner race defect, (iii) faulty with an outer race defect, (iv) faulty with a ball defect, and (v) faulty with combined defects on the inner race, the outer race and a ball. Variable speed conditions include (i) increasing speed, (ii) decreasing speed, (iii) increasing then decreasing speed, and (iv) decreasing then increasing speed。

Data set details
This experiment uses five kinds of bearing failures under the condition of increasing speed to carry out the experiment. The detailed information of the data set used in the experiment is shown in Table 5.

Experimental process and results
In the experiment, the data is collected for 10s in each experiment, and the length of the original data collected is 2,000,000. Each fault type is truncated according to the sample length of 1024 data points, and the overlap coefficient between samples is 50% (see section 3.3). After truncation, 19,525 samples were obtained, among which 3905 samples were of each type of failure. For each type of failure, 80% of the samples are randomly selected as training data, and the remaining 20% are used as test data. Use the fault diagnosis model proposed in Section 3 for diagnosis. The experimental sample distribution changes are shown in Figure 19. When the fault diagnosis model proposed in this paper uses the Ottawa public data set as the sample, the classification effect is also obvious from the sample distribution map. In order to further test the stability of the model, five experiments were performed using the Ottawa public data set as a sample. Using the test data set to test the accuracy of 5 models, the average accuracy is 99.65%, and the standard deviation of accuracy is 0.164823. It shows that the model has a stable effect on the public data set in Ottawa and the accuracy rate is close to 100%. Calculate the change in loss value during training as shown in Figure 20. After 5 experiments were tested using the test data set the accuracy of the model shown in Figure 21.  Rotating machinery fault diagnosis has become a major problem in industrial production due to the unstable speed, discrete sample distribution within the fault category, and similar sample distribution among various categories. In response to this problem, this paper proposes to increase the data set capacity by overlapping sampling, construct ISMM+ISOMAP manifold learning to preprocess the variable speed signal, and input the preprocessing result heat map into CvT for training.
The 3 major improvements proposed in this article : (1) ISMM greatly reduces the influence of EMD modal aliasing effects; (2) The method of ISMM+ISOMAP manifold learning module to extract transient fault features solves the problem that it is difficult to extract better features from variable speed data sets through conventional methods; (3) Apply CvT to the field of fault diagnosis. Compared with other neural networks, it greatly reduces the number of iterations of neural network training and improves the accuracy of model training.
In order to verify the effectiveness of the proposed model, multiple experiments were performed using the laboratory's HFXZ-I planetary gearbox and the time-varying speed fault data set of the University of Ottawa in Canada. The accuracy of this method is 99.49% and 99.65%, respectively. In order to test the effect of each part of the model, multiple sets of ablation experiments were selected for verification. The results show that the proposed fault diagnosis model has only accurate fault type recognition rate and good robustness to different time-varying speed data sets.
On the basis of this research, in the future, more complex variable speed data can be studied and the ISMM and ISOMAP feature extraction modules that are adaptive to a variety of variable speed data can be studied. The CvT network model has a relatively large amount of calculation, and the CvT network structure can be optimized to reduce the computational scale and computational time complexity.