Intelligent Fault Diagnosis of Wind Turbine Gearbox based on Re ned Generalized Multi-scale State Joint Entropy and RSFS Feature Selection

The fault diagnosis of gearbox and bearing in wind turbine is crucial to improve service life and reduce maintenance cost. This paper proposes a novel fault diagnosis method based on refined generalized composite multi-scale state joint entropy (RGCMSJE), robust spectral learning framework for unsupervised feature selection (RSFS) and extreme learning machine (ELM) to identify the different health conditions of gearboxes, including feature extraction, feature reduction and pattern recognition. In this method, MAED is firstly adopted to assist RGCMSJE in parameter selection. Second, RGCMSJE is utilized to extract the multi-scale features of gearbox vibration signal and construct high-dimension feature set. Thirdly, RSFS method is used to reduce the dimension of high-dimensional RGCMSJE feature set. In the end, the obtained low-dimensional features are input to the ELM classifier to realize fault pattern recognition. Through two gearbox fault diagnosis experiments, the effectiveness of the fault diagnosis method is verified. The analysis results show that this method can effectively and accurately identify different fault types of wind turbine gearbox.

data show that the total installed capacity of global wind energy is 651GW, a year-on-year increase of 10%, and it is expected to reach 2930GW in 2040 [3].
Unfortunately, with the increasing number of wind turbines (WTs) installed, due to its complex and harsh operating environment, WTs failures occur frequently.
This will lead to high operation and maintenance (O&M) costs and huge economic losses. It is worth noting that gearbox and bearing are one of the most critical components of wind turbine, and also the main place of failure. Therefore, it is crucial to find a reliable fault diagnosis methodology to timely and accurately monitor the operation status of gearbox and bearing in order to reduce the cost of operation and maintenance (O&M) [4].
Gearbox and bearing, as an important part of WTs, have a great influence on the life of the whole wind power system [5]. The analysis methods for gearbox and bearing are usually based on vibration signals, which provide more useful information than photoelectric signals, acoustic emission signals, temperature signals and other types of signals [6]. However, due to the instability of wind speed and strong background noise, the vibration signals are usually nonlinear, nonstationary and mixed with noise. Especially in different working conditions, the weak characteristics of vibration signal are difficult to obtain, so it is difficult to identify the type of fault [7].
Fortunately, many scholars have proposed very interesting solutions based on data-driven and machine learning methods. These schemes mainly include two steps: feature extraction and pattern recognition. The key step affecting the performance of the latter is feature extraction [8]. Traditional time-frequency analysis method needs the professional knowledge of investigators and the detailed information of mechanical components as support, which greatly limits its application [9,10]. Recently, with the development of nonlinear technology, feature extraction based on entropy method has gradually become an important research topic in WTs fault diagnosis [11,12,13].
Entropy, as an index to measure the dynamic characteristics of time series, has been widely applied to feature extraction of gearbox and bearing [14], [15], such as sample entropy (SE) [16], permutation entropy (PE) [17], fuzzy entropy (FuzzyEn) [18], and dispersion entropy (DE) [19]. Among them, the SE time consumption is huge, especially when dealing with long time series [20]. PE is obviously faster than SE, but the amplitude relationship between time series is not taken into account [21]. The feature extraction ability of FuzzyEn is limited by fuzzy membership function [22]. Significantly, DE has the most prominent advantage, its computational efficiency is significantly higher than SE, PE and FuzzyEn, and entropy estimation is stable and effective. This is attributed to the symbol mapping based on statistics and the probability distribution based on embedding pattern in DE method [23]. However, these entropy algorithms ignore the dynamic characteristics of time series, that is, the probability of time series changing from the current state to the next state. Therefore, based on the advantages of DE, we propose a new entropy estimation algorithm called state joint entropy (SJE). The algorithm considers the joint distribution of the current state and the next state. The test results of simulation and experimental signals show that SJE not only inherits the efficiency and stability of DE algorithm, but also can extract more fault information. Moreover, a large number of studies show that the entropy algorithm only estimates the irregularity and complexity of the signal on a single time scale and hence some important fault information in other scales may be discarded [24,25,26]. In order to extract fault features from multiple time scales, refined composite multi-scale analysis is proposed by Azami et al [27]. It not only solves the problem of undefined entropy in composite multi-scale analysis, but also increases the stability of results for long time series. However, refined composite multi-scale analysis mainly considers the average process, which means that with the increase of scale factor, the variance of entropy value will increase rapidly and lose statistical stability. Therefore, this paper proposes a refined generalized composite multi-scale analysis to make up for the above shortcomings by defining the mean and variance of the whole time series. Finally, this paper combines state joint entropy and refined generalized composite multiscale analysis into refined generalized composite multi-scale state joint entropy (RGCMSJE), which is used as the feature extraction method of gearbox and bearing.
As is well known, after extracting high-dimensional features by multi-scale analysis, feature selection is usually needed to reduce the computational burden.
At present, Laplacian score (LS) [28], Fisher score (FS) [29] and Max-relevance and min-redundancy (mRMR) [30] have been widely used in the feature selection tasks in various fields. However, LS only focuses on the similarity of adjacent samples, and ignores the ability of global information separation. Moreover, FS shows the opposite characteristics. It focuses on the global separation performance of samples, and does not consider the similarity of adjacent samples [31]. Therefore, the features selected by FS and LS cannot effectively represent the separability of multi class samples. Max-relevance and min-redundancy (mRMR) is proposed based on the principle of maximizing the between-class distance and minimizing the within-class distance, but it has a large CPU time consumption, which is not conducive to fast calculation. In order to solve the above problems, this paper introduces RSFS method into feature selection process. This method uses robust local learning method to deal with the noise on the clustering label, so as to improve the local information retention ability while taking into account the global information [32]. Two experimental results show that the multi class features extracted by RSFS are more separable while ensuring the efficiency. Finally, the widely used extreme learning machine (ELM) is used to identify different fault types of gearbox and bearing. ELM not only has higher generalization ability, compared with softmax regression (SR) [33], support vector machine (SVM) [34], random forest (RF) [35] and k-nearest neighbor (KNN) [36] algorithms, but also has the advantages of fast calculation speed and less manual intervention [37][38].
In summary, the contributions of this paper are as follows： (1) SJE method is proposed to extract the current state and the next state of the time series, which can extract more fault features while inheriting the efficiency of DE algorithm.
(2) RGCMSJE method is proposed to extract features on multiple time scales, which makes up for the deficiency of refined composite multi-scale analysis.
Multi-scale average Euclidean Divergence (MAED) is proposed to automatically select the parameters of RGCMSJE method.
(3) RSFS is introduced for the first time to select sensitive features with higher distinguishability as the input of ELM intelligent classifier.
(4) The fault diagnosis method of RGCMSJE-RSFS-ELM is proposed systematically, and its effectiveness is objectively verified through MAED parameter selection and comparative experiments.
The rest of this paper is organized as follows. Section 2 introduces the theoretical derivation and performance comparison of SJE. Section 3 gives the detailed steps of RGCMSJE and MAED. Section 4 introduces the RSFS for the sensitive feature selection. Section 5 illustrates the ELM algorithm and the process of the proposed method based on RGCMSJE-RSFS-ELM. Section 6 introduces two detailed experiments and discusses the results. Finally, conclusions are drawn in Section 7.  [39], maximum entropy partition (MEP) [40], unified quantization (IAUQ) [41] and exact cumulative distribution (NCDF) [42].
Although the LM algorithm is the fastest one, when the maximum value of time series is far greater than the average value, the i z approaches a straight line.
MEP algorithm needs to set standard data, which increases the difficulty of symbolization. The fluctuations of signals are not taken into account by the IAUQ. The NCDF mapping process comes from dispersion entropy, which maps the original time series to a distribution of 0 to 1. Then a linear algorithm is used to map the distribution to an integer from 1  to c  . The mapping process is expressed as follows [42]: The state transition matrix shows that the symbol time series moves from one state to another with time. (5). According to the state transition matrix, using Eq. (7) to calculate its probability as follows:   (9) Where the value of T m c a m d k P q  z is defined as follows: The number is ( 1)  The normalized SJE is also calculated as follows:

Symbolization performance comparison
To validate the advantages of mNCDF algorithm for mapping time series, a fault signal model of rolling bearing in Ref [44] is employed to simulate the outer race  (12) where i A is the amplitude modulation signal with the frequency 33 Q  Hz, 0 A is the amplitude of the signal with 0 3 , n f is the natural frequency of the system with 3000H   Fig. (3), mNCDF algorithms could reflect more details of bearing failure simulation series than the MEP algorithms. The MEP algorithm maps too many bearing failure simulation series to the maximum number of symbols 3. Therefore, redundant amplitude information will be caused, and the original series periodicity will be destroyed. It should be noted that the symbol sequence obtained by the mNCDF algorithm not only appropriately retains the amplitude information but also protects the periodicity of the signal.  In order to further compare the information contained in the symbol series obtained by the two mapping algorithms, we obtained the spectrum and envelope spectrum of the symbol sequence through simulation are displayed in Fig. 4 and is severely weakened. This phenomenon shows that the mNCDF mapping process can effectively retain the amplitude and fault information of the original signal and its performance is better than the MEF algorithm.

Comparison between SJE, DE, SE and PE
To study the performance of the proposed SJE method in describing the complexity of time series, four simulated signals are used for analysis. At the same time, the three methods of DE, PE and SE are used for comparative analysis.
All these simulation signals have a length of 360s and a sampling frequency of 150 Hz. Design a 12s sliding window to divide the data with 75% overlapping steps, which means that each movement step is 3s.
In order to compare different entropy algorithms reasonably, the parameters of the four methods need to be set first.  To study the sensitivity of the SJE method to frequency and amplitude varying signals, the amplitude modulated chirp signal is applied, depicted in Fig. 7(b). The signal, whose frequency is linearly increasing from 0.  To detect the sensitivity of the SJE method to the level of noise, the quasiperiodic signal with different level of noise power is created, depicted in Fig. 8 The signal is modulated by two sinusoidal signals with frequencies of 0.5 and 1 Hz. There is no noise in the first 24 seconds of the sequence. Then, the Gaussian white noise (WGN) is added to the signal every 12s, with the noise power increasing gradually. The simulation results are shown in Fig 8 (a). The entropy value of PE is constant from the 10th sliding window, which indicates that the method cannot detect the change of noise power. The SE entropy curve increases monotonously with the increase of noise power, but it fluctuates sharply from the 40-th sliding window, which indicates that the SE method is vulnerable to noise.
It is worth noting that the curves of SJE and DE value increase monotonously and steadily with the increase of noise power, which indicates that both SJE and DE methods can detect noise power and maintain stable performance.
To investigate the recognition ability of SJE method for amplitude jump signal, a signal consisting of impulse and WGN is created, depicted in Fig. 9(b). The impulse signals are added every 80s, and their amplitudes are 20, 30 and 50 respectively. The comparison results are shown in Fig. 9 (a).
It can be observed that the PE curve is constant, which indicates that the method cannot detect the change of impulse amplitude. The SE method can detect the amplitude change, but the SE curve fluctuates violently, which indicates that the performance of the method is unstable and the anti-noise ability is poor.

SJE and DE methods can effectively detect the change of amplitude, and SJE
method is more sensitive. The above results show that SJE method not only inherits the stability of DE method, but also has stronger amplitude sensing ability.  3 Refined generalized composite multiscale state joint entropy algorithm

Refined generalized composite multiscale analysis
The refined composite multi-scale analysis (RCMSA) effectively solves the problem of uncertain entropy in multi-scale analysis. However, in the coarsegrained process of RCMSA, the variance of entropy values increases with the increase of scale factor, which leads to the low stability and distinguishability of feature space. In order to overcome these shortcomings, a new refined generalized composite multiscale state joint entropy (RGCMSJE) method is proposed. The detailed description of RGCMSJE is given as follows: (1) For a given time series { ( ), 1, 2, , } x i i N  L , in order to obtain a stable feature space, the mean value x  and standard deviation x  of the time series are calculated. 1 1 N The k th coarse-grained time series ( ) (2) According to the coarse-grained time series ( ) , s k j y , the unified mapping is , from 0 to 1 as follows: (3) For scale factor s , the RGCMSJE value is described as Shannon entropy of the second-order moment state joint model obtained after the time series is shifted. It is noteworthy that in reference [45], generalized multiscale processes are extended to second-order statistics by . This approach has been proved to be suitable for the sample entropy and permutation entropy which are sensitive to the relationship between adjacent amplitudes.
However, it is found that this method is not suitable for SJE because a lot of amplitude information will deviate with the change of second moment. This leads to the loss of useful information to some extent [45]. In this paper, the RGCMSJE method is introduced to solve the above problems. RGCMSJE has been improved in two aspects. First, the process of refined composite multi-scale analysis in step (1) can stably extract the changes of time series from multiple scales and avoid the appearance of undefined entropy. Second, in order to reduce the variance fluctuation of time series with large scale factor, step (2) uses the mapping method of unified standard deviation and mean value instead of step-by-step mapping, so that RGCMSJE has stronger fault feature extraction ability.

The parameter selection of refined generalized composite multiscale state joint entropy
In the entropy algorithm, the selection of parameters is a very important problem.
The appropriate parameters can make the feature information extracted by entropy more abundant. The traditional entropy parameter selection method is to traverse the parameter interval, and then observe the entropy curve to determine the parameter value. However, the above method is only qualitative analysis and cannot provide quantitative data guidance, and it needs experience as support.
This not only reduces the efficiency of fault diagnosis, but also restricts the fault recognition rate. Recently, the average Euclidean Distance (AED) was proposed as an index of entropy parameter selection in reference [46], but this method did not consider the multi-scale factors and the stability of feature space. When multiscale analysis is used, the scale factor is an important factor in the selection of entropy parameters. At the same time, we find that when the stability of feature space is poor, even if the AED value is large, the recognition rate of gearbox (2) Calculate the average refined generalized composite multi-scale state joint entropy (ARX) and multi-scale standard deviations (MSD) of sample in i th class and s th scale as: Where 1 20 s   .
(3) The Euclidean distance of entropy space and standard deviation space between i th and j th classes are calculated respectively as follows: (4) The MAED value is calculated as follows: (5) Update the parameters c and m , and then repeat the above steps (1) - (4) to calculate the required MAED value. The detailed calculation steps of RSFS algorithm can be summarized as follows: (1) A local kernel regression ( ) il p g is constructed based on the data point i x .
(2) Calculate the degree matrix B of T (S + S ) and matrix ( ) where  ,  and  are input parameters.
(4) The update rules of F are as follows: where T A X W Z   , v is input parameters. (2) The hidden layer output matrix H is calculated.
where Q is the number of samples.
(3) The output weight  is calculated.
Where H  is the Moore-Penrose generalized inverse matrix of H , C represents the penalty factor, I is the identity matrix, and T is the expected output matrix.

The proposed fault diagnosis method
Based on the superiorities of RGCMSJE, RSFS, and ELM, a fault diagnosis method for gearbox of wind turbine is presented in this paper. Figure 11 displays the flowchart of the proposed method and it can be summarized as follows: Fig. 11 Flowchart of the proposed method.
(1) Collect and store the vibration data of different healthy types of wind turbine gearbox; (2) Apply RGCMSJE method to extract fault features through quantifying the complexity of gearboxes vibration signals under different health conditions.
The parameter combination of RGCMSJE method is determined by MAED method. In this paper, we set the time delay d =1 and scale factor s =20. And on this basis, find the best parameter combination of embedding dimension m and scale factor c .
(3) RSFS method is used to select the most sensitive feature to construct the state feature vector; (4) The selected state feature vectors are input into the multi fault classifier ELM to train and identify different operation conditions. In order to eliminate the influence of data randomness on the recognition accuracy of ELM method, the 10 times cross validation method is applied to the recognition process.

Experimental verification
In this section, the proposed intelligent diagnosis method is applied to the analysis of a laboratory gearbox dataset and a wind turbine gearbox dataset to verify its effectiveness.

Experimental system description and input dataset introduction.
In this case, the experimental datasets of gearbox vibration signal were collected from the University of Connecticut [48][49]. The experimental platform is shown in Fig. 12 domain. More details of the dataset can be obtained in Reference [49].
In this test, all nine types of faulty gears were applied to the test equipment.  Table 2 shows the total experimental data sets and the input data sets. Fig. 12 The gearbox experimental platform.     In order to comprehensively compare the advantages of the feature extraction methods proposed in this paper, we introduce five feature extraction methods for comparison. As shown in Table 3, GRCMSJE is the generalized refined composite multi-scale SJE entropy based on the second-order moment. At the same time, the parameters of all methods are the optimal values selected by MAED method. It can be seen from Table 3 that the MAED value of RGCMSJE is the largest, and the CPU time consumption is the least, which indicates that this method can extract more useful information from time series, and the extraction efficiency is higher. A graph of all six feature extraction methods is shown in Fig.   15. It can be found from Fig. 15 that when the scale factor s is greater than 5, the dispersion of RGCMSJE curve is the best, and there is no large fluctuation.
This shows that the multi-scale features extracted by RGCMSJE method are the most distinguishable and stable.

Dimension reduction and comparative analysis of feature space.
As can be seen from the feature curve in Fig. 15 (a) Where s R represents the feature selected by RSFS and s is the scale factor. At the same time, in order to comprehensively analyze the performance of RSFS method, we introduce Laplacian score (LS), Fisher score (FS) and max correlation min redundancy (RMR) methods for comparative study. According to the research of literature [50], the feature dimension retained after dimension reduction is set to be 1 n  , where n is the number of healthy states of gearbox and 9 n  in this study. That is, the first eight features of the four dimensionality reduction methods are selected as sensitive features to compose a new feature set.
Here an explanation about feature space is given to describe four feature selection methods more clearly. In this experiment, the feature space extracted by RGCMSJE is a 900 * 20 matrix, where the number of data samples is 900 and the feature dimension is 20. In addition, this experiment has 9 gearbox health status, so the category label is 1~9. In other words, the four feature selection algorithms select eight features from the feature space of 900 * 20, and the final dimension reduction feature space is 900 * 8. A detailed description of the feature space is shown in Table 5. The performance comparison results of the four feature selection algorithms are shown in Fig. 16 and Table 4.
The visualization results of the original signal and all feature distribution is given in Fig. 16. In order to intuitively analyze the feature space, we apply t-SNE algorithm to project the feature into the three-dimensional space. It can be seen from Fig. 16 (a) that the original signals of 9 health state are overlapped with each other, which indicates that it is very difficult to classify by using the original signals directly. At the same time, Figure 16 shows that the RGCMSJE algorithm clusters the features of the same category, and the features of different categories are separated from each other, which is convenient for the classification of different health conditions. It is worth noting that the features selected by RSFS method have stronger clustering. Although the other three feature selection algorithms can separate the nine states well, the clustering of the same category is not strong enough. This shows that RSFS algorithm can extract features with more health status information.  In order to quantify the feature selection performance of the above four methods, the between-class scatter, within-class scatter and CPU running time are introduced for comprehensive analysis, as shown in Table 5. We take the ratio of between-class scatter and within-class scatter as the performance index of feature selection. The larger the index is, the more concentrated the similar features are and the more distant the heterogeneous features are. From Table 4, compared with LS, FS and mRMR, RSFS has the largest feature selection performance index, and the CPU time loss is only twice that of LS and FS algorithm, which proves the superiority of RSFS algorithm in feature selection.

Fault diagnosis results analysis.
Finally, when the appropriate features are extracted, ELM classifier is used to classify the health status of wind turbine gearbox. It is worth noting that before ELM classification, we need to set the number of hidden neurons and select the appropriate activation function. The detailed results are shown in Figure 16, where the mean and standard deviation curves are the results of 20 calculations. the stability of ELM classification is gradually enhanced with the increase of the number of hidden layers. As can be seen from Figure 16 (b), when the number of hidden layers is set to 70, the classification accuracy is the highest, the stability is the best, and the CPU time loss is moderate. Therefore, we set sin function as the activation function of ELM, and set the number of hidden layers to 70.  In order to improve the reliability and accuracy of the classification results, the 10-fold cross validation algorithm is introduced in the training and testing process of intelligent classification algorithms in this section. The confusion matrix (CM) obtained by ELM intelligent classifier is shown in Fig.17. Figure 17 is that result that is repeated most often in the 20 experiments. It can be seen from  Table 6, where tp , tn , fp and fn refer to true positives, true negatives, false positives, respectively. , where k is the number of classes to be classified. Suppose  and  are the sets of actual labels and classified labels respectively; 11 n is the sample pairs of overlapping labels in  and  ; 00 n is the sample pairs of non-overlapping labels in  and  ; 2 n C is the total number of sample pairs; ( ) p g is the probability function and ( , ) p   is the joint probability function of  and  . The larger the value of the above evaluation index, the stronger the comprehensive classification ability of the classifier. Table 6 The performance evaluation metrics.

Metric Equation Range
ACC ACC tp tn tp fp tn fn  Table 7 lists the parameter settings of the five classifiers. The feature sets of the five classification algorithms are obtained by RGCMSJE method and RSFS method. At the same time, in order to reduce randomness, each method is tested with a 10-fold cross-validation method. The final verification results are shown in Fig. 19 and Table 8. From the radar diagram in Fig. 19, it can be found that the curve of RGCMSJE-RSFS-ELM model proposed in this paper is farthest from the center. At the same time, Table 8 Table 9. In this section, we select three working conditions of healthy and broken teeth as the research objects, and their condition labels are: h30hz0, h30hz20, h30hz90, b30hz0, b30hz20 and b30hz90, respectively. Similar to experiment 1, 102400 points of sensor 1 and sensor 2 are divided into 50 data samples (each data sample contains 2048 points). And the 10-fold cross validation algorithm is also applied to the training and testing process of feature set.

Parameter selection and feature sets determination
The    Similar to experiment 1, in order to extract feature information efficiently, this paper uses RGCMSJE method to analyze the recorded experimental data to verify the effectiveness and generalization ability of this method. Firstly, MAED method is used to select the appropriate parameter combination, and then RGCMSJE method is applied to extract the multi-scale feature vector set with dimension of 20 based on the optimal parameter combination. The constraints in this section are the same as those in Experiment 1, and the MSAED curve and the corresponding CPU time loss are shown in Fig. 21. Figure 21 shows that the maximum value of MAED curve is 14.6402, and the parameter combination is (4,2). This shows that for the WTGF data set, the RGCMSJE algorithm can extract more abundant fault information efficiently when the parameter combination is c = 4 and m = 2.
Therefore, considering CPU time loss and MAED, we choose category parameter c = 4 and embedding dimension m = 2. Subsequently, RGCMSJE is compared with other five feature extraction algorithms, and the comparison results are shown in Table 10 and Fig. 22. It can be seen from table 9 that the experimental results are the same as experiment 1. The MAED value of RGCMSJE is the largest and the CPU time consumption is the smallest, which indicates that the feature extraction method has good generalization ability. The 20 dimensional characteristic curve in Fig. 22 also proves this point. The RGCMSJE curve has the largest discrimination and the smallest standard deviation.

Dimension reduction and comparative analysis of feature space
Subsequently, according to literature research [50], the feature dimension retained after dimensionality reduction is set as 1 n  , where n is the number of operating conditions of gearbox, 6 n  . This is, the first six features are selected as sensitive features to get a new 6-dimensional feature vector set. Lastly, as the feature selection method selected in this paper, RSFS approach is also used to compare with the LS, FS and mRMR methods. Figure 23 visually depicts the visualization results of the raw signal and the distribution of all features, which is obtained by applying t-SNE algorithm to project the features into the three-dimensional space. From Fig. 23, one can clearly see that the distance between different classes is larger and the clustering between similar classes is clearer when 6 sensitive features are selected using RSFS approach. By comparing Fig.   23 (a), (b), (d), (E) and (f), obviously, the RGCMSJE can extract recognizable features from cluttered and inseparable signals.   Table 11. From Table 11, compared with LS, FS and mRMR, RSFS has the largest feature selection performance index, and the CPU time loss is only 0.0905s, which further proves the superiority of RSFS algorithm in feature selection. Therefore, this paper chooses RSFS algorithm as the dimension reduction algorithm of high-dimensional feature space.

Fault diagnosis results analysis
Similar to experiment 1, the obtained dimension reduced features are fed into the ELM intelligent classifier for gearbox fault classification. The parameter setting of ELM in this section is the same as experiment 1. The active function is set to sin function and the number of hidden layers is set to 70.  Similar to Section 6.1, we compare the comprehensive performance of the five classifiers to verify the efficiency of ELM. The parameter setting of the five classifiers is the same as Table 7, and the feature set of the input classifier is obtained by RGCMSJE and RSFS methods. Each algorithm is operated 20 times, and detailed comparison results are given in Fig. 26 and Table 12. From the radar distribution Figure 26, it is clearly observed that the four evaluation parameters of the RGCMSJE-RSFS-ELM curve proposed in this paper are farthest from the center, which indicates that the comprehensive classification ability of ELM is the best, and it also proves that the RGCMSJE-RSFS-ELM model proposed in this paper has the highest fault recognition ability. Besides, the data in Table 12 shows that the training and testing of ELM classifier consumes the least CPU time.

Further discusses
Through the comparative analysis of the above methods, we can conclude that the proposed method combines the advantages of RGCMSJE, MAED, RSFS and ELM, so that the fault state of gearbox components can be effectively identified.
The emphasis of the proposed method focuses on three aspects: refined (1) In previous reports, DE has been proved to have some advantages in computing time and reliability due to its unique symbolization and linear mapping rules. However, DE is the same as SE and PE algorithms, which only considers the mode probability of current state and ignores the state transition probability of fault information from one state to another. Based on these facts, SJE entropy, which combines the advantages of DE algorithm and considering the mode probability and transfer probability of current state, is proposed. Therefore, SJE entropy can extract fault information stably and efficiently.
(2) Generalized composite multi-scale analysis method effectively solves the problem of missing mutation behavior in composite multi-scale analysis, but the former is not suitable for DE and SJE algorithm, because many useful information will disappear with the transformation of second moment, which will lead to the instability of entropy value. Therefore, we propose a refined generalized composite multiscale analysis method to solve the above problems. Refined generalized analysis and composite operation can avoid the problem of information loss caused by second moment and extract information on multiple time scales.
(3) Previous studies on the parameters of entropy algorithm are based on the traversal simulation of analog signals. Therefore, the entropy parameters for specific fault data cannot be obtained, and the parameter selection process needs the experience support of researchers, which inevitably leads to errors.
In this paper, MAED method is proposed to determine the parameters of RGCMSJE method, which can realize the analysis of specific data and get the best parameters adaptively.
(4) In the feature selection process, LS algorithm lacks the ability of global information separation, on the contrary, FS algorithm lacks the ability of local information preservation. Therefore, we apply a new feature selection method RSFS to reduce the dimension of multi-dimensional feature space. This method uses robust local learning method to deal with the noise on the clustering label, so as to improve the local information retention ability while taking into account the global information.

Conclusion
In this paper, a neoteric intelligent fault diagnosis method of wind turbine gearbox