Three-Stage Wind Turbine Assessment Method: Condition Monitoring, Failure Prediction, And Health Assessment

. To improve the wind turbine shutdown early warning ability, we present a generalized model for wind turbine (WT) prognosis and health management (PHM) based on the data collected from the SCADA system. First, a new condition monitoring method based on kernel entropy component analysis (KECA) was developed for nonlinear data. Then, an aggregate statistic 2 D was designed to express the state change of the monitoring parameters. As the features were submerged because of the diversity and nonlinearity of SCADA data, an enhanced generalized regression neural network (GRNN) method — KECA-GRNN — for failure prediction was developed by adding KECA for feature extraction to improve the predictive performance. Finally, the results of the KECA-GRNN model were visualized by a bubble chart, which made the health assessment results of the WT more intuitive. Similarly, the fusion residual was defined to analyze the health trend of the WT, and the health status of the WT was represented by two visualization methods — bubble chart and fuzzy comprehensive evaluation. Furthermore, they were evaluated using SCADA data that were collected from a wind farm. Observations from the results of the model indicated the ability of the approach to trend and assess turbine degradation before known downtime occurrences.


INTRODUCTION
With the intensification of the energy crisis, wind energy has become an important source of renewable energy and has gradually played a significant role in the global energy mix. In general, wind turbines (WTs) operate in harsh environments under dramatically fluctuating operating conditions and are often subjected to strong mechanical stress. Their operating state and load bearing conditions are random and unstable. Therefore, methods to improve availability of WTs, reduce operation and maintenance costs, and enhance the economic benefits of wind farms are of great significance. To date extensive research has been carried out on condition monitoring and fault diagnosis of key components of the WT. 1 Condition monitoring systems (CMSs) are capable of processing high frequency signals. However, due to data acquisition problems, incomplete data, and the inability to analyze all fault information, many scholars have recently adopted prognostics and health management (PHM) systems. Failure PHM of WTs, including the development of assessment systems for monitoring and managing the health status of WTs across their entire life cycle, have become an important research focus. Various algorithms and intelligent models are used as part of PHM, which can monitor, predict, and manage the health of WT systems in order to achieve condition-based or predictive maintenance. Performance degradation prognostics and health assessment of WTs form the basis of PHM research.
In recent years, data-driven multivariate statistical monitoring methods have been widely used in industrial process condition monitoring. 2 The core idea of multivariate statistical monitoring is to transform the input space into a feature space and a residual space through dimension reduction. A set of low-dimensional variables containing important features is constructed using the multivariate statistical monitoring method to summarize the information carried by high-dimensional data, such as principal component analysis (PCA), partial least squares (PLS), independent component analysis (ICA), and some related improved algorithms. 3 The multivariate statistical monitoring of the calculation process is the projection of data onto a higher-dimensional space, using square prediction error (SPE, squared prediction error) and the T 2 (Hotelling T 2 ) statistic to calculate the relationship between the test data and actual data to determine whether it is beyond its corresponding control limit to analyze whether the situation is abnormal and thus to monitor the process conditions. PCA is the most widely used algorithm, which can effectively reduce the dimension of data while retaining the maximum variance of the original data. However, as the PCA algorithm targets linear systems, WT state monitoring data mainly come from SCADA systems, which are multivariable with nonlinearity and high coupling among variables. The application of PCA to SCADA data is not ideal. Accordingly, Reference 4 combined the kernel function with the PCA algorithm and proposed kernel principal component analysis (KPCA). With the use of the kernel function, PCA was first extended to a high-dimensional feature space to eliminate the process variable nonlinearity and achieve more effective process monitoring. In 2010, Reference 5 proposed kernel entropy component analysis (KECA). KECA combines the kernel function and the concept of information entropy, maps the data to a high-dimensional space through kernel mapping, solves the nonlinear problem of the data, and reduces the dimension of the data in the high-dimensional feature space according to the size of the kernel entropy, making the information carried by the feature information deeper. Different from PCA and KPCA, KECA uses the size of the feature value as an indicator and reduces the dimension by disclosing the structure of the dataset through information entropy, thus revealing the data information more effectively.
In addition to condition monitoring, fault prediction and health assessment are the key research contents in industrial equipment maintenance. The fault prediction methods of equipment include the probability graph model, 6 model method, 7 and artificial neural network (ANN) method. 8 The probability graph model includes the Bayesian network and the Markov network. The Bayesian network method is a directed graph structure representation that uses the advantages of probabilistic mode inference uncertainty factors to realize the fusion of multidimensional information and has considerable advantages in solving the faults caused by complex equipment. ANN is one of the most popular methods in this field. It is a model that simulates biological neural networks 9 to construct a simulated neural network with the help of the human brain neuron structure. With the input of the training samples, the weights of the neural network functions are adjusted continuously to simulate the closest relationship between the input and the output. When the trained neural network inputs data, it can obtain reasonable output data. The main idea of using the ANN method to evaluate the health of the equipment is to compare the output data of the ANN with the actual operation data and take this deviation as the health evaluation result. It can handle multi-input, multioutput, quantitative, or qualitative complex systems and has good data fusion and adaptive and parallel processing capabilities. Therefore, this method is very suitable for WT fault detection and prediction. In the field of fault prediction, the generalized regression neural network (GRNN) is the most widely used ANN algorithm. GRNN can realize a variety of complex nonlinear mappings and has powerful pattern recognition and data fitting capabilities. 9,10 It can learn and store a large number of input and output pattern mapping relationships without first revealing the mathematical equations of such mapping relationships. 11,12 It is one of the most widely used ANN models. At the same time, GRNN 13 has the capability of nonlinear mapping and fast learning speed and can converge to the optimal regression with more samples. For GRNN, the prediction results are satisfactory even when the training sample is small, and the network can process unstable data. Because GRNN can capture the complex nonlinear mapping relationship between the interface energy and various factors, it can be assumed that ANN and GRNN have unique advantages in interface energy quantization. 14 To better monitor the condition of WT components and predict faults, in this study, the three-stage wind turbine assessment framework is first described. We propose combining KECA with the GRNN prediction model, which can not only better extract multiscale information but also analyze the residual error after prediction to obtain the prediction results of faults. To complete the observation frame and make the experimental results objective and easy to understand, we further studied the content of health visualization and conducted a visual analysis of the health assessment of the gearbox of the WT from the perspective of one and three dimensions by using a fuzzy comprehensive evaluation method and a bubble diagram method. The rest of this article is organized as follows. The clean-up of the following experimental data is summarized in Section 2. The condition monitoring method based on KECA-D 2 is outlined in Section 3. The fault prediction framework using KECA-GRNN is introduced in Section 4. The visualization results of the health assessment of the WT gearbox are presented in Section 5. The conclusion is presented in Section 6. Figure 1 shows the framework of the three-stage WT assessment method. The overall algorithm has the following three stages. Stage 1: WT gearbox condition monitoring using KECA-D 2 According to the characteristics of SCADA data, a WT state monitoring method is proposed-KECA-D 2 . The KECA algorithm was used for feature reduction and feature extraction for the cleaned experimental data, and the feature information was expressed in the form of calculated entropy to reveal the characteristics of the data structure to avoid the influence of data nonlinearity on deep information extraction. Then, a new comprehensive statistic D 2 suitable for KECA is proposed to diagnose the monitored status. Finally, the proposed method is compared with four sets of algorithms, PCA-SPE and KPCA-SPE, to verify the effectiveness of the proposed method.

FRAMEWORK OF THE GENERALIZED MODEL
Stage 2: Fault prediction of WT gearbox based on the KECA-GRNN method As KECA can extract deeper and more useful feature information from abundant SCADA data, it organically combines KECA and GRNN and uses the principal component information extracted by KECA as the input data of the GRNN to establish a prediction model, which can effectively save the runtime of the neural network and improve the generalization ability of the algorithm. The KECA-GRNN prediction model was used to predict the evaluation parameters, and the comparison was made with the actual evaluation parameter data. The hidden trouble of the gearbox was detected through the prediction residue, and the occurrence of the faults was predicted.
Stage 3: Health visualization research Upon an analysis of the predictive residual obtained by the KECA-GRNN model, the health evaluation results of the gearbox are visualized by using two methods of fuzzy comprehensive evaluation and bubble diagram, making the results clear and considerable.

FIGURE 1
Overall algorithm  15,16 In this chapter, the evaluation parameters and the experimental parameters will be selected for the subsequent experiments. The evaluation parameter is the parameter that represents the change trend of the fault. Through its change, the time when the fault occurs can be analyzed, and then the health state can be evaluated. The experimental parameters are the input of the subsequent KECA monitoring model to monitor the operation state of the WT gearbox. The principle of the experimental parameter selection is to calculate the correlation between the experimental parameters and the evaluation parameters. The correlation can reflect the degree to which the evaluation parameters change when the experimental parameters change, and the parameters with high correlation are selected as the experimental parameters. Table 1 describes the common faults of WT gearboxes divided into the following categories. 4,17,18 According to the troubleshooting methods summarized in Table 1, this article analyzes the gearbox oil temperature, bearing temperature, and lubricating oil inlet pressure and identifies three abnormal changes in the parameters, which can effectively analyze the early symptoms of gearbox failure.

Parameter Selection
As not all the characteristics in SCADA data are related to the faults of the WT gearbox, we first need to determine the subset of characteristics collected by the SCADA system that reflect the running state of the WT gearbox. The traditional parameter selection method is mainly based on the analysis of the physical knowledge of each component of the WT or judgment according to the experience of the relevant staff. The above method lacks theoretical support. On the basis of the abovementioned selected parameters that can represent gearbox faults, this article proposes a SCADA parameter selection method based on Pearson's correlation coefficient. Pearson's correlation coefficient has the characteristics of fast convergence and good interpretability and can better select the characteristic parameter with the largest correlation with the expected output as the subsequent experimental parameter object.
Through the correlation analysis, Pearson's correlation coefficient (Sedgwick 2012) between the other parameters of the SCADA system and the gearbox bearing temperature, gearbox oil temperature, and lubricating oil inlet pressure is obtained, as shown in Table 2 below. In Table 2, variables with a large correlation coefficient are selected as the final set of correlation variables for state monitoring. Output power, engine speed, rotor speed, wind speed, and gearbox vibration are selected as the input parameters for state monitoring.

Data Preprocessing
Accurate and credible supervisory control and data acquisition (SCADA) data are the basis of power generation performance prediction, health status prediction, and wind power evaluation. However, because of the severe operating environment of WTs, many of the data collected on site are of poor quality, which considerably hinders the information mining and further applications of SCADA data. The reasons for the abnormal data include communication failure, equipment abnormality, wind abandonment and power limit, and the fluctuation of working conditions, among which the abnormal data caused by the latter two reasons are particularly prominent. The high proportion of abnormal data has a considerable influence on the real law of SCADA data extraction and the correlation between parameters; therefore, WT data cleaning is very important. Figure 2 shows the wind speed-output power scatter plot of the WT used in this experiment. i. The first category is the data with a high wind speed but zero power in the continuous time at the bottom of the curve. ii.
The second category is the data with a high wind speed and low power or less than the rated power in the middle of the curve, namely, the wind abandon data or fault data. iii.
The third category is low-wind-speed but high-power data. In this study, the bin method was adopted to take the data outside the power curve of the wind speed as the abnormal data according to the location of the abnormal data, which can realize the cleaning of multiple types of abnormal data. Moreover, data samples are not needed for training, which has strong universality.
According to the standard deviation sigma of the significant difference distribution of the WT operation data, the data located in 3-sigma are normal data, and the rest are abnormal data.

FIGURE 2
Data preprocessing method

CONDITION MONITORING USING KECA-D 2
The KECA status monitoring model is established to calculate the detection statistics between the principal component model and the data to be tested to determine whether there are any abnormalities. The detection statistic used in this study was the comprehensive residual of the residual spatial statistic-2 D , which is the synthesis of the square prediction error (SPE) and CS.

Kernel Entropy Component Analysis (KECA)
Kernel entropy component analysis (KECA) was proposed by Reference 5 in 2010, mainly revealing that the angular structure is related to the Renyi entropy of the input space dataset and does not necessarily use the top eigenvalues and eigenvectors of the kernel matrix. KECA: Through kernel mapping, data are mapped from a low-dimensional space to a high-dimensional feature space, and feature vectors are selected in the high-dimensional feature space according to the contribution of kernel entropy. The distribution of feature vectors has an angular structure with the origin, showing significant angle differences among different feature information.
Assuming that the dataset 1 : , , N D x x is generated from an underlying probability density function   px, the Renyi quadratic entropy can be defined as follows: 20 (1) Hp, a Parzen window density estimator is invoked, as described in Reference 5: is a Mercer kernel function and  is the parameter of the kernel function, also called the Parzen window. Using the sample mean approximation of the expectation operator, we then have the following: 21 vector where each element is equal to one. To this end, the Renyi entropy estimator may be expressed in terms of the eigenvalues and the corresponding eigenvectors of the kernel matrix, which may be decomposed as with D being a diagonal matrix storing the eigenvalues 1 ,, N   K and E being a matrix with the eigenvectors 1 ,, N ee K as the columns. Eq. (3) can also be written as follows: (4)

D 2 Statistic
With reference to the method of PCA monitoring, in this study, we used SPE statistics to monitor the occurrence of faults. Moreover, it was shown that a distinct angular structure among the datasets was led by KECA, where different clusters were distributed more or less in different angular directions. 5 The Cauchy-Schwarz (CS) divergence measurement between probability density functions corresponded to the cosine of the angle between the kernel feature space mean vectors, which could express the angular structure. Combined with the above two considerations, in this study, we fused SPE and CS statistics through the D-S evidence theory to form a new statistic suitable for KECA monitoring, named D 2 . Through the fusion of these two statistics, numerical changes in the data are considered, and the structural changes of the data are demonstrated. In this section, the support vector data description (SVDD) algorithm 22 is introduced to obtain the control limit of the D 2 statistic.

SPE Statistic
The SPE statistic reflects the degree of deviation between the model and the test values at a given time. The calculation formula of the SPE statistic is as follows: where i t is the i th  kernel principal element of the input vector x in the feature space and R P is the feature vector extracted by KECA. When the confidence level is γ, the control limit of the SPE statistic can be calculated as follows: and Cγ is the critical value where the standard normal distribution test level is 0.

CS Divergence
The CS divergence measurement between probability density functions corresponds to the cosine of the angle between the kernel feature space mean vectors, which can express the angular structure. The CS divergence is a measure of the "distance," namely, the similarity between two probability density functions,   1 px and   2 px , given as follows 23 : Setting via the Parzen windowing, we obtain the following: , . The CS divergence basically measures the cosine of the angle between these mean vectors. The CS statistic in this paper is defined as follows: Here, m is the mean vector of the normal data.

SVDD
Proposed by Reference 23, SVDD is a single-classification learning method is inspired by the support vector classifier. It obtains a spherically shaped boundary around a dataset and is analogous to the support vector classifier; it can be made flexible by using other kernel functions. The main idea of SVDD is to find a hypersphere. The hypersphere should be as small as possible and surround as much training data as possible. The hypersphere is determined by its center a and radius R . We set the training data as   , , , . Therefore, we introduced slack variables 0 i   , and the minimization problem changed into the following: The parameter C controls the trade-off between the volume and the errors. The duality problem of the above equation can be expressed as follows: According to the Kuhn-Tucker condition, the center of the hypersphere can be expressed as = i ii   ax . The radius can be represented by the center of hypersphere a and the corresponding support vector on the boundary of sphere k x ( 0 k   ): After the completion of the training phase, if a test point z meets the following decision conditions, the test point is accepted; otherwise, it is rejected. Eq. 14 is given by The control limit 2 D r of 2 D was obtained using the SVDD method; if 2 D < 2 D r , the 2 D statistics of the input vector are normal.

Condition Monitoring Framework Using KECA-D 2
The proposed process monitoring method based on KECA-D 2 is illustrated in Figure 3 and can be summarized as follows.
Offline training: Step 1: Preprocess and standardize the data of the normal working conditions as the sample data.
Step 2: Establish the kernel matrix for the sample data. The kernel matrix is decomposed, the corresponding eigenvalues and eigenvectors are obtained, the Renyi quadratic entropy is calculated, and the corresponding principal components are selected according to the contribution rate of the Renyi quadratic entropy. In this study, the principal element whose cumulative contribution rate was higher than 95% was selected.
Step 3: Establish the KECA monitoring model for the data under normal working conditions. The D 2 -limits 2 D r are estimated using KDE. Online monitoring: Step 1: Preprocess and standardize the tested data in the same way as the training data.
Step 2: Calculate the corresponding kernel matrix and the principal element matrix.
Step 3: Calculate the D 2 statistics and compare them with the control limit 2 D r to judge whether failure occurs. To verify the effectiveness and practicability of the KECA-D 2 method proposed in this article, the method was applied to the actual operating data collected from the WT between February 21 and April 16, 2018, and the faulty part was the gearbox. Aiming at the performance of the WT, we selected the output power (kW), temperature of the shaft end of the gearbox (°C), rotor speed (rmp), wind speed (m/s), and temperature of the gearbox cooling water (°C) to establish the KECA-D 2 model to monitor the abnormal conditions of the gearbox of the wind turbine. We selected the data for the above parameters from 2.21 to 3.01 as the experimental training data (ED), from 3.02 to 3.11 as the model validation data (VD), and from 3.12 to 4.16 as the test data (TD) for the subsequent experimental analysis. First, the model verification experiment was conducted by using the validation data. A monitoring model was established by extracting the principal elements from the experimental training samples ED, and the model verification samples VD were taken as the test dataset in the monitoring stage. Statistics were calculated, and the relationship between them and the statistical limits was analyzed. According to the actual situation, the gearbox was in the normal working condition during this period; therefore, the statistics calculated by the model had to be below the threshold. Figure  4 shows the status monitoring results of the WT gearbox with the abovementioned five monitoring models under normal working conditions. In Figure 4    KECA test results of the gearbox between March 12 and April 16

FAULTS PREDICTION FRAMEWORK USING KECA-GRNN
Condition monitoring based on KECA can only judge whether the condition is abnormal or whether the faults occur within a certain timeframe, but it cannot predict the trend of fault occurrence and identify the potential faults in advance. Therefore, in this study, we propose a prediction model based on KECA-GRNN. For GRNN, even if the training samples are small, the prediction results can be satisfactory, and the network can handle unstable data. As the KECA algorithm can extract effective data information from abundant and a large amount of SCADA data, we can combine it with the GRNN algorithm and use the principal component information extracted from the KECA features as the input data of the GRNN to establish a trend prediction model.

KECA-GRNN Prediction Model
The training speed and the accuracy of the GRNN are affected by the number of samples and the correlation between them. In this chapter, the KECA-GRNN prediction model is proposed. The main idea is as follows: Before GRNN training, KECA processing is conducted on the training data to obtain the principal element of the training data to eliminate the redundancy among the training data and extract the feature information between the data at a deeper level to make GRNN better able to process information. The structure of the KECA-GRNN model is shown in Figure 6.

Residual Analysis by KECA-GRNN
As an important regression diagnostic quantity, the residual error implies important information assumed by the model. Through the residual analysis, the following problems can be solved to some extent: (1) rationality and feasibility of the KECA-GRNN regression prediction model, (2) rationality of the independent variable selection in the data preprocessing stage and model fitting, and (3) use of the calculated residual error as an evaluation object for a subsequent health evaluation.
The prediction residual error can reflect whether the working state of the gearbox is abnormal. When the gearbox works normally, the KECA-GRNN model has a good prediction effect on the gearbox state parameters, and the residual value is zero or very small. When the gearbox is abnormal, its dynamic characteristics change, and the relationship between the variables in the observed parameters changes abnormally, deviating from the normal working state. The predicted residual of the gearbox parameters increases, and the residual distribution will be significantly different from the residual distribution under the normal working state.
To detect the hidden trouble of the gearbox according to the change in the statistical characteristic of the residual, in this study, we used kernel density estimation to calculate the threshold of the residual. When a certain residual of the prediction model exceeds a certain threshold, an early fault alarm will be issued. However, the residual of a single parameter is not sufficient to represent the overall health state of the gear. The fusion analysis of multiple residuals of all the predicted parameters will be more reliable. Therefore, in this study, we define the fusion residual as follows: where i  is the residual of parameter i, and i  is the weight factor whose parameters affect the health condition of the gearbox. The weight factor of different WTs is different, and the value of the weight factor can be deduced from the early operation state and experience of the WT.

Results and Discussion on KECA-GRNN Fault Prediction
In this study, the KECA-GRNN prediction model was used to predict the evaluation parameters selected in Section 2.1: oil temperature of the gearbox, bearing temperature of the gearbox, and lubricating oil pressure at the inlet of the gearbox; the occurrence of faults was predicted by comparison with the actual evaluation parameters. Consistent with the data used in the condition monitoring stage, the data were divided into three parts. Table 4 shows the specific division of data. First, the KECA-GRNN model was verified by VDKECA. Modeling data EDKECA established the KECA-GRNN prediction model to predict multiple parameters of the gearbox. To verify the validity of the KECA-GRNN prediction model by using the VDKECA test data as the model verification sample, the prediction output was calculated.
Compared with the actual gearbox parameters, the accuracy of the model was calculated to verify the validity and reliability of the model proposed in this article. The mean absolute error (MAE) and the mean relative error (MRE) were used as the error indicators to evaluate the prediction model. The specific formula is as follows: where   yi is the predicted value of the parameter,   yi is the actual value of the parameter, and n is the number of samples. Figures 7 and 8 show the prediction results of the KECA-GRNN model and the GRNN model, respectively. Table 5 shows the error comparison between the GRNN model and the KECA-GRNN model. By comparing the error indexes in Table 5, we found that the KECA-GRNN model had higher prediction and fitting accuracy. The reason was that the KECA feature extraction data provided deeper data relations to the prediction model. This implied that the output of the KECA-GRNN model could be directly compared with the actual parameters to evaluate whether there was a fault. If the difference between the estimated value and the actual value increased in the consecutive instances, that is, if there were no small fluctuations for a certain period of time, then this would signal a malfunction. GRNN prediction model After TDKECA was used to train the KECA-GRNN model, the EDKECA data were used to predict the parameters. A comparison of the estimated parameters with the actual parameters revealed that the corresponding faults of the gearbox could be detected. The prediction results of the oil temperature of the gearbox, the bearing temperature, and the inlet pressure of the lubricating oil are shown in Figure 9. In addition to the influence of the model accuracy on the results, we found that the oil temperature and the bearing temperature of the gearbox of the WT were abnormally high at approximately 3.27. The actual temperature was too high, and the difference between the actual temperature and the predicted temperature was too large. The abnormal state of the lubricating oil inlet pressure occurred after 4.05. The abnormal condition of the gearbox highlighted by the pressure was found to be slow.
A further analysis of the predicted residual error determined the cause and time of the high-temperature anomaly. The result is shown in Figure 10. Figure 10 shows the residual plot of the oil temperature, bearing temperature, and lubricating oil inlet pressure predicted by the KECA-GRNN model. The upper threshold limit of the residual calculated by the kernel density estimation (KDE) was used as a high alarm line. When the residual value exceeded the threshold limit, the gearbox was in an abnormal condition. From this figure, we found that the bearing temperature exceeded the high-temperature alarm line on March 18 and decreased from March 25 to March 27. The reason for the decrease might be attributed to the decrease in the ambient temperature or the small shaft friction caused by the decrease in the wind speed. The oil temperature continued to exceed the high-temperature alarm line after 3.21 days and even exceeded the high-temperature alarm line after 3.30 days. From the two residual gearbox bearing temperatures and the oil temperatures in the trend chart, we inferred that the gearbox assembly of this WT generator might be attributed to the shaft friction caused by high-temperature failure, and the gearbox failure was attributed to the increasing temperature of the transmission oil through heat transfer. In addition, the residual trend of the lubricating oil inlet pressure was consistent with the residual trend of the oil temperature and the bearing temperature, but the abnormal situation appeared on April 05, when the time was relatively slow. Therefore, we could predict the occurrence of early faults by calculating and analyzing the residuals generated by the KECA-GRNN prediction model, and by analyzing the multiple parameters of the gearbox, we could consider their mutual restriction relationship to avoid one-sidedness and errors in the prediction results. If only the operation of a single parameter was monitored, it was very likely to result in an error of the prediction results. In view of the abovementioned possible problems, in this study, we adopted the fusion residual method described in Section 4.3. Through the fusion analysis of the residual of the multiple target parameters of the gearbox, the relationship between them was considered to avoid one-sidedness and error of the prediction results. According to Eqs. (15), the fusion residual of the three target residuals produced by KECA-GRNN was calculated. In this study, the weight factor 12 0.4   = , 3 0.2   was selected by using a trial-and-error method, and the results are shown in Figure 11. A comparison of Figures 10 and 11 revealed that the residual after fusion was more sensitive to an abnormal judgment and that the abnormal judgment was less in the early normal state. Residual plot

FIGURE 11
Fusion residual

HEALTH VISUALIZATION RESEARCH
Health visualization is used to process and analyze the data in a certain way and turn the data into concrete graphs by drawing so that the analysis results are clear and considerable and easy to understand. Specifically, the needs of users must be carefully analyzed when drawing and then combined with the technology to reflect upon. Visualized data can be divided into the following categories: one-dimensional data visualization, multidimensional data visualization, temporal data visualization, hierarchical data visualization, and grid data visualization. In the previous section, three residuals and one fusion residual were used to predict the fault of this wind turbine, so in this study, one-dimensional data and multidimensional data visualizations were conducted using the residual predicted by KECA-GRNN to show the health of the WT and verify the rationality of the fusion residual setting.

One-dimensional data visualization
Through the fusion residual obtained in Section 4.4, one-dimensional data visualization research was conducted by means of a fuzzy comprehensive evaluation. Fuzzy comprehensive evaluation is a method based on fuzzy set theory. It quantifies all types of fuzzy information to judge the state and divides it into a single-stage fuzzy comprehensive evaluation and a multistage fuzzy comprehensive evaluation. In this study, we mainly applied the method of a single-stage fuzzy comprehensive evaluation. The fuzzy comprehensive evaluation of the health status of WTs implied that all the factors affecting the health status of the equipment were composed into a factor set and that the evaluation results of the health status were composed into an evaluation result group; the evaluation results had to correspond to some factors in the factor group. In this study, triangle and half trapezoid membership functions were used to define the membership function of the health status level, as shown in Figure  12. According to the membership function graph, the distinct assessment grades were defined as       ( According to the actual operation data of the early wind power unit, in this study, we selected 1  and conducted a fuzzy comprehensive evaluation to analyze the health status of the unit as accurately as possible. The fusion residual was substituted into Eqs. (18)- (22), the confidence level was 0.9, the membership degree was allocated with the basic confidence, and the health status of the gearbox was evaluated. The results are shown in Figure 13. This figure shows that the health status of the gearbox of the WT went through four stages: Stage 1: The gearbox of the WT was in a "healthy" or "good" state from March 10 to March 21, and the WT was in normal operation. Stage 2 was the transition from "good" to "attention" in the health status of the gearbox from March 22 to March 26, indicating that the gearbox needed to be strengthened and maintained. In Stage 3, the health status degraded significantly from March 25 to April 10, most of which was in the state of "deterioration," indicating that the gearbox in this stage might have been abnormal. In Stage 4, from April 10 to April 16, the gearbox of the WT deteriorated obviously to the state of "disease" around April 15, indicating that the fault occurred. It took 15 days for the WT to transition from the "attention" state to the "disease" state. The transition trend was consistent with the state trend diagram of KECA-SPE. As the transition process was very rapid, we observed that the emergency operation and maintenance plan had to be made to check the faults when abnormalities occurred in the important components (or subsystems).

FIGURE 13
Health assessment results

Multidimensional data visualization
A bubble chart is used to show the relationship between three variables. It is similar to a scatter plot, with one variable on the horizontal axis, another on the vertical axis, and the other variable represented by the size of the bubble. Bubble charts are similar to scatter diagrams, except that bubble diagrams allow an additional variable representing the size to be added to the diagram for comparison. In this study, a bubble chart was used to show the variation trend of the three predictive variables and the health status of the wind turbine gearbox from March 12 to April 16 (missing March 27 to March 30) for 31 days. In this section, the horizontal coordinate of the design bubble chart denotes the residual oil temperature, the vertical coordinate represents the residual bearing temperature, and the ball diameter indicates the residual oil inlet pressure. The residual at many moments of each day was screened. The screening principle was as follows: calculate 2 2 x y z  , and select the bubble corresponding to its maximum value as the health status of the day. The resulting chart is shown in Figure 14.
According to the previous WT data, the healthy area (green), degraded area (blue), and fault area (red) were calculated. Consistent with the results of the one-dimensional visualization, the bubble chart showed that the gearbox of the WT was in the "healthy" state from March 10 to March 21 and in the operating state from March 21 to April 10, transitioning into the notice state; the operating state from April 11 is displayed in the fault area.
In conclusion, the abovementioned two visualization methods could reasonably reflect the health status of WT components and were consistent with the actual situation. Maintenance personnel can choose different visualization methods to describe the health status of WTs according to the known degradation factors to clearly grasp the health status of WTs in different periods.

FIGURE 14
Bubble chart of health assessment

CONCLUSIONS
In this study, we presented a method for the condition monitoring and fault prediction of a multiparameter WT gearbox. The method could be used to monitor the condition of the gearbox and predict the multidata of the gearbox. In this study, the KECA algorithm was applied to the condition assessment of WTs; we found that the KECA algorithm introduced Renyi entropy and the main elements of entropy value selection. It was not only aware of the nonlinear processing of SCADA data but also exposed the internal information of the data to ensure that the information would not be lost to the maximum extent. According to this characteristic, a new comprehensive statistic D 2 was designed, which had the characteristics of the SPE statistic and the CS statistic, and had an obvious monitoring effect on the condition; the false alarm rate was minimal. In addition, a KECA-GRNN model was established to predict the gearbox parameters, which could also be widely used to predict other parameters. In addition, a fusion analysis was conducted on the predicted residuals generated by the KCA-GRNN model to analyze the time of fault generation from multiple angles and dimensions. Finally, a fuzzy comprehensive analysis method and a bubble diagram were used to visualize the health assessment results. The results were accurate and intuitive. The example showed that the prediction model and the evaluation method established in this study were accurate, simple, and intuitive and could be used to analyze the health status of the gearbox of a WT. Table 1.  Validate the KECA-D 2 model Test data (TD) 3.12-4. 16 Monitor wind turbine