Hybrid Intelligent Predictive Maintenance Model for Multiclass Fault Classi�cation

Abstract


44
As modern industrial companies strive to meet their operational targets in order to remain successful in profit 45 maximisation, they are pressured to make use of various integrated and complex engineering machinery. 46 Generally, working under extreme and challenging conditions, these industrial equipment are subjected to 47 progressive deterioration, leading to a significant increase in the possibility of related component failure 48 (Helwig et al. 2015a; Egusquiza et al. 2018). This impacts their availability and reliability to minimise 49 operational downtime and maintenance related cost (Sheng et al. 2011). Due to these reasons, monitoring the 50 conditions of these complicated systems as a requirement for predictive-based maintenance has gained 51 increasing importance over the years since it determines the required maintenance action based on 52 equipment's health status. This ensures the availability and reliability of industrial equipment and offers a 53 significant improvement in their health condition, thus, ultimately increasing asset utilisation and reducing 54 maintenance cost. 55 However, with the rising demands and increasing complexity of industrial systems, the number of installed 56 sensors and their sampling rate are constantly growing (Schneider et al. 2018). As a result, the processing of 57 high-dimensional data (signals) from multiple sensors for predictive-based maintenance is at risk of suffering 58 from scalability, classification performance and the curse of dimensionality ( in developing intelligent predictive maintenance frameworks in various applications (Çınar et al. 2020a). This 72 is due to the enormous potential of ML algorithms to process multivariate and high-dimensional dataset 73 (generated in industries by industrial equipment and machinery) through the extraction of hidden patterns, 74 classification, prediction or visual representation (Helwig et al. 2015a; Raptodimos and Lazakis 2018; Kaur 75 and Kaur 2020). However, a sizable number of these ML algorithms are very task-specific and thus are 76 incapable of being implemented in other specialised tasks. Hence, their performance varies when 77 implemented independently. Also, these ML algorithms are further constrained in producing the desired 78 results when exposed to nonlinear and nonstationary high-dimensional datasets characterised by high levels 79 of uncertainties (Cho et al. 2018). For these reasons, researchers in the field of predictive maintenance have 80 directed their focus into building hybrid frameworks by leveraging on the strength and weakness of the 81 multiple ML algorithms which unquestionably are improvements of existing techniques (Zhang et  The motivation for creating the hybrid is that, for ICEEMDAN, it has the ability to decompose nonlinear and 89 nonstationary signals arising from complex systems into a series of Intrinsic Mode Functions (IMFs), where 90 each resulting IMF represents the respective local transient features (Colominas et al. 2014). Besides, when 91 compared to existing filtering techniques for extracting transient features from nonlinear and nonstationary 92 signals such as the Wigner-Ville Distribution (WVD) (Wigner 1932), Short-Time Fourier Transform (STFT) 93 (Peppin 1994;Newland 2005), Wavelet Transform (WT) (Daubechies 1989) and Empirical Mode 94 Decomposition (EMD) (Huang et al. 1998 For a better comprehension of this study, the scope of the related works is further limited to prior works 168 conducted using the same hydraulic system dataset (obtained from the UCI machine learning repository) 169 employed. This will allow the tracking of progress that has been made regarding the usage of ML algorithms 170 in monitoring the conditions of the considered hydraulic system data since its inception in 2015. Table 1  171 summarises the ML algorithms that have been utilised in developing predictive maintenance methods based 172 on the UCI machine learning repository hydraulic system dataset under consideration. As seen, since the 173 publication of the considered hydraulic dataset by Helwig et al. (2015), several works that have been 174 conducted attempt to proposed predictive maintenance techniques that improve upon the efficiency and 175 accuracy in classifying the four major components (accumulator, cooler, pump and valve conditions). 176 Although improvements have been realised over the years, there still exist some aspects of the dataset that is 177 yet to be explored. Hence, this study introduces the ICEEMDAN denoising technique as the first phase of 178 pre-processing for dealing with high levels of uncertainties introduced into the dataset as a result of the 179 nonlinear and dynamic operational conditions of engineering systems. Comparing the prior works, it was 180 found that none of the existing studies addressed the issue of uncertainty (noise) before classifying the 181 degradation states of the monitored conditions. Therefore, performing data denoising as a pre-processing step 182 is a contribution to knowledge in predictive maintenance. 183

3
Condition monitoring of hydraulic system dataset 184 Time series data recorded from monitoring the condition of a hydraulic system is obtained from the UCI ML 185 repository via http://archive.ics.uci.edu/ml/datasets/Condition+monitoring+of+hydraulic+systems. The 186 essential details of the dataset are briefly discussed in this section, with detailed discussion found at the source 187 (Helwig et al. 2015a are selected to serve as input parameters for the LSSVM optimised by CSA-NMS for classifying fault types. 201 The details of the various stages shown in Fig. 1  affects the frequency resolution, and vice versa (Hlawatsch and Boudreaux-Bartels 1992). As a remedy, the 224 WT was proposed as an efficient alternative for dealing with fixed time-frequency resolution problems and 225 transient signals in general (Galli et al. 1996). Nonetheless, a low resolution is attained at higher frequencies 226 as the frequency from WT is logarithmically scaled (Barschdorff and Femmer 1995). 227 To overcome the deficiencies of these techniques, a nonparametric feature extraction method, the EMD was 228 proposed to decompose nonlinear and nonstationary signals arising from complex systems into a series of 229 Intrinsic Mode Functions (IMFs), where each resulting IMF represents the respective local transient features 230 (Huang et al. 1998 proposed. In this study, the recently enhanced formulation of CEEMDAN known as the Improved Complete 238 Ensemble EMD with Adaptive Noise (ICEEMDAN) that has the ability to reduce the contamination of noise 239 in a signal (Colominas et al. 2014) is adopted. ICEEMDAN is proposed as a pre-processing technique to 240 extract the required fault features whiles reducing the noise and high levels of uncertainty. 241

Improved Complete Ensemble EMD with Adaptive Noise (ICEEMDAN) 242
The ICEEMDAN is an adaptive method for decomposing nonlinear and nonstationary signals (both time and  243 frequency domains) into a series of Intrinsic Mode Functions (IMFs), where each resulting IMF represents 244 the respective local transient features (Huang et al. 1998 where () M  is the operator for estimating the local mean. 256 For the first IMF, 1 k  , is estimated using Equation (3). 257 The second residual   2 r is estimated as the average of the local means of the realisations as shown in 259 Equation (4). 260 Then, the second IMF at 2 k  is estimated using Equation (5). 262 The th k residual and th k IMF are estimated using Equations (6) and (7), respectively. 264 The process from Equations (6) and (7) is repeated for the next k .

Signal reconstruction 268
In the application of the ICEEMDAN on the data, it is of paramount importance to select the relevant IMFs 269 which contains as much fault information as possible from the series of IMFs generated by ICEEMDAN 270 whiles neglecting the spurious IMFs. As such, an effective, quick and repeatable scientific framework is 271 needed to discriminate between relevant and spurious IMFs for the signal reconstruction to aid improve the 272 performance of the fault diagnosis system. In this paper, a stringent threshold for discriminating between where i  is estimated using Equation (9). 279 intervals. In this study, as a means of ensuring uniformity within all cycles, the various time intervals were 306 allocated after averaging each feature per sensor. Also, similar to prior research works where the data for each 307 cycle per sensor were all partitioned into various segments, the partitioning of the cycles in this study varied 308 with varying characteristic time intervals of the cycles. Fig. 3(a), for instance, shows 13 characteristic time 309 intervals after averaging each feature in the PS1 sensor data. Similarly, PS5 was partitioned into 19 310 characteristic time intervals as shown in Fig. 3(b). The column labelled "Time Interval" in Table 2 1, 2, ,   which is not based only on its corresponding i x but also on the coupling 406 term  . In this paper, CSA is used to determine the starting values of the optimal bandwidth ( )  and the 407 regularisation ()  parameters of the LSSVM approach which are then passed on to the NMS technique for 408 fine-tuning. 409

Nelder-Mead Simplex (NMS) Algorithm 411
The NMS algorithm is one of the popular direct search algorithms for optimising multidimensional 412 unconstrained problems (Nelder and Mead 1965). Unlike other simplex methods, the NMS algorithm is 413 known to be an improvement as it allows the simplex to vary not only in size but as well in shape (Baudin  414  2009 The NMS algorithm is presented as follows: 427 Step 1: Build Step 2: Use Equation (30)  Step 3: If     x is accepted, else r x is accepted whiles e x is discarded and iteration is terminated. 435 Step 4: x is accepted and iteration is terminated, else skip to step 5. 442 Step 5: f is then evaluated at n points The standard values for the four major coefficient parameters: The optimal bandwidth ()  and regularisation ()  parameters proposed and refined by the combination of 446 CSA and NMS algorithms are then passed on to LSSVM to facilitate the classification. 447

Classification performance evaluation 448
From the statistical viewpoint, deducing a comprehensive evaluation of multiclass classification models based 449 on a single performance index is not easy and enough. For these reasons, eight evaluation metrics namely 450 accuracy, error rate, precision, recall (sensitivity), specificity, F score, Mathews correlation coefficient and 451 geometric mean were used in this study for the purpose of reliability and also to overcome the above 452 drawback. Results and discussion 504

5.1.1
Denoising 506 The hydraulic system dataset was first subjected to the modified ICEEMDAN technique (integrated with the 507 stringent threshold (Equation (8)) as a criterion for discriminating between relevant and spurious IMFs) as 508 discussed in Section 4.1.2. The modification was automated such that, the denoising (decomposition into 509 IMFs with ICEEMDAN and reconstruction with Equation (8)) was repeated for all the sensors considered in 510 this study as manual involvement was highly impractical. The modified ICEEMDAN technique successfully 511 eliminated most redundancies masked as noise in the sensor data. 512

Feature extraction and selection 513
Here, two different data scenarios were considered when applying the PCA. First, the PCA was applied to 514 the resulting pool of 1806 features which have been denoised using the ICEEMDAN technique. Secondly, 515 the PCA was applied directly to the original extracted 1806 features without denoising. The motive is to 516 ascertain the extent to which the denoised data could improve the classifiers prediction capability. Fig. 4(a) 517 shows the scree plot of the eigenvalues and the proportion of variance explained regarding the undenoised 518 1806 possible components whiles Fig. 4(b) shows the PCA results for the denoised 1806 features. Using the 519 Kaiser's criterion for retaining PCs with eigenvalues greater than 1.0, 161 PCs were retained for the 520 undenoised data representing 91.62% of the variance in the 1806 features. With regards to the denoised data 521 using the ICEEMDAN, the number of selected PCs was substantially reduced by half (82) when compared 522 to the undenoised data, representing 96.04% of the variance in the 1806 features as shown in Table 3. The  523 results from both approaches (with and without the use of ICEEMDAN) clearly indicate the relevance of the 524 ICEEMDAN technique in removing substantial levels of noise in the original hydraulic sensor data before 525 feature extraction and selection. Thus, ensuring the extraction of relevant features that contains the most 526 characteristic fault information projected onto a minimal number of uncorrelated PCs whiles maintaining a 527 high proportion of variance (96.04%) during the dimensionality reduction phase. 528

5.2
Multiclass classification 529 Multiclass classification refers to classifying input features to one of a predefined set based on an optimal 530 subset of features selected during the feature selection stage. In this study, the multiclass classification model 531 result of the optimised LSSVM was compared with three benchmark machine learning techniques that have 532 been used extensively in literature and have all demonstrated promising multiclass classification capabilities. 533 The methods include Linear Discriminant Analysis (LDA), Support Vector Machine (SVM) and Artificial 534 Neural Network (ANN), specifically, the Generalised Regression Neural Network (GRNN). If the 535 performance of the tested classification model is deemed satisfactory, the classification model can be used to 536 classify the unknown future health status of the hydraulic component. Fig. 5 shows the illustration of the 537 procedure used in building the multiclass classification models. The classification experiment was simulated 538 on MATLAB (2018a version) using an ASUS computer with a 2.3 GHz processor computer with 4 GB 539 memory. 540 In order to ascertain the reliability of the multiclass classification models, the selected features from PCA are 541 randomly partitioned into two sets; 70% (training set) to be used in training the multiclass classification 542 models and the 30% (testing set) is used to validate the optimum trained models. However, the random nature 543 of the hybrid CSA-NMS optimisation algorithms in providing the optimal bandwidth and the regularisation 544 parameters of the LSSVM will yield slightly different output after each run. As a result, the LSSVM is 545 implemented using the one-vs-one with RBF kernel, and is trained 10 times based on randomly partitioned 546 data into training and test sets. The final classification output is presented as an average, standard deviation 547 (STD), best and worst of all the outputs from the 10 runs as shown in Table 4. Notably, the proposed hybrid 548 technique achieved average test classification accuracies greater the 99.40% for all four monitored conditions. 549 Also found in Table 4 is the training accuracy, its deviation (10-fold cross-validation) and CPU time regarding 550 the training of LSSVM optimised by CSA-NMS for the four hydraulic components. 551

5.2.1
Classification rate based on optimised LSSVM compared to other investigated classifiers 552 Table 5 shows the average multiclass classification test results for hydraulic accumulator condition based on  553 pre-processed ICEEMDAN-PCA and PCA input features for the optimised LSSVM, LDA, SVM and ANN. 554 Although all the classifiers showed satisfactory test performance, the proposed ICEEMDAN-PCA-LSSVM 555 outperformed all the other methods. This is evident from the various evaluation metrics employed (Table 5). 556 The In the case of the cooler condition, similar average multiclass classification test results as in the accumulator 587 condition were obtained (Table 6). However, the classification results obtained by the proposed ICEEMDAN-588 PCA-LSSVM, ICEEMDAN-PCA-ANN and PCA-LSSVM models were quite similar. That is, these models 589 achieved the same highest level of accuracy (99.83%) with a corresponding lowest misclassification rate 590 (error) of 0.17%. Similar results were obtained for the remaining evaluation metrics considered. The 591 interpretation here is that the proposed ICEEMDAN-PCA-LSSVM model can adequately be used in 592 classifying cooler conditions. 593 Table 7 shows the average multiclass classification test results for internal pump leakage condition. As 594 observed in Table 7 With regards to the average multiclass classification of the valve conditions (Table 8), the proposed  606 ICEEMDAN-PCA-LSSVM model showed better classification results than the alternative models. This is 607 manifested in the 99.84% accuracy with a 0.16% misclassification (error) rate obtained. The same trend of 608 results was noticed for the remaining evaluation metrics considered (Table 8) (Table 6) while ICEEMDAN-PCA-SVM only classified the internal pump leakage  618 correctly (see Table 7). These suggest that the proposed ICEEMDAN-PCA-LSSVM model is highly potent 619 in classifying a wide range of monitored conditions irrespective of the dynamic operational condition of the 620 machine component. This will aid machine operators to spot significant changes in machine components 621 which is an indication of fault development. Thus, increasing the availability and reliability of machines, as 622 well as reducing maintenance-related cost by presenting operators with an accurate and informed decision as 623 to when to schedule maintenance. 624 Furthermore, this study has practically demonstrated the relevance of denoising the original signals by the 625 ICEEMDAN technique to enhance the overall classification results. This is evident in the results presented 626 in Tables 5 -8 where two scenarios of denoising the originally recorded data using ICEEMDAN and using 627 the originally recorded data without denoising in the modelling process. conditions. The compared classifiers achieved relatively similar classification performance; however, they 645 could only do so in the case of the cooler and the internal pump leakage. These suggest that the proposed 646 hybrid is highly potent in classifying a wide range of monitored conditions irrespective of the dynamic 647 operational conditions. Although the proposed hybrid technique generally improves the classification of 648 diverse monitored conditions, it may be limited when experimented on signals with mutations and similar 649 dominant frequencies. Also, the feature extraction technique as well as those utilised in prior research works 650 may be constrained when dealing with extremely larger datasets due to the level of manual involvement by 651 the user. Hence, future works should explore the following: the usage of filters capable of addressing signals 652 with mutation and similar dominant frequencies, deep learning approaches to feature extraction. This will 653 ultimately improve the optimality of the pre-processed signals for various fault classification tasks in the field 654 of predictive maintenance. 655

Conflict of Interest:
The authors declare that they have no conflict of interest. 656