Hybridization of harmonic search algorithm in training radial basis function with dynamic decay adjustment for condition monitoring

In recent decades, hybridization of superior attributes of few algorithms was proposed to aid in covering more areas of complex application as well as improves performance. Condition monitoring is a major component in predictive maintenance which monitors the condition and identifies significant changes in the machinery parameter to perform early detection and prevent equipment defects that could cause unplanned downtime or incur unnecessary expenditures. An effective condition monitoring model is helpful to reduce the frequency of unexpected breakdown incidents and thus, facilitates in maintenance. ANN has shown effective in various condition monitoring and fault detection applications. ANN is popular due to its capability of identifying the complex nonlinear relationships among features in a large dataset and hence, it can perform with an accurate prediction. However, a drawback is that the performance of ANN is sensitive to the parameters (i.e., number of hidden neurons and the initial values of connection weights) in its architecture where the settings of these parameters are subject to tuning on a trial-and-error basis. Hence, a wide range of studies have been focused on determining the optimal weight values of ANN models and the number of hidden neurons. In this research work, the motivation is to develop an autonomous learning model based on the hybridization of an adaptive ANN and a metaheuristic algorithm for optimizing ANN parameters so that the network could perform learning and adaptation in a more flexible way and handle condition classification tasks more accurately in industries, such as in power systems. This paper presents an intelligent system integrating a Radial Basis Function Network with Dynamic Decay Adjustment (RBFN-DDA) with a Harmony Search (HS) to perform condition monitoring in industrial processes. RBFN-DDA performs incremental learning wherein its structure expands by adding new hidden units to include new information. As such, its training can reach stability in a shorter time compared to the gradient-descent based methods. To achieve optimal RBFN-DDA performance, HS is proposed to optimize the center and the width of each hidden unit in a trained RBFN. By integrating with the HS algorithm, the proposed metaheuristic neural network (RBFN-DDA-HS) can optimize the RBFN-DDA parameters and improve classification performances from the original RBFN-DDA by 2.2% up to 22.5% in two benchmarks datasets, which are numerical records from a bearing and steel plate system and a condition-monitoring system in a power plant (i.e., the circulating water (CW) system). The results also show that the proposed RBFN-DDA-HS is compatible, if not better than, the classification performances of other state-of-the-art machine learning methods.


Introduction
Maintenance involves carrying out all technical and associated administrative activities to keep all components in an operational system to perform their function properly (Stephens 2010). If any equipment breakdown occurs in a well-maintained operational system, minor problems can be detected and corrected by conducting the short daily inspection, cleaning, and lubricating activities. An effective maintenance action requires company-wide participation and support by every personnel in order to make plant and equipment more reliable (Waeyenbergh and Pintelon 2002).
Basically, maintenance can be divided into two types, preventive maintenance and corrective maintenance (Blanchard 2004). Preventive maintenance includes all planned maintenance actions e.g., periodic inspection, condition monitoring, etc. that are implemented to avoid the equipment from breaking down unexpectedly. Corrective maintenance includes all planned and unplanned maintenance actions to rectify failures before restoring equipment back to its operational condition. Condition monitoring is a major component in predictive maintenance which monitors the condition and the significant changes in the machinery parameter to perform early detection and eliminate equipment defects that could cause unplanned downtime or incur unnecessary expenditures (Mohanty 2014). It is essential to develop efficient condition monitoring techniques for (1) quickly identifying the faulty components before breakdown. (2) Reducing costly repairs caused by unexpected failure. (3) Optimizing the scheduling of preventive and predictive maintenance operations without routine inspections, which may require periodic shutdowns in a plant. Condition monitoring consists of two sequential processes: feature extraction and condition classification (Marwala and Vilakazi 2007). Feature extraction requires the use of signal processing and/or post-processing techniques; it is a mapping process from the signal space to the feature space. The condition classification is a process of classifying the obtained features into different categories. The traditional approach of condition monitoring relies on human expertise to relate the extracted features to the faults, which is usually timeconsuming and unreliable, particularly when multiple features are referred for fault diagnostics and when the data are affected by noises (Gertler 1998). Fault diagnosis coupled with artificial intelligence (AI) can potentially overcome the shortcoming of traditional signal processing techniques by incorporating the human-like thought abilities such as learning, reasoning, and self-correction (Kok, 2009). The widely used AI tools in condition monitoring include artificial neural network (ANN), fuzzy logic system, support vector machine, extreme learning machine, etc. (Ali 2018). Literature shows that ANNs with learning capabilities are useful models for tackling condition-based maintenance problems (Tallam et al. 2003). They learn from data samples without a requirement for building an exact mathematical model. However, the performance of ANNs is highly depended on their parameter settings. For this concern, numerous global optimization methods are introduced and utilized to train ANNs to achieve a better network performance. Many of these global optimization methods are metaheuristic algorithms, which initiate a search process to explore in the search space to obtain near optimal solutions. The metaheuristic algorithms are mechanisms that imitate certain strategies inspired from nature, social behavior, physical laws, etc. Some wellknown metaheuristic algorithms include Genetic Algorithm (GA) (Holland 1975), Particle Swarm Optimization (PSO) (Sheng et al. 2011), Artificial Bee Colony Algorithm (ABC) (Karaboga 2005) and harmony search (HS) (Geem et al. 2001). Other new metaheuristic algorithms are also introduced in the literature, for instances, Aquila Optimizer (AO) (Abualigah 2021a) and Arithmetic Optimization Algorithm (AOA) (Abualigah 2021b). Metaheuristic algorithms have been applied in a wide range of application domains such as medical (Ahmad et al. 2017), engineering (Bhoskar 2015), finance (Daliri 2020), document clustering (Abualigah et al. 2018a, b) and due to their high efficiency and flexibility characteristic. Literature showed that many metaheuristic algorithms have been successfully used to optimize the ANN's design and the parameters (Yao 1993). Many reviews on ANN-based condition monitoring have been presented (Singh 2003;Lei 2014). Notably, the application of metaheuristic ANNs to condition monitoring is still relatively few. An effective metaheuristic ANNbased condition monitoring model is helpful to reduce the frequency of unexpected breakdown incidents and thus, facilitate in maintenance. Thus, this motivates a research on developing an autonomous learning model based on the hybridization of an adaptive ANN and a metaheuristic algorithm for handling condition classification tasks in industries, such as in power systems. In this paper, a hybrid model of an incremental radial basis ANN and metaheuristic algorithm is developed. A metaheuristic algorithm (i.e., harmony search (HS) algorithm) is proposed to improve the learning parameters of the ANN. The research work is aimed at developing a metaheuristic ANN for monitoring the operating states of a system more accurately in a power plant.
The paper's organization is as follows: Sect. 2 presents a review of condition monitoring and fault diagnosis techniques. In addition, the state-of-the-art of ANN and metaheuristic ANN models used for performing condition monitoring and fault detection will be described. Section 3 describes two machine learning components [namely, Radial Basis Function Network with Dynamic Decay Adjustment (RBFN-DDA), Harmony Search (HS)] before describing the proposed RBFN-DDA-HS in details. The effectiveness of RBFN-DDA-HS in condition monitoring and fault detection is evaluated using two benchmark datasets (namely, bearing fault and steel plate fault datasets) and a real dataset that is collected from a circulating water system in a power plant. Section 4 presents and compares the results of RBFN-DDA-HS and other machine learning methods in the mentioned condition monitoring and fault detection tasks. Finally, Sect. 5 draws concluding remarks and suggestions for further work.

Related work
In this section, we first present an overview of the condition monitoring and fault detection approaches, which are the model-based and the data-driven approaches. The strengths and limitations of each approach are briefly explained. The proposed method in this paper is developed from a datadriven, artificial intelligence (AI) approach. In particular, the proposed method is an example of artificial neural network (ANN) that is integrated with a metaheuristic algorithm. Therefore, the focus of the second section of literature review is on the state-of-the-art of ANN and metaheuristic ANN in condition monitoring and fault detection.

Condition monitoring and fault detection techniques
In general, condition monitoring and fault detection techniques can be broadly classified into model-based approach (Isermann 1997) and data-driven approach (Patton et al. 2013), which are shown in Fig. 1. Many review articles are available to survey the application of techniques from datadriven and model-based approaches in process monitoring and fault detection (Tidriri 2016;Miljković 2011;Ding 2011). The model-based approach involves a modeling process, which is based on fundamental understanding about the physics of the process. The model can be either of a qualitative or a quantitative type. Qualitative modeling (Venkatasubramanian et al. 2003a) relies on the use of qualitative functions centered around different units in a process to express the relationship between the input and output of the system while quantitative modeling (Venkatasubramanian 2003b) is built by the use of mathematical function. The performance and reliability of the diagnostic system are assessed by the accuracy of the model. Thus, these approaches are usually applied only when an accurate analytical model is available. However, an accurate mathematical model is usually difficult to derive to represent complex mechanical systems, especially when the machines operate in uncertain and noisy environments (Tan et al. 2007). Besides, model-based approaches also rely on human expertise to relate the extracted features to the faults, which is usually time-consuming and unreliable, particularly when multiple features are utilized for fault diagnostics and when the data are affected by noises (Gertler 1998). Data-driven approach overcomes the above-mentioned limitation of the model-based approaches and is more viable to monitor industrial processes that are inherently automated. Data-driven approach requires for real data obtained from a data acquisition system to monitor or forecast the process behavior of the system under monitoring. The commonly used data-driven methods can be divided into statistical (Yin 2014) and AI approaches (Ali 2018). Methods under the statistical approach perform condition monitoring/fault detection by identifying the relationships among the data and these relationships are referred for making classification and prediction in the future. The commonly used statistical methods such as control chart, principle component analysis and partial least squares are easy to build and they can provide fast detection on abnormal situations (Venkatasubramanian 2003c). To further improve the diagnosis performance, fault diagnosis coupled with AI with the human-like thought abilities such as learning, reasoning and self-correction (Kok et al. 2009) are extensively developed. Comprehensive reviews on the use of AI techniques for fault detection and condition monitoring on mechanical components such as induction machine (Singh 2003) and planetary gearboxes (Lei 2014) have been presented. The reviews show AI is useful to enhance the precision and accuracy of the condition monitoring/fault detection system. AI condition monitoring/fault detection systems have been developed by using, for examples, ANN, fuzzy logic system, support vector machine, extreme learning machine and etc. (Ali 2018). Among them, ANN is popular due to its capability of identifying the complex nonlinear relationships among features in a large dataset and hence, it can perform with an accurate prediction.

ANN and metaheuristic ANN methods for condition monitoring and fault detection
ANNs have been widely applied as condition classification tool in combination with signal processing techniques for feature extraction. For example, bearing has been identified as one of the main fault components in rotating machinery.  From the literature, extensive research work on bearing fault detection and diagnosis have been reported (Cerrada 2018). A single hidden layer feed forward ANN was proposed to perform health diagnostics of ball bearings in direct-drive motors (Cocconcelli et al. 2011). The feed forward ANN overcomes the problem of continuous change in rotational speed of the motor and identifies the health of ball bearing in different speed applications. A singular spectrum analysis (SSA) is integrated with an ANN based classifier to perform fault detection and diagnosis (Muruganatham 2013). Feature extraction by using existing time series methods could be affected by noise and sample sizes, but the proposed method could overcome the drawback of the existing time series methods and achieve a high classification rate. A feature extraction method based on empirical mode decomposition (EMD) was appended to an ANN based classifier to categorize bearing defects (Ben Ali 2015). The authors showed that the ANN can perform degradation assessment on the bearing condition automatically without human intervention. Kankar et al. (2012) compared the effectiveness of ANN and support vector machine (SVM) in detecting faults in a rotor bearing system. They used statistical methods to extract features that were input to both ANN and SVM classifiers. Results showed that ANN classifier was able to achieve higher accuracy than SVM. An adaptive algorithm based on wavelet transform was applied to perform feature extraction and extracted features were input to ANN and KNN, where ANN showed more effective in classifying bearing faults (Gunerkar et al. 2019). Effective condition monitoring technique is useful to enhance a machining process. Siddhpura and Paurobally (2013) presented a review of different classifiers (such as ANN, fuzzy logic, neurofuzzy, hidden markov model and SVM) used for predicting tool wear in metal cutting such. Among them, ANN classifier was the most frequently used classifier due to its noise tolerant and high fault adaptability characteristics. A hybrid machine-learning classifier between Gaussian classifier and an adaptive resonance theory (ART) neural network was proposed to implement online tool wear monitoring (Wang et al. 2013). This hybrid machinelearning classifier could memorize new knowledge in online training and achieves high classification accuracy. The performance of an ANN classifier was demonstrated to outperform SVMand KNN (Hesser and Markert 2019) in monitoring the condition of tool wear of a retrofitted CNC milling machine. Although ANN has shown effective in various condition monitoring and fault detection applications, a drawback is that the performance of ANN is highly relied on the parameters in its architecture such as the number of hidden nodes and connection weights. A survey on different combination of ANNs and evolutionary algorithms (EAs) was first made by Xin (1999). Yao explained that connection weights, learning rules and architecture of ANNs could be evolved using EAs. After a decade, Azzini and Tettamanzi (2011) extended the Yao's survey by updating the most recent related literatures. Ding et al. (2013) reviewed and presented some of the problems associated to the integration of ANNs with EAs. In machine learning community, many metaheuristic algorithms have been introduced to integrate with ANNs to enhance learning for achieving a better generalization performance (Devikanniga et al. 2019). On the other hand, a number of comprehensive reviews on the application of ANNs in condition monitoring and fault detection are available in the literature; however, the work on integrating ANNs with metaheuristic algorithms for performing condition monitoring and fault detection is relatively few. Genetic algorithm (GA) is one of the popular evolutionary algorithms applied to perform fault detection. A Dual GA loop has been applied to optimize the structure and the connection weights of a three-layered feed-forward ANN for fault detection in power systems (Bi et al. 2000). Besides, GA has been used together with Levenberg-Marquardt (LM) algorithm to train an ANN in an electrical machine fault detection application. The proposed hybrid method between GA and LM overcame the weal local search ability of GA (Zaiping et al. 2008). Besides, the performance of various optimization methods including ANN, fuzzy, GA and Ant Colony Optimization (ACO) and their hybrid models is compared when predicting accidents caused by repair and maintenance in oil refineries (Zaranezhad et al. 2019). Among those computing models, ANN-GA outperformed the rest. Three methods including multi-layer perceptron (MLP) neural network, radial basis function (RBF) neural network, and KNN were hybridized with GA for detecting gear faults (Lei 2010). Result showed that the performance of hybrid method was better than the single machine learning methods. Apart from evolutionary algorithm, swarm based metaheuristic algorithms are also always applied to train an ANN in fault detection applications. For example, particle swarm optimization (PSO) was used to hybridize with ANN in predicting drilling fluid density (Ahmadi 2018). Result showed that PSO-ANN could achieve a better fault detection performance if compared to fuzzy inference system (FIS) and GA-FIS. PSO was also proposed to optimize the weights and threshold parameters of a back propagation neural network (BPNN) that was applied for drilling fluid density prediction (Zhou et al. 2016). In a fault detection of multilevel inverter, GA and PSO were applied separately to train an ANN in order to optimize its connection weights. PSO tended to predict more accurately and faster than GA (Sivakumar and Parvathi 2014). On the other hand, four metaheuristic algorithms including PSO, GA, Tabu Search (TS) and Cuckoo Search (CS) have been applied to optimize the weight and the bias values of an ANN in a multilevel inverter fault diagnosis application (Manjunath and Kusagur 2018). A modified evolutionary PSO with time varying acceleration coefficient (MEPSO-TVAC) was proposed to optimize regression coefficient of the ANN before performing transformer fault detection (Illias et al. 2016). Fault diagnosis performance could be enhanced on the use of a feature extraction method in addition to a decision-making tool. Kernel linear discriminant analysis (KLDA) was proposed to extract the optimal features from a fault dataset of analog circuits, and PSO was applied to tune the ANN parameters and structures (Xiao and Feng 2012). ACO was proposed to optimize the weights of an ANN in a rolling bearing fault detection application (Shi 2010). An improved Gravitational Search Algorithm (GSA) was to optimize the weights and the bias settings of an ANN trained by back propagation algorithm in machine vibration (Do 2017). The performance of metaheuristic algorithm could be affected by the exploitation and exploration characteristics of the algorithm. Ideally, a metaheuristic algorithm should achieve a good balance of search between exploration and exploitation. One of the ways to tackle this issue is by hybridizing two or more metaheuristic algorithms in order to achieve such balance between two modes of search. A novel fault diagnosis model for sensor nodes was proposed by optimizing the weights of a feed-forward ANN using a hybrid metaheuristic algorithm of GSA and PSO (Khilar and Dash 2020). Table 1 shows a summary of aforementioned research work about the state-of-the-art of metaheuristic ANNs (Bi et al. 2000;Zaiping et al. 2008;Zaranezhad et al. 2019;Lei 2010;Ahmadi 2018;Zhou et al. 2016;Sivakumar and Parvathi 2014;Manjunath and Kusagur 2018;Illias et al. 2016;Xiao and Feng 2012;Shi 2010;Do 2017;Khilar and Dash 2020) in condition monitoring and fault detection. All of these metaheuristic ANNs optimize the ANN's parameters (i.e., centers/widths, connection weights the number of hidden nodes or structure) from randomly-initialized settings. On the contrary, in this paper, we introduce a hybrid model of a radial basis ANN and HS algorithm (RBFN-DDA-HS) that combines the advantage of incremental learning with the RBFN-DDA, and search and adaptation with the HS. RBFN-DDA performs incremental learning wherein its structure expands by adding new hidden units to include new information. As such, its training can reach stability in a shorter time compared to the gradient-descent based methods to learn information (or knowledgebase) directly from a dataset. In this case, the knowledgebase represents solutions to a problem at hand and is not initialized randomly (that is implemented in other metaheuristic ANNs). To achieve optimal RBFN-DDA performance, HS is proposed to further optimize the knowledgebase that is represented in terms of a set center and width from all hidden units in a trained RBFN.

RBFD-DDA
Learning in RBFs is governed by many parameter settings such as the number of nodes in the hidden layers, the type of activation functions, the center and width of a neuron. Normally, in a fixed architecture, the number of nodes in the hidden layer must be determined or fixed before training process begins. It is important to determine the optimized number of neuron since redundant hidden neuron can result in poor generalization and overlearning situation. On the hand, insufficient number of hidden neuron can result in inadequate of learning information from data (Liu 2004). To enhance the performance of the network by determining optimized number of hidden neuron, the Dynamic Decay Adjustment (DDA) algorithm is applied. Two unique characteristics of DDA algorithm are the constructive nature of probabilistic extension of restricted coulomb energy algorithm (P-RCE) (Hudak 1992) and the independent adaptation of each prototype by referring to a decay factor. An area of conflict is defined by using two thresholds including positive thresholds h þ À Á and negative thresholds h À ð Þ during training as shown in Fig. 2 (Berthold and Diamond 1998). h þ determines the lower bound of activation value for the training patterns of correct class so that no new neuron is committed while h À represents the upper bound of activation value for the neuron tolerating with neurons of conflicting class. Thus, the area of conflict refers to those sections where neither matching nor conflicting training patterns are allowed to reside. This algorithm constructs an RBFN structure dynamically during training process by an idea that new hidden neurons will only be inserted to the hidden layer when necessary. With this growing structure during training process, to the RBFN can reach stability in a shorter time compared to ANNs trained with gradient-descent based methods. Besides, the DDA algorithm computes the decay factor based on neighbors' information.
An RBFN trained with the DDA Algorithm (RBFN-DDA) consists of three layers, namely, the input, hidden and output layers (Fig. 3). The input layer corresponding to network input features which represents the dimensionality of the input space and this layer is fully connected to the hidden layer. Next, the hidden layer of RBFN-DDA network consists of radial Gaussian units as an activation function and these units will only be added during training when necessary. Each hidden RBF unit is connected to only one output unit. The number of the output unit is determined by the number of the possible class. The output unit with the highest activation determines the class value. During the DDA-Algorithm training, all RBFs weights are first set to zero to avoid any duplication of the information about training samples. Then, all training samples are presented to the network. If a new training sample is classified correctly, then the weight of the RBF unit of correct class is increased by one. On the other hand, if the training sample is misclassified, a new RBF unit with an initial weight of one is introduced. The hidden RBF unit is assigned with a reference vector that is the same as the new training instance. The last step of the algorithm reduces the radii of all conflicting RBF units in a width shrinking process. The DDA algorithm defines an area of conflict by referring on two thresholds [positive thresholds h þ À Á and negative thresholds h À ð Þ]. In this study, the h þ and h À are ANN-GA-LM GA is proposed to optimize the number of hidden neuron with the initial value is set with experience. Then, LM is used to further optimize the ANN's connection weights from random initialization Zaiping et al.
ANN-GA/ANN-ACO GA/ACO is used to optimize the weights of neurons on the hidden layer from random initialization Zaranezhad et al.
Multiple classifier-GA GA is applied to optimize the weight of multiple classifiers from random initialization Lei (2010) PSO-ANN/GA-FIS GA/PSO is applied to optimize the relevant parameters of a regarded FIS/ANN from random initialization Ahmadi (2018) PSO-BPNN PSO is used to optimize network weights, threshold parameters with the number of hidden neuron is set manually and the connection weights are randomly initialized before PSO optimization set to 0:4 and 0:2. To commit a new RBF unit, the activation of existing RBFs of the correct class must not be above positive thresholds h þ À Á and during shrinking the width of an RBF unit of a conflicting class must not be above negative thresholds h À ð Þ. Figure 4 shows the RBFN-DDA training in a single epoch. Usually, the training process of RBFN-DDA involves several epochs before completion. However, in this study, the center and radius of neuron shall be optimized using EA, thus the ANN training of RBFN-DDA is run for one epoch only.

Harmony search (HS)
An interesting musical inspired algorithm called HS algorithm was first introduced to solve optimization problems (Geem et al. 2001). Musicians always search for an ideal state of harmony in their performances. Music improvisation is executed iteratively to obtain optimal harmony by considering three rules (Geem et al. 2001): (1) memory consideration-a new music is improvised from the existing harmony memory (HM); (2) pitch adjustment-a new music is improvised by slightly adjusting the pitch; (3) randomization-a new music is improvised on a random basis. HS mimics a musician's behavior of searching for an ideal state of harmony for finding the best solution to an optimization problem. This algorithm is widely applied for solving optimization problems and training of ANN. In Geem et al. (2002), HS was employed to solve several water resources problems and it gave better near-optimal solutions with faster convergence in most problems. HS was also employed to optimize the parameters of a multilayer feed-forward ANN such as the number of neurons in the hidden layer, the learning rate, and the momentum rate in predicting the AC power from a photovoltaic system (Kassim et al. 2014). HS was applied to determine the optimal initial weights of the ANN in the prediction of the stability number of breakwater armor stones and it was more efficient than the conventional ANN model (Lee et al. 2016). The HS algorithm was proposed in predicting the best structure for ANN in the financial fraud detection application and it is able to achieve a high accuracy (Daliri 2020). A study showed that HS could achieve a better overall recognition performance than BP or GA in a feedforward ANN (Kattan and Abdullah 2013). The ability to explore the search space of HS is highly depended on the pitch adjustment and randomization. Pitch adjustment warrants that new solution is improved from existing good solutions, whereas randomization provides a search for new solution within the search space. Harmony memory size (HMS), harmony memory considering rate (HMCR), pitch adjusting rate (PAR), and the termination criterion shall be specified. The ability of HS in finding optimal solution from a search space is highly relied on a harmony memory acceptance rate. The following HS parameter ranges are recommended to produce an optimal solution (Lee 2005) 0.70-0.95 for HMCR, 0.20-0.50 for PAR, and 10-50 for HMS. The procedure of HS algorithm is shown in Fig. 5.

The proposed hybrid RBFN-DDA-HS
In literature, many training algorithms were proposed to train RBFN, including gradient descent algorithm (GD)  (Simon 2002). However, these methods exhibit poor convergence and time consuming before obtaining optimal solution (Kurban and Beşdok 2009). Evolutionary algorithm such as GA has been applied to optimize RBFN (Barreto et al. 2002). GA could perform a robust search algorithm to avoid itself from being stuck in local minima. However, its search algorithm is computational expensive to find optimal solution (Hamadneh, 2012). On the other hand, various global optimization algorithms based on other metaheuristics have been introduced to train RBFN for dealing with problems in different application domains. These global optimization algorithms include particle swarm optimization (PSO) algorithm (Liu 2004), artificial immune system (AIS) algorithm (Castro and Zuben 2001), differential evolution (DE) algorithm (Yu and He 2006), firefly algorithm (FA) (Horng et al. 2012). The purpose of applying these algorithms is to find the best settings of the center and the width of the hidden units in RBFN to achieve optimal network performance (Simon 2002). In this paper, HS is proposed to optimize the center and the width of each hidden unit in a trained RBFN for optimizing its recognition performance. The reason why HS is adopted in this work is that conventional EA such as GA utilizes two parental vectors to generate new solution vectors, whereas HS involves all existing vectors. Therefore, HS has a higher ability in obtaining better solutions when compared to those conventional EAs (Mahdavi et al. 2007). The training procedure of RBFN-DDA-HS model can be summarized as follows: a. RBFN-DDA Training The proposed model begins with the training of RBFN with the DDA Algorithm as described in Sect. 3.1 by using a training data set. After completing the RBFN-DDA training process, the trained solution vector is formed as a set of center and radius of hidden units z train ; z train À Á . b. Defining objective function and setting the HS parameters. The fitness of the solution vector is evaluated in term of accuracy and hence the objective function is defined as below: The HS parameter settings are listed in Table 2 below: where C z i ; r i ð Þ¼ The number of correctly classified training data by z i ; r i ð Þ, z i ; r i ð Þ¼ The trained center and radius of a hidden node, N ¼ The number of training data.  Maximum improvisation is referred to as the largest number of iteration of the evolution in an HS. A common setting for maximum improvisation of HS is occasionally greater than 10,000 for achieving high accuracy rates. The reason is, the initial solution of HS is set randomly because prior knowledge about the problem is not known. A high setting in maximum improvisation is a way adopted by HS to search for solutions to achieve good accuracy rates. However, in this study, the initial harmony memory is not randomly set, it explores from a set of knowledge of the problem learned by the RBFN-DDA. RBFN-DDA that performs incremental learning can absorb information about the problem from the data and provides high quality initial solution. HS is employed to perform exploration on this solution with an aim to achieve high performance in terms of accuracy rates. Notably, the number of iteration of the HS component in RBFN-DDA-HS is set with a small number to avoid overtraining. Another advantage of RBFN-DDA-HS is that it can shorten computation time as compared to performing search using HS alone.
c. Forming an initial harmony memory from a trained RBFN. An initial population called harmony memory (HM) is generated from a set of parameters from a trained RBFN instead of generating this population randomly. By referring Sect. 3.2, the HMS is set as 10.
Thus, 10 sets of solution vector stored in the HM matrix z; r ð Þ as shown below: where z t ; r t ð Þ is, respectively, the center and radius of RBFN; m is the number of hidden units in RBFN. The other solution vectors z i ; r i À Á are developed according to: A Relative Multiple Factor (RMF) value 2 0; 1 ½ is applied on the trained solution vectors to control the variation of the developed center and width. All solution vectors stored in the harmony memory are then evaluated by using the objective function defined in step (b).

d. Evolving solution vectors. A new solution vector
z new ; r new ð Þ is generated either by slightly adjusting the solution candidate from any existing solution candidate in HM or by creating a new solution candidate on a random basis as described in Session 3.2 The fitness of new solution vector is computed as in step (b). If the new solution is better than the worst one, then it will be included in the HM while the worst one will be removed. e. The procedure is either continued as in step (d) or is terminated if the maximum number of improvisations has been reached. A flowchart of the RBFN-DDA-HS model is listed in Fig. 6.

Experiments and results
Fault detection is essential to ensure the effectiveness and reliability of machinery. In rotating machinery, bearing has been identified as one of the main fault components. Hence, fault detection of bearing has attracted a great attention from researchers. Besides, extensive research have been also focused on a data classification approach to fault detection in the manufacturing industry such as tool wear monitoring and machining parameter prediction that are described in Session 2.2. Condition monitoring and fault detection techniques are applied in power generation industry with an intention to avoid sudden breakdowns which may result in costly repair and machine unavailability. The effectiveness of the proposed model is evaluated in two benchmark fault detection problems, which are bearing fault classification and steel plate fault detection problems. This study also involves the application of the proposed RBFN model to perform condition monitoring in a real case study, which is a circulating water (CW) system in a power generation plant. The dataset in each problem was randomly divided into both training and testing dataset. All the attributes in the dataset were normalized between 0 and 1. To compare classification performances of RBFN-DDA and RBFN-DDA-HS statistically in monitoring the operating condition of the CW system, a Wilcoxon signed rank test was employed at a level of significance a = 0.05. The classification performance of the proposed RBFNDDA-HS was also compared with other machine learning methods in all three problems for which the results of these machine learning methods are taken from Tan et al. (2007), Kavathekar et al. (2016), Tan and Lim (2015) and Wong (2015), respectively.

Fault detection applications
4.1.1 Benchmark dataset 1: Bearing data set The RBFD-DDA-HS is applied to classify the bearing faults. The benchmark dataset is taken from Case Western Reserve University (CWRU) (Loparo 2003). The data were generated from a setup consisting of a 2HP motor, a dynamometer and bearing support (Fig. 7). Data were collected from an accelerometer at 48,000 Hz which means 48,000 samples/second was collected in the bearing experiments. The load was varied from 0 to 3 HP with fault dimensions varying from 0.1761 to 0.7044 mm. Ten features were extracted from drive end signal, which were mean, standard deviation, variance, root mean square value, skewness, kurtosis, minimum value, peak value, crest factor and form factor. The bearing dataset contains 55 records that indicate four types of bearing fault condition including health bearing, inner race (IR) defect, ball defect and outer race (OR) defect. Details of this dataset are explained by Vakharia et al. (2016).

Benchmark dataset 2: Steel plate data set
The data set for steel plate fault detection problem can be downloaded from the UCI web repository (Lichman et al. 2013). A total of 27 numerical attributes of a steel-plate image are used to classify each sample into one of the seven types of steel plate's faults, namely pastry, Z-scratch, K-scratch, stains, dirtiness, bumps, and others. The experiment is repeated for 30 times, with different training data to obtain average results. The data set consists of 1941 records which are divided into a training set of 1457 samples and a test set of 484 samples in the study. The proposed RBFN-DDA-HS is applied to perform condition monitoring by learning and classifying a set of real data collected from a CW system in a power generation plant in Penang, Malaysia. The CW system operates to provide cooling water continuously to the main turbine condenser to condense steam from the turbine exhaust (Berhad and System description and operating procedures. 1999). The overall water steam cycle efficiency in power plant is highly relied on the operating condition of the CW system. Intelligent system such as RBFN-DDA-HS could be helpful to monitor the operating conditions of the CW system and reduce the frequency of unexpected breakdown. Figure 8 shows an overview of the CW system. The CW system includes all piping and equipment (such as condensers and drum strainer) between seawater intake and the outfall where water is returned to the sea. The description of the method of seawater processing and how it is transferred into the CW system can be referred in Tan and Lim (2004). A targeted 80 MW power generation is the fundamental environment to establish the database. For every 5 min, an input sample of 12 temperature and pressure measurements was collected at the inlet and outlet points of the condenser. The operating conditions of the CW system were identified in four classes, which are listed in Table 3. The database had a total number of 2500 input samples.

Performance comparison between RBFN-DDA and RBFN-DDA-HS
The experiment was conducted by repeatedly training the classifiers using different training and testing data before computing average classification results. We followed the experimental setup as mentioned in Kavathekar et al. (2016), Tan and Lim (2015) and Tan et al. (2007). In this case, for steel plate (Tan and Lim 2015) problem and CW system case study (Tan et al. 2007), the experiment was repeated for thirty and ten times, respectively. However, in experiment using the bearing dataset (Kavathekar et al. 2016), the number of repetition of machine learning in training and testing was not mentioned. In our work, we repeat the experiment using RBFN-DDA and RBFN-DDA-HS for thirty times. All average results of training and testing accuracy of RBFN-DDA and RBFN-DDA-HS are  Heat transfer in the condenser is efficient and there is no significant blockage in the piping system 2 Heat transfer in the condenser is not efficient and there is no significant blockage in the piping system 3 Heat transfer in the condenser is efficient and there is significant blockage in the piping system 4 Heat transfer in the condenser is not efficient, and there is significant blockage in the piping system shown in Table 4. RBFN-DDA-HS performs with higher accuracy rates than RBFN-DDA in all three dataset. These results signify the performance of RBFN-DDA has been improved after its learning is integrated with a search and adaptation process by the HS algorithm.
A Wilcoxon signed rank test (Woolson 2007) is applied to compare statistically the classification performance between RBFN-DDA and RBFN-DDA-HS at a level of significance a = 0.05 in all three problems. In this test, the null hypothesis is that the testing accuracy of RBFN-DDA-HS is the same as that of RBFN-DDA. The alternative hypothesis is that the testing accuracy of RBFN-DDA-HS is different from RBFN-DDA. By referring to the results in Table 5, all p values are smaller than 0.05. This means that the classification performances of RBFN-DDA and RBFN-DDA-HS in terms of testing accuracy are statistically different. In this regard, RBFN-DDA-HS achieves better accuracy rates than RBFN-DDA in handling three tasks related to condition monitoring and fault detection.

Performance comparison with other machine learning methods
The classification performance of RBFN-DDA-HS is compared with other machine learning methods in all three problems Kavathekar et al. (2016), Tan and Lim (2015) and Tan et al. (2007) from which their accuracy rates are referred. Table 6 shows the results. In bearing dataset, the result of RBFN-DDA-HS is compared with machine learning classifiers equipped with a feature selection algorithm (i.e., Random forest and Rotation Forest), and without a feature selection algorithm (i.e., ANN, SVM, Decision Tree, RBFN-DDA) (Kavathekar et al. 2016). The classification accuracy of RBFN-DDA (i.e., 64.35%) is higher than the classifiers without feature selection such as ANN, SVM and Decision Tree, but is less accurate than classifiers with feature selection (i.e., Random forest and Rotation Forest). When RBFN-DDA classifier is integrated with HS, the proposed RBFN-DDA-HS model outperforms all other classifiers by achieving the highest classification accuracy, i.e., 78.81%. Next, in steel plate dataset the classification performance of RBFN-DDA-HS is compared with a metabheuristic classifier (FAM-GSA) and others single machine learning classifiers (FAM, MLP and RBF) (Tan and Lim  . Based on the results in Table 6, the testing accuracy of RBFN-DDA is higher than other single machine learning classifiers. Note that FAM-GSA contributes only a small improvement in testing accuracy, i.e., 0.9% from FAM. Both FAM-GSA and RBFN-DDA-HS are metaheuristic classifiers. In comparison, the proposed RBFN-DDA-HS is more effective than FAM-GSA in detecting steel plat faults. The HS can help improve RBFN-DDA-HS by a testing accuracy of 2.2% from RBFN-DDA. The performance of the RBFN-DDA-HS in CW system is compared with fuzzy ARTMAP (FAM) rectangular basis function network (RecBFN), RBF-based Extreme Learning Machine and RBF-based Constrained Optimization Extreme Learning Machine (C-ELM). By observing the results in Table 6, the testing accuracy of RBFN-DDA is the lowest if compared to other machine learning methods. However, when RBFN-DDA is integrated with the HS algorithm to form the RBFN-DDA-HS, its testing accuracy improves greatly and outperforms all other machine learning methods. The HS algorithm is effective to search for the optimal parameter settings.

Summary
In this study, a hybrid classification model of RBFN-DDA and the HS algorithm is proposed to improve the RBFN-DDA learning parameters. HS is adopted in the study due to its simplicity, high flexibility and search efficiency. The effectiveness of the proposed algorithm is demonstrated in the condition monitoring of circulating water (CW) system and two fault detection benchmark datasets, which are bearing and steel plate. The proposed RBFN-DDA-HS achieves the highest classification accuracy if compared to other machine learning methods. Result shows RBFN-DDA-HS can improve classification performance in terms of testing accuracy that has a range between 2.2 and 22.5% from RBFN-DDA. Besides, the results from the Wilcoxon signed rank test show that the classification performances of RBFN-DDA and RBFN-DDA-HS are statistically different for all three condition-monitoring case studies where the latter outperforms the former. HS algorithm is a musical inspired metaheuristic algorithm. In the future, the classification performances of RBFN-DDA integrated with other types of metaheuristic algorithm (e.g., from biological, physical or chemical type) would be investigated. In addition, research may also focus on combining RBFN-DDA with two metaheuristic algorithms. The integration of two metaheuristic algorithms may explore the search space for optimal solutions more efficiently and effectively.