A hybrid remaining useful life prediction method for cutting tool considering the wear state

Accurate cutting tool remaining useful life (RUL) prediction is of significance to guarantee the cutting quality and minimize the production cost. Recently, physics-based and data-driven methods have been widely used in the tool RUL prediction. The physics-based approaches may not accurately describe the time-varying wear process due to a lack of knowledge for underlying physics and simplifications involved in physical models, while the data-driven methods may be easily affected by the quantity and quality of data. To overcome the drawbacks of these two approaches, a hybrid prognostics framework considering tool wear state is developed to achieve an accurate prediction. Firstly, the mapping relationship between the sensor signal and tool wear is established by support vector regression (SVR). Then, the tool wear statuses are recognized by support vector machine (SVM), and the results are put into a Bayesian framework as prior information. Thirdly, based on the constructed Bayesian framework, the tool wear model parameters are updated iteratively by the sliding time window and particle filter algorithm. Finally, the tool wear state space and RUL can be predicted accordingly using the updating tool wear model. The validity of the proposed method is demonstrated by a high-speed machine tool experiment. The results show that the presented approach can effectively reduce the uncertainty of tool wear state estimation and improve the accuracy of RUL prediction.


Introduction
Cutting tools play a critical role in the process of mechanical manufacturing. When the tool wear exceeds the failure threshold, the machining quality of the workpiece will be significantly reduced, and even the workpiece will be scrapped if the cutting tool cannot be replaced in time [1]. The machine downtime that caused by abnormal tools accounts for 10-40% of the total machine downtime [2]. Besides, replacing cutting tools that have not reached their service life is a waste of resources and time [3]. To ensure the quality of the machine, an accurate prediction of the tool RUL can reduce the manufacturing cost and improve production efficiency.
Generally, tool RUL prediction methods can be divided mainly into physics-based methods, data-driven methods, and hybrid methods. Physics-based methods conduct the tool RUL prediction based on physical models, which describe the tool wear process using physical or first principles [4]. According to the wear mechanism, the tool wear is strongly affected by contact stresses, the relative sliding velocity of the interface, and cutting temperature [5]. The tool wear process is generally expressed as ordinary differential equations or partial differential equations [6,7]. Therefore, the RUL of the tool can be predicted according to the physicsbased models. Malakizadi et al. [8] combined the thermodynamic model with the finite element simulation and put forward a wear prediction method of cemented carbide tool. By studying the physical characteristics of the wear process, Pálmai [9] built a wear rate model that considered the technological parameters of cutting and the change of tool flank temperature. Some other researchers used the finite element method to characterize the tool wear mechanism [5,10]. However, it is difficult to establish an accurate mechanism model because of uncertainties and nonlinearities in the machining processes, such as manufacturing errors and material properties. Therefore, the prediction approach based on the physics-based models may not be suitable for different conditions without extensive tests.
In contrast to the physics-based methods, data-driven methods do not require physical knowledge to predict the RUL [11]. Historical data are used to train a data-driven model and characterize the regularity of the tool wear. Statistical modeling approaches describe the tool state space in the form of the probability distribution function, which can effectively deal with uncertainties in the machining process [12]. Therefore, some researchers used statistical models to characterize the tool wear process, for example, Wiener process [13], Gamma process [14], inverse Gaussian process [15]. It is often difficult to obtain sufficient offline measurement data in practical engineering, and the statistical modeling methods require many experimental data to identify the model parameters accurately. For another kind of data-driven approach, artificial intelligence methods such as machine learning can establish the relationship between the tool wear and sensor signal to achieve real-time RUL prediction [16]. Due to easy implementation, machine learning has become a hot topic in the tool wear and RUL prediction. Machine learning methods include artificial neural network (ANN) [17,18], support vector regression (SVR) [19,20], hidden Markov model (HMM) [21,22]. Additionally, deep learning models such as convolutional neural network (CNN) [23] and long short-term memory (LSTM) [24,25] are also widely used in the tool RUL prediction. However, the prediction accuracy of data-driven methods depends on the quantity and quality of historical data [26]. In practical engineering, it is often difficult to collect enough qualified data because of limited resources and time, leading to large RUL prediction errors.
The physics-based and data-driven approaches have their advantages and disadvantages, and there is no suitable method when historical data are limited [4]. By combining these methods, hybrid methods can intuitively take advantage of their strengths and improve the tool RUL prediction performance. Recently, many researchers have turned their attention to the tool RUL prediction on the hybrid methods. Sun et al. [27] combined the data-driven model with Wiener process for the hybrid tool RUL prediction. Wang et al. [28] proposed a cross physics-data fusion (CPDF) modeling strategy and established a physics-guided neural network model for the tool wear prediction. Particle filter, which is a sequential Monte Carlo method, can effectively combine physical information with online monitoring signal [29]. To improve the efficiency of particle filter method, Pang et al. [30] constructed a self-moving average model based on the transition and an enhanced particle filter method. Wang et al. [11] realized the prediction of the tool wear by recursively updating the physical tool wear rate through online measurement based on the particle filter. Although the particle filter method has achieved some success in the tool wear and RUL prediction, the traditional particle filter method still has some weaknesses. When adopting particle filter to predict tool RUL, prior probability distribution should be determined based on historical data. Thus, the accuracy of prior information may directly affect the accuracy of prediction results.
It is found that some progresses have been made in hybrid methods for the tool RUL prediction. These methods achieve prognosis by combining different types of models. However, the tool wear processes can be divided into three categories: initial wear, normal wear, and severe wear [31]. The data characteristics and wear trends of different wear stages are quite different. Current methods (such as particle filtering based on the Bayesian framework) do not consider these wear processes and use a single probability distribution to quantify the uncertain model parameters, which leads to a large prediction error. Therefore, assigning different model parameters in different wear stages is beneficial to improve the prediction accuracy. In this paper, a hybrid prediction method considering the tool wear state is developed to achieve an accurate prediction. SVR is firstly used to obtain the observed value of the tool wear. The wear stage recognition results based on SVM are then put into the Bayesian framework as prior information. Moreover, an algorithm that combines sliding time window and particle filter is proposed to update the wear model according to the observed wear states. Finally, the tool wear state space and RUL are predicted by the modified tool wear model.
The rest of the paper is organized as follows. Section 2 introduces the basic knowledge of the tool wear model and data-driven method, which provides the theoretical basis for the research. Section 3 describes the framework of the proposed hybrid approach. In Sect. 4, the proposed method is verified by experiments and compared with other methods. Finally, conclusions are illustrated in Sect. 5.

Tool wear model
In the area of metal cutting, the tool wear is generally divided into three stages: initial wear, normal wear, and severe wear. Because of the stress variation of the tool, the tool wear rate varies in its service stages. The change of the tool wear with the machining time is shown in Fig. 1.
The tool wear model usually describes the growth behavior of the tool wear with time. The wear width is a direct index of the severity of wear for the prediction of the tool wear. An empirical model expresses the relationship between the tool wear rate and the changes of applied load based on physical knowledge [32] which is developed as follows: 1 3 where x represents the tool wear width, and the wear width growth rate dx/dt is an exponential function; G represents the wear coefficient depending on the materials and temperatures; H is the hardness of the softer in the pair of tool and workpiece; V c is the velocity of rubbing, and m is a constant depending on the nature of the removed layer; N denotes the normal load on the surface [33].
As the tooling force increases with the wear width of the tool, the normal load N is approximated as a linear relationship of the wear width x.
in which a is a coefficient, and Eq. (1) can be reformulated as follows: To simplify the model and improve computational efficiency, a new coefficient c is used to replace the coefficients G∕Ha m xV c , Eq. (3) becomes: Since the measured wear are discrete values, the variables on both sides of Eq. (4) are separated for integral operation, and Eq. (4) is rewritten into the system equation in the form of state transition: where x k is the wear state at time k, and the wear state at time k − 1 is represented by x k−1 . In order to represent the three stages of the tool wear process and reflect the stochastic of the wear process, distribution functions of parameters m and c are respectively established according to the historical data of the three wear stages.

The data-driven methods
To have accurate predictions, two data-driven approaches are adopted in the proposed method. The data-driven classification method trained from historical data is adopted to recognize the tool wear state, and a data-driven regression method is used to establish an observation model for the mapping from the indirectly measured tool state to the observation. In this paper, SVM and SVR are applied due to the advantages of data processing, and the brief reviews about them are given below.

SVM
SVM is a pattern recognition technology on statistical learning theory [34]. SVM adopts the principle of structural risk minimization to classify data by constructing an optimal hyperplane. It has a fast-computing speed and strong generalization ability. Also, it can effectively deal with the problem of limited samples, nonlinearity, and high dimensions [34].
For the given training sample set .., m , the following hyperplane is constructed in the sample space: where =( 1 , 2 , ..., d ) is a normal vector of the hyperplane, and b is the offset. Based on the hyperplane, a classification function can be constructed as follows: The problem of finding the optimal classification hyperplane can be transformed into a constrained quadratic optimization one by comprehensively considering the structural risk minimization criterion, regularization term, and fitting error. A unique optimal solution can be obtained by solving the following equation: where C is the penalty factor, and i represents the relaxation factor. Using the Lagrange coefficient method, an optimal hyperplane can be constructed as follows: In which i is a Lagrange multiplier. For nonlinear classification, SVM uses the kernel function to map the lowdimensional nonlinear data to a high-dimensional space. Since radial basis function (RBF) can intuitively reflect the distance of data and the classification effect is better than other kernel functions, RBF is used as the kernel function in this paper. The RBF expression is as follows: Where > 0 is width of the kernel.

SVR
Considering the nonlinear relationship between the tool wear and the sensor signal, SVR is introduced. SVR assumes that there is a maximum error between f (x) and y. By constructing a loss function, the SVR can be formalized as the following optimization problem: Where C is the regularization constant; f (⋅) represents a nonlinear transformation that maps the low-dimensional data to a higher-dimensional space.
By introducing Lagrange multiplier i and ̂i , the optimization problem shown in Eq. (11) can be transformed into the solution of the following dual optimization problem: The form of SVR solution can be obtained by solving the above problems:

The proposed tool RUL integrated prognosis method
The basic idea of the proposed method is to update the physics-based model dynamically with online observations. The framework of the proposed tool RUL integrated prognosis method is shown in Fig. 2. After feature extraction, the sensor signals are fed into the SVM and SVR, and the classification results of the wear stage and the observed value of the tool wear are obtained. Then, the classification results of the tool wear stage are used to determine the prior distribution of the wear model, and the observed values of the tool wear are applied to update the model parameters. Finally, the tool RUL is predicted based on the modified model. The tool wear model and the data-driven approaches are combined in the framework. The tool wear model is used as the system model (Eq. 14) to describe the tool wear. The model based on SVR is used as the observation model (Eq. 15) to establish the connection between the sensor signal and the underlying system state. They form the basic framework of Bayesian inference.
Where x k is the state of the tool at time k; u k stands for process noise and v k represents observation noise.

Signal feature extraction and selection
In the machining process, cutting force, vibration, and other signals can be obtained through different sensors. However, it is hard to directly establish the relationship between the sensor signal and the tool wear state because of the low signal-to-noise ratio of the sensor signal. Therefore, some feature extraction techniques are adopted to extract the signal features of the sensor signal from the time domain, frequency domain. and time-frequency domain. The extracted features are listed in Table 1.
However, the trend of the tool wear expansion cannot be well reflected by all signal characteristics. A good characteristic should be consistent with the trend of the tool wear expansion. To improve the accuracy of the tool wear prediction, signal characteristics need to be screened. As an indicator to measure the correlation between two variables, Pearson correlation coefficient is widely used in the tool wear prediction [35]. In this paper, Pearson correlation coefficient is employed to select the key features. This method is a statistical measure of the independence of two or more random variables, which is defined as follows: (14) x Where x i is the actual tool wear, and y i is the extracted feature.

Bayesian inference
The key idea of the Bayesian inference is to construct the posterior probability distribution p(x k+1 |Z k ) of the future state at time k + 1 through the initial probability density function p(x 0 ) and the available observation Z 1∶k [11]. As a recursive (16) numerical method based on sequential Monte Carlo sampling, particle filter can be used to estimate the posterior probability density function of nonlinear or non-Gaussian system states. The posterior probability distribution can be calculated in two stages: prediction and update.
The probability distribution p(x k |Z k−1 ) at time k is predicted by the posterior probability distribution p(x k−1 |Z k−1 ) at time k − 1, and can be formulated as follows: According to the new observation Z k , the state update equation can be established as follows: Fig. 2 Framework of the hybrid tool RUL prediction method Spectral kurtosis

Time-frequency domain
Wavelet energy where p(Z k |Z k−1 ) can be calculated as follows: However, for nonlinear, non-Gaussian systems, the integral operation in Eq. (19) is intractable. To solve this problem, the particle filter method based on sequential Monte Carlo sampling is employed. With this method, a set of random samples (particles) x i k−1 , i = 1, 2, … , N with corresponding weight w i k−1 is used to represent the posterior probability distribution p(x k−1 |Z k−1 ) at time k − 1 [36]. Therefore Eq. (19) can be approximated as the sum of these random samples: Where N represents the total number of particles. The more the number of N is, the higher the calculation accuracy of the described probability distribution and the lower the calculation efficiency will be; (⋅) is the delta function.
According to the Bayesian estimation, the weight of each particle is updated by the likelihood of the observation Z k at time k: Similarly, the posterior probability distribution p(x k+1 |Z k ) at time k + 1 can be obtained as follows: For the degradation problem of the particle filter algorithm, resampling is used at each step to obtain samples with equal weight [37].

RUL prediction via sliding time window and particle filter
As one artificial feature sampler, the sliding time window is widely used in time series sampling [38]. Because the transition of the tool from one wear stage to another appears in a specific time domain, a method based on sliding time window and particle filter is developed to effectively calculate the tool wear in the transition stage. The sliding time window can frame the time series according to the specified window length, and it can calculate the statistical indicators in the window. The principle of sliding time window is shown in Fig. 3. By using the sliding time window algorithm, new observations (the labels of the wear stage) constantly enter the sliding time window. An algorithm that combines sliding time window and particle filter is proposed. Based on this algorithm, the physical model parameters of the tool wear are continuously updated according to the current state of the tool. The major steps of the algorithm are given below.

Step 1 Set initial parameters
Initial particle number N and model parameters dis- of the current stage, otherwise using the resampled particles.

5.
Step 5 Estimate the tool wear. Estimate the tool wear X from the new particle distri- In order to consider the model error and the process noise, the initial parameters of the model are set as a probability distribution, and the distribution parameters are determined according to the historical data. The model parameters are updated based on the Bayesian inference. Then, the updated model is used to predict the tool wear state, and the RUL is defined as the service time before the defect state reaches the threshold.
Where t r is the time when the tool wear reaches the threshold value and t s is the present time.

Experimental setup
To verify the effectiveness of the proposed method, a set of experimental data of dry milling on high-speed CNC machine tools is used [39]. The schematic diagram of the experimental setup is shown in Fig. 4. The down milling test of the workpiece (material: HRC52, stainless steel) was carried out with a three-flute ball nose tungsten carbide milling cutter. The spindle speed was 10,400 r/min, while the feed speed was set as 1555 mm/min, and the axial milling depth was set as 0.2 mm. The radial milling width was set as 0.125 mm, and the feed rate of each cutter was 0.001 mm.
Cutting force signals were collected by a Kistler quartz three-component platform dynamometer mounted between the workpiece and the table. Three Kistler acceleration sensors mounted on the workpiece were used to collect acceleration signals in x, y, and z directions. The sensor sampling frequency was 50 kHz. After each machining, the wear of the tool was measured by a LEICA microscope. During the tool life test, 315 sets of machining data were collected.

Data preprocessing
Firstly, the original signal is normalized to improve the convergence rate of the model. The calculation formula of normalized data is as follows: Then, feature extractions are carried out in the time domain, frequency domain, and time-frequency domain. After feature extractions, the sensor signals of 6 channels are transformed into 54 signal features. The normalized features extracted from the force sensor signal in the x-direction are shown in Fig. 5.
Finally, the features of wear with high correlation are selected based on the Pearson correlation coefficient. The calculation results of correlation coefficients between 54 signal characteristics and the actual wear are shown in Table 2. Signal features with correlation coefficients greater than 0.95 are selected to form a training set for training the data-driven model.

Stage classification of the tool wear
According to the tool wear rate, the tool wear state is divided into the initial wear stage, normal wear stage, and severe wear stage. With the classification results of the tool wear stage, the signal characteristics corresponding to each stage are selected, and the sample dataset D is divided into D 1 , D 2 , and D 3 , which are respectively used to represent the feature sample data sets of the three stages of wear. Labels 1, 2, and 3 are added to the features data of the three wear stages to represent the categories of the three wear stages. Stochastic neighbor embedding (t-SNE) [40] is used to reduce the highdimensional feature data to two-dimensional space. The scatter diagram of the feature data set is obtained, as shown in Fig. 6. As can be seen from the figure, sample points  belonging to the same wear stage are clustered together, and there is a specific interval between the samples of the three wear stages, indicating that the wear stage of the tool is separable. SVM is used as a classifier to train the classification model of the tool in different wear stages. A fivefold cross validation is used in training to get more accurate test results, and the training set to test set ratio is set as 4:1. Figure 7 shows the classification results of the tool wear stages and the actual tool wear. As is shown in the picture, data from each of the three stages of wear can be almost correctly identified.

Result analysis
To facilitate the RUL prediction, the online measurement of time steps [1,215] are selected as the available measurement for model construction and parameter learning. With the proposed fusion wear prediction model considering the tool wear state, the classification results of the tool wear stages are taken as prior information. The initial parameters of the wear model are set as the probability distribution. In the absence of other prior information, the uniform distribution is chosen, and the probability parameters corresponding to the upper and lower bounds of the three wear stages are selected according to empirical knowledge. Based on the prior knowledge of parameter estimation by expectation maximization algorithm, the distribution parameters are determined as shown in Table 3.
Using the current measurement information, the model parameters are updated based on the sliding time window and particle filter algorithm, as shown in Fig. 8.
Based on the results of model parameters, the tool wear state space model is determined, and the wear amount is predicted in the prediction stage. Due to the lack of updatable measurement data, the probability distribution of model parameters fluctuates continuously during the learning phase, but remains unchanged during the prediction phase. Figure 9 shows the prediction results of the tool wear after 215 steps. The results show that the predicted median (represented by a blue line) can track the tool wear and is closely related to the trend of the online measurement. The RUL of the machining tool can be calculated by the estimated tool wear state. As shown in Fig. 10, the uncertainty of the RUL is represented by probability, and the uncertainty of the predicted results is quantified by a 90% confidence interval. Next, the RUL prediction results for different advanced steps are evaluated and shown in Fig. 11. Since the length of the prediction step determines the amount of prior information, the shorter are the prediction steps, and the smaller is the distribution of the RUL prediction results.

Performance comparison
Data-driven method, physics-based method, and traditional particle filter method are used to predict the wear for comparison, and the results are shown in Fig. 12 and Table 4. As is shown in Fig. 12, compared with the tool wear model, the proposed method can characterize the time-varying wear   rate. Compared with SVR, the proposed method quantifies the uncertainty of prediction results. In addition, compared with the traditional particle filter, the confidence interval of the prediction result of the proposed method is smaller. In Table 4, root mean square error (RMSE) is used to evaluate the prediction error, and standard deviation of the mean is used to evaluate the uncertainty of the prediction results. It can be observed that among all the methods, the prediction result of RMSE of the proposed method is the smallest. In addition, compared with the conventional particle filter method, the of prediction result of the proposed method is significantly reduced. Based on the evaluation criteria, the method has good predictive performance for the tool wear and has good effect in practical application.
Since the tool wear state is not considered, the traditional hybrid prediction method only uses a single uniform distribution to characterize the uncertainty of model parameters. However, due to the tool wear rates are significantly different in each wear stage, using a single distribution to characterize the model parameters will lead to greater uncertainty in the prediction results than the proposed method. As shown in Fig. 13, by selecting the maximum probability, the estimated RUL of the proposed method is around 98 cut numbers, while the estimated RUL of the traditional method is around 95 cut numbers. In addition, the uncertainty of the prediction result of the proposed method is obviously less than that of the traditional particle filtering method. It can be concluded that the proposed method is more accurate than the traditional method because of considering different distributions of the tool wear model parameters.
From the above analysis, the proposed approach can combine the estimation of the model parameters and the prediction of the tool wear stages in one framework. Using the physical mechanism, the tool wear is inferred from the online measurement signal based on the Bayesian inference. Therefore, the tool RUL can be estimated by a threshold value, and the uncertainty of prognosis can be quantified in a probabilistic manner.

Conclusions
Accurate and reliable tool RUL prediction is an important research direction to ensure the machining quality, reduce the processing cost, and realize intelligent manufacturing. In this paper, a hybrid RUL prediction method considering the tool wear state is proposed. According to the obtained results, the following conclusions can be drawn.
1. The proposed approach takes advantage of the physicsbased and data-driven methods in a single framework, and precise prediction and uncertainty quantification are achieved based on Bayesian inference. 2. The identification results of the tool wear stage obtained by SVM provide much more prior information for the prediction model, which reduces the confidence interval and provides more accurate prediction results. 3. The algorithm that combines sliding time window and particle filter can adaptively process the time series data and accurately describe the tool wear process.
The effectiveness of the method is verified by a highspeed CNC milling machine experiment. The results show that the proposed method can effectively reduce the   prediction error and quantify the uncertainty of the results. However, this method needs high-order matrix operation when solving support vectors, which will consume a lot of memory and calculation time when dealing with largescale training samples. In addition, it is difficult for the model trained under a single working condition to obtain an accurate prediction in the machining process under various working conditions. In future research, LSTM, CNN, and other deep learning models can be used in this framework to process larger data. In addition, the proposed approach can be extended in the case that the cutting tool has different working conditions.