Tool wear conditions monitoring based on Kernel Smoothing Density and deep learning

Life prediction and health assessment of cutting tools is a challenging problem in industrial manufacturing, and plays an important role for Prognostic activities. The degradation of this tool can cause signiﬁcant economic losses and risks for machine users. However, due to the random nature of system degradation and the sensitivity of various features characterizing cutter degradation, may vary considerably under different operation conditions. This paper propose a new data driven approach for tool condition monitoring, based on kernel smoothing density to model the random phenomena of degradation progression. The health assessment is insured using one directional neural network to extract useful information’s. Deep learning allows relating between extracted features and health indicator progression, and estimate remaining useful life. This methodology is evaluated using regression metrics and compared with recent frameworks, based on an experimental dataset from PHM Data Challenge 2010. The obtained results shows that the proposed approach can effectively predict the tool wear and obtain higher prediction accuracy than other comparison models.


Introduction
LSTM is used because of its capacity of learning long and short-term dependencies. Khanh et al. [22] got much more interested in the classification of defects using a machine learning technique based on LSTM also, and to make this task more practical and useful, a synchronization is planned, allowing to provide a decision support tool (keep or remove damaged parts and even stock management) for the maintenance policy (Dynamic predictive maintenance). Jianjing et al. [23] proposed a machine learning network based on BiLSTM for health monitoring, this network allows from statistical indicators to estimate RUL, at the end, this article compared the proposed network with other architectures (SVR, MLP, BD-RNN, CNN and LSTM) to show its effectiveness. André et al. [24] proposed a new approach based on two different machine learning type (supervised and unsupervised), RBM is used as unsupervised and in the other hand LSTM is used as supervised machine learning, to get better results and for an optimal architecture, GA is used for designing network and the choice of 14 hyperparameters, after, this architecture was tested on an experimental dataset C-MAPSS and compared with previous work by the estimation of RMSE.
In this paper, a new prognostic strategy in data driven approach is proposed to predict cutter RUL. Firstly, the signal distribution obtained using the kernel smoothing density allows modeling random degradation signals obtained from sensors. This signals are used also for Health indicator construction. Secondly, using convolution neural network to extract useful information's from kernel distributions, then bidirectional long short term memory to track degradation during cutters life. Finally, the effectiveness of the proposed approach and accuracy is validated based on experimental dataset [25] of cutter's degradation. The main contributions of this paper includes: -A sensor-based data-driven approach using kernel smoothing density and 1D-CNN/BiLSTM for RUL prediction. The remainder of this paper is organized as follows. Section 2, introduces the basic working principles of the proposed prognostic methodology, where a brief exposition of the necessary background and most important steps. Section 3, describes the theoretical framework of the proposed RUL prediction method. The RUL prediction performance of the proposed method is compared with recent framework based on datasets. Finally, conclusions are given in Section 4 with future scope.
1. Prognosis refers to a prediction, forecast and extrapolation process by modeling the progression of faults, based on the assessment of the current state and future operating conditions; 2. Health management refers to a decision-making capacity to intelligently carry out maintenance and logistics activities based on diagnostic / prognostic information. It is described as the combination of 7 modules [28]: data acquisition, manipulation, condition assessment, diagnostic, prognostics, decision making and humanmachine interface (HMI) Fig. 1. The 7 modules can be divided into three main phases: observe, analyze and act. In the analysis phase, prognosis is considered a key task with future capabilities, which should be performed effectively for successful decision support in order to recommend actions for maintenance [29].

Proposed prognostics methodology
The proposed methodology is part of the PHM activities, more precisely the monitoring of cutting tool health status for RUL estimation. Based on data-driven approach and health indicator to provide a decision support tool for the industry. In this study, an original prognostic approach is proposed based on PHM steps sequence, starting by signal acquisition. These latter are exploited to extract necessary information's, by the construction of HI on one side, and on the other signals processing to extract information related to degradation using KSD. Finally, design an expert system to learn and automatic assess tool health state based on 1D-CNN and BiLSTM network , and finally as shown in Fig. 2. In Fig. 2, the proposed mathematical model of four steps to be followed in the TCM process are: The data acquisition step is to collect data related to the health of the system; Data preprocessing consists of analyzing the acquired signals, including centering and filtering, in order to remove the offset in the measured signals. The proposed approach used the KS-density values over time in order to facilitate the prognostics of time to failure for cutting tool. The details of the proposed approach are presented in the rest of the section.
In the last step, involves estimating the RUL with the BiLSTM based on the useful variables of KSD. PHM aims to assess the state of the current physical system and predict its RUL before the failure. The objective is to maximize the operational safety and availability of the Cutting tool, and better health management. An illustration of a RUL is given in Fig. 3.
The predicted RUL can be obtained by estimating the time between the current time tc and the time tf related to the wear threshold. Therefore; the equation of the RUL is given by: After this step, the proposed network can be trained to estimate HI, after, RUL can be calculated with a simple temporal inversion as in Eq. 1, unfortunately, the obtained results contain several fluctuations that do not have a physical meaning and can be reduced with a simple smoothing using moving average window. In the end, and to be able to compare with previous work, different metrics for assessing the quality of prediction are used (RMSE, MSE, MAE, etc.).

Kernel Smoothing Density estimation
In this study, the problem is to provide a good data-based procedure for selecting the smoothing parameter-which controls the degree of smoothing applied to the dataemployed in statistical curve estimation techniques. The basic kernel density estimator in one dimension has a single smoothing parameter, usually referred to as the bandwidth.
Kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable [30], which is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample. Kernel density estimation on a finite interval poses an outstanding challenge because of the well-recognized bias at the boundaries of the interval [31] .   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 Let x 1 , x 2 , ..., x n be a univariate independent and identically distributed sample drawn from some distribution with an unknown density f . We are interested in estimating the shape of this function f . Its kernel density estimator is: where K is the kernel, a non-negative function and h > 0 is a smoothing parameter called the bandwidth. A kernel with subscript h is called the scaled kernel and defined as Kh(x) = 1/hK(x/h). Intuitively one wants to choose h as small as the data will allow; however, there is always a trade-off between the bias of the estimator and its variance. A degradation process y(t),t ≥ 0 that follows Gaussian process can be defined in terms a wear variable y(t) varying over time t from an initial state at t = 0 according to the increment of wear. The proposed Gaussian process has three parameters µ, σ , w, which can be estimated through MLE according to the following process [32].

1D-CNN/BiLSTM for RUL estimation
The first step in this framework is the data acquisition, using necessary instruments such as the data acquisition modules and sensors (Microphone, Accelerometer, etc.), with adjustment of scan conditions such as sampling frequency, then the recovered data will be stored for further processing which can be expensive in time [33], In a data-driven approach, there are two methods for estimating from historical data, which are direct and indirect. The indirect method relies on the construction of the HI to be able through a model to go back to RUL, by contrast, direct method allows to estimate directly RUL from data [34]. Now, move on to the second stage, which is Health Indicator (HI) construction, in this stage, seek to find an indicator reflecting cutter health stat from the acquired signal, several works have been done in this framework, always looking for an indicator that represents the most monotonicity and reflects the degradation of the component to be studied [35].
To allow the information to be used, its necessary to apply a learning algorithm to build an expert system allowing minimize human intervention and real time monitoring. Given the nature of time-series data which requires an algorithm with a learning power of dependence between data, among the most widely used networks for timeseries domain are the RNNs. Unfortunately, it presents a vanishment gradient defect in addition to its inability of long-term dependency learning, to remedy these problems, a new variety of RNNs appeared called LSTM [36], this latter has proven its competence in time-series monitoring activities by the presence of new gates compared to the simple RNN, which are, input, output and foregate gates, made it possible to solve RNNs problemes mentioned before. The architecture of LSTM node is shown in Fig. 4a, and equations managing the flow of information within are formulated from Eq. 3 to Eq. 8: Forget gate Output gate Candidate state of the memory unit Updated internal state LSTM units final output Where W, U, and b represent the network parameters to be learnd, • is Hadamard product, and two types of activation function were used, σ (x) as logistic sigmoid and g(x), h(x) as hyperbolic tangent.
In the same context of improving time-series monitoring, a new variant of LSTM named BiLSTM, the first one ensures the dependency from the past towards the future, but BiLSTM network ensures a double dependency of the past towards the future and the inverse, as in Fig. 4b, by keeping the same architecture as unidirectional LSTM with a difference in the flow of information in the layer.
The equations describing this aspect are presented in the following [37]:

Experimental demonstration
To evaluate the effectiveness of the proposed approach, the tool wear task prediction conducted on a high-speed CNC machine tool Fig. 5 The machining experiments were carried out on a Roder CNC machining centre. The work piece is made of Inconel 718 which is a hard material to be cut and whose thermal and mechanical properties are of interest in the aeronautical field [25].

Milling cutter
Machining Table   Accelerometer Force sensor

RUL estimation
The framework of the proposed approach for RUL prediction based on the KSD process is illustrated in Figure 6, and includes the training phase model and RUL phase prediction. Here, the parameters of the KSD model are determined in the model training phase in conjunction with a tool wear degradation process training dataset, while the RUL value and 95% confidence interval for the cutting tool are predicted in the RUL prediction phase .   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  0  2  4  6  8  10  12 Time (s)   At this stage, its necessary to use signal processing technique allowing to appear useful informations, this framework propose to use Kernel Smoothing Density technique. Finally, the MLE estimators of the three parameters can be obtained by solving the above equations [32]. Here, we employ the BiLSTM algorithm, the upper limit of estimation error is set to 10 −5 . Now, move to the construction of the health indicator, by the use of RMS as a statistical indicator, which is often used in prognosis frameworks, given that it is sensitive to degradation.KSD is used for features extraction by the use of signal processing toolbox from MATLAB, it returns a probability density estimation f , for a sample vector data, based on a normal kernel function for 100 points, and is evaluated at equally-spaced points x i .
To justify the choice of KSD for cutting tool monitoring, Fig. 7a and 7b clearly show the difference between gaussian bell obtained by the first cycle and the irregularly shaped bell obtained by the 300 th cycle, which shows a dispersion of density in addition to a decrease in the maximum value compared to the first cycle, which confirms the degradation undergone by the cutting tool.
In the activity of prognosis, we often manipulate time-series, which represents a dependency in time and requires the use of deep neural networks with memory effect, which justifies the use of BiLSTM network, to benefit from the double temporal dependency. The proposed network in Fig. 9, consists of input layer stacked with two layers of BiLSTM with 200 and 150 nodes respectively, and separated by two dropout layers with a rate of 0.2 to avoid overfitting, and finalized by two fully connected layers of 135 nodes each one, followed by a regression layer for RUL estimation as shown in figure below.  By arriving at this step, HI can be predicted by training the proposed network and specify training options, in this paper, Adam optimizer is used, with a learning rate 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 of 0.00001, minibatch size of 12, and using GPU execution, given at the end, a good learning progressing without overfitting.
After the training of the proposed model, the obtained results are shown in Fig.  11, Fig. 12, and Fig. 13 for the prediction of HI, remarkably, the two curves are almost pasted, which shows the performance of the proposed methodology, The wear evolution of three flutes for the cutter C1 are given in Fig. 10. the first step is to use the data for selected three cutters (C1, C4 and C6) to build up a reference model which can be used to predict tool wear for the cutter (C2,C3 and C5), the second step by using the model to predict tool life for another three cutters (data not used in model development) and check the model accuracy.  The output wear values shown in figure 11, 12 and 13 of the three flutes were provided (in 10-3 mm). the training cutters (C1, C4 and C6) were used for estimating the wear for the cutter (C2,C3 and C5). The value of the wear was predicted by the optimal input parameters of separation and the level of decomposition (L=7)and the signal from three dimension.

Comparison of prediction performance with other methods
In this study three criteria are adopted for the performance evaluation of the proposed approach: Root Mean Square Error (RMSE), and average accuracy, with an average length of 95% confidence interval. Here, The indicators used to evaluate quantitatively the prediction quality was shown in Table 1. The calculations are as follows: Here, the prediction accuracy increases as the RMSE shown in Table 1.
In the end, these metrics were used to compare with other frameworks, and present in Table 1, which shows the performance of the proposed methodology compared with recent framework from [38], [39], [40], and [21].

Conclusion
In this paper, a new methodology has been proposed that fits into the field of PHM using data-driven approach, for the estimation of the Remaining Useful Life of cutting tool, based on KSD for signal processing and BiLSTM for learning dependencies of the degradation process. This proposed methodology is based on the steps of PHM, starting with signals from an experimental dataset of Intelligent Maintenance Systems, after the construction of an HI expressing the degraded state of the cutter and on the other hand, signal processing by the use of KSD, to be able after training an expert system using BiLSTM, in the end, to make this framework useful, different metrics evaluation are estimated to evaluate the quality of prediction and compare with other frameworks.
Based on the obtained model, the online phase leads to estimate the current health state and predict the RUL of the cutting tools. From the obtained results, it is expected that the proposed approach gave higher forecasting accuracy of RUL estimation than other existing approaches. Therefore, the proposed approach is very promising to the success of smart manufacturing operations for intelligent decision making. In the future scope, the other prediction methods will be applied in different wear degradation stages and the proposed approach will be extended and improved to other mechanical components.

Ethical Approval
I, Dr Tarak BENKEDJOUH, certify that we have no potential conflict of interest for the mentioned article and that we respected the ethical rules and good scientific practices.

Consent to Participate
I, Dr Tarak BENKEDJOUH, declare the consent of all the co-authors to participate in the aforementioned research paper.

Consent to Publish
I, Dr Tarak BENKEDJOUH, declare the consent of all the co-authors to publish the aforementioned research paper in the International Journal of Advanced manufacturing Technology.