Image-based tool condition monitoring based on convolution neural network in turning process

Tool wear has a significant impact on machining quality, efficiency, and cost, so it is vitally important for manufacturing systems. The current work of Tool Condition Monitoring (TCM) mainly processes the time series signals from multisensory using intelligent algorithms. However, the limits of these methods are as follows: (1) the image information is not integrated into the time series signals, and (2) the traditional methods face the problems of poor generalization and fast convergence. Thus, a novel integrated model based on the multisensory feature fusion and neural network is presented. The sensor data is first pre-processed using Piecewise Aggregate Approximation (PAA) and then recoded into images using Gramian Angular Field (GAF). The images, together with the tool infrared images, are inputs to the Convolutional Neural Network (CNN) model, which realizes the output of flank wear value. Both time series signals and tool infrared images are used to achieve the classification, and the final classification accuracy in the test set is 91%. The results show the high computation efficiency and the good generalization performance of the presented methodology.


Introduction
Cutting is currently the most commonly used machining method. During the machining process, the tool is prone to wear and failure due to complex temperature and stress changes on the contact surface of the tool and the workpiece [1]. Tool wear and failure makes the workpiece unable to achieve the desired shape, size, and accuracy. In severe cases, it may also cause damage to the equipment, resulting in machine failure and thus the inability to process. Data shows that 20% of machine downtime is due to tool damage [2]. In order to avoid harmful effects on the part and the machine, it is necessary to monitor the tool condition in real time, so that it can be replaced before it wears out.
The Prognostics and Health Management (PHM) tries to detect the current state of the equipment and its failure in advance through prediction. It overcomes the drawbacks of the traditional reactive maintenance policy (fail and fix) and helps decision-making with the prediction information, thereby extending the lifetime of the equipment [3]. Tool Condition Monitoring (TCM) is an important part of PHM. The development of TCM technology plays an important role in promoting intelligent production. By collecting and analyzing data reflecting the tool condition during machining process, it can lower machining costs and improve machine utilization [4,5].
The commonly used indirect measurements of TCM are divided into three stages: (1) signal acquisition, (2) feature extraction and selection, and (3) classification [6]. In the feature extraction and selection stage, the main features that characterize the original data are extracted from a large amount of data, and methods such as Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) are commonly used. In the classification stage, algorithms such as K-Nearest Neighbor (KNN), Support Vector Machines (SVM), and fuzzy are used.
Elgargni et al. [7] proposed the method of combining PCA and Discrete Wavelet Transform (DWT) with neural network and established an effective and reliable tool tracking and health identification software program. Barreiro et al. [8] combined acoustic emission and vibration signals to provide complementary signals in different spectral bands. The tool condition and the transition between wear conditions were identified using frequency band analysis. This method is applied to a TCM system of especial king-size multiple cutting edge tools for machining in the milling process of super long and thick steel plate. Painuli et al. [9] proposed a TCM algorithm based on K-star, which extracts a set of statistical features from vibration signals to form the input of the algorithm. Experimental results show that the algorithm can achieve 78% classification accuracy.
In recent years, the rapid development of deep learning in various fields has improved solving the problems of difficult feature extraction, poor model generalization, and local optimization and significantly improved the recognition ability of industrial big data. Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) are the most widely used deep learning networks.
Xu et al. [10] proposed a Gated Regression Unit (GRU)based neural network for tool condition monitoring, which can analyze time series data on multiple time scales. Li et al. [11] established a mapping model between vibration and sound signals and boring tool state by collecting vibration and sound signals from the boring process and proposed a deep-hole boring tool state monitoring method based on Long Short-Term Memory (LSTM) network. Zhou et al. [12] extracted wear features from machining signals and developed a model for predicting the remaining useful life of a tool under variable working conditions by using LSTM.
The research related to condition monitoring and fault diagnosis based on CNN [13] is still in the exploratory stage. Ambadekar et al. [14] carried out dry cutting experiments on low carbon steel parts with carbide blade. The images of cutting tools and turning parts were taken at regular intervals by microscope and used as inputs of CNN model to classify tools into three wear grades: low, medium, and high. The experimental results show that the accuracy of this method is 87.26%. Janssens et al. [15] proposed a CNN-based feature learning model for condition monitoring, and the results show that the model performs significantly better than traditional feature-engineering-based methods.
However, the above methods still have many drawbacks. Firstly, most of these methods do not take into account both the change of vibration signal caused by the change of cutting force during machining process of the tool and the wear due to bonding and oxidation caused by high temperature. Secondly, most of these methods focus on the study of the tool state, what have not been refined to the determination of the VB value of the tool wear. Thirdly, in the feature extraction and selection stage, the method for feature extraction and selection may not be suitable for other working conditions.
In addition, CNN is usually for the extraction of image information, but rarely for the extraction of features of time series signals. The processing of time series signals is also mostly done by one-dimensional CNN (1D-CNN). For instances, some studies applied 1D-CNN to process vibrational signals for structure health monitoring [16,17] and bearing fault diagnosis [18]. But compared with signal input method in 1D-CNN, two-dimensional CNN can do fine tuning with large database, achieving higher accuracy and robustness [19]. Converting time series signals to images can take advantage of 2-dimensional CNN for large data processing and can incorporate additional image information to achieve TCM.
In this paper, a CNN-based tool condition monitoring model built on the Software as a Service (SaaS) layer of the cloud platform is proposed. It combines vibration sensor and current sensor data with infrared images of the tool to achieve TCM and realizes data transmission and interaction between the physical entity of the tool and the virtual model. Furthermore, the monitoring of the tool state using deep learning models is presented in the virtual digital system.

Architecture of tool condition monitoring model based on CNN
In machine tool process, the number of signals is of high volume and complex, including analog signal and image signal. GAF can transform signal curve into image information. CNN can learn features from massive data and generalize the learned features to other data of the same type.
It is suitable for image information processing. Therefore, in the mechanism model and algorithm module of Platform as a Service (PaaS) layer of cloud platform, the cloud tool condition monitoring model is constructed, and the mapping relationship between tool condition edge and cloud side is established, which is used for TCM service at SaaS layer.
The architecture of tool condition monitoring model is shown in Fig.1. The input of the model is the vibration signal and current signal collected by the sensors in the process of machine tool processing and the infrared image of the tool collected by the infrared sensor. After Gramian Angular Difference Field (GADF) image processing, all the images are taken as the input of CNN, and the output is the VB value range of the tool. In the initial case, the weight of CNN is not determined, so the original data is used for training and reducing the distance between the predicted value and the real value, and dropout is added to prevent the overfitting of the model. The image data is processed by the convolution layer and pooling layer and classified by the Softmax layer, and finally the optimal model of tool condition monitoring is obtained.

Time series data process using Gramian Angular Field (GAF)
Signal processing technology is widely used in various fields; the quality of signal processing often has a very important impact on the experimental results. Machine tool processing time series data needs smoothing, noise reduction, and other operations, and then the time-domain and frequency-domain features need to be extracted for analysis to characterize the signal state. In this paper, multiple sensors are used to implement TCM. In order to combine vibration and current time signals with tool infrared images for classification, time series signals need to be converted to images. Inspired by the great success of deep learning in computer vision, Wang et al. [20] proposed three temporal data processing algorithms Gramian Angular Summary Field (GASF), Gramian Angular Difference Field (GADF), and the (MTF). These algorithms can encode time series data into images, realize the visualization of time series data, and can be applied to the field of computer vision to realize the functions of image classification and recognition. Given the vibration data sequence of machining process, normalize the value of vibration data within the interval [-1, 1] using Equation (1) or [0, 1] using Equation (2), In this paper, Equation (1) is used. The new vibration data sequence is transformed into polar coordinate system. Among them, the value of time series and its corresponding time stamp are expressed by angle and radius, respectively: where t i is the time stamp and N is the constant factor of regularized polar coordinate span; this method based on polar coordinates analyzes the time series data from another point of view. The normalized vibration signal of X-axis is shown in Fig.2. With the increase of time, the corresponding value of polar coordinates will twist between different angles on the span circle. The advantage of GAF is that the whole coding method is bijective that given a time series, there is only one result in the polar coordinate system. At the same time, compared with the Cartesian coordinate system, GAF maintains the absolute time relationship through coordinates. After the time series data are transformed into polar coordinate system, the angle can be used to identify the temporal correlation in different time intervals by considering the triangular relationship between the points. The temporal correlation is expressed as Gram matrix, as shown in Equation (5): Then the definitions of GASF and GADF are as follows (Eqs. (6) and (7)): where I is the unit row vector [1, 1, ..., 1]. Fig.3 is the result of GADF transformation of the time series data in Fig.2. GADF provides a time-dependent method. When the position in the image moves from the lower left corner to the upper right corner, the time increases. G (i,j||i−j|=k) represents the superposition relationship of direction relative to time interval, when k=0, the diagonal G i,i contains the original value/angle value information. So, from the diagonal, the high-dimensional features of time series data could be learned through convolution neural network.
In order to reduce the size of the GADF image, Piecewise Aggregate Approximation (PAA) [21] is used to smooth the curve while maintaining the trend of the original data. Supposed the sensor signal X = {x 1 , x 2 , ..., x n } be a sequence of n length, and convert it into a sequence of m length S = {s 1 , s 2 , ..., s m } , where the relation n > m is satisfied and k = n∕m is an integer, and then any element s i in the sequence S satisfies the following equation (Eq. (8)): This method represents the characteristics of the original data by calculating the mean value. When analyzing the time series data, it pays more attention to the integrity and change trend of the data. The data processed by PAA can reflect the shape of the data and reduce the size of the data at the same time, effectively reducing the amount of calculation, improving the calculation efficiency, and laying a foundation for the visualization of time series data.

TCM feature identification based on CNN
CNN can learn features from massive data and generalize features to other data of the same type, so it is often used in the field of computer vision with multi-pixel information. The default input format of CNN is image, and the neuron is designed into three dimensions: width, height, and depth. The input of tool condition monitoring method based on CNN is the image after GADF and the infrared image. The CNN network includes six layers of convolution layer, three layers of pooling layer, and one layer of full connection layer. The result is classified by Softmax, as shown in Fig.4.
The basic idea of CNN network is convolution. Each convolution layer is composed of several neuron filters convolution cores. The parameters of each convolution core are optimized by Backpropagation Through Time (BPTT) algorithm, and the image features can be extracted. These convolution cores are connected with the previous layer of neurons according to the information they feel. In the first convolution layer, each convolution core is not connected to each pixel of the whole picture but only connected to the pixels in its own region. The convolution core in the second convolution layer is connected to the convolution core in the first layer, and so on. Therefore, convolution neural network  With the increase of the number of layers, the subsequent convolution layer can further convolute the previous features into higher-level features and extract more and more complex features. Set the sensor signal image X after GADF as the network input and W as the convolution kernel function, and then the convolution of the two is (Eq. (9)): where * is the convolution operator, s(i, j) is the matrix of the image and convolution kernel after convolution operation, m, n are the number of pixels of the image, and i, j indicate the size of the convolution matrix, which is called a convolution feature of the image.
The convolution operation on the image is equivalent to a convolution core sliding on the image matrix until the whole image is scanned. Each sliding will produce a corresponding node; the specific process is shown in Fig.5.
In the convolution process, each image has different characteristics because of the difference of the image of sensor signal GADF. Each convolution unit has more than one set of weight values and filters. In the process of training, CNN can constantly find the filter that can classify a specific task, that is, CNN can accurately find the features that can represent the tool state; regardless of their specific position in the image, higher-level convolution can classify the tool state considering all the features. CNN can get weight or feature through training, and it has the ability of learning and clustering, but it cannot classify different kinds of signals. Therefore, it is necessary to add multi-classification structure to CNN structure and solve multi classification problem by Softmax regression model.
Softmax regression model is a generalization of logistic regression model in multi-classification. It transforms the score value of linear classification into probability, so the sum of these probability values is 1. In the multi-classification problem, the class label can take more than two values. Suppose that there is a picture collection x (i) , y (i) , for each x (i) , the model calculates its probability value p(y (i) = k|x (i) ), k = 1, 2, 3, ..., K . The formula is as follows:  Therefore, Softmax regression can be used to calculate the probability of each tool condition and classify the tool conditions.

Experimental design
The experiments are performed on a CNC lathe (TX30). The vibration data and current data are collected by CT1010SLFP vibration sensor and JLB50A current sensor and transmitted to the computer by NI USB-6009 data acquisition cards. At the same time, the infrared image of the tool in the machining process was captured by using the infrared camera FLIR A35. After each machining, the micro-measuring instrument AMT3114 was used to measure and record the tool wear until the tool was damaged. The process of the experiment is as follows: Machining 30 mm workpiece along the axial direction with the cutter; after machining, remove the cutter from the tool holder and measure the tool wear VB. Fig. 6 shows the installation of sensors.
For each machining process, the corresponding data acquisition card signal channel is shown in Table 1. The whole tool wear process is divided into six stages, corresponding to different tool wear VB values, and corresponding labels are given, as shown in Table 2, infrared image as shown in Fig.7, and some tool wear VB values and pictures as shown in Fig.8. The experimental design adopts the way of orthogonal experiment, which can balance sampling in the range of multi-factor and multi-level, making each experimental combination more representative and meeting the requirements of the experiment. Considering the    Table 3.

Model training, testing, and evaluation
The model uses Keras open-source framework to build convolutional neural network and completes the corresponding training and testing verification. A small number of data sets (3/4) are selected for training to verify the availability of the model and avoid the waste of computing resources. The training data set of CNN is composed of vibration signal and three-phase current encoded image and infrared image. Because the time series of a machining process is too long (about 20W data points at a time) and contains irrelevant data caused by start-up or stop when the machine is started or stopped, the representative part of the complete sensor data series can be intercepted, and the relevant images of vibration data and current data can be obtained by using the GADF method. After data normalization, PAA is used to reduce the number of data to 1500 without affecting the consistency of time series; GADF is used to convert the time series data after PAA into three channel image of 350 × 350 pixels. The label of the image is consistent with the label of the tool in this case.
After data preprocessing, a total of seven images, including six 3-channel images and one infrared image, were generated in each cutting process. A total of 1120 images were generated in 160 experiments. Seventy-five percent of these pictures are used as training set and 25% as test set. The test set does not participate in the whole process of training, so there are 840 training set samples and 280 test set samples. The following will elaborate the generation of CNN model from model training, model testing, and evaluation.

Model training
In the training phase, the CNN network uses Softmax for regression classification, uses cross entropy loss function to calculate the distance between the actual output of the neural network and the label, and updates the model parameters and weights through continuous training. In the training stage, the data sets and tags are loaded into the CNN neural network model. The model is iteratively optimized in the continuous training process and constantly updates the weights in the neurons to reduce the loss value. Finally, the neural network model with classification ability is obtained. Table 4 shows the training parameters of CNN model.
The curve of accuracy and loss value of convolutional neural network under above training parameters is shown in Fig.9. The horizontal axis is the training round epoch, and the vertical axis is the accuracy and loss value. It can be seen that after about 2000 steps of epoch, the accuracy of the model gradually increased to about 80%, loss gradually decreased to less than 0.25, and then the accuracy of the model quickly reached more than 95% and gradually converged. Although the model oscillates slightly, with the increase of training times, the accuracy and loss of the model tend to be stable, which shows that the model can mine the tool state-related features in the signal image and achieve good training effect.

Model testing and evaluation
When the model training is completed, the trained model with tool state classification function is applied to the test set to evaluate the performance of the model. This section will evaluate the advantages and disadvantages of the model using F1 score. The F1 score is the weighted average sum of the precision and the recall, i.e., the F1 score is high when both the precision and the recall are high, and the formulas are as follows: where TP, FP, and FN represent the number of true positive samples, false positive samples, and false negative samples, respectively.
Based on a total of 280 pictures in the test set, the evaluation indicators are shown in Table 5. It can be seen that the accuracy of the test set is 91%, the macro-average F1 score value of six kinds of Tool States is 0.91, and the weighted average F1 score value is 0.92. It shows that the model also shows a satisfactory classification effect in the test set and has a good performance in predicting the "1" initial wear state and "5" damage state, that is, the model is more sensitive to the damaged state of the tool and can quickly locate the damaged tool in practical application.
At the same time, in terms of training time, the average training time of each epoch is 850 MS, and the whole training process takes about 3 h, but the time consumption on the test set becomes shorter. This is because only the model parameters need to be loaded in the test set to classify the tool state, so the monitoring model can also achieve realtime monitoring. Because the training network needs to consume a lot of computing resources, the twin data transmission of sensors depends on bandwidth, and it also takes a certain amount of time to convert the data into images. Therefore, this algorithm is built in the cloud with strong computing power to achieve the effect of near real-time monitoring. In addition, although training consumes a lot of time, we can speed up the model calculation and testing by using more powerful CPU or GPU, optimizing code, and introducing parallel computing and other methods. Fig.10 shows the confusion matrix of the model. Combined with Table 5, it can be seen that the model performs better in classification "0," "1," and "2" which are initial wear and stable wear and worse in classification "4" which is sharp wear. The model is difficult to classify severe wear.
The possible reason is that at this stage, the tool will quickly enter the damage stage, resulting in less available data. At the same time, due to the instability of the tool at this stage, the form, temperature, weak point, and angle of the tool are not the same, and the vibration curve, current curve, and infrared image are not the same. It is hard to summarize a general feature. When the tool goes to this stage, the data  set of the tool in this state will be reduced if the machine tool is shut down artificially to protect the workpiece and the machine tool. The information and features extracted in the model training process are less than those in other tool states. In the stage of tool breakage, the tool vibration and temperature rise sharply, which are different from the other states, and can be better classified in this stage.

Results comparison and analysis
The model classification accuracy and its possible improvements will be discussed first in this section. Then, in order to verify the effectiveness of the model, the monitoring performance of the model will be compared from two aspects of different model inputs and different monitoring algorithms.

Discussion of classification results
The model performs relatively poorly in class 4, i.e., "sharp wear." The possible reason is that the training set samples for class 4 are too small. Because at that stage the tool is about to enter the damage stage, the state is unstable, and it is difficult to collect a sufficient number of samples. To improve the classification accuracy for class 4, a possible improvement is to adjust the structure of the CNN, such as changing the convolution kernel size and the number of convolution layers. One of the possible improvement directions is to introduce inception layer [22] in the CNN. The inception module is shown in Fig.11.The advantages of this algorithm are that the features can be abstracted from different scales simultaneously and more nonlinearities are introduced to improve the generalization ability. Thus, it may be possible to improve the classification accuracy of category 4.

Comparison between single sensor image input and multi-sensor image input
With the development of sensing technology and communication technology, there are more and more data collected by sensors of different scales and types. Simultaneous interpreting different data from different sensors reveals the state of the tool. In this experiment, only a single vibration sensor data is used as the input of CNN neural network, not all the sensor images. After the model training is completed, it is applied to the test set. The confusion matrix is shown in Fig.12, and the evaluation index is shown in Table 6. It can be seen from Table 6 that the prediction result of this model is better   only in "5" category, F1 score is 0.89, and the classification effectiveness of other categories is far from the expected. Comparing Table 5 and Table 6, Fig.10 and Fig.12, it can be seen that the classification of multi-sensor model is better than that of single sensor model. When all kinds of sensor information are comprehensively used, the ability of multi-source information synthesis can be trained, and the classification ability of the model can be enhanced. Multiple types of input can make neural network extract more effective and sensitive features from different data.

Comparison with related algorithms
Xie et al. [23] proposed a tool condition monitoring method based on LSTM. The whole LSTM network consists of one input layer, two hidden layers, and one output layer: the input layer uses tool twin data as input, the hidden layer includes one LSTM layer, and one full connection layer and the output layer uses regression to classify signals as well. After repeated iterative training of labeled data sets, LSTM neural network with tool condition monitoring ability is finally obtained. The number of LSTM neurons is 64, dropout value is 0.4, and batch size value is 128. The trained model is applied to the test set, and the accuracy of the test set is 89.7%. At the same time, we also built a simple RNN model, the learning rate is 1e−5, the number of RNN neurons is 64, under the above parameters for training and testing, and the accuracy is 84.6%.
Ref [24] proposed a tool wear condition monitoring method based on PCA and C-SVM. In the same data set, the algorithm extracts six features, the penalty factor C and the parameter g of RBF kernel function are 0.82932 and 0.15604, respectively. The data of the three tool states are classified under above parameters. A total of 10 tests are carried out, and the average accuracy is 73.3%.
The same data set is also applied to the tool condition monitoring model based on Support Vector Regression (SVR). After 10 tests in the test set, the average accuracy of the test set is 70.2%. At the same time, we also build a simple CNN structure, which has two layers of convolution layer; the basic learning rate is 0.0001, and the accuracy of test set is 80.7%. The specific comparison of the above algorithms is shown in Table 7.
As can be seen from Table 7, compared with the traditional monitoring algorithm, the neural network algorithms based on machine learning provide better accuracy in the same data set, and the accuracy in the test set is more than 80%. Although compared with the traditional monitoring algorithm, the training time of neural network is generally longer, and the number of epoch rounds is also larger. According to the size of training set, the training time varies from several hours to dozens of hours and consumes more computing resources. However, when it is applied to tool condition monitoring, it tends to be more accurate. In terms of monitoring time, the neural network only takes seconds to test. At the same time, convolutional neural network can extract more and higher dimensional features from the original data, and it can show better generalization when processing image information. Even if it is applied to data sets under other working conditions, it only needs to retrain the network to get a new model.
It can be seen that the performance of RNN, CNN, and LSTM with gating unit is better than the traditional "feature extraction + classification" monitoring method. In the traditional tool condition monitoring methods, it often takes a lot of time to preprocess the data in the feature extraction stage to capture sensitive features, such as using wavelet analysis, Fourier transform, and other methods. Although the extracted features can represent the tool state, they still contain some redundant information, and more sensitive features may be omitted, which is inevitable for manual feature extraction. This method is time-consuming and needs to rely on empirical knowledge. With the explosion of industrial data, the disadvantages of this kind of methods are gradually significant. On the contrary, the method based on deep learning can adaptively learn features from the original data and mine deep-seated features without relying on empirical knowledge and manpower, effectively avoiding the limitations of human feature extraction. Because dropout layer is added to the network, the model does not rely on local features, which improves the generalization of the model.

Conclusion
In this paper, the tool condition monitoring system framework on cloud platform is constructed, and the tool condition monitoring model based on convolution neural network is proposed to realize the tool condition monitoring using by sensors. In the training phase of convolutional neural network, the sensor data is preprocessed, the data curve is smoothed by PAA algorithm, and then the data is imaged by GADF, and the sensor data is converted into graphics for CNN to use. At the same time, the infrared sensor image is used as the neural network data set for training and testing. Both time series signals and image information are used for classification purpose. In the test phase, the model is evaluated by confusion matrix, accuracy rate, F1 score, and other indicators. The results show that the algorithm can achieve satisfactory monitoring performance: 1) Compared with single sensor model, multiple sensor model can improve classification accuracy from 82% to 91%. It shows the necessity of multi-sensor to improve the accuracy of the model. 2) Compared with related algorithms, the algorithm performs best in classification accuracy. While the classification accuracy of traditional algorithms can only achieve 73.3% in the test set, the algorithm can achieve a 91% accuracy rate. Moreover, the results also show that it has better goodness of fit and generalization performance than other neural network algorithms.
Author contribution Kou Rui: experiments and writing-algorithm and original draft preparation. Lian Shi-wei: data processing and writing-reviewing. Xie Nan: scheme development and overall experimental design. Lu Bei-er: experimental platform construction. Liu Xue-mei: validation.
Funding This work was supported by National Key R&D Program of China (2018YFB1700902).

Availability of data and materials
The authors confirm that the data supporting the findings of this study are available within the article.

Declarations
Ethics approval Not applicable.
Consent to participate Consent to participate in this study was obtained from all the authors.

Conflict of interest
The authors declare no competing interests.