Chatter detection in milling process based on the combination of wavelet packet transform and PSO-SVM

Chatter is one of the biggest unfavorable factors during the high speed machining process of a machine tool. It severely affects the surface finish and geometric accuracy of the workpiece. To address this obstacle and improve the quality and efficiency of products, it is significantly essential to detect chatter during machining. Therefore, a multi-feature recognition system for chatter detection on the basis of the fusion technology of wavelet packet transform (WPT) and particle swarm optimization support vector machine (PSO-SVM) was proposed in this paper. Firstly, the original vibration signals collected from the acceleration sensor were processed through wavelet packet transform (WPT). The noise and the irrelevant information were remarkably decreased. In addition, the wavelet packets containing chatter-emerging information were chosen and reconstructed. The fourteen time–frequency domain characteristics of the reconstructed vibration signal were calculated and chosen as the multi-feature vectors of chatter detection. Finally, to obtain the optimal radial basis function parameter g and penalty parameter C of the SVM prediction model, the optimization algorithms of k-fold cross-validation (k-CV), genetic algorithm (GA), and particle swarm optimization (PSO) were employed in optimizing the model parameters of SVM. It was indicated that the PSO-SVM improved obviously the accuracy of chatter recognition than the others. In addition, we applied the optimized SVM prediction model by PSO for detecting chatter state in end milling machining. Chatter recognition results indicated that the model accurately predicted the slight chatter state in advance.


Introduction
Chatter is one of the biggest unfavorable factors in achieving high performance machining, which is a self-excited vibration that happened between workpieces and cutting tools [1]. It occurs in any machine tooling process and directly affects the surface finish and geometry accuracy of the workpiece, seriously damaging the tool and reducing the life of the machine tools. Timely chatter is detected, which is a prerequisite for improving production efficiency and reducing manufacturing costs. However, the cutting process in the milling is non-stationary due to machine tool spindle wear, the change of operating temperature, workpiece stiffness, and other non-linear factors [2]. Therefore, with the cutting environment changing, the chatter detection and identification methods in the machining process have always been significantly critical problems.
Over recent decades, many researchers paid attention to chatter detection, which has been a research hotspot. In order to detect the phenomenon of chatter, some sensors were generally applied to obtain chatter signals, such as acceleration sensor, acoustic emission, current sensor, and microphone [3][4][5][6]. No matter which sensor is chosen, it is of great importance to guarantee that the extracted chatter indicators are sensitive and the designed chatter indicators are closely relevant. Ye et al. [7] extracted the root mean square (RMS) sequence of the real-time acceleration signals, and its coefficient of standard deviation to mean was designed as an indicator to distinguish the machining state. Tangjitsitcharoen [8] calculated the cumulative power spectrum density (PSD) by the collected three dynamic cutting forces and used its ratios for detecting in-process chatter states during NC turning. In addition, a multi-sensor fusion technique was utilized for extracting chatter features, ensuring that chatter detection is robust and reliable regardless of variable cutting conditions. Kuljanic et al. [9] investigated the sensibility of chatter onset of several sensors and found that three or four sensors were the most promising solution for reliable and robust chatter identification. Pan et al. [10] used the multi-sensor fusion technique and manifold learning for chatter detection during boring and found that the extracted multi-features can improve the recognition rate. However, some sensors, such as force sensor, maybe be not suitable for the practical machining process. For example, to ensure the reliability and accuracy of measurements, the acoustic emission needs to be close to the machining area between the cutting tool and workpiece [11]. Moreover, the installation and cost of the displacement and force sensor may be difficult and huge, respectively. Wan et al. [12] selected 8 time-frequency domain characteristics and 8 automatic characteristics extracted by stacked-denoising autoencoder as chatter indicators, and they found that the accuracy and reliability of chatter detection in milling was greatly improved.
During the actual cutting process, the acquired signals contain a lot of noise. For extracting the chatter-sensitive characteristics, the signal processing technique is particularly crucial. The proper method for processing timevarying non-stationary signal, including Wigner-Ville distribution, Hilbert-Huang transform, short-time Fourier transform (STFT), wavelet transform, and wavelet packet transform, effectively reduces the content of the noise. These methods effectively enhance the signal-to-noise ratio (SNR). Fu et al. [13] decomposed the collected acceleration signals into a sequence of intrinsic mode functions (IMFs) with using empirical mode decomposition (EMD) to quantize the spectrum characteristic for an online detection system. Ji et al. [14] investigated that ensemble empirical mode decomposition (EEMD) was adopted to treat the acceleration signals and selected the IMFs with chatter information of the milling process to detect milling chatter timely. Although EEMD could address the issue of mode mixing, the application of EMD and EEMD is still restricted owing to lacking theoretical foundation [15]. Wavelet packet transform (WPT) is an effective signal processing method which is especially used to deal with the non-stationary signal. Compared to short-time Fourier transform (STFT), WPT overcomes the drawbacks in high frequency signals, synchronously obtaining the high resolution at both low and high frequency signals. Hence, the measured signals in the milling process were preprocessed by WPT, which effectively extracted the certain frequency band with rich chatter information. Cao et al. [16] applied WPT as a preprocessor to eliminate noise in measurement signals. The performance of the Hilbert-Huang transform (HHT) was enhanced, and its mean and standard deviation were adopted to identify the chatter state at the end milling process. Yao et al. [17]. combined wavelet analysis and wavelet packet analysis for processing the measured acceleration signals, and standard deviation and energy ratio were correspondingly extracted as chatter indicators for detection.
Although extracting features in signals with preprocessing presents the stability degree of the machining condition, the threshold methods are usually applied to detect chatter state [18]. However, the threshold approaches required individual experience and inherent mechanism of the dynamic system of machine tools in advance, which are not suitable for industrial applications. Therefore, additional efforts are required to implement intelligent monitoring systems of chatter detection. Through a learning process, the stable and unstable states in the cutting process are significantly discriminated. Several recognition techniques such as neural network, hidden Markov models (HMM), and fuzzy logic have been utilized to detect variable machining conditions. Teti et al. [19] investigated that the neural network and fuzzy logic approaches were extensively applied for monitoring the cutting state. Zhang et al. [20] proposed a hybrid approach of combining hidden Markov model (HMM) and artificial neural network for detecting cutting chatter and found that the cutting chatter was detected timely. However, these recognition techniques need a large amount of samples to ensure the recognition accuracy of chatter.
Alternatively, support vector machine (SVM), a popular supervised machine learning approach, owns the greatest generalization ability and minimizes the classification error. This algorithm can solve the problems of classification and regression for a small sample. Therefore, SVM has been extensively applied for identifying chatter in the milling process [21][22][23]. However, SVM has the disadvantage of limiting a wide range of industrial applications. For example, the selection of kernel function parameter and penalty parameter C of SVM seriously impacts on the recognition rate of the SVM classifier. Therefore, for achieving the optimal kernel function parameter and penalty parameter, several optimization algorithms are employed for SVM. Peng et al. [24]. used the k-fold cross-validation method to optimize the gamma parameter of radial basis functions and penalty parameter C. But this approach is a local search strategy which leads to that the SVM classifier is prone to falling into the local minimum [25,26]. SVM parameters were generally optimized by a genetic algorithm (GA) for improving classification accuracy for monitoring wheel wear, which achieved a great performance [27]. Jia et al. [28] adopted the genetic support vector machine (GA-SVM) to monitor the trend of tool wear during the deep-hole drilling. In addition, particle swarm optimization (PSO) is another optimization algorithm for optimizing SVM parameters, successfully applying for the bearing fault diagnosis [4,29]. Even though both GA and PSO have a good advantage of optimization of SVM parameter in terms of classification recognition, PSO is simple to operate and reduce the computation time significantly with respect to GA [30,31]. Wang et al. [32] developed a chatter detection approach on the basis of particle swarm optimization support vector machine (PSO-SVM) in end milling and found that this approach accurately recognized the processing state than that of other algorithms.
In this paper, a multi-feature recognition approach for chatter detection in the end milling process on the basis of the fusion technology of wavelet packet transform (WPT) and particle swarm optimization support vector machine (PSO-SVM) was proposed. To demonstrate the accuracy of the proposed approach of chatter recognition, we have conducted the cutting experiments of aluminum 6061 on a three-axis milling machine center of VMC1165B. An acceleration sensor was adopted to collect vibration signals in the end milling. Wavelet packet transform (WPT) was utilized to remove the irrelevant information and redundant noise of the original vibration signal. Then, the wavelet packets at a certain frequency band were chosen and reconstructed. Through calculating the reconstructed wavelet packets, 14 time-frequency features were acquired as chatter indicators. To enhance the recognition performance of the SVM classifier, three optimization algorithms of genetic algorithm (GA), k-fold cross-validation (k-CV), and particle swarm optimization (PSO) were employed to optimize radial basis function parameter g and penalty parameter C of the SVM prediction model. Comparison results demonstrated that the PSO-SVM improved obviously the accuracy of chatter recognition than that of the others. Furthermore, we applied the optimized SVM prediction model by PSO for detecting chatter in end milling machining. Chatter recognition results have indicated that the model accurately predicted the slight chatter state in advance. Figure 1 describes the scheme of the proposed approach of chatter detection in this study. The collected vibration signals were decomposed and reconstructed by using WPT, and the energy ratio was chosen as the choice criterion for selecting a characteristic wavelet packet containing chatteremerging frequency. Subsequently, fourteen time-frequency features were selected as recognition parameters of chatter by calculating the reconstructed characteristic wavelet packets, which formed feature vectors. Finally, parameters of the SVM prediction model were optimized by PSO for identifying test data in end milling, outputting the final identification results.

Feature extraction of chatter
The distribution and amplitude of the collected vibration signals will change as the cutting state changes. Therefore, we can determine whether chatter occurs through detecting the amplitude and distribution of the signal. We supposed that there is a vibration signal x i (i = 1, 2, … , n) , where n represents the number of collected data points. Ten features in the time-domain were chosen for identifying chatter state [12,33], as displayed in Table 1. From Table 1, x m , x p , and x rms stood for the energy and amplitude of vibration signals in the time-domain. x std , x ske , x kur , CF , CLF , SF , and IF reflected the distribution of vibration signals in the time-domain.
Furthermore, on account of time-domain features extracted above cannot directly acquire some potential chatter information, this paper also introduces some frequencydomain features for flutter detection [12] because the amplitude and distribution of frequency components of vibration signal may change with the occurrence of chatter when chatter occurs [13]. In this regard, we selected four frequencydomain parameters as feature indicators of chatter [33,34], as described in the following.

Mean square frequency: MSF
where S(f j ) is the power spectrum amplitude that is analyzed through FFT, and f j (j = 1, 2, ⋯ , m) described the jth frequency of vibration signal. MSF is the energy of the frequency-domain signal. ρ and FC described the location of the dominant frequency band. FV represented the energy distribution of the frequency-domain signal.
However, to obtain these frequency-domain features, an amount of time need to spend on fast Fourier transform (FFT). To solve this issue, a fast calculation criterion was employed [35,36], and four frequency-domain features can be replaced, as shown in Table 2. ̇x i is the first-order difference equation, and ̇x i = (x i − x i−1 ) Δt , and Δt is the sampling interval. Hence, fourteen time-frequency features were selected as recognition parameters of chatter in this study, which formed feature vectors.

Wavelet packet transform
However, if the collected vibration signal contains noise, the recognition rate of chatter will be seriously affected. Therefore, the elimination and suppression of noise is significantly important for extracting chatter indicators. Wavelet packet transform (WPT) is an optimal candidate to solve this issue [16]. WPT is performed by using the basic two-channel filter bank that is iterated on the lowpass branch or the high-pass branch. Hence, it simultaneously decomposes the low frequencies and the high frequencies to improve the time-frequency resolution. The collected vibration signals are pretreated by using WPT, and a series of narrow bands are available. At this moment, the energy of broadband noise will be uniformly distributed in these narrow bands. The specific frequency bands containing chatter-emerging frequency were chosen, resulting in the enhancement of the signal-to-noise ratio to a certain extent.
The wavelet method uses a prototype function, which is called the mother wavelet, to shift and scale the signal. A time-domain signal is decomposed into a time-frequency scale. Supposing that there is a mother wavelet function Ψ(t) ∈ L 2 , and its shifted and scaled functions family is Ψ s,u (t)(s, u ∈ r,s > 0 ). The entire family of functions is produced through the dilatations or contractions of a modulated window Ψ(t) , and the time translations of the family of functions are described by Eq. (1) [37,38]: Here s and u represent the scaling and position parameters, respectively. In the case of a continuous time signal x(t), continuous wavelet transform (CWT) could be presented in Eq. (2) [37,38]: where CWTx(s, u) represents the inner product of signal x(t) and a family of shifted and scaled wavelets. CWT has a great ability to deal with various kinds of stationary and non-stationary signals, determining the temporal position of each frequency component. However, due to the low computational efficiency of CWT, it takes a lot of time to calculate. Therefore, CWT is not suitable for off-line applications.  Table 2 Frequency-domain feature parameters [35,36] Feature Expression According to conjugate quadratic filters, Mallat developed a discrete wavelet transform (DWT) with a fast algorithm [39]. However, the high-frequency information is lost by using DWT. To overcome this drawback, wavelet packet transform (WPT) can decompose a discrete signal into a detail signal and approximation signal via high-pass and low-pass filters. Hence, the basic wavelet packet function is presented in the following [39]: Here, g and h and g represent high-pass and low-pass filter coefficients of wavelet decomposition, respectively. The relationship between them is orthogonal: .u 0 (t) and u 1 (t) correspondingly represent the scaling function and wavelet function. The wavelet packet decomposition coefficients can be acquired by iterative calculation: where x n,j is wavelet coefficient at the transformation level of j(j = 1, 2, …) ; n stands for n sub-band; m stands for the number of wavelet coefficients. Supposing there is a discrete vibration signal x(t), a three-level decomposition of x(t) by the WPT method is described in Fig. 2. From Fig. 2, xi ,j (t) represents the j-th frequency band signal at level i. And j(j = 1, 2, ..., J, and J = 2 i ) is the number of the decomposed frequency band.
Before the extraction of chatter features, it is necessary to accurately identify the frequency bands containing abundant chatter information of the decomposed wavelet packets. Therefore, the energy ratio of the wavelet packets in end milling machining is used to find out the chatter-emerging frequency band [40]. During the stable milling process, the energy of the wavelet packet node distributes over the entire frequency bands. While energy ratio of a certain wavelet packet will dramatically increase owing to the occurrence of chatter. So the relatively large energy ratio of wavelet packets was chosen. Chatter features are available by calculating the reconstructed wavelet packets.

The theory of SVM
Support vector machine (SVM) is a supervised machine learning approach according to statistical learning theory. Through finding the minimum structural risk, the generalization ability of machine learning is improved, decreasing confidence interval and empirical risk [26,41]. SVM is utilized to find the optimal separating hyperplane in a feature space. This hyperplane could classify a training set. Therefore, the object of the optimization problem changes to discover an optimal hyperplane to minimize the margins from the hyperplane to the nearest training data. Two types of classification cases of linear SVM are displayed in Fig. 3.
Considering that two classes of data points in a set of samples: i , i , i = 1, 2, … , n, i ∈ , i ∈ {+1, −1} , i stands for the input vectors; i stands for class labels of i ; n represents the total number of samples. Figure 3 described that there are data points composed of different classes of feature vectors. Squares and circles denote class A and class B, respectively. Linear boundary H separates two data sets to guarantee the maximization of the margins between the nearest data points and the boundary of two classes. The nearest data points are defined as support vectors, which lied on the boundary H1 and H2 . Generally, the larger margin can reduce the generalization error of the classifier. The linear boundary H is expressed as follows: where b denotes the bias term, and ω stands for weight vector that is perpendicular to the optimal hyperplane. The combination of ω and b can determine the position of separating hyperplane. Therefore, the decision functions for classifying samples as class A or class B are as follows: To achieve an accurate classification, the margin 2 � ‖ω‖ 2 between the two hyperplanes H1 and H2 should be maximized, which determines the generalization ability of the hyperplane. The constrained optimization problem changes to obtain the optimal hyperplane of data point of classification [26]: where C stands for penalty parameter that imposes a tradeoff between training error and generalization, and i denotes slack variables [25], and i > 0 . (x i ) maps xi from input space to the higher-feature space, which allows linear classification in higher dimensions [42].
To address it, this optimization problem usually changes into the following quadratic programming (QP) problem by using Lagrange multipliers α i . Therefore, the optimization problem of Eq. (9) could be rewritten as follows: where K x i , x j is called kernel function which should meet the theory of Mercer, and its corresponding dot product of higher-feature space can be defined by When the quadratic programming problem is settled, the values of the weight vector ω and b should meet: So, the final decision function of SVM is given as follows: In addition, with the help of kernel functions, the classification of SVM is extended to nonlinearity. Among the kernel functions, the usual functions mainly include linear function, polynomials function, radial basis function (RBF), and sigmoid function [22,43], as shown in Table 3. The whole performance of the above kernel functions is similar. However, RBF has the advantage of classifying multidimensional data and fewer parameters in comparison to the linear, polynomial, and sigmoid kernel functions [44]. Hence, the RBF kernel function is employed in SVM to optimize the solution in this study.

Parameters selection of SVM with PSO
In this study, it is meaningfully to choose reasonably RBF parameter g and penalty parameter C of SVM because these parameters have a remarkable effect on the recognition performance of SVM classifier. Particle swarm optimization (PSO), proposed by Kennedy and Eberhart in 1995, is an intelligent bionic algorithm. The inspiration of PSO is derived from social behavior such as birds foraging and swarm theory. In terms of theory background, PSO, as a global search algorithm, is a sample without crossover and variation compared to GA [29]. Therefore, the parameter setting of PSO is easy to achieve. The execution of PSO begins with random initialization of a swarm of particles for finding out the optimal solution in the search space [45].
Assume that the position and velocity of the i-th particle are set to search space, respectively, where i = 1, 2, ..., m , m represents the scale of particles in the swarm, and j = 1, 2, ..., d . According to the PSO algorithm, the direction of movement of each particle is toward its previous best position, finding a global best position of any particle in the swarm. Through evaluating the fitness of each particle, the previous best position (called pbest) of the i-th particle is computed and then the global best position (gbest) of all particle groups is found. For each iteration, the velocities of all the particles are computed and their positions are updated [46], and the global best value is ultimately obtained. Table 3 Four kernel function and formulas [22,43] Kernel function type Mathematical formula Radial basis function (RBF) Let k denote the current generation. To seek out the optimal solution, the current d-th dimension's position and velocity of the i-th particle at time k is described as follows [29]: where c 1 and c 2 denote as accelerating constants, and c 1 , c 2 > 0. r 1 and r 2 represent random numbers, and its range is [0, 1] . p i,j represents the best position of i-th particle in j-dimensional search space, and p g,j the best position of the entire swarm. v i,j represents the current i-th particle velocity in j-dimensional search space, v i,j ∈ [− V max , V max ], and V max represents the maximum limited velocity. ω is an inertial weight which is capable to balance local exploration and global exploration. A popularly used inertial weight is linearly decreasing weights (LDW) [30], which is defined as where max and min represent the maximal and minimal inertia weight, respectively. k is the number of iterations of controlling process, and k max is the maximal iteration of PSO.
In this paper, we chose the radial basis function (RBF) as the kernel function. Therefore, the performance of the SVM classifier has been influenced by two user-determined parameters (i.e., penalty parameter C and RBF parameter g).
In a PSO system, the particle is composed of penalty parameter C and RBF parameter g of SVM. The optimizing procedure of the SVM parameters by PSO is presented in Fig. 4, which is displayed as follows: 1. Input data. Training and testing sets are represented. 2. Particle swarm initialization. The accelerating constants are set to c 1 and c 2 . k max is defined as the maximum iterations, and the current iterations k is set to 1. The m particles in the d-dimensional space are randomly generated, and the i-th particle position and velocity are denoted as Evaluating the fitness of all particles. The optimization function value for each particle is calculated in the corresponding search space, respectively. 4. Comparing the historical best position (pbest) of each particle with its current fitness. If the particle current fitness of particle is larger than its pbest, and then the current fitness replaces pbest to become the current position. Otherwise, pbest remains the same. 5. Comparing the historical best positions (gbest) of the whole swarm with its current positions of all particles. If gbest is smaller than the current position of the particle, then the current positions of the whole swarm replace to become the current positions of all particles. Otherwise, gbest remains the same. 6. Updating velocity and position of particles. Velocities and positions of all particles based on Eqs. (11) and (12) are updated, and the next new swarm of the particle is formed, going to step 3. 7. Judging the stopping criterion. The stopping criterion is the maximal iteration, or that the fitness of the particles is smaller than a given required precision. If the stopping criterion is met, end the iteration operation. If the stopping criterion is not met, go to step 3. 8. Obtaining optimized SVM parameters C and g.

Experimental setup of end milling
As displayed in Fig. 5, to demonstrate the accuracy of the proposed approach of chatter recognition, we had conducted the cutting experiments of aluminum 6061 on a three-axis milling machine center of VMC1165B. The cutting tool was a two-edged carbide end milling cutter, and the corresponding overhang and diameter were 44 mm and 8 mm, respectively. An acceleration sensor was mounted on the Fig. 4 The procedure of optimizing the SVM parameters with PSO spindle housing, and a data acquisition card of NI USB-6341 was utilized to obtain the vibration signals during end milling machining. The signal sampling frequency was set to 12,000 Hz. In addition, the end milling experiments were carried out under the dry condition without coolant.
It is well known that the occurrence of chatter is closely connected with spindle speed, cutting depth, and feed rate [47]. In this study, for the same spindle speed, the milling depth was set to 0.2 mm each time until the occurrence of chatter. The specific milling parameters were described in Table 4.
In addition, to acquire the inherent frequency of tool and workpiece systems, a hammer experiment was carried out before milling. The position striked by the Hammer was on the tool tip. The transfer function of tool-workpiece systems was acquired through the approach of single-point impaction and response. Therefore, the first-, second-, and third-order natural frequencies are 1494, 2041, and 4160 Hz , respectively. Figure 6a, b shows the three typical processing states (e.g., stable, transition, and chatter) of the vibration signal. Figure 6c, f describes the partially enlarged vibration signals and FFT of stable cutting state in Fig. 6a, respectively. Figure 6d, h, e, i are the partially enlarged vibration signals and FFT of transition and chatter cutting state in Fig. 6b, respectively. From Fig. 6c, f in the stable cutting, it is obviously found that the vibration signal amplitude is small, and the distribution of frequency components is dispersal, the main frequency peaks mainly concentrating on the 1301 Hz, 2039 Hz, 2801 Hz, 4078 Hz, 5145 Hz. These frequency components nearly correspond with the first-order natural frequency, second-order natural frequency, twice first-order natural frequency, third-order natural frequency, and four-fold first natural frequency, respectively. When increasing axial milling depth up to 0.8 mm and spindle speed up to 6000 r/min, the transition state of chatter appears. In this transition state, the increase of signal amplitude is slight, but the distribution of the frequency components changes drastically (see Fig. 6d, h). Consequently, other frequency components are suppressed, and the frequency focuses on around 2824 Hz. This characteristic frequency is almost twice as large as the first-order natural frequency [48]. This is owing to the serious instability of the helix angle of the milling tool, repeatedly impacting on the workpiece to drive chatter [49]. In addition, the occurrence of chatter inhibits the production of other frequencies. Subsequently, severe chatter occurs with the cumulative effect of energy (see Fig. 6b). This is reflected in the increase in the amplitude of the signal, and the chatter frequency of 2824 Hz is further enhanced. This paper aims to effectively identify the infantile chatter state for avoiding the undesirable influence on the material surface quality and milling cutter. Therefore, the effective and accurate recognition of the transition stage of chatter is significantly important. In this regard, this paper emphasized on investigating the feature extraction of the transition stage of chatter. In this study, the wavelet basis function chose db10, owing to better orthogonality. WPT was utilized to decompose the measured vibration signals in four layers in terms of stable state and chatter state during end milling machining. The sixteen wavelet packets were obtained correspondingly. Figure 6 was the description of the four-layer WPT of the reconstructed signal in the transition state of chatter and their corresponding FFT in each frequency band, which was corresponding to Fig. 6d. It was shown that the acceleration signal amplitude in the frequency bands x 4,7 (2625-3000 Hz)  and x 4,8 (3000-3375 Hz) was larger than that in other frequency bands. Therefore, chatter frequency may appear in the frequency bands x 4,7 and x 4,8 . The energy ratio of each frequency band was calculated, as shown in Table 4. We found that the energy ratio of stable, transition, and chatter state in wavelet packets of x 4,7 and x 4,8 was 0.1137, 0.53, and 0.8467, respectively (Table 5). Therefore, the vibration energy increased sharply and mainly concentrated in the wavelet packets of x 4,7 and x 4,8 when chatter occurred. Due to the presence of rich chatter information, the wavelet packets of x 4,7 and x 4,8 were chosen and reconstructed. The reconstructed vibration signal of characteristic wavelet packets and corresponding to FFT were shown in Fig. 7. It is found that the time-domain features of the reconstructed vibration signal become more obvious, and the frequency spectrum retains complete characteristic information of the transition state of chatter. This demonstrated that the redundant noise and irrelevant information is effectively removed through selecting the characteristic wavelet packets preprocessed by WPT (Fig. 8).

Extraction of chatter features
In this paper, we have mainly focused on the infantile chatter identification (e.g., transition state) based on the vibration signal, as displayed in Fig. 6d. Therefore, for extracting chatter features, the vibration signal is pretreated with four-layer WPT, obtaining sixteen wavelet packets in the time-frequency domain. Then, the characteristic wavelet packets of x 4,7 and x 4,8 in stable and chatter transition state, corresponding to Fig. 6c, d, were chosen and reconstructed according to the analysis in Sect 4.1. Subsequently, fourteen time-frequency features were gained through calculating the reconstructed vibration signal. However, the difference between the values of fourteen time-frequency features is too large. This may lead to the failure of convergence and increase the training time. Hence, it is meaningfully necessary to preprocess the chatter feature recognition parameters before training in order to avoid too large a difference between the characteristic parameters, the mean square frequency, frequency center, and standard frequency divided by 10 7 , 10 4 , and 10 5 , respectively. The results were described in In addition, the transition state during milling belonged to the chatter state in this study. Based on the experiment parameter in Table 5, we obtained 30 samples of a stable state and 30 samples of a chatter state. The training set was randomly composed of 20 stable samples and 20 chatter samples, and the testing set consisted of the remaining samples.

Chatter recognition based on time-frequency characteristics of WPT and PSO-SVM
According to the previous analysis in this paper, inappropriate penalty parameter C and RBF parameter g may cause the over-fitting and under-fitting of the SVM classifier, which severely affects the accuracy of SVM prediction classification. But in the process of practical application, it is very difficult to determine the optimal value of C and g. Therefore, the SVM parameters should be set in advance before the application of the SVM prediction model. In this study, the algorithm of PSO was employed to optimize the penalty parameter C and RBF parameter g of the SVM classifier. The initialized parameter values of PSO-SVM were set as follows. The swarm size was set to 20 particles, and the maximum iterations were set to 200. The accelerating constant c 1 and c 2 were set to 1.5 and 1.7, respectively. The searching range of parameter C of SVM was between 0.1 and 100, while the searching range of parameter g of SVM was between 0.01 and 1000. The fitness curves for PSO to find the optimal parameter of the SVM classification model were shown in Fig. 9. From Fig. 9, the optimum fitness reached 95% and the optimal values of penalty parameter C and RBF parameter g of SVM classifier were C = 74.89 and g = 0.01, respectively. The chatter recognition accuracy of the testing set is 95%, as displayed in Table 7.
In addition, for demonstrating the benefits of the developed PSO-SVM in chatter detection, three other methods of standard support vector machine (SSVM) and k-fold cross-validation support vector machine (k-CV-SVM) were selected and compared for chatter identification. The standard SVM parameters were set to C = 2 and g = 1, which was a reference. And the initialized parameter values of k-CV-SVM were set to threefold cross-validation, 2 −8 ≤ C ≤ 2 8 and 2 −8 ≤ g ≤ 2 8 . The initialized parameter values of GA-SVM were set as follows. The maximum evolutionary algebra and population were set to 100 and 20, respectively. And the crossover probability and mutation probability were set to 0.4 and 0.01, respectively. The searching range of parameters C and g of SVM was the same as the PSO algorithm. The classification accuracy of the k-CV method was shown in Fig. 10, and the prediction accuracy was shown in Table 7. From Table 7, it was seen that the chatter identification accuracy rate of training data was 95% under k-CV-SVM. The identification performance of k-CV-SVM was equal to that of SSVM, while the prediction accuracy rate was lower in testing data than that of SSVM. This is due to that the approach of k-CV-SVM belongs to the local search method. In addition, this algorithm is vulnerable to local optima [44,50]. The fitness curves of the evolutionary algebra of GA were shown in Fig. 11. It was found that the optimum fitness reached 95%. And the chatter identification accuracy rate of testing data was 85%. Therefore, compared to the three chatter identification methods of SVM, k-CV-SVM, and GA-SVM, the PSO-SVM enhanced the accuracy of chatter identification for training data and testing data. Through the method proposed in this study, the phenomenon of chatter in the milling process could be identified and predicted.

Validation of effectiveness of chatter recognition based on WPT and PSO-SVM
For proving the effectiveness of chatter recognition and prediction on the basis of WPT and PSO-SVM, we used the measured vibration signals under milling conditions of axial cutting depth of 0.6 mm, spindle speed of 6000 r/min, and feed rate of 0.2 mm/z to identify chatter state. Figure 12 described the time-domain spectrum of the collected vibration signals. For the better identification of cutting state, every 1024 data points were selected as an identification sample of cutting state. And 122 identification samples were obtained. Each sample was processed with WPT, and then fourteen time-frequency features were extracted as feature indicators for chatter recognition. The optimized SVM prediction model by PSO in the Sect. 4.3 was adopted to detect the milling machining state. The identification result was shown by the red asterisk mark in Fig. 12. From Fig. 12, it   . 9 The adaptive evolutionary curves of PSO  Fig. 10 The classification accuracy of the k-CV method Fig. 11 The adaptive evolutionary curves of GA was visibly seen that the collected vibration signal amplitude did not change dramatically before the time of 3.891 s. The optimized model identified the early occurrence of chatter at 3.667 s, while the amplitude of the collected vibration signal had a slight fluctuation. It meant that the optimized SVM model accurately identified the light chatter state 0.224 s in advance. In this regard, if some measures to suppress chatter may be executed by this time, the adverse effects of severe chatter on the workpiece and milling cutter could be prevented. Therefore, the proposed method with an excellent and robust performance in this paper accurately identified the infantile chatter state during the milling machining.

Conclusion
In this paper, the multi-feature recognition approach for chatter detection combining WPT and PSO-SVM was proposed for improving the accuracy of chatter identification under the milling machining. The chatter detection is essentially a problem of signal processing, feature extraction, and chatter identification. This study emphasizes on the selection of the frequency band containing chatter information of vibration signals with WPT for signal processing. Chatter-emerging frequency bands of x 4,7 and x 4,8 were chosen and reconstructed, which effectively removed the redundant noise and useless information to increase the signal-to-noise ratio of vibration signals. Subsequently, fourteen time-frequency features are extracted through calculating the reconstructed vibration signals. The fourteen time-frequency features as the multi-feature indicators of chatter recognition could reflect the change of amplitude and distribution in the time and frequency domain when chatter occurs. In addition, this paper studied that the optimization algorithms of k-CV, GA, and PSO were adopted to optimize the penalty parameter C and RBF parameter g of the SVM model. Compared to the two optimization algorithms of k-CV and GA, PSO-SVM obviously improved the accuracy of chatter recognition, as shown in Table 7. Moreover, the optimized SVM prediction model by PSO was applied to detect the state of milling machining. The recognition results indicated that the model may accurately predict the slight chatter state in advance. In this regard, chatter suppression will be investigated in future. Chatter may be suppressed through adjusting the milling depth, spindle speed, and other processing parameters.
Funding This study was supported by the National Key Research and Development Program of China (Grant Number: 2017YFB1104600).
Data availability Data will be available upon reasonable request.

Declarations
Ethical approval Not applicable.

Consent to participate Not applicable.
Consent for publication All co-authors consent to the publication of this work.

Competing interests
The authors declare no competing interests.

Fig. 12
Chatter detection during the milling process based on the optimized SVM model