An effective LSSVM-based approach for milling tool wear prediction

In order to realize real-time and precise monitoring of the tool wear in the milling process, this paper presents a tool wear predictive model based on the stacked multilayer denoising autoencoders (SMDAE) technique, the particle swarm optimization with an adaptive learning strategy (PSO-ALS), and the least squares support vector machine (LSSVM). Cutting force and vibration information are adopted as the monitoring signals. Three steps make up the unique feature extraction and fusion method: multi-domain features extraction, principal component analysis (PCA)-based dimension reduction, and SMDAE-based dimension increment. As a novel feature representation learning approach, the SMDAE technique is utilized to fuse the PCA-based fusion features to enrich the effective information by increasing the dimension, thus helping polish up the predictive performance of the proposed model. PSO-ALS is used to obtain the optimal parameters for LSSVM, simplifying the problem and increasing the population diversity. Twelve sets of milling experiments are conducted to demonstrate the reliable performance of the proposed model. The experimental results show that the presented model is superior to models such as PSO-LSSVM in predictive performance, and the SMDAE technique effectively improves the prediction accuracy of the established model. The findings of this paper offer theoretical guidelines for monitoring milling tool wear in real industrial situations.


Introduction
Because of the friction and extrusion between the workpiece and the tool edge, tool wear is unavoidable in the machining process. Tool wear status is one of the most crucial variables in ensuring the manufacturing system's dependability and stability. Excessive wear of cutting tools causes a dramatic increment in cutting forces, vibrations, and even machining tool noise. Furthermore, tool wear impacts the surface roughness of parts throughout the machining process, affecting the service life and reliability of the final product. According to prior research, tool failure causes up to 20% downtime, resulting in heavy productivity and profitability losses [1]. Therefore, it is vital to build monitoring systems that are efficient for monitoring tool wear during the machining process in real-time and precisely.
The various tool wear monitoring systems proposed by scholars can be grouped into two main types according to the different monitoring principles: offline and online methods. The offline method refers to the acquisition of tool wear information by direct measurement methods, which have high accuracy of results but also have high requirements in the machining environment, where light conditions, cutting fluid, and chips can directly affect the measurement results [2]. In addition, due to the specificity of the offline method, measurements can only be made when the machining equipment is down, which affects productivity and does not allow real-time monitoring. Compared with the offline method, the online method is less likely to affect the regular production schedule. Researchers usually collect signals like cutting force, vibration, acoustic emission, spindle motor current, and power and then construct a mathematical model between tool wear and the monitored signals to implement tool wear monitoring indirectly [3]. Among the various signals employed in the online method, cutting force and vibration signals can provide effective information in terms of the cutting tools' status and offer several obvious advantages over other signals, as demonstrated in previous work [3,4].
Relevant research shows that tool wear conditions can be efficiently depicted with features in the time-frequency domain, time domain, and frequency domain [5,6]. Wang et al. [7] integrated the above three domain features extracted from the chosen signals to sense virtual tool wear. Kong et al. [8] extracted different domain features from cutting force signals to monitor the cutting tool wear. The above work has provided certain references for selecting universal features to characterize tool wear conditions. Artificial intelligence algorithms like artificial neural networks (ANN), support vector machines (SVM), and hidden Markov model (HMM) are often used to monitor tool wear. These algorithms can set up a nonlinear mapping between tool wear and multi-domain features. Yu et al. [9] presented a weighted HMM model for monitoring tool wear in continuous conditions and proved its superiority through experiment. Through training a deep belief network (DBN) with the selected time-domain features, Chen et al. [10] achieved a lower error than an ANN model regarding tool wear prediction. However, ANN-based and HMM-based models require lots of training samples to achieve superior predictive performance. It needs an abundance of time and causes a huge waste of resources in the actual industrial production environment. Besides, the selection of core parameters is overly dependent on experience and lacks a theoretical basis [4]. Based on statistical learning theory and minimization of structural risk, LSSVM has better generalization ability with a small sample size and is especially fit for nonlinear function regression. Zhang and Zhang [11] proved the LSSVM-based model's superiority to the model based on ANN in prediction accuracy with the presented model for ball-end milling. Note that the kernel parameters of the LSSVM algorithm have a significant impact on the model's predictive accuracy. Particle swarm optimization (PSO), the population-based stochastic optimization technique, which was widely used for its quick convergence speed and high efficiency, has succeeded. However, the population diversity is lost as a result of the learning strategy's primary concentration on the global best particle, which also increases the risk of falling into the local optimum. To increase the population diversity, different PSO algorithms with various learning approaches have been created [12,13]. Recently, on the basis of the adaptive learning approach, a unique PSO algorithm was proposed to greatly increase diversity and strengthen the global search capability [14]. It has been theoretically verified that PSO-ALS performs well in terms of convergence speed and solution accuracy. This study adopts the PSO-ALS algorithm to obtain the optimal parameters of the LSSVM-based model related to a practical problem, i.e., milling tool wear monitoring.
Dimension fusion techniques, including dimension reduction techniques (like principal component analysis (PCA)) and dimension increment techniques (like kernel locality preserving projection (KLPP)), have been used a lot for feature fusion [2,15]. These techniques can efficiently suppress and get rid of noise in the extracted signal features, which is crucial to increase the prediction accuracy of the decision-making system. Shi and Gindy [16] presented a new tool wear predictive model by combination of LSSVM and PCA. Kong et al. [8] proposed a unique dimension-increment strategy that can preserve and enhance reliable information concerning tool wear.
Moreover, stacking denoising autoencoder (SDAE) [17] and other deep learning-related methodologies have been created and utilized in life prediction and fault diagnosis in the last few years. Wang et al. [18] proposed a novel fault diagnostic method based on SDAE and achieved better fault diagnosis performance in the case of small samples. To remove noise from the original signal features, a novel strategy called stacked multilayer denoising autoencoders (SMDAE) is proposed in our previous work [19]. SMDAE, which is achieved by stacking denoising autoencoder layers [19], has been utilized to fuse the extracted features to improve the predictive performance of multi-kernel GPR. However, a small amount of noise and redundancy may still exist in the fused feature after dimension reduction or dimension increment, affecting the tool wear predictive model's performance. In this work, a new two-stage feature fusion by combining PCA and SMDAE is developed to improve the prediction accuracy further.
This research presented a novel tool wear predictive model based on the SMDAE technology and the proposed PSO-ALS LSSVM. First, a large number of features related to tool wear are extracted from the original cutting force and vibration signals to reflect the change of tool wear comprehensively. However, there is plenty of redundancy and noise in the original signal features, thus influencing the constructed model's predictive performance. Then, the effective feature fusion method is proposed to solve this problem. It entails first reducing dimension with PCA and then increasing dimension with SMDAE. The validity of SMDAE in enhancing the constructed model's predictive performance has been investigated in depth in this research.
The rest of this paper is organized as follows: "Sect. 2" briefly introduces the background theory of PCA, SMDAE, PSO-ALS, and LSSVM. The framework of the constructed model is detailed in "Sect. 3." Experiment setup and the effectiveness of the proposed model are discussed in "Sect. 4." Finally, the conclusions are summarized in "Sect. 5."

PCA
The goal of dimension reduction techniques is to identify fewer key variables by non-linear or linear transformations of multiple variables, so as to effectively extract the critical elements from the excessively rich data [20]. It helps to simplify the structure of the complex data, eliminate the redundancy, and decrease the dimensionality.
By maximizing variance, the dimension reduction method PCA provides the best representation of the extracted signal features [21]. Suppose x i ∈ R n represent the feature vectors to be preprocessed. The original features are projected by PCA onto the dimensions that have the most effective projection information, aiming to minimize the loss of information as much as possible. The dimension l(l < n) can be determined by two ways: (a) by utilizing K-fold cross-validation to ensure the predictive performance of the decisionmaking system and (b) by utilizing the threshold method, which is described as: where i are the eigenvalues of the covariance matrix, ordered from highest to lowest, resulting in 1 ≥ 2 ≥ ⋅ ⋅ ⋅ ≥ n . The threshold, commonly referred to as the cumulative contribution rate [15], is represented by .

SMDAE
The principles of SDAE and SMDAE are introduced in detail in our previous work [19]. By applying the method proposed by Vincent et al. [17], SDAE helps to compress and eliminate the redundancy that exists in the extracted signal features. The stochastic gradient algorithm [22] is applied during the parameter optimization procedure of SDAE.
Specifically, the most significant difference between SMDAE and SDAE is that the SDAE proposed by Vincent et al. mainly focuses on stacking two-layer denoising autoencoders, while SMDAE used in this study investigates the effect of stacking multilayer (including layers = 2, 3, and 4) denoising autoencoders on the predictive performance of the constructed model, and finally finds the optimal number of stacking layers and the dimension of the final output feature vector.
Although SDAE has been applied in the intelligent manufacturing industry increasingly, there are few studies that specifically address their application regarding tool condition monitoring, particularly in fusing multi-dimension features collected from multiple sensors. Our research uses SMDAE to solve this practical problem.

PSO-ALS
The particle swarm optimization (PSO), originally introduced by Kennedy and Eberhart in 1995 [23, 24], originates from the research on bird swarm predation. It is a population-based stochastic optimization algorithm. The population of particles or presented solutions evolves with each iteration of the algorithm, coming closer to the optimum solution for the problem. Due to PSO-ALS's easy implementation and effectiveness in finding global solutions, it has been widely applied to solve problems such as task assignment [25], classification [26], and stochastic optimization [27].
Each particle i (i = 1, 2, ⋅ ⋅ ⋅, N) in the standard PSO algorithm offers a possible solution to the optimization problem. The velocity vector ., x D i are used to define the ith particle. During the evolution process, the particles, initially distributed in the D-dimensional search space by stochastic velocities, update their velocities and positions by the following equations: where d = 1, 2, ⋅ ⋅ ⋅, D , the positive constants c 1 and c 2 denote the acceleration factors, rand d 1 and rand d 2 denote two random numbers uniformly distributed in the interval [0, 1] , pBest i and gBest are the previous optimal position found by the ith particle and the global optimal position found by the particle swarm, respectively. The above learning approach aids in fast convergence behavior. Nevertheless, the population diversity is lost since only search information from gBest is utilized to determine the search direction, increasing the risk of falling into the local optimum. To increase the population diversity, different PSO algorithms with various learning approaches have been created [28,29].
Recently, the multiswarm technique has attracted considerable attention. Zhang et al. [14] proposed a novel PSO algorithm based on the adaptive learning strategy. The proposed strategy adaptively divides the particles into several subswarms through an efficient clustering technique [30]. In each subswarm, the particles are then further divided into ordinary particles and the local best particle. The above particles are updated by different learning approaches. Instead of using the global best particle, the local best particle in each subswarm is utilized to update the ordinary particles in the same subswarm. The ordinary particles' learning approach is defined as follows: where is the inertia weight and cgBest c is the local best particle in subswarm c. By exchanging information, the local best particle in each subswarm uses the local best particles' average information in all subswarms to update its position, thereby promoting the population diversity. The following is the proposed learning approach for local best particles: where C denotes the subswarms' number. In Eq. (5), to update the local best particles, the average information of cgBest c in all subswarms is adopted as guidance.
In conclusion, a divide-and-conquer strategy is utilized to maintain the population diversity. After the swarm has been divided into subswarms, the complex overall PSO algorithm is decomposed into many basic PSO processes. This makes the issue easier to solve while boosting population diversity and strengthening global search ability.

LSSVM
Based on statistical learning theory, SVM is a commonly used supervised machine learning algorithm. For small sample sets, it is particularly beneficial for classifying and predicting. In contrast to SVM, which adopts the complex quadratic problems as constraints, LSSVM tries to find the solution by solving a set of linear equations. The main principles of LSSVM are as follows.
Given a set of training samples The following regression model, generated by utilizing a non-linear mapping function (⋅) , can be expressed as: where T is the weight vector and b is the bias item. Equation (7) provides the definition of the LSSVM algorithm's objective optimization function, where denotes the penalty coefficient. For samples exceeding the error, it determines their punishment degree. i are the error variables [31].
To solve the optimization problem, Lagrange multiplier is introduced. The LSSVM's Lagrange function is constructed as: According to the KKT condition, the resulting equality constraints are as follows: After elimination of and i , Eq. (10) can be expressed as: w h e r e y = y 1 , y 2 , ⋅ ⋅ ⋅, y n T , = 1 , 2 , ⋅ ⋅ ⋅, n T , is the non-linear kernel function, and the polynomial kernel function, radial basis kernel function, and sigmoid kernel function are the frequently employed ones. To reduce the computational complexity and get strong generalization ability, this study selects the radial basis function to be the most appropriate function. The expression of K(x i , x j ) is given as follows: where denotes the kernel parameter. Finally, and b are obtained by calculating Eq. (10), then the regression function of LSSVM model can be expressed as:

Tool wear predictive model based on SMDAE and PSO-ALS LSSVM
The key components of the LSSVM-based tool wear predictive model are feature extraction and fusion, model construction, and performance evaluation. Because of their low signal-to-noise ratio (SNR), original signals cannot be input to the model directly. Feature extraction and fusion is an effective signal processing method that can reduce model prediction errors caused by low SNR as much as possible. Furthermore, to reduce the impact of cutting conditions on the model's predictive performance, cutting parameters and fusion features are combined to form feature vectors, which are then utilized as the model's input. Detailed process of the construction of the presented model based on SMDAE and PSO-ALS LSSVM is shown in Fig. 1.

Feature extraction and fusion
Cutting force and vibration signals are reliable carriers of milling process information. In this study, the cutting force and vibration signals in each direction can be sampled at 5000 and 25,000 points per second, respectively. It is hard to establish a direct relationship between tool wear and raw signals by utilizing the LSSVM-based model due to the large number of sampling points. As a result, to naturally portray the variation in tool wear, the signal features must be collected from the original signals within a specific timeframe. For feature extraction, the raw signals' sample interval is set to 1 s before the termination of each cutting process. To depict the tool wear process as accurately as feasible, multi-domain features are extracted from the original signals in 1 s, as described in Table 1. Note that each type of feature extracted from the original vibration or cutting force signals contains three components. In terms of mean, for example, the cutting force feature includes mean F x , mean F y , and mean F z . Ten types of time-domain features are extracted from the original cutting force signals, including root mean square (Rms), variance (Var), mean, skewness (Ske), standard deviation (Std), kurtosis (Kur), peak to peak (PP), maximum (Max), force ratio, and form factor (Fmf). Rms is the representation of a signal's average energy over a specified time period. Var computes the difference between the predicted value and the source data to determine the dispersion of a data set. Mean reflects the change trend of a signal. A signal's asymmetry around its mean value is reflected by Ske. Std measures how much a signal varies from the mean value. The maximum value of the signal's probability density distribution curve at the mean is depicted by Kur. PP describes the magnitude of data change within a given time interval. Max denotes the signal's maximum instantaneous amplitude for a specific time period. Force ratio represents the ratio of the mean values of the component forces in different directions. Fmf represents the ratio of the Rms value to the mean value. Spectral power, spectral kurtosis, and spectral skewness are extracted from the frequency domain. Their specific definitions have been described in our previous study [19]. In the wavelet domain, "db5" wavelet packets adopting Shannon entropy are utilized to obtain wavelet packet energy. Altogether 135 signal features are obtained from the original cutting force signals.
For vibration signals, nine types of statistical features are extracted in the time-domain, which are briefly introduced as follows: mean, Rms, Max, PP, Std, Ske, Kur, Fmf, and vibration ratio. Altogether 3 × 9 signal features are obtained from the original vibration signals.
Note that the elements of feature set have different orders of magnitude. The above-mentioned features need to be standardized before feature fusion to eliminate the possibility that a large order of magnitude weakens the effect of the small one, which is given by Eq. (13), where x is the mean value and x is the standard deviation.
Considering that it is difficult to determine the validity of each signal feature, PCA is first used to fuse the extracted signal features to gain the most valid principal aspects. This helps get rid of as much redundancy and noise as possible from the signal features. The prediction accuracy of the decision-making system (PSO-ALS LSSVM) is used to determine the dimension m of PCA's fused features. Then, the PCA-based fusion features are fused using SMDAE, i.e., the effective information is enriched by achieving dimension increment. This improves the proposed model's predictive performance in two aspects: reducing the computational complexity and enhancing the prediction accuracy.
To eliminate the effect of cutting conditions on the LSSVM-based tool wear predictive model's predictive performance, this study combines the four cutting parameters listed in Table 4 with the PCA or SMDAE-based fused features to generate the new feature vectors, which are utilized as the input of the model.

Model construction
To monitor the tool wear in the milling process in real-time and precisely, this work proposes a novel feature fusion technique that entails first reducing dimension with PCA and then increasing dimension with SMDAE. Then combines it with the PSO-ALS LSSVM algorithm. In this paper, the time-domain, frequency-domain, and wavelet-domain features are extracted from the original signals. Then, PCA is used for fusing the above signal features to get the most effective principal components. Next, to enrich the effective information of the fusion features based on PCA, SMDAE is adopted to increase the features' dimension. Finally, feature vectors, which are composed of the corresponding fused features or the original signal features and the cutting parameters, are input to the LSSVM-based model. In this study, the constructed model is used to establish the non-linear mapping relationship between the feature vectors and the flank wear width (i.e., target value). For training dataset, using PSO-ALS algorithm to acquire the optimum parameters for LSSVM. Then use the optimized LSSVMbased model to evaluate the predictive accuracy of the model based on the test dataset. Figure 2 shows the flowchart for optimizing LSSVM with the PSO-ALS algorithm.
The concrete steps can be defined as follows: Step 1: Normalize the obtained data samples and divide them into training and test datasets that do not overlap. Then initialize the parameters of the PSO-ALS algorithm.
The swarm is divided into many subswarms through an efficient clustering technique [30]. According to the char-acteristics of the particle distribution, the size of the subswarm is adjusted adaptively.
Step 2: The penalty coefficient and the kernel parameter of LSSVM are adopted as every particle's two-dimensional coordinates. Take root mean squared error (RMSE) as the fitness function of the PSO-ALS algorithm. Calculate the fitness value of each particle according to Eq. (14), where y i is the real measured wear value, f i is the predicted value, and the total number of test points is represented by N . Seven-fold cross-validation is used when calculating the fitness of particles to prevent the model from overfitting. Note that the calculation only involves the training dataset. The steps of K-fold crossvalidation is illustrated in Fig. 4.
Step 3: By comparing fitness values, get the ordinary particles' optimal positions and the local best particle's optimal position in each subswarm.
Step 4: Iteratively update the positions of the ordinary particles and the local best particles in all subswarms according to Eqs. (4) and (5), respectively. Then produce a new population's generation.
Step 5: Recalculate each particle's fitness value in the new population. Then update all the subswarms according to step 4, the global best particle's position is obtained by comparing all of the local best particles' fitness values.
Step 6: The position of the global best particle and its fitness value are saved when the iteration reaches its maximum number. If the termination criterion is not met, go back to step 2 and continue to iterate.
Step 7: The final optimized parameters, i.e., the saved positions in step 6, are substituted into the LSSVM algorithm to obtain the optimal tool wear predictive model. Finally, the model based on PSO-ALS LSSVM is established.

Performance evaluation
To reveal the advantages of the proposed model, the test dataset generated from "Sect. 4.1" is utilized to measure the prediction accuracy of the LSSVM-based model. As shown in Table 2, three evaluation indicators are investigated. In Table 2, y i is the microscope's measured value of actual tool wear, while y is its mean value. At the feature vector for the test dataset, f i is the tool wear value predicted by the constructed model, and f is its mean value. Note that a larger PCC value indicates the model has better predictive performance, while MAE/RMSE is on the contrary.  Figure 3 depicts the experimental setup for milling tool wear. The experiments are performed on a three-axis vertical machining center (DAEWOO ACE-V500). The size of the workpiece, which is made of titanium alloy (Ti-6AI-4 V), is 170 × 85 × 85 mm 3 . The indexable insert (Walter SPM-T1204AEN-WSP45) is adopted as the cutting tool. The cutter head (Walter F2233.B.080.Z06.07) has three inserts mounted symmetrically on it. During the milling process, the cutting force signals in three directions are collected by the cutting force acquisition system, which is composed of a data acquisition card, a charge amplifier, and a dynamometer. Besides, one ceramic shear triaxial IEPE accelerometer (Kistler 8763b050bb) is used to get the fixture's orthogonal vibration signals from three directions. The multifunctional DAQ (NI USB-6211) is used to transmit the vibration signals, and the data analysis platform (NI LabVIEW) is used to finally save the signals to a computer. Table 3 lists the experimental instruments and their detailed information utilized in this study. During the milling experiment, the sampling rates of the cutting force and vibration signals are set to 5 kHz and

Fig. 3
Experimental setup for milling tool wear 25 kHz, respectively. To ensure the tool wear trend is normal, trail cuts are required to confirm the range of machining parameters. This also contributes to reducing vibration and verifying the feasibility of the selected parameters. In this work, machining parameters of the experiments, as listed in Table 4, are chosen from the insert's recommended range and determined with trial cuts. The experiments are repeated three times with the same machining parameters to achieve the most reliable sample data. Altogether 320 data samples are gathered from the 12 milling experiment sets as introduced in Table 4. There is no overlap between the training and test datasets, which are divided from the above-mentioned samples. The training dataset comprises two complete milling tool wear experiments for each set of cutting parameters, totaling 213 samples of feature vectors. The test dataset contains the remaining experiments, totaling 107 samples of feature vectors.
In the multi-tooth milling process, each tooth is independent of the others. After each cutting procedure, the HD CCD video microscope (BD-100) is utilized to measure the flank wear width VB. The three inserts' average value VB is utilized to measure the degree of tool wear. In each set of cutting experiments, the new insert is adopted to work until VB > 0.35 mm is reached.

Feature extraction and selection
In "Sect. 3.1," 162-dimension original feature vectors are obtained to portray the variation in tool wear. However, the extracted signal features contain plenty of redundancy and noise and will affect the prediction of tool wear.
The extracted signal features are fused by PCA to reduce dimension in order to eliminate as much redundancy and noise as possible. The first m PCA-based fusion features and the four cutting parameters shown in Table 4 are combined to form the new m + 4-dimension feature vectors, which further compose the training dataset. The LSSVM-based tool wear predictive model, also known as the decision-making system, is constructed by the PSO-ALS LSSVM algorithm.
Furthermore, K-fold cross-validation and PSO-ALS LSSVM are utilized together to obtain an optimal dimension m of the PCA-based fusion features. Figure 4 illustrates the steps of the K-fold cross-validation. The only dataset involved in it is the training dataset. The dataset is first randomly divided into K equal-size subsets. Secondly, the other K − 1 subsets are adopted as the "training set," while the ith (i = 1, 2, ⋅ ⋅ ⋅, K) subset is adopted as the "test set." Thirdly, the LSSVMbased model trained with the "training set" is evaluated by the "test set" regarding its predictive performance. RMSE between the true values and the predicted results of the presented LSSVM-based model is chosen as the evaluation indicator. Finally, there are K sets of calculations (model construction and evaluation), and the fitness value is calculated by the average of the K evaluation results.
From m = 1 to m = 162, the effectiveness of the features fused by PCA for the milling dataset is analyzed. Figure 5 illustrates the evaluation results of 6-, 7-, and eightfold cross-validation under different values of m. The intersection of different linear lines in Fig. 5 indicates the minimum fitness value under 6-, 7-, and eightfold cross-validation, respectively. It can be concluded that when m = 78 is chosen, the features fused by PCA contain the most useful information. Therefore, m = 78 is chosen as the optimal dimension for PCA's fused features.

Model establishment and assessment by using the original signal features
In this work, PSO-LSSVM, linear SVR [32], K-NN [33], and decision tree [34] are also adopted to achieve tool wear prediction. The feature vectors, formed of the four cutting parameters and the 162 extracted original features, are directly input to the five models (PSO-ALS LSSVM and the above four comparison models) to demonstrate the superiority of PSO-ALS LSSVM, i.e., it has better prediction accuracy. According to the analysis in "Sect. 2," the two most important parameters that affect LSSVM's predictive accuracy are and . As described in "Sect. 3," the PSO-ALS algorithm was utilized to optimize the and the within LSSVM. The following are the initial parameter settings for PSO-ALS: the value of declines from 0.9 to 0.4 linearly and the values of c 1 and c 2 are both 2.0. The population size Fig. 4 Steps of the K-fold cross-validation is 20, the same as that in standard PSO, and the maximum number of iterations is 500. Besides, the range of values for the two-dimensional coordinates of each particle, i.e., and , is from 1e − 9 to 2e3.
The training dataset comprises two complete milling tool wear experiments for each set of cutting parameters, totaling 213 samples of feature vectors. The test dataset contains the remaining experiments, totaling 107 samples of feature vectors. Based on the procedures for optimizing  LSSVM with the PSO-ALS algorithm, as described in "Sect. 3.2," the optimal parameters and are 694.992351 and 6.82158996e − 4, respectively. The initial parameter configuration in the PSO-LSSVM model is defined as follows: the inertia weight = 0.9, and other conditions are the same as in the PSO-ALS LSSVM model. The final parameter optimization result, determined by utilizing sevenfold cross-validation, is = 2000 and = 1.37621515e − 4.
Predicted results of the five mentioned models based on the test dataset are shown in Fig. 6. The actual tool flank wear width is represented by the bold red line in each subfigure. The absolute difference between the actual and predicted tool wear values is given by the error. It can be found that PSO-ALS LSSVM, PSO-LSSVM, and decision tree can efficiently follow the actual tool wear value's trail. K-NN and linear SVR performs not so well. The five predictive models' performance comparison under different evaluation indicators is illustrated in Fig. 7. In each subfigure, the evaluation indicator's value is denoted by the figure at the top of

Model establishment and assessment based on the fused features of PCA
As pointed out, the original signal features contain plenty of redundancy and noise. As shown in Fig. 6, the presented model's predicted results are not so satisfactory due to the deviation between some test points and the actual tool wear trend, for example, the points around VB = 0.15 and 0.3 mm.
In this section's discussion, the first 78 PCA-based fusion features and the four cutting parameters shown in Table 4

Model establishment and assessment by utilizing the SMDAE-based fusion features
SMDAE is adopted to fuse the 78-dimensional fused features of PCA to improve the model's predictive performance. In this section, the new feature vectors, which are made up of the corresponding SMDAE-based fusion features and the machining parameters listed in Table 4, further constituting the training and test datasets. The established LSSVM-based model's predictive performance is evaluated with the test dataset. SMDAE with different layers is trained locally to denoise the inputs while realizing feature fusion [19]. This paper analyzes the effectiveness of SMDAE with layers = 2, 3, and 4. The range for the dimension of each layer is set to [35,170] to weaken the negative effect of the random noise. The effect of SMDAE with different layers and the corresponding dimensions on the prediction accuracy of the constructed model is discussed in detail. The selectable region of dimension for each layer in SMDAE is dependent on the evaluation indicator values of the established model, which are acquired by using the fused features of SMDAE.
Note that the first denoising layer's dimension, which serves as the foundation for optimizing the successive denoising layers, is essential for the follow-up optimization process. Firstly, layers = 2 is discussed in detail. Figure 9 shows the performance assessment of the constructed model using the SMDAE-based fusion features with different first denoising layer dimensions. The range for the dimension of the first layer is set to [35,170]. The second layer's dimensions are set to 50, 70, 90, 110, 130, 150, and 170, respectively. In each subfigure, the threshold acquired from the constructed model by utilizing the PCA-based fusion features is represented by the blue dashed line. As the dimension of the first layer increases from 35 to 100, the value of MAE/RMSE, in terms of the overall trend, decreases gradually and then increases, while PCC behaves oppositely. It can be found that when the dimension of the first layer varies from 50 to 80, the three evaluation indicators are improved. Besides, when the first layer's dimension is set to 60, the model relatively reaches its best prediction accuracy. Therefore, considering the stability of SMDAE, the dimension of the first layer is chosen to be 60, the selectable region's midpoint value, as the foundation for further optimization. Table 5 shows the performance assessment of the model constructed with the fused features of SMDAE when layers = 2. The second layer's dimension is also set to the range of [35,170]. It can be seen that performance of the model largely depends on the dimension of the second layer and the selectable region is [50,170]. At layers = 2 and dimension = [60, 70], MAE decreased by 7.78%, RMSE decreased by 9.24%, and PCC increased by 0.38%. Thus, it can be concluded that SMDAE is effective for improving the constructed model's prediction accuracy.
The performance assessment of the model constructed with the fused features of SMDAE when layers = 3 is illustrated in Fig. 10. As shown in Fig. 10, the random error of the constructed model caused by feature fusion and denoising processes is represented by the red error bars (4%). Besides, it is obviously that the selectable region in Fig. 10 is slightly narrower than in Table 5. Table 6 illustrates the variation (%) of the three indicator values compared to the corresponding threshold as the dimension of the third layer increases when layers = 3. It can be found that at layers = 3 and dimension = [60, 70, 150], MAE decreased by 7.79%, RMSE decreased by 9.81%, and PCC increased by 0.40%. Compared with the case when layers = 2 and dimension = [60, 70], there is a slight improvement in the predictive performance of the model.
The performance assessment of the model constructed with the fused features of SMDAE when layers = 4 is shown in Table 7. The fourth layer's dimension is also set to the range of [35,170]. Compared with Table 5 and Fig. 10, it can be found that the optimization effect is unstable, the selectable region becomes very small, and the evaluation indicator values vary widely. Besides, when it comes to the constructed model's prediction accuracy, it is slightly lower compared to the case of layers = 3. Therefore, at layers = 3 and dimension = [60, 70, 150], the constructed model has the highest prediction accuracy and strong stability of the evaluation indicators.
The predicted results of the constructed model using the fused features of SMDAE (layers = 3, dimension = [60, 70, 150]) are shown in Fig. 11. As seen in Fig. 11, the test points near VB = 0.15 and 0.3 mm differ very little from the actual wear trend, comparing with Fig. 8, and it can be found that the denoising property of SMDAE contributes to polishing up the predictive accuracy of the constructed model. Furthermore, although some predictive values do not fully match actual tool wear values, the gaps between them are quite small. Finally, the effectiveness of SMDAE in enhancing the established model's predictive performance is intuitively reflected.
In summary, it can be concluded that, under the support of the SMDAE technique and the presented PSO-ALS LSSVM algorithm, the tool wear of the milling process can be tracked more precisely.

Conclusions
Based on the SMDAE technology and the proposed PSO-ALS LSSVM, this study presents a novel tool wear predictive model. The main conclusions are as follows: 1. By comparing various evaluation indicators, the presented model is better than models like PSO-LSSVM at predicting milling tool wear. 2. An effective feature fusion method for monitoring milling tool wear is found. It entails first reducing dimensions with PCA and then increasing dimensions with SMDAE. 3. As a new denoising technique, the SMDAE technique enhances the established model's predictive performance effectively. 4. At layers = 3 and dimension = [60, 70, 150], the constructed model has the highest prediction accuracy and strong stability of the evaluation indicators.
Twelve milling experiment sets are carried out to validate the effectiveness of the proposed model. The findings of this paper offer theoretical guidance for monitoring milling tool wear in real industrial situations.