Tool wear predicting based on weighted multi-kernel relevance vector machine and probabilistic kernel principal component analysis

This paper proposes a novel tool wear predicting method based on a weighted multi-kernel relevance vector machine (WMKRVM) and the integrated radial basis function–based probabilistic kernel principal component analysis (PKPCA_IRBF). The proposed WMKRVM model is constructed using the optimized standard single kernel RVM and its weight parameters. As a new dimension increment technique, PKPCA_IRBF can extract the noise information of the cutting force signal feature and incorporate the noise information into the model. Moreover, PKPCA_IRBF is first proposed to fuse the cutting force signal feature to improve the confidence interval provided by the WMKRVM model. Compared to the traditional PKPCA_RBF method, PKPCA_IRBF has a broader range of kernel parameter selection intervals and higher model accuracy. The cutting experiment is carried out to validate the effectiveness of the proposed tool wear predicting technique. Experimental results show that the proposed tool wear predicting technique can accurately monitor the tool wear width with strong robustness under various cutting conditions, laying the foundation for application in the industrial field.


Introduction
The workpiece quality and economy of the manufacturing system are the most susceptible to tool wear status. Thus, tool condition monitoring (TCM) and prediction are indispensable in an intelligent manufacturing system. To guarantee the stability of the intelligent manufacturing system, an efficient monitoring system for real-time and precise evaluation of the tool wear status is highly desirable.
Many remarkable achievements have emerged in tool condition monitoring and prediction with recent sensor technology and machine learning theory advancements. The researchers [1][2][3][4] review methods used for TCM and provide a comprehensive survey of sensor technologies, signal processing, and decision-making system for process monitoring. By extensively reviewing papers in different areas, researchers summarized the development trend of TCM. They gave suggestions for the development of TCM in the future, thus promoting the cross-fertilization of concepts and techniques within TCM research. Recently, artificial intelligence (AI) technology, such as the hidden Markov model, long short-term memory (LSTM), and artificial neural networks, have been applied with considerable success. Liu et al. [5] propose a novel approach for tool wear monitoring based on a novel switching hidden semi-Markov model to represent the equipment's degradation process. Han et al. [6] research the sticky hierarchical Dirichlet process hidden Markov model. Huang et al. [7] combine the advantages of bidirectional LSTM networks and particle filter and mitigate their limitations. Sun et al. [8] use the LSTM network to establish a residual convolutional neural network to predict multiple flank wear values. Xu et al. [9] propose an intelligent model-adaptive neuro-fuzzy inference system to estimate the tool wear. Shi et al. [10] present a novel deep learning data-driven modeling framework for TCM in an ultraprecision machine. Wang et al. [11] present a multiscale principal component analysis method to realize online tool wear monitoring of the milling process. Huang et al. [12] propose new tool wear predicting method based on multi-domain feature fusion by a deep convolutional neural network. Multi-domain (including time domain, frequency domain, and time-frequency domain) features are extracted from multisensory signals as health indicators of tool wear condition. A predicting model using advanced machine learning methods with a multi-feature multi-model ensemble and dynamic smoothing scheme is developed by Shen et al. [13]. However, these methods focus on improving the test dataset's accuracy and lack the prediction result's uncertainty estimation or probability.
Relevance vector machine (RVM) is a sparse probability model based on kernel function and originally was derived and experimented on binary classification [14]. Researchers have successfully applied RVM to many engineering fields since it has lower requirements on training samples and is robust to noise samples [15,16]. However, traditional singlekernel RVM has difficulty capturing the structure of nonhomogeneous data from multiple data sources or different data types. Therefore, researchers choose multiple types of kernel functions to build multi-kernel models to exploit the advantages of different types of kernel functions. Chen et al. [17] presented an early fault diagnosis method for rolling bearings based on a multi-kernel RVM. Experimental results show that the proposed method has higher prediction accuracy than the traditional single kernel RVM.
Inspired by the multi-kernel model, this paper proposes a new tool wear predicting model based on the weighted multi-kernel relevance vector machine (WMKRVM). In this work, the weight parameter and the hyperparameters of each kernel function are optimized by the sparrow search algorithm (SSA) [18]. Moreover, this work proposes the integrated radial basis function-based probabilistic kernel principal component analysis (PKPCA_IRBF) to improve the confidence interval (CI) of the WMKRVM model.
Dimension-fusion techniques, including kernel principal component analysis (KPCA), and isometric feature mapping, have been widely used for feature selection and fusion [19]. Feature fusion can effectively compress and remove noise and redundancy from the data, which is vital for improving the precision and reliability of the decision-making system. In this research, the probabilistic kernel principal component analysis (PKPCA) technique [20] is used to fuse the extracted feature to improve the WMKRVM model's performance. Kim et al. [21] proposed a multivariate process monitoring method based on probabilistic principal component analysis and applied it to process monitoring. Neil [22] proposes a novel dual probabilistic principal component analysis (DPPCA). The DPPCA model has the advantage that the linear mappings from the embedded space can easily be nonlinearized through Gaussian processes. The basic idea of PKPCA is to quantify the randomness of process variables contaminated by random noise. Typically, the kernel function determines the performance of PKPCA. Since it is hard to determine the effective and noisy information preserved by the radial basis function (RBF), this research adopts the integrated radial basis function (IRBF) [23] as the kernel function of PKPCA to fuse features. The proposed PKPCA_IRBF technique can enrich noise while preserving effective tool wear information. Experimental results show that the utilization of PKPCA_IRBF will be conducive to improving the confidence interval of the WMKRVM model. This paper is organized as follows: In the first section, the experimental and data collection is introduced. The theory of PKPCA_IRBF, WMKRVM, and SSA is shown in the second section. In the third section, the proposed model and the experimental results are analyzed. Finally, conclusions are given in the fourth section.

Experimental setup
The cutting process is conducted on the CNC lathe (PRE-CION CKD6150H) to validate the effectiveness of the proposed tool wear predictive model. A CNC lathe, cutting force monitoring system, experimental material, HD CCD video microscope, and cutting tool constitute the experimental setup, as shown in Fig. 1. The cutting force monitoring system is composed of a dynamometer, charge amplifier, data acquisition card, and computer with DynoWare.
The software DynoWare collects the cutting force signal during the machining process in real time at a sampling rate of 20 kHz and displays it on a computer. The experimental material is 50# normalized steel (HB160 ~ 197) with a cutting diameter of 150 ~ 80 mm and a cutting length of 140 mm. The cemented carbide indexable insert (Sandvik CNMG120 408-PM) is selected as the cutting tool and connected to the dynamometer via a tool holder (Sandvik PCLNR 2525 M 12). The tool wear width VB at half of the cutting depth, measured by an HD CCD video microscope, quantifies the tool wear degree. The whole service life of the cutting tool is the time from new tool to tool breakage (VB > 0.5 mm). HD CCD video microscope measures the tool wear width VB and records it. The laboratory instruments used in this research are shown in Table 1. The experiment is carried out under different machining conditions, as shown in Table 2, with varying feed rates and cutting depths.
Three orthogonal cutting forces contain rich information about the change of tool wear width during machining. The time domain, frequency domain, and wavelet domain feature listed in Table 3 are extracted from the original orthogonal cutting forces signals within 5 s as feature reflecting tool wear. Root mean square (RMS) reflects the effective value of the signal in a given time interval. The dispersion of the cutting force signal and the difference between the original force signal and the expected value are measured by variance. The maximum value (Max) represents the maximum instantaneous amplitude of the signal within a specific time range. Skewness measures the direction and degree of the skew of the probability density curve distribution of the data, that is, the degree of asymmetry of the data compared to the average value. Peak to peak (PP) represents the difference between the highest and lowest values of the data in a period and describes the size of the range of data change. Kurtosis represents the maximum size of the probability density distribution curve of the signal at the average value.
The kurtosis visually reflects the peak sharpness of the curve. The higher the kurtosis, the greater the thickness at the bottom of the curve, which leads to more significant variance. The reason is that the signal has a certain number of differences that are significantly larger or smaller than the  Kistler 2825D A general-purpose data acquisition and analysis software average. Wavelet packet decomposition effectively decomposes the original signal into multiple sub-band. Based on wavelet transform, the detailed sub-band is further decomposed. Finally, the optimal signal decomposition path is calculated by minimizing the cost function, which decomposes the original input signal. As a sparse basis, the Daubechies wavelet introduces the smoothing error of the signal. It has reasonable regularity, the signal reconstruction process is smooth, the order of N can control the localization ability in the frequency domain, and the flexibility is high. And the packet decomposition process uses the Daubechies wavelet function basis. This paper adopts the tool wear width VB of the cutting tool as the object of research. As shown in Table 2, all data files are collected and recorded in the twelve cutting tests. One data file corresponds to one cutting experiment. In addition, features need to be fused to attenuate the noise to reduce its negative impact. This work combines the extracted original signal or fused features with the corresponding processing parameters to form feature vectors fed into the proposed model. All force signal features are mixed and divided into training and test datasets to avoid errors brought by individual experiments. All feature vector samples are obtained and divided into training and test datasets at random at a ratio of 9:1, which do not intersect.
The extracted signal features need to be normalized before the feature fusion: where x is the mean value of cutting force, x is the standard deviation.

Probabilistic kernel principal component analysis
Probabilistic principal component analysis was proposed by Tipping and Bishop within the maximum likelihood framework to analyze kernel principal components in a probabilistic manner [23,24]. Probabilistic analysis supposes that the data in the feature space follow a special factor analysis model which relate a p dimensional observational data x to a latent q dimensional variable z as: The model covariance is computed by Σ=WW T + 2 I . The log-likelihood of the observation data is expressed as follows: Spectral skewness Spectral kurtosis Spectral power Based on the maximum likelihood estimates, the weight matrix W and the noise variance 2 are expressed as follows: where S = U q Λ q U T q and R is an arbitrary q × q orthogonal rotation matrix. q+1 , ..., q are the smallest eigenvalue of S . Based on the Bayesian theory, the associated posterior distribution of the latent variable z given a specific observed data x is expressed as follows: where M = 2 I + WW T ∈ R p×q is the posterior covariance. In the case of PKPCA, a nonlinear expectation maximization (EM) algorithm has been used for parameter learning. The EM algorithm iterates two steps, expectation (E-step) and maximization (M-step) until convergence, and local maxima of the data likelihood can be guaranteed.
where H = I − 1 p 1 p 1 T p . I and 1 p denote the p × p identity matrix and the p × 1 vector of one, respectively. It is important to note that it works with the kernel matrix K only.
Selection of the kernel function directly affect the performance of the fused feature obtained from PKPCA. In this paper, the integrated radial basis function (IRBF) is adopted as the kernel function of PKPCA, referred to as the PKPCA_ IRBF technique and is expressed as follows: where 0 is the kernel parameter.

Weighted multi-kernel relevance vector machine
RVM is a sparse learning classifier proposed by Tipping based on the Bayesian inference. Supposing that a dataset is m is the number of cutting data samples, which follows the probabilistic formulation p(y i |x i ) = N(y i |f (x i ), 2 ): where the additive noise i ∼ N(0, 2 ) . The regression function of the RVM model corresponds to those implemented by SVM, which means the target values of y i are the linear weighted sum of the nonlinear basis function: where K(x, x i ) is the kernel function of RVM, w i is the model weight value, and w 0 is the bias. The corresponding likelihood function is defined as: where y = [y 1 , y 2 , ..., y m ] T is the target vector of cutting data, The introduction of an individual hyperparameter i for each weight parameter w is the key feature of the RVM. According to the Bayesian theory, the traditional method is that direct execution of maximum-likelihood estimation of w and 2 will lead to sever overfitting. To overcome this problem, a zero-mean Gaussian prior distribution over w i with a different precision i is imposed to constrain their variation. Thus, the prior distribution of w is expressed as follows: where is a vector of m + 1 hyperparameter. Consequently, the posterior distribution over the weight w conditioned on the cutting data is expressed as follows: where p(y| , 2 ) = ∫ p(y|w, 2 )p(w| )dw is convolution of Gaussian. The variance matrix S and the weight mean matrix m over the posterior weight w follow a Gaussian distribution and are expressed as follows: Under the Bayesian framework, the weight posterior p(w|y, , 2 ) is calculated by maximizing the marginal likelihood (11) . The posterior of hyperparameter is expressed as follows: which means and 2 are calculated by seeking the approximation function p( , 2 |y) ≈ ( MP , 2 MP ) . The logarithm of RVM marginal likelihood is expressed as follows: The values of ( MP , 2 MP ) can be obtained iteratively by using the updating formulae: The prediction distribution can be based on the posterior distribution of the weights, conditional on maximizing the values MP and 2 MP . We then calculate the predictive distribution for new test data x * : where the predictive mean f * and variance 2 * are expressed as follows: where the mean f * and the variance 2 * denote the predicted value and the uncertainty of the predicted distribution at the test data x * , respectively. The 95% CI of the predicted results is[f * − 1.96 * , f * + 1.96 * ].
In learning RVM modeling, choosing an appropriate kernel function with a suitable hyperparameter is complicated. Constructing kernel function with better learning ability and generalization requires lots of experimental and computing resources. However, the multi-kernel model utilizing several kernel functions exhibits higher performance than a single kernel function. The linear combination of kernel functions is the most common way to construct the multi-kernel RVM model and is expressed as follows: where K 1 and K 2 are different kernel functions (e.g., the radial basis function and polynomial kernel function). is the weight parameter for kernel function in combination with kernel function. Multi-kernel function integrates the excellent characteristics of different kernel functions, showing better learning and generalization ability when dealing with more complex data features. This paper adopts the linear weighted combination kernel function, which can adaptively adjust the weight of each independent kernel function for different input samples to achieve the advantages of varying kinds of basic kernel functions.
where M is the number of the kernel function.
Multi-kernel function integrates the advantage of the different kernel functions. For example, the radial basis function kernel function captures the local nonlinear change trend in the tool wear process. The linear kernel function captures the global monotonic decreasing trend of tool wear. Since the kernel function of RVM does not have to satisfy Mercer's theorem as SVM, K s (x, x i ) can be any kernel function. Optimization of the weight parameter s ≥ 0( ∑ M s=1 s = 1) is required to ensure that the kernel function maps the data with high performance.

Sparrow search algorithm
The Sparrow search algorithm (SSA) is a swarm intelligence optimization algorithm inspired by the foraging and anti-predation behavior of the sparrow population proposed in 2020. As sparrow is an intelligent social creature with an excellent memory, the sparrow population has biological characteristics in foraging: 1. It divided sparrow populations into producers and predators. Producers have more extensive search space to find food sources, while others search for food based on producers. 2. Sparrows have active evaders; some can be chosen as Scouters in foraging. 3. If better food sources appeared, producers and scroungers would dynamically switch. 4. Scroungers can always find producers to provide better food sources; scroungers even monitor producers to get more food.
The flowchart of SSA is presented in Fig. 2.

Construction of tool wear predicting model based on PKPCA_IRBF and WMKRVM
The reasonable selection of weight of kernel function determines the performance and sparsity of RVM. The commonly used parameter optimization method for MKRVM is to encode MKRVM's parameters in the optimization algorithm's solution. The process is time-consuming and laborious and requires high optimization performance from the optimization algorithm. To obtain suitable hyperparameters for the WMKRVM model and prevent over-fitting in the test dataset, the SSA optimized the hyperparameters of the WMKRVM model in stages. SSA is a novel swarm optimization approach, mainly inspired by sparrows' group wisdom, foraging, and antipredation behaviors [18]. It offers high search accuracy and robustness compared to other algorithms. Experiment results of nineteen standard test functions [25][26][27][28], including unimodal test functions, multimodal test functions, and fixed-dimension test functions, verify the feasibility and effectiveness of SSA. Compared with grey wolf optimizer [29], gravitational search algorithm [25], and particle swarm optimization [30], SSA is superior in searching precision, convergence speed, stability, and robustness. Therefore, SSA is adopted as the parameter optimization method for MKRVM in this research.
The SSA optimizes the kernel parameter of the standard kernel function, then the optimized kernel function comprises the multi-kernel function of RVM. Finally, the SSA algorithm optimizes the weight parameters of the WMKRVM model.
The weight parameter s of kernel function make up the search agent: where x t ij is the jth dimension of the ith agent. t is the number of optimization calculation. The weight parameter optimized in WMKRVM is contained in the search agent x t ij = ( 1 , 2 , ..., M ) . The target function of SSA should be defined in a proper form.
The mean square error (MSE) between the target value and the predictive mean of the RVM is adopted as the target function of SSA, as shown in Table 5. The training dataset is used to train and optimize the parameters of the WMKRVM model. Besides, k-fold cross-validation is utilized to determine the target function values for each search agent of SSA, as shown in Fig. 3.
Subsequently, each subset was used once to validate the accuracy of the WMKRVM-based tool wear predictive model, which is trained by the remaining k-1 subsets. The training process is then executed k times, and each subset is used only once as the validation dataset. The MSE between the actual value and the predicted value is used as the performance indicator of the model. The sum of validation results is used as the fitness value for the search agent. The iterative process continues until the maximum number of iterations is reached, and the optimized weight parameters of WMKRVM are found. After finishing the weight parameter optimization process of the WMKRVM model, the optimized weight parameters corresponding to the minimum fitness are generated and combined with the training set to construct the tool wear prediction model.

Fig. 2 The flowchart of Sparrow Search Algorithm
The training and test datasets provided by PKPCA_IRBF are used to optimize the hyperparameters of the proposed model and evaluate its performance. When the convergence value is satisfied, the iterative process of ( , ) will stop. The maximum of the iterative process shown in Fig. 4 is defined as 1000. The iterative process will stop when satisfying the convergence condition |max{ new } − max{ old }| < Thresh . Once the iteration process is finished, the modeling parameter will be determined.

Experimental results analysis
The tool wear predictive model based on the WMKRVM model and PKPCA_IRBF proposed in this paper aims at real-time and accurate monitoring tool wear during machining. The cutting experiment is carried out to validate the effectiveness of the proposed tool wear predictive model, as shown in Fig. 1. Data acquisition for cutting experiments

Performance evaluation of the RVM
Existing kernel functions are divided into local kernels and global kernels. This research uses the commonly used kernel functions for model RVM, including local and global kernels, such as Gaussian kernel, multiquadric kernel, polynomial kernel, Laplace kernel, and exponential kernel, as shown in Table 4. Besides, they are used for tool wear prediction to show the advantages of the WMKRVM model. The input feature vector of tool wear prediction comprises the extracted feature of cutting force and cutting parameters. The training dataset obtained in "Experimental setup" is used to estimate the modeling parameters of the WMKRVM model, including the probable variance of the Gaussian noise 2 MP , the posterior varianceΣ , and the mean of the weight w. The purpose of multi-kernel is to integrate the advantages of different kernel functions to obtain better performance. When dealing with data with complex structural features, the multi-kernel performs better than the single kernel. Linear combinations of kernel functions are often used to construct the multi-kernel function. The linear weighted combination kernel function adopted in this paper is different from the standard linear combination of the local and global kernel functions. The proof that the prediction performance of the WMKRVM model outperforms single kernel RVM is: Supposed that the set formed by the optimal solution of different single kernel RVM K i is expressed as follows: Then the proposed WMKRVM model g consisting of different single kernel func- Counterproof: Let a k = 1 and a i = 0, i = 1, 2, ..., k − 1, k + 1, ..., N, then.
The hypothesis does not hold. The optimum solution of WMKRVM g min ≤ f is proved.
In recent years, there has not been a unified theory and methodology for selecting kernel and its weight parameters. Commonly used methodology for optimizing kernel parameters include empirical selection, experimental comparison, large-scale search, and cross-validation. This research adopts SSA to optimize the hyperparameter of the WMKRVM The root mean square error � The mean absolute percentage error Negative correlation The Spearman correlation coefficient

Positive correlation
The Pearson correlation coefficient Positive correlation model. The convergence process curves for different single kernel RVM and the WMKRVM model optimized by SSA are shown in Fig. 5. SSA performs well for optimizing the hyperparameters of the RVM, as the fitness values of the RVM model remain stable after 100 iterations. The parameters of SSA are set to (100, 50, 1, 0.5, 10 −9 ). The optimized kernel parameter and its weight parameter of the RVM model are listed in Table 6. The performance of the tool wear predictive model is investigated by the evaluation indicator listed in Table 5. In Table 5, y i is the tool flank wear width and y is the mean tool flank wear width measured by HD CCD video microscope. f i is the predicted tool flank wear width value and f i is the mean predicted tool flank wear width value based on the tool wear predicting model at feature vector x i of test dataset.
MAE is the mean absolute error, i.e., the average absolute value of the deviation of all individual observations from the arithmetic mean. MAE avoids the errors from canceling each other out and reflects the actual forecast error. RMSE is the root mean square error, representing the sample standard deviation of the difference between the predicted and observed values (residual) and explaining the sample's dispersion. MAPE, the mean absolute percentage error, is a statistical indicator used to measure the accuracy of the prediction. PCC is the Pearson correlation coefficient and reflects the model's performance.
The performance of different kernel RVM-based tool wear predicting models under different evaluation indicators based on training and test datasets is shown in Figs. 6 and 7, respectively. Compared to the Gaussian kernel, Laplace kernel, exponential kernel, polynomial kernel and multiquadric kernel, the weighted multi-kernel has the minimum MSE  Table 7.
The test dataset evaluates the effectiveness of the proposed WMKRVM model. The optimized hyperparameters in Table 7 are used to construct the RVM model. The predicted results of different kernel RVM-based tool wear predictive models (Gaussian kernel, Laplace kernel, multiquadric kernel, polynomial kernel, exponential kernel, and multikernel) for the cutting process by using the original signal feature are shown in Fig. 8.
The real brown point represents the test point which is the actual tool wear value measured by HD CCD video microscope. The solid and dashed lines represent the predicted results and 95% CI obtained from the WMKRVM model, respectively. The deviation represents the absolute value of the difference between the predicted and the actual tool wear value.
The number above the bar in the figure represents the value of evaluation indicator. It is found that WMKRVM has the minimum MAE/RMSE/MAPE and the maximum PCC among the different kernel RVM models, as shown in Fig. 7. Therefore, it can be concluded that WMKRVM outperform traditional single kernel RVM in capturing the global structure of the nonlinear dataset.
However, the width of 95% CI obtained by WMKRVM does not capture the effect of process noise and redundancy on the model, affecting its practical application. Therefore, this paper adopts PKPCA_IRBF to fuse the original data features and sufficiently estimate the WMKRVM model's data noise.

Model construction and evaluation based on PKPCA_IRBF fusion feature
The lower and upper bounds of the 95% CI are given by f i − 2 √ 2 i and f i + 2 √ 2 i , respectively. However, the 95% CI of the predicted results obtained by the WMKRVM model did not accurately reflect the process noise. This means the model is lacking in the simulation of noise. Besides, the traditional PKPCA adopts RBF as its kernel function, referred to as the PKPCA_RBF technique, and is expressed as follows: where 0 is the kernel parameter of PKPCA_RBF. PKPCA_ IRBF and PKPCA_RBF techniques are used to fuse the original force signal feature to ameliorate the 95% CI.
This research adopts PKPCA_IRBF and PKPCA_RBF to improve the model's ability to simulate the noise and improve the 95% CI of the prediction results. The fused features obtained from PKPCA_IRBF and PKPCA_RBF    Table 7 Fig. 9 a-f Original feature and the feature fused by PKPCA_RBF.   The dense distribution and narrow discrete nature of the original force feature data, as shown in Fig. 9a, are not conducive to estimating the effects of noise and distinguishing changes between distinct data points. As shown in Fig. 9b-f, the distribution of the force features fused by PKPCA_RBF is uniform and discrete, indicating that PKPCA_RBF can effectively distinguish between distinct data points. However, the even data is not conducive to the model prediction. Although fusion features distinguish the differences between the overall data, uniformly distributed fusion features do not reflect the drastic changes in specific data points, which lowers the WMKRVM model's prediction accuracy. Compared to the original force feature, the dispersion and volatility between the PKPCA_IRBF fusion features are more significant, which is beneficial in estimating the confidence interval of the WMKRVM model, as shown in Fig. 10a-f.
In this research, the kernel parameters in KPCA_IRBF are set to 0 = 1.5 × 10 4 and d = 2 ~ 190. Predicted results with corresponding 95% CI of the WMKRVM model using the fused feature of PKPCA_IRBF with the dimensions d = 69, d = 85, d = 99, d = 146, and d = 197 are shown in Fig. 11, respectively. Performance evaluation of the WMKRVM-based tool wear predictive model under the fused features of PKPCA_ IRBF with different dimensions is shown in Fig. 12. The CI width obtained from the WMKRVM-based tool wear predictive model under the PKPCA_IRBF fused features is shown in Fig. 13. The cyan dashed line in Fig. 13 represents the CI width value obtained by the WMKRVM model. Compared to Fig. 8, the 95% CI of the proposed WMKRVM model has been improved under the PKPCA_IRBF fused features. It is found that the WMKRVM model under the PKPCA_IRBF fused features outperforms the WMKRVM model under the original features in terms of CI width . The selectable region for the kernel parameters in KPCA_IRBF is 0 = 1.5 × 10 4 and d = 10 ~ 155. It can be concluded that the dimension increment property of KPCA_IRBF can help to further ameliorate the reliability and stability of the monitoring process of the GPR-based tool wear predictive model.
The PKPCA RBF is also utilized to fuse the original feature for constructing the WMKRVM-based tool wear predictive model, revealing the superiority of PKPCA IRBF in this section. The best predictive performance of WMKRVM using the PKPCA_RBF-based fusion features is listed in Table 8. As can be shown, PKPCA IRBF outperforms PKPCA RBF. Therefore, it can be concluded that PKPCA IRBF can extract more effective signal features suitable for tool wear prediction.  The predicted tool wear values have been squeezed into a more significant 95% CI, and most test points fall into the 95% CI. It intuitively reflects the effectiveness of PKPCA_ IRBF in ameliorating the CI of the WMKRVM model.

Conclusion
This paper develops a novel tool wear predicting method based on the proposed PKPCA_IRBF technique and the WMKRVM model to monitor cutting tools' in-process tool wear width. The major work is summarized accurately: 1. The proposed WMKRVM model outperforms the standard single kernel RVM, including Gaussian kernel, polynomial kernel, multiquadric kernel, exponential kernel, and Laplace kernel in predictive accuracy due to the superior performance of multi-kernel in capturing the global structure of the nonlinear dataset. 2. As a new optimization algorithm, SSA can effectively optimize the kernel parameters of the WMKRVM model, obtaining excellent predictive performance and robustness. 3. A new parameter optimization method, which uses a k-fold cross-validation method combined with HHO, is proposed to optimize the WMKRVM model's initial hyperparameters. 4. As a new nonlinear dimension-increment technique, the proposed PKPCA_IRBF is conducive to ameliorating the 95% CI of the WMKRVM model.
Cutting experiments are carried out to reveal the effectiveness of the tool wear predictive model based on the proposed WMKRVM model and the PKPCA technique. This research provides theoretical guidance for monitoring tool wear in the machining process.