## 2.1 Vector Representation of Tool Edge Parameters and Processing condition parameters

Tool design involves numerous parameters, and calculating the precise impact of each parameter on tool life using equations is challenging. Moreover, these parameters exhibit a certain level of correlation. Additionally, tool life is inherently imprecise, obtained through testing and subject to certain volatility. Consequently, tool designers typically rely on their experience to initially determine these design parameters. To address the relationship between tool life and design parameters, and to circumvent the need for exact modeling, artificial intelligence algorithms can be employed. These algorithms facilitate the identification of patterns between tool life and design parameters by establishing a regression function using statistical learning. In this process, it is crucial to establish a vector representing the tool edge parameters, a vector representing the processing condition parameters, and an objective function for tool life.

(1) Tool edge parameter vector

$${{\mathbf{x}}_1}={\{ {a_1},{a_2},...,{a_l}\} ^T}$$

1

The tool edge parameters vector comprises various parameters involved in tool design, such as tool diameter, radial rake angle, radial clearance angle, core diameter, and helix angle. These parameters, denoted by \({a_i}\), have a significant influence on the machining performance of the tool. There exists a complex nonlinear correlation among these design parameters, where each parameter's effect on the final tool life is interdependent with the others. Consequently, accurately describing and solving this multidimensional constraint relationship using traditional exact calculation models is challenging.

In practical tool design, engineers often prioritize optimizing the tool edge parameters that strongly impact performance based on the specific usage requirements of the tool. Conversely, parameters with lesser influence on performance are frequently assigned empirical values. This design strategy has proven to be practical and widely adopted within the industry.

(2) Tool machining parameter vector

$${{\mathbf{x}}_2}={\{ {b_1},{b_2},...,{b_l}\} ^T}$$

2

The machining parameter vector encompasses spindle speed, feed rate, lateral depth of cut, and radial depth of cut, denoted by \({b_i}\). These parameters are primarily determined based on the machining equipment's performance, efficiency and quality requirements of the processed product, and the cutting edge parameters of the tool. They exert a significant influence on the tool life, and the appropriate selection of processing condition parameters can prolong the tool life. Conversely, improper selection of processing condition parameters may reduce the tool life.

Determining the optimal range of processing condition parameters for a newly designed tool typically involves extensive testing and experimentation. Engineers continuously adjust and optimize the processing condition parameters during actual machining to assess the tool's performance under different parameter combinations. Through this iterative process, they aim to identify the optimal combination of processing condition parameters that achieves the desired machining efficiency and meets product quality requirements. This optimization process requires a comprehensive consideration and trade-off among key indicators such as tool life, cutting forces, surface quality, and machining efficiency.

(3) Objective Function for Tool Life

$${F_{\hbox{max} }}={\text{Max}}\{ F({\mathbf{X}})\}$$

3

The tool life objective function is defined as follows:\(F\left(x\right)=F({x}_{1},{x}_{2})\), where **X** is the design vector comprising the tool edge parameters and processing condition parameters, denoted by \({x}_{1}\) and \({x}_{2}\), respectively. The function \(F\left(x\right)\) represents the nonlinear regression function calculated based on a set of cases. The objective is to determine the optimal value, denoted as \({F_{\hbox{max} }}\), that maximizes the tool life. By optimizing the values of \({x}_{1}\) and \({x}_{2}\) within the design vector \(X\), the goal is to find the combination of tool edge parameters and processing condition parameters that leads to the highest tool life performance.

### 2.2 Support vector regression algorithm

The Support Vector Regression (SVR) algorithm, initially derived from the principles of Support Vector Machine (SVM) [15], was further developed by Vapnik [16] for data regression analysis. Unlike traditional regression algorithms, SVR takes into consideration the upper and lower bounds of prediction errors, enabling it to effectively handle linear and nonlinear data. The SVR algorithm has witnessed significant advancements in both theoretical research and practical applications [17–19], positioning it as one of the leading methods for addressing multidimensional nonlinear regression problems. Notably, SVR offers distinct advantages and practical feasibility.

Compared to other intelligent algorithms, SVR stands out by providing a definitive regression analytic equation. Moreover, this equation exhibits stability even after repeated calculations, ensuring consistent and reliable regression outcomes. These qualities have contributed to the widespread adoption of SVR as a preferred method across various domains.

The fundamental concept of SVR involves mapping a variable *x* from a low-dimensional space to a high-dimensional feature space through a nonlinear transformation function. By employing this transformation, the nonlinear problem in the low-dimensional space can be effectively addressed by finding a linear regression hyperplane in the high-dimensional space. This mapping is accomplished using a kernel function, denoted as \(K\left(x,{x}^{{\prime }}\right)\), which operates on \({R}^{n}\times {R}^{n}\). A kernel function is considered valid if there exists an H-transform, denoted as \({\phi }_{\bullet x\to \phi \left(x\right)}^{\bullet {R}^{n}\to H}\), capable of mapping \({R}^{n}\) to a Hilbert space, such that \(K\left(x,{x}^{{\prime }}\right)=\left(\phi \right(x)\bullet \phi (x{\prime }\left)\right)\). Utilizing the kernel function, it becomes possible to construct a linear function based on the sample feature vectors within the high-dimensional feature space.

$$y={\mathbf{\omega }} \cdot \varphi ({{\mathbf{x}}_i})+b$$

4

In SVR, *y* represents the output quantity, **ω** represents the weight vector, **x** represents the input vector, and b represents the bias constant. By introducing the relaxation variable ξ and considering the Karush-Kuhn-Tucker (KKT) condition, the support vector regression problem can be expressed as follows:

$$\mathop {{\text{Min}}}\limits_{{\varvec{\omega},b,{\xi ^*}}} \;\{ \frac{1}{2}||\varvec{\omega}|{|^2}+C\sum\limits_{{i=1}}^{i} {({\xi _i}+{\xi _i}^{*})} \}$$

5

$$\begin{gathered} {\text{s}}{\text{.}}\;{\text{t}}.\;\;({\mathbf{\omega }}\varphi ({{\mathbf{x}}_i})+b) - {y_i} \leqslant \varepsilon +\xi _{i}^{{}},i=1, \ldots l \hfill \\ \mathop {}\limits^{{}} \mathop {}\limits^{{}} \mathop {}\limits^{{}} {y_i} - \mathop {({\mathbf{\omega }}\varphi ({{\mathbf{x}}_i})+b) \leqslant \varepsilon +\xi _{i}^{*},i=1, \ldots l}\limits^{{}} \hfill \\ \mathop {}\limits^{{}} \mathop {}\limits^{{}} \mathop {}\limits^{{}} \xi _{i}^{{}},\xi _{i}^{*} \geqslant 0 \hfill \\ \end{gathered}$$

6

By introducing the Lagrange function, the original problem is transformed into a quadratic optimization problem. The Lagrange function incorporates the constraints and converts the problem into a maximization problem. The objective is to maximize the Lagrange dual function, which is defined as the difference between the primal objective function and the sum of the Lagrange multipliers multiplied by the constraints. This transformation enables the application of optimization techniques to find the optimal values of the Lagrange multipliers. The quadratic optimization problem can be efficiently solved using various optimization algorithms, such as the Sequential Minimal Optimization (SMO) algorithm. The solution to the quadratic optimization problem provides the optimal values for the Lagrange multipliers, which can then be used to determine the solution to the original problem.

$$\mathop {{\text{Max}}}\limits_{{\mathbf{a}}} \mathop {\{ - \frac{1}{2}\sum\limits_{{j={\text{1}}}}^{l} {\sum\limits_{{i=1}}^{l} {({a_i} - {a_i}^{*})} } ({a_j} - {a_j}^{*})K({{\mathbf{x}}_i},{{\mathbf{x}}_j}) - \varepsilon \sum\limits_{{i=1}}^{l} {({a_i}+{a_i}^{*})} +\sum\limits_{{i=1}}^{l} {{y_i}({a_i} - {a_i}^{*})} \} }\nolimits^{{}}$$

7

$$\begin{gathered} {\text{s}}{\text{.}}\;{\text{t}}.\;\;\sum\limits_{{i=1}}^{l} {({a_i} - {a_i}^{*})} =0 \hfill \\ \;\quad \;{\kern 1pt} {\kern 1pt} a{}_{i},{a_i}^{*} \in [0,C] \hfill \\ \end{gathered}$$

8

Solving for the regression function results in determining the optimal values for the parameters, weights, and bias in the model. The regression function represents the relationship between the input variables and the output quantity. It can be obtained by minimizing the objective function, which measures the discrepancy between the predicted output and the actual output in the training data. Various optimization techniques can be employed to solve for the regression function, such as gradient descent, least squares, or maximum likelihood estimation. The ultimate goal is to find the best-fit regression function that accurately captures the underlying patterns and provides reliable predictions for new input data.

$$F(x)=\sum\limits_{{i=1}}^{l} {({a_i} - {a_i}^{*})} K({{\mathbf{x}}_i},{\mathbf{x}})+b$$

9

$$b=\left\{ {\begin{array}{*{20}{c}} {{y_i} - \sum\limits_{{{{\mathbf{x}}_j} \in SV}} {({a_j} - {a_j}^{*})} K({{\mathbf{x}}_i},{{\mathbf{x}}_j}) - \varepsilon \;\;(0<{a_i}<C({a_i}^{*}=0))} \\ {{y_i} - \sum\limits_{{{{\mathbf{x}}_j} \in SV}} {({a_j} - {a_j}^{*})} K({{\mathbf{x}}_i},{{\mathbf{x}}_j})+\varepsilon \;\;(0<{a_i}^{*}<C({a_i}=0))} \end{array}} \right.$$

10

The *b*-values are individually calculated for each standard support vector and subsequently averaged. This process involves determining the bias constant, *b*, which is an important parameter in SVR. By considering each standard support vector separately, their corresponding *b*-values are calculated, and then these values are averaged to obtain the final value for *b*. This method ensures that the bias constant is appropriately estimated and takes into account the contributions of all standard support vectors in the regression model.

$$b=\frac{1}{{{N_{NSV}}}}\{ \sum\limits_{{0<{a_i}<C}} {[{y_i} - \sum\limits_{{{x_j} \in SV}} {({a_j} - {a_j}^{*})} K({{\mathbf{x}}_i},{{\mathbf{x}}_j}) - \varepsilon ]} +\sum\limits_{{0<{a_i}^{*}<C}} {[{y_i} - \sum\limits_{{{{\mathbf{x}}_j} \in SV}} {({a_j} - {a_j}^{*})} K({{\mathbf{x}}_i},{{\mathbf{x}}_j})+\varepsilon ]} \}$$

11

The set of standard support vectors, SV is utilized, and the number of standard support vectors is denoted as *N*NSV. In support vector regression (SVR), generalized kernel functions play a crucial role, with the main options being Gaussian radial basis kernel functions and polynomial kernel functions. Each type has its own strengths and weaknesses. The Gaussian radial basis kernel function, a local kernel function, exhibits excellent learning capability and convergence properties; however, its generalization effect may be limited. On the other hand, the polynomial kernel function, a global kernel function, offers superior generalization; nevertheless, it may be more challenging to achieve convergence and requires more complex parameter adjustments.

In the SVR algorithm, the choice of appropriate parameters significantly affects the accuracy of fitting and the generalization performance. Specifically, the Gaussian radial basis kernel function is represented by \(K({\mathbf{x}},{\mathbf{x}}')\)=\({e^{ - \frac{{||{\mathbf{x}} - {\mathbf{x}}'|{|^2}}}{{2{\sigma ^2}}}}}\), where \({\sigma ^2}\) denotes the kernel function parameter. Additionally, the insensitive loss coefficient *ε* and the error penalty factor *C* also contribute to the overall fitting accuracy and generalization performance of the SVR algorithm. It is imperative to select suitable parameter values to optimize the regression effect and achieve improved accuracy and generalization in SVR.

### 2.3 Particle swarm optimization algorithm

The fitted function \(F({\mathbf{X}})\) obtained through the SVR algorithm represents a complex multivariate function, making the search for the optimal solution quite challenging. To tackle such complex problems, intelligent search algorithms like neural networks, genetic algorithms, ant colony algorithms, and particle swarm algorithms are commonly employed. These algorithms bypass the need for complex equation derivation and solution processes and have shown promising results.

Among these algorithms, PSO algorithm, jointly proposed by Drs. Kennedy and Eberhart in 1995 [20], draws inspiration from the foraging behavior of bird flocks. It leverages the information exchange between individuals within a group and between individuals and the optimal individual to guide the entire group toward convergence to the optimal solution while preserving individual diversity. Through continuous updates, the algorithm gradually identifies the optimal solution. Over time, the PSO algorithm has undergone continuous improvements and has found widespread applications across various domains [21–22].

In the PSO algorithm, all particles adjust their velocity and position based on their individual optimal values and the global optimal values of the particle swarm. The velocity and position update equations are as follows:

$$v_{{id}}^{{k+1}}=\omega v_{{id}}^{k}+{c_1}{r_1}{\text{(}}p_{{id}}^{k}-x_{{id}}^{k}{\text{)}}+{c_2}{r_2}{\text{(}}g_{d}^{k}-x_{{id}}^{k}{\text{)}}$$

12

In the given equations, \({p}_{id}^{k}\)represents the individual optimal value of the *i*th particle in the *d*-dimensional space at the *k*th iteration, while \({g}_{d}^{k}\) denotes the global optimal value of all particles in the *d*-dimensional space at the *k*th iteration. The parameters *ω*, *c*1, and *c*2 refer to the inertia weight factor, individual learning factor, and social learning factor, respectively. Additionally, *r*1 and *r*2 represent random numbers between 0 and 1, while \({x}_{id}^{k+1}\) represents the current position of the *i*th particle in the *d*-dimensional space at the \((k+1)\)th iteration.

The fitness function plays a crucial role in evaluating the quality of each particle's position in the algorithm. The appropriate selection of the fitness function significantly impacts the convergence speed and accuracy of the algorithm. In our method, Eq. (9) is utilized as the fitness function. This choice enables a more accurate evaluation of the advantages and disadvantages of each particle's position, leading to a significant acceleration in the convergence speed of the algorithm.

The selection of initial values in the PSO algorithm is another critical factor affecting its convergence. In our method, a strategy of using filtered cases as the initial particles is adopted, rather than randomly initialized particles. This deliberate choice aims to enhance the convergence of the algorithm and expedite the attainment of the optimal solution. By leveraging pre-filtered cases as the initial particles, the method has demonstrated a faster convergence rate compared to random initialization methods. The main flow of the particle swarm algorithm is shown in Fig. 3.