CNN-GRU network-based force prediction approach for variable working condition milling clamping points of deformable parts

Improper clamping is one of the major causes of part deformation. Improving the fixture arrangement through force analysis of clamping points is an effective means to suppress or improve machining deformation. However, the existing research focuses on the monitoring and off-line optimization of the clamping point force, which has a certain lag on the machining deformation control, and it is difficult to predict the clamping point force due to the time-varying coupling effect of multiple factors such as process parameters, cutting force, and clamping point force in the machining process. Inspired by the excellent performance of convolutional neural networks and gated recurrent networks in feature extraction and learning of temporal association laws, this paper proposes a CNN-GRU-based method for predicting the force state of clamping points under variable working conditions. Firstly, a force prediction model of clamping point during milling process with variable working conditions is established. Secondly, a convolutional neural network is designed to extract the features of dynamic coupled machining conditions. Then, a network of gated recurrent units is constructed to learn the temporal correlation law between the machining conditions and the forces on the clamping points to achieve force prediction of the clamping points during machining. Finally, it was verified by the milling process of the piston skirt. The results show that CNN-GRU can effectively predict the clamping force. In addition, CNN-GRU has higher computational efficiency and accuracy compared with CNN-LSTM, CNN-RNN and CNN-BP.


Introduction
Part processing deformation is a common problem in CNC machining, such as the machining deformation of large-size integral parts in aerospace industry, diesel engine piston skirt, and automobile impeller, which seriously affects the machining accuracy and surface quality of the workpiece [1,2]. Deformation caused by external loads (clamping, cutting forces) that elastically deform the workpiece during processing, being recovered afterwards [3]. This means that the fixture becomes a key component to avoid the geometrical error associated to deformation caused by external loads during machining processes. The fixture is a precision subsystem to provide an accurate positioning of the workpiece in the work space and to rigidly hold and support the component to withstand the machining forces, affecting the static and dynamic behaviour of the workpiece [4]. The deformation of the workpiece is not only related to the residual stress and cutting force of the workpiece but also related to the processing conditions such as cutting order, fixture arrangement, and clamping force [5], in which the deformation caused by the clamping of the fixture is usually related to the existing deformation caused by the previous manufacturing process and clamping force in the original workpiece, especially in the processing of thin-walled parts, the larger clamping force will lead to excessive elastic deformation of the workpiece, resulting in larger dimensional errors, while the smaller clamping force cannot adequately restrain the workpiece during the processing. However, in addition to the clamping force, the fixture also exerts a reaction force on the workpiece during machining, which is due to the combined effect of residual stress release, machining deformation and cutting forces on the workpiece during machining. Therefore, the integrated force at the clamping point is the key to the study of the overall force and machining deformation of the workpiece.
Moreover, in the development of special fixtures for thinwalled parts, fixtures with floating capability can separate the force at the clamping point to a certain extent, such as the intelligent fixture designed in Reference [4], which can separate the clamping force and the reaction force and can reduce the deformation of the workpiece by controlling the reaction force at the clamping point. In other words, the force at the clamping point is an indispensable component for developing a fixture system that can suppress machining deformation. Accurate prediction of clamping point forces, for one thing, can provide input for some workpiece machining deformation prediction and control methods, such as predeformation-based workpiece deformation control methods or finite element method-based workpiece deformation prediction methods. Second, in the actual machining process, for some fixture adjustable machining scenarios, the fixture contact point or clamping force can be adjusted in advance to balance the overall workpiece force based on the predicted clamping point force, or even release the machining deformation, and then eliminate the workpiece deformation through subsequent machining, so that the machining efficiency can be improved while ensuring the workpiece machining quality.
Therefore, accurate prediction of clamping point force can help to correct fixture arrangement and clamping force and then effectively restrain or improve machining deformation. From this perspective, it is a good entry point to study the prediction of clamping point force during the machining of thin-walled parts. However, the variation of clamping point force is influenced by a combination of several factors such as cutting parameters, cutting force, and vibration during machining, and these factors show complex coupling changes in time sequence as the working conditions change, making it a challenge to accurately predict the clamping point force under variable working conditions. At present, the analysis or prediction methods for the machining process clamping point force mainly include different modeling methods such as analytical method and finite element method to numerically simulate the clamping process and its deformation [6,7], and these works are the basis for the study of fixture layout optimization. However, these models rely on input parameters that are difficult to obtain or estimate, such as the clamping point forces during machining, and a large number of assumptions are required to ensure that the model can be solved or approximate solutions are obtained. Due to the uncertainties in the machining process, the simulation process differs from the actual machining process, and the clamping point forces are constantly changing under the complex coupling of cutting parameters, cutting forces, vibrations, and other operating conditions, off-line clamping optimization or fixed clamping simulation model can not adapt to such changes, and it is not possible to optimize the fixture arrangement and clamping force size, as well as further processing deformation control according to the changes in clamping point forces.
Nowadays, sensors and monitoring technologies have evolved significantly allowing the on-line control of the actual state of the manufacturing systems [8]. Some studies use sensors, including displacement sensors or strain gauges, in fixtures and clamping elements, to monitor clamping and reaction forces and design dynamically adjustable fixtures based on this, such as dynamic fixtures by monitoring clamping forces [4] or floating clamping methods based on real-time displacement monitoring data [9,10]. The core principle of the above research is to adjust the fixture state by monitoring the change of force or displacement at the clamping point of the machining process, so as to achieve the purpose of releasing the machining stress of the workpiece and controlling the machining deformation, which means that the correction of the clamping state by analyzing the change of force at the clamping point becomes one of the effective ways to suppress or control the machining deformation.
However, the current use of fixtures or workpiece-fixture systems to analyze machining deformation is more a matter of converting machining deformation prediction into online monitoring and detection of workpiece deformation during machining, and less work has been done to directly predict the force state at the clamping point. Moreover, the uncertainty of the actual machining process leads to the fact that the work condition information (e.g., cutting force) is often missing to some extent or contains a lot of noisy data and redundant information, which is difficult to be used directly as a basis for adjusting the clamping state. Moreover, there is a time-varying coupled influence relationship between the working condition factors and the clamping force state, and the accuracy of the clamping point force prediction method relies on the extraction of the dynamic time-series characteristics related to the clamping force from the complex changing working conditions, and the existing research cannot fully explore the time-series correlation between the timevarying working conditions and the clamping point force.
The rapid development in the field of computer science such as machine learning and deep learning provides new data-driven ideas for discovering patterns in machining processes from these data [11], especially for complex, dynamic, and even chaotic manufacturing processes reflecting great advantages, such as applications in tool condition monitoring [12], machining accuracy prediction [13], and machining deformation prediction problems [14]. Inspired by this, this paper proposes a CNN-based feature extraction 1 3 of time-varying working conditions of machining process, with GRU-based learning of dynamic time-series relationship between variable working conditions and clamping point force for clamping point force prediction. The main contributions of this study are summarized as follows: 1) We propose the use of hierarchical mean interpolation; Donoho [15] threshold and soft-hard threshold compromise method and Z-score standardization method to pre-process the time-varying working condition data according to the characteristics of continuous change of time-varying working condition data and compare and analyze the effect of different data processing methods. 2) We designed a time-varying working condition feature adaptive extraction model with a fully convolutional CNN network to realize the feature extraction of highdimensional sensitive data responding to the change of force state at the clamping point. 3) A GRU network-based force state prediction model for clamping points is proposed, and several different prediction models are compared. Experimental results demonstrate the effectiveness of the CNN-GRU model that considers dynamic time-series correlation.
The remainder of this paper is organized as follows: Sect. 2 reviews the existing approaches used for force prediction at clamping points. In Sect. 3, the factors affecting the clamping point force are analyzed and some definitions and framework overview of CNN-GRU based clamping point force prediction methods are presented. Section 4 gives details of the time-varying operating condition data preprocessing, feature extraction, and clamping point force prediction model learning methods. In Sect. 5, piston skirt milling machining is used as an experimental object and compared with other models for analysis to verify the performance of the proposed model. Finally, Sect. 6 summarizes the whole paper and gives an outlook on future research directions.

Related work
In this section, we briefly review three research fields that are relevant to this paper. Specifically, the three aspects include force state prediction, time-varying data feature extraction, and recurrent neural network application.

Prediction of force state in machining process
The methods for force prediction during processing can be broadly classified into three categories: mechanical analysis modeling method, finite element simulation method, and machine learning method. Among them, the mechanical analysis modeling mainly refers to the theoretical analysis of the forces and related parameters in the processing process, to establish the corresponding theoretical numerical model. Kurnadi et al. [6] proposed a systematic mathematical approach to optimize these workpiece clamping parameters using an analytical model of ring deformation and a model to determine the minimum clamping force. Using the principle of minimum modulus, Zhang et al. [16] developed a kinematic model of the workpiece-fixture system, calculated the contact forces (including frictional forces) between the fixture components and the workpiece, and further proposed a model to optimize the clamping force to maintain the stability of the workpiece-fixture system. Finite element simulation method refers to offline machining simulation prediction using commercial finite element simulation software. A machining surface error prediction method that considers the dynamic effects of clamping and workpiece systems during milling was proposed by Dong et al. [17]. The static deformation caused by the clamping force and the dynamic deformation caused by the milling force are calculated using static analysis and explicit dynamic analysis methods in the FEA environment. Liu et al. [18] proposed a milling error prediction method based on the finite element method, which considered the deflection of the workpiece/tool system and the springback deformation of the workpiece, in addition to the dynamic model of the tool. Machine learning-based force state prediction for machining processes mainly uses artificial neural networks to learn the correlation between the force state and the relevant machining parameters. Zhao et al. [14] proposed a deep learning model-based online prediction method for CNC machined part deformation, using deformation monitoring data and machining information related to intermediate part geometry information to construct and train a deep learning model consisting of traditional neural networks and recurrent neural networks to achieve online prediction of part deformation. Teramoto [19] evaluated the deformation of workpieces under different clamping sequences, using locally measured strains and clamping simulations, and used the response surface method to estimate clamping forces and workpiece deformation. Hao et al. [20] designed a responsive fixture that can make responsive adjustments to part deformation based on online monitoring data and optimize the machining sequence through clustering and dynamic planning to reduce machining deformation due to unpredictable factors.
Although the above three methods have achieved certain results, theoretical mechanical analysis modeling and finite element simulation to describe the complex manufacturing process is limited, the analysis process inevitably contains simplifications, and sometimes contains inappropriate assumptions, and therefore difficult to adapt to the actual machining process of time-varying work factors, the dynamic coupling between work factors analysis process also has a gap with the actual machining. Machine learning methods provide ideas for mining the correlation between the force state and the working condition of the machining process, but the traditional neural network structure level is relatively too shallow, and the ability to learn the complex nonlinear relationship between the time-varying working condition and the force at the clamping point is relatively weak.

Time-varying processing data feature extraction
The existing time-varying data feature extraction methods can be broadly classified as traditional feature extraction methods and deep learning-based feature extraction methods. Traditional feature extraction methods can be divided into three categories based on the implementation principle: time domain, frequency domain, and time-frequency domain, where time-domain feature extraction refers to obtaining various time-domain parameters or indicators in signal data through statistical methods, such as Wu et al. [21] performed tool remaining life prediction by calculating and filtering parameters such as standard deviation, mean value, root mean square value, root square amplitude, and maximum value of noise signals and current signals. And frequency domain feature extraction usually refers to the use of Fourier transform for frequency component analysis of signal data, such as the frequency amplitudes in acoustic emission signals and vibration signals obtained by Wang et al. [22] using fast Fourier transform as part of the input to the tool wear prediction model. But the actual acquisition of processing dynamic signal is often non-smooth, and only one-sided analysis of its time domain or frequency domain characteristics is not enough, but also to further clarify the signal spectrum changes with time, so the need to use the joint time-frequency function of the signal data for time-frequency domain analysis. Zhou et al. [23] used HHT transform for feature extraction of torque signals and implemented tool wear life prediction. Compared with the traditional feature extraction methods, the deep learning feature extraction method uses a multi-hidden layer machine learning model to learn autonomously to obtain useful data features, such as Tran et al. [24] proposed a 3D convolutional neural network-based spatio-temporal feature learning method. Compared with traditional methods for feature extraction of time domain, frequency domain, and time-frequency domain data, deep learning can deeply explore the feature information hidden in the data and significantly eliminate the influence of subjective experience on the feature extraction process, and this advantage makes it more effective in dealing with feature extraction of complex working conditions. In this paper, we consider multiple time-varying operating conditions data as images composed of single-channel pixel matrices and then use CNN networks to adaptively extract time-varying operating conditions features at each moment (or time period).

Recurrent neural network
Recurrent neural network (RNN) was first proposed to process sequential data in the 1980s [25]. However, due to the problems of gradient disappearance and gradient explosion, the original RNN is very difficult to be trained, and its application is very limited. To tackle these problems, Hochreiter and Schmidhuber [26] proposed the long short-term memory (LSTM) which is a variant of RNN, and then on this basis, Chung et al. [27] proposed the gated recurrent unit (GRU) network. Although both of them are RNN networks, the former is a recurrent neural network with a special structure, while the latter belongs to a variant of the former structure. They both solve the problems of gradient explosion, gradient disappearance, and insufficient long-term memory capacity of traditional recurrent neural networks while introducing the temporal feedback mechanism and thus effectively utilize the previous temporal information and improve the model reliability. LSTM has been successfully used for anomaly detection in mechanical equipment [28], and the authors' team [12,23] also proposed the use of LSTM to predict the tool wear state under different working conditions. Compared with the standard LSTM network, the GRU network structure is more streamlined and computationally simpler. Zhao et al. [29] implemented the monitoring of three machine health states (tool wear prediction, gearbox fault diagnosis, and early bearing fault detection) by constructing a local feature-based gated recursive unit (LFGRU) network. It can be seen that recurrent neural networks have significant advantages in dealing with complex time-series change prediction problems during processing. In this paper, a GRU network-based force prediction method for clamping points is proposed, taking into account the complex variation of machining conditions and clamping point force states during machining and the correlation between the two in time sequence.

Basic concepts
Definition 1 Clamping point force (F). The clamping point force represents the force that the fixture is subjected to under the action of external and internal forces (workpiece processing deformation, cutting force, etc.) acting on the workpiece. Clamping point force can be denoted as F = ( F x , F y , F z ), and F x , F y , F z denote the component forces in the X, Y, and Z directions respectively.

Definition 2
Machining conditions (C). The machining conditions are a collection of factors that affect the force on the clamping point during machining. The machining conditions can be denoted as C = {C pr , C cl , C mt , C tool , C part , C mon }, and C pr , C cl , C mt , C tool , C part , C mon indicate the machining subconditions, clamping sub-conditions, machine sub-conditions, tool sub-conditions, workpiece sub-conditions, and monitoring signal sub-conditions, respectively.  The force prediction at the clamping point is to output the force state at the clamping point after t hours of continued machining, given the machining condition data for a period of time T, after data pre-processing and feature fusion operations, and input to the prediction model. That is Input = C, after data pre-processing operation and feature extraction operation, the processing results are then fed into the prediction function Y, and the output result is as follows Out-

Overview of the proposed approach
The procedures of the proposed method are illustrated in Fig. 1. It involves four parts: data pre-processing, feature extraction, network training and evaluation, and clamping point force prediction. Here, we give a brief description of each part, respectively: (1) Data pre-processing The pre-processing of different working condition data during processing is an essential step, because the raw data are usually subject to local extremes, missing data, data noise, and inconsistency in the data scale due to the actual processing environment and data acquisition conditions, which seriously affect the subsequent data analysis and model construction results. In this paper, the data pre-processing operations Φ include missing value processing, noise reduction, and normalization.
(2) Feature extraction Considering that the processing process involves a variety of time-varying factors, the data is often characterized by large quantities, dynamic changes, and strong coupling, so in order to extract the feature information efficiently and reliably, we consider the timevarying data as a single-channel image consisting of a two-dimensional pixel matrix and then construct a CNN network without pooling operation for feature extraction to obtain training samples. (

3) Network training and evaluation
The GRU model from the bottom up mainly consists of three parts: prediction model input, prediction model forward calculation, and prediction model backward tuning. The processing conditions in each time window of the current time period are first fused using the Concat algorithm and then used as input to the prediction model. The mean square error (MSE) is used as the loss function in the reverse tuning process, and the gradient descent process is optimized using Adam's algorithm. After model training, we quantitatively evaluate the fit of the model predictions to the true values through a comparative analysis of several prediction models.

(4) Clamping point force prediction
In this step, the new machining conditions data are inputted into the prediction function Y and then output a vector of three-way clamping force values at the clamping point. Based on the proposed method, the three-way force at the clamping point is input to the CNN network for the first time for feature extraction, and the extracted features are fused with the features of cutting force and vibration as the input to the GRU network. The second time, the actual data of the three-way force at the clamping point is directly input into the GRU network as the label, and the GRU network is trained together with the above fused features to achieve the prediction of the force at the clamping point. In fact, we collected the force at two different clamping positions, either one of which can be input to the CNN network for feature extraction as the first time, and the other one can be input to the GRU network as the label value for prediction model training, and in turn the two can be verified with each other.
The following sections describe data pre-processing, eature extraction, networks construction, and training process in more detail.

Methodology
In this section, we first introduce the preprocessing of the time-varying working condition data of the machining process and then state the extraction of force-sensitive features at the clamping point. Finally, we elaborate the process of our proposed force prediction model for the clamping point.

Pre-processing of machining condition data
In order to improve the quality of the extracted data features, we need to pre-process the raw data, i.e., remove the redundant data, supplement the missing data and unify the inconsistent data through certain data processing methods, and then lay the foundation for the construction of the subsequent time-varying working condition feature extraction model and clamping force state prediction model.

(1) Missing value handling
For the characteristics of time-varying machining conditions, we choose the layered mean interpolation method to supplement the missing values. The interpolation values under the hierarchical mean interpolation method are calculated as follows: where y k is the interpolation value of the kth layer; a i proxy for the presence of missing data at the i-th cell in the k-th layer, a i = 1 means no missing data, a i = 0 means missing data; y i is the data value at the i-th cell; and n is the number of data cells in the k-th layer.
(2) Wavelet threshold denoising Due to the influence of machining environment, machine tool system, or sensor performance, there are usually noisy data in the collected time-varying working condition data, and these noisy data can affect the model accuracy, so it is necessary to noise reduction and filter the raw data. Based on the principle of wavelet transform and the good application effect of wavelet threshold denoising method [30,31], we choose to use wavelet threshold denoising (WTD) method for noise reduction and filtering of the original time-varying working condition data.
There are two commonly used threshold calculation methods for wavelet threshold denoising methods. One is the generic threshold given by Donoho et al. and is calculated by the following equation: where σ is the noise intensity and N is the signal length.
The other is the improved threshold of Zhang et al. [32]. It is calculated by the following equation: where σ is the noise intensity; N is the signal length; and j is the decomposition scale. (1) In order to determine the most effective wavelet threshold noise reduction method, this paper compares the noise reduction performance of different combinations of thresholds and threshold functions based on the wavelet base (db3 wavelet), the number of decomposition layers (j = 1), the signal length (N = 2), and the adjustment factor (a = 0.5) for 200 consecutive sampling points of machine tool vibration signal (Y direction) during piston skirt milling.
For the same data or signal, in order to measure its noise reduction after filtering, the ratio between the standard deviation and the mean of the filtered data or signal is used as a measure, and this indicator is defined as the speckle noise index [33]. We quantitatively measure the noise filtering performance of the above six methods by calculating the speckle noise index β of the noise reduction filtered data, and usually the smaller the β value is, the better the noise filtering performance is. Suppose the original machine vibration signal data is X = {x 1 , x 2 , …, x n }, the noise reduction filtering process to get the data is X' = {x' 1 , x' 2 , …, x' n }, at this time its speckle noise index is calculated by the following formula: Using the formula to quantitatively analyze the noise reduction and filtering performance of machine tool vibration signals under the above six combinations, the specific results are shown in Table 1: As can be seen from the table, when the threshold value is selected as Donoho threshold and the threshold function construction method is selected as soft and hard threshold compromise method, the β-value is the smallest, i.e., the machine tool vibration signal data is filtered relatively better. Based on the above qualitative and quantitative analysis results, we choose to use Donoho threshold and soft-hard threshold compromise method for noise reduction and filtering of time-varying working condition data.
The data of machining conditions often exist in different levels or units, and it is necessary to standardize these data to eliminate the influence of levels between different index parameters and solve the problem of incomparable data indicators.
The main common standardization methods are Z-score, Min-max, and nonlinearity. Considering that the timevarying machining data (such as cutting force signal data, machine vibration data, etc.) processed in this paper have positive and negative variations, the Z-score method is chosen directly to process the data.
Z-score is mainly calculated by means of the mean and standard deviation in the data set, and the data after this standardization process conform to the standard normal distribution, i.e., a mean of 0 and a standard deviation of 1. For a data set X = {x 1 , x 2 ,…, x n }, for any x ∈ X, the transformation is calculated as follows: where x' is the standardized z-score; μ is the mean of the samples in the dataset; and σ is the standard deviation of the samples in the dataset.

Machining conditions feature extraction
In this section, the time-varying machining condition data in time window Tw is treated as a single-channel pixel matrix image, and then the CNN network is used to extract the condition features adaptively. The main module of the CNN network is the convolutional layer, where a set of weights called filter banks is used to connect the previous layer and features and to extract features with highly relevant data. The subsequent pooling process is similar to data compression, and the main purpose is to reduce the dimensionality of the data features and the amount of parameter computation by performing down-sampling operations on the neurons of the convolutional layer.
However, considering that CNN networks can process small image blocks without pooling operations [34], in this paper, the time-varying work data matrix is considered as small pixel images (i.e., small image blocks), and the feature extraction model does not perform the corresponding pooling operations after the convolution operation.
This paper introduces the feature extraction process using the example of piston skirt milling process data containing three types of time-varying working conditions (X/Y/Z three-way machine vibration data, X/Y/Z three-way force data at the clamping point, and X/Y/Z three-way cutting force data). The network structure of the feature extraction model is shown in Fig. 2: The sample data of time-varying machining conditions (C t v ) forms a 200 × 9 time-varying working condition data matrix after interval sampling Sv. Since this paper only used CNN network for convolutional operation without pooling operation, 64 200 × 9 feature matrices were obtained after 4 convolutions, and then pulled into a 200 × 576 matrix afterwards, so the dimensionality of the final extracted features is not significantly reduced. The specific process is shown in the following Table 2: The error backward transfer process consists of two stages, firstly the error is backward transferred in the GRU network, and then it continues to be backward transferred in the CNN network. Since only convolutional layers exist in the network structure of the time-varying machining data feature extraction model in this paper, the parameter update calculation process in Step7 is given as follows:

Forward calculation process (layer l-1 to layer l):
where M is the number of convolution kernels; g is the ReLU excitation function; and a m and w m are both network parameters. Reverse transfer process (layer l + 1 to layer l): where δ (l) is the gradient error of the lth convolutional layer; w (l) new is the updated weight of the lth convolutional layer; rot180 refers to rotating the matrix by 180°; and η refers to the learning rate, which is usually taken between (0,1).
The 200 × 576 dimensional data matrix is obtained by the time-varying working condition data feature extraction model, that is, it contains 200 1 × 576 dimensional vectors, where each vector corresponds to multiple (6) a (1) = g z (1)

Clamping point force prediction model construction
After the data pre-processing and feature extraction, the time-varying and non-time-varying features of the machining process were obtained. In order to improve the information perception and learning efficiency of the prediction model, we use the Concat algorithm to fuse the two types of working condition features within each sample C t_v and the three-way clamping force values at the clamping points to construct the input vector of the force state prediction model. Then the GRU network is used to learn the time series correlation law between the machining conditions and the force on the clamping point. The prediction model of the force state of the clamping point based on GRU network is shown in Fig. 3:  Input: After data pre-processing, the machining condition data sample C t_v in time T Output: Data feature vector L T Step 1: The original time-varying working condition data C t_v are subjected to interval sampling, prediction sample division, time window division, data pre-processing, and other operations to obtain the data matrix (C t_v ) T in time T; Step 2: Convolution process 1 is performed on this data matrix using 32 kinds of 3 × 3 × 1 convolution kernels to obtain 32 new 200 × 9dimensional data feature matrices. The number of trainable parameters in the convolution kernel, the number of neurons in the C1 layer, and the total number of connections are shown in the figure above; Step 3: The 32 new 200 × 9-dimensional data feature matrices are obtained by convolution process 2 using 32 3 × 3 × 32 convolution kernels on the data matrix (C t_v ) T obtained from the first abstraction extraction. This process is the second abstraction of the hidden features in the data matrix; Step 4: Convolution process 3 is performed on the data matrix obtained from the second abstraction with 64 3 × 3 × 32 convolution kernels to obtain 64 new 200 × 9-dimensional data feature matrices. This process is the third abstraction to extract the hidden features from the data matrix (C t_v ) T .
Step 5: As above, the convolution process 4 is performed on the data matrix obtained from the third abstraction extraction using 64 kinds of 3 × 3 × 64 convolution kernels to obtain 64 new 200 × 9-dimensional data feature matrices. This process is the fourth abstraction extraction of the hidden features in the data matrix; Step 6: The 64 200 × 9-dimensional data feature matrices extracted by the fourth abstraction are pulled into a 200 × 576-dimensional data matrix in order; Step 7: The data matrix obtained from Step 6 is used as input, and the force value data f T+t in the three directions of X/Y/Z at the clamping point T + t is used as the label, and the parameters in the model are updated and adjusted using the principle of reverse error transmission, and the final 200 × 576 dimensional data matrix obtained is the time-varying machining condition data features L T of the predicted sample at time T (1) Model Structure As can be seen from the above figure, the model from the bottom up mainly consists of three parts: prediction model input, prediction model forward calculation, and prediction model backward tuning. The details of each part are as follows: The first part is the input of the prediction model. We use the Concat algorithm to fuse the non-timevarying working condition feature vector, the working condition feature vector extracted by CNN network, and the three-way force value at the clamping point (the data value corresponding to the last moment in each time window) in each time window of the current time period, and then use them as the input of the prediction model. The fusion process using the Concat algorithm can be simply understood as the stitching of vectors, assuming that the non-time-varying condition feature vector in a time window Tw is 1 × m dimensional, the time-varying condition feature vector is 1 × n dimensional, and the three-way force value vector at the clamping point is 1 × k dimensional, and the input vector obtained after fusion is 1 × (m + n + k) dimensional. The second is the forward calculation part of the prediction model, which mainly takes the fusion vectors in each time window of the current time period as input and outputs the predicted values of the force state of the clamping point at a future time after forward calculation by the GRU network. Finally, the prediction model is backward tuned, and we use the error back propagation principle to update and adjust the weights and biases of the neurons in the network. In this paper, the mean square error (MSE) is used as the loss function in the backward tuning process, and the gradient descent process is optimized using the Adam algorithm.

(2) Model training
Although the coordinates of the cutting force and the force at the clamping point input to the GRU network are different, they have little effect on only mining the temporal relationship between them, because as the cutting process proceeds, the data of these two types of forces are recorded simultaneously according to discrete tool points at different sampling frequencies, i.e., the temporal correspondence between them is recorded, and the time window above enables them to correspond more precisely. However, with reference to the conversion relationship between tool coordinates and workpiece coordinates, the coordinates between these two types of forces can also be converted to each other. A more detailed coordinate conversion process can be found in Reference [35]. Here, this paper takes the three-way components of these two types of forces as input and focuses on their relationship in temporal order.
The structure of the GRU network neurons is shown in Fig. 4. The GRU network is a "two-input,two-output" structure, which uses two "gates" (update gate and reset gate) tooperate on the incoming hidden layer information S t-1 at time t-1 andthe input information x t attime t. Each "gate" structure iscomposed of a Sigmoid excitation function and a point-by-point multiplicationoperation, where the update gate z t determines the weight of the hidden layer information S t-1 at time t-1in the candidate hidden layer information S t' at time t and the resetgate r t determines theweight of the candidate hidden layer information S t' at time t in the hidden layer information S t at time t. And the candidate hidden layer information S t' at moment t is obtained from the hidden layerinformation S t-1 at moment t-1 passing through the reset gate andthe input x t at moment t through the excitation function tanh.
The specific training procedure is as follows. The first is the forward calculation of the GRU network with the following formula: where W r is the weight matrix of the reset gate, W z is the weight matrix of the update gate, W h is the weight matrix of the hidden layer information S t-1 at time t-1 as input, W o is the weight matrix of the output layer, and [] indicates that the two matrices are connected, and y t is the output value of the GRU network at time t. Fig. 4 The structure of GRU network neuron From the forward formula, we can see that W r , W z , W h , W o are the weight parameters to be trained and learned by the GRU network, and the first three parameters can be partitioned according to their composition as follows: Assuming that the actual value of a sample data at moment t is y t ' and the output value at moment t obtained using the GRU network is y t , the amount of transmission loss (with mean square error MSE as the loss function) of this network at moment t is: In turn, the amount of transmission loss of this network throughout the time period T is: And then the back propagation through time algorithm can be used to find the intermediate error term in the network at time t as follows: The derivative of the loss quantity with respect to the weight matrix is the gradient of the weight matrix, and the summation of the gradients at each moment is the final gradient in the total time period. Therefore the gradient of each weight parameter matrix W o , W zx etc. in time period T is calculated as: Finally, a new round of updating of each parameter of the GRU network model can be achieved by using the above formula and selecting a suitable learning rate η.
where g t is the current gradient value of this parameter; m t and n t denote the first-order moment estimates and secondorder moment estimates of g t , respectively; p and q are real numbers between 0 and 1; and c is a non-zero constant.

Data
Piston skirt is one of the key parts of diesel engine. Due to the residual stress of the piston blank, the stress change caused by cutting, and the clamping force in machining, it is easy to cause the piston skirt machining deformation. Considering the machining deformation problem in the actual milling process of a diesel engine piston skirt, this section collects the relevant data during the machining process of this piston skirt and verifies the proposed method by predicting the force state of its clamping point. In order to find the appropriate piston skirt machining process to obtain the relevant data, the piston skirt key process is analyzed first.
The key machining processes of piston skirt are shown in Table 3.
Compared with other machining processes, the arc milling process of the piston skirt has a higher material removal volume and does not require any changes to the existing fixture and its clamping method, which can avoid bringing uncertainties that affect the actual machining. Moreover, the clamping point of the milling process is closer to the position of the arc to be machined, and the force change at the clamping point is more obvious, so it is easy to collect the data of the force at this point. Tool diameter, tool edge number, spindle speed, milling depth, and feed rate are non-time-varying data during machining, which are mainly collected by manual recording methods. For time-varying conditions such as machine vibration data, clamping point force data, and cutting force signals, various sensors are required to collect the corresponding data. Table 4 shows the types of sensors and their acquisition frequencies.
Machine tool vibration data is collected by Bosch XDK sensors. By fixing it on the machine tool machining platform and pressing the data acquisition button at the start of machining and the stop button at the end, the vibration signal data for that time period is recorded and stored on the SD card. The details are shown in the Fig. 5.
The clamping force data of the clamping point is collected by the XR-D7 three-way force sensor and the corresponding eight-channel data acquisition card. Two three-way force sensors are integrated into the fixture platen so that the sensor force point is in constant contact with the piston skirt stop without relative slippage, and then the two sensors are connected to an eight-channel data acquisition card so that the clamping force can be collected using the corresponding data acquisition software. The specific settings are shown in Fig. 6.
The cutting force data is collected by means of a SIPKE toolholder. The data collected by the toolholder can be transmitted to the computer via a wireless receiver. Since the toolholder collects the bending moment data in X/Y direction and the cutting force data in Z direction, we use the bending moment value data and the tool overhang length to calculate the cutting force data in X/Y direction. The details are shown in Fig. 7.
The milling cutter parameters are 4 edges, 8 mm diameter, and 35 mm overhang. Cutting experiments were performed at 9 different cutting parameters as shown in Table 5:

Training details and comparative experiments
Firstly, the sampling interval is set for the three time-varying conditions to ensure their synchronization in terms of data volume. Then, the time synchronization of the four time-varying data is ensured by taking the time-varying data with the least amount of data as the base. The specific settings are shown in Table 6.
Considering that the machine vibration data acquisition frequency is only 0.1 kHz, the time window Tw length and sample length are set to 0.01 s and 2 s, respectively. Then, a total of 120,704 samples were obtained for nine sets of processing parameters based on the idea of rolling prediction. The input vector sequence of the clamping point force state prediction model is a sequence of 200 (sample time length/time window) time windows corresponding to the input vectors, and each input vector includes three parts, i.e., non-time-varying working condition features, time-varying working condition features, and force at the clamping point. Take the first set of cutting parameters in the table as an example to illustrate the process of obtaining the inputs to the model.
The number of tool edges used in the experiment is 4 and the diameter is 8 mm. Since this paper only takes the third way of tool walking in the whole project experiment, that is, zigzag tool walking, so simply take its serial number 3 as the description of tool walking way. Taking the first set of cutting parameters in Table 5 as an example, the non-time-varying working condition feature vector C n_t_v can be denoted as follows: The time-varying conditions C t_v include cutting force, machine vibration, and clamping force. The three time-varying conditions are pre-processed (missing value addition, wavelet threshold noise reduction and data normalization) before C n_t_v = (Z, w, D, n, vf , ap) = (4, 3, 8, 220, 70, 0.8)  feature extraction. Take the cutting force under the first set of cutting parameters as an example, set the wavelet base as db3 wavelet, the number of decomposition layers as 1, the signal length as 2, and the adjustment factor as 0.5, and then obtain the cutting force data before and after noise reduction filtering as shown in Fig. 8. Then, the 200 × 9 × 1 dimensional data matrix composed of the three time-varying conditions data after pre-processing is input into the feature extraction model proposed in Sect   The length of the sample and the length of the time window in this paper are 2 s and 0.01 s, respectively, so the constructed GRU network contains a total of 201 GRU structural units after being expanded along the time axis, where the first 200 unit structures are input into the fusion vector corresponding to 1 × 585 in the time window and finally the clamping force state prediction is output in the 201st structural unit.

Result and discussion
Based on the above prediction model, we used the crossvalidation method to verify the accuracy of the clamping force state prediction model. That is, eight of the nine sets in Table 7 of milling machining data sets are selected in turn as the training set and the remaining one as the test set. The cross-validation scheme is shown in the following table.
The prediction sample data of the training set in the nine validation schemes are first randomly disrupted, and then the corresponding prediction models under the nine validation schemes are trained. The model training process is set with a learning rate of 0.0015, a batch size of 128, and a training iteration number of 200, and the gradient descent process is optimized using the Adam algorithm. Finally, the change curve of the loss function (MSE value) of the prediction model training process under the nine validation schemes is shown in Fig. 9.
The fitting effect of the predicted (X/Y/Z three-way force) curves for the test set of the nine validation schemes   Fig. 10.
In order to visually illustrate the prediction model effect, one sample was taken from each of the nine sets of validation scheme test sets, and the results of the true and predicted values are shown in Table 8.
To demonstrate the advantages of the CNN-GRU prediction model, we select datasets 1-8 in Table 5 as the training sets, and train the CNN-LSTM model, CNN-RNN model, and CNN-BP model, respectively, while the performance of each model is evaluated in comparison using dataset 9 as the test set. Figure 11 shows the variation of the loss value (MSE value) for each model during the training process.
The test results of each prediction model are shown in Fig. 12 (only the results of the three-way force prediction for clamping points with 100 samples are shown).
It can be seen that both the CNN-GRU model and the CNN-LSTM model have better prediction results. Further, we chose root mean square error (RMSE), mean absolute error (MAE), correlation coefficient (CC), and Nash-Sutcliffe efficiency coefficient (NSEC) as assessment metrics to quantitatively compare the predictive ability of the four models. The calculated results of the mean values of the evaluation indexes for each model are shown in Table 9.

Conclusion
Accurate prediction of the force state at the clamping point of thin-walled parts is an effective way to optimize fixture clamping and further suppress machining distortion. In this paper, we propose a CNN-GRU based force prediction model for clamping points. Our proposed method takes full advantage of the dynamic time series correlation between the complex working conditions of the machining process. The original data of the machining process is firstly pre-processed, and the CNN is used to perform adaptive feature extraction on the original training data to reduce its dimensionality while obtaining new feature data that can better reflect the essence of machining. Further, the GRU network was constructed and trained to obtain a model with the capability of retaining machining   information for a long time and predicting the force on the clamping point under complex time-varying conditions, in view of the time-series correlation between the timevarying conditions and the change of the clamping point force state itself. Finally, the experimental validation was conducted using the piston skirt milling process, and the results proved the effectiveness of the CNN-GRU prediction method proposed in this paper. However, this paper only predicts the force state of key clamping points during the machining process, while the deformation of the workpiece is the result of the overall force on the workpiece. Therefore, how to extend the proposed method of force prediction at clamping points to force prediction at any point on the workpiece and the prediction of the overall force distribution of the workpiece is the future research content.
Authors' contributions Enming Li conceived of research methods, performed the experiments, and wrote the manuscript. All authors contributed to collect and analyze the data, also revisions.
Funding This research was supported by the National Key R&D Program of China (No. 2019YFB1703800).

Availability of data and material
The raw/processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.