Compilation method of CNC lathe cutting force spectrum based on kernel density estimation of G-SCE

The cutting force spectrum of the CNC lathe is the basic data for the reliability design, reliability test, and reliability evaluation of the CNC lathe and its components. Due to the complex and changeable turning conditions and different cutting processes, the cutting load presents multi-peak characteristics. At the same time, grouping the counted load cycles when parameter modeling will produce certain errors. As a result, the parameter distribution model cannot meet the modeling requirements. Thus, a compilation method based on kernel density estimation (KDE) of goodness-smoothness comprehensive evaluation (G-SCE) is proposed. The KDE is used to establish the dynamic cutting force distribution of the CNC lathe in which grouping the counted load cycles is not needed. For the bandwidth-determining methods, the rule of thumb method (ROT) and the least-squares cross-validation method (LCV) do not take into account the influence of different bandwidths on the goodness estimation and the smoothness of the estimated curve, and the G-CSE for KDE is proposed to determining the optimal bandwidth. It includes the estimation accuracy test method based on multiple goodness-of-fit tests and the smoothness test method based on the envelope curve, and the entropy method is used to comprehensively weights the estimated goodness index and the smoothness index to determine the optimal bandwidth. The results of the case analysis indicate that the method proposed can solve the problem of too large estimation error of parameter distribution for multimodal distribution. At the same time, it can better comprehensively evaluate the KDE under different bandwidths. In short, a new method of optimal bandwidth selection is proposed in the original method.


Introduction
The load spectrum is the image, table, matrix, or other probability characteristic values obtained by counting the time history of the load of the product or its components. It shows the relationship between the load and the occurrence frequency or applied time [1,2]. Load spectrum can reflect the distribution law of random load, so it is widely used in reliability test of component structure [3,4], fatigue life prediction [5][6][7][8], reliability design [9,10], accelerated test [11], and other aspects. Compiling a scientific and reasonable load spectrum of CNC machine tools can provide basic data for the reliability design, reliability tests, and reliability evaluation of key components [12,13].
The main steps of compiling the load spectrum include obtaining the load, counting the load and extrapolating the load [1,14,15], among which the extrapolation for load is the most important step. For the obtained cutting force data, the rain flow counting method is used for counting, and the result is the frequency of the mean and amplitude [16]. In the conventional process of building spectrum, the counting results are classified and combined to groups according to the set number, and the frequency statistical results of the mean and the amplitude of the cutting force are obtained. Finally, the statistical results are fitted to obtain the distribution function of the mean and amplitude. In this process, parametric estimation and nonparametric estimation can be used [1,3,5,]. Due to the diversity and complexity of load conditions, the load exhibits multimodal characteristics [17], and the parametric extrapolation is also derived from the single distribution [1] to the mixed distribution [9,18]. However, there is often a large gap between the basic function assumption of the parametric model and the actual model [15,19]. Meanwhile, it is inaccurate to use the interval mean value to represent the load value in the interval [6] because the distribution of the load value in the interval is not necessarily uniform, which will produce larger errors. More importantly, the density distribution is greatly affected by the width of the subinterval (i.e., each histogram). If the same original data take different subinterval ranges, the displayed results may be completely different [20]. Therefore, in this case, using parametric regression to model the load distribution is not accurate, and kernel density estimation (KDE) can solve this problem well. Rosenblatt and Parzen [21] proposed the KDE method, one of the nonparametric estimation methods. Since the KDE does not use the prior knowledge of the data distribution [22,23] and does not attach any assumptions, it is a method to study the characteristics of the data distribution from the data sample itself [24]. So, the KDE is more comprehensively and accurately to reflect the distribution characteristics of the sample compared with the parametric estimation method. The establishment of the density distribution function is to set a kernel function at each observation value and take the bandwidth as the boundary. The kernel function within the boundary is the distribution probability of the observation value, and then the accumulation of the kernel function of the observations is the density distribution of the observations [25]. Compared with the parametric distribution method, the KDE does not need to group the original data, and it solves the influence of grouping on parametric estimation from the source [26]. Qin [19] used the KDE to estimate the probability distribution of wind speed in different regions and believed that the kernel density method was superior to other parametric estimation methods. Siddharth [27] used the KDE to predict the multi-peak data of the smart meter, and it could better meet the prediction requirements.
Compared with the kernel function, the bandwidth has a more obvious influence on the accuracy of KDE [24,26]. Duong [28] proposed a method to determine the bandwidth of KDE by using the rule of thumb (ROT). The principle of ROT is to solve the optimal bandwidth with the goal of minimum mean integrated square error (MISE). At the same time, he also proposed the least-squares cross-validation (LCV) method to determine the bandwidth, which is based on the minimum integrated square error (ISE) as the goal to solve the optimal bandwidth [29]. On this basis, Bayesian methods [13,22], adaptive methods [30], and diffusion methods [26] have been developed. Dias [30] modeled the river runoff with random data based on the KDE. Compared with the conventional method, it verified the effectiveness of the KDE. Bo HU [20] used the KDE, whose was bandwidth determined by the ROT based on MISE, to establish the probability density distribution of the wind speed in a specific area. He also evaluated the sensitivity of each kernel function and confirmed that the Gaussian kernel function was the optimal kernel function. Based on the above analysis, the ROT based on MISE and the LCV based on ISE are used as bandwidth determination methods, and the Gaussian kernel function is used as the kernel function for KDE in this paper.
For different forms of data distribution, ROT and LCV have corresponding application conditions. It is necessary to determine which method is suitable to corresponding data. Literature [19] proposed using Kolmogorov-Smirnov test (K-S test) and chi-square test to determine the optimal KDE, but two or more tests may have opposite test results due to different test values. It is necessary to make a comprehensive evaluation of multiple test quantities. An evaluation method for KDE based on goodness-smoothness comprehensive evaluation (G-SCE) is proposed in this paper. It uses the K-S test, chi-square test, Anderson-darling test (A-D test), and coefficient of determination to test the goodness of KDE. At the same time, considering that the smoothness of the KDE is also an important factor, the smoothness test of KDE based on the envelope curve was established as well. The entropy method was used to evaluate the weight of each goodness index and smoothness index and calculate the comprehensive evaluation value.
Based on the KDE of G-SCE, the method of establishing the dynamic cutting force spectrum of the CNC machine tool is studied. KDE for rainflow counting results avoids the inaccuracy of probability distribution caused by grouping, and does not need to assume the probability distribution function (PDF). In Section 2, the KDE is introduced, and how to determine the optimal kernel bandwidth is described. In Section 3, a comprehensive evaluation method for the goodness and smoothness of KDE is proposed. Section 4 introduces the acquisition of dynamic cutting force and results of KDE, and the results of the two bandwidth determination methods are compared by G-SCE values.
2 Theory and method 2.1 Kernel density estimation KDE uses observed random variables from an unknown distribution to estimate its density function, and it does not require prior knowledge of the data distribution. It is a predictive estimation using only the sample data itself [31]. Assuming that the density of the random variable X is f, its value can be calculated by the following formula: where h is the length of interval [x-h, x+h] and P(x-h <X <x+h) is the probability that the sample X in the interval. Therefore, the estimated density of f(x) is: , and when |u|>1, X i is not in the interval, so the weight function ω can be defined: Therefore, the estimated density of f(x) at x can be expressed as: Since the definition of the above formula is discontinuous at the boundary of each estimation interval, this will cause the loss of some probability distribution information. In order to achieve a continuous estimation of the probability density, Silverman [31] proposed kernel function K(x) to replace the weight function, and then the kernel density of f(x) at x is estimated as: where h is the bandwidth and N is the number of samples. In KDE, the kernel function K(x) is a weight function, and its shape controls the utilization degree of x when estimating the f(x). There are many types of kernel functions, and commonly used are uniform, triangle, Epanechnikov, Gaussian, quartic, and triweight [31].
Take the Gaussian kernel function as an example to introduce the principle of KDE. For samples with unknown distribution, there is a kernel density function at each sample, and the probability density value at data x can be determined by Eq. (5). Specific examples are shown below. There is a sample set [0.9, 0.6, 2.9, 2.7, 2.8, 2]. As shown by the red dotted line in Fig. 1, at each sample point, there is a Gaussian kernel function, and the result of KDE is the blue line in Fig. 1.
At present, there is no recognized optimal kernel function type. When the bandwidth calculation method is determined, the kernel function type has little effect on KDE, and the effect can be ignored compared with that of bandwidth [24].

Determination of bandwidth
Bandwidth is a key parameter that affects the accuracy of KDE. It reflects the overall smoothness of the kernel density curve and the proportion of observed data points in the formation of the kernel density curve. Different bandwidths have different estimation results as shown in Fig. 2. When the bandwidth is unreasonable, it will bring a poor effect to the KDE. The final density curve is too smooth or too steep. Therefore, it is very necessary to use a suitable method to select the bandwidth.
In order to evaluate the approximation effect of the KDE, evaluation features should be introduced to evaluate the degree of difference between the estimated PDF and the true PDF. Commonly used evaluation features are MISE and ISE.
(1) The expression of MISE is as follows: The (ROT) [28] is introduced to derive the bandwidth: Fig. 1 Gaussian function to estimate kernel density whereσ x is the standard deviation of the sample, X is the mean of sample, and n is the number of samples.
However, the method may lead to over-smooth for multimodal density functions. Therefore, Silverman [31] proposed an improved ROT method to improve the estimation accuracy of complex density functions: whereR is the sample interquartile range.
(2) The calculation of ISE is shown in the following formula: The LCV is introduced to calculate the bandwidth: There is currently no specific selection basis to determine which method can be applied to a certain condition, especially in the field of dynamic cutting force spectrum. The bandwidths obtained by different methods have different values, which have a greater impact on the effect of KDE. Therefore, it is necessary to use appropriate methods to evaluate the effects of KDE obtained by different methods.

Goodness-smoothness comprehensive evaluation (G-SCE)
Different bandwidths will affect the accuracy of KDE and the smoothness of the KDE curve. Literature [6,15] shows that the dynamic cutting force distribution is uneven, showing a multi-peak distribution, which leads to different estimation effects at different positions with different bandwidths. A goodness-smoothness comprehensive evaluation method based on the entropy method is proposed to evaluate the estimation results for cutting force data. As shown in Fig. 3, after bandwidths are calculated by ROT and LCV, smoothness inspection method based on envelope curve and goodness test based on four indexes are combined to evaluate the estimation results of different bandwidths comprehensively by entropy method. Finally, the optimal bandwidth suitable for dynamic cutting force is determined when the comprehensive index F is the smallest.

Goodness test for KDE
The KDE is obtained when the overall distribution is unknown, and the K-S test, A-D test, chi-square test, and coefficient of determination can evaluate the goodness of fit well [10].
The K-S test defines the statistic D based on the largest vertical difference between the observation cumulative PDF F(xi) and the empirical cumulative PDF G(xi): The A-D test measures the integral of the squared distance between the empirical cumulative PDF and the observation cumulative PDF. The A-D test quantity statistic A 2 is: The chi-square test is the degree of deviation between the actual observation value of the statistical sample and the theoretical value. The chi-square test is: where O i is the observed frequency for group i and E i is the expected frequency for group i. The coefficient of determination R 2 is the ratio of the regression sum of squares to the total deviation of squares.
where S re is the residual sum of squares, and S t is the total sum of squares, S t ¼ ∑ n i¼1 y i −y ð Þ 2 . y i is the sample observation value, f i is the model prediction value, and y is the average observation value.

Smoothness test for KDE
The envelopes of the peak and valley points of the estimated curve are drawn, respectively, and the difference between the coordinate values of the upper and lower envelopes is defined as the local smoothness of the curve. The smaller the value, the smoother the curve. The calculation formula of the local smoothness index P z is: where Pu i is the coordinate value of the upper envelope and Pl i is the coordinate value of the lower envelope.
Define the area of the part included in the upper and lower envelopes as the envelope area, P a , whose expression is: where f u (x) is the expression of the upper envelope and f l (x) is the expression of the lower envelope.  Fig. 3 Flow chart of G-SCE

Comprehensive evaluation method based on entropy method
The entropy method believes that different criteria vary with the goal when determining an optimal goal. The value of a criterion changes greatly when the target value changes, indicating that this criterion has a greater impact on the final decision-making and occupies a larger weight value. When the target value changes, the value of a criterion is basically no change, it means that this criterion has little influence in the final decision-making, and its influence can be less considered. Therefore, the entropy method can be used to evaluate the weight of each criterion in the final decision.
The value of the test corresponding to different bandwidths is also different. Suppose the test value of j-th test (total n) corresponding to i-th bandwidth (total m) is x ij and the evaluation matrix is X: The normalization method is used to obtain the standardized fitting test value, and the entropy value E j of each test can be calculated by Eq. (18): where k ¼ 1 lnm .
The entropy weight of each test can be calculated by Eq. (19).
After obtaining the weight vector w=(w 1 , w 2 ,...,w n ), the value of the comprehensive test can be calculated by Eq. (20).
where θ is the coefficient of the evaluation index, for the benefit index (the larger the index value, the better the effect), θ= -1, and for the cost index (the smaller the index value, the better the effect), θ=1. Only the coefficient of determination is the benefit index; others are cost indexes. The smaller the final value F of the comprehensive test index, the better the effect of KDE.

Cutting force data achievement
Stainless steel (1Cr18Ni9Ti) was chosen as the cutting material, and the coated carbide insert (CNMG120408) was selected for the test. The cutting parameters are shown in Table 1. A total of 36 sets of process parameters were carried out.
The dynamic turning force measurement system includes Kistler multi-component dynamometer (9129A), charge amplifier (5070A), data acquisition card (5697A), and computer, as shown in Fig. 4. The dynamometer is fixed on the tool holder of the CNC lathe (HTC 40100n), and the toolbar (MCLNL2525M12) was fixed on the dynamometer through the connector.

Data processing
The rainflow counting method is widely used in engineering, especially the fatigue life analysis of engineering parts [16]. It expresses the measured load history in the form of discrete load cycles. A total of 36 sets of measured dynamic cutting force data were recombined according to different components ( Fig. 5(a)) and counted by rainflow counting method. The counting result is the "from-to" cycle of the cutting force components in the three directions, and the mean and amplitude are used to describe the cycle. The two-dimensional rainflow matrix of the X direction is shown in Fig. 5(b). As shown in Fig. 5, the rainflow counting result is the frequency of amplitude value and mean value. In the previous studies, the statistical results are divided into predetermined load intervals. In the subsequent statistical analysis, there are two problems as mentioned in the Introduction, one is that the mean value of the interval representing the load leads the parameter regression analysis inaccurate because the distribution of the load in the interval is not necessarily uniform. The other is that the density function is greatly affected by the width of the subinterval (i.e., each histogram). If the same original data takes different subinterval ranges, the displayed results may be completely different, and the statistical results will vary with the changing interval.

Estimation of the probability density
When a single parametric distribution or a mixed parametric distribution is used for fitting, the two-dimensional rainflow counting matrix needs to be grouped, for example, into a 32 × 32 matrix or a 64 × 64 matrix. However, the KDE is different from parametric distribution estimation in that the results of rainflow counting do not need to be grouped, which reduces the impact caused by grouping. Instead, it uses the mean and amplitude counting results after rainflow counting directly. Based on the evaluation results of the G-SCE, the ROT and LCV methods for calculating the bandwidth of the kernel density function are judged to determine the optimal KDE of the cutting force distribution.
Based on the above method, the KDE is performed on the rainflow counting results of the dynamic cutting force in different directions, and the results obtained are shown in Fig. 6 to Fig. 9. The results of the goodness test and smoothness test of KDE for dynamic cutting force in the X direction are shown in Table 2.
Rainflow counting is performed on the dynamic cutting force in the X direction, and 427,589 mean-amplitude cycles are obtained. After dividing the cycles into specific intervals, the two-dimensional matrix of the average-amplitude frequency is shown in Fig. 5(b). It can be found that the distribution is not uniform and there are multiple peaks at different locations. The difference between the peak and the valley is quite large. It is difficult to use the parametric estimation to fit them.
The method proposed in this paper is used to establish the dynamic cutting force distribution of the CNC lathe. The LCV method and ROT are used to calculate the bandwidth, and the results of the KDE determined by the two bandwidths are tested by the goodness test and the smoothness test. The results in Table 2 show that the estimation accuracy of the kernel density estimation determined by the ROT is higher than that of the LCV method, regardless of the mean or the amplitude. The smoothness of the KDE curve determined by the LCV method is indistinguishable from the ROT. Finally, the weights of tests are calculated, and the result of comprehensive estimation is that ROT is more suitable for the distribution estimation of the mean and amplitude.
The curves of KDE are shown in Figs. 6 and 7. Figure 6 is the estimation result of the mean distribution, and Fig. 7 is the estimation result of the amplitude distribution. The green line is the KDE curve obtained using the LCV method, the red line is the KDE curve obtained using the ROT, and the histogram is the mean-frequency result of the cycles obtained from rainflow counting which is divided into 100 intervals. It can be seen from A, B, and C in Fig. 6 that when the mean value is small, the peak value of the probability distribution is large, and the difference between the peak value and the valley value is large, which has a bad influence on the estimation result, whether it is LCV method or the ROT. The estimation results in the first half are not completely consistent with reality, which also verifies that the same bandwidth will have different Similarly, for the amplitude, its distribution is a unimodal light-tailed distribution. The result of KDE is shown in Fig.  7, and the ROT is better than the LCV method. Similarly, data processing and rain flow counts are performed on the acquired dynamic cutting forces in the Y direction and Z direction. The rainflow counting result of dynamic cutting force in the Y direction has 469,330 cycles and 471551 cycles in the Z direction. It is similar with the performance in the X direction, and the distribution of the mean of the cutting force in the Y and Z directions is not uniform. There are multiple frequency peaks at different locations, and the difference between the peak and the valley is quite large. It is difficult to use the parametric estimation to fit the mean.
The bandwidths of KDE for the mean and amplitude are calculated by the LCV method and ROT, and the results are shown in Tables 3 and 4. The curves of KDE are shown in Figs. 8 and 9. It can be seen from Fig. 8(a) and Fig. 9(a) that the distributions of mean are multimodal and when the value is small, the peak and the difference between the peak and the valley are larger and the phenomenon has a bad influence on the estimation. The mean of the cutting force in the Z direction is more concentrated in the place where the value is small, and its change trend is more obvious. While the mean of the cutting force in the Y direction is more concentrated in the middle of the distribution, and for the amplitude, its distribution is a unimodal light-tailed distribution. Compared with the X direction, the amplitudes of cutting force in the Y and Z directions are more balanced at the front end of the image, and the trend D D Fig. 7 Probability density estimation of amplitude of dynamic cutting force in the X direction  Table 3 and Table 4. It can be seen that the LCV method is better to determine the bandwidth of the mean and amplitude, and the bandwidth is 23.022 and 3.834, respectively. Similarly, for the dynamic cutting force in the Z direction, the ROT is better than the LCV method for determining the bandwidth of mean, and its value is 6.563. But the LCV method is better for amplitude and the bandwidth is 3.598.

Conclusion
This paper studies the problem of using KDE to model the distribution of dynamic cutting forces. A modeling method based on KDE of G-SCE is proposed. The Gaussian function is selected as the kernel function, and the ROT and LCV methods are used as the bandwidth selection method. As the ROT method and the LCV method do not fully consider the impact of bandwidth on the accuracy and smoothness of KDE, the G-SCE method is proposed to solve this problem. In the G-SCE method, the K-S test, A-D test, chi-square test, and  coefficient of determination are chosen as the test indexes to evaluate the goodness, and the smoothness test method based on the envelope curve is established. The entropy method judged the goodness indexes and the smoothness index comprehensively at last. Rainflow counting is performed on the measured dynamic cutting force of 36 sets. The results show that the mean of dynamic cutting force presents a random multimodal distribution, and a single parameter distribution cannot meet the statistical requirements. The probability density distribution is estimated using the KDE, and the results show that the KDE based on the ROT or the LCV method can describe the multipeak distribution characteristics of the cutting force well, especially at locations with small peak changes.
Simultaneously, the evaluated results of goodness and smoothness by the G-SCE indicate that ROT and LCV methods under different sample data have a different performance. Case analysis shows that the G-SCE method used to estimate the goodness and smoothness can well solve the different performance of ROT and LCV methods under different sample data. The results show that in the X direction, the KDE determined by the ROT can estimates better for the mean and amplitude of the observed dynamic cutting force and its bandwidth is 2.624 and 1.860, respectively. For the dynamic cutting force in the Y direction, the LCV method better estimates its mean and amplitude, with bandwidths of 23.022 and 3.834, respectively. For the dynamic force in the Z direction, the ROT performs better in the mean with a bandwidth of 6.563, but the LCV method performs better in terms of amplitude with a bandwidth of 3.598.
Availability of data and material All the data presented and/or analyzed in this study are available upon request to the corresponding author. a (b) Fig. 9 KDE of mean (a) and amplitude (b) of dynamic cutting force in the Z direction