Correlation analysis of roughness surface height distribution parameters and maximum mises stress

Roughness surfaces contact analysis is cutting-edge research in interface design. The 3D rough surface amplitude distribution characterized by height distribution parameters( Sq (root mean square), Ssk (skewness), Sku (kurtosis)) has a great influence on the extreme value and distribution of the interface contact stress. However, the relationship between height distribution parameters and surface maximum mises stress ( σ max ) is still unclear and lacks of in-depth study. Through BP(Back Propagation) neural network, global sensitivity qualitative (Morris) and quantitative (Sobol) analysis methods, the relationship between Sq , Ssk , Sku and σ max under different loads is studied. Based on complete polynomial and permutation combination method, the optimal correlation model between height distribution parameters and σ max was established, and particle swarm algorithm was introduced to analyze σ max extreme values under different Sq . The results show that: (1) Under different loads, the order about height distribution parameters influence on surface contact stress is: Sq > Ssk > Sku , and as the load increases, the influence of Ssk and Sku gradually decreases. (2) In different roughness surfaces, the influence of Ssk and Sku on the contact performance is significantly full of discrepancy. The research results provide reference and technical support for active design of rough surface microstructure to improved contact performance.


Introduction
Surface morphology significantly affects the contact, wear and friction properties of the components, which restricts its service performance and working hours [1][2][3][4][5]. In order to describe surface topography more accurately, midline characterization method based on the profile section was proposed. The method mainly used surface height arithmetic square root Ra to evaluate the surface quality, and had been widely applied in many industries [6,7]. But with the development of science and technology, the researchers found that only a single roughness parameter characterizing topography can't contain all the surface information. Therefore, roughness parameters based on the characteristics of surface peaks and valleys and Abbott curve: Rp (maximum profile peak height), Rv (maximum profile valley depth), Rmr(relative material ratio) were used to distinguish surface extremum characteristic and bearing performance [8][9][10]. In view of the convenience about roughness parameters, researchers from all over the world defined a series of different roughness parameters according to the actual needs, but ignored the rigor of parameter definition, resulting in the continuous increase of parameters and 'parameter explosion' phenomenon [11].
The redundancy of parameters and the disparate definition of symbols increased the obstacles to surface topography characterization, so the International Standards Organization Technical Committee formulated roughness standard draft ISO 4287 in 1997 to unify parameter names and symbols, and further divide them into amplitude parameters, spacing parameters, hybrid parameters, and material ratio curves and related parameters [12,13] from parameter definitions. However, two-dimensional profile section method was powerless to avoid the loss of topography spatial information and difficult to meet the requirements of modern precision machining. In contrast, three-dimensional surface roughness parameters, widely used in later work, can reflect topography characteristics more comprehensively and truly. The principle of parameter definition is to collect height data in a designated area and store it as a height matrix, and obtain information such as surface height distribution and spatial distribution based on the midplane of the topography [6,14]. Although the three-dimensional parameter classification standard covers functions and related parameters, they are generally used to evaluate the oil storage and lubrication performance of the surface, which cannot be combined with contact performance [15].
It is well known that height distribution parameters are directly related to surface roughness, which significantly affects interface contact strength and fatigue performance of the workpiece. In addition, under the premise of a given autocorrelation function, height distribution parameters are considered to carry surface microstructure statistical characteristics, determined by the manufacturing principle and processing parameters, which can well characterize the surface topography [16,17]. For example, under different roughness conditions, Tayebi et al firstly studied the influence of skewness and kurtosis in Gaussian distribution surface on the static friction coefficient [18]. On the basis, Sedlaček et al set the same root mean square value to observe and analyze the correlation law between skewness, kurtosis and surface dynamic friction coefficient in the contact surface to clarify height distribution parameters influence on the surface friction performance [19,20]. However, Yan et al did a research on the effects of skewness and kurtosis on fatigue life under point contact of mixed elastohydrodynamic lubrication [21]. To reveal the mapping relationship between height statistical distribution parameters and friction and wear properties, Zhu et al explained height distribution parameters influence mechanism about the friction and wear by studying the correlation between height distribution parameters and surface bearing area ratio and establishing the formula between bearing area ratio and friction coefficient in dry friction and lubrication friction [22]. In terms of correlation analysis between contact performance and roughness, Yang et al analyzed the influence of roughness parameters on contact performance with the help of statistical correlation theory, neural network and sensitivity analysis [23,24]. However, the above researches were either based on experiments, whose samples were insufficient and hard to obtain, or did not take height distribution parameters as research objects. Therefore, the related work generally focused on qualitative analysis, leading to limitations in conclusions. And the correlation between height distribution parameters and contact properties of rough surfaces has been obscured by fog.
Since surface reconstruction algorithm can provide large sample in a short time and the default surface morphology characteristics, it is an excellent idea to apply the algorithm to study the relationship between surface properties and performance. And height distribution parameters were the key parameters in the surface reconstruction algorithm, which means that under the premise of a given autocorrelation length and texture direction, a surface with different morphological characteristics can be generated by setting the Sq, Ssk, Sku [25,26]. Secondly, surface maximum mises stress (σ max ) directly reflects the surface bearing capacity and can be well used to evaluate the surface contact performance. As a result, it is wise to clarify the influence of the surface topography on the contact performance by illustrating the relationship between height distribution parameters and σ max .
In the first place, based on the surface reconstruction algorithm proposed by Liao et al [25,26], 264 surfaces with different characteristics were generated in this paper, meanwhile surface maximum mises stress (σ max ) under different loads was obtained from Wen's et al three-dimensional rough surfaces contact analysis model [27], which provided data support for the correlation discussion.
Followed by the first step, considering that there is no direct theoretical analysis model between height distribution parameters and stress, BP neural network was introduced to build a mapping model between them, and the global sensitivity analysis method was used to evaluate the influence of height distribution parameters on contact stress.
Finally, optimal correlation model between height distribution statistical parameters and maximum mises stress was constructed through complete polynomials and combination ideas. The surface contact performance difference under different Sq was demonstrated by particle swarm algorithm. The research quantifies the influence of height distribution parameters on the contact stress under different loads, and provides a reference and technical support for the active design of rough surface microstructures to improve components service performance.

Data sources
On account of the truth that the research focuses on the use of data analysis to dig out the correlation law between height distribution parameters and rough surface contact stress, only a brief introduction to the data source is given. The detailed calculation process of surface reconstruction algorithm and contact model can be found in the literature [23] and [25].

Reconstruction surface
Surface reconstruction algorithm proposed in Literature [23] is used to randomly allocate height distribution parameter values by MATLAB random function under the premise of a given autocorrelation length of 32. The parameter settings are shown in the table 1.
The sampling interval is set as 1 μm, and the total area A=480×640=0.307 mm 2 , and finally 264 sets of non-Gaussian reconstruction rough surfaces are generated. Figure 1 shows two reconstruction surfaces with different roughness.

Surface contact stress
Having 264 reconstruction surfaces been obtained, it is no sweat to calculate σ max from Wen's elastic-plastic contact for three-dimensional rough surfaces considering interaction of asperities. Mechanical property parameters of the surface are set as shown in table 2.
In order to analyze the influence of the surface amplitude characteristics in different loads, dimensionless loads are set to 0.0002, 0.0005, 0.001, and the ratio is 1:2.5:5. Finally, the working conditions based on the three loads are regarded as light load, medium load and heavy load, respectively. The conversion formula between dimensionless load and actual load is as follows [28,29]: P t is the actual load on the surface, P n is the dimensionless load; E is the equivalent elastic modulus; A n is the nominal contact area; The surface is not set to be in full-area contact. because of the usual roll-slip condition in interface contact. A n is 200×640=0.128 mm 2 in the paper.   Although most of surfaces in mises2 exceeds the yield strength, the excess amplitude is low and part of them is still in the elastic deformation stage. In mises3, all the surfaces exceeds the yield strength and enter the plastic deformation stage. It is well known that when considering the surface micro-morphology, the surface plastic deformation is easy to occur in interface contact. The strength criterion cannot be used to simply judge the surface failure. Therefore, stress samples distribution characteristics in figure 2 can clarify the phenomenon that the failure risk of the surface increases as the load increases, which is in line with the preset light and heavy load conditions.

Global sensitivity analysis 2.3.1. Sensitivity analysis model
There is no perfect theoretical analysis model between height distribution parameters and surface contact stress as the research object at present. And the mapping relationship between them is bound to be necessarily nonlinear, so BP neural network is chosen as the sensitivity analysis model.
The structure of BP neural network is composed of input layer, hidden layer and output layer with connecting lines between different layers to ensure the transmission of data. Each connecting line will have a separate weight value to constitute the weight matrix in BP neural network. Based on the weight matrix, the connection strength between neural units in different layers can be quantitatively evaluated. Usually, in complex time series or computer vision data sets, BP neural network will choose multi-layer hiding layer to increase model reliability. However, multi-layer hidden layer is easy to lead to problems such as overfitting and computation speed reduction. Therefore, in general simple data sets, two or more hidden layers are rarely used [30]. Owing to the data characteristics from observation, a single hidden layer network model is constructed. Figure 3 shows BP model structure.
The input layer is Sq, Ssk and Sku, and the output layer is mises1, mises2 and mises3. After training, the  mapping model between height distribution parameters and σ max in multiple loads can be obtained.

Principle of sensitivity analysis
BP neural network provides a reliable model for global sensitivity analysis. Global sensitivity analysis method qualitatively and quantitatively evaluates the influence of different height distribution parameters on σ max by calculating sensitivity coefficient. The relevant principle is as follows:

Morris method: qualitative global sensitivity analysis
Morris sensitivity analysis is a qualitative global sensitivity analysis. The method advantage is that it is stable in sensitivity coefficient calculation, and new samples can be generated from origin samples to conduct research, but insufficient to capture sample features [31], so the method is generally used to preliminarily determine the relative sensitivity at the first step. The analysis process is in display in figure 4. The following are the definitions and explanations of parameters in Morris method.
The matrix size of B 1 is determined by the number of input variables k, usually set to (k+1)×k orderand there is only a single parameter difference between two adjacent rows in the matrix; J 1 also followings the input variables k, set to unit column vector of order k+1; J m is the (k+1)×k order matrix with all positions filled with 1;x * is a randomly generated parametric base sample (x 1 , x 2 , Kx i , K, x k ), belonging to the core in Morris method. Here is its source description. After setting level parameter p, x i becomes a discrete and normalized parameter set M(0, 1/(p-1), 2/(p-2), K, 1). And then x * will be born by means of randomly selecting the element from set M in turn; B is the (k+1)k order matrix, for any element in > D * is a diagonal square matrix of order k, and diagonal element is randomly set to ±1; Therefore, the single Morris sample [32] EE i is calculated by the formula in display.
When the number of generated Morris samples reaches the set value L, there will be L number of EE i values in the last. At this time, the modified mean value * i m is used as the Morris sensitivity coefficient under multiple groups of samples.
is the single sample coefficient calculated under the jth sample of the ith variable.

Sobol method: quantitative sensitivity analysis
Sobol method [33,34] is a quantitative sensitivity analysis based on variance decomposition, which can effectively quantize the influence of a single parameter and parameters interaction in a high-dimensional nonlinear model. So it can well explain the comprehensive influence of a single independent variable on the dependent variable in the case of multiple independent variables. The sample feature capture ability is better than that of Morris method, and thus the sensitive results in large samples are more reliable. The calculation process is as follows.
Through the following three formulas, the model is decomposed into multiple sub-models and the variance V of each order subfunction is obtained, and the ratio Sof the sub-function variance to the total variance is used as the sensitivity coefficient of different variables. S 1,2,K,k is the interaction sensitivity coefficient of all parameters; Through equation (6), it is without an effort to calculate different orders sensitivity coefficient including parameter i. The formula is as follows: is the different order item of containing parameter i. The total effect sensitivity coefficient S T i is used to evaluate the comprehensive influence of different height distribution parameters.

Complete polynomial
Although BP neural network can express the mapping relationship between variables in a fantastic way, the excessive connection lines and complex activation functions restrict its application in regression analysis. Therefore, to face the double in finding the explicit general expression and optimal correlation model between height distribution parameters and σ max , a complete polynomial nonlinear model is introduced to step through the pit. Here's how it works [35,36].
The target response function is expressed as a complete polynomial combination of n independent variables: When m=n, the regression model contains the different order interactions of all variables, and the model has the strongest ability to explain information, which is usually used to optimize the selection of the model. However, it also leads to the explosive growth of equation terms with the increase of m. And the equation, not suitable for the final regression model, is extremely prone to redundant terms. Therefore, it is necessary to prune the equation and select the final regression equation in combination with the engineering meaning and data characteristics.
Considering that it is difficult to directly identify the redundancy of the nonlinear regression equation, a method is proposed to remove redundancy by means of permutation and combination. The calculation process is as follows: (2) Having m been determined in (1), model items N will be fixed. In the next place, according to the combination idea, the adjustment coefficient R k 2 under different combination of equation terms is calculated when k is set, so the number of models is When the total number of equation items is the same, the accuracy of equation fitting will still be inconsistent due to the inconsistency of the selected items. Therefore, the regression model with the highest ( ) R max k 2 under the same number of equation terms is firstly selected.
.  should be fully considered in engineering. Therefore, the final term should be determined in combination with physical meaning and data characteristics.

Particle swarm algorithm
Particle swarm algorithm [37,38] is an excellent optimization algorithm proposed by Eberhart and Kennedy in 1995. Its basic concept originates from the study on the foraging behavior of birds, that is, to find the optimal solution through cooperation and information sharing between different individuals in the population.
First, each particle independently searches for the optimal solution in their own search space, and the optimal solution of each particle will be recorded as the current individual extreme value. In the next step, the optimal solution of the population is determined from different particles, which is also called the current global optimal solution. Finally, according to the current individual extreme value and global optimal solution, the particle search direction and step size will be adjusted continuously until the predetermined goal is reached. The calculation formula is as follows: L is the total number of particles in the population; v i is the particle velocity, x i is the current position; pbest i is the current individual extreme value, gbest i is the current global optimal value; c 1 and c 2 are learning factors respectively, generally a value of 2 [39]; However, this standard form of particle swarm algorithm is not stable. In order to improve the reliability, an improved inertial factor algorithm is introduced. The formula is as follows: Where w is the inertia factor, and its value is nonnegative. When w is large, global optimization ability is strong, but weak in local optimization. On the contrary, global optimization ability is weak with strong local optimization ability.  figure 5.

Results and discussion
As show in figure 5, BP model quickly converges to the target error, and the relative error of contact stress in the test set is generally within 0.075, indicating that the model, used to carry out sensitivity analysis, has good fitting effect and generalization performance and strong reliability.

Sensitivity coefficient
Referring to the research of Song et al [40], set the level parameter p to 10 and disturbance factor m to 0.11 to generate 1000 sets of Morris samples to ensure the reliability of the sensitivity analysis results. And then, due to 264 reconstructed surfaces belonging to large samples, Sobol sensitivity analysis is carried out by the way of utilizing the original samples to better grasp the characteristics of the samples. Sensitivity relative coefficients of Sq, Ssk and Sku are shown in figure 6.
As can be seen from figure 6, in the two qualitative and quantitative sensitivity analyses, the order of influence of Sq, Ssk and Sku under different loads on σ max is Sq Ssk Sku, > > interactively verifying the rationality of the order of parameters.
In addition, although height distribution parameters order of Morris and Sobol sensitivity analysis shows consistency in figure 6, the relative values of the two sensitivity coefficients are still quite different, so it is difficult to compare the influence of height distribution parameters in a same way. In order to observe the comprehensive influence of different height distribution parameters, the relative values of the two sensitivity coefficients are averaged, as shown in table 3.
As can be seen from table 3, when the surface is under light load, the influence of Sq reaches 80%, while the influence of Ssk and Sku is 20%. However, Ssk and Sku will change to about 14% on heavy load. The results show that with the increase of load, the influence of Ssk and Sku decreases gradually, but Sq expands continuously. That is, with the increase of the asperities in surface plastic deformation, the surface morphology has changed greatly. The peak areas of asperities are crushed and flattened when they are in contact, the kurtosis of the entire surface is reduced, and the peak characteristics are attenuated. Therefore, the influence of the two on contact stress is significantly reduced, but this does not mean that when surfaces in plastic deformation, the influence of Ssk and Sku can be ignored.
It is universally acknowledged that Ssk is usually used to describe the surface peak-valley shape, which indicates that when Ssk < 0, the surface is generally in a shape of steep valley and round peak, otherwise, it will be sharp peak and round valley. While Sku is used to evaluate the height and depth of peaks and valleys, that is, when Sku > 3, the surface peaks and valleys are higher and deeper. In sensitivity analysis, Ssk has a stronger influence on contact stress than Sku, demonstrating that the peak shape of asperities is more likely to affect contact performance than the volume.

Establishment and optimization of the optimal correlation model
Due to the existence of activation function and node in BP, it is generally not selected as the regression display expression. And considering that the calculation process of rough surface contact stress is complicated and low calculation efficiency, it is necessary to establish a regression model between the height distribution parameter and the maximum mises stress to directly observe the correlation between them. So according to operating conditions, and material performance parameters, nominal contact area and so on, based on the contact model proposed by Wen, the stress sample data is first obtained. With the regression analysis method, many related parameters can be simplified to solve the regression equation coefficients to avoid the lengthy and complicated calculation process. Finally, establish the optimal relationship model in figure 7 between the height distribution parameters and contact stress.
The analysis is carried out by means of the complete polynomial and optimization method mentioned above. Set the highest degree term to be 2. The regression model contains the following items, as shown in table 4.
Complete polynomial regression equations R 2 under three kinds of loads are calculated:0.9342, 0.9385 and 0.9642, respectively. Because of the existence of R 0.9, 2 > there is no need to solve the complete polynomial under the cubic term. The simplified model is optimized based on the second-order complete polynomial. Table 5 shows the maximum R 2 of the combination when the model takes different number of items. The 2 under different loads is directly selected as the general optimal correlation model at this time, it will be found that the parameter terms under three kinds of loads are different, which does not meet the requirements of the general model and engineering applications.
In order to solve the difference problem of parameter items, the following solutions are proposed: (1) Under a certain load, select all regression model sets with only thousandths difference from adjustment coefficient At this point, it can be considered that the models in the regression model sets and the models with ( ( )) R max max k 2 have basically the same explanatory ability to the dependent variables.
(2) Record the frequency of equation different terms in the regression model sets. The higher the frequency of the selected item is, the more likely it is to be the key term to explain the dependent variable, and it is further restricted in combination with the engineering meaning. In this way, the four selected parameter items with the highest frequency can be used as the final regression polynomials under the corresponding load. In this way, it is avoided that some items have a strong explanatory ability for other items, but they happen to catch a 'free ride', so that the regression model coefficients also present better fake phenomenon.
(3) According to the regression polynomials under different loads, the method in (2) can be used again to unify the parameter differences under multiple loads.  The frequency statistics of parameter items of the three loads are shown in figure 8.
The top four parameter items in mises1 are Sq, Sq 2 , SqSsk, Ssk; mises2 are Sq, SqSsk, SqSku, SskSku; mises3 are Sq, SqSsk, Sq 2 , Ssk. According to the practical meanning, the equation should contain as much variable information as possible, and there is no Sku item in mises1 and mises3. Therefore, the fourth item Skuin mises1 and mises3 should be revised, and the fifth item SskSku should be used as a replacement. The final optimal relationship model can be determined as follows:   The results show that the optimal correlation model fits well under the three loads. Equation adjustment coefficient is only slightly smaller than the maximum adjustment coefficient, and the reliability gets verified.
However, there is still a big gap between engineering surface and reconstruction surface, and the fitting effect of the optimal relationship model on engineering surface height distribution parameters and contact stress is not clear.
In order to clarify the application ability of the optimal relationship model on engineering surface, 264 measured ultrasonic grinding surfaces are selected. Wen's model was used to obtain σ max and verify its fitting effect. Figures 9 and 10 are the grinding surface diagram and stress sample distribution box figure, respectively.
The regression equation of ultrasonic grinding surface height distribution parameters and contact stress is shown as follows:  Under the measured surface, the equation adjustment coefficient R u 2 is reduced by about 0.2 compared with the reconstruction surface, which seems that the fitting effect of the regression model is not very satisfactory. But the model R 2 can not fully represent the fitting effect of the nonlinear equation [41,42], only just for a reference. Therefore, the relative error of ultrasonic grinding surfaces equations should be further observed, as shown in figure 11.
The model relative error in figure 11 is generally less than 0.05. It proves that the optimal relationship model, suitable for predicting contact stress by height distribution parameters under different morphological characteristics, also has excellent fitting ability on the measured surface, further verifying its universality.

Extreme stress value
Although the optimal relationship model builds the bridge between the height distribution parameters σ max , it can only ensure that the equation has sufficient fitting ability while the information of the model will be inevitable to produce partially lost. The optimization algorithm usually requires the model to have a complete information interpretation ability to consider the influence of different factors as much as possible.
Sq, Ssk and Sku are the target research objects. Therefore, cubic complete polynomial can fully consider the interaction of the parameters. Meanwhile, cubic polynomial includes the concavity and convexity of the function, meeting the requirements of the basic model of the optimization algorithm.  As can be seen from table 6: (1) In the case of the same Sq but different Ssk and Sku, the difference between the maximum and minimum of σ max is nearly 250 MPa, consistent with the engineering cognition. Otherwise, the influence of Ssk and Sku is basically less than 14%, keeping pace with the results of the sensitivity analysis when the surface is in plastic deformation. The accuracy of the model and sensitivity analysis are mutually verified.
(2) In condition that Sq is in [0.6,0.7], the minimum σ max is generally generated on the surface with negative Ssk but Sku is 10, indicating that surface with strong kurtosis and negative skewness has better contact performance. At this time, the peaks of the surface asperities are rounder and the overall volume is larger, so the bearing capacity gets improved. However, combination of Ssk and Sku under the maximum σ max presents great difference with the change of Sq but no obvious rule.
(3) Under the circumstances Sq is in [0.8,1.2], the minimum σ max is basically unified as Ssk=−1.5 and Sku=2.9, where surface contact stress Sku close to 3 and negative Ssk is smaller. At this time, negative Sku surfaces close to normal distribution have stronger contact performance. The maximum σ max is basically unified as Ssk=1 and Sku=10, which illustrates that kurtosis to 10 and positively skewed surface is in the worst contact performance.
(4) When Sq is in the range [1.3,1.4], the rule presented by Ssk and Sku combination disappears, and the surface with the maximum and minimum contact stress presents randomness. It indicates that when the surface is too rough, the influence of the surface bearing capacity is no longer the microscopic morphology characteristics but the macroscopic shape characteristics, and the microscopic morphology characteristics lose their role in regulating the contact performance.

Conclusion
(1) With the help of BP neural network and global (qualitative and quantitative) sensitivity analysis method, the influence of height distribution parameters on contact stress gets studied. Under different loads, the order of contact stress is consistent in Sq Ssk Sku. > > And with the increase of load, the influence of Ssk and Sku gradually decreases. When the surface is in elastic deformation, the influence of Sq is 80%, but Ssk and Sku is 20%.The influence of Ssk and Sku on surface plastic deformation caused by heavy load contact is about 14%.
(2) Based on the idea of complete polynomials and combination, and combined with data characteristics and engineering physics significance, a method of optimizing and reducing complete polynomials is proposed. An optimal relationship model between height distribution parameters and contact stress is established by means of the method: (3) According to the particle swarm optimization algorithm, the extreme value of surface contact stress under different Sq is solved. The results show that when Sq is in a lower value, surface contact performance with negative strong kurtosis is stronger. As the surface becomes rougher, the contact performance of the negatively skewed surface close to the normal distribution is better. However, when the surface is too rough (surface is more prone to plastic deformation under the same load), it indicates that the influence of microscopic morphology on the surface bearing capacity is low but mainly controlled by the macroscopic shape characteristics.