Evolutionary cost-tolerance optimization for complex assembly mechanisms via simulation and surrogate modeling approaches: application on micro gears

With the introduction of new technologies, the scope of miniaturization has broadened. The conditions under which complicated products are designed, manufactured, and assembled ultimately influence how well they perform. The intricacy and crucial functionality of products are frequently only fulfilled through the use of high-precision components such as micro gears. In power transmission systems, gears are used in a variety of industries. Micro gears or gears with micro features, with tolerances of less than 5 μm, are pushing manufacturing processes to their technological limits. Monte-Carlo simulation methods enable an accurate forecast of inaccuracies in compliance. The complexity of the micro gear’s design, on the other hand, increases the simulation computation and runtime. An alternative method for simulation is to create a surrogate model to predict the behavior. This paper proposes a statistical surrogate model to predict the conformity of a pair of micro gears. Afterward, the advantage of the surrogate model enables the optimal tolerance assignment while taking gear functionality and production cost into account.


Introduction
The trend toward miniaturization has led to an increase in complexity, and key functions of products often can only be realized by using high-precision components such as micro gears. Micro gears or gears with micro features, with tolerances less than 5 μm, are pushing manufacturing processes to their technological limits. Immanent technological process deviations prevent the manufacturing of components that fulfill quality standards constantly. A typical trade-off in the product design, its manufacturing, and its assembly process exists between saving production costs with high throughput under higher deviations on the one hand and preventing a high number of scrap units with tighter tolerances assuring the gears' functionality under higher capabilities on the other hand [1].
In general, companies use two complementary approaches to cope with this challenge: (1) a priori allocation of the tolerances [2] and (2) an optimal selection of production strategy [3]. Both approaches, however, rely on rather static decision-making processes to meet customer specifications by selecting appropriate design alternatives. Therefore, in order to minimize the effects of uncertainties and assure product functionality, manufacturers need an exhaustive engineering plan that contains the key functions and characteristics of the product as well as allocated tolerances. Additional digital threads such as simulation tools bring a comprehensive perception of the product into the tolerancing and production optimization. As data is transferred up the digital thread from the existent product to the design phase, a truly informed analysis of tolerance sensitivities could be made. To bypass the use of costly and time-consuming physical mock-ups, an effective surrogate model for geometric quality management could be used.
A surrogate model is a simplified mathematical or statistical model used to approximate a more complex or computationally expensive model for faster analysis. It acts as a stand-in for the original model and is commonly used in engineering and finance. Surrogate models are trained on a subset of data from the original model and can provide useful approximations and insights [4]. This surrogate model must be able to forecast the behaviors and performances of the product and manufacturing process, as well as make decisions about the product and manufacturing process [5,6]. The major benefit of the surrogate model is its ability to rapidly assess a high number of additional function evaluations without resorting to more expensive numerical models. Surrogate-assisted optimization enables the determination of an optimum design while at the same time providing insight into the workings of the design. A surrogate model provides the benefit of a low-cost analysis of tolerance sensitivities as well as helps revise the problem definition of a design task. Additionally, it can conveniently handle the existence of multiple desirable design parameters. Also, it offers quantitative assessments of trade-offs along with facilitating global sensitivity evaluations of the design variables [7,8].
Following the previous motivation, this paper focuses on developing a surrogate-assisted tolerance optimization under the uncertainties of micro gears design. This allows for the assessment of the micro gears' technical and economic evaluation. The paper is structured as follows: Section 2 provides the state-of-the-art on tolerance analysis techniques and tolerance allocation optimization. Section 3 details the surrogate-assisted tolerance optimization while assessing manufacturing costs. This section employs a gear geometrical modeling and numerical simulation tool, a consistent cost-tolerance model, and proposes a surrogate model and its associated optimization approach, followed by results from the analysis. Section 4 brings a study of the uncertainty impact on this research. Finally, Section 5 concludes this paper by providing insight into prospects.

Literature
Industrial production is always subject to non-conforming batches which can increase the production costs of mechanisms and be the source of customer dissatisfaction in terms of functional fulfillment. In order to reduce the rate of nonconformity and produce high-quality products, several studies have been conducted for decades on tolerancing. Two main tasks are thus categorized: • Tolerance analysis is employed to assess the assemblability and functionality of a design once tolerances on each component of a mechanical assembly have been specified. [9][10][11]. • Tolerance allocation entails the assignment of the values of adequate tolerances [12,13].
These two categories are described in detail in this section.

Tolerance analysis aims
Tolerance analysis aims at verifying the functionality of a design once the tolerances are specified on each component of a mechanism. Three main issues in tolerance analysis can be distinguished: (1) The first issue originates from the models representing the geometrical deviations. Modeling the geometrical deviations and gaps is the first requirement in order to model behavior and analyze the quality level of the mechanism designed. Several representations are mentioned in the literature to explore the mathematical basis for geometrical modeling [14][15][16]. (2) The second issue is to formulate mathematical models for representing and assessing the mechanical system behavior with deviations. Several studies have been dedicated to the geometrical behavior analysis of overconstrained systems [16][17][18][19]. (3) The last issue is the development of analysis methods.
Tolerance analysis techniques are required to define a mathematical formulation involving all characteristics of the behavior model. It provides an accurately computed quality level. Different analysis techniques exist and are presented. The worst-case (also called deterministic) technique assigns the worst possible combination of each deviation among all the admissible assembly deviation combinations of workpieces [20][21][22]. Statistical tolerance analysis enables to compute the rate at which given individual tolerances can meet the requirements [10,11,[23][24][25].
The study of contemporary tolerance analysis techniques and the necessity of modern industries for having a comprehensive interpretation of complex products can be deemed in the literature. The growth of simulation tools and machine learning (ML) techniques as two crucial computeraided tools for evaluating complex mechanical functionality is investigated. Therefore, the advantages of simulation and ML assist in the prediction of complex mechanisms' behavior in an efficient time and accuracy. The tolerance analysis issues are concluded in Fig. 1. In the next section, cost-tolerance models and the application of simulation\ML tools in cost-tolerance optimization are examined.

Cost-tolerance optimization
Tolerance allocation has a substantial impact on both production cost and quality. To ensure product performance, designers prefer tight tolerances; manufacturers choose loose tolerances to decrease production costs. Tolerances are used to ensure that geometrical product specifications are met while also achieving the lowest possible manufacturing cost. There are currently three tolerance allocation strategies in use: knowledge-based synthesis, rule-based synthesis, and optimization synthesis [26]. A parametric model of the tolerance cost is widely used in the optimization technique. [27][28][29][30][31]. The structures of parametric models range from linear to non-linear [28]. For example, numerous forms of manufacturing cost models can be found, including reciprocal power function (RP) [32], cubic polynomial (cubic-P), and hybrid models that are derived from common cost models in the literature [33].
To obtain an appropriate cost-tolerance model, an extensive individual study on tolerance-variation sensitive analysis and existing manufacturing resources is used [34][35][36][37]. Tsutsumi et al. [38] optimized product design, process planning, and production planning in multi-product assembly, assessing investment efficiency and lowering overall production costs. The authors presented a comprehensive review of the parametric cost-tolerance functions and examined the models' inconsistencies due to parameter variability. Wang et al. [39] established a novel variation management framework for key control characteristics in multistage machining processes considering quality-cost equilibrium. In more recent research, Han et al. [40] incorporated the Monte-Carlo method into the cost-tolerance model. The method tackles the impact of model uncertainty on the economy of quality design and the reliability of optimization results.
Contrary to the parametric cost-tolerance models, several activity-based cost models have been proposed. Etienne et al. [30] proposed an activity-based cost model that rationally provides an accurate indicator of the relevance of designer-specified tolerance values. This model connects the effects of tolerance allocation to all activities in the product lifecycle (manufacturing, inspection, scrap, etc.). Dantan et al. [41] introduced inspection planning into the tolerance allocation. It considers several factors, including the frequency of monitoring and inspection activities, the conformed product rate, the non-detection of the non-conformity rate, and the non-detection of the conformed rate. Moreover, Khezri et al. [42] and Khezri et al. [43] extended the cost-tolerance model by embedding tolerance analysis techniques such as the worst-case method and Monte-Carlo simulation into the model, and the application was illustrated in a two-dimensional tolerancing case study. Khezri et al. [44] investigated the embedded model efficiency for a three-dimensional tolerancing case study that possesses one more dimensional feature. The authors substituted the simulation for a surrogate model which evaluates geometrical behavior response with high accuracy and efficient time. The embedded tolerance analysis techniques associate the costtolerance model with the parts and assembled conformity rate. Since the implementation requires expensive calculation time and lacks in time efficiency as the assembled complexity increases, in this paper a statistical surrogate-assisted model is proposed, and the application on gear assembly possessing rotational and dimensional deviations is applied.

Surrogate-assisted tolerance optimization of micro gears design
In power transmission systems, gears are used in a variety of industries. Some of the advantages include durability, a constant transmission ratio, decreased size, excellent efficiency, and appropriateness for a wide range of powers. However, gears have a variety of disadvantages, such as the vibration of the gear meshing system, which causes unwanted noises. The kinematic transmission error (KTE), which is caused by misalignment of the gear, tooth profile inaccuracies, and tooth deflections, is the main source of such noises [45]. The geometric errors are demonstrated in Fig. 2. KTE represents the variability of an angular displacement. Therefore, the KTE value compromises the quality level of paired gears associated with features' deviations. This section proposes an embedded surrogate model costtolerance optimization approach in order to allocate costefficient tolerances on vital key characteristics of the gears. The development initiates the gathering of onsite data which is measured by highly precise measurement tools and expert operators. Since data gathering consumes time due to the micro gear's geometric complexity, a mathematical and data-based simulation tool is proposed. The simulation tool provides a helpful insight into the gears meshing behavior in calculating the KTE value and number of defectives, but it escalates optimization calculation time once it's embedded. Therefore, a set of experiment runs in the simulation environment are combined with experiment runs in the field to collect the necessary data. Then, a variety of classifiers are trained to predict the assembled system behavior. The result triggers an expedient surrogate model to predict the number of defective functions of tolerance variables. Finally, the surrogate model integrates into an optimization approach allocating cost-effective tolerances.

Gear numerical simulation
The gear tolerance analysis focuses on the analysis of the impact of manufacturing imprecisions (assembly misalignment, runout, eccentricity, pitch error, and form defects) on the kinematic transmission error, which is the difference between the existent position of the output gear and the predicted position if the gears were perfectly conjugate. Many mathematical theories have been developed in order to calculate the kinematic transmission error: • KTE can be regarded as a minimized objective function [47][48][49], • KTE can be modeled by a periodic function with a period 2π/N (N represents the number of teeth of the gear drive).
The case study is a pair of gears (Fig. 3). The studied functional characteristic is the kinematic transmission error

Pitch error R unout
Teeth are not well-angularized. Pitch circle and the hole axis are not well-centered.

Form defect Misalignment
Pinion and the wheel are not well-aligned.
In the presented framework, we implement polynomial KTE functions to predict the tooth-to-tooth KTE and the global KTE. The estimation of the system's behavior response ( f ) (Eq. 1) can be approximated by assessing the polynomial functions [51].
The model is a function of a set of geometrical deviations (Dev), translational, and rotational localization errors on the gears' geometry, which evaluate the global KTE associated with the geometrical deviations. Subsequently, a Monte-Carlo-based simulation is developed to evaluate the gears and assembly conformity. The simulation is outlined in Fig. 4.
The simulation receives a set of tolerances (T Misalignment , T Runout , T Pitch error , f fα ), admissible KTE value, process deviation, and a number of iterations (NMC). Pseudo-random number generators are used to generate a sample of geometrical deviations (Dev Misalignment ,Dev Runout , Dev Pitch error ,Dev fα ) considering the process deviation. Subsequently, the KTE value and deviation responses are evaluated. Each instance of the simulation generates a new set of random geometrical deviations and responses are evaluated and stored. Finally, the number of conformed instances is calculated, and associated conformity rates are evaluated.
The number of simulations is a crucial parameter in this case study; therefore, a comprehensive analysis of the number of simulations is performed. Since further analysis depends on criteria such as the precision of the simulation and limited calculation resources, therefore, NMC = 10 6 is opted. For instance, Fig. 5 demonstrates a simulation run with NMC = 200 out of NMC = 10 6 to provide a better understanding of the KTE fluctuation due to different geometrical deviations at each iteration.
Finally, the simulation is developed and tuned. The simulation tool estimates the conformity rates associated with the tolerance intervals. The tool provides precise predictions, but the process of evaluating a single tolerance interval allocated (1) KTE =f Dev Misalignment , Dev Runout , Dev Pitch error , Dev f to gears is computationally expensive. This is illustrated in Fig. 6b, where simulating and assessing a particular set of tolerances on the gear features for predicting the system's

Cost-tolerance optimization model
This section proposes a tolerancing problem that ensures the assemblability and functionality of the two gears while providing cost-efficient production. Within this section, a  Unlike previous cost-tolerance models, the activities in this model are weighted by the efficiency of the relevant activities, which is correlated to the conformity rate of the assembled (λ) and components (γ). Furthermore, the conformity ratios of the assembly and components are affected by the tolerances used. Moreover, the assembly and the components' conformity ratios depend on the allocated tolerances. As a consequence, a tight tolerance may cause an increase in the manufacturing cost since precise production compensates the manufacturer; however, the conformity rate is promising. On the contrary, a loose tolerance may facilitate manufacturing with less cost, but the conformity rate may decrease. As a result, to assess the assembly cost, Eq. (2) expresses an abstraction of the cost tolerance: (2) This equation represents a statistical cost-tolerance model in which the costs of the activities are constant, but the weights of the activities are related to the allocated tolerances.
Furthermore, the abstract associated ratios and cost model structure are detailed as follows: • The conformity of a gear (γ) is the likelihood a gear meets the design constraints (C d ) corresponding to the geometrical deviation (Dev). This value is predicted using the simulation tool developed in the previous section.
• Assembled conformity rate illustrates the rate that geometrical deviations on the two assembled gears with respect to functional constraints (C f ) set.
• Inspection uncertainties, inclusively, affect the gear conformity rate and the assembled conformity rate; therefore, two common failures are comprised, respectively, type I and type II. Type I failure rate (α) happens once the process is conformed; however, the inspection rejects it, and type II failure rate (β) occurs when a non-conformed process returns as a confirmed process from the inspection. The following definitions explain comprehensively the rate terms employed.   (Cost Proc ) to the gears' conformity rate functions of the tolerances allocated on the gears' characteristics.
• Assembly cost (Cost Assembly ) calculates conformed assembled gears cost.
• Inspection cost (Cost Inspection ) evaluates the inspection cost for the conformed gears before assembly, as well as conformed assembled.
• Scrap cost ( Cost Scrap ) calculates the compensation associated with the non-conformed gears and non-conformed assembled.
Ultimately, the cost-tolerance model is detailed, and the dependencies are explained. The model is established by associating tolerancing impacts with the manufacturing cost. The impact of the allocated tolerances assesses by a surrogate model evaluating the conformity rates. It is discussed in the next section.

Surrogate model development
Analyzing the functional behavior of a complex mechanical assembly, such as micro gears, associated with the impacts of errors in the design is expensive and time-consuming, which entails developing a novel and time-efficient approach. The necessity of an innovative approach shapes the prerequisite of this section on the application of machine learning techniques. In this regard, the examination of the micro gears with numerous geometric  errors and dependencies directed the research to employ artificial intelligence (AI) in tolerance analysis. Figure 7 demonstrates the outlines of this section.

Initial sampling and experimental design
In the previous section, the simulation characterized the gear meshing behavior and evaluated the KTE value, then estimated the number of defective gears per million associated with the input tolerance variables. Afterward, in this section, an experiment space is designed using several techniques to define adequate experiments in the domain of tolerance variables. The experiments are designed using the Scikit-Optimize Python package [52]. Random, Latin hypercube sampling (LHS), Hammersley, and Halton are implemented. The experimental space for 1000 experiments is illustrated in Fig. 8. Following the results of the practice on the small experiment, the Hammersley technique provided well-distributed and homogenous experimental tolerance inputs. In this paper, an internal computational resource with the following specification is utilized for a larger design of the experiment: 48 cores, 192 GB RAM, Intel Xeon Gold 5220R (2.2 GHz). Following internal policies and resource availability, the largest empirical design of experiment size could yield is 40,000 points (tolerance variables). Table 1 represents the calculation time and the detected defectives for a variety of simulation runs for a population of 40,000 (tolerance variables).
Moreover, Fig. 9 demonstrates the number of defective parts per million (dppm) occurrence frequency for different simulations. Since the simulation model predicts dppm as discrete output variables, therefore, the target sets are counted as classes. As is shown, the   Table 2 Optimization test sets   Test_1  Test_2  Test_3  Test_4  Test_5  Test_6  Test_7  Test_8  Test_9  Test_10  Test_11  Test_12   Itr  100  200  500  1000  100  200  500  1000  100  200  500  1000  Pop  20  20  20  20  50  50  50  50  100  100  100  100 configuration of 10 6 million simulation runs and 40,000 experiments provided an adequate amount of target sets that cover the target range homogenously. A skewed dataset is an eminent issue that emerged once analyzing the results which will be handled in the next section.

Imbalanced data refining and surrogate modeling
The designed experiments (i.e., tolerance variables) in the previous section provided well-distributed tolerance points in the tolerance domains; however, once the points are employed in the simulation, the target (dppm) would fluctuate within the no defective (0 dppm) and all defective (10 6 dppm) ranges (Fig. 10a). The empirical data has shown that the target fluctuated in the range of 0 to 500 dppm; therefore, the data needed to be refined and the off-grid points had to be excluded. At this point, the appropriate target range and associated input tolerances are collected. The initial study of the refined and collected target values demonstrates the tendency of the process toward having no defectives (Fig. 10b). This tendency causes a vital influence in the surrogate model training step and triggers an inaccurate model (this is discussed in Section 3.4.3). Generally, sampling approaches are proposed to lessen the impact of imbalanced data. They are broadly divided into two categories-undersampling and over-sampling. Under-sampling techniques are known to provide a compact, balanced training set; on the other hand, over-sampling methods duplicate the rare classes at a specific rate [53]. Since some targets (classes) were experienced once during the experiment, the over-sampling method is employed. In Fig. 10c, the over-sampled balanced data set is shown. Consequently, the new dataset is considered, and the surrogate modeling can be triggered.

Surrogate model
Once the target output is identified and refined (Fig. 10), training an efficient surrogate model can be carried out. The target represents the number of defective parts per million (dppm); therefore, the given data points are discrete values that can be identified as classes. The surrogate model predicts the number of defective (dppm) functions of the input tolerance variables (T). In literature, a variety of classifiers are proposed. In this section, a comprehensive comparison is provided comparing imbalanced and balanced datasets using variant classifiers. The following classifiers were implemented using Scikit-learn [54] and Keras [55] Python package files: support vector classifier (SVC) [56], random forest [57], multi-layer perceptron (MLP) [58], K-nearest neighbors [59], the Gaussian Naive Bayes (GaussianNB) [60], decision tree [61], artificial neural networks (ANNs) [62], and AdaBoost [63]. The comparison of the implemented models on the imbalanced and balanced datasets is depicted in Fig. 11 using "accuracy_score" which returns the number of matches between the actual and predicted values [64].
Accordingly, the artificial neural network led to a highaccuracy approximation surrogate, and the model representation is illustrated in Fig. 12. Afterward, once the surrogate model is established, it can be deployed into the cost-tolerance optimization model (Section 3.2) which aims at allocating the cost-efficient tolerances.

Surrogate-assisted optimization approach
The suggested surrogate model is integrated into the costoptimization model developed in this section. As a result, the optimization and surrogate models enable the identification of ideal tolerances correlated to vital gear characteristics. The implemented surrogate-assisted optimization is illustrated in Fig. 13.
The method starts with defining the geometric design requirements, statistical relations, and goal function. Following that, tolerances are introduced across the key characteristics and examined using the surrogate model, which estimates the system's behavior. The simplicity of surrogate-assisted optimization facilitates the optimization approach adapting to a variety of algorithms. Therefore, in order to obtain optimal tolerances, selfadaptive differential evolution (SADE) is implemented. SADE is a population-based stochastic search technique that has been shown to be a robust evolutionary algorithm for global optimization in many real problems [65]. It is an improved version of the differential evolution (DE) and is instructed by executing mutation, crossover, and selection operators. The original DE algorithms perform five various learning strategies to yield the optimal solution and various control parameters such as Itr, Pop, and P c . On the other hand, SADE uses two out of five learning strategies and does not require pre-specified control parameters [66].
In this regard, several tests are designed and applied to assure global optimal solutions in an efficient time. The test sets and evaluated results are respectively demonstrated in Table 2 and Fig. 14. The percentage error in this step is evaluated compared to a pilot test with Itr = 1500 and Pop = 500 where the minimum cost is gained. Consequently, the optimization approach is tuned considering Test_10 with Itr

Optimization approach analysis
The application of the optimization approach and the following results are analyzed in this section. The functional behavior of assembled gears and their functionality can be realized through the KTE value. Moreover, the responses of the design and functionality constraints associated with the allocated tolerances have to fall in the predefined control level. Therefore, the sensitivity of the approach is studied with various KTE values where a higher KTE value implies a lower quality level and vis-a-vis. Figure 15 illustrates the correlation between KTE value and manufacturing cost. While the designer seeks to enhance assembled functionality by reducing the KTE value, tolerances are getting tighter (Fig. 15e). As a result, the manufacturing section is being imposed providing tighter and more precise components. Therefore, the higher the quality, the higher the manufacturing cost (Fig. 15a) results. Figure 15a illustrates the relative change in cost referenced to the minimum manufacturing cost associated with the KTE value equal to 28 (μm). Moreover, in Section 3.2, the correlation between allocated tolerances and the number of defective parts per million has been discussed. Consequently, Fig. 15b-d demonstrates the associated results. In this study, the crown wheel possesses a complex design with strict boundaries and slighter dimensions that require to be controlled. Therefore, the optimization approach allocates tighter tolerances on the crown wheel features in comparison to the spur gear to avoid an increase in the number of defective gears and consequently the manufacturing cost. Moreover, a full glance at (Fig. 15e) depicts the importance of the pitch error in both gears. Pitch error fluctuates more than other characteristics which explains its influence on the total cost as well as the number of defectives.
In conclusion, a surrogate-assisted approach for micro gears design is explained and developed. The application and analysis of the results demonstrate the evolution in manufacturing cost and the number of defectives associated with the KTE value as the quality criteria. This approach brings Fig. 16 The uncertainty present in the tolerancing field an inexpensive and adaptive tolerance analysis synthesis into tolerance allocation optimization.

Conclusion and future works
The performance of micro gears is determined by their design, manufacturing, and assembly in the context of current internal and external conditions. The behavior of micro gears is frequently realized through the use of high-precision parts. As a result, manufacturers face high-quality requirements, cost pressure, and an increasing number of defective variants. Therefore, in this study, a surrogate model was developed for gears. It estimates the effects of tolerance and its contributions to functional behavior. The results depict the cost and time efficiency of the proposed surrogate model in comparison to the simulation model. Following that, the relationships between tolerances and manufacturing costs should be identified. A cost-activity model is then proposed in order to assess the minimum manufacturing cost while assuring the quality level. The cost-tolerance model associates allocated tolerances with the conformity rates of the gears using the surrogate model, and it evaluates the corresponding manufacturing cost.
As a marginal result of the approach development, the presence of uncertainty in each of which of the steps is recognized. The uncertainties and their passage through the model development establish an uncertainty area that circled the tolerancing domain. This topic can be issued as a vital scientific future study in the tolerancing domain. Figure 16 illustrates the importance of uncertainty mitigation. Besides, in this study, the assembled gears' conformity is assessed while that of the random assembly is assumed. In random assembly, gears are paired on a random basis regardless of their pairing quality. The pairing quality comprises different criteria, e.g., the mean KTE value of the pairs, the number of pairs, and the KTE depression. The future study of this work could be the integration of assembly strategies into the proposed model.