Online prediction of automotive tempered glass quality using machine learning

This study explores the application of machine learning algorithms for supporting complex product manufacturing quality through a focus on quality assurance and control. We aim to take advantage of ML technics to solve one of the complex manufacturing problems of the tempered glass manufacturing industry as a first attempt to automate product quality prediction and optimization in this industrial field as an alternative to destructive testing methodologies. The choice of this application field was motivated by the lack of a robust engineering technique to assess the production quality in real time; this arises the need of using advanced smart manufacturing solution as AI to save the extremely high cost of destructive tests. As methodology, this paper investigates the performance of machine learning techniques including Ridge Regression, Linear Regression, Light Gradient Boosting Machine, and Lasso Regression, for predicting the product thermal treatment quality within the selected type of industry. In the first part, we applied the selected machine learning models to a dataset collected manually and made up of the more relevant process parameters of the heating and the quenching process. Evaluating the results of the applied models, based on several performance indicators such as mean absolute error, mean squared error, and r-squared, declared that Ridge Regression was the most accurate model with a mean error of 14.33 which is significantly acceptable in a business point of view and not reachable by any human level experience of prediction. The second part consists of developing a digitalized device connected to the manufacturing process to provide predictions in real time. This device operates as an error-proofing system that sends a reverse signal to the machine in case the prediction shows a non-compliant quality of the current processed product. This study can be expanded to predict the optimal process parameters to use when the predicted values do not meet the desired quality and can advantageously replace the trial-and-error approach that is generally adopted for defining those parameters. The contribution of our work relies on the introduction of a clear methodology (from idea to industrialization) for the design and deployment of an industrial-grade predictive solution within a new field which is the glass transformation.

Tempered glass has to sustain the rigorous safety standards that are stipulated in several homologation programs. The most obvious way to define the safety level of tempered glass [30], regarding the international regulation, is by counting the number of fragments in the standardized fragmentation test. It consists of a destructive test, also called a punch test, where a tempered glass piece is impacted with a pointed tool at the mid-point of the longest edge. Then, from the breakage pattern, the number of the particles is counted from a minimum fragment count area of 50*50 mm2 (Fig. 1).
From a manufacturing point of view, the glass tempering quality check is a costly process since it is based only on systematic destructive tests. To our best knowledge, there are no non-destructive test methods practiced in such industry, which can be caused by the difficulty to interpret the fracture test with accuracy.
The goal of our work is to develop a data-driven decision model to help manufacturing organizations to predict in real time (for each processed part) the tempering quality with an acceptable accuracy and, then, to gain in quality control cost by reducing the count of destructive tests. Such a prediction system can allow other more advantages as the enablement of the early reaction for the process parameters adjustment based on prediction curve evolution.
In the age of the smart factory or "industry 4.0" transformation, where advanced manufacturing technologies are going through an accelerated development [26] a large amount of industrial facilities are still not perceiving how this large panoply of concepts should be deployed and connected to give the expected value [24]. Taking artificial intelligence as one of the key enablers of smart manufacturing [27], only a very few companies have had success establishing some limited use cases of machine learning and AI for decision-making and operation improvement [25]. Other companies fail in deploying AI either for lack of use case ideas, lack of knowledge regarding the deployment on shop floors, or the difficulty of scaling a small use case to cover multiple processes in a full production facility [28].
Several studies have been done to demonstrate the feasibility of modeling a manufacturing system using its input in order to predict the outcome or optimize production settings.
But, the majority of such research are performed offline or using disconnected datasets and stop at the discussion of the modeling or the predictive results, which helps keeping such research far from real implementation and prevents its usage in real manufacturing life. Our current work will go one step further and demonstrate an end-to-end solution from idea to autonomous exploitation in real factory life, passing through the different lifecycles of an industrial machine learning project. With such research, several ideas from different industries can benefit from the same study and follow similarly the described methodology to scale up their AI cases.
The rest of the paper is arranged as follows: Section 2 gives a general overview of the thermally tempering process and the fracture phenomena. Next, the industrial case study, the dataset collection, and the methodology adopted for the ML experiments are presented in Section 3. Subsequently, Section 4 presents the main results, followed by a brief interpretation and comparison of the proposed models. The deployment of the final data-driven decision model is then explained in Section 5. Finally, Section 6 provides some concluding remarks.

Thermal tempering of glass
Thermal tempering is a heat treatment process that consists of elevating the temperature of a material to a critical set point for a certain period of time and then allowing it to cool rapidly at predetermined rate. As a result of this thermal treatment, the glass becomes stronger, but not harder unlike other tempered metals. In order to have a good-quality tempered glass, precise control of heat transfer during the tempering process is required. This consists of controlling the heating which takes place through simultaneous radiation and forced convection. Then, the quench rate must be carefully fixed so that the glass is rapidly cooled to below the glass transition temperature [16]. Tempering gives rise to an extreme temperature difference between the surface and the core (mid-plane) of the piece. Therefore, the glace surface and edges cool off before the core, and this causes permanent stresses in the glass [17]. The core is under tensile stress, while the zones close to the glass surfaces are under compressive stress.
Due to the residual stress state, thermally tempered glass shows a greater resistance to mechanical shocks and thermal stresses. Moreover, when breakage occurs, tempered glass breaks into small and blurred fragments, eliminating the risk of dangerous shards. Hence, thermally tempered glass is also known as tempered safety glass.
The residual stress state obtained by the tempering process is approximately parabolic distributed across the thickness of a tempered glass, with a maximum compression at the free surface and a maximum tension at the center, as shown in Fig. 2. It represents a kind of a balance between compression in the surface layers and tension in the core (mid-plane). Any externally imposed compressive stress at the surface must be compensated before any net tensile stress can occur, which can cause failure.
The residual stress distribution along the plate thickness direction (x-axis) is given by: where s is the surface compressive stress and l is the half thickness of the glass plate.

Fracture of tempered glass
Tempered glass breaks in use when an external load causes tensile stresses in the glass surface to exceed the surface compression due to tempering by more than the tensile strength of the flawed surface. Additionally, fracture may also be initiated by the existence of many flaws (such as pores, cracks, and bubbles) into the interior, where tempered glass is always under tension [9]. In both cases, the propagation of the fracture is spontaneous. In addition, if the equilibrated residual stress state within the glass plate is disturbed sufficiently and if the elastic strain energy in the glass is large enough, it will lead to a complete fragmentation of the glass plate with the creation of many small fragments. Thus, the fragment size depends on the amount of the strain energy stored inside the glass. A high stored strain causes small fragments, while a small one leads to the creation of larger fragments. It's important to mention that the average size of the fragments represents the rough measure of the quenching quality or the quality of the temper (Fig. 3). More details about the fragmentation process can be found in the literature [1,2,8,9,14,18,19,22].

Experimental methodology
This section describes the methodology used for developing our ML-based decision model for the experiments. Firstly, we present the way we construct our dataset which we will use to train our ML models. Next, we describe the process of data acquisition for online prediction. Finally, we exhibit the ML algorithms used for constructing our decision model; then, we set out the evaluation strategy used in the experiments.

Case study
The building glass manufacturing industry consists of producing the safety glasses for cars, bus, trucks, and other transportation systems. The output product is generally either a tempered glass composed of a geometrically curved layer of different thickness and various transparency levels or a multi-layer annealed glass strengthened by highly additive resins (laminated glass). The production of such parts requires some major material transformation steps such as the cutting of 2D flat shapes of glass within precise CNC machines, the grinding of edge to allow safe touch, and the printing of traceability information on the glass surface. Finally, the pre-processed parts are transferred to the main process operations which are the heating, the 3D forming using pressing technologies, and then the tempering (or quenching) that consists of applying an accelerated air flow on the hot glass surface to create The goal of our approach is to construct a sufficiently effective predictive model that tries to learn the physical relationship between the manufacturing settings and the exact fracture (or fragmentation) test result. To reach our goal, we started by selecting the possible manufacturing variables that can influence the glass tempering quality. Then, we implemented a manual procedure for data collection, which was executed during 3 months and covered more than 206 destructive tests where the numerical fracture results were included in a dataset in respect to the manufacturing setting that leaded to those results. Next, we proceeded to the data preprocessing and cleaning which helped us to eliminate the ambiguous or missing values that surely came from the manual data filling procedure. After the described steps, we come into the machine learning algorithms selection and testing, where several models were constructed, tuned, and compared. As a result, mainly three models worth to be described in the present paper are derived from the respective algorithms: Multiple Linear Regression, Artificial Neural Networks, and Random Forests. Then, a validation phase was conducted to decide about the most accurate model as solution for our problem. Finally, after model development and validation, a model deployment in the manufacturing shop floor was designed in order to allow autonomous prediction triggering and reaction based on product flow.

Data construction (collection) for training
The applicability of artificial intelligence and, especially, machine learning in the industrial field requires some expertise in selecting the adequate problem to solve. One key consideration in problem selection is data availability which conducts us to develop robust and efficient ML models able to be generalized. Unfortunately, in our case study and by the lack of digitization, there is no data available. Our first biggest work consist in constructing and collecting the appropriate data for training our ML models.
The data collection process consists, for every destructive test, of recording the process and product parameters and then noting the fragmentation value (Figs. 4 ans 5). The data collection process took 3 months and enabled us in constructing a dataset of 206 samples.

Exploratory data analysis (EDA)
The main task that we perform in this step is the correlation test. To find out if two categorical variables are related, we use the famous chi-square test. In the test that interests us, the null hypothesis is simply "the two variables tested are independent." Finally, the test is accompanied by a test statistic which participates in the decision to reject or not the null hypothesis. The way that this statistic is built has the good technic to follow a chi-square law with a certain degree of freedom.
At the other hand, there is a test to determine if two continuous variables are independent: the Pearson correlation test. The null hypothesis to be tested is identical: "the two variables tested are independent." As for the chi-square test, this one is accompanied by a test statistic and a p-value which determines whether or not the null hypothesis is rejected.
Since our categorical variables are not with high impact on the industrial process result, we present hereafter the pair-plot (Fig. 6) and correlation matrix (Fig. 7) for our dataset.
As it can be seen, some correlation exists between the values of air pressure and air temperature; this is a normal behavior in the pneumatic field: the air temperature increases under pressure increase. We can also notice that the pressure and the air temperature that are applied on the top side of the glass are correlated respectively with the air pressure and air temperature that are applied on the bottom surface of the part. This comes from the fact that a balance between the top and bottom sides of the part in terms of thermal exchange should be maintained to avoid some type of defects like shape of optic.
We decide to maintain the same variables as input and test different techniques of modeling using different machine learning algorithms; some of them already deal with collinearity and multicollinearity like the Ridge Regression algorithm.

Machine learning models and results
The application of machine learning (ML) techniques for predicting product quality and process performance in the manufacturing companies is at the heart of the 4.0 strategy. Successful ML models should be expected to make a significant impact, positively, on global production performance. This is only achievable by selecting appropriate ML algorithms, owning meaningful data and handling a high value-added application. We investigate several supervised learning algorithms before selecting the appropriate ones to apply in our study. In this section, we present the main results of our study and our findings from these results. We exhibit a comparison of the predictive performance obtained by the ML algorithms under their different configurations. The performance of each of the predictive models was evaluated using several performance indicators such as mean absolute error, mean squared error, root mean squared error, r-squared, mean absolute percentage error, and root mean square log error. All the models used in this research are developed using Python 3.6 with the PyTorch, Scikit-Learn, NumPy, and Pandas packages.

Ridge Regression
Ridge Regression is a linear regression variation usually used to handle the problem of multicollinearity in multiple linear regression context. When the independent variables of a linear regression model are highly correlated, least square estimators are unbiased, and variances are large so that the predictions are far away from target values. Ridge Regression technique consists of performing a regularization penalty, also known as L2 penalty, to the loss function during training. Thus, the loss function is altered by adding a penalty equivalent to square of the coefficient's magnitude multiplied by a penalty term , as described in the following equation: which is equivalent to minimizing the loss function in Eq. 3 under condition as described below: Therefore, Ridge Regression puts constraint on the coefficients b j , so that the optimization function is penalized in cases the parameters b j take large values. To sum up, by applying the L2 penalty, Ridge Regression minimizes the standard errors by shrinking the coefficients of input variables in order to avoid the issue of multicollinearity and enhance the accuracy and reliability of regression estimates.
Applying Ridge Regression algorithm on our problem data with regularization strength of 1 in addition to the default solver and a precision of 0.001 shows the results as presented in Table 1. Figure 8 shows the learning curve (bottom right), the residual plot (top right), the prediction error (bottom left), and the predictions vs real values (top left) of the Ridge Regression model.

Light Gradient Boosting Machine
Light Gradient Boosting Machine, or LightGBM for short, is an extension of gradient boosting framework based on decision tree algorithm providing higher efficiency, faster training speed, and improved predictive performance. The algorithm was first introduced by Guoline [10],it is based on two novel techniques, namely, Gradient-based One Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). GOSS is a new sampling method to filter out the data instances, by focusing on instances resulting in a larger gradient while performing random sampling on instances with small gradients, to find the best split value. As for EFB, it is a near-lossless approach for reducing the number of effective features by merging together sparse mutually exclusive features and treating them as a single feature. The two techniques comprise together to provide an efficient and effective implementation of the gradient boosting algorithm, speeding up the learning procedure, and enhancing the capability of handling large-scale data.
Unlike other boosting algorithms that split the tree levelwise, LightGBM splits the tree leaf-wise which means growing the tree by splitting the data at the nodes with the highest loss change (Fig. 9). That is, the leaf-wise algorithm can reduce more loss compared to the level-wise algorithm and hence achieving much better accuracy compared to other boosting techniques. However, when dealing with smaller datasets, leaf-wise algorithm may usually lead to overfitting and increase the model complexity, and therefore, level-wise growth can present a good alternative.
The results in Table 2  The model interpretation plot (Fig. 10) using SHAP (SHapley Additive exPlanations) values shows the high impact of the air pressure and the air temperature applied on the top surface of the part. A high impact of the processing time of the part is also showing a high impact of the prediction result. Figure 11 shows the learning curve (bottom right), the residual plot (top right), the prediction error (bottom left), and the predictions vs real values (top left) of the Light Gradient Boosting model.

Lasso Regression
Just like Ridge Regression, Lasso Regression is another variation of linear regression that uses shrinkage to tighten data values towards a central point, like the mean. This particular type of regression uses L1 regularization technique by adding a penalty equivalent to the absolute value of the coefficient's magnitude. The Lasso technique is of high interest when dealing with multicollinearity issues and also for performing a feature selection procedure. It consists of identifying the variables and its corresponding regression parameters that provide a more accurate predictions. This is achieved by applying a constraint on the model parameters to shrink the regression coefficients towards zero. The loss function for Lasso Regression can be written as: This type of regularization imposes a penalty so that it forces the sum of the absolute value of the magnitude of coefficients to be less than a fixed value λ denoted as the amount of shrinkage. Unlike Ridge Regression, Lasso regularization can lead to zero coefficients that will be eliminated from the model, which is suitable for producing simpler models with few coefficients.
It is equivalent to minimizing the lost function in Eq. 3 under condition as described below: Using Lasso model with the below parameters shows the results displayed on Table 3: Figure 12 shows the learning curve (bottom right), the residual plot (top right), the prediction error (bottom left), and the predictions vs real values (top left) of the Lasso Regression model.

Multiple Linear Regression (MLR)
Multiple regression (MR) analysis can be used for descrip-Multiple Linear Regression (MLR) model is a model that summarizes the relationship between a set of predictor variables and a response variable, also called a criterion. It involves the estimation of multiple regression equation by using parameters entered linearly and estimated by the least square method [15]. The most general form of the regression equation can be expressed as follows: where y i represents the i-th dependent variable, x i represents the i-th independent variable (or explanatory variable), b i is the i-th partial regression coefficient (or regression weights), b 0 is the regression intercept, and is the error related to the i-th observation (or possible variation form).
As mentioned before, the MLR model is constructed using the least square method as the objective function, with the goal of minimizing the sum of the least square errors between the expected and predicted outputs as described in the following equation: MLR is seen as one of the most classical black box ML models used for prediction tasks [5]. It is intuitive and easy to handle with strong generalization and self-learning abilities, which make it popular for a lot of real-time applications especially industrial ones. Black box methods generally didn't require any knowledge of the physical phenomena, and thus these methods, namely, MLR as one of them, provide good predictions without excessive computational cost.
Besides its self-learning abilities and high degree of generalization, MLR is used in our study as one of the models more used for linear problems since we ignore if our problem is a linear or non-linear problem. This makes the choice of MLR a well-founded option for the comparison purposes. Here, we used the MLR proposed in the Scikit-Learn

Random Forest Regression
Random Forest (RF) is a supervised machine learning algorithm used for both prediction and classification problems [6] that was proposed by Breiman in 2001 [4]. Known as a bagging ensemble learning technique, RF is made up of a number of decision trees (Fig. 14) such that each tree is generated from a randomly selected subset of the same training data using replacement. The RF predictions are then produced by aggregating the results (outputs) of all individual trees that form the forest. Robustness to noise, high  interpretability, and insusceptibility to overfitting are key advantages of RF over other traditional ML models. Additionally, RF is an efficient ML model that requires a very little data preprocessing and feature selection, and this is due to the way in which it is constructed. For real and, especially, industrial application, RF is one the most popular and most used ML algorithms in literature because of its ability to deal with high-dimensional and complex datasets. Known as one of the most powerful ML algorithms that require a very little data preprocessing and feature selection, RF was the best candidate for our study. Our RF model is implemented using 120 trees; the number of trees was decided by trial and error using mean absolute error (MAE) as a metric. The other hyper parameters were left to the default Scikit-Learn parameters. For a better and non-biased constructed model, we used the cross-validation strategy [3]. The K-fold cross-validation technique was adopted since it is friendly and easy to use. It consists of dividing randomly the data into a k number of folds that have similar sizes. In our case, we divided our data  into 10 folds, 9 folds will be used to train the RF model and the remaining fold for testing the trained RF model. Table 5 presents the performance metrics. The below parameters of Random Forest algorithm show the best model result (Table 5). In addition, the model's SHAPE values show a slightly different interpretation (Fig. 15) compared to Light Gradient Boosting algorithm since it makes in evidence the pressure and temperature on both top and bottom surfaces of the part and then gives less importance to the processing time of the part.

Artificial Neural Networks (ANN)
Artificial Neural Networks (ANN) have been widely used to deal with both classification and prediction problems. They are known by their high accuracy, which make them more valuable to handle complex problems without requiring exact mathematical description about the underlying phenomena, and huge ability to generalize and deduce the unseen part of a population, especially, when the sample data contain noisy information. ANNs are inspired from the biological neural networks. In the present work, the feed-forward network based on the back propagation training algorithm is adopted as a particular ANN class [20]. It consists of three building blocks also called layers, namely, input layer, hidden layers, and output layer. Each layer is composed of multiple neurons (Fig. 17) that are connected with the neurons of the other layers. The training process of the feed-forward consists of delivering information and associating weights from the input layer, through all hidden layers, until the output layer. Then, a back propagation step is taken in order to adjust the weights, using gradient descent, based on the residual error between the simulated values of the network and the target outputs [21]. The transfer function that associates the neurons of a layer to the other neurons, of the previous and the subsequent layers, is expressed as follows: where f denotes the transfer function, y is the output of the neuron, b is the bias value of the neuron, x is the input vector of the neuron, and w is the weight vector of the neuron.
A key consideration in constructing robust and powerful ANN model is to determine correctly the structure used in terms of number of nodes and hidden layers.
Our ANN model is implemented using an input layer of 9 neurons which represent our input features, two hidden layers, and an output layer of one node since we are predicting one final value (fragmentation count). The number of hidden layers was chosen based on the literature recommendations given that 2 hidden layers can approximate any complex relation between a set of input variables and output variables. For the number of neurons in each hidden layer, we tried different configurations, some using trial and error and others using heuristics and meta-heuristics [11,13,23]. Finally, we adopted three configurations that give better results; one of them used the Huang's network architecture for two hidden layer feed-forward network (TLFN) [7,12]. The ANN model was constructed using the PyTorch package, and its architecture under its three configurations is represented in Table 6.
The performance indices of best constructed ANN models are presented in Table 7. Figure 18 presents the loss curves for both training and test sets for the best configuration. It is found, for each configuration, that the two losses have the same decreased trend which reflects the stability and the performance of the model.

Hyper parameter tuning and optimization for the machine learning models
In the previous part, we presented the selection of machine learning models that were tested on our dataset. Then, we compared several learning and prediction metrics in addition to a relevant set of visualization. Those models were established using default parameters, which mean that we haven't performed yet the step of model optimization and hyper parameter tuning, which is the main topic of this chapter. The target is to search for even better results of the previous models using an additional optimization step.
In this study, we perform a cross-validation of ten folds and then we compare the tuning results over 100 iterations which leads to 1000 fits for each model (10 iterations for each fold) with a selection of the best iteration result. The results are presented using state-of-the-art metrics (or KPIs: key performance indicators). In addition, the computation cost for the tuning process is noted and expressed in a matric of time (number of seconds). The optimization target function was defined as the MAE (mean absolute error). The optimized Ridge model shows a very slight improvement in terms of mean absolute error and standard deviation which goes respectively from 14.78 and 3.99 to 14.76 and 3.97 (Table 8).
The optimization of Ridge Regression model using 1000 fits was executed in 1.2 s which is relatively fast compared to the remaining models. The main selected changes in the model parameters are the value of Alpha which become 4.03 in the optimized model compared to 1.0 in the default model.

3
The optimized LGBM model got an observable improvement in terms of mean absolute error going from 15.21 to 14.33 which is of great value in business point of view (Table 9).
The optimization of Ridge Regression model using 1000 fits was executed in 3.5 s which is relatively higher compared to the Ridge model. This can be explained by the higher number of parameters to tune in this algorithm. In Table 10, we present the main changes on tuned compared to the default model. Table 11 shows the KPIs of the tuned Lasso model. The value of MAE get noticeably improved (14.74) compared to the default parameter settings which has an MAE of 15.28. A slight improvement is also observed on the stand deviation over the cross-validation. The main change is in the alpha value that was reduced from 1 to 0.3 in the optimized version. As computation time, Lasso models execute the iterations in 1.5 s.
The next table (Table 12) displays the tuning result of the Random Forest model which also got improved after the optimization process with a value of MAE decreasing from 15.30 in the previous model to 14.47 in the tuned model. This result can be considered satisfying in a business point of view since we are talking of the real prediction error of the model. It is true that the tuning process takes much longer than other model's tuning with 167 s, but this is caused by nature of the Random Forest algorithm and how it works.
On Table 13, we present the new values of the Random Forest hyper parameters before and after tuning. The tuning of the remaining models, namely, Multiple Linear Regression and Multi-layer perceptron, was not presented in this study since the previously selected setting still gives the best results over other sets of parameter values that have been tested.
The ANN model optimization as a special case can be even more investigated; since it presents a large field of research, we cannot state that our selected parameters give the best result, but we can consider it as acceptable balance between result and optimization cost.

Comparative analysis
The models introduced in the prior sections were all compared for the purpose of providing precise predictions of the thermal quality value in real time with a high degree of generalization. The quality level or value was more closely approximated using Ridge Regression with the cross-validation technique in comparison to that obtained using the remaining ML models. This conclusion is confirmed by comparing the mean absolute error (MAE), the mean squared error (MSE), root mean squared error (RMSE), and r-squared (R 2 ) between predictions and targets of the discussed models. These metrics on Ridge Regression model are considerably smaller; then, at the second and third positions, we observe an acceptable performance of multiple linear regression and Light Gradient Boosting Machine. Artificial Neural Networks, despite the fact that they are robust and more suitable to handle non-linear problems, present comparatively less accuracy than the previous algorithms. This could be explained by the fact that ANN model generally requires large dataset which is not satisfied in our case since our dataset does not exceed 206 samples.

Statistical analysis
Comparing machine learning methods and selecting the best model is a critical operation in applied machine learning.
Models are generally assessed using resampling approaches like K-fold cross-validation from which mean KPI scores are calculated and compared directly. While simple, this method can be disingenuous as it is hard to recognize whether the difference between mean KPI scores is real or the result of a statistical chance.
Statistical significance tests are considered to address this problem and quantify the probability of the samples of skill scores being observed given the assumption that they were drawn from the same distribution. If this assumption, or null hypothesis, is rejected, it suggests that the difference in skill scores is statistically significant.

Null hypothesis
There is no difference between the dependent variables Ridge, LGBM, Lasso, LR, RF, and ANN.

Alternative hypothesis
There is a difference between the dependent variables Ridge, LGBM, Lasso, LR, RF, and ANN.

Summary of ANOVA test
A one-factor analysis of variance with repeated measures showed that there was no significant difference between the variables, F = 0.68, p = 0.642. Thus, the null hypothesis is retained over a significance level of 0.05. Model's performance variation over the 10 folds is shown in Fig. 20. However, ANOVA test results don't map out which groups are different from other groups. As we can see from the hypotheses above, if we can reject the null, we only know that not all of the means are equal. Sometimes, we need to know which groups are significantly different from other groups. Table 14 and Table 15 (post-hoc tests) help us to see the mean difference between each pair of models.
Extended comparison.
In this part, we present an extended comparison list using additional algorithms. The goal is to explore more results and understand how a set of old and modern algorithms perform on our dataset. Table 16 shows the crossvalidated mean value of mean absolute error, mean squared error, root mean square error, r-squared, etc. obtained from models created based on the listed algorithms. The second table (Table 17) shows the standard deviation for the same metrics and gives an idea about the model's stability and generalization capacity. As an overall conclusion, we can notice that the Ridge Regression-based model is still showing the best result over the 14 tested algorithms.
For the same comparison purpose, we explore the model's plots through the residual value graph (Fig. 21), the prediction error graph (Fig. 22), and the predicted vs the real values (Fig. 23). These plots show some additional insight such as a great overfitting showed by the Extra Tree Regression and a non-convergence of the Lars Algorithm for our problem type. The

Real-time data acquisition
The next step, after ML model training, is the data acquisition for performing real-time predictions. It consists in building a pipeline to acquire the parameters automatically and use them for predicting the tempering quality of the current glass during the part processing. The contribution of this section is to show a clear roadmap for the integration of a predictive model into a complex manufacturing process. The pipeline connects directly to the machine's programmable logic controller (PLC) in order to capture the measurements and then send them to the prediction software. Once the online prediction is performed, the software exhibit the product quality value. The software is also able to turn off the machine in the case where the prediction shows a non-compliant tempering quality.

Physical validation of the model
After the offline validation of the predictive model using the test data and validation data, the launch of the model in production required an additional follow-up by getting the production-level prediction cross-checked by destructive testing and physical quality checks. This step showed that the model error matches the actual real average difference, which is more convincing and encouraging for going further in the deployment steps.

Industrial connectivity of the prediction system with the physical process
As mentioned in Fig. 24, the industrialization of our glass tempering quality prediction model requires a digitalized link with the tempering machine in order to get triggered by the arrival of new part, pull the processing parameters, perform prediction, and communicate the result. Our final developed system, in addition to the previous connectivity feature, included also a control function that allows the automatic stop of the machine in case of prediction results that show a bad tempering quality. All the previous functions was implemented using the PLC (programmable logical controller) constructor communication protocol, in our case Ethernet TCP/IP.

Online prediction of automotive tempered glass
Our computer-based system hosts the predictive model which is behind a graphical user interface. This GUI displays the parameter values in 200 (ms) rate and waits for a trigger to perform and visualize the prediction result. The trigger is enabled by a virtual tracking of the workpiece at the time the quenching cycle starts. The actual production rate is 300 pieces per h, which means our predictive system is also performing and delivering 300 values per h that is plotted as visual evolution curve. The predicted value is compared with a min threshold to judge whether the piece is presenting a good quality or not. The client PC (predictive system) is communicating with the main PLC through the Ethernet/IP protocol which is one of the robust industrial communication solutions. A router is installed to expose the PLC variables; then, our application is connected in real time through Wi-Fi network, pulling the measurements data and feeding it to the model (Fig. 25).
The Ethernet/IP protocol was created in 2001 and is more and more developed today. It is a proven industrial Ethernet network solution available for industrial automation.
Ethernet/IP is a member of a family of networks that implements the Common Industrial Protocol (CIP) at its upper layers. CIP encompasses a comprehensive suite of messages and services for industrial automation applications including control, security, timing, motion, configuration, and information.
In the same way as our predictive system, the main SCADA system acquires also data through the same protocol using physical cable. Supervisory control and data acquisition systems collect data from various sensors at a factory, plant, or in other remote locations and then send this data to a central computer which then manages and controls the data. Traditionally, SCADA was designed to be in a private network utilizing line communication. As the scope becomes larger and utilizing line communication becomes impractical, therefore, integrating wireless communication to SCADA was introduced.

Conclusion
In this paper, the performance of several machine learning techniques including Ridge Regression, Light Gradient Boosting Machine, Random Forest, Lasso Regression, and Artificial Neural Networks, was evaluated to predict the glass thermal tempering quality. To this end, a data collection step was performed in order to bring out a suitable dataset followed by developing the more relevant ML models that are able to deal with the complexity of our industrial case study. The ultimate goal of our work is to show an endto-end practical methodology of applying manufacturing quality prediction in complex manufacturing processes, which is not clearly covered in literature. The selected use case justifies a high added value using such technics which can replace traditional costly destructive testing. Results indicated that the applied models have varying performances for predicting thermal tempering quality; however, the Ridge Regression algorithm presented the best overall predictive performance for the test examples which is acceptable for practical purposes. Taking this a step forward, we performed an additional follow-up of the new predictions and getting them cross-checked by Fig. 22 Error plot for the tested algorithms (y-axis, predicted value; x-axis, real value)

Fig. 23
Prediction plot for the tested algorithms (y-axis, tempering quality; x-axis, observations; green, predicted; blue, real) destructive testing and physical quality evaluation. This step confirmed our findings about the model performance. Finally, we constructed a digitalized device that contain our best performing ML model for predicting product quality in real time. The device is connected to the machine's PLC in order to get triggered by the arrival of new part, pull the processing parameters, perform prediction, and then communicate the result. It is also designed as an error-proofing system performing a control function and then sending a reverse signal to the machine to stop in case of anomaly. This work can be extended by adding several optimization technics to allow the proposal of the ideal manufacturing setting to keep the process in the safe interval.
Author contribution The authors' contributions are as follows: Abdelmoula Khdoudi: Conceptualization and design of study, implementation, writing-original draft, results analysis, and approval of the version of the manuscript to be published. Noureddine Barka: Conceptualization and design of study, interpretation of results, writing-reviewing and editing, validation, and approval of the version of the manuscript to be published. Tawfik Masrour: Conceptualization and design of study, interpretation of results, writing-reviewing and editing, validation, and approval of the version of the manuscript to  be published. Ibtissam El-Hassani: Conceptualization and design of study, interpretation of results, writing-reviewing and editing, validation, and approval of the version of the manuscript to be published. Choumicha El Mazgualdi: Conceptualization and design of study, interpretation of results, writing-reviewing and editing, validation, and approval of the version of the manuscript to be published.
Data availability All data, material, and codes used in this paper are available.

Declarations
Ethical approval This article does not involve human or animal participation or data, therefore ethics approval is not applicable.

Consent to participate
This article does not involve human or animal participation or data; therefore, consent to participate is not applicable.

Consent for publication
This article does not involve human or animal participation or data; therefore, consent to publication is not applicable.

Competing interests
The authors declare no competing interests.