In this section, we describe our data-analytics-based factory operation strategies for solving quality-related problems [P1]−[P4] in detail. In Sect. 3.1, the data that were utilized for establishing the proposed factory operation strategies are presented. In Sect. 3.2, the details of the strategy for addressing each problem are given. Finally, in Sect. 3.3, the proposed data-analytics-based factory operation strategies are summarized.
3.1. Data utilized for the proposed data-analytics-driven factory operation strategies
As this paper targets die-casting factories and presents data-analytics-driven operation management strategies for solving their quality-related problems, die-casting factories should first be understood, and how the data that were utilized for the study were chosen should be explained. Die-casting factories carry out the typical sequential production processes of casting, fitting, blasting, machining, and inspection. Raw materials are melted and forced into a mold by high pressure, and the approximate shape of the product is formed. Then the impurities and residue are detached from the product, and the rough surface is made smooth in the fitting and blasting processes. In the machining process, the delicate shape of the product is formed with a computer numerical control machine. In the final inspection process, the quality of the finished product is investigated, and the product is released. In each stage of the production process, a number of data are generated and can be collected. As this study focused on the quality-related problems concerning the casting process in particular among all the production processes, we discuss herein the data generated when the casting process is conducted.
The casting-related data that can be utilized as independent variables of data analytics can be categorized into two groups: (1) casting parameter data; and (2) sensor data. When a die-casting machine casts a product, the values of the casting parameters (e.g., casting pressure, injection velocity, physical strength) generated during the product casting are measured and provided to the users through a built-in data interface module. Although the types of casting parameter data that can be provided by the data interface module differ depending on the type and age of the die-casting machine, the primary casting parameters are managed by the data interface modules of all die-casting machines. To collect the other data that cannot be provided by the machine, sensors can be attached to the die-casting machine and the peripheral machines (devices). Data like the temperature and pressure of the molds, the temperature of the heating furnace and coolant (which assist the die-casting machine), and the temperature and humidity of the factory can be collected by the sensors. The sensor data can be utilized as the important independent variables as well as the casting parameter data. This study assumed that the die-casting factories that would apply the proposed data-analytics-driven factory operation strategies already have existing infrastructure for collecting the casting parameter and sensor data.
The dependent variables that were utilized for the proposed data-analytics-driven factory operation strategies were selected considering the problems defined earlier. The problems that require dependent variables for constructing data analytics models that can diagnose and predict the statuses of the casting process and product are [P2] and [P3]. [P2] needs outcome data on whether a preheat shot was produced or not, and [P3] needs quality inspection data indicating whether a product is defective or not. This means that the consequence data of the preheat shot and the product quality have to be secured and matched with the values of the relevant independent variables to apply the data-analytics-driven factory operation strategies. This study also assumed that die-casting factories have already collected the values of the dependent variables and have already matched these with the values of the independent variables for a certain period before carrying out the proposed strategies.
3.2. Details of the proposed factory operation strategies for solving each quality-related problem
In this section, we present the details of the proposed data-analytics-driven factory operation strategies for solving each of the quality-related problems defined earlier. Crucial information on the following is needed to understand the proposed strategies and to apply them to die-casting factories: (1) the input and output data; (2) data preprocessing; (3) the data analytics method used; and (4) implementation of the strategy. For each quality-related problem, the proposed factory operation strategy that can address it will be described in terms of these four categories of information.
3.2.1. Proposed factory operation strategy for solving [P1]
The purpose of the proposed factory operation strategy for solving [P1] is to understand the relationship between the casting parameter and sensor data and to derive some insights on managing the gaps between the casting parameter input and output data. In the strategy, statistical analysis rather than advanced machine-learning-based technologies is useful and should be conducted. By understanding the factory data through statistical analysis, the proper casting parameter output values for the production of fair-quality products and the gap allowance can be derived, along with insights like when a gap occurs often and which data are relevant. The main tasks for the strategy are as follows: (1) implementation of exploratory data analysis (EDA) for investigating the trends and correlations between the data; (2) deduction of the optimal casting parameter output values for the production of fair-quality products; and (3) deduction of the upper and lower control limits for casting parameter input–output gap management.
3.2.1.1. Input and output data
In die-casting factories, the casting parameter input values are usually constantly changed while the sensor data input values are usually fixed. To illustrate this, as the sensor data (e.g., temperature and humidity of die-casting factories) are not controllable in reality, the sensor data of the peripheral machines (e.g., temperature of the coolant and heating furnace) are usually utilized along with the relatively constant output values. The types of input data needed for the proposed data-analytics-driven strategy for solving [P1] should be decided considering the environment. The casting parameter input and output values and the sensor data output values are basically needed, and quality classification of each shot is also needed as the input data of the strategy. The input data should be managed with a time stamp for integration with relevant data. An issue that needs to be addressed is that data on the temperature and humidity of the factory should be gathered in multiple spots of the factory. The factory temperature and humidity are usually revealed as crucial data affecting the casting quality, so the data should be managed well as the input data of the strategy. The temperature and humidity of the factory, however, are measured differently according to the locations of the sensors in the factory, such as a section near the factory exit and a section in the middle of the factory. The factory should be properly divided into sections considering the arrangement of the exits, windows, and machines, and the data of each section have to be considered the input data of the strategy.
From the implementation of the proposed data-analytics-driven factory operation strategy for solving [P1] using the casting parameter input data, the following data was found to be useful for enhancing the quality of die-casting factories’ products: the statistical values of each casting parameter and sensor data, the correlation coefficients between the data, the optimal output value of each casting parameter, and the upper and lower control limits of each casting parameter. The statistical values and correlation coefficients can be used as quantitative data-analytics-driven factors to understand the die-casting factory, and the optimal casting parameter output values and control limits can be utilized as factors affecting the closing of the gaps between the casting parameter input and output values.
3.2.1.2. Data preprocessing
What has to be done first in the data preprocessing process for the proposed data-analytics-driven factory operation strategies is to integrate the casting parameter input data and generate datasets to be utilized for statistical analysis. The key parameters that can integrate the relevant casting parameter input and output data, sensor data, and quality classification are the time stamp and the shot ID. The casting parameter data and quality classification can be matched with the shot ID, and the casting parameter and sensor data can be matched with the time stamp. As the casting parameter and sensor data often have different sample rate, data interpolation should be done when they are integrated with the time stamp. For example, let us assume that casting parameter and sensor data are collected every 30 seconds and 1 minute, respectively. In this case, every other casting parameter data cannot have paired sensor data. The sensor data utilized as the input data of the strategy are observed to be continuous time series data, so we can infer that the interpolated value (e.g., the average value of the previous and next sensor data corresponding to the target casting parameter) can be matched with the casting parameter value.
As real field data have numerous noises and missing values, the integrated input data have to be filtered through the data preprocessing processes. Recall that one of the primary purposes of the proposed strategy for solving [P1] is to understand the generalized environment of the factory. In other words, the data that needs to be examined are the integrated input data of the strategy trimmed through outlier elimination and missing-value treatment, which make the data representative of the factory’s generalized circumstances. When outlier elimination is conducted, outliers have to be detected for each subset of the strategy’s input data classified by quality (e.g., fair-quality and defective types). The fact that the data distribution is observed to differ according to the quality of the product can result in the neglect of the data characteristics if the classified subset of the data is not considered. There is a possibility that some trends of the casting parameter or sensor data observed in a certain quality-related problem will be treated as outliers. For each classified casting parameter and sensor data, outliers can be determined with the interquartile range (IQR) of the data, which can be calculated as follows: (3rd quartile of the data) - (1st quartile of the data). Data lower than ((1st quartile) - (1.5×IQR)) or higher than ((3rd quartile) + 1.5×IQR) can be investigated as outliers, and the outliers should be eliminated. Although there are numerous interpolation-based methods for treating missing values, interpolation-based missing-value treatment may lead to a nonconformity problem in the comprehension of data trends. Thus, we recommend that the missing values and outliers be eliminated.
3.2.1.3. Data analytics method
In this subsection, we will present how data analytics is conducted in the proposed strategy for solving [P1] in relation to the three tasks that we defined. An issue that we have to discuss prior to presenting the details of the data analytics method is how to group the input data of the strategy to conduct statistical analysis. As the distribution and trend of the casting parameter and sensor data have deep correlations with time series, statistical analysis should also be conducted after grouping the data based on the time-series-based criteria. Moreover, although the EDA results sometimes cannot show certain data trends when EDA is conducted with a single-unit data value, the results of EDA utilizing statistical values like the average value or standard deviation of multiple data values may provide some meaningful insights. Therefore, we need to conduct statistical analysis utilizing data grouped according to the time-series-based criteria, such as hours, day and night, weekdays and weekends, months, and seasons, or a certain number of successive data.
For the first defined task (i.e., the implementation of EDA for investigating the trends and correlations between the data), the most popular and the simplest way of determining the distribution and tendency of the data is to visualize the data. For the visualization methods, we can use a time series graph and a box plot. By observing the data with the time series graph, the statistical information of several cases can be determined. For example, a case in which a certain parameter shows a periodic increase or decrease according to the time can be identified, along with how a certain parameter increases or decreases when the machine status is changed. With the box plot, approximate information regarding the distribution of the data can be obtained.
As the visualization methods of EDA present only approximate information that can be detected with the naked eyes, more specific statistical information should be obtained from each data by calculating the statistical values of each data, such as the average, maximum, minimum, quartiles, and standard deviation. When we analyze the data grouped in accordance with each quality classification, we can derive specific characteristics of the data distribution for fair quality and for each defective category as well as the time-relevant characteristics of the data, using time-series-based grouped data. Some insights (e.g., a certain casting parameter) can be obtained from the statistical values. For example, the statistical values are observed to be higher when a certain defect occurs compared with fair-quality products, or a certain casting parameter is observed to be higher during a specific time period compared with the other time periods. Such insights can help factory administrators understand the factory with the use of quantitative data. The gaps between the casting parameter input and output values can also be analyzed in a similar way.
The correlations between the casting parameter and sensor data are also important statistical data that can be utilized for understanding a factory quantitatively. By calculating correlation coefficient using all the data and the time-series-based grouped data, how the casting parameter and sensor data affect each other and how each data is correlated with the others can be discovered. Factory administrators can derive some insights, for example, on which data should be managed together when they try to adjust some casting parameter values.
With regard to the second main task of the proposed strategy for solving [P1] (deduction of the optimal casting parameter output values for the production of fair-quality products), the optimal casting parameter output values for the production of fair-quality products can be simply derived from the EDA results. The calculated statistical values of fair-quality products, such as the average or median values of each casting parameter, can be regarded as the optimal casting parameter output values. Whether the seasonal factors affect the optimal casting parameter values should be additionally considered. We have to check if the statistical values have differences in statistical significance according to the groups of data divided by time period. When the differences are statistically significant, the optimal casting parameter output values should be derived, along with the different time periods, such as day and night, hours, months, and seasons.
Based on the derived optimal casting parameter output values, the upper and lower control limits of each casting parameter can be derived using IQR. Let us assume that represents the optimal output value of casting parameter i of product j, and that indicates the observed values of the fair-quality products’ casting parameter i of product j. The upper and lower control limits of each casting parameter of product j can be derived as follows: , where c is a constant. Note that if there is a difference in the optimal output values of specific casting parameter or sensor data by time period, the lower and upper limits should also be managed by time period.
3.2.1.4. Implementation
The results of the proposed strategy for solving [P1] contain quantitative information that can help factory administrators understand the factory data and obtain insights that can support their decision making on how to address [P1]. However, the effects of the proposed strategy with the purpose of managing gaps between the casting parameter input and output values can eventually be maximized when a monitoring system of casting parameter and sensor data is developed using quantitative information. The optimal output values of each casting parameter and sensor data can be set as the standard guidelines for real-time data management, and the gap between the real-time and guideline data can be calculated and monitored in real time. If the gap is monitored continuously and an alarm occurs when the real-time data deviate from the lower or upper control limits derived from the proposed strategy, [P1] can be effectively addressed. The insights obtained regarding the correlations between the data inferred from carrying out the proposed strategy can also be utilized more efficiently when a monitoring system is established.
It should also be noted that when the statistical values of the data are revealed to be significantly different by time period (e.g., shift, season, month), we should manage the data and adopt all the proposed strategies for solving [P1]–[P4] separately according to the statistically significant time periods.
3.2.2. Proposed factory operation strategy for solving [P2]
The purpose of the proposed data-analytics-driven factory operation strategy for solving [P2] is to determine whether the die-casting machine is currently in a normal or preheat condition. Unlike the proposed strategy for solving [P1], the proposed strategy for solving [P2] needs a more advanced data analytics method to diagnose the status of the die-casting machine, such as machine learning methods that can comprehensively consider the variances of the casting parameters. The main task of the strategy is to develop a preheat shot diagnosis algorithm.
3.2.2.1. Input and output data
As the status of the die-casting machine is determined by the casting parameter data directly related with the machine, the values of the casting parameter data should be utilized as input data of the proposed strategy for solving [P2]. Sensor data that can affect the casting parameter data should also be considered input data. Additionally, the labels of the machine status that indicate whether the condition of the machine is normal or preheat have to be considered input data for utilization as dependent variables of the developed algorithm. An important issue regarding the input data of the proposed strategy is the classification problem of the labels. To develop a preheat shot diagnosis algorithm, the exact status of the die-casting machine should be collected for a certain period of time. However, in die-casting factories, factory administrators usually do not collect the exact machine status data indicating whether a shot is a preheat shot or not because doing so requires much effort. Even so, as the exact labels have to be collected to solve [P2], time and effort must be invested for such. The output of the strategy is a machine-learning-based algorithm for diagnosing preheat shots and classification of the casted shot as normal or preheat.
3.2.2.2. Data preprocessing
The first stages of data preprocessing (i.e., integration of the input data, outlier elimination, and missing-value treatment) can be conducted in a way similar to the proposed strategy for solving [P1]. The discriminated data preprocessing in the strategy concerns the generation of training datasets utilized in the development of machine-learning-based algorithms. How training datasets are organized affects the performance of such algorithms. With regard to the organization of the training datasets for the strategy, there were two main issues: (1) the independent variables of the algorithm that was to be developed (casting parameter data) could have subordinative relations with each other; and (2) the number of dependent variables (labels of the machine status) is often imbalanced. Let us examine these two issues and how we can handle them using data preprocessing.
The accuracy of the machine-learning-based data analytics model can vary according to the data sets utilized. To illustrate this point, considering that a large number of data columns does not always yield the best results, for example, when multiple independent variables with high correlations are simultaneously considered a training dataset of the machine-learning-based algorithm, an overfitting problem can occur due to the redundancy of the data features. There are numerous data that can be collected from a die-casting factory, and only the data that can well represent the features of the factory should be selected. A way of selecting best-fitting data columns is to conduct principal component analysis (PCA), an eigenvector-based multivariate analysis method. PCA is an orthogonal linear transformation method that transforms data into a new coordinate system sequencing the scalar projection of the data on the volume of variance. PCA also provides a principal component score (weight of coefficient vector) for each vector of data, and a high score (weight of coefficient vector) means that the data can represent the domain well. The ratio of the weight of coefficient vector to the total weight signifies the degree to which the corresponding data can represent the domain. When all the weights of coefficient vector are arranged in descending order, the dataset corresponding to the over 0.8 or 0.9 weight of the accumulated proportion from the beginning is selected as the representative dataset.
In the die-casting industry, there is an extreme preheat-normal shot occurrence imbalance. When a data-analytics-based algorithm for diagnosing preheat shots is developed utilizing imbalanced data, there is a high probability that the base rate fallacy (i.e., the tendency to ignore the data pertaining to a small number of cases and to focus on the base rate data) will occur. For example, in the case of the die-casting industry, as the number of normal-shot data is generally much bigger than the number of preheat shot data, an algorithm that diagnoses all the shots as a normal shot can have a higher classification accuracy than other algorithms. To avoid committing the base rate fallacy, the data imbalance should be resolved by conducting oversampling, such as the synthetic minority oversampling technique (SMOTE).
3.2.2.3. Data analytics method
As the purpose of this study was to propose general data-analytics-driven factory operation strategies rather than to propose a brand-new data analytics method, we focus here on how we can effectively utilize the previously developed data analytics methods. As there are several previously developed machine learning methods, such as decision tree, random forest, support vector machine (SVM), neural network, AdaBoost, and XGBoost, and as the performance of the data-analytics-based algorithm depends on the machine learning method used, selecting the proper machine learning method is an important issue. A simple and clear way of doing this is to compare the performances of all the data-analytics-based algorithms, such as through the mean absolute percentage error (MAPE). For evaluating the performance of each data-analytics-based algorithm, k-fold cross-validation is recommended. k-fold cross-validation splits data in k sets, and a set is selected as the test set while the other k-1 sets are combined into the corresponding training set. This means that the performance of each algorithm can be evaluated with k numerical experiments. A machine-learning-based algorithm that shows the best performance can be selected as the suitable data-analytics-based algorithm.
The next step we should carry out after we finalize and generate a data-analytics-based algorithm is to calculate the feature importance of each of the data utilized for data analytics, which indicates the relative impact of such data on the algorithm performance. When the developed algorithm is used in the field, the algorithm performance evaluated in the algorithm validation steps cannot be guaranteed. Unlike the preprocessed training data, real field data have noises that can affect the performance of the data-analytics-based algorithm. Moreover, due to the great volume of field data, the computational time required for generating the algorithm can be too long. These problems can be solved by considering the concept of feature importance. The feature importance of each column data can be calculated as the relative extent to which it is reduced by ignoring the column data when the data analytics model is generated. The steps for deriving the feature importance are as follows: (1) develop data analytics models utilizing the selected machine learning method, and exclude each one of the data from one model; (2) calculate the gaps in accuracy between the original data analytics model and each model from which one of the data was excluded; and (3) normalize the gaps. The normalized gap becomes the feature importance of each excluded data. When the data analytics model generated using only the data with a high feature importance value is applied to the field, the algorithm’s validation performance can decrease, but its field implementation performance may increase. The features to be finally adopted should be selected based on the field application test results.
The last step is to set criteria for the update of the generated data-analytics-based algorithm. With the passage of time and as the factory environment changes, the field accuracy of the data-analytics-based algorithm can decrease. To maintain the accuracy of the data-analytics-based algorithm, the algorithm should be continuously updated with the most recent data. The criteria for algorithm update can be set with two standards: time and field accuracy. When a certain time period flows or when the field accuracy of the data-analytics-based algorithm becomes lower than a certain level, we can update the algorithm. However, as the updated algorithm does not guarantee better performance than the original algorithm, we should compare the performances of the updated and original algorithms and adopt the one with a better performance.
3.2.2.4. Implementation
When a shot is casted, the real-time casting parameter data relevant to the shot are integrated in a record and should be entered in the developed preheat shot diagnosis algorithm. When the machine status at the time that the shot is casted is determined as preheat, the shot should be discarded from the lot. For the implementation of the proposed strategy, an infrastructure that enables real-time diagnosis of the die-casting machine status and that enhances the quality of die casting by discarding the shots that are not of fair quality is needed.
3.2.3. Proposed factory operation strategy for solving [P3]
The purpose of the proposed data-analytics-driven factory operation strategy for solving [P3] is to predict whether a casted product will be of fair quality or will become defective right after a shot is casted. Moreover, by diagnosing the causes of the defects created when a shot is predicted to have a defect, the said strategy can assist the factory administrator in making a quality-related decision. There are two main tasks for the strategy: (1) development of a defect prediction algorithm; and (2) development of a defect cause diagnosis algorithm. As the stages of data preprocessing and developing a defect prediction algorithm are similar to those of the proposed strategy for solving [P2], we will skip the said stages here and we will focus on the differentiated contents.
3.2.3.1. Input and output data
For the input data of the strategy, the casting parameter data, sensor data, and quality classification for each shot are utilized when the target factory for the strategy has already constructed an infrastructure for tracking each shot over all the processes of the die-casting factory. To develop a defect prediction algorithm that predicts the quality of each single shot, the quality information of each shot should be mapped with the corresponding casting parameter and sensor data. This means that the ID of each shot has to be granted so that the data regarding a shot will be trackable when the shot is realized to be defective. However, there are actually few die-casting factories that have infrastructure enabling data tracking in product unit. For the factories that cannot develop a defect prediction algorithm in shot unit because they do not have a data collection infrastructure, we propose that a defect prediction algorithm be developed in lot unit. Lot information, including the production time, production quantity, and number of defects, is managed in most die-casting factories. By using the lot information and time stamps, we can match a lot with the set of casting parameter and sensor data collected during the production of the products belonging to the corresponding lot. We can then develop a defect prediction algorithm that can forecast the number of defects for each lot. The results of the defect prediction algorithm in lot unit are also meaningful because the defects of casted products often occur consecutively, and as such, defective casted products are likely to belong to the same lot. As a lot includes a certain number of shots, single values of the casting parameter and sensor data of each shot cannot be utilized as independent variables. Instead, what can be utilized as independent variables are the calculated statistical values of each casting parameter and sensor data of the shots included in the same lot (e.g., average, minimum, maximum, skewness, standard deviation, increasing velocity, decreasing velocity). The shapes of the dependent variables (labels) can also be somewhat different from those in the algorithm in product unit, which considers the quality classification of each shot as a dependent variable. There are several ways of setting the dependent variables of the defect prediction algorithm in lot unit. The number of each quality classification, the ratio of each quality classification to the total product, and the level of defective shots contained in the lot can be regarded as the dependent variables. Among the candidates, we recommend that the level of defective shots contained in the lot be considered a dependent variable because the machine-learning-based algorithm can show better performance when the algorithm classifies data as a class than predicts a certain numerical value. When the classification levels of lots according to the proportion of defective shots among all the shots are 0–10%, 11–20%, 21–30%, and over 30% for example, the developed algorithm becomes a classification algorithm, and its accuracy may be better than that of the other cases.
3.2.3.2. Data analytics method
As the data analytics method for the development of a defect prediction algorithm is similar to the strategy for solving [P3], as mentioned earlier in this subsection, we present herein the details of the data analytics method for the development of a defect cause diagnosis algorithm. To enhance the quality of die casting, the defect causes should be determined because the defects of die casting tend to continuously occur in factories as a result of the casting process until the machine is adjusted right after the first defect occurrence. We propose two ways of developing a defect cause diagnosis algorithm depending on the machine learning method used.
When tree-based data analytics models are utilized, we can infer defect causes from the branch points of the trees. When decision-tree-based ensemble models utilizing boosting or bagging are developed, multiple trees that consider randomly or statistically selected data features are generated, and an integrated model is constructed. Whether a shot is defective or not is determined by the principle of majority rule reflecting the prediction results of each tree. When a shot is predicted as a defect, we can derive specific data conditions that make the model decide that the shot is defective from a set of majority trees. We extract the conditions of the lowest branches of trees included in the majority set and consider the union conditions for the upper-bound conditions and the intersection conditions for the lower-bound conditions as the causes of the defect. For example, when the lowest branches are observed to be x1 ≤ 20, x1 ≤ 15, x2 ≥ 20, and x2 ≥ 30, x1 ≤ 15 and x2 ≥ 30 become the defect causes.
When non-tree-based machine-learning-based algorithms are considered, we can utilize the upper and lower control limits of each casting parameter and sensor data derived in the proposed strategy for solving [P1] for deriving the defect causes. Whenever a defect is predicted, we can compare the corresponding values of the casting parameter and sensor data with the control limits and regard the deviated condition as a defect cause. The proposed algorithm is summarized in Table 1.
3.2.3.3. Implementation
When every shot is casted, a record consisting of the real-time values of the casting parameter and sensor data utilized as independent variables of the defect prediction algorithm should be generated, and then, whether the shot is defective or not should be predicted right after the shot is casted, by implementing the defect prediction algorithm. When a shot is predicted as defective or when a lot is predicted to include more than a certain percentage of defective shots, the shot or lot have to be discarded. When a defect is detected, its causes should be diagnosed and relayed to the factory administrators.
3.2.4. Proposed factory operation strategy for solving [P4]
The purpose of the proposed data-analytics-driven factory operation strategy for dealing with [P4] is to derive quantitative data-analytics-based guidelines for the casting parameter tuning process. As a casting parameter can be affected by the other casting parameter data and by the sensor data, the extent of the adjustment of a casting parameter input value needed to change the output value to the intended value can differ according to the factor’s circumstances. Therefore, we should determine the extent of the casting parameter input value adjustment considering its correlations with the other data. The main task of the strategy is to develop a casting parameter tuning algorithm considering the characteristics of the problem.
3.2.4.1. Input and output data
To implement the proposed strategy for solving [P4], we need information regarding when we will tune the casting parameter input value and how we will do this. For the former information, we need the upper and lower control limits of each casting parameter data and the defect causes, which are the output data of the implemented strategies for solving [P1] and [P3], respectively, as the input data of the strategy. When the values of a specific casting parameter are observed to have deviated from the limits or when similar defect causes are continuously detected, the factory administrators can make a decision to tune the casting parameter input value. The input data of the strategy that can be utilized as independent variables are the input value of each target casting parameter data and the corresponding output values of the other casting parameter data and of the sensor data. For an output value of each casting parameter, the input value of the casting parameter and the output values of the other data should be integrated, and the integrated data should be preprocessed. As the preprocessing procedures are similar to those in the other proposed strategies in terms of outlier elimination and missing-value treatment, we will no longer discuss the data preprocessing part here. The output value of the target data is regarded as a dependent variable. As the output of the strategy, regression equations that can infer how the output value of a certain casting parameter data will be changed as the input value is changed are derived. Moreover, an expert system that determines when to tune the casting parameter input value and the extent of adjustment needed is derived.
3.2.4.2. Data analytics method
The data analytics method can be divided into two components: that for finding the triggers of casting parameter tuning and that for deciding the extent of input adjustment needed for a target casting parameter data. For the former component, we should develop an expert system that can detect the criteria for activating the casting parameter tuning algorithm. The triggers of the casting parameter tuning process can be defined as three cases: (1) when the gap between the input and output values is continuously observed to be high; (2) when the real-time values of the casting parameter or sensor data continuously deviate from the control limits; and (3) when the number of certain defect causes is predicted in a certain time period. For the first case, we have to set a specific time period during which a gap between the input and output values will be continuously observed, as well as the extent of gap allowance. For example, when we set the time period as 10 minutes and the gap allowance as 5%, the average gap between the casting parameter input and output data for every 10-minute period should be calculated, and whenever the average gap deviates 5%, the system has to detect the circumstance. The time period and gap allowance can be set differently as casting parameters. In the second case, we can set the deviation allowance number and standard time period. For example, in a case where we set the deviation allowance number as 5 and the time period as 10 minutes, the system detected a circumstance in which the real-time values of a certain casting parameter deviated more than 5 times in the recent 10 minutes as a trigger of casting parameter tuning. The criteria for the third case are somewhat similar to those for the second case. The deviation allowance number of a certain defect cause’s occurrence and the time period should be set.
When a trigger is detected, we should determine the intended change of the target casting parameter data’s output value. When the first trigger is activated, the average gap can be the intended extent of adjustment, and the average extent of deviation can be the extent of intended change for the second case. For the third case, the average gap between the real-time data and the defect cause diagnosis criteria can be determined as the intended adjustment value.
The next step is to develop a data analytics model for each casting parameter data that derives the input value of the target casting parameter for adjusting the intended output values. Although there are many methods that can derive the input value for casting parameter tuning, we recommend the multivariate linear regression method, which can intuitionally present the relationship between the input and output values of a certain casting parameter data with mathematical models reflecting the correlations between the target casting parameter and other data. To modify the input value of the casting parameter as a way of changing the output value to the intended amount, and to develop the function, we need clear guidelines, such as mathematical equations. Moreover, the fact that the key casting parameter and sensor data are those of the velocity, pressure, and temperature, which are revealed to be proportional to one another, makes the multivariate linear regression method suitable for the strategy. For a target casting parameter data that we want to use to derive a regression model, let us assume that there are n instances of m relevant independent variables xi (i=1, ....,m), and that xm+1 indicates the input value of the casting parameter data. In the case in which H indicates the function of deriving the output value of the target casting parameter, the purpose of developing a multivariate linear regression model is to derive an equation from H(x1,....,xm+1)= w1x1 + w2x2 + ... +wm+1xm+1+c, where wi is the coefficient weight and c is a constant. The cost function of the regression model is calculated as cost(W, b) = , where and yj, respectively, are the value of each independent variable and the value of the dependent variable for instance j. The values of wi for i=1,... ,m+1 and c are derived by minimizing the cost function with the gradient descent algorithm. The input value of the target casting parameter can be derived through the backward calculation of the derived equation.
Selecting proper independent variables relevant to a target casting parameter data is an important issue when a regression model is generated. We can consider all the casting parameter and sensor data or consider some parts of the dataset that have close correlations with the target casting parameter data. As the accuracy of the regression model can vary according to the dataset utilized, we should compare the accuracy levels of several cases, such as by considering all the data and considering only the data that has higher correlation coefficient values than a certain extent. The regression model with the highest accuracy level will be selected as the equation to be utilized for the casting parameter input value tuning algorithm.
Whenever a trigger of the casting parameter tuning process is detected and the intended extent of output value change is derived, the input value that should be adjusted is determined using the developed regression models. By substituting xi (i=1,... ,m) with the average value of the corresponding data recently observed during a certain time period and H(x1,...,xm+1) with the intended output value of the target casting parameter data, we can calculate the input value of the target casting parameter data. It was noted, however, that when an input value of a target casting parameter is adjusted, the other casting parameter data related with the target casting parameter data are also affected. Hence, the input values of the related casting parameters included in the regression model of the target casting parameter should also be adjusted, along with the input values of the target casting parameter, maintaining the same extent of adjustment of the output values. The expected output value of the target casting parameter data should be reflected in the stage.
3.2.4.3. Implementation
To implement the results of the proposed factory operation strategy for solving [P4] in die-casting factories, the consecutive stages of the casting parameter tuning process, from the detection of process triggers to the adjustment of the input value of the target casting parameter data, should be integrated into a complete system. Table 2 shows the detailed stages of the system, including the function of trigger detection, the determination of the intended change of the target casting parameter data’s output value, and the extent of the target casting parameter data’s input value adjustment.
3.3. Summary of the Proposed Factory Operation Strategies
In this section, we summarize the four proposed data-analytics-driven factory operation strategies for solving [P1]–[P4] and present systematic viewpoints of implementing the strategies in an integrated way. For each strategy, we derived the detailed tasks for solving each quality-related problem, as follows: [T1] implementation of EDA for investigating the trends and correlations between data, [T2] deduction of the optimal casting parameter output values for the production of fair-quality products, [T3] deduction of the upper and lower control limits for casting parameter input–output gap management, [T4] development of a preheat shot diagnosis algorithm, [T5] development of a defect prediction algorithm, [T6] development of a defect cause diagnosis algorithm, and [T7] development of a casting parameter tuning algorithm. The algorithms that we derived are summarized in Fig. 2.
Recall that the tasks for solving each quality-related problem that we defined are related with one another, and thus, the precedence relationships of the tasks should be considered when the data-analytics-driven operation strategies are implemented. Figure 3 presents the sequence of implementation of the tasks of the proposed data-analytics-driven factory operation strategies. Before implementing the tasks, the data that will be utilized for the said strategies should be collected and integrated. Then [T1] should be conducted, and the statistical information essential for data analytics, such as the correlation between the data, should be deduced. The results of [T1] are utilized in the following stages and for deriving useful information for the quality enhancement of die casting. The next stages consist of two parallel sequential steps: one consisting of [T2] and [T3] and dealing with [P1] and the other consisting of [T4], [T5], and [T6] and dealing with [P2] and [P3]. Between the two sequential steps, the former is recommended to be conducted first as the results of [T3] can be used when [T6] is conducted. After [T2] is conducted, [T3] is carried out using the optimal casting parameter output value deduced from [T2]. For the latter sequential steps, [T4] should be conducted prior to [T5] because preheat shots are regarded as either defective or of fair quality, and are managed separately. This means that preheat shots should be eliminated and not considered when [T5] is conducted. When a defect prediction algorithm is developed, [T6], which infers the specific causes of the defect, is carried out using the developed defect prediction algorithm or the upper and lower control limits of the casting parameters. After the two parallel stages are completed, [T7] is carried out, triggered from the results of [T3] and [T6].