Monitoring of casting quality using principal component analysis and self-organizing map

The monitoring of casting quality is very important to ensure the safe operation of casting processes. In this paper, in order to improve the accurate detection of casting defects, a combined method based on principal component analysis (PCA) and self-organizing map (SOM) is presented. The proposed method reduces the dimensionality of the original data by the projection of the data onto a smaller subspace through PCA. It uses Hotelling’s T2 and Q statistics as essential features for characterizing the process functionality. The SOM is used to improve the separation between casting defects. It computes the metric distances based similarity, using the T2 and Q (T2Q) statistics as input. A comparative study between conventional SOM, SOM with reduced data, and SOM with selected features is examined. The proposed method is used to identify the running conditions of the low pressure lost foam casting process. The monitoring results indicate that the SOM based on T2Q as feature vectors remains important comparatively to conventional SOM and SOM based on reduced data.


Introduction
Nowadays, the monitoring of casting quality plays a very important role to ensure the steadiness of casting processes. Its main objective is to ensure the dependability of processes and increase their availability with lower cost. Modern developments have led to a broad array of specialized casting processes to create complex castings at a high production rate and good quality. Even in these controlled processes, defects in the output products can occur.
Generally, casting defects fell into one or more of the established seven categories of casting defects such as metallic projections, cavities, discontinuities, defective surface, incomplete casting, and incorrect dimensions or shape [1]. The formation of casting defects is linked to several factors like pouring temperature, metal velocity, molding sand, and refractory coating [1,2]. Obviously, most castings suffer rejection from defect formation. It is therefore essential or necessary to classify the casting defects in order to improve the performance of the process. Traditionally, non-destructive testing techniques (eddy current, magnetic leakage, and so on) have been used and applied with human participation [3,4]. However, these manual techniques are usually exhausting, error-prone, and expensive. Today, many computer systembased defect detection methods bring great convenience for quality control and provide operators with the process operating conditions. Process condition monitoring is not an easy task; it is essentially a problem of process knowledge. In current methods, after data collection, features are first extracted and generated, which often require human expertise. Then, the defect identification takes place. The most effective feature extraction and more accurate identification are needed to obtain the correct operating condition of the process.
In recent years, the use of hybrid form combining several monitoring methods together is an effective tool for improving monitoring accuracy. It requires a critical processing step designed to find the most informative feature. Various combined methods have been widely developed and applied to identify casting defects, e.g., Dabade and Bhedasgaonkar [5] presented a new combination method of casting defect analysis based on Taguchi method and computer-aided casting simulation technique to analyze the sand-related and methoding-related defects in green sand casting. Zhang and Wang [6] optimized the low pressure die-cast process by a combining neural network and genetic algorithm method. Zhao et al. [7] extracted a robust randomly distributed triangle feature for casting defects detection such as cracks, blow holes, shrinkage porosities, and shrinkage cavities. Sata and Ravi [8] developed a casting defect analysis system to analyze and reduce defects in investment castings based on Bayesian inference with pre-processing of the input data. Galan et al. [9] proposed an inspection system based on computations of independent quantities on several areas within an image to identify defects on the surface of a metal part produced by a casting process. Lee et al. [10] developed a fault detection module based on an artificial neural network with the pre-processing of time series temperature and pressure measurement data to reduce the defect rate and production cost in die-casting industry. Niu et al. [11] embedded a deep learning algorithm in a digital camera to quickly and automatically detect casting defects. To effectively detect surface defects in continuous casting process, a methodology for automatic feature extraction and classification using vision-based sensing technology have been presented by Yang et al. [12]. Similarly, Lin et al. [13] proposed a robust detection method of casting defects based on vision attention mechanism and deep learning of feature map. Ai and Xu [14] used contourlet transform and kernel locality preserving projections to extract sufficient and effective features from metal surface data. Yu et al. [15] developed a predictive control strategy using a heat transfer model and quasi-Newton method to improve product quality in continuous casting. In another application, Bouhouche et al. [16,17] extracted a reduced data matrix by principal component analysis (PCA) method, which is used as input to self-organizing map (SOM) algorithm to evaluate the pickling process. Bendjama and Mahdi [18] suggested a combined simulation approach between Monte Carlo method and neural networks to bring essential information on the state of materials. The work presented in this paper is motivated by practical applications in the foundry industry, where a hybrid form of PCA-SOM is used for defect assessment.
The multivariate statistical method PCA has become widely used to improve fault detection accuracy in recent years. PCA is a data compression method; it produces a lower dimensional representation in a way that preserves the correlation structure among the process variables. The collected casting measures are used as inputs to the PCA algorithm, which is in charge of the detection of abnormal situations and provide information about the state of the process using Hotelling's T 2 and Q statistics. These statistical parameters must not exceed threshold values in normal situation; however, sometimes T 2 and Q statistics alone could not detect the faulty conditions, so additional algorithms are necessary.
The proposed hybrid form uses the advantages of the two proposed methods: PCA and SOM. PCA is used for data pre-processing, data reduction, and feature extraction. The features are obtained from casting data analysis using T 2 and Q (T 2 Q) statistics, which represent the input vectors to the SOM algorithm. The SOM computes the metric distances based similarity measure in order to identify the running conditions of the low pressure lost foam casting process. The main goal of this proposed method is to obtain more detailed information contained in the measured data than had been previously possible. The monitoring results using real measurement data demonstrate that casting defects can be clearly identified by utilizing the proposed method.

Experimental setup and data acquisition
The measurement data applied to condition monitoring require different types and levels of equipment and techniques. These depend on the investment and available expertise.  The experimental measurements presented in this paper are entirely based on the casting data acquired from the low pressure lost foam casting process. As shown in Fig. 1, the casting process utilizes a resistance furnace capable of melting standard aluminum base alloys. Air pressure is applied to the chamber containing the crucible to push liquid metal up into a flask containing the foam pattern and unbounded sand. More details about the casting process were reported in [19,20].
The low pressure lost foam casting process is utilized to create complex castings. However, filling the mold with molten metal can produce an undesired casting. Generally, a casting defect is defined as all observable and unplanned variation. If a defect occurs, measures must be adopted to control and monitor the casting conditions. The data used in this study were gained through the experimental measurements in normal (NR) or healthy operating conditions and abnormal functionality of the casting process including three defects: cracks (SC), holes (SH), and metal penetration (MP).
To acquire data, five temperature transducers were implemented in the process by five thermocouples for temperature input. The pressure and temperature inputs were wired to a National Instruments (NI) data acquisition board with a USB interface. The foam pattern was supported in the flask and the thermocouples were wired to the data acquisition unit. When the vessel is pressurized, liquid metal rises through a steel pipe into the flask. All test part was cast using AlSi12 alloy at temperatures between 725 and 750 °C (Table 1). NI DASY Lab software was used to collect and analyze the signals from the temperature sensors. The measured variables are listed in Table 2 and presented in Fig. 2.
Ten days of experimental measurements were collected from 9 casting tests, including normal data and faulty data. Each data set includes eight measurement variables and m observations or samples at different times which make it possible to construct an m × 8 input matrix. The obtained matrices, from all measured data, are then used to compute the PCA and SOM models to better control the quality of castings.
3 Improvement of condition monitoring using the hybrid form PCA-SOM

Principal component analysis
The PCA [21] is a multivariate statistical analysis technique, which reduces the original data space into a smaller dimension space in terms of protecting the main original data information. Given data matrix X ∈ ℜ m×n composed of m observations or samples and n variables, which have been normalized to have 0 mean and unit variance, PCA is only interested in its variance and covariance. PCA actually relies on eigenvalue/eigenvector decomposition of the covariance or correlation matrix C given as follows:  where D = diag(λ 1 ….λ n ) is a diagonal matrix with diagonal elements in decreasing magnitude order and P contains the eigenvectors.
PCA determines an optimal linear transformation of the data matrix X in terms of capturing the variation in the data as follows: where T ∈ ℜ m×k is the principal component matrix and the matrix, P ∈ ℜ m×k contains the principal vectors which are the eigenvectors associated to the eigenvalues λ i of the covariance matrix and k denotes the principal components number (PCs) of PCA model. A key issue to develop a PCA model is to choose the adequate number of PCs. A number of wellknown techniques have been proposed for selecting the number of PCs [22]. In this study, the specific calculation method of selected k follows the cumulative percent variance (CPV) principal [23]. It is a measure of the percent variance, such as 85%, captured by the first PCs.
The difference between X and X is the residual matrix E. It can be calculated as follows: where I is the unit matrix.
To perform process fault detection, a PCA model of the normal operating conditions must be built. When a new observation data is subject to faults, these new data can be compared to the PCA model. The correlation of the new data is detected by Hotelling's T 2 and Q, called also as squared prediction error (SPE), statistics as follows: The process is considered normal if Hotelling's T 2 and Q statistics do not exceed threshold values. These statistics alone could not detect the faulty conditions. They are used, in this study, as input to the SOM algorithm to further improve the separation between casting conditions. The computing steps using PCA method are summarized as follows: 1. Matrix X, 2. Mean centring of X, 3. Calculation of covariance matrix, 4. Eigenvalues and eigenvectors, 5. Optimal number of PCs, 6. PCA Model, 7. T 2 and Q statistics.

Self-organizing map
Artificial neural networks (ANNs) are mathematical or computational models, inspired by biological nervous system. ANNs are comprised of an interconnected group of artificial neurons, and they have long been used for data-driven decision-making. Based on their learning process (supervised or unsupervised), ANNs are performed on the computer to perform certain specific tasks like optimization and pattern recognition. They can help draw useful conclusions and interpretations from observed data. ANNs using supervised learning like multi-layer perceptron (MLP), probabilistic neural networks (PNN), and radial basis functions (RBF) have proved to be advantageous in obtaining a good model with accurate predictions. However, they require that the output vector be known for training phase. Unlike supervised learning, output vector is not required to be known with unsupervised learning, i.e., the network does not use training pairs consisting of input vector and desired output.
The SOM, invented by Kohonen [24], is a kind of ANNs that use unsupervised competitive learning to map a high dimensional input space (the data space) onto a low dimensional output space, usually of two dimensions. It is a technique to group data with similar characteristics. Each neuron or node comprises a vector of weights of the same dimension as the input data vectors. The SOM is trained by presenting the data repeatedly and upgrading the weights to learn the structure of the data.
The SOM consists of an input layer and a competitive or output layer, fully interconnected to each other. The output layer consists of m neurons. Each neuron i (i = 1, 2,..., m) is represented by an n dimensional weight vector w i = [w i1 ,....,w in ] where n is the dimension of the input vector. In the output layer, the competitive process is done and the weight of connection is updated to choose a winner neuron.
The key steps in the SOM learning process (training) are first, for each input vector, determining its best matching unit (BMU). The BMU is the node that is most similar to the input vector. If we denote b the BMU of input vector x and w b the weight vector of this BMU, the identification is based on the minimum Euclidian distance, which is defined as follows: Then, the process proceeds to update the weight vectors of the BMU and those of its adjacent neurons (neighborhood) to match the input data using the following equation: where t denotes time, 0 < α(t) < 1 is the learning rate, h b (x) i (t) is the neighborhood function centered in the BMU b at time t. The neighborhood function is often taken as a Gaussian function. Both α(t) and the width of h b (x) i (t) decrease gradually with the increasing step t.
The SOM algorithm is described as follows: 1. Set the initial random positions and the initial learning rate, 2. Provide an input data, 3. Computing and finding BMU, 4. Adjust the weights, 5. Construct the output space,

Similarity measure
After SOM training, the winner neuron is used to compute metric distances. Distances or similarity measures are essential to solve many pattern recognition problems such as classification and clustering. Various distance measures are applicable. In this work, we consider a similarity evaluation based on the Euclidian distances [16,17]. There will be used different formula to measure distances between two data points characterizing two monitoring conditions. The followings distances can be applied: • Distance between input and neuron data: • Mean or gravity center of the input data: • Gravity center of the neuron data: • Distance between gravity centers (input and neuron data): where x j is the input vector and w bj defines the winner neuron.
The decision about running conditions can be performed using the computed metric distances: DIN and OGX. Figure 3 shows the principle of condition monitoring using similarity; the casting defect is determined by the distance separating the current operating data and process conditions. The differences between two running conditions i and j (ΔDIN = DIN i − DIN j and ΔOGX = OGX i − OGX j ) are used as indexes to distinguish the defective and healthy casting conditions.

Description of the proposed method
In order to construct a successful casting defects identification system, a combination using PCA and SOM is described in this work (Fig. 4). PCA is trained with input matrix that contains the process variables, where the goal is to establish the normal statistical correlation among the measured data to characterize the operating conditions of the casting process using Hotelling's T 2 and Q statistics. Sometimes, these statistics cannot exactly detect the defective conditions. They are therefore proposed in this study as characteristic features of the measured casting data. The chosen features are used as input to SOM algorithm, which is used for casting quality evaluation. The main objective is to compute the distances between the input vectors and winning neuron to better identify casting defects. A comparative study between conventional SOM, SOM with reduced data matrix (RD), and SOM with selected features (T 2 Q) is examined.

Results and discussion
The structure of the proposed fault identification method involves two parts: one is the development and training the models, the other is testing the process fault based on trained models. The measured data used in training represent a healthy casting condition.
In this study, the data matrices that contain the values of measured variables are constructed from 930 sampling instances of interval 0.1 s. Ten days of experimental measurements were collected from 9 casting tests as follows: two castings on the first day, two castings on the third day, two castings on the sixth day, and three castings on the last day.  Table 3. The anterior 2 PCs explain over 85 of the total variance of the data. The PCA model is established making use of them, and then the monitoring performance is progressed. As shown in Fig. 5, all process variables are correctly estimated with this PCA model, except that some certain variables are less well estimated than others.
According to SOM algorithm and PCA-SOM with RD as input (PCA-SOM/RD), casting conditions are evaluated by the metric distances DIN and OGX given by Eqs. (8) and (11), respectively. Obtained results are represented in Figs. 6 and 7. As we can see, it is not easy to distinguish between normal and abnormal conditions and also between false and missed alarms.
The proposed algorithm, PCA-SOM with T 2 Q as input (PCA-SOM/T 2 Q), is tested using the same computed   indexes. The isolation level of the normal and abnormal casting conditions is clearer than the SOM and PCA-SOM/ RD. The tested distances indicate that the casting defects are successfully identified (Fig. 8). The output results can be used to describe the capability of fault isolation.
To evaluate the performance of the SOM, PCA-SOM/ RD and PCA-SOM/T 2 Q techniques, a comparative study using ΔDIN and ΔOGX distances is carried out. Differences between computed indexes of defective and healthy casting conditions are listed in Table 4. From obtained On the other hand, the proposed method also proves its ability to ensure high defect detection performance compared with other applications that use PCA-SOM with RD as feature [16,17].

Conclusion
In this paper, a new statistical method to improve the monitoring of casting conditions based on a combination of PCA and SOM is proposed. The PCA is exploited to extract statistical features from experimental data. Using Hotelling's T 2 and Q statistics as input to SOM algorithm, the casting defects are separated using Euclidian distances based similarity evaluation. This combined method is accurate in casting quality monitoring; it can be seen from the analysing results that the normal and the abnormal casting conditions can be clearly identified. It is clear that the proposed method outperforms the other approaches and algorithms considered in this work.