Sandblasting Defect Inspection of Investment Castings Based on Deep Convolutional Neural Network


 Investment castings often have surface impurities and pieces of shell molds can remain on the surface after sandblasting. Identification of defects involves time-consuming manual inspections in working environments of high noise and poor air quality. To reduce labor costs and increase the health and safety of employees, we applied automated optical inspection (AOI) combined with a deep learning framework based on convolutional neural networks (CNNs) to the detection of sandblasting defects. We applied the following four classic CNN models for training and predictive classification: AlexNet, VGG-16, GoogLeNet, and ResNet-34. In terms of predictive classification, AlexNet, VGG-16, and GoogLeNet v1 could accurately determine whether there were defects. Among the four models, AlexNet was the most accurate, with prediction accuracy of 99.53% for qualifying products and 100% for defective products. We demonstrate a direct detection technique based on the AOI and CNN structure with a fast and flexible computational interface.


Introduction
The casting industry in Taiwan is immense, within a working environment that is often hazardous. Lostwax casting involves high temperatures, high levels of noise and dust, and a signi cant amount of environmental pollution. In particular, at the sandblasting stage of this process, workers must manually inspect workpiece surfaces for impurities or remaining shell molds. This is time-consuming and can result in eye fatigue, which ultimately affects the quality of inspection. Prolonged exposure to noise and poor air quality during the inspection process also undermines the health of workers.
Automated optical inspection (AOI) technology has matured in recent years. It uses optimal instruments and image processing to detect product defects. It achieves non-contact detection with greater stability, speed, and accuracy than manual detection, as well as reducing production costs. Many industries are using AOI to good effect in processes such as battery laser welding, at steel manufacturing, automotive manufacturing, and semiconductor packaging [1,2]. Furthermore, recent advances in computer hardware technology have lowered storage costs and enhanced computing power. As a result, deep learning (DL) techniques have become popular. A number of network architectures have been developed, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and restricted Boltzmann machines (RBMs). Among these, CNNs offer good performance when applied to image recognition and defect detection, successfully solving problems that conventional image processing techniques cannot, such as quality detection in agriculture, bridge deck assessment, and railway track inspection [3,4].
Conventional defect detection is based on image processing, involving grayscale preprocessing, feature extraction, and then comparison. Researchers have employed grayscale histogram techniques as well as texture analysis and wavelet transformation for defect detection. For instance, Kuo et al. [5] applied a grayscale histogram technique to detecting defects in LED packaging. They obtained good recognition rates and helped to reduce the errors caused by manual inspection, thereby increasing product yield and quality. Li et al. [6] used X-rays and a wavelet transformation technique to detect and assess defects such as cracks, gas holes, and impurities in castings. Their approach successfully detects defects for most cases, but issues remain, such as the need to manually con rm the number of multiresolution levels for each image. The features of grayscale images of steel billets vary; Jeon et al. [7] therefore proposed a method based on wavelet transformation to detect defects on steel billets. Their approach could effectively detect defects such as cracks on steel billet surfaces. However, using conventional image processing techniques such as grayscale histograms or texture analysis alone can result in errors, as these detection processes are susceptible to interferences, causing instability. Moreover, they often can only recognize a single defect. These approaches also require substantial amounts of time and effort to characterize and de ne defects, making them inadequate for current needs.
To address these issues, machine learning and DL have been incorporated into defect detection. Past applications of machine learning include that developed by Bakir [8], in which logistic regression and a decision tree were used to identify the process variables of investment castings and the corresponding values that result in product defects. Bakir demonstrated that decision trees are superior to logistic regression, with accuracy rates of 92.15% and 60.3%, respectively. Gu et al. [9] solved a wood defect classi cation problem using support vector machines (SVMs) and derived a mean classi cation rate of 96.5% and a false alarm rate of 2.25% from 400 test datasets. Li et al. [10] proposed a fabric defect detection method in which the algorithm extracts the grayscale histogram and engages in defect training using SVMs. Their algorithm was shown to be superior to other methods such as the two-way visual attention mechanism (TVAM) and dictionary-based visual saliency (DVS). For continuous casting quality prediction, Ye et al. [11] developed an approach based on weighted random forests (WRFs) and compared it with a decision tree and an SVM. Their results indicated that the mean true positive rate of WRF prediction was 93%, which was higher than both the 68% of the decision tree and the 65% of the SVM. To detect casting surface defects such as pores, pinholes, and cracks, Riaz et al. [12] passed images through a Gaussian lter to smooth them and then detected and classi ed defects in the images using k-means.
Past applications of DL to defect detection include that by Alencastre-Miranda et al. [3], in which four classic types of CNNs (AlexNet, VGG-16, GoogLeNet, and ResNet-101) were used to assess the surfaces of sugarcane billets. Their results indicated that AlexNet had the best prediction performance; depending on the sugarcane variety, their approach increased yield per hectare from 33% to 80%. Dorafshan et al. [4] compared a one-dimensional DL model (i.e., biLSTM) with two-dimensional DL models (namely AlexNet, GoogLeNet, and ResNet-101) for bridge deck assessment. The results revealed that the onedimensional model had the best average true positive rate at 70%, and the lowest was the ResNet at 53%.
On the whole, the one-dimensional model was more accurate than the two-dimensional models because the input of the former comprised signals rather than images. Iyer et al. [13] proposed a railway track inspection system and compared the performances of an artificial neural network (ANN), a CNN, a random forest, and an SVM. Their results indicated that the CNN performed better than the other algorithms. Li et al. [14] used a you-only-look-once (YOLO) algorithm to detect defects on steel strip surfaces. Their results showed that the mean average precision (mAP) for six types of defects was 97.55% and at 83 frames per second (FPS), it could achieve 99% detection accuracy. Raj et al. [15] developed a graphical user interface using YOLO and applied it to detect casting surface defects such as pres, pinholes, burrs, shrinkage defects, mold material defects, casting metal defects, and metallurgical defects. Their approach achieved an accuracy rate of 99% when applied to classifying investment castings in the test dataset. Shi et al. [16] proposed an algorithm based on a single shot object detector (SSDT), a modi ed single shot multibox detector (SSD) algorithm, to detect tiny defects in printed circuit boards (PCBs). Their approach achieved good performance with an mAP of 81.3%, which was better than SSD (mAP = 79.5%). Du et al. [17] developed a defect detection system for aluminum castings based on X-ray oriented DL. They incorporated a multiscale feature pyramid network (FPN) and RoIAlign into Faster-RCNN to strengthen information from bottom structures. Their results showed that using FPN or RoIAlign to detect defects in X-ray images of aluminum castings achieved better performance than Faster-RCNN.
Existing studies have rarely focused on the detection of impurities or remnants of shell molds on the surface of investment castings after sandblasting; most studies investigated the detection of pores, pinholes, and cracks associated with the casting process [6,11,12,15]. To protect the eyesight of workers and reduce their exposure to environments with high levels of noise and poor air quality, in this study we employed AOI for defect detection. However, different castings and surface properties require changes to the underlying algorithm, thereby reducing detection compatibility. Thus, we paired the AOI technology with DL techniques to develop defect-detection software for lost-wax investment castings.

Research Procedures And Methods
The Pre-processing of DL involves the classi cation and labelling of training data. A detailed data collection process, compressed-le establishment, and labelling of the dataset were given as follows: Step 1. Camera setup and image capture We used the camera function of a smartphone to capture images of investment castings provided by a manufacturer. To obtain images of equal size, we installed the smartphone on a tripod at a xed height (h = 130 cm), as shown in Fig. 1(a). Furthermore, to reduce the number of images captured and to capture individual samples completely, we used white paper with 6 × 4 grids lined in black as the background. The investment castings were placed in the grids, and a cropped image of each grid served as training data (or testing data), as shown in Fig. 1(b). In consideration of the fact that sandblasting inspection areas mostly have uorescent lights and to enhance the environmental compatibility of this study, the images of training data were taken under uorescent lights in the factory. We took images of 16 types of investment castings and obtained around 100 images like those shown in Fig. 1 Step 2. Cropping regions of interest (ROI) The original images included the oor background and 24 white grids. To effectively extract the ROI within the grids, we employed the OpenCV library [18] for grayscale processing, Gaussian blurring, edge detection for ROI identi cation, and the cropping and saving of individual sample images. We set the size of the ROIs at 416 × 416 pixels. Using edge detection, rectangles with a certain width and length were selected, and using the coordinates of the upper left corner, we calculated the coordinates of the rectangle centers, as shown in Fig. 2(a). Next, using the coordinates of the centers and the target width and length, we calculated the coordinates of the upper left corner, as shown in Fig. 2(b). Once the rectangle coordinates were revised, we could crop the target rectangle. In total, we obtained 1,591 cropped images, some of which are exhibited in Fig. 3.
Step 3. Image labelling Before training, the images must be labeled. This provides samples for training, which the CNNs must learn to be able to classify and recognize un-labelled data. We divided the investment casting images into two groups based on whether they had undergone sandblasting: (1) those in which there were clearlyvisible remnants of shell molds or impurities on the surfaces of the investment castings were labelled as "unquali ed" (i.e., defective); (2) those in which the investment castings had undergone one to several rounds of acid pickling and sandblasting until there were no traces of shell molds or impurities on their surfaces were labelled as "quali ed". After preprocessing, one-hot encoding was used to label the images Step 4. Saving compressed les We divided the data into a training dataset, a validation dataset, and a test dataset, which respectively accounted for 60%, 20%, and 20% of the total data. To avoid re-labelling the data before each training, we saved the labelled and cropped datasets in an npz le that can be loaded every time training is conducted. To save time and prevent the training data from taking up too much storage, we compressed the original 416 × 416 pixels to 224 × 224 pixels and saved the data in an npz le. To determine whether data size in uences the predictive capabilities of the model, we also created an npz le with the data in 128 × 128 pixels. To match the size of the input data in the original GoogLeNet v4 paper [19], we also saved an npz le with the data in 299 × 299 pixels to compare the predictive capability of our approach with this model. As the amount of data collected was not large enough, we employed angle rotation, brightness adjustment, horizontal shift, vertical shift, scaling, and vertical ipping to augment the number of images and avoid over tting.
Albawi et al. [20] pointed out that CNNs are currently one of the most popular types of neural network architectures. They are generally superior to ANNs because the convolution and pooling layers in their architectures reinforce the relationships between image recognition and neighboring data. CNNs have made relatively good achievements in various applications such as image recognition and voice recognition, to the point of exceeding human performance in recent years. They are thus one of the main forces in DL progress at present. A typical CNN architecture includes a convolution layer, a pooling layer, and a fully-connected layer, as shown in Fig. 4. Techniques such as padding, strides, and dropping neurons are often incorporated. The convolution layer is the core of a CNN. The operations involve multiplying and summing corresponding elements in the input data and the kernel. To lower the amount of computation and increase computational e ciency, a pooling layer is often added to CNNs. Pooling refers to the dimensional reduction of the input data in the width and length directions, thereby reducing the amount of data while preserving important information; it can also lower the possibility of over tting. A fully-connected layer is connected to all of the neurons in neighboring layers, and the last fullyconnected layer is used to classify problems.
As done by Alencastre-Miranda et al. [3], [4], we compared the performances of the following four architectures: AlexNet, VGG-16, GoogLeNet, and ResNet. AlexNet is a CNN model proposed by Krizhevsky et al. [19] with an eight-layer architecture: ve convolution layers and three fully-connected layers, combined with three maximum pooling layers. Its default input images are color images with 224 × 224 pixels. VGG is a CNN model proposed by Simonyan et al. [20]. Its variants include VGG-11, VGG-13 VGG-16, and VGG-19, among which VGG-16 and VGG-19 are the best in performance. In this study, we employed VGG-16, which has fewer parameters. It contains 13 convolution layers and three fullyconnected layers and is somewhat similar to AlexNet in structure. Its default input images are color images with 224 × 224 pixels. GoogLeNet is a CNN model proposed by Szegedy et al. [21]. In the deepening of this network, it replaces the original pure convolution and pooling layers with Inception structures, unlike the concepts of AlexNet or VGG. It also has fewer parameters than AlexNet but with a deeper network and greater accuracy. ResNet is a CNN model proposed by He et al. [22]. They pointed out that degradation is often encountered during the training of deep network models; to address this, they proposed residual learning architectures.
Prior to detecting sandblasting defects, we con rmed that the investment casting appeared in the image using YOLO v3. If the investment casting was detected, then a certain range was extracted, and CNN defect detection was performed. YOLO v3 is a CNN-based object detection algorithm proposed by Redmon et al. [23] which uses the Darknet-53 network architecture. It also refers to FPN methods and uses multi-scale feature maps to recognize objects of different sizes to enhance recognition capacity of small objects.

Results And Discussions
The DL in this study was established under Anaconda Spyder Python 3.6. Experiments were run on a computer with an Intel Core i7-9700 processor using 16 GB of RAM, and a NAVIDIA GeForce RTX 2060 graphics card in Microsoft Windows 10.

Deep-learning training results and model evaluation
We trained four CNN model architectures and compared the results of training with data in 128 × 128 pixel format and in 224 × 224 pixel format, as well as the results of training with batch size = 4, batch size = 8, batch size = 16, and batch size = 32. ResNet-34 performed poorly in the prediction of sandblasting defects. To identify the cause of this, we performed a comparison using GoogLeNet v4, which also employs residual learning. We additionally added a comparison using the default size of input data in GoogLeNet v4, which is 299 × 299 pixels.  Figure 5 presents the historical accuracy rates of the AlexNet model. The differences among the results from the 12 sets of training data were small, and the accuracy rates all converged to 1. Figure 6 presents the historical accuracy rates of the VGG-16 model. Although the accuracy rates involving [128,4], [128,8], and [128, 16] uctuated somewhat, they still converged to 1. The VGG-16 model only presented poor performance when trained using [224,4], and its accuracy rates could not effectively converge to 1 late in the training period, remaining around 0.65.
As VGG-16 is deeper and has more parameters, the RAM of our computer was unable to process the input data in 224 × 224 pixel format with batch size = 32 or the input data in 299 × 299 pixel format. Training with these datasets could not be completed, and consequently, there were no results for these datasets. Figure 7 presents the historical accuracy rates of the GoogLeNet v1 model. The differences among the results from the 12 sets of training data were small, and the accuracy rates all converged to 1. Figure 8 displays the historical accuracy rates of the training results of the GoogLeNet v4 model. The training accuracy rates resulting from the 12 sets of training data all converged to 1; however, the validation accuracy rates uctuated and could not effectively converge to 1. Figure 9 shows the historical accuracy rates of the training results of the ResNet-34 model. The training accuracy rates resulting from the 12 sets of training data all converged to 1; however, the validation accuracy rates uctuated sharply and could not converge to 1.
The objective of this study was to detect sandblasting defects in investment castings, so we attached more importance to the "unquali ed" prediction results. An "unquali ed" casting predicted as "unquali ed" represents a true positive (TP), whereas an "unquali ed" casting predicted as "quali ed" represents a false negative (FN). Similarly, a "quali ed" casting predicted as "quali ed" represents a true negative (TN), whereas a "quali ed" casting predicted as "unquali ed" represents a false positive (FP).
We used recall, precision, and the F1 score to evaluate model training results. Their formulas are as shown in Eqs. (1). Through (3). Table 1  In terms of overall training time, AlexNet took the least time among the ve models. GoogLeNet v1 was close to AlexNet in training time, followed by ResNet-34. On the whole, GoogLeNet v4 took the longest time. In terms of average time spent on the test dataset, AlexNet was the fastest among the four models, which took less than 3 milliseconds per image, followed by GoogLeNet v1, which took 1.9 milliseconds to 3.2 milliseconds. GoogLeNet v4 was the slowest, taking 9.6 milliseconds to 12.4 milliseconds. and ResNet-34 did not perform as expected in defect prediction. We therefore speculate that the investment castings used in this study were not suitable for training models with residual learning architectures in defect detection.
We employed YOLO v3 to detect whether investment castings appeared in the machine vision images, so we only de ned one YOLO v3 category: "sample". Thus, when training YOLO v3, we did not investigate the impact of data size or batch size. Models with better object tracking performance generally display the following: "samples" take up a greater proportion of the recognized images, and when other categories are identi ed, they identify "samples" as much as possible. In other words, while increasing the recall, a certain level of precision must also be maintained. In addition, the area under the precision-recall curve (i.e., average precision (AP)) is generally used for evaluation. Calculations revealed that the AP of the "samples" in this study was 99.83%, indicating that the YOLO v3 presented excellent performance.

Design and practical application of graphical user interface
To make it convenient for users to perform sandblasting-defect detection, we employed the PyQt5 library to design a graphical user interface (GUI). After images are captured under su cient lighting, YOLO v3 analyzes them to detect investment castings, and then the images are sent to the CNN for defect detection. Upon starting, the application automatically loads the trained YOLO v3 weights and CNN weights. Before predicting the type of investment casting sandblasting, a certain range of the camera is rst extracted (the dimensions of this range depend on the image size designated during CNN training).
To prevent capture failure, we added a detection boundary, as shown in the off-white frames in the images on the left of Fig. (10). When YOLO v3 detects an investment casting, the type of investment casting sandblasting is not predicted unless the center point of the investment casting frame falls within the detection boundary. At the same time, the frame will display warning text. The results are as shown with the orange frame in the right image of Fig. (a). In contrast, if YOLO v3 detects an investment casting and the center of its frame falls within the detection boundary, then the type of investment casting sandblasting is predicted using CNN. The results are as shown in Fig. (b): the green frame in the right image indicates a "quali ed" casting and the red frame indicates an "unquali ed" casting.

Conclusions
In our assessment of various classic CNN models using different indices, the AlexNet model with data in 128 × 128 pixel format and batch size = 32 displayed the best training performance, followed by the AlexNet model with data in 224 × 224 pixel format and batch size = 16, the AlexNet model with data in 224 × 224 pixel format and batch size = 4, and the VGG-16 model with data in 224 × 224 pixel format and batch size = 8. We speculated that batch size in uences the training results to some degree; however, limited by our hardware, we could not add more training data to prove this. However, we can be sure that the magnitude of this value affects training time. GoogLeNet v4 and ResNet-34 presented poor prediction capabilities, with no discrimination among 24 training datasets. We therefore speculate that DL performed using CNN models with residual learning architectures is not suitable for the detection of sandblasting defects in the investment castings.
Although the average precision of YOLO v3 in predicting "samples" is 99.83%, the training datasets were more uniform (the images all contained investment castings against a white background), so for some camera angles, non-background objects in more complex images may be mistaken for "samples" in practical application. We suggest that future studies de ne investment casting categories based on their shape, such as "sample a", "sample b", and "sample c", or add more images with multiple investment castings in the same image and more investment castings with some overlapping each other to enhance image diversity.
During training, we re-adjusted the sizes of the input images. Thus, when the application needs to detect objects, it captures images that are the same size as the input images based on the centers of the objects. We therefore added a detection boundary to the left image of the application (the actual YOLO v3 object detection image) and only predicted the type of investment casting sandblasting within the detection boundary. Furthermore, to implement image monitoring, a proper distance between the camera and the investment castings must be maintained so that complete images of the investment castings can be captured.      Historical accuracy of AlexNet model Figure 6 Historical accuracy of VGG-16 model  Historical accuracy of GoogLeNet v4 model Figure 9 Historical accuracy of ResNet-34 model Figure 10 Demonstrations of GUI: (a) casting part (orange) exceeds detection boundary; (b) "unquali ed" (red) and "quali ed" (green) parts