Multiscale feature extraction and its application in the weld seam quality prediction for plasma arc welding

As a complex thermo-physical process, the plasma arc welding (PAW) is easy to be unstable due to external interferences. Weld quality monitoring is important for intelligent robot PAW welding. Due to the different instability mechanisms, it is difficult to obtain high adaptivity and accuracy with features extracted in a single time window. In this paper, a novel feature extraction method based on sliding multiscale windows is proposed to improve model accuracy and calculation speed. A group of windows with different time widths are established to extract multiscale information. Windows slide throughout welding process and are synchronized on the timeline for feature correlation. The welding current and arc voltage are processed to extract features inside windows, including signal denoising by discrete wavelet transform (DWT) and dimension reduction by primary components analysis (PCA). Based on the feature vectors extracted from multiscale-windows, support vector machine (SVM) with radial basis function (RBF) kernel is used. The best window width is determined automatically by model training. The proposed method is used to predict weld quality for PAW in the field of shipbuilding. The results show that the model with multiscale feature extraction is helpful to improve prediction precision and recall ratio.


Introduction
Plasma arc welding (PAW) can obtain high depth-to-width ratio welds by compressing the arc, which improves manufacturing efficiency significantly [1]. It has been widely used in the field of petroleum, shipping, aviation, etc. [2,3]. In recent years, with the development of liquid natural gas (LNG) ships, the application of PAW in the joining of stainless-steel sheets is booming. However, PAW is easy to be disturbed by external interference in mass automatic production. It has been found that the unstable keyhole or plasma arc would cause porosity and undercut of weldments [4,5]. Van Anh et al. [6,7] demonstrated the close relationship between poor weld formation and welding heat input and plasma flow. As a remarkable feature of PAW, keyhole contributes significantly to large depth-to-width ratio and high efficiency. But at the same time, it increases the complexity of the welding process. Liu et al. [8] concluded that the process stability of PAW is influenced by a combined effects of many factors. Wu et al. [9] performed numerical simulation to investigate the interaction among plasma arc, molten pool and keyhole of PAW. They proved the close relationship between these factors, but the interpretation ability of the model is limited due to the high complexity of welding process. Accordingly, it is necessary to establish a set of adaptive monitoring system in the automatic production with PAW.
For traditional welding production, process monitoring mainly depends on manual observation and expert experience. Some welding process sensors or monitoring devices have been established based on process signals [10]. Zhang et al. [11] found through a large number of observation that the rapid decrease of plasma cloud upon establishment of the full penetration keyhole. They designed a sensor based on this mechanism to detect weld formation. Their followup study [12] found that the duration of peak current was equal to the establishment time of keyhole. Normally, these research results not only rely on expert experience, but also usually require complex equipment. For example, Liu et al. [13] built a vision system to monitor the transient keyhole status from the backside of the workpiece in PAW process. They found the strong relevance between the welding current falling edges and the keyhole dynamic behaviors. These research results are of great significance for understanding PAW process. However, it also proves that it is very difficult to establish a monitoring system for such complex welding processes, which limits the application in large-scale production.
The methods based on Artificial Intelligence (AI) have an advantage in handling such high nonlinear problems. As reviewed by Zhang et al. [14], machine learning plays a vital role by minimizing human involvement and performing adaptive decisions in the field of welding. Chi and Hsu [15] established a fuzzy radial basis function (RBF) neural network to predict quality characteristics of PAW process. Wu et al. [16] proposed a monitoring approach for the penetration state in variable polarity keyhole PAW. Feature extraction is an important part of artificial intelligence modeling, which normally requires a lot of time and energy in the field of welding. Based on continuous research, Wu et al. [17][18][19] extracted time and frequency domains features from sensors constitute a fusing feature set to reflect the variation trend of PAW weld penetration status. Song et al. [20] conducted a series of work to study the relationship between the welding sound and penetration states in PAW. They obtained feature vectors and proposed a t-SNE method to recognize different penetration states. In recent years, some novel feature extraction methods are studied and applied in the field of welding. Shevchik et al. [21] used wavelet packet transform (WPT) to extract features from the optical and acoustic emission signals in welding process. The quality classification ranged between 71 and 99%. You et al. [22] used wavelet packet decomposition (WPD) to extract features and exerted primary component analysis (PCA) to refine features. A model used these features provided a mean accuracy of 80.17% on predicting weld defects like blowouts, humping and undercut. Zhang et al. [23] calculated the statistical features of wavelet packet coefficients based on WPD. Their model detected blowout, undercut, and humping defects with an accuracy of more than 93%.
To extract features, signals are analyzed in a specific time span called "window". The width of windows would affect the quality of extracted features significantly. At present, there are many studies on extraction methods, but less attention is paid to the calculation window. The optimization of the window number and width mainly depends on expert experience.
Huang et al. [24] determined the peak period (0.4 s) as calculation window to evaluate porosity defects for pulsed tungsten inert gas (TIG) welding of aluminum alloy. They computed a ratio of two spectral line intensities and their statistical parameters in every peak period of spectral signal. Huang et al. [25] observed laser welding process and found that the computed window with period of 0.002 s is suitable for predicting weld formation defects. In gas metal arc welding (GMAW) process, Huang et al. [26] extracted features in the window with time length of 0.4096 s. The accuracy of their model for weld formation was 98.75%. There are significant differences of window width among prediction models. This is due to the differences of welding principles, influencing factors and detection targets. For instance, in laser welding, Wang et al. [27] studied the oscillation of plasma and found out that the fluctuation period was around 450-600 μs. Huang et al. [28] concluded in their study that the fluctuation frequency of plasma is between 1.5 and 3 kHz, for keyhole it is between 200 and 700 Hz, and noise can be at the order of 10 kHz. Improper calculation window will lead to information loss, which will reduce the accuracy of models. Although manual determination is helpful, it increases the dependence of evaluation methods on experts. To this end, it is necessary to design a method to determine the most appropriate mode for window settings.
This paper aims to present an adaptive monitoring system to evaluate the welding quality of PAW process. The welding current and arc voltage are collected as process signals. After wavelet denoising, a multiscale signal feature extraction method is designed to obtain information within different time scales synchronously. A group of sliding windows with different widths are established. The multiscale windows are synchronized on the timeline for feature correlation. After windows initialization, SVM with RBF kernel, as well as dimension reduction by PCA, is used to establish prediction model. Through labeling and training, the optimum parameters of window width are determined. The proposed method was used in weld quality prediction for PAW in the field of shipbuilding. Its performance is compared with single window models. The comparison results show that it is helpful to promote the precision and recall ratio of model.

Method and procedure
In PAW, keyhole is highly coupling with weld pool and plasma arc, and keyhole behavior involves mass and heat transfer. The so-called process instability refers to the abnormal keyhole behavior, whose duration is different and related to its formation mechanism. Thermo-physical behavior in the PAW is complicated to evaluate accurately only based on theory. In addition, there are differences between varied applications. Therefore, it is necessary to carry out datadriven optimization besides knowledge-driven model. The core of proposed method is to extract information from complex PAW process with multiscale windows. The purpose 1 3 is to improve the accuracy of prediction. The settings of windows are critical. To this end, the training based on a group of labeled data should be carried out for window optimization. Accordingly, the method is divided into two parts: training and prediction, as shown in Fig. 1. In the training part, there are three steps: (1) signal preprocessing, (2) feature extraction with multiscale windows, and (3) model training to determine the optimum window width. When the settings of windows are optimized, the prediction model is established. In the prediction part, the information extracted during welding process. The frequency of noise is usually high. When the signal-to-noise ratio is low, it is very unfavorable to the information extraction. The optimization of multiscale windows would be affected. To this end, signals are denoised by wavelet denoise method. The method is based on the principle of discrete wavelet transform (DWT). DWT is a widely used wavelet transform method which decomposes the signal into a series of components with different frequency ranges, developed by Mallat [29]. The main formula of DWT can be written as: where j and k are the parameters controlling the scale and shift of the basis wavelet functions and scale functions . W j 0 , k and W j, k are called the approximation The flow diagram of the proposed multiscale feature extraction method by multiscale windows is used as input vector of model to output prediction results. The multiscale feature extraction is helpful to obtain effective and adequate information, which is helpful to improve model accuracy.

Signal denoising
Although there is an analog filter in the signal conditioning module, it is difficult to avoid noise in signals collected coefficient and detail coefficient, respectively. The two coefficients stand for the low frequency and high frequency components of the signal. In wavelet denoise method, for each component from DWT, a threshold is set to filter the high frequency values. After filtering, the components are reconstructed into the denoised signal through a reverse process.

Multiscale windows
After denoising, signals of complex process usually have different frequency components. Especially in the case of defects caused by process instability, the information usually comes from different frequency scales. For PAW, weld defects form due to the unstable keyhole that is highly coupling with plasma arc and molten pool. Instability can be reflected in abnormal fluctuation of process signals. The frequency of fluctuation is affected by physical factors including materials, joint types, welding methods and environment factors. The time duration of factors is different. Even in a single welding process, the frequency of the fluctuation may be different. Figure 2a showed a filtered voltage signal collected in welding process, where fluctuation can be seen at the start period. The weld formation at initial part was poor because there is a big assembly gap. With the decrease of assembly gap, the welding process became stable in the second half of welded seam. Two sections of unstable welding process, Sects. 1 and 2, are used to investigate the effects of window width. Windows with different width are set for feature extraction from welding voltage signals, as shown in Fig. 2b (2) by using a smaller extraction window; (c) feature extraction ranges at section (1) and section (2) by using a larger extraction window established, and feature extraction is carried out inside windows. Normally, short windows focus on details, while long windows are capable to extract features that contain longterm information. Windows slide throughout process. In addition, the multiscale windows should be synchronized on the timeline for feature correlation. Figure 3 presented the setting method of multiscale windows. The initialization of window width needs basic cognition of the analyzed object. The requirement of expert knowledge is not necessary. Take three windows as an example, all windows are aligned at their end time. The moving distance between neighbored windows is set to be equal to the shortest window width. Before slide, a feature extraction is carried out inside every window. The features are used for model prediction. In this mode, there is no overlap for the shortest window. For other windows, there are some same data points in the two calculation processes. The sliding step length and calculation frequency are determined by the shortest window. At every iteration, the number of windows and their width are kept constant. Through iterations, the width of windows is optimized by model training.

Modeling procedure
As raw data of PAW, welding current and arc voltage contain plentiful information of process state. From the perspective of data size, they are not suitable to be directly used as the input of the model. Dimensionality reduction is indispensable. Accordingly, the raw data are firstly segmented in the unit of windows, then dimension reduction is executed for the data inside windows. Here, the primary components analysis (PCA) [30] method is used for data dimension reduction. PCA is a widely used method of feature extraction and data dimension reduction. It transforms the original data to a new coordinate system where the greater variance by some projection of the data lies on the former coordinates (called primary components). Here, the computation process mainly includes the following steps [22]: where V is the mean vector of the feature vectors in data matrix V . The covariance matrix C of the data matrix is calculated by The eigenvector u i of the covariance matrix is obtained by solving the equation where i is the i th eigenvalue of C , and 1 ≤ 2 ≤ ⋯ ≤ n . Then, the transformed feature matrix S can be calculated by where each row contains a primary component, and the primary components are sorted in the order of importance. By choosing the number of rows to be used, the feature matrix's dimension can be reduced. The number of primary components is determined by the information contained in them, where the information ratio is often represented by variance ratio of the component. Normally, the number of components is set to contain at least 85% of the sample's variance.
Subsequently, the feature vectors with the same ending time are connected end to end to form a dataset that contains multiscale information. The new feature dataset can be used in machine learning models. To determine the optimum window size, a predict model is built and trained with the dataset with different window width. The dataset is divided into training set and test set. Here, support vector machine (SVM) with radial basis function (RBF) kernel is used as the machine learning model. SVM is a common machine learning model developed by Cortes and Vapnik [31], which has been applied to various fields. The main formula of SVM can be expressed as: where y i and x i are the label and the feature vector of the i th sample, respectively, and is the distance between the inseparable points to the separating hyperplane. The hyper parameter C is called regulation parameter which controls the strength of the penalty to the inseparable points. Kernel functions are used to map the input features to higher dimensional space to solve nonlinear problems. A frequently used kernel function is the RBF kernel, which is expressed as where x ′ is the center of the kernel, and is the shape parameter. The kernel function is used to replace input feature vector x in Eq. (10) to solve nonlinear problems. The hyper parameters C and should be optimized, and this is usually done by using cross-validation method like grid search.

Application situation and objectives
The proposed method was used to predict the weld quality of PAW in the field of shipbuilding. As shown in Fig. 4, the robotic PAW system included an LHM-315 plasma welding machine, a TP-3 plasma torch, and a Yaskawa Fig. 4 The PAW welding system and signal acquisition device Shougang DX100 robot. A torch orifice with diameter of 1.6 mm was used in the experiment. The electrode diameter was 1.5 mm. The electrode setback, the distance between the electrode tip and the nozzle orifice, was 3 mm. The electrode tip angle was 20°. Argon with purity of 99% was chosen as plasma and shielding gas, and the gas flow rate was 12 L/min. The welding current was 40 A, and the welding speed was set to 20 cm/min. The welded material was 304L austenitic stainless steel. The size of the samples was 150 mm × 150 mm × 1.2 mm, and the joint type was lap joint. During the welding process, a Hall current sensor and a Hall voltage sensor are used to obtain the electrical signals of welding circuit. A signal conditioning module is used for signal amplification and analog filtering. The signal acquisition module (USB-6211) is in the charge of analogto-digital conversion. The sampling rate of the acquisition process was 20 kHz. After the digital signals are transmitted to computer, LabVIEW is used for digital signal processing and visualization. Data dimension reduction and evaluation modeling was performed under Python platform, with NumPy and Scikit-learn library used as tools. Four groups were designed to represent different welding stability conditions. Each group was repeated 5 times. As base line, the distance between torch and workpiece in group 1# was constant at 2 mm to keep plasma stable. As shown in Fig. 5, unstable welding conditions were created by changing the torch height. In group 2#, torch was lifted 1 mm at the middle of welding process. On the contrary, torch height in group 3# was reduced to 1 mm during welding process. In group 4#, a spacer was used to increase the distance between torch and workpiece continuously. As results, the fluctuation of torch height led to unqualified weld formation, as shown in Fig. 6. Here, the current and voltage of plasma arc are collected as input signals of model. The surface quality of welded seam was the prediction target of model. According to surface defects and seam width uniformity, the weld process was divided into two types. In Fig. 6, the zones with poor formation are marked in red, which should be predicted by model in the process of robot welding. The remaining zones, as well as their corresponding welding processes are considered as qualified. The target of the proposed model is to predict the weld quality based on electrical signals and present evaluation results on line.

Modeling and windows optimization
During robot welding, electrical signals of plasma arc are easy to collect synchronously. It provides conditions for the industrial application of model. However, there is a lot of noise in signals, which is disadvantageous to the subsequent feature extraction. To this end, the denoising of raw signals are carried out firstly. The wavelet denoise method is used to remove the high-frequency noise in the signal and keep the fluctuation of the signal as much as possible. Here, the denoise threshold in this experiment was set as follows: where length(x) denotes the length of the signal. The threshold type used was soft threshold. The wavelet type was set to 'db4' wavelet, and the decomposition level was set to no more than 11. Compared to the raw signals presented in Fig. 6, the denoised waveform present in Fig. 7 has less high frequency noise, but the fluctuation trend and local details are preserved.
The denoised electrical signals of PAW are used as information sources. With the 20-kHz sampling frequency, there would be a large number of input variables if the electrical signals are directly used as the input of classifier. In addition, the number of input variables will be different when window width changes. To this end, the electrical signals in the window are processed by PCA. It was found that the cumulative variance contribution rate of the three principal components can exceed 85% although the window width changes. Thus, the number of input parameters does not change with the window size, that is, each window forms three principal components as the input of the SVM. After dimension reduction based on PCA, an SVM model with RBF kernel was established. The hyper parameters C and of SVM model were optimized using grid search cross-validation method.
The candidate values of the hyper parameters are set to {0.1, 1, 10, 100, 1000} . Because the result of hyper parameter optimization may differ according to the choice of window widths, the specific optimization results are not discussed in this paper, and the final model always used the best hyper parameters to produce the outputs.
Here, the weld quality is the prediction target, and it is divided into poor formation and qualified formation. Two windows are set for feature extraction. The short window is designed for local and short-term disturbances, while the long one is to collect long-term fluctuation or trend in welding process. According to Wang and Chen [32], for PAW process, fluctuation frequency could lie in a range of 0.5 to 1 kHz, and the frequency was found widely distributed within 250 Hz. Therefore, the range of segment lengths was set between 0.001 s and 2 s. The short window width was chosen in the list of {0.001s, 0.005s, 0.01s, 0.05s, 0.1s} , and the long window width in the list of {0.1s, 0.5s, 1.0s, 1.5s, 2.0s} . Window width, as well as their match, is the optimization objective of the proposed model. Orthogonal experiments were performed for this purpose. According to the presented method, the total number of samples in the dataset was determined by the short windows size. In this study, for datasets with short window width of {0.001s, 0.005s, 0.01s, 0.05s, 0.1s} , the number of samples were {42002, 8402, 4203, 842, 422} , respectively. 70% of the dataset was used as training set, and the remaining 30% was used as test set.
During training, five-fold cross-validation method was used to reduce over fitting. Accuracy for test set and total computation time cost were used to evaluate the performance of the model. The accuracy was defined as: where n c and n represent the number of corrected classified samples and the total number of samples in the test set, respectively. Computation time was the total time cost of data segmentation, feature extraction, model training, and model testing.
According to the results presented in Fig. 8a, the model accuracy was significantly affected by window widths. The (11) A t = n c n window settings located on the upper left corner have high accuracy. It indicates that the longer the long window, and the shorter the short window, the higher the accuracy of the model. As the width of short window decreases, the accuracy of model increases rapidly. It proves that detail information contributes a lot for prediction. With the increment of the long window width, the performance of model slightly increases. It reflects the relationship between the weld formation and the long-term fluctuation trend of welding It could be attributed to the information loss on both local disturbances and long-term trend. From the point of model accuracy, the two-window model has a wide range of parameter selection. It indicates that modeling with multiscale windows is helpful to improve model accuracy. As shown in Fig. 8b, the computation time cost mainly depends on the width of short window. With the decrease of short window width, the number of windows grows. It results in a large amount of calculation.
During the optimization of windows, the model accuracy and calculation time should be balanced. In Fig. 8a, the accuracy of model has the best performance when the short window width is less than 0.01 s, and the long window width is larger than 0.5 s. In Fig. 8b, the computation cost increases significantly when short window width is less than 0.005 s. Therefore, the best choice in this case is 0.01 s for short window and 0.5 s for long window. The optimized matching between short window and long window is marked as a red point in Fig. 8.

Model performance
The model with optimized windows, namely 0.01 s short window and 0.5 s long window, is established to predict weld quality for PAW. The prediction result of the model is compared with models using a single window. The window widths of single window model are 0.005 s, 0.01 s, 0.05 s, 0.1 s, 0.5 s, 1 s, and 2 s. The precision and recall for both qualified formation and poor formation are calculated as following: where TP represents the number of correctly classified positive samples, FP denotes the number of wrongly classified positive samples, and FN means the number of wrongly classified negative samples. Especially, if the number of positive classified samples is 0, the precision is also set to 0. The result is presented in Table 1.
The result shows that the model using multiscale windows performs better than models with single window, both in the precision and recall of poor formation. The performance of single-window models varies significantly with the window width. Generally, the smaller segment length results in a better performance for the classification of poor formation areas. But this improvement is at the cost of computing time. The accuracy of the multiscale-window model is better than all single-window models. It proves that the proposed modeling method is helpful to improve the prediction accuracy of weld quality for PAW. Figure 9 shows the prediction results in test set, where most samples are in good agreement with their actual seam states.
The model training was performed on a computer with a high-performance CPU of Intel Xeon E5-2470 (2.30 GHz with 16 cores and 32 threads). For industrial application, the trained model ran on a CPU of Intel i5-8300H (2.30 GHz   Fig. 9 Prediciton results in test set by proposed model with 4 cores and 8 threads) to output prediction results. Table 2 shows the average time cost for each step of onsite prediction. The total computation time for one weld is 3.612 ms, less than the time period (10 ms) of a welding process. In terms of accuracy and computation time, the proposed model has good engineering application ability. Currently, the most time-consuming step is signal preprocessing, and mainly comes from the wavelet denoising. In future application, a physical filter circuit would be added to the acquisition equipment to reduce or remove the need for digital filtering, by which the model's computation time can be greatly saved.

Conclusions
A feature extraction method based on multiscale windows was proposed to improve the accuracy and response speed of evaluation model. Its application in weld quality prediction for PAW proves its applicability in the field of shipbuilding. The intelligent level of robot welding is improved. The conclusions are as follows: 1. Signals in complex process contain information in a wide frequency range. It's difficult for model with single window to extract effective information rapidly. Modeling with multiscale windows is helpful to extract information of both momentary disturbances and long-term fluctuation trend. The optimization of window width is the dominant factor for model accuracy. 2. To establish multiscale-window model, SVM with RBF kernel is used after signal denoising and dimension reduction by PCA. The windows with different width are aligned to time stamp. For each type of window, the shift distance between neighbor windows is equal to its window width. The optimization of window width is carried out aiming at both model accuracy and calculation time. 3. A model with two scale windows is established to predict weld quality of PAW in the field of shipbuilding. The width of short window is 0.01 s, while the width of long window is 0.5 s. The prediction precision is 99.943% and recall ratio is 99.986%. Its calculation time is 3.612 ms. The comprehensive performance of prediction model is significantly improved by multiscale feature extraction method.