Research On Arc Fault Detection Using Morlet Wavelet Features And ResNet

Series arc fault is the main cause of electrical fire in low-voltage distribution system. A fast and accurate detection system can reduce the risk of fire effectively. In this paper, series arc experiment is carried out for different kinds of electrical load. The time-domain current is analyzed by Morlet wavelet. Then, the multiscale wavelet coefficients are expressed as the coefficient matrix. We use HSV color index to map the coefficient matrix to the phase space image. Random gamma transform and random rotation are applied to data enhancement. Finally, typical deep residual network (ResNet) is established for image recognition. Training results show that this method can detect faults in real time. The accuracy of ResNet50 is 96.53% by using the data set in this paper.


Introduction
Arc is a kind of abnormal discharge phenomenon in insulating medium. The arc can keep burning when the circuit voltage is higher than 20V and the current is greater than 0.1A. It is easy to injure people or produce electrical fire. In low-voltage power distribution system, arc fault may cause by irregular circuit connection and the aging electronic equipment. When series arc fault occurs, the residual current of the circuit is usually less than the cut-off threshold of low-voltage circuit breaker. The circuit cannot be cut off in time. A real-time and accurate arc fault detection system can reduce the risk of fire effectively.
The research on arc faults mainly focuses on the simulation of arc models [1] since Cassie and Mayr established the arc model by studying the relationship between arc dissipated power and current in the 1940s. As these two arcs models are suitable for circuits with different voltages, some studies are aimed at improving and integrating the arc model to make it applicable to more occasions [2] [3] .
In recent years, the researches of arc fault detection are not limited to the arc mathematical model. In fact, some circuit parameters may have a mutation when an arc fault occurs. The purpose of fault detection can be achieved by identifying the circuit parameters as features. In practical application, it is difficult to collect the photothermal physical characteristics of the arc in real time due to the random arc generation. As a result, the common feature extraction method is time-frequency domain analysis of voltage or current. Fast Fourier analysis (FFT) is the main method of frequency domain analysis. In addition, literature [4] realizes the detection of series arc faults of different loads through the change rate of voltage spectral dispersion index (SDI) of low-voltage power line. Literature [5] designed a high-frequency coupling sensor to detect the highfrequency characteristics of arc generation, and established a fault identification model according to the different high-frequency characteristics of different loads. In literature [6], fault detection models under different loads are designed by fusing time-domain information and frequent-domain information obtained by wavelet analysis as feature input convolutional neural network. The extraction of time domain features has also changed from the original phase analysis to empirical mode analysis (EMD) or noise processing [7][8][9] . However, a single feature has great uncertainty in the face of arc faults with many singularities. The fusion of fault feature has become a new challenge in the research of detection method.
As a time-frequency analysis method, the multi-resolution characteristic of wavelet analysis is efficient in many fields. The fusion of fault features in time-frequency domain ensures the real-time detection [10][11] .
In addition, some statistics of the circuit can also be used as fusion features, such as information entropy wavelet energy entropy and power spectrum entropy of signal [12] or the fusion of proportional coefficient of arc zero rest time and normalization coefficient of low-pass filter [13] . These features detect the fault state of the circuit by setting the weight and threshold of parameters. Due to the limitation of experimental conditions, threshold setting is faced with the disadvantages of poor reliability and low efficiency. In order to improve the accuracy of detection, the fault feature data of arc can be input into machine learning algorithm for processing.
Machine learning can be subdivided into unsupervised learning and supervised learning. Typical unsupervised learning including cluster analysis, principal component analysis (PCA) and singular value decomposition (SVD), etc. Among them, PCA is widely used as a data dimension reduction method. In literature [14] the high-dimensional phase plane at the center of the moment, the radius vector offset, Correlation dimension and K-entropy were used as fusion features. Then PCA is used to reduce the dimension of features to extract the main features of fault detection. The effect of unsupervised learning is affected by the sparsity of data and singular value points to some extent.
With the good performance of supervised learning in various fields, neural networks and support vector machines (SVM) have also been applied to arc fault detection. As an example, the authors of literature [15] propose a method to input the fusion features of variational modal decomposition (VMD) and multi-scale fuzzy entropy (IMFE) into SVM for classification, and the accuracy of classification was verified through experiments.
Supervised learning is a data-driven algorithm. Fewer fault data samples may result in the accuracy decreasing of neural network. Literature [16] solved the impact of less fault data on the accuracy of neural network by using the method of adversarial data enhancement, and proved the effectiveness of data enhancement through the detection of convolutional neural network.
In practical application, feature data can exist in the form of more intuitive images. As proposed in literature [17], quantitative recursive analysis (RQA) was performed on the sequential periodic phase space trajectory diagram of load faults to extract fault characteristics of different loads.
According to the questions above, the research of this paper aims at the following three aspects: 1) The selection of fault features should consider both diversity and real-time. In section 1and section 2, we introduce the experimental design and multi-scale wavelet feature fusion method.
2) In order to handle fault features in an intuitive manner, we converted features into images and a computer vision network is built for classification. This part is reflected in section 3.
3) The randomness of arc fault results in a small number of experimental samples. Data enhancement is used to improve network performance, and arc fault detection model is established through network pre-training. Data enhancement methods and comparative tests are shown in section 4. The research structure of this paper is shown in

Experiment and data processing
In order to restore the real arc fault data, we set up an arc fault simulation experiment platform according to the international standard UL1699. The common 6 kinds of loads in low-voltage circuits are connected with the arc fault generator in series, and the sampling resistance method is used to measure the current time domain signals of six loads in normal working state and fault state. The load types and sampling resistance values are shown in Table 1   Table 1 Load parameters and sample resistance values In 220V, 50HZ power grid environment, the above six kinds of loads are tested in normal operation and fault state for 4 times respectively. In the power grid, harmonics have little influence on the signal with the harmonic frequency higher than 20 times. In view of this, the sampling frequency of the experimental current is set as 25KHZ according to Nyquist sampling theorem. Nyquist's theorem can be expressed by equation (1): Where s f is the nondestructive sampling frequency and m f is the highest harmonic frequency of the time domain signal.
The 48 groups of data obtained from the experiment are reproduced in Matlab. Taking incandescent lamp and computer of linear load and nonlinear load as examples respectively, which experimental results are shown in Fig. 2  The spectrum in the figure is obtained by fast Fourier analysis (FFT) of time-domain signals. It can be seen from the figure that the "flat shoulder" phenomenon occurs near the zero crossing of nonlinear load when it fails, which is called "zero rest". In the frequency spectrum, the linear load has a higher odd number of high harmonics during the failure. In addition, the total harmonic distortion (THD) rate reaches 54.94%, which is much higher than 6% under normal operation. By contrast, the time domain current of nonlinear load presents high randomness, higher harmonic component and complete distortion of signal. In normal operation, the nonlinear load also produces odd high order harmonics and the harmonic amplitude is higher than that of the linear load. It is difficult to conclude the general rule of nonlinear load arc fault from its time domain current and frequency spectrum.

Morlet continuous wavelet analysis
By using FFT, the high-order harmonic components contained in the current signal can be analyzed in a certain time range. In order to analyze the frequency information at different moments, Morlet wavelet is used to conduct one-dimensional continuous wavelet analysis on all the experimental currents [18] . The analytical form of Morlet wavelet can be expressed as equation (2): Where C is the approximate coefficient. The continuous wavelet transform is an integral transform, which is the same as the Fourier transform. The continuous wavelet transforms the mother wavelet by continuous translation and scaling, and then the wavelet coefficient is obtained. The wavelet coefficient is a binary function composed of translation and scaling, and the continuous wavelet transform can be expressed as: The Morlet continuous wavelet transform coefficient can be expressed as:  An arc fault detection method based on ResNet50 is proposed to integrate multi-scale fault features and to classify faults in a more intuitive form. Different from traditional machine learning methods, computer vision is more practical for those who lack prior knowledge. In view of this, we try to transform the abstract wavelet coefficients into images.
The wavelet coefficients obtained from each experiment are arranged into a 64×2500 matrix according to the scale of 1-64. However, the matrix coefficients of a single channel do not have the spatial information required by the convolutional neural network. Therefore, the phase space depth diagram of continuous wavelet transform is made by mapping the coefficient matrix into the phase space of HSV color index, as shown in Fig  5. In the convolution operation of phase space image as feature image, the image is convolved in the form of three-channel (RGB) respectively. A wider receptive field can be obtained during the traversal operation of the convolution kernel [19] , thus achieving higher recognition accuracy. The significance of Fig. 4 is that the color index at the bottom of the image represents the size of the wavelet coefficient from small to large, the horizontal axis is the sampling time axis, and the vertical axis is the scale axis. By adjusting the color value at the bottom of the image, the color domain of the phase space depth map can be changed to obtain different images. Fig. 6 is the phase space map with the color domain changed. The above processing is applied to the data we obtained from 48 groups of experiments, and the 480 images are labeled artificially according to load types and working conditions. The data sets are set for subsequent classification detection

ResNet50 arc detection model
Deeper convolutional neural networks can learn deeper data features. The identity mapping of neural networks between network layers is realized by updating network weights, and the learning process of neural networks is the process of updating the weights between network layers through the back propagation of gradients between network layers. According to the chain derivative rule, gradient disappearance or gradient explosion will occur with the increase of neural network layers. Normalization of data can alleviate this phenomenon.

Batch-normalization
In order to prevent the simple linear relationship between the input and output of neural network neurons, we use ReLU as the activation function to add nonlinearity to the neurons. The processed function can approximate any nonlinear function. ReLU function can be expressed as equation (5): For deep neural networks, the distribution of neuron input values will shift with the training process. The overall distribution will approach the extreme value of nonlinear function generally. The addition of batchnormalization fixes the distribution of input values across layers to a standard normal distribution with an expectation of 0 and variance of 1 [20] . One-hot labels are set for the four data sets, and 8 images in each subdata set are used as mini-batch for training to improve training efficiency. Batch-normalization for input of neurons in layer K of neural network through input value X after activation function can be expressed as equation (6): Accordingly, the variance and expectation in Equation (6) should be the unbiased estimation of corresponding statistics in the mini-batch composed of every 8 images.

ResNet50 Network Structure
ResNet is a typical model in the field of computer vision. The proposal of ResNet enables convolutional neural network to avoid network degradation even when the number of network layers increases to a large extent [21] . As we known, deep neural networks take multi-convolutional layers as parameter mapping. According to the chain rule, deep networks face the problem of gradient explosion or gradient disappearance when calculating gradients. The idea of ResNet is to make the deep network obtain the gradient of the shallow network through the mapping of residuals. When the input parameter X maps to H (X), the residual can be expressed as: ( ) = ( ) − (8) ResNet takes the residuals as a mapping. The input and output of the network layer are identical mappings even when gradient disappearance occurs, thus preventing the network performance degradation caused by gradient disappearance. In practical applications, the residual is usually not 0, so the network layer can learn new features from the input features to improve the accuracy of the network. The convolutional block structure of ResNet50 can be expressed in Fig. 7: ResNet can be expressed as the mathematical model shown in equation (9): Where x is the input vector, y is the output vector, and F is the residual mapping, which is part of network training. The convolutional Block in Figure. 7 skips two layers, and its residual mapping can be expressed as: We respectively use different convolution kernels for convolution with step 2 and maximum pooling, and then restore ResNet using convolution block as shown in Figure. 6. Softmax layer is added in the output layer of the network to predict the type of feature and the maximum value is the prediction category. The specific model of the network is shown in Table 2: Table 2 ResNet50 detects network structure

Classification and detection results of arc faults
The image data set is divided into training set and test set according to the ratio of 9:1, and the hyperparameters of the neural network are adjusted through repeated training so that the neural network can obtain higher classification accuracy. Since 480 images are relatively small image samples, Adam algorithm is added to dynamically adjust the learning rate of neural network in order to prevent the increase of loss caused by uneven distribution of image samples in each iteration. The final neural network hyperparameters are shown in Table 3: Table 3 Neural network training hyperparameters All the algorithms above are implemented based on Keras platform interface Tensorflow. The neural network is trained on Intel I7-9750H processor (8G RAM) and the graphics card is NVIDIA RTX2060 (6G).
In order to reflect the changes in the accuracy of training set and validation set in each epoch during the training process, the average accuracy and loss changes in each epoch are made into curves, and the training results were shown in Fig. 8: The accuracy rate in the figure is the average of each epoch, and the image on the right is the change curve of the cross-entropy loss function.
In order to better compare the training connection of the network, the accuracy of training set, the accuracy of verification set and the loss value of epoch18-20 are expressed in numbers, as shown in Table 4: Table 4 ResNet50 training accuracy rate and loss change table As can be seen from Fig. 7, the accuracy of the verification set began to be lower than that of the training set since the epoch is 11. When the epoch is 20, the accuracy of the training set still showed an upward trend, while the loss tended to converge. The accuracy of the verification set is 30% lower than that of the training set, which indicates that the network has a certain over-fitting phenomenon. Due to the small number of samples in the dataset and the image data obtained through color domain transformation, many images may have certain similarity. Insufficient sparsity of data samples may lead to uneven distribution of samples in training set and verification and segmentation, which leads to over-fitting of the network.

Image preprocessing
In order to solve the problem of high similarity and small total number of samples in the data set, we enhanced the wavelet coefficient phase space graph of the time domain signal obtained from the original experiment through wavelet analysis [22] . In order to process the original data without adding image information, we processed the original image in two different ways:

1) Random gamma variation of the image
Gamma change is also known as the curve gray change of color image. In the field of image processing, gamma change of image is often used to adjust the contrast. Gamma change is a nonlinear change acting on pixel gray value, which can be expressed mathematically as equation (11): Where, S is the gray value after gamma changes, and C is the gray scale coefficient, which is 1 in this paper. "r" is the gray value of the image input, and its value range is [0,1]. Gamma is the gamma influence factor, we change the value of gamma randomly to amplify the original image. The images after gamma changes are shown in Fig. 9  2) Random rotation of the image Rotate the image at random Angle without changing the size of the image. Fill the free part of the rotation with black and the images after random Angle rotation are shown in Fig. 10. The original image is changed by the above two methods, and 5 new images are obtained for each image. The new dataset contains 960 images, which is two times the size of the original dataset.

Network training results after data enhancement
On the basis of not changing epoch and mini-Batch parameters, we use the enhanced data set to conduct neural network training of ResNet50. After training, the average accuracy and average loss of the training set and verification set on each epoch are shown in Fig. 11: When the epoch is 23-25, the classification training results of neural network can be expressed in Table 5:

Train Valid
Epoch Accury Loss Epoch Figure 11. ResNet50 training results after data enhancement Table 5 ResNet50 classification training results after data enhancement It can be seen that the over-fitting phenomenon of neural network is solved after adding data enhancement to the data set. The network obtained the highest accuracy of validation set and training set when the epoch is 24. Compared with Table 4, the loss value of the neural network is generally lower, indicating that the neural network had learned more image features and the preprocessing of the data set is effective [23] .
To verify the performance of ResNet50, we have established other typical visual models for comparison including AlxNet, inception V4 and VGG19. After parameter setting and early stop, the network pre-training results are shown in Table 6: Table 6 Other typical computer vision network pre-training results It can be seen that the accuracy of other typical networks is maintained at 20%-30%. The loss function stays around 11 which suggests that the network is not learning the deeper features of the image. Due to the limitation of the fault conditions, it is difficult to obtain large numbers of fault samples. Compared with ResNet, the visual model above is prone to the problem of gradient disappearance/explosion when dealing with small samples and similar data.
The arc fault detection method proposed in this paper can be reflected in the following three points: 1. The neural network is pre-trained by the arc fault detection data set we have established.
2. The current of the circuit is sampled in real time. The sampled signals are analyzed by Morlet wavelet to obtain characteristic images 3. The feature images are input into the pre-trained neural network for classification detection so as to diagnose the fault state and fault load.

Conclusion
This paper presents a series arc fault detection method combining Morlet wavelet analysis and computer vision. Common load arc fault current is sampled by sampling resistance method. The Morlet wavelet with the scale of 64 is applied to deep fault feature fusion.
The matrix composed of wavelet coefficients is mapped to images by HSV color index. The image data is used as the feature to establish the detection data set, and the data categories are annotated manually. ResNet50 is applied to feature image recognition. In addition, we propose a data enhancement method of image random gamma transform and random rotation. Experimental results show that data enhancement can effectively improve the over-fitting phenomenon of neural network, and improve the detection accuracy to 96.53%.
It is worth mentioning that the image feature extraction in this paper provides a new method for feature selection. More researches can be done on image features, such as dimensionality reduction of image level data, image preprocessing that can change network performance, and image expression of typical network regularization.
Computer vision has brought a lot of convenience to our life. Just like face recognition, we look forward to applying computer vision to fault arc detection.