An unconstrained face recognition method based on Siamese networks

It is aimed at the low accuracy and low efficiency of face recognition under unlimited conditions.In this paper, a Siamese neural Network model SN-LF (Siamese Network based on LBP and Frequency Feature perception) is designed based on the Local Binary Pattern (LBP) and the Frequency sensing model.Based on Siamese neural networks, the network adopts circular LBP algorithm and frequency feature perception to realize face recognition under unrestricted conditions.The LBP algorithm can eliminate the influence of light on the image and provide directional input to the network model at the same time.Frequency feature sensing divides the image features into low frequency features and high frequency features. The low frequency features are compressed in the Siamese neural network to increase the recognition efficiency of the network. At the same time, information is exchanged with the high frequency features, so that the target noise data can be eliminated while the feature data is retained.In this way, the recognition rate of the network is maintained, and the computing speed of the network is improved.Simulation experiments are carried out on standard face dataset CASIA-Webface and Yale-B, and compared with other network models. The experimental results show that the proposed SN-LF network structure can improve the recognition accuracy of the algorithm, and achieve a good recognition accuracy.

conditions.Tian Lei et al used the prior knowledge of non-local self-similar images, first used sparse representation to defuzzify the image, and then used the local image block obtained from the sparse representation as the input of the data dictionary.Through the use of unsupervised clustering algorithm to generalize the similarity of the image blocks in the image, and finally use the principal component analysis algorithm to build the image data dictionary.The traditional classic algorithm will produce a lot of redundant data while processing noise data.
Too much redundant data will interfere with the recognition effect of the algorithm, and too much redundant data will increase the computational cost.With the introduction of to train multi-layer restricted Boltzmann machines (RBM) layer by layer, and the output of the previous layer is used as the input of the next layer to perform unsupervised learning training on the image. Improve the speed and accuracy of model training, and solve the problem of local optimal solutions.Since then, the development of deep neural networks has entered a more rapid development stage ， And face recognition has also ushered in rapid development.Scholars such as Liang Shu-fen et al (2014) proposed to improve the deep belief network to process the image, use the LBP algorithm to preprocess the texture features obtained from the image, and then use it as the input of the deep belief network to assist the deep belief network in unsupervised training to improve The efficiency of image recognition, but this method is not efficient in identifying targets that have too much noise data interference, and it is easy to miss the detection.The The algorithm flow chart of this paper is shown in Figure   1.Divide the two branches in the Siamese neural network into the search branch and the model branch,add the LBP algorithm to the search branch of the Siamese network to preprocess the image to be detected, and then use the frequency feature to perform convolution operations on the image.Make the spatial acceptance domain of the image target feature larger, and finally learn the pixel-level face features through the octiva convolutional layer.At the same time, the model branch of the Siamese network extracts the target image from the target library, performs the same operation on the target image, and finally uses the Euclidean distance to judge whether the target in the two images is the same target.
Step1. Use the LBP algorithm to reduce the image dimension to 32*32 and perform histogram equalization to preprocess the image for noise reduction; Step2. Use the training samples as the input of the Siamese neural network, and train the feature data layer by layer; Step3. Use high and low frequency convolution on the feature image input in the Siamese neural network to obtain a frequency convolution image,The width and height of the frequency convolution image are 1/2 of the original feature image width and height, which makes the receptive range of high-frequency features in the feature image larger, and reduces the spatial resolution of low-frequency features.It can improve the recognition rate while not reducing the recognition speed; Step4. Divide the convolutional layer into 4 groups, corresponding to the width and height of the feature image, so that the image feature data can be exchanged for data and frequency.The 4 groups of convolutional layers are the width in the high frequency-the width in the low frequency, the width in the high frequency-the high in the low frequency, the high in the high frequency-the width in the low frequency, and the high in the high frequency-the high in the low frequency. Make it exchange information between high frequency and low frequency, so that the data of the characteristic image is not lost; Step5. After the Siamese neural network training is completed, compare the two outputs in the test sample to determine the Euclidean distance of the feature vectors of the two pictures.If the Euclidean distances of the feature vectors of the two pictures are within the threshold, they are the same target, and if they are greater than the threshold, they are not the same target.

2.1Image preprocessing based on LBP algorithm
The basic idea of the LBP algorithm is that for any pixel,with this as the center, the gray value of the pixel point is the threshold value, and the gray value of the field pixel is compared with the gray value of the center pixel. If the gray value of the field pixel is less than the gray value of the center pixel, it is marked as 0, and vice versa Mark it as 1, and finally get a binary code of the center pixel, as shown in Figure   2.Because the LBP algorithm uses the central pixel as the threshold, and the binary code of the surrounding pixels remains unchanged, it is not sensitive to changes in illumination, so the LBP algorithm has gray-level invariance.LBP is to get multiple LBP values by rotating at one point in the field, and the smallest value is selected as the characteristic value, so LBP has the characteristics of rotation invariance.And LBP extracting features is a method of extracting features without parameters, so the LBP algorithm does not need to adjust the parameters for calculation, which reduces the difficulty and complexity of the algorithm in the calculation process, and at the same time can completely extract the main features of the image .

Fig 2.LBP algorithm conversion
The algorithm formula is: In which, formula (1) is the binary code for calculating the LBP algorithm, is LBP binary code, ( 1 , 2 , …, ) is the coding function, is the value of the middle pixel, 0 、…、 8 are the pixel values of 8 areas around the center point. In formula (2), is the judgment function,Combination formula (1) and formula (2).Compare each pixel value with the center pixel value separately, When the field pixel value is less than the value of the central pixel point is 0, the field pixel value is greater than or equal to the value of the central pixel point is 1.
Equation (3) is the decimal number of the LBP algorithm, is the number of pixels in the area surrounding the central pixel.
Combining equations (2) and (3) In which, R M is the LBP histogram of the m-th block of image, and H represents the LBP histogram of the m-th block of face images connected to obtain a complete face image spatial histogram.Using the LBP operator to obtain the texture features of the image can not only greatly reduce the impact of the illumination, but also express the regional features of the face image, and connect the regional features to form the global feature of the face image. Figure 4 shows the original picture and the effect picture calculated by the LBP algorithm. .We control the high and low frequency feature segmentation ratio by setting the hyperparameter α .The algorithm formula is: In which, is the collection of high-frequency and low-frequency features, ℎ and are the height and width of the feature, is the channel number, and X H is the high-frequency feature of the image. Capture precise details in the image by threshold α, As shown in formula (8), is the low frequency feature of the image. Reduce the spatial dimension by reducing the width and height, and slow down the change of low-frequency characteristics in the spatial dimension.
As shown in formula (9). The current frequency feature of the image is that the high-frequency feature and the low-frequency feature have the same height and width, and the receptive field of the low-frequency feature is obviously higher than that of the high-frequency feature. As shown in Figure 5. After the frequency feature is extracted, the width and height of the low-frequency feature are reduced proportionally, which can reduce the receptive field of the low-frequency feature.At the same time, information is exchanged between low-frequency features and high-frequency features, so that the frequency features of the reduced image can save most of the information of the original image.

Fig 5 Frequency characteristics of the current image
In the feature update operation, the high-frequency and low-frequency features will be updated in the corresponding frequency. The feature swap operation will update the high-frequency and low-frequency feature information between different frequencies. Therefore, high-frequency features include not only its information processing, but also mapping from low-frequency to high-frequency, and vice versa. In the Siamese network, metric learning is used to compare the similarity of two pictures (SHI Guoqiang and Zhao Xia 2020). First of all, as shown in Figure 7, input a picture in Network_1, enter the convolutional layer, pooling layer, and fully connected layer through the Siamese neural network in turn, and finally output the feature vector 1 of the picture, Enter another picture in Network_2, perform the same operation on the picture in Network_2, and finally get its feature vector 2 . In order to be able to judge whether two pictures are the same person, we define a threshold (hyperparameter), If the encoding result distance of the two pictures is less than the threshold, the two pictures are one person, and if the distance is greater than the threshold, the two pictures are not the same person. The algorithm formula is: The cost function is the sum of the loss functions in all individuals.

Experimental environment and parameter settings
The experimental hardware environment is Intel core i3 CPU, Geforce 710m GPU, 2G memory, the operating system is  Analyzing Figure 9 we can see that when the learning rate increases linearly, the loss value will decrease. Once the learning rate increases to a certain extent, the loss value will increase. Therefore, the range of the periodic change of the learning rate is selected as the interval where the corresponding loss value drops rapidly. This article refers to the parameter settings in Literature (Yi D et al 2014

Comparative experiment of the algorithm in this paper on the CASIA-WEBFACE dataset
In the CASIA-WEBFACE data set, the classic Siamese neural network and SN-LF are compared. This paper adopts the mean Average Precision (mAP) and the recognition rate to evaluate the performance of the algorithm. Comparing the algorithm in this paper with the classic Siamese neural network algorithm, it is found that the similarity of the algorithm in this paper is about 15% higher than that of the classic Siamese neural network.It shows that the algorithm in this paper can recognize target features better than the classic Siamese neural network in the recognition process.The experimental results are shown in Figure 11.  From the comparison between the classic Siamese neural network algorithm in Figure 12 and the mAP curve of SN-LF, it is found that the mAP curve of SN-LF and the classic Siamese neural network algorithm is not much different before 1000 iterations.
As the number of iterations increases, the difference between the two becomes larger and larger, reaching the maximum at 3000 iterations, with a maximum difference of 15%.This shows that as the number of iterations increases, the convergence speed of the SN-LF algorithm is faster than that of the classical Siamese neural network.When the number of iterations increases to 5000, the mAP curve reaches about 90%, indicating that the algorithm has strong robustness.   Figure 13, Combined with Table 2, the algorithm in this paper is ahead of the traditional algorithm in accuracy and mAP. From Table 1 and Table 2, we know that in complex samples, the algorithm in this paper can maintain a stable recognition rate.It shows that SN-LF can accurately identify the target under interference, which further proves that the algorithm in this paper is robust to pedestrian recognition under unrestricted conditions and has high anti-interference.  Table   3, it is known that in the iterative process, the algorithm in this paper has a faster convergence rate, and has better robustness than the traditional Siamese neural network. It shows that the algorithm in this paper also shows higher performance under restrictive conditions. Table 4 combined with Table 3 shows that, in addition to the faster convergence speed of the algorithm in the iterative process, it has better robustness than the traditional Siamese neural network. It shows that the algorithm in this paper also shows higher performance under restrictive conditions.

Conclusion
The existing unconstrained face recognition methods mainly start from the key points of the face combined with the direction of the denoising algorithm, which tends to converge too fast or the robustness is not strong, and the algorithm application scene is relatively single and cannot adapt to multiple scenes. This paper proposes an architecture that combines frequency feature perception, LBP and Siamese neural networks. By changing the parameters, the network convergence speed is accelerated and the number of iterations required for the same recognition rate is reduced. The algorithm first uses the LBP algorithm to preprocess the image to obtain the texture feature, which effectively reduces the interference caused by the noise data, and then the texture feature is input into the Siamese neural network.While improving the running speed, it also improves the recognition accuracy of the algorithm and enhances the robustness of the algorithm. Finally, through frequency feature perception, the features are divided into high frequency and low frequency, which effectively expands the spatial acceptance domain and improves the recognition accuracy. Moreover, the proposed network has a simple structure and is suitable for face recognition of small-scale data sets under unconstrained conditions.It has been trained and tested on CASIA-webface and Yale standard face database, and obtained good results, and at the same time, it has high robustness. It is a face recognition algorithm with excellent performance. The next problem to be solved is to reduce the running time of the algorithm, improve the efficiency of the algorithm, and at the same time solve the problem of identifying multiple targets in the image under unrestricted conditions.

Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.

This work is supported by open fund projects of artificial
intelligence key laboratory of Sichuan province (2020RYJ04); Liaoning provincial natural science foundation (20180551020); Liaoning educational committee program (JDL2019011).