Design on Face Recognition System with Privacy Preservation Based on Homomorphic Encryption

Face recognition is playing an increasingly important role in present society, and suffers from the privacy leakage in plaintext. Therefore, a recognition system based on homomorphic encryption that supports privacy preservation is designed and implemented in this paper. This system uses the CKKS algorithm in the SEAL library, latest homomorphic encryption achievement, to encrypt the normalized face feature vectors, and uses the FaceNet neural network to learn on the image’s ciphertext to achieve face classification. Finally, face recognition in ciphertext is accomplished. After been tested, the whole process of extracting feature vectors and encrypting a face image takes only about 1.712s in the developed system. The average time to compare a group of images in ciphertext is about 2.06s, and a group of images can be effectively recognized within 30 degrees of face bias, with a recognition accuracy of 96.71%. Compared to the face recognition scheme based on the Advanced Encryption Standard encryption algorithm in ciphertext proposed by Wang et al. in 2019, our scheme improves the recognition accuracy by 4.21%. Compared to the image recognition scheme based on the Elliptical encryption algorithm in ciphertext proposed by Kumar S et al. in 2018, the total spent time in our system is decreased by 76.2%. Therefore, our scheme has better operational efficiency and practical value while ensuring the users’ personal privacy. Compared to the face recognition systems in plaintext presented in recent years, our scheme has almost the same level on recognition accuracy and time efficiency.


Introduction
As an important part of artificial intelligence, face recognition is widely used in hotel check-in and mobile payment, and has become an indispensable way of identity authentication in modern life. At the same time, with the hidden danger of face information leakage, 1 3 there is an increasing demand for face recognition schemes and systems that support privacy preservation.
For a long time, the ciphertext is irregular and difficult to operate, which limits the application of traditional encryption algorithms. To address this problem, the concept of homomorphic encryption was proposed by Rivest et al. [1] in 1978. This concept means that a certain operation in ciphertext space, such as multiplication and addition, can be mapped to the plaintext space with the same operation. A fully homomorphic encryption (FHE) scheme based on the ideal lattice was proposed by Gentry [2] in 2009, which has attracted great attention from cryptographers all over the world. The FHE scheme based on integers and polynomials was proposed by Smart et al. [3] in 2010, which shortened the key and ciphertext size. The DGHV scheme based on the FHE adopting the integer ring was proposed by Gentry et al. [4] in 2010.
The DeepID2 model was proposed by Sun et al. [5] In 2014, which can simultaneously train two types of data: face verification and face Classification. In 2015, Schroff et al [6] . proposed the FaceNet neural network and designed a unified solution framework for face recognition: recognition, verification, search and other issues can be implemented in the feature space, which has achieved good performance.
For the traditional face recognition systems [7][8][9][10][11], the recognition processes were totally performed in the plaintext state, which provided potential attackers to steal and tamper user's data, and traditional face recognition system cannot preserve user privacy any more. Therefore, the traditional face recognition system is not suitable for the rapid development of cloud technology today. In order to solve this problem, we combine homomorphic encryption with face recognition in this paper, which can not only provide security for data, but also provide privacy preservation for users.
In this paper, a face recognition system in ciphertext based on SEAL library is designed and implemented. This system has better application prospects than the system in plaintext. The main contributions of this paper are summarized as follows: (1) A face recognition scheme based on SEAL library in ciphertext is proposed for the first time. Firstly, the CKKS algorithm in the SEAL library, latest homomorphic encryption result, is used to encrypt the normalized face feature vector. Secondly, the FaceNet neural network is used to learn on the ciphertext of the image to achieve face classification. Finally, the face recognition in the ciphertext is realized, and the privacy leakage of the user's face information can be effectively avoided. (2) A face recognition system based on SEAL library in ciphertext is implemented and tested. The whole process of extracting feature vectors and encrypting a face image takes only 1.712s on average. The average time spent by comparing a group of images in ciphertext is about 2.06s, and a group of images can be effectively recognized within 30 degrees of face bias. The recognition accuracy can reach 96.71%. While ensuring the personal privacy and security of users, this scheme has better operational efficiency and practical value compared to the face recognition systems [12][13][14][15][16][17][18][19][20][21] on recognition accuracy and time efficiency.
The structure of this paper is as follows. In the first and second section, we introduce the research background, development status of homomorphic encryption and face recognition. In the third section, we briefly introduce the homomorphic encryption algorithm and the FaceNet architecture, the data set and development tools used in the test are introduced at the same time. In the fourth section, we test all aspects of the system, and analyze the test results in detail. A comparison of test result between this system and other systems is also performed. Finally, we summarize this paper in the last section.

Related Work
A neural network architecture for face recognition was proposed by Sun et al. [7] in 2015, which is called DeepID3. This architecture is based on two network architectures of VGG net and GoogleNet. The image in the gallery was converted into a sketch style by Wan et al. [22] in 2017, which reduced the amount of calculation when comparing with the original image. In 2015, the FaceNet architecture was proposed by Schroff et al. [6] of Google. Since then, the development of face recognition has entered a new stage. FaceNet is a very versatile system, which can be used in Face Verification, Classification and Clustering. FaceNet selects a vector space with a dimension of 128 and uses the loss function of triplets' Maximum Boundary Neighbor Classification (LMNN) to train the neural network. FaceNet learns to map the face information in images to Euclidean space through convolutional neural network, and uses the Euclidean distance of feature points in the images to determine the similarity between images. The FaceNet algorithm uses two data training sets, called CASIA-WebFace and VGGFace2, also known as 20180408-102900 and 20180402-114759. FaceNet is tested on the widely used standard facial image data set, and achieves high recognition accuracy with excellent performance [23].
In 2015, the Learning with error (LWE) problem was reduced to a variant of the Approximate Greatest Common Divisor(AGCD) problem by Cheon et al. [24]. On this basis, a new AGCD-based FHE scheme was proposed, and has achieved better performance than any previous FHE schemes based on AGCD. In 2016, Jaschke and Armknecht [25] pointed out that a rational number can be approximately expressed as an integer by iteratively multiplying with a power of 2. In 2017, a more efficient method of representing fixed-point decimals was proposed by Dowlin et al. [26]. In 2017, an approximate FHE scheme was constructed by Cheon et al. [27], which uses rescaling technique to change the ciphertext modulus growth from exponential growth to linear growth while ensuring calculation accuracy, and improves the computational efficiency of the algorithm by using batch processing technique. In 2012, the BFV homomorphic scheme was proposed by Fan et al [28], based on which Microsoft disclosed Simple Encrypted Arithmetic Library (SEAL). In 2016, a Residue Number Systems (RNS) variant of the BFV scheme was proposed by Bajard et al. [29]. SEAL 2.3.0 was released by Microsoft in 2017. In 2018, a new technology for updating low-level ciphertext based on bootstrapping was proposed by Cheon et al. [30]. The application adopting homomorphic encryption algorithms has also begun to rise in various fields [31][32]. In 2018, Melchor et al. [33] compared the performance of HElib, SEAL and FV-NFLlib, and found that the performance of the SEAL library is better than HElib and FV-NFLlib in all aspects, so the SEAL library is selected as the development tool for this solution.

Fully Homomorphic Encryption
Homomorphic encryption requires that the ciphertext data can be directly calculated without decryption. The same operation can be mapped to the plaintext space. Fully homomorphic encryption is a homomorphic encryption algorithm whose addition operation and multiplication operation works on ciphertext. For any operation f and plaintext m, the following formula is satisfied when the algorithm is additive homomorphism and multiplication homomorphism.
In the case of fully homomorphic encryption, the operations such as addition, subtraction, multiplication and division, polynomial evaluation, exponents, logarithms, trigonometric functions can be performed on the ciphertext. A fully homomorphic encryption scheme usually includes KeyGen, Enc, Dec and Evaluate these four algorithms.
The SEAL 3.0 released by Microsoft in 2018 supports the CKKS scheme proposed by Cheon et al. [27], and can realize the approximate calculation of ciphertext in the real number domain. Since the BFV scheme cannot directly encode, encrypt and operate on floating numbers, the floating numbers should be turned into integer numbers by multiplying these numbers by a magnification factor before operating on them. However, the introduction of the magnification results in a rapid accumulation and expansion of the ciphertext. The CounTeR(CRT) scheme can be used to solve this problem, but greatly increases the complexity of the ciphertext calculation. Compared to the BFV scheme, the CKKS scheme can directly encode, encrypt and operate on double-precision floating-point real numbers and complex numbers. SEAL 3.2 also can support .NET development, which made it easier for .NET developers to develop homomorphic encryption applications. The latest version of SEAL is SEAL3.4.
The CKKS scheme basically consists of those algorithms: key Generation, encryption, decryption, homomorphic addition and multiplication, and rescaling. For a positive integer q, let R q ∶= R∕qR be the quotient ring of R modulo q. Let s , r , e be distributions over R which output polynomials with small coefficients. These distributions, the initial modulus Q, and the ring dimension n are predetermined before the key generation operation.
Key generation algorithm: Sample a secret polynomial s ← s . Sample a(resp. a) uniform randomly from R 2 Q , a public key pk ← (b = −a ⋅ s + e, a) ∈ R 2 Q , and an evaluation Decryption algorithm: For a given ciphertext ct ∈ R 2 q , output a message m � ← ⟨ct, sk⟩(modq).

FaceNet
In 2015, the FaceNet algorithm was proposed by Schroff et al. in Google. The algorithm can map image feature data to points in Euclidean space through learning, and the distance of the points directly corresponds to the similarity of the points. Figure 1 illustrates the network structure of FaceNet. The front part is the same as CNN, followed by a feature normalization module to make its feature ||f (x)|| 2 = 1 . Then, FaceNet proposes a new loss function Triplet Loss, which is the biggest feature of the algorithm. Through the Triplet Loss module, all image features will be mapped to a hypersphere, the feature distance between the same identities should be as small as possible, and the feature distance between different identities should be as large as possible. An example of Triple Loss is as follows.
In Fig. 2, the left part of the figure is a representation of the distance between data features in the original space. Through learning, it can be ensured that the image of a specific person is closer to the image of the target object (positive) and far away from the image of the non-target object (negative). The optimization function adopted during learning is expressed as.
The loss function is expressed as follows.

Overall Design
This system is developed based on python3.6, and completes the learning and training of FaceNet algorithm by calling various python libraries. The python libraries and versions required for this system are listed in Table 1. The overall structure of this scheme is divided into the client and the server on the cloud. The client is responsible for the extraction of the face feature data, and then this data is encrypted and sent to the server on the cloud. After the server obtains the data, it calculations result in ciphertext and then returns the result in ciphertext to the client. During the whole process, both the data and the results are encoded in ciphertext. So this scheme achieve the purpose of protecting face feature privacy and security.
When the user uses it, the data training set is first learned by the FaceNet algorithm. The face information in the image is extracted by the FaceNet algorithm and normalized. The plaintext information obtained after these operations is put into the CKKS encryption scheme of the SEAL library for encryption. The system uses the order-preserving feature of the homomorphic encryption to compare the distance between the ciphertexts of the images and obtain the recognition results. The flow chart of the recognition algorithm is illustrated in Fig. 3 (Tables 2, 3 and 4).

FaceNet Data Training Set
There are two main types of FaceNet data training sets: 20180408-102900 and 20180402-114759. The basic information of the two data training sets is as follows.
There are 494,414 images of 10,575 persons in the CASIA-WebFace dataset. VGG-Face2 is another large-scale face recognition data set, which contains a total of 9131 facial data. According to the test, 20180402-114759 derived from VGGFace2 has high recognition accuracy. So the system proposed in this paper is developed based on the 20180402-114759 data training set.

Selection of Training Data Set
Two major data training sets for the FaceNet algorithm are 20180402-114759 and 20180408-102900. We tested both the two data training sets in order to select the better one as the training data of this system.
The judgment value indicates the deviation degree of facial feature values from different images. After a lot of repeated training and testing, we set 0.7 as the threshold to judge whether two images are corresponding to the same person. If the judgement value is greater than 0.7, we consider that the two images correspond to different persons. Otherwise, they correspond to the same person. If the judgment value of two images of the same person is smaller and the judgment value of different persons's images is larger, the recognition effect will be better and the data training set will meet our requirements.
According to the test results, the 20180402-114759 data training set can meet our requirements, and it is chosen as the training data set to be used in our system.

Testing and Analysis
As two very important indicators of the face recognition system, the efficiency and reliability of the system are tested in this section. The information of this test platform is as follows. The computer model is Lenovo 310S-14IKB, the CPU is Intel i5 7200U 2.4G, and the running memory is 4G. The operating system of the computer is Windows 10 Home, and the PyCharm based on python3.6 is used to run the system code.

Accurate Face Recognition Test
Firstly, we select the most frequently used LFW test data set in the FaceNet algorithm. Images of 40 persons were randomly selected for face recognition and comparison, and we conducted a total of 820 tests and conducted comparative tests based on different thresholds. The test results are shown in Table 5. It can be seen from Table 5 that the recognition accuracy reaches the maximum when the threshold is 0.7, and the maximum recognition accuracy is 96.71% on the LFW test set. Therefore, we choose 0.7 as the judgment threshold. When the judgment value is not large than 0.7, we judge that the images correspond to the same person, otherwise the opposite.
Secondly, we take into account that there may be problems with irregular and nonstandard image collection in actual situations. So the irregular and non-standard image collection is also taken into consideration in the test. Table 6 shows the test results.  It can be seen from Table 6 that our system works effectively when the head deflection is less than 90 degrees and the face shielding area is small.

Multi-person's Image Recognition Test
Reliability testing is not only to determine whether two images are corresponding to the same person, but also to accurately distinguish the feature values of different person's images. We used 20 images of 10 persons and tested two different images of each person. For any two images, the threshold of the judgment value is 0.7. When the judgment value is less than or equal to 0.7, we consider the two images corresponding to the same person, otherwise the opposite. The test results are shown in Tables 7, 8.
As shown in these two tables, the average judgment value of the same face is 0.38513367, and the average judgement value of the different faces is 1.56347265. Therefore, the judgment value of images from the same person is within 0.7, and the judgment value of images from different persons is greater than 0.7 or even greater than 2. Thus, this scheme performs high recognition accuracy in the multi-person situation.

Efficiency Test
In the efficiency test, we randomly selected two groups of images, each including 20 sets of images. Each set is comprised of two different images of the same person. We sorted and analyzed the time consumed by processing 120 images, and got the time probability distribution for each image, which is shown in Fig. 4.
The time t1 for processing one image is concentrated in 1.3s-1.8s, with an average value of 1.712s.
To test the time t2 consumed by the encryption operation, we also select 20 feature values for encryption, and the result is shown in Fig. 5.
It can be calculated from Figure 5 that the average encryption time of each image feature value is about 0.086s.
To evaluate the ciphertext comparison operation time t3, we select 20 groups of ciphertexts for comparison, and the result is shown in Fig. 6. Through the above test, the total time tall for ciphertext image identification can be obtained as follows: In the above formula, tall, t1, t2, and t3 are the total time, processing time, encryption time, and comparison time, respectively. The total time is about 2.06s. The processing time for two images is 1.712s, and accounting for 83.1% of the total time. The processing time is closely related to computer performance. If a computer with more powerful performance is used, the comparison time will be further reduced.

Performance Comparison
In order to further illustrate the comprehensive performance of this recognition scheme, we compared the latest relevant research results in the field of face recognition in ciphertext in the past 5 years, and obtained the results shown in Table 9.
As we can see from Table 9, scheme [18] and [21] are slightly higher than our scheme in recognition accuracy. Such as to the scheme [21], our scheme is 0.22% lower than it in recognition accuracy. However scheme [21] did not provide privacy preservation. To the scheme [18], the authors propose two new schemes for the secure computation in face recognition. First scheme relies face image database on the third-party server, which is easier to reveal user personal privacy information. At the same time, the overall scheme is even more complex. The second scheme computes and returns the results to the client by using the special protocol, which also adds some additional steps to complicate the scheme as a whole. Scheme [18] uses the Paillier algorithm to perform partial homomorphic encryption operations, this algorithm can only perform homomorphic encryption on the addition computing, but not perform the real number multiplication operation. Although literature [18] provides privacy preservation solution,  it has inherent limitations in ciphertext operations when recognizing faces. The CKKS encryption scheme used in our solution can not only provide calculation operation for real number, but also provide the fully homomorphic encryption on multiplication and addition computing. Our scheme does not require any other specific third-party or any other complex protocol, which makes this scheme easier to deploy and expand. From the Table 9, the encryption time of this scheme is 15.7% higher than the encryption time in [18]. The total time of this scheme is 76.2% higher than the total time of literature [21], and the recognition accuracy of this scheme is 4.21, 0.68, 1.67% higher than that of literature [16], [17], and [19], respectively.
In order to explain the advantages in our scheme, We also compared some recent achievements on face recognition in plaintext [12][13][14][15]. The maximum performance difference between those achievements and our scheme in recognition accuracy is only 2.3%. In other words, considering the security and the privacy preservation of the user face data, this difference is acceptable and negligible.
Obviously, our scheme has a better comprehensive performance.

Summary
In this paper, a ciphertext face recognition scheme and system based on SEAL library and FaceNet is proposed by combining homomorphic encryption and face recognition. Firstly, the normalized facial feature vector is encrypted using the CKKS algorithm in the SEAL library, Microsoft's latest homomorphic encryption achievement. Secondly, the FaceNet neural network is used to perform machine learning on the ciphertext of images to realize face classification. Finally, face recognition in ciphertext is realized. Through the test, the average time to compare a group of images in ciphertext is about 2.06s. Our system works effectively when the head deflection is less than 90 degrees and the face shielding area is small. Compared with the plaintext face recognition system proposed in the past 5 years, this scheme not only has almost the same level of recognition accuracy, time efficiency and better practical value , but also provide privacy preservation to users. In practice, the ciphertext information can be stored in a cloud database, and cloud computing is used to overcome the problem that encryption and decryption cost too much time in ciphertext recognition. On the whole, this scheme and system have good reference value and application prospects in the fields of ciphertext face recognition, privacy preservation, and identity authentication.
Data Availability Not applicable.

Declarations
Funding This work was supported by The State Cryptography Development Fund of Thirteen Fiveyear(MMJJ20170110).

Conflict of interest
The authors declare that they have no Conflict of interest.