Fast and efficient face recognition system based on kernel sparse representation and fast CoSaMP

Kernel sparse representation based classification (KSRC) in compressive sensing (CS) represents one of the most interesting research areas in pattern recognition, image processing and especially facer recognition and identification. First, it applies dimensionality reduction method to reduce data dimensionality in kernel space and then employs the 1 L -norm minimization to reconstructing sparse signal. Nevertheless, these classifiers suffer from some shortcomings. KSRC is greedy in time to achieve an approximate solution of sparse representation based on 1 L -norm minimization. In this paper, a new method is mathematically developed and applied for face recognition based on Gabor-wavelets for feature extraction and KSRC for precise classification. The aim of the proposed algorithm is to improve the computational efficiency of KSRC by applying a supervised kernel locality preserving projections (SKLPP) for dimension reduction. In fact, the 1 L -norm minimization is performed by the use of the fast compressive sampling matching pursuit method (FCoSaMP). Experimental results prove that the proposed classification method is efficient, fast, and robust against variations of illumination, expression, and pose. Indeed, its computation time is highly reduced compared to the baseline performances.


Introduction and state of the art overview
Pattern recognition in images or videos, represent an important process in objects identification. In the last decades, face recognition has become an interesting research field, 2 naming a few; security system and access control. Its principal goal is to identify person's faces from a database. The features extraction and the classification of facial images represent the most important and decisive tasks in recognition process.
In order to find the most appropriate representation and discriminative features of the face image data, several attributes extraction methods are proposed in the literature. The technique of local appearance-based features represent an important approach in this way. Its role is to extract local descriptors by using appropriate image filters. It  On the other hand, classification is the second important step in pattern recognition.
Consequently, classifier design and type is vital for efficient recognition tasks. In this order, several approaches are widely proposed. Based on compressed sensing (CS) [3], Wright et al.
proposed a sparse representation based-classification (SRC) [4] to classify face image data with corruption and occlusion. SRC implements sparse representation of data by using the methods for sparse signal reconstruction and classifies data in terms of reconstruction errors.
A typical SRC method is performed in the original input space. Nevertheless, the standard SRC cannot capture the non-linear information within the data, mainly for high-dimensional data such as face image databases [5]. In order to solve this problem, Zhang et al. proposed the kernel sparse representation-based classifier (KSRC) to transform the input data into a high-dimensional feature space. The KSRC is based on some non-linear mapping associated with the reproducing kernel Hilbert space (RKHS). Hence, data in feature space are often of high-dimensionality and 1 L -norm minimization in this space is impractical. Before optimizing the corresponding cost function, adding a dimensionality reduction stage in feature space could be considered one of the ideal solutions. The dimensionality of the feature space can be reduced by exploiting non-linear subspace learning and manifold learning methods. The classical kernel principle component analysis (KPCA) [6] and kernel fisher discriminant analysis (KFDA) [7] methods consider only the global structure of training data. The manifold learning methods, such as Isomap [8], locally linear embedding (LLE) [9] and supervised kernel locally preserving projection (SKLPP) [10], effectively utilize the manifold structure or local neighbor structure of training data.
The optimal solution for reconstructing sparse signal in KSRC is achieved through 1 Lnorm minimization. Two interesting approaches are implemented in this order: The first one 3 is convex optimization methods includes the least absolute shrinkage and selection operator (LASSO) [11], homotopy method [12] and gradient projection for sparse representation method (GPSR) [13]. The second one is based on greedy recovery methods which consist in reducing the computational complexity of the optimum 1 L -norm minimization, while maintaining good reconstruction accuracy. This approach includes compressive sampling matching pursuit (CoSaMP) method [14], orthogonal matching pursuit (OMP) method [15], iterative hard thresholding method (IHT) [16] and subspace pursuit (SP) method [17].
The majority of the existing greedy recovery methods perform non-regulated selection, leading to the selection of either too few or too many elements, generating a larger reconstruction time and error. Moreover, the sparse signal is estimated through least square minimization and calculating a pseudo-inverse of a matrix at each iteration. Such matrix inversion becomes more intensive as the size of the data increases, which brings high computation cost with each iteration and makes such approaches not suitable for real-time applications.
In order to develop an effective method for face recognition, we propose in this work Gabor wavelet to extract the features from face database followed by the proposed modified KSRC for face classification. Gabor

Gabor Feature extraction
The variations in illumination, facial expressions, and poses represents the major challenges in the face recognition problem. In order to overcome such challenges, it was found that the local features in face images are more robust against such variations [18]. So, a spatial-frequency analysis was found to extract such features. In fact, the wavelet analysis provides a good space-frequency localization. However, Gabor functions provide the optimized resolution in both the spatial and frequency domains [20,21]. Therefore, the Gabor-wavelets are proposed to be the optimal basis for extracting local features for pattern recognition [18]. The 2D Gabor filter in the spatial domain is defined by the following expression [1, 22, 23]: where z = (x, y), u and ν is the orientation and scale of the Gabor kernels , and ,

Kernel Sparse Representation Classifier
In this section, we introduce the compressed sensing (CS) method, then we briefly review the kernel sparse representation-based classifier (KSRC). Finally, we apply supervised kernel locality preserving projections (SKLPP) for dimensionality reduction.

Compressed sensing
Let there be a sparse signal N xR  , with s-sparse , and a measurement system that acquires M linear measurements presented by: In KSRC, to make the training data separable, the data are mapped from the input feature space M R into a high-dimensional kernel feature space H by a nonlinear mapping function  . The dictionary A composed of all N data after the nonlinear mapping  can be formulated as: In order to find the identity of y, KSRC used the sparse solution x by solving the following stable 1 L -norm minimization problem [26]: where 1 x is the 1 L -norm minimization of x. However, Eq. (10) is even harder to solve. So it is necessary to reduce the dimensionality in feature space H into low-dimensional subspace.
In fact, the dimensionality of the feature space H is far greater than the original feature space M R and requires a huge computation. Hence, the system may slow down terribly or run out of memory. Moreover, it has been observed that a large number of features may actually degrade the performance of classifiers. Indeed, the number of training data is small in relation to the number of features [26]. That is why, in order to reduce the dimensionality in kernel feature space H into low-dimensional subspace we modify the Eq. Eq. (11) into Eq. (12), we get The inner product of data in the high-dimensional kernel feature space can be computed by Mercer's kernel function. The kernel function must satisfy Mercer's conditions which are continuity, symmetry and being positive definite. Namely, for any data a and  a , we have k a a is a kernel function. Some popular kernels that meet Mercer's condition are linear kernel, polynomial kernel, sigmoid kernel and Gaussian radial basis function (RBF). The RBF function transforms the data to a specific high-dimensional space which is computed by the form of the corresponding kernel function. The RBF kernel is given by the following equation: where  is a positive constant called RBF kernel parameter. Consequently, Eq. (13) is transformed to the following expression: is the kernel Gram matrix which is symmetric and positive definite. After zero centralization, the centering kernel gram matrix K is defined by the following relationship: The term 1 is a matrix with all entries 1 N . Following [26], the problem of Eq. (10) can be rewritten as: Consider the case of noisy data, the following optimization problem of Eq. (17) is modified by the following expression: where  is small positive error. To compute the identity of a test data y, KSRC uses the coding coefficients x . After solving Eq. (17) or Eq. (18), the representation residual of each class is computed by the following expression: where i  is the characteristic function which is used to select the associated coefficients to the ith class. The characteristic function is given by the following relationship: Thus, based on the representation residual, the class label of the test data is computed by the following expression: In the above equation, if a class achieves the minimal representation residual, the test data is classified into this class. The complete classification procedure of KSRC is summarized in the algorithm 1. This algorithm introduces subspace projection method SKLPP to perform dimensionality reduction and FCoSaMP to solve the 1 L -norm minimization problem.
Algorithm 1: kernel sparse representation-based classifier (KSRC) Input: A matrix of training data A and a test data y Step 1: Kernelize of A and y.
Step 2: Get the corresponding pseudo-transformation matrix B using SKLPP.
Step 3: Center the kernel gram matrix K.
Step 4: Normalize the columns of T B Kx and ( , ) Step 5: Solve the 1 L -norm minimization problem to get the coefficient vector x using FCoSaMP (Algorithm 3).
Step 6: Compute the residuals

Supervised Kernel locality-preserving projections
In KSRC, we mapping the training data from the input feature space M R into a highdimensional kernel feature space H by a nonlinear mapping function . The objective function of KLPP in the kernel feature space is given by the following equation: is defined as the kernel Gram matrix which is symmetric and positive definite. Thus, the transformation matrix B can be obtained by the following expression:

Fast Compressive Sampling Matching Pursuit
Motivated by the need to reach computationally inexpensive solutions to solve 1 L -norm minimization, we propose in this section a fast compressive sampling matching pursuit (FCoSaMP) for reconstructing sparse signal. First, we review compressive sampling matching pursuit (CoSaMP), then FCoSaMP. Finally, Sherman-Morrison-Woodbury formula to update the inverse of matrix. In this section, the sensing matrix  will be taken as the kernel Gram

Compressive Sampling Matching Pursuit (CoSaMP)
Compressive Sampling Matching Pursuit (CoSaMP) is one of the Greedy recovery methods proposed by Needell and Tropp [14] for reconstructing the sparse signal. In each iteration, the CoSaMP selects 2s elements and the sparse signal is estimated based on the identified support set  through least square minimization. The signal is pruned maintaining only its s largest values and the residual is calculated based on the pruned signal. The previous steps are repeated until a stopping condition is met. The CoSaMP method is summarized in Algorithm 2 and its steps are explained as follows.
1) Identification: The residual vector r is correlated with the columns of the sensing matrix  , forming a signal proxy g.
2) Support Merger: the support set  is obtained by selects 2s columns of sensing  [27]. Instead of directly forming the pseudo-inverse necessary for least square minimization in each iteration with matrix inversion, FCoSaMP updates the inverse matrix from data in the previous iteration by adding a new columns corresponding to the added elements. FCoSaMP then prunes the signal estimate to keep only its s largest values and update the inverse matrix by removing the columns corresponding to the pruned elements for the next iteration. Finally, a residual is calculated and the previous steps are repeated until a stopping condition is met. In the following discussion, we will denote T  in the ith iteration, with i  . The FCoSaMP method is summarized in Algorithm 3 and its steps are explained in what follows.

1) Identification:
The residual vector r is correlated with the columns of the sensing matrix  , forming a signal proxy g. The g  containing the columns of the signal proxy g from the support set  and g  is the average of g  . It is clear that the shaped threshold gg s  − , take into consideration the size change and it's mean in the same time. This dynamic smart threshold auto-adapts itself to generate optimal value in each time.

3) Estimation:
Based on the identified support set T, the signal estimate x is formed by using least square minimization. In the ith iteration, let us denote

4) Pruning:
After solving the least squares and  i is updated with the new columns, the support set T is updated by removing the columns to obtain the s largest magnitude elements in the signal estimate and set the rest to zero. In the ith iteration, let us denote  p the nonsingular matrix after removing the pruned columns, and P a matrix containing p columns, which will be removed from  i to obtain 1 +  i . This is necessary for the iterative structure of the method, since new columns will be added in the next iteration. The matrix  i can be expressed as

5) Sample Update:
The new residual is calculated based on the pruned estimated signal and the measurement vector y.

Update inverse matrix
Let us denote dk R   is symmetric positive definite matrix and kd  . In this section, we are interested in updating the matrix inverse B by adding a symmetric matrix U to  or removing a symmetric matrix P from  . This matrix inverse has been calculated in various 13  [28]. The Sherman-Morrison-Woodbury formula relate the inverse of a matrix after a small rank 14 perturbation to the inverse of the original matrix [29]. The Sherman-Morrison-Woodbury formula is the generalization to a rank q modification to B and can be expressed by the following expression: (27) is useful in situations where q is much smaller than k and the structure of B is "nice" so that the effort involved in evaluating the correction

relative to the effort involved in inverting a general
kk  matrix is small [29]. The Sherman-Morrison-Woodbury formula can be formulated using block-partitioned matrix inversion [29].
Let the symmetric matrix R be partitioned as: In order to find the inverse of a partitioned matrix R, we use the general result provided by the inverse of a partitioned matrix [29], which is written by the following expression [30, Appendix B]: To find the inverse of a matrix 11 F and 22 F , we use the Sherman-Morrison-Woodbury formula:

Updating inverse matrix by adding columns
Assuming that we receive new data in ith iteration, let U be a matrix containing the newly added columns. The new matrix  i is positive definite and corresponds to the newly added selected columns U to the previous matrix 1 corresponding to the new matrix  i can be expressed by the following expression: To find the inverse of a partitioned matrix i B , we use the result provided in Eq. (29). The partitioned matrix 1 − i B can be expressed by the following expression: The method of updating the inverse of a partitioned matrix by adding columns is summarized in Algorithm 4.

Updating inverse matrix by removing columns
Let P is a matrix containing p columns, we consider the case when we need to remove a p columns from matrix  i to obtain  p .
And the inverse of a partitioned matrix i B can be expressed by the following expression: Eq. (41) is modified by the following expression: and we can obtain the matrix 1 p B − for the next iteration by the following expression: The method of updating the inverse of a partitioned matrix by removing columns is summarized in Algorithm 5.

Results and Discussion
In this section, the performance of our proposed method is evaluated by a series of experiments on four face databases: ORL [31], Yale [32], Extended Yale B [33] and the GT  training, the other half is used for testing, and we will repeat the same experiments ten times to take the average accuracy and computation time.  Fig. 2 represents some images of one person in this dataset. Each image is first cropped and resized to a resolution of 64 × 64 pixels. For each person, we randomly select six of the images for training and the rest are used for testing, and we will repeat the same experiments ten times to take the average accuracy and computation time.  Fig.3 represents some images of one person in this dataset. Each image is first cropped and resized to a resolution of 64 × 64 pixels. For each person, we randomly select the half of the images for training, the other half is used for testing, and we will repeat the same experiments ten times to take the average accuracy and computation time.   4 represents some images of one person in this dataset. Each image is first changed to grayscale, and then cropped and resized to a resolution of 64 × 64 pixels. For each person, we randomly select eight of the images for training and the rest are used for testing, and we will repeat the same experiments ten times to take the average accuracy and computation time. For all face databases, we randomly choose the half of the images for training and the other half is used for testing. Each 10 times we will repeat the same experiments to take the average accuracy and computation time. This technique has the merit that randomly choosing the training set ensures that the classification results will be unbiased [4]. For each face databases, the same training and test data selections are used by all the methods to guarantee where Nc represents the training data number per class. The kernel function adopted is the RBF kernel. The kernel parameter  is set by the median value of ( ) In this experiment, we analyze and compare the performance of our method against other state-of the-art methods. First, we test the dimensionality reduction using KSLPP against other related methods, KPCA, KFDA and Isomap. Then, we test the 1 L -norm minimization problem using FCoSaMP against other related methods CoSaMP, OMP and LASSO. All of the experimental results were simulated using Python by a PC with Intel(R) Core(TM) i5 CPU 2.7GHz and 4 Go of memory.

Comparison of different dimensions reduction methods
In order to further investigate the recognition performance of all methods, we compare the performance of different dimensionality reduction methods. KSRC technique is used for classification, applying FCoSaMP.           The author is grateful to the anonymous reviewers and the associate editor for their valuable comments, which have greatly helped to improve this work.