MVM-LBP : mean−variance−median based LBP for face recognition

This paper proposes a novel descriptor called Mean-Variance-Median based Local binary pattern (MVM-LBP). The Median binary pattern (MBP) calculates the difference between median of all the pixels in a 3 ×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} 3 pixel window and respective eight neighbours. The main drawback of MBP is that two different pixel windows can have same MBP code which is not appropriate in face recognition systems. To improve the performance of the system, the proposed descriptor (MVM-LBP) involves two more statistics mean and variance and uses the mean, variance, and median values of all the pixels in a 3 ×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} 3 pixel window to get a more robust feature vector. To check the excellence of the proposed descriptor, the proposed method is applied on two publicly available facial data sets AT&T, and faces94. The outcomes show that the proposed descriptor gives encouraging results.


Introduction
CBIR is a system that can search similar images from a collection of images for a specific image (query image). Due to easy access to digital devices such as cellphones, digital cameras, tablets, and other similar digital devices, the number of digital images are rising by the minute. It takes a long time to manually search through a big pool of images for a query image. Text-based image retrieval (TBIR) [1], given keywords or hand annotation, takes time, and annotations might vary from person to person and this is one of the difficulties of TBIR. An image's content can also be used to find other images. Color, texture, and shape are three different types of information in every image. The CBIR system [2] allows search for images based on image content. An overview of a CBIR system is seen in its entirety in Fig. 1.
An image database and a query image (an image which we want to search) are the two inputs to the CBIR system. Image pre-processing is the first step in every CBIR system. All of the images are scaled to the same size and converted to gray-scale images in this phase. The feature extraction step follows, which entails extracting features using a local pattern descriptor. The retrieved features are converted into histograms in the third step, and the features are matched using similarity measure matrices in the fourth step. Finally, the best feature match is used to generate the output. A detailed survey on CBIR is presented in [3][4][5].
The performance of the CBIR systems is dependent upon the unique feature vectors. The more unique the feature vector, the better the system. In MBP [6], difference between the median of all the pixels in a 3 × 3 pixel window and respective eight neighbours is used. The main draw back of MBP is that two different pixel windows can have same MBP code which is not appropriate in face recognition systems. Fig. 2 show the two examples of G. Sucharitha and Subhash C. Sharma contributed equally to this work. different pixel arrangements but having the same MBP codes. This paper's primary focus is to propose a more robust descriptor that generates unique codes and hence the feature vector. The proposed descriptor takes advantage of the variance in calculating more unique codes.

Contribution
This paper proposed a novel descriptor called MVM-LBP. The descriptor is based on mean, variance, and median of all the pixels in a 3 × 3 window. The key contributions of the work are: 1. A novel descriptor MVM-LBP is proposed using mean, variance, and the Medina of all the pixels in a 3 × 3 pixel window. 2. The performance of the suggested descriptor is examined on the two publicly available face images databases AT&T face images database [7], and faces94 database [8]. 3. The d1 distance-based metric is utilized to compute the similarity score of the suggested system. The performance of the suggested descriptor is compared with state-of-the-art descriptors LBP [9], Neighborhood intensit based LBP (NI-LBP) [10], MBP [6], Center symmetric based LBP (CSLBP) [11], 6x6 multi-block based LBP (6x6 MB-LBP) [12], and Logically connected-LBP (LC-LBP) [13].

Organization
The paper is divided into parts for easy understanding. The related work is discussed in Section 2. The Section 3 presents some of the local patterns. The proposed descriptors, similarity measure, and evaluative measure are all depicted in Section 4. Section 5 details the databases that are used, as well as the costs of computation and the outcomes. Section 6 concludes with closing remarks and recommendations for further study.

Related work
A with the easy access of digital cameras, the facial images are increasing rapidly. With this face recognition is one of the big challenges. Many researchers have been working on face recognition problem and proposed many solution. In this section we highlighted some of the existing descriptors available in literature.
In [14] introduced the grey level co-occurrence matrix (GLCM) for image categorization. This matrix extracts characteristics based on pixel pair co-occurrence. The GLCM was created as a generalized co-occurrence matrix to extract some significant spatial features from the distribution of local maxima [15,16] presented a texture feature computation technique that uses the Prewitt edge detector to compute edge images and extracts the co-occurrence matrix for those edge images instead of the original images. Statistical characteristics were used to extract features from the  co-occurrence matrix. Researchers also used the transformation domain to extract features. [17] utilized wavelet packets to extract features and apply them to image classification.
The methods, such as the k-d tree, co-occurrence matrix, etc., are more computationally extensive. The local binary pattern (LBP) [9] is offered to solve this computational complexity. Ojala et al. [9] invented the LBP for texture analysis. Later LBP is used in many applications, such as texture classification, facial recognition, object detectio, and image retrieval. In [18], an orthogonal difference -local binary pattern (OD-LBP) has been introduced. OD-LBP considered only the orthogonal positions in the 3 × 3 pixel window. In OD-LBP, initially, a 24 bits binary pattern is generated, which is divided into three binary patterns of 8 bits each, and to reduce the size of the feature vector, PCA is used. Color ZigZag Binary Pattern (CZZBP) and Color Median Block ZigZag Binary Pattern (CMBZZBP) color face descriptor have been presented in [19]. The ZigZag pattern is used to produce the binary pattern in both the descriptors. In CZZBP, the features are extracted for three different colors, R, G, and B, and combined to produce a feature vector. In CMBZZBP, 9 × 9 pixel is utilized. This window is divided into nine blocks, each of size 3 × 3. The median of each block is computed to produce a window of size 3 × 3. To reduce the size of the feature vectors, PCA is used, and classification is done using SVM and NN [20] presented a neighbourhood and center difference-based-LBP (NCDB-LBP). NCDB-LBP is based on the difference between neighbourhood and center pixel intensities. NCDB-LBP is used in both the direction clockwise and anti-clockwise. PCA and FLDA are used to reduce the size of the feature vector, and for classification, SVM and NN are used. In [21], a local directional order pattern (LDOP) has been presented. LDOP uses multi-scale neighbourhood to improve the robustness of the descriptor. In LDOP multi-radius pixels, find the relationship between the central pixel and neighbourhood pixels at different scales. In [22], Local tri-directional patterns (LTriDP) is proposed. The traditional LBP method uses the difference information between central pixel and neighbourhood pixels. LTriDP is based on the intensities of a pixel in three directions. Local neighbourhood difference pattern (LNDP) is presented in [23]. LNDP is based on the mutual relationship of neighbouring pixels. LNDP is combined with LBP to form a more robust feature descriptor. In [24], the Mean distance local binary pattern (Mean distance LBP) is proposed. Mean distance LBP is based on the Euclidean distance of the neighbouring pixels from the central pixels. In [25], the fuzzy theory was used to detect faces in color images. The intuitionistic fuzzy set theory is used to describe local texture patterns in images.
In [26], authors proposed content-based medical image retrieval system. In this, the authors proposed CBIR system for the retrieval of medical images (CBMIR) for enabling the early detection and classification of lung diseases based on lung X-ray images. Three different approaches, Surrounding information retrieval (SIR), Minimum edge retrieval (MER) and Integrated feature retrieval (IFR), for image retrieval system are proposed in [27]. SIR extracts the features related to the similarities of the neighborhood intensity values. MER extracts the features related to the similarities of the neighborhood intensity values. IFR combines the properties of feature extraction from SIR and MER. A CBIR system using local and global features for large dataset of images is proposed in [28]. In this MapReduce paradigm with different modes is used to retrieve a queried image. Another query based image management system (QBIMS) is proposed in [29]. In QBIMS, image features are based on image energy, image entropy, image contrast, horizontal edge, vertical edge, centre point, mean and median.

Local patterns
This section briefly describes the existing local binary pattern and Median binary pattern.

Local binary pattern (LBP)
Ojala et al. [9] proposed LBP for texture analysis. LBP was later utilized successfully in several applications. In LBP, the Where N is the number of neighboring pixels at radius R, V R,n specifies the locations of the individual pixel, and V c specifies the location of the central pixel.
After obtaining the LBP features of the image of size P × Q ,the histogram can be calculated using Eqs. (3) and (4).  In Fig. 3a, 3 × 3-pixel windows are evaluated first, followed by comparing all eight neighbors to the center pixel. After comparison, pixels with intensity value higher than or equal to the central pixel are designated as 1 while those with intensity value less than or equal to the central pixel are designated as 0. This generates an 8-bit binary pattern, which may be translated to decimals by assigning weights to each binary value, and the LBP code can then be generated by summing decimal values. LBP generates histograms with a size of 256.

Median binary pattern (MBP)
In [6], MBP descriptor has been suggested for texture analysis. In this, all the eight neighbors of a central pixel in a 3 × 3 pixel window are compared with the median value of the whole window. After comparison, if the difference between the neighbouring pixel and central pixel is greater than or equal to zero then the corresponding neighbouring pixel designated as 1 else 0. By assigning weights to each binary value, an 8-bit binary pattern may be generated, which can then be translated into decimals. The entire computation for the MBP code is shown in Fig. 3b. The MBP descriptor is defined mathematically in Eqs. (5) and (6).
Where N is the number of neighboring pixels at radius R, V R,n specifies the locations of the individual pixel, and V median specifies the pixels' median in a 3 × 3 pixel window.
After obtaining the MBP features of the image of size P × Q , the histogram can be calculated using Eqs. (7) and (4).
In Fig. 3b, 3 × 3-pixel windows are evaluated first, followed by comparing all eight neighbors to the median of all the nine pixels in the 3 × 3 pixel window. After comparison, pixels with intensity value higher than or equal to the central pixel are designated as 1 while those with intensity value less than or equal to the central pixel are designated as 0. This generates an 8-bit binary pattern, which may be

Proposed descriptor and system framework
This section describes the proposed descriptor with an example, and also covers the step-by-step algorithm for calculating MVM-LBP codes, the metric used for distance similarity between feature vectors, and the evaluation measures (precision, recall and f-score).

Mean-variance-median based local binary pattern (MVM-LBP)
The proposed MVM-LBP descriptor is based on the mean, variance, and median of all the pixels in a 3 × 3 pixel window. In MBP, all the eight neighbors are compared with the median of the entire 3 × 3 window. But, in MVM-LBP, all the eight neighbors are compared with an average of the mean, square root of the variance, and median of all the pixels in a 3 × 3 pixel window. Mathematically MVM-LBP is defined in Eq. (8) and (9).
Where N is the number of neighboring pixels at radius R , V R,n specifies the locations of the individual pixel, and V MVM is calculated using Eq. (10) for all the pixels in a 3 × 3 pixel window.
Where V mean and V median specifies the mean and median values of all the pixels in a 3 × 3 pixel window. V sd. is calculated using Eq. (11) After comparison, if the difference between the neighbouring pixel and central pixel is greater than or equal to zero then the corresponding neighbouring pixel designated as 1 else 0. By assigning weights to each binary value, an 8-bit binary pattern may be generated, which can then be translated into decimals. The entire computation for the MVM-LBP code is shown in Fig. 4. After obtaining the MVM-LBP features of the image of size P × Q , the histogram can be calculated using Eqs. (12) and (4). The size of the histogram produced by MVM-LBP is 256. A block diagram of the proposed system is shown in the Fig. 5.
In Fig. 4, 3 × 3-pixel windows are evaluated first, followed by comparing all eight neighbors to the average of the mean, square root of the variance, and median of all the nine pixels in a 3 × 3 pixel window. After comparison, pixels with intensity value higher than or equal to the central pixel are designated as 1 while those with intensity value less than or equal to the central pixel are designated as 0. This generates an 8-bit binary pattern, which may be translated to decimals by assigning weights to each binary value, and the MVM-LBP code can then be generated by summing decimal values. MVM-LBP generates histograms with a size of 256. Fig. 6 shows the transformed images using the proposed descriptor. From the figure, the proposed descriptor is using boundary lines and main features from the image for features extraction.

Similarity measure
After feature extraction, feature matching is another important step in CBIR systems. Many distance similarity metric like, Euclidean distance, Manhattan distance, Canberra distance, d1-distance, Mean-squared distance, etc. [40], are existing in the literature. In this paper experiments are performed with all the above mentioned similarity metrics. Out of these d1-distance similarity metric (Eq. 13) given the best results. Results with all the distance metrics are presented in Table 5. Based on the smallest distances, identical images are retrieved. In our experiment, each image in the database has been treated as a query image once.
In all above equations left side is the respective distance between the feature vectors of database image db and j t h query image q j . N represents the length of the feature vector. F db (i) and F q j represent the feature vector of the database image and j th query image.

Algorithm
Step-by-step working of the proposed method is presented in Algorithm 1. In this, image retrieval using the proposed descriptor works in three steps. Image pre-processing is the initial step. In this step, all the images are loaded and converted into gray-scale images of equal size. Feature vector generation is the second step. The feature vectors of the database images and the query image are evaluated using Eqs. (8)(9)(10)(11), and for histograms Eqs. (12) and (4)   i ⇐ i + 1 14: end while

Evaluation measure
The performance of the proposed descriptor is compared to some of the existing state-of-the-art methods. Because the proposed descriptor takes advantage of mean, variance and median, therefore, it generates more unique codes. To prove the excellence of the proposed descriptor, the results are compared with LBP [9], MBP [6], 6 x 6 MB-LBP [12], CSLBP [11], NI-LBP [10], and LCLBP [13].
In each experiment, the images are retrieved by taking every image from the database as a query image. In this paper, the precision and recall values are evaluated using Eqs. (14) and (15).
Where A is the total number of relevant images retrieved from the dataset, B is Total number of images in the dataset, and C is Total number of relevant images present in the dataset.
Further F-score is evaluated using P and R values from Eqs. (14) and (15). F-score is given in Eq. (16).
Average recall rate (ARR) and Average precision rate (APR) are calculated using Eq. (17).

Experiment results and analysis
Different experiments are performed to check the excellence of the proposed descriptor. This section provides experiments performed and the outputs.

Experiment
To perform the experiments, two publicly available facial databases, AT&T [7], and faces94 [8] database of facial images are used. Both the databases are having the same challenges of lighting, face emotions and facial features. Both the databases are summarized in Table 2.

Experiment 1: AT&T database of facial images [7]
The first database for our experiment is the AT&T face database [7]. This database contains 400 face images of 40 subjects, and each subject is represented by ten different images. The images for certain persons were taken several times, with varied lighting, face emotions (open/closed eyes, smiling / not smiling), and facial features (glasses / no glasses) being used. Fig. 7a shows some of the sample images of this database.
The performance of the proposed descriptor on retrieving the different numbers of images from the database is shown in Fig. 8. Initially, only one image is retrieved, and the number of retrieved images is increased by one subsequently. A maximum ten number of images have been retrieved from this database. The APR of the proposed descriptor is outperformed the other descriptors on retrieving ten images.
The maximum improvement of 30% is from LBP, and the least improvement of 2.1% is from MBP is recorded. The proposed descriptor outperformed by 21.14% from NI-LBP, 12% from CS-LBP, 11.77% from 6 × 6 MB-LBP, and 16% from LC-LBP. The APR% and the ARR% are presented in Table 3.

Experiment 2: faces94 facial images database [8]
Faces94 database [8] is the second database used in this paper. There are 2980 color images of 149 people in this database [8]. Each subject has 20 images with considerable expression changes and minor changes in face position with head turn, tilt, and slant. For this experiment have we considered only 50 individuals randomly with 20 images each. Therefore, we have taken 1000 images in total. Fig. 7b shows some of the sample images of this database.
The performance of the proposed descriptor on retrieving the different numbers of images from the database is shown in Fig. 9. Initially, two images are retrieved, and the number of retrieved images is increased by two subsequently. A maximum twenty number of images have been retrieved from this database. The APR of the proposed descriptor is outperformed the other descriptors on retrieving twenty images. The maximum increased of 51.22% is from LBP, and the least increased of 4.2% is from 6 × 6 MB-LBP. The proposed descriptor outperformed by 19% from NI-LBP, 16.27% from MBP, 5.6% from CS-LBP, and 19.9% from LC-LBP. The APR% and the ARR% are shown in Table 4.   Table 5 presented the summary of all the ARR obtained using different distance similarity metrics discussed in subsection 4.2.

Computational cost
The performance of the CBIR systems is dependent upon the size of the feature vector (#FV). The size of the feature vectors generated using the proposed descriptor and stateof-the-art descriptors except the CS-LBP is same. But, the performance of the proposed descriptor is much better than other techniques. Feature vector size and ARR% of all the used techniques on faces94 database [8] is presented in the Table 6. From the table, the performance of the proposed descriptor is significantly improved from the other existing descriptors.

Conclusion
A novel descriptor MVM-LBP has been proposed in this paper. This descriptor performs well in precision and recalls values tested on two facial databases AT&T database [7], and the faces94 database [8]. The proposed MVM-LBP descriptor takes the advantage of using extra statistics while calculating MVM-LBP codes. Although, in comparison the size of the feature vectors in MVM-LBP is same as to the state-of-the-art techniques. In this paper, different similarity measure metrics are used for experiment purpose but d1-distance metric given the best results. For the proposed descriptor, the concluding points are as follows: 1. The proposed descriptor takes the advantage of including more statistics measures for calculating the patterns. 2. The average precision/average recall has considerably increased from 35.90%, 38.55%, 45.70%, 41.35%, 41.78%, and 40.25% to 46.70% when compared with basic LBP, NI-LBP, MBP, CS-LBP, 6 × 6 MB-LBP, and LC-LBP, respectively on AT&T database [7]. 3. The average precision/average recall has significantly increased from 48.04%, 61.01%, 62.48%, 68.77%, 69.72%, and 60.56% to 72.65% when compared with basic LBP, NI-LBP, MBP, CS-LBP, 6 × 6 MB-LBP, and LC-LBP, respectively on faces94 database [8].
In the future, we should focus on generating more efficient descriptors. To reduce the size of the feature vectors, techniques like PCA and LDA can also be used.