Improvement of face recognition performance using a new hybrid subspace classifier

Multiple classification systems play an important role in increasing recognition performance, especially when using heterogeneous classifiers that effectively improve performance. In this study, a new hybrid classifier was designed using heterogeneous Fisherface and discriminative common vector approach (DCVA) subspace recognition methods, which gave successful results in face recognition. While the classification process of DCVA is based on the common properties of signals belonging to the classes, the classification process of Fisherface is based on the different properties of signals. To create a hybrid classifier, called the Hybrid DCVA-Fisherface, the classifiers' decision rules were combined using the Minimum Proportional Score Algorithm and Recognition Update Algorithm. In addition to the proposed subspace classifiers, convolutional neural networks, Transform learning-Alexnet, Alexnet + SVM, and Alexnet + KNN were used for classification. Studies were conducted using the ORL, YALE, Extended YALE B and Face Research Lab London Set (FRLL). To better examine the efficiency of the algorithms, tests were also carried out by downsampling the images. When the experimental results were analysed, the proposed hybrid classifier gave higher recognition rates than all classifiers for ORL, YALE, and Extended YALE B. However, deep learning methods generally achieved better recognition performance than subspace classifiers for the FRLL database, which has more classes than other databases.


Introduction
Traditionally, classifiers are used separately for classification in pattern recognition, and the classifier that gives the best recognition rate is determined. In addition, classification can be made with multiple classifier groups [1]. Multiple classifier systems (MCS) are presented as high-performance methods in pattern recognition. The MCS method is constructed using combinations of some different classifiers. The classification performance is affected by the fact that the classifiers selected by the MCS method are created in the correct combination and make different errors. Methods aiming to design pattern classification systems formed from diverse classifiers as hybrids are accepted as a basic need nowadays.
B Serkan Keser skeser@ahievran.edu.tr 1 method that implements a generalised version of the bagging and boosting algorithms that combine the decisions of various classifiers. Nweke et al. [7] implemented a different data fusion and multiple classifier systems in human activity recognition. Mi et al. [8] proposed a nearest-farthest subspace (NFS) classifier that takes advantage of different properties of the class-specific subspace using the nearest subspace (NS) and farthest subspace (FS) classifiers. Rodriguez et al. [9] used the Rotation Forest method to construct ensembles of classifiers based on feature extraction. In this study, they randomly divided the feature set into subset K and applied principal component analysis (PCA) to each subset to generate the training data for the base classifier. A multi-feature, multi-classifier system for SER is proposed in [10], and four convolutional neural networks (CNNs) and a traditional support vector machine (SVM) classifier are used to exploit emotional information in multiple features.
Apart from these studies, there are new face recognition studies that give successful results in computer vision. For example, Liao and Gu [11] proposed a face recognition approach by subspace extended sparse representation and discriminative feature learning, called SESRC & LDF. The experimental results show that SESRC & LDF achieves the highest recognition rates, outperforming many algorithms. In another study, they also performed a new subspace clustering method based on alignment and graph embedding (SCAGE) [12]. In SCAGE, they unify the image alignment process and clustering subspace learning process based on low rank and sparse representation. Liao et al. [13] created a graphbased adaptive and discriminative subspace learning method (GADSL), giving successful face image clustering results.

Motivation and contribution
A general face recognition system includes feature extraction [14,15], classifier selection [16,17], and the classification rule [18]. Principal component analysis (PCA) [19,20] is one of the most common methods used for feature extraction. The PCA transforms the original images from a high-dimensional space to a low-dimensional feature space to extract features. Fisher's linear discriminant analysis (FLDA) is one of the most common classifiers used in face recognition [21,22]. FLDA finds orthogonal optimum basis vectors that maximise the within-class distribution and minimise the between-class distribution [23]. In other words, the optimum basis vectors are found using the difference subspace of the matrix formed by the product of the between-class distribution matrix and the inverse of the within-class distribution matrix. The Fisherface method is performed from FLDA and makes the within-class distribution matrix (S W ) non-singular using PCA; in this way, optimal basis vectors can be found [23].
Another classifier used in face recognition that gives good results is the Discriminative Common Vector Approach (DCVA) derived from CVA [24][25][26][27][28]. The CVA uses the indifference subspace of the within-class distribution matrix (S W ), and a unique vector containing each class's common properties is found using CVA. This vector is called the common vector, and the dimension of the common vector is equal to the dimension of the samples.
On the other hand, DCVA performs classification using the basis vectors that maximise the distributions of these common vectors. Discriminative common vectors whose sizes are one less than the number of classes are found using DCVA for each class. As a result, Fisherface is a classifier based on difference subspace, and DCVA is a classifier based on indifference subspace. During classification, when one of these classifiers misclassifies a test image according to the different (or common properties), the other classifier can classify it correctly according to the common properties (or different properties). Also, even if two classifiers assign a test image to the correct class, the classifiers may have different recognition performances. Based on these ideas, a hybrid classifier was developed using heterogeneous Fisherface and DCVA classifiers, combining the decision rule of the two classifiers and reducing the overall error rate of the system.
Classical classifiers project to test and train images into a subspace using basis vectors and compare them based on the Euclidean distance. The proposed method uses performance score values instead of Euclidean distances. The algorithm that finds these score values is called the Minimum Proportional Score Algorithm (MPSA). This algorithm finds a score value belonging to classifiers for each test image. The base classifier is selected according to the performance score of the classifiers. The recognition rate found using Euclidean distances of the classifiers is used in the MPSA. The base and another classifier's performance values are obtained to correct incorrectly classified parts. The algorithm called the Recognition Update Algorithm (RUA) performs this correction. Due to the update, the recognition rate increases if some incorrectly classified parts of the base classifier are assigned to the correct class.
Four separate databases, ORL, YALE, Extended YALE B, and FRLL, were used in experimental studies. In the test phase, leave-one-out cross-validation was used for YALE and ORL, threefold cross-validation was performed for Extended YALE B, and tenfold cross-validation was carried out for the FRLL database. It has been observed that the HDF method with the used decision rule gives higher recognition rates between 0.25% and 20% than the Fisherface and DCVA. In addition, the face recognition performances of deep learning methods such as CNN, Alexnet + SVM, Alexnet + KNN, and Transfer Learning-Alexnet (TrAlexnet) and subspace classifiers were also compared.

The classifiers used in the study
DCVA, Fisherface, HDF, CNN, Alexnet + SVM, Alexnet + KNN, and TrAlexnet classifiers used in this study are briefly summarised below.

DCVA classifier
The DCVA method primarily involves finding the common vectors (x i com ) for the ith class [24]. Then the common scatter matrix (S com ) is found using common vectors as follows, where μ com indicates the mean vector of the common vectors, and i is the index of classes. The eigenvectors corresponding to the nonzero eigenvalues of matrix S com give the optimal projection vectors (W opt =[w 1 w 2 … w C-1 ]) for the DCVA. The feature vectors can be written as follows [24]; These vectors ( i ) are called discriminative common vectors, whose dimensions are at most C-1. In the test phase, to classify the test signal, the feature vectors of the test signal are found by where Ω test ∈ R (C−1)×1 . The operations described above were performed for the insufficient data case (M < n); however, in the sufficient data case (M > n), because the covariance matrix has n nonzero eigenvalues, difference and indifference subspaces can be determined by estimation [27].

Fisherface classifier
In Fisherface, first, S w and S b scattering matrices are obtained for the image in the training set. The between-class scatter matrix S B is calculated by where N is the number of samples in a class. μ i is the mean of the ith class, and μ represents the mean of all classes; C is the number of classes, and the optimal set of basis vectors (W opt ) is determined using these matrices [23].
C-1 eigenvectors corresponding to the largest eigenvalues of the formed matrix as a result of S −1 W S B multiplication gives the optimal basis vector (W opt ). In other words, these basis vectors are obtained from the difference subspace of S −1 W S B . However, S W becomes singular if the number of images in the database is less than the N size of images. All image signals in the database are reduced to N-c by applying PCA to solve this problem. Thus, the new S W matrix of PCA applied signals becomes non-singular, applying the standard FLD defined by (5) to reduce the dimension to C -1. All the signals in the training set are projected onto the optimum space using this basis vector.
where i ∈ R (C−1)×K and, K is the number of samples in the ith class. For classification, the test signal is first projected using the W opt, and, then test In the test phase, the projected test signal ( test ) is assigned by Euclidean distance measure to the most appropriate class.

Hybrid DCVA-Fisherface (HDF) classifier
In the study, a hybrid classifier was performed by combining the decision rules of DCVA and Fisherface. In the test phase, the recognition matrices (euc_class 1 , euc_class 2 ) are found using the Euclidean distance measure of all test signals for each classifier. These matrices are essential to determine whether the test signals are assigned to the correct class. Two different algorithms have been proposed to obtain hybrid classifiers. In the first algorithm, MPSA, performance score values based on the recognition performances of the classifiers were found instead of the Euclidean distance criterion used for the classification. To achieve this, the classification matrices (euc_class 1 , euc_class 2 ) are converted into performance score matrices (SC 1 i j , SC 2 i j ). By using the MPSA algorithm, which classifier to be taken as the base classifier is determined. After the base classifier is selected, the incorrectly classified parts by the base classifier are updated using the RUA according to the performance scores of the other classifier. The recognition rate increases if some errors are assigned to the correct classes. However, if no errors are corrected, the recognition rate of the hybrid classifier becomes equal to the recognition rate of the selected base classifier. The proposed hybrid classifier system is shown in Fig. 1. Classically, the class given the smallest Euclidean value is assigned a test signal using the classifiers, but the Euclidean distances of the other classes are ignored. On the other hand, Euclidean distances of other classes also give information about the classifier's performance. For example, the sum of the ratios between the class giving the smallest Euclidean value and the Euclidean distances of the other classes can provide a performance score value for a test signal. Based on this idea, the performance score was used for classification with the proposed MPSA. By the MPSA, the classification is made according to the obtained performance score values instead of the Euclidean distance criterion. The classifier that gives the highest performance of the two classifiers is determined as the base classifier. The test signals that the base classifier classifies incorrectly according to the Euclidean distance criterion are tried to be assigned to the correct class by using the performance scores of the other classifier and the base classifier. In Fig. 2, a test signal is projected into the optimum subspaces separately for two classifiers. Finally, the Euclidean distances obtained for n classes are shown as vectors according to their magnitudes.
The distances indicated by the red arrow represent the assignment to the correct class. Other arrows indicate longer Euclidean distances belonging to different classes. In other words, these arrows are the magnitudes of the distances belonging to the incorrect classes. Although classifier 1 and classifier 2 assign the test signal to the exact correct class, these classifiers have a recognition performance difference, as seen in Fig. 2; at the same time, there is a high distance difference between the correct and false classes in classifier 1 . However, this difference is much less for classifier 2 . In other words, it can be said that classifier 1 gives a better recognition performance than classifier 2 . With the proposed mathematical method, the performance score matrix (SC where d p x min is the smallest Euclidean distance between classes for the test signal x, i is the index of the classes, n is the number of classes, p is the classifier index, j is the index of the test signals, and k is the number of samples. The performance values of the classifiers are obtained using the MPSA algorithm. The squaring of the performance score values was used to clarify the difference between the scores. For all test signals, the classifiers have an overall performance score matrix (SC 2 i j , SC 1 i j ). Then the differences between the performance score values for the jth test signal of the ith class are found (di f f i, j = SC 2 i j -SC 1 i j ). If this difference is positive, classifier 1 has better performance, and the difference is assigned to the performance value ( per f 1 ) of the classifier 1 . If the difference is negative, classifier 2 has better performance. In this case, the performance value ( per f 2 ) of the classifier 2 is assigned the absolute value of this difference. At the end of the algorithm, the per f 1 and per f 2 performance values are summed and assigned to ts 1 and ts 2 , respectively. If the total score 1 (ts 1 ) is higher than the total score 2 (ts 2 ), it indicates that the first classifier performs better than the second classifier for all test signals. As a result, the first classifier is selected as the base classifier; otherwise, the second classifier is chosen as the base classifier. The MPS algorithm is given below.
The base classifier is selected using the MPSA, and then the RUA is applied. The euc_class 1 and euc_class 2 matrices mentioned above are converted to class 1 and class 2 matrices containing the values cc when classified correctly and fc when classified incorrectly for each test signal. These matrices are size n × k for n classes and k test signals.

Recognition update algorithm (RUA)
The RUA is based on the base classifier's recognition matrix (euc_class) obtained according to the Euclidean values. Misclassified parts in this matrix are updated according to the performance score matrices of the base and another classifier (SC 2 i j , SC 1 i j ). For a test signal, the updated recognition matrix (updated_rec) gets the value of "cc (correct classification)" if both classifiers correctly classify; otherwise, the updated recognition matrix receives the value of "fc (false classification)". The updated_rec matrix is size n × k for n classes and k test images. The RUA algorithm is given below. To indicate the RUA more clearly, the recognition matrices of the two classifiers (class 1 and class 2 ) are given in Fig. 3. The first classifier (classifier 1 ) is assumed to be selected as the base classifier using the MPS algorithm. In this case, only the recognition matrix (class 1 ) of classifier 1 is referenced.
As you can see, there are three errors in the class 1 matrix. First, blue boxes in Fig. 3a, b indicate that both classifiers misclassify for the same test signal. Therefore, this error cannot be assigned to the correct class. However, when the two errors in the class 1 matrix of the base classifier are compared with the corresponding recognition values (fc or cc) and scores in the class 2 matrix, it is seen that classifier 2 has better performance scores than classifier 1 (s 2 4,6 < s 1 4,6 , s 2 5,5 < s 1 5,5 ). As a result, these misclassified parts are assigned to the correct classes using the RU algorithm, and the updated recognition matrix is obtained in Fig. 4.

The proposed CNN classifier
The proposed CNN model consists of three convolution layers, 3 max-pooling layers, and 4 regularisation layers. A total of 50 epochs with 30 iterations were applied, and 0.01 was chosen as the learning coefficient. The adaptive moment estimation (Adam) optimiser was used as a solver for the training network. The layers of the proposed CNN model are shown in Fig. 5.

Transfer learning Alexnet model (TrAlexnet)
While CNN can give good results in face recognition for large databases [30], the Alexnet pre-trained CNN model can provide better results for small databases than classical CNN [31]. Therefore, the pre-trained Alexnet CNN model was also used in the study. All images are resized to 227 × 227, and all greyscale images are converted to RGB to be used in Alexnet. In the MATLAB environment, Alexnet consists of 25 layers, and the last three layers (fully connected, softmax, classification layers) are used to classify the features obtained from the previous layers. In transform learning, the layers are transferred to the new classification task by removing the last three layers and adding a new fully connected layer according to the number of classes in the database. So, a new fine-tuning deep transfer learning model for the problem is created. For this model, we used a 0.0001 learning rate, stochastic gradient descent with momentum (SGDM) optimiser, and 20 epochs.

The proposed Alexnet + SVM and Alexnet + KNN
Another deep learning model has been proposed apart from the TrAlexnet model used in the study. This model obtains some layers of Alexnet for feature extraction. These features are used for classification using machine learning algorithms such as KNN and SVM [31,32]. In this model, during the training phase, the 20th layer of the Alexnet model, fully connected-7 (fc7), was used to extract the feature. This way, 4096-dimensional features were obtained for each image and used in training for the SVM and KNN classifiers. The test images were classified using SVM and KNN classifiers in the test phase. Here, the linear kernel for SVM and K = 5 nearest neighbor is used for KNN. Besides, hyper-parameters proposed for TrAlexnet are used.

Experimental results
ORL, Extended YALE-B, YALE, and FRLL databases were used in the study. The ORL database is a face database consisting of 400 images obtained using ten different images of 40 people. The YALE database has a total of 165 images belonging to 15 people. In the experiments, 150 images were used because two images of each person were similar. The Extended Yale B database, which is the cropped version, contains 2414 images with a size of 192 × 168 over 38 subjects and 64 images per subject [29]. The extended Yale Face Database's 44 images were used for training and 20 for testing. In this way, threefold cross-validation was performed in the study. The FRLL database contains 1020 images with a size of 1350 × 1350 over 102 subjects and 10 images per subject [33]. Using a MATLAB code, each image in the FRLL database is resized by selecting the face region of 800 × 800.
Images are downsampled to create sufficient data cases. Downsampling was performed using two factors, and one of the four obtained images was used in the studies. This selected image has been downsampled similarly, and its size has been further reduced. By using the downsampling process, 64 × 64,32 × 32,16 × 16 and 8 × 8 sized images for ORL, 80 × 80, 40 × 40, 20 × 20, and 10 × 10 sized images were used for YALE. Also, 192 × 168, 96 × 84, 48 × 42 and , 24 × 21 sized images were used for the Extended YALE-B and, 160 × 160, 80 × 80, 40 × 40 , and 20 × 20 sized images were obtained for the FRLL database. For the deep learning methods using Alexnet, the sizes of the images were first reduced to the dimensions mentioned above. Then, the sizes of the images were resized to 227 × 227 pixels through MATLAB's augmented image datastore function. Experimental results were obtained using codes written in a MATLAB environment. The results for ORL and YALE databases are given in Tables 1 and 2, respectively. The + and -symbols next to the expressions showing the image dimensions indicate the sufficient and insufficient data cases, respectively.  Tables 1 and 2 show that the Hybrid-DF classifier has higher recognition rates than the Fisherface and DCVA classifiers. In Table 2, the highest difference between the recognition rates of the HDF and others was 2%, which was found using 40 × 40 images for the YALE database, and for the ORL database, the highest difference is 0.5% in Table 1. As seen in Table 1, CNN has a lower recognition rate than other classifiers.
These tables and the tables below show that DCVA has low recognition rates for sufficient data cases. The main reason is that the difference and indifference subspaces cannot be distinguished precisely. In these tables, TrAlexnet gave the best recognition results among deep learning methods, but HDF has higher recognition rates than all classifiers. The results obtained using the Extended YALE B database are given in Table 3.
In Table 3, the CNN gave higher recognition rates than the DCVA and Fisherface at some image sizes; however, the HDF has the highest recognition rate for the Extended YALE B database. While CNN and TrAlexnet obtained high recognition performance for sufficient and insufficient data cases, Alexnet + SVM and Alexnet + KNN gave lower recognition performance than CNN and TrAlexnet. The recognition results of the classifiers for the FRLL database are given in Table 4. One of the reasons for choosing the FRLL database is that the number of classes is higher than other databases. As it is known, as the number of classes increases, the projections of the images of each class into the optimum subspace will become more interfere and negatively affect the recognition rate. Finally, the FRLL database was chosen to examine the effects of this situation on subspace classifiers.
As can be seen from Table 4, deep learning algorithms have higher recognition rates than subspace classifiers for insufficient data cases. For insufficient data case, while TrAlexnet CNN, and Alexnet + KNN gave better recognition performance than other classifiers, CNN and TrAlexnet obtained the best recognition performances for sufficient data case. In subspace classifiers, HDF gave the best recognition performance for sufficient and insufficient data cases, which has approximately 8% higher recognition rates than Fisherface and 20% higher than DCVA. Moreover, DCVA achieved the lowest recognition rates for sufficient and insufficient data cases. In addition, HDF gave better recognition results than Alexnet + SVM and Alexnet + KNN for sufficient data cases.
In Table 5, the selected base classifiers using the MPSA algorithm are given for all dimensions of the images.
When the selected classifiers are examined according to the tables above, it can be seen that the classifiers with the highest recognition rate were selected as the base classifier. When all the experimental results are examined, subspace classifiers can exceed the classification performance of deep learning methods for databases with few classes. However, in the FRLL database, where the number of classes is higher than in other databases, it has been observed that subspace classifiers give lower recognition performance than deep learning algorithms.

Conclusions
Hybrid classifiers have an essential role in pattern recognition. Using heterogeneous classifiers in hybrid form can give significantly higher recognition rates. For example, the fact that Fisherface and DCVA use different subspaces for classification in the study is a factor that increases the performance of the hybrid classifier because a test image classified incorrectly according to the difference subspace can be classified correctly according to the indifference subspace, and vice versa. In the study, the base classifier with the best performance was selected using the MPSA. Then the recognition matrix was updated using the RUA. The experimental studies showed that the MPSA correctly chose the base classifier corresponding to the classifier with the highest recognition rate for all test signals and image sizes. It was observed in experimental studies that the proposed hybrid classifier gives higher recognition rates than DCVA and Fisherface for all image dimensions. In addition, the recognition performances of Alexnet + SVM, Alexnet + KNN, and TrAlexnet deep learning algorithms and subspace classifiers were compared. As a result of the comparison, subspace classifiers generally have better recognition performance for databases such as YALE, ORL, and Extended YALE B with a small number of classes. At the same time, deep learning algorithms perform better recognition performance than subspace classifiers in sufficient and insufficient data cases for databases with more classes, such as the FRLL database. This result obtained by the subspace classifiers is related to more interference between the class signals projected to the optimum subspace due to the increased number of classes. The proposed HDF subspace classifier has better recognition performance than DCVA and Fisherface in all experimental studies. Finally, the results show that HDF is a classifier with high recognition performance, especially for datasets with small classes.
Author contributions All the work done to create this article was carried out only by Keser.
Funding No funding was received to assist with the preparation of this manuscript.

Data availability
The datasets used or analysed during the current study are available from the corresponding author upon reasonable request. Ethical statement. I declare that all the principles of ethical and professional conduct have been followed while preparing the manuscript for publication in signal, image and video processing, and comply with Springer's ethical policies.