Improved Gait Recognition Accuracy Based on DFT-GEI

Person identication is a challenging task in computer vision. Identify a person from different cameras due to changes in appearance based on cofactors. Cofactors such as changing clothes, a suitcase, backpack, etc. The gait biometric is used to identify a person vary with different cofactors at different backgrounds. The person's gait can be identied at a distance, based on a walking pattern, without any physical contact. In this work, the videos are recorded using Infrared and Visible cameras at different locations such as urban and rural environments. The pre-processing includes the recorded videos are converted into frames, person identication using deep learning techniques, background subtraction, artifacts removal, silhouettes extraction, calculating gait cycle, and synthesis frequency domain gait energy image by averaging the silhouettes. The moving features are extracted from the frequency domain gait energy image and gait energy image are dimensionally reduced by principal component analysis, recognized using different classiers and results are compared. Experiments are conducted on urban and rural datasets recorded using Long Wave Infrared and Visible cameras.


I. Introduction
Video surveillance is to track people across a network of cameras to detect abnormal behavior. Recognizing a person by gaits is more popular due to identify at a distance from the low-resolution videos or images, without the cooperation of individuals, without any physical contact with instruments such as faces and ngerprints requires physical contact with instruments. Gait can be recognized by moving features from a distance while other features such as faces, ears, and ngerprints are hidden. Gait features are typically di cult to be caricatured. The most challenging problems in gait recognition in a video, owing to variations in background, changes in daylight illuminations, different cofactors, body shapes, person's pose and appearances.
Gait identi cation plays a major role in video-based wide-area surveillance [1,2] in nding terrorists in airports, stations, car parking, banks, crowd gathering places, law enforcement to identify criminals, detecting health disorders such as identifying the early stage of Parkinson's and in the sports training to provide optimal training strategies.
Thermal imaging is a bene t to armed forces such as the army, navy and air force. The border surveillance and law enforcement work in all weather conditions and day-night, they use thermal detectors to capture the infrared cameras. The infrared cameras capture the radiation emitted from objects, which are above absolute zero temperature. Thermal imaging is mainly used to locate moving objects, recognize, target and differentiate from the own to enemy forces. Thermal imaging, due to its various advantages, has many applications in the military and defence [3].
The object appearance and shape are characterized by the distribution of local intensity gradients or edge directions. The gradients and edge directions are implemented by dividing the image window into cells, for each cell accumulating a local l-D histogram of gradient directions or edge orientations over the pixels of the cell. Contrast-normalize can be done by accumulating a measure of energy over blocks and using results to normalize all the cells in the block. The normalized descriptor blocks are referred as Histogram of Oriented Gradient (HOG) descriptors. Dalal et al. [4] describe that tiling the detection window with a dense grid of HOG descriptors and using the combined feature vector in a conventional SVM based window classi er gives human detection chain. Dalal et al. [5] build a detector combine gradient based appearance descriptors with differential optical ow-based motion descriptors in a linear Support Vector Machine (SVM) framework to detect human in a challenging environment.  [11] framework for object instance segmentation is simple and exible. The framework includes instance segmentation, bounding box object detection and person key point detection. Framework detects objects in an image and generates a high-quality segmentation mask.
A static camera observing a region of interest is a common case for monitoring in a surveillance system.
Detecting objects of the region of interest is an essential step in analyzing the scene. A statistical model of a scene exhibits some regular behavior. In background subtraction, pedestrians are detected in the scene when the full body exactly tted in the model. A Gaussian mixture model (GMM) was proposed for the background subtraction in [12] and e cient update equations are given in [13]. In [14], the GMM is extended with a hysteresis threshold. In GMM, the kernel method is much simpler, the processing time is less, and the segmentation is better than the traditional methods [15], [16]. The GMM gives a compact representation and a better model for simple static scenes.
Visual Background Extractor (ViBe) is another method for background subtraction as proposed in [17].
This method requires a minimum memory compared to the other background subtraction technique, it compares the current pixel value with the neighborhood value to determine whether that pixel belongs to the background and remodel by substitute values from the background. Finally, the part of the background pixel value is propagated to the neighboring pixel of the background.

Ii. Methodology
The Gait video data collected from two different locations in the urban and rural environments. The data were collected from volunteers of different ethnicity, religion, and a range of body forms from slim to fat.
The participants were both men and women volunteers were wearing different clothing, coats, case, and backpack are considered for this analysis. Walking along straight lines perpendicular to the camera view axis in the urban and rural environments are recorded using Longwave infrared (LWIR) and Visible cameras. The rural data consists of 24 subjects and the urban data consists of 31 subjects. Two walking data sequences, Right to Left and Left to Right. In this work, we considered Left to Right walking data sequences.

Human detection
The algorithms used for human-based detection are HOG, YOLO and Mask-RCNN. The YOLO-based object detection outperforms other methods.
a. HOG HOG is for object detection. The following steps are required to calculate HOG for an object: 1.Image normalization to reduce the in uence of illumination effects.
2.Computing the gradient image in x and y to add further resistance to illumination variations.
3.Computing gradient histograms provides resistant to small changes in pose or appearance.
4.Normalizing across blocks provides better invariance to illumination, shadowing, and edge contrast.
5.Flattening into a feature vector.

b. You Only Look Once
A single convolutional neural network predicts bounding boxes, class labels and probabilities directly from full images in one evaluation. The main advantage of YOLO is it extremely fast and makes predictions that are comparatively better than traditional methods for object detection. YOLO makes less than half the number of background errors and false positives and negatives compared to other methods. In YOLO, the detected box is bounded towards the object approximately the same size as the object. The limitation of YOLO imposes strong spatial constraints and struggles to generalize aspect ratios or con gurations to objects.

c. Mask R-CNN
Mask R-CNN is for semantic segmentation and extends Faster R-CNN for the bounding box recognition.
Mask R-CNN detects objects and generates a segmentation mask for each instance. The results of HOG, YOLO, and Mask R-CNN are shown in Fig. 1.
The bounding box of HOG is larger than the object. False positives and false negatives are relatively higher than YOLO. Instance segmentation of Mask R-CNN results in a rectangular effect. YOLO outperforms other methods with a smaller number of false positives and false negatives with a compact bounding box around the object.

Background Subtraction
The background subtraction was to check the quality of the image using GMM and ViBe methods. The results of both the methods are shown in Fig. 2. The gure shows that the ViBe results are comparatively better with fewer artifacts and clutter than GMM.

Silhouettes Extraction
Each subject is divided into four groups normal, coat, bag, and suitcase. The silhouettes for normal data consist of 12 sequences, six sequences of walking from Left to Right and six sequences of walking from Right to Left. The coat, bag, and suitcase data consist of four sequences, two sequences of walking from left to right and two sequences of walking from right to left. In this work, Left to Right walking sequences are considered for gait analysis. The silhouette data are divided into training and testing sets.
The training data set consists of four sequences of normal silhouettes. The testing data sets consist of two sequences of normal, coat, bag, and suitcase. The silhouettes are shown in Fig. 3.

Gait Energy Image
The Spatio-temporal silhouettes are averaged over the Gait cycle to calculate Gait Energy Image (GEI).
The GEI are shown in Fig. 4.
Gait Energy Image is de ned as, where pre-processed binary gait silhouette, N is the number of frames in a gait cycle, n is the frame number and and values are the 2D image coordinates.

Discrete Fourier Transform
The amplitude spectra of Gait Silhouette Volume (GSV) are calculated by Discrete Fourier Transform (DFT) analysis based on the gait period.
where is amplitude for temporal axis, N is the number of frames in a gait cycle, is a base angular frequency for a gait cycle and is the frequency component. The DFT analysis of gait period is shown in Fig. 5.

Principal Component Analysis
Principal Component Analysis reduces data by geometrically projecting them from higher dimension to lower dimensional features. PCA by projecting simpli es the complexity in high-dimensional data while retaining trends and patterns. The gait sequences are represented as GEI and DFT-GEI, gait recognition can be performed by matching testing dataset to the training dataset that has the minimal distance to the testing GEI and DFT-GEI. The PCA projects the original features to the subspace of the lower dimensionality so best data representation and class separability can be achieved simultaneously. The reduced dimension features are used for gait recognition by using classi ers.

Iii. Classi er
In this work, classi er K-Nearest Neighbour, Random Forest, Naïve Bayes, Linear Discriminant Analysis, Support Vector Machine and Linear Regression are analysed for recognition.

K-Nearest Neighbour
The K-Nearest Neighbour classi er is based on the class of their nearest neighbors considering more than one neighbor. Classi cation is based directly on the training examples and the Memory-Based Classi cation needs to be in the memory at run-time during the training process.

Random Forest
Random forest is a machine learning algorithm used in classi cation tasks. Random forest is an ensemble of tree-structured classi ers constructing many decision trees in the training stage and gives a unit vote in the classi cation stage. It is robust, fast, and identi es non-linear patterns in the data. The voting strategy corrects the undesired property of decision trees, is a major advantage of random forest is that it does not suffer from over tting training data.

Naïve Bayes
The Naive Bayes (NB) is a simple learning algorithm that uses the Bayes theorem, in which each feature makes an independent contribution to the target class. Each feature does not interact with the other and contributes to the speci c class, simpli es, and speeds up the computation operation.
NB classi er performs well on high dimensionality large dataset.

Linear Discriminant Analysis
LDA is a discriminant approach that is used to classify patterns between two classes. LDA projects the data onto a hyperplane to maximize the separation of two categories. The hyperplane maximizes the ratio between the group variance and within-group variance. When the value of its ratio is maximum, the samples within each group have the smallest possible scatter. Therefore, the groups are separated from one another the most.

Support Vector Machine
Support Vector Machines (SVM) are supervised learning methods, popular for performing classi cation. SVM is a linear classi er that builds a hyperplane to classify high-dimensional or in nite space data. Support vectors are the closest values to the classi cation margin. The goal of SVM is to maximize the margin between the hyperplane and the support vectors.

Linear Regression
Linear regression (LR) is a fundamental regression algorithm that predicts one variable from the other when the two have a linear relationship. A straight line summarizes a linear relationship between two variables.

Iv. Results And Discussions
The experimental results of the DFT-GEI are shown in Table I. The recognition accuracy for normal data is comparatively higher compared to bag, coat, and briefcase video sequences. The experimental results of GEI are shown in Table II. In the normal vs. normal classi cation, 100 % accuracy is achieved in the visible urban DFT-GEI dataset and achieved 95% in the visible GEI rural dataset. The result shows that the visible dataset outperforms the LWIR dataset in DFT-GEI.

V. Conclusion
Video surveillance plays a major important role and acts as a part of everyone's life for security reasons. In public places, identifying a person in different cameras is a challenge when those individual changes their appearance. This paper proposes that the gait biometric of a person can be identi ed at a distance. This biometric measure can identify a person even with changes in their appearance based on their gait. In this work, person re-identi cation is analyzed using gait moving feature extraction. The features extracted from the GEI and DFT-GEI are dimensionality reduced using PCA and recognized using different classi ers. The DFT-GEI recognition rate is higher compared to GEI. In the future, the work could be extended using soft biometric features with traditional and part based biometric features.

Declarations
Author's Contribution Lavanya wrote the manuscript, revised, and edited.

Author's information
Lavanya is a research fellow at the University of Southampton, United Kingdom. Her research areas of interests include biometrics, image processing and medical image processing.

Funding
Not applicable.

Availability of data and materials
Materials used in the manuscript may be requested from the corresponding author.

Competing interests
The author declare that she has no competing interests and that there is no con ict of interest regarding the publication of this manuscript.
Author details