Deep Transfer Learning Based Human Activity Recognition By Transforming IMU Data To Image Domain Using Novel Activity Image Creation Method

Human Activity Recognition is the most popular research area in the pervasive computing field in recent years. Sensor data plays a vital role in identifying several human actions. Convolutional Neural Networks (CNNs) have now become the most recent technique in the computer vision phenomenon, but still it is premature to use CNN for sensor data, particularly in ubiquitous and wearable computing. In this paper, we have proposed the idea of transforming the raw accelerometer and gyroscope sensor data to the visual domain by using our novel activity image creation method (NAICM). Pre-trained CNN (AlexNet) has been used on the converted image domain information. The proposed method is evaluated on several online available human activity recognition dataset. The results show that the proposed novel activity image creation method (NAICM) has successfully created the activity images with a classification accuracy of 98.36% using pre trained CNN.


INTRODUCTION
Human activity recognition is a popular research area in ubiquitous computing, humancomputer interaction and human behavior analysis.Activity recognition is very helpful in areas like Ambient Assisted Living, Elderly care etc (Mohammed & Sangavi, 2019).Recent research on wearable and ubiquitous sensors shows that these sensors can be used to sense data for human actions such as construction workers activity recognition (Akhavian & Behzadan, 2016), fall detection (Alveraz de la Concepción et al., 2017; Jansi & Amutha, 2020) etc. Enormous amount of data have been obtained by the presence of sensors in pervasive computing environments.Sensors like accelerometer and gyroscope play a major role for human action recognition (Mohammed Hashim & Amutha, 2020).Machine learning and data mining methods have been used to extract vital information from raw sensor data (Chetty et al., 2015) and this vital information is nothing but features and it is usually numerical statistical data.These features are always reliable in classification, segmentation and recognition.
Deep learning is extremely successful in many domains.Convolutional neural networks (CNNs) have now been used for practical tasks and it has gained popularity after achieving exceptional accuracy on tasks related to image classification (He et al., 2016;Szegedy et al., 2017;Zhang et al., 2017).Deep CNN have been used to recognize human activities also (Long et al., 2015).One of the limitations of these techniques is the requirement of big labeled data to carry out the training of the deep neural network.The computer vision community have created large labeled datasets like ImageNet (Russakovsky et al., 2015) for object recognition and classification.Only few labeled datasets are available since the scope is limited compared to general images.The limitation mentioned above can be overcome by using transfer learning.
Transfer learning uses a pre-trained CNN, the activations of the last hidden layer are taken as Conventionally, the transfer learning is used on visually interpretable domain.Images such as flowers, vegetables, animals, biological medical images like X-ray, MRI etc., will come under the category of images which are easily visually interpretable.The application of deep transfer learning on these images is mainly possible because of two reasons.First, it is feasible because of being visually interpretable and secondly, there is a very huge number of datasets available to carry out the tasks.However, the sensor data is not visually interpretable and it is unclear whether it can be visually interpretable.These data are called not visually interpretable data (Singh et al., 2017).Only a few have worked on converting the IMU sensor data to images for activity recognition and the scope is wide open in this area.
In this paper, we introduce the concept of using transfer learning on a not visually interpretable domain.
We have proposed a conversion technique called novel activity image creation method (NAICM), which involves the conversion from a non suitable domain for transfer learning to a suitable domain on which the CNNs can been trained.The raw sensor data is converted into a group of visual images using novel activity image creation method (NAICM) and then it is used as an input dataset for a pre-trained convolutional network model (AlexNet).

ACTIVITY DATASETS
We have used 3 activity recognition datasets namely SKODA, UCI, and USC-HAD to identify the adaptability of our proposed algorithm.For comparison with other works, we have chosen the USC-HAD dataset since most of the works have used USC-HAD.We have discussed the USC-HAD dataset elaborately compared to the other two datasets because of the aforementioned reason.

SKODA dataset
The Skoda Mini Checkpoint dataset contains the activity of assembly line laborers in a vehicle maintenance environment.The dataset is generated by making a laborer wear 20 accelerometers on both arms.The dataset contains 10 different activities and some of the activities are "write on notepad", "close hood", "check trunk gaps", etc.We limit our experiments to only 10 sensors placed on the right arm.

UCI HAR dataset
The UCI HAR dataset (Mohammed Hashim & Amutha, 2020) consists of labeled data from thirty human subjects.Totally 6 different activities are performed.Activities like "Standing", "Lying down", "Walking upstairs", etc are a few of the total 6 activities.The accelerometer and gyroscope sensor data is collected using the smartphone.The dataset consists of extracted features from the sensor data.

USC HAD dataset
The information about human activity dataset USC-HAD (Zhang & Sawchuk, 2012) have been discussed here.In order to construct an efficient recognition system, it has to be trained on a big diverse set of individuals.In point of human activities, the diversity of the enrolled subjects in the dataset is based on the following 4 factors: (1) Gender (2) Age (3) Weight and (4) Height.
Based on these guidelines, 14 human subjects (7 female, 7 male) have been chosen to participate in the process of data collection.In Table 1, the statistics of height, age and weight are listed.
A wider range of population has been covered by the diversity in each of these 4 factors.The subject moves in an elevator

PRE-TRAINED CNN
AlexNet shown in Figure1 is much larger than the previous CNNs utilized for computer vision.
It consists of 6 crore parameters and 6.5 lakh neurons and it took more than 5 days for training on 2 GTX 580 3GB GPUs.AlexNet is consists of cascaded stages.They are convolution, pooling, ReLU and fully connected layers.In particular, AlexNet consists of five convolutional layers, the first to fourth layer and it is followed by the pooling layer and 3 fully connected layers after the fifth layer.For the AlexNet architecture, the convolutional kernels are extracted while the back propagation optimization process by optimizing the whole cost function with the stochastic gradient descent algorithm.Usually, the convolutional layer along with sliding convolutional kernels acts on input feature maps to produce convolved feature maps.The pooling layer groups the information within the neighborhood window with a max or average pooling layer by acting on the convolved feature maps.The image F with frequency components has been resized to (227×227×3) to make it suitable for AlexNet.

RESULTS
We have used a 64-bit operating system, 32GB RAM, Intel Core i5 processor, GPU, MATLAB 2019b for our simulation work.The accuracy obtained for all the datasets using our proposed method have been mentioned in Table 4.The results show that the position of the sensor plays a vital role in getting better accuracy.From the results, it can be observed that the sensor placed in hands gives better accuracy compared to sensors placed in other positions like hip and waist.
We have used three different train and test ratios like 7:3, 8:2, and 9:1.Table 4 Accuracy obtained from all three datasets using NAICM.

Comparison of our proposed NAICM with a similar image conversion algorithm
Table 5 shows the accuracy comparison between our proposed NAICM and a similar image conversion algorithm (Jiang, W., & Yin, Z. ( 2015)).The results show that our proposed method has outperformed the existing method by nearly 0.57% with UCI dataset and 0.22% with USC HAD dataset.

Comparison of results using USC HAD dataset with other works
In this section, we have discussed the results obtained for our proposed method using USC HAD dataset and a comparison with other works using the same USC HAD dataset.Several quantitative metrics have been used for evaluation.They are 1.Sensitivity 2. Precision 3. F-Score 4. Accuracy 5. Specificity Table 6 USC HAD dataset -Confusion Matrix for classification of different activities.The following parameters can be calculated from the confusion matrix.True Positives (TP):

Activities
The total number of positive instances that were correctly predicted as positive.True Negatives (TN): The total number of negative instances that were correctly predicted as negative.False Positives (FP): The total number of negative instances that were correctly predicted as positive.
False Negatives (FN): The total number of positive instances that were correctly predicted as negative.The most common standard metrics used to analyze classification performances are sensitivity, precision, F-score, accuracy and specificity (Jansi, R., & Amutha, R. ( 2018)).
These are obtained using the following equations.
Sensitivity S measures the proportion of correctly classified positive instances.The F-score F is determined by finding the harmonic mean of recall and positive predictive value (precision).F-score is computed using the following equation.6 shows the obtained confusion matrix for classification.From the confusion matrix, we can calculate all the performance metrics.

CONCLUSION
In this work, we have proposed a novel activity image creation method to convert the sensor data into an image.We have used three activity recognition datasets available online.Raw sensor signals from accelerometer and gyroscope have been taken and the novel activity image creation method is used to get a newly-defined activity image suitable for pre-trained CNN.We compared our proposed method with the existing works.The proposed method achieves excellent performance in terms of recognition precision, specificity, sensitivity, and F-score

Conflict of Interest
Authors declare no conflict of interests input datasets feature descriptors by eliminating the last fully connected layer.The emerged feature descriptors are taken for training the classification model.The Pre-trained convolutional neural networks like AlexNet (Krizhevsky et al., 2012) and VGGNet (Simonyan & Zisserman, 2014) are transferred by fine-tuning the semantic segmentation task.Transfer learning removes the urge to create a big dataset required for training the CNNs.Additional costs like time and computational resources can be saved by using transfer learning.

Figure 2
Figure 2 Sample Activity Images the proportion of correctly classified negative instances.the proportion of classified positive instances to the total number of positive instances.

Figure 3 Figure 4
Figure 3 USC HAD dataset -Relative F-scores using DRNNs (Murad & Pyun, 2017) and Proposed method The accuracy A is the mostly used performance evaluation measure to find out the overall classification performance of all classes.= compared to the existing method ((Murad & Pyun, 2017)).The proposed method shows a good performance in terms of accuracy compared to existing methods by a margin of 7.66%*(Vaka, P., Shen, F., Chandrashekar, M., & Lee, Y. (2015)), 2.76%(Zheng, Y. (2015)), 1.36%(Jiang, W., & Yin, Z. (2015)), and 0.56%((Murad & Pyun, 2017)) respectively.Furthermore, the proposed method performed better than a similar image conversion algorithm (Jiang, W., & Yin, Z. (2015)) by a narrow margin.In the future, we can use our method to other IMU sensor data and study the performance.

Table 1
USC HAD dataset -Statistics of human subjects participated.There are totally 12 activities performed which are the most common human activities in daily lives of people.All the activities and its description are shown in Table2.Table2USC HAD dataset -Activities and its descriptions.

Table 3
Proposed Novel Activity Image Creation Method.
compares the values of sensitivity, specificity, precision of proposed method with DRNNs ((Murad & Pyun, 2017)).The comparison shows the performance metrics of proposed method is better than the DRNNs (Murad & Pyun, 2017).The average specificity, precision and sensitivity of the proposed method is