An Energy-Efficient Human Activity Recognition System Based on Smartphones

Smartphone-based human activity recognition (HAR) has become a considerable research field as a subdomains of pattern recognition and pervasive computing. With the increasing popularity of smartphones, HAR has prominent applications in number of fields such as health care, education, entertainment and etc. Smart devices have a huge advantage in convenience as the main acquisition and processing equipment, but the battery life of smartphone and other resources are limited for long-duration tasks. In this paper, we propose a lightweight HAR system. The system realizes HAR algorithm with deep learning algorithm. Beyond that, we introduce a clustering-center based pre-classification strategy to reduce the call frequency of the DL model. Meanwhile, we add a sampling frequency control mechanism to the inertial sensor. The goal of the whole system is to achieve low power consumption and time delay. According to the final experiment results, the energy consumption reduces about 49% and time delay reduces about 55% while the overall recognition accuracy only suffers about 10% reduction.


INTRODUCTION
Human activity recognition (HAR) is a vital research filed in pervasive computing and also provides the background of various applications. In recent years, inertial sensors in phones have become a primary HAR data source with the increasing popularity of smart devices. Researchers can collect daily information in areas by using these sensors, such as healthcare, entertainment, education, etc. [1]- [3]. Most of these HAR researches focus on improving the recognition accuracy under the specific application background. However, smartphonebased HAR is a long-duration sensing task while the power of smart phones is limited, which makes energy-efficient necessary for activity recognition. In a system involving multiple types of sensors, it is usually necessary to turn off some sensors (such as GPS) with high energy consumption to save power [4], while in some single sensor activity recognition systems, the energy saving is realized by reducing sampling frequency of the sensor. For example, Morillo et al. [5] use a constantly changing data sampling frequency from 32Hz to 50Hz to collect data and identify activities (including walking, jumping, cycling, etc.). As mentioned above, energysaving methods inevitably reduce the accuracy of HAR. Therefore, the core of the problem lies in the balance between low power consumption and recognition accuracy (or other performance indexes).
In our work, we proposed a lightweight HAR system for daily activities. The mainframe is based on deep learning (DL) algorithm for activity recognition, and pre-classification strategies are introduced to save energy. The rest of the paper are organized as follows: Part II gives the related work abour HAR. Part III depicts the detailed architecture of our system. The experiment are shown in Part IV and Part V is the conclusion and future work.

A. Sensor-based Activity Recognition
Sensor-based human activity recognition has been studied for years. Some early researches make particular sensing device for specific recognition targets. Pansiot et al. [6] develop the e-ar sensor, which can be worn on the ear to detect human body signs data for health care. Minnen et al. [7] put multiple sensors on military suit to recognize tactical actions and provide battlefield information. Angelini et al. [8] design a smart bracelet for the elderly. With the popularity of smart phones, many studies on activity recognition take advantage of smart phones as the core device to collect, process and identify data. Akhavian et al. [9] attach mobile phone to the upper arm of construction workers in their study to collect work data for identifying the activities of workers. Tran et al. [10] deploy the SVM model on a smartphone to recognize the daily activities of the human body. Bisio et al. [11] utilize sensors integrated in smart phones for medical monitoring on patients. Smartphone is a potential pervasive computing platform due to its convenience and popularity. However, as a device for long-duration tasks like HAR, its computing and energy resources are limited.

B. Deep Learning Algorithm
DL algorithms have attracted much attention these years and make excellent performance in image processing, Natural language processing (NLP), data mining and other fields [12][13][14]. Some classical network structures are designed such as AlexNet [15], Resnet [16], etc. The features extracted by the DL model are believed to be able to reveal deep character of the data. DL model is also applied to the research of sensorbased activity recognition recently. Yang et al. [17] utilized CNN network structure to the recognition of multi-channel human activity data sequence, and the results were superior to the traditional classifier. Hammerla et al. [18] compared performance of various DL network on HAR. In general, DL method has better recognition performance and robust characteristics, while it consumes more computing and memory resource than traditional machine learning algorithm, which limits its availability on smart phones [19].

C. Energy-Efficient Strategy
The existing energy-efficient strategy for based HAR realizes low power consumption by adjusting sensor sampling frequency. For example, Reddy et al. [20] change sampling frequency via detecting the user location information using GPS, they achieve an overall recognition accuracy of 93.6% with data from 16 subjects, but GPS consumes huge power and is not suitable for indoor tasks. Luis et al. [21] discretize origin data to reduce the computation load, and experiments on smart phones showed that the battery life is extended to 27 hours, but the minimum sampling frequency still reached to 32 Hz. Yan et al. [22] propose an adaptive method, which established relationship between sampling frequency and feature set, and achieved 20%-25% energy consumption reduction. Qi et al. [23] design the AdaSense model, and prove that multiple activity recognition consumed more resource than judging a single activity is in progress or not, hence the algorithm increases the time proportion of single activity detection, and reduces the energy consumption by 39.4%. Moreover, Aderemi et al. [24] design an energy consumption forecast model and Adeyemi et al. [25] conduct a study on energy consumption of a residential building. They both discuss the energy-efficient strategy in a broader sense.
The above energy-efficient strategies only reduce power consumption through adjusting sampling frequency and feature subset selection, without consideration on the classifier optimization. Meanwhile, there is no discussion on time delay of the proposed systems, which is crucial to wearable applications especially like epilepsy detection, fall detection, etc. [26]. So our work attempt to propose a preclassification method based on the clustering-center, which can be seen as a simple core feature for HAR.

A. System Main Frame
We develop an Android app for HAR, the architecture is shown in Fig. 1. Our system architecture mainly consists of two parts. One is offline training module, which is in charge of training DL model and pre-classification. The other is online recognition module. In this module, the system utilizes pre-classification model to obtain primary recognition results. These results can be considered as final results if they are acceptable through judgment, otherwise the DL model will be activated for recognition and final results are fed back to cluster-center calculating as correction reference. Meanwhile, the activity fluctuation is also calculated, and sampling frequency is realtime controlled according to the computing result.

B. Data Acquisition and Preprocessing
We build a HAR data set using the accelerometer in a MI phone on the right thigh of subject to train the model. The experiment invites 8 subjects (6 male and 2 female, age [22][23][24][25][26][27] to perform 8 activities, including sitting, standing, lying on the left side, upstairs, downstairs, walking, running, quick walk. For each activity of a subject, we gather data at a frequency of 50 Hz for 3 minutes. A 3-order low-pass Butterworth filter is applied to eliminate the noise, then the data is segmented with sliding window algorithm. According to previous experimental experience, we define the window size as 2s, which is close to the activity duration [27]. We collect two copies of activity data for all participants, one for train and the other for test.

C. DL-HAR Model
AlexNet is a DL model designed by Hinton and his team in 2012, which is one of classical DL models. It includes 5 convolutional layers and 3 full-connected layers, and each layer contains thousands or even more neurons. Given the size of our HAR tasks, we simplify AlexNet by cycling fine-tuning to form our DL-HAR model. The final net structure is depicted in Fig. 2.
The final structure preserves 1 convolutional layer and 3 full-connected layers (Dense). The segmented data blocks are sent to the DL model for recognition.

D. Pre-Classification Strategy
We propose two pre-classification strategies to improve the overall performance of the model in this section. We define the concept of clustering-center for pre-classification. DL model performs excellent on pattern recognition issues with high energy and other resources consumption. The proposed pre-classification strategy in our work is to save energy by reducing the call frequency of DL model. In the HAR field, a feature vector consists of time-domain and frequency-domain features extracted from activity data for recognition. Frequency-domain features perform well on periodic activities while the computational complexity and delay are relatively large than time-domain features which are more commonly used. Therefore, our system only takes advantage of time-domain features and the Recursive feature elimination (RFE) algorithm is applied to reduce the dimension of feature vector to 36. (Feature 1-11 are extracted from the accelerometer data of 3-axis and Feature 12 is the coefficient of 3-axis data as shown in Table I). In the offline training stage, feature vectors of all data blocks from same activity are calculated, and then we define the clustering-center as the clustering value of all feature vectors of an activity. We denote the clustering-center set of all activities as , where c represents the number of activities to be classified. The feature center can be calculated as: (1) where is the feature vector extracted from the jth training data of ith activity(denote as ), , where is the training size of , the dist is the distance between two vectors and depicted as Equation (2): In the online recognition stage, for current time window t, is the time domain feature of t, and calculate the distance between the feature center and , denoted as , and the represents the Euclidean distance between elements of T and the to-be-classified feature vector.
reflects the possibility whether the current activity in the current time window is : the greater is, the more deviates the feature center of , and so the lower the possibility that the user is carrying out and vice versa.
For the vector E above, there may be some cases that little difference between and ,which means pre-classification strategy is unable to recognize t. Consider of this, we define a constraint to determine whether accepting the preclassification results. The formula is as follow: ( 3) where is the pre-classification confidence of and calculated as follow: (4) In (8), The function gets the -quantile of a sequence. The experiment of determining is depicted in Part 4. If (7) is considered true, the pre-classification result will be taken as the final recognition result. For each , is a constant after being calculated in offline training stage.

E. Adaptive Frequency Control Strategy
The complexity of activities differs from each other. Obviously, static activities like sitting and lying are significantly less complex than dynamic activities include running and walking. In the study, we believe that activity recognition with low complexity also requires less data or features compare with high complex activity, which is the main theoretical basis to reduce the sampling frequency to save energy. Therefore, we propose an adaptive sampling frequency control strategy, which adjust sampling frequency referring to the history record, to realize energy-efficient recognition.
We introduce the concept Activity Changing Rate (ACR) to measure the probability of the user staying in the current activity. The formula is shown as follow: (5) where t represents the current time window and is the activity intensity of . The concept of activity intensity is a constant and based on the value of energy consumed by the activity [34] as shown in Table II.   TABLE II. ACTIVITY is the standard deviation of the activity intensity calculated from the past time windows, which is just the ACR we defined. When is smaller than threshold represented as , the activity state is considered stable which means the sampling frequency can be reduced. Obviously, the threshold has a decisive influence on the system sampling frequency. On the other hand, the parameter represents the quantity of referred historical time windows when calculating . It is objective using long enough historical data when recognizing activity, while large data lead to more storage and computation resource consumption. Based on our previous experience, we finally set .

IV. EXPERIMENT AND RESULTS
We transplant the Pre-Classified Deep Learning (PC-DL) model to the Android platform. We take 2 new subjects, who have similar physical features with the former 8 ones, for data collection and activity recognition. Each activity lasts for 1 minute and repeats for 10 times. A single experiment contained two parts, one for DL model and another for PC-DL model to recognize activities. Energy consumption, time delay and recognition accuracy are recorded during the experiment. The results are shown in Table III.
The results contain the comparision of Power consumption, Time delay and Recognition accuracy between the DL and PC-DL. According to the records, the energy consumption of PC-DL decreases by 49% on average due to it can reduce the computing load and the sampling frequency of mobile device. In addition, the PC-DL reduced the time delay by 55% on average. Meanwhile, the recognition accuracy is reduced but still remained at 87.78%. However, the DL still performs better on activity "Running" considerting the power consumption.
Moreover, to further verify the advantage of PC-DL on energy-efficiency we test the battery life of two Android phones in 3 scenarios: x Recognition with DL. (DL) x Recognition with PC-DL. (PC-DL) x No recognition. (NAN) To ensure the initial conditions are identical, we set the phone power 100% before start the experiment. We require users to avoid other operation (such as watching video or playing game) on the phone during the experiment. Fig. 3 shows the energy consumption of the two smartphones in 3 scenarios. According to the figure, comparing with DL model, the PC-DL method is significantly reduced the power by 31%-39%.

V. CONCLUSION AND FUTURE WORK
In this work, we develop a lightweight HAR system for recognizing simple daily activities based on PC-DL model. We utilize convolutional network as the core classification algorithm. Meanwhile, we introduce the pre-classification strategies to reduce energy consumption and time delay by lower the algorithm complexity and sampling frequency of the sensors in smart phones. By deploying the HAR app developed on the Android platform, this research verifies the good performance of the PC-DL algorithm on energyefficiency and time delay: the overall accuracy of recognizing 8 activities remains about 85%, while energy consumption is reduced by 49% and the time delay is reduced by 55%. In the long-time test, the experiment showed that the PC-DL algorithm can reduce the consumption of battery by 31%-39% compared with the DL algorithm.
In future work, we will expand the activity dataset. Also, we will consider multiple kinds of sensors for complex activity recognition.