An Advanced Attention-Enhanced Hybrid Deep Learning Model for WBSNs’ Gait Pattern Recognition

Background: The deep learning techniques have been attracted increasing attention on wireless body sensor networks (WBSNs) gait pattern recognition that has a great contribution to monitoring gait change in clinical application. However, in existing studies, there are some challenging issues such as low generalization performance and no potential interpretation for gait variability. It is necessary to search for the advanced deep learning models to resolve these issues. Method: A public WARD database including acceleration and gyroscope data acquired from each subject wearing five sensors was selected, and the gait with different combination of on-body multi-sensors is considered as a WBSNs’ gait pattern. An advanced attention-enhanced hybrid deep learning model of DCNN and LSTM for WBSNs’ gait pattern recognition was proposed. In our proposed technique, the combination model of DCNN with LSTM is firstly to discover the spatial-temporary gait correlation features. And then the attention mechanism is introduced to exploit the more valuable intrinsic nonlinear dynamic correlation gait characteristics associated with gait variability hidden in spatial-temporary gait space obtained. This significantly contributes to enhancing the generalization performance and taking insight on gait variability in a certain anatomical region. Results: The ten gait patterns are randomly selected from WARD database to evaluate the feasibility of our proposed method. Our experiments demonstrated the superior generalization ability of our method to some models such as CNN-LSTM, DCNN-LSTM. Our proposed model could classify ten gait patterns with the highest accuracy and F1-score of 91.48% and 91.46%, respectively. Moreover, we also found that the classification performance of a certain gait pattern was almost same best when the combinations of three or five on-body sensors were employed respectively, suggesting that our method possibly take insight on gait variability in a certain anatomical region. Conclusion: Our proposed technique could feasibly discover the more intrinsic nonlinear dynamic correlation gait characteristics associated with gait variability from on-body multi-sensors gait data, which greatly contributed to best generalization performance and potential clinical interpretation. Our proposed technique would hopefully become a powerful tool of monitoring gait change in clinical application.


Background
In recent 20 year, wireless body sensor networks(WBSNs), as an emerging wearable sensing technology for easily acquiring gait data during freely walk in-home or outdoor setting [1][2], has been popularly applied in gait analysis [3][4][5]. Usually, WBSNs-based gait pattern can be constructed by different sensors node that are located on the different anatomical regions of body such as head, shoulder, elbow, arm, wrist, waist, hip, thigh, knee, ankle, heel, foot and so on. Many studies have found that automatic recognition of WBSNs gait pattern has greatly contributes to accurately evaluate the human gait function change in daily life [6][7][8][9], which benefits clinical diagnosis, rehabilitation assessment and early prediction of fall risk of elderly, and so on. In recent studies, automatic recognition of WBSNs gait pattern has been considered as a gait classification task, and the challenging issue is how to achieve the best generalization performance and potential interpretation for gait variability [8][9].
As we know, a key step in developing gait classification model with high quality is to capture more valuable intrinsic gait features information associated with gait variability hidden in gait data. The same is true with how WBSNs gait classification model with best quality is constructed [10][11]. Recent studies have shown that some deep learning algorithms with excellent ability to learn representative features have been applied in WBSNs' gait pattern recognition. These algorithms mainly include convolutional neural network (CNN), dilated convolutional neural network (DCNN), recurrent neural network (RNN) and long short-term memory(LSTM) [12][13][14][15]. Their basic idea is to take advantage of superior learning ability to exploit the spatial or temporary correlation gait features hidden in multi-sensors data for improving generalization performance. Especially, DCNN and LSTM algorithms have more and more been attracted attention on the related studies because they have strong capability to capture the intrinsic spatial or temporal dependency gait features. Despite these algorithms can obtain better classification performance, they hardly capture the more valuable gait characteristics information from multi-sensors data. This is because these algorithms possibly impede the efforts on the valuable intrinsic spatial-temporal correlation gait features in WBSNs gait classification task.
Recently, the different efforts on the quantitative analysis of multi-sensors gait data have been devoted to discovering the intrinsic spatial-temporal correlation gait feature information embedded in WBSNs' gait pattern, in order to produce the best generalization ability. Some recent studies have concentrated on the application of data fusion-based feature extraction technique to exploit the valuable implicit spatial-temporal correlation gait features information from both inter-sensors and intra-sensors data in WBSNs [16][17][18]. A typical example is that the distributed compressed sensing (DCS) technique is utilized to simultaneously compress multisensors gait data for capturing spatio-temporal correlation information hidden in WBSNs' gait pattern [16]. However, the DCS technique difficultly discovers the valuable intrinsic gait feature from non-sparsity gait data of multi-sensors dynamic sequence because it must satisfy data joint sparsity. Meanwhile, the others recent studies have demonstrated that many endeavors have been made to construct the deep learning-based WBSNs' gait classification model with high quality [19][20][21][22]. Their basic idea is to take advantage of the superior ability of learning representation feature of deep learning algorithms to discover the intrinsic spatial-temporal interaction information embedded in WBSNs' gait pattern in an end-to-end manner. Some existing studies have showed that the feasibility of application of hybrid deep learning algorithms for capturing the more intrinsic spatial-temporal correlation information hidden in on-body multi-sensors gait data, in order to enhance the generalization ability of dynamic sequence gait data [19][20][21]. For example, F. J. Ordoez et al. proposed a hybrid deep learning model of CNN and LSTM to automatically learn the implicit spatial-temporal gait features from multi-sensor dynamic data for best classification performance. Unfortunately, there exist some limitations in recent studies. Firstly, these existing studies have ignored the developed WBSNs gait classification model for potential interpretation for gait variability, that is, they can not accurately find the best location of on-body sensors associated with gait change for potential interpretation. Secondly, it is well-known that the human gait is generated by the interaction between the central nervous system, peripheral nervous system, and musculoskeletal effectors system. In other words, human gait depends on the high interaction between central nervous system and various muscles which allow the individual keep a continual interchange between mobility and stability. There exists the intrinsic characteristics of the complex dynamic in gait that is derived from the locomotors sensory feedback from muscles, joints, and other receptors related to locomotion change. The intrinsic "interesting" gait characteristics information on local nonlinear dynamic is possibly hidden in the interaction among multi-sensors in WBSNs' gait pattern. However, these existing studies do not take into account the intrinsic valuable characteristic highly associated with gait variability such as the local nonlinear dynamic characteristics embedded in spatial-temporal correlation of human gait. How to discover these intrinsic valuable "interesting" gait characteristics for the best generalization ability and potential interpretation for gait variability based on on-body multi-sensors data has been a challenge endeavor in the recent existing studies [20][21][22]. Theoretically, it is possible that valuable characteristics with gait variability are locally and saliently distributed in the intrinsic non-linear dynamics correlation of human movement. It is critical to exploit the local intrinsic non-linear dynamics correlation from on-body multi-sensors data, which greatly contributes the best generalization performance of gait classification. This motivates us to search for the advanced hybrid deep learning models that can discover the valuable local intrinsic "interesting" feature from multi-sensors gait data for accurately detecting WBSNs' gait pattern change.
To our best knowledge, attention mechanism has a superior ability to focus on the salient parts of the object rather than the whole object. Recently, the attention mechanism has been popular in deep learning algorithms for exploiting valuable implicit features hidden in the dynamic data. The representative examples include attention-based RNN model and attention-based LSTM model, which improve intrinsic time correlation in dynamic time-series data [23][24]. Some recent studies have showed the feasibility of application of attention-enhanced LSTM network to identify Parkinson' disease(PD) characteristics using multi-sensors GRF gait data on foot [25]. However, no studies have been reported that attention-based hybrid deep learning model is developed to discover local intrinsic non-linear dynamics correlation embedded in spatial-temporal correlation of gait features using on-body multi-sensors data. Therefore, inspired by deep fusion of excellent learning performance of attention mechanism, DCNN and LSTM, we proposed a novel attention-based hybrid deep learning model of DCNN and LSTM to capture the local intrinsic non-linear dynamics correlation information associated with gait variability hidden in WBSNs' gait pattern defined by the combination of different on-body multi-sensors. That is, with on-body multisensors gait data, we take advantage of the attention mechanism to exploit the local implicit nonlinear dynamic correlation gait features containing more valuable gait variability information embedded in spatial-temporary gait correlation space obtained by DCNN and LSTM, which greatly contribute to the best generalization performance and potential interpretation for gait variability based on different anatomical regions of body. In this study, an open wearable action recognition database (WARD) [26] including acceleration and gyroscope data from all five onbody sensors of each subject was employed to evaluate the feasibility of our proposed model. The experimental results demonstrated that our proposed model can achieve the state-of-art generalization performance and find different anatomical regions for potential interpretation on gait variability. The main contribution of this study are summarized as follow: 1. We proposed a novel attention-based hybrid deep learning model of DCNN and LSTM for WBSNs' gait pattern recognition with high quality.
2. Our proposed model can discover the valuable local intrinsic "interesting" characteristics associated with gait variability such as nonlinear dynamic correlation features, which would contribute to searching for specific details of pathological gait deviations.
3. Our proposed technique can find the best combination location of different on-body sensors associated with anatomical regions, which could provide potential gait variability interpretation for gaining deeper insight into gait compensatory mechanism.

Results
The selection of proper fixed-length of sliding window In this study, with the selected accelerometer and gyroscope data from multi-sensors, we objectively evaluate the effects of the selection of best parameters on our proposed model. These selected parameters mainly include of sliding window size, dilated rate, batch size and epoch size on the classification performance. Firstly, we evaluated the impact of the selection of the proper the fixed-length of sliding window on the classification performance. In view of the normal gait speed (1.1-1.5 m/s) and gait cycle (1-1.3s), we concentrated on the selection of the proper fixed-length sliding window from window length range from 1.5s to 5s. In this experiment, our proposed model consist of seven dilated convolution layers, two long shortterm memory layer, attention strategy and softmax layer. To decrease the training time, the time of epoch was set to 60. Accuracy and F1-score were used as evaluation criteria of the classification performance. The results of effect of sliding window size on classification performance were presented in Fig. 1. From Fig. 1, we obviously observe that both F1-score and accuracy increase while the window length increases from 1.5s to 3.5s, whereas both F1-score and accuracy decrease when the windows length increases from 1.5s to 3.5s, whereas both F1-score and accuracy decrease when the window length increases from 3.5s to 5s. In comparison, both F1-score and accuracy can, respectively, achieve the maximum values of 89.54% and 89.44% when the window length is fixed to 3.5s. Both F1-score and accuracy can, respectively, obtained the minimum values of 88.1% and 87.8% when the window length is fixed to 1.5s. These results demonstrated that data segmentations obtained by sliding window with length of 1.5s possibly contain the less valuable information about gait change, which could deteriorate the generalization performance of our proposed classification model.
Similarly, data segmentations yielded by sliding window with length of 4.5s or 5s can retain the more redundancy information hidden in gait data, which could worsen the classification performance. Fortunately, data segmentations produced by sliding window with length of 3.5s can contain the more valuable discriminative information in gait, which significantly improve the generalization ability. The selection of best dilated rates In this study, it is very important to select the best dilated rate for exploiting the valuable "interesting" gait features. As we know, dilated rate is set to 8 and 16 corresponding to receptive field of 31 and 63, respectively. Theoretically, the receptive field size with dilated rate size of 16 is closed to the input data size of 105 in our proposed model, which could obtain the more useful information about gait change. In this experiment, for comparison, we designed the two different schemes of selection of dilated rate. The first scheme is that the dilated rates are set to 1,2,4,8,16,2,4,8,16, respectively; and the second scheme is that the dilated rates are set to 1, 2, 4, 8, 2, 4, 8, respectively. The best parameters corresponding classification model are same as those in the former. The comparative results are showed in Table 1. From Table 1, we can see that the F1-score and accuracy can achieve the higher values of 91.46% and 91.48%, respectively, when the second selection scheme of dilated rates was employed. In comparison, it is obvious that the second selection scheme of dilated rates is superior to the first selection scheme. These results suggested that, with our selected gait data, when the dilated rates are selected as 1, 2, 4, 8, 2, 4, 8 in the different DC layers (i.e. seven DC layers), we can gain the more valuable intrinsic gait spatial features for best classification performance. Table 1 The comparative results of classification performance with different dilated rates The proper batch size of our proposed model Besides, we also evaluated the effects of the choice of the different batch size on the classification performance. In this experiment, in view of our selected sliding window with 3.5s length, we selected the batch size of 16, 32, 64 and 128, respectively, to implement our gait classification task. The epoch of classification model was set to 60, and accuracy and F1-score were also employed as evaluation criteria of assessing the generalization performance. The comparative results were given in Fig. 2. As shown in Fig. 2, it is obviously that both accuracy and F1-score can increase with the increase of batch size from 16 to 64, especially, they can obtain the almost same higher values respectively when the batch size increases from 32 to 64. In comparison, both accuracy and F1-score can reach the best values of 89.1% and 89.3% respectively when batch size was set to 32, suggesting that the selected batch size of 32 possibly contain the more intrinsic valuable spatial-temporal correlation information hidden in gait data for best gait classification performance. However, both accuracy and F1-score only obtain the minimum value when the batch size was fixed to 128, demonstrating that the batch size fixed to 128 can retain the more redundant information for the poor classification performance. These results illustrated that the properly selected batch size of gait data such as acceleration and gyroscope data can greatly benefit to the improvement of the generalization ability of our proposed model.   The results of evaluation of our proposed model with high quality In this section, we described the results of the evaluation of our proposed model with high quality.
For comparison, some recent relevant deep learning models such as CNN, LSTM, CNN-LSTM,

DCNN-LSTM, attention-based DCNN-GRU, attention-based DCNN-RNN and attention-based
CNN-LSTM were employed. In this experiment, to objectively and accurately evaluate the classification performance, we adopted the same gait data, architecture of model, and the training and testing scheme. The fixed-length of sliding window, batch size and epoch size were set to 3.5s, 32, and 200, respectively. Both accuracy and F1-score were used as the evaluation criteria of the classification performance. With our selected multi-sensors data, the best architecture of our proposed model was presented in Fig. 4. As illustrated in Fig. 4, this architecture mainly includes seven dilated convolution layers, two long short-term memory layer, attention strategy and softmax layer. For all seven "DC" layers, we successively set dilation rate to 1, 2, 4, 8, 2, 4, 8, respectively. The numbers before and after "@" refer to dimension of a feature map and the number of feature maps in each DC layer. In first and fourth "DC" layers, there are the same 32 feature maps. In the reminder "DC" layers, there are the same 64 feature maps. In addition, in the first LSTM layer, the best 128 neurons can be obtained to effectively capture the more valuable temporary dependency change corresponding to the extracted local gait spatial features. In the second LSTM layer, 10 neurons are equal to ten selected gait patterns to be classified. The neurons of models are dropped out with a probability of 0.25. The comparative results were given in Table 2. As shown in Table 2, it is obviously that our proposed model can   further validate the excellent generalization ability of classification WBSNs' gait patterns. Here, in order to effectively assess the superior ability of accurate detection of each selected gait pattern, both recall and precision were used as the evaluation metrics of classification performance. The evaluation results with confusion matrix were presented in Table 3. In such confusion matrix, the i th row and j th column denote the total number of gait pattern with class i that is classified as class j .
According to the results of confusion matrix in Table 3, the whole averaged precision and recall across all selected ten gait patterns can reach the maximum of 91.3% and 91.2%, respectively, suggesting that our proposed model can recognize all selected WBSNs-based gait patterns with higher accuracy.
In comparison, from Table 3, we can find that the precision and recall from three "static" gait patterns such as stand, sit and lie down are higher while the precision and recall from two "active" gait patterns are lower. Especially, stand pattern can gain the best recall value of 95.6%, and both sit and lie down patterns can obtain the maximum precision values of 94.5%.
In contrast, downstairs pattern can obtain poor precision and recall values of 84.7% and 87.1%, respectively, which illustrated that some downstairs patterns could be misclassified as upstairs and compared to the anatomical regions consisting of waist, left and right wrists, left and right ankles. In conclusion, these results demonstrated that our proposed method could exploit the intrinsic valuable nonlinear gait correlation features associated with gait variability in a certain anatomical regions, which could help to not only produce the best generalization performance but also take insight on gait change from a certain anatomical regions for potential gait variability interpretation.

Discussion
The experiment results of the present study showed that our proposed attention-based hybrid deep learning model of DCNN and LSTM could feasibly exploit the more intrinsic valuable nonlinear dynamic gait correlation characteristics from the on-body multi-sensors gait data such as acceleration and gyroscope data during walking, which greatly enhanced the generalization ability of WBSNs' gait classification. Currently, finding the best deep learning model for WBSNs' gait recognition with high quality has been a challenging endeavor in gait analysis. A key issue to be addressed in recent related work is how to exactly exploit the intrinsic most representative characteristics highly associated with gait change from on-body multi-sensors gait data.
Theoretically, gait variability is highly depended on gait compensatory mechanism that is characterized by the whole coupling of anatomical structure of human locomotion. Such mechanism is to a great extent reflected by the correlation between joints and muscle in a certain anatomical regions during walking. The aim of this study is to find an advanced deep learning technique that can capture the intrinsic valuable nonlinear gait correlation features from a certain anatomical regions, which significantly contribute to improving WBSNs' gait classification performance and taking insight on gait variability.
In this study, we aimed to take advantage of attention mechanism to capture the implicit valuable  Table 1, Fig. 2 and Fig. 3, respectively. In order to gain data segmentation containing the more useful gait characteristics information, we selected the proper fixed-length of sliding window for the highest generalization performance. As shown in Fig.   1, the optimal fixed-length of 3.5s was selected to reach the best classification performance, when compared to the fixed-length of approximately 5s selected usually in previous study [45]. This is because data segmentation with fixed-length of 3.5s can contain the more valuable gait features information associated with gait change, whereas data segmentation with fixed-length of 5s can contain the more redundant information that reduce classification performance. In addition, in this study, it is very important to select the optimal number of DC and LSTM layers for best classification performance. As illustrated in Table 1 and The possible reason is that the dynamic nonlinearity and non-stationary of multisensors acceleration and gyroscope data selected are possibly sensitive to receptive field size expanded by the proper selection of dilated rates in DCNN, which could benefits to dynamically capturing the intrinsic spatial gait features at different scales. In this experiment, we also found that all optimal parameters must be carefully be selected by trial and error method for best generalization performance because the best value of each parameter could change with the different values of other parameters. Currently, there have been no general agreements on standards for the selection of the optimal parameters in deep learning algorithms. The similar reports have been found in [46].
In the present study, we focused on the investigation of our proposed model for WBSNs' gait classification with high quality. As shown in Table 2, our proposed model was best when compared with some deep learning models used in recent studies. In the experiment, we can find that most of hybrid deep learning models can produce superior generalization ability than individual deep learning model such as CNN and LSTM. The main reason is that hybrid deep learning model can capture the valuable spatial-temporary correlation information between both intra-sensors and intersensors on body, whereas CNN or LSTM only learn the spatial or temporary features in gait data.
Similar results have been reported in [21,47]. Especially, our proposed technique can significantly exploit the more intrinsic nonlinear dynamic correlation features from spatial-temporary space associated with gait change while the lower computational time cost is spent. This is because attention-based DCNN-LSTM can concentrate on the most valuable spatial-temporal correlation features between both intra-sensors and inter-sensors, and thus it could suppress the redundant and possibly confusing information of multi-sensor gait data to reduce the overall training time [21]. It is hopeful that our proposed technique has a great potential application in real-time monitoring gait change. No similar results have been reported in recent studies.
In addition, we further evaluated the discriminative ability of our proposed model with high quality by classifying all selected ten gait patterns. Unlike some previous studies, this study randomly selected some similar gait patterns including "static" gait patterns such as stand, sit, and lie down, and "active" gait patterns such as walk forward, walk left-circle, walk right-circle, turn left, turn right, upstairs and downstairs, in order to accurately detect the smaller difference among WBSNs' gait patterns. As illustrated in Table 3, our proposed model can feasibly detect each gait pattern with a higher recall and precision. This further suggests that our proposed technique is capable of discovering the valuable intrinsic gait variability information between selected "static" or "active" gait patterns. The main reason is that our proposed method can take advantage of attention mechanism to exploit the more local distinctive nonlinear information associated with the dynamic spatial-temporal correlations of gait [28,[22][23][24], which significantly contribute to recognizing gait change. Similar studies have been reported in [16].
Besides, in this study, we also investigate the effect of combination of different on-body sensors on detecting a certain gait by our proposed technique. We aimed to find the impact of the intrinsic interaction between joint and muscle in a certain anatomical regions on gait change based on gait compensatory mechanism. As shown in Table 4, our proposed technique can accurately detected each gait pattern when the combinations of three or all five on-body sensors were adopted.
Especially, for a certain gait pattern such as stand, the superior identification ability of some combination of three sensors such as S123 is much the same as that of all five sensors. This further illustrated that our proposed model had excellent learning ability to capture intrinsic local interesting information, such as specific details of gait deviations highly associated with gait compensatory mechanism, which resides in a certain anatomical regions. Recent similar studies have been reported in [17]. In comparison, our proposed model is superior. The main reason is that the local energy-based shape histogram method used in recent study hardly discover the more valuable intrinsic gait features from multi-sensors data, however, our proposed technique has the superior learning capability to exploit the more local implicit nonlinear characteristics information about gait variability from a certain anatomical regions. This greatly benefits to gaining deeper insight into the pathologies and physiological change of gait compensatory mechanism based on a certain anatomical regions.

This study proposed a novel attention-based hybrid deep learning model of DCNN and LSTM for
WBSNs' gait pattern recognition with high quality. Our proposed model with superior learning ability can feasibly exploit the valuable intrinsic nonlinear gait correlation characteristics associated with gait variability from on-body multi-sensors gait data such as acceleration and gyroscope data, which significantly improves the generalization performance of WBSNs' gait classification. With our proposed technique, a certain anatomical regions highly associated with gait pattern change can be obtained by the combination of different sensors, which greatly contributes to take insight on the intrinsic correlation of gait deviations residing in gait compensatory mechanism for potential clinical interpretation. Further work will evaluate the effectiveness and feasibility of our proposed model using the different kinds of multi-sensors gait data from patients subjects such as PD patients.
It is hopeful that our proposed method would become a effective tool for monitoring gait change in clinical environment in the future.

Method
In this study, in view of multi-sensors dynamic time-series gait data with nonlinearity, nonstationary and stochasticity, we proposed an advanced attention-based hybrid deep learning model of DCNN and LSTM for WBSNs' gait classification. The objective of our study is to take advantage of deep fusion of the excellent learning ability of attention mechanism, DCNN and LSTM for WBSNs' gait pattern recognition with high quality in clinical application. The basic idea is that, with WBSNs' gait pattern defined by the combination of different on-body multi-sensors, we firstly took advantage of DCNN technique to exploit intrinsic spatial gait feature hidden in onbody multi-sensors gait data. Secondly, LSTM was utilized to capture the intrinsic temporal dependency corresponding to the extracted intrinsic spatial gait features. Namely, we constructed a spatial-temporal correlation space embedded in multi-sensors gait data by hybrid deep learning model of DCNN and LSTM. Finally, the attention mechanism was introduced to discover the valuable local nonlinear dynamic correlation information highly associated with gait variability from the constructed spatial-temporal gait correlation space. This significantly improves the generalization performance can find a certain anatomical regions for potential clinical interpretation.
Based on the above basic idea, the architecture of our developed model mainly consist of input layer, H blocks based on dilated convolution (DC) layers with different dilation rates, K layers of long short-term memory with different number of neurons    1 m K K , attention strategy and softmax layer, as shown in Fig. 5 The input layer includes multi-sensors data such as acceleration data and gyroscope data. For DC networks capacity, each block includes a DC layer and a batch normalization(BN) layer. In the process of gait feature learning, the dilated convolution layers are firstly utilized to extract the intrinsic local short-term spatial gait features embedded in multisensors gait data. And then, long short-term memory layers are implemented to model temporal dependency corresponding to the local short-term spatial gait features extracted, in order to capture the implicit spatial-temporal correlation gait characteristics hidden in gait data. Next, attention strategy is employed to discover the local intrinsic valuable nonlinear dynamic gait feature embedded in spatial-temporal correlation feature space. Therefore, we can gain the more valuable discriminative information among WBSNs gait patterns, which greatly contributes to softmax layer that accurately recognize gait pattern with maximum probability. The detailed description of our proposed model is described as follows. According to the defined  j g a R , the gait data of sensor j during the length of time h can be defined as a vector j s : Next, according to equation (1), the gait data containing all L sensors during the length of time h can be defined as a vector v And then, according to equation (2), the time-series gait data including all L sensors with gait pattern class i during the length of time h can be defined as a vector i V : where i n denotes all samples of the gait pattern class i , and  m gLh .
So, according to equation (3), we can define the WBSNs gait pattern using the combination of the different on-body sensors. In addition, the time-series gait data consisting of all L sensors with all gait pattern classes K during the length of time h can be defined as a vector D : where a total of WBSNs' gait samples 1 2 k n n n n    . According to to equation (4), we can obtain the sample training set of WBSNs gait patterns to be classified.

DCNN for local short-term spatial gait features
Next, based on the defined WBSNs' gait samples set D , DCNN technique is employed to exploit the implicit local short-term spatial gait features hidden in on-body multi-sensors time-series gait data. As we know, the advantages of DCNN is to adopt dilation convolution operator to change the receptive field size while the size of field map (i.e. convolutional kernel) keeps no change. That is, in each block, assuming that  denotes a dilation factor, the - dilated convolution operation   for local intrinsic gait feature extraction can be defined as where the discrete function F is generalized as . Therefore, it is important to choose the proper dilation factors to effectively expand receptive field size that benefits to dynamically capturing the intrinsic spatial gait features at different scales. With our proposed DCNN model, we can gain the more intrinsic local short-term spatial gait features from on-body multi-sensors gait data

LSTM for spatial-temporal correlation gait features
And then, we take advantage of LSTM technique to discover the intrinsic temporal dependencies corresponding to the local spatial gait features extracted by DCNN, in order to exploit the spatialtemporal correlation gait characteristics hidden in WBSNs gait. As we know, the basic unit of Where  , f W , f b are the non-linear function, the weighted matrix and bias vector to be learned during training, t represents the current time state, 1 t h  denotes output value of the previous time 1 t  . Next, the updated information on temporal correlation at time t is determined by following input gate function t i and candidate memory cell t C  : where i W and c W denote the weighted matrix. i b and c b denote bias vectors to be learned during training. And then, according to equation (7) and (8), the updated memory state cell for temporal correlation corresponding to local spatial feature of gait at time t is defined as t C : Also, the output gate function t o at time t is defined as where o W and o b denote the weighted matrix and bias vector to be learned during training, respectively.
So, according to equation (10), the output of current memory state containing temporal dependencies corresponding to local spatial gait feature at time t is defined as t h , Therefore, according to equation (11), we can define the spatial-temporal correlation gait characteristics as a vector where T is total number of time steps and n denotes the number of dimensions of each time step.
The detailed procedures for solution of recurrence computation in LSTM are presented in literature [38]. In addition, it is key to select the proper number of LSTM layers to achieve the best generalization performance.

Attention-based technique for local intrinsic nonlinear dynamic correlation gait features
Next, attention-based technique is employed to exploit the intrinsic specific nonlinear dynamic gait characteristics embedded from the spatial-temporal correlation space obtained by hybrid deep And then, the attention score t  at time t can be is defined as a probability vector by According to equation (14), the attention score t  at time t obtained can be considered as a weighted value that reflects the significant contribution to intrinsic specific nonlinear dynamic gait features in the whole spatial-temporal correlation. So, we can obtain the intrinsic valuable specific nonlinear dynamic correlation gait feature s  hidden in spatial-temporal correlation space by computing the following equation:  are the weighted matrix and bias vector, respectively.
In order to achieve the best learning performance, we utilized the common cross-entropy error function as cost function.
where k represents the total number of target classes to be predicted, and k y R  denotes a onehot vector that represents the label of the d th data of the m  th sample. The k i p R  denotes the probability distribution for accurately predicting gait class i , and  is the parameters of the learning network.

Experiment
The selection of on-body multi-sensors gait data In this study, an open wearable action recognition database (WARD) including on-body multisensors gait data from University of California was selected, in order to evaluate the effectiveness of our proposed technique. In the selected WARD database, there are a total of 20 participants including 7 females and 13 male subjects, and the age of all subjects range from 19 to 75 years.
Each subject had no known injuries or abnormalities associated with human walking pattern. The on-body multi-sensors gait data were acquired by the five wireless motion sensors that are, respectively, located on the right and left wrists, the waist, the right and left ankles using elastic belt, as shown in Fig. 7. Each wireless motion sensor was equipped with a triaxial accelerometer within the range of 2g and a biaxial gyroscope within the range of 500dps while each axis is reported as a 12bit value to the on-body sensor. The sampling frequency is set to 30 Hz for data recording while each subject wearing all five sensors was asked to walk at a natural self-selected walking speed. Each subject performs the data recording of a total of 13 different walking patterns, such as stand, sit, lie down, walk forward, walk left-circle, walk right-circle, turn left, turn right, go upstairs, go downstairs, jog, jump and push wheelchair, were asked to perform. The detailed scheme of data collection can be found in reference [26].

Data denoise and normalization
In order to obtain the gait data with high quality, the collected data denoise and normalization was firstly performed. Here, a combination denoise scheme of a fifth order median filter with a finite impulse response (FIR) filter was proposed. Its basic idea is that a fifth order median was firstly Fig. 7 The illustration of different location on body of the five wearable sensors.S1, S2, S3, S4, S5 are denoted as sensor1, sensor2, sensor3, sensor4 and sensor5 on body, respectively. used to reduce noise for smoothing the data, and then, a FIR filter was employed to remove the redundancy information for obtaining the data containing the more valuable information about gait.
In the FIR filter, the size of filter window, cut-off frequency and sampling frequency were set to 64, 0.3Hz and 30Hz, respectively. And then, the denoise gait data was normalized to interval [-1,1] by using the linear Min-Max Normalization algorithm. This significantly contributes to avoiding the difference of the dimension and order of magnitude between data while retaining the original data structure for further data processing.

Data segmentation
In this study, data segmentation was implemented by the common sliding window method [36], in order to greatly contribute to performing further quantitative analysis of dynamic multi-sensors time-series gait data. Here, the fixed-length sliding window with two thirds overlap next to window was designed to perform data segmentation, in order to maintain the intrinsic best correlation in dynamic time-series gait data. In view of the significant impact of the length of sliding window (i.e. sliding window size) on the superior ability of both the valuable gait feature exploitation and gait classification, the selection of proper sliding window size was performed based on the best generalization performance of our proposed model.

Training and testing scheme of our proposed model
In order to effectively evaluate the best prediction ability of our proposed model, we build a training and testing sample data set including all multi-sensors gait data of 20 subjects in WARD database, each subject randomly selecting 10 gait patterns from all 13 gait patterns, such as stand, sit, lie down, walk forward, walk left-circle, walk right-circle, turn left, turn right, go upstairs, go downstairs. In this study, 105 data segmentations of each axis in accelerometer and gyroscope were arbitrarily extracted from the gait data of each participant. In view of all selected data segmentations, all samples data size can be obtained by DL OV SL OV N N Samples L N    (19) where DL N represents the number of data length (i.e. 20 subjects  each activity repeats 5 times  the total number of a activity sequence number of more than 10 seconds = 2311048), OV N denotes the number of the overlap size and SL L is the segmentation length [40]. According to equation (19), the total of samples data size were 66027.
And then, the detailed scheme of training and testing was given as follows. The 70 percent of all samples data were randomly selected as the training data set while the rest 30 percent of all sample data were used as the testing dataset. It is noted that the data for testing can not be included in the training set, in order to objectively and accurately evaluate the generalization ability. In the training process, the backpropagation learning algorithm with Adam-based iteration rule for gradient descent was selected to search for the optimal parameters of our classification model while achieving the minimization of loss function. In this experiment, Adam optimizer with initial learning rate was set to 0.001, and the default parameters 1  and 2  were set to 0.9 and 0.999, respectively. All parameters of classification models were randomly initialized by glorot uniform.
In the testing process, the final average results were obtained for all subjects while all optimal parameters of classification model were determined by the best generalization performance.
In addition, all deep learning algorithms programs were developed in the Tensorflow using python language, and they were implemented on computer with Intel(R) Core(TM) i5-6500 3.2GHz CPU, 8.00 GB RAM and Windows 7 operating system.

Evaluation criteria for our proposed model
In order to objectively evaluate the generalization performance of our proposed classifier, we adopted some common assessment criteria for gait classification task such as accuracy, recall, precision and F1-score. The detailed description of these criteria adopted was given as follows.
Accuracy is used to measure the generalization ability of our proposed model to recognize gait patterns in testing process, and is defined as [34].
Where TP and NP denote the total number of gait patterns truly detected and the whole number of gait patterns to be identified, respectively.
Recall is considered as a measure of the classification ability to specify gait with class j in the confusion gait between class i and class j , and is defined as [34].
Re c 100% TP all TP FN    (21) Precision is employed to objectively evaluate the ability of classification model to exactly discriminate gait patterns with class i in the confusion gait between class i and class j , and is defined as [34].
Pr e 100% TP cision TP FP    (22) where TP and FP denote the number of gait patterns accurately identified as class i and class j , respectively. F1-score is utilized to objectively assess the ability to exactly predict the class of each gait pattern in testing process, and is defined as [34].