The current investigation presents the validation of seven step-counting algorithms on a common dataset consisting of walking trials at varying speeds and conditions. Four of these algorithms (Lee, Femiano auto, Femiano WPD, and ADEPT) were selected as state-of-the-art based on the literature on step estimation using accelerometry. The other three algorithms (CPIWv1, CPIWv2, and CSEM) represent ActiGraph’s existing and partner algorithms for step count estimation. The current investigation builds on the validation efforts of step-counting algorithms, specifically based on wrist accelerometry. It has been reported that the accuracy of wrist-based devices is lower than the waist-based devices in detecting steps [20]. Wrist-worn devices, however, can achieve higher compliance than traditional hip- or waist-worn monitors [6] thanks to their convenience and comfort. The algorithms for processing wrist-derived raw accelerometer data are rapidly evolving. Therefore, validation efforts such as the current investigation, are critical to informing the accuracy of step count algorithms specific to the wrist location so that these algorithms can be confidently integrated into clinical trials utilizing DHT.
The need for fit-for-purpose algorithms
The current investigation for the first time systematically evaluated a diverse group of step-counting algorithms for evaluation. These algorithms represent smartphone-based step detection (Lee et al.), smartphone-accelerometry-based algorithms transformed for wrist-accelerometry-based step detection (Femiano et al.), and fit-for-purpose wrist-accelerometry-based step counting algorithms (ADEPT, CPIWv1, CPIWv2, and CSEM). In the current investigation, Lee et al. showed considerably lower accuracies compared to the other algorithms. Lee et al. was originally developed for smartphone-based step detection with average accuracies of over 98.6% for any combination of step mode (walking, running, etc.) and device pose (texting, swinging, pocket, etc.). The results from the current investigation, however, suggest that this algorithm cannot be used “as is” for wrist-worn accelerometry. Note that this should not be interpreted regarding the performance of Lee et al. for its original use as smartphone-based step detection. Femiano algorithms represent a class of algorithms that were originally developed for smartphone-based step detection and were tuned and validated for wrist accelerometry. For Femiano et al., we implemented Femiano auto and WPD as reported in [16] and no further tuning of algorithms was performed on the current dataset. Even with the optimization, Femiano et al. performed at slightly lower accuracies compared to the previously reported accuracies (95.9% for Femiano WPD and 94.2% for Femiano auto, reported by [16]). These results highlight the need for developing and validating fit-for-purpose algorithms such as ADEPT, CPIW, and CSEM for wrist-accelerometry-based step count estimation.
Strengths and weaknesses of current algorithms
All algorithms developed for wrist-worn accelerometry performed step count estimation at acceptable accuracies. Most algorithms were agnostic to the device type (GT9X vs CPIW), except for Femiano auto and ADEPT for 1MWT at a comfortable speed (Table 4A). All algorithms showed no significant differences (p < 0.05) in accuracies with respect to the device placement (left wrist vs right wrist) (Table 4B). These results suggest the adaptability and robustness of these algorithms in extracting step counts using wrist accelerometry. It was found that ML-based algorithms (CPIWv1, CPIWv2) and CSEM’s algorithm delivered superior performance compared to other algorithms (ADEPT, Lee, Femiano auto, and Femiano WPD). ML-based algorithms are data-driven and training data is required for developing such algorithms, whereas CSEM’s algorithm is driven by a movement frequency detector which does not require specific training. The other algorithms employ traditional step-by-step processing of the raw signals to extract step count. These algorithms do not require any training data as such, but the algorithm parameters are highly specific to one dataset and may or may not be appropriate for other datasets/populations/conditions. ADEPT, a template-based algorithm, requires a predefined pattern. We utilized a publicly available pattern [11], [12], which may or may not be appropriate for the patient populations or different walking conditions.
For widespread adoption of these algorithms across clinical trials, the algorithm parameters might need to be tuned accordingly, particularly for the algorithms such as Femiano et al. and ADEPT where slight modifications in certain parameters can have significant differences in accuracy. Femiano et al. reported that the empirical tuning of parameters from its original version (by Gu et al. [18]) was needed to optimize the accuracy. For ADEPT, a slight change in the ‘pattern_dur_grid’ parameter significantly affected the accuracies, and the parameter was set to [0.72–1.7] (i.e., the template pattern duration between 0.72 and 1.7 seconds). The rationale for selecting these values was based on the normal stride duration values for healthy young and older adults [21]. In general, the task of tuning these parameters can be subjective, ad hoc, data-specific, and time-consuming. This restricts the adoption of such algorithms on real-world data where ground truths are not available. In the future, if these algorithms are to be utilized for different populations, these parameters need to be tuned on a controlled dataset with ground truth, before applying to real-world data. Even in such a case, the accuracies may still be uncertain without the validation with ground truth data in the RW. Further, the performance discrepancies of ADEPT and Femiano et al. can also be attributed to the type of activities included in the current dataset. Our current dataset includes trials at three categories of self-selected speeds (comfortable, slow, and fast) and two categories with distinct hand postures (hand-in-pocket and phone-in-hand) of overground walking compared to Femiano et al. dataset which had walking trials at comfortable speed but under different conditions (walking, running, nordic, arm movement with and without walking). The average length of a walking bout for Femiano et al. validation data was 5.22 ± 0.84 minutes. ADEPT also utilized the dataset with long (bout range: 2.5-4 minutes, distance: 1500 feet), and straight-walking bouts without any turns [11]. For the current investigation, all walking trials were shorter (1-minute) long except for the 6MWT. Interestingly, both Femiano et al. algorithms and ADEPT showed their highest accuracies for the 6MWT- the longest walking trial in the current dataset.
Current limitations and future recommendations
The current investigation uses the raw acceleration data recorded during the walking tests which is a simpler task than step counting in the RW. For the application in the RW, the algorithms need to specifically “detect” walking bouts before step count estimation. All of these algorithms (except for Lee et al.) have a walking bout detection stage before extracting step count information. Activity or walking bout classification applied as a top layer before extracting gait features such as step count can help to isolate walking segments and increase accuracy and computational efficiency by avoiding the processing of non-walking data. Lastly, the current investigation uses the algorithms with their default parameters (and the template pattern for the ADEPT) and no individual optimization was performed as it was outside the scope of this investigation. The performance of these algorithms might likely improve further after their respective parameters are tuned based on the specific dataset or walking conditions.