Automated Gait Detection in Older Adults during Daily-Living using Self-Supervised Learning of Wrist-Worn Accelerometer Data: Development and Validation of ElderNet

Progressive gait impairment is common in aging adults. Remote phenotyping of gait during daily living has the potential to quantify gait alterations and evaluate the effects of interventions that may prevent disability in the aging population. Here, we developed ElderNet, a self-supervised learning model for gait detection from wrist-worn accelerometer data. Validation involved two diverse cohorts, including over 1,000 participants without gait labels, as well as 83 participants with labeled data: older adults with Parkinson’s disease, proximal femoral fracture, chronic obstructive pulmonary disease, congestive heart failure, and healthy adults. ElderNet presented high accuracy (96.43 ± 2.27), specificity (98.87 ± 2.15), recall (82.32 ± 11.37), precision (86.69 ± 17.61), and F1 score (82.92 ± 13.39). The suggested method yielded superior performance compared to two state-of-the-art gait detection algorithms, with improved accuracy and F1 score (p < 0.05). In an initial evaluation of construct validity, ElderNet identified differences in estimated daily walking durations across cohorts with different clinical characteristics, such as mobility disability (p < 0.001) and parkinsonism (p < 0.001). The proposed self-supervised gait detection method has the potential to serve as a valuable tool for remote phenotyping of gait function during daily living in aging adults.


INTRODUCTION
Aging is associated with progressive loss of motor function.These de cits are heterogeneous and may manifest as reduced walking speed, poor balance, increased, gait variability, increased fear of falling, and shorter stride length [1][2][3] .Objective measures of gait obtained during brief supervised gait testing in a lab or clinic predict survival, varied adverse health outcomes, and loss of independent living [4][5][6] .However, these brief assessments provide only a limited snapshot of an individual's gait abilities and may not re ect function and variability during the manifold demands of daily living 7,8 .Advances in unobtrusive sensor technology afford investigators the opportunity to obtain a more comprehensive assessment of mobility via remote multi-day recordings of daily living.However, the automated analytic tools employed for many commercially available devices focus nearly exclusively on healthy younger adults and do not account for the impairments observed in older adults during device development or validation 9,10 .Hence, there is an urgent need for the development and validation of automated tools to quantify daily living gait among the full health spectrum of older adults that reside in the community-setting 11,12 .
Previous studies investigating real-world gait employed accelerometers worn on the lower back, leveraging the inherent quasi-periodicity of lumbar movement during walking 13 .While these studies have demonstrated the potential of assessing daily living gait, sensor placement on the lower back may present limitations for long-term adherence due to potential discomfort 14 .A different approach is to ask participants to wear a wrist-worn accelerometer.Wrist-worn accelerometers have gained widespread use to measure daily living physical activity [15][16][17][18] .In this regard, the ubiquity and popularity of smartwatches make wrist-worn accelerometers a practical choice for ensuring high compliance in daily living studies.Wrist-worn accelerometers enable the extraction of a wide range of daily living behaviors, including sleep patterns 19 , circadian metrics 20 , and levels of physical activity 21 .While estimated physical activity levels can provide many insights 22,23 , to date, most studies using a wrist-worn accelerometer lacked detailed and high-resolution information about other crucial facets of gait quality 21 .Therefore, recent efforts have focused on leveraging these accelerometers to assess walking and gait quality.
The rst step in deriving gait metrics from an accelerometer is the detection of gait sequences from the raw accelerometer signals 24,25 .Gait detection from a wrist-worn accelerometer is more challenging compared to other locations, such as lower limbs or lower back, due to the non-gait related hand movement and the fact that wrist movements often deviate from the expected periodic swinging during the gait cycle.This may occur, for instance, when an individual walks while simultaneously engaging in other activities, such as texting.This challenge is exacerbated for older adults and people with gait disturbances, such as Parkinson's disease who manifest reduced arm swing during walking 26 .People with Parkinson's disease also exhibit symptoms of tremor and dyskinesia, which impact wrist movements and contribute to an overall less stable and consistent gait pattern, factors complicating gait detection algorithms 24 .
Classical gait detection algorithms employ signal processing techniques, such as peak detection and wavelet analysis, to extract features both from the time and frequency domain 25,27 .These features are then used to identify gait sequences based on the repeated periodic waveforms manifested during gait.However, the complex wrist movements render the differentiation between gait and non-gait movements very challenging.Alternative approaches are needed to detect gait from wrist-worn accelerometers.
Previous studies addressed this goal by employing supervised machine learning algorithms that were trained to identify patterns in the signal associated with gait 17,28,29 .Kluge et al. 25 conducted a comprehensive analysis of gait detection algorithms using accelerometer data from lower-back and wristworn accelerometers.The algorithms were trained on data from healthy young adults and subsequently tested on diverse subsets of adults from the Mobilise-D technical validation study 30 , including older adults with and without varied diagnoses.They found, not surprisingly, that algorithms based on lowerback data outperformed wrist-based algorithms.Yet, the reduced performance of wrist-based algorithms may be attributed, in part, to being trained on data from healthy young adults, potentially leading to suboptimal performance among older adults.This highlights the need to optimize wrist-based algorithms for older adults, who more commonly show heterogeneous gait abnormalities that do not occur as frequently in younger adults.
The best performing wrist-based algorithm identi ed in the study by Kluge et al. was initially developed and validated in Brand et al. 24 .In this study, we employed a supervised convolutional neural network with U-Net architecture 31 for gait detection, focusing on older adults and people with Parkinson's disease (PD).
The results were then compared to those of a control group comprising healthy young adults.Our ndings indicated that biological meaningful measures of gait quality (e.g., cadence and gait regularity) and quantity (e.g.daily walking duration) could be derived from a wrist-worn accelerometer.However, it is crucial to note that the model's performance was reduced when applied to older adults and individuals with PD, compared to the healthy young adult control group.An important impediment for training a supervised model that can be applied to older adults and varied clinical conditions derives from the scarcity of ground-truth labels indicating the temporal location of the gait sequences, especially for recordings of unsupervised movement during daily living.
Recently, there has been a growing interest in leveraging self-supervised learning (SSL) methods to overcome the gap imposed by the shortage of labeled data 32 .SSL generally comprises two main stages.First, learning feature representations of varied signals using a substantial amount of unlabeled data, which can be achieved through methods such as multi-task learning (MTL) 33 and contrastive learning 32,34 .An example of contrastive learning is the SimCLR method: "A Simple Framework for Contrastive Learning of Visual Representations" 32 .In these approaches, the model's objective is to predict characteristics of the signal that do not require any labels.This stage is commonly referred to as the 'pretext' stage.The second stage involves ne-tuning the SSL model with a smaller set of labeled data in a supervised manner for a downstream task (e.g., gait detection).
The SSL approach has demonstrated signi cant potential in varied human activity recognition tasks 35- 37 .For example, Yuan et al. 38 utilized the UK Biobank dataset, which comprised daily living recordings from a wrist-worn accelerometer, to develop an SSL model for activity recognition and exhibited improved performance in several tasks and datasets.Small et al. 39 ne-tuned this SSL model for gait detection on a semi-living dataset, termed OxWalk, which included approximately one hour of recording in a home environment.However, the dataset used for ne-tuning included only healthy adults (N = 39, mean age = 38.5 years).Thus, their model may not be optimized for older adults or individuals with gait disturbances.
Here, we developed and evaluated a gait detection deep learning approach, termed ElderNet, that was oriented and optimized for older adults and, in particular, those who might have impaired gait.

Performance of the Gait Detection Algorithm
To develop ElderNet, an SSL model was trained using the MAP database constituting 950 participants.
Next, the labeled data from Mobilise-D was used for ne-tuning ElderNet and evaluating its performance (Fig. 1).83 participants were included in the Mobilise-D dataset.Table 1 summarizes the characteristics of the Mobilise-D dataset.The model predictions made by ElderNet signi cantly outperformed the two other state-of-the-art algorithms 24,39 both in terms of accuracy and F1 score.The median accuracy for ElderNet was 96.86%, surpassing the U-Net at 93.69%, and OxWalk at 92.83% (p < 0.001).In terms of F1 scores, ElderNet achieved a score of 86.52%, outperforming the U-Net and OxWalk models which achieved scores of 67.29% (p = 0.046) and 73.51% (p < 0.01), respectively (Fig. 2). Figure 3 shows a representative example of a raw acceleration signal containing gait sequences, along with the predictions of the different models and the corresponding ground-truth labels.

Exploring Construct Validity
To examine the construct validity of the output of ElderNet, we rst applied it on an unseen portion of the MAP dataset (N = 157) that was not utilized during the training of ElderNet.Table 2 summarizes the characteristics of this test dataset.A preliminary analysis based on the detected gait events, revealed a few statistically signi cant differences across different subject populations and disease cohorts.The average daily walking duration displayed variations among participants in different demographic and clinical groups, as demonstrated in Fig. 4. A signi cant difference in daily walking durations was observed between age groups, indicating a decline in walking activity with age, supporting its utility.To account for this, we performed partial correlation analyses, adjusting for age, sex, and BMI in subsequent comparisons, and found that the differences between groups remained statistically signi cant.Furthermore, Fig. 4 illustrates that participants with a mobility disability score of 0 (no mobility disability) walked signi cantly more minutes per day than those with scores of 2 (p < 0.01) and 3 (p < 0.01).
Additionally, participants with a mobility disability score of 1 also showed a signi cant difference from those with a score of 3 (p < 0.048).Examining participants with different parkinsonism scores in terms of the number of parkinsonian signs, we observed that individuals without any parkinsonian signs walked signi cantly more than those with 1 sign (p < 0.001) or two or more signs (p < 0.001).

DISCUSSION
In this work, we developed and validated a gait detection algorithm (ElderNet), speci cally designed for older adults with and without gait impairments.ElderNet demonstrated superior performance compared to the two state-of-the-art models.It achieved the highest accuracy, signi cantly surpassing the OxWalk model 39 .Moreover, its F1 score was higher than both OxWalk and the U-Net 24 models.Additionally, ElderNet achieved at least comparable results in other metrics such as speci city, recall, and precision.The imbalance between gait and non-gait sequences in daily living is often expressed by a signi cant trade-off between precision and recall 24 .While the U-net and the OxWalk models indeed exhibited such a trade-off, our model was prominent with stable precision and recall, resulting in a high F1 score.This suggests that ElderNet is well-suited for daily living data, capable of identifying most existing gait sequences (i.e., high recall) with high con dence (i.e., high precision).
Gait detection algorithms often lack labeled data from daily living datasets, particularly for older adults and individuals with gait impairments.This scarcity of labeled data prevents the algorithms from being optimized for these populations, whose gait signals can be diverse and abnormal.Here, an SSL method was utilized to address this gap.First, a pre-trained model trained on the UK Biobank data was leveraged.
The UKB dataset consists of 100,000 participants who wore a wrist-worn accelerometer in their daily lives, making it the largest dataset of its kind.Due to its size, we anticipated bene ts from incorporating this pre-trained model into our SSL phase.Indeed, utilizing this pre-trained model led to a higher F1 score (82.59) than training the SSL model from scratch (F1 score of 77.15, Supplementary Table S1).
Our objective was to develop a gait detection algorithm tailored for older adults, aiming to bridge the current accuracy gap observed in algorithms designed for this population 24 .While the UK Biobank dataset included a large number of older adults, its participants were recruited in the age range of 45-69, with a mean age of 62 for the visits that involved wearing the wrist accelerometer.To address this limitation, we leveraged the MAP dataset with a mean age of 83 years old (range 62-103) and more than 1000 participants.We found that integrating the MAP data into our combined model enhanced its overall performance (Supplementary Table S2).This improvement may be attributed to the fact that the extensive MAP data used to train ElderNet better represented the characteristics of the target population i.e., older adults that were also re ected in the test set Mobilise-D data.
Two different SSL approaches were explored, namely MTL and SimCLR.Overall, both methods yielded similar performance, with a slight advantage favoring the MTL results, but with no signi cant difference (Supplementary Table S3).These ndings are consistent with a previously published paper that observed similar results for SimCLR and MTL in human activity recognition tasks using acceleration data from the wrist 35 .Finally, we compared ElderNet with its supervised counterpart (Supplementary Table S4).Remarkably, our model exhibited superior performance compared to its supervised counterpart, achieving an F1 score of 84.74 for ElderNet compared to 79.21 for the supervised model.This underscores the potential of leveraging large unlabeled data to learn feature representations of the data.
In this study, the Mobilise-D data was utilized for the ne-tuning phase, leveraging its unique characteristics.Firstly, the dataset incorporates a robust reference system, the INDIP system, whose accuracy has been previously validated against an optical motion capture system.The results showed excellent absolute agreement (ICC > 0.95) within a laboratory setting 43,44 , establishing the INDIP system as a reliable method for obtaining reference data in real-world environments.Moreover, the Mobilise-D dataset contains daily living data from older adult populations, particularly those with speci c medical conditions that affect mobility.Notably, this cohort includes older adults who utilize walking aids, exhibiting abnormal gait signals from the wrist accelerometer, thereby complicating gait detection 25 .However, ElderNet exhibited high performance on this subcohort, showcasing its generalizability to diverse gait patterns.While we acknowledge that the 2.5-hour assessment used in Mobilise-D data may not fully capture the complete variability of real-world walking, this dataset remains one of the largest available with comprehensive gait and non-gait reference information across various disease indications with labels.
The establishment of ElderNet sets the stage for subsequent studies aimed at extracting meaningful digital mobility outcomes related to gait quantity and quality from the identi ed gait sequences 17,45 .Gait measures have already been shown to serve as potential biomarkers for age-related health outcomes 5,46 .
Notably, gait speed has been shown to be associated with survival rates in older adults 47 .A recent study has demonstrated that using a simple model based solely on mean acceleration data can facilitate the prodromal diagnosis of Parkinson's disease 48 .We hypothesize that incorporating higher-level gait measures into such models can augment their predictive capabilities, leading to better identi cation of multiple neurological conditions that manifest with gait impairments.
It is important to highlight that we standardized the sampling rate of all datasets to 30 Hz to align with the frequency used in the pre-trained UK Biobank model.This relatively low sampling rate allowed for the e cient use of long-duration recordings.Exploring the rami cations of using different sampling rates should be addressed in future work.While the MAP data utilized in the SSL phase and the participants from the Mobilise-D data shared similarities in their emphasis on older adults, there were notable differences between them.Particularly, the average age of the MAP is higher (83 years) than that of the Mobilise-D data (72 years).Additionally, the Mobilise-D dataset predominantly includes participants with speci c medical conditions, unlike the MAP data which is not exclusively focused on populations with diseases.We attempted to address this by standardizing both datasets (MAP and Mobilise-D) using a zero-mean unit-variance whitening 35 .However, we observed that standardizing the MAP data, but not the Mobilise-D data, resulted in improved outcomes (Supplementary Figure S1).
The data was segmented into non-overlapping 10-second windows, both in the SSL and ne-tuning steps, to align with the UK Biobank pre-trained model, which utilizes the same window size.Consequently, we de ned windows containing 5 seconds or more as gait windows in our labeled dataset, omitting gait sequences shorter than ve seconds.However, this approach can lead to an underestimation of the number of gait sequences that occur in daily living.A potential consequence of this approach could be the estimated daily walking duration, as observed in the construct validity step (recall Fig. 4), which was found to be slightly lower than reported in the literature 49 .To address this issue, we explored the use of dense labeling, involving a shift to per-sample labels and outputs in the ne-tuning model.Despite this modi cation, the model's performance was found to be lower compared to using window-based labeling, and there was no meaningful change observed in the estimated daily walking time (Supplementary Table S5, Supplementary Figure S2).This suggests that the alternative dense labeling strategy does not provide a signi cant improvement in capturing daily walking patterns.

CONCLUSIONS
This study introduced ElderNet, a novel gait detection model developed and validated for older adults with and without known health conditions that can affect gait.The model demonstrated high performance in accurately identifying real-world gait sequences extracted from wrist recordings.When applied to unlabeled daily living data, ElderNet successfully revealed differences between different clinical groups supporting further clinical testing of its e cacy.Given that many older adults experience gait impairments, a reliable system for gait quanti cation is crucial for obtaining a comprehensive characterization of gait function remotely during daily living.ElderNet addresses that need.

METHODS
This study was composed of four stages: 1. Self-supervised learning: training an SSL model on a large amount of unlabeled activity data to learn the feature representation of daily living acceleration data.
2. Fine-tuning: utilizing the model from the SSL step for training a supervised gait detection system (ElderNet) using labeled data.
3. Gait Detection Test Phase: comparing the results of the gait detection model with 2 state-of-the-art algorithms on an independent test set.
4. Exploring construct validity: applying ElderNet on another unseen dataset to examine the potential of gait-based analysis for identifying differences between cohorts of different clinical characteristics.

Preprocessing
To maintain uniformity in comparison with state-of-the-art algorithms, we standardized the acceleration data across the various cohorts by resampling to a 30 Hz resolution and dividing the signals into 10second non-overlap windows, following a methodology similar to the UK Biobank study 38,39 .We considered the window as a gait window only when half or more of it was labeled as gait.Given that the typical gait frequency is less than 10 Hz, the 30 Hz sampling rate surpasses the Nyquist frequency, preventing any loss of essential signal information.

Participants and Wearable Sensors
Participants were community-dwelling older adults enrolled in an ongoing cohort study of chronic conditions of aging, known as Rush Memory and Aging Project [40][41][42]  Participants wore the GENEActiv device (Activinsights Ltd.; Cambridgeshire, UK), a triaxial accelerometer, on their non-dominant wrist for 24 hours/day for up to ten consecutive days.Acceleration data were sampled at 40/60 Hz, with a range of ± 8 gravitational acceleration units (g).This dataset lacks labels indicating the presence or absence of gait.The free-living nature of this data enables us to expose the SSL model to a variety of human activities and gait sequences, supporting the extraction of meaningful signal features.

Self-Supervised Approaches
Typically, SSL models consist of a main trunk, usually a convolutional neural network, referred to as a feature extractor, which produces a vector containing feature representations.The feature vector is then adjusted to a different dimension to match the 'pretext' task associated with the chosen SSL approach.In this study, we investigated two SSL approaches, namely MTL and contrastive learning (SimCLR).We selected these approaches based on their demonstrated superior performance in downstream human activity recognition tasks, as identi ed through an extensive exploration of various SSL approaches using wearable sensors 35 .
In the MTL approach, each acceleration window undergoes data augmentation, where the objective of the model is to predict the augmentation of the signal (pretext task).Following the methodology of Yuan et al. 38 , 4 distinct augmentations were employed: 1. Reversing the signal.2. Permutation of different segments of the window, with each segment comprising 10 samples.3. Time warping, which alters arbitrary segments of the signal by stretching and compressing them.4. Scaling each of the acceleration axes with a random factor.Each window has a random probability of undergoing each of the augmentations, and the model predicts whether the window underwent the augmentation, resulting in four binary outputs.The model's loss is calculated using the cross-entropy function for all four augmentations and then averaged to produce the nal loss.
The SimCLR contrastive learning method also employs data augmentations.In SimCLR, each window undergoes two augmentations, resulting in two distinct views of the same window.Views originating from the same source window are considered "positive" pairs, while views stemming from different sources are considered "negative" pairs.For instance, if we initially have N windows of acceleration signal, the transformation yields 2N views of the windows.Thus, for every positive pair of windows, there are 2N-2 negatives.In this study, we utilized a 3D rotation transformation as the augmentation function.
In this augmentation, a random axis in 3D and a random rotation angle are drawn from a uniform distribution, and the corresponding rotation is applied to the window.This can be considered as a way to simulate different sensor placements 34 , making it especially effective for wrist accelerometers where the axis orientation frequently changes.We speci cally chose this augmentation due to its demonstrated superior performance in downstream human activity recognition tasks associated with the SimCLR approach 34 .The different views of the windows pass through the model encoder (i.e., feature extractor), resulting in an output that re ects the different windows as feature vectors.Next, a contrastive loss function is employed to calculate the relationships between pairs of vectors using cosine similarity.The objective of the loss function is to maintain proximity in the feature space for vector representations of "positive pairs" while ensuring that "negative" pairs remain distant in this space.This loss is also known as the normalized temperature-scaled cross-entropy loss (NT-Xent) 32 .

Model Con gurations
To enhance the model's performance, the incorporation of a pre-trained model as the feature extractor of the SSL model was used.Speci cally, we employed a model developed by Yuan et al. 38 , which utilized the diverse UK Biobank dataset to train an SSL model using the MTL approach.The architecture of the pre-trained model was ResNet-V2 with 18 layers.The input acceleration data underwent through the pretrained model, resulting in an intermediate output-a vector with dimensions (1024, 1).Subsequently, we introduced additional layers on top of the pre-trained model, referred to as a model's head.The intermediate vector then traversed through these additional layers to produce the nal output, suitable for the pretext task.While the weights of the pre-trained model were frozen during the training of our model, indicating they were not updated during gradient calculations, the weights of the model's head were updated.This modi cation to the pre-trained model allowed us to tailor our model to older adults using the MAP data, considering that the pre-trained UK Biobank model did not exclusively focus on older adults.We termed our combined model ElderNet.Figure 1 illustrates the pipeline of our model.
We experimented with 3 different versions for the model's head, each with increasing complexity: 1.Using 3 fully-connected layers without non-linearity between them.2. Using the same fully-connected layers, but with ReLU non-linear activation function between the layers.3. Utilizing the U-Net with an architecture similar to the model employed during the testing phase.Supplementary Table S6 provides more details on the models' hyperparameters and implementation.

Participants and Wearable Sensors
For optimizing and evaluating algorithms for gait detection, a dataset from the Mobilise-D technical validation study was used.This multi-center observational dataset, originally aimed at validating realworld digital mobility outcomes included different patient and healthy populations.A comprehensive description of the study's experimental protocol, incorporating all inclusion and exclusion criteria, can be found in 30 .Brie y, 112 participants across ve different disease cohorts and one cohort of healthy adults were studied.The patient groups included chronic obstructive pulmonary disease, Parkinson's disease, multiple sclerosis, proximal femoral fracture, and congestive heart failure patients.We excluded the multiple sclerosis group (N = 20, mean age = 48.7 years) as we aimed to customize the model to older adults and the MS cohort comprises also young adults.In addition, nine participants were also excluded due to missing data, resulting in 83 participants overall used for this step.All participants gave written informed consent before participation.The participants were monitored during 2.5 hours of real-world living undergoing their normal activities.The participants were equipped with an accelerometer worn at the wrist on the non-dominant hand and a validated multi-sensor system, the INDIP (INertial module with DIstance sensors and Pressure insoles) as reference 30,43 .

Fine-Tuning Procedure
The ne-tuning step involved a supervised learning procedure.The model's input comprised the Mobilise-D dataset, which contains labels indicating the temporal location of the gait sequences.We divided the Mobilise-D data into 75%-25%, where 75% of the data was used for training and validation of the supervised model, as well as for assessing different model con gurations, and the remaining 25% was reserved for testing the model.We selected this ratio to ensure comparable distributions between the training and test sets, ensuring that each cohort has at least 3 participants in the test set.The divisions were made subject-wise, ensuring that the data points belonging to a particular subject were entirely contained within one subdivision and did not get shared across other subdivisions.We utilized the trained model from the SSL step to train a gait detection model.That is, the weights learned on the extensive unlabeled data served as a robust starting point for training a supervised gait detection model.To adapt the SSL model for gait detection, we modi ed its last layer to function as a linear layer producing a binary output (i.e., gait/non-gait).During the ne-tuning process, we allowed the model to update all of its weights.This decision was based on prior studies that demonstrated the preference for not freezing weights in the ne-tuning procedure 38,50 .
In the ne-tuning process, we again split the training set, corresponding to 75% of the entire data, into an 80 − 20 ratio.Eighty percent of this subset was used for training and 20% for validation.We applied vefold cross-validation on the training set, strati ed by class label and grouped by participant.An earlystopping mechanism was implemented to halt training when the loss stopped decreasing for ve consecutive epochs.The cross-validation process was repeated with three different seeds, representing three different divisions of the folds, to obtain more generalizable results independent of a speci c order of the data.The results from the three iterations were averaged to derive nal performance metrics.The ne-tuning and performance evaluation processes were implemented for all different SSL con gurations (refer to the Models Con guration section), utilizing only the training set.For each unique con guration, its performance after ne-tuning the Mobilise-D data was recorded.The con guration that yielded the best results was then selected as our model for comparison and further analysis.Supplementary Figure S3 illustrates the ow of this process.

Ablation studies
To further explore the in uence of different components of the SSL model on its downstream performance, several ablation studies were conducted.Initially, the impact of utilizing the pre-trained weights from the UK Biobank model was investigated 38 .For this purpose, the same architecture of the pre-trained SSL model was evaluated (i.e., ResNet) twice-once with the pre-trained weights from the UK Biobank model initialized, and once trained from random initialization on the MAP dataset.
To assess the contribution of the MAP dataset in tailoring the model to older adults, the combined network (the pre-trained model with the newly added layers) was utilized, and its performance was evaluated with and without utilizing the MAP data.This investigation allowed us to discern whether the performance difference stemmed solely from the expansion of the pre-trained model architecture (by adding the new layers) or if the use of a dataset focused on older adults, such as the MAP data, also played a role.

Stage 3: Testing
The F1 score from the ne-tuning step was used to select the best model con guration.The choice of using the F1 score for model selection is based on the inherent imbalance of daily living data in terms of gait, where gait sequences are much less frequent than non-gait ones.In imbalanced datasets, the F1 score provides a more realistic and unbiased assessment of the model's performance 24 .Model Performance was tested at the window level (i.e., comparing the prediction and the label of each window).The assessment was conducted at the window level, comparing predictions with the corresponding labels for each window.

Model Comparison
We compared the resulting ElderNet model with two state-of-the-art gait detection algorithms.The rst comparison algorithm employed a U-Net architecture, developed and validated in our recent publication 24,25 .The U-Net model was originally trained on healthy young adults.The second model in our comparative analysis was an SSL algorithm pre-trained on the UK Biobank dataset and subsequently ne-tuned for gait detection in healthy adults, which was referred to as the OxWalk dataset 39 .We tested these 3 models on 25% of the Mobilise-D data, which was not used in the ne-tuning step.The performance metrics were calculated for each of the 21 participants in the test set, and then averaged to obtain the nal performance.

Stage 4: Assessing construct validity
As a preliminary exploration of the clinical potential of the gait-detection information introduced by ElderNet, we applied the model to an unseen portion of the MAP dataset, ensuring that participants used in this step were distinct from those involved in the SSL phase.A total of 167 participants were assigned to this stage.To accurately analyze participant activity, we excluded time segments indicating participants who were not wearing the device.These non-wear periods were de ned as consistent low movement (low STD) across all acceleration axes for at least 30 minutes 51,52 .For each participant, we extracted data from four full (24-hour-long) days, as a recent study has shown that this duration provides reliable gait quantity measures 53 .Ten participants were excluded, due to an insu cient amount of activity (less than 96 hours of data), resulting in a nal number of 157 participants who were included in this stage.ElderNet was applied to the four days of data to identify the gait sequences.Subsequently, for each day, we summed the number of gait sequences and de ned the median value as the daily walking time.
To examine the construct validity of ElderNet, differences in daily walking time among participants belonging to different clinical cohorts were investigated.Speci cally, we examined 2 motor-related clinical variables: the mobility disability score, assessed using the Rosow-Breslau scale 54 , and the number of parkinsonism signs 55 .The modi ed version of the motor portion of the United Parkinson's Disease Rating Scale (UPDRS III) was used to assess the presence of four Parkinsonian signs: bradykinesia, gait, rigidity, and tremor 56 .Participants were categorized into three cohorts (no sign, 1 sign, 2 + signs).We hypothesized that daily walking time would differ between these cohorts, with individuals without mobility disability spending more time walking than those with mobility disabilities 23 .
Additionally, we expected individuals without Parkinsonian signs to spend more time walking than those with 1 or more Parkinsonian signs 57 .

Statistical analysis
The Kruskal-Wallis test was performed to identify signi cant differences between ElderNet and state-ofthe-art models across the test performance metrics.Dunn's post-hoc analysis was applied to reveal the sources of difference among the models.In the context of construct validity, the Kruskal-Wallis test assessed differences in daily walking durations across cohorts with distinct demographic and clinical statuses.The corresponding Dunn's post-hoc analysis was then used to pinpoint the sources of variation in walking durations.To address multiple comparisons in all instances, the Bonferroni correction was applied.The Kruskal-Wallis test and Dunn's post-hoc analysis were implemented using the 'kruskal' function from the scipy.statslibrary and the posthoc_dunn function from the scikit_posthocs library, respectively.Partial correlation analyses were performed to adjust for age, sex, and BMI, using IBM SPSS Statistics software (Version 29.0.0.0).

DECLARATIONS Figures
illustrates the ElderNet pipeline.In the SSL phase, data was segmented into non-overlapping 10-second windows.These windows underwent signal augmentations and were used as input for the SSL model, composed of the UK Biobank pre-trained model and additional de-novo optimized layers, producing a feature vector.In the MTL task, the feature vector undergoes a linear transformation to generate binary predictions of the multiple possible augmentations.In the SimCLR task, the loss is calculated directly from the feature vector.The weights from the SSL phase were then ne-tuned to train a supervised gait detection model using labeled data from the Mobilise-D dataset.

Figure 3 A
Figure 3

Table 1
Characteristics of Older Adults in the Mobilise-D Technical Validation Study HA: Healthy adults, CHF: Congestive heart failure, COPD: Chronic obstructive pulmonary disease, PD: Parkinson's disease, PFF: Proximal femoral fracture.

Table 2
Average Daily Walking Durations across Demographic and Clinical Factors in the MAP unseen Dataset.
*0 indicates the absence of mobility disability.3 indicates a high level of disability.