Categorisation of Sit-To-Stand Motion Pattern. Human vs Automated Quantitative Assessments.

BACKGROUND: The Sit-to-Stand (STS) test is widely used in clinical practice as an indicator of lower-limb functionality decline, especially for older adults. Hitherto, due to its high variability, there is no standard approach for categorising the STS motion pattern, and the vision-based evaluation remains the most reliable method to evaluate people’ performance. This paper presents a comparative analysis between visual assessments and an automated-software approach for the categorisation of STS, relying on registrations from a force plate. METHODS: A group of 5 participants (30 ± 6 years) took part in 2 different sessions of visual inspections on 200 STS movements randomly extracted from a dataset of 742 acquisitions under self-paced and controlled speed conditions. Assessors were asked to identify three specic STS events from the Ground Reaction Force, simultaneously with the software analysis: the start of the trunk movement (Initiation), the beginning of the stable upright stance (Standing) and the sitting movement (Sitting). The Test-Retest Reliability between rst and second visual evaluations was compared with the Inter-Rater Agreement between visual and software assessments, as indexes of human and software performance, respectively. RESULTS: No statistical differences between methods were found for the identication of the Initiation and the Sitting events at self-paced speed and for only the Sitting event at controlled speed. The estimated signicant values of maximum discrepancy between visual and automated assessments were 0.200 s [0,039; 0.361] in unconstrained conditions and 0,340 s [0,014; 0,666] for standardised movements. CONCLUSIONS: The software assessments displayed an overall good agreement against visual evaluations of the Ground Reaction Force, relying, at the same time, on objective measures. In this sense, the proposed approach can provide robust and consistent data in the eld of Big Data analytics, augmenting the performance of articial intelligence methods for Human Activity Recognition tasks.


Background
Performance-based tests are important clinical tools used to identify functional decline in older adults and patients with either neurological or motor impairments [1]. Among them, the Sit-to-Stand test (STS) is strictly correlated with functional capacity and with independence in the activities of daily living [2][3][4][5][6][7][8][9]. Furthermore, its simplicity allows it to be performed both in clinical and in-home environments across a broad range of patients' functional conditions [1,5]. For these reasons, the STS movement is widely used in clinical research and practice, either as a single test [10][11][12] or as part of evaluation scales like the Short Physical Performance Battery [13]. Diversely from gait, a standardised categorisation for the STS motion pattern has yet to be established, due to its intrinsic variability. Furthermore, the lack of consensus in the de nitions and measurement methodologies increases the uncertainty even in the de nition of those events that normally are generally recognisable. For instance, the lifting of the thighs from the chair is de ned across the literature as the peak instant of the horizontal [14] and vertical components [15] of the Ground Reaction Force (GRF), the time of maximum anterior head movement [16] or through a sensor-equipped chair [17].
In an attempt to describe a general model of the movement, Nuzik [18] divided the STS into two phases through the analysis of camera recordings, i.e. the exion phase, consisting of the forward exion of trunk and hip, and the extension phase, characterised by the lifting from the chair to the extension of knees, hips, and ankles. In later years, a ner sub-categorisation was introduced by Schenkman and colleagues [19]. They identi ed other two phases, with the use of a motion capture system, i.e. a "momentum transfer phase" between the exion phase and the extension phase, when the inertia is transferred from the trunk exion to the upper body, and a "stabilisation phase" when the hip is completely extended, and all the movements associated with stabilisation from rising are completed. Differently, Etnyre and colleagues analysed the kinetics of the STS under different conditions to recognise eleven invariably occurring events in the GRF, i.e. six in the vertical direction, three in the fore-aft direction, and two in the medial-lateral direction [20].
In general, despite the large amount of data obtained from a wide variety of sensors and technologies, it is still not possible to identify a univocal approach for a standardised de nition of the STS motion pattern. Kinematic parameters proved to be strongly affected by the high within-between individuals' variability, limiting the performance of automated techniques to the recognition of dynamic transitions and static positions. On the contrary, kinetic variables permit a ner discretisation of the movement, but they still depend too much on visual evaluations to obtain reliable results [20,21].
According to the previous statements, to establish an objective method to describe commonly occurring events in STS is essential to assure quantitative and repetitive measures to be used as a reference in clinical practice and research.

OBJECTIVE
This study aims at evaluating the performance of a new algorithm able to automatically recognise speci c events in the STS movement from GRF pro les and at comparing its results with the reliability displayed by human visual assessments. The routine implementation is described in detail in an additional document (see Additional File 1).

DEFINITION OF THE DATASETS
We evaluated the performance of the proposed method on two Human Activity Recognition datasets (HAR1 and HAR2), collected at the REHELab (University Campus of Savona, Via Magliotto 2, 17100, Savona, Italy). Both HAR1 and HAR2 represent a series of sequential movements collected from a convenience sample of healthy young adults ( Table 1). The inclusion criteria for eligible subjects were: good health, absence of musculoskeletal or neurological disorders and, ability to easily rise from a chair. Each participant had to sign an informed consent. Participants performed the STS repetitions on a force plate (Kistler Winterthur, Switzerland). During the execution of the movement, a custom-made chair equipped with an electronic switch was able to record the time instants of rising (Seat Off) and sitting (Seat On). In both datasets, participants had to execute two different tasks: To perform 10 repetitions of a single STS transition at self-paced speed (SP); To perform 10 repetitions of a single STS transition at controlled speed (CT) with duration marked by a repetitive 4 seconds acoustic feedback, composed by a succession of 3 tones and a pause.
The different strategies of motor control involved in the two tasks outlined two typical pro les of the Ground Reaction Force (GRF). According to the de nition of STS events given by Etnyre and colleagues [20], in SP trials, an initial de ection from the baseline was observed (Initiation). After reaching the lowest level in the force recording (Peak-counter), the GRF raise to a global maximum (Peak) and subsequently levelled to a normal postural sway (Standing). Diversely, CT trials were characterised by a more gradual increase in the GRF following the progressive inclination of the trunk and the raising movement from the chair. Examples of force pro les for both SP and CT trials are displayed in Fig. 1.
Hence, the STS movement pattern was categorised in 4 sequential phases ( The two datasets are the result of two consecutive studies. Hence, some minor modi cation in the execution protocol of the exercise must be disclosed and taken into account. These changes are due to the progressive methodological improvement of the general study, which relies on previous empirical evidence to reduce possible future bias. The summary of the protocol differences between the two datasets is reported in Table 2. In SP trials some participants tended to start the movement too early not allowing the registration of an appropriate RES phase.
In SP trials participants had to wait 3 seconds from the start of the acquisition before starting the movement.
In SP trials some participants tended to sit-down without having reached a su cient stable STA phase.
In SP trials participants had to wait 3 seconds in STA phase to reach balance stability.
In CT trials participants considered the Stand-to-Sit transition as a single returning phase, starting from the beginning of the Sitting event until reaching the RES phase.
In CT trials participants were asked to control the descending movement, identifying two returning phases: a "Sitting" phase, starting from the beginning of the sitting movement until registration of the "seat-on" signal; a "Trunk Raising phase, where subjects raise the trunk until reaching the RES phase. This was done in anticipation of future efforts in categorising the Stand-to-Sit transition.

EVALUATION OF THE SOFTWARE PERFORMANCE
In the lack of a proper gold standard approach, visual inspection remains the most reliable method to recognise STS events and phases on the base of GRF values [20]. Hence, simultaneously with the software analysis, ve participants aged (30 ± 6) years visually identi ed the beginning of the trunk movement (Initiation event) and the limits of the stable stance (Standing and Sitting events) from the force pro les of 200 STS sequences. The ages of the assessors are reported in the table below (Table 3) together with their professional and academic background. As a measure of reliability, assessments were repeated in 2 distinct sessions, separated in time by a minimum interval of 1 hour. STS sequences were drawn randomly from a total pool of 742 acquisitions (HAR1 + HAR2) and were maintained across the measurement trials. This information was not made explicit to avoid possible learning effects. Among the 200 pooled sequences, 100 referred to SP trials, and 100 referred to CT trials. Before the rst session, subjects were brie y trained to recognise the onset of each event on ve force pro les accordingly to the de nitions given by Etnyre and colleagues [20], and an explanatory summary was always available in the form of MATLAB live script throughout all the measurements. All assessors were physiotherapists, bioengineers and PhD candidates with clinical experience in physiotherapy and expertise in movement analysis.

DATA ANALYSIS
Test-Retest reliability was considered as a quality index of the human measure, and it was compared with the performance of the software evaluated with the Inter-Rater reliability against participants' assessments in the rst trial. For both analyses, the absolute agreement was measured using a non-parametric version of Bland-Altman statistics [22], as a consequence of the violation of the normality assumption of data. Normal distributions were tested using the Kolmogorov-Smirnov test and, furtherly investigated using the skewness and kurtosis indexes. To correct the analysis for possible outliers not directly connected to the evaluation of the assessors or the performance of the software, data observations that fell outside 1.5 interquartile ranges above the upper quartile (75 percentile) or below the lower quartile (25 percentile), were visually inspected. In particular, wrong identi cations resulting from an erroneous right mouse click, early movements or a malfunction of the electronic switch were eliminated. We calculated the systematic bias and the Limits of Agreement as the median of the absolute differences between assessments and the respective 2.5-th and 97,4-th percentile scores. The upper Limits of Agreement (ULoA) represented the maximum estimated error between the measures. The 95% Con dence Intervals (CI) of the above-speci ed parameters were also calculated using a percentile bootstrap method based on 10 k samples [23]. The percentile method was chosen for its conservative nature, as it tends to produce wider CI less sensitive to population value and sample size [24]. To compare the maximum errors made between the Test-Retest trials and the Human-Software evaluations, we used a twotailed two-sample t-test [25], exploring the signi cant differences between measures. Every ULoAs and respective CI were approximated to normal distributions characterised by mean values and standard deviation [26], calculated as: The choice of the lower or higher con dence limit is guided by the need to calculate the largest standard error to obtain a more conservative approximation. The statistical analysis was strati ed by considering the different STS events separately, dividing the results obtained in SP and CT trials. Bland-Altman statistics was implemented in MATLAB as a modi ed version of the BlandAltman.m function developed by Ran Klein from the Department of Nuclear Medicine of the Ottawa Hospital [27]. The two-tailed two-sample t-test was executed with the online "Comparison of means calculator" tool from Medcalc Statistical Software [28].

Results
The Bland-Altman plots relating to the identi cation of the Initiation, Standing and Sitting events are shown respectively in Fig. 3, Fig. 4 and Fig. 5 and the summary of the descriptive statistics of bias and the ULoAs (CI) are reported in Table 4.  The result of the two-tailed two-sample t-test are summarised in Table 5.

Discussion
This work aimed at evaluating the performance of a new automated approach for the recognition of clinically relevant events in the STS and at comparing its performance to the human visual assessment. The results obtained offer a double contribution in prospective researches since not only we quanti ed the discrepancy between the two methods, but we also compared it against the maximum error made in repeated visual measures. Despite the signi cant lower systematic bias in repeated evaluations, the comparison between visual assessments and the proposed approach showed similar values of maximum absolute error.
More speci cally no statistical differences were found in the identi cation of the Initiation and the Sitting events in SP trials and in the identi cation of only the Sitting event in CT trials. The worsening of the observed agreement during CT movements was generally in line with our expectation, as ULoAs values could be affected by uncertainties due to the kinetic modi cations resulting from the standardisation of the movement. For instance, the initial GRF de ection effect is highly dependent on each individual's movement strategy [20] and could be reduced by the lower quantity of momentum produced under constrained speed, complicating the identi cation of the Initiation event.
Another important consideration highlights the intrinsic subjectiveness of human evaluations [29]. One could consider the slightest oscillation either as an extension of a contiguous static phase or as the limit of a movement transition. Moreover, visual assessments can vary across repeated evaluation and differ in individuals, depending on their professional experience [30,31]. respectively for normal and standardised speed, the proposed method may not be suitable for evaluations on a single patient, where the expertise of health professionals plays a key role in the diagnosis process, often requiring a high level of abstraction [29]. It is also true that a great quantity of data collected through sensor systems is inherently noisy, and for this reason, its analysis should consider and handle some degree of uncertainty [32]. In this context, the presented algorithm can be used as a solid basis for arti cial intelligence methods to provide accurate, faster, and scalable results in the eld of big data analytics with interesting applications in Human Activity Recognition tasks [33,34].
This is the rst study that aims at comparing human and automated assessments in the identi cation of a complex motion pattern in the STS movement relying on data collected through a force plate. Previous works [35][36][37][38][39] evaluated the performances of various algorithms developed for the identi cation of Sit-to-Stand and Stand-to-Sit postural transitions using data acquired from inertial sensors. In particular, a recent paper of Atrsaei and colleagues [40] validated the accuracy of a new routine based on a single device against visual assessments on-camera recordings of STS movements, obtaining levels of agreement above 94%, in terms of positive predictive values and sensitivity. As a direct comparison with the present study, the use of inertial sensors is usually preferable since they can be also applied in non-clinical environments [41]. However, their measurements are strongly in uenced by the inter-and intra-individual variability of the movement [20,42], limiting the recognition of the STS motion pattern to the simple discrimination of static and dynamic phases.
Conversely, our choice to use a force plate has some doubtless limitations in terms of costs and portability but the strong advantage of providing easier interpretable results, on which it is possible to identify clinically

AVAILABILITY OF DATA AND MATERIALS
The software used to support the conclusions of this article is partially described within the Additional le 1 and a complete implementation is available at https://zenodo.org/record/3956138#.XxgQgigzZPY [45]. The datasets used and analysed during the current study are available from the corresponding author on reasonable request.

COMPETING INTERESTS
The authors declare that they have no competing interests.  Bland-Altman plots depicting the Test-Retest Reliability (left) and the Inter-Rater Reliability (right) of the identi cation of the Initiation event in SP trials (upper plots) and CT trials (lower plots).

Figure 4
Bland-Altman plots depicting the Test-Retest reliability (left) and the Inter-Rater reliability (right) of the identi cation of the Standing event in Task A (upper plots) and Task B (lower plots).