A multifactorial fall risk assessment system for older people utilizing a low-cost, markerless Microsoft Kinect

Abstract Falls among older people are a major health concern. This study aims to develop a multifactorial fall risk assessment system for older people using a low-cost, markerless Microsoft Kinect. A Kinect-based test battery was designed to comprehensively assess major fall risk factors. A follow-up experiment was conducted with 102 older participants to assess their fall risks. Participants were divided into high and low fall risk groups based on their prospective falls over a 6-month period. Results showed that the high fall risk group performed significantly worse on the Kinect-based test battery. The developed random forest classification model achieved an average classification accuracy of 84.7%. In addition, the individual’s performance was computed as the percentile value of a normative database to visualise deficiencies and targets for intervention. These findings indicate that the developed system can not only screen out ‘at risk’ older individuals with good accuracy, but also identify potential fall risk factors for effective fall intervention. Practitioner summary: Falls are the leading cause of injuries in older people. We newly developed a multifactorial fall risk assessment system for older people utilising a low-cost, markerless Kinect. Results showed that the developed system can screen out ‘at risk’ individuals and identify potential risk factors for effective fall intervention.


Introduction
Fall is a common health concern for older people.Around one-third of older people aged 65 or over experience falls annually, and 35% of them fall recurrently (Hung et al. 2017).Fall is a leading cause of fatal injuries and emergency visits, and annual direct medical costs from fall-related injuries are more than $30 billion (Smith et al. 2015).Due to the serious adverse consequences of falls, it is critical to assess the fall risks and provide appropriate fall interventions for the older individuals at risk of falling (Howcroft, Kofman, and Lemaire 2013).
Various clinical scales have been developed in the past, such as the Activities-specific Balance Confidence (ABC) Scale, Berg Balance Scale (BBS), Timed Up and Go (TUG), and Tinetti Performance Oriented Mobility Assessment (POMA), to assess specific fall risk factors or overall fall risk and screen out 'at risk' individuals (Kim and Xiong 2017;Strini, Schiavolin, and Prendin 2021).The scales consist of only self-reported questionnaires and simple performance tests, thus they are widely used in the clinical field because they are simple and easy to use.However, most of them are subjective or oversimplified to assess fall risks in older people (Howcroft, Kofman, and Lemaire 2013;Qiu et al. 2018).In addition, many of these scales suffer from the ceiling or flooring effects in fall risk assessment (Kim and Xiong 2017).On the other hand, sophisticated equipment such as Neurocom Balance Master, Biodex Balance System can provide objective, quantitative, and accurate measures for fall risk assessment.Unfortunately, they are expensive, cumbersome and require well-trained staff to operate.With the recent development in sensing technology, the portable sensor based systems for fall risk assessment are gaining popularity due to their advantages over the expensive, cumbersome equipment and subjective questionnaires or oversimplified clinical tests (Wu et al. 2022).Inertial sensors, especially accelerometers and gyroscopes, have been widely used for fall risk assessment due to their excellent capabilities to monitor human motions and wearability.Qiu et al. (2018) developed a fall risk assessment system for older people using five inertial sensors and an inertialsensor based test battery.Simil€ a, Immonen, and Ermes (2017) developed models to predict balance deficits and identify fall risk via two accelerometers in balance and walking tests.Even though these studies are valuable and have achieved reasonable fall risk classification accuracies, wearing inertial sensors on the body locations accurately is not an easy task for older people and it may cause discomfort or inconvenience due to obtrusiveness from the attached sensors (Sun and Sosnoff 2018).Furthermore, the accuracy of the fall risk assessment system depends on the number of wearable inertial sensors and their attached locations (Howcroft, Kofman, and Lemaire 2013), which needs to be further optimised.
Microsoft Kinect holds promise to overcome the limitations of inertial sensors for developing a fall risk assessment system.Kinect v2 is a low-cost (USD249 for Kinect v2 released in 2014), markerless sensing device that can real-time track human 3D full-body motions in form of 25 skeletal joints based on an RGB-D sensor (image and depth sensors) and embedded gesture recognition algorithms.Therefore, it is unobtrusive and convenient for older people because they do not need to wear the sensors.In addition, it can enable older users to effectively interact with the developed fall risk assessment system with their natural gestures directly (natural user interface).Thus, a Kinect-based fall risk assessment system can facilitate the healthcare applications (self-assessment, diagnose, and rehabilitation) and maximise older people's adoption due to unobtrusiveness, low-cost, natural user interface, and convenience (Dolatabadi, Taati, and Mihailidis 2016;Choi et al. 2017).The position accuracy of human skeletal joints from Kinect was also confirmed to be acceptable, even if it is not as good as the optical motion capture system (Ma et al. 2019).
A few studies have attempted to implement fall risk assessment tests using the Kinect for older people and patients of high fall risks (Ejupi, Brodie, et al. 2016;Ejupi et al. 2016;Dubois, Bihl, and Bresciani 2017;K€ ah€ ar et al. 2018;Sun et al. 2019).Researchers utilised static balance test (Sun et al. 2019), sit-to-stand test (Ejupi, Brodie, et al. 2016), timed up and go test (Dubois, Bihl, and Bresciani 2017;K€ ah€ ar et al. 2018), and choice reaching reaction test (Ejupi, Gschwind, et al. 2016), and they reported that various outcome measures (such as sway range, completion time, step length, number of steps, and reaction time) from those tests showed the statistically significant group differences between the high fall risk group and the low fall risk group.However, outcome measures of statistical significance do not necessarily guarantee sufficient discriminative power and good accuracy for fall risk classification.Two studies further examined the discriminative power of their proposed Kinectbased fall risk assessments.Tripathy, Chakravarty, and Sinha (2018) classified fallers and non-fallers with 74.5% overall accuracy based on a single-leg standing test, and Kargar et al. (2014) discriminated between high and low fall risk groups with 67.4% accuracy based on the timed up and go test.These research studies indicate that it is difficult to accurately screen out people who are at high fall risk with just one or a few simple tests.The test battery should be comprehensive and multifactorial since falls typically happen due to the combination and interaction of different fall risk factors.In addition, all the developed Kinectbased fall risk assessment systems can only assess the overall fall risk but lack the ability to further diagnose potential risk factors that increase one's fall risk, thus, they are of limited clinical value in determining tailored fall intervention programs for effective treatment.
In order to overcome existing limitations, this research aims to develop a multifactorial fall risk assessment system with diagnosis ability for older people utilising a low-cost, markerless Microsoft Kinect v2 and a comprehensive while practical test battery.In this study, we designed our system primarily considering intrinsic and modifiable fall risk factors which have a high potential to be improved by subsequent Kinect-based fall intervention programs such as exergames (Kim and Xiong 2022).These risk factors fall into three broad categories: physiological (subsystems for sensory input, central processing, and motor response), psychological (fear of falling, depression etc.), and integrated functions (gait and mobility, postural adjustment, etc.) (Delbaere, Close, Heim, et al. 2010;Hamacher et al. 2011;Qiu et al. 2018).A list of representative fall risk factors corresponding to these three categories were generated from a comprehensive literature review of previous studies.A corresponding Kinect-based multifactorial test battery consisting of seven subtests to assess those risk factors was designed afterwards.In order to make this test battery to be both scientific and practical, each subtest under the test battery should be not only valid and reliable for assessing corresponding risk factors, but also simple and quick for older people to undertake.We hypothesise that the developed multifactorial fall risk assessment system has better performance than existing systems in assessing fall risks.Moreover, it can further diagnose individual potential fall risk factors to design tailored fall intervention programs.

Kinect-based system setup and multifactorial test battery for fall risk assessment
Considering the practical tracking range of Kinect (max.range of 4.5 m for Kinect v2) and necessary space for different tests, the Kinect-based system setup (Figure 1) was first optimised for easy and robust data acquisition through our internal tests.A Kinect v2 device was placed on a table, 3.2 m away from a chair and 0.7 m above the ground.A cone and a balance pad (Airex) were placed in between the Kinect device and the chair for some subtests in the test battery.
Our developed test battery (Figure 2) was composed of seven subtests to assess intrinsic and modifiable fall risk factors for physiological, psychological, and integrated functions (Qiu et al. 2018): 1) Sensory Organisation Test-SOT for sensory inputs and static balance (Nocera, Horvat, and Ray 2009;Chaudhry et al. 2011;Park et al. 2019), 2) Limit of Stability-LOS for postural stability (Latt et al. 2009;Liu et al. 2015;Scena et al. 2016), 3) Sit to Stand 5 times-STS5 for postural adjustment and lower-limb strength (Doheny et al. 2011;Ejupi, Brodie, et al. 2016;Shukla et al. 2020), 4) Timed Up and Go-TUG for mobility and dynamic balance (Kojima et al. 2015;Kang et al. 2017;Tan et al. 2019), 5) Range of Motion test-ROM for joint movement and flexibility (Demura, Yamada, and Kasuga 2012;Naeemabadi et al. 2018;Jung et al. 2020;Lohrasbipeydeh et al. 2021), 6) Choice Stepping Reaction Test-CSRT for sensorimotor and cognitive function (Lord and Fitzpatrick 2001;Delbaere et al. 2016;Ejupi, Gschwind, et al. 2016), and 7) Short Fall Efficacy Scale-FES for psychological risk factor, especially fear of falling (Kempen et al. 2008;Delbaere, Close, Mikolaizak, et al. 2010;Denkinger et al. 2015).Seven daily activities evaluated in FES are getting dressed or undressed, taking a bath or shower, getting in or out of a chair, going up or down stairs, reaching for something above your head or on the ground, walking up or down a slope, and going out to a social event (e.g.religious service, family gathering or club meeting).SOT, CSRT and ROM were designed to assess physiological risk factors; FES was used to assess psychological risk factors; and the rest three (STS5, LOS and TUG) were designed to assess the integrated functions.All subtests were developed using Kinect SDK 2.0 (Microsoft Corporation, Redmond, Washington, United States) and Unity3D (Unity Technologies, San Francisco, California, United States), and they were adopted from above mentioned studies with modifications if necessary.
The walking distance of standard TUG was modified from 3 m to 2 m due to the limited tracking range of Kinect.The maximum tracking range of Kinect is about 0.8-4 m and the actual range of stable full-body tracking is even smaller.We followed the earlier studies and chose 2 m walking distance for Kinect-based TUG (K€ ah€ ar et al. 2018;Tan et al. 2019).CSRT has two versions: CSRT-M using a rubber pad and CSRT-E using an electronic pad (Delbaere et al. 2016).Ejupi, Gschwind, et al. (2016) developed Kinect-based CSRT with two stepping panels on the left and right, however, the present study implemented Kinect-based CSRT with four stepping panels (left, front-left, right, and front-right) to keep the same design as the original CSRT (Lord and Fitzpatrick 2001).Each stepping panel was randomly illuminated in one trial for a total of five times, so the total trials for four panels were 20.Five-second break was given in between trials.Figure 2 shows the implemented subtests to assess fall risk, and Table 1 describes the detailed protocol of each subtest and its representative outcome measures.

Experimental participants
All participants were community-dwelling senior volunteers from three cities (Cheongju, Sejong, and Incheon) in South Korea.The eligibility criteria were as follows: age � 65 years, female, and able to walk independently without the use of assistive devices.As for exclusion criteria, older people who have impaired visual acuity (<0.3) or cannot communicate in Korean were not considered for the experiment.This study focussed on older women because they were reported to have higher fall risks than older men (Chang and Do 2015;Kim, Choi, and Xiong 2020).In total, 106 community-dwelling older Korean women participated in this study.This sample was based on convenience sampling instead of random sampling due to challenges from COVID-19 pandemic.Each participant gave the written informed consent prior to participation.All participants performed the seven subtests sequentially (in the order of SOT, LOS, STS5, TUG, ROM, CSRT, and FES) and completed the test battery within 25-min.A research assistant briefly explained how to perform each subtest before each test and provided necessary assistance.Each participant's self-reported history of falls, chronic diseases, and regular physical activities in the past 1-year were also collected.The study was ethically approved by KAIST Institutional Review Board (IRB No: KH2020-015).

Investigation of prospective falls
The events of prospective falls were investigated over six months after the fall risk assessment (Aviles et al. 2019).A fall was defined as 'an unexpected loss of balance resulting in coming to rest on the floor, the ground, or an object below the knee level' (Lach et al. 1991).The investigation was conducted biweekly by text message or by telephone if there was no text reply.Four participants lost fall monitoring as three failed to follow up and one decided to withdraw from the study.Therefore, a total of 102 (106-4) participants remained in this study and their data were further analysed.
Participants who experienced prospective falls during the 6-month follow-up period were classified as 'high fall risk group' (Mirelman et al. 2016;Aviles et al. 2019;Martinez and De Leon 2020); otherwise, they were classified as 'low fall risk group'.According to the above criteria, 22 (21.6%)older participants belonged to the high fall risk group and the rest of 80 (78.4%) participants belonged to the low fall risk group.shows the sample characteristics of the high and low fall risk groups.Compared with the low fall risk group, the high fall risk group was significantly older (p ¼ 0.048) and experienced more falls in the past 1-year (p ¼ 0.046), but there were no significant differences in height, weight, BMI, self-reported chronic diseases, and regular physical activities (p > 0.05).

Kinect data processing and feature extraction
Figure 3 shows 25 skeletal joints tracked by Kinect v2 and its coordinate system.The joint position data collected by Kinect were first filtered by a first-order Butterworth low-pass filter with a cut-off frequency of 10 Hz (Gottlieb et al. 2020) to remove noise in the time-series data.Then various algorithms were developed for extracting meaningful fall risk measures from the skeletal data.During this process, different skeletal joints were used for different subtests due to their high relevance with the corresponding subtests and the specific movement patterns recorded during each subtest (Qiu et al. 2018).For example, hand joints were used to compute reach distances in LOS; for ROM, hip, knee, and ankle joints were used; while for Sit on the chair, fully extend a knee and hold for 2 sec, then bend the knee and hold for 2 sec (Repeat 3 times for each knee) Knee flexion angle: The average of minimal knee angles while bending the knee; Knee extension angle: The average of maximal knee angles while extending the knee; Knee range of motion: The average of difference between knee extension and flexion angles Choice Stepping Reaction Test (CSRT) Step on an illuminated panel as fast as possible (4 random panels: left, left-front, right and right-front; each panel was illuminated 5 times) Total completion time: The total elapsed time to complete 20 trials of the test; Reaction time: Average time between when a panel is illuminated and a foot starts to move; Movement time: Average time between when a foot starts to move and when the illuminated panel is stepped Short Fall Efficacy Scale (FES) Choose the right level of concern about falling during 7 daily activities using hand gestures to control the cursor (natural gesture interface) FES score: The total score of seven evaluation questions (Range: 7-28) TUG, spine base, spine shoulder, elbow, and foot joints were used.Validation experiments have been performed to confirm the accuracy of our developed algorithms for extracting important features from the Kinect data for each subtest.
In order to simplify the explanation of algorithm development for extracting important features from the Kinect data to compute outcome measures in subtests, one of the most complicated subtests for feature extraction-TUG was illustrated as a representative example (Figure 4).The detailed feature extraction for other subtests except FES is illustrated in Appendix A (FES is a Kinect-enabled questionnaire that does not involve sensor data processing).The entire TUG task was divided into four phases: sit-tostand, walking, turning, and stand-to-sit as Kargar et al. (2014).The turning phase was determined by the absolute difference between x-coordinates of the left and right elbows as shown in Figure 4B (top, left).When turning, the x-coordinates of left and right elbows will theoretically overlap.Therefore, the position difference between two elbows along the x-axis will first reach to a local minimum and then return to the original difference.This pattern would happen  again in the transition between the walking phase and the stand-to-sit phase because the participant should turn around and sit on a chair.The entire TUG phase (from start to end) was extracted by using the z-coordinates of the spine base joint as shown in Figure 4B (top, right).Since the spine base joint was located close to the centre of mass, its position data tracked by Kinect were very stable.The initial moment when the z-coordinate of this joint starts to decrease is the start moment of the test, and the moment when the z-coordinate becomes the smallest is the turning moment.After turning, the z-coordinate will continue to increase until it stabilises at the initial position at the end of the test.After the starting point, the end of the sit-to-stand phase (i.e.just before the walking phase) and start of the stand-to-sit phase (i.e.right after the walking phase) were determined by comparing the y-coordinate of the spine shoulder joint with its height when full standing, as shown in Figure 4B (bottom, left).Gait-related outcome measures were derived from two foot joints.As shown in Figure 4B (bottom, right), the local peaks of the difference between the z-coordinates of the left and right feet were related to the gait cycle and characteristics, and they were used to further calculate the number of steps, step width, and step duration.Due to the validity issue (Vernon et al. 2015;Tan et al. 2019), all gait outcome measures were calculated using only data before the turning phase.

Statistical analysis and fall risk modelling
Figure 5 summarises the whole process of statistical data analysis and fall risk modelling.Two-sample t-tests were conducted first on all outcome measures from seven subtests to identify significant ones between high and low fall risk groups.Then, a Receiver Operating Characteristic (ROC) analysis was performed with each significant outcome measure to examine the discriminative power on classifying the fall risk groups.Fall risk classification models were constructed afterwards by using only significant outcome measures in both t-test and ROC analysis as predictors.The same analysis process was performed for the sample characteristic variables in Table 2, which were collected through surveys and easy to be included in the fall risk classification model.For significance tests of categorical variables, such as the history of falls, Chi-squared test and univariate logistic regression were used (Phase 1).Because the performance of a classification model can be different how to split training and test datasets, three exclusive dataset splits were made to investigate the overall performance of classification models.Phases 2 and 3 in Figure 5   suggested by previous studies (Chawla et al. 2002) (Phase 2).Afterwards, the classification model was constructed by using the random forest algorithm for the oversampled training set (Phase 3).The hyperparameters of the random forest algorithm were tuned by using the random search method with 3-fold cross-validation, and the final model was evaluated by the test set.The balanced accuracy, sensitivity and specificity were used to evaluate the model classification performance, as the high and low fall risk groups were imbalanced (Bekkar, Djemaa, and Alitouche 2013).
Statistical analysis of significance tests was conducted using IBM SPSS Statistics 20 (IBM Corporation, New York, United States) with a significance level of 0.05.Kinect data were processed on Visual Studio 2019 (Microsoft Corporation, Redmond, Washington, United States).Data augmentation and classification model construction were performed using the imbalanced-learn and scikit-learn packages in python (Pedregosa et al. 2011).

Significant outcome measures and sample characteristics to distinguish high and low fall risk groups
Among all 66 outcome measures from seven subtests of the developed Kinect-based test battery (see details in Appendix Table B1), 11 outcome measures were significantly different between high and low fall risk groups in terms of two-sample t-tests (Table 3).Among them, 10 outcome measures had significant discriminative power from ROC analysis (Table 3).According to the results of the significance tests, the high fall risk group had the following characteristics: larger body sway and lower equilibrium score in SOT, shorter reach distance in LOS, longer time to complete STS5, shorter turning phase during TUG, longer time to complete CSRT and higher total score in FES.There was no significant outcome measure from ROM.
As shown in Table 2, age and fall history were significant sample characteristics, and further univariate logistic regression analysis (Table 4) showed that fall history had significant discriminative power between the high and low fall risk groups (odds ratio ¼ 2.636; p ¼ 0.050).However, age didn't show significant discriminative power between the high and low fall risk groups according to the ROC analysis (AUC ¼ 0.634; p ¼ 0.055).Therefore, only the fall history was further included in fall risk classification models.

Performance of fall risk classification models
Table 5 summarises the fall risk classification performance of the random forest classification models for three different test sets.The classification model achieved an average balanced classification accuracy of 84.7%, sensitivity of 83.3%, and specificity of 86.1%.

Diagnosis of potential fall risk factors
Inspired by Lord et al.'s physiological profile approach for assessing fall risks (Lord, Menz, and Tiedemann 2003), our developed fall risk assessment system can further diagnose the potential fall risk factors based on the performance of the subtests.An individual's performance in relation to a normative database of 106 tested older participants were computed as percentile values so that deficiencies (in red) can be clearly visualised and targeted for intervention (Figure 6). Figure 6 shows typical examples of diagnostic reports for individuals with low (Figure 6, left) or high fall risks (Figure 6, right).It not only provides an overall fall risk score equal to the fall probability estimated by the random forest classification model, but also provides detailed scores related to six significant fall risk factors that correspond to each subtest result.For example, the older individual at high fall risk in Figure 6 (right) has a probability of falling at 69%, and potential fall risk factors including deficiencies in cognitive function (central processing speed), postural stability, confidence in daily activities, and lower-limb function (postural adjustment and low-limb strength).

Discussion
We developed a Kinect-based multifactorial fall risk assessment system for older people.It is low-cost, markerless, and convenient to use.A comprehensive test battery consisting of seven subtests was designed and implemented using Microsoft Kinect v2 and Unity3D.A follow-up experiment with 102 communitydwelling older women and an investigation of their prospective falls showed that our developed system can not only classify the overall fall risks with good accuracy (84.7%), but also identify the potential fall risk factors for effective interventions.

Kinect-based test battery and significant outcome measures to distinguish high and low fall risk groups
Ten outcome measures from the developed Kinectbased test battery and 1-year fall history showed both significant statistical difference and good discriminative power to distinguish high and low fall risk groups (Tables 3 and 4).When compared with the low fall risk group, the high fall risk group had more falls in the past 1-year, larger body sway in SOT, and shorter reach distance in LOS.In addition, the high fall risk group showed higher fear of falling (larger score in FES) and took longer time to complete STS5 and CSRT, while had a shorter turning phase during TUG.These results are largely consistent with previous studies which reported the major risk factors for fallers (Kerrigan et al. 2001;Lord et al. 2007;Kim and Xiong 2017;Qiu and Xiong 2017;Qiu et al. 2018).
Importantly, at least one outcome measure from each of seven subtests in the Kinect-based test battery showed significant group difference and good discriminative power, with the exception of ROM, indicating that the proposed subtests were effective to assess fall risk in older people.ROM was the only subtest that did not have any significant outcome measure for classifying fall risk.Even though earlier studies have reported that the range of motion of ankle (e.g.dorsiflexion or plantarflexion) plays an important role in assessing fall risk, strength is more important than range of motion for knee-related measures to assess fall risk (Jung and Yamasaki 2016;Menz, Auhl, and Spink 2018).However, due to the poor tracking quality of ankle joints by Kinect, we only measured the range of motion of the knee, not the ankle (Otte et al. 2016).ROC analysis showed that some Kinect-based outcome measures were more discriminative for fall risk than others (Table 3).Mean sit-to-stand time and total completion time of STS5 showed the highest discriminative power, followed by total completion time of CSRT and FES score.Overall, this study suggests that STS5, CSRT, and FES are important tests that should be included in the multifactorial fall risk assessment as much as possible.
It should be noted that even though the TUG is a widely used test in the clinical field to assess fall risk in patients at high risk of falling, the total completion time of TUG was not a significant outcome measure for predicting future falls of the community-dwelling older people in this study.Although the association of TUG completion time with future falls is controversial, it is generally agreed that TUG completion time has limited ability to predict future falls, while it is effective to identify the history of falls (Beauchet et al. 2011;Barry et al. 2014;Kojima et al. 2015).The potential reason is that a 3 m walk can be insufficient to assess fall risk, therefore, a longer walking distance (5 to 6 m) is necessary to fully assess mobility and predict falls (Beauchet et al. 2011).In our study, due to the limited trackable range of Kinect, the TUG walking distance was even shorter (2 m) than the original 3 m TUG, so it would be difficult to expect sufficient predictive power.Whereas regarding the history of falls, since TUG is tested after participants already experienced falls, their behaviour patterns may be different from the moments before falls -more conservative and safer (Howcroft, Kofman, and Lemaire 2013).Interestingly, the high fall risk group had a shorter turning phase during TUG in comparison to the low fall risk group.As shown in Appendix Table B1, the high fall risk group not only required slightly longer times in sit-to-stand, walking, stand-to-sit phases, but also total completion time, while took a slightly shorter time in the turning phase despite statistical insignificance.It seems that the proportion of the turning phase became significantly different between groups by accumulating these small differences.Furthermore, this may indicate that the high fall risk group takes longer time in risky transitions such as sitto-stand, stand-to-walk, and walk-to-sit (Howcroft, Kofman, and Lemaire 2013;Pozaic et al. 2016).

Accuracy of fall risk classification
Based on the proposed Kinect-based test battery, a follow-up experiment with 102 older participants showed that our fall risk classification model can achieve an average accuracy of 84.7%, with 83.3% sensitivity, and 86.1% specificity.Several earlier studies attempted to develop wearable inertial sensor-based fall risk assessment systems.Caby et al. (2011) tested walking tasks for older people, and classified high and low fall risk groups with 75-100% accuracies.However, the accuracies were based on very limited sample size of 20 and thus it could be difficult to generalise.Liu et al. (2011) and Gadelha et al. (2018) performed TUG, STS, step or muscle quality tests to evaluate fall risk of 68 and 167 participants, respectively.However, their reported fall classification accuracies were 71% and 69.5%, which may not be sufficient for practical use.Yamada et al. (2011)explored the preliminary validity of using Nintendo Wii game programs to assess fall risk in older people.They tested 45 older women and correctly classified 88.6% of the cases (faller or non-faller) based on the basic step scores from the game program.Unfortunately, even though it was a preliminary study, we did not find follow-up studies to verify the major findings and classification accuracy.Recently, Qiu et al. (2018) attached five inertial sensors and performed a multifactorial fall risk assessment test battery on 196 community-dwelling Korean older women.Their fall classification model achieved an overall accuracy of 89.4%, with 92.7% sensitivity and 84.9% specificity.Overall, the accuracy of our Kinect-based fall risk assessment system (84.7%) was comparable to those of previous inertial sensorbased fall risk assessment systems.Additionally, inertial sensors need to be worn accurately by older people and can induce movement interference and discomfort, while Miscrosoft Kinect is a markerless device, which does not require the participant to wear bothersome sensors and can automatically detect major body landmarks during testing.Therefore, the Kinect-based fall risk assessment system should be more convenient and practical for older participants.
Even though many studies assessed fall risk using Kinect devices, and reported statistically significant measures between low and high fall risk groups (Ejupi, Brodie, et al. 2016;Ejupi, Gschwind, et al. 2016;Dubois, Bihl, and Bresciani 2017;K€ ah€ ar et al. 2018;Sun et al. 2019), only three comparable studies (Colagiorgio et al. 2014;Kargar et al. 2014;Tripathy, Chakravarty, and Sinha 2018) further attempted to classify individuals at high or low fall risk using Kinectbased test protocols and their outcome measures (Table 6).Table 6 demonstrated that more comprehensive test protocols tend to produce higher classification accuracy, although it is difficult to directly compare these studies due to different samples, labelling criteria for fall risk group and classification models.Kargar et al. (2014) and Tripathy, Chakravarty, and Sinha (2018) implemented a single test such as TUG and single-leg standing to assess fall risk, and the accuracies were 67.4% and 74.6%, respectively.Whereas, our study and Colagiorgio et al. (2014) achieved accuracies of around 85% with comprehensive and multifactorial test protocols.The higher classification accuracy from comprehensive test protocols could be explained by the fact that fall is a complex multifactorial phenomenon (Lord et al. 2007) and the test protocol should be also multifactorial for more accurate fall risk assessment.We found that Colagiorgio et al. (2014) reported slightly higher classification accuracy than ours (85.8% vs. 84.7%),even though our test protocol was more comprehensive.This may be due to their mixed samples of old and young adults and different labelling criteria for the high fall risk group.Note that only our study labelled high and low fall risk groups based on the prospective fall, which is difficult to obtain but should be the most valid criterion for predicting fall risk (Romli et al. 2021).It is worth noting that we have carefully designed those seven simple and quick subtests for the Kinect-based test battery, in order to achieve a good balance between the accuracy of fall risk assessment, the efficiency of the test time, and the convenience of the test participant.All older participants in this study can complete the entire test battery within 25 minutes, further demonstrating the practicality and potential of our newly developed Kinect-based fall risk assessment system.

Practical applications of the developed Kinect-based fall risk assessment system
The developed Kinect-based fall risk assessment system is low-cost and markerless, and its test battery is simple and easy for seniors to complete, making it affordable and convenient for health professionals to use it to objectively assess and regularly monitor the fall risks of seniors or certain patients at high risk of falls.The developed Kinect-based system doesn't require expensive and bulky equipment, and Kinect is portable and easy to set up at home, so it can be further applied to the seniors living in the community and even at home.Another important advantage of our developed fall risk assessment system is the ability to diagnose potential fall risk factors for older individuals at high fall risks.The combination of Kinect-based multifactorial test battery and a large sample of older people (N > 100) allowed us to quantitatively evaluate the performance of the older individual on each fall risk factor.By comparing individual performance data with a normative database of older participants, each individual's deficiencies and potential fall risk factors can be easily identified.This type of diagnostic report (Figure 6) can provide a good reference for doctors to conduct in-depth examinations and design tailored fall intervention programs to effectively reduce the fall risks.
It is worth noting that we evaluated the proposed fall risk assessment system only based on older women.This is because gender is a well-accepted fall risk factor, and older females generally tend to have higher fall risk than older males (Chang and Do 2015;Kim, Choi, and Xiong 2020).In order to make our fall risk assessment model more accurate and not confounded by the gender factor, in this study, we first focussed on older females with higher fall risk.Therefore, applying the current model to both genders may lead to biased results.To address this issue, future research could adopt a similar approach to develop a tailored fall risk assessment model for older males.

Limitations
This study has some limitations.First, in this study we used convenience sampling and focussed only on older females who are usually at higher fall risks.Therefore, the direct application of main findings to the general older population should be cautious.Second, due to technical limitations, the Kinect-based multifactorial fall risk assessment focuses on major intrinsic risk factors rather than extrinsic factors (e.g.environmental hazards), although extrinsic factors are also important causes of falls.Third, considering the ease of adoption by older participants, we didn't randomise the sequence of subtests, which may cause learning or fatigue effects on certain subtests, leading to some potential bias in the results.Last but not least, how to link the Kinect-based fall risk assessment and risk factor diagnosis to Kinect-based tailored intervention programs should be further studied.

Conclusion
In this study, we developed a low-cost, markerless multifactorial fall risk assessment system for older people using a Microsoft Kinect v2.A Kinect-based test battery consisting of seven subtests (SOT, LOS, STS5, TUG, ROM, CSRT, and FES) was designed to comprehensively assess major fall risk factors on physiological, psychological, and integrated functions.A follow-up experiment was conducted with 102 communitydwelling Korean older women (22 in high fall risk group and 80 in low fall risk group based on their prospective fall occurrences) to assess their fall risks.Experimental results showed that the high fall risk group performed significantly worse on the Kinectbased test battery, especially in STS5, CSRT, and FES tests.Random forest classification models were further constructed to classify fall risk based on 10 significant outcome measures from the test battery with 1-year fall history.The classification model can achieve average classification accuracy of 84.7% with 83.3% sensitivity and 86.1% specificity.Furthermore, an individual's performance on the test battery was computed as the percentile value of a normative database so that deficiencies can be clearly visualised and targeted for intervention.The findings from this study indicate that the developed low-cost, markerless Kinect-based multifactorial fall risk assessment system can not only conveniently screen out 'at risk' older individuals with good accuracy, but also identify potential fall risk factors.Given that fall intervention programs tailored to each older individual's specific fall risk factors tend to be more effective, future research could further extend the current fall risk assessment system with modular fall intervention programs in the form of tailored exergames for more effective and entertaining interventions.The developed system has good potential for effective fall intervention and prevention.

Limit of Stability
The reach distances were derived from hand joint data.Depending on the reach direction, left or right hand joint was utilised.The right arm was positioned to perform forward and rightward reach, and the left arm was positioned for leftward reach.Therefore, the z-value of the right hand was used to compute the forward reach distance, and the x-values of the left and right hands were used for the leftward and rightward reach distances, respectively.The absolute difference between the maximum and minimum hand position values was computed as the reach distance (Figure A3).

Sit to Stand 5 times
The y-coordinate of the spine shoulder joint data was used to extract the features (Figure A4).The total completion time was the time interval from the beginning of the task to the fifth full standing.The mean sit-to-stand time and standto-sit time were calculated by averaging five sit-to-stands times and four stand-to-sit times, respectively.

Range of Motion test
Left and right knee range of motion angles were computed by using position data of the hip, knee, and ankle joints   (Equation A2).Whenever the participants straightened and bent their knees, the maximum and minimum points of each repetition were investigated to calculate the full knee range of motion angle (Figure A5).Five knee range of motion angles could be obtained from three fully extended angles and three fully flexed angles, and the final outcome of the knee range of motion was calculated by averaging the three largest knee range of motion angles among five of them.

Choice Stepping Reaction Test
The elapsed time for each trial is the sum of reaction time and movement time.The reaction time is the time interval from when a panel is illuminated on the screen to when the participant starts to move his or her foot.The movement time is time interval from when the foot starts to move to when the illuminated panel is stepped on.The total completion time is the sum of elapsed times for 20 trials.The mean reaction time and mean movement time were calculated by averaging the reaction times and movement times of 20 trials, respectively.Appendix B

Figure 3 .
Figure 3. 25 Skeletal joints tracked by Kinect and the coordinate system.
Figure5summarises the whole process of statistical data analysis and fall risk modelling.Two-sample t-tests were conducted first on all outcome measures from seven subtests to identify significant ones between high and low fall risk groups.Then, a Receiver Operating Characteristic (ROC) analysis was performed with each significant outcome measure to examine the discriminative power on classifying the fall risk groups.Fall risk classification models were constructed afterwards by using only significant outcome measures in both t-test and ROC analysis as predictors.The same analysis process was performed for the sample characteristic variables in Table2, which were collected through surveys and easy to be included in the fall risk classification model.For significance tests of categorical variables, such as the history of falls, Chi-squared test and univariate logistic regression were used (Phase 1).Because the performance of a classification model can be different how to split training and test datasets, three exclusive dataset splits were made to investigate the overall performance of classification models.Phases 2 and 3 in Figure5represent an example to develop a fall risk classification model using a dataset split.Each dataset split consisted of 70% of the training set and 30% of the test set.Due to the concern of potentially biased classification results caused by the high imbalance between high and low fall risk groups (22 vs 80), the Synthetic Minority Oversampling Technique (SMOTE) was applied for the training set before constructing the classification model, as

Figure 4 .
Figure 4. Flowchart for deriving outcome measures of TUG from Kinect sensor data (A), and automatic feature extraction for turning phase, entire TUG phase, sit-to-stand and stand-to-sit phases, and double-support moments in the gait cycle (B).

Figure 5 .
Figure 5.The whole process of statistical data analysis and fall risk modelling.

Figure 6 .
Figure 6.Typical examples of diagnostic reports for identifying potential fall risk factors: an individual with low fall risk (left) and an individual with high fall risk (right).

Figure A1 .
Figure A1.Body sway angle based on the full body COM position and ankle position.

Figure A2 .
Figure A2.Method to find features from Sensory Organisation Test: (A) COG trajectories in the AP direction, (B) body sway angles in the AP direction: ‹ and › indicate maximum and minimum sway angles during the task, respectively.

Figure A3 .
Figure A3.Method to find features of the forward reach task of Limit of Stability.

Figure A4 .
Figure A4.Method to find features of Sit to Stand 5 times.

Figure A5 .
Figure A5.The way to find features of Range of Motion test.

Table 1 .
A Kinect-based multifactorial test battery to assess fall risk: subtests and representative outcome measures.Equilibrium Score (ES): The average centre of gravity (COG) sway for each condition in AP/ML directions; Composite Equilibrium Score (CES): A weighted average of equilibrium scores under the 4 conditions.It is derived from the individual equilibrium scores; Sensory analysis ratios: Ratios of the average ESs of C2 Total completion time: The average elapsed time from the beginning to last standing up; Average time of sit-stand (stand-sit): The average elapsed time from sitting to standing or standing to sitting; COG moving distances: The average COG moving distance in AP and ML directions while sit-stand and stand-sit Timed Up and Go (TUG)Stand from a chair, walk 2 m, turn, come back, and sit on the chair with normal walking speed (The test was repeated twice)Total completion time: Average elapsed time to complete the TUG test; Elapsed time of each phase: Average elapsed times for sitto-stand, walking, turning, and stand-to-sit phases; Duration of each phase (%): The percentage of the elapsed time of each phase to the total completion time;

Table 2 .
Sample characteristics of high fall risk group and low fall risk group.Continuous variables were analysed with two-sample t-tests and categorical variables were analysed with chi-squared test.
CharacteristicsHigh fall risk group (N 1 ¼ 22) Low fall risk group (N 2 ¼ 80) Two-sample comparison a , p-value a

Table 3 .
Significant outcome measures of the Kinect-based test battery to distinguish high and low fall risk groups.Significant difference with p � 0.05; # Significant outcome measures in both two-sample t-test and ROC analysis.

Table 4 .
Significant sample characteristics to distinguish high and low fall risk groups.

Table 5 .
Classification performance of fall risk classification models for the test set in each dataset split.

Table 6 .
Comparison of Kinect-based fall risk assessment studies and fall risk classification performances.

Table B1 .
Results of two-sample t-tests for all Kinect-based outcome measures in the test battery for future falls.

COG moving distance in sit-to-stand phases (AP) (cm
Significant difference with p � 0.05.Outcome measures of significant group difference are in bold.