To run the two models and see the results, various tools and techniques were used as shown in Table 1.
Table 1
Tools and techniques used in the triage fall detection model.
Tool/ Technique | Description |
Pose estimation | Human pose estimation to track and detect human movements. |
TensorFlow | Framework supports ML models used for building AI and solving problems like detecting human poses. |
Key points | Human joints (a.k.a landmarks) like wrists, knees, and shoulders. |
Keras | A high-level API build on the Python neural network library that offers a user-friendly and modular approach to building deep learning models |
MoveNet | An ultra-fast model that is able to accurately detect the 17 key points of a body. |
Feature Importance | ML technique used to evaluate the significance and impact of various features in predicting the target variable [37, 38]. |
CNN
The images were imported into MoveNet for the extraction of 17 keypoints. MoveNet generates features related to the 17 keypoints, totaling 51 features.
For both models, we did comparative analysis to evaluate the performance. The confusion matrix was used to measure the performance of DL model against FRF model based on accuracy, specificity, and sensitivity. Figures (7) shows the confusion matric classification measure representation.
The DL model achieved 82% accuracy, Figure (8) shows a visual representation of the training and validation sets.
The second model FRF shows an accuracy of 94%. Figures (9) presents the visual representation of the training and validation sets.
Table 2
The three performance metrics for the two experiments DL model and FRF
Classification model | Accuracy | Specificity | Sensitivity |
a) DL model | 82 | 65 | 82 |
b) FRF model | 94 | 91 | 94 |
Table 2 shows the performance result for a) DL model, b) FRF model. As seen in Figure (10) the results show the superiority of the FRF model and its ability to generalize better. Figures (10) shows the accuracy of both models. We can conclude that the result is due to the use of feature importance RF. The use of the RF layer enhanced fall detection decision-making, which support our claim that in hypotheses.
Statistical Analysis
In order to conduct the T-test, we collected more 40 new samples that represent scenarios in which there were “falls” and “not falls”. N = 40 images in each group splitting them into two group. The sample was intended to perform a comparative analysis between two distinct groups: control group and test group.
a. DL model:
Control group (nursing evaluation), and test group (DL model). The result of T-test shows that p-value = 0.0089 which is < 0.05, confidential interval reaches 95% from the range 0.08 to 0.51, which indicates there is statistical difference between the assessing of nursing team (control group) and FFM assessing (test group).
Confidence interval does not include zero value in the range of 0.08 to 0.15, which indicates there is a very effect on data. t value = 2.7320 and degree of freedom (df) = 46 and standards error of difference = 0.107.
As shown in Table 3 the mean in control group higher than the mean in test group. Furthermore, the standard deviation (SD) is smaller in the control group which indicates less variability and higher precision in the mean. the standard error of the mean (SEM) in control group is smaller than SEM in test group. All this observation led to reject the null hypothesis for FFM.
b. RFR model
In addition, to what we showed in experiment #1 that FRF model Control group (nursing evaluation), and test group (FRF). The p-value = 0.3110 which is > 0.05, confidential interval reaches 95% from the range − 0.05 to 0.15, which indicates there is no statistical difference between the assessing of nursing team (control group) and FRM assessing (Test group)
Confidence interval includes zero value in the range of 0.05 to -0.15, which shows there is no static and important effect on data.
After analyzing the means, SD, and SEM on control and test groups results as shown in Table 3, we notice the following:
Mean for control group is 0.98 while mean for test group is 0.93. This shows that control group, achieved higher results compared to test group in mean.
RFR outperform the DL model, DL model.
SD for control group (0.16) is less than the SD for test group (0.27). This shows that results in control group are more concentrated around the mean, while the results in the test group are more distributed.
SEM for control group (0.02) is less than SEM result for test group (0.04), which shows that the mean of control group could be more accurate.
Based on the above, it’s clear that FRF doesn’t vary a lot from the nursing team. Thus, null hypothesis cannot be rejected for FRF.
Table 3
T-test for the two groups for both models
Model | | Control group | Test group | p-value |
DL model | Mean | 0.98 | 0.67 | 0.0089 |
SD | 0.20 | 0.48 |
SEM | 0.04 | 0.10 |
FRF model | Mean | 0.98 | 0.93 | 0.3110 |
SD | 0.16 | 0.27 |
SEM | 0.02 | 0.04 |
To evaluate the performance of our algorithm FRF, we compare it with the existing method currently used in emergency department in hospital and carried out by human only triage system.
As shown in Table 2, using dimensionality reduction in FRF was important to refine the triage fall detection accuracy and improve its computational efficiency with an overall accuracy of 94%, which is 12% higher than the DL model. We also used a balanced data for both classes. Hence, the FRF proves it can be as good as human classification and the result of the statistical analysis shows in Table 3.