Prior work evaluating the efficacy of the MFS has identified key limitations in its ability to accurately assess fall-risk.[7–9] Although alternative fall-risk assessment tools have been proposed, these tools often do not consider changes in a patients’ clinical presentation nor leverage the robust data available within EHR.
In this assessment, we utilized existing EHR data to develop a time series classification model with the goal to more accurately predict the likelihood of falls within VHA acute care settings. Within our finalized model, we obtained an AUROC of 85.1% and an AUPRC of 28.5%. Although an AUROC of 85.1% is considered good, an AUPRC of 28.5% is low.[39] However, considering the severe class imbalance, in which 2.5% of Veterans had a documented fall, an AUPRC of 28.5% is substantially better than identifying those at risk by chance alone. Based upon the optimal geometric mean, we calculated a specificity of 76.2% and a sensitivity of 77.3%. With this, our model outperformed other studies which utilized the MFS. Although the MFS was originally reported to have a sensitivity of 78 and a specificity of 83,[6] recent studies have shown sensitivities as low as 36.9 and specificities as low as 48.3.[8–10] As such, our model has outperformed recent evaluations of the MFS.
Compared to the performance of other tools, our model appears to be an improvement. For example, Nassar et al., evaluated the performance of the Hendrich II Fall Risk Model and obtained a sensitivity of 55.3% and a specificity of 89.3%.[9] In addition, for the evaluation of the John Hopkins Fall Risk Assessment Tool, Kim et al. calculated a sensitivity of 67.8% and a specificity of 80.2%.[8] The greater specificity at the cost of sensitivity for these alternative models are important when considering how to optimally balance care goals and available resources.
Understanding clinical trajectories and disease progressions are known to be important for forecasting patient outcomes,[40, 41] a feature prior fall prediction work has not included. With this knowledge, we incorporated temporal events to assist the model in understanding the time-dependent latent structures and feature representations within the data. This enabled the model to evaluate the magnitude and sequence of events that may impact clinical outcomes.[42] For example, understanding that a medication was prescribed directly following a surgical procedure may provide greater information than a documentation of a patient’s past surgical history and prescription medications.
Our work has revealed and quantified several important demographic risk factors for falls in our patient population. As expected, older patients were at increased risk for fall as were men. Furthermore, we noted race and ethnicity differences in fall risk with greater variability. These types of results emphasized the sometimes subtle and nuanced nature of fall risk. Furthermore, we also observed notable differences when assessing the model's performance for different demographic groups. This is an important reminder that the optimal use of any model may require it to be adjusted or configured for an individual patient.
There are several notable strengths of our work. We utilized existing historical EHR data from a large national cohort. This allowed for prompt analysis representative of our specific VHA population. This is particularly important as creating and training a model on the population that it will be used for, is expected to result in more appropriate and relevant results.[43] Furthermore, building a fall prediction model that utilizes existing and available data will allow for the creation of automated and efficient clinical decision support tool that can empower staff to redirect the time saved to other tasks. In addition, by evaluating falls that occurred within acute care settings only, in which reporting falls is required,[5] we optimized the reliability of the falls data.[2, 44–46]
There are also several limitations, such that this assessment was performed for our specific Veteran population, that are statically older and have a greater likelihood for comorbidities, as well as specific conditions that influence results and limit generalizability. In addition, because 2.5% of patients had a documented fall during their inpatient stay there was a class imbalance for the model. In general, severe class imbalance such as this limits the ability to predict patient outcomes accurately and precisely. This is further emphasized by the ill-fitting probability calibration curve and the low probability threshold as defined by the geometric mean. While the prevalence of falls reflects those documented elsewhere,[47, 48] further work is needed to understand the nuance differences among these two cohorts to enhance model performance. Exploring alternative data sources or extracting more granular data may assist in these efforts. In addition, the performance of this model was measured against historical data. Therefore, although the results are promising for improved care, the performance may not reflect the results from clinical use. Therefore, a more complete understanding of the effectiveness of any risk assessment tool requires rigorous prospective assessment.
As noted, there remains substantial and robust data available within unstructured clinical notes that was not utilized in this model. Nonetheless, the additional data may further improve fall risk identification.[19, 49, 50] Therefore, future work should examine how to leverage emerging tools such as large language models, that can extract insights from unstructured text, will improve performance.
Furthermore, the methods employed within this assessment do not allow for model transparency. Although alternative classification methods may allow us to examine feature importance (e.g., decision trees, logistic regression), these tools often lack the robustness time series methods and transformer models provide. Future work is needed to enhance the interpretability of time series models and identifying feature importance.