This study presents an in-depth analysis of the false positive cases resulting from a sepsis prediction algorithm and suggests a methodology to mitigate false alarms. To our knowledge, this is the first focused false positive analysis for ML sepsis prediction model. By examining patients with underlying conditions that register high SOFA scores but without any sepsis or infection identified, we were able to identify major contributors to false alarms. The methodology we propose resulted in percent change increases in PPV up to 6.9% and specificity up to 3.52% while the sensitivity decreased by 4.17% but still maintained above the 0.85 threshold. Importantly, we demonstrate that this methodology facilitated up to a 10.89% reduction in false positives while the reduction in true positives was only 3.95% on the test sets.
Artificial intelligence-based sepsis prediction models have high false positive rates, ranging between 15–47%.(4, 12, 13) When utilizing machine learning, causes of false alerts may include site-to-site variation,(14) differences in hospital workflow and practices,(15) and variation in data missingness.(16) Brief analyses of false alarms have been conducted in other prediction algorithms.(9, 17) However, they did not provide a methodology to mitigate false positives nor provide evidence of successful reduction in false alerts. One important aspect recognized in prior studies is the effect of sepsis mimics via high SOFA scores on false positives.(18–20) To explore this in more detail, we examined several comorbidities that give rise to high SOFA scores, but do not necessarily lead to sepsis. By focusing on these individual subpopulations, and adjusting their thresholds in the ML model, we devised a method to reduce false positives for our sepsis prediction algorithm while having a fixed sensitivity of > 0.85. Our findings suggest this methodology could be used in broader application to other machine learning prediction algorithms to reduce false alarms.
The substantial reduction in false positives in comparison to true positives suggests this methodology could prove useful in clinical settings. However, to implement reductions in false positive cases in a prospective setting would require adjustments. One manner to achieve this is to perform scaling of the weight of specific underlying conditions during model development.(21) Essentially, underlying conditions that substantially contribute to more false positive cases could be weighted to reduce the effect on model prediction results. This proposed method would reduce the number of false positive cases before model output in a streamlined manner. Another method to achieve a reduction in false positives would be to introduce a notification caveat when an alert is generated if a patient is identified as having a known underlying condition. Completely excluding patients with known underlying conditions that mimic sepsis could be dangerous as missed detections can cause considerable harm, and those patients are still potentially quite sick. Producing a notification caveat would enable the health care provider to use their best judgment regarding patient status and potential treatment actions. These methods of implementing reductions in false positives would need to be fully analyzed to ascertain the potential benefit in mitigating clinician workload as well as improving patient outcomes.
Limitations
This study has several limitations. It is retrospective in nature and thus the performance of the algorithm and the underlying conditions affecting the false positive cases may be different in a prospective clinical setting. External validation in prospective clinical settings are necessary to determine the consistency of the model output and the analysis of false positive cases. This would also be necessary to determine how health care providers may respond to alerts for patients at risk of sepsis and also assess the potential benefit in sepsis patient outcomes.
Future perspectives
This detailed analysis of false positive cases highlights potential false alerts that could be reduced. Future development of sepsis prediction algorithms that include false positive analysis, either in algorithm design and development or as post hoc analysis of algorithm output, may provide pre-alert notification stratification informed by end-user preferences.