Background Unlike linear models, complex machine learning models can capture non-linear interrelations and provide opportunities to identify novel risk factors. Explainable artificial intelligence can improve prediction accuracy and reveal unprecedented insights into outcomes like mortality. This paper comprehensively analyzes all-cause mortality by explaining complex machine learning models.
Methods We propose the IMPACT framework that uses a principled XAI technique to explain a state-of-the-art tree ensemble mortality prediction model. We apply IMPACT to understand all-cause mortality for 1-, 3-, 5-, and 10-year follow-up times and age groups of <40, 40-65, 65-80, and <=80 years old within the NHANES 1999-2014 dataset, which contains 47,261 samples and 151 features.
Results Here we show that IMPACT models achieve high accuracy in every mortality prediction task. Using IMPACT, we identify several overlooked risk factors and interactions. Furthermore, we identify relationships between laboratory features and mortality that may suggest adjusting established reference intervals. Finally, we develop highly accurate, efficient, and interpretable mortality risk scores. We ensure generalizability by performing temporal validation of the mortality risk scores and external validation of the key findings with the UK Biobank dataset. All our results and risk scores are available on a website where the relationships can be explored in detail to generate new research hypotheses.
Conclusions IMPACT’s unique strength is the explainable prediction, which provides insights into the complex, non-linear relationships between mortality and individual's features. Our explainable risk scores could help individuals improve self-awareness of their health status and help clinicians identify patients with high risk. IMPACT takes a significant step towards bringing contemporary developments in XAI, which have already revolutionized fields like finance, to epidemiological research.