In this paper, we contend that the conventional approach to training and evaluating machine learning models frequently overlooks their application within the real-world organizational or societal contexts they are meant to serve. By shifting to this perspective, we redefine how we assess and choose machine learning models. Our focus is particularly on integrating these models into practical workflows that involve both machines and human experts, with human intervention occurring when machines lack sufficient confidence in their predictions. We demonstrate that traditional metrics such as accuracy and f-score fall short in capturing the true value of machine learning models in such hybrid settings. To address this issue, we introduce a simple but theoretically sound strategy to adapt existing machine learning models so as to maximize value. An extensive experimental evaluation highlights the importance of the value-based perspective in evaluating models, and the impact of calibration and out-of-distribution settings on model value.