The study cohort included 49,282 patients (median [IQR] age = 64.0 [28.0]), 51.7% self-identified as female and 29.5% self-identified as Black or African American. 52.2% were on Medicare and 28.7% on Medicaid. Baseline characteristics are summarized in Table 1.
Discrimination and Calibration
The model overall had a c-index of 0.83 (95% CI: 0.82, 0.84), with a pre-recalibration intercept and slope of -1.25 (95% CI: -1.28, -1.22) and 1.55 (95% CI: 1.51, 1.59) respectively (Table 2). The c-index for Black and White patients were 0.843 and 0.829 respectively. The sensitivity, specificity, PPV, and NPV for Black patients was 0.76, 0.75, 0.49, and 0.91 respectively. The sensitivity, specificity, PPV, and NPV for White patients was 0.75, 0.73, 0.47, and 0.90 respectively. Prior to recalibration, the calibration of the model was similar for both Black and White patients (Table 3–4, Supplementary Fig. 1A-B). Logistic recalibration improved calibration across different statistics: rescaled Brier score, calibration intercept and slope, and Emax and Eavg. Full calibration statistics for Black and White patients are shown in Table 3. Calibration curves for Black and White Patients are displayed in Fig. 1A-C. We also found calibration statistics to be similar in Asian patients (Supplementary Table 1).
Table 2
Overall Calibration Statistics for MUST-Plus Model
| No Recalibration (95% CI) | Recalibration In the Large (95% CI) | Logistic Recalibration (95% CI) |
Rescaled Brier Score | 0.01 (0.01, 0.03) | 0.25 (0.24, 0.26) | 0.27 (0.26, 0.28) |
Calibration Intercept | -1.25 (-1.27, -1.22) | 0.49 (0.46, 0.52) | 0 |
Calibration Slope | 1.55 (1.51, 1.59) | 1.55 (1.51, 1.59) | 1 |
Emax | 0.27 (0.26, 0.28) | 0.26 (0.22, 0.28) | 0.01 (0.008, 0.03) |
Eavg | 0.19 (0.19, 0.20) | 0.05 (0.04, 0.05) | 0.005 (0.003, 0.007) |
Table 3
Calibration Statistics for MUST-Plus Model by Race, Gender, and Year
| No Recalibration | Recalibration In the Large | Logistic Recalibration |
| Rescaled Brier Score (95% CI) | Calibration Intercept (95% CI) | Calibration Slope (95% CI) | Emax (95% CI) | Eavg (95% CI) | Rescaled Brier Score (95% CI) | Calibration Intercept (95% CI) | Calibration Slope (95% CI) | Emax (95% CI) | Eavg (95% CI) | Rescaled Brier Score (95% CI) | Calibration Intercept (95% CI) | Calibration Slope (95% CI) | Emax (95% CI) | Eavg (95% CI) |
Black | 0.07 (0.06, 0.09) | -1.16 (-1.20, -1.11) | 1.56 (1.50, 1.62) | 0.25 (0.24, 0.26) | 0.18 (0.17, 0.19) | 0.27 (0.25, 0.29) | 0.48 (0.43, 0.53) | 1.56 (1.50, 1.63) | 0.23 (0.18, 0.27) | 0.05 (0.04, 0.06) | 0.29 (0.28, 0.31) | 0 | 1 | 0.007 (0.005, 0.03) | 0.002 (0.002, 0.006) |
White | 0.01 (0, 0.03) | -1.24 (-1.28, -1.20) | 1.58 (1.51, 1.64) | 0.27 (0.26, 0.28) | 0.19 (0.19, 0.20) | 0.24 (0.23, 0.26) | 0.50 (0.44, 0.55) | 1.58 (1.51, 1.64) | 0.28 (0.24, 0.33) | 0.05 (0.05, 0.06) | 0.27 (0.25, 0.29) | 0 | 1 | 0.04 (0.01, 0.06) | 0.009 (0.005, 0.01) |
Male | 0.14 (0.13, 0.14) | -0.99 (-1.03, -0.96) | 1.60 (1.55, 1.65) | 0.23 (0.22, 0.24) | 0.16 (0.15, 0.16) | 0.28 (0.27, 0.29) | 0.46 (0.42, 0.50) | 1.60 (1.55, 1.65) | 0.24 (0.20, 0.27) | 0.06 (0.05, 0.06) | 0.31 (0.29, 0.32) | 0 | 1 | 0.008 (0.005, 0.02) | 0.003 (0.002, 0.005) |
Female | -0.13 (-0.15, -0.12) | -1.53 (-1.57, -1.48) | 1.56 (1.51, 1.62) | 0.32 (0.31, 0.33) | 0.23 (0.22, 0.23) | 0.22 (0.21, 0.24) | 0.57 (0.51, 0.62) | 1.56 (1.51, 1.62) | 0.28 (0.22, 0.32) | 0.05 (0.04, 0.05) | 0.25 (0.24, 0.27) | 0 | 1 | 0.02 (0.02, 0.04) | 0.01 (0.007, 0.01) |
2021 | 0.02 (0.02, 0.03) | -1.29 (-1.33, -1.26) | 1.62 (1.57, 1.67) | 0.28 (0.27, 0.28) | 0.19 (0.19, 0.20) | 0.26 (0.25, 0.27) | 0.55 (0.51, 0.59) | 1.62 (1.58, 1.67) | 0.27 (0.23, 0.30) | 0.06 (0.05, 0.06) | 0.29 (0.28, 0.31) | 0 | 1 | 0.01 (0.008, 0.03) | 0.005 (0.003, 0.008) |
2022 | 0 (-0.01, 0.01) | -1.21 (-1.25, -1.17) | 1.47 (1.42, 1.52) | 0.26 (0.25, 0.27) | 0.19 (0.19, 0.20) | 0.23 (0.22, 0.24) | 0.41 (0.37, 0.46) | 1.47 (1.42, 1.52) | 0.24 (0.19, 0.28) | 0.04 (0.04, 0.05) | 0.25 (0.23, 0.27) | 0 | 1 | 0.02 (0.006, 0.04) | 0.004 (0.002, 0.008) |
Table 4
Empirical Bootstrap Differences in Calibration Intercept and Slope before Recalibration
Comparison | Calibration Intercept Mean Difference (P-Value) | Calibration Slope Difference (P-value) |
White - Black | -0.08 (0.006) | 0.02 (0.35) |
Female - Male | -0.53 (0)* | -0.04 (0.14) |
2022 − 2021 | 0.08 (0)* | -0.15 (0)* |
*Denotes statistical significance at Bonferroni adjusted threshold of 0.003 |
The c-indices for male and female patients were 0.84 and 0.83 respectively. The sensitivity, specificity, PPV, and NPV for female patients was 0.78, 0.71, 0.41, and 0.92 respectively. The sensitivity, specificity, PPV, and NPV for male patients was 0.73, 0.77, 0.53, and 0.89 respectively. The calibration for females was statistically different that that for males, with more negative calibration intercepts (P-value = 0), higher Emax, and higher Eavg (Tables 3–4; Supplementary Fig. 1C). Calibration curves for male and female patients are displayed in Fig. 1D-F.
Calibration by year was also examined to see if there was performance drift over time since the model’s inception in 2018. The year 2020 was removed due to the COVID-19 pandemic. The c-indices in 2021 and 2022 were 0.84 and 0.82 respectively. The sensitivity, specificity, PPV, and NPV for the model in 2021 was 0.76, 0.75, 0.48, and 0.91 respectively. The sensitivity, specificity, PPV, and NPV for the model in 2022 was 0.74, 0.72, 0.46, and 0.90 respectively. The calibration of the model changed significantly from 2021 to 2022 (Tables 3–4, Supplementary Fig. 1E-F). The average calibration intercept for assessments made in 2022 was higher than those made in 2021, while the slope was lower, indicating that there was greater underestimation of malnutrition risk. We also assessed calibration by payor type and hospital type as sensitivity analyses. In the payor type analysis, we found that malnutrition risk was overestimated in patients with commercial insurance, but underestimated in patients with Medicaid and Medicare (Supplementary Table 2–3, Supplementary Fig. 2A-B, Supplementary Fig. 3). We did not observe substantial differences in calibration across hospital type (community, tertiary, quaternary) (Supplementary Table 4–5, Supplementary Fig. 2C-D, Supplementary Fig. 4).