During VIDA (development dataset), 2,895 children aged < 5 years sought care for diarrhea in the sentinel health centers, of whom 2,009 (69.4%) had MSD and 1,554 (77.4%) met the study case definition and were subsequently enrolled. Among those enrolled 1, 482 (95.4%) had their memory aids completed by the caretakers, of whom 478 (32.3%) had LDD. While in EFGH (temporal validation dataset), 1,879 children aged < 5 years sought care for diarrhea in the SHCs, of whom 1, 365 (72.6%) were eligible for screening and 706 (51.7%) met the study case definition and were subsequently enrolled. Among those enrolled 685 (97.0%) had their diarrhea diaries completed by the caretakers, of whom 69 (10.1%) had LDD (Fig. 2). There was a statistically significant difference in prevalence of LDD between VIDA and EFGH studies (478 [32.3%] vs 69 [10.1%]; p < 0.001). Additionally, we observed significant differences in the baseline characteristics of participants in the two studies. Specifically, compared to EFGH participants, VIDA participants were older (Median age in months [IQR]: 15.0 [9.0–25.0] vs 13.6 [8.9–20.4], p = 0.0361), had more severe diarrheal episodes (Median Vesikari score [IQR]: 10 [8–13] vs 8 [6–10], p < 0.001), had a higher respiratory rate (Median [IQR]: 36.5 [31.5–41.0] vs 33.0 [28.0–39.0], p < 0.001). Moreover, VIDA participants were more likely to present with vomiting (843 [56.9%] vs 346 [50.5%], p = 0.006), decreased skin turgor (614 [41.4%] vs 137 [20.0%], p < 0.001) and severe dehydration (388 [26.2%] vs 22 [3.2%], p < 0.001) compared to EFGH participants (Table S2).
The characteristics of VIDA participants stratified by LDD status are shown in Table 1. Children who had LDD were younger than those who did not (Median age in months [IQR]: 13 [7–20] vs 16 [10–27], p < 0.001). Furthermore, compared with those who did not have LDD, those with LDD had a higher respiratory rate (Median [IQR]: 37.5 [33-42.5] vs 36 [31–40], p < 0.001), and a higher Vesikari score (Median [IQR]: 11 [9–13] vs 10 [8–12], p < 0.001). Additionally, caretaker education, breastfeeding, stool frequency in 24 hours, belly pain, rectal straining, cough, number of vomiting episodes, prior home oral rehydrating salts use, rotavirus vaccination, fast breathing and decreased skin turgor were significantly associated with LDD.
Table 1
Characteristics of children aged < 5 years seeking care for moderate-to-severe diarrhea in Kenya stratified by Diarrhea duration, 2015–2018.
| Longer Duration Diarrhea (LDD) | |
Characteristics | Yes (n = 478) | No (n = 1,004) | p-value |
| n (%) | n (%) | |
Demograhic | | | |
Median age [IQR] | 13 [7–20] | 16 [10–27] | < 0.001 |
Age Category | | | |
0–11 months | 225 (47.1) | 332 (33.1) | < 0.001 |
12–23 months | 152 (31.8) | 356 (35.5) | |
24–59 months | 101 (21.1) | 316 (31.5) | |
Gender: Female | 213 (44.6) | 463 (46.1) | 0.574 |
Household Details | | | |
Caretaker education ( > = Secondary ) | 53 (11.1) | 153 (15.3) | 0.03 |
Natural Floor | 323 (67.7) | 644 (64.1) | 0.177 |
Refined/Electric Primary Fuel Sourceβ | 14 (3.)) | 44 (4.4) | 0.183 |
Clinical characteristics | | | |
By History | | | |
Breastfeeding before diarrhea onset | | | < 0.001 |
None | 147 (30.8) | 411 (40.9) | |
Partial | 42 (8.8) | 55 (5.5) | |
Exclusive | 289 (60.5) | 538 (53.6) | |
Median diarrhea days [Interquartile range (IQR)] | 4 [3–5] | 2 [2–3] | < 0.001 |
Stool Count | | | |
3 | 75 (15.7) | 192 (19.1) | 0.037 |
4–5 | 256 (53.6) | 562 (56.0) | |
≥ 6 | 147 (30.7) | 250 (24.9) | |
Belly Pain | 299 (65.0) | 568 (58.7) | 0.024 |
Rectal straining | 155 (32.6) | 235 (23.5) | < 0.001 |
Cough | 280 (58.6) | 526 (52.4) | 0.025 |
Vomiting | 255 (53.4) | 588 (58.6) | 0.058 |
No. of vomit | | | |
0 | 223 (46.7) | 416 (41.4) | 0.026 |
1 | 48 (10.0) | 119 (11.9) | |
2–4 | 177 (37.0) | 364 (36.2) | |
≥ 5 | 30 (6.3) | 105 (10.5) | |
Median vomit days [IQR] | 2 [1–3] | 2 [1–2] | < 0.001 |
Home ORS use | 58 (12.1) | 77 (7.7) | 0.005 |
Rotavirus vaccination | 414 (90.6) | 761 (81.7) | < 0.001 |
At enrolment | | | |
Very Thirsty | 350 (74.2) | 695 (70.0) | 0.1 |
Fast breathing | 63 (13.2) | 98 (9.8) | 0.048 |
Median Respiratory rate [IQR] | 37.5 [33-42.5] | 36 [31–40] | < 0.001 |
Dry mouth | | | |
Normal | 4 (0.8) | 23 (2.3) | 0.053 |
somewhat Dry | 441 (92.3) | 891 (88.7) | |
Very dry | 33 (6.9) | 90 (9.0) | |
Skin turgor (slow/very slow) | 222 (46.4) | 392 (39.0) | 0.007 |
Mental Status | | | 0.093 |
Normal | 199 (41.6) | 437 (43.5) | |
Restless/Irritable | 266 (55.7) | 555 (55.3) | |
Lethargic/Unconscious | 13 (2.7) | 12 (1.2) | |
Under Nutrition | 65 (13.6) | 109 (13.6) | 0.125 |
Vesikari Score | | | < 0.001 |
Mild | 15 (3.1) | 130 (13.0) | |
Moderate | 204 (42.7) | 426 (42.4) | |
Severe | 259 (54.2) | 448 (44.6) | |
Median vesikari score [IQR] | 11 [9–13] | 10 [8–12] | < 0.001 |
Cipro_ceft | 20 (4.2) | 62 (6.2) | 0.117 |
β− Includes electricity, propane, butane, natural gas.
ORS-Oral rehydration solution
The following variables had a p-value ≥ 0.2 and are not included in the table: No. of children < 5 years in households; Total assets; Animal ownership; improved water; improved sanitation; shared facility; stool type; Blood in stool; drinks poorly; unable to drink; fever; restless; lethargy; unconscious; rectal prolapse; difficulty breathing; convulsion; sunken eyes; home zinc use; capillary refill; chest indrawing; sunken eyes; Bipedal edema; Abnormal hair; Dehydration; ORS at facility; Zinc at facility; IV rehydration; any_antibiotic; Malaria diagnosis; Dysentry diagnosis; Stunting; Wasting
From the feature selection analysis, the selected variables in order of importance were diarrhea days prior to presentation (55.1%), Vesikari score (18.2%), age group (10.7%), vomit days (8.8%), breastfeeding (8.4%), respiratory rate (6.5%), vomiting (6.4%), number of vomits in last 24 hours (6.2%), rotavirus vaccination (6.1%) and rectal straining (3.4%). Skin pinch (2.4%) and number of loose stools in last 24 hours (2.4%) were tentative features (Fig. 3).
We evaluated seven ML algorithms in the prediction of LDD. From the developed models, sensitivity was highest in the RF model (80.7%), followed by the LR (76.5%), ANN (75.6%), SVM (73.9%), KNN (73.1%), the GBM model (72.3%) and lowest in the NB model (69.7%). The specificity of the GBM and SVM models were the highest (76.5%), followed by ANN (74.9%), LR (74.5%), RF and NB (74.1%), and lowest in the KNN model (72.1%). The PPV ranged between 55.4% − 59.9% while the NPV ranged between 83.8% − 89.0%. The AUC of the models in decreasing order was 83.0%, 82.0%, 81.5%, 81.1%, 80.5%, 79.7% and 77.3% for RF, SVM, ANN, GBM, LR, KNN and NB, respectively (Table 2). The RF model emerged as the champion model with 80.7%, 74.1%, 59.6%, 89.0%, 68.6%, 83.0% and 90.0% for sensitivity, specificity, PPV, NPV, F1-score, AUC and PRAUC, respectively.
Table 2
Longer Duration Diarrhea (LDD) prediction models with Over-sampling technique used in the resampling procedure: Model Performance
Algorithm | LDD Prediction with over-sampling technique |
Sensitivity % [95% CI] | Specificity % [95% CI] | PPV % [95% CI] | NPV % [95% CI] | F1-Score % [95% CI] | AUC % [95% CI] | PRAUC % [95% CI] |
RF | 80.7 [72.4–87.3] | 74.1 [68.2–79.4] | 59.6 [51.6–67.3] | 89.0 [83.9–92.9] | 68.6 [56.3–75.3] | 83.0 [78.6–87.5] | 90.0 [85.9–93.8] |
GBM | 72.3 [63.3–80.1] | 76.5 [70.8–81.1] | 59.3 [50.8–67.4] | 85.3 [80.0-89.7] | 65.2 [41.7–73.1] | 81.1 [76.3–86.0] | 87.8 [83.3–93.0] |
NB | 69.7 [60.7–77.8] | 74.1 [68.2–79.4] | 56.1 [47.7–64.2] | 83.8 [78.3–88.4] | 62.2 [36.4–68.9] | 77.3 [72.1–82.6] | 85.7 [82.2–89.0] |
LR | 76.5 [67.8–83.8] | 74.5 [68.6–79.8] | 58.7 [50.5–66.5] | 87.0 [81.7–91.2] | 66.4 [44.1–69.4] | 80.5 [75.6–85.4] | 88.7 [84.4–92.7] |
SVM | 73.9 [65.1–81.6] | 76.5 [70.8–81.6] | 59.9 [51.5–67.9] | 86.1 [80.9–90.4] | 66.2 [42.2–69.6] | 82.0 [77.3–86.7] | 89.3 [85.1–93.1] |
KNN | 73.1 [64.2–80.8] | 72.1 [66.1–77.6] | 55.4 [47.3–63.3] | 85.0 [79.5–89.5] | 63.0 [40.4–70.7] | 79.7 [74.9–84.5] | 87.0 [83.9–91.0] |
ANN | 75.6 [66.9–83.0] | 74.9 [69.1–80.1] | 58.8 [50.6–66.7] | 86.6 [81.4–90.9] | 66.1 [46.3–74.1] | 81.5 [76.8–86.2] | 89.1 [84.8–93.6] |
*RF-Random Forest; GBM-Gradient Boosting; NB- Naïve Bayes; LR-Logistic Regression; SVM- Support vector machine; KNN-K-nearest neighbors; ANN-Artificial Neural Networks;
95% CI- 95% Confidence Interval; PPV- Positive Predictive Value; NPV- Negative Predictive Value; AUC- Area under the Curve; PRAUC- Precision Recall Area under the Curve
The receiver operating characteristic (ROC) curves for LDD prediction models are shown in Figure S3. Furthermore, in the prediction of the duration of diarrhea post-enrolment (≥ 7 days), the model performance ranged between 42.3%-78.8%, 45.3%-72.3%, 16.8%-22.1%, 88.3%-90.9%, 26.5%-30.7%, 52.9%-64.4%, and 86.9%-92.0% for sensitivity, specificity, PPV, NPV, F1-score, AUC and PRAUC, respectively (Table 3). The model performance in the prediction of LDD when no sub-sampling technique was employed are shown in Table S1.
Table 3
Post-enrolment duration (≥ 7 days) prediction models with Over-sampling technique used in the resampling procedure: Model Performance
Algorithm | Post-enrolment Duration Prediction (≥ 7 days) |
Sensitivity % [95% CI] | Specificity % [95% CI] | PPV % [95% CI] | NPV % [95% CI] | F1-Score % [95% CI] | AUC % [95% CI] | PRAUC % [95% CI] |
RF | 48.1 [34.0-62.4] | 72.3 [67.1–77.2] | 22.1 [14.9–30.9] | 89.5 [85.1–93.0] | 30.3 [-19.4-56.0] | 63.3 [55.9–70.7] | 91.6 [88.1–94.4] |
GBM | 42.3 [28.7–56.8] | 71.1 [65.7–76.0] | 19.3 [12.5–27.7] | 88.3 [83.7–92.0] | 26.5 [-25.7-55.3] | 61.1 [54.1–68.1] | 91.7 [87.7–94.8] |
NB | 78.8 [65.3–88.9] | 45.3 [39.7–50.9] | 19.1 [14.0–25.0] | 92.9 [87.7–96.4] | 30.7 [0.5–45.2] | 62.3 [54.6–70.0] | 90.6 [87.6–92.9] |
LR | 61.5 [47.0-74.7] | 59.1 [53.5–64.6] | 19.8 [13.9–26.7] | 90.4 [85.5–94.0] | 29.9 [-11.7-39.7] | 64.2 [57.1–71.3] | 91.8 [88.5–95.2] |
SVM | 59.6 [45.1–73.0] | 62.6 [57.0-67.9] | 20.7 [14.5–28.0] | 90.5 [85.8–94.0] | 30.7 [-12.6-50.1] | 62.7 [55.7–69.7] | 91.2 [87.3–95.3] |
KNN | 59.6 [45.1–73.0] | 51.6 [45.9–57.2] | 16.8 [11.7–22.9] | 88.6 [83.2–92.8] | 26.2 [-12.9-43.5] | 52.9 [44.5–61.3] | 86.9 [83.7–89.9] |
ANN | 67.3 [52.9–79.7] | 53.5 [47.8–59.0] | 19.1 [13.7–25.6] | 90.9 [85.8–94.6] | 29.8 [-7.8-50.2] | 64.4 [57.2–71.6] | 92.0 [88.8–95.3] |
*RF-Random Forest; GBM-Gradient Boosting; NB- Naïve Bayes; LR-Logistic Regression; SVM- Support vector machine; KNN-K-nearest neighbors; ANN-Artificial Neural Networks;
95% CI- 95% Confidence Interval; PPV- Positive Predictive Value; NPV- Negative Predictive Value; AUC- Area under the Curve; PRAUC- Precision Recall Area under the Curve
Overall the Brier scores were low and ranged between 0.17–0.21, however the Spiegelhalter’s p-value showed that the NB and KNN models did not calibrate well in the automated algorithm (p < 0.05) (Table 4). From the explanatory model analysis of the prediction of LDD, the degree of importance varied across models with the likelihood of developing LDD increasing with pre-enrolment diarrhea days, severity based on the modified Vesikari score, no rotavirus vaccination, normal skin turgor and age. Conversely, the likelihood of progressing to LDD decreased with no vomiting (vomit = 0, number of vomits in last 24 hours = 0, and vomit days = 0) and number of loose stools in last 24 hours (≥ 6) (Fig. 4).
Table 4
Calibration results of Longer Duration Diarrhea (LDD) prediction models.
Algorithm | Brier Score | Spiegelhalter Z-score | Spiegelhalter p-value |
RF | 0.17 | -1.23 | 0.219 |
GBM | 0.18 | -0.95 | 0.341 |
NB | 0.20 | 4.82 | < 0.0001 |
LR | 0.18 | -0.58 | 0.565 |
SVM | 0.17 | -0.92 | 0.357 |
KNN | 0.19 | -3.86 | < 0.0001 |
ANN | 0.18 | -0.18 | 0.855 |
*RF-Random Forest; GBM-Gradient Boosting; NB- Naïve Bayes; LR-Logistic Regression; SVM- Support vector machine; KNN-K-nearest neighbors; ANN-Artificial Neural Networks;
Furthermore, we observed similar patterns in the EMA results of the prediction of ≥ 7 days of diarrhea post-enrolment with the only difference being in pre-enrolment diarrhea days, which decreased the likelihood of developing the outcome in this prediction (Fig. 5).
From the business value evaluation of our champion model (RF), the cumulative gains plot shows that the model is able to select 46% of the target class (LDD) if we select the top-20% cases based on our model. Additionally, from the cumulative lift plot, our champion model is able to identify 2.6 times more LDD cases compared to a random selection if we pick the top-20% observations based on model probability. Lastly, from the cumulative response plot, 72% of observations in the top-20% cases based on model probability belong to the target class (Fig. 6).
Temporal Validation in EFGH data
We observed a decline in model performance on the temporal validation dataset, the RF model achieved 37.7%, 86.0%, 23.2%, 92.5%, 27.9%, 68.4% and 94.4% for sensitivity, specificity, PPV, NPV, F1-score, AUC and PRAUC, respectively. We observed a marginal increase in model performance in the sensitivity analysis when including only EFGH enrollees that met the VIDA inclusion criteria, the RF model achieved 47.5%, 80.5%, 25.7%, 91.5%, 33.3%, 71.0% and 93.8% for sensitivity, specificity, PPV, NPV, F1-score, AUC and PRAUC, respectively (Fig. 7).