A total of 1,554 and 706 children were enrolled in the development and temporal validation cohorts, respectively. Among children aged 6–35 months enrolled, 1,106 (71.2%) and 655 (92.7%) had HAZ data that were plausible, respectively. Among those that had plausible HAZ data, 187 (16.9%) and 147 (22.4%) had LGF in the development and temporal validation cohorts, respectively (Fig. 1).
Development dataset (VIDA: 2015–2018) Temporal validation dataset (EFGH: 2022–2023)
VIDA- Vaccine Impact on Diarrhea in Africa Study
EFGH-Enterics for Global Health Shigella Surveillance study
MSD-Moderate-to-Severe Diarrhea; MAD-Medically Attended Diarrhea
Figure 1. Flowchart of development and temporal validation studies conducted in Siaya County, Kenya
This difference in the prevalence of LGF between the development and temporal validation cohorts was statistically significant (p = 0.0042). The median [interquartile range] ΔHAZ between enrollment and follow-up was − 0.21 [-0.42- -0.01] and − 0.24 [-0.48- -0.02] in the development and temporal validation cohorts, respectively. In the sensitivity analysis using the cut-off of negative change in HAZ, the prevalence of LGF was 1,051 (28.7%). Additionally, the constructed synthetic dataset had 8,527 observations and it closely replicated the propensity score distribution of the original development data (VIDA) as evidenced by the comprehensive descriptive analysis that compared each variable (Table S1).
The characteristics of VIDA participants at enrolment stratified by LGF status are shown in Table 1. Children who had LGF were younger than those who did not (Median age in months [IQR]: 11 [8–14] vs 17 [11–24], p < 0.001). Furthermore, compared with those who did not have LGF, those with LGF had a higher respiratory rate (Median [IQR]: 38.5 [34.0-42.5] vs 36.0[31.5–39.5], p < 0.001), a higher temperature (Median [IQR]: 37.1 [36.6–37.8] vs 36.8 [36.4–37.5], p < 0.001) and more severe disease (Median Vesikari score [IQR]: 11 [9–12] vs 10 [8–12], p < 0.001). Additionally, caretaker education, breastfeeding, vomiting, wrinkled skin, restless, admission, and intravenous rehydration were significantly associated with LGF (Table 1).
Table 1
Characteristics of children aged < 5 years seeking care for moderate-to-severe diarrhea in Kenya stratified by Linear Growth Faltering Status, 2015–2018.
| Linear Growth Faltering | |
---|
Characteristics | Yes (n = 187) | No (n = 919) | p-value* |
---|
| n (%) | n (%) | |
---|
Demograhic | | | |
Median age [IQR] | 11 [8–14] | 17 [11–24] | < 0.001 |
Age Category | | | |
0–11 months | 104 (55.6) | 259 (28.2) | < 0.001 |
12–23 months | 74 (39.6) | 428 (46.6) | |
24–59 months | 9 (4.8) | 232 (25.2) | |
Gender: Female | 83 (44.4) | 428 (46.6) | 0.584 |
Household Details | | | |
Caretaker education ( > = Secondary ) | 78 (41.7) | 305 (33.2) | 0.026 |
<= 2 children under 5 yrs | 167 (89.3) | 839 (91.2) | 0.387 |
<= 4 people sleeping | 77 (41.2) | 400 (43.6) | 0.546 |
<= 3 Total Assets | 158 (84.5) | 812 (88.4) | 0.142 |
Refined/Electric Primary Fuel Source | 5 (2.7) | 39 (4.3) | 0.313 |
Animal ownership | 176 (94.1) | 836 (91.0) | 0.159 |
Improved water | | | |
Safely managed | 83 (44.3) | 431 (46.9) | 0.15 |
Basic | 14 (7.5) | 112 (12.2) | |
Limited | 28 (15.0) | 125 (13.6) | |
unimproved/Surface water | 62 (33.2) | 251 (27.3) | |
Improved Sanitation | | | |
Safely Managed and Basic | 20 (10.7) | 106 (11.5) | 0.392 |
Limited | 72 (38.5) | 306 (33.3) | |
Unimproved/Open Defecation | 95 (50.8) | 507 (55.2) | |
Clinical characteristics | | | |
Reported by caretaker | | | |
Breastfeeding before diarrhea onset | | | |
None | 25 (13.4) | 248 (37.9) | < 0.001 |
Exclusive | 3 (1.6) | 8 (0.9) | |
Partial | 159 (85.0) | 563 (61.2) | |
Median diarrhea days [IQR] | 3 [2–3] | 3 [2–4] | 0.7196 |
Stool Type | | | |
Simple watery | 113 (60.4) | 532 (57.9) | 0.41 |
Rice watery | 5 (2.7) | 12 (1.3) | |
Sticky/Mucoid | 65 (34.8) | 347 (37.8) | |
Bloody | 4 (2.1) | 28 (3.1) | |
Stool Count | | | |
3 | 27 (14.4) | 165 (18.0) | 0.489 |
4–5 | 101 (54.0) | 506 (55.0) | |
6–10 | 55 (29.4) | 228 (24.8) | |
> 10 | 4 (2.1) | 20 (2.2) | |
Blood in stool | 15 (8.0) | 108 (11.8) | 0.138 |
Vomiting | 127(67.9) | 531 (57.8) | 0.01 |
Very Thirsty | 156 (83.9) | 752 (82.2) | 0.582 |
Drinks poorly | 47 (25.1) | 232 (25.3) | 0.962 |
Unable to drink | 2 (1.1) | 27 (2.9) | 0.145 |
Belly Pain | 109 (61.2) | 508 (57.9) | 0.404 |
Fever | 142 (75.9) | 709 (77.2) | 0.720 |
Restless | 151 (80.8) | 710 (77.3) | 0.295 |
Lethargy | 123 (65.8) | 600 (65.3) | 0.898 |
unconscious | 7 (3.7) | 32 (3.5) | 0.864 |
Rectal straining | 55 (29.4) | 211 (23.1) | 0.066 |
Rectal prolapse | 2 (1.1) | 15 (1.6) | 0.565 |
Cough | 103 (55.1) | 482 (52.5) | 0.511 |
Difficulty breathing | 32 (17.1) | 124 (13.5) | 0.197 |
Convulsion | 3 (1.6) | 17 (1.9) | 0.818 |
Currently | | | |
Very Thirsty | 145 (78.4) | 653 (71.7) | 0.062 |
Drinks poorly | 40 (21.5) | 188 (20.5) | 0.747 |
Sunken Eyes | 171 (91.4) | 792 (86.3) | 0.054 |
Wrinkled skin | 56 (30.6) | 211 (23.0) | 0.029 |
Restless | 134 (71.7) | 557 (60.6) | 0.004 |
Lethargy/unconscious | 23 (12.3) | 151 (16.4) | 0.157 |
Dry mouth | 142 (75.9) | 658 (71.7) | 0.235 |
Fast breathing | 24 (12.8) | 100 (10.9) | 0.44 |
Home ORS use | 21 (11.2) | 86 (9.4) | 0.43 |
Home Zinc use | 8 (4.3) | 34 (3.7) | 0.706 |
Assessed by Clinician | | | |
Temperature [IQR] | 37.1 [36.6–37.9] | 36.8 [36.4–37.5] | < 0.001 |
Measured Fever (≥ 37.5oC) | 99 (52.9) | 342 (37.2) | < 0.001 |
Median Respiratory rate [IQR] | 38.5 [34.0-42.5] | 36.0 [31.5–39.5] | < 0.001 |
Chest indrawing | 4 (2.1) | 9 (1.0) | 0.180 |
Sunken eyes | 177 (94.7) | 848 (92.3) | 0.255 |
Dry mouth | 183 (97.9) | 903 (98.3) | 0.71 |
Skin turgor (slow/very slow) | 78 (41.7) | 391 (42.6) | 0.833 |
Mental Status | | | |
Normal | 73 (39.0) | 380 (41.4) | 0.052 |
Restless/Irritable | 108 (57.8) | 530 (57.7) | |
Lethargic/Unconscious | 6 (3.2) | 9 (0.9) | |
Rectal prolapse | 0 (0) | 3 (0.3) | 0.434 |
Bipedal edema | 2 (1.1) | 5 (0.5) | 0.337 |
Abnormal hair | 9 (4.8) | 43 (4.7) | 0.937 |
Under Nutrition | 21 (11.2) | 109 (11.9) | 0.807 |
Flaky Skin | 2 (1.1) | 5 (0.5) | 0.409 |
Severe Acute Malnutrition (SAM) | 24 (12.8) | 78 (8.5) | 0.061 |
Wasting | 13 (7.0) | 39 (4.2) | 0.111 |
Admission | 27 (14.4) | 87 (9.5) | 0.043 |
Diarrhea Duration (≥ 7 days) | 70 (37.4) | 327 (35.6) | 0.631 |
any_antibiotic | 78 (41.7) | 402 (43.7) | 0.609 |
Rotavirus vaccination doses | | | |
0 | 2 (1.1) | 19 (2.4) | 0.385 |
1 | 8 (4.6) | 25 (3.1) | |
2 | 166 (94.3) | 764 (94.5) | |
ORS at facility | 186 (99.5) | 914 (99.9) | 0.311 |
Zinc at facility | 183 (97.9) | 887 (96.9) | 0.494 |
IV rehydration | 31 (16.6) | 92 (10.1) | 0.01 |
Dehydration | | | |
None | 8 (4.3) | 35 (3.8) | 0.747 |
Some | 126 (67.4) | 645 (70.2) | |
Severe | 53 (28.3) | 239 (26.0) | |
Vesikari Score | | | |
Mild | 13 (7.0) | 71 (7.7) | 0.088 |
Moderate | 75 (40.1) | 442 (48.1) | |
Severe | 99 (52.9) | 406 (44.2) | |
Median Vesikari score [IQR] | 11 [9–12] | 10 [8–12] | 0.0003 |
Diagnosis | | | |
Dysentery | 10 (5.4) | 58 (6.4) | 0.605 |
Malaria | 85 (45.5) | 361 (39.5) | 0.131 |
Pneumonia | 12 (6.4) | 37 (4.1) | 0.152 |
Bacterial Infection | 14 (7.5) | 93 (10.2) | 0.258 |
Malnutrition | 15 (8.0) | 68 (7.4) | 0.784 |
β− Includes electricity, propane, butane, natural gas; SAM defined as WHZ < − 3 or MUAC < 115 millimeters, or the presence of bilateral pitting edema; ORS-Oral rehydration solution
*P-value computed using either chi-square or Fisher`s exact test were performed as appropriate for categorical variables and Wilcoxon rank sum tests were used to compare continuous variables
From the feature selection analysis, the confirmed variables in order of importance were age (16.6%), temperature (6.0%), respiratory rate (4.1%) and breastfeeding (3.3%). SAM (3.4%), rotavirus vaccination (3.3%), and skin turgor (2.1%) were tentative features (Fig. 2).
Green, yellow, red and blue boxplots represent the Z scores of confirmed, tentative, rejected and shadow features, respectively.
Confirmed and tentative features: Age; temperature; respiratory rate; severe acute malnutrition (SAM); rotavirus vaccination; breastfeeding; skin turgor
Figure 2. Feature selection for linear growth faltering among children aged < 5 years presenting with moderate to severe diarrhea in rural western Kenya, 2015–2018
In addition to age, respiratory rate, temperature and breastfeeding, the following features were selected: confirmed (stunting at baseline [5.2%], vomit [4.0%], Vesikari score (3.7%) and sunken eyes [3.6%]) and tentative (bacterial infection diagnosis [2.5%]) in the sensitivity analysis using a cut-off of negative change in HAZ (Figure S2).
Model Performance
We evaluated seven ML algorithms in the prediction of LGF. From the developed models, sensitivity was highest in the RF model (80.7%), followed by the ANN (79.5%), SVM (77.3%), NB (76.5%), GBM (75.6%), LR (75.4%) and lowest in the KNN model (72.4%). The specificity ranged from 58.2–71.8%. Specifically, the specificity of the GBM model was the highest (71.8%), followed by RF (70.1%), LR (61.9%), NB and SVM (61.6%), KNN (61.4%) and lowest
in the ANN model (58.2%). The PPV ranged between 27.4% − 34.9% while the NPV ranged between 92.3% − 94.8%. The AUC of the models ranged from 73.4–83.5% with the GBM model having the highest AUC (83.5%, 95% Confidence Interval [95% CI]: 81.6–85.4) (Table 2).
Table 2
Model performance of linear growth faltering predictionβ models using combined data (Original and synthetic data)
Algorithm | Sensitivity % [95% CI] | Specificity % [95% CI] | PPV % [95% CI] | NPV % [95% CI] | F1-Score [95% CI] | AUC % [95% CI] | PRAUC % [95% CI] |
---|
RF | 80.7 [76.5–84.4] | 70.1 [68.1–72.1] | 34.9 [31.9–38.0] | 94.8 [93.6–95.9] | 48.7 [16.8–59.7] | 82.8 [80.8–84.8] | 96.0 [93.8–96.2] |
GBM | 75.6 [71.2–79.7] | 71.8 [69.8–73.7] | 34.7 [31.6–37.9] | 93.7 [92.4–94.8] | 47.6 [13.9–74.5] | 83.5 [81.6–85.4] | 96.2 [94.9–96.5] |
NB | 76.1 [71.7–80.1] | 61.6 [59.5–63.7] | 28.2 [25.6–31.0] | 92.8 [91.4–94.1] | 40.2 [12.0-42.4] | 75.6 [73.3–77.9] | 94.0 [92.1–95.0] |
LR | 75.4 [70.9–79.4] | 61.9 [59.7–64.0] | 28.2 [25.5–30.9] | 92.7 [91.2–94.0] | 38.2 [3.1–64.2] | 73.7 [71.3–76.1] | 93.0 [91.1–94.0] |
SVM | 77.3 [73.0-81.2] | 61.6 [59.5–63.7] | 28.6 [25.9–31.3] | 93.2 [91.7–94.5] | 41.7 [9.3–56.8] | 73.4 [71.0-75.8] | 93.0 [91.6–94.1] |
KNN | 72.4 [69.7–75.0] | 61.4 [59.3–63.5] | 27.6 [25.0-30.3] | 92.3 [90.8–93.6] | 40.2 [6.7–67.0] | 74.8 [72.3–77.2] | 93.0 [90.8–93.6] |
ANN | 79.5 [75.3–83.3] | 58.2 [56.1–60.4] | 27.4 [24.9–30.0] | 93.5 [92.0-94.7] | 40.8 [9.8–58.5] | 73.6 [71.3–76.0] | 93.0 [90.9–94.1] |
β− Linear growth faltering defined as Δ HAZ ≥ − 0.5 |
*RF-Random Forest; GBM-Gradient Boosting; NB- Naïve Bayes; LR-Logistic Regression; SVM- Support vector machine; KNN-K-nearest neighbors; ANN-Artificial Neural Networks; |
95% CI- 95% Confidence Interval; PPV- Positive Predictive Value; NPV- Negative Predictive Value; AUC- Area under the Curve; PRAUC- Precision Recall Area under the Curve
The GBM model was chosen as the champion model. The receiver operating characteristic (ROC) curves for LGF prediction models are shown in Figure S3. Moreover, in the sensitivity analysis using only the VIDA data in development, the model performance ranged between 63.0%-82.6%, 55.9%-78.6%, 27.3%-33.7%, 91.0%-94.2%, 40.3%-44.3%, 68.0%-75.5%, and 90.6%-94.4% for sensitivity, specificity, PPV, NPV, F1-score, AUC and PRAUC, respectively (Table 3). All models showed a decline in predictive performance during sensitivity analysis except for the SVM model, which had a marginal increase.
Table 3
Model performance of linear growth faltering prediction β models using original training data only
Algorithm | Sensitivity % [95% CI] | Specificity % [95% CI] | PPV % [95% CI] | NPV % [95% CI] | F1-Score [95% CI] | AUC % [95% CI] | PRAUC % [95% CI] |
---|
RF | 52.2 [36.9–67.1] | 78.6 [72.7–83.7] | 32.9 [22.3–44.9] | 89.1 [84.0–93.0] | 40.3 [10.9–51.4] | 70.3 [61.8–78.7] | 90.6 [88.4–90.8] |
GBM | 80.4 [66.1–90.6] | 63.3 [56.7–69.6] | 30.6 [22.5–39.6] | 94.2 [89.2–97.3] | 44.3 [12.9–55.8] | 75.5 [68.2–82.8] | 93.6 [92.3–93.9] |
NB | 63.0 [47.5–76.8] | 75.1 [69.0-80.6] | 33.7 [23.9–44.7] | 91.0 [86.0-94.7] | 43.9 [5.6–60.3] | 73.6 [66.1–81.2] | 93.0 [91.1–94.0] |
LR | 73.9 [58.9–85.7] | 63.3 [56.7–69.6] | 28.8 [20.8–37.9] | 92.4 [87.0–96.0] | 41.5 [7.8–52.1] | 73.8 [67.0-80.5] | 93.9 [92.0-94.9] |
SVM | 71.7 [56.5–84.0] | 65.9 [59.4–72.1] | 29.7 [21.4–39.1] | 92.1 [86.8–95.7] | 42.0 [7.2–51.9] | 75.2 [68.9–81.5] | 94.4 [93.0-95.5] |
KNN | 82.6 [68.6–92.2] | 56.8 [50.1–63.3] | 27.7 [20.4–36.0] | 94.2 [88.9–97.5] | 41.5 [12.2–51.8] | 73.1 [66.3–79.9] | 93.6 [91.4–94.2] |
ANN | 82.6 [68.6–92.2] | 55.9 [49.2–62.4] | 27.3 [20.1–35.5] | 94.1 [88.7–97.4] | 41.1 [12.0-57.9] | 68.0 [60.5–75.6] | 91.4 [89.3–92.5] |
β− Linear growth faltering defined as Δ HAZ ≥ − 0.5 |
*RF-Random Forest; GBM-Gradient Boosting; NB- Naïve Bayes; LR-Logistic Regression; SVM- Support vector machine; KNN-K-nearest neighbors; ANN-Artificial Neural Networks; |
95% CI- 95% Confidence Interval; PPV- Positive Predictive Value; NPV- Negative Predictive Value; AUC- Area under the Curve; PRAUC- Precision Recall Area under the Curve
In the sensitivity analysis using the second definition of LGF (negative change in HAZ), the model performance ranged between 45.8%-73.1%, 53.2%-76.6%, 79.0%-90.5%, 28.6%-48.5%, 58.3%-80.9%, 58.0%-82.4%, and 29.0%-62.6% for sensitivity, specificity, PPV, NPV, F1-score, AUC and PRAUC, respectively (Table S2). In this scenario, all models exhibited a drop in predictive performance except for the SVM model, which had a marginal increase and the RF model which registered same performance as in the primary analysis.
Overall the Brier scores were relatively high and ranged between 0.19–2.50 (Table 4).The Spiegelhalter’s p-value showed that all the models were not properly calibrated (p < 0.05). The performance of the calibrated GBM model was largely similar to its uncalibrated form with the model having an AUC of 83.7%.
Table 4
Calibration results of linear growth faltering prediction models.
Algorithm | Brier Score | Spiegelhalter Z-score | Spiegelhalter p-value |
---|
RF | 0.19 | 16.83 | < 0.0001 |
GBM | 2.50 | 208.10 | < 0.0001 |
NB | 2.18 | 101.02 | < 0.0001 |
LR | 2.16 | 85.02 | < 0.0001 |
SVM | 2.16 | 85.88 | < 0.0001 |
KNN | 2.21 | 109.88 | < 0.0001 |
ANN | 2.17 | 84.07 | < 0.0001 |
*RF-Random Forest; GBM-Gradient Boosting; NB- Naïve Bayes; LR-Logistic Regression; SVM- Support vector machine; KNN-K-nearest neighbors; ANN-Artificial Neural Networks; |
Explanatory Model Analysis
The EMA results for the top 2 models in the primary analysis were similar though the degree of importance varied across models with no SAM, no skin turgor, no rotavirus vaccine, age, elevated temperature and respiratory rate being predictive of LGF (Fig. 3). Similarly, in the sensitivity analysis using the second definition of LGF, the direction of association was similar between the two models although the magnitude of importance varied. In addition to age, respiratory rate and temperature, the following factors were also identified to be predictive of LGF: severity of disease, no vomiting, stunting at baseline, bacterial infection and lack of sunken eyes (Fig. 3).
Business Value Evaluation of Champion Model
From the business value evaluation of our champion model (GBM), the cumulative gains plot shows that the model is able to select ~ 60% of the target class (LGF) if we select the top-20% cases based on our model. Additionally, from the cumulative lift plot, our champion model is able to identify ~ 3 times higher number of the target class compared to a random selection if we pick the top-20% observations based on model probability. Lastly, from the cumulative response plot, 48% of observations in the top-20% cases based on model probability belong to the target class (Fig. 4).
*Scenario 1- Predicting linear growth faltering using a cut-off of Δ HAZ ≥ − 0.5
* age = 9: 9 months; Rotavirus_vacc = 2:2 doses of rotavirus vaccine; cur_wrinkledskin = 0: normal skin; SAM = 0: No severe acute malnutrition (SAM)
*Scenario 2- Predicting linear growth faltering using change in haz/month (negative change in linear growth is deemed growth faltering)
*age = 9: 9 months; vesikari_cat = 3: Severe disease based on Vesikari score; vomit = 1: Vomitting; Stunting_base = 0: No stunting at baseline; bacterial_infec = 0: No bacterial infection; sunken_eyes = 1: sunken eyes.
Figure 4. Business value plots for the Gradient Boosting (GBM) Model for linear growth faltering
Temporal Validation
We observed a decline in model performance on the temporal validation dataset with the AUC dropping by ~ 18%. Additionally, all metrics dropped in temporal validation with the GBM model achieving 53.7%, 67.7%, 32.5%, 83.5%, 40.5%, 65.6% and 86.4% for sensitivity, specificity, PPV, NPV, F1-score, AUC and PRAUC, respectively (Fig. 5).
PPV- Positive Predictive Value; NPV- Negative Predictive Value; AUC- Area under the Curve; PRAUC- Precision Recall Area under the Curve
Figure 5. Performance of champion model in development (2015–2018) and temporal validation (2022–2023) datasets.