Derivation and Validation of a Clinical Predictive Model for Longer Duration Diarrhea among Pediatric Patients in Kenya using Machine Learning Algorithms

doi:10.21203/rs.3.rs-4048898/v1

Download PDF

Research Article

Derivation and Validation of a Clinical Predictive Model for Longer Duration Diarrhea among Pediatric Patients in Kenya using Machine Learning Algorithms

https://doi.org/10.21203/rs.3.rs-4048898/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Background

Despite the adverse health outcomes associated with longer duration diarrhea (LDD), there are currently no clinical decision tools for timely identification and better management of children with increased risk. This study utilizes machine learning (ML) to derive and validate a predictive model for LDD among children presenting with diarrhea to health facilities.

Methods

LDD was defined as a diarrhea episode lasting ≥ 7 days. We used 7 ML algorithms to build prognostic models for the prediction of LDD among children < 5 years using de-identified data from Vaccine Impact on Diarrhea in Africa study (N = 1,482) in model development and data from Enterics for Global Heath Shigella study (N = 682) in temporal validation of the champion model. Features included demographic, medical history and clinical examination data collected at enrolment in both studies. We conducted split-sampling and employed K-fold cross-validation with over-sampling technique in the model development. Moreover, critical predictors of LDD and their impact on prediction were obtained using an explainable model agnostic approach. The champion model was determined based on the area under the curve (AUC) metric.

Results

There was a significant difference in prevalence of LDD between the development and temporal validation cohorts (478 [32.3%] vs 69 [10.1%]; p < 0.001). The following variables were associated with LDD in decreasing order: pre-enrolment diarrhea days (55.1%), modified Vesikari score(18.2%), age group (10.7%), vomit days (8.8%), respiratory rate (6.5%), vomiting (6.4%), vomit frequency (6.2%), rotavirus vaccination (6.1%), skin pinch (2.4%) and stool frequency (2.4%). While all models showed good prediction capability, the random forest model achieved the best performance (AUC [95% Confidence Interval]: 83.0 [78.6–87.5] and 71.0 [62.5–79.4]) on the development and temporal validation datasets, respectively.

Conclusions

Our study suggests ML derived algorithms could be used to rapidly identify children at increased risk of LDD. Integrating ML derived models into clinical decision-making may allow clinicians to target these children with closer observation and enhanced management.

Machine Learning

Longer duration diarrhea

Pediatric

Prediction

There are 1.7 billion episodes of diarrheal illness annually [1] and 1.5 million deaths [2] in children < 5 years are attributed to diarrhea every year, most of which occur in low- and middle-income countries (LMICs). Longer duration diarrhea (LDD) defined as diarrhea episode lasting ≥ 7 days encompasses both prolonged acute diarrhea (7–13 days) and persistent diarrhea (≥ 14 days) [3]. LDD has been shown to have a higher mortality rate among infants compared to acute diarrhea [4, 5] in addition to associations with decreased cognitive function, delayed growth and nutritional deficiencies [6, 7]. Whilst LDD represents a relatively small portion of childhood diarrheal episodes, it accounts for more than half the days with diarrhea [4].

Current diarrhea management guidelines focus on acute diarrhea, but up to 20% of diarrhea cases in LMICs end up becoming LDD [8, 9]. Diagnostic capabilities are scarce in most LMICs, including Kenya [10], which often prevent clinicians diagnosing and treating the enteropathogens associated with LDD. Predictive models based on clinical signs and symptoms may be used to rapidly identify patients at increased risk of LDD and provide an opportunity to administer better, timelier care, which could improve outcomes for this vulnerable group of children.

Machine learning (ML) has been adopted in public health and clinical practice to rapidly develop data-driven prediction models that complement clinician judgement and improve efficiency, reliability and accuracy [11, 12]. Several successes in the application of ML models in improving the implementation of public health interventions globally have been reported, specifically in the prediction of duration or length of stay for various diseases: mental disorders [13]; COVID-19 [14]; knee osteoarthritis [15]; ischemic stroke patients [16]; cardiac arrest [17]; lung cancer [18]; sepsis [19]. However, no such predictive models exist in literature for LDD. We seek to use ML to develop predictive models for children at increased risk of LDD in Kenya.

Study Design

This retrospective study leveraged two de-identified diarrheal datasets: the Vaccine Impact on Diarrhea in Africa (VIDA) study for model development and evaluation; the Enteric for Global Health (EFGH) Shigella surveillance study for temporal validation. This analysis focuses on data collected from the Kenya site in both studies.

The study design for VIDA have been described elsewhere [20]; in summary, VIDA was designed to assess diarrheal etiologies, rotavirus vaccine effectiveness, and population-based impact of rotavirus vaccine introduction in children aged 0–59 months residing in censused populations in 3 African countries. Moderate-to-severe diarrhea (MSD) cases, defined as children aged 0–59 months presenting at a sentinel health center with diarrhea (defined as ≥ 3 looser-than-normal stools within 24 hours) that began within the past 7 days after ≥ 7 diarrhea-free days and had ≥ 1 of the following: sunken eyes, poor skin turgor, dysentery, required intravenous rehydration, or hospitalization. Diarrhea-free controls matched by age, gender and geographical location were enrolled within 14 days of case enrolment. We utilized data from cases enrolled in VIDA over a 36 month period from May 2015 and July 2018.

The EFGH study employed cross-sectional and longitudinal study designs to establish incidence and consequences of Shigella medically attended diarrhea (MAD) within 7 country sites in Africa, Asia, and Latin America. Eligible MAD cases were children aged 6–35 months presenting at a sentinel health center with diarrhea (defined as ≥ 3 looser-than-normal stools within 24 hours) that began within the past 7 days after ≥ 2 diarrhea-free days. Additional eligibility criteria included: residing within the pre-defined EFGH catchment area; plan to remain at their current residence for at least the next 4 months;, legal guardian consenting to child’s participation in the study as well willingness to be followed-up for 3 months post-enrolment; child is not being referred to a non-EFGH facility at the time of screening; and site enrollment cap has not been met. Our study utilized EFGH data collected from 01 August, 2022 and 31 July, 2023.

In both studies, data on demographic, household-level characteristics, illness history, anthropometric and clinical characteristics were collected at enrollment by study research staff.

Diarrheal duration: Data sources and calculation.

Diarrhea duration was determined from two data sources. The pre-enrolment duration was based on caregiver’s report. The days of diarrhea reported in this period were considered to be uninterrupted. The post-enrolment duration involved a 14-day follow-up period after enrollment and was extracted from the data reported by caregiver in a memory aid (Figure S1) and diarrhea diary (Figure S2) for VIDA and EFGH, respectively. Additionally, for the EFGH study, if the caretaker did not return the diarrhea diary, the post-enrolment duration was extracted from the week 4 or month 3 follow-up case report form interview if available.

Based on these two sources, we were able to determine the pre-enrolment diarrhea duration covering 7 days before enrolment and the post-enrolment diarrhea duration covering 14 days after enrolment. This period gives a possible duration of 20-days (day of enrolment was captured in both pre-enrolment and post-enrolment duration). During the post-enrolment period, 2 diarrhea free days were considered an end of an episode consistent with previous studies [21–24]. We defined LDD as a diarrheal episode lasting ≥ 7 days [4, 21].

Statistical analysis

We compared patient characteristics of LDD cases versus non-LDD cases. Proportions were reported for categorical variables and either chi-square or Fisher`s exact test were performed as appropriate. Wilcoxon rank sum tests were used to compare continuous variables as appropriate.

Pre-Processing

We assessed all demographic, socio-economic and clinical characteristics as potential features. We evaluated the missing data patterns and the missing data points in the variables were imputed using the Multiple Imputation by Chained Equations (MICE) package [25].

Feature Selection

We conducted feature selection with the goal of optimizing model accuracy, minimizing computational cost and enhancing interpretability of the models. This was implemented using the Boruta package [26], an all relevant feature selection wrapper around the random forest algorithm that selects relevant features by comparing original attributes' importance with importance achievable at random using their permuted copies. Selected and tentative features were subsequently used in model development. Although, rectal straining and breastfeeding were selected features, they were not included in the model development since the EFGH study used for temporal validation did not capture them.

Model development and evaluation

The schematic diagram for model development and validation is shown in Figure 1. We modeled two scenarios representing two different use cases: i.) the probability of an acute diarrheal episode progressing to LDD based on the child’s signs and symptoms at presentation to hospital ii.) The probability of a diarrheal episode lasting an additional 7 or more days after presentation to hospital (i.e. excluding pre-hospital days of diarrhea).

The first scenario aims to aid healthcare providers in early identification and better management of children at increased risk of LDD. The second scenario hopes to address possible caretaker concerns on how long their child’s diarrheal episode would last from the day of medical treatment. To build the LDD prediction model, we applied 7 ML algorithms including: Random Forest (RF), Gradient Boosting (GBM), Naive Bayes (NB), Logistic regression (LR), Support vector machine (SVM), K-nearest neighbors (KNN) and Artificial Neural Networks (ANN). The algorithms were implemented in R version 4.2.2 using the Caret package [27].

We performed split-sampling by conducting a 75:25% data split to partition the development data (VIDA) into training and test sets [28]. The LDD predictive models were then developed in the training dataset using K-fold cross-validation [29, 30] to obviate under-fitting or overfitting of the model. We employed over-sampling technique [31] within the resampling procedure to handle the modest class imbalance in our target variable (LDD) since a disparity in the frequencies of the observed classes can have a significant negative impact on model fitting. The models from the training data were evaluated in the test dataset using the following performance metrics: sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F1-score. Receiver operating characteristic (ROC) curves were constructed and the area under the curve (AUC) and the precision-recall area under the curve (PRAUC) for each model was computed using the precrec package [32].

We assessed calibration in the built models using Brier scores (the mean squared error between the actual outcome and the estimated probabilities), Spiegelhalter’s z-test (a formal measurement that serves as a proxy for calibration calculated from the decomposition of Brier score) and its accompanying p-value [33]. The champion model was the best predictive model from the pool of developed models based on the AUC metric.

We conducted explanatory model analysis (EMA) for the top four models using a model agnostic procedure to estimate SHapley Additive exPlanations (SHAPs) attributions. This was implemented using the DALEX package [34]. The SHAP values were plotted as bar plots in descending degree of importance with the red color signifying a negative association and green color showing a positive association. We further conducted temporal validation on the champion model to assess its transportability and generalizability [35]. Due to the difference in case definition between the two studies, we also conducted a sensitivity analysis of the temporal validation using a subset of EFGH participants who met the VIDA inclusion criteria.

To evaluate the business value of the predictive model, modelplotr package [36] was used to build valuable evaluation plots (cumulative gains, cumulative lift, response and cumulative response plots). The cumulative gains plot was used to visualize the percentage of the target class members that were selected if we decided to select up until percentile X while the cumulative lift plot was used to explain how much better selecting based on our model was compared to taking random selections. The response plot was used to plot the percentage of target class observations per percentile. Lastly, the cumulative response plot was used to show the expected percentage of the target class observations in the selection, when we apply the model and select up until percentile X. Descriptive analysis, predictive modelling for LDD and plotting were all performed in R version 4.2.2 [37].

During VIDA (development dataset), 2,895 children aged < 5 years sought care for diarrhea in the sentinel health centers, of whom 2,009 (69.4%) had MSD and 1,554 (77.4%) met the study case definition and were subsequently enrolled. Among those enrolled 1, 482 (95.4%) had their memory aids completed by the caretakers, of whom 478 (32.3%) had LDD. While in EFGH (temporal validation dataset), 1,879 children aged < 5 years sought care for diarrhea in the SHCs, of whom 1, 365 (72.6%) were eligible for screening and 706 (51.7%) met the study case definition and were subsequently enrolled. Among those enrolled 685 (97.0%) had their diarrhea diaries completed by the caretakers, of whom 69 (10.1%) had LDD (Fig. 2). There was a statistically significant difference in prevalence of LDD between VIDA and EFGH studies (478 [32.3%] vs 69 [10.1%]; p < 0.001). Additionally, we observed significant differences in the baseline characteristics of participants in the two studies. Specifically, compared to EFGH participants, VIDA participants were older (Median age in months [IQR]: 15.0 [9.0–25.0] vs 13.6 [8.9–20.4], p = 0.0361), had more severe diarrheal episodes (Median Vesikari score [IQR]: 10 [8–13] vs 8 [6–10], p < 0.001), had a higher respiratory rate (Median [IQR]: 36.5 [31.5–41.0] vs 33.0 [28.0–39.0], p < 0.001). Moreover, VIDA participants were more likely to present with vomiting (843 [56.9%] vs 346 [50.5%], p = 0.006), decreased skin turgor (614 [41.4%] vs 137 [20.0%], p < 0.001) and severe dehydration (388 [26.2%] vs 22 [3.2%], p < 0.001) compared to EFGH participants (Table S2).

The characteristics of VIDA participants stratified by LDD status are shown in Table 1. Children who had LDD were younger than those who did not (Median age in months [IQR]: 13 [7–20] vs 16 [10–27], p < 0.001). Furthermore, compared with those who did not have LDD, those with LDD had a higher respiratory rate (Median [IQR]: 37.5 [33-42.5] vs 36 [31–40], p < 0.001), and a higher Vesikari score (Median [IQR]: 11 [9–13] vs 10 [8–12], p < 0.001). Additionally, caretaker education, breastfeeding, stool frequency in 24 hours, belly pain, rectal straining, cough, number of vomiting episodes, prior home oral rehydrating salts use, rotavirus vaccination, fast breathing and decreased skin turgor were significantly associated with LDD.

Table 1

Characteristics of children aged < 5 years seeking care for moderate-to-severe diarrhea in Kenya stratified by Diarrhea duration, 2015–2018.
	Longer Duration Diarrhea (LDD)
Characteristics	Yes (n = 478)	No (n = 1,004)	p-value
	n (%)	n (%)
Demograhic
Median age [IQR]	13 [7–20]	16 [10–27]	< 0.001
Age Category
0–11 months	225 (47.1)	332 (33.1)	< 0.001
12–23 months	152 (31.8)	356 (35.5)
24–59 months	101 (21.1)	316 (31.5)
Gender: Female	213 (44.6)	463 (46.1)	0.574
Household Details
Caretaker education ( > = Secondary )	53 (11.1)	153 (15.3)	0.03
Natural Floor	323 (67.7)	644 (64.1)	0.177
Refined/Electric Primary Fuel Source^β	14 (3.))	44 (4.4)	0.183
Clinical characteristics
By History
Breastfeeding before diarrhea onset			< 0.001
None	147 (30.8)	411 (40.9)
Partial	42 (8.8)	55 (5.5)
Exclusive	289 (60.5)	538 (53.6)
Median diarrhea days [Interquartile range (IQR)]	4 [3–5]	2 [2–3]	< 0.001
Stool Count
3	75 (15.7)	192 (19.1)	0.037
4–5	256 (53.6)	562 (56.0)
≥ 6	147 (30.7)	250 (24.9)
Belly Pain	299 (65.0)	568 (58.7)	0.024
Rectal straining	155 (32.6)	235 (23.5)	< 0.001
Cough	280 (58.6)	526 (52.4)	0.025
Vomiting	255 (53.4)	588 (58.6)	0.058
No. of vomit
0	223 (46.7)	416 (41.4)	0.026
1	48 (10.0)	119 (11.9)
2–4	177 (37.0)	364 (36.2)
≥ 5	30 (6.3)	105 (10.5)
Median vomit days [IQR]	2 [1–3]	2 [1–2]	< 0.001
Home ORS use	58 (12.1)	77 (7.7)	0.005
Rotavirus vaccination	414 (90.6)	761 (81.7)	< 0.001
At enrolment
Very Thirsty	350 (74.2)	695 (70.0)	0.1
Fast breathing	63 (13.2)	98 (9.8)	0.048
Median Respiratory rate [IQR]	37.5 [33-42.5]	36 [31–40]	< 0.001
Dry mouth
Normal	4 (0.8)	23 (2.3)	0.053
somewhat Dry	441 (92.3)	891 (88.7)
Very dry	33 (6.9)	90 (9.0)
Skin turgor (slow/very slow)	222 (46.4)	392 (39.0)	0.007
Mental Status			0.093
Normal	199 (41.6)	437 (43.5)
Restless/Irritable	266 (55.7)	555 (55.3)
Lethargic/Unconscious	13 (2.7)	12 (1.2)
Under Nutrition	65 (13.6)	109 (13.6)	0.125
Vesikari Score			< 0.001
Mild	15 (3.1)	130 (13.0)
Moderate	204 (42.7)	426 (42.4)
Severe	259 (54.2)	448 (44.6)
Median vesikari score [IQR]	11 [9–13]	10 [8–12]	< 0.001
Cipro_ceft	20 (4.2)	62 (6.2)	0.117

^β− Includes electricity, propane, butane, natural gas.

ORS-Oral rehydration solution

The following variables had a p-value ≥ 0.2 and are not included in the table: No. of children < 5 years in households; Total assets; Animal ownership; improved water; improved sanitation; shared facility; stool type; Blood in stool; drinks poorly; unable to drink; fever; restless; lethargy; unconscious; rectal prolapse; difficulty breathing; convulsion; sunken eyes; home zinc use; capillary refill; chest indrawing; sunken eyes; Bipedal edema; Abnormal hair; Dehydration; ORS at facility; Zinc at facility; IV rehydration; any_antibiotic; Malaria diagnosis; Dysentry diagnosis; Stunting; Wasting

From the feature selection analysis, the selected variables in order of importance were diarrhea days prior to presentation (55.1%), Vesikari score (18.2%), age group (10.7%), vomit days (8.8%), breastfeeding (8.4%), respiratory rate (6.5%), vomiting (6.4%), number of vomits in last 24 hours (6.2%), rotavirus vaccination (6.1%) and rectal straining (3.4%). Skin pinch (2.4%) and number of loose stools in last 24 hours (2.4%) were tentative features (Fig. 3).

We evaluated seven ML algorithms in the prediction of LDD. From the developed models, sensitivity was highest in the RF model (80.7%), followed by the LR (76.5%), ANN (75.6%), SVM (73.9%), KNN (73.1%), the GBM model (72.3%) and lowest in the NB model (69.7%). The specificity of the GBM and SVM models were the highest (76.5%), followed by ANN (74.9%), LR (74.5%), RF and NB (74.1%), and lowest in the KNN model (72.1%). The PPV ranged between 55.4% − 59.9% while the NPV ranged between 83.8% − 89.0%. The AUC of the models in decreasing order was 83.0%, 82.0%, 81.5%, 81.1%, 80.5%, 79.7% and 77.3% for RF, SVM, ANN, GBM, LR, KNN and NB, respectively (Table 2). The RF model emerged as the champion model with 80.7%, 74.1%, 59.6%, 89.0%, 68.6%, 83.0% and 90.0% for sensitivity, specificity, PPV, NPV, F1-score, AUC and PRAUC, respectively.

Table 2

Longer Duration Diarrhea (LDD) prediction models with Over-sampling technique used in the resampling procedure: Model Performance
Algorithm	LDD Prediction with over-sampling technique
Algorithm	Sensitivity % [95% CI]	Specificity % [95% CI]	PPV % [95% CI]	NPV % [95% CI]	F1-Score % [95% CI]	AUC % [95% CI]	PRAUC % [95% CI]
RF	80.7 [72.4–87.3]	74.1 [68.2–79.4]	59.6 [51.6–67.3]	89.0 [83.9–92.9]	68.6 [56.3–75.3]	83.0 [78.6–87.5]	90.0 [85.9–93.8]
GBM	72.3 [63.3–80.1]	76.5 [70.8–81.1]	59.3 [50.8–67.4]	85.3 [80.0-89.7]	65.2 [41.7–73.1]	81.1 [76.3–86.0]	87.8 [83.3–93.0]
NB	69.7 [60.7–77.8]	74.1 [68.2–79.4]	56.1 [47.7–64.2]	83.8 [78.3–88.4]	62.2 [36.4–68.9]	77.3 [72.1–82.6]	85.7 [82.2–89.0]
LR	76.5 [67.8–83.8]	74.5 [68.6–79.8]	58.7 [50.5–66.5]	87.0 [81.7–91.2]	66.4 [44.1–69.4]	80.5 [75.6–85.4]	88.7 [84.4–92.7]
SVM	73.9 [65.1–81.6]	76.5 [70.8–81.6]	59.9 [51.5–67.9]	86.1 [80.9–90.4]	66.2 [42.2–69.6]	82.0 [77.3–86.7]	89.3 [85.1–93.1]
KNN	73.1 [64.2–80.8]	72.1 [66.1–77.6]	55.4 [47.3–63.3]	85.0 [79.5–89.5]	63.0 [40.4–70.7]	79.7 [74.9–84.5]	87.0 [83.9–91.0]
ANN	75.6 [66.9–83.0]	74.9 [69.1–80.1]	58.8 [50.6–66.7]	86.6 [81.4–90.9]	66.1 [46.3–74.1]	81.5 [76.8–86.2]	89.1 [84.8–93.6]

*RF-Random Forest; GBM-Gradient Boosting; NB- Naïve Bayes; LR-Logistic Regression; SVM- Support vector machine; KNN-K-nearest neighbors; ANN-Artificial Neural Networks;

95% CI- 95% Confidence Interval; PPV- Positive Predictive Value; NPV- Negative Predictive Value; AUC- Area under the Curve; PRAUC- Precision Recall Area under the Curve

The receiver operating characteristic (ROC) curves for LDD prediction models are shown in Figure S3. Furthermore, in the prediction of the duration of diarrhea post-enrolment (≥ 7 days), the model performance ranged between 42.3%-78.8%, 45.3%-72.3%, 16.8%-22.1%, 88.3%-90.9%, 26.5%-30.7%, 52.9%-64.4%, and 86.9%-92.0% for sensitivity, specificity, PPV, NPV, F1-score, AUC and PRAUC, respectively (Table 3). The model performance in the prediction of LDD when no sub-sampling technique was employed are shown in Table S1.

Table 3

Post-enrolment duration (≥ 7 days) prediction models with Over-sampling technique used in the resampling procedure: Model Performance
Algorithm	Post-enrolment Duration Prediction (≥ 7 days)
Algorithm	Sensitivity % [95% CI]	Specificity % [95% CI]	PPV % [95% CI]	NPV % [95% CI]	F1-Score % [95% CI]	AUC % [95% CI]	PRAUC % [95% CI]
RF	48.1 [34.0-62.4]	72.3 [67.1–77.2]	22.1 [14.9–30.9]	89.5 [85.1–93.0]	30.3 [-19.4-56.0]	63.3 [55.9–70.7]	91.6 [88.1–94.4]
GBM	42.3 [28.7–56.8]	71.1 [65.7–76.0]	19.3 [12.5–27.7]	88.3 [83.7–92.0]	26.5 [-25.7-55.3]	61.1 [54.1–68.1]	91.7 [87.7–94.8]
NB	78.8 [65.3–88.9]	45.3 [39.7–50.9]	19.1 [14.0–25.0]	92.9 [87.7–96.4]	30.7 [0.5–45.2]	62.3 [54.6–70.0]	90.6 [87.6–92.9]
LR	61.5 [47.0-74.7]	59.1 [53.5–64.6]	19.8 [13.9–26.7]	90.4 [85.5–94.0]	29.9 [-11.7-39.7]	64.2 [57.1–71.3]	91.8 [88.5–95.2]
SVM	59.6 [45.1–73.0]	62.6 [57.0-67.9]	20.7 [14.5–28.0]	90.5 [85.8–94.0]	30.7 [-12.6-50.1]	62.7 [55.7–69.7]	91.2 [87.3–95.3]
KNN	59.6 [45.1–73.0]	51.6 [45.9–57.2]	16.8 [11.7–22.9]	88.6 [83.2–92.8]	26.2 [-12.9-43.5]	52.9 [44.5–61.3]	86.9 [83.7–89.9]
ANN	67.3 [52.9–79.7]	53.5 [47.8–59.0]	19.1 [13.7–25.6]	90.9 [85.8–94.6]	29.8 [-7.8-50.2]	64.4 [57.2–71.6]	92.0 [88.8–95.3]

*RF-Random Forest; GBM-Gradient Boosting; NB- Naïve Bayes; LR-Logistic Regression; SVM- Support vector machine; KNN-K-nearest neighbors; ANN-Artificial Neural Networks;

95% CI- 95% Confidence Interval; PPV- Positive Predictive Value; NPV- Negative Predictive Value; AUC- Area under the Curve; PRAUC- Precision Recall Area under the Curve

Overall the Brier scores were low and ranged between 0.17–0.21, however the Spiegelhalter’s p-value showed that the NB and KNN models did not calibrate well in the automated algorithm (p < 0.05) (Table 4). From the explanatory model analysis of the prediction of LDD, the degree of importance varied across models with the likelihood of developing LDD increasing with pre-enrolment diarrhea days, severity based on the modified Vesikari score, no rotavirus vaccination, normal skin turgor and age. Conversely, the likelihood of progressing to LDD decreased with no vomiting (vomit = 0, number of vomits in last 24 hours = 0, and vomit days = 0) and number of loose stools in last 24 hours (≥ 6) (Fig. 4).

Table 4

Calibration results of Longer Duration Diarrhea (LDD) prediction models.
Algorithm	Brier Score	Spiegelhalter Z-score	Spiegelhalter p-value
RF	0.17	-1.23	0.219
GBM	0.18	-0.95	0.341
NB	0.20	4.82	< 0.0001
LR	0.18	-0.58	0.565
SVM	0.17	-0.92	0.357
KNN	0.19	-3.86	< 0.0001
ANN	0.18	-0.18	0.855

*RF-Random Forest; GBM-Gradient Boosting; NB- Naïve Bayes; LR-Logistic Regression; SVM- Support vector machine; KNN-K-nearest neighbors; ANN-Artificial Neural Networks;

Furthermore, we observed similar patterns in the EMA results of the prediction of ≥ 7 days of diarrhea post-enrolment with the only difference being in pre-enrolment diarrhea days, which decreased the likelihood of developing the outcome in this prediction (Fig. 5).

From the business value evaluation of our champion model (RF), the cumulative gains plot shows that the model is able to select 46% of the target class (LDD) if we select the top-20% cases based on our model. Additionally, from the cumulative lift plot, our champion model is able to identify 2.6 times more LDD cases compared to a random selection if we pick the top-20% observations based on model probability. Lastly, from the cumulative response plot, 72% of observations in the top-20% cases based on model probability belong to the target class (Fig. 6).

Temporal Validation in EFGH data

We observed a decline in model performance on the temporal validation dataset, the RF model achieved 37.7%, 86.0%, 23.2%, 92.5%, 27.9%, 68.4% and 94.4% for sensitivity, specificity, PPV, NPV, F1-score, AUC and PRAUC, respectively. We observed a marginal increase in model performance in the sensitivity analysis when including only EFGH enrollees that met the VIDA inclusion criteria, the RF model achieved 47.5%, 80.5%, 25.7%, 91.5%, 33.3%, 71.0% and 93.8% for sensitivity, specificity, PPV, NPV, F1-score, AUC and PRAUC, respectively (Fig. 7).

As the field of machine learning advances, its potential to revolutionize healthcare practices for the benefit of both patients and healthcare systems globally is becoming increasingly evident. This study evaluated the feasibility of ML algorithms in the prediction of LDD among pediatric patients presenting with diarrhea. From our evaluation of 7 ML algorithms, the models achieved good performance with the RF model emerging as the champion model in predicting LDD. However, on the temporal validation data the champion model (RF) did not perform optimally registering a drop of 12.0% in the model AUC, largely driven by a decrease in sensitivity. Moreover, there was a decline in model performance while predicting the probability of having ≥ 7 diarrhea days post-enrolment with the ANN model achieving the best performance. These declines in model performance are likely to be attributable to differences in the study populations [38, 39], with our temporal validation set having fewer LDD episodes and including children with less severe diarrhea. However, we cannot exclude the possibility that some degree of model over-fitting may have contributed to the decrease in performance in our validation dataset. Despite these decreases in performance the models retained good negative predictive values, suggesting they may aid clinicians identify which children are unlikely to experience LDD.

Based on our feature selection, the variables identified as predictors of LDD were diarrhea days, Vesikari score, age group, vomit days, breastfeeding, respiratory rate, vomiting, number of vomits in last 24 hours, rotavirus vaccination, rectal straining, skin pinch and number of loose stool in last 24 hours. These variables have been documented as risk factors for LDD in previous studies. Specifically, severity of diarrheal disease was an important predictor of LDD in our results with the overall severity (modified Vesikari score) as well as individual elements of the severity score (diarrhea days, vomiting, vomit days, number of vomits in last 24 hours, number of loose stool in last 24 hours, skin pinch) being predictive of LDD. Our findings are consistent with those of Lima and Guerrant who found in their review that episodes of longer duration were more severe at presentation [40, 41]. Severe diarrheal episodes may cause intestinal inflammation which could lead to prolonged illness and recovery time [42]. Additionally, severe diarrhea through vomiting and high passage of loose stools may lead to significant loss of fluids and electrolytes, which may exacerbate the illness and lead to extended duration of illness as the body needs more time to replenish lost fluids and restore its normal balance of electrolyte. A stronger host immune response to etiologic agents of diarrhea may also lead to more severe symptoms and an extended illness duration.

We also observed younger children were at increased risk of LDD. This finding is similar to findings from previous studies [4, 41, 43] and can possibly be explained by the fact that previous exposure to enteric pathogens which can induce specific immunity that may reduce diarrheal duration and frequency is likely to be minimal in infants and toddlers compared to older children. Additionally, as children age their immune system undergoes development throughout early childhood thereby reducing their vulnerability to infection by microbial agents. We also observed lack of rotavirus vaccination to be a predictor of LDD. Despite the lower vaccine effectiveness of rotavirus vaccine reported in developing countries compared to the developed countries [44], lack of rotavirus vaccination exposes children to severe dehydrating diarrhea that would possibly lead to prolonged duration of illness.

Post-enrolment diarrhea duration (≥ 7 days) was much harder to predict than overall LDD, and we observed a decline in model performance with a difference of up to -26.8% reported in model AUCs. A number of potential reasons have been advanced in literature for poor performance of machine learning algorithms: outliers in the development dataset, class imbalance, overfitting or underfitting, use of less than ideal metric in assessing performance and the data doesn’t represents a predictable pattern [45]. Our model development strategy that involved split sampling, K-fold cross validation and over-sampling technique addresses most of these challenges leaving data that does not represent a predictable pattern as a possible reason for the sub-optimal results observed. The EMA results in the prediction of post-enrolment duration were similar to those of LDD prediction and they showed that severity based on the modified Vesikari score, no rotavirus vaccination, normal skin turgor and age and had a positive effect on the prediction of the outcomes while no vomiting and ≥ 6 loose stools had a negative effect on the prediction of the outcomes. The primary difference between the two outcomes was inclusion of pre-enrolment diarrhea days which would be known to the models and clinicians at presentation.

Approximately, 32 and 10 in every 100 children with MSD and MAD, respectively, develop LDD. This burden coupled with lack of adequate diagnostic capacity [46] and overburdened healthcare workers [47] underscore the need of alternative strategies such as clinical predictive models in prioritizing resources at high-risk children and ensuring close monitoring and better management while allowing low-risk children return home earlier. Our results show the potential of ML algorithms in the rapid identification of at-risk children. These model could be deployed as web-based applications using platforms such as R-shiny or plumber [48, 49], or they could be integrated into electronic medical records systems [50] ensuring they are aligned with clinical workflows. Such simple and flexible deployment methodologies can allow rapid adoption of the model in clinical practice helping to complement clinician judgement in the timely identification of at-risk patients. However, further work is needed to address the drop in sensitivity during temporal validation that was probably caused by the baseline differences in severity of disease and possible shift in study population over time. This also suggests that different criteria for predicting LDD could be used across settings with varying severity. Our findings highlight the need for monitoring and periodic retraining of the model in order to maintain its predictive performance. It may also be possible that strengthening laboratory capacity, allowing inclusion of biological data into predictive framework, is an alternative pathway to improve the accuracy of clinical judgements.

In spite of using a robust strategy in model derivation and validation (internal and temporal), our study still has some limitations. While certain pathogens have been shown to be associated with extended duration of diarrhea, we did not use laboratory results although they were available in both studies used. The rationale for this decision was that culture and molecular diagnostic testing is not routinely done in most health facilities and therefore results would be unavailable in the absence of study support hence this data would be missing when using the tool. Additionally, while this model may help to rapidly identify children at increased risk of LDD, no evidenced-based treatment for LDD exists leaving only empiric treatment such as general supportive care and nutritional rehabilitation as possible therapeutic options whenever zinc fails to reduce diarrheal duration. Future research should focus on assessing the potential acceptability and potential impacts of ML models on clinical practice and patient outcomes as well as the cost-effectiveness of such model deployment.

Our study shows the practical utility of machine learning algorithms in rapid identification of children at increased risk of LDD in our setting. The use of our validated RF model in clinical settings to complement clinician judgement could help to prioritize resources at high-risk children and ensure close monitoring and better management while allowing low-risk children return home earlier. However, successful implementation and widespread adoption will require further research, collaboration, and ethical diligence. There is need to explore its integration into clinical decision-making in order to translate the model outputs into actionable insights and real-world impact.

LMICs

low-and middle-income countries

LDD

Longer duration diarrhea

Machine learning

VIDA

Vaccine Impact on Diarrhea in Africa study

EFGH

Enterics for Global Health

MSD

Moderate-to-severe diarrhea

MAD

Medically attended diarrhea

MICE

Multiple Imputation by Chained Equations

Random Forest

GBM

Gradient Boosting

Naive Bayes

Logistic regression

SVM

Support vector machine

KNN

K-nearest neighbors

ANN

Artificial Neural Networks

PPV

Positive predictive value

NPV

Negative predictive value

ROC

Receiver operating characteristic curves

AUC

area under the curve

PRAUC

precision-recall area under the curve

EMA

Explanatory model analysis

SHAPs

SHapley Additive exPlanations

KEMRI SERU

Kenya Medical Research Institute Scientific and Ethical Review Unit

Ethics approval and consent to participate

The VIDA protocol was approved by the Institutional Review Board of the University of Maryland School of Medicine, Baltimore, MD, USA (UMB Protocol #: HM-HP-00062472) and the Kenya Medical Research Institute (KEMRI) Scientific and Ethical Review Unit (SERU) (SERU#2996). The EFGH protocol was approved by the KEMRI SERU (SERU#4362). Written informed consent was sought from caregivers in both studies before initiation of study procedures. Additionally, ethical approval for undertaking the current study was sought from the health research ethics committee of the University of South Africa, College of Agricultural Sciences (2023/CAES_HREC/2192).

Consent for publication

Not Applicable

Availability of data and materials

The data used for the modelling in this study belongs to KEMRI and restrictions apply to the availability of these data. Data cleaning, pre-processing and model development were done in R version 4.1.2. The programming code for R is available upon request addressed to the corresponding author: Billy Ogwel ([email protected]).

Competing interests

Authors declare no conflict of interest.

Funding

This work was supported by the Bill & Melinda Gates Foundation (grant INV-045988). The funders did not play any role in the study and interpretation of its outcome.

Author Contributions

BO and VM conceived the study, BO, VM, AOA, KDT, PBP and RO contributed to study design and implementation, BO, VM and KDT analyzed and interpreted the data. BO drafted the manuscript and all authors critically reviewed the manuscript for intellectual content and approved the final manuscript.

Disclosure

The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Kenya Medical Research Institute or partnering institutions.

Acknowledgements

We appreciate the contributions and efforts of KEMRI-CGHR staff involved in the data collection, data management, and laboratory testing of samples in the two studies. We also wish to thank the study participants and the ministry of health staff for supporting both studies. Moreover, we would like to acknowledge the use of artificial intelligence (AI) technology for grammar checking and proofreading of this manuscript.

World Health Organization. Diarrhoeal disease. 2017. Available at: https://www.who.int/news-room/fact-sheets/detail/diarrhoeal-disease. Accessed 19 February 2022.
CDC. Global Diarrhea Burden | Global Water, Sanitation and Hygiene | Healthy Water | CDC. 2018. Available at: https://www.cdc.gov/healthywater/global/diarrhea-burden.html. Accessed 25 November 2020.
Giannattasio A, Guarino A, Lo Vecchio A. Management of children with prolonged diarrhea. F1000Research. 2016; 5. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765715/. Accessed 25 November 2020.
Strand TA, Sharma PR, Gjessing HK et al. Risk Factors for Extended Duration of Acute Diarrhea in Young Children. PLoS ONE. 2012; 7. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3348155/. Accessed 27 November 2020.
Victora CG, Huttly SR, Fuchs SC, Nobre LC, Barros FC. Deaths due to dysentery, acute and persistent diarrhoea among Brazilian infants. Acta Paediatr. 1992;81:7–11.
Bhutta ZA, Nelson EA, Lee WS, et al. Recent advances and evidence gaps in persistent diarrhea. J Pediatr Gastroenterol Nutr. 2008;47:260–5.
Moore SR, Lima NL, Soares AM, et al. Prolonged episodes of acute diarrhea reduce growth and increase risk of persistent diarrhea in children. Gastroenterology. 2010;139:1156–64.
Alam NH, Ashraf H. Treatment of Infectious Diarrhea in Children. Pediatr Drugs. 2003;5:151–65.
Keusch GT, Walker CF, Das JK, Horton S, Habte D. Diarrheal Diseases. In: Black RE, Laxminarayan R, Temmerman M, Walker N, eds. Reproductive, Maternal, Newborn, and Child Health: Disease Control Priorities, Third Edition (Volume 2). Washington (DC): The International Bank for Reconstruction and Development / The World Bank, 2016. Available at: http://www.ncbi.nlm.nih.gov/books/NBK361905/. Accessed 13 January 2023.
McNerney R. Diagnostics for Developing Countries. Diagnostics. 2015;5:200–9.
Chen JH, Asch SM. Machine Learning and Prediction in Medicine — Beyond the Peak of Inflated Expectations. N Engl J Med. 2017;376:2507–9.
Jayatilake SMDAC, Ganegoda GU. Involvement of Machine Learning Tools in Healthcare Decision Making. J Healthc Eng. 2021; 2021:e6679512.
Nieuwenhuijsen K, Verbeek JHAM, de Boer AGEM, Blonk RWB, van Dijk FJH. Predicting the duration of sickness absence for patients with common mental disorders in occupational health care. Scand J Work Environ Health. 2006;32:67–74.
Ebinger J, Wells M, Ouyang D, et al. A Machine Learning Algorithm Predicts Duration of hospitalization in COVID-19 patients. Intell-Based Med. 2021;5:100035.
Holm J, Frumento P, Almondo G, et al. Predicting the duration of sickness absence due to knee osteoarthritis: a prognostic model developed in a population-based cohort in Sweden. BMC Musculoskelet Disord. 2021;22:603.
Chang K-C, Tseng M-C, Weng H-H, Lin Y-H, Liou C-W, Tan T-Y. Prediction of Length of Stay of First-Ever Ischemic Stroke. Stroke. 2002;33:2670–4.
Daghistani TA, Elshawi R, Sakr S, Ahmed AM, Al-Thwayee A, Al-Mallah MH. Predictors of in-hospital length of stay among cardiac patients: A machine learning approach. Int J Cardiol. 2019;288:140–7.
Alsinglawi B, Alshari O, Alorjani M, et al. An explainable machine learning framework for lung cancer hospital length of stay prediction. Sci Rep. 2022;12:607.
Ling Y, Chen Y, Chirikov V, et al. A Prediction Model for Length of Stay in the Icu Among Septic Patients: A Machine Learning Approach. Value Health. 2018;21:S5.
Powell H, Liang Y, Neuzil KM, et al. A Description of the Statistical Methods for the Vaccine Impact on Diarrhea in Africa (VIDA) Study. Clin Infect Dis. 2023;76:S5–11.
Schilling KA, Omore R, Derado G, et al. Factors Associated with the Duration of Moderate-to-Severe Diarrhea among Children in Rural Western Kenya Enrolled in the Global Enteric Multicenter Study, 2008–2012. Am J Trop Med Hyg. 2017;97:248–58.
Morris SS, Cousens SN, Lanata CF, Kirkwood BR. Diarrhoea—Defining the Episode. Int J Epidemiol. 1994;23:617–23.
Md N, Sr M, Pd P et al. Early childhood diarrhea is associated with diminished cognitive function 4 to 7 years later in children in a northeast Brazilian shantytown. Am J Trop Med Hyg, 2002. Available at: https://pubmed.ncbi.nlm.nih.gov/12201596/. Accessed 24 October 2020.
Platts-Mills JA, Liu J, Rogawski ET, et al. Use of quantitative molecular diagnostic methods to assess the aetiology, burden, and clinical characteristics of diarrhoea in children in low-resource settings: a reanalysis of the MAL-ED cohort study. Lancet Glob Health. 2018;6:e1309–18.
van Buuren S, Groothuis-Oudshoorn K, Vink G et al. Package ‘mice’. 2021. Available at: https://cran.r-project.org/web/packages/mice/mice.pdf. Accessed 31 May 2021.
Kursa MB, Rudnicki WR. Package ‘Boruta’. 2020. Available at: https://cran.r-project.org/web/packages/Boruta/Boruta.pdf. Accessed 31 May 2021.
Kuhn M, cre, Wing J et al. caret: Classification and Regression Training. 2022; Available at: https://CRAN.R-project.org/package=caret. Accessed 10 February 2023.
Nguyen QH, Ly H-B, Ho LS et al. Influence of Data Splitting on Performance of Machine Learning Models in Prediction of Shear Strength of Soil. Math Probl Eng. 2021; 2021:e4832864.
Irizarry RA. Chapter 29 Cross validation | Introduction to Data Science. 2019. Available at: https://rafalab.github.io/dsbook/cross-validation.html. Accessed 10 February 2021.
Kassambara, Cross-Validation Essentials in R - Articles - STHDA. 2018. Available at: http://www.sthda.com/english/articles/38-regression-model-validation/157-cross-validation-essentials-in-r/. Accessed 10 February 2021.
Kuhn M. 11 Subsampling For Class Imbalances | The caret Package. 2019. Available at: https://topepo.github.io/caret/subsampling-for-class-imbalances.html. Accessed 10 February 2021.
Saito T, Rehmsmeier M. precrec: Calculate Accurate Precision-Recall and ROC (Receiver Operator Characteristics) Curves. 2023; Available at: https://CRAN.R-project.org/package=precrec. Accessed 10 February 2023.
Huang Y, Li W, Macheret F, Gabriel RA, Ohno-Machado L. A tutorial on calibration measurements and calibration models for clinical prediction models. J Am Med Inf Assoc. 2020;27:621–33.
Biecek P, Maksymiuk S, Baniecki H, DALEX. : moDel Agnostic Language for Exploration and eXplanation. 2023; Available at: https://CRAN.R-project.org/package=DALEX. Accessed 10 February 2023.
Cowley LE, Farewell DM, Maguire S, Kemp AM. Methodological standards for the development and evaluation of clinical prediction rules: a review of the literature. Diagn Progn Res. 2019;3:16.
Nagelkerke J. modelplotr: Plots to evaluate the business value of predictive models. 2020; Available at: https://cran.r-project.org/web/packages/modelplotr/vignettes/modelplotr.html. Accessed 19 November 2022.
R Core Team. R: The R Project for Statistical Computing. 2021. Available at: https://www.r-project.org/. Accessed 3 December 2021.
Rahmani K, Thapa R, Tsou P, et al. Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction. Int J Med Inf. 2023;173:104930.
Bayram F, Ahmed BS, Kassler A. From concept drift to model degradation: An overview on performance-aware drift detectors. Knowl-Based Syst. 2022;245:108632.
Lima AAM, Guerrant RL. Persistent Diarrhea in Children: Epidemiology, Risk Factors, Pathophysiology, Nutritional Impact, and Management. Epidemiol Rev. 1992;14:222–42.
Patel AB, Ovung R, Badhoniya NB, Dibley MJ. Risk Factors for Predicting Diarrheal Duration and Morbidity in Children with Acute Diarrhea. Indian J Pediatr. 2012;79:472–7.
Lo Vecchio A, Conelli ML, Guarino A. Infections and Chronic Diarrhea in Children. Pediatr Infect Dis J. 2021;40:e255.
Ochoa TJ, Salazar-Lindo E, Cleary TG. Management of children with infection-associated persistent diarrhea. Semin Pediatr Infect Dis. 2004;15:229–36.
Khagayi S, Omore R, Otieno GP, et al. Effectiveness of Monovalent Rotavirus Vaccine Against Hospitalization With Acute Rotavirus Gastroenteritis in Kenyan Children. Clin Infect Dis Off Publ Infect Dis Soc Am. 2020;70:2298–305.
Hundley D. Five Reasons Your Machine Learning Model is Performing Poorly. Medium. 2019; Available at: https://dkhundley.medium.com/five-reasons-your-machine-learning-model-is-performing-poorly-f60287a24023. Accessed 27 July 2023.
Bahati F, Mcknight J, Swaleh F, et al. Reporting of diagnostic and laboratory tests by general hospitals as an indication of access to diagnostic laboratory services in Kenya. PLoS ONE. 2022;17:e0266667.
Kokonya D. Burnout Syndrome among Medical Workers at Kenyatta National Hospital (KNH), Nairobi, Kenya. Afr J Psychiatry 2014; 17.
Eddington HS, Trickey AW, Shah V, Harris AHS. Tutorial: implementing and visualizing machine learning (ML) clinical prediction models into web-accessible calculators using Shiny R. Ann Transl Med. 2022;10:1414.
Murphree DH, Quest DJ, Allen RM, Ngufor C, Storlie CB. Deploying Predictive Models In A Healthcare Environment - An Open Source Approach. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2018: 6112–6116. Available at: https://ieeexplore.ieee.org/document/8513689. Accessed 6 October 2023.
Khalilia M, Choi M, Henderson A, Iyengar S, Braunstein M, Sun J. Clinical Predictive Modeling Development and Deployment through FHIR Web Services. AMIA Annu Symp Proc. 2015; 2015:717–726.

No competing interests reported.

SupplementalMaterials12Mar2024.pdf

Download PDF

Editorial decision: Revision requested
16 Jul, 2024
Reviews received at journal
13 Jun, 2024
Reviewers agreed at journal
04 Jun, 2024
Reviews received at journal
02 Jun, 2024
Reviewers agreed at journal
02 Jun, 2024
Reviewers invited by journal
30 May, 2024
Editor invited by journal
13 Mar, 2024
Submission checks completed at journal
13 Mar, 2024
Editor assigned by journal
13 Mar, 2024
First submitted to journal
08 Mar, 2024

You are reading this latest preprint version

Derivation and Validation of a Clinical Predictive Model for Longer Duration Diarrhea among Pediatric Patients in Kenya using Machine Learning Algorithms

Status:

Version 1

Abstract

Background

Methods

Results

Conclusions

Figures

Introduction

Methods

Statistical analysis

Results

Discussion

Conclusions

List Of Abbreviations

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1