Bayesian Network Models with Decision Tree Analysis for Management of Childhood Malaria in Malawi

doi:10.21203/rs.3.rs-60209/v1

Download PDF

Research article

Bayesian Network Models with Decision Tree Analysis for Management of Childhood Malaria in Malawi

https://doi.org/10.21203/rs.3.rs-60209/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 17 May, 2021

Read the published version in BMC Medical Informatics and Decision Making →

You are reading this older preprint version

Read the latest preprint version →

Background: Malaria is a major cause of death in children under five years old in low- and middle-income countries such as Malawi. Accurate diagnosis and management of malaria can help reduce the global burden of childhood morbidity and mortality. Trained healthcare workers in rural health centers manage malaria with limited supplies of malarial diagnostic tests and drugs for treatment. A clinical decision support system that integrates predictive models to provide an accurate prediction of malaria based on clinical features could aid healthcare worker in judicious use of testing and treatment. We developed Bayesian network (BN) models to predict the probability of malaria from clinical features and an illustrative decision tree to model the decision to use or not use a malaria rapid diagnostic test (mRDT).

Methods: We developed two BN models from data that were collected in a national survey of outpatient encounters of children in Malawi. The target diagnosis is taken as the result of mRDT. The first BN model was created manually with expert knowledge, and the second model was derived using an automated method followed by modifications guided by expert knowledge. The performance of the BN models was compared to other statistical models on a range of performance metrics. We developed a decision tree that integrates predictions from these predictive models with the costs of mRDT and a course of recommended treatment.

Results: Compared to the logistic regression and random forest models, the BN models had similar accuracy of 64% but had higher sensitivity at the cost of lower specificity at the default threshold. Sensitivity analysis of the decision tree showed that at low (below 0.04) and high (above 0.4) probabilities of malaria in a child, the preferred decision that minimizes expected costs is not to perform mRDT.

Conclusion: In resource-constrained settings, judicious use of mRDT is important. Predictive models in combination with decision analysis can provide personalized guidance on when to use mRDT in the management of childhood malaria. BN models can be efficiently derived from data to support such clinical decision making.

Medical Informatics

Bayesian network model

decision tree

clinical decision support

childhood malaria

Malawi

Malaria is a mosquito-borne infectious disease that is a major cause of death in children under five years old in low- and middle-income countries (LMICs). Accurate diagnosis and management of malaria can help reduce the burden of childhood morbidity and mortality in LIMCs. In Malawi, the overall prevalence of malaria in children under five is 24%, with the prevalence being as high as 48% in rural areas (1). Among the several malarial parasites, Plasmodium falciparum causes 98% of all malarial infections and all instances of severe illness and death in Malawi (1). Management of childhood malaria in Malawi is provided at health posts and health centers that serve as primary healthcare facilities, district hospitals that serve as secondary healthcare facilities, and central hospitals that serve as tertiary centers of care. Management of common childhood illnesses, such as malaria, is provided mainly at the health posts by community-based healthcare workers known as Health Surveillance Assistants (HSAs), and at health centers that are staffed with HSA and medical assistants. For the majority of the population, health posts and health centers in rural areas serve as the primary sites of care (1).

Historically, in LMICs presumptive treatment of fever with anti-malarial drugs was common. The current standard for the management of childhood malaria is defined in a set of clinical guidelines developed by the World Health Organization (WHO) (2). Based on these guidelines, a child presenting with fever and suspected of having malaria should have the diagnosis confirmed from a drop of blood using either microscopic examination or a malaria rapid diagnostic test (mRDT) that rapidly detects antigens derived from malarial parasites. The mRDT is a useful and less expensive alternative to microscopy. In 2010, Malawi adopted the WHO guidelines as national policy and instituted the use of mRDT for suspected malaria as standard practice. The WHO-recommended treatment for malaria caused by the P. falciparum is artemisinin-based combination therapy (ACT) that combines two active ingredients with different mechanisms of action. Malawi extensively uses ACT for treatment of childhood malaria. Malawi has made significant efforts to provide community-based care for childhood malaria by adopting mRDT and ACT coupled with their national distribution, and these efforts have led to a decline in the disease burden (3). However, a number of challenges remain that hinder effective management of malaria in rural Malawi.

Health posts in rural Malawi are characterized by limited availability of resources, unavailability of diagnostic testing facilities, and lack of clinicians (4). In a study conducted in 2017 in Malawi, Klootwijk et al. (5) reported a lack of microscopy facilities in the rural health centers that were surveyed. mRDTs and HIV tests are typically the only diagnostic tests available at the health posts and rural health centers (6). Even so, mRDTs and ACT drugs are in limited supply in rural areas, especially during the malaria season. The Malawi Service Provision Assessment (SPA) survey reported that mRDTs are only available in 85% of the facilities. Hospitals, which are located in urban centers, have the highest proportion available (95%), and health posts, which are located in rural areas, the lowest (19%) (6). Common reasons for stockouts include late and inaccurate reporting of supplies, drug pilferage, and overprescribing of anti-malarial and antibiotic drugs (4,7). As HSAs are encouraged to adhere to the WHO guidelines, the unavailability of mRDTs leads to one of three common responses at the health posts. The child may be referred to a secondary health center or a tertiary hospital; the HSA treats the child presumptively with ACT drugs if the child is febrile and the drugs are in stock; or in the worst case, the health post stays closed while mRDTs are out of stock. Often, the guardians of the child cannot arrange transportation to the referred site and the child is not treated (5). When available, mRDTs and ACT drugs are provided free of cost to patients at all healthcare facilities in Malawi. Data on affordability of drugs shows that a single course of treatment is unaffordable for a major part of the population (8). This can be a problem if the guardians are advised to purchase ACT drugs on the market, when the drugs are unavailable at the healthcare facilities. Given the high volume of patients and increasing non-adherence to traditional paper-based management guidelines (9), it is imperative to provide support to the healthcare workers for accurate diagnosis and treatment with sustainable resource use.

Technological advances can help tackle some of the above challenges. The promise of artificial intelligence and statistical models for healthcare in LMICs has recently begun to see light (10). While clinical decision support systems that use statistical models are available in high-income countries, transfer of these technologies to LMICs is impractical due to the unique challenges in resource-constrained countries. The distinct needs, diseases, demographics, and standards of care in LMICs call for a different approach to personalized and affordable medicine by adopting tools specifically designed for use in these areas (11). Prior attempts to develop clinical decision support in Malawi have focused on implementing electronic versions of existing guidelines, rather than personalized evidence-based algorithms (12,13). There is a significant lack of diagnostic support for the healthcare workers in these applications.

A recent review of electronic clinical decision algorithms (eCDAs) in LMICs identifies the lack of effective, integrated diagnostic tools as a contributing factor to childhood morbidity and mortality (13). In addition to better diagnosis of diseases and support for rational use of drugs, the review identifies components of an eCDA that are crucial to close gaps in the primary care management systems in low-resource countries. These include algorithms for specific regions, openly available evidence-base content, automated data collection for monitoring and evaluation, and syndromic-based surveillance systems (13). One promising type of model that can be used for diagnosis of diseases using country or region-specific data is the Bayesian network (BN). A BN probabilistically models associations between variables such as a disease and its clinical features, and can be used to predict the presence of the disease (14). BN models have been developed to aid diagnosis and risk assessment in many diseases (15–17), and a wide range of learning algorithms are available that automatically learn BN models from data.

Our long-term goal is to implement a clinical decision support system for childhood malaria in Malawi to aid in management of malaria especially where mRDT is unavailable or in limited supply. In this study, we derived several BN models to predict childhood malaria from data obtained from Malawi, and we compared them to other commonly used statistical models. Further, we provide an illustrative decision analysis that integrates predictions from our BN models with the costs of available alternatives for management.

We first describe the Malawi Service Provision Assessment (SPA) (6) dataset, followed by the methods for the development and evaluation of BN models and the comparison of statistical models. Finally, we describe the details of the decision tree that we developed for decision analysis.

2.1 The SPA Dataset

The SPA survey was conducted between July 2013 and February 2014 by the Ministry of Health of Malawi, with support from the Demographic and Health Surveys (DHS) Program, to assess the status of health facilities and quality of healthcare in Malawi. Data were collected from 1,060 facilities comprised of 97 hospitals, 489 health centers, 55 dispensaries, 369 clinics, and 28 health posts across three major regions in the country, and are representative at the national level by facility type and managing authority (6). These data have been used previously in studies to assess the quality of care and treatment for pneumonia in Malawi (18) and are publicly available from the DHS program (19).

The survey dataset contains observations on 3,441 encounters with children aged 2 to 59 months presenting to an outpatient healthcare facility. For each encounter, the data contains demographic details (age, date of birth, and sex), clinical features (duration of illness, fever, diarrhea, anemia, etc.), mRDT result (if available), and the provider’s diagnosis.

2.2 Data Preprocessing

We assumed the result of the mRDT that is recorded in the dataset to be the gold standard malaria diagnosis. The mRDT has high sensitivity (0.997) and specificity (0.995) for the diagnosis of malaria (20) and is recommended for confirmation of disease by both the WHO and Malawi’s malaria management guidelines (21). Thus, if the mRDT result was positive, we considered malaria to be present, and if the test result was negative, we considered malaria to be absent. This variable is referred to as ‘malaria’ or ‘malaria diagnosis’ in the following sections.

While it would have been ideal to have the mRDT result for each encounter in the dataset, this is not the case. Of the 3,441 encounters, an mRDT result was recorded for only 1,139 encounters, and we restricted our analyses to only these encounters. Table 1 shows the variables that we identified to include for modeling. These variables were chosen based on their inclusion in childhood illness management guidelines (2) as well as on expert domain knowledge. Two of the variables are continuous (age and duration of illness) and the remaining variables are categorical. We discretized the continuous variables since the BN algorithms we used are designed for discrete variables. We discretized age by months (< 2, 2-12, 13-24, 25-60, > 60) based on the varying epidemiology of the disease in children of different ages. We discretized the duration of illness by number of days as shown in Table 1. Every predictor variable had one or more missing values, and we denoted them with a special value called ‘Unknown’. The target variable, malaria, is binary, taking the values ‘Positive’ or ‘Negative’ that represent the mRDT result.

Table 1: Variables and values that were included in the models.

Variable	Values
Target variable
Malaria	Present Absent
Predictor variables
Age	Less than 2 months 2-12 months 13-24 months 25-60 months Over 60 months Unknown
Duration of Illness	Less than or equal to 2 days 3-15 days 16-30 days Over 30 days Unknown
Conscious	Yes No Unknown
Anemia	Present Absent Unknown
Convulsions
Cough or Difficulty Breathing (CDB)
Diarrhea
History of Fever
Fever (temperature>37.5 C)
Lethargy
Malnutrition
Unable to Feed
Vomiting

We randomly split the dataset into 80% training dataset and 20% test dataset, stratified on the target variable, which is the malaria diagnosis. The models were developed using the training dataset and were evaluated on the test dataset.

2.3 Bayesian Network Models

A BN model is a probabilistic graphical model that is specified by a graphical structure and a set of numerical parameters. The graphical structure consists of nodes representing variables and arcs denoting associations between pairs of variables. In this paper, we use nodes and variables interchangeably. Each node in the network has an accompanying conditional probability table that constitutes the parameters of the node. A BN model can be used as a classifier where the model provides the posterior probability distribution of a target node (such as a disease diagnosis) given the values of all other nodes (such as clinical features) in the network (22). Several approaches are available to construct a BN model. In the first approach, both the structure and the parameters are specified manually using expert knowledge. In the second approach, the structure is specified manually and the parameters are estimated from data. In a third approach, both the structure and parameters are automatically estimated from data; a variety of algorithms have been developed to automatically derive BN models in this way. In this study, we used the second and third approaches to develop two BN models for prediction of malaria using the GeNIe Modeler tool (33). For the first model (manual model) we manually specified the structure based on domain knowledge and for the second model (hybrid model) we automatically derived the structure and subsequently modified it using expert knowledge. Using GeNIe Modeler, we computed the parameters of each node from the dataset by estimating the conditional probability distribution of the node given the values of its parent nodes (14).

Using the GeNIe Modeler tool, we developed manual and hybrid BN models to predict malaria from the predictor variables listed in Table 1. Based on domain knowledge of malaria from experts and the literature, for the manual model we modeled clinical features as conditionally independent of each other given malaria. Specifically, a clinical feature that was a symptom or a sign was represented as a child of the malaria node to create a Naïve Bayes structure. A feature that was not a sign or a symptom was represented as a parent of the malaria node. For example, a sign such as convulsions was represented as a child of malaria with the arc directed from malaria to convulsions. This encodes clinical knowledge that malaria can cause convulsions. As another example, age was represented as a parent of malaria with the arc directed from age to malaria. This denotes knowledge that younger children may be more vulnerable to contracting malaria than older children. In a Naïve Bayes disease model, each sign or symptom node has a single incoming arc from the disease node with no arcs among them. We used the GeNIe Modeler to estimate the parameters of the manual model as conditional probabilities from the training dataset.

While the manual model is simple and interpretable, the conditional independence assumption is overly simplistic. Hence, we developed a hybrid model in which we first derived automatically an Augmented Naïve Bayes (ANB) model using GeNIe Modeler and then modified it guided by expert knowledge. The ANB algorithm in GeNIe Modeler enables efficient learning of an ANB model (see description below) from data. Each arc in the ANB model was then evaluated by an expert for clinical plausibility. We also used the GeNIe Modeler to estimate the parameters of the hybrid model as conditional probabilities from the training dataset.

The ANB model (14) extends the Naïve Bayes model by allowing arcs among child nodes. For example, in a Naïve Bayes model, diarrhea and convulsions are linked only by incoming arcs from the malaria node while in the ANB model, an additional arc may be included from diarrhea to convulsions that implies convulsions are associated with both malaria and diarrhea.

2.4 Comparison Models

For comparison with the diagnostic predictions of BN models, we derived several commonly used statistical models including logistic regression and random forests to predict malaria. Instead of discretizing the continuous variables, age and duration of illness, we scaled them so that the values had unit variance; when a variable had missing values, we imputed its value as the mean of its non-missing values. We treated the categorical variables in the same way as for the BN models. We trained the models using the scikit-learn library (23) in Python.

2.5 Evaluation

We applied the BN, logistic regression, and random forest models to predict the probability of malaria in the test dataset. Using these predictions, we computed the area under the Receiver Operating Characteristic curve (AUC) and the Brier score for each model. The AUC value indicates the diagnostic discrimination performance of the model, where perfect performance has an AUC of 1. The Brier score measures both the calibration and discrimination of probabilistic predictions and ranges from 0 to 1 with a score of 0 indicating perfect predictive performance. We converted the probability into a binary prediction of malaria present or absent by using a probability threshold of 0.5. With the binary predictions, we computed accuracy, sensitivity, specificity, precision, and F1 score for each model.

2.6 Decision Tree

To conserve the use of mRDT in a resource-constrained setting like a rural health post in Malawi, we developed a decision tree to compare the consequences of using and not using the mRDT. The decision tree integrates the probability of having malaria with the costs of testing and treatment, and identifies the optimal decision (relative to a set of probabilities and utilities) – to use mRDT or not – in a specific patient.

Figure 1: Illustrative decision tree that integrates predictions from BN models with example costs. Malaria+ and malaria- represent malaria present and absent respectively, F refers to clinical features of the patient, and C is the associated cost..

The decision tree that we developed is shown in Figure 1 and uses a common approach to model sequential decisions (24). The decision is driven by the expected costs of testing and treatment that are denoted by ‘mRDT?’ and ‘Treat?’ nodes. We calculated the expected cost of the [mRDT=no] branch using the probability of malaria from the BN model and costs associated with each decision as

In Figure 1, P(malaria+|F) is the probability from the BN model that malaria is present given the clinical features of the patient. P(malaria-|F) is the probability from the BN model that malaria is absent given the features. The costs (shown in the hexagons) in the decision tree are from the perspective of a payer of healthcare costs, such as the government of Malawi, and depend on the resources used, including mRDT and ACT drugs. We used the following costs based on the literature: a mRDT costs US $0.60 (8) and a course of ACT for uncomplicated malaria costs US $1.00 (25). We estimated the cost of mistakenly not treating a child with malaria at US $16.60 based on the assumption that the cost may go up to 10 times the cost of mRDT and ACT drugs for uncomplicated malaria if the untreated disease becomes severe, resulting in a hospital admission.

We computed the expected cost of the [mRDT?=yes] branch as

where

and

In the above equations, P(malaria+|mRDT+, F) is the probability of malaria being present given that the mRDT result is positive and the clinical features of the patient, and P(malaria-|mRDT-, F) is the probability of malaria being absent given that the mRDT result is negative and the clinical features of the patient. P(mRDT+|F) and P(mRDT-|F) represent the probabilities of mRDT being positive or negative, respectively, given the clinical features of the patient. These probabilities are obtained from the BN model and are assumed to be equal to P(malaria+|F) and P(malaria-|F) respectively. We have made the assumption that a positive result on the mRDT is equivalent to the child having malaria.

We also performed a sensitivity analysis to determine how dependent the strategy selection is on the probability of malaria. We varied the probability of malaria, P(malaria+|F) or P(mRDT+|F), from 0 to 1 and calculated the expected costs of using and not using the mRDT to determine the probability ranges in which a child may be treated based on clinical features alone without performing mRDT.

In this section, we discuss the characteristics of the SPA dataset, followed by a description of the BN models. We compare the predictive performance of all the models developed. Finally, we present the sensitivity analysis based on the decision tree.

3.1 Characteristics of the Dataset

The dataset that we used for modeling contains 1,139 encounters, 13 predictor variables, and the target variable (see Table 2). Malaria was present in 36.4% (415) of the encounters. The most common age category of the children was from 24 to 60 months (35.7%). The duration of illness varied from 0 days to over 30 days, although the duration period of 0 to 2 days was the most common (51.7%). The most common clinical features were history of fever (69.3%) and CDB (62.8%) followed by vomiting (29.1%) and diarrhea (26.3%). The percentage of missing values ranged from 0 to 6.4%, with ‘history of fever’ having the highest percentage missing and anemia and malnutrition having the lowest.

Table 2: Summary of the dataset.

Variable	Number of encounters (%)
Target Variable
Malaria	Present	415 (36.4)
Absent	724 (63.6)
Predictor Variables
Age (in months)	2-12	379 (33.3)
12-24	294 (25.8)
24-60	407 (35.7)
Other	21 (1.8)
Unknown	38 (3.3)
Duration of Illness (in days)	<=2	589 (51.7)
3 to 15	502 (44.1)
15 to 30	11 (1.0)
Unknown	37 (3.2)
Conscious	Yes	1076 (94.5)
No	12 (1.1)
Unknown	51 (4.5)
Anemia	Present	93 (8.2)
Absent	1046 (91.8)
Unknown	0 (0)
Convulsions	Present	55 (4.8)
Absent	1047 (91.9)
Unknown	37 (3.2)
Cough or Difficulty Breathing	Present	715 (62.8)
Absent	373 (32.7)
Unknown	51 (4.5)
Diarrhea	Present	299 (26.3)
Absent	803 (70.5)
Unknown	37 (3.2)
Fever (temperature>37.5 C)	Present	307 (27.0)
Absent	771 (67.7)
Unknown	61 (5.4)
History of Fever	Present	789 (69.3)
Absent	277 (24.3)
Unknown	73 (6.4)
Lethargy	Present	228 (20.0)
Absent	875 (76.8)
Unknown	36 (3.2)
Malnutrition	Present	11 (1.0)
Absent	1128 (99.0)
Unknown	0 (0)
Unable to Feed	Present	137 (12.0)
Absent	966 (84.8)
Unknown	36 (3.2)
Vomiting	Present	331 (29.1)
Absent	770 (67.6)
Unknown	38 (3.3)

3.2 Description of Bayesian Network Models

The manual BN model is shown in Figure 2. The model contains 14 nodes with 13 arcs. The variables duration of illness and age are modeled as parents of the malaria node, while the clinical features are modeled as children of the malaria node.

The hybrid BN model was created by modifying the ANB model that was derived by the GeNIe Modeler. The ANB model that was automatically derived contained 32 arcs. Based on expert knowledge, 4 arcs were deleted and 14 arcs were added to create a hybrid model with 42 arcs among 14 nodes (see Figure 3). Compared to the manual model, this model contains additional arcs among the predictor variables. The 8 red arcs in the model indicate associations of high strength of influence. The strength of influence of an arc in the ANB model measures the Euclidean distance between the conditional probability distributions of the nodes linked by that arc (26). These included association of (1) consciousness with lethargy, anemia, convulsions, diarrhea, and duration of illness, (2) fever with CDB, and (3) age with history of fever.

Figure 2: The manual BN model.

Figure 3: The hybrid BN model that was obtained by modifying an automatically derived ANB. The arcs with a high strength of influence are colored red.

3.3 Performance of Models

Table 3 displays the performance values of all models on the test dataset. While the AUC, accuracy, and precision values were similar across all models, there were differences in the sensitivity and specificity values. The comparison models had poor sensitivity (0.09 and 0.04 for logistic regression and random forest, respectively), with resulting low F1 values (0.16 and 0.07 respectively). Although the sensitivity was much lower than specificity for all models for the default threshold equal to 0.5, the BN models had much higher sensitivity than the comparison models. The Brier scores were similar across all models.

Table 3: Performance metrics for all models on the test dataset.

Model	AUC	Accuracy	Sensitivity	Specificity	Precision	F1	Brier score
Manual BN	0.58	0.64	0.33	0.81	0.50	0.39	0.24
Hybrid BN	0.58	0.63	0.24	0.85	0.48	0.32	0.25
Logistic Regression	0.60	0.64	0.09	0.95	0.53	0.16	0.22
Random Forest	0.59	0.64	0.04	0.98	0.50	0.07	0.23

3.4 Sensitivity Analysis

Figure 4 presents the sensitivity analysis of the decision tree that is shown in Figure 1. We varied the probability of malaria given the patient findings (x-axis) and computed expected cost that is plotted on the y-axis. The black and blue lines represent the expected cost of not obtaining and obtaining mRDT (the two branches labeled “no” and “yes” that originate from ‘mRDT?’) in the decision tree), respectively. For the decision of not obtaining mRDT, as the probability of having malaria increased, the expected cost increased and then became constant at US $1.00 at probability 0.0625 and above. And, for the decision of obtaining mRDT, as the probability of having malaria increased, the expected cost increased, and at probability 0.4, this cost surpassed the cost when not obtaining the test. Based on this analysis, when the probability of having malaria is between 0.0 and 0.04 or between 0.4 and 1.0 (Figure 4), the preferred decision is to forego the test. This is an illustrative analysis to demonstrate judicious use of mRDT in a resource-constrained setting where the availability of mRDT is limited.

Figure 4: Sensitivity analysis for the decision tree shown in Figure 1. The probability of malaria is plotted on the x-axis and the expected costs on the y-axis.

The current practice for management of malaria in children involves the use of mRDT and a course of ACT for a child presenting with fever. With limited availability of both mRDTs and ACT drugs in rural health centers in Malawi and other LMICs, a more sustainable strategy for judicious use of these resources is needed. We developed predictive models that compute the probability of malaria based on clinical findings and developed a simple decision tree to determine the optimal use of mRDT based on the probability of malaria.

To the best of our knowledge, this is the first study to develop BN models for prediction of childhood malaria in Malawi, which serves as an example of a LMIC. We derived two BN models, one that was specified manually, and a hybrid model in which an automatically derived ANB model was modified with expert knowledge. For both models, the parameters were derived from data. We compared the performance of the BN models to that of logistic regression and random forest models. The performance as measured by AUC, accuracy, and precision are similar for all four models we investigated. While the logistic regression and random forest models achieved high specificity, they exhibited poor sensitivity values at 9.6% and 3.6%, respectively. The two BN models had more balanced specificity and sensitivity values. Between the two models, the manual model had high sensitivity and precision, while the hybrid model achieved higher specificity. No model was clearly better than the other on all performance measures.

While the manual BN model is simple and interpretable with good performance, the hybrid BN model is likely more realistic in capturing the associations among the clinical features of malaria. The relations in the hybrid BN model in Figure 3 were judged to be clinically accurate in childhood malaria. Further, Figure 3 indicates that a small number of associations are stronger than others and provide a starting point for developing a simpler model with fewer associations. Simpler models are easier to interpret and may be preferred for clinical use if their performance is similar or vary only slightly from that of more complex models (15).

The BN models provide several advantages over the current malaria management approach. Fever alone has been found to be a poor indicator of childhood malaria (27), and CDB, anemia, malnourishment, and diarrhea have been found to be associated with increased likelihood of malaria (28,29). Since the BN models capture associations in addition to the main ones, such as between fever and malaria, they are more accurate than estimates that are based on a single feature (30). Further, BN models have the advantage that they can be iteratively refined by updating the model structure and parameters with new data and knowledge.

Integration of the probabilities obtained from a predictive model, such as a BN model, with the costs of resources such as tests and drugs in a decision tree can provide the basis of optimal decision making at rural health posts. In the example decision tree that we used (see Figure 1), we included illustrative costs of mRDT tests and ACT drugs. At low (below 0.04) and high (above 0.4) probabilities of malaria in a child, a sensitivity analysis indicates that the mRDT test can be omitted to minimize expected costs. Thus, accurate estimation of the probability of malaria based on clinical features can lead to judicious use of mRDT, conserving the test for children whose probability of malaria based on clinical features is intermediate (between 0.04 and 0.4). This implies that at probabilities below 0.04 the decision to not treat and at probabilities above 0.4 the decision to treat can be made with high confidence without a mRDT, and the mRDT is most useful at probabilities in the range 0.04 to 0.4.

As Malawi has an emerging Electronic Medical Record System (31, 32), one possibility is to integrate the BN model to provide the probability of malaria to healthcare workers such as HSAs at the point of care to enable them to use mRDT more judiciously.

Limitations

There are several limitations to our study. The dataset that we used was derived from the SPA survey, and our choice of variables was constrained by the information collected in the survey. For example, the survey did not include details, such as the immunization and HIV status of the children that are important for determining the risk of malaria. Additionally, the proportion of children with malnutrition in the data was much less than the reported prevalence in the country which suggests that this variable might have been underreported (33). Information about prior exposure to anti-malarial drugs would also be useful but were not collected in the survey. Thus, there may exist latent associations among variables that were not captured in our models.

We believe that the choice of using the mRDT result as the gold standard diagnosis was the best possible decision given the dataset and the WHO reported sensitivity and specificity of the test. As the type of mRDT and procedure of the test was not made available with the dataset, we cannot verify the reported outcome. We removed the encounters that did not include an mRDT result, which reduced the number of encounters substantially. A smaller dataset limits the reliability of the parameter estimates in all models including the BN models. Further, the selected dataset may yield biased predictions that are not representative of the outcomes in the remainder of the dataset. However, this is the only dataset that we know of with both gold standard diagnosis and clinical features of childhood malaria available. While this study developed and validated the models with the same dataset, external validation with appropriate feedback from the healthcare providers in Malawi would be valuable to guide the next steps to refine the model for clinical use.

The decision analysis considered only the costs of tests and ACT drugs. The analysis also assumes that if the disease becomes severe, then treatment is provided, albeit, at a higher cost. Additional costs and preferences based on the local needs can be incorporated in the decision tree for more sophisticated decision analysis to make it more applicable for clinical use. The analysis can be also be extended to include several outcomes in the case of progression of the disease to severe complicated malaria and death (8).

Current clinical guidelines for the management of childhood malaria in LMICs such as Malawi are based on WHO guidelines that require that a child receive a confirmatory diagnosis based on microscopy or an mRDT before deciding to put the child on a course of ACT. However, in resource-constrained settings, mRDT and ACT drugs may not always be available. Thus, a clinical decision support system that provides personalized guidance on when to use mRDT could aid the healthcare worker in conserving the use of mRDT.

We used clinical features from a publicly available dataset to derive models that predict malaria in a LMIC setting. Integration of predictions with costs of resources, such as mRDTs and ACT drugs, in a decision tree provide a way to model the rationale use of those resources. The application of such models at the point of care will require the development of clinical decision support that can provide nuanced guidance for the personalized management of childhood malaria.

ACT	Artemisinin Combination Therapy
ANB	Augmented Naïve Bayes
AUC	Area under the Receiving Operating Characteristic curve
BN	Bayesian network
CDB	Cough or difficulty breathing
DHS	Demographic and Health Surveys
eCDA	Electronic clinical decision algorithm
HSA	Health Surveillance Assistant
LMIC	Low- and middle-income country
mRDT	Malaria Rapid Diagnostic Test
NGO	Non-Governmental Organization
SPA	Service Provision Assessment
WHO	World Health Organization

Ethics approval and consent to participate

Not applicable as all data is publicly available and subjects are anonymized.

Consent for publication

Not applicable.

Availability of data and materials

The SPA dataset used for analysis in this study is publicly available through the DHS website, https://www.dhsprogram.com/. Subset of the data extracted by the authors is available from the corresponding author on reasonable request.

Competing interests

The authors declare that they have no competing interests.

Funding

The research reported in this publication was supported in part by the National Library of Medicine of the National Institutes of Health under award number R01 LM012095. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Additional funds were provided by the Center for Health Informatics for the Underserved, Department of Biomedical Informatics, University of Pittsburgh.

Authors' contributions

SBT obtained and analyzed the data, created the models, and developed the manuscript with input from all authors. SV supervised the project and contributed in study design, analysis, and manuscript writing. GPD devised the original idea for the study, with inputs from MJD. GPD and GFC helped plan the experiments and supervise the findings. GFC and SV guided the decision analysis. MGM was involved in model development and validation. MJD helped design the initial study and contributed to draft revisions. All authors read and approved the final manuscript.

Acknowledgments

The authors thank Mr. Rashid Deula for assistance during the field visits to the health centers and health posts in Malawi. The BN models described in this paper were created using the GeNIe Modeler that is available from BayesFusion, LLC (http://www.bayesfusion.com/) free of charge for academic research and teaching.

National Malaria Control Programme (NMCP), ICF. Malawi Malaria Indicator Survey. 2017;2.
Chart Booklet Integrated Management of Childhood Illness. 2014.
Health Organization W. World malaria report 2015.
Lufesi NN, Andrew M, Aursnes I. Deficient supplies of drugs for life threatening diseases in an African community. BMC Health Serv Res. 2007;7:1–7.
Klootwijk L, Chirwa AE, Kabaghe AN, Van Vugt M. Challenges affecting prompt access to adequate uncomplicated malaria case management in children in rural primary health facilities in Chikhwawa Malawi. BMC Health Serv Res. 2019 Oct 22;19(1):735.
MoH. Malawi Service Provision Assessment (SPA) 2013-14. 2014;1.
Kabaghe AN, Phiri MD, Phiri KS, Van Vugt M. Challenges in implementing uncomplicated malaria treatment in children: A health facility survey in rural Malawi. Malar J. 2017 Oct 18;16(1):419.
Shillcutt S, Morel C, Goodman C, Coleman P, Bell D, Whitty CJM, et al. Cost-effectiveness of malaria diagnostic methods in sub-Saharan Africa in an era of combination therapy. Bull World Health Organ. 2008 Feb;86(2):101–10.
Bjornstad E, Preidis GA, Lufesi N, Olson D, Kamthunzi P, Hosseinipour MC, et al. Determining the quality of IMCI pneumonia care in Malawian children. Paediatr Int Child Health. 2014 Feb 6;34(1):29–36.
Wahl B, Cossy-Gantner A, Germann S, Schwalbe NR. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? BMJ Glob Heal. 2018 Aug 1;3(4):e000798.
Hosny A, Aerts HJWL. Artificial intelligence for global health. Vol. 366, Science. 2019. p. 955–6.
Connor YO, Sullivan TO, Gallagher J, Heavin C, Donoghue JO. Developing eXtensible mHealth Solutions for Low Resource Settings. In: Prasath R, O’Reilly P, Kathirvalavakumar T, editors. Mining Intelligence and Knowledge Exploration. Cham: Springer International Publishing; 2014. p. 361–71.
Keitel K, D’Acremont V. Electronic clinical decision algorithms for the integrated primary care management of febrile children in low-resource settings: review of existing tools. Vol. 24, Clinical Microbiology and Infection. Elsevier B.V.; 2018. p. 845–55.
Friedman N, Geiger D, Goldszmidt M. Bayesian Network Classifiers. Mach Learn. 1997;29(2–3):131–63.
Onisko A, Druzdzel MJ, Wasyluk H. A probabilistic causal model for diagnosis of liver disorders. Proc Seventh Symp Intell Inf Syst. 1998;(September 2016):379–87.
Kraisangka J, Druzdzel MJ, Benza RL. A Risk Calculator for the Pulmonary Arterial Hypertension Based on a Bayesian Network. Work Notes 13th Annu Bayesian Model Appl Work. 2016;1–59.
Berkan Sesen M, Nicholson AE, Banares-Alcantara R, Kadir T, Brady M. Bayesian networks for clinical decision support in lung cancer care. PLoS One. 2013;8(12):82349.
Uwemedimo OT, Lewis TP, Essien EA, Chan GJ, Nsona H, Kruk ME, et al. Distribution and determinants of pneumonia diagnosis using Integrated Management of Childhood Illness guidelines: A nationally representative study in Malawi. BMJ Glob Heal. 2018;3(2):1–12.
The DHS Program - Quality information to plan, monitor and improve population, health, and nutrition programs [Internet]. [cited 2020 Mar 14]. Available from: https://dhsprogram.com/
Organization WH, others. Malaria rapid diagnostic test performance: results of WHO product testing of malaria RDTs: round 8 (2016--2018). 2018;
Ministry of Health Malawi. Guidelines for the treatment of malaria in Malawi. 4th ed. Malawi: Malawi Government; 2013.
Cheng J, Bell DA, Liu W. An Algorithm for Bayesian Belief Network Construction from Data. Proc Sixth Int Work Artif Intell Stat (AI STAT ’97). 1997;83–90.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011;12(Oct):2825–30.
Kjaerulff UB, Madsen AL. Bayesian networks and influence diagrams. Springer Sci Bus Media. 2008;200:114.
Khuluza F, Heide L. Availability and affordability of antimalarial and antibiotic medicines in Malawi. Postma M, editor. PLoS One. 2017 Apr 18;12(4):e0175399.
Koiter JR. Visualizing inference in Bayesian networks. Man-machine Interact Gr. 2006;Master of.
Mfueni E, Devleesschauwer B, Rosas-Aguirre A, Van Malderen C, Brandt PT, Ogutu B, et al. True malaria prevalence in children under five: Bayesian estimation using data of malaria household surveys from three sub-Saharan countries. Malar J. 2018;17(1):1–7.
Reithinger R, Ngondi JM, Graves PM, Hwang J, Getachew A, Jima D, et al. Risk factors for anemia in children under 6 years of age in Ethiopia: analysis of the data from the cross-sectional Malaria IndicatorSurvey, 2007. Trans R Soc Trop Med Hyg. 2013 Dec 1;107(12):769–76.
Feasey NA, Everett D, Faragher EB, Roca-feltrer A, Kang A, Denis B, et al. Modelling the Contributions of Malaria , HIV , Malnutrition and Rainfall to the Decline in Paediatric Invasive Non-typhoidal Salmonella Disease in Malawi. 2015;1–12.
Arora P, Boyne D, Slater JJ, Gupta A, Brenner DR, Druzdzel MJ. Bayesian Networks for Risk Prediction Using Real-World Data: A Tool for Precision Medicine. Value Heal. 2019 Apr 1;22(4):439–45.
Douglas GP, Gadabu OJ, Joukes S, Mumba S, McKay M V., Ben-Smith A, et al. Using Touchscreen electronic medical record systems to support and monitor national scale-up of antiretroviral therapy in Malawi. PLoS Med. 2010 Aug;7(8).
Waters E, Rafter J, Douglas GP, Bwanali M, Jazayeri D, Fraser HSF. Experience implementing a point-of-care electronic medical record system for primary care in Malawi. In: Studies in Health Technology and Informatics. IOS Press; 2010. p. 96–100.
ICF. Malawi Demographic and Health Survey 2015-16 . Zomba, Malawi : National Statistical Office and ICF; 2017.

Download PDF

Journal Publication

published 17 May, 2021

Read the published version in BMC Medical Informatics and Decision Making →

Editorial decision: Major revision
22 Nov, 2020
Review #2 received at journal
18 Oct, 2020
Review #1 received at journal
25 Sep, 2020
Reviewer #2 agreed at journal
22 Sep, 2020
Reviewers invited by journal
14 Sep, 2020
Reviewer #1 agreed at journal
14 Sep, 2020
Editor assigned by journal
01 Sep, 2020
Submission checks completed at journal
21 Aug, 2020
Editor invited by journal
20 Aug, 2020
First submitted to journal
14 Aug, 2020

You are reading this older preprint version

Read the latest preprint version →

Bayesian Network Models with Decision Tree Analysis for Management of Childhood Malaria in Malawi

Status:

Journal Publication

Version 1

Abstract

Figures

1 Background

2 Materials and Methods

3 Results

4 Discussion

5 Conclusions

Abbreviations

Declarations

References

Status:

Journal Publication

Version 1