Here below results are discussed based on the research questions.
A) What is the underlying structure of anemia among pregnant women in Ethiopia?
To show the underlying structure of anemia among pregnant in the case of Ethiopia, a descriptive statistical technique was used by considering the age, place of residence, region, antenatal care visit, history of the place of delivery, history of terminating the pregnancy, and wealth index with the anemia level. See Fig. 1 below which represents that pregnant women who live in the rural area of Ethiopia are highly affected by anemia, and in the rural area of Ethiopia the level of anemia shows that 57.2%, 14.1%, 2.5%, and 14.7% of non-anemic mild, severe, and moderate respectively. In Fig. 1 we conclude that every level of anemia in a rural area of Ethiopia was higher than the urban area of Ethiopia.
Figure 2 here below represents that pregnant women with poor economic status were highly affected by anemia. As we see in Fig. 2 here below pregnant women with Poor wealth index status were higher than other wealth index status in every level of anemia.
Figure 3 here below represents that the pregnant women which didn’t follow or follow one time only during pregnancy were highly affected by anemia, and pregnant women who follows antenatal care visits repeatedly reduced the level of anemia.
See in Fig. 4 here above, which represents the anemia level distribution among pregnant women with different age groups and, the pregnant women in the age between 30–34 were severely affected by anemia.
As we see in Fig. 5 the Ethiopian regions, like Somalia, afar, Dire Dawa, and snnpr were highly affected by anemia.
B) Which homogeneous ensemble of machine learning algorithms is suitable for predicting the level of anemia among pregnant women in Ethiopia?
To answer this question, twelve experiments using three homogeneous ensemble machine learning algorithms namely random forest, extreme gradient boosting, and cat boost with class decomposition (by using one versus one and one versus the rest), and without class decomposition was conducted. To show that homogeneous ensemble algorithms can perform better than other supervised machine learning algorithms, we have also conducted an experiment using decision tree algorithms. The experiments showed that the model that was developed using the cat boost algorithm with one versus the rest class decomposition performs better in predicting the level of anemia among pregnant in the case of Ethiopia with 97.6% of accuracy, 97.59% of precision, 97.57% of recall, 97.58% of f1_score, and, 99.9% of roc see Table 3 below, using all the tuning parameters of (depth = 10, iterations = 300, l2_leaf_reg = 1, learning_rate = 0.15) extracted with grid search. To develop a model using random forest algorithm also uses (criterion='entropy', max_features='sqrt', min_samples_split = 3, n_estimators = 500, random_state = 0, max_depth = 20, max_leaf_nodes = 400, n_jobs=-1) parameters and performs less than cat boost algorithms, extreme gradient boosting algorithms use a default parameters, and decision tree algorithms use (criterion='entropy',max_features='sqrt',min_samples_split = 12,random_state = 0,max_depth = 30, max_leaf_nodes = 600) parameters and performs less performance than all other algorithms.
Table 3
|
Evaluation metrics
|
Without class decompositions
|
With one vs. one class decomposition
|
With one vs. rest class decomposition
|
Decision tree
|
Accuracy
|
79.38%
|
89.88%
|
89.09%
|
precision
|
79.09%
|
89.81%
|
89.01%
|
Recall
|
79.21%
|
89.77%
|
88.98%
|
F1_score
|
79.03%
|
89.71%
|
88.96%
|
Cross validation
|
68.48%
|
84.27%
|
83.17%
|
Random forest
|
Accuracy
|
91.34%
|
94.4%
|
94.4%
|
Precision
|
91.32%
|
94.36%
|
94.37%
|
Recall
|
91.28%
|
94.35%
|
94.35%
|
F1_score
|
91.25%
|
94.34%
|
94.34%
|
Cross validation
|
81.23%
|
89.37%
|
88.18%
|
ROC
|
99%
|
-
|
99.43%
|
Cat Boost
|
Accuracy
|
97.08%
|
97.44%
|
97.595%
|
Precision
|
97.09%
|
97.438%
|
97.596%
|
Recall
|
97.05%
|
97.418%
|
97.574%
|
F1_score
|
97.06%
|
97.422%
|
97.58%
|
Cross validation
|
95.94%
|
96.478%
|
96.482%
|
ROC
|
99.9%
|
-
|
99.9%
|
Extreme gradient Boost
|
Accuracy
|
94.26%
|
95.21%
|
94.54%
|
Precision
|
94.27%
|
95.20%
|
94.53%
|
Recall
|
94.20%
|
95.16%
|
94.48%
|
F1_score
|
94.20%
|
95.16%
|
94.48%
|
Cross validation
|
88.86%
|
91.73%
|
89.72%
|
ROC
|
99.53%
|
-
|
99.54%
|
C) What are the associated risk factors that influence the occurrence of anemia among pregnant women in the case of Ethiopia?
To answer this question, feature importance analysis was performed using the model that was developed with the best performing algorithm which is cat boost. Table 4 presents the most important risk factors that determines the level of anemia among pregnant women in Ethiopia.
Table 4
Identified risk factors with best fit model and feature importance
Feature
|
Values
|
Feature
|
Values
|
Duration of current pregnancy
|
10.3953193
|
Current pregnancy wanted
|
3.838873474
|
Age in 5-year groups
|
9.69394377
|
Body mass index
|
2.787116569
|
Source of drinking water
|
8.99369175
|
Number of ANC visits
|
2.600944933
|
History of contraceptive use
|
6.61405164
|
Highest educational level
|
2.419310637
|
Respondent's occupation
|
6.12946203
|
History of terminating a pregnancy
|
0.849814164
|
Number of household members
|
5.85914199
|
Currently breastfeeding
|
0.732357678
|
Wealth index
|
5.63211101
|
Type of place of residence
|
0.576997215
|
Frequency of listening to the radio
|
5.16045505
|
Vitamin A in last 6 months
|
0.356953114
|
Husband/partner's education level
|
5.02943094
|
During pregnancy, given or bought iron tablets/syrup
|
0.046775106
|
Region
|
4.3314029
|
History of Place of delivery
|
0.010932682
|
Husband/partner's occupation
|
3.96855455
|
During pregnancy took: sp/ fansidar for malaria
|
0.00058328
|
Birth history
|
3.87177534
|
|
|
D) What are the important rules that can be generated from the predictive model?
To answer this question, we used all the features that we used to develop the predictive model and generate all the important rules by using the best-performed algorithms (cat boost algorithms with one versus rest class decompositions) for the level of anemia among pregnant in the case of Ethiopia. The most important rules that were also validated by domain experts are presented here below:
RULE1, IF given iron tablet or syrup during pregnancy == 'No' AND vitamin A in last 6 months == 'No' AND during pregnancy took sp fansidar for malaria== 'No' AND region == 'Somali' AND currently breastfeeding == 'No' AND place of residence == 'rural' AND Duration of current pregnancy == 'seven-nine-week' AND current pregnancy wanted == 'Yes' AND respondents occupation == 'did not work' AND history of place of delivery == 'Home' AND age == 'thirty - thirty four' AND educational level == 'no education' AND husband educational level == 'no education' AND number of household== 'six-ten' AND history of terminating pregnancy== 'No' AND body mass index == 'normal' AND husband occupation == 'did not work' THEN anemia level== 'sever'.
RULE2, IF given iron tablet or syrup during pregnancy == 'No' AND vitamin A in last 6 months == 'No' AND during pregnancy took sp fansidar for malaria== 'No' AND region == 'Somali' AND currently breastfeeding == 'No' AND place of residence == 'rural' AND Duration of current pregnancy == 'seven-nine-week' AND current pregnancy wanted == 'Yes' AND respondents occupation == 'did not work' AND place of delivery == 'Home' AND age == 'thirty - thirty four' AND educational level == 'no education' AND husband educational level == 'no education' AND number of household== 'six-ten' AND History of terminating pregnancy== 'No' AND body mass index == 'normal' AND husband occupation == ' agricultural - employee' AND source of water == 'pure' AND history of contraceptive use == 'Yes' THEN anemia level== 'none anemic'.
RULE3, IF given iron tablet or syrup during pregnancy == 'No' AND vitamin A in last 6 months == 'No' AND during pregnancy took sp fansidar for malaria== 'No' AND region == 'Somali' AND currently breastfeeding == 'No' AND place of residence == 'rural' AND Duration of current pregnancy == 'seven-nine-week' AND current pregnancy wanted == 'Yes' AND respondents occupation == 'did not work' AND history of place of delivery == 'Home' AND age == 'thirty - thirty four' AND educational level == 'no education' AND husband educational level == 'no education' AND number of household== 'six-ten' AND history of terminating pregnancy== 'No' AND body mass index == 'normal' AND husband occupation == ' agricultural - employee' AND source of water == 'not pure' AND history of contraceptive use == 'Yes' THEN anemia level== 'Moderate’.
Finally, the predictive model was deployed on cloud for potential users. The artifact was designed using a Python module called flask framework with HTML and deployed on Heroku. All potential users can access (https://anemia-level-prediction-model.herokuapp.com/) the predictive model to evaluate a pregnant woman’s level of anemia.