Descriptive statistical summary of Socio-demographic characteristics
The dataset has been described and visualized using SPSS to examine the properties of the dataset relative to the whole records. Simple statistical analysis has been performed to verify the quality of the dataset such as missing values, error values and to obtain high level information regarding the data mining questions. Hence, the selected attributes used for model building are statistically described in details to understand the dataset during experimentation and increasing the accuracy of the model.
Only 3.8% of the respondents had never breastfed their children until the survey. 40% and 54.8% of the respondents had ever breastfed but were and were not breastfeeding during the time of survey. the attribute Amenorrheic shows the unusual absence of menstruation. 55.1% of the respondents have had Amenorrheic while 44.9% of them have had no Amenorrheic during the time of interview. Majority of the respondents (88.8%) were not pregnant during the study period. About 92% of the children born during the previous five years from the mothers included in the study were alive. Most (85.22%) of the mothers have had one and two birth histories while the rest of them had have 3 and 4 births for the last five years. Only Five of the total respondent’s also have greater than four births [Table 1].
About 83% and 17% of the respondents were rural and urban residents respectively. 78.7% of the respondents had have no history of diarrhea within the last recent two weeks before the survey date. Out of the total respondents, 49.2%, 16.1% and 34.7% were poor, middle and high income mothers respectively. 70.3% of the respondents have had an experience of watching television while the rest of them have not practiced watching television before the study. Majority (85.2%) of the mothers included in the study have delivered their labour at home, while only 11.4% and 2% of the mothers have got institutional delivery service at public and private health institutions respectively. 74.7% of the respondents had no fever and 17.9% of them had fever during the time of surveying. Most of the respondents were illiterate (69.9%), 25.1% were primary school attendants, 3.3% were secondary and 1.7 were graduated mothers. About half of the children have had average weight (51.9%), 32.4% less than average, 1.2% greater than the average weight [Table 1].
Table 1
Descriptive statistical summary of Socio-demographic characteristics
Attributes | Values | N | % | Attributes | Values | N | % |
Duration of breastfeeding | Ever breastfed | 6390 | 54.8 | Wealth index | poor | 5739 | 49.2 |
Never breastfed | 445 | 3.8 | middle | 1872 | 16.1 |
Still breastfeeding | 4757 | 40.8 | rich | 4043 | 34.7 |
Missing | 62 | .5 | Frequency of Watching TV | no | 8195 | 70.3 |
Currently Amenorrheic | No | 6422 | 55.1 | yes | 3447 | 29.6 |
Yes | 5232 | 44.9 | Missing | 12 | .1 |
Currently pregnant | No/don't know | 10351 | 88.8 | Place of delivery | home | 9934 | 85.2 |
Yes | 1303 | 11.2 | public | 1334 | 11.4 |
Region | Tigray | 1202 | 10.3 | private | 237 | 2.0 |
Affar | 1130 | 9.7 | others | 129 | 1.1 |
Amhara | 1294 | 11.1 | Total | 11634 | 99.8 |
Oromiya | 1761 | 15.1 | Missing | 20 | .2 |
Somali | 1027 | 8.8 | Had fever | no | 8710 | 74.7 |
Benishangul-Gumuz | 1020 | 8.8 | yes | 2082 | 17.9 |
SNNP | 1614 | 13.8 | Total | 10792 | 92.6 |
Gambela | 851 | 7.3 | Missing | 862 | 7.4 |
Harari | 659 | 5.7 | Educational attainment | Illiterate | 8142 | 69.9 |
Addis Ababa | 400 | 3.4 | Primary | 2930 | 25.1 |
Dire Dawa | 696 | 6.0 | Secondary | 386 | 3.3 |
Child is alive | No | 846 | 7.3 | Higher | 196 | 1.7 |
Yes | 10808 | 92.7 | Total | 11654 | 100.0 |
Birth in the last five years | 1 or 2 births | 9926 | 85.2 | Child weight | Less than Average | 3774 | 32.4 |
3 or 4 births | 1723 | 14.8 | Average | 6050 | 51.9 |
> 4 births | 5 | .0 | Greater than Average | 138 | 1.2 |
Type of place of residence | Urban | 1986 | 17.0 | Others | 447 | 3.8 |
Rural | 9668 | 83.0 | Total | 10409 | 89.3 |
Had diarrhea | no | 9173 | 78.7 | Missing | 1245 | 10.7 |
yes | 1620 | 13.9 | |
Missing | 861 | 7.4 | |
J48 Decision Tree Prediction Model output
In this study, different experiments were conducted altering parameters of the J48 decision tree and PART rule induction algorithm for building the best predictive model. The J48 decision tree algorithm builds decision trees from a set of predefined training dataset using the concept of information entropy and attribute ordering. It uses the fact that each attribute of the data was used to make a decision by splitting the data into smaller subsets.
Table 2
Experimentation result of J48 Algorithms in scenarios one and two
Performance measurements | Experiments |
Scenario one | Scenario two |
#1 | #2 | #3 | #4 | #5 | #6 | #7 | #8 | #9 | #10 | #11 | #12 | #13 |
Accuracy (%) | 96.17 | 94.92 | 96.49 | 95.12 | 96.77 | 95.35 | 96.64 | 96.45 | 96.93 | 96.55 | 95.44 | 96.9 | 96.95 |
Mean absolute error | 0.05 | 0.07 | 0.05 | 0.07 | 0.04 | 0.06 | 0.04 | 0.05 | 0.04 | 0.04 | 0.06 | 0.03 | 0.04 |
Numbers of leaves | 480 | 280 | 428 | 293 | 454 | 343 | 454 | 408 | 501 | 484 | 408 | 501 | 501 |
Size of tree | 555 | 376 | 581 | 396 | 615 | 62 | 615 | 555 | 684 | 657 | 550 | 684 | 684 |
Time taken | 0.26 | 0.12 | 0.12 | 0.11 | 0.12 | 0.12 | 0.26 | 0.13 | 0.07 | 0.07 | 0.06 | 0.03 | 0.04 |
AV.TP rate | 0.96 | 0.95 | 0.96 | 0.95 | 0.97 | 0.96 | 0.96 | 0.96 | 0.97 | 0.97 | 0.96 | 0.97 | 0.97 |
AV.FP rate | 0.04 | 0.06 | 0.04 | 0.05 | 0.04 | 0.06 | 0.04 | 0.04 | 0.04 | 0.04 | 0.06 | 0.04 | 0.03 |
AV. Precision | 0.97 | 0.96 | 0.97 | 0.97 | 0.97 | 0.97 | 0.96 | 0.98 | 0.97 | 0.97 | 0.96 | 0.97 | 0.98 |
AV.Recall | 0.96 | 0.95 | 0.96 | 0.95 | 0.97 | 0.96 | 0.96 | 0.96 | 0.97 | 0.97 | 0.96 | 0.97 | 0.97 |
AV.ROC area | 0.98 | 0.98 | 0.99 | 0.98 | 0.99 | 0.98 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 |
CCI | 11209 | 11062 | 11246 | 11086 | 11277 | 11113 | 3829 | 3372 | 11297 | 11253 | 11123 | 3839 | 3390 |
ICCI | 445 | 592 | 408 | 568 | 377 | 541 | 133 | 124 | 357 | 401 | 531 | 123 | 106 |
Key: CCI: Correctly classified Instance, ICCI (Incorrectly classified Instance), Accuracy: Registered performance of model, AV: Average, TP: True Positive. FP: False Positive, ROC: Relative Optical character curve. |
As we can see in Table 2 the result of each experiment developed model the unpruned experiment have best accuracy more than pruned experiment. As the result Experiment # 13 (building decision tree unpruned with 70 − 30 percentage split) is the best with an accuracy of 96.95%. Experiment # 9 also showed best performance next to experiment # 13 with an accuracy of 96.93%. both experiment #9 and #13 are unpruned experiments. The pruned experiment #5 has also good performance next to the above two experiments and better than all the other pruned experiments with an accuracy of 96.77%. In general, the unpruned experiments had shown good performance than the pruned experiments.
J48 Decision Tree Prediction Model Evaluation
The experiments conducted above have been analyzed and evaluated in terms of classifiers performance values, accuracy, confusion matrix values, TP and FP Rate, number of leaves, and size of the tree generated, ROC curves and execution time. Performance of the classifier on the testing set increased as the confidence factor increased up to about 0.5. Experiment #5 showed an accuracy of 96.77%. At this accuracy correctly and incorrectly classified instance are 11279 and 377 respectively from 11,654 instances [Table 3]. From thirteen different trials experiment #5 is the best model in terms of accuracy and minimized incorrectly classified instances. The Confusion Matrix of Experiment #5 in Table 3 shows the number of instances of each class that are assigned to all possible classes according to the classifier’s prediction. The columns represent the predictions, and the rows represent the actual class.
Table 3
summary of confusion matrix for J48
| Predicted Breast feeding practices |
Positive | Negative | Total |
Actual Breast feeding Practices | Positive | 7568 | 7785 | 7785 |
Negative | 158 | 3869 | 3869 |
Total | 7726 | 3928 | 11654 |
The confusion matrix in Table 3 shows that 7568 instances were correctly predicted as normal breast feeding practice (True positive). True positive of the actual class of the test instance is Normal breast feeding practice and the classifier correctly predicts the class as Normal breast feeding practice. The numbers of instance which were correctly predicted as poor breastfeeding practice are 3711 instances (True negative). In this case of true negative the actual class of the test instance is poor breastfeeding practices and the classifier correctly predicts the class as poor breast feeding practices. Therefore, correctly classified instances are the sum of diagonal values of the table, which are 11279 instances correctly classified from 11,654 instances.
In contrast, 158 instances were predicted as a normal breastfeeding practice while they were in fact poor breastfeeding practice (False Positives). A false positive is when the actual class of the test instance is poor breastfeeding practice but the classifier incorrectly predicts the class as normal breast feeding practice. The classifier predicted 217 instances as poor breastfeeding practice (False Negatives). A false negative is when the actual class of the test instance is Normal breast feeding practice but the classifier incorrectly predicts the class as poor breastfeeding practice.
The result in Table 4 has been extracted from Experiment #5 model. True Positive rate shows the percentage of low weight instances whose predicted values of the class attribute are identical with the actual values. FP rate shows the percentage of instances whose predicted values of the class attribute are not identical with the actual values.
Table 4
Detailed accuracy by class
| TP Rate | FP Rate | Precision | Recall | F-Measure | ROC Area | Class |
| 0.972 | 0.041 | 0.979 | 0.972 | 0.976 | 0.992 | NORMAL |
| 0.959 | 0.028 | 0.944 | 0.959 | 0.952 | 0.992 | POOR |
Weighted Av | .0.968 | 0.037 | 0.968 | 0.968 | 0.968 | 0.992 | |
If we take the first level where ‘breast feeding practices = POOR’ TP Rate is the ratio of poor breastfeeding cases predicted correctly to the total of positive cases, there were 3711 instances correctly predicted as poor breastfeeding practice, and 3869 instances in all that were poor breastfeeding practice. So the TP Rate (True Positive Rate) of poor breastfeeding practice = 3711/3869 = 0.959. The FP Rate is then the ratio of normal breastfeeding practice of incorrectly predicted as poor breastfeeding practice to the total of normal breastfeeding practice cases. 217 normal breast feeding practice instances were predicted as poor breastfeeding practices and there were 7785 normal poor breastfeeding practices in all. So the FP Rate is 217/7785 = 0.028. We can follow the same method to calculate for ‘breast feeding practice = normal’ but as we can see from detailed accuracy by class TP Rate and FP Rate of Normal class level are 0.972 and 0.041 respectively. The model performance is good quality because it has high true positive rates with low false positive rates [Table 4]. |
As can be seen from the detailed accuracy by class output in Table 6, the ROC (Receiver Operating Characteristics) area of this model is highest (0.992). The Area under the ROC area curve of experiment #5 is higher. Higher numbers here indicate the model is the more accurate. The ROC curve is a plot of how the classifier is performed over the entire range of possible choices of cutoff values. Each point on the curve represents the True-Positive Rate plotted on the y-axis and the False-Positive Rate plotted on the x-axis that resulted from a particular cut-off value as shown in Fig. 1. |
PART Rule Induction Prediction Model output
To build the Rule induction model using PART algorithm, WEKA software package and the same number of datasets were used as an input. The experiments were divided into two scenarios with two test option that are 10-fold cross validation and percentage split evaluator.
Table 5
Experimentation result of PART Algorithms with one and two scenarios
Performance measurements | Experiments |
Scenario one | Scenario two |
#1 | #2 | #3 | #4 | #5 | #6 | #7 | #8 | #9 | #10 | #11 | #12 | #13 |
Accuracy (%) | 96.74 | 95.45 | 96.78 | 95.33 | 96.86 | 95.46 | 96.25 | 96.86 | 96.94 | 96.71 | 95.93 | 96.87 | 97.10 |
Mean absolute error | 0.04 | 0.06 | 0.04 | 0.06 | 0.04 | 0.06 | 0.04 | 0.04 | 0.03 | 0.04 | 0.04 | 0.03 | 0.03 |
Numbers of leaves | 180 | 150 | 180 | 156 | 191 | 152 | 191 | 191 | 282 | 277 | 262 | 282 | 282 |
Size of tree | 0.85 | 0.38 | 0.43 | 0.37 | 0.49 | 0.33 | 0.97 | 0.94 | 1.65 | 1.42 | 1.40 | 1.68 | 1.66 |
Time taken | 0.97 | 0.96 | 0.97 | 0.96 | 0.97 | 0.96 | 0.97 | 0.97 | 0.97 | 0.97 | 0.96 | 0.97 | 0.97 |
AV.TP rate | 0.04 | 0.05 | 0.04 | 0.06 | 0.04 | 0.06 | 0.05 | 0.04 | 0.04 | 0.04 | 0.05 | 0.04 | 0.04 |
AV.FP rate | 0098 | 0.97 | 0.97 | 0.96 | 0.97 | 0.96 | 0.97 | 0.98 | 0.97 | 0.97 | 0.97 | 0.97 | 0.98 |
AV. Precision | 0.97 | 0.96 | 0.97 | 0.96 | 0.97 | 0.96 | 0.97 | 0.97 | 0.97 | 0.97 | 0.96 | 0.97 | 0.97 |
AV.Recall | 0.99 | 0.98 | 0.99 | 0.98 | 0.99 | 0.98 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.98 | 0.99 |
AV.ROC area | 11274 | 11124 | 11279 | 11110 | 11282 | 11126 | 3814 | 3386 | 11298 | 11271 | 11180 | 3838 | 3387 |
CCI | 380 | 530 | 375 | 544 | 372 | 528 | 148 | 110 | 356 | 383 | 474 | 124 | 109 |
ICCI | 96.74 | 95.45 | 96.78 | 95.33 | 96.86 | 95.46 | 96.25 | 96.86 | 96.94 | 96.71 | 95.93 | 96.87 | 97.10 |
As shown in Table 5, the registered performance in case of induction rule learner, the unpruned is better than the pruned one. Among the 13 experiments an Experiment #13 (70 − 30 percentage split) registered the best performance of 97.10%. This shows that out of the testing set of 3496 records, 3387 (97.10%) of the records are correctly classified, while 109 (2.9%) of the records are misclassified. Experiment #5 also registered the best performance out of all the experiments using pruned parameter with an accuracy of 96.86%.
PART Rule Induction Prediction Model Evaluation
The resulting confusion matrix shown in Table 6 depicts that out of the total 2382 normal breast feeding practice instances 2316 (96.94%) of them are correctly classified in their respective class, while 66 (3.06%) of the records are incorrectly classified as poor breastfeeding practice. In the other hand, out of the total poor breastfeeding instances 1071 (96.04%) of them are correctly classified as poor breast feeding practices and 43 (3.96%) of the records are misclassified.
Table 6
Confusion matrix of PART algorithm with 70 − 30 percentage-split
| Predicted Breast feeding practices |
Positive | Negative | Total |
Actual Breast feeding Practices | Positive | 2316 | 2382 | 2382 |
Negative | 43 | 1114 | 1114 |
Total | 2359 | 1137 | 3496 |
J48 And Part Models Accuracy Comparison
The two selected classification models J48 and PART with their respective accuracy, Precision and number of instances correctly classified and misclassified.
Table 7
Performance comparison of selected best models
Types of algorithms | Accuracy (%) | Time taken (sec/) | Correctly classified | Misclassified |
J48 | 96.77 | 0.97 | 11277 | 377 |
PART | 96.86 | 0.98 | 3380 | 110 |
As shown in Table 7, PART rule induction algorithm classifier outperforms J48 classifier with an accuracy of 96.86% and it was selected as the better classifier for predicting breastfeeding practice.
Evaluation of Discovered Knowledge
About 191 rules/patterns were generated by the PART algorithm from the experiment #5. Consequently, to evaluate the importance of the discovered knowledge/rules, whether they are acceptable/not and whether they go in line with what is already known in the real world practice, domain experts from Mekelle University Ayder Referral Hospital were consulted. Finally, 39 rules generated by the PART algorithm were selected as best rules. Rule 1 – Rule 7listed below were also selected as the most interesting and best rules or discovered knowledge.
Rule 1
If Amenorrheic = “no” AND Birth within 5 Years interval =”one or two” AND Region = “Tigray” AND Watching Television = “no”, Delivery Place = “Home” AND Alive = “Yes” AND Mother Educational Status = “illiterate” AND Weghit of child = “Average” then the child will have poor breast practice (87.0/3.0).
Rule 2
-If Amenorrheic =”no” AND Birth within 5 Years =”one or two” AND Pregnant = “Yes” AND Delivery Place = “Home” AND Fever = “no” AND Diarrhea = “no” then the child will have poor breastfeeding practice (63.0/6.0).
Rule 3
If Amenorrheic= “no” AND Birth within the 5 year interval =“one or two”, Delivery Place=”Home” AND Educational Status of the mother is “illiterate” AND child lives in Amhara, Somali, Tigray, Oromiya, affair, Gamble and Benishangul-Gumuz, then the child will have poor breastfeeding practice.
Rule 7
If Delivery place= “Home”, Television=”yes”, Diarrhea=”no” and Alive= “yes” then child will have a normal breast feeding (120.0).
Rule 6
-If Amenorrheic= “no” AND Birth within 5 years interval= “one or two” AND Diarrhea=”no” AND weight of the child at birth time= ”larger than average”, then child will have Poor breastfeeding practice. (98.0).
Rule 2
If Amenorrheic =”no” AND Birth within 5 year interval=”one or two” AND Delivery Place = “private sector” then child will have poor breastfeeding practices (113.0/7.0).
Rule 3
If Amenorrheic=”no” AND Birth within 5 year interval= “one or two” AND Region=” Addis Ababa” AND Fever=”no”, then the child having poor breastfeeding practice will happen (110.0).
In general, the above rules indicated that, the attributes delivery place, educational status of mother, pregnancy, watching television and the weight of the child at birth time was found to be the most determinate factors for child breastfeeding practice. Whereas, the model assumed that some attributes like region, duration of breastfeeding, amenorrheic, place of residence, number of birth within 5 years’ interval, child Alive, diarrhoea, family wealth status and fever are less determinate factors for breast feeding practice. Finally, we agreed with the general rules that the model produced and findings of the current research.
Use of the Discovered Knowledge
In order to show how to use the discovered knowledge for the domain expert, user interface was designed by using JAVA programming language as an interaction point between the user and the system. WEKA is written in the Java language and contains a Graphical User Interface (GUI) for interacting with data files and producing visual results. It also has a general Application Page Interface (API); WEKA can be embedded like any other library in applications. Hence, Java application was deployed in to the selected predictive model as a decision support system for breastfeeding practice. Accordingly, the outputs of the prediction model were classified as NORMAL and POOR breast feeding practice based on the filled attribute values. You can see a model output predicting breastfeeding practice as NORMAL in Fig. 2 and a model output predicting breastfeeding practice as POOR Fig. 3.