Using Machine Learning Method for classication Body Mass Index for clinical decision

Background: Body mass index (BMI) is a good method for measure the overweight and obesity among people. The aim of this study was to develop a machine learning method to classication of BMI for clinical application. Methods: In this study we used the dataset of 1316 people who selected randomly from all area of Ardabil city. Dataset included demographic and anthropometric data. Classication algorithms such as Random forest (RF), Gaussian Naïve Bayes (GNB), Decision Tree (DT), Support-Vector Machines (SVM), Multi-layer Perceptron (MLP), K-nearest neighbors (KNN) and Logistic Regression (LR) were used for classication of people based on BMI data. The performance of algorithms were evaluated with Precision, Recall, Mean Squared Errors (MSE) and Accuracy. All programing done in python.3.7 in Jupyter Notebook. Results: According to BMI, 603(45.8%) of all samples were normal and 713 (54.2%) were at-risk. The precision of RF, GNB, DT, SVM, MLP, KNN and LR for people at risk was 0.93, 0.86, 0.99, 0.82, 100, 0.82 and 0.99 respectively. Also, the accuracy of RF, GNB, DT, SVM, MLP, KNN and LR were 95%, 83%, 100%, 82%, 100%, 82% and 100 %. Conclusion: In compare classication algorithms results showed that, the LR , MLP and DT had the higher full accuracy than other algorithms in detection of people at-risk.


Background
Obesity and overweight are a complex, multifactorial, and major public health problem world-wide which could be affects people in all age groups and increased the risk of many diseases among people [1][2].
This index is used for measure of obesity and overweight and detection of people at risk of obesity and overweight [3]. BMI is de ned as a person's weight in kilograms divided by the square of his height in meters (kg/m2). According the WHO reports, BMI less than 25 was considered as normal as and more than 25 as at-risk of overweight and obesity. By 2030, approximately 38% of the world's elderly population will be obese [2][3][4].
Recently, obesity and being overweight are increasing rapidly in the developed and developing countries and it is estimated that by 2030 due to many factors, up to 57.8% of the world's elderly people would suffer from being overweight or obese [5][6].
The aim of this study was to investigate the classi cation of BMI by using several machine learning algorithms.

Data collection method and dataset
In this study, the used dataset is from the data used in the obesity and overweight research which approved before by Ardabil University of Medical Science and part of data published in a paper by Amani et al [7].
The used dataset included the information BMI of 1316 people of Ardabil city at year 2019.
The detailed clari cation about this dataset is given in Table 1. Logistic Regression (LR) is a machine learning technique for regression and classi cation problems which to assign observations to a discrete set of classes.
Gaussian Naive Bayes classi er (GNB) is a group of simple classi ers based on probabilities created assuming the independence of random variables and based on Bayes theorem.

Decision Tree (DT)
A decision tree is a map of the possible results of a series of related choices or options so that it allows an individual or organization to weigh possible actions in terms of costs, opportunities, and bene ts.
Support vector machine (SVM) is classi ed as a pattern recognition algorithm. The SVM algorithm can be used wherever there is a need to identify patterns or classify objects in speci c classes.

Multi-layer Perceptron (MLP)
The arti cial neural network creates a structure similar to the biological structure of the human brain and neural network to be able to learn to generalize and make the decision.
Random Forest (RF) is a combined learning method for regression classi cation, which works on the training time and class output (classi cation) or for predictions of each tree separately, based on a structure consisting of a large number of decision trees.

K-nearest neighbors (KNN)
In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric classi cation method rst developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classi cation and regression.   Table 3 shows the structure of confusion matrix.
We compared the classi cation performance of all ML algorithms by using Accuracy, Precision, Recall (Sensitivity), F1-score and MSE.
All of participants were from urban population who lived in Ardabil city. (Table 4) Performance of machine learning algorithms In this ML model, we predicted the performance on BMI dataset by Accuracy, Precision (Positive Predictive Value), Recall (Sensitivity) and F1-score. Figure 1 shows the performance of the predictive model using different data mining algorithm techniques. As shown in Fig. 1, the LR and MLP with 100% and RF with 97% had the highest sensitivity than other algorithms. Also the algorithms DT, LR and MLP with 100% had the highest accuracy rate than others in the classi cation of people based on BMI data.

Discussion
The main goal of this study was presentation of the e cacy of ML algorithms and techniques in BMI data which we used various machine learning (ML) algorithms to improve the classi cation of at-risk people based on BMI data which could be provided signi cant insights compared with traditional statistical models.
Among all ML models, DT, LR and MLP showed higher performance than others. Similar to this study, Wu et al in a study on fatty liver disease by using machine learning algorithms showed that among studied algorithms, the random forest model showed higher performance than other classi cation models which have some difference with our study results [8].
To our knowledge, this is the rst population based study attempted to classi cation of at-risk people based on BMI data by using various machine learning algorithms. There are many kind of machine learning algorithms have been developed along with the most popular Bayesian algorithm and logistic regression, it is hard to make a proper algorithm for clinical decision making and clinical practices [9]. Therefore, the performances of different algorithms could provide the most important consideration, along with the easy to use and the interpretation of the models. However, our model could effectively detect the at-risk people based on BMI data without using advanced methods. In addition, the model could provide an easy, fast, low cost, and noninvasive method to accurately detection of people with normal and abnormal BMI [10]. By considering the increasing health issues related to obesity and overweight has in daily reports, machine learning allow massive amounts of data to be analyzed rapidly [11]. Therefore, it is the opportunity to apply machine learning algorithms to the claasi cation of individual patients in medical practice and treatment and control of future related problems for people in term of their health and life style. By using various machine learning prediction models, physicians and health staff could be able to extract the minimum data necessary to make a prediction decision about people with normal and non-normal BMI [12].
Lee et al in a study showed that accuracy of ML method ranged from 60.4-73.8% which was lower than our study results because in our study the accuracy of ML algorithms ranged from 82-100% [13].
Uddin et al in a study entitled" Comparing different supervised machine learning algorithms for disease prediction" showed that of all ML algorithms, the algorithm RF had high accuracy in compare with other algorithms which was not in line with our study results because in this study we resulted that the best accuracy related to the algorithms such as DT, LR and MLP each with 100% [14].
Bastin Takhti et al in a study entitled" A model for diagnosis of kidney disease using machine learning techniques" showed that similar to our study on BMI data, the results showed that machine learning techniques could be effective in the diagnosis of kidney disease and of all algorithms, the most accuracy was related to the SVM whit 0.97 and recall was for DT with 0.96 and most precision was related to the MLP with 0.99. In our study the most accuracy, recall and precision of BMI classi cation was related to the DT,LR and MLP but the accuracy of SVM was 0.82 which was lower than Bastin Takhti study rate [15].

Conclusion
In this study, seven machine learning techniques were used to classi cation of healthy people from at-risk people based on BMI data. All the algorithms worked with a reasonable accuracy and speed. However, the DT, LR and MLP algorithms showed maximum precision and minimum errors among all algorithms and also, these algorithms showed better performance than other ML classi cation techniques. This prediction outcome has the potential to help clinicians and health system staff to make more precise and meaningful decisions about people at-risk of overweight and obesity to provide the prediction program for decreasing their risk of diseases and change their bad life style in compare with healthy people.

Declarations
Ethics approval and consent to participate: The used dataset is used from another study about BMI which published by the rst author in JBE. The original study was approved by Ethical committee of Ardabil University of Medical Sciences.

Consent for publication: Yes
Availability of data and materials: Yes  BMI.useddataset.xlsx