Application of Machine Learning Algorithm to Predicting Metabolic Syndrome in Adults

The prevalence of the metabolic syndrome (MetS) is increasing worldwide. Early detection of the MetS by valid and available indicators can help prevent, control and reduce its complications. This study aims to identify of most important anthropometric, biochemical and nutritional indices for predicting MetS. Methods This study conducted on 9,602 participants from baseline data of the Ravansar Non-Communicable Disease (RaNCD) cohort study including of adults aged 35–65 years. The reference model for MetS was considered according to International Diabetes Federation (IDF) criteria. We used a wrapper algorithm and area under ROC curve (AUC) for selection and assessing most important predictors of MetS.


Conclusion
This study demonstrated that in addition to aggressive models, models non-invasive (anthropometric indices, blood pressure and energy intake) can be also a good and convenience screening tool to predict the MetS. The models, in addition to the application of clinical diagnosis, can be widely used in researches on large populations.

Background
The metabolic syndrome (MetS) represents a cluster of cardio-metabolic disorders including excessive abdominal adiposity, high blood pressure and fasting plasma glucose, high-density lipoproteincholesterol (HDL-C) and abnormal triglyceride (TG) [1,2]. MetS is one of the risk factors for cardiovascular diseases (CVDs), non-alcoholic fatty liver disease (NAFLD) and Type 2 Diabetes [3,4]. The risk of developing CVDs in patients with the MetS compared [5]. In addition, the MetS can double the mortality rate from cardiac arrest and stroke [6]. The prevalence of MetS in low and middle-income countries ranges from 10-47% [7]. In a systematic review study in 2017, the prevalence of MetS was 25% based on the Adult Treatment Panel III (ATP III) criteria and 30% according to the International Diabetes Federation (IDF) criteria, which was higher in women than men [8].
Although there are different criteria for diagnosing MetS such as IDF, ATP III, World Health Organization (WHO), and the European Group for the Study of Insulin Resistance (EGIR) each offers different results.
Epidemiological studies have suggested that convenient and low cost anthropometric indices can be used to predict MetS. These indices include body mass index (BMI) and waist circumference (WC) which have been used in the clinics for decades [9], and also new measures such as body roundness index (BRI), body shape index (ABSI) and visceral fat index (VAI) [10,11].In addition, diet is one of the most important factors in causing chronic and metabolic diseases. Improving the diet with the approach of reducing in ammatory foods decrease the MetS [12]. A multi-center study in Taiwan found that energy intake could be a good predictor of the MetS, and optimal cut-off points set the energy intake for predicting the MetS at 26.2 kcal/kg/day [13].
As mentioned above, previous studies have reported an association between MetS anthropometric, biochemical and nutritional indices [3,14,15]. Therefore, using a combination of these indicators with a single formula compared of using each of these indicators separately can be useful and effective in predicting the MetS. This study aimd to identify of most important anthropometric, biochemical and nutritional indicators for predicting MetS in adults using Boruta (wrapper algorithm around random forest) machine learning algorithm.

Study design and population
For this cross-sectional study, we used the baseline data of Ravansar non-communicable diseases (RaNCD) that is one of the sub-studies of the national Prospective Epidemiological Research Studies in IrAN (PERSIAN) [16]. Ravansar is one of the western cities of Kermanshah Province with a population of about 50,000. For RaNCD study, 10,000 participants of aged 35-65 years were enrolled, covering approximately 75% of the eligible individual's residents in the area. The baseline phase of this study was completed during the years 2014 to 2017. The RaNCD study protocol has been published in detail [17].

Inclusion And Exclusion Criteria
All participants in the recruitment phase of RaNCD entered this study. According to the purpose of this study, subjects with cancer, renal failure, kidney stones, pregnant woman and cases with incomplete information were excluded from this study.

Procedures
Demographic information including (age, sex, education, alcohol use and smoking) was completed faceto-face by experts trained at the RaNCD cohort center.
We measured BP using a manometer cuff and stethoscope from arm after 10 minutes of rest in the seated position. To measure biochemical markers including TG, HDL-C, low-density lipoprotein cholesterol (LDL-C), total cholesterol (T-C) and fasting blood sugar (FBS); blood samples were collected after a 12 hours fasting.
Weight (with 0.5 kg precision) and height (with 0.1 cm precision) were measured using a Bio Impedance Analyzer BIA (InBody 770 Biospace, Korea) and a BSM 370 (Biospace Co, Seoul, Korea), respectively. Other anthropometric measurements including BMI, body fat mass (BFM), percent body fat (BF%), fat free mass (FFM), skeletal muscle mass (SMM), visceral fat area (VFA) and waist to hip ratio (WHR) were also measured with BIA. To measure WC, participants were told to stand erect, relaxed, and to not hold in their stomach, a midwaist circumference measurement was taken at the level of the upper border of the right ilium. Centimeters were used to measure Wrist Circumference (WrC). Dietary information collected from the Food Frequency Questionnaire (FFQ) was used to calculate energy intake.

Statistical Methods
Continuous variables are presented as mean ± standard deviation, and categorical variables has presented as N (%). Chi-square test and independent t-test were performed for assessing the associations of the categorical and continuous predictor variables and status of MetS. Also, we used a wrapper algorithm for selection most important predictors of MetS by "Boruta" R package. For calculation of area under ROC curve, we implemented "glm" R function. In addition, "pROC" R package was used for statistical comparison between reference model with proposed, noninvasive, and other models. The reference model for MetS was consider according to IDF criteria. All of the statistical analyses were analyzed using R programming version 4.0.3. The signi cance level was set at level of 0.05. Results 9,602 participants with a mean age of 47.31 ± 8.25 years were studied. Table 1 shows the basic characteristics of the participants according to the presence or absence of the MetS. The mean BMI in subjects with MetS was signi cantly higher compared to subjects of non-MetS (P < 0.001). The mean indexes of central obesity (including WHR, WC, HP) and WrC were signi cantly higher in subjects with the MetS than in compared to non-MetS (P < 0.001). The mean of lipid pro le (TG, LDL and T-C) were signi cantly higher in subjects with the MetS than in compared to non-MetS (P < 0.001). As results of Table 2, FBS, TG, HDL, SBP, energy intake, and WC had the highest importance value (IV) for prediction of MetS in the studied women, 79%, 57%, 48%, 39%, 33%, 28%, respectively. The TG, HDL, WC, FBS and SBP had the highest IV for prediction of MetS in the studied men, 76%, 67%, 61%, 59%, 34%, 28%, respectively. Figure 1 shows the importance value for total participants.      Predictive formula for each of perdition logistic models for total participants has presented in Table 5. All models with AUC > 0.6 can predict the syndrome in both men and women by non-invasive components.

Discussion
According to nding of this study, all models tested by Boruta (wrapper algorithm around random forest) machine learning algorithm in women and men have good predictive power for MetS, based on the ROC curve analysis (AUC > 0.6). Although, the models with biochemical indices including FBS, TG and HDL-C had higher predictive power, so that they were almost equal to the reference model. However, models consisting of non-invasive components (anthropometric indices, blood pressure and energy intake) can be also a good and convenience screening tool to predict the MetS, which had lower AUCs compared to aggressive models. In addition, the IV for components of the models was con rmed (using Boruta algorithm), before implementing the models.
In the present study, four models consisting of non-invasive indicators for women and men were identi ed that with high predictive power (AUC > 0.6) can be used as a good tool for predicting MetS. These models are a combination of anthropometric indicators, blood pressure, age and calorie intake. Model 1 in Table 5 includes the ve components of age, WC, BMI, SBP and DBP, with AUC: 0.756 is the best predictor model for MetS in both sexes, with IV of 55% and 49% for WC and SBP, respectively.
Another model that has been con rmed to be predictable in both sexes includes ve components of age, BMI, SBP, DBP and Energy with AUC of 0.747 (Table5, model4). A noteworthy point in this model is the importance of energy intake of daily in the MetS occurrence. The point that can be discussed in this formula is the time-consuming calculation of people's energy intake. Needless to explain, models that include energy intake and are time consuming to calculate will be used for research and study purposes. Brie y, these models can be applied in research; however, further studies are needed to con rm these models de nitively in the future. In general, the models identi ed in this study may compete with the reference model (IDF) in clinical practice and research studies, due to their convenience, applicability, and availability in different conditions.
Our ndings showed, FBS, TG, HDL, SBP, and WC had the highest IV for prediction of MetS; which are components of reference MetS (IDF) and their validity has already been proven. In addition, we found the indices of general and central obesity (BMI, WC and WHR) had a high IV. A study on 9,746 participants found that BMI, WC and WHR indices were valid predictors of MetS risk in adult; and BMI had the highest predictive power (AUC: 0.78; 95%CI: 0.77, 0.80) [18]. Meta-analysis studies have also reported these indicators as a reliable tool for predicting CVDs and MetS [19,20].
In this study, indices used to assess of body fat include BFM, VFA, FFM and PBF had the relatively high IV.
Previous research has reported the importance and relevance of some of indicators examined in this study with the MetS risk, and few studies have identi ed these indicators as a reliable predictor of the MetS. For example; in a study conducted by , three anthropometric indices that used to assess body fat including fat mass index, (FMI), PBF and VFA with AUC > 0.6 have been introduced as strong predictors for MetS risk [15]. A cross-sectional analysis on data from the RaNCD prospective study has shown, that visceral adiposity index (VAI) that relatively new indicator composed of biochemical and anthropometric indices (including WC, BMI, TG, HDL-C) can be used as valid tools to early detection the MetS [10,21]. A study by Pekgor S et al. has demonstrated that VAI is a good tool for predicting insulin resistance (AUC: 0.7) and MetS risk (AUC: 0.8) [22]. The BRI that consists of height and WC, is a good indicator for evaluation of body fat, and is a valid tool for early detection of the cardio-metabolic risk [10,11].
In addition, has been observed association between high energy intake, high carbohydrate, fat and the MetS [13,23,24]. A study by Trevino et al. with 90 days of intervention and giving a high-calorie diet to rats demonstrated, a high-calorie diet has led to the MetS in rats [25]. As in the present study, energy intake in men had a high IV to predicting the MetS. As in the present study, energy intake in men and women had an acceptable IV for predicting MetS, and in women this score was higher. Overall, a review of these studies con rms the application of these indicators in the models of this study. Since each of these indicators alone has previously been reported to be predictive of the MetS, by using several indicators simultaneously and constructing a formula, we found that their predictive power increased.
Although the results of this study were maybe interesting and signi cant, we also had some limitations. This study was conducted on the Kurdish population. Therefore, to con rm our study models, there is a need to conduct further studies on different populations with different ethnicities. We did not nd a similar study that provided a new criterion for predicting the MetS. Therefore, to further con rm the results of this study, we suggest more studies with a variety of computational approaches. The large sample size data of RaNCD cohort study and using an advanced machine learning algorithm are the strengths of this study.

Conclusion
According to nding of this study, all models tested by wrapper algorithm and area under ROC curve (AUC) in women and men have good predictive power for MetS. Although, the models with biochemical indices including FBS, TG and HDL-C had higher predictive power, so that they were almost equal to the reference model. However, models consisting of non-invasive components (anthropometric indices, blood pressure and energy intake) can be also a good and convenience screening tool to predict the MetS, which had lower AUCs compared to aggressive models. Moreover, the IV for components of the models was con rmed, before implementing the models.
However, considering increasing prevalence of MetS, access to various tools to predict it can play an important role in early diagnosis and control of complications. The models, in addition to the application of clinical diagnosis, can be widely used in researches on large populations.

Declarations
Ethics approval and consent to participate The study was approved by the ethics committee of Kermanshah University of Medical Sciences (IR.KUMS.REC.1399.307). From all participants was taken oral and written informed consent.

Consent for publication
Not applicable.