Developing cardiometabolic risk classi ers for youth using handgrip strength, anthropometrics, and demographics: a machine learning approach leveraging National Health and Nutrition Examination Survey Data


 Background Handgrip strength associates with cardiometabolic (CMB) risk. Health-related physical fitness (HRPF) testing in schools do not explicitly predict CMB disease risk, and commonly engender anxiety, teasing and taunting by peers. Further, school-based screening for hyperinsulinemia using acanthosis nigricans likely misses nascent CMB risk factors like dyslipidemia. This study examined the feasibility of leveraging anthropometrics, demographics, and handgrip strength data to build optimal CMB risk classifiers. Methods The 2011-2014 National Health and Nutrition Examination Survey data from participants aged 12-18 years (n = 402; 205 males) (15.4 ± 1.8 years; 167.4 ± 9.1 cm; 73.2 ± 21.5 kg) who performed bilateral handgrip strength and CMB testing was leveraged. CMB risk was delineated as clustering of three or more risk factors across weight status, mean systolic, mean diastolic, HDL-cholesterol, LDL-cholesterol, total cholesterol, triglycerides, fasting glucose and HOMA-IR. 80% of the balanced dataset was used to train several models (e.g., Decision Tree and K-Nearest Neighbors (KNN)), while 20% was retained for further validation. There were 18 initial features, including age, sex, race, BMI, and combined handgrip strength. SelectKBest, Recurrent Feature Elimination, and Random Forest were deployed to identify the most salient features. Results Resulting models were evaluated using performance metrics such as Area Under the Curve (AUC), recall and precision. The most salient model was a Quadratic Discriminant model involving five features, namely number of people in household, annual household income, number of children 5 years or younger, combined handgrip strength, and waist circumference. When deployed, the model accurately classified 83% and 93% of the positive and negative classes within the test data, respectively (accuracy = 81.7%; AUC = 0.87; Recall = 0.91; Precision = 0.81; F-Measure = 0.92). Conclusions Findings demonstrate demographics, anthropometrics, and handgrip strength can be leveraged (using machine learning techniques) to accurately predict and optimally identify nascent CMB risk in youth while mitigating peer shaming and optimizing student participation in HRPF surveillance protocols in schools. Additional studies are needed to externally validate resulting models and investigate related effects on participation in HRPF testing and CMB risk detection among children and youth.

leveraged (using machine learning techniques) to accurately predict and optimally identify nascent CMB risk in youth while mitigating peer shaming and optimizing student participation in HRPF surveillance protocols in schools. Additional studies are needed to externally validate resulting models and investigate related effects on participation in HRPF testing and CMB risk detection among children and youth. Background Muscular tness (i.e., strength, endurance and power) is the capacity to actuate work against one's body weight or other external resistance for a relatively sustained duration [1]. Muscular tness is commonly evaluated using surrogates, including handgrip strength, push-up, and curl-up. Skeletal muscle is integral to protein synthesis throughout the body. As such, altered muscle metabolism often underlies many common pathologies and chronic diseases like diabetes [2]. Muscular tness is linked with cardiorespiratory tness, weight status, and cardiovascular disease risk [3][4][5]. Particularly, normalized handgrip strength (a surrogate for muscular tness) inversely associates with clustered cardiometabolic (CMB) risk, including central adiposity and hypertension in youth [6][7][8]. However, despite the important role CMB risk clustering monitoring could play in identifying, preventing and controlling disease onset, current youth health-related physical tness (HRPF) assessments like the so called FITNESSGRAM have yet to leverage features such as handgrip strength and machine learning techniques to streamline HRPF testing and consolidate chronic disease risk screenings in schools. Consequently, these assessments do not explicitly predict CMB risk and remain highly time consuming and burdensome to physical education (PE) teachers.
Optimal muscular tness and muscle metabolic functioning are critical to preventing chronic diseases.
Texas passed Senate Bill (SB) 530 in 2007 requiring annual HRPF tness assessments of public school students across third through twelfth grades (using FITNESSGRAM® ® protocols) [9]. However, there is evidence some parents and children have reservations about the test [10]. Speci cally, 26% of surveyed PE teachers reported negative impressions of physical tness testing in Texas schools, citing the fact that performing test items in front of peers engendered anxiety, teasing, and taunting by peers (both for poor performance and trying hard). Some students missed school to avoid testing, and parents called and sent notes to request their children be excluded from testing owing to teasing. Some children cried owing to what they perceived as poor performance on HRPF tests (e.g., inability to complete a push-up). Therefore, compounded by persistent issues such as low recognition of PE as a course among school personnel and teachers withholding students from PE as punishment or for additional instruction time, it is unclear whether programs like FITNESSGRAM® ® testing actually inform policies to promote physical activity and school health services [11,12].
Hispanic/Latino children are less active at home and during recess at school (compared to their White peers) [13][14][15][16]. In fact, they averaged only 35 minutes of moderate-to-vigorous physical activity (MVPA), which is considerably lower than the recommended 60 or more daily minutes of MVPA for children and youth [17,18]. Hispanic/Latino children are disproportionately affected by obesity and chronic diseases such as diabetes [19][20][21]. Further, there was pervasive co-prevalence of muscular tness de cits (42.3%) and overweight and obesity (40%) among Hispanic/Latino youth in Corpus Christi, Texas [22].
Hispanic/Latino youth are more likely to have high abdominal adiposity, elevated triglycerides, and increased chronic disease risk [19][20][21]23]. Ironically, Hispanic/Latino youth were provided fewer schoolbased health services, including identi cation of chronic health conditions, student tracking over time, and referral to community-based care [24]. Because cardiovascular events and related morbidity are relatively rare in children, CMB indicators are commonly used to assess risk [25]. Studies have used clustering of a range of CMB markers, including systolic blood pressure, waist circumference, total cholesterol, insulin resistance, and triglycerides to assess CMB risk in children and youth [25,26].
Like many states, Texas schools screen for Type 2 diabetes by evaluating students for acanthosis nigricans, a dermatologic hyperpigmentation manifestation that sometimes results from hyperinsulinemia and insulin resistance [27]. However, this screening likely misses nascent (i.e., no physical manifestations) metabolic disease risk factors like dyslipidemia and hypertension. Considering the issues around current HRPF testing, it is imperative to examine additional surrogates that are e cient and tractable in schools, and develop classi ers that explicitly predict CMB risk, especially in medically underserved communities where many children may be uninsured/underinsured and not routinely seen by a pediatrician. Considering the need for early prediabetes detection [28], it is critical to examine any potential contributions of race and other demographics on predicting and classifying CMB risk.
Machine learning approaches have been commonly applied to classi cation problems involving predicting disease risk owing to their capacity to leverage several different methods and identify multivariate interactions and patterns that are optimally predictive of speci ed endpoints [29]. Supervised learning has been used to classify fundamental locomotor skills (e.g., hopping, running, etc.) [30], activity type (e.g., walking, standing stationary, etc.), predicting physical activity patterns in older adults [31], and classifying obesity among youth [32]. Although studies have approached prediction problems using predetermined classi cation methods, inherent peculiarities around shared contexts (e.g., ecological and sociocultural) suggest it may be important to explore several models and identify the best performing ones for the speci c problem and dataset. Further, there are no theoretical methods to determine the sample size required to effectively train machine learning models [33]. These dataset attributes underlie variations in performances between different classi cation algorithms and methods. For example, prior research found that Decision Tree and Support Vector Machine (SVM) outperformed Bayesian and Neural Networks at classifying childhood obesity using features that included push-up test, partial curl-up, and step-up in 12-year-old Malaysian children [32]. Similarly, Decision Tree outperformed Bayesian methods at classifying obesity in children after age two years [34]. Relatedly, it is advantageous to leverage different feature selection techniques [35] (e.g., lter, wrapper, and ensemble methods), because it optimizes the range of important feature combinations and decreases model complexity [36].
The problem of developing classi ers from datasets with imbalanced classes has gained some attention in the literature. Broadly, a dataset is described as imbalanced, if the discrete categories to be classi ed are not roughly equally represented in the dataset. Signi cant underrepresentation of the minority class can skew model learning and result in poor accuracy for predicting the minority class [37]. While a few different methods have been recommended to address class imbalance, under-sampling invariably implies loss of data, which does not seem optimal, especially when the dataset is low-dimensional and has relatively diminutive data points to begin with. Synthetic Minority Oversampling Technique (SMOTE) has been applied to data imbalance problems [29,38]. Depending on their inherent rules, different classi cation algorithms (e.g., Naïve Bayes) may tolerate data imbalance, whereas SMOTE-augmented data, which decreases variability between observations may decrease minority class prediction performance metrics [37,39].
It is widely recognized that it is critical to promote muscular tness in children through moderate-tovigorous physical activity [40] and deliberate integrative neuromuscular training, which has been shown to improve muscular tness by increasing neuromuscular capacity and physical/motor competence in youth [41]. However, there are issues around HRPF testing, including the apparent emphasis on performance, which can foment peer shaming and dissuade student participation.
The purpose of this study was to examine the feasibility of developing highly accurate models to predict and classify CMB risk in youth using features across demographic, anthropometric, and handgrip strength (i.e., muscular strength) data in a nationally representative youth sample.

Participants
This study leveraged cross-sectional data from the 2011-2014 National Health and Nutrition Examination Survey (NHANES). Data was collected around the United States through electronic surveys and at mobile examination centers. National Center for Health Statistics personnel collected data in periodic cycles across May 1 through October 31 and November 1 through April 30 from 2011-2014. A total of 19,346 original participant records were screened and delimited by age. Of this sample, 8,322 were between ages 0-19 years. However, blood was only drawn from participants aged 12 years and older and tested during a morning session. Therefore, only 402 records of participants aged 8-18 years had associated demographics data, CMB markers, and muscular strength data. Texas A&M University-Corpus Christi Institutional Review Board approved this study (TAMU-CC-IRB-2020-02-026).

Anthropometrics
Standing height and body weight were measured to the nearest 0.1 cm and 0.1 kg, respectively. Standardized BMI z-scores were then calculated to determine respective percentiles for age and sex according to the Centers for Disease Control and Prevention (CDC) BMI-for-age growth charts [42]. Underweight, healthy weight, overweight, and obesity were de ned as BMI < 5th percentile, 5th ≤ BMI < 85th percentile, 85th ≤ BMI < 95th percentiles, and BMI ≥ 95th percentile, respectively [42,43]. Waist circumference was measured as the distance around the waist (using a pre-marked reference point that coincides with the iliac crest) to the nearest 0.1 cm at the end normal expiration during standing using a retractable steel measuring tape. Sagittal abdominal diameter was measured as the distance around the waist (using a pre-marked reference point that coincides with the iliac crest) to the nearest 0.1 cm at the end normal expiration while participants lay supine using a Holtain-Kahn caliper. Additional details of procedures for anthropometrics data collection are provided in the NHANES Anthropometry Procedures Manual [44].

Cardiometabolic Measures
Blood was collected by a trained phlebotomist in a minimum 9-hour fasted state. Blood specimens were initially processed and stored by refrigeration (-30 o C) and subsequently sent to University of Minnesota, Minneapolis, MN for analysis. Details of laboratory quality assurance and monitoring are previously outlined [45]. Blood lipids, fasting blood glucose, and insulin were measured. Additional details of procedures for CMB measures are provided in the NHANES Anthropometry Procedures Manual [44].
Homeostatic model assessment of insulin resistance (HOMA-IR) (i.e., insulin sensitivity) was implemented using HOMA2 Calculator (Oxford, England) [46]. CMB risk was delineated as having a cluster of three risk factors across factors, namely mean systolic, mean diastolic, HDL-cholesterol (mg/dL), LDL-cholesterol (mg/dL), total cholesterol (mg/dL), insulin (mg/dL), triglycerides (mg/dL), and fasting glucose (mg/dL). Systolic blood pressure less than 120 is normal, 120 to 139 is prehypertension, and greater than 139 is hypertension [47]. Similarly, diastolic blood pressure less than 80 is normal, 80 to 89 is prehypertension, and greater than 89 is hypertensive [47]. Total cholesterol less than 200 mg/dL is normal, 200 to 239 mg/dL is borderline high, and greater than or equal to 240 mg/dL is considered high [47]. HDL greater than 45 mg/dL is normal, 40 to 45 mg/dL is borderline low, and less than 40 mg/dL is low [48,49]. LDL less than 110 mg/dL is normal, 110 to 130 mg/dL is borderline high, and greater than 130 mg/dL is high. Triglycerides less than 90 mg/dL is normal, 90 to 129 mg/dL is borderline high, greater than 130 mg/dL is high [49]. Glucose 3.0 to 25.0 mmol/L and Insulin 20 to 400 pmol/L are considered normal. Because HOMA IR does not have a universally agreed especially among youth, a score equal or greater than the 90 th percentile (i.e., 27) of the current sample was considered high. Because objective scans of body fat content were not available in the original NHANES dataset, obesity (determined using CDC Growth Charts) was deemed an additional CMB risk factor [28] such that observations with two individual risk factors across the lipid, blood pressure, glucose and insulin pro les were deemed to have CMB risk, if they were obese. This increased the percentage of the sample with CMB risk from the initial 12% to 28%.

Handgrip Strength
Muscle strength was examined using the NHANES handgrip test developed in collaboration with the National Cancer Institute designed to provide nationally representative data on muscle strength, so that associations between muscle strength and risk factors such as obesity and CMB risk can be studied. The isometric grip strength test was administered using a Takei T.K.K.5401 Digital Grip Strength Dynamometer TKK 5401 Grip-D; Takei, Niigata, Japan. After calibrating the handgrip dynamometer and adjusting the device for grip size, participants were asked to squeeze a as hard as possible with each hand in a standing or seated position. For the handgrip test, participants were instructed to grasp a dynamometer between the ngers and palm at the base of the thumb, stand upright with the feet shoulder width apart, and maintain a neutral wrist with the device pointing downwards (at the level of the thigh) without touching the body. Participants were instructed to look straight ahead, inhale prior to squeezing, squeeze with the palm facing the thigh, and exhale while squeezing. To ensure maximal effort, participants were instructed to squeeze as hard as they could until they could not squeeze any harder. Each hand was tested three times, and the hands were alternated, thereby resulting in 1 minute of rest on each hand. Efforts were adjudged to be maximal, if squeezing was observably accompanied by slight shaking. Although all participants aged 6 years and older were tested, only participants aged 12-18 years without prior hand or wrist surgery who stood unassisted for the duration of test were included in this study. Further, participants were excluded, if they indicated any hand pain or sat during the muscle strength testing. Participants were also excluded, if they were unable to ex the second interphalangeal joint on their index nger (on the hand being tested) to 90 o .

Data Analysis
The 2011-2014 NHANES transport les were accessed in February 2020 by downloading the SAS Universal Viewer (SAS, Cary, NC) and saving the associated data as a CSV le. Further data reduction and processing were done in EXCEL (Microsoft Corporation, Redmond, WA) and MATLAB R2019b (Mathworks, Natick, MA). There were 16 initial features namely gender, age (in years), race, number of people in the household, number of people in the family, number of children 5 years or younger, number of children 6-17 years, annual household income, annual family income, ratio of family income to poverty, body weight (kg), height (cm), BMI (kg/m2), waist circumference, average sagittal abdominal diameter, and combined handgrip strength. Previously, while handgrip strength did not associate, handgrip strength normalized by body weight and BMI both associated with metabolic syndrome in male and female adults [50]. Therefore, combined handgrip strength was normalized to body weight and BMI in this study, thereby resulting in 18 total features. Missing data points were imputed using the median score of the respective weight class, age and gender. Categorical variables, namely gender and race were maintained as discretized in the original dataset (i.e., male = 1; female = 2). There were 402 eligible records (298 negative and 104 positive cases). Twenty percent of the dataset (i.e., 40 positive and 40 negative records) was separated as the test set (i.e., for further internal validation). All 18 predictors (i.e., features) were recursively combined and their capacity to separate the classes visually examined using scatter plots.
In this study, "0" represented the negative class (i.e., "Not At Risk" for CMB disease) and "1" represented the positive class (i.e., "At Risk" for CMB disease). Approximately 72% of the total original observations did not have CMB risk. Such imbalance in the distribution of target classes can adversely impact the performance of classi cation models [38]. Also, considering that the cost of misclassifying observations with CMB risk as "Not At Risk" far exceeds that of the reverse error, it was important to oversample the minority class to mitigate any potential effects of data imbalance on model training with the original dataset. Therefore, the Synthetic Minority Over-Sampling Technique (SMOTE) was implemented [29,38,39]. SMOTE simply generates new data points by multiplying the Euclidean distance between a reference data point and its nearest neighbors in space by a random number between 0 and 1 and adding the resulting vector to the original (i.e., non-synthetic) data points [39]. Considering the class distribution ratio of 4:1 (i.e., 258 positive to 64 negative class records) in the training set, the Synthetic Minority Oversampling Technique (SMOTE) package was implemented in Python 3.7 (Python Software Foundation, Wilmington, DE) to resolve the imbalance. Speci cally, SMOTE was used to synthetically generate data points using nearest neighbors. As such, 64 positive cases (minority class) was oversampled by 400%. This resulted in 257 minority class observations and a total of 514 balanced records.
Features were narrowed down to the ve most salient using three different feature selection methods, i.e., lter (SelectKBest), wrapper (Recursive Feature Elimination), and embedded (Random Forest) [36] ( Table  2). The respective feature selection packages were implemented in Python. Subsequently, domain knowledge around correlates of obesity (a strong risk factor for CMB diseases) and school health-related tness testing practicalities was leveraged to select optimal features most optimal considering the classi cation problem at hand. Classi ers were then developed using MATLAB Classi cation Learner Application rst using the balanced dataset. Several models were t using the balanced dataset and a variety of algorithms including, Decision Tree, Support Vector Machine (SVM), Naïve Bayes, and Ensemble. A 5-fold cross validation was employed to prevent over tting in the training phase.
Resulting models were evaluated using Receiver Operating Characteristics curve analyses. Accuracy, associated Area Under Curve (AUC) (where AUC ≥ 8 is good discrimination), the True Positive Rate (TPR) (i.e., sensitivity or recall), and the False Positive Rate (FPR) (i.e., 1 -Speci city) indicated model performance. Overall, model saliency was adjudged considering the recall, precision, and F-Measure magnitudes, and performance when deployed to classify the test data. Precision refers to the capacity to identify only the relevant cases, while recall is the capacity to identify all cases of interest within a dataset. Maximizing precision decreases the incidence of false positives, while maximal recall reduces the instances of false negatives. F-Measure (harmonic mean of precision and recall) was also adopted, because it penalizes extreme values of precision and recall.

Statistical Analysis
Spearman and Pearson's bivariate correlations were calculated and examined ( Table 1 in the supplement). A maximum threshold of 0.899 was set to determine collinearity, such that two or more related features with a correlation equal to or greater than 0.9 were considered colinear. Correlation coe cients were considered signi cant at the 0.05 level (2-tailed), i.e., P<.05.

Results
Prior to splitting the imbalanced dataset into training and test sets and oversampling the minority class, the records (n = 402; 205 males) were from youth aged 12-18 years (15.4 ± 1.8 years; 167.4 ± 9.1 cm; 73.2 ± 21.5 kg) ( Table 1). The sample was 20% Mexican American, 9% Other Hispanic, 25% non-Hispanic White, 29% non-Hispanic Black, 13% non-Hispanic Asian, 3% Other Race, including Multi-Racial. Table 1 shows similar dataset attributes following the train-test data split and subsequent oversampling. Of the total observations, 71.75% belonged to the majority class ("Not At Risk") and 28.25% belonged to the minority class ("At Risk"). Figure 1 captures the 18 features initially considered (Fig. 1A) and the remaining features (Fig. 1B) after having excluded some features owing to their collinearity with others. Notably, the algorithm excluded number of people in family, BMI, waist circumference, and average sagittal abdominal diameter. However, considering its stronger correlation with CMB risk (compared to body weight) ( Table 1 in the supplement) and ease of measurement in school or home settings (compared to sagittal abdominal diameter), waist circumference was later substituted for body weight on the list of non-colinear features.
SelectKBest and associated models SelectKBest feature selection method yielded ratio of family income to poverty, combined handgrip strength, combined handgrip strength normalized to body weight, height, and waist circumference as the top ve predictive features ( Table 2). The corresponding top salient models were Coarse Tree, Quadratic Discriminant, Kernel Naïve Bayes, and Weighted KNN (Table 3A). When deployed to classify the test data, the Coarse Tree model accurately classi ed 85% and 78% of the positive and negative classes, respectively (Table 3A). Comparatively, the Quadratic Discriminant model accurately classi ed 80% and 80% of the positive and negative classes, respectively, while the Weighted KNN model accurately classi ed 80% and 80% of the positive and negative classes, respectively (Table 3A). Other salient models with the same features and their respective performance metrics are listed in Table 3A.

Recurrent Feature Elimination and associated models
Recurrent Feature Elimination method produced number of people in household, number of children ve years or younger, annual household income, combined handgrip strength, height, and waist circumference as the top ve predictive features ( Table 2). The corresponding most salient models were Quadratic Discriminant, Logistic Regression, and Linear SVM (Table 3B). When deployed to classify the test data, the Quadratic Discriminant model accurately classi ed 83% and 93% of the positive and negative classes, respectively (Table 3B). In comparison, the Logistic Regression model accurately classi ed 80% and 88% of the positive and negative classes, respectively, and the Linear SVM model accurately classi ed 80% and 85% of the positive and negative classes, respectively (Table 3B). Other salient models with similar features and their respective performance metrics are listed in Table 3B.
Decision Tree and associated models Feature selection using Decision Trees method yielded annual household income, ratio of family income to poverty, combined handgrip strength, height, and waist circumference as the top ve predictive features ( Table 2). The corresponding top salient models were Medium Tree, Quadratic SVM, and Fine KNN (Table 3C). When deployed to classify the test data, the Medium Tree model accurately classi ed 83% and 80% of the positive and negative classes, respectively (Table 3C). Comparatively, the Quadratic SVM model accurately classi ed 83% and 80% of the positive and negative classes, respectively, and the Fine KNN model accurately classi ed 80% and 83% of the positive and negative classes, respectively (Table 3C). Other salient models with the same features and their respective performance metrics are listed in Table 3C.

Discussion
This study examined the feasibility of developing highly accurate models to classify CMB risk in youth using features across demographic, anthropometric, and handgrip strength (i.e., muscular strength) data in a nationally representative youth sample. The top ve features selected using Recurrent Feature Elimination, a wrapper method, produced the best performing CMB risk predicting models. Features, namely number of people in household, number of children ve years or younger, annual household income, combined handgrip strength, height, and waist circumference were leveraged ( Table 2). The most salient corresponding models were ones t using Discriminant, Logistic, and SVM algorithms (Table 3B). When deployed, the Quadratic Discriminant model accurately classi ed 83% and 93% of the positive and negative classes within the test data, respectively, while the Logistic Regression model accurately classi ed 80% and 88% of the positive and negative classes, respectively (Table 3B). The Linear SVM model accurately classi ed 80% and 85% of the positive and negative classes, respectively (Table 3B). Other salient models with similar features and their respective performance metrics are listed in Table 3B.
Consistent with previous reports of varying performances across different classi cation algorithms, speci c algorithms were more salient in this study owing to their superior performance involving speci c clusters of features. For example, previous work found that Decision Tree and Support Vector Machine (SVM) outperformed Bayesian and Neural Networks at classifying childhood obesity using features that included push-up test, partial curl-up, and step-up in 12-year-old Malaysian children [32]. Similarly, Decision Tree outperformed Bayesian methods at classifying obesity in children after age two years [34]. In the current study, model performance varied across predictive algorithms, feature selection methods, and the resulting cluster of salient features. Speci cally, SelectKBest method selected ratio of family income to poverty, combined handgrip strength, combined handgrip strength normalized to body weight, height, and waist circumference as the top ve predictive features. The related Decision Tree, Discriminant, and KNN algorithms outperformed Logistic Regression, SVM, and Naïve Bayes models when evaluated using the test data and features were selected using SelectKBest. A limitation of lter methods is that it fails to account for the dependency between features and may resultantly not select the most important features [35]. Although the correlation between combined handgrip strength normalized and un-normalized to body weight did not meet the threshold for exclusion, it is interesting that only SelectKBest, a lter method, selected both as salient features. This appears consistent with its reported tendency to ignore interdependencies when selecting features [35]. In contrast, wrapper methods such as Recurrent Feature Elimination accounts for dependencies between features and outperforms lter methods at selecting the most important features [35]. Although multiple performance metrics were evaluated (Tables 3A, 3B, and 3C), however, because the practical cost of misclassifying observations as "At Risk" (i.e., False Positive) is much less consequential than misclassifying observations as "Not At Risk" (i.e., False Negative), the balance between model sensitivity and precision (i.e., highest F-Measure score) was ultimately interpreted as having the greatest contextual signi cance in this study. Speci cally, parents of a child with CMB risk who is predicted as not having CMB risk may not see the need to modify or adopt lifestyle factors such as increased physical activity, decreased sedentary time, and reduced sugar-sweetened beverage consumption, which will likely help control the risk. Even when families may contemplate some of these changes presumably resulting from encounters with other health promotion campaigns/exposures, knowing their child may be at risk for CMB disease will likely infuse a level of urgency that may not otherwise inform their decisions around lifestyle factors and seeking professional help. Therefore, the gravity of a false negative classi cation could be grave, especially in medically underserved communities, where children may not be routinely seen by a pediatrician. On the upside, these models are examined as potential tools to predict CMB risk, not diagnose CMB disease. Therefore, they could be highly valuable at identifying children who are at risk and serve as a basis to alert and connect parents or primary caregivers with resources such as affordable and free community-based medical and lifestyle factor services.
From the perspective of holding health as a shared value (i.e., where parents, school administration jointly value and prioritize student health), having scalable validated predictive models for CMB risk may be more e cient at ensuring all children are regularly screened and those who are most at risk for CMB disease are identi ed and provided support to help control risk. The current practice in the state of Texas is that school health services personnel evaluate students for acanthosis nigricans (a manifestation of insulin resistance) as the sole mechanism to screen for type II diabetes. Deploying accurate predictive models could potentially identify risk while physical manifestations such as acanthosis nigricans that precipitate CMB disease, are nascent or altogether absent. A screening protocol that includes handgrip strength and waist circumference could consolidate muscular tness and health screenings and increase collaboration between PE teachers and school nurses, thereby

Strengths and Weaknesses
This study has several strengths. The implementation of SMOTE allowed several models to be trained on a balanced dataset. This increased con dence to examine other models besides Naïve Bayes, which was previously shown to be tolerant of data imbalance compared to other classi cation algorithms [39]. As such, any concerns around potential bias that may skew model performance towards the majority class owing to an imbalanced training set was mitigated. The use of a balanced dataset yielded relatively highly accurate models that are potentially scalable in settings like schools where the rst line of screening for metabolic disease involves evaluating students for an observable manifestation of hyperinsulinemia (i.e., acanthosis nigricans). This is the rst study to demonstrate the feasibility of developing models that could considerably improve surveillance by alerting school health services personnel to children who might be at risk for CMB disease precipitators even in their nascent stages. The salience of handgrip strength as an important feature opens up the prospect of consolidating and deploying the same surveillance models to both predict CMB risk, evaluate muscular tness, and optimally inform primary care referrals and preventive services. Lastly, this study leveraged existing NHANES data related to chronic disease surveillance, thereby precluding the need for new data collection and associated resources.
This study has several limitations, including a relatively small dataset. Further, although the data was from a nationally representative sample, only 28% of the original dataset had greater than three individual risk factors (e.g., hypertension, high glucose, high triglycerides, etc.) and was therefore categorized as having CMB risk. While the dataset was synthetically balanced, a documented disadvantage of SMOTE is that while it synthetically generating data points, it fails to consider that neighboring data points can be from other classes. This failure can result in increased overlap between classes, thereby introducing additional noise to the dataset. However, the imbalanced dataset used was preprocessed and grouped such that neighboring data points belonged to the same class. It does not appear such noise introduction adversely impacted the models developed following SMOTE implementation in this study as evidenced by their superior performance metrics over the models trained on the imbalanced dataset. Also, the data is cross-sectional in nature. As such, current models do not establish any causal longitudinal relationships between salient features and CMB risk. Notably, even the best performing models resulted in false negatives and/or false positives in the order of low, albeit double-digit percentages. Additionally, these models have yet to be externally validated using data from an unrelated sample. Lastly, body weight (kg) rather than body mass (kg) was used in this study in order to maintain consistence with language in the widely known NHANES dataset. The current models are not intended to diagnose chronic disease; rather, they predict cross sectional risk of chronic disease related to CMB risk clustering.

Conclusions
Recursive Feature Elimination (a wrapper method) appears optimal at identifying the most salient predictive features for classifying CMB risk.

Availability Data Statement
The imbalanced and oversampled datasets supporting the conclusions of this manuscript are attached as supplements. The Python codes supporting the analyses will be made available by the author upon request, without undue reservation, to any quali ed researcher.

Competing interests
The author discloses that there are no competing interests related to this work.

Funding
This work was not funded.
Author's contributions TA conceived the study, designed the study, performed statistical analyses and interpretation, and drafted the manuscript. TA takes responsibility for the integrity of this work as a whole, from inception to the nished article.