Participants
This study leveraged cross-sectional data from the 2011-2014 National Health and Nutrition Examination Survey (NHANES). Data was collected around the United States through electronic surveys and at mobile examination centers. National Center for Health Statistics personnel collected data in periodic cycles across May 1 through October 31 and November 1 through April 30 from 2011-2014. A total of 19,346 original participant records were screened and delimited by age. Of this sample, 8,322 were between ages 0-19 years. However, blood was only drawn from participants aged 12 years and older and tested during a morning session. Therefore, only 402 records of participants aged 8-18 years had associated demographics data, CMB markers, and muscular strength data. Texas A&M University-Corpus Christi Institutional Review Board approved this study (TAMU-CC-IRB-2020-02-026).
Procedures
Anthropometrics
Standing height and body weight were measured to the nearest 0.1 cm and 0.1 kg, respectively. Standardized BMI z-scores were then calculated to determine respective percentiles for age and sex according to the Centers for Disease Control and Prevention (CDC) BMI-for-age growth charts [42]. Underweight, healthy weight, overweight, and obesity were defined as BMI < 5th percentile, 5th ≤ BMI < 85th percentile, 85th ≤ BMI < 95th percentiles, and BMI ≥ 95th percentile, respectively [42, 43]. Waist circumference was measured as the distance around the waist (using a pre-marked reference point that coincides with the iliac crest) to the nearest 0.1 cm at the end normal expiration during standing using a retractable steel measuring tape. Sagittal abdominal diameter was measured as the distance around the waist (using a pre-marked reference point that coincides with the iliac crest) to the nearest 0.1 cm at the end normal expiration while participants lay supine using a Holtain-Kahn caliper. Additional details of procedures for anthropometrics data collection are provided in the NHANES Anthropometry Procedures Manual [44].
Cardiometabolic Measures
Blood was collected by a trained phlebotomist in a minimum 9-hour fasted state. Blood specimens were initially processed and stored by refrigeration (-30oC) and subsequently sent to University of Minnesota, Minneapolis, MN for analysis. Details of laboratory quality assurance and monitoring are previously outlined [45]. Blood lipids, fasting blood glucose, and insulin were measured. Additional details of procedures for CMB measures are provided in the NHANES Anthropometry Procedures Manual [44]. Homeostatic model assessment of insulin resistance (HOMA-IR) (i.e., insulin sensitivity) was implemented using HOMA2 Calculator (Oxford, England) [46].
CMB risk was delineated as having a cluster of three risk factors across factors, namely mean systolic, mean diastolic, HDL-cholesterol (mg/dL), LDL-cholesterol (mg/dL), total cholesterol (mg/dL), insulin (mg/dL), triglycerides (mg/dL), and fasting glucose (mg/dL). Systolic blood pressure less than 120 is normal, 120 to 139 is prehypertension, and greater than 139 is hypertension [47]. Similarly, diastolic blood pressure less than 80 is normal, 80 to 89 is prehypertension, and greater than 89 is hypertensive [47]. Total cholesterol less than 200 mg/dL is normal, 200 to 239 mg/dL is borderline high, and greater than or equal to 240 mg/dL is considered high [47]. HDL greater than 45 mg/dL is normal, 40 to 45 mg/dL is borderline low, and less than 40 mg/dL is low [48, 49]. LDL less than 110 mg/dL is normal, 110 to 130 mg/dL is borderline high, and greater than 130 mg/dL is high. Triglycerides less than 90 mg/dL is normal, 90 to 129 mg/dL is borderline high, greater than 130 mg/dL is high [49]. Glucose 3.0 to 25.0 mmol/L and Insulin 20 to 400 pmol/L are considered normal. Because HOMA IR does not have a universally agreed especially among youth, a score equal or greater than the 90th percentile (i.e., 27) of the current sample was considered high. Because objective scans of body fat content were not available in the original NHANES dataset, obesity (determined using CDC Growth Charts) was deemed an additional CMB risk factor [28] such that observations with two individual risk factors across the lipid, blood pressure, glucose and insulin profiles were deemed to have CMB risk, if they were obese. This increased the percentage of the sample with CMB risk from the initial 12% to 28%.
Handgrip Strength
Muscle strength was examined using the NHANES handgrip test developed in collaboration with the National Cancer Institute designed to provide nationally representative data on muscle strength, so that associations between muscle strength and risk factors such as obesity and CMB risk can be studied. The isometric grip strength test was administered using a Takei T.K.K.5401 Digital Grip Strength Dynamometer TKK 5401 Grip-D; Takei, Niigata, Japan. After calibrating the handgrip dynamometer and adjusting the device for grip size, participants were asked to squeeze a as hard as possible with each hand in a standing or seated position. For the handgrip test, participants were instructed to grasp a dynamometer between the fingers and palm at the base of the thumb, stand upright with the feet shoulder width apart, and maintain a neutral wrist with the device pointing downwards (at the level of the thigh) without touching the body. Participants were instructed to look straight ahead, inhale prior to squeezing, squeeze with the palm facing the thigh, and exhale while squeezing. To ensure maximal effort, participants were instructed to squeeze as hard as they could until they could not squeeze any harder. Each hand was tested three times, and the hands were alternated, thereby resulting in 1 minute of rest on each hand. Efforts were adjudged to be maximal, if squeezing was observably accompanied by slight shaking. Although all participants aged 6 years and older were tested, only participants aged 12-18 years without prior hand or wrist surgery who stood unassisted for the duration of test were included in this study. Further, participants were excluded, if they indicated any hand pain or sat during the muscle strength testing. Participants were also excluded, if they were unable to flex the second interphalangeal joint on their index finger (on the hand being tested) to 90o.
Data Analysis
The 2011-2014 NHANES transport files were accessed in February 2020 by downloading the SAS Universal Viewer (SAS, Cary, NC) and saving the associated data as a CSV file. Further data reduction and processing were done in EXCEL (Microsoft Corporation, Redmond, WA) and MATLAB R2019b (Mathworks, Natick, MA). There were 16 initial features namely gender, age (in years), race, number of people in the household, number of people in the family, number of children 5 years or younger, number of children 6-17 years, annual household income, annual family income, ratio of family income to poverty, body weight (kg), height (cm), BMI (kg/m2), waist circumference, average sagittal abdominal diameter, and combined handgrip strength. Previously, while handgrip strength did not associate, handgrip strength normalized by body weight and BMI both associated with metabolic syndrome in male and female adults [50]. Therefore, combined handgrip strength was normalized to body weight and BMI in this study, thereby resulting in 18 total features. Missing data points were imputed using the median score of the respective weight class, age and gender. Categorical variables, namely gender and race were maintained as discretized in the original dataset (i.e., male = 1; female = 2). There were 402 eligible records (298 negative and 104 positive cases). Twenty percent of the dataset (i.e., 40 positive and 40 negative records) was separated as the test set (i.e., for further internal validation). All 18 predictors (i.e., features) were recursively combined and their capacity to separate the classes visually examined using scatter plots.
In this study, “0” represented the negative class (i.e., “Not At Risk” for CMB disease) and “1” represented the positive class (i.e., “At Risk” for CMB disease). Approximately 72% of the total original observations did not have CMB risk. Such imbalance in the distribution of target classes can adversely impact the performance of classification models [38]. Also, considering that the cost of misclassifying observations with CMB risk as “Not At Risk” far exceeds that of the reverse error, it was important to oversample the minority class to mitigate any potential effects of data imbalance on model training with the original dataset. Therefore, the Synthetic Minority Over-Sampling Technique (SMOTE) was implemented [29, 38, 39]. SMOTE simply generates new data points by multiplying the Euclidean distance between a reference data point and its nearest neighbors in space by a random number between 0 and 1 and adding the resulting vector to the original (i.e., non-synthetic) data points [39]. Considering the class distribution ratio of 4:1 (i.e., 258 positive to 64 negative class records) in the training set, the Synthetic Minority Oversampling Technique (SMOTE) package was implemented in Python 3.7 (Python Software Foundation, Wilmington, DE) to resolve the imbalance. Specifically, SMOTE was used to synthetically generate data points using nearest neighbors. As such, 64 positive cases (minority class) was oversampled by 400%. This resulted in 257 minority class observations and a total of 514 balanced records.
Features were narrowed down to the five most salient using three different feature selection methods, i.e., filter (SelectKBest), wrapper (Recursive Feature Elimination), and embedded (Random Forest) [36] (Table 2). The respective feature selection packages were implemented in Python. Subsequently, domain knowledge around correlates of obesity (a strong risk factor for CMB diseases) and school health-related fitness testing practicalities was leveraged to select optimal features most optimal considering the classification problem at hand. Classifiers were then developed using MATLAB Classification Learner Application first using the balanced dataset. Several models were fit using the balanced dataset and a variety of algorithms including, Decision Tree, Support Vector Machine (SVM), Naïve Bayes, and Ensemble. A 5-fold cross validation was employed to prevent overfitting in the training phase.
Resulting models were evaluated using Receiver Operating Characteristics curve analyses. Accuracy, associated Area Under Curve (AUC) (where AUC ≥ 8 is good discrimination), the True Positive Rate (TPR) (i.e., sensitivity or recall), and the False Positive Rate (FPR) (i.e., 1 - Specificity) indicated model performance. Overall, model saliency was adjudged considering the recall, precision, and F-Measure magnitudes, and performance when deployed to classify the test data. Precision refers to the capacity to identify only the relevant cases, while recall is the capacity to identify all cases of interest within a dataset. Maximizing precision decreases the incidence of false positives, while maximal recall reduces the instances of false negatives. F-Measure (harmonic mean of precision and recall) was also adopted, because it penalizes extreme values of precision and recall.
Statistical Analysis
Spearman and Pearson’s bivariate correlations were calculated and examined (Table 1 in the supplement). A maximum threshold of 0.899 was set to determine collinearity, such that two or more related features with a correlation equal to or greater than 0.9 were considered colinear. Correlation coefficients were considered significant at the 0.05 level (2-tailed), i.e., P<.05.