Study design
This survey is cross-sectional in nature, and was conducted during the period between September and December in 2020 at Beijing and Tangshan (Hebei province), China. The implementation of this survey received approval from the Ethics Committee of China-Japan Friendship Hospital, and was in compliance with the principles of the Declaration of Helsinki. Signed informed consent was obtained from the parents of assessable children who participated in the present study.
Study participants
A stratified cluster random sampling strategy was used to collect information from preschool-aged kindergarten children in Beijing and Tangshan. In detail, four districts out of 16 districts in Beijing and two districts out of seven districts in Tangshan were selected. Within each district, five kindergartens were randomly selected, and so a total of 30 kindergartens entered into this survey. Children from these 30 kindergartens formed the study participants, except those who were diagnosed to have major illnesses, which include but not limit to chronic kidney disease, hypothyroidism, or congenital heart disease.
Data collection and quality control
Data were collected by circulating our self-designed questionnaires, which were a priori found to have reliability coefficient alpha over 0.85, to the parents or guardians of a total of 10441 children from selected kindergartens. The questionnaires were filled in online via the tool termed as the WenJuanXing (https://www.wenjuan.com/), and finally 10230 questionnaires were returned with a response rate of 98%. Data from completed questionnaires were downloaded in the form of Excel from this website.
Information from questionnaires was collected from both children and their parents. From children, sex, region, date of birth, time spent on outdoor activities at workdays and weekends, weekly intake frequency of fast food and night meals, picky eating, birthweight, birth height, gestational age, delivery mode, twin birth, birth order, breastfeeding duration and solid food introduction age were recorded. Thereof, weight (to the nearest 0.1 kg) and height (to the nearest 0.1 cm) were measured by trained healthcare physicians
From parents, self-reported data included age, height, gestational diabetic mellitus, education, family income, perinatal clinical history (delivery mode, birth weight, and birth height), duration of breast-feeding, and self-rated patience to children.
Kindergarten teachers were responsible for sending electronic questionnaires online to the parents or guardians of all participant children. Data exported from electronic questionnaires to a Microsoft Office ExcelTM spreadsheet were strictly checked by trained staff. In case of missing or uncertain records, parents or guardians were contacted by phones for the sake of accuracy.
Overweight and obesity definition
Several official definitions are available for the definition of childhood overweight and obesity, including the International Obesity Task Force (IOTF) criteria, World Health Origination (WHO) criteria, and Chinese criteria. In this study, we adopted the WHO criteria for wide application. In detail, overweight and obesity are defined based on body mass index (BMI) z-scores at a cutoff of 5 years old under the WHO criteria [22–24]. In children 5 years of age or below, overweight and obesity are defined as the BMI Z-score between 2 and 3 and > 3, respectively. In children over 5 years of age, overweight and obesity are separately defined as the BMI Z-score between 1 and 2 and > 2.
Definitions of baseline characteristics
Time spent on outdoor activities every day was calculated as the sum of time both on workdays × 5 and weekends × 2 divided by 7. Fast foods referred to foods with high energy and low nutrition (e.g., hamburger and French fries), Night meal was defined as eating food within 2 hours before bedtime. Weekly intake frequency was consistent with fast food and night meals, which was classified as every day, often (3-5 times), occasional (1-2 times) or none or occasionally. Picky eating was defined as yes or no. Gestational, birth weight and birth height were recorded. Delivery mode included vaginal delivery and caesarean section.
Parental height was self-reported. Paternal and maternal age at delivery was calculated as the difference between the date of child’s birthdate and parent birthdate. Maternal gestational diabetes mellitus diagnosed by doctors from second-class or above hospitals, were recorded. Education was categorized as doctor’s degree or above, master’s degree, bachelor’s degree, and high school degree or below. The relatives in this study referred to the parents, grandparents, and grandparents-in-law of children. Family income (RMB per year) was categorized as ≥1,000,000, 600,000-1,000,000, 300,000-600,000, 100,000-300,000, and <100,000.
Statistical analyses
Considering that the number of children with obesity was small, overweight and obesity were combined as a single group compared with the non-overweight group (the reference group). Factors with missing data over 30% were removed from the analysis. Missing data were derived using the multiple imputation procedure by the mice package in the R environment (Version 4.1.1). Categorical data are expressed as count (percentage). For continuous data, P values for comparison between children with non-overweight and overweight or obesity were derived by the t test for normally distributed data, the rank-sum test for skewed data, and the χ2 test for categorical data.
To examine the association of data from children and parents with childhood overweight/obesity, nine machine learning algorithms were employed, including Logistic regression model, decision tree, support vector machine (SVM), random forest, K-nearest neighbor (KNN), gradient boosting machine (GBM), extreme gradient boosting (XGBoost), light gradient boosting machine (LGBM), and naïve Bayes. On the basis of the nine machine learning algorithms, both hard and soft voting classifiers were calculated as an ensemble machine learning method. The performance of machine learning algorithms was assessed using accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve (AUROC). Accuracy is a measurement of how good a model is. Precision is a measurement of how many positive predictions were actual positive observations. Recall is a measure of how many actual positive observations were predicted correctly. F1 score is an ‘average’ of both precision and recall. The importance of each factor under study was calculated using the χ2 test and ranked in an ascending order.
Additionally, a deep learning algorithm, sequential model was also employed to test this association by using three different optimization algorithms, that is, adaptive moment estimation (Adam), root mean square prop (RMSprop), and stochastic gradient descent (SGD). Model loss and accuracy were used to appraise prediction performance.
Both machine learning and deep learning algorithms were trained on 60% of study children (the training group) and tested on the remaining 40% (the testing group) as an internal validation of the prediction model.
All analyses were done using the community PyCharm (Edition 2018.1 x64) under Windows 10 with the Python (Python Software Foundation) software (Version 3.7.6).