A. Study Design
A cluster-randomized controlled trial was conducted in 40 elementary schools (20 intervention schools; 20 control schools) in a large suburban school district in Georgia, US Students were prospectively followed from Grade 4 to Grade 5 including Grade 4 Fall (Fall 2018; “T1”), Grade 4 Spring (Spring 2019; “T2”), Grade 5 Fall (Fall 2019; “T3”), and Grade 5 Spring (Spring 2020; “T4”), though study activities ended midway through T4 in March 2020 because of the COVID-19 pandemic. School selection and randomization are described in a previous manuscript.14 The school district administration, district IRB, and Emory University IRB (CR001-IRB00095600) approved this study. This study was registered with the National Institutes of Health (NIH) ClinicalTrials.gov system, with ID NCT03765047.
The intervention employed components from the evidence-based Health Empowers You! program, which was designed using the Comprehensive School PA Program approach promoted by the Centers for Disease Control and Prevention (CDC).15 The multi-level intervention aims to shift school PA practices and culture and help students reach at least 45 minutes of PA during the school day. Prior evaluations of Health Empowers You! document improvements in average daily steps, moderate-to-vigorous PA (MVPA) levels in physical education (PE) classes, and student fitness and BMI16,17. The intervention was implemented with the goal of sustainably elevating student school-day MVPA. Intervention status was ultimately not included in this analysis because differences in MVPA between intervention and control students were small; intervention students had approximately 3 more daily minutes of MVPA in Grade 4 Fall, 4.5 minutes more in Grade 4 Spring, and 5 minutes more in Grade 5 Fall. Details about the intervention are provided in a previous manuscript14.
Before study implementation, consent/assent forms were distributed through district and school protocol with a brief informational video to obtain guardian consent and student assent to measure PA via accelerometry, and authorization for the school district to share de-identified demographic, standardized test score, course grade, Fitnessgram, and attendance data with the research team.
B. Study Population
Participating elementary schools included diverse student race/ethnicity and a mix of higher- and lower SES. The school selection procedure ensured the schools were representative of the school district.14 Of 6 525 fourth graders in the 40 study schools, 4 966 (76%) returned consents. Special education teachers participated in training and received resources for implementation of the intervention at their discretion in the intervention schools, but students in special education classrooms were not included in data collection because these classes include multiple grade levels and required complex additional supports. After removing students in special education classrooms from the analytic sample, 4 936 students were eligible for analysis.
C. Data Sources
Two data sources were used for this study: 1) routinely-collected school district data to obtain information about demographics, attendance, Fitnessgram, course grades and standardized test scores; and 2) accelerometer data to objectively measure physical activity. Each of these sources are described in more detail below.
School District Data
Demographic data included parent/guardian reported student sex and race/ethnicity; and school reported students with disabilities (SWD), English language learners (ELL), and participation in free/reduced-price lunch (FRL) during the Grade 4 school year.
Attendance data included the number of days students were absent, tardy, and enrolled during the Grade 4 school year.
FitnessGram data documented students’ performance on the FitnessGram, an assessment developed by The Cooper Institute.18 The district’s PE instructors are routinely trained on FitnessGram data collection, and the PASs delivered a refresher training on FitnessGram to PE instructors in both years of the study. Students complete the FitnessGram in September/October and May/June each year. PE instructors measured student height and weight to calculate student BMI. Results from the FitnessGram PACER, a 20-meter shuttle run, were used to estimate CRF. Full FitnessGram data were collected in Grade 4 Fall and Spring and Grade 5 Fall. FitnessGram data were not collected in Grade 5 Spring due to COVID-19. The PACER test was also not completed in Grade 3 because it has not been validated among third grade students, but BMI data were collected in the Grade 3 Fall FitnessGram.
Semesterly course grades data included mathematics, reading, spelling, and writing grades from Grade 3 Fall to Grade 5 Fall.
Georgia Milestones Test data included student scores for Grade 3 Spring and Grade 4 Spring for English language arts (ELA), mathematics, and Lexile reading level.19 The Milestone test is designed to assess whether students’ knowledge and skills meet state-adopted content standards for each academic subject.20 Standardized tests were not administered in Grade 5 due to COVID-19.
Accelerometers
School-day accelerometer data were collected in Grade 4 Fall and Spring and Grade 5 Fall. During one week in each semester, consented students wore ActiGraph wGT3X-BT accelerometers (ActiGraph LLC, Pensacola, FL) to objectively track school-day PA. Teachers were trained on proper accelerometer wear. Students wore the accelerometer belt on the waist for the entire school day.
D. Study Measures
Exposure
The exposure for this analysis is longitudinal weight status based on BMI. CDC age and sex-specific growth charts21 were used to categorize participants as obese, overweight, healthy weight, and underweight. Children with a BMI at or above the 95th percentile for their age and sex were categorized as obese, those from the 85th to 95th percentile were overweight, those from the 5th to the 85th percentile were healthy weight, and those below the 5th percentile were underweight.22
Longitudinal weight status was based on obesity status at two time points and had four categories. Students who were obese at baseline and at follow-up were assigned “persistently obese,” those who were not obese at baseline but were at follow-up were “became obese,” those who were obese at baseline but not at follow-up were “formerly obese,” and those who were not obese at both time points were “persistently non-obese.” For analyses examining Grade 4 standardized test scores as outcomes, baseline BMI was Grade 3 Fall and follow-up was Grade 4 Spring. For analyses examining Grade 5 fall course grades as outcomes, baseline BMI was Grade 3 Fall and follow-up was Grade 5 Fall.
Outcomes
Two different types of academic achievement measures were assessed. The first was Grade 4 Spring ELA, math, and Lexile Georgia Milestones standardized test results. Participant math scale scores ranged from 394 to 715, ELA scale scores ranged from 357 to 775, and Lexile scores ranged from 190 to 1300. Analyses were conducted with Milestones scores as continuous variables.
The second type of academic achievement measure was teacher-assigned course grades for reading, writing, spelling, and math. Course grades for Grade 3 Fall to Grade 5 Fall were collected and ranged from 0 to 100, with 100 indicating highest achievement.
Covariates
Variables examined as modifiers and/or confounders included, sex (male or female), race/ethnicity (Asian, Black, Latino, White, or Other), FRL, SWD, ELL, prior achievement, CRF, MVPA and sedentary time. FRL status was dichotomized as “E” or “not receiving” and was used as a proxy for poverty status since only students whose families earn less than 185% of the federal poverty level are eligible. SWD included those with physical or learning disabilities and was dichotomized as “yes” or “no.” Current ELL was also dichotomized as “yes” or “no”. Student prior achievement was defined as the previous year’s course grade or standardized test score, in accordance with the outcome assessed in analyses. For example, the analysis using Grade 4 Georgia Milestones math standardized test scores controlled for each student’s Grade 3 Georgia Milestones math standardized test score. PACER laps were converted to an estimated CRF using the Cooper Institute’s standard formula.23 The median CRF across Grade 4 Fall, Grade 4 Spring, and Grade 5 Fall was assigned to each student. The “healthy fitness zone” cutoff for CRF in this age group is 40.2.24 A dichotomous CRF variable using this cutoff categorized students’ median CRF as “fit” or “unfit.”
ActiLife software was used to download and score accelerometer data, and filtered such that only school-day minutes were used in scoring. Non-wear time was defined as 60 consecutive minutes of zero counts, allowing for up to 2 minutes of counts between 0 and 100.25 Data were collected in 15-second epochs and scored using Evenson cut points for activity thresholds.26 Criteria for a valid day required students to have worn the accelerometer for at least 80% of the school day. Students needed at least three valid days of wear time during the 5-day measurement period each semester to be included in analyses for that semester. A single measure of mean MVPA minutes and mean sedentary minutes was calculated in each semester for students who met the 3-day criteria. Though at least four days of wear time is recommended for reliable PA estimates in children,27,28 school day PA is less variable than full-day data.29
E. Analysis
Variables were missing data either because students were not enrolled in the participating schools for the entirety of the study or because their observation did not meet inclusion criteria. Multiple imputation addressed missing data. Twenty imputed datasets were created using the multilevel multiple imputation program Blimp.30 Implausible imputed values were set to variables’ upper or lower bounds, depending on the nature of the recorded implausible value.
Descriptive statistics were computed on the non-imputed data. Two-level multilevel models were then fit with students nested within schools and synthesizing data across the 20 imputed sets. The teacher level was not included in multi-level analyses since students with departmentalized teachers rotated across teacher for core subjects. All models were run with longitudinal obesity status as exposure. First, models assessed crude associations between longitudinal weight status and academic outcomes (Model A). Then the same associations were assessed but adjusted for prior achievement, FRL, sex, race/ethnicity, SWD, and ELL (Model B). For analyses with Grade 4 standardized test outcomes, Grade 3 standardized test scores were used for prior achievement. For analyses with Grade 5 Fall course grade outcomes, average Grade 3 course grade was used for prior achievement. Model C further adjusted for dichotomized CRF. Fixed and random effects were aggregated across imputations using Rubin’s rules.31