Data were derived from the longitudinal “Monitoring Activities of Teenagers to Comprehend their Habits” (MATCH) study.  At study inception, English- and French-speaking grade 5 and grade 6 students (9-12 years) were recruited from 17 schools in New Brunswick, Canada. Schools were situated in a mix of rural and urban areas representing diverse socio-economic backgrounds. Participants completed self-report questionnaires three times per year (i.e., fall, winter, and spring) at four month intervals every year until graduating from Grade 12 (16-18 years). These time points were specifically chosen to control for the potential confounding effects of seasonality (i.e., fall, winter, spring). 
In the first year of data collection, there were 806 participants (55% girls; 65% French-speaking) who were, on average, 10.3 ± 0.6 years of age. In addition, 131 students from participating schools joined after the first year of the MATCH study, such that a total of 937 participants (56% girls; 66% French-speaking) took part in at least one survey cycle. Data on PA motives and PA were collected during cycles 1 to 15 which spanned the first 5 years of the MATCH study; analyses were restricted to data from those 15 cycles.
Participants’ PA motives were assessed using the MPAM-R scale. [23, 27] Participants were asked to report the extent to which 30 items representing PA motives were true for them using a 7-point Likert–type scale, with 1 representing “not at all true for me” and 7 representing “very true for me.” In theory, the MPAM-R includes 7 items assessing enjoyment motives (e.g. “because it makes me happy”), 7 items assessing competence motives (e.g. “because I want to improve existing skills”), 5 items assessing social motives (e.g. “because I want to be with friends”), 5 items assessing fitness motives (e.g. “because I want be physically fit”), and 6 items assessing appearance motives (e.g. “because I want to lose or maintain weight to look better”). The MPAM-R has demonstrated score reliability and validity in other studies. [32, 34]
MVPA was assessed with a 2-item measure developed specifically for adolescents.  Participants were asked to read the following statement: “Physical activity is an activity that increases your heart rate and makes you get out of breath some of the time. Physical activity can be done in sports, playing with friends, or walking to school. Some examples of physical activity are running, brisk walking, rollerblading, biking, dancing, skateboarding, swimming, soccer, basketball, football, and surfing” and then asked: “Over the course of the week (past 7 days), how many days were you physically active for a total of at least 60 minutes per day?” and “Over the course of a typical or usual week, how many days are you physically active for a total of at least 60 minutes per day?” Response options ranged from 0 to 7 days. The two items were averaged to estimate a weekly average MVPA score. In previous work, this method to create MVPA scores was significantly correlated (r = 0.40, p<0.001) with accelerometer data, had an interclass correlation of 0.77  and was supported for use among children and adolescents. [45–47]
Descriptive statistics. Data were summarized to describe the participants at study inception and their evolution at every cycle. Means and ranges were computed to describe the age of participants. The median was calculated to describe the typical number of days reported for participating in at least 60 minutes of MVPA. Average scores and standard deviations for each PA motive were also calculated using the average of all subscale items.
Factor structure of the MPAM-R. The hypothesized correlated five-factor structure of the MPAM-R was verified with confirmatory factor analysis (CFA) using the 30 items at every survey cycle (i.e. 15 independent CFAs were computed). The computation was repeated twice; once with full information maximum likelihood (FIML) estimation to adjust the likelihood function so that each case contributes information on the variables that are observed, and a second time by using a robust estimation of the errors in addition to FIML (i.e. the Huber/White/Sandwich estimator). The items were set to load onto their corresponding factor as identified by Ryan et al. . As the chi-square statistic has a tendency to reject the null hypothesis when the sample is large, model fit was evaluated with approximate methods. Specifically, model fit was considered acceptable when a Root Mean Square Error of Approximation (RMSEA) < 0.08, a Comparative Fit Index (CFI) ≥ 0.90, and a Tucker-Lewis Index (TLI) ≥ 0.90 were obtained at every cycle.  If acceptable model fit was not achieved at least at one cycle, standardized loadings, modification indices, standardized residuals, squared multiple correlations, and covariances between items were scrutinized to determine if lack of acceptable model fit was due to a problematic item or a function of the hypothesized factor structure. Items that cross-loaded on factors not identified by the original authors of the MPAM-R,  were dropped from subsequent re-specified models until acceptable model fit was achieved at all cycles. In every re-specified model only one item at a time was dropped in the computation of the 15 CFAs corresponding to each cycle (e.g. after the first estimation of CFAs item 8 was identified as cross-loading, then item 8 was dropped from the next iteration of CFAs). Once an item was dropped it was not brought back to subsequent computations of CFAs.
Score reliability. To create the variables used for the main analysis in this study, five PA motives scores were calculated at each survey cycle using the mean of items retained for each of the motives (i.e., enjoyment, competence, social, fitness and appearance) in the final and reduced CFA model. Cronbach’s alpha was calculated once using all items (i.e. all original items from the MPAM-R) and a second time with the retained items from the reduced CFA. This was completed to assess score reliability on the originally hypothesized variables as well as the variables informed from the reduced CFAs, respectively. Additionally, composite reliability was calculated using the “relicoef” command in Stata  that is based on Raykov’s computation of reliability coefficients. 
Longitudinal Invariance. To ensure that the results reflected true change over time and not change in the psychometric structure of MPAM-R scores, the factor structure and measurement invariance over time was examined.  Four levels of invariance were estimated with the truncated MPAM-R using maximum likelihood parameter estimates with standard errors and a chi-square test statistic that are robust to non-normality in Mplus:  (1) configural invariance (i.e., no equality constraints), (2) weak invariance (i.e., factor loadings constrained to be equal), (3) strong invariance (i.e., factor loadings and intercepts constrained to be equal), and (4) strict invariance (i.e., factor loadings, intercepts, and errors constrained to be equal).  Following Chen’s  and Cheung’s  recommendations, a change of ≤ 0.010 in CFI supplemented by a change of ≤ 0.015 in RMSEA was used as an indicator of invariance at each level.
Trajectories of PA Motives. The slopes and intercepts for each PA motive were estimated with mixed effects regression models using data from all 15 cycles. Both linear and quadratic fixed and random effects for cycle were tested in order to establish the functional form of change in PA motives. The quadratic term was retained for further analyses when it was statistically significant at p<0.05. Interactions for sex with time for each PA motive were tested and if they were significant for at least one of the motives, further analyses involving PA motives trajectories were stratified by sex. After, deciding on final models, variables representing the slopes and intercepts for each PA motive were generated so that they could be used in the primary analysis.
Trajectories of MVPA. Similar to PA motives, trajectories for MVPA were estimated with mixed effects regression models using all 15 cycles. Linear and quadratic fixed and random effects for cycle were also examined, and quadratic terms were retained if they were statistically significant at p<0.05. Sex interactions with time were tested and if significant, further analyses involving MVPA trajectories were stratified by sex.
Trajectories of PA motives and MVPA. To address the main objective of this study, mixed effects regression models were used to examine longitudinal associations between slopes and intercepts of each PA motive and MVPA using all 15 cycles. Variables representing the slopes and intercepts for each PA motive were used as independent variables in the same model (i.e. the effect of one PA motive was adjusted for all other PA motives). Sex interactions were tested and if statistically significant, further analyses of the relationship between PA motives and MVPA were stratified by sex.
Longitudinal invariance testing was conducted using Mplus 7.4,  whereas all other analysis (i.e., CFAs, mixed effects regression models) were estimated using Stata MP 15.1.  In all mixed effects regression models, the unstructured matrix was used to estimate the covariance matrix.