Study design and sample
A 24-week retrospective pre-post matched pairs design was used to examine the effect of adding STCs to the Carrot Rewards ‘Steps’ walking program on mean daily step count. Participants were drawn from the existing Carrot Rewards user base which included Canadians 13 years of age or older living in the three provinces the app was launched (i.e. British Columbia (BC), Newfoundland and Labrador (NL), Ontario (ON)). All participants had to have opted-into the ‘Steps’ walking program to be included in the study. The experimental group included participants using the STC feature for the first time between March 19 and April 16, 2018 (the first month STC was available). Control participants were drawn from the cohort of current Carrot Rewards users who had enabled the ‘Steps’ walking program but had not engaged in a STC during the study period. Control participants were matched with existing experimental participants on age (±1 yr), gender, province and baseline step count (±500 steps/d, so individuals with similar PA levels would be compared). Only one control user was selected to match to experimental users if they met each of the four criteria; therefore, one control user could be matched with multiple experimental users who shared the same age, gender, province and baseline daily step count. Notably, 10% of the study population with the highest matching ratios (more than 1:18 and up to 1:250) were excluded to minimize the experimental-control imbalance (for more details see Additional file 1). Sensitivity analyses were conducted with experimental-control participants matched 1:1 only as well to check if the imbalance influenced results.
The pre-intervention period was defined as the 12 weeks preceding experimental users’ first STC (Study Weeks 1-12). The intervention period included the 12 weeks following the initiation of the first STC (Study Weeks 13-24). Participants were required to have valid pre-intervention and intervention study periods, consisting of a minimum of four weeks of daily step count data in each period—a valid week was operationally defined as a minimum of four days with step counts between 1,000 and 40,000 inclusive, as previously done.38 A study flow chart is provided (Additional file 2). Ethical approval for this study was provided by Western University’s Human Research Ethics Board (#111252).
Individual and team incentives
Upon downloading the free, commercial Carrot Rewards app, and following a two-week baseline period, Carrot Rewards users earned individual-level incentives in the form of loyalty points (redeemable for consumer goods like movies or groceries) each day they reached a personalized daily step goal (worth $0.04 CAD/day). Given finite reward budgets and a large user base, and to maximize program scalability and sustainability, the smallest possible loyalty point increment was selected (i.e. the app could not offer less than 1 point/d = $0.04 CAD/d). Previous research has suggested that as part of a multicomponent intervention this incentive magnitude could stimulate PA.12 In addition, several RCTs have demonstrated positive effects with PA incentives worth $0.09 to $0.75 USD per day.23, 39-41 Goals were initially set using the two-week baseline median (e.g., if a user’s baseline daily step count median was 5,441 steps, their first goal would be rounded to 5,400). See Mitchell et al. (2018) for a more full description of the goal setting approach, including how goals were progressed.42 While small, incentives were tied to objectively measured PA and were given nearly instantaneously with a push notification using smartphone technology (e.g., linking data from native smartphone accelerometer with loyalty program application programming interfaces (APIs)). Manual entry of daily step count was not possible (e.g., from a pedometer) nor were participants able to set their own step goal, in order to ensure incentives were earned for meaningful PA efforts. To boost app engagement and PA, the ‘Steps’ walking program evolved with refinement of older features, as well as the introduction of new ones. For example, the algorithm used to calculate each user’s daily step goal was updated to be more personalized and adaptive.42
STCs were introduced in March 2018 to allow users to collaboratively pursue team-based goals with a peer of their choosing for additional rewards (i.e. a pre-existing friend they had already connected with on the app). Users participating in a STC could each earn a bonus incentive worth $0.40 CAD for together reaching 10 individual daily step goals in a seven-day period (e.g., Partner A completes four goals and Partner B completes six goals in a week; Fig. 1). Users could only participate in one STC at a time. The app allowed users to see their partner’s daily step progress in real time, as well as their own, though users could not communicate about their shared progress in-app (this needed to be done through other means e.g., text messages, in-person, etc.). Over the course of 12-weeks, STC participants could earn a maximum of $9.76 CAD in points. In addition to promoting social support, the STC feature integrated other behaviour change techniques as well including goal setting/review, self-monitoring and demonstration. For more app design detail, completed Mobile App Rating Scale (MARS self-score 4.23/5; for understanding app quality, aesthetics and functional appeal)43 and App Behavior Change Scale (ABACUS self-score 4.5/5; for measuring potential to change behaviour)44 are provided (Additional files 3 and 4).
The primary outcome was mean daily step count as measured by built-in smartphone accelerometers. In recent validation studies, the iPhone step counting feature, as well as those for Android smartphones were accurate in laboratory and field conditions.45-48 Duncan et al. (2018) did determine, however, that steps were under-estimated by the iPhone step counting feature in their free-living condition by approximately 20%, or 1,340 steps/day. According to the study authors this likely reflects not carrying the iPhone continually throughout the day rather than inaccuracy in the step counting feature; if adherence can be optimized, they suggest, then smartphones may be suitable for PA evaluations. Self-reported demographics (i.e. age, gender, province) and number of STCs completed were also collected. Number of STCs completed was defined as any STC that was started and finished within the seven-day window, irrespective of whether the challenge was completed successfully or not. To finish the challenge, a user simply needed to open the app to facilitate app vs. smartphone data synchronization.
Chi-square and independent t-tests were conducted to examine group equivalency on demographic measures. Controlling for pre-intervention mean daily step count, ANCOVA was performed to examine group differences in intervention period mean daily step count. Data were expressed in estimated marginal means (95% CI). To complement the ANCOVA and increase internal validity (i.e. the extent to which causality can be established) in this quasi-experimental study a number of analysis phase strategies recommended by Handley et al. (2018) were deployed.49 First, a pairwise t-test examined the mean daily step count change over time (pre-intervention vs. intervention) for each group. Second, ANCOVA and pairwise t-test sensitivity analyses were performed with (a) users with complete data sets only (highly-engaged users with valid step count data for all 24 study weeks), as well as (b) participants with a 1:1 control to experimental matching ratio only (vs. in the overall sample where controls were matched with up to 18 experimental users). Finally, linear regression was performed to determine whether a relationship existed between the number of STCs completed and intervention period mean daily step count. Statistical significance were two-sided and set at 0.05.50 Reported effect sizes followed Cohen’s (1988, 1992) criteria; Cohen’s d: small = 0.20, medium = 0.50, large = 0.80, Cramer’s V for chi squared: small = 0.10, medium = 0.30, large = 0.50, partial eta squared: small = 0.01, medium = 0.06, large = 0.14.51, 52 Statistical analyses were performed using IBM SPSS Statistics Version 25.