The study involved seven weekly free play sessions with a within-subjects group design. During the first four sessions (baseline phase), the SAR was powered off. In the last three sessions (intervention phase), the SAR was teleoperated to move in the play area and offered rewards of lights, sounds, and bubbles to children.
Participants
Six children between the ages of one and seven (Range = 1.6 to 6.7 years; M = 3.6; SD = 1.9; five females; all Caucasian), who attended two or more play sessions during both the baseline and intervention phases were included for analyses.
Procedure
IRB and Informed Consent
Approval for all study procedures was obtained from the Oregon State University Institutional Review Board. Written informed consent from parents was obtained prior to the start of the study.
Play Area
The play area (approximately 440 sq. ft or 41 m2) was lined with alternately colored blue and green foam squares (each 2’ X 2’ or 0.6 X 0.6 m), and children were instructed to remain in the play area for the entire session. At the start of each play session, the same set of developmentally appropriate toys was set up in the same location, as shown in Fig. 1. Toys for the age range included physical activity and recreational toys, sensory toys, learning toys, pretend play toys, and the SAR.
Play Session Description
There was a total of seven weekly sessions with four baseline sessions (weeks one to four) and three intervention phases (weeks five to seven). A fourth planned weekly session for the intervention phase was cancelled due to the COVID-19 pandemic. Each weekly session was approximately 30 minutes long wherein children engaged in free play. In this study, free play is defined as play behavior that is controlled by the child, with minimal involvement of adults [40]. Parents and research team members intervened minimally during play time.
The SAR used in the study was an infant-sized mobile robot which is capable of providing configurable rewards of lights, sounds, and bubbles. During the baseline phase the SAR was powered off, and during the intervention phase, a research team member used a teleoperation interface to maneuver the SAR to approach each child in the play area at varying intervals and activated the rewards of lights, sounds, or bubbles. We randomized the order in which children were approached each session using a random number generator. Every child received all three rewards during every play session in the intervention phase.
Data Collection and Video Coding
Overhead GoPro cameras were used to record the 30-minute play sessions, and these videos were used for data analyses.
Measurement
As summarized in Table 2, physical activity, play behavior, and toy-use behavior variables were annotated based on a predefined codebook, and the child and robot positions were tracked using computer vision.
Physical Activity
Physical activity behaviors were adapted from a direct observation system called the Observational System for Recording Physical Activity in Children: Elementary School (OCRAC-E) [41] to add more behaviors like catching/throwing, riding, and walking on knees based on observed behaviors during playgroup sessions (Table 2). The OSRAC observation system is used commonly to record children’s physical activity behaviors [18, 42].
Play Behavior
Play behaviors were adapted from the Parten’s Stages of Play [24], and Peer Play Scale [23]. The adaptations from both of these scales were made to include behaviors of interest for the current study. Similar coding systems have been used to assess play behavior of children at various stages of development [18, 43, 44]. Play behaviors were categorized as unoccupied play, solitary play, parallel play, peer interaction play, and adult interaction play (Table 2).
Toy-use Behavior
Toy-use behaviors were annotated based on the type of toy children were interacting with. Developmentally appropriate toys for the age range included physical activity and recreational toys, sensory toys, learning toys, pretend play toys, and the SAR (Table 2).
Child and SAR Positioning
Positional data for the child and SAR were extracted using the OpenCV multi-object tracking function [45] in a custom Python script. This region-of-interest tracker is commonly used in several different contexts such as traffic surveillance, surgery, and medical imaging since it allows for position monitoring for entities of interest [46]. To use the script, a research assistant selected bounding boxes for the SAR and each child in the play area. If a child or the SAR left the play area, the research assistant would re-select this target of interest when that child or robot re-entered the frame. At a rate of 25 frames per second, position data was automatically recorded at each timestep based on the position of each bounding box’s center.
Table 2 Behavior Assessments with Categories
Data Analysis
The videos were annotated using a momentary time sampling observation system [18, 47]. This technique involves breaking down the 30 minutes of video into 10-second consecutive intervals, observing the child behavior for the first two seconds of each interval, and recording the observed behaviors during the remaining eight seconds of each interval. The protocol used in this study was adapted from previous studies where the first five seconds of 15-second intervals [18] or 25-second intervals [47] were annotated for child behaviors. Shorter epochs of 10 seconds were used for recording behaviors in the present work based on accelerometer-based cut-point estimations for moderate to vigorous physical activity of toddlers [48]. Six observation intervals were annotated for each minute, resulting in 180 observations per child for every session. This yielded in a total of 5,400 observation intervals across the study.
Two trained coders annotated all the video recordings for behaviors. One coder annotated the physical activity behaviors, and the other coder annotated play behavior and toy-use behavior. An inter-rater reliability of at least 85% agreement was established between an additional expert coder and the two trained coders for 10% of the video recordings. Agreement of 85% or higher is considered acceptable in observational studies of children [18]. Percent agreement was calculated as the following:
$$Percent Agreement= \left(\frac{\# of agreements between coders}{\# of agreements+ \# of disagreements}\right)\times 100$$
For physical activity, play behavior, and toy-use behaviors, the percentage of total intervals when the child was in the field of view is reported. For child and robot positioning, the percentage of total frames when the child was within three feet of the robot in the field of view is reported. For each child, mean percentage of time spent in each behavior in each individual phase is calculated as follows:
$$Mean \% of Time in Behavior= \left(\frac{\# of observed intervals for the behavior}{Total \# of intervals}\right)\times 100$$
For the computer vision-generated data the distance between the SAR and each child was calculated for every timestep. Then, the percentage of frames where the child was within three feet of the SAR was calculated to determine time spent by the child in parallel or more complex play behaviors within close proximity of the robot [43]. For each child, mean percentage of time spent that the child was within three feet of the SAR is calculated as follows:
Statistical Analyses
A within-subjects group design was used to analyze the data. Due to the non-parametric nature of the data, paired-Wilcoxon signed rank tests were conducted using the SPSS statistical software (version 25).