Learning from instructional videos is a prevalent learning activity in formal and informal environments. When instructors express difficult learning materials, they often use depictive gestures accompanied by speech to provide semantic information. For instance, when the instructor talks about “the separation of chromosomes during mitosis”, it is indicated by the two hands moving away. Depictive gestures are the mental modeling of movement and perception, which can provide more visual input information and semantic content (Mayer & Fiorella, 2022).
However, previous studies did not gain a consistent conclusion on the effects of depictive gestures, which may be attributed to the use of learning strategies (Willems & Hagoort, 2007; Son et al., 2018; Ianì et al., 2018; Pi et al., 2019). There are three common learning strategies used in learning from videos: observation, imagination, and imitation. Observation is a passive learning strategy, and imagination and imitation are two active learning strategies related to gestures.
As a passive learning strategy, observing gestures did not always guarantee the effectiveness of learning (Pi et al., 2019; Brucker et al., 2022). Additionally, studies have shown that compared to the observation strategy, the two active strategies of imagination and imitation take advantage of the instructor’s depictive gestures (Baills et al., 2018; García-Gámez et al., 2021; Leopold et al., 2019). Imagination is an effective generative learning strategy, helping students construct mental imagery, but increasing the cognitive load (Leahy & Sweller, 2004; Fiorella & Mayer, 2015; Leopold et al., 2019). Unlike imagination, the imitation strategy not only is an effective generative learning strategy but externalizes mental representations and frees up cognitive resources, thus reducing the cognitive load (James & Swain, 2011; Baills et al., 2018; García-Gámez et al., 2021). So far, to our knowledge, no empirical study compared the effects of three strategies on video learning including an instructor’s depictive gestures.
In addition, past studies have proved that spatial ability moderated the effectiveness of the design of learning materials and the application of learning strategies (Lee, 2007; Brucker et al., 2015, 2022). The ability-as-compensator hypothesis held that the design of learning materials and instruction that was beneficial to low-spatial-ability students may not be helpful for high-spatial-ability students (Höffler, 2010). However, to our knowledge, little was known about the moderation role of spatial ability on the effects of observing, imagining, and imitating an instructor’s depictive gestures in instructional videos .
1.1. The effects of depictive gestures on learning from instructional videos
Gestures are identified as spontaneous and meaningful hand movements that are usually accompanied by speech, and convey information to the audience, which are widely used by the instructors during instruction (McNeill, 1992; Goldin-Meadow, 2014). Depictive gestures is a kind of typical gesture, which describes various concrete or abstract aspects of semantic content through the shape or movement track of gestures. Depictive gestures express the spatial properties of objects or related events, and evoke the mental imagery of the shape in students’ minds in a literal or metaphorical way (e.g., crossing the hands left and right to express the concept of “air convection”; Alibali et al., 2013; Goldin-Meadow, 2014; Pi et al., 2019, 2021). Also, depictive gestures often rely on the accompanying speech to complement meaningful information for the instructor’s interpretation, which will lead to a deep understanding of the concept (Willems & Hagoort, 2007).
The potential benefits of the instructors’ depictive gestures in instructional videos were supported by the dual channels principle of multimedia learning (Congdon et al., 2017; Pi et al., 2019; Mayer & Fiorella, 2022). The dual channels principle holds that students process information through visual and auditory channels, which work independently but with limited capacity. When students learn new materials, their limited working memory capacity will affect their acquisition of new knowledge (Mayer & Fiorella, 2022). Therefore, if the information is presented in both visual and auditory channels, it will expand their effective working memory capacity, thereby reducing their cognitive load, and thus facilitating the acquisition of new knowledge (Paas & Van Merriënboer, 1994; Mayer & Fiorella, 2022).
Numerous empirical studies have confirmed the benefits of the instructor’s depictive gestures on learning from instructional videos (Son et al., 2018; Ianì et al., 2018; Pi et al., 2019). For example, Son et al. (2018) used mathematical concepts “variance and mean” to explore the effects of depictive gestures on undergraduate students’ conceptual understanding. They found that watching instructional videos with the instructor’s depictive gestures was beneficial for students to understand complex concepts, which was probably because the instructor’s depictive gestures help students pay attention to and remember key knowledge points, so as to understand the learning materials better. A similar result was also shown in an eye movement study by Pi et al. (2019). To be specific, when learning “the adjustment of the curve in Photoshop”, students who viewed the instructional video with the instructor’s depictive gestures gained better transfer performance and paid more attention to the gestures than those who viewed the video without gestures. The above evidence supported the embodied cognition theory and the dual channels principle of multimedia learning (Wilson, 2002; Mayer & Fiorella, 2022).
However, the instructor’s depictive gestures did not always facilitate students’ learning from videos (Pi et al., 2019). For instance, although Pi et al. (2019) observed the positive effects of depictive gestures on students’ transfer, they failed to find the benefits of the instructor’s depictive gestures on retention performance and attention allocation. A study conducted by Brucker et al. (2022) with the theme of “fish movement patterns” also found no significant differences in classification performance between students watching videos with and without the instructor’s depictive gestures.
Taken together, the instructor’s depictive gestures had wide influences on students’ learning performance, cognitive load, and attention allocation, but the results were controversial. The above inconsistent results about the effects of instructor’s depictive gestures on learning from videos might be attributed to the use of learning strategies.
1.2. Comparisons of observation, imagination, and imitation strategies
Besides the design of instructional videos, the effectiveness of learning strategies in video learning also attracted much attention (Fiorella & Mayer, 2016). Three learning strategies related to depictive gestures were examined: observation, imagination, and imitation of gestures (Brucker et al., 2015; Pi et al., 2019; García-Gámez et al., 2021).
Observational learning was defined as learning by viewing the actions of others (Bandura, 1986), which has been proved to improve learning performance and learning experience (Marcus et al., 2013; Castro-Alonso et al., 2014; Fiorella & Mayer, 2016). The instructor’s depictive gestures can provide social cues that help students to understand the learning material by associating the instructor’s gestures with relevant cognitive processing. The positive effects of observing gestures were usually attributed to the activation of the mirror neuron system. The brain regions activated when the students observed the instructor’s gestures in the video were the same as those activated when the students performed the same gestures themselves (Marcus et al., 2013; Holle et al., 2008; Brucker et al., 2015). However, as a passive learning strategy, observation did no always work, for example, Pi et al. (2019) found no beneficial effect of observing an instructor’s depictive gestures on knowledge retention.
Imagination strategy indicated that students process knowledge by imagining in the mind, which was one of the eight generative learning strategies (Fiorella & Mayer, 2015; Leopold et al., 2019). According to generative learning theory, learning by imagining activated the processes of selection (selecting relevant components of learning material to include in mental imagery), organization (arranging the components in mental imagery spatially), and integration (connecting the learning content with prior knowledge). Learning by imagining encouraged students to mentally visualize knowledge concepts or processes in working memory, forming meaningful constructs that are transferred and stored in long-term memory, which improved learning performance and learning efficiency (Lin et al., 2016; Cooper et al., 2001; Yang et al., 2019; García-Gámez et al., 2021). For instance, García-Gámez et al. (2021) tested the impact of depictive gestures on vocabulary learning in a foreign language, and they found that compared to viewing videos without gestures, those who observed and imagined the instructor’s depictive gestures gained a higher percentage of recall. However, imagination strategy may harm learning when the integration of information may exceed the limit of cognitive capacity and result in a higher cognitive load (Leahy & Sweller, 2004).
Imitation involved the observation and execution of other’s actions (Tellier, 2008; Mainieri et al., 2013; García-Gámez et al., 2021), which can not only provide semantic information but also activate students’ motor and perception systems, thus affecting the cognitive process (James & Swain, 2011; García-Gámez et al., 2021). Embodied cognition theory provided explanations for the positive effects of imitating the instructors’ depictive gestures (Wilson, 2002). According to the embodied cognition theory, an individual’s thinking and cognition are derived from the sensory and motor experience of the body, and there is a strong connection between the physical experience and the mental state. From this perspective, when observing and executing gestures, learners may have a joint understanding of the mental imagery with the instructor in their sensory and motor systems (Alibali & Nathan 2012; Ping et al. 2014). The beneficial effects were also associated with the mirror neuron system (Rumiati & Bekkering, 2003; Mainieri et al., 2013). Specifically, when students are asked to imitate the instructor’s gestures, they will have a shared mental state. To some extent, imitation also involved the process of observation and imagination, and only by generating information in the mind can the observation be externalized.
Moreover, studies have shown that externalized knowledge can unload information accumulated in cognitive capacity, avoid cognitive overload, and improve learning performance (Baills et al., 2018; Yang et al., 2021b; García-Gámez et al., 2021). For example, Baills et al. (2018) explored the influence of depictive gestures on students’ learning of Chinese tones. They found that compared to the passive observation of gestures, students who watched videos using the imitation strategy could identify Chinese words and tones better. The study by García-Gámez et al. (2021) also proved that imitating the instructor’s depictive gestures was more effective than imagining gestures during learning vocabulary in a foreign language.
To sum up, the existing studies have shown that observation, imagination, and imitation of gestures had different influences on learning. To illustrate, the observation strategy worked by activating the mirror nervous system; the imagination strategy helped students build mental images, but may increase cognitive load; while the imitation strategy can externalize mental representations and free up cognitive resources. However, studies on the effectiveness of observing, imagining, and imitating gestures mainly concentrated on the learning performance and cognitive load of one or two strategies, and little was known about the differences in attention allocation among the three strategies.
1.3. The moderation role of spatial ability
Spatial ability was regarded as an individual ability at the processing of dynamic visualizations as well as observing and making gestures (Brucker et al., 2015, 2022). Specifically, spatial ability had to do with an individual’s ability to perceive the attributes (e.g., location, shape, etc.) of objects, form a mental representation of the attributes, and manipulate the representations mentally (Höffler, 2010; Brucker et al., 2022). Previous studies have shown that spatial ability moderated the effectiveness of material design (Lee, 2007; Brucker et al., 2015). For example, Brucker et al. (2015) found that low-spatial-ability students performed better in the well-designed (i.e., viewing corresponding depictive gestures) group than in the poor-designed (i.e., viewing non-corresponding depictive gestures) group when they learned from the videos to classify the fish movement patterns; while for high-spatial-ability students, no difference was found. The results were consistent with the ability-as-compensator hypothesis, that was, low-spatial-ability students might need well-designed visualization to achieve good learning performance, but high-spatial-ability students did not (Höffler, 2010). The study conducted by Brucker et al. (2022) further proved that students’ spatial ability may influence the effectiveness of learning strategies in a fish movement patterns classification task. The results found that low-spatial-ability students who learned without gestures performed better than those who observed and made gestures; but for low-spatial-ability students, making gestures and learning without gestures did not differ. In contrast to their hypothesis, making gestures did not improve learning performance, which may be due to the ambiguous self-gesturing instruction.
In sum, the effectiveness of the learning strategies differed for students with different spatial ability levels. Therefore, we assume that spatial ability will moderate the effects of observing, imagining, and imitating the instructor’s depictive gestures on learning from instructional videos.
1.4. The present study
Taken together, studies have shown that an instructor’s depictive gestures in video lectures influences students learning performance, cognitive load, and attention allocation (Son et al., 2018; Ianì et al., 2018). However depictive gestures does not always facilitate learning from videos (Pi et al., 2019). It should be noted that the effects of the instructor’s depictive gestures are varied with students’ learning strategies of observation, imagination, and imitation (Marcus et al., 2013; Fiorella & Mayer, 2016; Cooper et al., 2001; Yang et al., 2019; García-Gámez et al., 2021). To our knowledge, there has been no study of the horizontal comparison of the learning effects of observing, imagining, and imitating the instructor’s depictive gestures. Besides, the moderating role of spatial ability on the three strategies has not been comprehensively understood.
This study aimed to compare the effects of observing, imagining, and imitating the instructor’s depictive gestures on learning performance, cognitive load, learning efficiency, learning satisfaction, and attention allocation. In addition, we expected to verify the moderation role of spatial ability. The following hypotheses were proposed.
H1: Students will show the best learning performance (retention and transfer) when they used the imitation strategy to learn a video (Imitating gestures condition), followed by using imagination strategy (Imagining gestures condition), next to using observation strategy (Observing gestures condition), finally by passively viewing a video without gestures (No gestures condition).
H2: Students in the Imitating gestures condition will report the least cognitive load, followed by Observing gestures condition, next to No gestures condition, finally by Imagining gestures condition.
H3: Students in the Imitating gestures condition will show the highest learning efficiency, followed by Imagining gestures condition, next to Observing gestures condition, finally by No gestures condition.
H4: Students in the Imitating gestures condition will report the highest learning satisfaction, followed by Imagining gestures condition, next to Observing gestures condition, finally by No gestures condition.
H5: Students in the Imitating gestures condition will spend the least percentage fixation duration on slides and the most on the instructor, followed by Imagining gestures condition, next to Observing gestures condition, finally by No gestures condition.
H6: Spatial ability will moderate the effect of learning strategy on learning performance (retention and transfer), cognitive load, learning efficiency, learning satisfaction, and attention allocation. In other words, there will be an interaction effect between spatial ability and learning strategy.