Living in social groups has been claimed to be beneficial for many animal species1. The increased survival rate accelerated brain development, and enhancement of cognitive abilities are just some of the many advantages of coexisting with other individuals2. Throughout the history of humankind, being surrounded by our conspecifics did not only grant us a higher survival rate but allowed for the optimization of everyday life tasks. From cultivating fields to industrial assembly lines, collaboration with others has been crucial for boosting productivity and reducing the workload of an individual.
When it comes to motor tasks, such as coordination or athletic performance, the mere presence of “others” seems to increase an individual’s physiological (and psychological) arousal. If the task is familiar, a boost in physiological arousal tends to cascade and result in better performance, through an increase in the frequency of dominant responses (i.e. responses with the greatest habit strength)3. Initially, this effect has been studied within the drive theory of social facilitation framework4–5, giving rise to several different models and hypotheses6–7. Despite this theory stemming from the observation of facilitation during motor behavior, the presence of others seems to affect individuals’ performance also at a cognitive level.
In an fMRI work, Chib, Adachi, and O’Doherty (2018) investigated the neural correlates of social facilitation using a monetarily incentivized reward paradigm8. The authors found increased connectivity between participants’ ventral striatum (vSTR) and dorsomedial prefrontal cortex (dmPFC) in trials in which social facilitation occurred (i.e. when an audience was watching them). The authors claim that the increased brain activity in the vSTR reflects motivational encoding occurring when other individuals are watching participants during the task. This might indicate that when we are “watched” by someone while performing a task, we are more motivated to perform better or to focus more on the task, rather than when we perform the same task alone.
Providing a systematic review of social facilitation theories goes beyond the scope of the current paper, but it is important to highlight that according to most social facilitation accounts, the (social) presence of an audience can improve individuals’ performance9. Still, there are times in our daily life in which the presence of another individual is not necessarily beneficial. Every one of us experienced the (sometimes unpleasant) feeling of being observed by someone while performing a casual task, such as cooking or writing an essay. Indeed, even during daily activities, people tend to behave differently when they think they are being watched by someone else, a phenomenon known as the Hawthorne effect10. Whether it is simply walking down the street or giving an athletic performance, having an audience observing you can be either facilitating or detrimental, depending on task demands, context, and personality traits11–12. The Hawthorne effect has been studied and debated within several theoretical frameworks, highlighting the complex interplay between social presence, contextual factors, and individual differences13–14.
Several studies examined the potentially detrimental effects of social presence on individuals’ performance15. Indeed, elevated arousal levels can improve individuals’ performance, but only up to a certain point, especially if the task (or the environment) is unfamiliar to the individual16. Therefore, in conditions under which the arousal becomes excessive, performance might dramatically deteriorate. This effect is also known as the Yerkes-Dodson law, which postulates that an optimal level of arousal can help individuals focus on a task, while a high level of arousal can impair the ability of an individual to concentrate. Under certain circumstances, the increase in an individual’s arousal can be connected with feelings of anxiety and stress17. In particular, the phenomenon of “choking” (i.e. performing worse than one’s actual skills would allow) under monitoring pressure presents a fascinating counterpart of the social facilitation theory18–19.
In an fMRI study, Yoshie and colleagues (2016) found out that the presence of an evaluative audience can worsen participants’ fine motor performance15. While engaged in a motor task (i.e. feedback-occluded isometric grip task), participants reported higher subjective anxiety in the “observed” condition, compared to the “unobserved” one. Interestingly, in the “observed” condition, the authors reported increased activation of the posterior superior temporal sulcus (pSTS), when compared to the “unobserved” condition. As the pSTS is claimed to be a key neural substrate for social perception based on visual information20, the authors claim that individuals performing a task needed to allocate additional attentional resources to monitor external observers. Such costs (i.e. the reallocation of attentional resources) might conflict with the execution of the task. In another study, Belletier and colleagues (2015) investigated the same phenomenon, which they define as “monitoring pressure”18. The authors observed that being watched by an evaluative audience leads individuals to choke on executive control tasks. This is in line with the distraction-conflict theory21 stating that, when an individual is performing a task, the mere presence of others generates an attentional conflict between attending to the observers and attending to the task.
The impact of social presence on performance has been widely studied in the context of human-human interaction22. However, despite a growing body of literature investigating the impact of artificial agents’ presence on performance in collaborative tasks23–24, it remains to be understood whether collaborative artificial agents, facilitate or impair individuals' performance. This question is becoming of great importance, as interaction with social robots is becoming increasingly present in our lives. From industrial to clinical applications, robots are becoming part of our everyday life activities.
Past research demonstrated that the presence of a robot gazing at individuals while they perform a task modulates attentional orienting25–26, social decision-making27, and engagement28. For example, in two studies, Spatola et al.24, 29 found that individuals’ performance in a Stroop task30 improved when participants were observed by a social robot rather than when they were observed by a non-social robot or when they were not being observed at all24. These results speak in favor of the social facilitation effect in human-robot interaction. However, in a recent work, Koban, Haggadone, and Banks (2021) did not find the same results using a similar Stroop paradigm31. The authors did not find a substantial difference in individuals’ performance when participants were playing alone or when they were observed by a robot or a human, suggesting the existence of additional contextual factors that might influence social facilitation processes in human-robot interaction (e.g., individual’s familiarity with the task or with the observer; task difficulty). Similarly, Belkaid et al.27 showed that social signals exhibited by a humanoid robot impair participants’ performance in a social decision-making task.
One potential reason for such conflicting findings might be that the paradigms that are usually adopted to study social facilitation in human-robot interaction rely on classical attentional tasks that are not interactive (for example, the Stroop task30). In recent work, Irfan and colleagues (2018) highlighted the importance of adapting classical experimental paradigms to more natural human-robot interaction environments, intending to increase the ecological validity of the approach32. Furthermore, performance measures (i.e. accuracy, answer times) are often the only indicators used to assess participants’ cognitive (and attentional) engagement during human-robot interaction experiments. Recent literature demonstrated the potential that other measures of attention, such as eye-tracking metrics, may have in exploring social cognition mechanisms that underpin human-robot interaction33–34).
To explore the effect of the social presence in a human-robot interaction scenario, we designed a visual-search game, in which participants were asked to find a letter hidden within pictures of naturalistic photographs. To monitor subtle changes in the allocation of attentional resources, we asked participants to wear a mobile eye-tracker during the task.
The game was designed to be played both alone and with another player. Importantly, the design gave participants the the time and the freedom to explore the environment (including the social environment) during the experiment. This was meant to mimic more natural contexts of being involved in an attentional task (e.g., monitoring bag scans in security control) while being embedded in a social (and noisy) environment that can also be distracting.
We decided to focus on two eye-tracking metrics, previously used in research on visual attention35–36, namely fixation duration and time to the first fixation. The importance of the use of eye-tracking metrics to better understand human cognition has been recognized in both human-computer interaction37 and human-robot interaction38 studies. Therefore, we decided to include such metrics along with participants’ response times, to examine the effects of the robot’s presence on performance and attention.
The first group of participants was asked to play the game alone. After two weeks, the same participants were asked to play the game again, but with the humanoid iCub robot39 observing them. The second group of participants was asked to play a turn-taking version of the game, where the robot was both observing and collaborating with them. The combination of these three versions of the game (i.e. “Solo”, “Observation”, and “Interaction”) allowed us to investigate further contextual aspects that might play a role in social facilitation/distraction, such as the difference between a “passive” robot observer and an “active” cooperative one. This difference might be of high relevance, as in everyday life interactions observers are not always “passive”, especially when it comes to scenarios where robots can be deployed as human assistants.