The traditional paradigm for the evaluation of a new method, application or scenario in virtual reality (VR) is to carry out an experimental study, with response variables as answers on Likert scale questionnaires and possibly some behavioural or physiological measures. The point is to understand how the responses vary between different levels of the factors of the experiment. For example, one of the earliest ever VR studies was concerned with transfer of training in a physical manipulation task from VR to real world performance (Kozak, Hancock, Arthur, & Chrysler, 1993) and found that VR training offered no advantage compared to a waiting group (in the particular system used at that time). More generally, a long-standing theme in the evaluation of VR experiences has been the concept of presence (the feeling of ‘being there’) in the place depicted by the VR (Sheridan, 1992). Since this is a unique affordance of VR the achievement of high presence has been thought to be a fundamental goal of VR experiences. A stream of studies started in the early 1990s that analysed different factors that may contribute to presence, for example, (Slater, Usoh, & Steed, 1995) examined how walking-in-place compared to point-and-click methods of moving through an environment influenced presence, and (Barfield & Hendrix, 1995) examined the impact of display update rate.
Presence is usually evaluated by questionnaire (Lessiter, Freeman, Keogh, & Davidoff, 2001; Usoh, Catena, Arman, & Slater, 2000; Witmer & Singer, 1998), physiological responses (Meehan, Insko, Whitton, & Brooks Jr, 2002), breaks in presence (Slater & Steed, 2000) or psychophysical approaches where factors can be varied in real time in order to find their optimal balance (Llobera et al., 2021; Slater, Spanlang, & Corominas, 2010).
Research on presence is reviewed in (Sanchez-Vives & Slater, 2005; Skarbez, Brooks Jr, & Whitton, 2018) with a meta-analysis concerned with factors found to influence presence in (Cummings & Bailenson, 2016). However, the concept of presence has evolved and has been deconstructed into two orthogonal components (Slater, 2009). Place Illusion (PI) refers to the illusion that participants have of being in the place depicted by the VR displays, even though they know that this is not true. The root of this is that perception be based on the extent to which natural sensorimotor contingencies (O'Regan & Noë, 2001a, 2001b) are afforded by the VR system. This refers to using the whole body for perception (e.g., head turns, looking around and underneath objects, turning the whole body, eye movements) resulting in the same changes in sensory input as in reality. For example, a stereo wide field-of-view head mounted display with head 6 degrees of freedom head tracking meets many of the requirements for natural sensorimotor contingencies for vision, and in the case of spatialised sound, for audition too. The second component of presence is referred to as Plausibility (Psi). This is the illusion that the events that are perceived to be happening in the VR are really happening, even though this is known not to be true. Psi depends on (i) events in the VR responding to the actions of the participant (for example, a virtual character looks back when looked at), (ii) events that spontaneously refer to the participant (e.g., a virtual character contingently looks at the participant and smiles), (iii) that where the VR depicts events or a situation that participants are quite familiar with in reality, that their expectations are met. Requirement (iii) is often difficult to satisfy since it requires detailed domain knowledge by the application designers, and is complex in itself. For example, participants might well accept a VR with strange creatures or where normal physical laws are not obeyed - for example, in the case of 3D chess in VR where chess pieces fly through the air of their own accord (Slater, Linakis, Usoh, & Kooper, 1996) – but not accept a situation where some detail fails to meet expectations – for example, in our work on violence between soccer fans in a bar, our first rendition of the bar was rejected by participants on the grounds that a bar decorated in that way would never be visited by soccer fans (Rovira, Swapp, Spanlang, & Slater, 2009). Medical doctors experienced less Psi in an interview with virtual patients because they were unable to look up patient details on a virtual computer display that was on their virtual desk (Pan et al., 2016). Plausibility is probably the more difficult (and interesting) illusion to generate and has been increasingly studied, for example recently (Hofer, Hartmann, Eden, Ratan, & Hahn, 2020) examined the relationship between PI and Psi with results suggesting their independence, (Galvan Debarba, Chague, & Charbonnier, 2020) studied the impact of different levels of body tracking on Psi using the psychophysics methodology of (Slater et al., 2010), the impact of virtual human character behaviour and other factors on Psi were considered in (Bergström, Azevedo, Papiotis, Saldanha, & Slater, 2017; Skarbez, Neyret, Brooks, Slater, & Whitton, 2017).
The standard experimental paradigm and methods of measurement are appropriate when there are specific hypotheses in mind, or when we know what relationships we are interested in investigating. For example, whether spatialised sound is likely to result in greater scores on a presence questionnaire (Poeschl, Wall, & Doering, 2013), or to examine how display latency influences presence (Meehan, Razzaque, Whitton, & Brooks Jr, 2003). However, in the case of a novel application where there is no or little prior knowledge about how participants may respond, or what factors may be important, this paradigm may be uninformative or even misleading. A questionnaire score can mask critical information.
We embarked on a new research field concerned with the recreation of historical rock concerts in VR. On the technical side the idea was to employ computer vision techniques (Beacco, Gallego, & Slater, 2020) to extract the appearance and movements of the band players in 3D, to use agent based models and crowd rendering to reconstruct a virtual audience, and to place the players and audience in a model of the theatre in which the concert took place. The sound from the original video at the basis of the reconstruction was used for the audio. Our scientific interest has been to explore how people would respond to the virtual concert – would they reject it because of the inevitable lack of realism? Would they join in dancing along with the audience? How much would they feel as if they were at an actual concert? and so on. Which factors might contribute to or detract from these? The particular performance on which we have focussed is from the 1983 Alchemy concert by Dire Straits playing ‘Sultans of Swing’ at the Hammersmith Odeon in London. In our pilot study (Beacco, Oliva, Cabreira, Gallego, & Slater, 2021) 25 participants were recruited online and from an overseas University class. The scenario involved a recreation of the Hammersmith Odeon, the players on stage modelled partially from a video of the live performance, and a virtual audience that surrounded the participant. The audience had a number of realistic male avatars standing in the immediate vicinity of the participant, which were created from photographs of men using our computer vision techniques so that they looked like actual people, and further away from the participant the audience members were standard graphics-based avatars, and further away still were impostors. The audience moved with the music though the dancing animations that were taken from online repositories of animations.
Since our questions were very open, exploring this quite new area of application, instead of questionnaires and behavioural measures, we used sentiment analysis. Sentiment analysis (Bakshi, Kaur, Kaur, & Kaur, 2016; Liu, 2012) relies on prior classifications of millions of words in dictionaries which have been assigned positive or negative valence, we discuss this further below. Pieces of text then obtain a score, for example, as the average score over all the relevant words in the text. The major response variable was derived from a sentiment analysis of short essays that participants were asked to write immediately after their experience. This analysis led to the quite unexpected result that the virtual audience was more impactful than the actual performance by the band. In particular, some participants felt vulnerable, and alone amongst the audience, had a feeling of being stared at by audience members (even though this was not programmed to occur), and especially women felt that they would be on the receiving end of unwelcome approaches from the surrounding men. Such feelings were classified as ‘disturbing’. However, this also signified a high degree of Plausibility of the experience, since a prerequisite of feeling disturbed is that the events in question must be experienced as really happening, an automatic response, not a belief. High disturbance was associated with low sentiment scores. A second contributor to lower sentiment scores was a failure of expectations – examples being the band not interacting with the audience, or the drummer not visually beating in time to the sound of the drums. On the positive side, higher sentiment scores were associated with a feeling of immersion in the concert, people joining in with the dancing of those around them, spatial audio from the band, and the movement of the crowd around.
We refer to this as Study 1. An important overall conclusion from the results of Study 1 was that a standard VR experimental design, with a questionnaire asked after the experience with Likert scale questions, would never have picked up on these deeper findings about the responses of participants. For example, Plausibility may have scored highly, but the underlying disturbance associated with this, might never have been discovered.
Here we report the results of a second experiment (Study 2) of exposure to the VR concert with a number of changes:
-
To further examine the impact of the audience the virtual audience members were all depicted as female. This deliberately went to the other extreme compared to the first study. Would this lessen the chance of disturbance, especially as earlier reported amongst women participants?
-
All audience members visible to the participant were generated with the software Character Creator 3, being more pleasant and realistic than the earlier ones.
-
The movements of the audience members, depicted as dancing along with the music, were based on motion capture of a few individuals actually dancing in rhythm to the same music.
-
There were various small improvements to the portrayal and movement of the band members.
Methods for the two studies are described in the next section, including a description of the sentiment analysis used. We then present results for Study 2, and then combine the data from both Studies and analyse those together. Conclusions about sentiment analysis, the concert scenario and the way forward are presented in the Discussion section.