This work aims to take preliminary steps towards creating a conversational agent that can give advice convincingly with an empathic verisimilitude. This study sought to explore whether a voice assistant should express emotional support and empathy or provide informational advice-only support about a personal problem, as suggested by prior studies with different artificial entities (de Gennaro et al., 2020a; B. Liu & Sundar, 2018b; Morris et al., 2018). Although the expression of empathy and emotion is supportive in human-to-human communication, will it be the same when we converse with an artificial entity, or will we reject it due to its artificiality, uncanniness, age-related emotional or social needs, or biases? On the other hand, vocal cues such as pitch and speech rate are salient in judging the personality of voices, triggering the current stereotypes, identifying and developing relationship bonding (Chang et al., 2018; H. Liu et al., 2010). We designed two generations for age stimuli, with a "mature voice" (speed: 0.9, pitch: -4) around a 60-year-old male and a "young voice" (speed:1.25, pitch:6) around 20 years old male, changing speech rate and pitch on the Voiser platform2 to understand how similar vocal cues affect developing a relationship of older adults with voice assistants. We chose to use only male voices to make the scope of the work more focused and since voice assistants are created with female voice by defult. After producing voices on the platform, we asked 60 participants to guess the voice assistants’ age to doublecheck; all participants agreed on the age period.
This study expects the evidence to support the similarity attraction theory when matching the participants' age and identity traits with the CA's perceived vocal cues. Mixed research is planned to explore what voice characteristics of conversational agents may be preferred and what personality traits they would associate with voice pitch by Turkish older adult users. One of the purposes of this research is to understand how one specific in-group and out-group social cue (an 'older' and 'younger' voice) influences older people's perceptions of conversational agents. It can be postulated that older adults may identify and have a stronger bond with a conversational agent with a mature voice based on the similarity attraction and social identification theory. The other purpose is to explore how older adults’ perceptions vary depending on the VA’s emphatic expression level.
Using CASA Paradigm, SAT, and UVM as a foundation, our goal is thus to provide a preliminary understanding of 1) Do older adults categorize voice assistants as "tool-like" vs "human-like"? 2) What are older adults' perceptions of having social interactions depending on their cultural biases, age-related emotional or social needs or stereotypes? 3) How is older adults' reaction to conversational agents change when the conversational agent's voice characteristics and emphatic expression level vary?
We set up 3 hypotheses to analyze quantitative part of the study :
H1: Participants will trust more and feel more supported by a VA with a mature voice.
H2: Participants will perceive stronger support and trust with a VA with more empathic expression.
H3: Participants’ perceived self-efficacy toward new technologies will increase after the conversation.
3.1 Research Procedure and the Experimental Set-up with Wizard of Oz
Our research process was based on a multi-method approach, which included three successive stages: pre-test, momentary test, post-test. In this context, firstly, we focused on understanding initial impressions of new technologies, synthetic voices, robots and perceived self-efficacy toward new technologies of participants during pre-test interviews. We also aim to understand how they handle stressful situations and their age-related patterns toward the conception of empathy. Participants were also asked to rate their self-efficacy using new technologies on a 5-point scale based on the Tsai and Tsai (2003) internet self-efficacy construct.
This research used a voice-based CA prototype during testing through the Wizard of Oz technique and adopted a multi-method approach. The experiment was realized in a Wizard of Oz setting in which the participants deceptively think they are interacting with an autonomous system. The system's actions were operated by the remote experimenter or "wizard."(Dahlbäck et al., 1993; Large et al., 2019; Medhi Thies et al., 2017) Participants were told they interacted with a conversational agent in this experiment, which automatically responded to their answers. The "wizard" was the one communicating with participants through a pre-planned script.
To test our voice assistant prototype, we have uploaded audios to be played as a reaction to a wide range of user utterances. We have made a soundboard on Powerpoint and hyperlinked all the audios as a button for each response. A wizard played an appropriate audio button as the voice assistant responded according to the user's response or demand. To convince our participants to believe the audio is being played via Google Home (Assistant), we have connected our laptop to Google Home (Assistant) via Bluetooth. We ensured that our Google Home (Assistant) was visible but muted to let us operate our prototype from the computer when running our tests.
During the testing, we sought to observe how the participants used their gestures and the way to position themselves while conversing to understand if they used human-like conversational norms like nodding, uh-hums or facing.
In post-test interviews, we focused on exploring the user experience in-depth, including usage patterns, needs and challenges, their ontological perception of voice assistants, the tendency to use conversational norms when interacting with a voice assistant, their mood change and their perceived self-efficacy change after the conversation.
3.2 Momentary Test Design and the Dialog Flow
We aimed to present an empathic advice-giver voice assistant about Covid-19 quarantine to older adult users. First, we analyzed their Covid-19 experience and emotions by surveying 60 older adult users to determine the dialogue flow. Secondly, to measure the voice assistant's degree of empathic expression, another round of 60 older adult users rated our voice assistant's empathy according to the dialogue flow script that we had pre-prepared. Before beginning the ratings, all the 60 older adult users were given definitions of empathy and statements expressing high/low empathy to rate consistent with our theoretical framework. The ratings were made on a five-point scale (1 ⫽ low empathy; 5 ⫽ high empathy). In high emphatic expression conditions, statements expressing empathy were added, such as " I feel very sorry that you feel very anxious during the Covid pandemic lockdown. Most people experienced the same issues, but your age group had the worst damage," "I truly impressed by your strong character. Many people could not handle this stressful quarantine time so powerfully, " etc. In low emphatic conditions, statements were formal primarily and focused on only advice-giving, such as “The feelings of anxiety, being trapped, and numbness was also reported before due to long isolation. Try to stay calm” and, “You can try an application to do meditation by yourself if you feel stressed and anxious due to lockdown.
We designed and conducted a study where we used a 2×2 within-subjects factorial design; the factors were voice characteristics (mature vs young) and the presence of empathic expression (high empathic expression vs low empathic expression) on the social (perceived support and trust) and functional (perceived self-efficacy toward voice assistant) outcomes of older adult users using verified scales.
Table 1
Sources for Construct Items
Construct | Adapted from |
Perceived support | (van der Zwaan et al., 2012) |
Trust | (Klein, 2007) |
Perceived Self-efficacy | (Tsai & Tsai, 2003) |
Depending on which group the participants belonged to, they were asked to interact with the voice assistants through a four-part conversation, i.e., an initial greeting, small talk, suggestions, and sensitive questions. Firstly, we asked participants to summarize their moods during COVİD 19 pandemic. Suggestion sessions were added after the greeting and small-talk sessions. Then, the conversation gradually moved to sensitive questions. After finishing the sensitive questions, the CA wrapped up the conversation. The two types of dialogues comprised the same conversational topics and suggestions but had different levels of CA's empathy. We created a dialogue flow with high empathy by creating a CA with the semblance of personalized, empathic expression. We chose the experience of Covid pandemic quarantine as the topic because it is the very most sensitive and up-to-date situation to talk about for older adults in Turkey. By adapting previous related works, we created our 4 step dialogue flow (de Gennaro et al., 2020b; Y. C. Lee, Yamashita, & Huang, 2020; Y. C. Lee, Yamashita, Huang, et al., 2020; Lucas et al., 2018).
3.3 Participants
Participants ranged in age from 65 to 75 and were required to have no prior experience using voice assistants (e.g., Alexa, Siri, Cortana). All participants use computers, smartphones, or tablets at least once daily. The Galatasaray University Ethical board approved the study, and informed consent, which was written to use collected data, signed by the participants before participating in the study.
2A platform which convert texts to voices with humanoid machine sounds. The platform allowed us to control the gender, pitch, and speech rate of the agent. https://voiser.net/