Despite verbal content in speech being the main component of human communication for hearing individuals, there is much more to successful communication than just the verbal message. To understand the contextual form of the word and gain full insights into the speaker’s communicative intent, nonverbal auditory information also needs to be comprehended (Liebenthal et al., 2016; Morningstar et al., 2022; Wilson & Sperber, 2002).
Prosody is a complex nonverbal speech attribute associated with acoustic properties of individual speech sounds, pitch, intonation, stress, duration, and intensity. It conveys multidimensional information and diverse functions, including disambiguating sentence structure, highlighting or emphasising elements in a sentence, and signalling emotion (Zatorre & Baum, 2012). Broadly, the functions of prosody can be considered linguistically (to distinguish whether a statement is declarative or interrogative), or applied in terms of decoding a speaker’s emotional state (Rohrer, Sauter, et al., 2010). Modulations in vocal pitch (fundamental frequency (F0)), syllable length, duration, intensity, and voice quality perceived by listeners to convey emotional states are collectively known as ‘emotional prosody’ (Everhardt et al., 2020; Horley et al., 2010). Different emotions tend to create different prosody ‘profiles’ in speech. For instance, joy is typically characterised by a faster speech rate, higher intensity, and increases in F0 mean and variability, resulting in more melodic and energetic speech; while sadness is typically characterised by slower speech, at a lower intensity, with decreases in F0 mean and variability (Banse & Scherer, 1996; Schirmer & Kotz, 2006).
Consideration of how processing of prosody is affected in neurodegenerative diseases is important: syndromes across the Alzheimer’s disease (AD) and frontotemporal lobar degeneration spectrum are characterised by impairments in speech processing (Crutch et al., 2013; Gorno-Tempini et al., 2011; Hardy et al., 2016; Rohrer, Ridgway, et al., 2010; Weiner et al., 2008) and/or social signal processing (Bertoux et al., 2016; Downey et al., 2015; Keane et al., 2002; Marshall et al., 2019; Sivasathiaseelan et al., 2021). Impaired prosodic processing has been shown to have implications for social interactions and interpersonal relationships (Everhardt et al., 2020). Previous research has identified impairments in nonverbal auditory perception in AD, non-fluent/agrammatic variant primary progressive aphasia (nfvPPA) and semantic variant primary progressive aphasia (svPPA) (Bozeat et al., 2000; Golden et al., 2015; Goll et al., 2010; Grube et al., 2016; Hsieh et al., 2011; Omar et al., 2010). Impaired perception of emotional prosody has been documented in AD (Amlerova et al., 2022; Arroyo-Anlló et al., 2019; Horley et al., 2010; Taler et al., 2008; Testa et al., 2001), svPPA, nfvPPA and logopenic variant primary progressive aphasia (lvPPA) (Macoir et al., 2019; Rankin et al., 2009; Rohrer, Sauter, et al., 2010; Shany-Ur & Rankin, 2011).
However, all of these studies have considered emotional prosodic processing using ‘clear’ speech stimuli under ideal, laboratory settings, which are unlikely to reflect the reality of communication in the real world where speech is often distorted (e.g. via a poor telephone or videoconferencing connection) or masked by other competing signals. One widely used technique for altering speech signals experimentally is noise-vocoding, a technique in which the speech signal is divided digitally into discrete frequency bands (‘channels’), each filled with white noise and modulated by the amplitude envelope of the original signal (Shannon et al., 1995). Among various alternative methods (Jiang et al., 2021; Mattys et al., 2012), noise-vocoding is an attractive paradigm to study the effects of neurodegenerative diseases on the processing of degraded speech generally for three reasons. Firstly, it allows for parameterisation of the amount of acoustic degradation of a speech signal. Secondly, spectral detail is important to many emotional prosodic cues, so noise-vocoding (as a technique that targets spectral detail) is a logical paradigm to assess the impact of acoustic degradation on comprehension of prosodic signals. Thirdly, noise-vocoding has previously been used successfully in the extraction of verbal message comprehension threshold in the patient groups targeted in the present study (Jiang et al., 2023).
Here, we explored perception of emotional prosody in major primary progressive aphasia syndromes and Alzheimer’s disease, both in ‘clear’ and ‘degraded’ speech forms. In line with previous research (Amlerova et al., 2022; Arroyo-Anlló et al., 2019; Horley et al., 2010; Macoir et al., 2019; Omar et al., 2011; Rankin et al., 2009; Rohrer, Sauter, et al., 2010; Shany-Ur & Rankin, 2011; Taler et al., 2008; Testa et al., 2001), we hypothesised that people with AD and primary progressive aphasia (PPA) (particularly nfvPPA) would perform worse than healthy individuals at identifying ‘clear’ emotional prosody. Secondly, we hypothesised that AD and PPA patients would have an additional cost from degrading emotional prosodic cues compared with healthy individuals, particularly in nfvPPA and lvPPA. Thirdly, we predicted that emotional prosody identification performance would correlate with measures of daily life socio-emotional functioning.