Participants. Twenty-four actors (14 males, 10 females; MAge = 42.5 ± 14 years) were recruited for the experiment through local theatre companies and academic theatre programs. All actors were legal adults who spoke English either as their native language or fluently as their second language (n = 1). Actors were selected for their overall level of acting experience (i.e., a minimum of three years of experience; MExp = 27.5 ± 14.3). Fourteen held degrees in acting, and two were pursuing degrees in acting at the time of the experiment. Seventeen of the 24 participants self-identified as professional actors. All participants gave written informed consent and were given monetary compensation for their participation. In addition, written informed consent was provided by the model used in Supplementary Figure 1 for publication of identifying information/images in an online open-access publication. The study was approved by the McMaster University Research Ethics Board, and all experiments were conducted in accordance with relevant guidelines and regulations.
Characters and emotions. The methods and procedures are similar to those reported in Berry and Brown (2019, 2021) 23,34. The actors performed the 9 characters from the 3 x 3 (assertiveness x cooperativeness) classification scheme validated by Berry and Brown (2017) 30 (see Figure 1). The actors, in addition to portraying characters, performed 8 basic emotions (happy, sad, angry, surprised, proud, calm, fearful, and disgusted) and neutral, as based on previous emotion studies using actors 18,21,36,37. The selected emotions were grouped according to an approximate dimensional analysis (e.g., Russell, 1980) 38, rather than examining them individually (see also Castellano et al., 2007) 16. A 2 x 2 scheme was used according to the valence and arousal of each emotion as follows: positive valence + high arousal (happy, proud, surprised), negative valence + high arousal (angry, fearful, disgusted), positive valence + low arousal (calm), and negative valence + low arousal (sad). The order of presentation of the 9 characters and 9 emotions was randomized across the 18 trials for each participant. The actors performed a semantically neutral monologue-script for each of the 18 trials. The script was created for the study and consisted of 7 neutral sentences (M = 6 ± 1.4 words/sentence) derived from a set of 10 validated linguistically-neutral sentences from Ben-David et al. (2011) 39. Each trial lasted approximately two minutes, and the full set of trials lasted no more than 45 minutes. At the end of the session, the actor was debriefed and compensated.
Motion capture. The experiment took place in a black-box performance laboratory. Actors performed each of the 18 trials on stage, facing an empty audience section. The performances were video- and audio-recorded using a Sony XDCam model PXW-X70. A Qualisys three-dimensional (3D) passive motion-capture system was used to record body gestures and facial expressions for each actor. Sixteen Qualisys Oqus 7 infrared cameras captured marker movement in three dimensions at a sampling rate of 120 Hz 40,41. Participants were equipped with 61 passive markers placed on key body/facial landmarks, providing bilateral full-body coverage. Of these, 37 markers were placed on the torso and limbs, 4 on the head via a cap, and 20 markers on the face. The markers used in the present analysis were placed bilaterally on the thumbs, elbows, and hips, as well as two single midline markers placed on the sternum and bridge of the nose, respectively
Data processing and cleaning. Marker movements for the body were recorded in 2D and reconstructed in 3D for analysis. The 2D-tracked motion data were processed using the Qualisys reconstruction algorithm, creating an analyzable 3D model within the user interface (UI; Qualisys, 2006) 41. Following this, each trial was cleaned manually using the 3D model via the UI (i.e., each marker and trajectory was identified manually, provided a label, and extracted). Extraneous trajectories (e.g., noise, errors, reflective artifacts, unassigned or outlying markers) were excluded. No interpolation was done (i.e., no gaps in the 3D motion trajectory were filled). Instead, the data from a particular marker were temporally omitted. This was done to prevent the system from incorrectly interpolating and/or skewing the motion data and thereby artificially changing the mean. The cleaned X coordinates (anterior-posterior movement), Y coordinates (right-left movement), and Z coordinates (superior-inferior movement) were extracted into data tables for further analysis.
Transformation of variable parameters. The variables of interest in this study are those related to expansion and contraction of body segments. From the 61 available markers, we selected a subset of 8 for the current analysis: markers located on the nose bridge, sternum, left and right elbow, left and right thumb, and left and right hip. Pairs of markers were combined into 6 body segments whose expansion and contraction were measured in three dimensional space, as shown in Figure 2. We examined two vertical postural segments: 1) “head”, extending from the sternum to the bridge of the nose, to indicate sagittal rotation of the head; and 2) “torso”, extending from the left/right hip to the sternum, to indicate sagittal rotation of the torso at the waist. Next, we examined four segments related to horizontal and vertical expansion/contraction of the upper limbs: 3) “horizontal arm”, extending between the left and right elbows; 4) “vertical arm”, extending from the left/right hip to the left/right elbow; 5) “horizontal hand”, extending between the left and right thumbs; and 6) “vertical hand”, extending from the left/right hip to the left/right thumb. The term “left/right” implies that the mean was taken for the two sides of the body for that segment. Each segment’s length was calculated from the raw exported X, Y, and Z coordinates for the pair of contributing markers using the following formula for Euclidean distance:
d = √(x2 – x1)2 + (y2 – y1)2 + (z2 – z1)2
where d is the Euclidean distance (i.e., the absolute geometric distance) between two points in 3D space, and x, y, and z are the 3D coordinates of a single sample at time (2) and time (1), respectively. A time series of the Euclidean distance for each body segment was created for each approximately-2-minute trial. The mean segment length across this time series was calculated for each of the 6 body segments using the following formula:
Md ij = ∑ dij / sr*(tij)
where Md is the mean Euclidean distance in mm between marker pairs over the length of the entire trial (i.e., the mean segment length), d is the segment length, sr is the motion capture sample rate (i.e., 120 Hz), and t is the time in seconds of the entire trial. This resulted in a total of 6 parameters for the analysis (i.e., 6 body-segment means). Each body-segment parameter mean was extracted for each participant (i) for each character or emotion condition (j).
[ Insert Figure 2 about here ]
Correcting for body-size differences. A “percent change” transformation was applied to the 6 segmental parameter means in order to eliminate any bias caused by subject-related differences in body size. This was carried out by subtracting the mean segmental lengths for the neutral emotion condition (i.e., performing the script devoid of any character or emotion) from the means for each character and emotion trial, as per the following formula:
% change = 100 * ((Md[performance] – Md[neutral]) / Md[neutral])
where the percent change is the difference between the mean Euclidean distance for a participant’s given performance condition (character or emotion) and the participant’s neutral emotion condition, scaled to the neutral condition, and then multiplied by 100. As a result, all data for the characters and emotions are reported as a percent change relative to the neutral emotion condition. Following this transformation, each parameter was visually screened for extreme outliers, of which none were found. Finally, to reduce handedness effects, the bilateral average was taken for the vertical arm, vertical hand, and torso.
Analysis of variance. Statistics were conducted in R 4.0.2 (R Core Team, 2018) 42. Each of the 6 transformed parameters was analyzed using a two-way repeated-measures analysis of variance (RM ANOVA), which fits a linear model (lm) using the stats package (v3.6.2, R Core Team, 2013) 43. For the character trials, the two orthogonal dimensions of assertiveness and cooperativeness were treated as fixed effects (i.e., within-subject factors), while subject was treated as the random effect (i.e., error). For the emotion trials, the two approximated dimensions of valence and arousal from the circumplex model of emotion 38 were treated as fixed effects, while subject was again treated as the random effect. The neutral emotion condition – which was used as the baseline condition for data normalization – was not included in either of these analyses. The final sample for the repeated-measures ANOVA’s was therefore n = 216 for characters (9 characters x 24 participants) and n = 192 for emotions (8 emotions x 24 participants). Statistical significance levels were set to α < .05, and adjustments for repeated testing for the group of 6 segmental parameters were made using Bonferroni corrections (i.e., α/6 for each segment, resulting in a corrected threshold of α < .008; 23,44. The significance of statistical analyses and the estimates of effect size using general eta-squared (ƞ2) and partial eta-squared (ƞp2) were calculated using the rstatix package (v0.7.0, Kassambara, 2019)45.
Cross-modal correlation analysis. We used the combination of the character and emotion trials with the stats package (v3.6.2, R Core Team, 2013) 43 to calculate Pearson product-moment correlations between the 6 body segmental parameters, 4 facial segmental parameters for the brow, eyebrows, lips, and jaw (reported in Berry & Brown, 2021) 34, and the 2 vocal parameters of pitch (in cents) and loudness (in decibels) (reported in Berry & Brown, 2019) 23. All parameters were z-score transformed within-subject in order to avoid any scale-related artifacts due to intermodal variability in comparing parameters across the body, face, and voice. The neutral emotion condition – which was used as the baseline condition for data normalization – was not included in either this analysis. The final sample for the cross-modal correlations was therefore n = 408 (9 characters x 24 participants + 8 emotions x 24 participants). Statistical significance was set to α < .05, and adjustment for repeated testing of 53 analyzed intermodal correlations was made using Bonferroni corrections (i.e., α/53 for each correlation, resulting in a corrected threshold of α < .0009). An additional 13 intramodal correlations are presented in Table 3 (e.g., the correlation between the lips and brow within the face), but these were not correlations of interest, only the 53 intermodal correlations.
Table 3
| VOICE | FACE | BODY |
Pitch | Loudness | Brow | Eyebrow | Lips | Jaw | Head | Torso | Arm (h) | Arm (v) | Hand (h) | Hand (v) |
VOICE | Pitch | NA | < 0.001 | 0.009 | 0.803 | < 0.001 | < 0.001 | < 0.001 | 0.360 | < 0.001 | < 0.001 | < 0.001 | < 0.001 |
Loudness | 0.79*** | NA | 0.010 | 0.866 | 0.010 | < 0.001 | < 0.001 | 0.236 | < 0.001 | < 0.001 | < 0.001 | < 0.001 |
FACE | Brow | 0.13 | 0.13 | NA | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 0.781 | 0.144 | 0.737 | < 0.001 |
Eyebrow | -0.01 | 0.01 | 0.62*** | NA | < 0.001 | 0.002 | < 0.001 | < 0.001 | 0.482 | 0.872 | 0.745 | < 0.001 |
Lips | 0.19** | 0.13 | 0.37*** | 0.42*** | NA | 0.933 | < 0.001 | < 0.001 | 0.309 | 0.015 | 0.256 | 0.031 |
Jaw | 0.69*** | 0.59*** | 0.24*** | 0.16 | 0 | NA | < 0.001 | 0.048 | < 0.001 | < 0.001 | < 0.001 | < 0.001 |
BODY | Head | 0.30*** | 0.37*** | 0.20** | 0.25*** | 0.25*** | 0.38*** | NA | 0.019 | 0.002 | < 0.001 | < 0.001 | < 0.001 |
Torso | -0.05 | 0.06 | 0.22*** | 0.20** | 0.27*** | -0.1 | 0.12 | NA | 0.033 | < 0.001 | 0.004 | < 0.001 |
Arm (h) | 0.32*** | 0.39*** | 0.01 | -0.03 | 0.05 | 0.32*** | 0.15 | 0.11 | NA | < 0.001 | < 0.001 | < 0.001 |
Arm (v) | 0.38*** | 0.41*** | 0.07 | 0.01 | 0.12 | 0.30*** | 0.17* | 0.18* | 0.78*** | NA | < 0.001 | < 0.001 |
Hand (h) | 0.31*** | 0.49*** | 0.02 | -0.02 | 0.06 | 0.20** | 0.18* | 0.14 | 0.26*** | 0.37*** | NA | 0.970 |
Hand (v) | 0.42*** | 0.30*** | 0.20** | 0.17* | 0.11 | 0.50*** | 0.18* | -0.19** | 0.24*** | 0.30*** | 0 | NA |
Note: Summary of the Pearson product-moment correlations and their significance for each modality: voice (pitch and loudness), face (brow, eyebrow, lips, and jaw), and body (head, torso, horizontal arm, vertical arm, horizontal hand, and vertical hand). The lower triangle contains Pearson r values, whereas the upper triangle contains uncorrected p values. For the upper triangle, p values that failed to reach significance after Bonferroni correction are in italics. P values that retained significance after Bonferroni corrections are in bold. For the lower triangle, *pCORR < .05, **pCORR < .01, ***pCORR < .001, where “CORR” reflects the adjusted alpha value after Bonferroni correction for the 53 analyzed intermodal correlations (indicated by the region of grey shading in the table), which are a subset of the 66 correlations shown in the table. |