In this study, we examined the impact of different interaction settings on decision-making and creativity. A total of 56 participants were randomly assigned to one of three conditions: FTF, VC, and VR. Each group engaged in a 20-minute brainstorming session to generate ideas on an assigned problem, followed by a 20-minute phase to collaboratively rank the top 5 ideas. Creativity was assessed using three dimensions: uncommonness, remoteness, and cleverness. Decision-making efficacy was calculated by dividing the time spent on ranking by the number of ideas generated. Participants also completed 4 questionnaires: the Zoom Exhaustion & Fatigue Scale, the Flow Short Scale, the Perceived Performance Scale, and the Single-Item Social Identification.
As depicted in Figure 1 and Figure 2, the graphs provide a comprehensive visualization of the outcomes across all investigated variables, which are denoted in the respective corners of the radar plot. To facilitate cross-variable comparisons, data normalization was employed, given that the constructs under investigation were assessed using disparate measurement scales. For variables lacking a predefined scale—such as the number of ideas generated—the normalization was executed based on the maximum observed value for that specific variable.
Figure 1 encapsulates the outcomes from questionnaires focused on Flow experience, perceived performance, social identification, and the Zoom Exhaustion & Fatigue Scale. Conversely, Figure 2 encompasses the analyses pertinent to creativity, decision-making, and the four dimensions scrutinized via the Bales Interaction Process Analysis grid30.
Fig. 1: Multidimensional Analysis of Psychological and Cognitive Metrics Across Interaction Settings This figure graphically represents the mean scores for each of the three interaction conditions—face-to-face (FTF, represented by the green line), video call via Teams (represented by the red line), and virtual reality (VR, represented by the blue line)—across various psychological and cognitive constructs. The constructs included are Flow experience, Perceived Performance, Social Identification, and Zoom Exhaustion & Fatigue Scale. Data normalization was applied to account for different measurement scales across the constructs.
Fig. 2: Comparative Analysis of Creativity, Decision-Making, and Interaction Dynamic This figure illustrates the mean scores for each interaction condition—face-to-face (FTF, represented by the green line), video call via Teams (represented by the red line), and virtual reality (VR, represented by the blue line)—in relation to creativity, decision-making, and the four quadrants assessed by the Bales Interaction Process Analysis grid30. Data normalization was performed to ensure comparability across variables that were measured on different scales.
Table 1 presents the means and standard deviations for each dimension across the different types of interaction settings. Tables 2 and 3 delineate the outcomes of the statistical analyses conducted. Specifically, Table 2 contains the results for data samples conforming to a normal distribution, for which Analysis of Variance (ANOVA) and t-tests were employed. Conversely, Table 3 features the results for non-parametric distributions, analyzed using Friedman and Wilcoxon tests. In both tables, the significance level of the p-value is denoted by asterisks.
|
Video Conference
|
|
Virtual Reality
|
|
Face-to-face
|
Variables
|
M
|
SD
|
|
M
|
SD
|
|
M
|
SD
|
FSS
|
|
|
|
|
|
|
|
|
Flow experience
|
5.21
|
0.86
|
|
5.14
|
1.07
|
|
5.55
|
0.70
|
Perceived outcome importance
|
2.38
|
0.88
|
|
3.12
|
1.14
|
|
2.85
|
1.14
|
Fluency of performance
|
5.24
|
1.05
|
|
4.83
|
1.34
|
|
5.4
|
0.94
|
Absorption by activity
|
4.55
|
0.88
|
|
5.05
|
0.95
|
|
5.12
|
0.68
|
ZEFS
|
|
|
|
|
|
|
|
|
General
|
1.5
|
0.74
|
|
2.39
|
1.11
|
|
1.76
|
0.73
|
Social
|
1.97
|
0.96
|
|
2
|
1.05
|
|
2.09
|
1.09
|
Motivational
|
1.83
|
0.89
|
|
2.06
|
1.01
|
|
2.04
|
1.04
|
Emotional
|
1.35
|
0.76
|
|
1.59
|
0.75
|
|
1.64
|
0.82
|
Visual
|
1.49
|
0.65
|
|
2.33
|
1.02
|
|
|
|
PPS
|
|
|
|
|
|
|
|
|
Perceived performance
|
4.91
|
1.05
|
|
4.67
|
1.3
|
|
4.88
|
1.25
|
Different group
|
1.97
|
1.08
|
|
2.27
|
1.57
|
|
2.29
|
1.45
|
SISI
|
|
|
|
|
|
|
|
|
Single-item social identification
|
6.26
|
1.07
|
|
6
|
1.21
|
|
6.18
|
0.94
|
Number of ideas
|
15.6
|
5.1
|
|
16.8
|
7.77
|
|
16.2
|
5.03
|
Evaluation of ideas
|
|
|
|
|
|
|
|
|
Uncommon
|
3.44
|
0.21
|
|
3.35
|
0.3
|
|
3.44
|
0.28
|
Remote
|
3.05
|
0.36
|
|
2.99
|
0.26
|
|
3.06
|
0.31
|
Clever
|
3.66
|
0.25
|
|
3.55
|
0.14
|
|
3.57
|
0.13
|
Decision making
|
40.94
|
16.64
|
|
38.44
|
15.62
|
|
32.85
|
16.29
|
IPA
|
|
|
|
|
|
|
|
|
Social emotional Area: Positive Reactions
|
76.9
|
16.18
|
|
97.4
|
22.55
|
|
75
|
9.71
|
Task Area: Attempted Answers
|
82.1
|
12.82
|
|
101.3
|
15.55
|
|
113.6
|
14.78
|
Task Area: Questions
|
15
|
8.58
|
|
24
|
10.02
|
|
13.4
|
2.95
|
Social emotional Area: Negative Reactions
|
10.9
|
5.93
|
|
8
|
4.29
|
|
14.1
|
5.92
|
Table 1: Descriptive Statistics for Interaction Conditions
This table presents the means and standard deviations for all measured dimensions, segmented by each type of interaction condition: face-to-face, video conference and virtual reality. M stand for mean, and SD for standard deviation.
|
VC x VR x FTF
|
|
VC x VR
|
|
VR x FTF
|
|
VC x FTF
|
Variables
|
F
|
P
|
|
M diff.
|
T
|
P
|
|
M diff.
|
T
|
P
|
|
M diff.
|
T
|
P
|
|
FSS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Perceived outcome importance
|
7.781
|
0.0008
|
***
|
|
-0.71
|
-4.343
|
0.0001
|
***
|
|
0.36
|
1.288
|
0.2055
|
|
|
-0.35
|
-2.458
|
0.0186
|
*
|
|
Fluency of performance
|
3.804
|
0.0266
|
*
|
|
0.48
|
1.551
|
0.1291
|
|
|
-0.60
|
-2.574
|
0.0141
|
*
|
|
-0.12
|
-1.213
|
0.2326
|
|
|
Absorption by activity
|
7.147
|
0.0014
|
**
|
|
-0.49
|
-2.480
|
0.0177
|
*
|
|
0.05
|
-0.432
|
0.6682
|
|
|
-0.44
|
-4.837
|
2.207e-05
|
***
|
|
PPS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Perceived performance
|
0.776
|
0.464
|
|
|
0.24
|
1.062
|
0.2951
|
|
|
-0.21
|
-1.108
|
0.2747
|
|
|
0.02
|
0.118
|
0.9070
|
|
|
Creativity
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Number of ideas
|
0.123
|
0.885
|
|
|
-1.20
|
-0.513
|
0.6200
|
|
|
0.60
|
0.199
|
0.8463
|
|
|
-0.60
|
-0.347
|
0.7362
|
|
|
Decision making
|
1.038
|
0.375
|
|
|
2.51
|
0.344
|
0.7389
|
|
|
5.59
|
1.063
|
0.3156
|
|
|
8.10
|
1.882
|
0.0926
|
.
|
|
IPA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Social emotional Area: Positive Reactions
|
4.757
|
0.0219
|
*
|
|
-20.50
|
-2.148
|
0.0603
|
.
|
|
22.40
|
2.672
|
0.0255
|
*
|
|
1.90
|
0.330
|
0.7491
|
|
|
Task Area: Attempted Answers
|
18.380
|
4.49e-05
|
***
|
|
-19.20
|
-3.191
|
0.0110
|
*
|
|
-12.30
|
-2.317
|
0.0457
|
*
|
|
-31.50
|
-7.442
|
3.926e-05
|
***
|
|
Social emotional Area: Negative Reactions
|
3.116
|
0.0689
|
.
|
|
2.90
|
1.174
|
0.2704
|
|
|
-6.10
|
-2.690
|
0.0248
|
*
|
|
-3.20
|
-1.238
|
0.2471
|
|
|
Table 2: Parametric Statistical Analysis Results This table displays the outcomes of parametric statistical analyses, including ANOVA and t-tests, conducted on data samples with a normal distribution. Significance levels are indicated by asterisks next to the p-values. False Discovery Rate (FDR) correction was applied for multiple comparisons. VC stands for Video Conference, VR for Virtual Reality, and FTF for Face To Face interactions. M diff stands for Mean Difference.
|
VC x VR x FTF
|
|
VC x VR
|
|
VR x FTF
|
|
VC x FTF
|
Variables
|
Chi-squared
|
P
|
|
M diff.
|
V
|
P
|
|
M diff.
|
V
|
P
|
|
M diff.
|
V
|
P
|
|
FSS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Flow experience
|
2.430
|
0.2968
|
|
|
0.13
|
391.5
|
0.7662
|
|
|
-0.38
|
203.5
|
0.0260
|
*
|
|
-0.25
|
168.5
|
0.0099
|
**
|
|
ZEFS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
General
|
26.922
|
1.426e-06
|
***
|
|
-0.89
|
73.5
|
0.0002
|
***
|
|
0.63
|
444.0
|
0.0035
|
**
|
|
-0.26
|
70.5
|
0.0076
|
**
|
|
Social
|
2.346
|
0.3095
|
|
|
-0.03
|
149.0
|
0.9885
|
|
|
-0.09
|
165.0
|
0.5715
|
|
|
-0.12
|
152.0
|
0.5570
|
|
|
Motivational
|
1.863
|
0.3939
|
|
|
-0.23
|
160.0
|
0.2156
|
|
|
0.02
|
203.5
|
1.0000
|
|
|
-0.21
|
124.5
|
0.1201
|
|
|
Emotional
|
6.615
|
0.0366
|
*
|
|
-0.24
|
60.0
|
0.0955
|
|
|
-0.05
|
124.5
|
0.6906
|
|
|
-0.29
|
50.0
|
0.0132
|
*
|
|
Visual
|
|
|
|
|
-0.84
|
66.5
|
0.0002
|
***
|
|
|
|
|
|
|
|
|
|
|
|
PPS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Different group
|
0.640
|
0.7264
|
|
|
-0.30
|
141.5
|
0.5792
|
|
|
-0.02
|
123.0
|
0.9218
|
|
|
-0.32
|
99.5
|
0.2449
|
|
|
SISI
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Single-item social identification
|
2.355
|
0.3081
|
|
|
0.26
|
98.0
|
0.1163
|
|
|
-0.18
|
46.5
|
0.4445
|
|
|
0.08
|
48.0
|
0.4644
|
|
|
Evaluation of ideas
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Uncommon
|
1.800
|
0.4066
|
|
|
0.09
|
20.0
|
0.3525
|
|
|
-0.09
|
8.0
|
0.3525
|
|
|
0
|
16.0
|
0.8336
|
|
|
Remote
|
0.200
|
0.9048
|
|
|
0.06
|
16.0
|
0.7998
|
|
|
-0.07
|
10.0
|
0.5541
|
|
|
-0.01
|
17.0
|
0.9442
|
|
|
Clever
|
2.600
|
0.2725
|
|
|
0.11
|
21.0
|
0.2719
|
|
|
-0.02
|
13.0
|
0.9326
|
|
|
0.09
|
28.0
|
0.1834
|
|
|
IPA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Task Area: Questions
|
15.436
|
0.0004
|
***
|
|
-9.00
|
0
|
0.0058
|
**
|
|
10.60
|
55.0
|
0.0059
|
**
|
|
1.60
|
24.0
|
0.9055
|
|
|
Table 3: Non-Parametric Statistical Analysis Results This table showcases the results of non-parametric statistical analyses, specifically Friedman and Wilcoxon tests, conducted on data samples with non-normal distributions. Significance levels are denoted by asterisks adjacent to the p-values. False Discovery Rate (FDR) correction was applied for multiple comparisons. VC stands for Video Conference, VR for Virtual Reality, and FTF for Face To Face interactions. M diff stands for Mean Difference.
Flow Short Scale
The analysis yielded several noteworthy findings concerning the perception of the flow state across different interaction conditions.
Flow Experience
Statistically significant differences were observed in the perception of flow between VR and FTF conditions (p=0.026*; Mean Difference= -0.38), as well as between VC and FTF conditions (p=0.0099**; Mean Difference= -0.25). The FTF condition registered the highest mean score, while the VR condition recorded the lowest.
Perceived Outcome Importance
In terms of perceived outcome importance, the VR condition yielded the highest mean, whereas the VC condition had the lowest. A one-way ANOVA revealed a highly significant p-value of 0.0008***. Subsequent pairwise t-tests between VC and FTF (p=0.0186*; Mean Difference= -0.35) and VC and VR (p=0.0001***; Mean Difference= -0.71) were also significant.
Fluency of Performance
For the dimension of fluency of performance, a significant ANOVA p-value of 0.0266* was obtained. The lowest mean was associated with the VR condition, while the highest was observed in the FTF condition. A significant t-test result was found specifically for this pair (p=0.0141*; Mean Difference= -0.60).
Absorption by Activity
Regarding absorption by activity, the VC condition exhibited the lowest mean, while the FTF condition had the highest. The ANOVA yielded a significant p-value of 0.0014. Pairwise t-tests revealed significant p-values for VC vs. VR (p=0.0177*; Mean Difference= -0.49) and VC vs. FTF (p<0.0001***; Mean Difference= -0.44).
Zoom Exhaustion & Fatigue Scale (ZEF Scale)
Non-parametric statistical methods were employed for all dimensions of the ZEFS questionnaire.
General Fatigue
The Friedman test yielded highly significant results for the dimension of General Fatigue (p<0.0001***). Subsequent Wilcoxon tests for pairwise comparisons revealed significant p-values across all three conditions, thereby establishing the means as significantly distinct. The ranking order of the means is as follows: VC (M=1.50), FTF (M=1.76), and VR (M=2.39). The pairwise results are: VC vs. VR (p=0.0002***; Mean Difference= -0.89), VR vs. FTF (p=0.0035**; Mean Difference= 0.63), and VC vs. FTF (p=0.0076**; Mean Difference= -0.26).
Emotional Fatigue
Another dimension that yielded a significant Friedman test p-value was Emotional Fatigue (p=0.0366*). In this dimension, the only pairwise comparison that reached statistical significance in the Wilcoxon test was between VC and FTF (p=0.0132*; Mean Difference= -0.29). The FTF condition registered the highest mean (M=1.64), whereas the VC condition had the lowest (M=1.35).
Visual Fatigue
The dimension of Visual Fatigue was assessed solely in the VC (M=1.49) and VR (M=2.33) conditions. A Wilcoxon test revealed a highly significant p-value for this pairwise comparison (p=0.0002***; Mean Difference= -0.84).
Perceived Performance Scale (PPS)
Perceived Performance
The initial dimension explored by the PPS was Perceived Performance. The ANOVA analysis yielded a non-significant p-value (p=0.464), suggesting that participants, on average, did not perceive any performance disparities across the three interaction conditions. Subsequent t-tests corroborated this finding, as they also produced non-significant p-values.
Different Group
The second dimension, termed Different Group, aimed to assess whether participants believed they would generate either a greater number or higher quality of ideas with alternative group members. For this dimension, a Friedman test was conducted, resulting in a non-significant p-value (p=0.7264). Pairwise comparisons using the Wilcoxon test further substantiated this outcome, as all returned non-significant p-values.
Single-Item Social Identification (SISI)
This questionnaire consists of a single item and investigates the perception of belonging to the group. The Friedman test conducted on the data of the three conditions obtained a non-significant p-value (p=0.3081). Further comparisons between pairs using the Wilcoxon test also all reported non-significant p-values.
Interaction Process Analysis (IPA)
This analytical tool was employed to scrutinize all interactions across the different conditions. Each interaction was categorized into distinct areas of interaction, and several significant findings were observed. These are elaborated upon below.
Social-emotional Area: Positive Reactions
ANOVA analysis for this area yielded a significant p-value (p=0.0219*). The condition with the highest mean frequency of positive interactions was VR (M=97.4), while the FTF condition registered the lowest mean (M=75). Pairwise t-tests revealed a trend towards significance for the VC x VR comparison (p=0.0603; M diff= -20.5) and a significant p-value for the VR x FTF comparison (p=0.0255*; M diff=22.40).
Task Area: Attempted Answers
This area encompasses all participant attempts at providing answers and explanations. The ANOVA analysis yielded a highly significant p-value (p=4.49e-05***). Subsequent t-tests for all pairwise comparisons were also significant: VC x VR (p=0.0110*; M diff= -19.20), VR x FTF (p=0.0457*; M diff= -12.30), and VC x FTF (p=3.926e-05***; M diff= -31.50). Given the significant differences across all pairs, the conditions can be ranked by their respective means: VC (M=82.1), VR (M=101.3), and FTF (M=113.6).
Task Area: Questions
This area exhibited a non-parametric distribution and was thus analyzed using Friedman and Wilcoxon tests. The overall comparison yielded a highly significant p-value (p=0.0004***). Pairwise comparisons revealed significant p-values for VC x VR (p=0.0058**; M diff= -9.00) and VR x FTF (p=0.0059**; M diff=10.60). Given these results and the observed means (VC M=15; VR M=24; FTF M=13.4), it can be concluded that VR significantly outpaces the other conditions in terms of the frequency of questions and requests for clarification.
Social-emotional Area: Negative Reactions
This area includes interactions that generate tension, attempts to dominate, and rejection of others' ideas. The ANOVA analysis produced a p-value approaching significance (p=0.0689). Subsequent t-tests revealed a significant p-value only for the VR x FTF pair (p=0.0248*; M diff= -6.10). VR registered the lowest mean frequency of negative interactions (M=8), while FTF had the highest (M=14.1).
Creativity
Preliminary Analysis: Topic Variability
Prior to assessing the influence of communication settings on creativity, an initial analysis was conducted to evaluate the potential variability in the number of ideas generated across three different thematic areas (see Methods) — namely Tourism (M=13.6), Restaurant (M=19.3), and Pollution (M=15.7). Shapiro's test confirmed the normality of the distributions for these samples. A repeated-measures ANOVA yielded a significant p-value (p=0.0344*). Pairwise t-tests revealed a significant difference between the Tourism and Restaurant topics (p=0.0301; M diff= -5.7). However, this variability was deemed inconsequential for the broader study, as each topic was employed in a balanced manner across all conditions.
Number of Ideas Generated
The mean number of ideas generated for each experimental condition were as follows: VC (M=15.6), VR (M=16.8), and FTF (M=16.2). Given the normal distribution of these data, an ANOVA test was employed for comparative analysis, resulting in a non-significant difference (p=0.885).
Qualitative Analysis: OSF Tool
Subsequently, two independent human raters evaluated each generated idea using the OSF tool, which employs three distinct criteria: Uncommon, Remote, and Clever. The mean ratings for each criterion across the conditions were: Uncommon (VC M=3.44; VR M=3.35; FTF M=3.44), Remote (VC M=3.05; VR M=2.99; FTF M=3.06), and Clever (VC M=3.66; VR M=3.55; FTF M=3.57).
Given the non-parametric nature of these samples, Friedman tests were conducted for comparative analysis. The results indicated non-significant p-values across all three dimensions: Uncommon (p=0.4066), Remote (p=0.9048), and Clever (p=0.2725).
Decision making
To assess the impact of the experimental settings on group decision-making efficacy, we computed a ratio representing the time required to reach a consensus on idea ranking relative to the number of ideas generated in each session. The mean ratios for the three experimental conditions were as follows: VC (M=40.94), VR (M=38.44), and FTF (M=32.85). An Analysis of Variance (ANOVA) was conducted to compare these means, yielding a non-significant p-value (p=0.375).
Correlation and linear regression
Table 4 presents the outcomes of correlation analyses, along with associated p-values, examining the relationship between participant age and responses across all questionnaires within the various experimental conditions. With respect to the questionnaire probing the state of flow, the correlation analyses between its multiple constructs and participant age yielded no statistically significant results.
In the analysis of responses to the Zoom Exhaustion & Fatigue Scale (ZEFS), several noteworthy correlations with age emerged. In the VC condition, three dimensions—General Fatigue (r= -0.2725; p=0.0933), Motivational Fatigue (r= -0.2812; p=0.0829), and Visual Fatigue (r= -0.2835; p=0.0891)—displayed p-values approaching significance, all indicating a negative correlation with age.
For the VR condition, significant p-values were observed in the dimensions of Motivational Fatigue (r= -0.3406; p=0.0339*) and Emotional Fatigue (r= -0.3525; p=0.0278*). Both dimensions exhibited a negative correlation, suggesting that as participant age increased, reported fatigue levels decreased.
In the FTF condition, a significant negative correlation was found for the dimension of General Fatigue (r= -0.4263; p=0.0068**).
Lastly, within the context of the Perceived Performance Scale (PPS), a significant p-value emerged in the linear regression analysis between age and the "Different Group" dimension, but only in the VR condition (r= -0.3766; p=0.0181*).
|
VC
|
|
VR
|
|
FTF
|
Variables
|
r
|
F
|
P
|
|
r
|
F
|
P
|
|
r
|
F
|
P
|
ZEFS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
General
|
-0.2725
|
2.9682
|
0.0933
|
.
|
|
-0.1871
|
1.3428
|
0.2540
|
|
|
-0.4263
|
8.2191
|
0.0068
|
**
|
Social
|
-0.2087
|
1.6850
|
0.2023
|
|
|
-0.1761
|
1.1844
|
0.2835
|
|
|
-0.0822
|
0.2517
|
0.6189
|
|
Motivational
|
-0.2812
|
3.1771
|
0.0829
|
.
|
|
-0.3406
|
4.8542
|
0.0339
|
*
|
|
0.0163
|
0.0099
|
0.9215
|
|
Emotional
|
-0.1447
|
0.7918
|
0.3793
|
|
|
-0.3525
|
5.2483
|
0.0278
|
*
|
|
-0.1672
|
1.0645
|
0.3089
|
|
Visual
|
-0.2835
|
3.0588
|
0.0891
|
.
|
|
-0.1301
|
0.6022
|
0.4430
|
|
|
|
|
|
|
PPS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Perceived performance
|
-0.1300
|
0.6356
|
0.4304
|
|
|
0.0398
|
0.0588
|
0.8098
|
|
|
-0.2174
|
1.8347
|
0.1838
|
|
Different group
|
-0.0773
|
0.2223
|
0.6401
|
|
|
-0.3766
|
6.1143
|
0.0181
|
*
|
|
-0.2819
|
3.1948
|
0.0821
|
.
|
Table 4 This table presents the outcomes of correlation analyses between participant age and responses to various questionnaires administered across different experimental conditions. Each cell contains the correlation coefficient and associated p-value, providing a statistical measure of the strength and direction of the relationship between age and questionnaire responses. False Discovery Rate (FDR) correction was applied for multiple comparisons. VC stands for Video Conference, VR for Virtual Reality, and FTF for Face To Face interactions, ZEFS stands for Zoom Exhaustion & Fatigue Scale, and PPS stands for Perceived Performance Scale.