The sampling procedure for this study involved selecting participants from different education levels in private universities in the Anand district, Gujarat, India. The goal was to ensure representation of individuals from various educational backgrounds. The first step was to identify the relevant strata based on the education levels within the population that is graduate, postgraduate and doctorate. Within each stratum, participants were randomly selected to ensure that the sample represented the diversity of education levels in the population. This was done using a random number generator. Participants were approached and invited to participate in the research. Voluntary consent with adequate information was obtained, and they were provided with the necessary information about the research objectives and procedures. Data were collected through the administration of the questionnaire measuring user expectations and user comfort in AI-based systems. The questionnaire utilized a Likert scale, spanning from strongly disagree to strongly agree. The questionnaire comprised of items related to user expectations for accuracy, explanation, and overall satisfaction with the system as well as system interface, system feedback, system responsiveness, and user engagement for user comfort. The statements included in the questionnaire were developed based on an extensive literature review and consultation with experts in the field. The questionnaire was administered to the participants either in an online format or in a paper-based format, depending on the participants convenience and preference. Clear instructions were provided at the beginning of the questionnaire to guide participants on how to respond and complete the survey.
User Comfort was measured by the tool with its manifestations in the following.
- Ease of Use: This dimension captures the level of ease and convenience experienced by the user while interacting with the AI-based language processing system.
- Perceived Comfort: This dimension assesses the user's subjective perception of comfort during their interactions with the system.
- Satisfaction: This dimension measures the extent to which the user feels satisfied with the overall user experience of the AI-based language processing system.
User expectation was measured by the tool with its manifestations in the following.
- Performance Expectations: This dimension evaluates the user's expectations regarding the system's performance and accuracy in understanding and responding to their inputs.
- Personalization: This dimension assesses the user's expectations for personalized and tailored interactions with the AI-based system.
These dimensions provide a comprehensive assessment of user comfort and user expectations in AI-based systems, covering aspects related to ease of use, subjective comfort, satisfaction, performance expectations, and personalization.
Factorial design
In the study examining the effects of selected factors on user comfort and user expectations in AI-based systems, a factorial design was employed. The factorial design allowed for the simultaneous investigation of multiple independent variables and their interactions, providing a comprehensive understanding of their effects on the dependent variable.
The factorial design for the first objective involved manipulating three independent variables: system interface, system feedback, and system responsiveness for user comfort. Each independent variable has multiple levels or conditions. The system interface had two levels (text-based and voice-based), system feedback had two levels (no feedback and minimal feedback), and system responsiveness had three levels (slow, moderate, and quick). In this case, a 2x2x3 factorial design was used, resulting in a total of 12 unique conditions or combinations. The participants in the study were allocated in a random manner to one of the 12 conditions, ensuring an equal distribution of participants across each combination of independent variable levels. This random assignment helped control for potential confounding variables and contributes to the internal validity of the study.
The second objective, examining the effects of prior experience and education levels on user expectations in AI-based systems, the factorial design involved manipulating two independent variables: prior experience and education levels. The prior experience had two levels (with prior experience and without prior experience), and education levels had three levels (Graduate, Postgraduate and Doctorate). In this case, a 2x3 factorial design was employed, resulting in a total of 6 unique conditions or combinations. The participants were randomly allocated to one of the 6 conditions, ensuring an equal distribution of participants across each combination of independent variable levels.
Findings
The findings of the study were categorised objective-wise for clearer perspective.
Objective 1: Assess User Comfort about AI-based Systems
To fulfil the first objective, the study examined the main and interactive effects of system interface, system feedback and system responsiveness on user comfort in AI-based systems. By analysing the data using a three-way ANOVA, the aim was to determine if, system interface, system feedback and system responsiveness or their interaction significantly influence user comfort.
Table1: Descriptive statistics (User comfort)
System interface
|
System feedback
|
System responsiveness
|
Mean
|
Std. Deviation
|
N
|
Text-based
|
No feedback
|
Slow response
|
2.27
|
1.069
|
33
|
Moderate response
|
3.00
|
1.107
|
32
|
Quick response
|
2.57
|
1.089
|
14
|
Total
|
2.62
|
1.124
|
79
|
Minimal feedback
|
Slow response
|
2.63
|
.824
|
24
|
Moderate response
|
2.33
|
.985
|
12
|
Quick response
|
3.00
|
.894
|
11
|
Total
|
2.64
|
.895
|
47
|
Total
|
Slow response
|
2.42
|
.981
|
57
|
Moderate response
|
2.82
|
1.105
|
44
|
Quick response
|
2.76
|
1.012
|
25
|
Total
|
2.63
|
1.041
|
126
|
Voice-based
|
No feedback
|
Slow response
|
2.75
|
.775
|
16
|
Moderate response
|
3.61
|
.979
|
18
|
Quick response
|
3.69
|
.855
|
13
|
Total
|
3.34
|
.962
|
47
|
Minimal feedback
|
Slow response
|
2.55
|
1.214
|
11
|
Moderate response
|
3.55
|
.858
|
22
|
Quick response
|
4.15
|
.631
|
46
|
Total
|
3.76
|
.964
|
79
|
Total
|
Slow response
|
2.67
|
.961
|
27
|
Moderate response
|
3.58
|
.903
|
40
|
Quick response
|
4.05
|
.705
|
59
|
Total
|
3.60
|
.980
|
126
|
Total
|
No feedback
|
Slow response
|
2.43
|
1.000
|
49
|
Moderate response
|
3.22
|
1.093
|
50
|
Quick response
|
3.11
|
1.121
|
27
|
Total
|
2.89
|
1.119
|
126
|
Minimal feedback
|
Slow response
|
2.60
|
.946
|
35
|
Moderate response
|
3.12
|
1.066
|
34
|
Quick response
|
3.93
|
.821
|
57
|
Total
|
3.34
|
1.082
|
126
|
Total
|
Slow response
|
2.50
|
.976
|
84
|
Moderate response
|
3.18
|
1.077
|
84
|
Quick response
|
3.67
|
.998
|
84
|
Total
|
3.12
|
1.121
|
252
|
Table 2: Tests of Between-Subjects Effects (User comfort)
Source
|
Type III Sum of Squares
|
Df
|
Mean Square
|
F
|
Sig.
|
Partial Eta Squared
|
Corrected Model
|
109.233a
|
11
|
9.930
|
11.545
|
.000
|
.346
|
Intercept
|
1856.395
|
1
|
1856.395
|
2158.292
|
.000
|
.900
|
System interface
|
28.771
|
1
|
28.771
|
33.449
|
.000
|
.122
|
System feedback
|
.132
|
1
|
.132
|
.153
|
.696
|
.001
|
System responsiveness
|
23.271
|
2
|
11.636
|
13.528
|
.000
|
.101
|
System interface* System Feedback
|
.008
|
1
|
.008
|
.009
|
.923
|
.000
|
System interface* System Responsiveness
|
8.151
|
2
|
4.075
|
4.738
|
.010
|
.038
|
System feedback* System responsiveness
|
5.575
|
2
|
2.787
|
3.241
|
.041
|
.026
|
System interface* System feedback* System responsiveness
|
3.040
|
2
|
1.520
|
1.767
|
.173
|
.015
|
Error
|
206.429
|
240
|
.860
|
|
|
|
Total
|
2761.000
|
252
|
|
|
|
|
Corrected Total
|
315.663
|
251
|
|
|
|
|
R Squared = .346 (Adjusted R Squared = .316)
|
As the table 1 indicates, the sample size for realising objective 1 involved 252 participants ensuring to eliminate the redundant cases that disturbed the study. The table 1 also indicates the mean scores that help draw conclusions regarding the significant interactions, if found during the later stage.
Inferences about the main and interactive effects were drawn from Table 2. The main effect of system interface on user comfort was found to be statistically significant at the α = 0.05 level, F(1, 11) = 33.449, p = 0.000. The effect size, measured by partial eta squared, was 0.122, indicating that approximately 12.2% of the variance in user comfort can be attributed to the system interface.
The main effect of system feedback on user comfort was not found to be statistically significant at the α = 0.05 level, F(1, 11) = 0.153, p = 0.696. The effect size, measured by partial eta squared, was 0.001, indicating that only approximately 0.1% of the variance in user comfort can be attributed to the system feedback.
The main effect of system responsiveness on user comfort was found to be statistically significant at the α = 0.05 level, F(2, 11) = 13.528, p = 0.000. The effect size, measured by partial eta squared, was 0.101, indicating that approximately 10.1% of the variance in user comfort can be attributed to the system responsiveness.
The interactive effect of system interface and system feedback on user comfort was not found to be statistically significant at the α = 0.05 level, F(1, 11) = 0.009, p = 0.923. The effect size, measured by partial eta squared, was 0.000, indicating that the interaction between system interface and system feedback explains very little variance in user comfort, less than 0.1%.
The interactive effect of system interface and system responsiveness on user comfort was found to be statistically significant at the α = 0.05 level, F(2, 11) = 4.738, p = 0.010. The effect size, measured by partial eta squared, was 0.038, indicating that the interaction between system interface and system responsiveness explains approximately 3.8% of the variance in user comfort.
The interactive effect of system feedback and system responsiveness on user comfort was found to be statistically significant at the α = 0.05 level, F (2, 11) = 3.241, p = 0.041. The effect size, measured by partial eta squared, was 0.026, indicating that the interaction between system feedback and system responsiveness explains approximately 2.6% of the variance in user comfort.
The interactive effect of system interface, system feedback, and system responsiveness on user comfort was not found to be statistically significant at the α = 0.05 level, F (2, 11) = 1.767, p = 0.173. The effect size, measured by partial eta squared, was 0.015, indicating that the interaction between system interface, system feedback, and system responsiveness explains approximately 1.5% of the variance in user comfort.
Table 3: Pairwise comparisons of system interface
(I)System interface
|
(J)System interface
|
Mean Difference (I-J)
|
Std. Error
|
Sig.b
|
95% Confidence Interval for Differenceb
|
Lower Bound
|
Upper Bound
|
Text-based
|
Voice-based
|
-.749*
|
.130
|
.000
|
-1.004
|
-.494
|
Voice-based
|
Text-based
|
.749*
|
.130
|
.000
|
.494
|
1.004
|
Based on estimated marginal means
|
*. The mean difference is significant at the .05 level.
|
The table 3 helped in the pairwise comparisons of system interface, which revealed a significant mean difference between the text-based and voice-based system interfaces (M = -0.749, p = 0.000, α = 0.05). This indicates that users rated the text-based system interface 0.749 units lower than the voice-based interface. Conversely, the mean difference between the voice-based and text-based system interfaces was 0.749, indicating that users rated the voice-based interface 0.749 units higher than the text-based interface (p = 0.000, α = 0.05). These findings suggest that there are significant differences in user comfort based on the type of system interface utilized.
Table 4: Pairwise comparisons of System responsiveness
(I)System responsiveness
|
(J)System responsiveness
|
Mean Difference (I-J)
|
Std. Error
|
Sig.b
|
95% Confidence Interval for Differenceb
|
Lower Bound
|
Upper Bound
|
Slow response
|
Moderate response
|
-.574*
|
.154
|
.000
|
-.877
|
-.271
|
Quick response
|
-.806*
|
.162
|
.000
|
-1.124
|
-.487
|
Moderate response
|
Slow response
|
.574*
|
.154
|
.000
|
.271
|
.877
|
Quick response
|
-.232
|
.160
|
.149
|
-.547
|
.084
|
Quick response
|
Slow response
|
.806*
|
.162
|
.000
|
.487
|
1.124
|
Moderate response
|
.232
|
.160
|
.149
|
-.084
|
.547
|
Based on estimated marginal means
|
*. The mean difference is significant at the .05 level.
|
The table 4 helped in the pairwise comparisons of system responsiveness which revealed a significant mean difference between the slow response and moderate response conditions (M = -0.574, p = 0.000, α = 0.05). This indicates that users rated the slow response condition 0.574 units lower than the moderate response condition. The pairwise comparisons revealed a significant mean difference between the slow response and quick response conditions (M = -0.806, p = 0.000, α = 0.05). This indicates that users rated the slow response condition 0.806 units lower than the quick response condition. However, there was no significant mean difference between the moderate response and quick response conditions (M = -0.232, p = 0.149, α = 0.05). This suggests that there is no difference in user ratings between the moderate response and quick response conditions.
Table 5: Post hoc test (User comfort)
Tukey HSD
|
(I)System responsiveness
|
(J)System responsiveness
|
Mean Difference (I-J)
|
Std. Error
|
Sig.
|
95% Confidence Interval
|
Lower Bound
|
Upper Bound
|
Slow response
|
Moderate response
|
-.68*
|
.143
|
.000
|
-1.02
|
-.34
|
Quick response
|
-1.17*
|
.143
|
.000
|
-1.50
|
-.83
|
Moderate response
|
Slow response
|
.68*
|
.143
|
.000
|
.34
|
1.02
|
Quick response
|
-.49*
|
.143
|
.002
|
-.83
|
-.15
|
Quick response
|
Slow response
|
1.17*
|
.143
|
.000
|
.83
|
1.50
|
Moderate response
|
.49*
|
.143
|
.002
|
.15
|
.83
|
Based on observed means.
The error term is Mean Square (Error) = .860.
|
*. The mean difference is significant at the .05 level.
|
In the current study, as indicated by Table 5, Tukey's HSD post hoc test was conducted to examine pairwise differences between the response conditions (slow, moderate, and quick). The pairwise comparisons using Tukey's Honestly Significant Difference (HSD) post hoc test revealed the following:
- The mean difference between the slow response and moderate response conditions was -0.574 units, indicating that users rated the slow response condition significantly lower than the moderate response condition (p < 0.001, α = 0.05).
- The mean difference between the slow response and quick response conditions was -1.17 units, indicating that users rated the slow response condition significantly lower than the quick response condition (p < 0.001, α = 0.05).
- The mean difference between the moderate response and quick response conditions was -0.49 units, suggesting that users rated the moderate response condition significantly lower than the quick response condition (p = 0.002, α = 0.05).
All reported p-values were below the chosen significance level of 0.05, indicating statistical significance in the observed differences between the respective response conditions.
The reason for conducting the post hoc test is to identify which specific pairs of response conditions significantly differ from each other in terms of user comfort. The post hoc test allows for a more detailed analysis of pairwise differences, providing insights into the specific comparisons that contribute to the overall significant result in the ANOVA. The differences in the results of the pairwise comparisons obtained through Tukey's HSD post hoc test compared to the initial pairwise comparisons are due to the adjustment made by the post hoc test to control for Type I error inflation. The post hoc test adjusts the significance levels for multiple comparisons, resulting in more stringent criteria for determining statistical significance.
Objective 2: Assess User expectations about AI-based Systems
To fulfil the second objective, the study examined the main and interactive effects of prior experience and education levels on user expectations in AI-based systems. By analysing the data using a two-way ANOVA, the aim was to determine if prior experience, education levels, or their interaction significantly influence user expectations.
Table 6: Descriptive Statistics (User Expectations)
Prior Experience
|
Education level
|
Mean
|
Std. Deviation
|
N
|
With prior experience
|
Graduate
|
3.34
|
.815
|
47
|
Postgraduate
|
3.63
|
.755
|
49
|
Doctorate
|
3.57
|
.788
|
23
|
Total
|
3.50
|
.791
|
119
|
Without
prior experience
|
Graduate
|
2.83
|
1.264
|
52
|
Postgraduate
|
2.70
|
1.282
|
50
|
Doctorate
|
2.24
|
1.091
|
17
|
Total
|
2.69
|
1.254
|
119
|
Total
|
Graduate
|
3.07
|
1.100
|
99
|
Postgraduate
|
3.16
|
1.149
|
99
|
Doctorate
|
3.00
|
1.132
|
40
|
Total
|
3.10
|
1.123
|
238
|
Table 7: Tests of Between-Subjects Effects (User Expectations)
Source
|
Type III Sum of Squares
|
df
|
Mean Square
|
F
|
Sig.
|
Partial Eta Squared
|
Corrected Model
|
46.183a
|
5
|
9.237
|
8.484
|
.000
|
.155
|
Intercept
|
1827.932
|
1
|
1827.932
|
1678.899
|
.000
|
.879
|
Prior Experience
|
42.063
|
1
|
42.063
|
38.633
|
.000
|
.143
|
Education level
|
1.987
|
2
|
.994
|
.913
|
.403
|
.008
|
Prior Experience * Education level
|
5.174
|
2
|
2.587
|
2.376
|
.095
|
.020
|
Error
|
252.594
|
232
|
1.089
|
|
|
|
Total
|
2581.000
|
238
|
|
|
|
|
Corrected Total
|
298.777
|
237
|
|
|
|
|
a. R Squared = .155 (Adjusted R Squared = .136)
|
As the table 6 indicates, the sample size for realising objective 2 involved 238 participants ensuring to eliminate the redundant cases that disturbed the study as well as eliminating cases using SPSS randomly that contributed to the large variance in sample size in each stratum, which contributes to the limitation of this study. The table 6 also indicates the mean scores that help draw conclusions regarding the significant interactions, if found during the later stage.
Inferences about the main and interactive effects were drawn from Table 7. The main effect of prior experience on user expectations was found to be statistically significant, F (1, 5) = 1678.899, p < 0.001. The effect size, indicated by partial eta squared, was large, accounting for 87.9% of the variance in user expectations. This suggests that prior experience has a substantial influence on user expectations in the context of the study.
The main effect of education level on user expectations was not found to be statistically significant, F (2, 5) = 0.913, p = 0.403. The effect size, as indicated by partial eta squared, was small, accounting for 0.8% of the variance in user expectations. This suggests that education level may not have a significant impact on user expectations in the context of the study.
The interaction effect of prior experience and education level on user expectations was not found to be statistically significant, F (2, 5) = 2.376, p = 0.095. The effect size, as indicated by partial eta squared, was small, accounting for 2% of the variance in user expectations. This suggests that the combination of prior experience and education level may not have a significant joint impact on user expectations in the context of the study.
Table 8: Pairwise comparisons (User Expectations)
|
(I)Prior Experience
|
(J)Prior Experience
|
Mean Difference (I-J)
|
Std. Error
|
Sig.b
|
95% Confidence Interval for Differenceb
|
Lower Bound
|
Upper Bound
|
With
prior experience
|
Without
prior experience
|
.925*
|
.149
|
.000
|
.632
|
1.219
|
Without
prior experience
|
With
prior experience
|
-.925*
|
.149
|
.000
|
-1.219
|
-.632
|
Based on estimated marginal means
|
*. The mean difference is significant at the .05 level.
|
The table 8 helped in the pairwise comparisons of system interface, which revealed a significant mean difference in user expectations between individuals with prior experience and those without prior experience (mean difference = 0.925, p = 0.000, α = 0.05). This indicates that individuals with prior experience had higher expectations compared to those without prior experience.
Post hoc tests are typically conducted when there are three or more levels of a categorical independent variable. Since, there were only two levels of prior experience, which means there were no additional pairwise comparisons to be made beyond comparing the two groups directly.