Descriptive Data of The Interview Quality Indicators in the Combined Dataset
Considering that in both conditions (Feedback and No Feedback) the participants did not receive any feedback before the first interview, Table 3 presents the descriptive statistics excluding the first-round interviews. As shown, compared with the No Feedback condition, recommended question use, relevant details, and neutral details elicited were higher in the Feedback condition. Non-recommended question, and wrong details elicited were lower in the Feedback condition. As for the proportion of correct conclusions, the Feedback condition almost doubled the correct rate compared with the No Feedback condition.
Correlations between Questions, Details, and Conclusion Correctness
Number of recommended questions was positively correlated with number of relevant details elicited (r = .79, p < .001), number of neutral details elicited (r = .79, p < .001), and negatively correlated with number of wrong details elicited (r = −.25, p < .001). Number of recommended questions was also positively correlated with conclusion correctness (r = .36***, p < .001). Number of non-recommended questions was negatively correlated with number of relevant details elicited (r = −.69, p < .001), number of neutral details elicited (r = −.20, p < .001), and positively correlated with number of wrong details elicited (r = .57, p < .001). Number of non-recommended questions was negatively correlated with conclusion correctness (r = −.13, p < .001). The correlational structure provided robust evidence that the algorithms used in this series of studies functioned as expected (For the correlation matrix, see Table S2 in supplementary materials).
The Effect of Simulation Training with Feedback on Interview Quality
The complete results of the linear mixed models for question use, details elicited, and conclusion correctness can be accessed in supplementary materials (Tables S3, S4, and S5, respectively). There were considerable individual differences and within-study variation of interview quality as indicated by the random effects and inter-class correlations (ICCs) of the models. Simulation with feedback had robust effects on increasing recommended question use and decreasing non-recommended question use, improving details retrieval from the avatar, and finally reaching a correct conclusion regarding the suspected abuse. For models predicting question use, our main interest, the interaction term between Feedback condition and Round significantly predicted increased interview quality (Recommended Questions: B = 2.03, SE= 0.16, 95%CI [1.71, 2.34]; Non- recommended Questions: B = −2.37, SE= 0.17, 95%CI [−2.68, 2.05]; Percentage of recommended Questions: B = 5.34, SE= 0.32, 95%CI [4.74, 5.95]). Similar patterns emerged for the details elicited during interviews, with increasing improved interview quality in the Feedback condition (Relevant details: B = 0.40, SE= 0.05, 95%CI [0.30, 0.50]; Neutral details: B = 0.30, SE= 0.05, 95%CI [0.20, 0.40]; Wrong details: B = −0.40, SE= 0.05, 95%CI [−0.50, −0.30]). In the generalized linear mixed model predicting conclusion correctness, the significant interaction between Feedback and Round showed increased correct rate in the feedback condition as the training progressed (Interaction term: Odds Ratio = 1.39, SE = 0.11, 95%CI [1.20, 1.62]). The trends of interview quality improvement in the Feedback condition can be seen in Figures 1, 2, and 3. Note that these plots are not estimates from the mixed models but the actual data.
As for the effects of the other experimental conditions on interview quality, outcome feedback did not seem to have a robust influence on question use, details elicited, or conclusion correctness and neither did reflection (see Table S3, S4, and S5 in supplementary materials). All eight 95% CIs of Outcome Feedback indicated that no significant effect was found. Reflection only had a positive effect on neutral detail elicitation. Process Feedback had positive effects on recommended question use (B = 6.97, SE = 2.58, 95%CI [1.88, 11.92]), relevant detail elicitation (B = 1.61, SE = 0.65, 95%CI [0.35, 2.85]), percentage of recommended questions (B = 15.56, SE = 5.26, 95%CI [5.05, 25.99]) and percentage of relevant details (B = 26.98, SE = 7.40, 95%CI [12.73, 41.40]) but no significant effect was detected on the conclusion correctness (Odds Ratio = 1.44, SE = 0.78, 95%CI [0.49, 4.19]). More importantly, Modeling had significant effects on all interview quality indicators except for number of wrong details elicited. Modeling increased recommended question use (B = 14.70, SE = 2.65, 95%CI [9.47, 20.04]), while decreasing non-recommended question use (B = −6.46, SE = 2.89, 95%CI [−12.22, −0.82]), leading to a higher percentage of recommended questions (B = 27.06, SE = 5.35, 95%CI [16.34, 37.45]). Modeling also significantly increased the number of relevant (B = 2.72, SE = 0.63, 95%CI [1.49, 3.99]), and neutral details (B = 2.86, SE = 0.61, 95%CI [1.64, 4.04]), without increasing the number of wrong details in the meantime, resulting in higher percentage of relevant details (B = 22.15, SE = 7.17, 95%CI [8.07, 36.38]). As for the conclusions, modeling significantly increased conclusion correctness beyond the provision of feedback alone (Odds Ratio = 5.05, SE = 2.81, 95%CI [1.69, 15.05]).
We also did round-wise comparisons (i.e., compare each round’s performance with the performance of the previous round) in the interviews that received either combined feedback or process/ outcome feedback to examine the trend of training. The data contained 247 participants and 1307 interviews in total and the results are presented in Table 4-6. Overall, round-wise increase of recommended question and decrease of non-recommended question were, for several comparisons, significant, suggesting continued improvement. The training effect did not seem to continue to improve in terms of details elicited, especially for wrong details. Round-wise difference of conclusion correctness was only significant in the first comparison (Round 1 vs. Round 2). Notably, all the comparisons between the 8th round and the 7th round were not significant. Whether this is an indicator of reaching plateau or a result of insufficient power demands further investigation.
Individual Differences in Interview Training: Reliable Change
The RCI results showed that only a minority of participants in the Feedback group exhibited reliable change at the end of the training. For recommended question, 41.7% (93/223) of participants had a RCI greater than 1.96. But only 18.8% (42/223) had an RCI smaller than −1.96 in the case of non-recommended questions. Similar patterns emerged when using the details elicited to examine reliable change: 26.0% (58/223) of participants achieved reliable change in terms of relevant detail elicitation and 30.5% (68/223) for neutral detail elicitation. As for wrong detail elicitation, only 8.5% (19/223) of participants had a RCI smaller than −1.96.
A closer examination between training design and RCI showed that number of interviews had an impact on reliable change percentage. As shown in Table 7, participants who participated in a greater number of interviews were more likely to achieve a reliable change. These results corresponded to the round-wise comparison analyses, showing continuous improvement at the individual and the group level. Note that the higher percentage in the 5-round design from Haginoya and colleagues (2021) could be a result of small sample size and the added feature of modeling instead of indicating a non-linear trend for improvement. Combined, these results suggest that there are great individual differences in training effect but with more practice, it is possible to improve the interview quality even among those who learn at a relatively slow pace.
Professional and Parenting Experience Moderates Improvements over Interviews
To examine whether professional experience and parenting experience moderated the improvement, additional mixed models with Professional Experience, parenting, and the interaction term with Round were run for all quality indicators. In terms of professional experience (For detailed results, see Table S6, S7, and S8 in supplementary materials), having professional experience positively predicted relevant details (B = 0.82, SE = 0.35, 95%CI [0.12, 1.53]) and neutral details elicited (B = 0.72, SE = 0.34, 95%CI [0.05, 1.40]), that is, all else being equal, individuals with professional experiences were better at eliciting information from the avatars in the first round of the interview. More importantly, the interaction term between professional experience and round was a significant predictor of number of recommended questions (B = 0.96, SE = 0.25, 95%CI [0.43, 1.45]), number of non-recommended questions (B = −0.68, SE = 0.27, 95%CI [−1.21, −0.17]), percentage of recommended questions (B = 2.47, SE = 0.52, 95%CI [1.40, 3.49]), relevant details elicited (B = 0.18, SE = 0.08, 95%CI [0.02, 0.33]), and neutral details elicited (B = 0.26, SE = 0.08, 95%CI [0.11, 0.42]). Professional experiences also predicted higher correct rate (Odds Ratio = 2.59, SE = 0.99, 95%CI [1.22, 5.49]) but the interaction with Round was not significant (Odds Ratio = 0.88, SE = 0.10, 95%CI [0.70, 1.09]). After controlling for study-level, condition-level, and individual level variances, compared with those who had no experience in interviewing children, individuals with professional experience improved more over rounds of practice, as suggested by the significant interaction terms between professional experience and practice round.
Individuals with parenting experiences asked more non-recommended questions (B = 3.13, SE = 1.28, 95%CI [0.63, 5.68]) and obtained more wrong details at the beginning of the training (B = 0.97, SE = 0.30, 95%CI [0.38, 1.55]). Parenting experience also interacted with practice round to predict interview quality (For detailed results, see Table S9, S10, and S11 in supplementary materials). The 95% CI of the estimate of interaction term between Parenting experience and Round did not include zero for, number of non-recommended questions (B =−0.73, SE = 0.29, 95%CI [−1.30, −0.15]), relevant details elicited (B = 0.20, SE = 0.09, 95%CI [0.02, 0.36]), neutral details elicited (B = 0.24, SE = 0.08, 95%CI [0.07, 0.40]), wrong details elicited (B = −0.30, SE = 0.08, 95%CI [−0.46, −0.14]), and percentage of relevant details elicited (B = 3.16, SE = 1.09, 95%CI [1.02, 5.28]). All significant effects were in the expected direction. Parental experience had no impact on conclusion correctness regardless of training rounds. While all experimental conditions were controlled for, compared with those without parenting experience, individuals with parenting experiences showed greater improvement of interview quality over rounds of practices, using less non-recommended questions, eliciting more relevant and neutral but less wrong details during interviews. Importantly, though professional and parenting experiences interacted with interview rounds to positively predict interview quality, the effect sizes were smaller compared those of experimental manipulations, as suggested by the smaller estimates of the fixed effects of the moderation models compared with models having all experimental manipulations as fixed effects.