A Mega-Analysis of the Effects of Feedback on the Quality of Simulated Child Sexual Abuse Interviews with Avatars

doi:10.21203/rs.3.rs-1121518/v1

Download PDF

Research Article

A Mega-Analysis of the Effects of Feedback on the Quality of Simulated Child Sexual Abuse Interviews with Avatars

https://doi.org/10.21203/rs.3.rs-1121518/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 01 Apr, 2022

Read the published version in Journal of Police and Criminal Psychology →

Version 1

posted

You are reading this latest preprint version

The present study aimed to test the effectiveness of giving feedback on simulated avatar interview training (Avatar Training) across different experiments and participant groups and to explore the effect of professional training and parenting experience by conducting a mega-analysis of previous studies. A total of 2,208 interviews containing 39,950 recommended and 36,622 non-recommended questions from 394 participants including European and Japanese students, psychologists, and police officers from nine studies were included in the mega-analysis. Experimental conditions were dummy-coded, and all dependent variables were coded in the same way as in the previously published studies. Professional experience and parenting experience were coded as dichotomous variables and used in moderation analyses. Linear mixed effects analyses demonstrated robust effects of feedback on increasing recommended questions and decreasing non-recommended questions, improving quality of details elicited from the avatar, and reaching a correct conclusion regarding the suspected abuse. Round-wise comparisons in the interviews involving feedback showed a continued increase of recommended questions and a continued decrease of non-recommended questions. Those with (vs. without) professional and parenting experience improved faster in the feedback group. These findings provide strong support for the robustness of Avatar Training.

child sexual abuse (CSA)

investigative interviewing

simulation training

feedback

serious gaming

Child sexual abuse (CSA) is prevalent in all societies, with prevalence estimates ranging from 8 to 31% for girls and 3 to 17% for boys (Barth et al., 2013). It is also clear that CSA is associated with a plethora of negative psychological, relational, and somatic health consequences (Hailes et al., 2019). It is, therefore, important to investigate suspected CSA cases effectively. Unfortunately, these investigations present special challenges. In almost 70% of suspected CSA cases, the child’s statement is the only available evidence in these cases (Elliott & Briere, 1994; Herman, 2009). Due to the usual lack of corroborating evidence, investigative interviews with alleged victims are of central importance in CSA investigations. While most experts agree that children can in principle provide accurate reports, there is also little doubt that accounts can be distorted both by improper interviewing and by normal memory decay (Ceci & Bruck, 1995). It is, therefore, worrying that the quality of investigative interviews in these cases is poor worldwide: Closed questions are still the most common type of questions, regardless of expert warnings about the use of this type of questions (Cederborg et al., 2000; Korkman et al., 2008; Sternberg et al., 2001).

To improve interview quality, a lot of effort has been directed at training interviewers. The following main rules are recommended when interviewing children: First, questions should be non-leading. Leading questions can have a negative impact on children, creating less accurate statements and contaminated memories (Bruck & Ceci, 1999; Ceci & Bruck, 1993, 1995). For example, using a realistic mock event, Finnilä et al. (2003) showed that children who had been visited at daycare by a clown, gave false affirmative responses at alarming rates (e.g., a 20% false positive rate to the question “He told you that what you did together was a secret and that you couldn’t tell anyone, didn’t he?”). Second, open-ended rather than option-posing questions (i.e., asking for a yes/no-response or providing a list of alternatives) should be used. Whereas option-posing questions tap less accurate recognition processes, open-ended questions rely on recall memory and are, therefore, more likely to elicit useful answers (Lamb et al., 2003, Lamb et al., 2008; Lyon, 2014). Even though they have been widely disseminated over the years in theoretical training programs targeting professionals in the field, the recommendations are not always followed in practice (e.g., Johnson et al., 2015).

Despite the lack of proven efficacy of most training programs, Lamb, Sternberg, Orbach, Hershkowitz et al. (2002) have shown that the use of a structured protocol giving the interviewers clear guidance on the questions to use in the different phases of the interview coupled with extensive feedback has resulted in improvements. Unfortunately, performance deteriorates rapidly after feedback is discontinued (Lamb, Sternberg, Orbach, Esplin et al., 2002). Another limitation is that, as it is usually impossible to know what actually happened in real cases, feedback can only be given on the questions used by the interviewer but not on whether the elicited responses are true or false, that is, there is no outcome feedback. This means that interviewers are not alerted to when their questions have resulted in a false allegation of CSA therefore they may not experience the need to change their interviewing behaviors. The training program Specialist Vulnerable Witness Forensic Interview Training (Benson & Powell, 2015; Powell et al., 2016) has also shown promising results in improving CSA interview quality in terms of question use. However, it is a comprehensive training project that includes 15 modules and takes months to be implemented. The feedback provided usually consists of process feedback on questions used and behaviors employed during the interviews, with no feedback on whether the interviewer reached the correct conclusion. Some trainings employ actors who play the role of the allegedly abused child, which, while introducing the interactive components of the training, may not be optimal to mimic the behavior of actual children in interview situations in terms of memory recall and suggestibility. The reliance on actors and experts also poses difficulties for the scalability of the program.

To address the situation illustrated above, a series of experiments exploring the efficacy of simulated avatar interview training programs have been conducted, where individuals have interviewed child avatars and received feedback on the questions used and the correctness of elicited information (Haginoya et al., 2020; Authors, 2021; Pompedda et al., 2020; Krause et al., 2017; Pompedda et al., 2015). The algorithms in the program were set to mimic the behavioral pattern of real children during interviews. That is, the avatars have predefined ‘memories’ of an event of interest and respond to questions in a way that is consistent with research on suggestibility of children of different ages (4- and 6-year-olds). For example, if the interviewer asks a question about a detail that is absent from the avatar’s memory, the avatar responds ‘No.’ But if the question is repeated, a 4-year-old avatar will change the response to ‘Yes’ with a probability of .50. This way, suggestive questions (and other types of non-recommended questions) can lead to inaccurate details being contained in the avatars' responses, similar to what may happen in actual CSA interviews. For a detailed description of the program development, please see Pompedda (2018). One way of assessing if the simulation works as intended is to inspect the correlations between question types and types of details elicited. Previous research from actual child interviews show that recommended questions are positively correlated with longer responses that contain more central details compared to closed questions (e.g., Lamb et al., 2007). Consequently, correct conclusions should be positively predicted by recommended questions as well as relevant details and negatively predicted by not recommended questions and wrong details. Evidence has repeatedly shown that CSA Avatar Training coupled with feedback on the interviewers’ performance results in improvement of interview quality compared to controls who receive no feedback (Haginoya et al., 2020, 2021; Krause, et al., 2017; Pompedda et al., 2015, 2020). Subsequently, additional studies have focused on further factors that have been expected to improve the training effect by incorporating new features in the program.

Specifically, Pompedda et al. (2017) compared the effects of types of feedback on interview quality improvement. Compared with outcome feedback (i.e., feedback on the correctness of the elicited information from the avatar) and process feedback (i.e., feedback on the usage of recommended and non-recommended questions), the combined feedback with both outcome and process information produced the largest improvement. Krause et al. (2017) further examined whether instructions of reflection could contribute to the improvement above and beyond combined feedback. Reflection on task-related behaviors has been proposed to have an enhancing effect on learning and it has been shown to improve learning from feedback. Though empirical evidence supports the efficacy of reflection on task performance in other contexts such as education (Espinet et al., 2013), military leadership (Matthew & Sternberg, 2009) and aircraft navigation (Ron et al., 2006), Krause et al. (2017) did not find clear evidence favoring the reflection design in CSA interview training.

More recently, Haginoya et al. (2020) extended this line of research to an Asian population and online context with results showing that Avatar Training with feedback improves interview quality across cultural contexts and implementation settings. After establishing the effectiveness of the approach, Authors [masked for review] further examined the effect of behavioral modeling and combining it with feedback as well as investigating whether the use of supportive statements (Ahern et al., 2019; Blasbalg et al., 2018; Lamb et al., 2018) can be increased at the same time with the use of recommended questions. Behavioral modeling training (BMT) originates from social learning theory (Bandura & McClelland, 1977). This approach includes five components: (1) identifying well-defined behaviors, (2) showing the effective use of those behaviors through model(s), (3) giving opportunities to practice those behaviors, (4) providing feedback and social reinforcement, and (5) taking measures of maximizing the transfer of those behaviors to practical tasks (Taylor et al., 2005), the latter three of which have been the integral part of the Avatar Training approach used in the earlier studies. By also incorporating the first and second component into the Avatar Training and providing the participants both negative and positive models and the consequences of these behaviors, Haginoya et al. (2021) showed that the combination of feedback and modeling improves interview quality more than feedback alone. In addition to improving interview quality directly, Authors [masked for review] also tested whether adding feedback on supportive statements could be done while also improving the use of appropriate question types. The use of supportive statements would be helpful in enhancing rapport-building and, consequently, abuse disclosure by reluctant children. The results confirmed that it is possible to improve the use of recommended questions while also improving the use of supportive statements by providing feedback.

Whether experience of interacting with children, either as a parent, babysitter, or as a professional child interviewer can have an impact on training, has not been exhaustively examined before. In a professional context, it has been proposed that past experience may hinder the learning process as the interviewers could have trouble refrain from using non-recommended questions or be reluctant to change (Pompedda, 2018). This reasoning is based on the idea that in many of the situations in which children interact with adults, for example, at school or at home closed questions are used to ask children about topics that adults are already aware, for example, a teacher assessing knowledge after teaching (e.g., Pate, 2012) or a parent asking about an exam that a child had at school. A possible explanation of why experienced interviewers might find it more difficult than novices to adhere to the use of open questions can be provided by the proactive interference theory. According to the proactive interference theory previously learned information might interfere with newly learned information. In the case of investigative interviews, the previously learned use of closed questions might interfere with the use of open questions (Powell et al., 2014). In addition, the frequent lack of knowledge regarding the ground truth in alleged CSA cases, and the use of the judicial outcome as proof of the quality of the interview, might exacerbate the effects of proactive interference (Jacoby et al., 2001). However, the literature shows mixed results between field studies where there is no association between experience and use of open questions (e.g., Wolfman et al., 2016), with some exceptions (e.g., Lafontaine & Cyr, 2017), and simulated interviews, where there is evidence of a negative association between experience and the use of open questions (e.g., Powell et al., 2014), with some studies showing no differences after training (e.g., Benson & Powell, 2015). However, experienced interviewers, while potentially more incline to use closed questions can have other skills that can help them during the interview, for example, better ability in creating rapport with the child (Hershkowitz et al., 2017, or better communication skills (e.g., MacDonald et al., 2017). Due to the fact that previous studies employing the avatar approach were limited by individual study sample size and participant population, previous examinations of the effect of parenting or child interview experience in the Avatar Training have been underpowered. Given the lack of research related to the impact of informal interaction with children and the mixed results related to experience in investigative interviews of children, this mega-analysis can provide new evidence about this topic.

Though each single study has already provided evidence for the efficacy of the program, the current research intended to offer more insight through a systematic mega-analysis of all the studies conducted so far. Mega-analysis integrates data from separate studies in raw data format (vs. a meta-analysis where summary statistics are integrated), which allows not just the re-examination of conclusions drawn from single studies using a larger sample and more reliable estimates, but in the present case also a series of novel analyses that can explore the effects of individual differences, research design, and relevant demographic variables such as professional training or parenting experience on the outcomes. Importantly, this research also allows a better estimate of the effects of providing a combination of outcome and process feedback across different professional groups and countries.

Participants

The nine studies included in this mega-analysis collected training data using participant samples of European and Japanese students, psychologists, and police officers. Detailed information regarding all of the samples can be found in Table 1.

Materials

Avatar training

Simulated interviews with avatars were conducted using different languages based on the country where the interviews took place. The different language versions were identical with the exception of a small number of cultural adaptations (e.g. religious settings and some games played by the child avatar were changed). The simulation comprised 16 different avatars equally divided between age (4 vs 6), ground truth abused vs not abused) and emotions displayed (crying vs no crying). For each case, a series of details both related to the alleged abuse (e.g., details that describe the abuse or that provide an alternative explanation), but also details that are not relevant for the investigation (e.g., favorite toy or other activities the avatar had experienced) were created. Interviewers faced a screen where one of the avatars was presented and vocally asked a question to the avatar like in a real interview. Meanwhile, an operator listened to the questions asked and categorized them in real time by clicking the appropriate button in the simulation interface. The categorization triggered the algorithms (different between 4-year-old and 6-year-old avatars) that were based on research in children’s memory and suggestibility, as well as the available details in the memory of the avatar and resulted in the launch of the appropriate video clip containing the avatar’s response. In each study, a randomized selection of avatars was used (providing an equal balance between abuse vs not abused avatars and 4 years old vs 6 years old) and provided in a random order.

Data Coding

Experimental Conditions. All experimental conditions were dummy-coded

(See Table S1 in supplementary materials). Feedback referred to the condition where participants received both process feedback (i.e., which of the questions they had used were appropriate and which were inappropriate and why) and outcome feedback (whether they had reached the correct conclusion after the interview; they were explained what had really happened in the case, i.e., which memory contents did the avatars have). The supportive statement manipulation was not aimed at improving recommended question use or conclusion accuracy and it was therefore ignored in current analyses.

Interview Round. In all but two police studies ([removed for masked review]) participants were assigned either to a Feedback condition or a No Feedback condition, with interview rounds ranging from 4 to 8. Instead, to maximize the utility of simulation training in the police force, in the two studies using police samples, participants were either assigned to a condition where they finished four rounds of interviews with feedback or a condition where they first finished four rounds of interviews without feedback and then finished four rounds of interviews with feedback. In current analyses, we recoded the latter condition so that the first four rounds without feedback were coded as belonging to the No Feedback condition and the second four rounds with feedback were coded as belonging to the Feedback condition. The 5-8 rounds of interviews in the Control+ Feedback condition were thus coded as rounds 1-4 in the Feedback condition.

Recommended and Non-recommended Questions. Recommended and non-recommended questions were coded as continuous variables with the value indicating number of questions asked in an interview. The current analysis coded the data in the same way as in the previously published studies (Reference eliminated due to blind peer review. see Table 2).

Relevant, Neutral, and Wrong details. Relevant, neutral, and wrong details were coded as continuous variables with the value indicating number of details elicited from the avatar in an interview. Relevant details were the forensically relevant details that related to the alleged abusive situation (e.g., details that would clarify if the abuse happened or not). Neutral details were details related to other situations that the avatar had experienced but that were not forensically relevant to the investigation of the alleged abuse (e.g., games played with other persons). Finally, wrong details were details that contradicted the pre-defined memories of each avatar (e.g., using repeated suggestive questions, the interviewer found that the dad would have touched the child, while in reality it was the uncle) There were one hundred and twenty cases missing the number of wrong details in the combined data set.

Conclusion Content Correctness. In all the studies, participants were deemed as having reached a correct conclusion only if they (1) first provided a correct answer regarding the presence (or absence) of an abuse and then (2) offered a correct account of the sexual abuse (who, when, where, and what transpired) in the former case or an explanation of what happened instead of an abuse in the latter case. Conclusion content correctness was coded as dichotomous variable in all except for two data sets (Study 2 from Pompedda et al., 2021; [removed for masked review]). In these two studies, conclusion content correctness was coded with three categories: correct, incorrect, and not enough information to reach a conclusion. Therefore, in the current mega-analysis, we coded conclusion content correctness with two categories: correct and not correct, with the latter including both incorrect and fail-to-reach-conclusion cases. As the data set containing the Japanese police response did not record conclusions, there were one hundred and twenty-four cases missing the information on conclusion correctness in the combined data set.

Professional Experience Regarding Child Interview. The data sets included in the current mega-analysis employed several non-identical measures to assess participants’ professional training and/or experiences with child interview. [removed for masked review] employed three questions about child interview training, experience, and child sexual abuse interview experience, specifically. Krause et al. (2017), Pompedda et al. (2020), and Authors [removed for masked review] used one item to assess child sexual abuse interview experience. [removed for masked review] documented years of conducting child interview, years of conducting child sexual abuse interview, and number of interviews conducted in a continuous manner. Pompedda et al. (2017) did not report child interview experience data in their published manuscript nor documented the information in the data set available to us. We obtained the child interview experience information through private communication with the authors. Therefore, in current analysis, we coded child interview experience as a dichotomous variable, with yes indicating having interviewing experience with children. The combined data set contained 33 participants (resulting in 149 interviews) in the No Feedback condition and 62 participants (resulting in 276 interviews) in the Feedback condition who had child interview experience before participating in one of the included studies.

Parenting Experience. All data sets except for the data set of [removed for masked review] contained parenting experience, though the operationalizations were not always the same. Pompedda et al. (2017), Krause et al. (2017), Pompedda et al. (2020), and [removed for masked review] asked participants whether they had children or not. Haginoya et al. (2020; 2021) and [removed for masked review] asked participants whether they had child-rearing experience. In current mega-analysis, we coded parenting experience as dichotomous variable, with yes either indicating having children or having child-rearing experience. There were 32 participants with parenting experience (150 interviews) in the No Feedback condition and 45 participants with parenting experience (158 interviews) in the Feedback condition in the combined data set.

Statistical Analyses

All statistical analyses were conducted in R (version 4.05). We first employed correlational analysis to investigate the validity of the algorithms used in the studies. Then we used lme4 (Bates et al., 2014) to perform a series of linear mixed effects analyses to examine the efficacy of simulation training on CSA interview quality. As fixed effects, we entered interview round, feedback, process feedback, outcome feedback, reflection, modeling, and the interaction term between interview round and feedback into the model. As random effects, we had intercepts for participants and studies. In subsequent analyses, we also examined potential moderating effects of the demographic variables professional training and experience as well as parenting experience by including as well as their interaction terms with interview round as fixed effects and study, Feedback condition, participants as random effects. Since professional experience and parenting experience were correlated, χ² (1) = 238.04, p < .001, separate models were run for the two potential moderators. Confidence intervals (95%) of the parameters in the linear mixed models were calculated using bootstrap method with 5,000 draws. The 95% confidence intervals of the parameters in the generalized linear mixed model were calculated using the Wald method.

To make the results more interpretable, interview round in the analyses was coded starting from zero, that is, the first round was designated with the value of 0, the second round with value of 1, and so on. This way, the intercepts of the models represented the estimates of the first-round performance in the baseline conditions.

In addition, we calculated a Reliable Change Index (RCI) for each participant in the feedback condition for question use and details elicited to provide more nuanced information regarding individual differences in the training effect as well as how training design could have an impact on reliable change. As there is no established norm for these measurements, the reliability of the measurement (r) was operationalized as the correlation between the first-round performance and the last-round performance in the No Feedback condition while excluding cases who received modeling instructions or process feedback instructions. The standard deviation (SD) of the measure was operationalized as the standard deviation of the first-round performance in the Feedback group, which was used to calculate the standard error of the difference. The RCI formula is as follows:

RCI = (Performance _{last round} - Performance _{1st round})/ [2* (1-r) * SD²]^1/2.

RCI greater than 1.96 (the z score corresponding to a distance of 2 standard deviations from the mean) in the case of recommended questions or smaller than -1.96 in the case of non-recommended questions would indicate that the participant had a reliable change in their interview quality.

Descriptive Data of The Interview Quality Indicators in the Combined Dataset

Considering that in both conditions (Feedback and No Feedback) the participants did not receive any feedback before the first interview, Table 3 presents the descriptive statistics excluding the first-round interviews. As shown, compared with the No Feedback condition, recommended question use, relevant details, and neutral details elicited were higher in the Feedback condition. Non-recommended question, and wrong details elicited were lower in the Feedback condition. As for the proportion of correct conclusions, the Feedback condition almost doubled the correct rate compared with the No Feedback condition.

Correlations between Questions, Details, and Conclusion Correctness

Number of recommended questions was positively correlated with number of relevant details elicited (r = .79, p < .001), number of neutral details elicited (r = .79, p < .001), and negatively correlated with number of wrong details elicited (r = −.25, p < .001). Number of recommended questions was also positively correlated with conclusion correctness (r = .36^***, p < .001). Number of non-recommended questions was negatively correlated with number of relevant details elicited (r = −.69, p < .001), number of neutral details elicited (r = −.20, p < .001), and positively correlated with number of wrong details elicited (r = .57, p < .001). Number of non-recommended questions was negatively correlated with conclusion correctness (r = −.13, p < .001). The correlational structure provided robust evidence that the algorithms used in this series of studies functioned as expected (For the correlation matrix, see Table S2 in supplementary materials).

The Effect of Simulation Training with Feedback on Interview Quality

The complete results of the linear mixed models for question use, details elicited, and conclusion correctness can be accessed in supplementary materials (Tables S3, S4, and S5, respectively). There were considerable individual differences and within-study variation of interview quality as indicated by the random effects and inter-class correlations (ICCs) of the models. Simulation with feedback had robust effects on increasing recommended question use and decreasing non-recommended question use, improving details retrieval from the avatar, and finally reaching a correct conclusion regarding the suspected abuse. For models predicting question use, our main interest, the interaction term between Feedback condition and Round significantly predicted increased interview quality (Recommended Questions: B = 2.03, SE= 0.16, 95%CI [1.71, 2.34]; Non- recommended Questions: B = −2.37, SE= 0.17, 95%CI [−2.68, 2.05]; Percentage of recommended Questions: B = 5.34, SE= 0.32, 95%CI [4.74, 5.95]). Similar patterns emerged for the details elicited during interviews, with increasing improved interview quality in the Feedback condition (Relevant details: B = 0.40, SE= 0.05, 95%CI [0.30, 0.50]; Neutral details: B = 0.30, SE= 0.05, 95%CI [0.20, 0.40]; Wrong details: B = −0.40, SE= 0.05, 95%CI [−0.50, −0.30]). In the generalized linear mixed model predicting conclusion correctness, the significant interaction between Feedback and Round showed increased correct rate in the feedback condition as the training progressed (Interaction term: Odds Ratio = 1.39, SE = 0.11, 95%CI [1.20, 1.62]). The trends of interview quality improvement in the Feedback condition can be seen in Figures 1, 2, and 3. Note that these plots are not estimates from the mixed models but the actual data.

As for the effects of the other experimental conditions on interview quality, outcome feedback did not seem to have a robust influence on question use, details elicited, or conclusion correctness and neither did reflection (see Table S3, S4, and S5 in supplementary materials). All eight 95% CIs of Outcome Feedback indicated that no significant effect was found. Reflection only had a positive effect on neutral detail elicitation. Process Feedback had positive effects on recommended question use (B = 6.97, SE = 2.58, 95%CI [1.88, 11.92]), relevant detail elicitation (B = 1.61, SE = 0.65, 95%CI [0.35, 2.85]), percentage of recommended questions (B = 15.56, SE = 5.26, 95%CI [5.05, 25.99]) and percentage of relevant details (B = 26.98, SE = 7.40, 95%CI [12.73, 41.40]) but no significant effect was detected on the conclusion correctness (Odds Ratio = 1.44, SE = 0.78, 95%CI [0.49, 4.19]). More importantly, Modeling had significant effects on all interview quality indicators except for number of wrong details elicited. Modeling increased recommended question use (B = 14.70, SE = 2.65, 95%CI [9.47, 20.04]), while decreasing non-recommended question use (B = −6.46, SE = 2.89, 95%CI [−12.22, −0.82]), leading to a higher percentage of recommended questions (B = 27.06, SE = 5.35, 95%CI [16.34, 37.45]). Modeling also significantly increased the number of relevant (B = 2.72, SE = 0.63, 95%CI [1.49, 3.99]), and neutral details (B = 2.86, SE = 0.61, 95%CI [1.64, 4.04]), without increasing the number of wrong details in the meantime, resulting in higher percentage of relevant details (B = 22.15, SE = 7.17, 95%CI [8.07, 36.38]). As for the conclusions, modeling significantly increased conclusion correctness beyond the provision of feedback alone (Odds Ratio = 5.05, SE = 2.81, 95%CI [1.69, 15.05]).

We also did round-wise comparisons (i.e., compare each round’s performance with the performance of the previous round) in the interviews that received either combined feedback or process/ outcome feedback to examine the trend of training. The data contained 247 participants and 1307 interviews in total and the results are presented in Table 4-6. Overall, round-wise increase of recommended question and decrease of non-recommended question were, for several comparisons, significant, suggesting continued improvement. The training effect did not seem to continue to improve in terms of details elicited, especially for wrong details. Round-wise difference of conclusion correctness was only significant in the first comparison (Round 1 vs. Round 2). Notably, all the comparisons between the 8th round and the 7th round were not significant. Whether this is an indicator of reaching plateau or a result of insufficient power demands further investigation.

Individual Differences in Interview Training: Reliable Change

The RCI results showed that only a minority of participants in the Feedback group exhibited reliable change at the end of the training. For recommended question, 41.7% (93/223) of participants had a RCI greater than 1.96. But only 18.8% (42/223) had an RCI smaller than −1.96 in the case of non-recommended questions. Similar patterns emerged when using the details elicited to examine reliable change: 26.0% (58/223) of participants achieved reliable change in terms of relevant detail elicitation and 30.5% (68/223) for neutral detail elicitation. As for wrong detail elicitation, only 8.5% (19/223) of participants had a RCI smaller than −1.96.

A closer examination between training design and RCI showed that number of interviews had an impact on reliable change percentage. As shown in Table 7, participants who participated in a greater number of interviews were more likely to achieve a reliable change. These results corresponded to the round-wise comparison analyses, showing continuous improvement at the individual and the group level. Note that the higher percentage in the 5-round design from Haginoya and colleagues (2021) could be a result of small sample size and the added feature of modeling instead of indicating a non-linear trend for improvement. Combined, these results suggest that there are great individual differences in training effect but with more practice, it is possible to improve the interview quality even among those who learn at a relatively slow pace.

Professional and Parenting Experience Moderates Improvements over Interviews

To examine whether professional experience and parenting experience moderated the improvement, additional mixed models with Professional Experience, parenting, and the interaction term with Round were run for all quality indicators. In terms of professional experience (For detailed results, see Table S6, S7, and S8 in supplementary materials), having professional experience positively predicted relevant details (B = 0.82, SE = 0.35, 95%CI [0.12, 1.53]) and neutral details elicited (B = 0.72, SE = 0.34, 95%CI [0.05, 1.40]), that is, all else being equal, individuals with professional experiences were better at eliciting information from the avatars in the first round of the interview. More importantly, the interaction term between professional experience and round was a significant predictor of number of recommended questions (B = 0.96, SE = 0.25, 95%CI [0.43, 1.45]), number of non-recommended questions (B = −0.68, SE = 0.27, 95%CI [−1.21, −0.17]), percentage of recommended questions (B = 2.47, SE = 0.52, 95%CI [1.40, 3.49]), relevant details elicited (B = 0.18, SE = 0.08, 95%CI [0.02, 0.33]), and neutral details elicited (B = 0.26, SE = 0.08, 95%CI [0.11, 0.42]). Professional experiences also predicted higher correct rate (Odds Ratio = 2.59, SE = 0.99, 95%CI [1.22, 5.49]) but the interaction with Round was not significant (Odds Ratio = 0.88, SE = 0.10, 95%CI [0.70, 1.09]). After controlling for study-level, condition-level, and individual level variances, compared with those who had no experience in interviewing children, individuals with professional experience improved more over rounds of practice, as suggested by the significant interaction terms between professional experience and practice round.

Individuals with parenting experiences asked more non-recommended questions (B = 3.13, SE = 1.28, 95%CI [0.63, 5.68]) and obtained more wrong details at the beginning of the training (B = 0.97, SE = 0.30, 95%CI [0.38, 1.55]). Parenting experience also interacted with practice round to predict interview quality (For detailed results, see Table S9, S10, and S11 in supplementary materials). The 95% CI of the estimate of interaction term between Parenting experience and Round did not include zero for, number of non-recommended questions (B =−0.73, SE = 0.29, 95%CI [−1.30, −0.15]), relevant details elicited (B = 0.20, SE = 0.09, 95%CI [0.02, 0.36]), neutral details elicited (B = 0.24, SE = 0.08, 95%CI [0.07, 0.40]), wrong details elicited (B = −0.30, SE = 0.08, 95%CI [−0.46, −0.14]), and percentage of relevant details elicited (B = 3.16, SE = 1.09, 95%CI [1.02, 5.28]). All significant effects were in the expected direction. Parental experience had no impact on conclusion correctness regardless of training rounds. While all experimental conditions were controlled for, compared with those without parenting experience, individuals with parenting experiences showed greater improvement of interview quality over rounds of practices, using less non-recommended questions, eliciting more relevant and neutral but less wrong details during interviews. Importantly, though professional and parenting experiences interacted with interview rounds to positively predict interview quality, the effect sizes were smaller compared those of experimental manipulations, as suggested by the smaller estimates of the fixed effects of the moderation models compared with models having all experimental manipulations as fixed effects.

Through a systematic mega-analysis, the current research first provided strong support for the robustness of the Avatar Training program, demonstrating that not only the associations between question use and details elicited but also their relationships with the conclusion reached by the interviewer were as expected and consistent with previous research. Then, we examined the efficacy of the avatar training with feedback in terms of recommended question use, details elicited, and most of all, conclusion correctness confirming a strong and clear effect of feedback. Round-wise comparisons suggested continuous improvement in the use of recommended questions while plateaus were reached in accurate information elicitation. Reliable change analyses offered insights on individual difference in training efficacy but also pointed out the potential of achieving reliable change by more practice. Moderation analysis also revealed that both professional experiences and parenting experiences may be conducive to the learning process.

The Efficacy of Simulated Interviews with Feedback

Interviewers, both legal and psychological professionals as well as university students, showed significant improvement in interview quality when undergoing the simulated training combined with feedback on their question use and interview outcome. Interviewers increased the use of recommended questions and reduced the use of non-recommended questions, which, then led to better information gathering, that is, more relevant and neutral details and fewer wrong details being elicited from the avatars. Importantly, even though the program did not include explicit instructions on how correctly utilizing the elicited details to draw a conclusion, these variables had robust associations with reaching correct conclusion (see Table S2 in the supplementary materials). Note that in all the studies included in this mega-analysis, the criteria for a correct conclusion were very strict. The interviewers not only had to reach a correct judgment of whether an abuse had taken place but also had to provide a coherent account of how the abuse took place that was consistent with the ground truth of the case. Under this stringent standard, interviewers in the feedback condition were able to reach a correct conclusion at 22% rate on average and 50% rate in the 8th round.

The results are in line with previous research on the beneficial effect of feedback on the outcomes of child sexual abuse interviews (Benson & Powell, 2015; Cederborg et al., 2013; Lamb, Sternberg, Orbach, Hershkowitz et al., 2002). Within one training session that took 1-2 hours, the avatar training program achieved improvement of interview quality comparable to other successful programs that last at least for a few days in terms of proportions of recommended questions. While a perfect comparison is not possible, Benson and Powell (2015) showed on average between 57 and 79% of recommended question use.

How Many Interviews Will be Enough and What is The Goal?

Round-wise comparisons showed different patterns of development in question use, details elicited and conclusion correctness. The recommended (vs. non-recommended) question use continued to improve while the improvement of details elicited reached plateau earlier. This is not surprising as there is no limit to the number of questions a participant can ask (within the 10 min timeframe), while relevant, and neutral details are finite within each avatar. As shown in Figure 2, the average numbers of relevant and neutral details are close to 7 (the maximum is 9). The average number of wrong details was close to 1 in the later rounds of the training in the feedback condition suggesting a floor effect. This means that the interviewers used very few non-recommended questions that could have elicited wrong details at this point. Most of the round-wise comparisons for conclusion correctness were also not significant, which may suggest there is room for improvement in terms of using elicited information appropriately. However, it is also of note that all the estimates of the odds ratio were greater than 1, indicating a trend for continued improvement over interviews.

Though none of the round-wise comparisons between the 7th and 8th interview were significant, we should be cautious to draw the conclusion that the training reached plateau given the small number of interviews available in these comparisons. Also, it is important to note that even if the training effect did not reach plateau, it may not be necessary nor optimal to add more interviews to a training session. Instead, a more appropriate next step would be to focus on the relationship between learning during the simulated interviews and how this relates to transfer to actual interviews and then develop training plans also possibly including refresher training sessions (Cyr et al., 2021).

From the reliable change analyses, it is clear that there are large individual differences in the training effect. Only a minority of participants in the Feedback group achieved reliable change. But it is also important to note that as the number of interviews increase, so do the percentages of reliable change. Therefore, by incorporating RCI into the design and offer individuals trainings with different length and intensity, future training programs can have greater impact.

Experience with Children on the Training Effect

The use of questions, both recommended and not recommended, did not differ between those who had previous experience in interviewing children and those who did not in the first round of the simulated interview. However, individuals with interview experience were better at eliciting relevant and neutral information at the beginning. Individuals with parenting experience were more likely to use non-recommended questions and elicited more wrong details at the beginning. More importantly, both professional experience of child interview and parenting experience interacted with interview round to predict interview quality. Interviewers who had experience with children improved faster compared with those who did not in terms of question use or information elicitation. Additional analyses were also run to probe if there were three-way interactions between experience, feedback, and interview round, but the results did not support the existence of a three-way interaction (see supplementary materials). This is a surprising result and goes against previous literature suggesting that professional experience is negatively associated with the use of open questions in simulated interviews, and also goes against field studies that shows no relationship (for a review see Lamb et al., 2018). However, this is in line with other studies that show how the type of training can overcome the effect of some a priori characteristics (e.g., Benson & Powell, 2015). To our best knowledge, this is the first study to test the effect of parenting in investigative interviews, and to provide more accurate information about the effect of the combination of outcome and process feedback, providing new knowledge to the field. A possible explanation for this result is that parents and experienced interviewers, while might not have better ability in using open questions, might possess superior communication skills. The interactive nature of the training might have had a role in boosting the improvements in these groups of participants.

Limitations and Future Directions

Notwithstanding the strength of the mega-analysis approach, the current study has several limitations. First, we only examined the training efficacy within the Avatar Training system. That is, the scope of the current analysis did not include training efficacy in improving interview outside of the simulated environment. The reasons for not analyzing the transfer effect are first, to keep the analysis concise and focused and second, as a result of lack of enough data for providing reliable results. At the moment of this analysis, only two studies have examined the transfer effect ([removed for masked review]; Pompedda et al., 2021). Secondly, also out of the scope of this article is how interview performance of previous rounds can influence learning of the subsequent rounds, and whether professionals and lay people respond to interview failures in the same way. Despite these limitations, the current study not only re-examined previous conclusions with a large sample offering more reliable estimates of the effects of combining process and outcome feedback, but also advanced our understanding of CSA interview training by its novel round-wise, RCI s, and moderation analyses.

The focus of this research has been on testing its efficacy in improving interview quality in a variety of different samples. However, this approach has the potential to be applied in other research areas in investigative psychology given that its algorithms are based on empirical research and have been proven to work as intended. The effects of contextual factors such as time pressure and fatigue as well as individual difference factors on interview quality can be examined within this training. That is, when not providing the interviewers with feedback, the avatar system could also function as a standardized assessment tool for the impact of contextual factors and individual’s interview style and quality.

An interview is a dynamic interactive process between the interviewer and the interviewee, therefore, one other direction is to further develop the avatar training by tailoring the response patterns of the avatars based on family background, mental ability, and other factors that could make the children more or less vulnerable to suggestibility or compliance.

The present research demonstrated the robustness of the Avatar Training program in improving interview quality in interviewers with different backgrounds (e.g., working experience and specialty) and different training environments (face-to-face and remote online). This allows trainers in various fields to integrate Avatar Training into their interviewer training program flexibly. Moreover, this flexibility may imply successful training even when all procedures of the Avatar Training are automated to scale it to a large number of potential trainees such as police officers, clinical psychologists, child support center staff, and even school teachers.

Findings regarding interviewer background provided encouraging knowledge for interviewers who have experience with children. Experienced interviewers may be more likely improve faster than those without experience under the interactive training environment. Although potentially relevant factors (e.g., motivation to improvement) need to be investigated, this suggests that providers of training programs may need to consider an environment that promotes trainees to make the best use of their abilities.

Ethical Statements

Funding. The study receives no specific financial support from any institution.

Conflict of Interest. The authors declare that there is no conflict of interest.

Ethical Approval. The study utilized published data. All studies included in this mega-analysis obtained ethical approval from respective institutions.

Informed consent. All studies included in this mega-analysis obtained informed consent from their participants before collecting data.

Data and Code Availability

Data and code are available upon request to respective authors.

Ahern, E. C., Hershkowitz, I., Lamb, M. E., Blasbalg, U., & Karni-Visel, Y. (2019). Examining reluctance and emotional support in forensic interviews with child victims of substantiated physical abuse. Applied Developmental Science, 23(3), 227–238.
* Authors [masked for review]. (2021). Improving supportiveness and questioning skills using online simulated child sexual abuse interviews with feedback [manuscript submitted for publication].
* Authors [masked for review]. (2021). Avatar Training Effects Transfer to Investigative Field Interviews of Children Conducted by Police Officers [manuscript submitted for publication].
* Authors [masked for review]. (2021). アバターを用いた面接シミュレーションとフィードバックによる性的虐待被害児童の面接訓練 [Training in child sexual abuse interviews using simulated interviews with avatars paired with feedback]. 子どもの虐待とネグレクト. [Japanese Journal of Child Abuse and Neglect.]
Bandura, A., & McClelland, D. C. (1977). Social learning theory (Vol. 1). Prentice Hall: Englewood cliffs.
Barth, J., Bermetz, L., Heim, E., Trelle, S., & Tonia, T. (2013). The current prevalence of child sexual abuse worldwide: a systematic review and meta-analysis. International Journal of Public Health, 58(3), 469–483.
Benson, M. S., & Powell, M. B. (2015). Evaluation of a comprehensive interactive training system for investigative interviewers of children. Psychology, Public Policy, and Law, 21(3), 309–322.
Blasbalg, U., Hershkowitz, I., & Karni-Visel, Y. (2018). Support, reluctance, and production in child abuse investigative interviews. Psychology, Public Policy, and Law, 24(4), 518–527.
Bruck, M., & Ceci, S. J. (1999). The suggestibility of children’s memory. Annual Review of Psychology, 50(1), 419–439.
Ceci, S. J., & Bruck, M. (1993). Suggestibility of the child witness: A historical review and synthesis. Psychological Bulletin, 113(3), 403–439.
Ceci, S. J., & Bruck, M. (1995). Jeopardy in the courtroom: A scientific analysis of children's testimony. Washington, DC, US: American Psychological Association.
Cederborg, A.-C., Alm, C., Lima da Silva Nises, D., & Lamb, M. E. (2013). Investigative interviewing of alleged child abuse victims: an evaluation of a new training programme for investigative interviewers. Police Practice and Research, 14(3), 242–254.
Cederborg, A. C., Orbach, Y., Sternberg, K. J., & Lamb, M. E. (2000). Investigative interviews of child witnesses in Sweden. Child Abuse & Neglect, 24 (10), 1355–1361.
Cyr, M., Dion, J., Gendron, A., Powell, M., & Brubacher, S. (2021). A test of three refresher modalities on child forensic interviewers’ posttraining performance. Psychology, Public Policy, and Law, 27(2), 221–230.
Elliott, D. M., & Briere, J. (1994). Forensic sexual abuse evaluations of older children: Disclosures and symptomatology. Behavioral Sciences & the Law, 12 (3), 261–277.
Espinet, S. D., Anderson, J. E., & Zelazo, P. D. (2013). Reflection training improves executive function in preschool-age children: Behavioral and neural effects. Developmental cognitive neuroscience, 4, 3–15.
Finnilä, K., Mahlberg, N., Santtila, P., Sandnabba, K., & Niemi, P. (2003). Validity of a test of children’s suggestibility for predicting responses to two interview situations differing in their degree of suggestiveness. Journal of Experimental Child Psychology, 85(1), 32–49.
Hailes, H. P., Yu, R., Danese, A., & Fazel, S. (2019). Long-term outcomes of childhood sexual abuse: an umbrella review. The Lancet Psychiatry, 6(10), 830–839.
*Haginoya, S., Yamamoto, S., Pompedda, F., Naka, M., Antfolk, J., & Santtila, P. (2020). Online simulation training of child sexual abuse interviews with feedback improves interview quality in Japanese university students. Frontiers in Psychology: Forensic and Legal Psychology, 11, 998.
*Haginoya, S., Yamamoto, S., & Santtila, P. (2021). The Combination of Feedback and Modeling in Online Simulation Training of Child Sexual Abuse Interviews Improves Interview Quality in Clinical Psychologists. Child Abuse & Neglect, 115, 105013.
Herman, S. (2009). Forensic child sexual abuse evaluations: Accuracy, ethics and admissibility. In K. Kuehnle, & M. Connell (Eds.), The evaluation of child sexual abuse allegations: A comprehensive guide to assessment and testing (pp. 247–266). Hoboken, NJ: Wiley.
Hershkowitz, I., Ahern, E. C., Lamb, M. E., Blasbalg, U., Karni-Visel, Y., & Breitman, M. (2017). Changes in interviewers' use of supportive techniques during the Revised Protocol training. Applied Cognitive Psychology, 31(3), 340–350.
Jacoby, L. L., Debner, J. A., & Hay, J. F. (2001). Proactive interference, accessibility bias, and process dissociations: Valid subject reports of memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27(3), 686–700.
Johnson, M., Magnussen, S., Thoresen, C., Lønnum, K., Burrell, L. V., & Melinder, A. (2015). Best practice recommendations still fail to result in action: A national 10-year follow‐up study of investigative interviews in CSA cases. Applied Cognitive Psychology, 29(5), 661–668.
Korkman, J., Santtila, P., Westeråker, M., & Sandnabba, N. K. (2008). Interviewing techniques and follow-up questions in child sexual abuse interviews. European Journal of Developmental Psychology,5(1), 108–128.
*Krause, N., Pompedda, F., Antfolk, J., Zappalá, A., & Santtila, P. (2017). The effects of feedback and reflection on the questioning style of untrained interviewers in simulated child sexual abuse interviews. Applied Cognitive Psychology, 31(2), 187–198.
Lafontaine, J., & Cyr, M. (2017). The relation between interviewers’ personal characteristics and investigative interview performance in a child sexual abuse context. Police Practice and Research, 18(2), 106–118.
Lamb, M. E., Brown, D. A., Hershkowitz, I., Orbach, Y., & Esplin, P. W. (2018). Tell me what happened: Questioning children about abuse. John Wiley & Sons.
Lamb, M. E., Sternberg, K. J., Orbach, Y., Esplin, P. W., & Mitchell, S. (2002). Is Ongoing Feedback Necessary to Maintain The Quality of Investigative Interviews With Allegedly Abused Children? Applied Developmental Science, 6(1), 35–41.
Lamb, M. E., Sternberg, K. J., Orbach, Y., Hershkowitz, I., Horowitz, D., & Esplin, P. W. (2002). The effects of intensive training and ongoing supervision on the quality of investigative interviews with alleged sex abuse victims. Applied Developmental Science, 6(3), 114–125.
Lamb, M. E., Hershkowitz, I., Orbach, Y., & Esplin, P. W. (2008). Tell me What Happened: Structured Investigative Interviews of Child Victims and Witnesses. Hoboken, NJ: JohnWiley and Sons.
Lamb, M. E., Orbach, Y., Hershkowitz, I., Horowitz, D., & Abbott, C. B. (2007). Does the type of prompt affect the accuracy of information provided by alleged victims of abuse in forensic interviews?. Applied Cognitive Psychology: The Official Journal of the Society for Applied Research in Memory and Cognition, 21(9), 1117–1130.
Lamb, M. E., Sternberg, K. J., Orbach, Y., Esplin, P. W., Stewart, H., & Mitchell, S. (2003). Age differences in young children’s responses to open-ended invitations in the course of forensic interviews. Journal of Consulting and Clinical Psychology, 71(5), 926–934.
Lyon, T. D. (2014). Interviewing children. Annual Review of Law and Social Science, 10(1), 73–89.
MacDonald, S., Snook, B., & Milne, R. (2017). Witness interview training: A field evaluation. Journal of Police and Criminal Psychology, 32(1), 77–84.
Matthew, C. T., & Sternberg, R. J. (2009). Developing experience-based (tacit) knowledge through reflection. Learning and individual differences, 19(4), 530–540.
Pate, R. (2012). Open versus closed questions: What constitutes a good question. Educational research and innovations, 29–39.
Pompedda, F. (2018). Training in Investigative Interviews of Children: Serious Gaming Paired with Feedback Improves Interview Quality. (Doctoral dissertation, Abo Akademi University, Finland). Retrieved from https://www.doria.fi/handle/10024/152565
*Pompedda, F., Zappalà, A., & Santtila, P. (2015). Simulations of child sexual abuse interviews using avatars paired with feedback improves interview quality. Psychology, Crime & Law, 21(1), 28–52.
*Pompedda, F., Antfolk, J., Zappalà, A., & Santtila, P. (2017). A combination of outcome and process feedback enhances performance in simulations of child sexual abuse interviews using avatars. Frontiers in Psychology, 8, 1474.
*Pompedda, F., Palu, A., Kask, K., Schiff, K., Soveri, A., Antfolk, J., & Santtila, P. (2020). Transfer of simulated interview training effects into interviews with children exposed to a mock event. Nordic Psychology, 73(1) 43–67.
*Pompedda, F., Zappalà, A., & Santtila, P. (2015). Simulations of child sexual abuse interviews using avatars paired with feedback improves interview quality. Psychology, Crime and Law, 21(1), 28–52.
Powell, M. B., Guadagno, B., & Benson, M. (2016). Improving child investigative interviewer performance through computer-based learning activities. Policing and Society, 26(4), 365–374.
Powell, M. B., Hughes-Scholes, C. H., Smith, R., & Sharman, S. J. (2014). The relationship between investigative interviewing experience and open-ended question usage. Police Practice and Research, 15(4), 283–292.
Ron, N., Lipshitz, R., & Popper, M. (2006). How organizations learn: Post-flight reviews in an F-16 fighter squadron. Organization Studies, 27(8), 1069–1089.
Sternberg, K. J., Lamb, M. E., Davies, G. M., & Westcott, H. L. (2001). The memorandum of good practice: Theory versus application. Child Abuse and Neglect, 25(5), 669–681.
Taylor, P. J., Russ-Eft, D. F., & Chan, D. W. (2005). A meta-analytic review of behavior modeling training. Journal of applied psychology, 90(4), 692–709.
Waterman, A. H., Blades, M., & Spencer, C. (2000). Do children try to answer nonsensical questions? British Journal of Developmental Psychology, 18(2), 211–225.
Whisman, M. A. (1990). The efficacy of booster maintenance sessions in behavior therapy: Review and methodological critique. Clinical Psychology Review, 10(2), 155–170.
Wolfman, M., Brown, D., & Jose, P. (2016). Talking past each other: Interviewer and child verbal exchanges in forensic interviews. Law and Human Behavior, 40(2), 107–117.

Table 1 Descriptive Statistics of the Nine Datasets Included in the Mega-Analysis.

Study	Experimental Conditions	Rounds of Interviews	Participant
	Experimental Conditions	Rounds of Interviews	Population	n _Female	M_age	SD_age
Pompedda et al. (2017)	Control (n = 12) Outcome Feedback (n = 12) Process Feedback (n = 12) Feedback (n = 12)	4	Europe; Students	38	27.9	9.1
Krause et al. (2017)	Control (n = 19) Feedback (n = 19) Feedback +Reflection (n = 21)	8	Europe; Students	35	24.4	3.7
Haginoya et al., (2020)	Control (n = 15) Feedback (n = 17)	6	Japan; Students	23	20.5	0.6
Pompedda et al., (2021) Study 1	Control (n = 20) Feedback (n = 20)	6	Europe; Psychologists	37	27.4	2.2
Pompedda et al., (2021) Study 2	Control (n = 32) Feedback (n = 32)	6	Europe; Students	44	23.1	3.6
Haginoya et al., (2021)	Modeling (n = 11) Feedback (n = 10) Feedback +Modeling (n = 11)	5	Japan; Psychologists	22	35.1	8.7
Authors [masked for review]	Control (n = 10) Control+ Feedback (n = 11)	4/8	Japan; Police	8	35.5	5.4
Authors [masked for review]	Control (n = 11) Control+ Feedback (n = 11)	4/8	Europe; Police	3	41.2	6.2
Authors [masked for review]	Control (n = 20) Feedback (n = 20) Supportive (n = 20) Feedback +Supportive (n = 20)	4	Japan; Mixed	53	35.6	9.9

Table 2 Question-type Coding Used in the Studies

Category	Definition
Recommended Questions
Facilitators	Open-ended and non-suggestive questions that encourage the child to continue with the previous answer
Invitations	Open-ended and non-suggestive questions. They are broad and let the child talk freely
Directive	Open-ended and non-suggestive questions that focus the child attention on a previously mentioned detail asking for a specific explanation
Not Recommended Questions
Option-posing	Closed-ended questions that focus on unmentioned detail (without implying a particular type of response) or on a mentioned detail asking the child to provide a yes/no answer
Specific suggestive	Open-ended or closed-ended questions that are based on an unmentioned detail and express the expected response
Unspecific suggestive	Open-ended or closed-ended questions that are not based on an unmentioned detail but express the expected response
Repetitions	Repetitions of a previous recommended or not recommended question
Too-long/Unclear	Questions that use a logical structure that is too complicated for the cognitive level of the child and/or are formulated in a haphazard manner and/or contains more than one concept at the time
Multiple choice	Questions that provide a predetermined list which the child is requested (explicitly or implicitly) to pick from
Time	Open-ended or closed-ended questions that require the child to provide or recollect precise time-related information
Fantasy	Open-ended or closed-ended questions that move the discussion from the reality to the fantasy level
Feelings	Open-ended or closed-ended questions that require the child to provide accounts regarding own or other’s feelings

Table 3 Means and SDs of the Dependent Variables in the Combined Dataset (Excluding 1st Round).

Dependent variables	No Feedback			Feedback
	n	M	SD		n	M	SD
Recommended Questions	803	14.88	9.99		988	23.21	12.51
Non-Recommended Questions	803	21.30	12.89		988	11.45	10.30
% Recommended Questions	803	41.20	22.19		988	66.43	24.37
Relevant Details	803	3.42	2.63		988	4.99	2.96
Neutral Details	803	3.15	3.10		988	4.68	2.42
Wrong Details	775	2.74	2.55		911	1.36	2.93
% Relevant Details	775	57.19	35.49		911	77.11	32.63
% Correct Conclusions	773	11.25	-		925	21.62	-

Note. n refers to the numbers of interviews containing the dependent variable in question.

Feedback and No Feedback refer to conditions with and without combined feedback, respectively.

Table 4 Round-Wise Comparisons in the Feedback Group for Question Use.

Predictors	Recommended Questions (N = 1,307)			Non-recommended Questions (N = 1,307)			% Recommended Questions (N = 1,307)
	B	SE	95% CI	B	SE	95% CI	B	SE	95% CI
Fixed effects
(intercept)	23.42	2.35	[18.75, 27.94]	11.14	2.26	[6.59, 15.54]	67.07	4.19	[58.73, 75.23]
R1 vs. R2	4.58	0.60	[3.44, 5.77]	−3.81	0.57	[−4.91, −2.65]	13.35	1.13	[11.20, 15.63]
R2 vs. R3	3.10	0.60	[1.88, 4.28]	−2.21	0.57	[−3.33, −1.09]	7.82	1.13	[5.67, 9.98]
R3 vs. R4	2.54	0.60	[1.34, 3.69]	−1.10	0.57	[−2.23, 0.01]	4.83	1.13	[2.61, 7.04]
R4 vs. R5	3.15	0.75	[1.67, 4.61]	−1.88	0.71	[−3.26, −0.46]	5.33	1.41	[2.52, 8.07]
R5 vs. R6	1.07	0.88	[−0.68, 2.82]	−1.05	0.83	[−2.67, 0.56]	4.05	1.64	[0.82, 7.20]
R6 vs. R7	6.23	1.28	[3.69, 8.75]	−2.73	1.22	[−5.05, −0.29]	7.73	2.40	[2.93, 12.47]
R7 vs. R8	−2.05	1.50	[−4.94, 0.92]	−0.48	1.42	[−3.30, 2.28]	1.19	2.80	[-4.38, 6.77]
Reflection	1.90	2.15	[−2.45, 6.12]	0.31	2.29	[−4.29, 4.90]	1.43	4.29	[-6.89, 9.82]
Modeling	15.84	3.03	[10.02, 21.94]	−3.09	3.21	[−9.63, 3.22]	20.75	6.02	[8.48, 32.70]
Random effects
	ICC	SD	95% CI	ICC	SD	95% CI	ICC	SD	95% CI
ID: Study	.31	6.44	[5.73, 7.19]	.37	7.00	[6.31, 7.75]	.36	13.03	[11.68, 14.51]
Study	.35	6.83	[3.38, 10.27]	.32	6.53	[3.17, 9.81]	.31	12.11	[5.73, 18.46]
Residual		6.69	[6.40, 6.97]		6.35	[6.07, 6.62]		12.53	[11.98, 13.07]
Model fit
AIC	9134.33			9054.56			10793.05
BIC	9201.61			9121.84			10860.33
Pseudo R²-fixed/ total	.23/.74			.09/.72			.27/.76

Note. All significant comparisons and predictors in the model are in bold.

Table 5 Round-Wise Comparisons in the Feedback Group for Details Elicited.

Predictors

Relevant Details (N = 1,307)

Neutral details (N = 1,307)

Wrong details (N = 1,219)

% Relevant details (N = 1,219)

95% CI

Fixed effects

(intercept)

5.19

0.40

[4.40, 5.99]

4.75

0.45

[3.86, 5.66]

1.22

0.42

[0.40, 2.03]

78.81

4.74

[69.60, 88.07]

R1 vs. R2

0.93

0.18

[0.57, 1.29]

0.87

0.19

[0.51, 1.23]

−0.71

0.15

[−1.01, −0.41]

12.02

2.31

[7.44, 16.62]

R2 vs. R3

0.83

0.18

[0.46, 1.20]

0.64

0.19

[0.28, 1.01]

0.01

0.15

[−0.29, 0.31]

8.84

2.31

[4.30, 13.33]

R3 vs. R4

0.54

0.18

[0.17, 0.91]

0.48

0.19

[0.11, 0.85]

−0.18

0.15

[−0.49, 0.12]

−0.01

2.31

[−4.47, 4.41]

R4 vs. R5

0.59

0.23

[0.15, 1.03]

0.51

0.23

[0.05, 0.96]

−0.30

0.19

[−0.68, 0.08]

2.38

2.93

[−3.34, 8.48]

R5 vs. R6

0.20

0.27

[−0.32, 0.71]

0.18

0.27

[−0.37, 0.72]

0.10

0.23

[−0.35, 0.55]

3.52

3.45

[−3.37, 10.22]

R6 vs. R7

1.04

0.39

[0.26, 1.81]

1.18

0.40

[0.41, 1.98]

−0.66

0.37

[−1.37, 0.06]

9.42

5.53

[−1.31, 19.91]

R7 vs. R8

−0.05

0.46

[−0.94, 0.86]

−0.32

0.46

[−1.25, 0.58]

−0.10

0.44

[−0.97, 0.76]

3.98

6.58

[−8.53, 16.93]

Reflection

0.47

0.50

[−0.52, 1.44]

0.49

0.48

[−0.44, 1.44]

−0.53

0.46

[−1.46, 0.38]

5.69

5.67

[−5.86, 16.70]

Modeling

2.80

0.72

[1.36, 4.23]

2.96

0.69

[1.61, 4.32]

0.40

0.57

[−1.54, 0.73]

17.75

7.19

[3.14, 31.69]

Random effects

ICC

95% CI

ICC

95% CI

ICC

95% CI

ICC

95% CI

ID: Study

.28

1.46

[1.28, 1.65]

.23

1.34

[1.17, 1.52]

.33

1.13

[0.98, 1.28]

.17

12.81

[10.70, 14.97]

Study

.16

1.12

[0.49, 1.72]

.21

1.29

[0.62, 1.93]

.18

1.20

[0.56, 1.82]

.19

13.42

[6.13, 20.46]

Residual

2.05

[1.96, 2.14]

2.07

[1.98, 2.16]

1.66

[1.59, 1.74]

25.07

[23.94, 16.15]

Model fit

AIC

5950.20

5942.34

5036.47

11510.46

BIC

6017.48

6009.62

5102.84

11576.84

Pseudo R²-fixed/total

.19/.55

.16/.53

.04/.52

.11/.42

Table 6 Round-Wise Comparisons in the Feedback Group for Conclusion Correctness.

Predictors	Conclusion Correctness (N = 1,223)
	Odds Ratio	SE	95% CI
Fixed effects
(intercept)	0.21	0.07	[0.11, 0.41]
R1 vs. R2	2.59	0.86	[1.35, 4.95]
R2 vs. R3	1.16	0.31	[0.69, 1.97]
R3 vs. R4	1.19	0.31	[0.72, 1.97]
R4 vs. R5	2.31	0.69	[1.29, 4.15]
R5 vs. R6	1.06	0.34	[0.56, 1.99]
R6 vs. R7	1.91	0.81	[0.83, 4.42]
R7 vs. R8	1.24	0.59	[0.50, 3.13]
Reflection	1.27	0.43	[0.65, 2.49]
Modeling	8.18	5.04	[2.44, 27.39]
Random effects
	ICC	SD
ID: Study	.09	0.64
Study	.15	0.82
Residual
Model fit
AIC	1063.33
BIC	1124.64
Pseudo R²-fixed/total	.16/.37

Table 7 Percentage of Reliable Change in Designs with Different Rounds

Training Design	Recommended Questions	Non-recommended Questions	Relevant details	Neutral details	Wrong details
4 rounds	22.6% (21/93)	12.9% (12/93)	17.2% (16/93)	14.0% (13/93)	9.7% (9/93)
5 rounds	71.4% (15/21)	14.3% (3/21)	47.6% (10/21)	71.4% (15/21)	9.5% (2/21)
6 rounds	36.2% (25/69)	17.4% (12/69)	23.2% (16/69)	36.2% (25/69)	2.9% (2/69)
8 rounds	80.0% (32/40)	37.5% (15/40)	40.0% (16/40)	37.5% (15/40)	15% (6/40)
χ²statistics	χ²(3) = 46.60, p< .001	χ²(3) = 11.64, p= .009	χ²(3) = 13.20, p= .004	χ²(3) = 30.57, p< .001	χ²(3) = 5.14, p= .162

SupplementaryMaterialsJPCPTableS1S17.docx

Download PDF

Journal Publication

published 01 Apr, 2022

Read the published version in Journal of Police and Criminal Psychology →

Version 1

posted

You are reading this latest preprint version

A Mega-Analysis of the Effects of Feedback on the Quality of Simulated Child Sexual Abuse Interviews with Avatars

Status:

Journal Publication

Version 1

Abstract

Figures

Main Text

Method

Results

Discussion

Conclusion

Declarations

References

Tables

Supplementary Files

Status:

Journal Publication

Version 1