Behavior analytic based virtual reality interventions to teach adaptive and functional skills for individuals diagnosed with autism: A systematic review

Amarie Carnett (  amarie.carnett@vuw.ac.nz ) Victoria University of Wellington https://orcid.org/0000-0003-0664-1346 Leslie Neely University of Texas at San Antonio https://orcid.org/0000-0002-2171-3304 Siobhan Gardiner Victoria University of Wellington Marie Kirpatrick University of Texas at San Antoni https://orcid.org/0000-0002-6253-0504 John Quarles University of Texas at San Antonio https://orcid.org/0000-0002-4790-167X Kameron Christopher 5National Institute of Water and Atmosphere


Introduction
Recent numbers from the Centers for Disease Control and Prevention (CDC) indicate that 1 in 44 children have autism spectrum disorder (ASD; Maenner et al., 2021). ASD is characterized by a range of strengths (e.g., attention to detail) and needs (e.g., social communication skills). The needs of those with ASD can best be supported through interventions, particularly those considered to be evidence-based practices. Evidence-based practices are interventions and teaching methods that are substantially supported through research to produce positive outcomes (Hume et al., 2021). One evidence-based practice aimed at improving socially valid behavioral concerns is applied behavior analysis (ABA). ABA is a branch of the science of behavior analysis which relies on reviews exist on VR for people with ASD. Three of these four reviews focus on social communication skills (Irish, 2013;Parsons & Mitchell, 2002;Vasquez et al., 2015) and one review focuses on the use of naturalistic interventions within the context of VR (Dechsling et al., 2021). A focus in these areas is not surprising given the diagnostic criteria for ASD relates to de cits in communication; however, individuals with ASD often have di culty acquiring other skills as well. Additionally, neither of these reviews assessed the methodological rigor of the studies; therefore, it is unclear about the extent of the reliability of interpreting the outcomes of the included studies within their review. While there is a need to address the de cits of communication for those with ASD, it is also imperative to teach adaptive and functional skills in order to promote independence.
There is signi cant potential for VR to enable people with ASD to have meaningful opportunities to learn and generalize skills to their everyday life. The current review aims to (1) evaluate the current existing body of literature utilizing behavioral analytic interventions delivered using VR to target adaptive and functional skills for individuals with ASD, and (2) assess the methodological rigor of the literature to inform future research and practice for the use of VR when targeting adaptive and functional skills.

Method Search Procedures
The researchers completed a systematic search of the following databases: PsychINFO, Medline, Psychology and Behavioral Sciences Collection, and ERIC. These searchers were conducted by combining a term to describe ASD (i.e., "Autis*," "Developmental disab*," "Asperger," "ASD") with a term to describe VR ("virtual reality") and ("intervention", "treatment"). The original search was conducted in September 2021 and yielded 191 articles after the removal of duplicates (see Figure 1 graphic results). Following the database searches, the third author did an initial screening of each article by their title and abstract and excluded articles that did not include the use of VR and ASD (n = 129). An additional search using Google Scholar was conducted to identify any additional articles.
After excluding duplicates, a total of 34 articles were added to the total article list for application of the inclusion criteria. In total 67 articles were screened by their full text using the following inclusion and exclusion criteria and 50 were excluded with a total of 17 articles meeting inclusion criteria.
Next, an ancestral and forward search of the included articles was conducted. For the ancestral search, the references of the included articles were reviewed and extracted if titles contained any of the key search terms (provided above). For the forward search, Google Scholar was used to search the record of included articles by selecting the "cited by" button. Relevant articles citing the included article were reviewed for possible inclusion. All relevant articles were extracted from the ancestral and citation search into a Microsoft Excel TM spreadsheet.
These articles then underwent the full inclusion review. The researchers conducted this iterative process and retrieved a total of 7 additional articles from the ancestral and forward search. The nal number of articles included totaled 24 studies.

Inclusion and Exclusion Criteria
To be included in this review, articles had to meet the following criteria: (a) be peer-reviewed and published in English, (b) include at least one participant with an ASD, (c) implement a therapeutic intervention designed to establish/increase appropriate behavior or decrease interfering behaviors, (d) utilize an experimental design to evaluate the effects of the intervention on the target behaviors, (e) use a form of AR to facilitated the therapeutic intervention, and (f) provide quantitative data pertaining to the participant's acquisition of adaptive and functional target behaviors (e.g., teaching air travel behavior, treatment of phobia, learning pedestrian safety skills). Studies that did not collect data on an adaptive or functional target behavior or did not include a therapeutic intervention as the independent variable were excluded (e.g., social skills, communication skills). For example, Fitzgerald et al.
(2018) conducted a study that evaluated the use of VR and video modeling to teach paper folding tasks (e.g., making a paper boat). However, this paper was excluded since although folding paper might have some functional and adaptive contexts, such as folding paper menus in the context of job skills, this was in the context of what could be viewed as an arts and craft activity, without a direct functional or adaptive context during the intervention. Additionally, the researchers included any study utilizing a quasi-experimental, group comparison experimental, or single-case experimental design. Studies evaluating the validity of AR interventions or participants' perspectives on using the technology were excluded. For example, McCleery et al. (2020) evaluated the usability and feasibility of an immersive VR program to teach police interaction skills for participants with autism but did not measure targeted skills gained from the intervention program, and thus was excluded from this review. Finally, any study that discussed the development of technologies or the architecture of the technologies but did not provide quantitative data on the effects of the intervention on the target dependent variables was excluded. For example, Trepagnier et al. (2005) discussed multiple computer-based and virtual environment technologies that are in development but did not utilize those technologies in an experiment. After application of the inclusion criteria, a total of 24 articles were included in this review.

Descriptive Synthesis
The raters coded each article by the following variables: (a) number of participants; (b) participants characteristics (age, gender, diagnosis); (c) dependent variable; (d) independent variable, (e) technology utilized (description of the AR technology); (f) experimental design; and (g) study outcomes. Raters coded the total number of participants including both the participants with ASD and without ASD. Raters provided a narrative description of the dependent variables, independent variables, and technology used. Raters coded the experimental design as either group experimental, quasi-experimental, or single-case experimental designs. Finally, raters coded the study outcomes according to how the author(s) reported the outcomes for the target dependent variable(s).

Quality Evaluation Method
Articles were group based on the experimental design (i.e., single-case research versus group experimental/quasiexperimental) to facilitate the quality evaluation. The two lead authors then evaluated each study according to corresponding rubric developed by Reichow et al. (2008) evaluative method single case or group-experimental research design. Reichow's evaluative method was chosen in comparison to other quality evaluative method (e.g., Center for Exceptional Children Standards) since it includes procedures to evaluate both single-case and group experimental research, evaluates internal and external validity, and was speci cally developed for research speci c to individuals with autism (Cook et al., 2015;Reichow, 2008). Additionally, Reichow's evaluative method has been well established in the literature to aid in the identi cation of practices that meets the standards to be classi ed as an evidence-based practice (EBP; Lynch et al., 2018;Reichow, 2011).

Interrater Reliability
Search and Inclusion Criteria. During the review for inclusion, two raters coded 100% of the articles (n = 74). To evaluate the reliability of the application of the inclusion and exclusion criteria, interrater reliability (IRR) was calculated using the percent agreement by dividing the total number of agreements by the total number of agreements plus disagreements and then multiplying by 100 to obtain a percentage. Agreement on inclusion was obtained on 89.19% of the studies (n = 66). Disagreements were reviewed and discussed by the raters until agreement was established for a nal agreement of 100%.
Data Extraction. Two raters independently coded 50% of the included articles (n = 24). Each article was coded across three categories with a total of 36 items for which reliability was evaluated (i.e., 12 articles with three categories each). Agreement was established on 33 of the items. And IRR was calculated using the percentage of agreement by dividing the total number of items with agreement and by the total number of items and then multiplying by 100 to obtain a percentage. The initial IRR was 91.67%. Disagreements were reviewed by the raters and discussed for a nal IRR of 100%. The nal table was then evaluated for accuracy by the remaining authors to ensure accuracy of the table.
Quality Evaluation. Twelve studies of the 24 articles (50%) were independently reviewed by the two lead authors to establish IRR. The 12 articles included seven group experimental/quasi-experimental design studies and four single-case research studies. There were 12 indicators per article for a total of 24 items for which reliability was evaluated. Agreement was established on 21 of the 24 total items (88%). Disagreements were discussed by the authors until a consensus for a nal IRR of 100%.

Results
The 24 articles included in this review were summarized by dependent variable, intervention components, behavioral components, and technology used. Table 1 provides the data summary of each study.
Participants. Across the 24 included studies there were a total of 882 participants (excluding the staff participants included in Smith et al., 2021a) with an approximate mean age of 19.77 (range = 4 to 29.4). The majority of the included participants were high to moderate functioning levels.
Behavior analytic components embedded within VR. A combination of behavior analytic components, such as antecedent interventions, prompting, reinforcement, or corrective feedback, were utilized by all the included studies. For nine of the studies (37.5%) the VR system primarily provided the learning stimuli, prompts, and consequence variables (e.g., reinforcement or feedback) and although in some cases a researcher or therapist provided pre-training on the use of the VR system. For ve of the studies (20.83%) a combination of the VR system and therapist implementation was used. For example, for most of the studies utilizing VR within the context of job interview training, the VR system was primarily used for practice interviews and additional instruction was provided by a therapist on related interview skills (e.g., Smith et al., 2021a;Strickland et al., 2013).
Lastly, eight studies (33.33%) the VR system was utilized primarily for the learning stimuli needed for teaching the targeted skill with a therapist delivering instruction, prompting, and reinforcement. For example, Dixon et al. (2019) used the VR system within the context of pedestrian safety (visual and auditory stimuli) with a therapist delivering questions related to the safety of the situation (e.g., "Is there a moving car?") and providing reinforcement for the participants responses.
VR Technology. All papers used software to create the virtual environments, but some used additional hardware displays and interfaces to increase the level of immersion. A non-immersive VR was the most commonly utilized con guration, which was used by 41.67% of the included studies (n = 10). This type of VR con guration is the least immersive and generally relied on a standard desktop sized computer monitor (i.e., size range) with basic inputs from the user (e.g., desktop keyboard or controller; Bamodu, & Ye, 2013). The second most utilized VR platform was a semi-immersive VR which was used by 33.33% (n = 8). This con guration relied on external equipment, such as sensors for interaction (e.g., XBOX Kinect, Leap Motion) and projectors or large screens to display the VR simulation (e.g., Blue Room advanced VRE) to create a sense of deeper immersion and interactivity within a VR simulation (Bamodu, & Ye, 2013). Lastly full immersive VR was used by 25% (n = 6) of the included studies. This set up entailed both the use of advanced VR technology (e.g., motion tracking, head mounted display, Oculus Touch controllers) with the use of software (e.g., Unity Game engine) to create the more advanced 3D VR environments (Bamodu, & Ye, 2013).
Quality Ratings and evaluation of evidence. There were 18 group experimental design studies and six single-case experimental design studies. Overall, the raters identi ed three (12.5%) of the studies as meeting criteria to be classi ed as "strong" and nine (37.5%) of the studies as meeting criteria to be classi ed as "adequate". The remaining studies (12; 50%) did not meet criteria and cannot offer evidence towards the research question. Of the 18 group experimental design studies the raters classi ed two (.83%) as "strong", six (25%) as "adequate" and ten (41.67%) as "weak". Of the six single-case experimental design studies, the raters classi ed one (.42%) as "strong", three (12.5%) as "adequate" and two (.83%) as "weak".
Taken as a whole, the three studies (Cox et al., 2017;Hu & Han, 2019;Wade et al., 2013) identi ed as "strong" quality studies were conducted by three different research teams, at three different locations, with 74 different participants and meet the quali cations to be considered a promising practice.

Discussion
The primary aim of this review was to synthesize the literature based on the use of VR interventions for adaptive and functional behavior for individuals with autism. The secondary aim of this review was to evaluate the quality of the studies to help guide future research and practice applications. A total of 24 studies met the criteria of inclusion for this review. Of those studies, ten targeted vocational related skills, seven targeted functional behaviors (e.g., problem behavior treatment, hypersensitivity, phobias), four targeted safety skills (e.g., driving, airplane travel, pedestrian safety), two studies targeted general functional skills, and one targeted exercise engagement. In terms of quality ratings, only three of the studies met the three quality criteria for a classi cation of "strong". This indicates a need for replication of both single case and group experimental design, as well as an increase in the rigor of quality design methodology. Further, since these studies did not incorporate full immersive VR, this also highlights an area of need for quality research design and replication.
This review highlights several bene ts for the use of VR based interventions for individuals with autism. First, the ability to create environments conducive to safe practice for the development of safety skills, such as driving safety and pedestrian safety skills. In particular, VR environments can reduce the risks associated skill acquisition that might not be feasible in the real-world environments. For example, when practicing safely walking across the street in a VR environment, there are no real risk if the user does not wait for the crosswalk sign to signal as compared to the real environment, where an individual could be hit by a car.
Another potential bene t of VR based interventions are the ability to customize the user's intervention based on their progress for skill acquisition, such as embedding prompts to help highlight the salient cues in the environment that should evoke a speci c behavior response from the user. For example, Cox et al. (2017) included extra stimulus cues within the VR driving simulation based on user eye gaze to highlight driving hazards that should evoke driver attention and defensive driving maneuvers. This type of included component can potentially help ensure the VR interaction can individualize to the user, thus providing a more tailored intervention and user experience.
VR interventions can also allow for extra practice and a variety of exemplars to better promote generalization of skills (multiple exemplar training study). Further, VR can also easily allow for generalization to the natural environment since it allows for programming of the relevant stimuli that would occur within the natural environment (Stokes & Baer, 1977). For example, Miller et al. (2020) included programming for generalization within the sessions of the study. Speci cally, this study conducted the last session of the study at the airport to provide a real-world rehearsal of the air travel skills targeted during the VR-based intervention. This study highlights the utility and e cacy of VR based interventions as well as the need to evaluate the transfer of skills to the "real" environment. However, given the lack of assessment of generalization to real-environments from the studies included in this review, more analysis is needed to evaluate the effects of generalization on VR-trained skills.
Lastly, some of the studies included in this review indicated the effectiveness of using lower cost VR systems, which may increase the feasibly of VR-based interventions within clinical applications. For example, Miller et al. (2020) used an iPhone X with Google Cardboard device to create a virtual air travel experience. And several studies used a commercially available internet software program (i.e., Molly Porter by SIMmerson Immersive Simulations) to provide mock interviews for developing interview skills (i.e., Genova et al., 2021;Humm et al., 2014;Smith et al., 2014;Smith et al., 2021aSmith et al., , 2021bWard & Esposito et al., 2018). Although low tech solutions may be readily available, research is still needed to help evaluate the costs and bene ts of the various VR technology as it relates to skills being taught, the needs of the individual, and the programming of relevant environmental variables to help best promote generalization of skills to real world environments.
While the current research evaluated in this review indicates that VR is a conducive platform complementary for the integration of behavior analytic strategies to develop effective interventions, there are a few considerations worthy of discussion. First, only three studies met the criteria for a classi cation of "strong" quality standards. This indicates a need for further replication of VR-based interventions that focus on teaching functional and adaptive skills.
Second, there is a need for decision making frameworks to help inform practitioners and service providers which equipment options allow for individualization or what technology options best align to various characteristics and needs of the individuals we serve. For example, Simões et al. (2018) provided differentiation across the technology used. Speci cally, four of the participants in the study did not use the VR head-mounted display due to vision impairments, however the desktop con guration was still conducive for those users to participate in the VR intervention. This highlights the need for clear decision-making framework for technology selected in VR-based interventions.
Third, there is a need for cross eld collaboration to ensure that VR interventions have the programming capacity for individualization, systematic teaching procedures, and reinforcement contingencies that are transferable to the real environments. In many of the studies included in this review, therapist/researchers were still providing prompts and reinforcement rather than these elements being seamlessly incorporated into the VR system. This may indicate that there was a lack of collaboration across technology developers and behavior analysts. As such, future research should consider the bene ts of cross-eld collaboration to improve the quality and e cacy of VR-based interventions.
Finally, there is a need to evaluate other skills that fall within the domain of adaptive and functional behaviors, where VR could provide a better context for developing effective interventions. Given the few areas of safety skills addressed in the current literature, this seems like an obvious area that could bene t individuals who are working to develop these functional skills. For example, abduction prevention could be an area where VR base interventions might provide for more effective training, as compared to role playing or social stories-based interventions, since the virtual environment could include relevant signals with multiple exemplars and provide practice opportunities (e.g., Ledbetter-Cho et al., 2016).

Implication For Practice And Conclusion
For practitioners, it is important to highlight the use of EBPs when developing interventions for individuals with autism. Given the range of technology options for VR-based intervention, consideration of prerequisite skills for both the use of technology and the skill that is targeted within the intervention. Thus, assessment should be utilized to help guide the intervention plans. For example, if using VR goggles, it would be important to do some direct assessment to ensure the user has the necessary skills and that the VR experience is enjoyable and does not cause issues, such as motion sickness. Practitioners would also want to be sure that generalization of the skill is accounted for within the intervention and transfers easily to the real world. This may also require incorporating other stakeholders within the intervention phases to ensure the technology used is feasible for everyone involved. As VR technology continues to advance, research is needed to help provide a clear framework for collaboration and decision making to help progress and extend VR-based interventions.
Barton, E. E., Pustejovsky, J. E., Maggin, D. M., & Reichow, B. (2017). Technology-aided instruction and intervention for students with ASD: A meta-analysis using novel methods of estimating effect sizes for single-case research. Remedial and Special Education, 38 (6)  The match-to-sample software program was developed for the purpose of this study and not commercially available.
provided feedback based on the accuracy of the response (e.g., corrective or praise).
respond correctly with the visual cue, the system would display a crying face with the auditory statement of 'Try again'. Humm et al. (2014) (1) Role play skills scores, (2) Selfconfidence measures, All software programs were programmed using Flash 8 with the programming language Actionscript 2.0. The programs were run using Macromedia Flash Player.
Ward & Esposito (1) Self-efficacy measured via the General Self Efficacy Scale, (2) Selfconfidence measured via the Interview Self Confidence Survey,