Systematic searching and systematic assessment of the retrieved papers resulted in the inclusion of five papers in which AR was compared with another form of anatomical learning, as shown in Figure 1. 11,12,19-21 The assessment for the risk of bias and the level of change of knowledge according to the model of Kirkpatrick is summarized in Table1. See Table2 for more information on the participants in the included studies. All papers showed to be of moderate quality with minimal risks of bias.
Study Characteristics
The initial search yielded 430 results found in different databases of which 23 were duplicates and removed. Evaluating the title and abstract, 43 records were chosen to be screened. Of these, 12 papers were eligible for the qualitative synthesis. After evaluating full text, 12 papers were found to match our inclusion criteria, of which 7 proved to be irrelevant to our aim. The 5 remaining papers met the inclusion criteria. However some of the required outcomes, such as student motivation was not reported in all of the papers. The PRISMA flowchart shows the details and the search strategy can be found in the supplementary files. The assessment of the risk of bias was done according to the model of Kirkpatrick and is summarized in Table 1. The studies were synthesized by identifying the similar key themes and statements in these papers and then by independent reviews and later consensus building reclassifying these similarities and gathering conclusions from them following the PICO framework.
Participant variation
The total amount of participant were N= 569, of which 306 female. Participants originated from several countries, namely Australia, United States, Germany and the Netherlands. Undergraduates studying anatomy were sought out. The five studies have similar age groups, with the clear outlier of one paper’s third group 21. The means range from 18.5 to 22.5 years of age. Three studies reported the ratio of included biomedical students to medical students 12,19,20, which can be seen in Table 2. The groups show similarities in age, future academic aims and MRT scores. The effect of MRT scores has been examined in three papers 11,19,20. MRT scores showed to have an significant impact on the pre and posttest scores. Bork et al. showed that participants with low MRT scores using AR had higher scores compared to control, which was in accordance with the findings of Bogomolova et al., 2020.
Intervention heterogeneity
The AR interventions show differences in their approach to AR. Henssen et al., 2019 and Moro et al., 2017 shows a practical tablet based 3D model, while two studies opted for virtual mirrors with AR capabilities, called REFLECT 11,12. This mirror possess the ability to virtually project musculature on a subject. A headset-based AR application has been used in one study 19. All these interventions conform to the definition of AR. However, the differences should be noted in the form of AR and the implications, such as the adverse events reported by Moro et al, 2017. These showed that AR users experienced more general discomfort in their use compared to tablet users 21. Henssen et al., 2019 reported that students needed to get used to the device, causing some discomfort. Magic Mirror was claimed to be tiring to use after long learning sessions, according to three participants from Bork et al. 2019 while no such feedback was given in Barmaki et al., 2019. Moreover, no adverse effects were reported by Bogomolova et al., 2020.
Controls
Traditional teaching methods have been used, such as cross-sections and anatomical atlases, by three studies 11,19,20. Two of these studies used a virtual dissection table and a non-AR 3D desktop model respectively, while the latter had cross-sections as control. In the study of Barmaki et al. 2019 the virtual mirror without superimposing AR features functioned as control. Moro et al., 2017 compared AR to a VR headset and a conventional tablet based 3D model.
The effects on learning
The primary outcome measure was the effectiveness on learning, measured with the difference in pre- and posttest scores. The tests consisted of multiple choice questions in all of the studies, where some studies opted to supplement the tests with open ended questions, regarding the chosen anatomical structures. Little to no significant difference were found in the effectiveness on learning anatomy when looking at test scores. Notwithstanding, Bork et al. reported that the AR group did score significantly higher than the virtual dissection table (Anatomage) group. However, no difference between the conventional atlas group and the AR group was found 11. Conversely, Barmaki and colleagues found REFLECT users did score significantly higher than their virtual mirror controls 12. MRT scores showed to be of importance as several studies found that students with lower MRT scores learned more with the 3D AR models than with conventional materials.
Secondary outcomes
In the study of Moro et al., 2017 adverse effects were reported for the VR studytool, which caused students to experience nausea, headaches and dizziness. No such symptoms and problems plagued the use of their AR tool. Discomfort was also experienced by students using GreyMapp, as they reported trouble with getting used to operating the application. In combination with taking notes during the lesson, some students assumed uncomfortable positions to multitask. This problem was easily solved by creating a bigger tablet interface. In the REFLECT study, it was reported that time on task increased significantly. In addition, students engagement was significantly higher in the AR group, causing the longer time on task.
Henssen et al. reportedly did not find an increase in motivation when comparing the AR group to the conventional group. However, focus group interviews showed that students did find the concept novel and interesting. Additionally, some students expressed their disappointment with not being able to work with the program 20. Engagement was gauged differently in the study of Barmaki et al., 2019, where they measured time on task has been suggested as an important marker for knowledge retention and student engagement. The time on task was significantly higher in the AR group, compared to controls (P=0.01). Finally, a significant difference was found by Bogomolova et al. in the enjoyment during learning between 2D anatomical models and the AR intervention (P=0.003) 19. Table 2 summarizes the outcomes
Meta-analysis
Meta-analysis showed a substantial heterogeneity in the included papers (Tau2=21.301; Q=15.493; df=7; I2=54.82%; P=0.030),which complicated further analysis. Based on the mean differences in anatomic test scores (%) between the AR groups and the control groups, a difference of -0.765% was estimated (P=0.732). This indicated that there was no significant advantage or disadvantage when learning anatomy with AR (Table2; Figure2). Sub analysis was carried out on studies using 2D anatomy teaching methods as a comparison to AR-based learning 11,19,20. This sub analysis showed significant lower mean anatomic test scores for the AR-groups (P=0.024) in studies which showed a low interstudy heterogeneity (Tau2=1.927; Q=2.224; df=2; I2=10.05%; P=0.329), as seen in Figure 3. In order to observe whether outcomes of the different groups (AR vs. control groups) are impacted by spatial abilities of the participants, a meta-regression analysis was performed for the studies that 1) compared AR-features with 2D anatomy teaching methods and 2) used a MRT to assess spatial ability 11,19,20. Meta-regression showed no significant relation between mean difference in anatomic test results (%) and mean difference in MRT scores (%) between the AR- and control-groups (Omnibus P=0.229), which can be appreciated in Figure4.