Study selection
The kappa statistic for the agreement between the reviewers of searching and selection of studies was 0.86. Following inspection of the titles and abstracts, a total of 1257 articles were initially obtained for assessment, including 1219 from the five electronic databases (PubMed, Wiley, Cochrane Central Register of Controlled Trials, ScienceDirect, and SCOPUS), 31 from the manual hand search of the journals (Journal of Dental Education, European Journal of Dental Education, International Journal of Technology Assessment in Healthcare, and Medical Teacher), and 7 from the reference lists. After removal of duplicates, 1107 articles remained and from those, a further 1013 were removed as they were not directly relevant to the research question of the current systematic review. This left 94 articles for potential inclusion in our systematic review. After reading through the full texts of these 94 articles, 75 were excluded due to various reasons i.e. 28 were not randomized clinical trials, 19 were reviews, 15 did not include dental students entirely, 10 did not include a conventional group for comparison in the randomized clinical trial, 2 were unavailable in English, and 1 was an incomplete registered clinical trial. Thus, a total of 19 studies were included in the final analysis of this review (Figure 1).
Study characteristics
Nine studies (47.4%) took place in Europe [18-26], five studies (26.3%) took place in North America [7, 27-30], three studies (15.8%) were conducted in Asia [31-33], one study (5.2%) was conducted in Oceania [34], and one study (5.2%) did not mention the country setting [35]. One study (5.2%) had a sample size of 9 [7], three studies (15.8%) had a sample size between 11 and 30 [22, 28, 34], 11 studies (57.9%) had a sample size between 31 and 50 [18-21, 23-26, 31, 33, 35], two studies (10.5%) had a sample size between 51 and 70 [29, 32], and two studies (10.5%) had a sample size between 71 and 90 students [27, 30]. Seven studies (36.8%) assessed second year undergraduate dental students [7, 21, 22, 27, 29, 30, 33], four (21%) assessed first year [23-25, 28], four (21%) assessed fourth year [20, 32, 34, 35], two (10.5%) assessed third year [18, 26], one (5.2%) assessed fifth year [31], and one (5.2%) assessed a mixture of fourth and fifth year [19]. Most of these studies (57.9%) assessed the student’s ability in operative dentistry [18, 20-26, 29, 30, 33]. Five (26.3%) were about prosthodontics [7, 27, 31, 32, 34], one (5.2%) was a mixture of both operative dentistry and prosthodontics [28], one (5.2%) about endodontics [35], and one (5.2%) about oral surgery [19]. In total, there were four main types of technology-assessments that were used in all the 19 studies. Ten studies (52.6%) used virtual reality [21-25, 28-31, 33], six studies (31.5%) used a digital scanner [7, 20, 26, 27, 32, 34], two (10.5%) used augmented reality [18, 19], and one (5.2%) used haptic technology [35].
Risk of bias within studies
One study had a low risk of bias [35], 5 had an unclear risk of bias [24, 25, 32-34], and 13 had a high risk of bias [7, 18-23, 26-31], (Figure 2). A summary of the percentage of allocation of risk of bias grades in each domain can be seen in Figure 3.
Out of the total 19 studies, 17 had allocation concealment bias [18-34], and of these 15 also had another form of selection bias namely, random sequence generation [18-20, 23-34]. Six studies had performance bias in terms of either blinding of the personnel or the participants involved [7, 20, 27-30]. Six studies had concerns regarding incomplete outcome data [7, 23, 28-31]. Five studies had other forms of bias such as lack of blinding and calibration of the examiners who are grading the preparations, a small sample size, and a lack of monitoring of additional hours of practice by the students outside of the allocated training time [7, 19, 27, 29, 30]. Two studies had a form of detection bias [27, 30], and two studies had reporting bias [21, 22]. Only one study had no form of bias [35].
Description of study findings
Five studies provided a questionnaire on student’s experience using a technology-enhanced assessment system, to their intervention group [18, 19, 21, 22, 32]. In two of these studies, most participants believed they could improve their self-learning, self-assessment, and/or assessment abilities using the technology-enhanced method over the conventional method [18, 32]. Two of these studies revealed that most students in the intervention group did not feel that the conventional method would be replaced [21, 22]. In one study, the intervention group reported being more confident in their ability in administering an inferior alveolar dental block in comparison to their peers in the control group [19].
Operative Dentistry
Eleven studies compared technology-enhanced assessment to conventional assessment in operative dentistry. Two studies by Nagy et al., [20] and Wolgin et al., [26] used a digital scanner to assess cavity preparations in comparison to a control group. Nagy et al., [20] reported that the intervention group had significantly smaller deviations of the mean occlusal width, approximate depth, and shoulder width in their second preparations. In comparison, the control group did not show any significant difference in mean measurements between the first and second preparations. Wolgin et al., [26] reported that using the digital scanner was just as effective as the conventional form of supervision, as there was no significant difference between the intervention and comparison groups in regard to the cavity dimensions.
Five studies specifically used the DentSim Virtual Reality System [23-25, 29, 30]. These five studies all reported an overall increase in performance for the groups involved, however, results of the technical scores of the intervention and control groups varied in these studies. Wierinck et al., [23] reported that the intervention group which used the DentSim without feedback significantly outperformed the control group in performance scores in the retention tests. However, no significant differences were found between the groups in the transfer tests. Another study by Urbankova, [30] found that the intervention group performed significantly better than the control group for the first two examinations, but not on the last examination. A third study by LeBlanc et al., [29] reported that there were no significant differences in overall performance scores between the groups, but that the intervention group improved significantly more than the control group. The last two studies (Wierinck et al., [24]; Wierinck et al., [25]) reported that the intervention groups using DentSim had significantly better performance than the control group in both the immediate and delayed retention tests. The intervention groups, however, took a significantly longer preparation time compared to the control group. In Wierinck et al., [25] study during the delayed retention test and delayed transfer test, only one intervention group differed significantly from the control group, but in the second study by Wierinck et al., [24], both intervention groups had a significantly better performance than the control group in these tests.
Quinn and his co-workers used an unspecified virtual reality machine (Quinn et al., [21] involved two intervention groups and one control group; while Quinn et al., [22] had one intervention and one control group). Both studies reported that generally there were no statistically significant differences between the intervention and control groups. In Quinn et al., [21], the intervention group with real-time and conventional feedback scored significantly higher than the control group in one criteria, the outline form. The rest of the scores in all groups were not statistically significant. The second study by Quinn et al., [22] reported a variation in the significant differences found between the intervention and conventional training groups. Some criteria failed to show significant differences between the two groups, while the remaining criteria showed a significant difference with the virtual reality group showing worse qualitative scores.
Llena’s et al., [18] study used an Augmented Reality Software and mobile application. It reported that there was a significantly higher average score in the intervention group for class I cavity preparations, but no significant differences were observed between the two groups in the class II occlusal box cavity preparation exercise. In another study by Murbay et al., [33], the Moog Simodont was used and reported that there was a significant improvement as a result of being exposed to the dental trainer.
Prosthodontics
In prosthodontics, two studies reported the use of the E4D Compare software [7, 27]. E4D Compare scanning software is used as a virtual assessment tool for matching and comparing standard ideal tooth preparation with the operator’s dental work. Sadid-Zadeh et al., [7] reported that the intervention group, interacted only with the software, had consistently a higher percentage of acceptable crown preparations than the control group and the faculty assisted intervention group. All groups showed improvement over time, with undercuts being the most common error, along with unsupported enamel, finish of the preparation, finish line width, amount of occlusal reduction and contour of the preparation. This study found that using the E4D Compare software was just as effective as conventional training. In Gratton et al., [27] study, there was no statistically significant difference between the intervention group and the control group with regards to technical scores and self-evaluation scores during fixed prosthodontics preparation. However, there was a significant difference between the two groups with regards to the average faculty grading, as faculty consistently gave higher average scores than the average E4D Compare grade.
Two studies used another digital scanner as their method of intervention [32, 34]. Tiu et al., [34], reported that the conventional group with no tutor assistance had inconsistent results compared to the intervention group which was able to achieve the acceptable range for preparation finish-line dimensions. By the fourth session, 70% of the intervention group were able to achieve acceptable total occlusal convergence angles (TOC) and finish-line dimensions in their crown preparations. This was outperforming the other groups in the overall acceptable preparations. Liu et al., [32], revealed that there was a significant difference between both the intervention and control groups in practical scores with the intervention group scoring higher than the control group in the overall preparation score.
Kikuchi et al., [31], used the DentSim virtual reality system. This study reported that the intervention groups had significantly higher average scores than the control group. Total scores increased with experience in the intervention groups between experiments, but there was no significant difference in total scores in the control group between experiments. Preparation time was significantly shorter in the control group compared to the intervention groups. The scores for wall incline in the intervention groups were higher than the control group in all experiments. Undercuts decreased with experience, but damage to adjacent teeth was not significantly different among all groups. Scores for margin location in the intervention groups were significantly higher than the control group, but not in chamfer width, wall smoothness, finish line continuity, interproximal clearance resistance, and retention.
The DentSim virtual reality system was also tested in a study by Jasinevicius et al., [28], and was reported to have no significant difference in the number and quality of preparations between both the intervention and control groups.
Endodontics
Only one study was related to endodontics by Suebnukarn et al., [35]. A haptic virtual reality simulator intervention was used to evaluate procedural error and treatment time during access cavity preparation. There were no significant differences between the groups with regards to their average error scores, tooth mass removal, and task completion time before training. Error score reduction for both the virtual reality simulator and conventional training groups after training was not significantly different. The intervention group had a significant reduction in tooth mass removal after training when compared to the control group. There was no significant difference in task completion time after training between both groups.
Oral Surgery
Only one study was related to oral surgery by Mladenovic et al., [19]. It reported that the intervention group had a higher average score and a more limited range of responses on the questionnaire than the control group after using the augmented reality device. The average time for performing anesthesia in the experimental group was significantly lower than the control group. The intervention group had a higher success rate than the control group, but this difference was not statistically significant. Heart rate significantly increased in both groups when performing anesthesia, but there was no significant difference in heart rate between the two groups.
Assessment of the Quality of Evidence
The quality of evidence according to GRADE was rated overall as low. Although one RCT was rated as high as it had an adequate sample size and good control of confounding factors and a few limitations, the remaining RCTs were rated as low due to high risk of bias, small sample sizes, conflicting findings and other confounding variables such as, not accounting for additional hours of training, and the lack of calibration of examiners grading the preparations.