The Effect of Simulation-Based Neonatal Emergency Team Training on Clinical Performance and Patient Outcome: A Systematic Review

Background A number of neonatal simulation-training programmes have been deployed during the last decade, and a growing number of studies have investigated effects of simulation-based team training. However, the body of evidence remains to be compiled. Therefore, we performed a systematic review on the effects of simulation-based team training on clinical performance and patient outcome. Methods The review was conducted according to the preferred reporting items for systematic review and meta-analysis (PRISMA). We included studies on team training in emergency neonatal settings with reported outcome on clinical performance and patient outcome. Two reviewers independently selected articles and assessed risk-of-bias using the Cochrane risk-of-bias tool 2.0 and the Newcastle-Ottawa quality assessment scale. Kirkpatricks’ model for evaluation of training programs provided the framework for a narrative synthesis. Results We screened 1,434 titles and abstracts, evaluated 173 full-texts for eligibility, and included 24 studies. We identied only two studies with neonatal mortality outcome, and they had signicant methodological limitations, and no conclusion could be reached regarding effects of simulation training in developed countries. Considering clinical performance, randomized studies showed improved team performance in simulated re-evaluations 3 and 6 months after the intervention. after compare neonatal resuscitation team performance between intervention and control using randomly selected providers available on the day of Three of the randomized studies were small (n ≤ and had limited power (17,18,21).


Abstract
Background A number of neonatal simulation-training programmes have been deployed during the last decade, and a growing number of studies have investigated effects of simulation-based team training. However, the body of evidence remains to be compiled. Therefore, we performed a systematic review on the effects of simulation-based team training on clinical performance and patient outcome.

Methods
The review was conducted according to the preferred reporting items for systematic review and meta-analysis (PRISMA). We included studies on team training in emergency neonatal settings with reported outcome on clinical performance and patient outcome. Two reviewers independently selected articles and assessed risk-of-bias using the Cochrane risk-of-bias tool 2.0 and the Newcastle-Ottawa quality assessment scale. Kirkpatricks' model for evaluation of training programs provided the framework for a narrative synthesis.

Results
We screened 1,434 titles and abstracts, evaluated 173 full-texts for eligibility, and included 24 studies. We identi ed only two studies with neonatal mortality outcome, and they had signi cant methodological limitations, and no conclusion could be reached regarding effects of simulation training in developed countries. Considering clinical performance, randomized studies showed improved team performance in simulated re-evaluations 3 and 6 months after the intervention.

Conclusions
Simulation-based team training in neonatal resuscitation improves team performance and technical performance in simulation-based evaluations 3 to 6 months later. The current evidence was insu cient to conclude on neonatal mortality after simulation-based team training, since no studies were available from developed countries. Future research should include patient outcomes or clinical proxies of treatment quality whenever possible.

Background
It is estimated that less than 1% of all newborns will need extensive neonatal resuscitation in the delivery room (1). The individual health care professional will therefore rarely experience this, and even more rarely a speci c team of professionals will experience this together. In 2004, the Joint Commission for the Accreditation of Healthcare Organizations published a sentinel event alert indicating that ineffective communication within the neonatal resuscitation team played a role in almost three-quarters of perinatal deaths or permanent disabilities (2).
Before 2010 the Neonatal Resuscitation Program (NRP) have focussed on the acquisition of knowledge and technical skills pertinent to neonatal resuscitation (3,4). The sixth edition (2010) of NRP transitioned from instructor driven didactics and skills stations to interactive and simulation-based learning (4). The seventh edition (2016) introduced more focus on communication and team behaviours to the curriculum (4). Ten desired behavioural skills were adapted from crisis resource management principles (4). Optimal behaviour can be challenging in a highstakes time sensitive critical situation. Thus, simulation training needs to expand beyond technical skill acquisition, and to use simulated environments to study human and system performance (5).
We were unable to identify a review on simulation-based team training in neonatal resuscitation and emergency situations to answer our questions: Does simulation-based team training improve the performance of the team? Does it improve patient outcome and safety? We therefore aimed to perform a comprehensive and high-quality systematic review in order to describe the current state of evidence, and to point out areas where more research is needed to pave way for future improvements of neonatal emergency team training and patient safety.

Methods
We conducted and report this review according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) (6). We registered a study protocol at the International Prospective Register of Systematic Reviews (PROSPERO) repository (CRD42019128213); it was submitted June 24, 2019 and published September 6, 2019 (7).

Study eligibility criteria
We included studies of all designs if they met the criteria stated below.

Population
We included studies of active health care providers, e.g. nurses, doctors, midwives, and respiratory therapists, with clinical responsibilities in the delivery room, the neonatal intensive care unit (NICU), or in other hospital settings with emergency care for newborns. Studies focusing on pregraduate simulation training were excluded.

Intervention
The intervention was simulation-based team training of neonatal clinical emergencies in situ or in a training facility. We de ned team training as two or more health care providers in a critical situation requiring a coordinated effort for a successful outcome.

Comparators
We included studies comparing health care providers' performance before (comparator) and after simulation training (intervention) in pre-post designs with no control group. Studies of team skills retention with several evaluations over time (two or more) were also included. In randomized and non-randomized cohort studies the control group could be training or teaching as usual or no training at all. We also accepted comparators with different types or intensities of the simulation training.

Outcomes
We evaluated simulation training outcomes according to the framework of Kirkpatrick's four levels of training evaluation and transfer of learning to behaviour (8). Given our aim, we did not include reaction outcomes (level I), corresponding to providers' evaluation of the simulation training. Learning outcomes (level II) were included as self-reported changes in knowledge, attitude, con dence, preparedness, self-e cacy, and technical-and non-technical skills. Technical skills were de ned as: "Adequacy of the actions taken from a medical and technical perspective"; non-technical skills were de ned as: "Decision-making and team interaction processes used during the team's management of a scenario or a clinical situation" (9). Behaviour outcomes or clinical performance (level III) were included as observed technical and non-technical skills in the simulation setting or in the clinical setting. Patient outcomes (level IV) were included as a change in clinical parameters (e.g. time to critical task or action) or a change in patient outcomes (e.g. survival) of neonatal emergencies.

Search strategy
We conducted a combined search aiming to identify all papers on simulation-based team training in neonatology and emergency paediatrics as simulation-training programs may include both neonatal cases and paediatric (non-neonatal) cases. An experienced medical librarian (HSL) and a subject specialist (MSL) devised a search strategy for Medline, Embase, CINAHL and Cochrane Library. Studies were limited to English language, however there was no limit on year of publication. Details of the search strategy are available in Additional le 1. The nal search was run on March 6, 2019. Reference lists of the included studies were scrutinized to identify additional studies; identi ed studies were also subjected to the selection process as detailed below. After data extraction the papers were divided into a neonatal review and a paediatric (nonneonatal) review; two studies provided relevant data on both neonatal and non-neonatal simulations and were included in both reviews ( Figure   1) (10,11).

Study selection
Two authors (ST and MSL) independently screened titles and abstract of all identi ed studies. Studies meeting the inclusion criteria or potentially meeting the inclusion criteria, as well as studies with insu cient information, were included for full-text review. Any disagreements were resolved by discussion, and abstracts with non-consensus were also included for full-text review. Two authors (ST and MSL) independently performed full-text review of all included manuscripts. Any disagreements were resolved by discussion to consensus, or by consulting a third author in case of non-consensus (TBH). The study selection process was managed using the Covidence review platform (12), which facilitates co-reviewer blinded abstract-and full text screening, study inclusion, and resolution of con icts.

Data extraction
Two authors (AWS and MSL) extracted data using a prede ned template developed for the purpose. We did not contact authors of studies with missing or inadequate information. The following information was extracted from the included studies; author, year, country, simulation setting, study design, intervention, comparator, number of participants, professions of participants, participant/instructor ratio, simulation and debrie ng duration, simulation frequency (if applicable), level of delity, in situ or simulation centre, re-test timing (if applicable), self-reported outcomes, observed simulation outcomes, observed clinical behaviour outcomes, change in clinical parameter outcomes, change in patient outcomes.

Risk of bias in individual studies
Two authors (AWS and MSL) evaluated randomized studies according to the Cochrane risk-of-bias tool for randomized trials (ROB 2) (13). Separate score tools were applied for individually randomized parallel group trials and for cluster randomized trials. Each study received an overall risk-of-bias judgement of "low", "some concern", or "high". We evaluated non-randomized studies according to the Newcastle Ottawa quality assessment scale (NOS) (14). We used the NOS with the adaptations and operational de nitions for evaluating medical education research as suggested by Cook and Reed (15). An overall score of 0 (low quality) to 6 (high quality) was assigned each study. The risk of bias evaluation was presented for all studies and used for the synthesis and discussion of the results. Any disagreements were resolved by discussion to consensus, or by consulting a third author in case of non-consensus (TBH). The bias evaluation had no impact on study inclusion or exclusion.

Synthesis of results
We conducted a narrative synthesis of the included studies due to heterogenous interventions and outcomes. No single summary measure was applicable across studies. We used the guidance by Popay et al. (16), and focused the main synthesis based on the hierarchical Kirkpatrick levels; results (effects on patient outcomes); behaviour (effects on clinical performance); and learning (effects on knowledge). We presented all available studies for each level to provide full transparency. We then emphasized high quality studies in a narrative synthesis, but also integrated the remaining studies at hand.
We used a funnel plot to assess potential publication bias. We used the estimate from the main outcome of each included study. Ratio estimates (ORs, RRs, IRRs) were used directly, whereas a ratio was calculated for studies using continuous outcomes, e.g. intervention mean divided by comparator mean. A ratio above 1.0 favoured the intervention group or the post-intervention estimate.
We assessed selective reporting by inspecting pre-registered study or analysis protocols when available; for randomized studies this was part of the ROB 2 score. Further, we compared the speci ed analysis from the methods section with the reported outcomes in each included study.

Study selection
A total of 2,315 studies were identi ed across databases. After duplicate removal, we screened titles and abstracts of 1,434 studies. A total of 173 full-text articles were assessed for eligibility. Of these, 88 met the inclusion criteria common to neonatal and paediatric populations, and 24 provided relevant data for this neonatal review ( Figure 1). A list of excluded full-text manuscripts is provided in Additional le 2.

Study characteristics
Included studies were published between 2008 and 2018. Most were conducted in developed countries; however, one study from Kenya, one from Guatemala, one from Lebanon and two from Mexico were included (Table 1). Nine studies had a control group, six of these used random allocation. The remaining 15 studies had a single group pre-post intervention design. Two studies provided patient outcome (Kirkpatrick level IV), 14 provided clinical performance outcome (level III), and 15 provided learning outcome (level II). Number of participating health care providers ranged from 16 to 305 (Table 1).

Risk of bias within studies
The risk-of-bias judgements of the 6 randomized studies are presented in Table 2. Five studies received an overall judgement of "some concern" (17-21). One study received "high-risk" due to three domains with some concern, including the randomization domain (22). All but one study had low risk of bias in the missing outcome domain, re ecting limited dropout from the short-term educational interventions. Two studies had published their trial protocols (20,22), which was necessary to obtain "low risk" in the selected results domain.
The risk-of-bias scores for 18 non-randomized studies are presented in Table 3. As in the randomized studies the dropout was low in all studies.
Two of three cohort studies received an overall risk-of-bias score of 4 (out of 6) (23,24). The third cohort study scored 2 due to suboptimal description of the intervention and control groups, and to non-blinded outcome assessment (25).

Results and synthesis of patient outcome (Kirkpatrick level IV)
Two studies included patient outcomes (Table 1). Walker et al. conducted a cluster randomized study of 12 intervention hospitals matched (number of births, caesarean rate, mortality, complications, and number of operating rooms) with 12 control hospitals in Mexico (22). Hospitals with higher than average maternal mortality were selected from a list of government run facilities with 500-3,000 annual births. Intervention hospitals received 2+1 full days of training including interactive team and communication exercises, skills sessions, and in situ simulation of obstetric and neonatal emergencies. Control hospitals received no training during the study period. In total 305 (9%) of 3228 eligible health care professionals (nurses and doctors) received both training modules during 2010-2012. Statistical analysis adjusted for matching and presence of a NICU due to imbalance at baseline; 83% of control hospitals and 42% of intervention hospitals. Incidence of hospital-based neonatal mortality tended to be lower at the intervention hospitals; IRR (4 months) 0.73 (95% CI 0.45-1.17), IRR (8 months) 0.59 (0.37-0.94), IRR (12 months) 0.83 (0.50-1.37). However, we were concerned about the validity because the study received an overall high risk-of-bias judgement ( Table 2). The intended primary outcome of perinatal mortality was changed to in-hospital neonatal mortality due to poor reporting of stillbirths (not further speci ed). The intervention covered both obstetric and neonatal emergencies, thus any change in mortality really re ects the combined perinatal emergency care, not neonatal resuscitation alone.
Charafeddine et al. conducted a single group pre-post intervention study at 22 hospitals in Lebanon (26). The Hospitals were part of a larger National Collaborative Perinatal-Neonatal Network (NCPNN) covering 32 hospitals. Intervention was an 8-hour session including 40 minutes teaching including the NRP algorithm, hands-on simulation on low-delity manikins, and nally "megacode" simulations including all steps of neonatal resuscitation. Some 256 professionals (doctors, nurses, midwives) were trained during 2009-2011; the selection process and participation rate was not described. Patient outcomes were retrieved from surveillance data on mortality at hospital discharge and neonatal morbidity from the NCPNN network. The rst intervention year (2009) was chosen as reference; mortality odds ratio (OR) decreased steadily from 1.53 (95% CI 1.18-1.98) in 2006 to 0.72 (0.54-0.96) in 2013. Furthermore, the years 2011-2013 had fewer infants requiring oxygen at birth, bag and mask ventilation, intubation, and chest compressions, compared with 2009. The study obtained a low NOS score of 1 out of maximum 3 for a study with no control group and non-blinded outcome ascertainment, indicating risk of bias. There was no presentation of neonatal mortality rates for the 10 non-participating network hospitals likely also contributing surveillance data. Thus, we are concerned that other factors than the simulation training may explain the observed change in mortality. We acknowledge that the authors also do not emphasize this nding, but rather the participants' change in knowledge (included below).
In summary of patient outcomes, we identi ed one randomized study and one single group pre-post study that reported a measure of neonatal mortality (22,26). Both studies were from developing countries and indicated lower hospital-based neonatal mortality after simulation-based training. Both studies had a high risk of bias, and the randomized study by Walker et al. also included obstetric emergency training, thus hampering interpretation of effects of neonatal team training.

Results and synthesis of clinical performance (Kirkpatrick level III)
We included 14 studies with clinical performance outcomes (Table 4). Eight studies had a control group, ve with random allocation. Six studies were single group pre-post design. One study by LeFlore et al. simulated neonatal transport cases (24), the rest simulated neonatal (delivery room) resuscitation.
Two randomized studies evaluated the effects of simulation-based team training after approximately 3 months (18,19). Rubio-Gurung et al. conducted a large cluster randomized study in 12 hospitals in France (19). They compared 4-hour high delity in situ multidisciplinary team trainings of 6 professionals with no simulation training. They trained 80% of the delivery room staff within 1 month, amounting to 202 professionals in 6 intervention hospitals. Simulation-based evaluations were conducted for a random sample of professionals before (n= 116) and after (n= 114) intervention. No differences in baseline evaluations were observed. Signi cant improvements were demonstrated 3 months later for technical skills, team performance and global performance ( Table 4). The study received a low risk-of-bias judgement in 5 of 6 domains (  (18). They were randomized to a 4-hour high delity simulation-based session (45 min. didactics) on neonatal resuscitation, or the standard emergency medicine curriculum, including monthly paediatric (occasional neonatal) simulations. Baseline data were similar in both groups. Simulation-based evaluation after 16 weeks demonstrated no change in the neonatal resuscitation score for the control group, but a 12-percentage points improvement in the intervention group (Table 4). The intervention group also signi cantly reduced the time to warm, dry, stimulate, and hat on the infant compared to controls (p= 0.017). The study received an overall riskof-bias judgement of some concern ( Table 2). The randomized studies by Rubio-Gurung et al. and Lee et al. supports improved team performance and technical skills 3 months after simulation-based team training (18,19).
Two randomized studies extended re-testing to 6 months (20,21). Sawyer et al. conducted a study of 30 residents randomized to either standard oral debrie ng or video-assisted debrie ng at 3 high-delity simulation sessions approximately 2 months apart (21). Baseline data were similar in both groups. No signi cant differences in neonatal resuscitation performance score and time to perform critical actions were observed at 6 months comparison of standard oral and video-assisted debrie ng groups ( Table 4). The study received a low risk-of-bias judgement in 4 of 5 domains (Table 2). Thomas et al. randomized 98 interns to either standard NRP training (comparator) or NRP plus 2 hour session on communication and teamwork and low-(intervention group 1) or high (intervention group 2) delity simulation (20). At 6 months follow-up the intervention groups (analysed together to increase power) exhibited more teamwork behaviours per minute (11.8) than controls (10.0) (p=0.03). However, no differences were observed for NRP performance score, duration of resuscitation, vigilance, or workload management. The study received low risk-of-bias in 3 of 5 domains (Table 2).
Bender et al. investigated whether a NRP booster at 9 months improved performance at 15 months evaluation (17); 50 residents were randomized to either a half-day NRP booster with high-delity simulations (intervention) or to routine clinical duties (comparator). At 15-months evaluation the intervention group scored higher on both technical score and team performance score ( Table 4). The study was of some risk-ofbias concern (Table 2).
Two cohort studies explored minor interventions related to simulation-based team training. Rovamo et al. studied 99 doctors, nurses and midwives on a one day high-delity neonatal resuscitation course (23). Both intervention and control groups had the same simulation training, but in addition the intervention group received a 1-hour interactive lecture on crisis resource management (CRM) and anaesthesia non-technical skills principles. There was no difference in team performance score in the two groups after the lecture ( Table 4). The study received a NOS bias score of 4 of 6, indicating low to moderate risk-of-bias (Table 3). LeFlore et al. studied a neonatal transport team over 2 years (24); the rst year they trained with high delity simulation and self-paced modular learning (control), the second year they used high delity simulation and expert modelled learning (intervention). Some, but not all team members participated both years. There was no signi cant change in team performance score ( Table 4). The study received a NOS bias score of 4 of 6, indicating low to moderate risk-of-bias (Table 3).
Barry et al. studied a group of 28 rst year residents (intervention) and compared them to a group of 24 senior residents (control) (25). The intervention was half-a-day equipment workshop and in situ simulation-based team training. The control group was senior residents with NRP course and routine clinical duties. Re-testing was done after 1 month, and after 1-2 years. The intervention group's global performance score increased from a lower level before training to the same level as the senior residents after training ( Table 4). The study received a NOS score of 2 of 6 indicating moderate to high concern of bias ( Table 3). (27); 90-minute multidisciplinary high-delity trainings were conducted in a simulated delivery room. Over the years, increasing communication checklist scores was observed ( Table 4). The study received a NOS risk-of-bias score of 2 (Table 3) (Table 4) (28)(29)(30)(31)(32). NOS risk-of-bias scores ranged from 1-3 of 6 ( Table 3).

Dadiz et al studied 228 perinatal health care professionals over a 3-year period
In summary of clinical performance, randomized studies showed effects of team training in simulated re-evaluations after 3 and 6 months. Booster simulation sessions 9 months after NRP improved performance at 15 months evaluation. One randomized study showed no differences in team performance comparing standard oral debrie ng and video-assisted debrie ng. One single-group study showed steadily improving communication performance during a 3-year intervention with yearly simulation training. Six smaller single group studies showed that simulation-based team training improved team performance scores 0-6 months later.

Results and synthesis of learning (Kirkpatrick level II)
Two small studies with random allocation to the intervention and control group presented self-reported outcome on knowledge and con dence. Bender et al. observed no signi cant difference in knowledge 15 months after a simulation-based NRP boost at nine months (17). Lee et al. observed signi cant improvements in con dence in neonatal resuscitation after 16 weeks in both the intervention and control group, but no statistically signi cant difference between groups (18). Both studies had limited power to detect a difference.
A total of 13 studies with single group design presented self-reported learning outcome, we brie y summarize the ndings of 7 studies with more than 50 participants (Table 1). They all evaluated outcome immediately after the simulated neonatal resuscitation team training intervention or within a 2-3 months period. Self-assessed improvements were reported for neonatal resuscitation knowledge (26,28,(33)(34)(35), selfe cacy (28,34), communication (33,36), and leadership, con dence and technical skills (36). Dadiz et al. speci cally trained and studied delivery room communication, and interestingly the health care professionals reported signi cant improvements in team communication in real clinical situations over a 3-year study period (27).
In summary of learning outcomes, the single-group design studies all reported signi cant improvements in self-reported outcomes, but 2 small randomized studies found no difference in improvements between the intervention and control groups.

Risk of bias across studies
Within each group of studies (according to Table 1) the funnel plots were quite symmetric, and no major concern about publication bias was raised ( Figure 2). We found no indication of selective reporting bias, as methods section and reported result were coherent in all studies. Prepublished study protocols were available for only 2 of 6 randomized studies, which was re ected in the risk-of-bias score.

Summary of evidence
We encountered an evolving research eld with the earliest included study published in 2008 and the majority after 2012. One main nding of this review was the lack of evidence to support effects of team training on patient outcome (Kirkpatrick level IV). We identi ed only one randomized study and one single group pre-post study that reported a measure of neonatal mortality (22,26). Both studies indicated lower hospital-based neonatal mortality after simulation-based training. However, both studies had a high risk of bias. The randomized study by Walker et al. also included obstetric emergency training, which complicates interpretation of isolated effects of neonatal resuscitation team training. Thus, the evidence from the two studies was inconclusive, and they were conducted in developing countries.
Another main nding based on ve randomized studies was improved team performance (Kirkpatrick level III) in simulated re-evaluations 3 to 6 months after the intervention simulation training (18)(19)(20). Booster simulation sessions 9 months after NRP improved performance at 15 months evaluation (17). The study by Rubio-Gurung et al. stands out by being su ciently powered (n= 114), well designed, and by evaluating technical and team performance in a simulation-based setting before and after intervention (19). They were thus able to compare changes in neonatal resuscitation team performance between intervention and control hospitals using randomly selected providers available on the day of evaluation. Three of the randomized studies were small (n ≤ 50) and had limited power (17,18,21).

Strengths
This systematic review applied a comprehensive search strategy in four medical databases by an experienced medical librarian. We followed a pre-speci ed study protocol registered in the PROSPERO repository. Two reviewers independently screened and selected studies performed data extraction and risk-of-bias scoring. We presented the data according to the PRISMA guidelines. The funnel plot was not indicative of publication bias.

Limitations
Meta-analysis was impossible due to heterogenous interventions and outcomes. Instead, we performed a narrative synthesis structured by outcome Kirkpatrick level and with emphasis on studies of high quality. For transparency, every included study was presented and cited (Table  1). Although this process is less standardized than a meta-analysis, we do consider it reproducible and open to scrutiny.
The majority of included studies had signi cant methodological limitations. Fifteen of 24 studies had no control group, which is concerning when using a self-reported endpoint or a simulation-based endpoint; familiarization with the simulation setting may improve performance during subsequent simulations. Most non-randomized studies had inadequate adjustment for potential confounding factors such as years of clinical experience, profession, team composition, age and gender. Small sample size (n< 50 in 12 studies) may likely have resulted in inadequate power and inability to perform adjusted statistical analyses.

Recommendations for future research
Measuring effects of simulation-based training on clinical outcomes of neonates should be preferred whenever possible. Newborn morbidity and mortality is obviously of primary interest, but also clinical information that may serve as proxy measures of treatment quality, for example time to critical tasks during neonatal resuscitation. We acknowledge that such measures are di cult and time consuming to obtain given the paucity and volatile nature of real-life critical events. We advocate for the use of control groups when designing new studies, and the use of random allocation to intervention and control whenever possible. The study protocol should be published, to avoid issues of selective reporting. When applying simulation-based evaluation of the impact of training, use video recordings and blinded scoring by accepted and validated protocols. Compared to planned simulation-based re-testing, the use of unannounced simulated mock codes may mimic real clinical encounters more closely, because clinical professionals often prepare for planned testing, which may bias results. Describe the theoretical framework and process of debrie ng, as this is an important part of simulation-based training that impacts the learning outcome.

Conclusion
This systematic review compiles the rst decade of research on simulation-based team training in neonatal medicine emergencies. We were unable to reveal effects of team training on neonatal morbidity and mortality, as we identi ed only two studies both conducted in developing countries and both with signi cant methodological limitations. However, ve randomized studies showed improved team performance in simulation-based re-evaluations 3 to 6 months after the intervention simulation training. Future research should include patient outcomes or clinical proxy measures of treatment quality whenever possible.

Abbreviations
In order of appearance: Some concern Thomas, EJ 2010 Randomized, 3 arms Low risk N/A Some concern Some concern Low risk Low risk Some concern Bender, J 2014 Randomized, 2 arms Low risk N/A Low risk Low risk Some concern Some concern Some concern Sawyer, T 2012 Randomized, 2 arms Low risk N/A Low risk Low risk Low risk Some concern Some concern Lee, MO 2012 Randomized, 2 arms Low risk N/A Some concern Low risk Low risk Some concern Some concern Table 3. Risk-of-bias for non-randomized studies using the Newcastle-Ottawa quality assessment Scale (NOS) adapted to educational research.   PRISMA owchart of the study selec7on process.