Characteristics of included studies
The search yielded 11,084 citations excluding duplicates (Fig. 1). Of these, 639 met the inclusion criteria based on title and abstract, and 257 studies met the inclusion criteria based on full text (references in Supplemental Digital Appendix 2).
General characteristics
Summary characteristics of the 257 included studies are given in Table 1. More studies were published in 2011–2021 (n = 194) than in 2000–2010 (n = 63). In total, 22 countries were represented, though most studies were conducted in the United States (78.2%), followed by Canada (5.4%), Australia (2.3%), United Kingdom (1.9%) and Japan (1.6%)
Table 1
Summary characteristics of the 257 included studies
Characteristic | No. (%) |
Country | |
United States | 201 (78.2%) |
Canada | 14 (5.4%) |
Australia | 6 (2.3%) |
United Kingdom | 5 (1.9%) |
Japan | 4 (1.6%) |
Other | 27 (10.5%) |
Specialty | |
Family medicine | 34 (13.2%) |
Internal medicine | 29 (11.3%) |
Orthopedic surgery | 23 (9.0%) |
General pediatrics | 21 (8.2%) |
General surgery | 18 (7.0%) |
Surgery (any type) | 17 (6.6%) |
Psychiatry | 14 (5.5%) |
All/any specialty | 14 (5.5%) |
Anaesthetics | 8 (3.1%) |
Emergency medicine | 8 (3.1%) |
Hospitalist | 7 (2.7%) |
Radiology | 6 (2.3%) |
Other | 58 (22.6%) |
Target population of interventiona |
Resident | 162 (63.0%) |
All/any level | 47 (18.3%) |
Specialist | 36 (14.0%) |
Subspecialty fellow | 8 (3.1%) |
Other (Junior doctor, multiple) | 4 (1.6%) |
Intervention typeb |
Resident Research Program (RRP) | 76 |
Protected time | 72 |
Mentorship | 52 |
Education program | 41 |
Research support staff | 30 |
Intramural funding | 23 |
Resident research requirement | 21 |
Department-wide research program | 18 |
Research leadership position | 15 |
Intramural fellowship | 14 |
Otherc | N/A |
a. Not necessarily the level of research participants, for example specialists could be interviewed about an intervention they experienced during residency |
b. Some studies investigated multiple interventions so the total will exceed 257 |
c. Included short resident research rotations (< 1 month), monetary incentives for research outputs, research days/events, equipment, laboratory/office space, works-in-progress meetings, a pre-residency research program, team approaches to research, a resident scholarly activity points system, internal grant review panels, database infrastructure, journal clubs and general department resources |
Target populations for the interventions included 39 individual specialties, as well as studies which targeted any/all specialties (5.5%). The most common individual target specialties were family medicine (13.2%), internal medicine (11.3%), orthopedic surgery (9.0%), general pediatrics (8.2%), and general surgery (7.0%). Most interventions targeted the resident level (63.0%) (meaning doctors participating in a training program to gain specialty status/licensure, also known as trainees or registrars). Interventions were also commonly applied to all/any level of doctor (18.3%) or specialists (14.0%).
Study designs
Study designs have been separated into two types: single intervention (n = 209) and multi-intervention (n = 48) to capture whether the intervention was being investigated in isolation or alongside others. Study design often had to be inferred based on available information as many studies did not explicitly name their design. The most common study design used for single intervention studies was a pre-post cohort design (38.8%), followed by post-only cohort (28.7%), cross-sectional (17.2%), cohort studies with a contemporaneous control (either matched, unmatched, or waitlist control) (11.5%) and qualitative designs (3.3%). Multi-intervention studies were either cross-sectional (79.2%) or qualitative (20.8%) designs. Further information on study designs including sample sizes is given in Table 2. In some studies sample size was not directly reported but was calculated during data extraction based on other information given in the article (e.g. number of residents the program admits per year). Notably, sample size was not able to be identified for 46 studies, most commonly those using pre-post designs (38/81).
Table 2
Detail on study designs and outcome measures of included studies
Study Designs | | |
Type of design | No. (%) | Median sample size (range) | Explanation and further information |
Single intervention (n = 209) |
Pre-post | 81 (38.8%) | 68 (4-327) | Sometimes also known as a before and after study. The majority used an audit or bibliometric approach for data collection, though some used surveys or other prospective data collection. For 38 pre-post studies, it was unclear how many participants were included. It was also often unclear whether the pre and post groups overlapped (aka included some of the same individuals), or were a “historical cohort” design with separate cohorts. |
Post-only | 60 (28.7%) | 31 (2-232) | Post-only studies simply reported the outcomes following an intervention (e.g. the department published 12 articles). Most studies collected data using audits or surveys, though some used both or other methods. |
Cross-sectional | 36 (17.2%) | 142 (32–101,031) | Cross-sectional studies with larger sample sizes usually used retrospective audit/bibliometric data, while smaller samples often used prospectively collected survey data. |
Cohort with contemporaneous control | 24 (11.5%) | 106 (21–754) | Included 18 with an unmatched control, 5 with a matched control and 1 with a waitlist control. These studies usually used audit/bibliometric data, although surveys were also sometimes utilised. |
Qualitative | 7 (3.3%) | 17 (5–72) | Usually as part of an evaluation of an intervention. These studies usually utilised interviews to collect data. |
Interrupted time series | 1 (0.5%) | N/A | |
Multi-intervention (n = 48) | | |
Cross-sectional | 38 (79.2%) | | Almost exclusively multi-site surveys, either surveying individuals (25 studies) or program directors (13 studies) about the presence of interventions/modifiable factors and outcomes. Surveys of individuals had a median sample size of 136 (range 13-1351), while surveys of program directors had a median sample size of 96 programs sampled (range 24–351). It should be noted that some of these studies only reported statistically significant associations, so complete data was not always available to be extracted about associations which were not significant. |
Qualitative | 10 (20.8%) | 28.5 (10–144) | Mostly used interviews and sometimes surveys to ask participants to reflect on what factors helped them engage in research. |
Outcome types measureda | | Explanation of outcome type |
Publication-related | 199 (77.4%) | Included measures such as total number of publications, total number of staff who published, percent of staff who published, mean or median publications per staff member, and publications per FTE. Some studies counted all publications, while others only counted specific publications (e.g. publications during a specific time period only), or publications where the staff member was a first or last author. Proxies for quality of publications were also often used, for example type of research published (e.g. retrospective studies or case studies were considered less valuable than prospective research), journal Impact Factor, H index, citations, and whether the journal was indexed or peer-reviewed. |
Presentation-related | 126 (49.0%) | Similar to publications, this was measured in many different ways (e.g. total, per staff member, per FTE). Sometimes only presentations at a specific event (e.g. an annual meeting or a resident research day) were counted. It was common for the nature of the conference (regional, national or international) to be used as a proxy for quality. |
Grant-related | 63 (24.5%) | Included total number of grants, total amount of funding, mean number of grants per staff member, percent of staff members who had received funding, and number of years funded per staff member. |
Career outcome | 59 (23.0%) | Current self-reported engagement in research, research FTE, position, and type of practice (i.e. academic vs private). |
Project-related | 56 (21.8%) | Number of projects begun or completed, and number of protocols submitted or accepted through the Institutional Review Board. This could be a department total or numbers per staff member. |
Awards | 17 (6.6%) | Total number of awards or awards per staff member. |
Other | 53 (20.6%) | Examples include subsequent research degrees, whether the research was attributed to the intervention, implementation of research findings, number who fulfilled their research requirements, selection of the site for clinical trials, collaborations (e.g. percent of papers that included residents or university partners), how many students/others a staff member mentored, and participation in reviewing activities. |
None (qualitative) | 17 (6.6%) | N/A |
Number of outcome types measured | | |
None (qualitative) | 17 (6.6%) | | |
1 | 59 (23.0%) | | |
2 | 81 (31.5%) | | |
3 | 64 (24.9%) | | |
4 | 26 (10.1%) | | |
5 | 4 (1.6%) | | |
6 | 6 (2.3%) | | |
a. Most used more than 1 outcome measure so the total will exceed 100% |
Outcome measures
Most studies (70.4%) used more than one type of quantitative outcome measure to determine the success of an intervention. Publication-related outcomes were most commonly used (77.4%), followed by presentation-related (49.0%), grant-related (24.5%), career-related (23.0%), project-related (21.8%), awards (6.6%), and other outcomes (20.6%). Each of these broad categories of outcome was measured in variety of ways, as outlined in Table 2. One hundred and twenty-five studies completed formal statistical hypothesis testing, of which the majority (111/125) found a significant result for the primary outcome.
Interventions and findings
Each intervention type and a brief summary of the outcomes of relevant studies is described below, in order of frequency as given in Table 1. Figure 2 provides a visualisation of the broad outcomes of each study that used formal statistical testing (full data in Supplementary Digital Appendix 3). Findings of individual studies should be interpreted cautiously as no quality assessment was completed. Detailed results can be found in Supplementary Digital Appendices 4 and 5.
Resident research programs
The most common type of intervention studied were Resident Research Programs (RRPs), investigated in 76 studies. RRPs were multi-faceted research engagement programs which incorporated individual interventions such as protected time, education programs, a project requirement, mentorship, research support personnel, intramural funding, journal clubs and resident research day events. These programs were integrated into standard residency training, usually across the length of residency or within the last few years of residency. RRPs were usually mandatory for all residents of the specialty training program at that site, but some were programs that were available to any interested trainees who satisfied a small set of prerequisite conditions. Programs which were available to only a selective subset of trainees through a competitive process (sometimes called research tracks) usually included substantial periods of protected time and were included within the “protected time” category further down.
RRPs had largely positive impacts on a range of outcomes, especially publication and presentation-related measures (Fig. 2). Four studies using statistical testing used a contemporaneous control group. Three used a ranked-to-match control group, meaning the control group consisted of residents in other institutions who received a ranking that meant they could have matched into the program if they had preferenced it highly enough (Calhoun et al., 2020; Sakai et al., 2014; West, Halvorsen & McDonald, 2011). This was intended to help balance self-selection bias, meaning the possibility that the results were due to the fact that higher performing residents may be more likely to choose programs that offer a RRP. The remaining study (Koontz, Kamer & Heitkamp, 2020) compared residents at the same institution who chose to join the RRP to those who did not. All of these studies reported significant differences in publication-related outcomes in favour of the RRP group, although each study measured publications differently so direct comparison was not possible.
Pre-post studies were the most commonly utilised research design to evaluate RRPs. About half of the 25 pre-post studies using significance testing reported a significant difference for publications (13/23), and most found a significant difference for presentations (11/14).
Protected time
Protected time was investigated in 72 studies, inclusive of any study that looked at dedicated research time as an intervention, without describing it as part of a multifaceted program like a RRP or post-residency research fellowship. In single intervention studies, the vast majority of studies examined protected blocks of time of over 6 months (usually 1–2 years) during surgical residency. Multi-intervention studies looked at a variety of types of protected time (e.g. percentage of protected time in role) across a wider range of specialties. For this reason, they will be discussed separately below.
Of the single intervention studies investigating blocked time, four utilised a contemporaneous control group and statistical hypothesis testing (Brandt et al., 2018; Joshua Smith et al., 2014; Krueger et al., 2017; Osborn et al., 2018). All four were in surgical specialties, were retrospective and used unmatched control groups from other institutions. Half of these studies found a statistically significant effect on publication-related outcomes (2/4), and the two that looked at career outcomes both found a significant effect (2/2). Cross-sectional studies universally found a positive impact on publication-related outcomes (7/7), and mixed outcomes for grant (3/4) and career-related (3/7) outcomes. Two pre-post studies also found positive outcomes for publication-related outcomes (2/2). Some cross-sectional studies investigating protected blocks of time for residents also included comparisons of different lengths of time, usually finding that larger amounts of time had positive effects on a range of outcomes (Bhattacharya et al., 2011; Hsieh et al., 2014; Lee et al., 2020; Robertson, Klingensmith & Coopersmith, 2009; Yang et al., 2011). One cross-sectional study also found that protected time produced more publications when provided in a longitudinal format rather than a blocked format (Williams, Agel & Van Heest, 2017).
Multi-intervention studies used cross-sectional designs to determine the impact of various forms of protected time. These studies mostly found statistically significant positive effects on publication (11/17) and grant-related outcomes (3/3), but mixed effects were seen for presentation (2/4) and project-related outcomes (2/4). Participants in qualitative studies also commonly identified protected time as one of the factors that had contributed to their research success (6/10).
Mentorship
Fifty-two studies investigated mentorship as an intervention, which encompassed both formal mentoring programs and general presence of research mentors. Mentoring was often used synonymously with research supervision, rather than in the sense of external career mentoring. Studies also often investigated the characteristics of mentors, for example gender, geographical co-location, mentor research productivity, and the value of having single versus multiple mentors.
No studies which investigated the impact of mentoring used a contemporaneous control group. All studies which used statistical hypothesis testing were cross-sectional, and were mostly multi-interventional. Mentoring was mostly positively correlated with publication-related outcomes (8/12) and had varied effects on other outcomes. Mentoring was identified as an important factor in all qualitative studies that asked participants to reflect on factors contributing to research success (10/10).
Education programs
Forty-one studies focused on educational interventions in a wide variety of formats. Some were short intensive workshops of 1–2 days (Ostbye et al., 2004; Rhondali et al., 2015), while others were more extensive, ongoing education over the course of weeks or months, designed to sit alongside completion of a small project or proposal, sometimes with a mentorship component (Demirdjian et al., 2017; Wojtecki, Wade & Pato, 2007).
All of the single intervention studies using statistical hypothesis testing investigated completely different types of education programs, from a 3-day workshop (Ried et al., 2008), to a 33-session longitudinal program alongside a project requirement and mentorship (Lowe et al., 2008), hence attempts at comparisons may be inadvisable. Similarly, many multi-intervention studies used surveys asking about general availability of research education, which could be interpreted differently by each participant and thus represent many different types of education programs. Accordingly, there were variable associations for most outcomes, though single intervention studies more commonly found positive associations.
Research support staff
The presence of research support staff was investigated as a strategy for increasing research engagement in 30 studies. Research support staff were varied and included biostatisticians, nonclinical PhDs, lab technicians, research coordinators, research coaches, and support units including multiple staff. Most studies which used hypothesis testing were cross-sectional and found varied effects of the presence of support staff on publications (6/11 statistically significant) and most other outcomes.
Intramural funding
Intramural funding, meaning research funding from the recipients’ employing institution, was investigated in 23 studies. Most of these studies did not disclose a funding amount, but where specified this was usually under $10,000USD. A single study used a contemporaneous control (Winn et al., 2019), comparing residents from the same institution who received an intramural grant with those who did not. This study found no difference in publication-related outcome but a significant difference for presentation-related outcome. All other studies using statistical hypothesis testing were cross-sectional studies, which found presence of intramural funding had mixed associations with a range of outcomes.
Resident research requirement
Twenty-one studies looked solely at a mandatory departmental requirement for residents to engage in research or produce a research outcome (e.g. protocol, publication or presentation). The only single-intervention study to use statistical hypothesis testing was a matched control design (Ozuah, 2009), which compared a primary pediatric residency program (which had a research requirement), with subjects from other pediatric residency programs in the same institution. This study found a significant difference in both total and first authored publications during and after residency.
Multi-interventional cross-sectional studies found positive associations with presentation-related outcomes (2/2), and no association with publication (0/5), grant (0/2) or career-related (0/1) outcomes. All studies using resident participation in research activity as an “other” outcome had significant results (4/4).
Department-wide research programs
Eighteen studies investigated department-wide research programs. Like RRPs, these were multi-strategy interventions, but they focused on increasing research engagement of an entire department, rather than residents specifically. These programs contained many of the same strategies as RRPs, including protected time, mentorship, training sessions, research activity requirements, journal clubs, research leadership positions, research support staff and intramural funding.
All six studies which investigated department-wide programs using statistical hypothesis testing were single intervention- five pre-post and one interrupted time series design. All of these studies found statistically significant positive effects in publication (6/6) and presentation-related (2/2) outcomes.
Research leadership positions
Presence of a research leadership position, usually a department research director or residency research director, was investigated in 15 studies. Studies were included in this category if they focused on presence of the position itself, but it should be noted that these positions would usually be responsible for initiating and supporting other strategies (e.g. overseeing a RRP, administering an education program). The presence of research directors was associated with exclusively positive findings regarding publication, presentation and grant-related outcomes in all single intervention studies (which were cross-sectional and pre-post designs), whilst all multi-intervention studies found no significant association with these outcomes.
Intramural post-residency research fellowship
Intramural research fellowships after residency were the focus of fourteen studies. These fellowships were competitive placements within an institution, often analogous to subspecialty fellowships in length (1–2 years) and structure. While protected time was a key feature of these fellowships, they were usually formalised and/or accredited placements that incorporated multiple elements. It should be noted that many of these fellowships are administered by national organisations, thus were excluded from this review. Only fellowships administered and funded intramurally were included in this review.
Three studies investigated the outcomes of intramural fellowships using a contemporaneous control. One study found no significant difference in publication outputs between the fellowship group and a control cohort from the same institution matched for specialty and career stage (Brand, Patrick & Grayson, 2008). Two unmatched studies compared different types of programs at the same institution (Barreto et al., 2021; Dyrbye et al., 2008) and found significant differences for publication-related outcomes (2/2). Another four single intervention studies using cross-sectional and pre-post designs found statistically significant results for publications (4/4), but not presentation-related outcomes (0/2).
Other interventions
A variety of other types of research engagement strategies were the focus of fewer studies, listed in Table 1.