Design and guiding frameworks
The current review will use systematic review procedures and report in accordance with the updated Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA 2020) guidelines with its extension for searches (PRISMA-S) (33, 34). The extension for protocols (PRISMA-P) is found for this review in Additional file 1.
Included measurements will be assessed based on recommended best practice for psychometric properties and pre-defined criteria in accordance with the Psychometric And Pragmatic Evidence Scale (PAPERS) (35, 36) and measurement equivalence (invariance) (37). The criteria are summarized and displayed in Table 3. Further, the included measurements will be mapped against the domains and constructs of the Consolidated Framework for Implementation Research (CFIR) (17). The CFIR is one of the most widely utilized and acknowledged meta-framework within D&I (14, 15), and captures factors influencing implementation success. The framework with its domains and constructs is summarized in Table 2.
This systematic review is submitted to be pre-registered on PROSPERO ID: 284741. Submission occurred the 12th of October 2021.
Eligibility criteria
Publications included will be peer-reviewed journal articles on original research, in English, and Nordic languages (Swedish, Norwegian, Danish, and Icelandic), and published from year 2000. The date restriction was chosen mainly because published work within D&I has been on the rise during the last two decades. Additionally, included publications have to report research from school settings, here including primary and secondary school, and excluding preschool, tertiary, and vocational education. This is mainly due to that the included settings are similar enough concerning organization and contextual environments. Research populations can involve school stakeholders’ such as students, teachers, school leaders and management, school nurses, psychologists, assistants, and educators or similar school staff. Only quantitative measures will be included, whereas, qualitative methods will be excluded. The eligibility criteria are summarized in Table 1.
Abstracts describing editorials, commentaries, conference abstracts, and dissertations, and other grey literature, as well as publications which reports on measures developed using exclusively qualitative methods will be judged ineligible.
Search strategy
A detailed search strategy will be developed by researchers (SH, ÅN) together with the Karolinska Institute University Library, KIB, search specialist staff. KIB will further be enlisted to perform the searches.
To identify related measurements, we will systematically search the following six electronic databases: (i) Medline (Ovid); (ii) ERIC (ProQuest); (iii) PsycInfo (Ovid); (iv) Cinahl (EBSCO); (v) Embase (embase.com), and (vi) Web of Science Core Collection (Clarivate). Consistent with our aim to identify and assess implementation-related measurements in the educational-, behavioral-, and health-space, the search string will be built on three core levels: (a) terms for implementation, (b) terms for measurement, and (c) terms for school settings. A test strategy for the search terms will be conducted in Web of Science Core Collection (Clarivate), then reviewed by the research team and KIB together. The final strategy will be adapted to fit the five remaining databases.
While this review mainly is situated within public health and implementation science, we will also address articles within educational science because of our chosen setting being schools. Some of the commonly used search categories from the PICO-framework (38) for systematic reviews are sometimes less useful within other types of reviews than medical, such as educational, where the outcomes included might range, control groups are not always present, and studies without an intervention are of importance to include (39). This issue will be address through searching both generic and subject specific bibliographic databases (e.g. ERIC), hand-searches, contact with experts, and citation checking (40). These additional manual searches will be performed throughout the search period, and reference lists of earlier reviews will be gone through for additional eligible articles (SH).
Identification of eligible publications
Duplicate abstracts will initially be removed by KIB (41). All remaining records will be screened in two stages. First, title and abstract will be screened independently by two researchers (SH, BH) using Rayyan software (42) according to the inclusion and exclusion criteria, where obvious irrelevant studies will be removed. Second, full-text versions of publications will be obtained for included abstracts to be screened in further detail by two researchers (SH, BH) independently. Additionally, 10% of abstracts and full texts during the two stages of screening will be cross-checked by a third researcher (ÅN). If a decision cannot be made regarding a full-text articles’ eligibility, all three researchers will discuss the issue until consensus has been reached. A detailed description of the screening process will be presented in text and with a PRISMA flow chart of study selection figure.
Once measurements are mapped to CFIR’s domains, PAPERS criteria, and measurement equivalence (invariance), we will perform the analysis of each measurement. The final set of publications included will be carefully evaluated for the presence of CFIR-related constructs and items, and their psychometric and pragmatic properties.
Extraction of data from eligible publications
Data will be extracted and compiled through a systematic process in accordance with a developed project codebook which covers study characteristics, PAPERS criteria, measurement equivalence (invariance), and CFIR domains by three researchers (SH, BH, ÅN). For the purpose of a similar pre-understanding of the topic, the team will read articles on related measure evaluation system (e.g., COSMIN (43)) and psychometric properties (24, 25, 35–37), and implementation science reviews (20, 22, 27), as well as the original work of the theoretical framework CFIR (17). As a first stage, the team will extract data according to the project codebook from two included articles, independently of each other. The similarities and differences in coding will be addressed and discussed until consensus has been reached, to initially obtain an as comparable data extraction and coding process as possible.
Study characteristics
Characteristics of each study will be extracted, synthesized and reported, such as country, setting, sample and study population, characteristics of the innovation being assessed, and guiding theoretical frameworks.
CFIR coding
Factors influencing implementation being assessed in each measurement will be coded according to CFIR’s five domains and 39 constructs using a developed project codebook, based on the original work of Damschroder and colleagues (17) and data analysis tools available online (44). The coding process will focus on assessing items, constructs, and domains of each instrument and evaluate how they align with CFIR domains and constructs. Due to the considerable heterogeneity of how procedures of items, constructs, and domains are operationalized across disciplines and across researchers regarding measurement development and adaptation (29), we have chosen this three-focused coding approach. The CFIR’s five domains and 39 constructs are presented in Table 2.
Psychometric and pragmatic coding
We will apply the PAPERS scale (36), and measurement equivalence (invariance) defined by Putnick and Bornstein (37), summarized in Table 3. The rating scale includes five pragmatic measurement characteristics that reflects the ease or difficulty of use. Nine psychometric measurement characteristics are included in the rating scale to assess reliability and validity All properties of the PAPERS scale are rated on six levels with predefined values; poor (-1), none (0), minimal/emerging (1), adequate (2), good (3), or excellent (5). Additionally, we have chosen to include a tenth psychometric property – invariance – reflecting measurement equivalence. This is due to its importance as a prerequisite to comparing group means, and is most commonly tested through structural equation modelling using confirmatory factor analysis (37). Invariance will be descriptively assessed, and not rated against any scale.
Analysis and synthesis
Reporting of the results will be done through both narrative descriptions as well as descriptive statistics using proportions and frequencies of the psychometric and pragmatic properties, and CFIR domains and constructs.