Eligibility criteria for this review align with the PICO-T framework, which defines populations, interventions, comparators, outcomes of interest, and types of studies eligible for review (17).
Types of studies
To allow for examinations of behavior pre- and post-intervention exposure, we will restrict our review to ex-ante randomized and nonrandomized controlled trials, controlled before-and-after studies, interrupted-time-series studies, uncontrolled before-and-after studies, and case series (uncontrolled longitudinal studies). We will harvest data from literature written in English and published in peer-reviewed journals prior to January 1, 2020.
Types of populations
Due to the nature and scope of this review, we will not limit study inclusion based on any specific population characteristics. To determine the magnitude and direction of the effect of descriptive information regarding others’ behaviors on an individual’s own behavior, we will perform subgroup analyses. These analyses will examine intervention effects based on individuals’ behaviors prior to intervention exposure.
Types of Interventions
Interventions of interest for this systematic review include health and environmental sustainability interventions that employ descriptive norms messaging, where exposed groups receive descriptive information regarding others’ behaviors. We will not limit inclusion based on the duration, frequency, or method of intervention exposure.
Types of comparators
We will not limit inclusion to studies that make comparisons between specific interventions and counterfactuals (e.g., alternative interventions or no interventions). However, for literature in which program implementers or researchers employ multiple study arms, we will make note of whether the intervention of interest was compared to an alternative intervention or a control group that was not exposed to any intervention.
Types of outcome measures
The main outcomes of interest for this review are the desirable health and environmental sustainability behaviors promoted by eligible descriptive norms messaging interventions. Such measures may include quantitative or qualitative prevalence of the desirable behavior, or the degree or frequency of performance of the desirable behavior. We are interested in exploring the effects of descriptive norms messaging interventions on behavior change and potential boomerang effects resulting from descriptive norms messaging interventions. As such, we will restrict studies to those that employed descriptive norms messaging and captured pre-intervention and post-intervention measures of the behaviors targeted by the intervention. We will not restrict study inclusion based on the date of intervention implementation or follow-up duration (i.e., duration of time from pre-intervention to post-intervention measures).
We will exclude studies if the description of study methods is incomplete or ambiguous to the extent that renders reviewers unable to determine whether the study meets all inclusion criteria. If authors fail to report sufficient statistics or data for estimating changes in behavior before and after intervention exposure, we will include the related studies in the review and thematic synthesis, but not the meta-analyses (19). Exclusion criteria are subject to change during screening and selection. If changes are made, they will be reported in the final review paper, and applied retroactively to all previously screened studies.
Information sources and search strategy
We will harvest data from literature written in English and published in peer-reviewed journals prior to January 1, 2020. We will use search terms and Boolean operators to identify articles that may be eligible for review. We will systematically search the following electronic databases: Cochrane Library, Campbell Library, EMBASE, ProQuest, PsycINFO, PubMed, Social Science Research Network (SSRN), Scopus, and Web of Science. For each database, we will first employ key word searches (e.g., descriptive norms, health behaviors), then apply MeSH terms (if applicable), text words, and phrases associated with the key words. We will then generate preliminary search results by combining all key word search sets with appropriate Boolean operators. Literature generated from the searches will be exported into a reference management software (EndNote). Duplicates will be removed prior to screening. Our detailed search strategy for MEDLINE is described in Additional file 2. As various electronic databases have their own respective search platforms that operationalize search strategies, we will modify our search procedures, as needed based on the respective databases. We will report the full database search process, including all iterations used to search all targeted databases when our searches are complete.
Study screening and selection
We will perform screening and selection in a stepwise manner. First, two reviewers will independently screen the titles and abstracts of papers identified during the database search, using our pre-established inclusion and exclusion criteria to identify eligible studies. The reviewers will exclude studies if their titles and abstracts: a) meet any exclusion criteria, or b) clearly fail to meet all inclusion criteria. If the reviewers are not able to make a definitive decision based on the title and abstract, they will independently review the full text of the paper to determine eligibility. The two reviewers will independently review the full text of all papers that are determined eligible upon title and abstract review such that studies included in the review meet all inclusion and no exclusion criteria.
If disagreements arise between reviewers during the study screening and selection process, they will be resolved by discussion with a third reviewer. If disagreements arise due to a lack of information, we will contact the primary study authors for clarification. We will record and report disagreements and their resolutions. We will use Cohen’s Kappa to assess inter-rater reliability between reviewers (18).
Data extraction and coding
Two reviewers will independently extract relevant data from all studies included in the review using a standardized data extraction form (see Additional file 3). We will pilot test the data extraction form using two randomly selected articles, and refine it, as needed. If relevant data are unclear or unreported, we will contact authors for clarification. We will harvest data about study populations (including age and gender distributions) and settings (including country and region), details related to study design and interventions (including intervention details such as descriptive norms message design and any non-descriptive norms co-interventions, and control details, if any), and statistical methods. We will extract data on behavioral measures both pre- and post-intervention exposure. For dichotomous outcomes, we will extract the number of participants who practice desirable and undesirable behaviors and ratio measures with standard errors, if available. For count outcomes, we will extract the number of episodes the desirable or undesirable behaviors were practiced (e.g., frequency of excessive alcohol consumption per week across study arms with the total person-time in each study arm the rate ratio) and standard errors, if available.
We will extract statistical information from included studies based on the type of data collected, the type of statistical significance tests performed, the effect size measures used, and the types of statistics information reported. If the study specifically reports data related to individuals who out-perform the descriptive norms messaging information in the pre-treatment measure, we will extract the pre- and post-treatment sample size. If the study reports the effect size d, it will be entered; if the study does not report effect size data for the groups of interest, but authors report sufficient information, we will calculate the estimated effect size estimator d. If the study meets inclusion criteria but does not report sufficient data to calculate effect size, we will contact the corresponding author to request the information or dataset, and perform effect size calculation ourselves. We will code effect sizes as positive if the performance of the targeted behavior increased in the desired direction after intervention exposure. In contrast, we will code effect sizes as negative if the performance of the targeted behavior does not increase in the desired direction after intervention exposure. If the authors indicate that the effect was simply “non-significant”, we will enter a zero effect size and p-value of 0.50 (20,21).
We are aware that the level of intervention and analysis will likely vary across studies. Some studies likely focused on individual behavioral change or maintenance while others likely addressed household level behavioral change or maintenance. As a result, we will calculate and present effect estimates based on the level at which the descriptive norms interventions were administered (e.g., individual, household, community).
Risk of bias assessments
Two reviewers will independently assess the risk of bias for each included study using Cochrane’s Risk of Bias assessment tool (22). This tool considers five quality domains within each study: selection bias (random sequence generation, allocation concealment), performance bias (blinding of participants/personnel), detection bias (blinding of outcome assessment), attrition bias (incomplete outcome data), reporting bias (selective reporting), and other bias (other pre-specified, unique sources). To eliminate the possibility of bias in assessing quality, author names and affiliations may be removed from reports before they are evaluated.
Quality of evidence appraisal
We will use the Liverpool Quality Assessment Tool (LQAT) to critically assess the quality of evidence included in the review (23). We will use the following criteria to assess the quality of evidence presented in the review: a) selection procedures, b) baseline assessment, c) outcome assessment, d) analysis/confounding, and e) contribution of evidence towards review questions that are rated as strong, moderate or weak.
We will conduct our analyses in two phases, which we summarize here, and detail further in subsequent sub-sections. First, we will conduct qualitative analyses to codify intervention themes, examine descriptive norms messaging structure and content, and produce a narrative synthesis of studies included in the review. Then, we will examine overall trends in behaviors before and after intervention exposure, and the heterogeneity of results presented in included studies. We will carry out meta-analyses if it is appropriate to do so based on the heterogeneity of studies, targeted behaviors and levels of intervention, and behavioral measures. We will also assess publication bias and perform moderator analyses. We will perform our meta-analysis in RStudio (Version 1.1.463) using the “metaphor” library to do so (24). Our pre-defined type I error threshold is alpha = 0.05 on two-tailed tests.
Given the scope of this systematic review and the descriptive nature of the intervention messages (i.e., providing statistical information of others’ behaviors), we anticipate heterogeneity amongst the included studies. Regardless of whether we are able to perform meta-analyses, we will tabulate the characteristics of the included studies and conduct a thematic analysis following the procedure suggested by Thomas & Harden (25). During our qualitative analyses phase, we will code the intervention messages, organize them into descriptive themes, use those themes to investigate the types of intervention messages that are effective at changing behavior in the desirable direction, and identify factors that may moderate the results (e.g., targeted populations, behaviors of interest). We will report results by intervention domain (e.g., energy conservation, vegetable consumption, alcohol consumption) and separately for subgroups based on whether individuals practiced desirable behaviors prior to intervention exposure.
Given the scope of our research questions and the anticipated heterogeneity of studies emerging from various domains, we assume studies included in the review will not estimate the same underlying population parameters. Therefore, we will use random effects models, and employ the restricted maximum likelihood estimation method (26).
Effect size weighting
To compute the weighted mean of the effect sizes, we will assign weight to certain studies based on the inverse of the associated fixed effects and/or random effects variances (27). This method has been shown to outperform weighting by sample size in random effects analyses (28).
Test for heterogeneity of effect sizes
To test for heterogeneity of the effect sizes, we will use the homogeneity statistic (Q), which has a chi2 distribution with degrees of freedom equal to the total number of effect sizes minus one (e – 1) (27,28). We will also use the I2 statistic as a second measure of heterogeneity, which is more useful to compare across meta-analyses, and is less dependent on the number of synthesized effects (27,29).
Outlying effect sizes treatment
Outlying effect sizes are defined as effect sizes that are three standard deviations larger or smaller than the mean. Following the winsorization method suggested by Dixon and Tukey, we will minimize any outlying effects and replace them with the next most extreme values (30,31).
We will address publication bias through the use of the following three strategies. First, we will address the “file drawer problem” by computing the number of fail-safe on individual effect sizes with Rosenberg’s calculator (33). Second, we will present funnel plots, with y-axes representing study precision (i.e., error of the intervention effect estimates in a reverse scale) and the x-axes representing intervention effect size estimates (i.e., standardized mean difference), as suggested by Sterne and Egger (34). To statistically test and adjust for publication bias with funnel plots, we will use the Trim-and-fill method (35). Third, we will examine the Hedges’ d values of individual effect sizes in a normal-quantile plot. If the effect sizes are from a normal distribution, the data point (effect size of individual studies) will rest near the diagonal of X Y; deviation will suggest publication bias (36).
We will analyze the intervention effects on subgroups of individuals who exhibit different patterns of desirable behaviors prior to intervention exposure. We will perform moderator analyses to examine whether the variance of study-specific treatment effects is systematically associated with the population characteristics. Moderators will be informed and refined based on findings from our thematic syntheses. We will perform post-hoc power determination prior to moderator analysis to aid in the interpretation of results (27).