A multi-country meta-analysis on the role of behavioural change in reducing energy consumption and CO2 emissions in residential buildings

Despite the importance of evaluating all mitigation options to inform policy decisions addressing climate change, a comprehensive analysis of household-scale interventions and their emissions reduction potential is missing. Here, we address this gap for interventions aimed at changing individual households’ use of existing equipment, such as monetary incentives or feedback. We have performed a machine learning-assisted systematic review and meta-analysis to comparatively assess the effectiveness of these interventions in reducing energy demand in residential buildings. We extracted 360 individual effect sizes from 122 studies representing trials in 25 countries. Our meta-regression confirms that both monetary and non-monetary interventions reduce the energy consumption of households, but monetary incentives, of the sizes reported in the literature, tend to show on average a more pronounced effect. Deploying the right combinations of interventions increases the overall effectiveness. We have estimated a global carbon emissions reduction potential of 0.35 GtCO2 yr−1, although deploying the most effective packages of interventions could result in greater reduction. While modest, this potential should be viewed in conjunction with the need for de-risking mitigation pathways with energy-demand reductions. Behavioural interventions can reduce energy consumption and hence carbon emissions among households. Khanna et al. compare the effectiveness of different types of monetary and non-monetary household interventions using a machine learning-assisted meta-analysis, and examine the situations where each is most useful.

F inding low-energy demand pathways is necessary to hedge against the risks involved in decarbonizing energy supply and is key to finding socially acceptable ways to meet the Paris climate goals [1][2][3][4] . Energy demand from buildings was responsible for 28% of global energy-related CO 2 emissions in 2019, when indirect emissions from upstream power generation are considered. After plateauing between 2013 and 2016, buildings-related CO 2 emissions increased to an all-time high of 10 GtCO 2 in 2019 (ref. 5 ), with residential buildings accounting for almost 60% of these emissions. This recent growth in emissions was driven by an increased demand for building energy services that has outpaced energy efficiency and decarbonization efforts 6 . Besides technologies and architecture, behaviour, lifestyle and culture have a major effect on the energy demand from buildings, with three-to fivefold difference in energy use for the provision of similar building-related energy service levels 7 . Hence, addressing current growth trends in energy demand from residential buildings is one important component for guiding developments towards responsible, low-energy pathways in climate change mitigation. But a lack of systematic efforts to quantify demand-side solutions in general, and interventions in household energy demand specifically, in a way amenable to climate change assessments such as those by the Intergovernmental Panel on Climate Change (IPCC), has led to a bias towards riskier supply-side solutions 1,8,9 . There is a rich and diverse literature available on demand-side solutions 1,10 . Since the oil price shock in the 1970s, interventions to reduce energy use in buildings and appliance use have been researched extensively 11 . Experiments that use monetary incentives to reduce consumption have been trialled widely, even more so since the introduction of smart metering at scale over the last decade 12 . Evidence has accumulated on the use of other behavioural interventions, which encompass a range of initiatives that may, either by themselves or in conjunction with the more typical policy tools (for example, infrastructure and incentives), achieve greater energy consumption reductions than have been achieved by the typical tools alone 13 . Despite this vast pool of evidence that can be employed for policy-making, little is known about the global carbon emissions reduction potential of such interventions.
In this study we have addressed this gap through an interdisciplinary meta-analysis of interventions that foster behavioural change for reducing energy consumption by households, excluding incentives for equipment adoption and structural changes 14 . Previous meta-analyses tended to be disciplinary or focused on subsets of interventions: Faruqui et al. 15 on pricing interventions, Karlin et al. 16 on feedback and Abrahamse et al. 17 and Andor and Fels 18 on social comparison, commitment devices, goal setting and labelling. Nisa et al. 19 considered evidence from a wider range of household behaviours relevant to climate change mitigation, but did not review interventions in energy consumption exhaustively. The meta-analysis by Delmas et al. 20 broke new ground but was based on a narrower literature search and did not include studies published after 2012, which constitute about half of our sample. In this paper we provide a comprehensive, up-to-date meta-analysis that critically assesses the energy savings potential of interventions aimed A multi-country meta-analysis on the role of behavioural change in reducing energy consumption and CO 2  We have performed a machine learning-assisted systematic review and meta-analysis to comparatively assess the effectiveness of these interventions in reducing energy demand in residential buildings. We extracted 360 individual effect sizes from 122 studies representing trials in 25 countries. Our meta-regression confirms that both monetary and non-monetary interventions reduce the energy consumption of households, but monetary incentives, of the sizes reported in the literature, tend to show on average a more pronounced effect. Deploying the right combinations of interventions increases the overall effectiveness. We have estimated a global carbon emissions reduction potential of 0.35 GtCO 2 yr −1 , although deploying the most effective packages of interventions could result in greater reduction. While modest, this potential should be viewed in conjunction with the need for de-risking mitigation pathways with energy-demand reductions.
at household energy consumption, as well as their implications for carbon emissions reduction, to inform upcoming climate change assessments.
Our findings extend previous meta-analytic literature in important ways. Our sample of 122 relevant primary studies is twice as large as previous analyses and is geographically diverse. Based on these studies, we have found a statistically robust, medium-sized average effect of interventions targeted at behavioural change in energy consumption in residential buildings. We also evaluated the effectiveness of subsets, and also combinations of interventions. Our findings support the idea that behavioural interventions aimed at household energy consumption should not be looked at only individually, but rather as packages to increase effectiveness. Smart packaging can ensure that the overall effect of a portfolio of well-integrated interventions is larger than the separate effects when interventions are applied in isolation. We also translated the evidence on interventions in household energy consumption into estimates of CO 2 reduction potential. We found that the interventions studied deliver on average a modest reduction of 0.35 GtCO 2 yr −1 in global carbon emissions from residential buildings, although deploying the most effective packages of interventions and including other interventions that target maintenance and the replacement or upgrade of heating and non-heating equipment could result in greater reduction.

Interventions in household energy consumption are diverse
We performed a systematic review and meta-analysis of the literature (see Methods) on interventions in residential energy demand to reduce energy use by existing equipment. These interventions included monetary incentives that offer households a tangible financial reward for reducing energy consumption, and other behavioural interventions, such as nudging, appealing to norms, providing easily interpretable and credible information at the point of decision making and improving the skills required to perform or forego behaviours 13 . Following previous studies 16,18,20 , we classified the studied interventions into five categories, namely monetary incentives, information, feedback, social norms and motivation interventions (Fig. 1). We note that our focus in this study was on demand reductions through behavioural change that is mainly associated with the use and, to an extent, maintenance of existing equipment, and we did not specifically look at interventions that target household actions relating to the replacement or upgrade of heating and non-heating equipment, or structural changes in buildings 14 .
We identified and coded a total of 122 relevant studies across disciplines and geographies. This is twice the number of studies included in previous meta-analyses (Supplementary Table 3). We extracted 360 effect sizes from these studies, or an average of about three effects per study. Evidence is scattered unevenly across interventions. For example, more than half the studies in our sample involved forms of feedback, and only about one-fifth involved a motivation experiment (Table 1). Moreover, several of the studies not only evaluated the effects of individual interventions, but combinations of two or more (see Supplementary Table 5 for descriptive  statistics). This has allowed us to assess, across a large body of evidence, to what extent these packages of interventions are synergistic as proposed in the literature 16,21,22 .
Overall, our final sample represents research on a total of 1.1 million households across 25 countries. About half of the effect sizes in the sample come from studies in economics or business,  20,48 . Norm-based communication has been widely adopted by utilities in the form of home energy reports 49 , which seem to be effective in some cases even years after households received their initial reports 50 .
Commitment devices, goal setting, gamification Time-of-use pricing aligns the prices faced by households with the underlying cost of supply, which is higher during peak demand periods 43 . Other interventions reward consumers for reducing peakperiod consumption. Households are expected to reduce consumption as long as the financial savings from reduced consumption outweigh the costs of shifting or reducing consumption 21 .
These policies focus on promoting energy-saving behaviour by reducing the information deficit faced by households with activities and actions that can help reduce energy consumption 17 . The information provided may be general advice like energy-saving tips and practices through workshops 44 and mass media campaigns 45 or tailored advice in the form of home audits 46 .
Social pressure has also been employed in the form of public pledges or commitments by households to practice energyconserving behaviours 17 . Goal-setting interventions in which households commit to reducing energy consumption by a certain percentage over the course of the experiment are other commitment devices 18 . Some recent experiments have used webbased gamified platforms or mobile apps to induce behavioural change.   Table 4).
The studies in our sample reported effects in terms of the relative change in energy consumption, but the exact dependent variable and statistical technique employed varied across studies. To estimate the aggregate effect size, we first standardized the effects by converting the estimates of energy reduction reported for each study to Fisher's Z (ref. 23 ), and then used meta-analysis models to calculate the aggregate effect across studies (see Methods).

What interventions work best
Our analysis reveals a medium average effect size across all interventions. The estimated average effect (in terms of Fisher's Z) varies between 0.10 and 0.15, depending on the estimation model used, and is both statistically significant and substantive across model specifications ( Table 2). The average effect size is 0.10 in a random effects model with the DerSimonian−Laird (DL) estimator and 0.15 in a random effects model with the restricted maximum likelihood (REML) estimator. The REML estimator is recommended when heterogeneity is large, as is the case in our sample 24 . We also estimated a multilevel model to account for the dependence between effect sizes coming from the same study. This also gave an average effect size of 0.15, although with a slightly larger confidence interval. While an average effect size of 0.10 can still be considered small at the level of a single household but relevant if scaled up, an average effect size of 0.15 indicates a medium effect and is considered to be consequential both at a single household level and cumulatively over many households 25,26 . Removing studies that used pre-post analysis and including only those with control−treatment or difference-in-difference (DID) study designs reduced the average effect size from 0.15 to 0.13. The average effect size of only the studies that employed randomization is 0.14 ( Table 2). Our main results are robust to influential study analysis and variance matrix specification (see Methods). However, do note that these results may be subject to publication bias, which has been reported earlier in this literature 19,20 . Both the visual inspection of the funnel plot and the Egger's test hint at the potential presence of small-study bias in our dataset also (Supplementary Note 1).
The analysis shows that many interventions are complementary in that the effect of a combination of the interventions is higher than the effect of individual interventions (Fig. 2b). For studies that combined motivation, feedback and monetary incentives, the effect (0.48; 95% CI = (0.17, 0.79)) is additive, and is higher than the sum of the individual effect sizes (0.43). For other combinations, the interventions are complementary, but are not strictly additive, and the combined effect is lower than the sum of individual effects. For studies that combined feedback, social comparison and monetary interventions, the effect size (0.35; 95% CI = (0.12, 0.58)) is higher than the effect size for feedback, social comparison and monetary interventions studied individually, although it is lower than the sum of the individual effect sizes (0.43). Similarly, studies that combined information and feedback (0.17; 95% CI = (0.09, 0.25)) and studies that combined information and social comparison (0.18; 95% CI = (0.08, 0.28)) find an effect that is higher than the individual interventions.
In other cases, the effect size of the combination is about the same as individual effects, indicating little gain from combining  Overall, these results support the idea that interventions in household energy consumption should not be looked at only individually, but rather as packages to increase effectiveness 13 , although there might be trade-offs in combining them.

explaining heterogeneity in effect sizes
The meta-analysis models used to estimate the aggregate treatment effects indicate a high degree of heterogeneity in effect sizes across studies (I 2 is 94.12 for the DL model and 99.74 for the REML model). To understand what drives effect size heterogeneity, we performed a meta-regression controlling for a range of study characteristics, including region and time of study, study design and study level controls ( Table 3).
The impact of household interventions varies across regions and countries. Our results show that compared with the studies from the United States, studies from Asia involving monetary incentives (Table 3) report a higher effect size that is also statistically significant. While the underlying cause of this heterogeneity is not clear, it could be on account of differences in, for example, the size of the monetary incentive, energy costs and the use of electricity for heating and cooling, and this needs further research. Studies from the United Kingdom report lower effect size for motivation, monetary incentives and feedback studies compared with the United States. The average effect size reported by studies from continental Europe as compared with the United States is higher for monetary incentives but smaller for social comparison (as has also been reported in some primary studies 27 ), although the coefficients are not statistically significant in either case.
The design of studies also impacts the effect sizes reported by them. The primary studies in our dataset either compared the electricity consumption of the households before and after an intervention (pre-post), or across control-treatment groups, or both before and after intervention and across treatment groups (DID). The control-treatment and DID studies report lower average effect size. The coefficients of the moderator variables are statistically significant and negative when all interventions are considered together, and for all subsets of interventions except motivation interventions. We also found differences in the reported outcomes according to the statistical methods employed in the underlying studies. Studies that employed difference of means instead of regression designs (ordinary least squares (OLS) or household/time fixed effects panel) show higher effect sizes on average. Because electricity demand and prices are correlated with weather conditions, it could be an important confounding factor, especially in time-of-use pricing interventions. Indeed, the coefficient of the weather variable is negative and statistically significant for monetary interventions, indicating the need to control for weather in primary studies. We did not find a statistically significant impact of other control variables such as household size and demographics.
Household selection also impacts study outcomes. With monetary incentives, which have the largest effect size, the average effect is lower in studies in which households were not required to opt into the experiment. The coefficient of the opted-in moderator variable is positive and statistically significant. There are no statistically significant differences in the results between studies that employed randomization and studies that did not, except in case of feedback interventions.
Our results show that studies with longer treatment duration tend to find smaller effects on average for feedback and monetary incentives, although the coefficient is quite small. However, long-term studies are scarce-the mean (median) treatment duration in our sample was only 21.5 (12) weeks, indicating the need for long-term trials. Our meta-regression indicates that the average effect reported by newer studies is lower, although the trend is statistically significant only for feedback studies. Even though we controlled for quality and study design, there could be further differences in the study design over time that we are unable to capture.
The five-category classification of interventions shown in household behaviour as pricing, but the effect is not statistically significant. For the category of interventions tagged as feedback, we introduced a moderator variable for studies that deployed in-home displays. A negative coefficient, but one that is small and not statistically significant, indicates that such devices may not necessarily be a better means of providing feedback. It should, however, be noted that in-home displays come with different functionalities and their impact might vary, something that we are not able to capture here. For motivation interventions, we introduced moderator variables for studies that employed gamification or commitment devices in conjunction with or instead of simple goal setting. We found that these subcategories of interventions report higher effect sizes on average. While the number of studies using such experiments is small (11 and 4 effect sizes for gamification and commitment devices, respectively), it points to a promising new area of research.

emissions reductions from behavioural interventions
To address the lack of synthetic evidence on demand-side solutions 1 , we extended our meta-analysis to provide an initial estimate of the carbon emissions mitigation potential of the studied interventions for climate change assessments. We did this by using the percentage reduction in electricity consumption as the dependent variable in our meta-analytical models along with the aggregate emissions of residential buildings (see Methods). Interventions aimed at changing household behaviour on average deliver a reduction of 0.35 GtCO 2 yr −1 (95% CI = (0.17, 0.43)) in global carbon emissions from residential buildings. While the estimated emissions reductions are relevant in size, they are very modest compared with the approximately 5.6 GtCO 2 of emissions from residential buildings in 2018. Cumulatively, the emissions reductions will add up to 1.05-1.75 GtCO 2 if these effects persist over 3-5 years. We do not consider reductions over a longer period because of doubts over the persistence of these effects. The reductions in emissions could be higher if interactions between the various interventions highlighted in the previous sections, between injunctive and descriptive norms 28 , and between social norms, behavioural interventions and infrastructure provisions 29 are considered while devising policies. The cost effectiveness of a basket of interventions should also be assessed by considering the costs of different interventions (monetary incentives, for example, could entail higher infrastructure and regulatory costs). Further, our estimate is based on the current average emissions intensity of electricity grids, but would increase if the reductions in energy demand were to lead to a reduction of generation from coal power plants at the margin, as has been the case in the current COVID-induced demand reductions 30 . Our estimate only considers the reduction in energy consumption from household interventions, but not the shift in consumption from peak to non-peak hours, which could reduce electricity consumption during peak carbon emissions hours by up to 10% (ref. 31 ). Finally, our moderator variable analysis does not find substantial differences in the effectiveness of interventions across regions, and it is reasonable to expect that interventions in energy demand can temper the rapid growth of energy demand in developing countries in Asia and sub-Saharan Africa, leading to higher savings in emissions. The dependent variable is Fisher's Z calculated for each effect size. Z > 0 implies a reduction in energy consumption. The results from the multilevel model are broadly consistent and are given in Supplementary Table 6. ***P < 0.001; **P < 0.01; *P < 0.05.

Discussion
We have performed an interdisciplinary meta-analysis of the effectiveness of interventions in household energy consumption comprising 122 primary studies resulting in 360 effect sizes representing 1.1 million households in 25 countries. Our results reveal on average a medium-sized impact of interventions targeted towards fostering behavioural change in residential energy use among households. The effect is robust across the meta-analytical models and subsets of interventions. For the studies included in this analysis, monetary incentives have the highest average effect size, while motivation and social comparison have a lower average effect size. However, we note that this depends on the size of the monetary incentives. For example, a lower or higher peak to off-peak price ratio can decrease or increase the effectiveness of dynamic pricing 32 . Similarly, increasing or decreasing the time and frequency of feedback can improve or impair the outcomes of feedback interventions 16,17 . The design parameters of the primary studies and location can also have a bearing on the outcomes of interventions 14 . Because the relative effects of interventions may vary with context, the comparison between average effect sizes of different interventions should be interpreted with caution. Our findings support the idea that such interventions should not be looked at only individually but rather as synergistic packages to increase effectiveness. Smart packaging can ensure that the overall effect of a portfolio of well-integrated interventions is larger than the effect of individual interventions. Although some mechanisms relating to how various interventions interact have been suggested in the literature 13,22 , further research is needed to test these hypotheses, to harvest synergies and avoid possible trade-offs while combining interventions.
Beyond the interactions between interventions, we also used meta-regressions to investigate the heterogeneity of effect sizes. The moderator variable analysis points towards possibly lower effects for interventions implemented at scale due to self-selection bias, a concern that has also been noted in primary studies 33 . Our analysis also highlights the need for more long-term trials, using rigorous methodology and controls for contiguous factors. We were unable to assess the persistence of effects after the treatment period, which is critical, especially for non-monetary interventions, but also for monetary incentives to an extent. This is because the studies did not always include follow-up periods and even when they were included, they were not consistent in terms of the energy consumption metric, and the comparator used (follow-up period consumption to treatment period consumption or baseline consumption).
We drew on our meta-analysis to estimate a global emissions reduction potential of 0.35 GtCO 2 yr −1 through interventions aimed at behavioural change in the use of existing equipment in residential buildings. This estimate is a first attempt in trying to quantitatively evaluate possible emissions reductions from behavioural interventions based on meta-analytic evidence and needs to be refined further, for example, with regional-and country-specific estimates as the data become available.
Our initial estimate highlights the limited effect of behavioural interventions compared with the emission reductions required on the way to net-zero emissions 8 . However, the interventions considered in this paper do not cover all the interventions targeting household behaviours that are relevant to climate change mitigation. Most importantly, there are a series of interventions that specifically target the replacement or upgrade of heating and non-heating equipment, or structural changes in buildings that could be associated with more substantial emission reductions, but these were not the focus of this study. Previous analysis for the United States suggests stronger impacts of these interventions on emissions and their careful design can also have higher behavioural plasticity 14 .
Comprehensive meta-analysis of the full spectrum of interventions available to building residents would fill an important research gap. The IPCC has started giving more consideration to demand-side solutions and it is important that the scientific community supports this attempt by providing rigorous, gold-standard assessments of the available evidence. (1)  We developed a search string that followed the PICOS (population, intervention, comparator, outcome and study design) logic recommended by the Campbell Collaboration (Supplementary Table 1). The search string was developed iteratively by checking the results of the search against a set of studies of known relevance. We searched for articles that dealt with household energy (or electricity) consumption along with one or more of the interventions of interest (see Supplementary Table 2 for the inclusion/exclusion criteria). We only tagged as relevant studies that dealt with the energy consumption of households or dormitories and contained a quantitative estimate of the energy saved through a relevant intervention. We did not include studies that focused on price effects but only referenced load effects (changes in kW and not kWh) or those that only reported the effect on peak consumption and not total consumption. Studies that only provided an effect size but not the associated variance were not included in the final synthesis. In addition, studies in which no obvious comparator group was available (untreated control group or pre-intervention data) or in which the sample size was too small to extract meaningful estimates were excluded from the analysis. Studies that only measured the energy consumption of certain appliances or the direct load control of appliances by utilities were not considered. We reviewed quantitative studies aimed at behaviours for reducing energy demand mainly by leveraging the use and maintenance of existing equipment in residences and not by the adoption of long-lasting structural changes that reduce carbon emissions, such as energy-efficient appliances, building insulation and switching to green sources of energy, such as installing solar panels or switching to a greener supplier, although three of the included studies involved the provision of information/tips to households for reducing energy consumption through both home improvements and behavioural changes (Supplementary Table 7). Excluding these studies from the analysis did not materially change the results.

Literature search and data extraction. Our data collection strategy involved
Because we did not make any exclusions based on the date, methodology or the field of publication, the searches returned a large number of studies (64,931) after removing duplicates. To enable the screening of relevant papers, we applied a machine learning algorithm using support vector machines 34 to rank the studies in the order of relevance of their abstracts (Supplementary Note 2). A team of four reviewers then manually screened the abstracts of the top 6,023 studies. Full text screening was performed on a selection of 939 studies deemed relevant from this initial screening. The final sample included 122 studies after critical appraisal. The ROSES (RepOrting standards for Systematic Evidence Syntheses) 35 flowchart for screening and coding is available in Supplementary Fig. 1, and the complete list of studies included in the analysis is provided in Supplementary Table 7. Four reviewers extracted the relevant data from these studies using the rules laid out in a codebook. To ensure consistency, a sample of 50 studies were screened at the abstract level (Kappa = 0.77). The reviewers next carried out full text screening and coded the relevant papers from this sample, then discussed the coded fields to see whether any disagreements occurred and finally made suitable adjustments to the codebook. A single reviewer double-checked the final data collected from all the included studies. We used the NACSOS software 36 to manage the search results, remove duplicates, screen records and extract data.
Standardizing effect sizes. Although the dependent variable in the studies in our sample was uniform, the relative change in energy consumption, the exact functional form and the precision of the estimates varied across studies. Because most of the original studies employed regression analysis, following convention 23 , we standardized the effects by first converting the regression coefficients extracted from the studies into correlation coefficients r using the total sample size, which we then converted to Fisher's Z . For studies that employed the difference-of-means design, we first calculated the standardized mean differences or Cohen's d and then converted them to Fisher's Z. The conversions were performed using the standard formulae prescribed by Ringquist 23 .

Synthesis.
To estimate the aggregate effect size, we first standardized the effects by converting the estimates reported by each study to Fisher's Z (ref. 23 ). We used a random effects model to aggregate the standardized Fisher's Z from the original studies. A random effects model is appropriate when effect sizes in primary studies do not consistently converge to a central population mean 23,37 , which is certainly the case for studies relating to energy consumption in households with heterogeneous treatment effects 20 . We used the metafor package in R (ref. 38 ) to implement the random effects model with the DL and REMLs estimators. Although the DL method is relatively simple and popular, it can lead to severe underestimation of the variance when either the number of studies is limited or the heterogeneity is large. Instead, REML is often recommended, especially when the heterogeneity is relatively high 24 , so estimating using a REML estimator was preferred here. To check that no single study exerted undue influence on the aggregate effect sizes measured, we followed best practices by calculating three metrics for the estimation of influence, that is, Cook's distance, the cov ratio and tau2, using the influence function in the metafor package in R. This function calculates the value of these metrics for each effect size included in the analysis. The values of these metrics were distinctly different for eight effects belonging to four studies and could therefore be suspected of being influential observations in the analysis. Dropping the influential observations reduced the estimated average effect size to 0.08-0.12, but the results remained statistically significant and the estimate of tau2 decreased leading to a smaller prediction interval.
The ordinary random effects model is inappropriate when the effect sizes included are not statistically independent 23 . Effect sizes are likely to be dependent in our sample as we extracted multiple effect sizes from each study. In addition, several of the studies in our set employed multiple treatments, and some used data from the same underlying experiments. We employed a multilevel meta-analysis model to account for such dependence. The multilevel model explicitly assumes that several of the effect sizes come from the same study. For the multilevel analysis we used the default variance-covariance structure in the metafor package 38 . To test the robustness of our findings we also used cluster robust inference methods using the clubsandwich package in R to estimate the variance-covariance matrix (cluster-robust variance estimation). The results presented in the main paper were robust to the use of these methods.
Interaction effects between the various interventions were estimated by including treatment type (monetary incentives, information, feedback, social comparison and motivation) as interacted dummy variables in the multilevel model. The resulting output gives the estimated effect when a single intervention is applied alone and estimates for all possible combinations of effects seen in the dataset.
Moderator variables for effect heterogeneity. The meta-regression models used to investigate the causes of heterogeneity in effect sizes were estimated using the random effects and multilevel models and by introducing moderator variables into the estimation equation. Mathematically, the interpretation of a parameter estimate on a moderator variable in a meta-regression is the same as for a parameter estimate from a traditional regression, that is, it represents the average change in the effect size associated with a one-unit change in the moderator. Moderator variables could represent factors that genuinely affect the magnitude of the relationship between the focal predictor and the outcome of interest, or could represent the design elements of the original studies that may affect the effect size from coded studies 23 . In this study we included both types of moderator variables. The design elements of the original studies were captured as dummy variables for the following variables: weather controls (if a study controls for any aspect of weather, it is assigned the value 1), demographic controls (if a study controls for demographic variables, such as age, income and composition of the family, it is assigned the value 1), residence controls (if a study controls for the characteristics of the house, such as size, it is assigned the value 1) and randomization (assigned the value 1 if households are assigned randomly between interventions). We also included as moderator variables the study design (difference-in-difference, control-treatment or pre-post) and statistical method (panel regression, OLS regression or difference-of-means tests) employed in the studies. Other moderator variables captured factors that were likely to affect the relationship between energy use and treatment, for example, the duration of experiment or region in which the experiment was performed.

Emissions reductions.
To calculate the mitigation wedge, we used the data on direct and indirect CO 2 emissions of buildings from the International Energy Agency (IEA) 39 . The reduction in electricity consumption was calculated by multiplying the estimated CO 2 emissions of residential buildings in 2018 (5.6 Gt) by the average percentage reduction in the energy consumption of households due to interventions computed in the meta-analysis models. The meta-analysis models for this part were run using the percentage change in energy consumption reported in primary studies as the dependent variable. The corresponding variance was approximated using the square root of the sample size 20 . The weighted percentage reduction in energy consumption corresponding to the weights of the meta-analysis models was estimated to be 6.3% (95% CI = (4.9%, 7.7%)).
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The authors declare that the data supporting the findings of this study are available within the paper and its Supplementary Information and on GitHub (https:// github.com/tarun-hertie/Household-Interventions). All the information collected in this project is publicly available in line with the systematic reviews reporting protocol 40,41 , providing the transparency and reproducibility required to conform with Open Synthesis principles 42 (see the ROSES checklist 35 ). Source data are provided with this paper.

Code availability
We used the NACSOS software 36 to manage search results, remove duplicates, screen records and extract data, and the metafor package in R (ref. 38

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code Data collection To enable screening of relevant papers, we applied a novel machine learning algorithm using support vector machines to rank the studies in the order of relevance of their abstracts. Data was collected manually from underlying studies.

Data analysis
Analysis was done using R (version 4.0.3 2020-10-10) and metafor package (development version Oct 2020). Code available as sumplementary information on Github For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability All the data collected from underlying studies is available on the Github page. There are no restrictions to availability of this data. The data regarding buildings emissions was taken from IEA (2020), Tracking Buildings report.

nature research | reporting summary
April 2020 Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Behavioural & social sciences study design
All studies must disclose on these points even when the disclosure is negative.

Study description
Systematic review and quantitative meta-analysis

Research sample
Data was extracted from 122 studies on behavioural and monetary interventions in household energy consumption. Data consists effect sizes found by these studies and study characteristics.

Sampling strategy
Our data collection strategy involved (1) a search for relevant existing literature reviews and the studies referenced by them; (2) string-based searches of bibliographic databases; and (3) searches for grey literature on Google. In accordance with guidance for rigorous evidence syntheses, we searched a broad set of bibliographic databases (Web of Science Core Collections Citation Indexes, Scopus, JSTOR, MEDLINE), and the web-based academic search engine Google Scholar, based on a comprehensive search string that followed the PICOS (population, intervention, comparator, outcome and study design) logic recommended by the Campbell Collaboration46. We developed the search string iteratively by checking the results of the search against a set of studies of known relevance. We searched for articles that dealt with household energy (or electricity) consumption along with one or more of interventions of interest.

Data collection
To enable screening of relevant papers, we applied a novel machine learning algorithm using support vector machines to rank the studies in the order of relevance of their abstracts. A team of four reviewers then manually screened the abstracts of the top 6,023 studies. The final sample included 122 studies after critical appraisal. . Four reviewers extracted the relevant data from these studies using the rules laid out in a codebook. To ensure consistency, a sample of 50 studies was screening at an abstract level (Kappa = 0.77). The reviewers next did a full text screening and coded the relevant papers, followed by discussion of the coded fields to see what disagreements occurred and suitable adjustments to the codebook. A single reviewer double checked the final data collected for all the included studies. We used the NACSOS software (https://zenodo.org/record/4121526#.X9Nmx2hKjIU) for evidence synthesis developed by MCC Berlin for managing search results, removing duplicates, screening records and extracting data.

Timing
The query for collection of data was run in Feb 2019 and then re-run in Jun 2020

Data exclusions
We only tagged as relevant studies that dealt with energy consumption by households or dormitories and contained a quantitative estimate for the energy saved through a relevant intervention. We did not include studies that focused on price effects but only referenced load effects (changes in kW and not kWh) or those that only reported effect on peak consumption and not total consumption. Studies that only provided an effect size but not the associated variance were not included in the final synthesis. In addition, studies where no obvious comparator group was available (untreated control group or pre-intervention data) or where the sample size was too small to extract meaningful estimates were excluded from the analysis.The inclusion and exclusion criteria, ROSES flowchart for screening and coding and the complete list of studies included in the analysis is available in supplementary information.