This review provided a comprehensive overview about the methods of PA policy monitoring. The review also showed that there are several approaches of PA policy monitoring that differ with regard to the tools they utilize, the policy level and setting they analyse, and the level of government involvement. Studies were grouped into four categories: Report Cards on Physical Activity for Children and Youth (n = 47), HEPA Monitoring Framework (n = 5), HEPA PAT (n = 3), and other (n = 57).
Policy monitoring approaches differed in being (purely or mainly) research-driven, or (purely or mainly) government-driven, or applying a co-production approach. Each of these approaches seems to have its own strengths and weaknesses, and is related to different models of research-policy relations (18). In addition, the three most frequently applied tools are associated with different (implicit) theories of change with regards to generating impact on policy-making.
Research-driven approaches have the advantage that they collect data in a highly standardized and systematic way. In addition, the absence or low level of government involvement allows for an exclusively academic interpretation of data independent of political interests (which potentially allows for extremely critical verdicts of national policy). However, purely research-driven approaches that rely exclusively on a (systematic) literature search might have an extreme bias if only English-language publications are included. Furthermore, accounting for political context is difficult in international studies when no government officials are involved. Mainly research-driven approaches that involve government officials into selected stages of the research process can overcome these drawbacks to some extent. The Report Card on Physical Activity for Children and Youth is an example for a research-driven approach. Even though the tool does not focus exclusively on policy monitoring, it includes an indicator on government. The implicit assumption seems to be to generate a policy impact by assigning grades to governments and communicating them to the policy community (“knowledge shapes politics”). Evidence from studies analysing the impact of report cards indicates that the Canadian government used the tool as a source of information and as a ‘barometer’ for monitoring PA for children and youth (63). Other countries reported that the tool facilitated discussions between researchers and the government or even led to the involvement of researchers into policymaking (64). However, it is also reported that government officials were concerned about the legitimacy of the grade on the government indicator (63). Therefore, it remains unclear whether a research-driven grading process is also appropriate for the Report Card’s government indicator (i.e., policy monitoring) or whether this approach encounters more resistance of decision-makers.
Government-driven approaches benefit from the direct access of government officials to data on policies and from their intimate knowledge of the policy environment. Consequently, highly contextualized information is available. However, data might not be as detailed as when a research-driven approach is applied, as government officials might not engage in in-depth desk research. It is also likely that studies either remain more descriptive since government officials might hesitate to be too critical of their governments’ policies, or that officials will attempt to describe their government’s policies as more favourable as they are to stave of potential criticism. The HEPA Monitoring Framework is an example for a government driven approach, as the framework was developed by the European Commission – based on a recommendation of the Council of the European Union – and data are provided by national governments (“politics shape knowledge”). The implicit assumption might be that the regular monitoring of PA policies is an incentive for governments to develop additional policies in order to fulfil more and more indicators of the HEPA Monitoring Framework. A comparative analysis of the data collected in 2015 and 2018 showed that 17 out of 27 countries improved the number of accomplished indicators while five maintained a constant number of indicators (19). However, it remains unclear whether the progress in policy-making was influenced by the activities related to the HEPA Monitoring Framework. It remains also unclear, if governments take a “gaming the system” approach and start rather small-scale initiatives serving the sole purpose of fulfilling various of the often rather crude indicators.
Co-production approaches seem to combine the benefits of government-driven and research-driven approaches, i.e., to collect in-depth data via desk research while still relying on the knowledge of government officials. They are based on a clear theory of change, as the co-production approach can “produce research findings that are more likely (…) relevant to and used by the end users” (65) and might – as a consequence – have a higher policy impact compared to science-driven approaches. However, co-production approaches are resource intensive and require strong commitment from both government officials and researchers. The HEPA PAT is an example for a tool that was designed to initiate co-production approaches. Bull et al explicitly stated that the HEPA PAT is not only an instrument to facilitate the collection of data but also “to stimulate critical debate, greater awareness, a broader dialogue among relevant actors and a higher sense of ownership within countries at the national and local level” (12). As a result, the HEPA PAT might be “a catalyst for improved collaboration on future policy development and implementation” (12). By applying a co-production approach, the HEPA PAT is based on a well-established knowledge translation strategy at the nexus between public health policy, practice and research (65, 66).
In this context, it seems to be of particular importance to reflect on the quality of policy monitoring: In most cases, data cannot be collected exclusively by researchers (limited insider knowledge) or government officials (limited capacity) in order to gather and analyse the best available information on policies for PA promotion. It also has to be noted that assessing policies based on an expert consensus is highly subjective, and there is a need to apply more systematic approaches for an objective policy assessment. Such approaches have been developed, e.g. a scoring rubric for the government indicator of the Report Card on Physical Activity for Children and Youth that is based on the HEPA PAT and allows to generate a total percentage score according to a defined process (67). Furthermore, a Physical Activity Environment Policy Index (PA-EPI) has been developed recently that allows to assess the extent of implementation of government policies and actions in comparison to examples of international best practice; it is conceptualized as a two-component ‘policy’ and ‘infrastructure support’ framework and comprises forty-five ‘good practice statements’ across eight policy domains (such as, for instance, education, healthcare and sport-for-all) and seven infrastructure support domains (such as leadership, governance, and health-in-all-policies) (16).
Approaches for PA policy monitoring also differ with regards to the level of capacity building they may yield for governmental institutions. A monitoring approach directly involving governments may have the drawbacks of being resource-intensive and, when policies are assessed, lacking objectivity. However, it has the advantage of raising awareness for PA promotion within governments and fostering understanding for issues such as, for instance, definitions, recommendations and measurement issues.
When researchers and policy-makers are in the position to suggest or choose a specific approach for policy monitoring, they should consider the strengths and weaknesses of each approach with regards to data quality and capacity building. In addition, it needs to be considered at what stage policy-making is in a particular country (at an early stage, the need for applying a co-production approach might be the highest). Furthermore, the cultural appropriateness of a particular approach needs to be taken into consideration: While some governments might be very open to approaches that assess policy-making, governments in other countries might be rather reluctant to support policy monitoring approaches that rate or grade their own work.
There are some limitations to this study. First, the overarching results for the total of 112 studies are strongly influenced by the 47 studies that were based on the Report Cards on Physical Activity for Children and Youth. If these studies were not taken into account, the share of studies that were part of regular approaches to policy monitoring would have been significantly lower (16,9% instead of 51,8%) as well as the share of studies that assessed policies (20,0% instead of 53,8%). Second, the level of detail of information on methodological aspects of the included studies differed. For instance, Report Card studies usually provided a very general description of their methodology and did not include specific information on the grading process for the government indicator (an exemption is 67). A third aspect is that we might not have identified all studies on PA policy monitoring, e.g. due to limitations of the search term or because they have been published recently (e.g., 68, 69). This limitation also includes additional PA policy monitoring tools that were published in other languages – e.g. the Japanese Local Area Policy Audit Tool (L-PAT) (70) – as well as tools that have not been described in a scientific publication yet, e.g. because they are part of government-driven policy monitoring such as the Finnish TEAviisari tool (71). Finally, the identification of tools for PA policy monitoring was partly influenced by the fact whether researchers have framed their methodology as a new tool or not.