Blogs

Experimenting with policy and with policy evaluation

05 February 2025

Dr Simon Bailey

Being given early exposure to new policy ideas and their implementation is one of the most interesting things about conducting research for PRU HSSC, but thinking about the policy making process and its evaluation is not without its challenges.

For example, one challenge is that the boundaries between policy design and policy implementation become blurred: as soon as policy ideas are tried out in practice, even if on a limited scale they have already crossed over into implementation.

Policy piloting and experimentation has been a popular approach to doing exactly this for several decades, both within the UK and across the globe. Though it can be navigated in various ways, the conventional idea of a policy experiment is to trial ideas on a small scale in real-world scenarios, gathering evidence around successes and shortfalls to facilitate further development of the policy.

For an experiment to be useful, it needs to be realistic..

This is where design tips over into implementation – things have to actually happen on the ground in the ‘real-world’. However, a good experiment also needs to be measurable:, and have a clear start and end point, along with specific details about what has changed in between.

The two experimental ideals of realism and measurability can quickly become challenging in policy making, as it takes place within a complex and dynamic environment. Even seemingly small or simple changes might require significant system shifts: staff might be required to do things differently, possibly in ways that contradict current protocols; organisations need to create the capacity to do these ‘new things’, alongside all their ‘routine things’; and there might be differing understandings of what the new thing is, and how it strays from the routine. Doing these things may also have unforeseen impacts elsewhere in the system. All the while, things keep moving – there are no static points between by which we can measure change.

As such, making things more realistic could make them harder to measure.

The inverse possibility is that people work overtime trying to generate measurable change, and, in doing so, produce unrealistic experiments. When the experiment ends, those involved might be unable to ‘unlearn’ what they have done: the system in which they operate has changed and cannot easily be reversed.

All experiments, in this sense, are a form of early implementation.

As evaluators of early policy implementation, we must let go of the idea that design and implementation are inherently distinct, while still acknowledging the important differences between them. Once we accept that these phases can overlap, we should consider whether certain elements should belong solely to either design or implementation. We might think of this as separating the ‘what’ from the ‘how’ of policy. For example, if a new policy’s purpose and aims are clearly defined, it is less likely to be misinterpreted by local implementers. In turn, this clarity could make the impact of the policy easier to evaluate.

However, at times, policymakers want to create space for local determination.

This recognises that no two policy recipients begin from exactly the same place, but rather they operate in distinct contexts, and should be empowered to shape their own approaches to bridge the gap between (national) design and (local) context. Nevertheless, if too much flexibility is given to shape the ‘what’ at the local level, then the overall system risks losing coherence. This returns us to the question of which elements should remain within the design phase and which can be shaped during implementation. As evaluators, our task is to interpret how this fit works out in practice. Yet, like the system itself, the more local variation, the harder it is to draw generalisable insights.

Policy purpose can serve as an anchor for policy makers, implementers, and evaluators. If ‘what’ questions are left too open prior to implementation, ‘how’ questions tend to dominate. Evaluating such policies can reveal what is possible to achieve in practice, but there is a risk of overlooking whether the policy outcome is something we actually want.