Reflections on Risks in Real-World Experiments

June 2, 2020

Author

David Tan

Head of Experimentation, UNDP Accelerator Lab Malaysia

I had joined the Accelerator Lab in Malaysia (AccLab) as Head of Experimentation for less than two months when COVID-19 hit. In that time, we developed several experiments—some in conjunction with the CO—but have had to postpone and will perhaps even have to cancel much of what we had planned. Rather than letting that effort go to waste, we think there are still important learnings we can draw from the experiment design process—especially around identifying and planning for risk.

The AccLabs—and the Experimenters in particular—have been tasked with developing experiments that test portfolios of solutions and with building portfolios of experiments that shed light on persistent and tangled real-world problems. We want to test multiple hypotheses simultaneously rather than one-at-a-time to speed up learning. Designing these experiments is a fun, creative challenge. The real-world nature of our experiments, however, raises the stakes. Complicated ethical challenges emerge that are not seen in the controlled settings typical of traditional medical and academic research. And, when interventions are complex, interconnected, and contextually dependent, are simple experiments even possible?

At the heart of this problem is the very real tension between keeping experiments “safe to fail” and getting meaningful answers to the really important questions. Can we really go up to partners and ask for their time, effort, and resources for an experiment that could potentially make things worse? When we build in the safeguards to protect communities from risks and commit to fixing things that might go wrong, short simple experiments quickly grow in cost and scale and become hard to justify. Rather than discussing these trade-offs in the abstract—and there are many good discussions on this topic—I would like to explore this in the design of four concrete experiments.

Benefits and harms

A Shell filling station in Sabah, Malaysia. Photo: Uwe Aranas/Licenced under CC BY 3.0 (Creative Commons)

Most policies are never tested before they are implemented at a state- or national-level. That statement may sound surprising. Yet, if we think about the implications of testing certain types of policies, we run into practical and ethical questions. An experiment in reducing sales tax in a single locale will result in buyers and sellers from outside that location coming in to take advantage of the lower rate, skewing experimental results. Most people would consider it unethical to choose one city or community to test out a new tax policy—especially if the new policy raises taxes—while the rest of the country operates on the status quo.

These were precisely the questions we ran into when the AccLab and the Economists’ Unit were asked to develop a randomised control trial (RCT) to help move the discussion on carbon taxes in the Malaysian context. Although there is growing evidence that RCTs often sacrifice external validity, (i.e. that the conditions of an RCT are too different from real-world conditions to tell us about how the intervention being tested would actually work in real life), the drive for evidence-based policy has given RCTs tremendous persuasive power as they are (rightly or wrongly) considered the gold standard for evidence. Additionally, the ability of an RCT to demonstrate a cause-and-effect relationship without having to appeal to scientific theory is attractive when addressing politically charged issues such as a carbon tax.

Our task was to develop an experiment that would generate evidence in the form of consumer behaviour showing willingness (or not!) to pay for a carbon tax. We revisited and ruled out alternatives to RCTs. A survey alone would be unconvincing, given the difference between stated and actual behaviour. Natural experiments (e.g. looking at fuel consumption at different petrol price levels) were regarded as too abstract. We eventually settled on an experiment exploring a range of incentives to shift individuals’ transportation modes from private motorised vehicles to public transit. One intervention would provide participants a financial incentive for taking part in the experiment but deduct from that incentive for each kilometre travelled by private motorised vehicles. Psychologically, this ought to simulate a penalty or tax on such behaviour. We also planned to check to see if positive financial incentives for reducing private motorised travel has a symmetrical effect.

This experimental design represents a compromise between the answers we want and the questions we can actually ask. We think it could still allow us to draw meaningful conclusions about behavioural effects but realise that the need to extrapolate behavioural changes weakens the reliability and force of the conclusions we could draw from the data.

Managing expectations

Collective intelligence engagement with the Seberang Perai City Council. Photo: Benjamin Ong/UNDP Malaysia

As UNDP increasingly focuses on localisation and implementation of the Sustainable Development Goals (SDGs), there is a corresponding need for us to engage local communities. This comes with its own set of challenges. We do not want to be extractive—merely taking information from communities without giving a commensurate benefit in return. We want to be consultative and empowering, finding out what communities really want and, to the extent possible, letting them make the decisions about the things that affect their lives. What happens, however, when expectations are created through the process of consultation and in our efforts to ensure commensurate benefit for the time and energy we ask of our stakeholders? When supporting local actors in SDG implementation, might we give the impression that UNDP is going to be more involved than we actually are? How should we manage these risks?

In stakeholder mapping for the Malaysia AccLab’s first frontier challenge on greenspace activation in support of the Seberang Perai City Council (MBSP), it was immediately apparent that the two People’s Housing Projects (PPRs) adjacent to the park are critical to the sustainability of the park. We wanted user-led design of the park, using events, dialogue, and prototyping to get different groups of park users to imagine what a park that met their needs would look like. We wanted to understand the gaps in PPR facilities and whether the park could be part of a solution to address them. We wanted to know what governance structure PPR residents would design with the local council to ensure the park remains well-managed. Then, we realised that as valuable as this information would be, the process of collecting it would likely create expectations among the PPR community beyond our ability to deliver—especially since we are not the final decision-makers on the park.

Taking a step back, we felt that the in-depth user-design engagement we wanted to create would be best carried out by an actor who has a long-term commitment to the community, not one who would parachute in and then leave. While MBSP might not have all the tools that we do, they (1) have a better understanding of the PPR communities and (2) know what is and is not feasible and how to communicate this. Moreover, the PPR communities understand what MBSP can and cannot do based on their past experiences. Finally, having MBSP lead the dialogue with the PPR communities reduces the risk of engagement fatigue, as the city council can work that into their ongoing dialogue and strategic timelines.

For these reasons, we decided to provide capacity-building to MBSP by conducting training on creative techniques for exploring solution spaces with communities supporting them in design of the engagement programme while they take the lead. This meant delaying our timelines for data collection but would hopefully lead to better outcomes in this project and in long-term relationships between MBSP and the PPR communities. Meanwhile, we carried out user-led design engagement with schools and schoolchildren—a group not directly impacted by the park rejuvenation and thus has less at stake, who would naturally perceive our engagement with them as a one-off event, and a group that MBSP does not regularly engage. The lower the risks involved in this experiment but extends the duration of our engagement process with MBSP, with all the associated benefits and costs.

Consent and navigating boundaries

Ramadan Bazaar #8. Photo: Choo Yut Seng./Licenced under CC BY 2.0 (Creative Commons)

Informed consent has been a hallmark of medical and social research. It is not sufficient to get a signature on a piece of paper—each participant must understand the risks and benefits of participating in the experiment and must be free to opt in or opt out. In many types of real-world experiments that affect groups of people instead of individuals, obtaining informed individual consent may be impractical or outright impossible. For example, if we were to upgrade water supply systems in several different villages to understand how this impacts their health, must we obtain individual consent from every member of the village to proceed? Could one person withholding consent prevent the experiment from taking place? And, even if we allowed for that, how would we manage the social pressure on dissenting individuals to provide consent if the majority of the community wants the water supply upgrade?

The Malaysia AccLab had identified waste management in our second frontier challenge. One idea that surfaced quickly was how we might experiment on minimising waste generated at public bazaars, a (then!) very pertinent challenge in the local context. This was particularly timely given the Ramadan fasting month, as crowds throng to Ramadan bazaars in the evenings to break fast together. In a public space in which people are constantly entering and exiting, obtaining individual consent would be very difficult and intrusive. Furthermore, many of the approaches we would want to test consist of behavioural nudges—that is, small changes to the environment that shift behaviour at a subconscious level. Informing individuals that they are part of an experiment would change their behaviours in ways that invalidate the experiment. Yet, experimenting on people who are unaware of the experiment is an ethically dangerous area.

With the COVID-19 outbreak, the Ramadan bazaar experiment is not going to be feasible. Still, many experiments to shift behaviour—including behavioural nudges to reduce the spread of COVID-19—face similar ethical questions. We can take certain steps to minimise ethical risks, including seeking gatekeeper consent (e.g. the organiser of the bazaar) and choosing to track less “personal” data (e.g. measuring total waste generated under different intervention conditions instead of directly observing people’s behaviour in response to a nudge). There are many examples available from academia and medicine on how to manage such ethical questions. These trade-offs can be difficult to navigate, however, and different disciplines of research have come to different conclusions on where the boundaries are drawn. As UNDP moves towards creating portfolios of experiments for learning, there is a need for appropriate guidelines and processes to help us along.

Portfolios or complex experiments?

Arghyam - World Water Day Painting Exhibition, 2012. Photo: Vijay Krishna/Licenced under CC BY-SA 2.0 (Creative Commons)

Ideally, we want to test a multitude of approaches to solving a problem or achieving a goal in our experiments—this is what portfolio of hypotheses is about. This approach is good for simple problems with simple solutions but runs into complications when we are attempting complex solutions for systems change. A complex intervention is more than the sum of its parts. And, executing a portfolio of complex experiments is resource intensive.

We had an opportunity—now postponed indefinitely—to run a communications experiment with an arts and theatre group. They were going to put on an exhibition using different forms of art to communicate the value of water and water bodies and to enable participants to imagine and experience the impact we have on the aquatic environment. The use of multiple art forms would create a natural experiment—we could try to find out what forms of art attract and impact different segments of Malaysian society. There was one question that we could not find a satisfactory way to nail down however: would the impact of each form of art be the same apart from the exhibition as a whole?

The “scientific” way to answer that question would be to run the exhibition as a whole and display individual pieces and performances separately at a different time. To be thorough, one might even run the exhibition in different configurations, removing one or more pieces, or changing the order in which the participants experience them. This, of course, would be impractical. So, we had to accept that whatever findings we got would be partial and tentative. If a simple art exhibition gets this complex, we should not be surprised if we find it near-impossible to disentangle real-world problems.

Managing Learning and Risk

Dandelion wish. Photo: John Liu/Licenced under CC BY 2.0 (Creative Commons)

With all these challenges in the way, can we ever carry out the portfolio of experiments that UNDP needs? No single AccLab can. Even using the experiments of the AccLabs globally as a portfolio will be difficult, given the widely varying contexts in which we work.

Experiments are inherently risky. I had a recent conversation with an NGO leader who remarked that government officials do not attempt new things because failure carries tremendous political and career consequences—but was glad that UNDP is different. But how different are we, really? We may have a somewhat greater tolerance for risk, but we are still answerable to funders, to governments, and to the communities we serve. When the welfare of individuals and communities is at stake, how safe can it truly be to fail? And, if limited risk yields limited answers, how do we balance the two?

First, we need to recognise that not every experiment will yield an actionable lesson. So, we need to diversify. We need more experiments and more experimenters. This is why the Malaysia AccLab is attempting two experiments in experimentation (and would run more if we could!). We are fortunate that our Resident Reprensentative and our colleagues see the need to learn more from our Country Office (CO) projects. So, we will be launching a “Design Lab” in which we provide colleagues with new tools, reframe problems, and find space in projects for safe-to-fail experiments. We hope that we will not just design experiments but also develop experimenters. Another approach we are taking is exploring partnerships with local academics to multiply our experimentation and learning capacity.

Second, we should make joint decisions about risk with the communities and stakeholders we partner with. They may well have higher acceptance for risk than we do—or the reverse may be true. As the CO embarks on Area-Based Programming, co-creation with local partners is one of the key principles guiding this process. We aim to go beyond getting consent; we want to achieve a deep level of ownership among all those involved. This necessarily means slowing down to listen, explore, and negotiate, and perhaps not getting to do some of the things we would like to do. These are risks and costs to us that we hope will help communities choose the right level of risks and costs they want to undertake.

We are still in the early stages of these efforts, so results are pending. But we are hopeful and look forward to telling you just what we learn from these endeavours.