As published in the October 2021 issue of The Water Report.
PJM economics’ Paul Metcalfe summarises new research that sheds light on why willingness to pay varies so much across water companies. These findings have important implications for the design of the proposed national study.
Willingness to pay (WTP) research has shaped business plans for many years, through its impact on the setting of both performance commitment levels and outcome delivery incentive (ODI) rates. Throughout this time concerns have been voiced by many, including Ofwat, that the methodology is flawed and that the results are unreliable. This comes despite the fact that WTP studies have generally been subject to high levels of scrutiny, including from academic peer reviewers, and have usually offered substantial within-study evidence of their validity and reliability.
The key problem, as has often been pointed out, is that valuations have varied massively across companies. For example, total household WTP per avoided internal sewer flooding incident ranged from £1,722 to £123,477 at PR19 and from £22,530 to £367,291 at PR14, with similar variation for other service measures. This begs the question whether the WTP results genuinely reflect customer preferences or whether they are driven by features of the survey designs.
In putting forward the idea of a collaborative national study, Ofwat hopes that applying a common methodology will help ensure comparability of results, and thereby identify genuine differences. However, the national study could find itself maintaining an unacceptably large variation in WTP results across companies and struggling to explain why. The UKWIR 2011 guidelines on carrying out WTP research were themselves expected to reduce the variation in WTP estimates for PR14 by ensuring a common methodology was adopted across companies – but this clearly did not work.
A solid understanding of what drives differences in WTP is vital if the national WTP research is going to be fit for purpose. Differences in average income per capita across companies, for example, would be a valid reason for variations in WTP. On the other hand, excessive sensitivity to the scope of service change offered in a survey would suggest that values are unreliable with respect to alternative reasonable survey designs.
To explore this question, in a study funded by Scottish Water, we assembled a dataset of PR14/SRC15 valuations from 18 water companies, including all 10 water and sewerage companies in England and Wales, plus Scottish Water. These offer a good data source for analysis due to the fact that studies adhered to the UKWIR 2011 guidelines with very similarly defined service measures, and few methodological differences. PR19 methodologies, by contrast, tended to depart from one another in different ways, thus muddying their comparability, although most companies continued to utilise the same choice experiment approach at their core.
Econometric modelling was then used to measure the impact of potential drivers on WTP values for five common service issues: Discoloured water, Unplanned supply interruption (3-6 hours), Unplanned supply interruption (6-12 hours), External sewer flooding, and Internal sewer flooding. These were pooled together into a single model so that the impact of differences in the service issues themselves could be isolated as a driver of variation.
Further potential drivers explored included: base levels of service for each service measure; the changes in service levels offered in the surveys; the number of service measures valued in the surveys; the number of households supplied by the company; the way in which risks were framed in the surveys; the size of the average water bill in the company area; whether values were scaled to be consistent with a full package of improvements or were reported unscaled; and GDP per capita in the company supply area.
Drivers of variation in WTP estimates
The results from the modelling are striking in a number of ways. Firstly, the model was found to explain 95% of the variation in (log) WTP values, with ‘residual’ variation being only 5%. We thus obtained a lot of explanatory power from only a few variables.
The most significant driver of WTP was, by a long way, the scope of service change valued. As shown in Figure 1, differences in this variable accounted for a full 59% of the variation in (log) WTP estimates.
Specifically, the results show that doubling the risk reduction valued in the survey approximately halved the unit WTP value. Put another way, when comparing across companies, WTP for a reduction in the risk of a service issue seems to be completely insensitive to the size of the risk reduction shown – a serious problem for the methodology.
Another striking finding was that, in most cases, values did not vary significantly across service measures once other factors were controlled for. Only internal sewer flooding was found to be significant, with underlying values for avoiding internal sewer flooding higher than values for avoiding all other types of service issue. Despite this, the difference between internal sewer flooding and other service issues was sufficiently important to explain 25% of the difference in (log) WTP estimates.
The remaining statistically significant influences included: the number of households supplied by the company, which had a positive impact on WTP; the way in which risks were framed, as measured by the size of the denominator in the risk measures shown; the number of attributes in the survey, which had an effect consistent with the scope effect such that the more attributes shown, the lower the unit WTP value; and the GDP per capita in the company area which had the expected positive impact on WTP, but was only able to explain 1% of (log) WTP variation.
Variables that had no statistically significant impact on (log) WTP included: base levels of service for each service measure; the size of the average water bill in the company area; and whether values were scaled to be consistent with a full package of improvements.
By far and away the key result from this research is that variation in WTP estimates is driven by the scope of service change offered. This itself is driven by the fact that customer WTP for small risk reductions appears to have been almost completely insensitive to the sizes of those risk reductions, when compared across studies. Accordingly, the traditional methodology does not appear to be reliable, in the sense that alternative reasonable designs with respect to the scope of service change offered could be expected to result in very different valuations.
How can this be? A rigorous process was undertaken to develop the UKWIR 2011 guidelines, studies were often academically peer reviewed as well as subject to challenge from internal and external stakeholders, and studies often provided plenty of within-study evidence of their validity and reliability, consistent with best practice guidelines.
The issue is that within-study validity tests do not, in general, have the same power as external comparisons – whether these be across companies, or across different survey versions for the same company. To be assured of validity there needs to be evidence from such comparisons that results vary with scope to an acceptable degree. This recommendation is not new – best practice guidelines from 1993 included the same recommendation – but it tends not to be adopted in valuation studies because it reduces the sample size achievable for a preferred version within a given budget. Recent evidence, however, including that derived from the present study re-affirms the importance of such tests.
Similar findings concerning sensitivity to scope have been observed in related fields, most notably studies valuing small mortality risk reductions to determine a value per statistical life. In that literature, emphasis has been placed on the need to spend time and effort within the survey, including appropriate visual aids and verbal analogies, to help participants understand and evaluate risk changes objectively, and to implement an external sensitivity to scope test to demonstrate validity.
Whilst there may be some benefit in re-examining the use of visual aids and risk analogies, this approach seems unlikely to generate any substantial improvement. A recent major meta-analysis of the value of a statistical life found that estimates were still highly sensitive to the scope of risk change offered even when limited to studies that had included visual aids, as recommended. Furthermore, for the UKWIR 2011 guidelines, a set of depth interviews was undertaken with water customers which included testing of a number of visual aids to represent risks. The results suggested that the use of visual aids might actually hinder some respondents’ understanding.
Instead, I believe a different approach to measurement is needed, one which focusses on measuring the relative impact of different types of service failure and using this to determine relative unit values, with the overall scale calibrated to an estimate of customers’ overall WTP for a broad package of service improvements. This methodology is set out in another recent research paper, along with a case study demonstrating its implementation at PR19. Although this new approach itself has some limitations, as described therein, it offers much potential to resolve the problems caused by scope insensitivity and, as such, offers some hope that valid and reliable WTP estimates for water and wastewater services are achievable for PR24, particularly if combined with other studies as part of a broader, triangulated, programme of research.
 Metcalfe, P.J. and A. Sen (2021) Sensitivity to scope of water and wastewater service valuations: A meta-analysis of findings from water price reviews in Great Britain. Journal of Environmental Economics and Policy (Forthcoming).
 Accent-PJM Economics (2018) Comparative Review of PR19 WTP Results. Report prepared for a club of UK water companies.
 Accent-PJM Economics (2014) Comparative Review of Willingness to Pay Results. Report prepared for a club of UK water companies.
 Ofwat (2020) PR24 and Beyond: Reflecting Customer Preferences in Future Price Reviews – A Discussion Paper. December 2020.
 NERA-Accent (2011) Carrying out Willingness to Pay Surveys. Report for UKWIR. Ref. 11/RG/07/22.
 Arrow, K., R. Solow, P. Portney, E. Leamer, R. Radner and H. Schuman (1993) Report of the NOAA Panel on Contingent Valuation.
 Lindhjem, H., S. Navrud, N. A. Braathen, and V. Biausque (2011) Valuing Mortality Risk Reductions from Environmental, Transport, and Health Policies: A Global Meta-Analysis of Stated Preference Studies’, Risk Analysis, 31: 1381-407.
 Chalak, A. and P. J. Metcalfe (2021) Valuing water and wastewater service improvements via impact-weighted numbers of service failures. Paper submitted to the Journal of Environmental Economics and Policy.