Sign up for Quillette
Sydney. London. Toronto.
No spam. Unsubscribe anytime.
A review of Bad Data by Georgina Sturge, 288 pages, The Bridge Street Press (November 2022)
H.G. Wells once predicted that âstatistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.â It was a slight exaggeration, but in an age of big data in which governments pride themselves on being âevidence-basedâ and âguided by the science,â an understanding of where facts and figures come from is important if you want to think clearly.
Georgina Sturge works in the House of Commons Library where she furnishes UK MPs with statistics. If Bad Data is any guide, she also provides them with caveats and other words of caution, which are ignored. This informative, reasoned, and apolitical book offers a string of examples to show that statistics are not always what they seem. Some statistics are rigged for political reasons. Others are inherently flawed. Some are close to guesswork. Even crucial variables such as Gross National Income and life expectancy are shrouded in more uncertainty than you might think. We donât really know how many people live in Britain legally, let alone illegally. The number of people who are living in poverty varies enormously depending on how you measure it.
Crime and unemployment are hugely important to voters and therefore susceptible to manipulation by the authorities. England and Wales have data on recorded crime stretching back to 1857, but most crimes are not reported to the police and even when they are reported they are not necessarily recorded by police officers. Setting the police targets to reduce crime creates incentives for the police to allow possible crimes to go unrecorded. This has happened so much in Britain since the 1990s that the UK Statistics Agency stripped the recorded crime figures of their ânational statisticsâ status in 2014.
Unemployment figures are notoriously vulnerable to political manipulation. Under the Conservative government of Thatcher and Major, there were 31 changes to the way unemployment is measured. Some of these adjustments were trivial but many of them were, Sturge says, ânot just tweaks but quite major changes to who was included and, more often, excluded from the count.â This not only made it extremely difficult to compare unemployment statistics over timeâin some cases, it prevented people from claiming out-of-work benefits.
These are perhaps the worst cases of official statistics being twisted. Bad data is more often the result of human frailty and flawed methods. A lot of statistics are based on surveys, but people do not always tell the truth. They greatly under-report how much alcohol they drink, for example, due to a combination of shame and forgetfulness (we know they under-report thanks to alcohol sales receipts and the miracle of doubly labelled water).
Sampling bias can also be a problem. You may know, for example, that the governmentâs estimate of how many Poles would move to the UK after Poland joined the EU was, as Sturge puts it, âabsolutely miles off.â Between 5,000 and 13,000 Poles were expected. More than 500,000 arrived. But did you know that the UKâs migration estimates come from the International Passenger Survey which asks people arriving in Britain, more or less at random, what they plan to do while they are in the UK? The survey was designed for tourists but happened to identify some migrants along the way and so it became the main source of migration estimates. It is a big survey involving around 800,000 people but, crucially, is confined to Englandâs biggest airports. When Wizz Air and EasyJet started flying people from Eastern Europe to Luton, Stansted, and Leeds in the early 2000s, a large proportion of migrants went uncounted. The experts underestimated future migration because they were underestimating current migration.
Statisticians always want more information. Life would be easier for people like Sturge if we had a more European system of identity cards and centralised data. She does not call for this, to be fair, but her day job is clearly made more difficult by the way the British state has grown organically, and not always logically, over many centuries. Oliver Cromwell described England as âan ungodly jumbleâ and so it remains. As a result, England has dozens of police forces while Scotland makes do with just one, and we do not even know who owns 15 percent of the English countryside. âOur country is a patchwork,â writes Sturge, âand some patches are just holes.â
And yet it would be easy to exaggerate the problems with official statistics in the UK. Flawed statistics make evidence-based policy more difficult and there can be serious repercussions, as with the Windrush scandal and the hopeless algorithm behind the unfair A-level results of 2020. But for the most part, official statistics are imperfect but good enough. Recorded crime figures are flawed but we have the more reliable Crime Survey for England and Wales. Unemployment figures have been fiddled by politicians but statisticians have carefully created a dataset that allows you to retrospectively compare like with like. All this is available for free online. The Office for National Statistics is one of the few parts of the British state that still works well and there are some things we donât really need to know. As Sturge says, âWho cares how many Austrians there are in Wolverhampton?â
The bigger problem, which Sturgeâs book seeks to address, is the misinterpretation of statistics by people who should know better (and often do). In this regard, it is surprising that she does not write more about the pandemic, when an impressive amount of raw data became available to anyone with an internet connection but was widely misrepresented by bad-faith actors and horribly misunderstood by the statistically naive. The base-rate fallacy led to armchair virologists underestimating the efficacy of the vaccines. One science journalist made a fool of herself by not understanding the (admittedly unobvious) distinction between excess winter deaths and excess deaths in winter. Reporting lags in the mortality data encouraged âlockdown scepticsâ to believe that there was no second wave at a time when COVID-19 was killing a thousand people a day.
One of the statistical issues COVID-19 illustrated was the difficulty of comparing the UK to other countries when other countries measure things differently. A COVID death in the UK would not necessarily be registered as a COVID death in Belgium, and vice versa. As Sturge explains, similar difficulties arise when comparing the number of rapes in different countries and when comparing Gross National Income. The latter measure is far less definitive than many people assume. Subject to revisions and tinkering, it cannot always be fairly measured against previous years, let alone neighbouring countries. Did you know, for example, that the UK saw a bigger hit to GDP in 2020 than most countries because it measures healthcare productivity on the basis of outputs, such as face-to-face GP appointments, rather than on how much is spent?
I would have liked to see Sturge say more about some of the dodgy statistics bandied around by pressure groups who have every incentive for exaggerating the scale of the problem they are campaigning against. She mentions an estimate that air pollution kills 25,000 people a year and hints that she is sceptical about it but does not say why. Lack of sleep supposedly costs the UK ÂŁ40 billion a year and mental illness is said to cost ÂŁ105 billion a year. Does it really? I would have liked her to dig into these figures. And if you think the methods used to estimate the crime rate are sketchy, wait until you see how the government estimates the number of alcohol-related hospital admissions or how they calculate illegal tobacco sales.
If you take one point away from Bad Data it should be that the vast majority of statistics are estimates, some of them are very rough estimates, and statisticians are constrained by limited resources and bounded knowledge. It is not a crisis. Outright fraud is rare, but when confronted with an impressive statistic, especially when it seems surprising, it is worth asking, âHow do they know this?â Very often the answer will be that they donât really know it at all.