China, China Syndrome Series, COVID-19, Health, Long Read, World Affairs

The China Syndrome Part IV: Did China Fudge its Data?

Note: This is the concluding part of a four-part series of essays looking in detail at China’s role in the COVID-19 pandemic. Part One looked at the circumstances surrounding the initial outbreak; Part Two looked at the discovery of human-to-human transmission and the immediate response; Part Three investigated allegations that the pandemic began in a “wet market” or that the virus escaped from a lab in Wuhan; this part examines charges that China falsified its pandemic data. 

Allegations that China was falsifying its COVID-19 figures began to appear when its death and case rates were overtaken by even more dismal figures in parts of Europe and America. How could a repressive society like China possibly be getting this right while the West’s democracies were getting it wrong? As Western numbers climbed, commentators and politicians declared with growing certainty that China’s claim to have successfully suppressed its epidemic was simply the propagandistic lie of a mendacious totalitarian regime intended to fool its own citizenry and the rest of the world.

Back in April, Bloomberg reported that, according to anonymous US officials, China had intentionally released incomplete data on both the number of people infected by SARS-CoV-2 and the number who died as a result. Given the unreliability of anonymous US intelligence leaks (some of which I’ve examined in this series of essays), we should be sceptical of this report, and since Bloomberg’s sources apparently refused to disclose any details about how they arrived at these conclusions, it was irresponsible to have published them. That doesn’t mean, however, that China hasn’t manipulated its data. And since those data informed the policy responses of other countries in the early days of the pandemic when no other information was available, it’s important to know whether or not they were accurate.

Countless arguments have been made in support of the claim that China has lied about its COVID-19 figures and I can’t reasonably hope to address them all. But in what follows, I will discuss some of the most common. (The code I have used to perform the data analysis below can be reviewed on GitHub, so anyone can reproduce my results and scrutinise it for mistakes.)

COVID-19 cases, deaths, and noisy data

A number of commentators have argued that it simply is not plausible that the case fatality rate is a lot higher in France, Italy, and several other countries that are much richer than China. But while it is true that many countries have a much higher case fatality rate than China, many other countries do not (for all figures, right-click and open in a new tab to enlarge):


Countries with a lower case fatality rate than China include developed countries such as South Korea, Australia, and Germany. There is no legitimate reason to cherry-pick those countries most badly affected, while ignoring those where the case fatality rate is much lower. But, as we shall see, cherry-picking is a recurring problem in this debate.

More importantly, between-country comparisons of case fatality rates are largely meaningless. This is because the number of cases can vary enormously between countries for reasons that have nothing to do with the actual progress of the pandemic. Rather, the variation arises because different countries have different testing policies, different ways of defining cases, different levels of testing availability, and so on. In fact, because those variables often change within each country, even within-country case fatality rate comparisons over time are very difficult. For reasons like this, the quality of national case number data is almost universally poor, and certainly provides no reliable basis for the claim that China has deliberately and artificially deflated the number of cases on its territory.

A common argument is that, since China is home to almost 1.4 billion people, it’s highly unlikely to have had only about 84,000 cases and 4,600 deaths. The theory that China’s official death toll is underestimated is not just widespread across the political spectrum, it’s almost treated as received wisdom. The case fatality rate is just the number of deaths attributed to COVID-19 divided by the number of cases, so if China has underreported cases and deaths in roughly the same proportion, the case fatality rate could easily be within the range observed in the rest of the world even though China’s figures are completely bogus.

Another way to approach this problem is to separate the number of deaths from the number of cases and compare both as a proportion of each country’s population. Let’s start with the number of cases:


As you can see, China’s official number of cases per 100,000 is definitely on the low end, but there are plenty of countries with a similar or lower number of cases per 100,000 (it’s lower in 14 countries out of 208 in the dataset), so there is nothing particularly suspicious about it. This becomes even clearer if, instead of comparing China to every other country, we only look at countries in East Asia. This makes sense because the number of cases per capita is correlated within region, while there are stark differences between regions. Compared to its neighbours, China’s number of cases per 100,000 is unremarkable since almost half of the countries in that region have a lower number of cases:


As I’ve already noted, the number of cases in a country is a very noisy indicator, because it’s affected by all sorts of things that have nothing to do with the extent of the epidemic in that country. Nevertheless, if the Chinese government deliberately underestimated the number of cases in China, it’s hardly obvious by looking at the numbers.

The number of deaths ought to be a better indicator, because there’s less room for differences in definitions and policies to bias the comparison. Moreover, data about deaths are harder to manipulate, because dead people tend to get noticed. However, that is not to say that data about deaths are perfect, and there is plenty of noise even in the absence of deliberate manipulations. Not all countries have the resources to systematically test for SARS-CoV-2 in the recently deceased, which can make a comparison misleading even if no one is trying to fudge the figures. And even if every recently deceased person were tested everywhere, some countries will attribute any death to COVID-19 if the deceased tested positive, while others will try to determine the cause of death and only attribute a fatality to COVID-19 if they decide the person would not have died had they not been infected. These caveats notwithstanding, data on the number of deaths is probably not as noisy as data on the number of cases, so let’s see how China’s number of deaths per million compares to that of other countries:


As with the number of cases per capita, it’s on the low end, but many countries have a similar or even lower number of deaths per million (it’s lower in 40 countries out of 208 in the dataset), so China’s official death toll is hardly the red flag people think it is. And if we just compare China to its neighbours instead of the entire world, it becomes even less exceptional, since more than half of the countries in East Asia have fewer deaths per million than China:


Across East Asia, the number of deaths attributed to COVID-19 is remarkably low compared to Europe and the US. Of course, if you compare China to Italy or Belgium, it’s going to look suspicious, but this is only because you’re comparing it to outliers.

Besides, looking at the number of cases/deaths per capita is no panacea and can even be misleading. If one country has twice as many inhabitants as another, it doesn’t necessarily follow that it will have twice as many cases and suffer twice as many fatalities. This could only be expected if the virus appeared uniformly throughout a country, but it doesn’t. It first appears in one or more places—either because that’s where it originated (as in Wuhan) or because it was introduced from elsewhere—and then it spreads unless something is done to prevent it.

Compared to most other countries, China has a vast landmass and a huge population. But the virus has mostly been contained in Hubei and, even more specifically, in the Wuhan metropolitan area. This area is only home to between 0.8 and 4.1 percent of the Chinese population, depending on whether you include Wuhan itself, the surrounding metropolitan area or the province of Hubei. However, when looking at the number of cases/deaths per capita, we are dividing the number of cases/deaths by the entire population of China. This is bound to make China look better than most countries because most countries aren’t as large, so once the virus began circulating in one region, a much larger proportion of the population was at risk. For instance, having established itself in the region of Paris, the virus effectively threatened 18 percent of the French population. But when people compare the number of cases/deaths per capita in China with the number of cases/deaths per capita in France, people will routinely divide the number of cases/deaths in both France and China by their entire respective populations, as I have done here. Thus, even if France’s response to the pandemic had been more competent than China’s (which it assuredly was not), this alone would have resulted in a higher number of cases/deaths per million.

In almost every country, different regions are diversely affected by the pandemic, because the virus is not introduced everywhere simultaneously. Once things first start to go bad in the region or regions where the virus is circulating on a large scale, the government and people start taking steps to slow the spread by restricting movement, practicing social distancing and so on. The result is that regions where the virus was first introduced tend to be hit quite hard, while others are barely affected. This will bias a comparison that looks at the number of cases/deaths per capita against large countries with a large population distributed across many different areas. Now, in spite of what the official figures indicate, a lot of people insist that the Chinese epidemic was very serious outside Hubei. But that’s simply not true. Even if we allow for government manipulation of the numbers and the noisiness of COVID-19 data, it still ought to be obvious that nowhere else in China was hit nearly as hard as Hubei.

Had the authorities been forced to build a hospital in another province with the urgency that they constructed one in Wuhan, it’s hard to believe nobody would have noticed. Indeed, when there was a resurgence of the epidemic in Beijing recently, everyone immediately knew about it. Using phylogenetic evidence, a recent study of the epidemic in Guangdong, China’s most populous province, showed that most infections were imported from elsewhere and that local circulation was extremely limited. This is compelling evidence that, as the official figures suggest, China was able to limit the spread of the virus outside Hubei. In almost every country, it’s also the case that some regions were far more badly affected than others, but most countries are not as large and populous as China, so as I explained above, dividing the number of cases/deaths by the whole population is going to make the comparison with them misleading.

If we compare the number of cases per 100,000 in the rest of the world with the number of cases per 100,000 in Hubei, this is what it looks like:


As you can see, while the number of cases per 100,000 in China as a whole is on the low end compared to other countries, the number of cases per 100,000 in Hubei is just slightly below the median of the rest of the world, since 95 countries out of 208 in the dataset have fewer cases per 100,000.

Similarly, if we compare the number of deaths per million in Hubei with the rest of the world, we see that it’s well above the median since 148 countries out of 208 in the dataset have fewer deaths per million:


Now, this comparison is not fair either, because Hubei was the worst affected region in China. If we were to compare the number of cases/deaths per capita in Hubei with the number of cases/deaths per capita in the worst affected regions of other countries, China would look a lot better. Nevertheless, comparing Hubei to other countries illustrates just how misleading it can be to compare countries that differ vastly in size and population density, even if you try to make the comparison fairer by looking at the number of cases/deaths per capita.

If we were to ask a cave-dwelling hermit to look at the official figures for every country and guess which one had been cooking its books, he would almost certainly not pick China, but more likely a country like Vietnam or Taiwan, which are clear outliers. Nevertheless, it is claimed that China obviously faked its epidemic data because it’s not possible to contain the virus to the extent the official figures suggest. But this argument is simply question-begging, because the extent to which China was able to contain the outbreak within its borders is precisely what is at issue. If the effort to contain the virus wasn’t as successful as the official figures suggest then, obviously, the figures in question are not accurate. A demonstration that China manipulated the data about the outbreak requires independent evidence for that claim, and it won’t do to just repeat the claim in a different way. But that’s all people seem to offer on this point.

Given the extraordinary lengths to which the Chinese government went to suppress the virus once the occurrence of sustained human-to-human transmission was acknowledged, it’s not particularly surprising that the outbreak was quickly brought under control in Hubei and that circulation of the virus was very limited in the rest of the country. I don’t think most people in the West realise just how extreme China’s containment measures were, but this Twitter thread by Nicholas Christakis offers an illuminating overview—certainly, they were far more restrictive than anything imposed anywhere in the West, and much more strictly enforced. This should not surprise anyone given the nature of the Chinese regime.

Furthermore, although the severity of lockdown policies varied from region to region, they were by no means limited to Hubei. According to New York Times analysis published in April based on announcements made by provinces and major cities, residential lockdowns of varying strictness then covered a staggering 760 million people—more than half the country’s population. Christakis and his team estimated that more than 930 million people—two-thirds of the population—were subjected to movement restrictions of some kind. And while testing capacity was insufficient in Wuhan early on, in the rest of the country, where the authorities were not caught by surprise to the same extent, testing seems to have taken place on a massive scale. According to the paper on Guangdong, 1.6 million tests were conducted in the province between January 30th and March 19th, only 1,388 of which came back positive. So the rate of positive tests during this period was less than one percent, which is far lower than almost anywhere else and indicates that testing was very extensive.

Ben Hunt’s theory

This thread on Reddit has been widely cited as evidence that China’s data have been falsified, and this blog post by a political scientist named Ben Hunt, which summarises the same argument, has also received a lot of attention. The author of the original Reddit post showed that, if we train a quadratic model on China’s official data about the number of cases between January 20th and February 4th, the fit is almost perfect. The prediction of the model remains pretty good for a few days after February 4th, before diverging radically as the number of new cases rapidly drops:


The same thing happens if, instead of looking at the number of cases, we look at the number of deaths using data between January 20th and February 4th as our training sample. The prediction is pretty good for a few days, then starts underestimating the number of deaths by a few hundreds and eventually starts wildly overestimating the number of deaths as the fatality rate begins to decelerate rapidly around February 24th:


The model’s predictions don’t remain accurate for very long and are not really accurate at all unless we arbitrarily focus on just one part of the curve. When this is pointed out, people like Hunt reply that it’s “what you’d expect from a politically adjusted epidemic model over time… at some point you have to show a rate-of-change improvement from your epidemic control measures.” Be that as it may, it’s also exactly what you’d expect to see if China had suddenly taken radical steps to bring the epidemic under control, which is precisely what it did. This is not evidence of fraud, it’s just sloppy analysis presented as evidence of fraud, which is not the same thing at all. Hunt has not, as far as I know, accused South Korea of fraud, but the case rate predicted by the model fits much the same pattern:

When looking at the number of South Korean deaths, the model remains relatively good for a longer period of time, but if we looked at it beyond April 1st, the number of deaths would also collapse:


Besides the divergence of the model’s predictions and the official figures, Hunt claims that it “should be impossible” for the quadratic model to fit the case numbers in the early period of the epidemic because, until they are brought under control, epidemics are exponential not quadratic. The problem with this claim is that 1) it’s not true and 2) even if it were, Hunt’s conclusion still wouldn’t follow. First, the early growth dynamics of epidemics are only exponential in naive epidemiological models. More sophisticated models (that don’t, for instance, assume homogenous population mixing) predict sub-exponential growth patterns even in the absence of factors that mitigate the transmission rate over time. This is often what data about the early phase of real epidemics show. So the fact that a quadratic model fits the Chinese data during the early stages of the outbreak is not particularly surprising and certainly isn’t evidence that China manipulated the data.

Even if it were true that, during the early phase, epidemics always grow exponentially, the fact that China’s official data (and South Korea’s) fit a quadratic model during this period doesn’t mean the data have been fudged. Indeed, even if the actual number of infections and deaths grew exponentially in the early phase, this wouldn’t necessarily be reflected in the number of recorded infections and deaths. As I have already pointed out, even in the absence of deliberate manipulations, data about the number of cases and even deaths are very noisy. Case data, in particular, may say more about the state of a country’s testing capacity than about the actual number of infections, especially in the case of a novel pathogen for which there is no existing stock of reagents needed for testing by PCR. So, while Hunt’s argument may look convincing prima facie, it doesn’t actually prove anything.

Benford’s law

A few months ago, some people analysed the official Chinese data to see if they obeyed Benford’s law and found that they didn’t (here is one example). This strengthened the already widespread belief that China was lying about how bad its outbreak was.

Benford’s law states that, in many naturally occurring numerical datasets, the leading digit—that is, the first digit in a number that is not zero (e.g., 3 in 345)—occurs with predictable frequency. This is counterintuitive—one would expect leading digits in datasets to be distributed randomly, but Benford’s law predicts that 1 occurs most frequently, with 2 through 9 declining in frequency in a specific way. There is no theoretical reason to expect data to obey this law—it’s just that, as a matter of empirical fact, they often do. Consequently, Benford’s law has been successfully applied to detect fraud in elections, finance, accounting, etc.

If we knew that, in the absence of fraud, the number of cases/deaths attributed to COVID-19 should grow exponentially, there would actually be a reason to expect the data to obey Benford’s law. However, as I’ve just noted, there is no reason to expect the number of cases/deaths to grow exponentially, even in the early phase of the epidemic. Insofar as the failure of a country’s official data to obey Benford’s law is suspicious, it’s only because the law often seems to hold even though we generally don’t know why. This means that, even if a country’s data about the COVID-19 epidemic don’t obey Benford’s law, it would hardly be conclusive evidence that it engaged in fraud. On the other hand, my sense is that it wouldn’t be easy to fabricate epidemic data that obey Benford’s law, so the fact that a country’s data obey Benford’s law, while hardly conclusive, offers some evidence that it didn’t engage in fraud.

Several papers have since reanalysed China’s data (herehere, and here) and concluded that they did obey Benford’s law after all. For good measure, I conducted my own analysis using the daily number of new cases/deaths according to the ECDC. As you can see below, where I show the result for the daily number of new cases in a selection of countries, it confirms that not only did China’s data obey Benford’s law, but it did so better than the data from any other country (this is probably because it has more days of data since the pandemic started over there):

A chi-square test leads to the rejection of the hypothesis that the data-generating process obeys Benford’s law in Brazil, Canada, Denmark, France, Germany, Italy, Japan, Spain, Sweden, and the United Kingdom. The p-value is also suspiciously low in New Zealand and Thailand.

Here is another chart that shows the result of the analysis for the daily number of new deaths in the same countries:

Once again, China’s data obey Benford’s law, but a chi-square test suggests a violation of that law in Australia, Brazil, Canada, Denmark, Italy, Japan, New Zealand, and South Korea. The p-value is also suspiciously low in Thailand.

Since we have no theoretical reasons to expect the data to obey Benford’s law in any country, it doesn’t show those countries engaged in fraud and I don’t believe they did. In the full dataset, Benford’s law is violated in about 42 percent of the countries for the number of cases and 56 percent of them for the number of deaths at the conventional level of significance, which suggests that it’s not a good test of fraud in the case of the COVID-19 pandemic. Nobody is about to accuse Denmark or New Zealand of fraud on the basis of this analysis, but that’s exactly what everybody would have done had China’s data violated Benford’s law. There is no good reason for this double standard.

Evolving definitions and methodologies

At the beginning of April, the Chinese health authorities announced they would start including asymptomatic cases in the official number of cases they release every day. This announcement renewed speculation that China had been deliberately manipulating its numbers to make the epidemic seem contained when it wasn’t really under control. However, China’s exclusion of asymptomatic cases from its figures had been known about since February and was the subject of a fair bit of media discussion at the time. The New York Times published a story about it on February 12th, and Nature published a piece about it on February 20th, in which some experts were critical while others argued that the practice made sense.

Personally, I don’t think the policy of excluding asymptomatic cases from official counts made any sense at all, but nor do I think the Chinese health authorities adopted this policy to conceal the true extent of the epidemic. This was still early days, so nobody knew much about the virus yet and the role played by asymptomatic carriers in transmission was still unclear. (In fact, even five months later, the role of asymptomatic carriers in transmission is still being debated.) The Chinese health authorities were clearly struggling to determine how to count infections—between January 15th and February 20th, they revised the definition of a case seven times.

The official number of cases would certainly have been significantly higher had China used the most inclusive definition throughout the epidemic, but not all revisions of the definition resulted in a lower number of cases. In particular, as the New York Times observed, on February 13th, the health authorities decided to start counting people in Hubei province who had symptoms consistent with COVID-19, even if their diagnosis hadn’t been laboratory-confirmed. This change predictably produced a large spike of cases that day. They did this because there was a shortage of PCR tests in Hubei at the time, so requiring such a test before someone could be counted would artificially suppress case figures. This is not the kind of decision made by people trying to hide the true extent of an epidemic.

As a matter of policy, several countries, including France, are not even testing asymptomatic people and nobody is accusing them of fudging their numbers. (In fact, at the peak of the epidemic, it was French policy to only test people with severe symptoms.) Back in March, those who attacked China’s decision not to count asymptomatic cases also speculated that the epidemic was still raging in several parts of the country, but we now know this wasn’t the case. There have been several localised resurgences of the epidemic in China since the end of March, such as the flare-up in Beijing in June (which forced the authorities to cancel flights and shut down schools again), but so far the authorities have been able to contain all new outbreaks pretty quickly and haven’t tried to hide them as far as we can tell.

The Wuhan urns

Towards the end of March, funeral homes in Wuhan reopened after two months of lockdown and photographs appeared in Caixin showing a long line of residents waiting to collect the ashes of their dead, as well as a truck apparently delivering 2,500 empty urns to a funeral parlour. Since the city’s official death toll was only 3,869, this fed speculation that China had been systematically undercounting its COVID-19 dead. As various foreign media outlets such as Radio Free Asia reported, Chinese social media users estimated that between 42,000 and 46,800 people had died in Wuhan during the lockdown. According to China’s National Bureau of Statistics, the death rate in Hubei was seven per 1,000 in 2018 (the most recent year for which data are available), so approximately 6,400 people would have been expected to die in Wuhan—a city of 11 million—during an ordinary two-month period. (Since mortality is higher in winter and can vary quite a lot year-to-year, this figure could be a bit higher, but the number of deaths by month is not available, so it’s not possible to remove the seasonal effect.) It means that, even under conservative assumptions, the estimates circulating on social media imply that excess mortality due to COVID-19 was somewhere between 35,600 and 40,400. The official number of deaths attributed to COVID-19 underestimate excess mortality in many countries, but not by a factor of 10, so if those estimates were reliable it would almost certainly mean that China’s authorities lied about how many people died of COVID-19.

The problem is that we have no reason to take those estimates seriously and every reason to dismiss them as baseless. First, they are extrapolations from figures that are very thinly sourced and informed by a number of assumptions that are unclear and dubious. For instance, according to Radio Free Asia, “social media posts have estimated that all seven funeral homes in Wuhan are handing out 3,500 urns every day in total.” Since “funeral homes have informed families that they will try to complete cremations before the traditional grave-tending festival of Qing Ming on April 5th, which would indicate a 12-day process beginning on March 23rd,” social media users just multiplied 3,500 by 12 and arrived at an estimate of 42,000. We have no idea how the estimate of 3,500 urns a day was obtained, so we have no reason to think it’s accurate and, even if it is, this figure was estimated in the first days after funeral homes in Wuhan reopened (Radio Free Asia’s story was published on March 27th) and we have no reason to think that funeral homes continued to hand out urns at the same rate after that. Were people not already convinced that China is lying about the outbreak, nobody would take this kind of conjecture seriously. In fact, if Chinese state media were publishing this kind of speculation about the epidemic in the US, it would rightly be dismissed as propaganda.

If excess mortality due to COVID-19 were really as high as those estimates suggest, it would make the Infection Fatality Rate (IFR) in Wuhan astronomical. According to a serological survey conducted on 714 healthcare workers in Wuhan between March 30th and April 10th, the results of which were recently published in Nature, seroprevalence in that group was 3.8 percent. This is not a random sample but, if anything, healthcare workers were probably among the people most likely to be infected. If we accept that excess mortality due to COVID-19 was between 35,600 and 40,400 in Wuhan, it means the IFR must have been between 8.5 percent and 9.7 percent. This is much higher than any estimate of the IFR anywhere in the world, which a recent meta-analysis estimates to be around 0.66 percent. So the social media estimates are simply implausible on their face. The IFR indicated by the official figures, on the other hand, is approximately 0.9 percent, which is within the range of what has been observed elsewhere.

Of course, since it comes from China, many will simply reject the seroprevalence estimate out of hand. But those findings are broadly consistent with the results of another study published by a team of researchers from Hong Kong, which found that, of 452 Hong Kong residents evacuated from Hubei (80 percent of whom came from Wuhan) on March 4th–5th, four percent were seropositive. Those estimates also imply that there were between 3,236 and 3,672 deaths per million in Wuhan. This is significantly more than in New York City, where there were approximately 2,901 deaths per million between March 11th and May 2nd. Are we really supposed to believe that more people died in Wuhan, where the lockdown was incredibly strict and the sick were quarantined away from even their family in dedicated centres, than in a city where only a very lax shelter in place order was issued, and in a state where the governor ordered recovering COVID-19 patients to be sent to nursing homes?

A late revision of China’s death count

On April 17th, the Chinese health authorities revised the number of deaths in Wuhan and added 1,290 deaths to the official count. There are those who think that this too provides evidence that China had been hiding these deaths and belatedly decided to add them to the official tally in order to make it more realistic. But this is, frankly, a ridiculous argument. Since China already stood accused of manipulating their data, the Chinese authorities must have known that any revision would only fuel rather than dispel existing suspicion, which is exactly what happened. That the revision increased the number of deaths by exactly 50 percent is also thought to indicate manipulation but nobody, as far as I know, has explained why. Presumably, it’s a Bayesian calculation that an increase of exactly 50 percent is more likely to have been manufactured than produced by chance. But another Bayesian might equally counter that a regime smart enough to conceal thousands of unrecorded deaths would hardly then seek to revise their fraudulent total with a nice round number.

A more likely explanation is simply that, in the middle of the outbreak, it was very difficult for the local health authorities to accurately count the number of people who died of COVID-19 (not least because they lacked tests). Then, after things calmed down, they tried to identify people who had died of COVID-19 outside hospitals but had not been counted. So, they looked through their records to find COVID-19 deaths they’d missed and revised the number of fatalities as best they could. Health authorities in many Western countries have also made large revisions to their numbers of dead, including FranceItalySpainthe UK, and New York, but have not been accused of manipulating the data.

No doubt further revisions will be required as health authorities everywhere take the time to look at their data more closely. It’s likely that even China’s revised death toll still underestimates how many people died of COVID-19, but as excess mortality analyses show, this is true almost everywhere. It wouldn’t be surprising if China’s data on the number of deaths caused by COVID-19 were somewhat worse than similar data in Western democracies, not because China falsified its numbers, but because China’s vital statistics are still very poor compared to what is available in the West. China has a huge population and is still relatively poor, so it didn’t even have a mortality surveillance system that provided representative data on both the total number of deaths and the distribution by cause-of-death until 2013. The system established after that remains sample-based and only covers 24 percent of the population.

China’s data no doubt paint a very imperfect picture of its epidemic. But that’s true everywhere, and I haven’t seen any evidence that the problem is worse in China. There may well have been instances of manipulation here and there—that wouldn’t be at all surprising in such a vast authoritarian country with a largely decentralised structure beset by corruption. But the claims of centrally directed wholesale manipulation of data remain entirely unsubstantiated.


In this series of essays, I have examined in detail the accusations made against China in connection with the COVID-19 pandemic. I have concluded that there is a grain of truth to some of them—mistakes were certainly made in the early days of the crisis and the Chinese authorities have not always been forthcoming with information about the epidemic. Nevertheless, a careful review of the evidence suggests that most of the allegations are either exaggerated, unsubstantiated, or nonsensical, and sometimes they are all three. In particular, the claim that China is somehow responsible for the botched response to the pandemic in most Western countries doesn’t withstand even cursory scrutiny. Yet this claim continues to be made—not only by government officials eager to scapegoat China for their own lamentable failures, but also by journalists and citizens who ought to be more concerned about how badly their own countries have been misgoverned during this public health emergency.

I have highlighted several instances in which Western officials and journalists have misrepresented or distorted evidence. This may be a consequence of confirmation bias, fear of being accused of helping China or a tacit assumption that, since the Chinese regime is evil and hated, there’s nothing particularly wrong with dissembling to make it look bad. But, whatever the reason, this disregard for accuracy is dangerous, particularly on the part of journalists, who ought to at least strive to pursue truth irrespective of their personal ideological leanings. And it has contributed to a feedback loop I have observed over the past few months—people blame China for the pandemic because they adopt low evidentiary standards when it comes to accusations against China, which makes them hysterical about China, which in turn leads them to further lower their evidentiary standards, which makes them believe even more nonsensical accusations against China, etc. If people would only pause to consider whether or not the accusations against China make sense, they might realise that many of them do not.

As I wrote in the introduction to this series, there are many reasons to dislike and distrust the Chinese regime. But when dislike and distrust disable the ability to parse evidence and think clearly, they disfigure our understanding of reality. Hatred of the Chinese regime has become so strong and pervasive in the West—especially in the US, where China is seen as its main geopolitical foe—that it creates incentives that allow unsubstantiated allegations to spread largely unchecked. Indeed, not only does this prejudice mean that people adopt a lower evidentiary standard to examine such allegations, but anyone who points out they are unsubstantiated risks being accused of being China’s dupe. As the rivalry between the US and China grows, we should expect disinformation about China to become increasingly common. This is especially true since, as we have seen repeatedly in these essays, China hawks in the US administration are clearly trying to influence public opinion about China by leaking misleading information. China’s regime is appalling in many ways, and it’s understandable that people feel no sympathy toward it, but this fact should not make us accept dubious claims just because they fit our preconceptions. On the contrary, knowing that we feel that way and that it will unconsciously make us less cautious when evaluating claims that cast China in a dark light, we should be extra careful before we accept such claims. The stakes couldn’t be higher.


Philippe Lemoine is a PhD candidate in philosophy at Cornell University. He maintains a blog where he writes about politics, philosophy, and social science. You can also follow him on Twitter @phl43.

Feature image by Kobu Agency on Unsplash.

CORRECTION: The initial version of this article incorrectly stated that Hubei’s number of cases per capita is higher than the median of the rest of the world, when in fact both the chart above that sentence and the passage immediately after it show that it’s slightly below the median. Overall, Hubei still looks significantly worse than the rest of the world. Quillette apologises for the error.


  1. The China Syndrome Part IV: Did China intentionally mislead the world, hide information, downplay early information, withhold samples of the virus, obfuscate, allow infected nationals to fly from Wuhan to points all over the world, restrict access to the Bio Lab, etc. etc?

    There i fixed it for you.

    The author uses open source intel (Reddit, Twitter, etc) to argue about death rate reporting, and obfuscates the CCPs culpability in evolving an epidemic into a pandemic in the process.

  2. . Since China already stood accused of manipulating their data, the Chinese authorities must have known that any revision would only fuel rather than dispel existing suspicion, which is exactly what happened.

    The author gives the chicoms way too much credit. They are not playing three dimensional chess.

    They lied, and the only question is, by how much.

  3. Thanks for this. Even as someone skeptical of the “blame China” narrative, I still believed that cases and deaths in China were probably significantly undercounted, but these international comparisons show that they are not outliers and probably something approximating the truth.

    I remember back in March a lot of the hysteria that led to the severe lockdowns was everyone arguing that China’s numbers were fudged and we should trust the worst numbers out of Italy instead. It wasn’t just the 40,000 urns, people were seriously arguing that millions of people had died in China based on cell phone records. Now thankfully it looks like Italy and the other worst-hit European countries were the outliers (perhaps older demographics, and getting a worse strain). Ironically we probably would’ve done less damage to the economy in March and instead gone with more moderate social distancing measures like we have today if we’d believed the Chinese numbers.

  4. Thank you Mr. Lemoine. This article was informative on a complex topic that, as you illustrate, is tossed about with opinions and conjecture regularly. You addressed the data well.

    As with the article on New Zealand published on Quillette a short while ago, your analysis points to the responses and activities that are effective against a pandemic. As much as we in the west hate to admit, an authoritarian state can be effective against a pandemic by simply instituting a complete lockdown and enforcing it. Considerations for individual freedoms curtailed, potential long-term economic damage set aside and the transmission of the virus halted. A fast, hard response again may be worth much more than the perfect response. A hard response for weeks rather than a soft response for months. If another deadly virus hits in seven years (it will +/- five years), what will the responses and results be in China, US, New Zealand, Sweden,Brazil, Australia,…?

  5. An interesting set of essays. Thank you Phillippe.
    Your warning in the final paragraph is important but I fear will fall on deaf ears for the most part in the US (especially in an election year). Hatred of the Chinese regime is not as prevalent here in the UK, but it’s palpable and has already prevented politicians and health professionals from acting in our best interests.
    I hadn’t heard of Benford’s Law. Fascinating. However I have read a number of articles that say the virus has behaved pretty much as you would expect a virus to behave in just about all countries where figures are available, including China.
    It’s good that you point out just how large the population is in China and that overall it is still a relatively poor country and it isn’t that straightforward to collect data. People forget this when they allow their dislike of the regime to get in the way.
    FWIW we in the UK still don’t think our covid death numbers are accurate. I think we’ll only have a good picture next March when we can look back at excess deaths and make a more educated guess … because a guess it will be.

  6. I’ve been tracking COVID case and death statistics for a number of countries since April 17 using the date found on the “Our World In Data” website. Since April 17, China has reported only 96 additional deaths. Since those deaths are not broken down on this site by city or region, we don’t know where these deaths occurred. Regardless, for a population the size of China’s and for the virulence of the initial outbreak, and assuming that most of these 96 deaths occurred in Wuhan, this level of infection and subsequent deaths seems too good to be true, even for as repressive regime as China’s. My skepticism regarding China’s acting in good faith throughout this pandemic is unabated.

  7. A small imprecision: the Benford law has good theoretical grounds. For example, in an exponential growth, the expected distribution of digits can be readily calculated, and will work on real cases. A simple explanation is on the wikipedia page.
    Of course in most situations we don’t know the underlying theoretical model, so this law will be more delicate to use.

    Apart from this imprecise statement, the following calculations in the article seem sound (and make use of the theoretical predictions of Benford law). Please fix.

  8. Simple answer to the simple question in the title:

    “These aren’t the droids you are looking for.You can move along.”

  9. why do you compare china’s infection number with small poor countries like Lao, cambodia, thailand and mongolia, which are likely undercounting infection cases themselves? If china’s infection rate is the same as say hong kong or south korea, then its total infected number may be higher than 300000.
    Just like syria claim to have 3000 cases, but both Iran and Iraq have over 300000 cases.
    As for death counts hard to fudge, 10 years after the great leap forward, no body knew how many had died.

Continue the discussion in Quillette Circle