If The Findings Detract, You Must Retract

In 2020, five psychologists asked the editors of PNAS to retract their study of racial bias in police shootings. PNAS, which stands for the Proceedings of the National Academies of Science, is one of the most prestigious multidisciplinary journals in the world. Retraction is an outcome no scholar wishes to experience because it signifies a serious research error and, as such, entails considerable reputational damage.

No study is perfect. Falsification, i.e., disproving prior research, is a normal part of scientific progress. However, journal articles do not get retracted simply because they were wrong or imperfect. Typical reasons include data fabrication or the presence of a fundamental mistake in the analysis. So, what was the reason why Johnson, Tress, Burkel, Taylor, and Cesario asked for their study to be retracted? Did they fudge the data? Was there a major mistake in the analysis? No and no.

Was it political?

Some observers have suggested that the retraction was politically motivated. The study, which showed no evidence of racial bias in police shootings, had been used in political debates in ways that challenged calls for radical police reform; calls that had grown louder in the aftermath of the murder of George Floyd. Heather Mac Donald, a research fellow at the conservative Manhattan Institute, claimed the article was retracted because she had cited it in a congressional hearing and in essays published in the Wall Street Journal and other right-leaning media outlets. Others denied this claim. Most importantly, Dr. Joseph Cesario, the senior author of the retracted article wrote, in response to Ms. Mac Donald, “We retracted the paper because we overstepped with the inferences we made from our data.” In other words, he maintained the reason was not political but motivated by purely scientific considerations.

Normally, one would be inclined to take the authors’ word for it. But the process whereby this retraction took place was anything but normal. It involved several rounds of debate during which the authors acknowledged a minor inaccuracy but consistently defended the basic integrity of their research. At one point, the PNAS editorial board commissioned an independent panel of experts to evaluate the charges leveled against the article. This post-publication review concluded that the study “does not contain fabricated data or serious statistical errors warranting a retraction.” Despite this expert judgement, the authors themselves asked their article to be retracted.

Outside the editorial process, there is clear evidence of political passions at play. As a part of their campaign to have the results of the study nullified, the critics of the PNAS article organized a petition signed by 871 academics and research scholars, many of whom are elite members of the social scientific community. The list includes Ethan Bueno de Mesquita, the Dean of the University of Chicago’s Harris School of Public Policy; Gary King, the Director of the Institute for Quantitative Social Science at Harvard University, and—oddly enough—Douglas Massey, who was the PNAS editor in charge of managing this controversy. The language of the petition states that “[m]isleading statistics have been used to justify racial injustice in the past. They should not be used to do so now. This faulty research should not be relied upon in the current debate over policing reform.”

The editorial statement by the PNAS was, if possible, even more explicit about the political motives behind the retraction: “The problem that exists now, however, is outside the realm of science. It has to do with the misinterpretation and partisan political use of a scientific article after its publication” (italics added).

Which of these accounts is correct? Was the retraction politically motivated or scientifically justified? To resolve the matter, it’s helpful to start with the central point of agreement: the nature of the problem with the PNAS article.

The error of their way

The point of the PNAS study was to examine the racial bias in lethal police shootings. The results showed that, given the racial distribution of the civilian victims of police shootings, there was no evidence that white officers were any more likely than black officers to target black victims. In other words, the allocation of victims to officers was essentially random as far as their racial characteristics.

For every published article, the PNAS asks authors to construct a “significance statement” which serves as a short summary of the study intended for the lay reader. The mistake at the heart of the controversy has to do with one inaccurate sentence in the significance statement: “[W]hite officers are not more likely to shoot minority civilians than nonwhite officers.” For someone unfamiliar with the study, this sentence gives the false impression that the study estimated the overall probability of, for example, black people being shot by white police officers.

As noted by the critics, it is not possible to conclude anything about the “likelihood” of a police officer shooting a civilian without data on all police-civilian encounters, including encounters that did not result in a shooting. The likelihood, i.e., the probability of an event is calculated as the number of times the event occurs divided by the number of times it could have occurred. Since the study was limited to data on fatal shootings—and does not include the necessary denominator—it cannot speak to the overall civilian risk of being killed by the police.

In the article itself, the authors are exceedingly clear about this issue: “[I]t is important to be clear at the outset that our analyses speak to racial disparities in the subset of shootings that result in fatalities, and not officers’ decisions to use lethal force more generally.” In fact, the defining purpose of their research was to provide an alternative way to examine the “racist cop” hypothesis: “[W]e approached racial disparity from a different angle and asked: What factors predict the race of a person fatally shot by police?” They concluded the article by noting: “One limitation of our results is that they only focus on officers who fired at a civilian that was fatally wounded.”

Given their keen awareness of the issue, it was easy for Johnson, Tress, Burkel, Taylor, and Cesario to admit to the inaccurate wording in their article’s significance statement: “Although we are clear about the quantity we estimated … our language in the significance statement of our report should be more careful.” Following this acknowledgement, PNAS issued a correction to the article, alerting readers to the misleading sentence. The authors’ correction note includes this clarification: “[T]his sentence should read: ‘As the proportion of White officers in a fatal officer-involved shooting increased, a person fatally shot was not more likely to be of a racial minority.’ … To be clear, this issue does not invalidate the findings … discussed in the report.”

In short, there was a point when the authors recognized their mistake but thought issuing a correction was the appropriate remedy. As noted, this was also the judgement rendered by the PNAS expert review. Nevertheless, a short while later, the authors asked for their article to be retracted. Given the facts, it does look like they changed their mind under duress after having been labeled as enemies of racial justice by 871 peers of considerable distinction.

Method of difference

How do we resolve this dilemma? Let’s say we assume that the retraction was politically motivated—and scientifically unjustified—how does one prove it? After all, reasonable minds can disagree. It is entirely possible to take the position that the error in the significance statement is sufficient grounds for retracting an article. Personally, I think this is a rather harsh approach given that, except for that one sentence outside the body of the article, the study made a valuable contribution. The results, as reported in the article proper, were entirely correct.

If only there were another peer reviewed article that was more or less identical to the PNAS article in all but one respect: It was not used in political debates by Heather Mac Donald or any other conservative policy advocate. Ideally, this hypothetical article would have to meet the following three criteria: (1) it investigated the racial composition of victims and officers involved in lethal police shootings; (2) it found no evidence of anti-black bias; and (3) it gives the same misleading summary of the findings as the PNAS article did.

In 2019, only a few months prior to the publication of the PNAS article, the Public Administration Review disseminated a study by Menifield, Shin, and Strother. Similar to the authors of the PNAS study, they were interested in racial disparities in the police use of deadly force. Similar to the PNAS article, they examined the racial composition of the victims killed by white vs. nonwhite officers. And similar to the PNAS study, they found no evidence of racial bias: “[W]hite police officers actually kill black and other minority suspects at lower rates than we would expect if killings were randomly distributed among officers of all races.”

But how did Menifield, Shin, and Strother summarize their results? Consider this sentence from the abstract, the scientific summary of the article:

[W]hite officers appear to be no more likely to use lethal force against minorities than nonwhite officers.

Compare that statement to the misleading sentence from the significance statement of the retracted PNAS article:

White officers are no more likely to shoot minority civilians than non-white officers.

Quite clearly, both statements are guilty of the same mistake. Both articles summarize the results in terms of the probability of police shooting civilians. Whatever position one takes on the mistake in the PNAS article should be applied to its predecessor as well. Yet, there has been no effort to retract this other article. Why? Perhaps it is too obscure. As far as journals go, PNAS is far more prestigious than Public Administration Review.

This conjecture is not credible for two reasons. First, before the article was even officially published, Menifield, Shin, and Strother had described their research in an op-ed article in the Washington Post, one of the most prestigious and widely read newspapers in the country. Second, as one would expect, the authors of the PNAS article cite the Menifield article and explain how their research builds on it. In short, there is no way that anyone who pays attention to this literature was not aware of these two articles on the same topic, published within months of each other.

The only conspicuous difference between these otherwise similar articles is that only the PNAS article was used in political debates as evidence against the narrative that Americans are policed by racist cops. Why did Ms. Mac Donald ignore one of these studies but highlight the other? I suspect it has to do with the fact that the rhetoric used by Menifield, Shin, and Strother was not in line with her agenda. Having found no evidence of “microlevel” racism, these authors suggested “systemic” racism as the likely reason for the overrepresentation of black people as victims of police use of lethal force. By contrast, the PNAS article downplayed the need to diversify the police force. Clearly this take was more appealing to the conservative “pro-cop” agenda.

Ultimately, it does not matter why conservative participants in this policy debate used the PNAS study. The only thing that matters is that this is the characteristic that makes it different from the study that has not been retracted despite making the same exact error. Thus, using a classic tool of causal inference, Mill’s Method of Difference, we can logically conclude that the PNAS article was indeed retracted for purely political reasons:

If an instance in which the phenomenon under investigation occurs, and an instance in which it does not occur, have every circumstance save one in common, that one occurring only in the former; the circumstance in which alone the two instances differ, is the effect, or cause, or an indispensable part of the cause, of the phenomenon.

A shameful science

The retraction of the PNAS study is a clear case of academic censorship. It is a perfect example of the all-too common tendency among social scientists to use different standards of rigor depending on the implications of the findings. Lee Jussim, professor of psychology at Rutgers University, has coined the amusing term Rigorus Mortus Selectivus to describe this phenomenon. If the research supports the “liberal progressive narrative,” studies using weak designs get published. Even studies with major errors remain in the literature and continue to get cited as long as the conclusions are politically correct. However, if the results challenge the worldview of the politically homogenous elite, the academic community sets out to identify a “flaw” of whichever magnitude and mounts an attack against scholarship that offends the party line.

Another recent victim of Rigorus Mortus Selectivus is known as the “mentoring study.” In an article published in Nature Communications, three data scientists reported evidence suggesting that junior scientists may benefit more from having male as opposed to female mentors. Unsurprisingly, this result was met with social media outrage because it challenged the gender ideological orthodoxy. In the end, the authors were pressured to retract the article with no actual error in the data or its analysis.

In 2022, Dr. Klaus Fiedler, the editor of the journal Perspectives on Psychological Science was forced to resign after he attempted to publish articles debating popular claims about diversity and racism within the field of psychology. After professor Steven Roberts, one of the scholars involved in the process, shared a one-sided account of his grievances, more than 1,000 psychologists signed a petition demanding Dr. Fiedler’s immediate resignation. Without due process, the board of the Association for Psychological Science asked him to resign. Astonishingly, the petition in support of Dr. Roberts also asked the APS to “grant him any additional reparative action he might deem necessary.” (It is unclear if this could include a Caribbean cruise or perhaps a brand new Tesla.)

Academics who like the status quo dismiss these examples as mere anecdotes. However, there is a point in which steady accumulation of anecdotes amounts to systematic evidence. That point has been reached when all the examples point in the same direction. In social sciences, we have reached that point a long time ago. There are no examples of retractions of articles that show evidence of, say, systemic racism unless there is clear and convincing proof of serious research misconduct. Even in those kinds of cases, the research community has demonstrated unwillingness to do the right thing.

Left-wing political advocacy and hostility towards viewpoint diversity are integral features of contemporary social science. In his book The Sacred Project of American Sociology, professor Christian Smith provides a detailed account of these realities within his own discipline. To the outside world, sociology portrays itself as an objective science of social facts. In reality, Dr. Smith argues, the field is “packed full of like-minded scholars” interested in promoting “social-change activism” with little regard to research integrity.

Smith’s damning analysis may not be the whole truth about American social science. Sociology is far more political than, say, economics or even psychology. Still, having participated in interdisciplinary social inquiry for more than 30 years—a period that includes five years at the prestigious Institute for Social Research—I tend to concur with Dr. Smith’s assessment of sociology, criminology, and most other fields of social science as ideological projects masked as science.