Spurious Research as Tendentious Propaganda

A liberal society depends on its procedures as well as its institutions. Foremost among these is the error-correction mechanism of scientific method and critical inquiry. In his important book The Constitution of Knowledge, Jonathan Rauch identifies current threats to the pursuit of reliable knowledge. He argues that “politicising an academic discipline like sociology or literary criticism, or spreading propaganda to discredit and drown out fact-based journalism” is as damaging as, in the hard sciences, banning the teaching of evolution or denying the efficacy of vaccines.

This article recounts a case of bogus and politicised scholarship—propaganda—in the social sciences. A research article claiming to provide computational linguistic proof of bias in America’s premier newspaper of record is vitiated by ignorance of syntax yet the editors of the academic journal that published it decline to withdraw it.

Media, War & Conflict, published by Sage, describes itself as “a major international, peer-reviewed journal that maps the shifting arena of war, conflict and terrorism in an intensively and extensively mediated age.” There’s an inadvertent irony in its membership of a body called the Committee on Publication Ethics (“promoting integrity in scholarly research and its publication”).

Last year, the journal published a paper by Holly M. Jackson, a doctoral student in computer science at Berkeley, with the self-explanatory title “The New York Times Distorts the Palestinian Struggle: A Case Study of Anti-Palestinian Bias in US News Coverage of the First and Second Palestinian Intifidas.” The article claims to “identify bias against Palestine in a newspaper of international importance—the New York Times (NYT)—as a case study in the scope of a larger problem of anti-Palestinian bias in US news coverage.”

Last week's lecture by Holly Jackson @UCBerkeley at our People's Seminar @MIT on language & linguistics for decolonization & liberation struggles in Haiti, Palestine & Israel opened our eyes on the workings of "mind infection" in newsroom & US campuses: https://t.co/8K9Nj84ZwU pic.twitter.com/p7EGWQ0aFO
— Michel DeGraff (@MichelDeGraff) November 9, 2024

A charge of systematic bias towards one party in a historic and tragic conflict is about facts. Substantiating it requires hard evidence about the way language is used. The reality is mundane and depressing. What Jackson touts as “a methodologically novel, large-scale proof” of her thesis is founded on a misconception. Yet none of the journal’s editors noticed it; nor did its anonymous peer reviewers. Upon having the disqualifying flaw of Jackson’s research methodology pointed out by linguists, the editors preferred to close ranks rather than admit error and withdraw (or even relevantly correct) the article.

In support of her thesis of anti-Palestinian bias, Jackson claims to have analysed more than 33,000 NYT articles covering the first and second Palestinian intifadas. With the aid of “state-of-the-art natural language processing toolkits as well as a regression model of 90 per cent accuracy on a carefully validated work bank,” she assesses the newspaper’s “use of active/passive voice and … the objectivity, tone and violent sentiment of the language used.”

The ostensibly scientific terminology is hollow. Though the grammatical category of “voice” is integral to her case, Jackson is unable to tell a passive from an active clause. And that’s the only falsifiable aspect of her case—the rest of it has no testable content at all. Whereas grammatical analysis is empirical, identifying bias by assessing the “objectivity” and “tone” of language used is subjective. And partisans on a political issue are not best-placed to assess the objectivity of that issue’s media coverage. (There is an approach in linguistics called critical discourse analysis which purports to uncover the implicit meanings of texts, but it’s vulnerable to this criticism too, as its practitioners rarely interrogate their own tacit assumptions.)

Jackson writes:

We can consider an excerpt from a May 2021 NYT article to see how voice plays a critical role in introducing bias (Kingsley, 2021). In the article, the journalist uses the passive voice to describe the murder of 67 Palestinians. “More than 67 Palestinians, including 16 children, have died since the start of the conflict on Monday,” he explains. Because the author uses the passive voice, he never identifies the perpetrators. A reader knows Palestinians have died, but is left clueless as to who killed them.

Patrick Kingsley, the NYT journalist quoted, does not use the passive voice: the sentence Jackson quotes is entirely active. So, why did Jackson suppose otherwise? Her criticism of Kingsley suggests that she believes “passive” is equivalent to not identifying an agent. But that isn’t what linguists mean by the term.

Across the internet, pedagogical sites offer definitions of passives that are widely repeated yet fall apart under scrutiny. Here, for example, is Dictionary.com: “Passive voice is when the subject of a sentence receives the action of the verb rather than performing the action.” That definition works in a limited way in some cases. For example, in the passive clause “the cake was eaten by the children,” there is an agent (the children), a patient (the cake), and an action (was eaten). But the definition won’t work in the case of many passives. It presupposes that passive is a semantic concept (about what words mean), whereas it’s actually a syntactic one (about how words are put together).

An English passive is prototypically a construction comprising the auxiliary verb “be” (or, more informally, the lexical verb “get”) plus a past participle. It’s a way of packaging information. It isn’t a synonym for evasive writing, or for obscuring agency. A passive clause may include a prepositional by-phrase explicitly specifying an agent (“a seditious riot was incited by the speaker”). A passive lacking a by-phrase (a so-called short passive) is not necessarily evasive either. In the passive clause “the fire was put out,” it’s usually unnecessary to specify that the agent was the fire brigade. Conversely, an active clause can omit agency (“a seditious riot broke out”).

I read Jackson’s paper after I saw it approvingly cited in an article in The Conversation (a publication that says it stands for “academic rigour [and] journalistic flair”). I then sent it to three colleagues: Geoffrey Pullum, emeritus professor of general linguistics at Edinburgh University; Brett Reynolds, an adjunct professor in the department of linguistics at Toronto University; and Lane Greene, language columnist and senior editor at The Economist.

Pullum is co-author, with Rodney Huddleston, of the Cambridge Grammar of the English Language (2002), a comprehensive synchronic account of the grammatical structure of present-day English. Reynolds is co-author, with Huddleston and Pullum, of A Student’s Introduction to English Grammar, 2nd edition. Pullum wrote to Jackson, pointing out that:

Your citations of ‘passives’ which are actually active intransitives and ‘actives’ which are actually passives with agent by-phrases goes beyond anything I have ever seen before in the sheer depth of its erroneous grasp of English grammar. And I have seen some amazingly uninformed material on actives and passives. Your paper should be retracted.

I wrote to one of the editors of Media, War & Conflict—Katy Parry, professor of media and politics at Leeds University—to say, in effect, the same thing. Parry’s field, according to her academic profile, is visual communication and photojournalism. She has no competence in English grammar or any other area of linguistics; the same is true of her co-editors. These comprise, for the record: Piotr Cieplak and Sarah Maltby, both of Sussex University; Saumava Mitra of Dublin City University; Ben O’Loughlin of Royal Holloway in the University of London; and Richard Stupart of Liverpool University. One is an academic in international relations; the others are specialists in journalism, media studies, and communication. Cieplak is a filmmaker.

Parry acknowledged my letter promptly and undertook to discuss the issue with her colleagues, the publisher, and the author. She wrote again a fortnight later. I requested that she frame any response in terms that she wouldn’t mind me quoting for publication. This is Parry’s email in full:

We’ve taken a look at this as editors and in discussion with Holly and the publishers, and we are agreed that a correction notice is all that is required for the sentence you’ve quoted. We have agreed a corrigendum text with the publisher and they are processing this. We do not believe that the integrity of the article is in question. The example you refer to was not part of the dataset, and was picked by the author from a different time period, and she agrees that this is not the passive voice. However, we think the methodology is transparent and robust, combining computational quantitative analysis, conducted using natural language processing software spaCy, with qualitative analysis.

Holly has offered this correction:

“On page 120, the example offered from Kingsley (2021)—‘More than 67 Palestinians, including 16 children, have died since the start of the conflict on Monday’—is not in the grammatical passive voice. This example was hand-selected by the author to illustrate media bias, and it is not in the study dataset (which only encompasses the First and Second Palestinian Intifadas). This example does not represent the types of constructions that were identified in this study. All examples of passive and active voice in this study’s dataset are identified using the open-source, state-of-the-art natural language processing software spaCy (https://spacy.io/usage/facts-figures).”

In summary, Parry and her colleagues maintain that, while Jackson is indeed mistaken in identifying Kingsley as using passive voice, her error has no bearing on the rest of the paper. But this evasive response fails to acknowledge that the integrity of Jackson’s paper is already shot.

Agitprop at the AHA

If the American Historical Association formally adopts a resolution accusing Israel of “scholasticide,” it could destroy the organisation’s reputation for serious scholarship.

QuilletteJeffrey Herf

It’s a subsidiary point but, even in its own terms, Jackson’s proposed correction (which has yet to appear) is inadequate, as it doesn’t state what is being corrected and Jackson doesn’t even admit to having got anything wrong. And the wording is slippery. Jackson refers to “the grammatical passive voice” even though passive voice is by definition a grammatical category. This leaves open the possibility that there is some wider, ethereal notion of passives consistent with her misunderstanding of the term. There isn’t.

More fundamental is the disingenuous defence used by the editors—that Jackson’s inability to distinguish an active from a passive doesn’t matter because she employed “state-of-the-art natural language processing software” to do the job for her. Software is fallible, and we know Jackson isn’t competent to review the output of a program that, as she puts it, “classifies the voice (active or passive) of all sentences with Israeli or Palestinian subjects.” (It’s not Jackson’s most significant mistake, but “subject” is a grammatical function label whereas she means subject referents. An author writing a purportedly scholarly article about language does need to get a grip on the terminology, and to confuse a word with a thing—an extra-linguistic entity in the real world—is as basic an error as it’s possible to make in linguistic science.)

For the sake of argument, however, let us suppose that Jackson’s software has 100 percent reliability in distinguishing active from passive clauses. That still wouldn’t resolve the intrinsic problem of the article’s treatment of voice. In the “Content Analysis” section of her paper, where she explains her methodology, Jackson writes:

First, I identify whether actions by Palestinian and Israeli groups are being described in the active or passive voice. For every verb, I identified the perpetrator and recipient of the action (i.e. whether they were a Palestinian or Israeli group or individual).

That’s a hopeless definition. I’ve explained that the distinction between active and passive voice is syntactic rather than semantic. Yet even on the semantics, Jackson is confused: in an active or a passive clause, there may be no action at all, and no “perpetrator” or “recipient” of it.

Consider this example (which I owe to Pullum, from a paper he’s written about English passives, and which he sent to Jackson): “Not much is known by biologists about the coelacanth.” This is passive, but there is nothing that could be described as an action, let alone one with a “perpetrator” and a “recipient.” In short, Jackson’s research methodology rests on a misunderstanding: it is not possible to identify systematic exclusion of agency by counting the number of passives in a corpus, even supposing Jackson’s state-of-the-art software managed to correctly identify these constructions as passive.

That’s sufficient to demonstrate that Jackson’s entire paper is spurious. The problem is not just that she doesn’t know what she’s talking about, or even that she doesn’t know that she doesn’t know what she’s talking about. The problem is that she doesn’t even know what it is that she doesn’t understand.

My colleagues have examined the output of her analysis, and the results are not pretty. Jackson appears to have overstated the novelty of her software and the parsing is extremely superficial. The likely effect is to undercount the number of active clauses. Jackson seems to have run a Python script and trusted it to assign agency correctly to, respectively, Palestinian and Israeli referents.

I pointed out to Parry that the proposed correction of Jackson’s paper betrayed a misunderstanding of the subject matter. Parry replied, copying her co-editors:

We, as co-editors, have discussed this matter and we consider it closed. The article text will be directly corrected for that sentence, in addition to publishing the corrigendum. The computational analysis, using a respected natural language processing library, is supported with qualitative analysis and illustrative examples.

Scholarship advances through new research and if you or others wish to revisit the question of voice and framing of actors in the Israel-Palestine conflict, or specifically the NYT’s coverage during the periods of the Intifadas, we would welcome work that develops this field of enquiry. Any submissions that meet journal requirements, following our author guidelines, would be dealt with through blind review in the same manner as other submissions we receive.

I will not be responding to further emails about Holly Jackson’s article.

The defensiveness of this response tells its own story. Real scholarship stands on its merits; spurious research unmasked as tendentious propaganda needs a rescue operation.

A long time ago, reviewing a book by political theorist John Gray on global capitalism, Paul Krugman wrote: “John Gray surely expects economists in general, and perhaps me in particular, to denounce him as an ignoramus; I will not disappoint him.” Gray, he noted, has an “odd tendency to try to buttress his argument, not by producing evidence, but by quoting supposed authority figures, often puffing them up with obsequious honorifics.” Well, that’s what’s going on here, too. Embarrassed by the ignorance of the writer she published, Parry tries to shore her up with the authority of a “respected natural language processing library.” But this misses the point—the objection to Jackson’s paper is not to the tools but to the author’s misuse of them.

I don’t believe Jackson is a charlatan—I’m sure she imagined she’d made a contribution to knowledge. But she is not a genuine researcher. Her paper cites Noam Chomsky (not in linguistics but in geopolitics, about which his methods are notoriously unscholarly), and she dedicates it to “the people of Palestine and their struggle for liberation.” In short, she’s a political activist who cut corners on a project for which she was manifestly unfit.

It’s less clear why the editors of a supposedly scholarly journal swallowed this stuff, but I can hazard a guess. Practitioners of disciplines like sociology (in which I have a postgraduate degree) or media studies typically don’t get the respect accorded to scholars in hard sciences. Conceivably, the editors were gulled by the notion that big data and artificial intelligence could be deployed in the service of their disciplines and radical politics, and failed to do their due diligence. Language is a technical subject, and writing about it requires knowledge and expertise. The author and editors in this saga have neither, and it seems they would prefer to have their misconceptions preserved rather than challenged by stubborn facts and difficult concepts.

I’ve deliberately made no comment in this essay on the substantive issues in the Israel-Palestine conflict. The publication of Jackson’s paper by Media, War & Conflict is a fiasco regardless of all other considerations. But I should probably disclose my own stance on this issue.

I’m not an expert on the region and have travelled to Israel and the Palestinian Authority only around half a dozen times. I’ve never been to Gaza, but only as far as the Israeli border city of Sderot, a mile away. I’ve long supported a two-state solution between a secure Israel and a sovereign Palestine with East Jerusalem as its capital. Whenever the question has come up, in articles or panel discussions or speeches, I’ve invariably said this. Israel-Palestine is not a conflict like Russia’s aggression against Ukraine, where one side is wrong and needs to be defeated; it’s a clash of competing and legitimate nationalisms, both of which need to be accommodated in an eventual resolution.

As a working journalist, I do not believe or trust indignant accusations that my trade is engaged in systematic bias on either side of the Israel-Palestine conflict. If anything, the problem is the opposite: we strive for objectivity so far that we sometimes fail to make discriminate judgments about what we report.

This was the gravamen of an influential critique made a generation ago by Martin Bell of the BBC, the best-known British war correspondent of his day, when he publicly criticised his employer for not showing the full facts about the atrocities committed by Serb forces in a genocidal war against Bosnian Muslims from 1992–95. Bell (who I should disclose is my uncle) called for a “journalism of attachment,” and I supported him in this while expressing the matter somewhat differently. The first duty of a journalist is not to be diplomatic but to tell the truth about what they see, regardless of sensitivities, while understanding that they bring presumptions and biases and that their knowledge is inevitably partial.

Broadly speaking, mainstream news outlets do this; when they fail (and in the UK there is an independent press regulator that handles public complaints), they acknowledge error. It’s evident from the episode I’ve described that the world of academic publishing operates under far less stringent criteria. The chances of suffering any real-world consequences from egregious falsehoods, foul-ups, incompetence, ignorance, and obfuscation are very low indeed, and the temptation to rationalise bogus scholarship in order to avoid institutional embarrassment is beguiling. That culture must change.

UPDATE: Professors Pullum and Reynolds have submitted a detailed analysis of Holly Jackson’s paper to the Media, War & Conflict journal. The preprint can be downloaded here.