Science / Tech, Social Science

Why Citing a Scientific Study Does Not Finish An Argument

“Actually Studies Show…”

Chances are you’ve found yourself in a heated conversation among a group of friends, family, or colleagues when someone throws down the gauntlet: “Actually, studies show…” Some nod in silent agreement, others check their text messages, and finally someone changes the subject.

It’s hard to know what to say when people cite scientific studies to prove their point. Sometimes we know the study and its relative merits. But most of the time we just don’t know enough to confirm or refute the statement that the study is supposed to support. We are floating in a sea of information, and all we can do is flounder around for the nearest buoy to support a view that’s vaguely related to the conversation.

All of us lack the time to understand more than a small fraction of scientific research. For the most part, this works out well: scientists conduct research and publish papers, each new study adds another piece to the puzzle, and bit by bit we steadily increase the total stock of knowledge. Eventually, we hope, journalists and teachers will bring scientific knowledge together and distill it to the general public.

Of course, that’s not always how science works, or how knowledge is spread. A single study is rarely anything more than suggestive, and often it takes many replications under a variety of circumstances to provide strong justification for a conclusion. And yet, poorly supported studies often make their way into newspapers and conversations as if they are iron clad truths.

According to a spate of recent articles, many scientific results are difficult to replicate. The problem has been studied in detail by social psychologists, but the problem appears to be much more pervasive than initially thought. Some have argued that throughout the sciences most published research findings are false.

Correlations are Cheap, Patterns are Ubiquitous

Science typically involves gathering data, finding interesting correlations, and proposing hypotheses to explain the correlations. For example, suppose we find a set of sick people for whom antibiotics work, and a set for whom they don’t work. We might infer that those for whom antibiotics didn’t work had an antibiotic-resistant strain of a bacterial infection. Or we might think that the patients who didn’t recover had a different disease than those who did recover, perhaps a viral infection for which antibiotics don’t work.

Correlations are everywhere, and given enough data from enough studies, we will find correlations that are surprising and interesting. But as the sick patient example suggests, causation is difficult to infer, and some correlations are flukes that don’t admit of a common cause, or that can’t be consistently replicated.

We are pattern-seeking creatures, and correlations are patterns that cry out for explanation. But sometimes our political views infect our prior beliefs, and these beliefs lead us to look for patterns until we find them. Given enough tests and time, we will find them.

Consider the case of “stereotype threat.” The idea behind stereotype threat is that when a member of a group (e.g. race, sex, or religion) is asked to perform a task, but primed with information about how most people in their group perform that task, they will tend to perform in accordance with the group average rather than according to their own ability.

What the initial studies seemed to suggest was that stereotype threat is not just statistically significant, but large.  It appeared as if when blacks were told that the test they were taking was an indicator of intellectual ability, they scored worse than whites.  But when told that it was just a problem-solving exercise not indicative of ability, they scored about the same as whites.

Think about why people would be happy to find this result: if all we need to do to improve the outcomes of people in poorly performing groups is to prime them with certain kinds of information (or shield them from other kinds of information, such as negative stereotypes), we could dramatically improve test scores at school and productivity at work.

As you can imagine, the results were too good to be true, and stereotype threat has not stood up especially well to scientific scrutiny. It probably exists in some cases (some people gain or lose confidence when primed with certain kinds of information), but the magnitudes are usually small, and the social implications are unclear. Yet this hasn’t stopped universities and businesses from implementing training programs to combat the alleged evils of stereotype threat in the classroom and in the boardroom.

Publication Bias and Perverse Incentives

Researchers who discuss the “replication crisis” in science often emphasize publication bias and professional incentives as the primary culprits. Publication bias occurs when journal editors or ordinary readers place too much weight on a statistically significant study because they fail to think about the likely failure of many other attempts to find similar results. In other words, scientists often run tests and find data that don’t yield interesting results. But a “null result” is rarely published for the obvious reason that it’s not very surprising or interesting.

Even when a scientific study does find dramatic results, and the results can be replicated, subsequent results are generally less dramatic than the initial study. According to Brian Nosek, a social psychologist at the University of Virginia, we should predict that “replication effect sizes would be smaller than original studies on a routine basis—not because of differences in implementation but because the original study effect sizes are affected by publication and the replications are not.”

Researchers want to find interesting results, and are professionally rewarded for doing so. The rewards come in the form of career advancement, reputational enhancement, and a higher likelihood that journal editors will publish their results. These rewards usually help push science forward. But they can also slow science down rather than speed it up.

Consider the original study in the Lancet that linked the MMR vaccine to autism. The result was juicy, especially to journalists. The study found that taking the MMR vaccine significantly elevated the risk of autism in children, but that not taking the vaccine, or separating it into three separate components, would lower the risk of autism. If we could tackle autism this easily the world would be a much better place. The study fed into a widespread desire for an easy answer to a hard problem. But the study was wrong, and it took over a decade before the record was corrected.

In this case, the initial study itself turned out to be poorly designed. The publication of the autism study, and its promotion by journalists, probably cost lives as some parents declined to vaccinate their kids, and protested vaccine mandates.

But quite apart from the quality of the autism study, many studies that are reasonably well-designed are hard to replicate, and are probably either false or overblown.


The proliferation of scientific studies and the norms that make scientific journals more likely to publish surprising results than failed replication attempts are unnerving for a couple of reasons. First, politicians pass laws based on studies their advisors cite. Sometimes these laws are silly, and betray an absurd ignorance of science. For example, in 1998 the Governor of Georgia signed a law providing free classical music CDs to expectant mothers in order to boost their children’s IQ. Of course, this was based on, at best, weak evidence which has no business informing any type of policy.

But sometimes these laws are far-reaching, like the macro-economic policies that governments pursue in a financial crisis. We can easily find studies suggesting that providing a “stimulus” to the economy by increasing government spending in the short run can jump-start the economy during a downturn. But we can also find plenty of studies suggesting that the opposite happens, and these arguments go back to the early days of economics.

The right answer, almost certainly, is that we don’t know. Both sides gather immense amounts of data and weave theory and data into an intricate tapestry, translated into the universal language of math. But the mathematical sophistication of modern economics often gives us the illusion that we know more than we do.

The second reason to worry about the replication problem in science is that it becomes all too easy for teachers, friends, and colleagues (quite apart from politicians) to fool us into accepting a poorly supported conclusion that is intuitively satisfying but ultimately wrong. Malcolm Gladwell is a master of this. He has made a career out of telling stories that make people feel good about themselves by cherry-picking scientific studies that produce surprise and hope rather than fear and anxiety. Yet we should expect science to do both, since the truth doesn’t care about our emotional reactions.

We’re not advising you to commit social suicide by interrupting every conversation with a demand for more evidence. But we do think the phrase “studies show…” should be met with cautious skepticism, especially when the study supports the politically-motivated preconceptions of the person who’s talking.


Filed under: Science / Tech, Social Science


Jonathan Anomaly is a core faculty member of the Freedom Center, and Assistant Professor in the PPEL Program, at the University of Arizona.. Brian Boutwell is an Associate Professor of Criminology and Criminal Justice at Saint Louis University. Follow him on Twitter @fsnole1