Bad Data Analysis and Psychology's Replication Crisis

In 2014, a study published in JAMA Pediatrics linked playing aggressive video games to real-life aggression in a large sample of Singaporean youth. The study attracted considerable news media attention. For instance, a sympathetic article in Time magazine breathlessly reported its findings and suggested that brain imaging research found aggressive games change kids’ brains. But was the evidence from the Singapore study reliable?

In recent years, concerns about the Singapore dataset have grown. UK scholars Andrew Przybylski and Netta Weinstein recently wrote that the way the dataset had been used by the primary authors was problematic. The analyses from the same dataset kept changing across published articles in suspicious ways. These shifting analyses can be a red flag for the data massaging that may produce statistically significant outcomes and hide outcomes that didn’t work out. Such practices may be unintentional or unconscious (scholars are only human after all). But they do suggest that the results could do with further scrutiny.

When the dataset became available to my colleague John Wang and me, we re-analyzed the data using more rigorous methods. We publicly pre-registered our analyses, which meant we couldn’t subsequently alter them to fit our hypotheses. Our results were strikingly different from the 2014 paper: in fact, there was no evidence in the dataset that aggressive game play was related to later aggression at all. So, what happened? How did a dataset come to show links that don’t exist between aggressive video games and youth aggression?

This isn’t the first time this has happened in video game research. Recently, another study appeared to link violent media to irresponsible gun behavior among youth. However, an independent researcher re-analyzed that dataset and found that the questionable elimination of a few inconvenient participants had transformed non-significant results into significant ones. Furthermore, recent brain imaging studies have not supported the claims made in Time‘s 2104 article.

The problem is larger than video game research though. For almost a decade now, psychological science has been undergoing a significant replication crisis wherein many previously held truisms are proving to be false. Put simply, many psychological findings that make it into headlines, or even policy statements issued by professional guilds like the American Psychological Association, are false and rely upon flimsy science. This happens for two reasons. The first is publication bias wherein studies that find novel, exciting things are preferred to those that find nothing at all. And second, death by press release, or the tendency of psychology to market trivial and unreliable outcomes as having more impact than they actually do.

Publication Bias

The tendency to publish only or mainly statistically significant results and not null findings is due to a perverse incentive structure within academia. Most of our academic jobs depend more on producing research and getting grants than they do teaching or committee work. This creates the infamous publish or perish structure, in which we either publish lots of science articles or we don’t get tenure, promotions, raises, prestige, etc. Scientific studies typically require an investment of months or even years. As an academic, if we invest years on a science study we must get it published or we will lose our jobs or funding.

If journals only publish statistically significant findings, the outcome of that study must be statistically significant. Typically, scholars can choose from dozens or even hundreds of ways to analyze their data. Thus, if we need statistically significant results, we can simply cycle through these different analytical options until we get the outcomes we need to publish. Variations on this include p-hacking (when analyses are run in multiple ways, but only those that produce statistically significant findings are reported) and HARKing (Hypothesis After Results are Known.) Harking occurs when scholars run multiple analyses between numerous variables without any particular theory. When they publish their findings, they pretend the statistically significant ones were those they predicted all along. In fact, they are usually the product of random chance and therefore unreliable. This perverse incentive structure is thought to be the source of considerable scientific misconduct.

So why do journals tend to publish only statistically significant findings? Part of this has to do with the nature of traditional statistical analyses, which made it easy for scholars to dismiss non-significant results as lacking any meaning. Effectively, what is called Null-Hypothesis Significance Testing actually makes it difficult to prove a theory false, which is a way of turning science on its head. But also, statistically significant findings are more exciting, increase readership (this matters to academic journals too), and attract media interest. Dutch scholar Daniël Lakens recently said:

I agree.

Death by Press Release

Death by Press Release occurs when scholars fail to disclose the trivial or unreliable nature of some findings, often through a university press release. The recent imbroglio over whether social media leads to mental health problems in teens is an example. A recent study by Jean Twenge and colleagues claimed that social media use is associated with decreased mental health among youth. Pretty alarming! However, a re-analysis of the data by Oxford researchers found that, although statistically significant, the effect was no greater than the (also significant but obviously nonsense) correlation between eating bananas and mental health or wearing eyeglasses and mental health, neither of which produce anxious think-pieces.

In large samples (the studies mentioned above had hundreds of thousands of participants), very tiny correlations can become “statistically significant” even though the magnitude of the effect is tiny. Usually this magnitude or “effect size” is demonstrated by the proportion of variance in one variable explained by another. In other words, if the only thing you knew about people was variable X, how accurate would you be in predicting variable Y above chance. So, zero percent variance explained would be literally no better than a random coin toss, whereas 100 percent would be perfect predictive accuracy. The effects for screens on mental health suggests screens account for far less than one percent of variance in mental health, little better than a coin toss. Dr. Twenge has published defenses of her results, arguing that many important medical effects also have small percentage of variance explained. However, these claims are quite simply based on miscalculations of the medical effects which are actually much stronger in terms of variance explained. Although these miscalculations were revealed over a decade ago, this scientific urban legend is sometimes repeated by psychologists because it makes psychological research sound much stronger than it actually is.

The Other Crisis in Psychology

Sydney. London. Toronto.

QuilletteApril L. Bleske-Rechek

So, “statistical significance” doesn’t really mean very much and the first thing people should wonder when they hear about a scientific result is “What is the effect size?” Effects that are very tiny (explaining 1–4 percent of variance or less) are often not real at all, but rather the noise of survey research, participant hypothesis guessing, unreliability of respondents, and so on. They’re not just really small, they actually don’t exist in the real world. In the spirit of Dr. Lakens’s comment about publication bias, psychology’s inability to let go of tiny effect sizes or communicate them honestly to the general public is another (arguably unethical) element of psychological science that damages our credibility and grossly misinforms the public.

This phenomenon is abetted by two groups. The first group are the professional guilds, such as the American Psychological Association or the American Academy of Pediatrics, which many in the general public mistake for science organizations. They’re not. They’re dues-paying organizations that function to market a profession, not necessarily tell objective truths (disclosure: I am a fellow of the American Psychological Association). The accuracy of the policy statements on science has often turned out to be poor.

The second group are news reporters who often uncritically publish university press releases without doing any fact-checking of claims. Even when the studies they cover are later discredited, they don’t always carry the new information. For instance, Time magazine covered the early Singapore study which purported to show a link between aggressive games and youth aggression. However, they have not covered the subsequent re-analysis, even after I contacted the original reporter to inform her of the correction in science. The truth is that flashy, exciting, and novel findings tend to get news coverage. But when those findings later turn out to be unreliable, news media often drops the ball. Perhaps corrections aren’t that thrilling, or perhaps they simply don’t want to acknowledge they didn’t do any due diligence with university press releases in the first place.

In 2005, statistician John Ioannidis published an article in which he claimed that most published research findings are false. Almost 15 years later, not much has changed. Psychological studies continue to be found to be unreliable and unstable. Addressing publication bias through more transparent science has resulted in some clear improvements. However, until psychology acknowledges that most effect sizes, even in high-quality studies, are trivial and unlikely to have an impact in the real world, we will continue to deceive more than we illuminate. Our challenge is to be more honest about how much we still don’t know about the workings of the human mind. Hopefully, the next 15 years will see more progress toward an open, honest science that acknowledges it finds nothing at all far more often than it does clear effects.

Top Stories

Art and Culture

Politics

recent

Education

History

Podcast

Science / Tech

Activism

Free Speech

Books

Long Read

Feminism

Culture Wars

Health

Identity

Bad Data Analysis and Psychology's Replication Crisis

Keep reading

Stealing Australia and Buying New Zealand

After Liberal Internationalism

That Time Canada Rebranded as a ‘Genocide’ State

Natalism and the Welfare Mother

A New Middle East?

Greta Thunberg’s Fifteen Minutes

Sign up for Quillette