Retracting a Controversial Paper Won’t Help Female Scientists

Retracting a Controversial Paper Won’t Help Female Scientists

Tania Reynolds
Tania Reynolds
17 min read

Imagine yourself as a newly hired female assistant professor and the delight you feel when you learn that your article, examining over 222 million academic papers, has just been accepted at one of the top science journals. Now imagine your response when you discover that a fellow female academic is formally demanding your paper’s retraction,1 galvanized by a mob of outraged scientists on Twitter. This was the recent experience of Bedoor AlShebli, who published her large-scale research in Nature Communications.2

Open letter to the Editor-in-Chief of @NatureComms about the AlShebli paper, which claims that training with #WomenInSTEM damages the careers of young scientists

— Leslie Vosshall PhD (@pollyp1) November 19, 2020

In an analysis of over three million junior and senior co-author teams, AlShebli and her colleagues found that junior scholars with more female senior co-authors received fewer citations (up to 35 percent fewer) on their academic publications. Moreover, senior female academics who published with female junior scholars received 18 percent fewer citations than those who published with male junior scholars. No such citation penalty was found for male senior scholars who published with female junior scholars. Thus, it appears papers authored by both female junior and senior scholars were cited at lower rates than other gender authorship pairings.

These results are consistent with a broader literature on citation patterns. For example, in the fields of Ecology, Evolution, and Neuroscience, male-authored papers received more citations than those by their female authored counterparts.3, 4 Some evidence suggests these disparities are most strongly perpetuated by male academics, who exhibit male-favoring biases in their cited work.3 However, other analyses point to more innocuous explanations—that authors more often cite same-sex scientists because topics generally differ in the ratio of male to female scientists investigating them.5

Critiques of the paper

So why the outrage about a paper that reports patterns consistent with existing research? Vosshall’s letter calling for the paper’s retraction describes it as “deeply methodologically flawed” and accuses its authors of ignoring serious concerns raised by reviewers. The reviews are publicly accessible6 and include four primary critiques surrounding the gender analyses (most critiques centered on the analyses of whether senior authors’ relative prominence enhanced protégés’ citation outcomes).

The first concern was over the MAG dataset on which the analyses relied. AlShebli and her team addressed this critique by downloading the most recent version of the dataset, which improved on many of the previous name-disambiguation errors. They re-ran all of their analyses (an arduous task) and updated their results.

Second, reviewers expressed concern that AlShebli’s team treated co-authorship as synonymous with mentorship. To establish that senior co-authors had indeed provided mentorship, AlShebli and her team sent a survey to a random sample of 2000 of the scientists. Of these, 167 responded. This is a low response rate, suggesting reports may not generalize across the broader sample. Notwithstanding this limitation, 72–85 percent of respondents agreed they received guidance on each of the skills listed in the survey. Ninety-five percent agreed they had received guidance on at least one of the skills. These findings lend some (albeit limited) confidence that senior co-authors did indeed provide mentorship to their junior co-authors. Moreover, AlShebli’s team made sure to explicitly state in the article’s title they investigated “informal mentorship in academic collaborations,” as well as in the text: “we study mentorship in its broader sense, which may involve multiple senior collaborators who may or may not hold a formal supervisory role.”

Third, reviewers expressed concern over the paper’s lack of attention to the societal factors potentially contributing to the gendered results, such as men’s greater historical career advantages and resource access. AlShebli’s team thus included an explicit acknowledgement of these forces in their general discussion on page six:

…it should be noted that there are societal aspects that are not captured by our observational data, and the specific mechanisms behind these findings are yet to be uncovered. One potential explanation could be that, historically, male scientists had enjoyed more privileges and access to resources than their female counterparts, and thus were able to provide more support to their protégés.

Fourth, the reviewers requested that the team tone down their causal inferences, given that correlational data do not allow for such interpretations. Although AlShebli and colleagues reported altering their language accordingly, portions of the general discussion imply causal forces and speak beyond the data. Consider the following from the final paragraph:

Our gender-related findings suggest that current diversity policies promoting female–female mentorships, as well-intended as they may be, could hinder the careers of women who remain in academia in unexpected ways. Female scientists, in fact, may benefit from opposite-gender mentorships in terms of their publication potential and impact throughout their post-mentorship careers. Policy makers should thus revisit first and second order consequences of diversity policies while focusing not only on retaining women in science, but also on maximizing their long-term scientific impact.

Although it is common for scientists to run with their findings, it is important to be circumspect when discussing topics with such consequential implications as this one. To the team’s credit, they used terms such as “suggest,” “could,” and “may” when discussing the implications of their findings, which indicates a degree of caution. However, the authors were not careful enough. Policy implications certainly do not follow from their analyses. Many interpretations can make sense of these patterns, including citation biases (see above) as well as others, which I detail below. I suspect this error was their most egregious, though I do not view it as fatal. Authors often publish corrections or post-publication updates to addend their articles, which may be appropriate in this case.

Beyond these four primary critiques, others focused on the analytic techniques, which were addressed in the article’s supplemental materials. Although I am not in a position to assess whether these controls were sufficient, four independent scholars with relevant expertise reviewed the paper. Moreover, following recommended best practices, the authors provided open access to their data which allowed fellow academics to conduct their own analyses. AlShebli and her team note, “We believe that free inquiry and debate are engines of science, and welcome the review launched by the Editor in Chief of Nature Communications.”

The relevance of gender

Paper retractions are generally reserved for instances of data falsification and coding errors that render results invalid. Neither applies in this case. So why demand a retraction? Vosshall’s retraction letter notes: “I find it deeply discouraging that this message—avoid a female mentor or your career will suffer—is being amplified by your journal.” Social media responses echoed similar sentiments, as exemplified by Samantha Joel’s tweet,7 “Today feels like a great day to do some of that suboptimal lady mentoring.”

As a female scientist, I understand these worries. Numerous female academics work tireless hours providing mentorship to female protégés, often motivated by a dedication to justice and hope for promoting a more gender-equal scientific landscape. These are praiseworthy endeavors, from which I have personally benefitted. Although my primary advisors were male, I’ve co-authored papers with senior female academics, who undoubtedly enhanced my knowledge, skillset, and career trajectory. I am immensely grateful for them and for the pioneering female academics who came before, paving the path for us all. However, I am also grateful for my male advisors, who too invested numerous hours and resources towards supporting my career. I believe I have benefitted from the totality of this mentorship. Indeed, one possible take-away from AlShebli et al.’s results is that female scientists gain more citations when they collaborate with both female and male colleagues.

Why should this possible takeaway evoke outrage? A recent study8 found that people evaluate scientific studies reporting a male-favoring advantage less favorably, compared to those reporting a female-favoring one. Papers demonstrating a male (versus female) advantage were viewed as less credible, less valuable, more offensive, and more harmful. These findings suggest merely highlighting a disparity disfavoring women is likely to evoke public outcry.

Science, however, relies on the dispassionate accumulation and scrutiny of data. It is quite improbable that the entirety of findings will cohere perfectly with our cherished worldviews. With larger and larger datasets available for analysis, we will inevitably uncover unpalatable findings, some with important implications. However, only by documenting reality (or our closest approximation to it) can we strive to improve society. Interventions based on faulty data or incorrect assumptions will fail, wasting millions of taxpayer dollars and innumerable hours. When we bury findings that make us uncomfortable, the empirical landscape presents an inaccurate, lopsided view of reality. If policies and interventions hinge on that empirical landscape, we will only set ourselves up for failure. Moreover, by presenting an incomplete picture of phenomena, we undermine trust in science. Millions of taxpayers are counting on scientists to forward the most compelling evidence, to best inform subsequent interventions and policies.

If female scientists benefit by collaborating with both male and female scholars, we deserve to know that. Female scientists deserve to know that. Rather than erasing AlShebli and colleagues’ laborious work from the record, their findings should stimulate future investigations to examine whether the patterns replicate, and if so, examine the underlying causal forces.

Interpretation of findings: Research topic

One potential cause of the gendered disparity in citation rates is the chosen topic of investigation. A wide body of evidence reveals relatively large and consistent sex differences in interests in people versus things.9 That is, women show greater interest on average in topics related to people and their experiences, whereas men show greater interest in objects and mechanisms. These disparities are mirrored by sex ratios across academic disciplines and publications. A greater proportion of men are found in thing-related academic disciplines such as computer science and physics, whereas a great proportion of women are found in people-related disciplines, such as nursing and midwifery.10 Correspondingly, men more often author papers in thing-related fields, such as engineering and chemistry, whereas women more often author academic papers in people-related ones, such as psychology and the humanities.11

Although AlShebli and her colleagues controlled for academic discipline (for example, biology, economics, psychology), it is still possible that differences in citations arise from the particular topic under investigation. Take psychology as one example. A qualitative investigation of one niche group’s subjective experiences may garner different citation amounts than an experimental investigation of cognitive processing. If male and female academics differ on average in their selection of topics, citation disparities may follow from the topic,12 rather than from the scientists’ sex, per se. Indeed, AlShebli’s team straightforwardly highlighted this possibility: “The explicit drivers underlying this empirical fact could be multifold, such as… women taking on less recognized topics that their protégés emulate.”

It is certainly possible that scientists’ choice of study is influenced by societal factors including gender stereotypes, but it is also possible these choices reflect individuals’ own particular interests. The former suggests the value of intervention, but the latter may not. Regardless, understanding the mechanisms underlying these patterns better illuminates responses most likely to promote maximum flourishing.

Interpretation of findings: Same-sex cooperation across status differences

Because AlShebli et al.’s paper uncovered a citation penalty among female junior and senior collaborations in particular, it is also possible there is something undermining female-female collaborations across status disparities. Although readers may find this possibility unpalatable, it is consistent with some prior evidence, as well as the historical challenges faced by women.

Across human history, many human groups—especially those that were food-producing—practiced patrilocality.13, 14, 15, 16 Under patrilocality, women left their families to reside with their husbands’. These social arrangements meant that many of our female ancestors were surrounded by unrelated individuals. How did ancestral women foster cooperative alliances with the nearby, unrelated women? David Geary has theorized that without shared genetic interests in one another, unrelated female ancestors upheld their cooperative relationships using reciprocal altruism, whereby favors and goods were exchanged in a tit-for-tat manner.17 If throughout human history, women relied on these relationship styles to garner support and resources, relics of these dynamics may persist into modern contexts.

Reciprocity is much easier to track in dyadic (one-on-one) relationships, compared to large complex groups. Consistent with this explanation, modern girls and women today show a stronger relative preference than do boys and men for dyadic relationships over large groups18, 19, 20, 21, 22. It is possible such a preference is a relic of our female ancestors’ cooperative dynamics.

Reciprocal exchanges are also easier to sustain when both parties hold similar levels of power and resources. Consider, as a caricaturist example, if a celebrity tried to befriend a homeless person. Mutual benefit would be unlikely due to the large disparities in power and wealth. The relationship would likely devolve into a unilateral resource transfer from the more affluent celebrity to the indigent person, or worse, into an exploitation of the homeless individual. Although extreme, this example demonstrates how mutual benefit is unlikely in partnerships characterized by large disparities in status, resources, or power—a pattern supported by game theoretic models.23

If ancestral women upheld their relationships through mutual exchange, disparities in status may have threatened the possibility of mutual benefit. Indeed, a large body of evidence finds that compared to men, women more strongly prefer equal distributions of resources and power over unequal (i.e., hierarchical) or equitable (i.e., merit-based) ones24, 25, 26, 27, 28, 29. This well-documented sex difference may suggest that modern women prefer equal power distributions because those preferences aided female ancestors in sustaining mutually beneficial allyships.

Another explanation for why men may more strongly prefer hierarchical dynamics on average than women is because role specialization was critical for large-group dynamics, such as warfare. Because our male ancestors were more often involved in warfare,30, 31, 32, 33 promoting group coordination through hierarchy may have aided male success on the battlefield. These aggressive intergroup conflicts were incredibly consequential, as loss very often meant death. Thus, modern men are the genetic descendants of those who formed successful groups in warfare,34, 35 which likely involved role specialization and hierarchies.

This relative female preference for symmetry and equality could potentially compromise female-female cooperation in contexts where status demarcations are overt. Consider modern organizations, which are often hierarchically organized. Research finds that disparities in organizational power may corrode female-female cooperative relationships. For example, among over 60,000 working individuals, female employees judged their female managers as less competent (d= .08) and reported less close relationships with them (d= .15), compared to female employees reporting to male managers.36 In another study, using over 11,600 employees, female workers were less satisfied with their jobs when they reported to a female than male boss, whereas male employees showed no difference in satisfaction as a function of their boss’s gender.37 These negative outcomes were observed primarily among female junior/senior relationships, a pattern mirrored in AlShebli’s findings.

However, this slight animosity appears to cut both ways, as female superiors occasionally distance themselves from, fail to help, or actively thwart their female subordinates, a phenomenon known as the “Queen Bee Syndrome.” For example, in a survey of 1,700 employees, minority women reported more support from their male than female supervisors, as well as more optimism about their potential for promotion.38 In another study, low-performing female employees who switched from a male boss to a high-performing female boss earned 30 percent less than similarly low-performing male employees who made an identical switch.39 Such a pattern suggests low-performing female (but not male) subordinates were penalized financially by their accomplished female superiors.

This “Queen Bee Syndrome” has also been documented in academic contexts. Two studies found that senior female academics evaluated junior female academics as less committed to their careers than junior male academics, whereas no such bias was found among senior male academics40, 41. In another study among 50 universities, female senior faculty were less likely to co-author papers with same-sex junior faculty than were male senior faculty.42

To be sure, these studies used methods and measures quite different than AlShebli et al.’s analysis. Nonetheless, taken as a whole, this body of work suggests female-female cooperation across clear status demarcations may face unique obstacles. Although some may recoil at this possibility, such reactions do not help the women potentially implicated by these patterns. If there is some truth to these findings, women are done no service by hiding them. We cannot determine how best to support female-female cooperation if we do not acknowledge any of the challenges facing it.

Moving forward

When findings make us uncomfortable, we should dive into them rather than suppress them. By examining the underlying mechanisms, we can better understand how to improve the relevant outcomes. Our well-intended interventions will be sure to fail if we exclude certain possibilities from empirical investigation.

Much of the outrage expressed in response to AlShebli’s paper is well-intended. People want female academics to succeed. Whether her team’s findings reflect a fluke, citation bias, gender differences in topic choice, or challenges in female cooperation across status disparities, our only path forward is through further examination. Only data can tell us how best to promote female scientists.

If you are upset by AlShebli’s findings, channel your efforts towards understanding these patterns better. Do not bury them. This examination, although only correlational and observational, analyzed hundreds of millions of collaborations. Unlike many other studies, it was well-powered to detect effects. If the effects documented are real, let us examine why. We would be doing female scientists a disservice by sticking our heads in the sand and pretending these patterns are not there. If you care about promoting female scientists, collect new data, run additional analyses, or dive further into the empirical literature.

No scientific study is perfect and AlShebli et al.’s is certainly no exception. It may require an addendum to qualify overly confident interpretations. It is also possible that when we revisit her team’s data, we will learn something new. Let us publish our critiques and new findings. Science is a cumulative, self-correcting process. Its success and credibility hinge on collecting, analyzing, and disseminating evidence even-handedly. Without an accurate depiction of the human experience, we will be ceaselessly led astray in our efforts to improve it.


1 Vosshall’s letter:
2 AlShebli, B., Makovi, K., Rahwan, T. (2020) The association between early career informal mentorship in academic collaborations and junior author performance. Nature Communications.
3 Dworkin, J. D., Linn, K. A., Teich, E. G., Zurn, P., Shinohara, R. T., & Bassett, D. S. (2020). The extent and drivers of gender imbalance in neuroscience reference lists. Nature Neuroscience, 23, 918-926.
4 Fox, C. W., & Paine, C. T. (2019). Gender differences in peer review outcomes and manuscript impact at six journals of ecology and evolution. Ecology and Evolution9(6), 3599-3619.

5 Potthoff, M., & Zimmermann, F. (2017). Is there a gender-based fragmentation of communication science? An investigation of the reasons for the apparent gender homophily in citations. Scientometrics112, 1047-1063.
6 Reviews of AlShebli paper:
7 Samantha Joel’s tweet:
8 Stewart-Williams, S., Chang, C. Y. M., Wong, X. L., Blackburn, J. D., & Thomas, A. G., (2020). Reactions to male-favouring vs. female-favouring sex differences: A preregistered experiment and Southeast Asian replication. British Journal of Psychology.
9 Su, R., Rounds, J., & Armstrong, P. I. (2009). Men and things, women and people: a meta-analysis of sex differences in interests. Psychological Bulletin135, 859.
10 Holman, L., Stuart-Fox, D., & Hauser, C. E. (2018). The gender gap in science: How long until women are equally represented?. PLoS biology16, e2004956.
11 Luoto, S. (2020). Sex differences in people and things orientation are reflected in sex differences in academic publishing. Journal of Informetrics14, 101021.
12 Madison, G., & Söderlund, T. (2018). Comparisons of content and scientific quality indicators across peer-reviewed journal articles with more or less gender perspective: gender studies can do better. Scientometrics115(3), 1161-1183.
13 Burton, M. L., Moore, C. C., Whiting, J. W., & Romney, A. K. (1996). Regions based on social structure. Current Anthropology37, 87-123.
14 Copeland, S. R., Sponheimer, M., de Ruiter, D. J., Lee-Thorp, J. A., Codron, D., le Roux, P. J., … & Richards, M. P. (2011). Strontium isotope evidence for landscape use by early hominins. Nature474(7349), 76-78.
15 Szécsényi-Nagy, A., Brandt, G., Haak, W., Keerl, V., Jakucs, J., Möller-Rieker, S., … & Osztás, A. (2015). Tracing the genetic origin of Europe’s first farmers reveals insights into their social organization. Proceedings of the Royal Society B: Biological Sciences282(1805), 20150339.
16 Wilkins, J. F. (2006). Unraveling male and female histories from human genetic data. Current Opinion in Genetics & Development, 16, 611–617.
17 Geary, D. C. (2002). Sexual selection and sex differences in social cognition. In A. V. McGillicuddy-De Lisi & R. De Lisi (Eds.), Biology, society, and behavior: The development of sex differences in cognition (pp. 23–53). Greenwich: Ablex/Greenwood.
18 Benenson, J. F. (1990). Gender differences in social networks. The Journal of Early Adolescence, 10, 472–495.
19 Benenson, J. F., Apostoleris, N. H., & Parnass, J. (1997). Age and sex differences in dyadic and group interaction. Developmental Psychology, 33, 538–543.
20 David-Barrett, T., Rotkirch, A., Carney, J., Izquierdo, I. B., Krems, J. A., Townley, D., … Dunbar, R. I. (2015). Women favour dyadic relationships, but men prefer clubs: Cross-cultural evidence from social networking. PLoS ONE, 10, e0118329.
21 Fabes, R. A., Martin, C. L., & Hanish, L. D. (2003). Young children’s play qualities in same-, other-, and mixed-sex peer groups. Child Development, 74, 921–932.
22 Vigil, J. M. (2007). Asymmetries in the friendship preferences and social styles of men and women. Human Nature, 18, 143–161.
23 Johnstone, R. A., & Bshary, R. (2002). From parasitism to mutualism: Partner control in asymmetric interactions. Ecology Letters, 5, 634–639.
24 Almås, I., Cappelen, A. W., Sørensen, E. Ø., & Tungodden, B. (2010). Fairness and the development of inequality acceptance. Science328(5982), 1176-1178.
25 Berdahl, J. L., & Anderson, C. (2005). Men, women, and leadership centralization in groups over time. Group Dynamics: Theory, Research, and Practice9, 45-57.
26 Carlsson, F., Daruvala, D., & Johansson‐Stenman, O. (2005). Are people inequality‐averse, or just risk‐averse?. Economica72(287), 375-396.
27 Dufwenberg, M., & Muren, A. (2006). Gender composition in teams. Journal of Economic Behavior & Organization, 61, 50–54.
28 Saad, G., & Gill, T. (2001). Gender differences when choosing between salary allocation options. Applied Economics Letters, 8, 531–533.
29 Scott, J., Matland, R., Michelbach, P., & Bornstein, B. (2001). Just deserts: An experimental study of distributive justice norms. American Journal of Political Science, 45, 749–767.
30 Baumeister, R. F. (2010). Is there anything good about men? How cultures flourish by exploiting men. New York, NY: Oxford University Press.
31 Chagnon, N. A. (1988). Life histories, blood revenge, and warfare in a tribal population. Science, 239, 985–992.
32 Geary, D. C. (2010). Male/female: The evolution of human sex differences (2nd ed.). Washington, DC: American Psychological Association.
33 Keeley, L. (1996). War before civilization. New York: Oxford University Press.
34 Carvajal-Carmona, L. G., Soto, I. D., Pineda, N., Ortíz-Barrientos, D., Duque, C., Ospina-Duque, J., … & Ruiz-Linares, A. (2000). Strong Amerind/white sex bias and a possible Sephardic contribution among the founders of a population in northwest Colombia. The American Journal of Human Genetics, 67(5), 1287-1295.
35 Underhill, P. A., Passarino, G., Lin, A. A., Shen, P., Mirazón Lahr, M., Foley, R. A., & Cavalli- Sforza, L. L. (2001). The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Annals of Human Genetics, 65, 43–62.
36 Elsesser, K. M., & Lever, J. (2011). Does gender bias against female leaders persist? Quantitative and qualitative data from a largescale survey. Human Relations, 64, 1555–1578.
37 Artz, B., & Taengnoi, S. (2016). Do women prefer female bosses?. Labour Economics42, 194-202.
38 Maume, D. J. (2011). Meet the new boss… same as the old boss? Female supervisors and subordinate career prospects. Social Science Research, 40, 287–298.
39 Srivastava, S. B., & Sherman, E. L. (2015). Agents of change or cogs in the machine? Reexamining the influence of female managers on the gender wage gap. American Journal of Sociology120(6), 1778-1808.
40 Ellemers, N., Van den Heuvel, H., De Gilder, D., Maass, A., & Bonvini, A. (2004). The underrepresentation of women in science: Differential commitment or the queen bee syndrome?. British Journal of Social Psychology43(3), 315-338.
41 Faniko, K., Ellemers, N., & Derks, B. (2020). The Queen Bee phenomenon in Academia 15 years after: Does it still exist, and if so, why?. British Journal of Social Psychology, e12408.
42 Benenson, J. F., Markovits, H., & Wrangham, R. (2014). Rank influences human sex differences in dyadic cooperation. Current Biology24(5), R190-R191.



ActivismDiversity DebatePsychologyScienceScience / TechTop Stories