Skip to content

Academia

An Astronomer Cancels His Own Research—Because the Results Weren’t Popular

· 10 min read
An Astronomer Cancels His Own Research—Because the Results Weren’t Popular
The 30-meter telescope on Pico del Veleta in Spanish Sierra Nevada. Photo from Wikimedia Commons.

Astronomy seems to be in trouble, as it is increasingly populated by researchers who seem more concerned with terrestrial politics than celestial objects, and who at times view the search for truths about nature as threatening. This became obvious in recent years, once the proposed Thirty Meter Telescope (TMT) project in Hawaii was being blocked by Indigenous protestors, who view the mountain it is to be built on as sacred. With a resolution 12 times finer than the Hubble space telescope, the TMT could offer abundant new observational opportunities in astronomy and astrophysics. But a protest in support of the Indigenous groups by advocates in the astronomy community now means that it is an open question as to whether the TMT will ever be built.

Last week yielded another ominous sign of the times, as eminent astronomer John Kormendy retracted an article intended for publication in the Proceedings of the National Academy of Sciences from a preprint website. His article focused on statistical results relating to the evaluation of the “future impact” of astronomers’ research as a means to “inform decisions on resource allocation such as job hires and tenure decisions.” Online critics attacked Kormendy’s use of quantitative metrics, which may be seen as casting doubt on the application of diversity criteria in personnel decisions, at which point Kormendy felt the need to release an abject apology (more on this below).

Astronomer John Kormendy, photographed in 2006. 

Of course, statistical analyses of real-world human data are always subject to the possibility that systematic biases can inappropriately skew the claimed results. And I would never suggest that Kormendy’s work is beyond criticism. But the traditional scientific manner of engaging in such criticism is that other scientists present alternative proposals, and explore other data sets, to search for possible flaws in the original analysis. That is how science should be done. Those who claim in advance, without new analysis or data, that someone else’s research results are “harmful” or threatening, without challenging its accuracy, should consider another profession.

I have been a professor of astronomy (as well as physics) for over 35 years, at a variety of research institutions on three continents. But I wouldn’t classify myself as an astronomer. My educational background is in another area—theoretical particle physics—and my professional forays into astrophysics and cosmology have stemmed from my longstanding interest in observing scientific phenomena from a wide variety of disciplinary perspectives, astronomy included, as a means of testing fundamental notions about nature.

Nevertheless, I have worked with many astronomers over the course of my career, and consulted and learned from a far larger number. So I know enough about the social and professional dynamics of the profession to be concerned.

One of the astronomers whose work I have been aware of for decades—and which reflects my interest in dark matter and the formation of the universe’s structure—is John Kormendy himself. Indeed, I briefly met him while visiting the Dominion Astrophysics Observatory in Victoria, Canada.

That was several decades ago. But when I recently checked in with a colleague to determine how Kormendy’s reputation had fared during the interim, I was told he stands as one of the “world’s premier researchers on the formation and structure of galaxies.” He is a member of the National Academy of Sciences, winner of numerous awards in his field, and his research work has been cited over 33,000 times by other astronomers.

Kormendy has been interested for some time in metrics that scientists can use to ensure that their assessment of potential hires and promotions are less subjective. As with all areas in which decisions depend on human perceptions, there is no methodology that is universally guaranteed to work. Though I personally wouldn’t spend my own research time exploring this area, I appreciate that there are those willing to try to investigate it systematically, in spite of the many obvious obstacles.

Following five years of accumulating data, and consulting colleagues across the globe, Kormendy produced a book on the subject, published in August by the Astronomical Society of the Pacific, entitled Metrics of Research Impact in Astronomy, as well as the related (and now retracted) paper submitted to the Proceedings of the National Academy of Sciences (PNAS) on November 1st under the title ‘Metrics of research impact in astronomy: Predicting later impact from metrics measured 10-15 years after the PhD.’

Research Impact versus Total Citations for studied astronomy scholars, adapted from Figure 1 in Metrics of research impact in astronomy.

Kormendy began his paper cautiously, recognizing that his emphasis on applying quantitative metrics to human-resource evaluation would be viewed with skepticism by those who claim that such metrics embed systemic biases, and that their use presents obstacles to inclusion. Notwithstanding such anticipated concerns, he argued that

we have to judge the impact that [a] candidate’s research has had or may yet have on the history of his or her subject. Then metrics such as counts of papers published and citations of those papers are often used. But we are uncertain enough about what these metrics measure so that arguments about their interpretation are common. Confidence is low. This can persuade institutions to abandon reliance on metrics. [But] we would never dare to do scientific research with the lack of rigor that is common in career-related decisions. As scientists, we should aim to do better.”

He makes it clear up front that quantitative metrics cannot tell us everything we need to know about a candidate. In the “Significance Statement” provided on the first page, he states

This paper develops machinery to make quantitative predictions of future scientific impact from metrics measured immediately after the ramp-up period that follows the PhD. The aim is to resolve some of the uncertainty in using metrics for one aspect only of career decisions—judging scientific impact. Of course, those decisions should be made more holistically, taking into account additional factors that this paper does not measure (my emphasis).

The bulk of the paper focuses on three out of 10 metrics—citations of refereed papers, citations normalized by numbers of coauthors, and first-author citations—which Kormendy attempts to develop into a prediction machine, correlating the metrics evaluated over the early part of a researcher’s career with their later “impact.”

This latter index was constructed by asking 22 scientists who are well-known in their respective subfields to evaluate the impact of 512 astronomers from 17 major research universities around the world whose other early-career metrics could be correlated with those evaluations. Specifically, Kormendy sought to determine whether such evaluation of individuals’ “impact” 10 to 15 years after they’d received their PhDs correlated, in a significant way, with the metrics evaluated at that time and to those corresponding to these scholars during the early period following their PhD (in other words, whether the metrics could predict the evaluations rendered by the advisory panel). The paper claimed to demonstrate, not surprisingly, that averaging the three different metrics produces, on average, a better predictive estimate than any of the metrics do separately.

One can question many aspects of this model, including the significance of its conclusions. That early-career citation counts correlate with later impact may seem almost tautological. (Why would you not expect that having a large number of citations early on in one’s career would be correlated with attaining a reputation as a high-impact scholar later on?) Also, the proposition that averaging several metrics produces a better predictive fit than does any individual metric in isolation would only really be noteworthy if it turned out not to be true.

Finally, one can always question the subjective assessments of those 22 designated sages tasked with measuring “impact,” especially since their assessments (and Kormendy’s own decisions in regard to who performs this task) may reflect the same kind of subjectivity that Kormendy’s whole project is designed to avoid.

I am not sure Kormendy understood the can of worms he was opening. But the response from the astronomy Twittersphere was swift. One could have anticipated the arguments in advance if one were familiar with the standard concerns of those who tend to view any quantitative metrics applied to assessment (including standardized test results) as being inherently suspect at best, or sexist and racist at worst. Kormendy further tempted fate by focusing only on subjects from well-known schools, and by recruiting mostly well-known male senior scientists as members of his expert “impact” panel.

As it happened, those who rained criticism on Kormendy didn’t just limit themselves to these generalities. It was also specifically claimed that junior researchers who might read Kormendy’s paper would feel threatened, or that their careers might be negatively impacted by selection committees whose members were now further encouraged to be systematically biased against them.

Nevertheless, even imperfect quantitative metrics can improve on qualitative assessments made in the absence of such metrics. And it is quite true that Kormendy’s analysis, if applied as a means to recruit or promote, would expose, for better or worse, those whose metrics are low. There may be lots of reasons for such low scores, including bias. But low scores can also mean that the evaluated researchers are simply not productive or impactful. Either way, it exposes potential problems (either with the candidate or his or her academic environment) that could be addressed. Moreover, as much as one might dislike quantitative—or “objective”—merit-based metrics, the alternatives have, historically, usually been worse—and include nepotism and cronyism.