After Quillettepublished an essay by evolutionary biologist Carole Hooven titled, âWhy Do Men Dominate Chess?,â a number of friends asked me if her essay explained a similar phenomenon in my own field of mathematics. After all, arenât many of the traits that Hooven identified as crucial to excellence in chessâinnate spatial ability, competitiveness, obsessivenessâsimilarly crucial for excellence in mathematics? And donât men dominate mathematics for some of the same reasons?
Hooven states that there does not appear to be much dispute about the size or persistence of sex gaps in chess, and she wonders if biology might play a significant role in menâs domination of the game. Her study was prompted by a ruling from FIDE, the International Chess Federation in Switzerland, stipulating that transgender women may not compete in official FIDE women-only chess tournaments, which often carry substantial financial rewards and recognition. That suggests that men may have some biological advantage in chess. Is this also the case in mathematics?
To quantify what she means by dominance, Hooven reports that âthe fair sex accounts for only about two percent of the worldâs chess Grandmasters.â But mathematics does not have Grandmasters, and unlike its close scientific field of physics, there are no Nobel prizes in mathematics. Common criteria for distinction in mathematics, such as the number of PhDs or professorships an institution awards, are decided by committees and these are often quite political. On the other hand, there are some clear objective measures in mathematics analogous to chessâs world championships.
The statistics for the very highest levels of mathematics competitionsâin age groups ranging from middle school MathCounts to the university-level Putnam Mathematical Competition and the annual International Mathematical Olympiadâsuggest that sex statistics in mathematics closely resemble the chess statistics Hooven quoted in her essay. For example, about 95 percent (not all sexes were revealed) of the top-scoring 100 participants in the 2023 International Mathematical Olympiad were male.
Unlike chess, these three competitions are all age-restricted. But mathematics has two additional levels of distinction above world championships and Olympiads, and both are open to all comers. The first is the set of seven Millennium Prize Problems announced in 2000, the solution to each of which carries a prize of a million US dollars. A quarter-century later, only one of these, the PoincareÌ Conjecture, has been solved, by unemployed Russian mathematician Grigori Perelman, who turned down both the million dollar prize and a Fields Medal.
The very highest level of mathematics challenges and accomplishments, which carry no monetary award at all, is solving one of the famous historical mathematics questions that date back to the ancient Greeks. My personal favourite still-unsolved problem is Goldbachâs 1742 gem: Is every even integer greater than two the sum of two prime numbers? Yes or no?
Solving one of these problems brings the highest recognition among mathematicians, and sometimes even the general public. When Princeton mathematician Andrew Wiles was able to solve Fermatâs 1637 conjecture in 1995 after working on it for nearly eight years, mostly in isolation and in secret, his discovery instantly made international news. The sex statistics for solving these conjectures are as persistent and one-sided as those for chess. For instance, of the roughly two-dozen famous conjectures that have finally been cracked by lone mathematicians in the past fifty years, every one has been solved by a man.
Menâs domination at the highest levels of mathematics is much more contentious than their dominance in championship chess for several reasons. For starters, almost every human being on the planet, whether good at mathematics or not, is forced to learn and use at least some basic arithmetic and algebraic skills in daily life.
Another important difference between chess and mathematics is that mathematics is often both fascinating and useful to the general population. For example, an article by applied mathematician Herman Chernoff titled, âHow to Beat the Massachusetts Numbers Gameâ made instant news, and Benfordâs Law is employed internationally to detect financial and voting fraud as well as earthquakes. There are huge numbers of excellent careers at all levels in mathematicsâteachers from grade school to graduate school, financial analysts, data scientists, sports statisticians, you name it. But only a handful of Grandmasters make a full-time living playing chess, and no one but dedicated chess aficionados cares about new opening or end-game chess strategies.
Unlike Olympic sports, but like chess, the International Mathematical Olympiad has both an open category that anyone may enter and separate competitions open only to girls or women. The annual open International Mathematical Olympiad is augmented by both a European Girls Mathematical Olympiad and a Chinese Girls Mathematical Olympiad. Although American girlsâ teams have won medals, for reasons unexplained there are neither American nor International Girls Mathematical Olympiads at the present time. The renowned Putnam Mathematical Competition is also open to everyone, but instead of an additional women-only division, it has a special award for the highest-scoring female.
Since some of the women-only mathematics prizes also come with high perksâthe Birman, Michler, and Street women-only mathematics awards, for example, each award US$50,000âthe same question may soon arise in mathematics. Should transgender women be barred from female-only mathematics competitions and awards? Does biology play an important role, not only in tournament-level chess as Hooven suggests, but also at the highest levels of mathematical achievement?
The Greater Male Variability Hypothesis and Its Critics
In making the case for the role biology may play in championship chess, Hoovenâs Quillette article did contain one significant errorâshe dismissed greater male variability(GMV) as a factor. This error does not affect her other arguments, and GMV may in fact provide strong support for why men appear to dominate the upper tiers of both chess and mathematics. Her error, as I will explain below, is a version of a fundamental logical error that continues to plague GMV-related research today.
The basic tenet of GMV, which dates back to Charles Darwinâs observation in his 1871 book The Descent of Man, is that throughout the animal kingdom, males generally tend to be more variable than females of the same species. In the highly controversial field of human cognition, for example, GMV has often been interpreted to mean that there are more idiots and more geniuses among men than among women.
Darwinâs questions about sex differences in variability continue to challenge scienceâover 350 scientific articles on the subject have been published since 2020, and GMV is generally accepted in many contexts. For example, using three different measures for differences in variability (Leveneâs test, the variance ratio test, and the distances between cumulative distribution functions), Lehre et al concluded:
The data presented here show that human greater male intrasex variability is not limited to intelligence test scores, and suggest that generally greater intrasex variability among males is a fundamental aspect of the differences between sexes.
Most people do not care about sex differences in blood pressure or birth weight, but the subject of sex differences in human cognitive abilities has been an incendiary topic almost since Darwinâs time. And much of the controversy, as Hooven hints, has related to the role of biology.
In 1992, Stanford education professor Nel Noddings called GMV a âpernicious hypothesis.â It is, she explained, âobjectionable because of its biological connotationsâ and this accounts for âthe revulsion with which many feminists react to it.â In their 2018 textbook on the psychology of women and gender, professors of womenâs and gender studies Nicole Else-Quest and Janet Hyde take Noddingsâs observation a step further. âMany feminists,â they write, âare wary of biological explanations of anything, in large part because biology always seems to end up being a convenient justification for perpetuating the status quo.â
But in a curious turn of events (and an even more curious turn of logic), Darwinâs biological conception of GMV is now being used to argue that the persistent overrepresentation of men in fields like mathematics is mainly due to cultural effects, including what Hooven calls âsexism and harassment.â Many prevailing GMV arguments, however, suffer from fundamental logical errors. The current scientific definition of GMV is that there is GMV for a given trait ifâand only ifâthe standard deviation for the male values of that trait is larger than the standard deviation for the female values of that same trait. This definition says absolutely nothing about comparisons between the averages or high or low values of the male and female distributions.
The current scientific definition of GMV is that there is GMV for a given trait ifâand only ifâthe standard deviation for the male values of that trait is larger than the standard deviation for the female values of that same trait.
Informally, GMV is usually understood to mean that the male values are more spread out than the female values for that trait, but there are many different statistical measures of how âspread outâ a distribution or data set is. The range of the data set, for example, is simply the difference between the maximum and minimum values of the data. Other statistics such as mean (or average value) and median are generally well understood by the general public and used more or less correctly in everyday language. But standard deviation is a different matter.
At the end of the semester in my upper-division calculus-based probability courses at Georgia Tech, I was happy if my students could correctly define âstandard deviation.â Care to give it a try? The standard deviation of a data set is âthe non-negative square root of the average of the square of the distance from the data's mean value.â In practice, of course, researchers simply plug their data into a statistical package that spits out the correct numerical values of the standard deviations, which in plain English measures the spread of a data set relative to its mean.
Thus, the current formal working scientific definition of whether or not there is greater male or greater female variability for a given traitâsuch as height or blood pressure or mathematics abilityâdepends solely on whether the male or female standard deviation for those trait values is larger.
Three Common GMV Errors
In understanding the GMV hypothesis, we first have to clear up some common misunderstandings:
The Uniformity Error: Just because thereâs GMV in a whole population doesnât mean it exists in every subgroup.
The Cause-and-Correlation Error: GMV doesnât tell us anything about why the variability exists. It could be due to biological factors, social factors, or a mix of both.
The Tail-Ratio Error: GMV doesnât automatically mean there will be more males at the extreme high (or low) end of a trait.
The Uniformity Error The uniformity error isthe mistaken assumption that if the distribution of a certain trait exhibits GMV for a population as a whole, then the individuals in various subpopulations must also exhibit GMV for that trait. This is simply not true. Consider a data set with equal numbers of males and females, where the menâs values are all tightly clustered in two equal-sized groups of high and low values, and the female values are uniformly distributed between those same high and low values. Clearly there is GMV for this trait, but among the above-average values alone, there is greater female variability.
Hoovenâs inquiry into why men dominate the highest levels of chess commits the uniformity error by dismissing GMV as a possible factor, because even though there appears to be GMV in the overall population of world-class chess players, some subgroups actually exhibit greater female variability (GFV). But GMV in chess abilities cannot be ruled out simply because of this non-uniformity. In fact, GMV could be a very significant explanation of why nearly all Grandmasters are men.
The Causation and Correlation Error A second basic error in GMV arguments is a variant of the common causation and correlation error that has plagued the study of statistics since its inception. Whether or not a data set of a certain trait among males and females exhibits GMV or GFV implies nothing at all about the causes of the sex differences in the variability. A momentâs reflection tells us that this is true even for a single data set. If we learn that the standard deviation of the height of men in Canada is 2.7 cm, that is all we knowânot anything about the underlying reasons for this magnitude of standard deviation. Similarly, since GMV and GFV simply record comparisons of the two respective standard deviations, they reveal absolutely nothing about the underlying causes of any differences between the male and female standard deviations for that trait.
The Tail Ratio Error The third common logical fallacy in GMV arguments is what I will call the tail ratio error. Without additional knowledge about the actual distribution of the trait values in question, neither GMV nor GFV imply anything about sex comparisons in the extreme values of that trait. Suppose, for example, that the female values for a given trait are all high and tightly clustered, and those for men are all lower but widely dispersed. In this case, there is clearly GMV for that trait, but females are all at the top, and the ratio of females to males in that upper tail region is high. On the other hand, if the male values are again more widely dispersed but are uniformly higher than the female values, there is again GMV but the ratio of men to women among the highest values is high. Thus GMV or GFV alone implies absolutely nothing about the tail ratios for that trait, including the dominance of either sex in chess or mathematics or any other trait.
A simple example with real data may help clarify these three errors. This bar graph data is from mathematical performance tests of middle school children internationally. First, the non-uniformity of the male/female variability in these countries, which exhibit both GMV and GFV, does not rule out GMV globally, and suggesting otherwise is committing the uniformity error. In fact this study found that there is indeed GMV in the worldwide test scores. Second, inferring causes for the GMV in Norway and the GFV in Tunisia solely from these differences in variability is committing the causation and correlation error. The fact that male or female standard deviation is greater in a country gives absolutely no clue as to the underlying reasons for these differences. Third, neither GMV nor GFV alone, without additional information about the distributions, implies anything about extreme values. For example with these same standard deviations, 95 percent of the top 1 percent of Norwegiansâ scores could be males (e.g., if the average score for males is 100 and the average for females is 50), or 95 percent of the top 1 percent could be females (if the female average score is 100 and that of males is 50). Thus drawing any conclusions about the over- or under-representation of either sex in the extreme values, based solely on these differences in variability, is committing the tail ratio error.
Unfortunately, a widely cited and influential article published over a decade ago in Notices of the American Mathematical Society (âthe worldâs most widely read journal aimed at professional mathematiciansâ) contained all three of these errors. That article, by a husband and wife team of professors at the University of Wisconsin, committed the tail ratio error by claiming that if GMV were true for mathematics ability, it would explain why all Fields Medalists (one of the highest prizes in mathematics) had been male (at the time of the articleâs publication). It then committed the uniformity error by claiming that if GMV were true internationally, then it would hold for each individual country. Since every individual country did not exhibit GMV in mathematics ability, they conclude that this variability in relative standard deviations debunks the greater male variability hypothesis altogether. This supposed debunking of GMV, they then argued (committing the cause and correlation error), implied that those sex differences were therefore an artefact of âa complex variety of sociocultural factors rather than intrinsic differences,â i.e., cultural factors as opposed to âinnate, biologically determined differences between the sexes.â
These same basic GMV errors are being widely propagated in both research and popular-science journals today, including national news outlets. Scienceâthe peer-reviewed academic journal of the American Association for the Advancement of Science and one of the worldâs top academic journalsâreported on this research in an article titled, âIt Doesnât Add Up.â It, too, claimed that this âcross-cultural analysisâ of GMV in mathematics ability in various nations âseems to rule out several causal candidates, including ... innate variability among boysâ and that this points to âlocal social factors as the likely primary culprit. Gender gaps vary from place to place, showing that cultural factors swamp biological ones.â
Scientific American referred to the same study and concluded, âNow that the greater male variability hypothesis has fallen short, nature is not looking as important as scientists once thought.â Biology, the author added, âmay play only a minor role in the math gender gap.â As NBC News reported it, this GMV research showed that âthe âGender math gapâ is cultural, not biological.â
Similarly, a research article in the Proceedings of the National Academy of Arts and Sciences reported that, with respect to mathematics, GMV âis largely an artifact of changeable sociocultural factors, not immutable, innate biological differences between the sexes.â The science website PhysOrg, reporting on similar published GMV research conducted at the Australian National University, reported that, âDeeply entrenched scientific beliefs that for more than a century have explained why more men than women are high achievers because of biology are not backed up by evidence.â The headline of that article? âSexist âSexplanationâ for Menâs Brilliance Debunked.â
It is exactly these misconceptions about GMV that have continued to make it inflammatory. For suggesting that GMV might be a contributing factor to the under-representation of women in physics and mathematics departments at top universities, Harvard president Larry Summers was forced to resign in 2005, even though he had simplyâand correctlyâobserved that the standard deviation of men in several traits, including mathematics ability, is larger than the standard deviation of women. A similar viewpoint about GMV in high-tech fields led to the firing of Google engineer James Damore in 2017.
In a 2017 essay for Quillette, I detailed the harassment my colleagues and I experienced after we suggested a possible mathematical theory for GMV. In response, an official blog of the American Mathematical Society, of which I have been a member for fifty years, called my evolutionary theory for GMV âsexist,â and fraught with âbad mathâ and âfull-blown gloves-off misogyny.â
By trying to help set the scientific record straight and put a stop to the propagation of these errors, cognitive psychologist Rosalind Arden and I recently wrote to the Notices, and offered to help identify and correct the basic errors about GMV that were being repeated. But after the Noticesâ editor-in-chief consulted her editorial board several times, she decided not to publish any conceivable version of the paper (shorter, gentler, whatever), even as an editorâs note alerting readers to the errors. She declined to provide any reasons or explanation for the blanket rejection, and suggested that we publish it elsewhere, which we did just last year.
Bell Curves and Extreme Tails
As we have seen, whether or not a data set exhibits GMV or GFV implies absolutely nothing about the tail ratio of men to women above (or below) a given cutoff. Tail ratios are among the most common and important statistics used to compare extreme values between two data sets. In Hoovenâs analysis, for example, she reports that about two percent of chess Grandmasters are womenâso, the male-to-female tail ratio among chess Grandmasters is 50:1.
There is much less interest in how the average chess player or mathematician of each sex compare; the emphasis is usually on comparisons between top performers. However, comparisons of only the standard deviations of the two data sets, i.e. GMV or GFV, imply absolutely nothing about the extreme values and, in particular, the tail ratios. In fact, GMV alone could easily explain the persistent sex gap in chess and mathematics if those abilities are normally distributed.
In common explanations of the GMV and its implications for disproportionate levels of representation in various realms, authors often include graphs of two bell-shaped curves with different degrees of spread. These graphs are invariably of what is called the normal (or Gaussian) distribution, after mathematician Carl Friedrich Gauss, who discovered it in 1809 during his experiments with celestial observations.
The normal distribution is by far the most prevalent distribution in biometric studies, and like Darwinâs GMV hypothesis, Gaussâs normal distribution bell curve has also been the epicentre of heated controversies, especially with regard to biological applications. When political scientist Charles Murray, one of the authors of The Bell Curve, was invited to speak at Middlebury College in 2017, student protests not only disrupted his speech but also injured a faculty member, drawing national media attention.
One of the reasons for this prevalence of the normal bell curve in real-life data is that it is the consequence of one of the most powerful and beautiful theorems in statistics, the Central Limit Theorem, which states that if you repeat any experiment independently many times, the distribution of the sample average converges to the normal distribution bell curve. Not to some other shape, nor even to one of the other bell curve distributions.
An ingenious simple experiment devised by British polymath Francis Galton in the late nineteenth century, and still popular in science museums and classrooms today, yields a fascinating visual demonstration of this convergence to the normal distribution bell curve. May human traits have been found to be essentially normally distributed, including blood pressure, birth weight, and Scholastic Aptitude Test (SAT) scores in mathematics.
Normal distributions have a special property concerning tail ratios. Georgia Tech physicist Ron Fox and I recently discovered an extreme tails propertyfor normal distributions that is apparently not well-known. In every collection of different normal distributions, there will always be exactly one of those distributions that is not only over-represented in the right tail but also completely overwhelms all of the other distributions in the rightmost tails. Additionally, exactly one distributionâpossibly the same oneâwill overwhelm all the others in the lowest values. Of course, at what point the sex gap first appears is dependent on the parameters of the corresponding bell curves.
If the normal distribution of traits in one sex has a higher standard deviation than that in the other, that sex will completely overwhelm both the upper and lower tails.
If the normal distribution of traits in one sex has a higher standard deviation than that in the other, that sex will completely overwhelm both the upper and lower tails. For example, since SAT scores in mathematics are distributed normally, and males have a larger standard deviation, they dominate both the high and low ends of the curve. Whether or not chess ability is also normally distributed is not known, since there is only data for the highest level of players. This may soon change, since Armenia recently required every pupil in the country between the ages of six and eight to learn chess in school, and it is quite possible that boys are also over-represented among the worst players, just as they are in mathematics.
If both sex distributions are normal and have exactly the same standard deviation, on the other hand, then the one with the higher mean will completely overwhelm the upper tail, and the one with the lower mean will completely overwhelm the lower tail. This extreme tails property, although not unique to normal distributions, is not shared by other common distributions, even by other common bell-shaped distributions.
A concrete example may help to clarify this extreme tail property for normal distributions. It has been reported that the blood pressures of humans are essentially normally distributed, and that males have an average value of 124.7 and a standard deviation of 14.5, while females have an average of 125.7 and a standard deviation of 10.4. A standard statistical calculation then yields that there are a few more males than females with blood pressure of 130 or higher, i.e. the M/F tail ratio at 130 is slightly greater than 1:1. The M/F tail ratio above 140 is about 2:1, at 150 about 5:1, and 160 about 20:1. So, even though men have, on average, lower blood pressures than women, men overwhelm women at the high values, and the tail ratio gets worse the higher one goes. An example of this in the mathematical context can be seen by looking at the SAT scores in mathematics where the M/F tail ratios are increasing, both as the scores get higher and as they get lower.
Summarising the key GMV and bell-curve points here, there are three important takeaways:
Without any additional information about the underlying distributions of the trait and species under study, whether or not there is GMV or GFV says absolutely nothing about the causes of any sex differences in that trait. Nothing about biology or culture or anything else.
Without additional information about the underlying distributions, neither GMV nor GFV says anything about the comparative extreme values. There could be greater male variability while females completely dominate the top end, and vice versa.
If two different sex distributions are normally distributed, then one of those two sexes will completely overwhelm the other at the high end of values, and another, possibly the same, will overwhelm at the low values.
Harvard president Summers and Google engineer Damore might therefore both have avoided defenestration had they simply announced that mathematics and engineering abilities essentially follow a normal distribution, and left the identification of the dominant sexes in each case to the bureaucratic bean-counters.
Charles Darwin might have been intrigued had he foreseen that, a century and a half later, his observation about generally greater male variability throughout the animal kingdom, including humans, would still be extensively studied and cited. But he might also have been bewildered to learn that his own biological hypothesis would be incorrectly invoked to argue that the over-representation of males in many fields like chess and mathematics is primarily not due to biological causes.