There’s currently lot of anxious chat about ChatGPT-4 here in the Academy. Some professors worry that it’s about to take their jobs (a development that might lead to more interesting lectures). Others are breathlessly predicting the annihilation of humanity when AI spontaneously morphs into something malevolent and uncontrollable. Mostly, however, professors are worried that students will get Chat to do their homework, and some of them are really confused about what to do about this.
But these concerns tend to misunderstand how Chat works and take the “Intelligence” part of “Artificial Intelligence” too literally. The reassuring truth is that Chat isn’t really that smart. To clear things up, I asked ChatGPT-4 to give me a high-level explanation of how it works. It did a pretty good job, but it left some important stuff out, which I’ll fill in.
ChatGPT-4 is a Generative Pre-trained Transformer model built on a type of neural network programming architecture specifically designed for language processing. Initially, it went through a “self-supervised learning” phase, during which it was fed text data from a lot of sources (the web, books, news articles, social media posts, code snippets, and more). As it trained, it constructed complex probability models that enabled it to calculate the likelihood of specific words or patterns of words appearing together. Based on these models, Chat generated algorithms—mathematical heuristics—that select words or sequences based on the presence of other words in a string. The process is analogous to two lovers who can finish each other’s sentences by predicting what the other will say based on what they’ve already said (but without the love part).
During its initial training, GPT-4 acquired a massive amount of information—raw data from material already on the Internet or from information fed to it by its developers. That’s what makes it seem so smart. It has access to about one petabyte (1,024 terabytes) of data (about 22 times more than GPT-3) and uses about 1.8 trillion computational parameters (10 times more than GPT-3). To draw a rough comparison, a petabyte of data printed out digitally would be about 500 billion pages of text.
But here’s the important part that many people don’t know. After its initial training, Chat was “fine-tuned” with what’s called “supervised” training. This means that developers and programmers (that is, real people) and some other AI programs refined Chat’s responses so that they’d meet “human alignment and policy compliance” standards. Developers continue to monitor Chat’s behavior—a bit like helicopter parents—and reprimand it when it gets out of line (so to speak) to ensure that it doesn’t violate company standards by using “disallowed” speech or making stuff up. Apparently, all of this parenting has paid off (from the developers’ point of view). GPT-4 has been much better behaved than its younger sibling, GPT-3. Its “trigger rate” for disallowed speech is only about 20 percent of GPT-3’s, and it makes fewer mistakes.
So, from the outset, Chat and other AI systems are shaped by the peculiarities of a select group of people and their idiosyncratic, subjective points-of-view, assumptions, biases, and prejudices. Consequently—and contrary to what many people think—AI systems like Chat are not “objective” thinking machines any more than you or I are. They’re not even really thinking. They’re manipulating bits of information that people have chosen for them as directed (either explicitly or implicitly through the structures of the computer programs).
When you ask Chat a question, it reaches back into its database of information and arranges a set of symbols in a way that is statistically congruent with the text on which it’s been trained and that aligns with the parent company’s rules for good behavior. This constrains what Chat can do. Interestingly, Chat “knows” these limitations. In its own words:
It’s important to note that, while ChatGPT can generate impressive human-like text, it may sometimes produce incorrect or nonsensical answers. This is because its knowledge is based on the data it was trained on and it lacks the ability to reason like a human. Additionally, it may also inherit biases present in the training data.
That is why AI isn’t really “intelligent,” and why it isn’t ever going to develop a superhuman intelligence—“Artificial General Intelligence”—that allows it to take over the world and destroy humankind.
Math and lobsters
As scientist Jobst Landgrebe and philosopher Barry Smith have argued in their recent book Why Machines Will Never Rule the World, AI systems like ChatGPT won’t ever develop human-like intelligence due to the limitations of their foundational mathematical modeling. Although we can accurately model small (often highly abstracted) real-world phenomena, we simply don’t have the computation ability to model large natural systems—like intelligence—with current mathematics.
Simply put, if you can’t adequately model a phenomenon mathematically, you can’t duplicate it with AI. Full Stop. And the reason we can’t adequately model human intelligence is that the underlying neural networks are unpredictably complex. Much of the current panic about an imminent AI apocalypse results from a failure to appreciate this fact.
A very simple example can be used to illustrate this point. Crustaceans like lobsters have a specialized muscular stomach called a gastric mill (or “gizzard”) containing a set of bony plates. Rhythmic contractions of the gizzard’s muscles rub the plates together and grind up the lobster’s food, and these rhythms are controlled by just 11 neurons (nerve cells). Scientists who study these “pattern-generating” neural networks know everything about the anatomy of each neuron, the neurotransmitters they use, and exactly how they’re all hooked up.
But in spite of all this knowledge, scientists still don’t know exactly how the lobster’s system works. One scientist describes it as “a fascinating subject of research inspiring generations of neurophysiologists as well as computational neuroscientists” that is “still not sufficiently understood.” The coordinated activity of just 11 neurons poses such a complex computational problem that generations of scientists still haven’t completely figured it out. Now think about the computational impossibility of modeling the neural network interactions among the 86 billion neurons in your brain when you’re doing something intelligent.
Further, it’s not just the sheer number of brain cells, and the fact that each cell can have as many as 10,000 connections to other cells that create the difficulty. There are several other roadblocks to modeling brain activity. The first is that neural networks change over time, and their activity patterns are influenced by physiological context (i.e., the moment-to-moment stuff that’s going on in the outside world, and inside of your body).
Second, historically, we’ve used neural activity to measure what brains do. However, recent research indicates that large numbers of neurons are inactive during normal brain activity, some don’t become active until they are needed, and some are “hidden” and only activate irregularly. Further, the neurons can “change their connections and vary their signaling properties according to a variety of rules.” We simply don’t have the ability or information necessary to model these kinds of hypercomplex systems. This problem is analogous to the difficulties we have with longterm weather forecasting. And if you can’t mathematically model what the brain is doing, you can’t build an artificial system that duplicates its activity.
The third roadblock to modeling human intelligence is related to a point I have made elsewhere. For example, let’s say I want to compare the neural network activity responsible for different self-perceptions between two people—a conservative and a liberal, for instance. How would I go about doing that?
Well, first I’d have to pick a time when my first volunteer was unequivocally identifying as a member of one of the two groups (and, of course, I would have to take their word for it). Then I would have to remove their brain and drop it into liquid nitrogen to freeze the position of all of the molecules. Next, I’d have to map the location and configuration of the molecules, a step that would require technologies many magnitudes of resolution and sophistication greater than we currently have. But now I’m stuck. Cognitive processes like self-identity are not represented by the momentary state of a neural system. They are, by definition, dynamic. So, I would need to repeat the process over time to characterize the network’s dynamic properties. Unfortunately, my volunteer would be dead.
Of course, I could try to do this with a PET or CT scan or an fMRI. But the temporal and spatial resolutions of those techniques are far too crude to map complex cognitive processes. And, even if they were sufficiently precise, correlation is not causation. I couldn’t know for sure if I was mapping the cause or the effect of the volunteer’s self-identity. Maybe, theoretically, I could stick electrodes into some of the volunteer’s brain cells, record their activity (as is done with animal models), stain the cells, slice up the brain, and identify the cells microscopically. But I’d need to do this hundreds of times with different volunteers to characterize the network as a whole. I don’t think anyone would volunteer for that particular experiment. And, given the difficulty of figuring out the lobster’s gastric mill, I’m not sure I’d ever be successful.
So if it’s both conceptually and technically impossible to capture the dynamic brain states that represent something as apparently simple as your current self-identity, hopefully you can see why it’s impossible to capture and model something as complicated as human intelligence. And as I said, if you can’t model it mathematically, you can’t build a machine to duplicate it.
What’s that “intelligence” thing that Chat doesn’t have?
One of the reasons people have difficulty defining “intelligence” is because it’s neither a unitary thing (like a toaster or a chair), nor a constellation of discrete intelligence modules (e.g., spatial intelligence, linguistic intelligence, music intelligence, etc.). Intelligence refers to the distributed central nervous system processes that allow people (and some other animals) to spontaneously come up with unique solutions to unexpected problems.
These processes may be quite constrained and only work in very limited circumstances, like the learning ability of insects. Or they may be broad and generalizable over a wide range of circumstances and domains, like our problem-solving ability. In all cases, however, intelligence implies more than re-combining bits of memorized information or regurgitating predetermined outputs in response to specific prompts, irrespective of how complex those outputs may be.
Perhaps more importantly, human (and animal) intelligence only develops in relationship to the environment in which the organism is embedded. In other words, it has a developmental and epistemological history. We know that from decades of animal research, and the tragic cases of children who have grown up in severelyisolated or feral environments. Conversely, well-developed cognitive processes deteriorate if people are placed in a chamber that deprives them of environmental (sensory) inputs.
Consequently, human-like intelligence cannot be created de novo (from scratch). It requires a history. Further, every person’s intelligence is a product of their unique brain developing over their lifespan in their unique set of circumstances. In other words, each of us carries a unique constellation of experiential baggage that shapes how we think. In addition, we carry a lot of evolutionary baggage that has shaped the neural structures of our individually unique brains.
An everyday example of how human intelligence is different from that of AI is driving a car. Every moment-to-moment situation that you face while driving is unique for scores of reasons: the condition of the road, the distribution and movement of pedestrians, bicyclists, and automobiles, the degree to which you are distracted, attentive, hungry, or tired, the weather, and so on. Yet you manage. In the vast majority of cases, you are able to make it home in one piece. Completely autonomous, self-driving cars, on the other hand, don’t do well in these complex environments, even when faced with unexpected situations that would be trivial for a human to navigate.
About 10 years ago, people in the industry predicted that we’d all be in autonomous, self-driving cars by now. Well, now some AI experts believe that will never happen unless we create a completely closed road system and ban all human drivers, pedestrians, and bicyclists because they create unpredictable movement patterns that simply can’t be adequately modeled. Consequently, we’ll never be able to develop an AI system that safely operates autonomous cars in the chaotic environments in which you and I drive every day. Again, this is evidence of the limits of AI and a testament to the unique information processing and problem-solving abilities of the human brain.
Let me give you one more example of the difference between human and machine problem-solving abilities. Think of those captchas you have to solve when you log in to a website. You know, those squiggly letters and numbers that you have to identify, or that set of nine pictures from which you have to pick the few that have a motorcycle in them. A first-grader has no problem solving these. A computer bot can’t. That simple ability is enough to distinguish you from a machine.
So, as in the driving example, what ChatGPT-4 (and other AI programs) can’t do is move beyond their training data in any meaningful way. GPT-4’s ability to recombine its stored data bits is remarkable, to be sure. But it’s only remarkable within its narrow, closed environment where every output is derived by combining some subset of inputs. That’s why AI is good at playing chess or Go, but if you change one of the game’s rules, the program becomes useless.
And as I’ve said, even within its closed world, GPT-4 is neither omniscient nor infallible. When I asked it some straightforward, scientific questions, it got a lot of them wrong. For instance, I asked “Who were the first scientists to identify and synthesize a praying mantis pheromone?” it couldn’t tell me. (It was me and a couple of my colleagues. Our paper was published in 2004.) When I asked it who wrote the article “First Identification of a Putative Sex Pheromone in a Praying Mantid,” it got the name of the journal correct (Chemical Ecology), but said it was published in 2018 and made up some fictitious authors.
Oh well. At least it was able to provide me with the best recipe for an Old Fashioned cocktail—after which it admonished, “Enjoy your Old Fashioned responsibly!” Thanks, Chat.
CORRECTION: An earlier version of this article incorrectly stated that ChatGPT-4 was trained using Masked Language Modeling (MLM). Apologies for the error.