Thinking Slow: Artificial Reasoning

The divide between human and machine intelligence is narrowing faster than we can invent stories to assert our supremacy. Each time we draw a line in the sand, machines leap over it. Remember when we thought creativity was uniquely human? 2020 was a simpler time.

The current line in the sand is reasoning. Skeptics assert that models like GPT-4 do not “understand” what they are saying. The skeptics are mostly correct. Sebastien Bubeck illustrates this point near the end of his “Sparks of AGI: early experiments with GPT-4” talk. The model struggles to provide reliable explanations for its responses, at one point claiming an incorrect calculation was “a typo.”

Let’s set aside the obvious problem of using arithmetic as a proxy for reasoning. GPT-4 was trained on text. That text had numerical patterns, but we should not be surprised when language models struggle with arithmetic. It would be like sending a student to nothing but history classes and then wondering why they suck at math.

The more important question is whether human-level reasoning is within reach of machines. I believe the answer is “yes,” and the implications are slightly terrifying.

One Step at a Time: Human Reasoning

Human reasoning has been an active area of research for hundreds of years. It began with philosophers exploring its origins and meaning. It continued with psychologists documenting the various systems and mechanisms. Today, neuroscientists study circuitry and chemistry to unravel the mysteries of the mind.

One of the best-known books on human reasoning is “Thinking, Fast and Slow” by Daniel Kahneman. The book is based on his Nobel Prize-winning research with Amos Tversky and describes two modes of thought:

  • System 1: fast, automatic, frequent, emotional, stereotypic, and unconscious

  • System 2: slow, effortful, infrequent, logical, calculating, and conscious

System 1 and System 2 are helpful for understanding and predicting human behavior. However, I have a problem with any model that distills complex systems into discrete categories. Simplicity makes Kahneman and Tversky’s model practical and also obfuscates essential nuances.

System 2 implies that the mechanisms through which I multiply 37 by 42 and write this article are similar. The problem is that we know those are two different pathways in the brain. How do we reconcile these findings without creating a complicated model that is no longer useful?

The brain is a powerful pattern-matching engine. We receive data through our senses and identify patterns that help us survive. Some patterns are simple. If I show you a picture of a circle, you can quickly name the shape. Other patterns are complex. If I ask you to calculate the circle's circumference, you will need a minute.

We encounter curved shapes so frequently that a model is wired into our visual cortex. Naming the shape is simply a matter of connecting the sensory input to a few other parts of your brain, like the Broca area involved in producing the word “circle.”

The second task, calculating the circumference, also requires connections between multiple areas of the brain, but there are more steps:

  1. Identify the shape as a circle (same as the first task)

  2. Connect the concept of a “circle” to the concept of “circumference”

  3. Connect “circumference” to symbols representing the mathematical formula for calculating a circumference

  4. Reconstruct from memory the first few digits of “pi”

  5. Estimate the circle’s radius based on its relationship to a known object or a reference object (e.g., a ruler)

  6. Calculate the circumference based on known patterns for multiplying numbers (this is multiple steps, but I am collapsing it into a single step for brevity)

Calculating the circumference requires several steps, but each step is a simple pattern-matching exercise. If I asked you to do each step in isolation, you could probably do most of them instinctively.

I see little evidence of a difference between System 1 and System 2 thinking. Identifying multiple patterns in succession requires short-term memory to store intermediate results, but that should be true for all but the most basic tasks (e.g., reflexes that do not engage the cortex). Perhaps what makes System 2 thinking “conscious” is its heavy reliance on short-term memory that is easily accessed.

Collapsing Systems 1 and 2 into a single, continuous system maintains the model's simplicity while allowing for nuance. For example, what if I ask you to estimate rather than calculate the circumference? In that case, the chain becomes shorter. Steps 1–3 remain the same. However, steps 4–6 are replaced by a single step similar to step 5 (visually estimate the circumference rather than the radius).

The model I’m describing is consistent with Kahneman and Tversky’s observations. It allows for biases and heuristics but does not ascribe them to one mode of thinking or another. For example, anchoring bias could be the conflation of priming data stored in short-term memory with intermediate results stored between processing steps.

I refer to this as the “stepwise” model for reasoning. If you believe the stepwise model accurately represents how we reason, the emergent behavior of AI begins to make sense.

All Steps at Once: AI Reasoning

When you type a question into ChatGPT or another large language model, the AI attempts to predict a series of characters most likely to satisfy your requirements for a “correct” answer. You provide the input, and the machine runs billions of calculations to predict the output.

AI models do not break complex questions into a series of more straightforward pattern-matching tasks. They attempt to generate responses in a single step. We know that reducing steps produces less accurate results in humans. Estimating the circumference of a circle is different from calculating it.

Attempting to solve complex problems in a single step is incredibly difficult. To work, pathways through the neural network must connect all concepts required for the prediction. Even changing “circumference” to “area” should invoke different pathways.

That said, just because an AI doesn’t use a stepwise model doesn’t mean it can’t. If you force an AI to break a problem into steps, it will tackle each one independently. This is the idea behind AutoGPT. There is nothing novel about AutoGPT. It is a platform that forces GPT-4 (the same model powering ChatGPT) to break complex problems into steps.

This might be the point where you say, “Yes, but humans must specify the steps required.” And my response is that specifying steps is simply another pattern-matching task. If you want evidence, ask ChatGPT to generate a series of steps for a complex problem (e.g., how to buy a car).

There is nothing magic about AutoGPT. All it does is force OpenAI’s models to engage in stepwise reasoning. It would be like asking you to calculate the circumference of a circle but adding, “Don’t estimate.”

Thinking Slow: The Context Penalty

The thing I found most surprising about GPT-4 the first time I used it was not the accuracy. I expected the model to be more capable than GPT-3. I found the speed (or lack thereof) most surprising. I have not run my own tests, but I estimate the new model is at least two times slower than the old one.

This may be an infrastructure issue, as some have speculated. However, there is another explanation that seems more logical. OpenAI showed that GPT-4 is multimodal. That means it processes both images and text. While there are certainly overlaps between the two capabilities (e.g., letters are nothing more than small images), I imagine the pathways in GPT-4 are far more complex than in GPT-3.

OpenAI has not announced the number of parameters in GPT-4. However, even if the number of parameters is the same as GPT-3 (175 billion), the pathways would be inherently more complicated. The word “circle” is no longer associated only with words like “ball” and “round.” It is also associated with images. That means the pathway to associate “circle” with “circumference” should be more complex than in a text-only model.

Quick disclaimer: I am aware that neural networks (biological and artificial) do not store information like a database. There is no neuron or node assigned to the word “circle.” That said, the impacts on complexity that I describe should be consistent across levels of abstraction.

My guess is that the speed decline from GPT-3 to GPT-4 has less to do with infrastructure and more to do with complexity. We see similar behavior in humans. Try describing a picture. Finding patterns of words that match patterns of visual information is challenging, partly because it requires you to activate circuits across multiple parts of the brain.

There is a computational penalty for context. The narrower the task, the better the performance. This is true for both humans and machines.

To be clear, I am not asserting that multimodal task performance is always slow. We know that deliberate practice can strengthen neural circuits across different parts of the human brain. My point is that these types of gains are not generalizable. When you build expertise, you are rewiring your brain for specific tasks.

Specialization can also improve machine performance. That is why writing a few lines of code for a simple task is more efficient than employing a large language model. It is also why machines reliably outperform humans on narrow tasks. Our biological neural networks are too complex to complete simple tasks quickly and efficiently.

The slow thinking described by Kahneman as “System 2” is not unique to humans. AI models exhibit similar behavior. The difference is that the “all at once” nature of AI reasoning means every task is run through the same complicated neural network. Humans have evolved to reinforce pathways used most frequently and weaken ones used rarely. ChatGPT can recite the first 30 digits of pi with ease. Can you?

Why I’m a Slightly Terrified

This is a good time to reflect on how impressive it is that AI models can tackle complex problems without stepwise reasoning. Skeptics cite AI’s inability to explain its logic as a shortcoming. However, if you believe humans reason using a stepwise model, there is no logic for the AI to explain. The prediction is made in a single step.

Asking ChatGPT to explain its actions is like asking a human to explain the logic behind something instinctual (e.g., solve “2+2=”). The person may explain their actions, but that is not the same as reasoning. The person is simply constructing a plausible response, which itself is a reasoning exercise.

What if stepwise reasoning is an artifact of biological evolution? Humans have about 86 billion neurons. We are forced to break complex problems into steps and specialize where valuable. We cannot grow extra neurons every year or two.

Machines do not face the same limitations. Imagine you could write 100 lines of working code instinctively. What would be the practical reason for doing it in steps? The only answer I can think of is so you could explain it to a less intelligent being. We do this with human children. Are we becoming the children of AI?

I am increasingly convinced there is nothing unique about our ability to reason. Where will we draw the next line in the sand? Social and emotional reasoning? Sentience and consciousness? We experienced two AI winters. Hoping for a third hardly seems like a sound plan. Now is an excellent time to think slow…and fast.

Previous
Previous

Uniquely Human: Reaching Our Full Potential

Next
Next

Mind Like a Sieve