Artificial Intelligence (AI) is advancing at a dizzying pace, with new models seemingly appearing every week. From OpenAI’s latest releases to innovations from startups like DeepSeek in China, the race to build smarter and more capable AI is relentless. Among the most talked-about developments is the claim that modern AI models are not just parroting information but actually reasoning, thinking through problems the way humans do.
But is that true?
This question isn’t just academic. How we answer it shapes how we trust and use AI from casual users relying on ChatGPT for homework help to governments deciding whether to depend on AI in critical decision-making. The answers vary widely, often split between hopeful believers and skeptical critics. Yet the best response lies somewhere in the middle.
Before debating whether AI truly reasons, we need to understand what “reasoning” actually is. The AI industry often describes reasoning as the ability to break down a complex problem into smaller parts and solve them sequentially, arriving at a conclusion by chaining logical steps, this is called chain-of-thought reasoning. For example, to solve a tricky math problem, a model might first identify variables, then apply formulas step by step, just as a person might.
But reasoning is much broader than that. Human reasoning includes many forms:
We don’t fully understand how reasoning works in human brains, and replicating all these types in AI remains a monumental challenge. Yet many claim today’s “reasoning” AI models are closer to humans than ever before because they can solve problems that require multiple steps of thought and even write error-free code.
Many AI researchers are deeply skeptical that today’s models truly reason in any meaningful way.
Shannon Vallor, a philosopher of technology at the University of Edinburgh, describes what she calls “meta-mimicry.” Instead of genuinely thinking, these AI systems imitate the process of reasoning as it appears in the vast troves of text they’ve been trained on. Put simply, they don’t think; they pretend to think by mimicking human language patterns associated with reasoning.
This view is reinforced by the observation that even the newest reasoning models, like OpenAI’s o1 and its successor o3, continue to fail embarrassingly on simple problems. For example, AI models have historically flubbed classic logic puzzles like the “man, goat, and wolf crossing a river” problem in ways that no reasonable human would. If these models were truly reasoning, why would they get such basic problems wrong?
Additionally, the architectural foundations of these models have not dramatically changed from older, more “memorization-heavy” models like ChatGPT. If a simple architecture fails to reason well, it’s questionable whether a model with incremental improvements suddenly “figures out” genuine reasoning.
Melanie Mitchell, a professor at the Santa Fe Institute who studies AI’s reasoning capabilities, highlights the mystery behind what AI models actually do during their extensive computations. She points out that humans solve complex problems quickly, often verbalizing just a few logical steps before arriving at a solution. Meanwhile, AI models engage in lengthy internal processes, sometimes generating meaningless intermediate “filler” tokens that help the model allocate more computational resources without necessarily improving true understanding.
An intriguing experiment called “Let’s Think Dot by Dot” showed that simply instructing a model to generate meaningless dots improved problem-solving performance. This suggests that what looks like chain-of-thought reasoning may be, in part, an artifact of how models allocate resources rather than genuine stepwise thinking.
Furthermore, many AI “reasoning” successes may be the result of heuristics mental shortcuts learned from training data. For instance, an AI trained to detect skin cancer might actually rely on spurious correlations, like the presence of a ruler in photos indicating malignancy, rather than truly understanding medical features.
All of this has led skeptics to argue that AI is just a very sophisticated form of pattern recognition and memorization, cloaked in the illusion of reasoning.
On the other side of the debate, some experts argue that AI systems are doing genuine reasoning, even if it’s not perfect or human-like.
Ryan Greenblatt, chief scientist at Redwood Research, explains that while AI reasoning isn’t as flexible or generalizable as human reasoning, the models do exhibit some capacity to generalize beyond mere memorization. These models solve problems they have never directly encountered by piecing together learned concepts.
Greenblatt also contextualizes AI failures in human terms. Consider the “man, goat, and cabbage” puzzle again. The model often confuses the details because it’s seen many versions of similar puzzles with different elements, sometimes a wolf, sometimes a cabbage and it tries to match the prompt to its closest learned heuristic rather than starting from scratch. This is akin to a student trained on multiple variations of a problem who sometimes mismatches the details.
Ajeya Cotra, a senior AI risk analyst at Open Philanthropy, offers an insightful analogy. Imagine teaching a college physics class:
Cotra argues that AI models resemble the diligent but not genius student. They memorize far more information than a human student would but still need to figure out which “equations” (or learned patterns) to apply. This combination of memorization and limited reasoning gets them surprisingly far even if it doesn’t approach human-like insight.
In other words, AI reasoning is neither all mimicry nor all true reasoning. It exists along a spectrum, mixing learned heuristics with some capacity to synthesize new solutions.
Computer scientist Andrej Karpathy coined the term jagged intelligence to describe this uneven mix of AI capabilities. AI systems today can solve complex problems, like writing flawless code or acing math tests, but simultaneously stumble on what seem like trivial challenges.
This “jaggedness” reflects the fact that AI’s problem-solving skills are patchy, sometimes brilliant, sometimes bafflingly bad. This is very different from human intelligence, which is far more consistent in everyday tasks.
Jagged intelligence forces us to recalibrate expectations. AI won’t think like a human tomorrow, but it can solve certain complex problems today, even without understanding them the way people do.
The question of whether AI truly reasons isn’t just philosophical, it has practical consequences:
Public Perception: Overhyping AI’s abilities risks disappointment and mistrust, while excessive skepticism might delay adoption of valuable tools.
AI researchers are actively working to close the gap. New techniques aim to improve AI’s ability to generalize from limited examples, incorporate common sense knowledge, and reason causally. Models are becoming more transparent and interpretable, helping researchers understand how AI arrives at answers.
Yet full human-like reasoning with creativity, intuition, and deep understanding remains a long way off. For now, AI exhibits “jagged intelligence,” a mix of memorization, heuristics, and emerging reasoning abilities.
This “in-between” state is both promising and cautionary. AI is powerful and improving fast, but it’s not a flawless thinker. Understanding its strengths and limits will help society harness AI’s potential responsibly.
In Summary:
As AI continues to advance, keeping a clear-eyed perspective on what it can and cannot do will be essential for everyone from technologists and policymakers to everyday users.