“Language models are like parrots: they can produce convincing speech without understanding what they’re saying.” — Critics
“Parrots don’t write poetry that makes you cry.” — Defenders
The Battle Lines
In 2021, researchers Emily Bender, Timnit Gebru, and colleagues published a provocatively titled paper: “On the Dangers of Stochastic Parrots.” The paper ignited a debate that continues to shape how we think about large language models. The central question: Do these systems genuinely understand language, or are they sophisticated mimics—parrots that have learned to produce convincing-sounding speech without any comprehension underneath?
This matters beyond academic philosophy. How we answer affects whether we trust LLM outputs. It shapes how we regulate AI systems. It influences what moral status, if any, AI might deserve. And perhaps most surprisingly, it forces us to examine our own assumptions about what understanding even means.
The Stochastic Parrot View
The “stochastic parrot” metaphor captures a specific critique. LLMs are trained on massive amounts of human text. They learn statistical patterns—which words tend to follow which other words, which phrases occur together, what kinds of responses typically follow what kinds of prompts. When they generate text, they’re essentially sophisticated pattern-matching: producing sequences of words that sound meaningful based on those learned statistics, but without any genuine understanding of what the words mean.
The critics point to several pieces of evidence. First, the training objective is shallow. LLMs are trained to predict the next token—essentially, to guess the most likely next word. This is a surface-level statistical task. It doesn’t obviously require deep understanding of meaning, any more than autocomplete on your phone requires your keyboard to understand what you’re trying to say.
Second, there’s no grounding. LLMs process text and only text. They’ve never seen a sunset, felt rain, or experienced loss. All their “knowledge” comes from descriptions of experiences, never the experiences themselves. How can a system understand “pain” if it has never felt anything?
Third, coherence isn’t the same as comprehension. A well-trained statistical model can produce remarkably coherent text without understanding any of it. The Chinese Room thought experiment makes this vivid: a person following rules to manipulate Chinese symbols might produce perfectly grammatical Chinese responses without understanding a word. LLMs might be doing something similar at massive scale.
Finally, hallucination provides evidence. LLMs confidently generate false information all the time. They claim books exist that don’t, invent historical events, fabricate citations. This suggests they’re not genuinely “knowing” what they’re saying—they’re producing plausible-sounding sequences without any mechanism for tracking truth.
The Understanding View
Defenders of LLM understanding point to counter-evidence that’s hard to explain away.
Start with functional competence. LLMs do things that seem to require understanding. They answer novel questions correctly—questions that have probably never appeared in their training data in exactly that form. They reason through multi-step problems, maintaining coherence across long chains of inference. They generate creative content that moves human readers. They explain their reasoning when asked. They correct errors when pointed out.
If understanding means nothing more than the capacity to perform understanding-like tasks, LLMs seem to have it.
Then there are emergent capabilities. As LLMs scale up, they develop abilities that weren’t explicitly trained. In-context learning lets them adapt to new tasks from just a few examples. Chain-of-thought reasoning lets them work through complex problems step by step. They transfer knowledge across domains, applying concepts learned in one context to novel situations.
Where do these capabilities come from if not some form of understanding? A pure pattern-matcher should be limited to patterns it has seen. But LLMs generalize in ways that suggest something more is happening.
Research on internal representations adds another piece of evidence. Studies show that LLMs develop internal representations that correspond to real-world structure. They encode geographic relationships—knowing that Paris is closer to London than to Tokyo. They capture temporal relationships—knowing that World War II came after World War I. They represent conceptual hierarchies—knowing that dogs are animals and that animals are living things.
This isn’t what you’d expect from a system doing pure surface-level pattern matching. It suggests the models are building something like a model of the world, organized in ways that reflect reality’s actual structure.
Finally, the grounding objection may prove too much. If “no sensory grounding means no understanding,” we have to explain how blind people understand “red,” how deaf people understand music, how any of us understand quantum mechanics or the interior of stars—phenomena utterly beyond direct human experience. If humans can understand things we’ve never perceived, why couldn’t AI?
Finding Middle Ground
Perhaps both camps are partially right, and the disagreement comes from treating understanding as all-or-nothing when it might be a matter of degree.
Understanding could be graded. At one end, you have no understanding at all—a lookup table that returns memorized responses. At the other end, you have the rich, embodied, experiential understanding of a human expert. In between lies a spectrum. LLMs might have real but limited understanding—more than a parrot, less than a human.
Understanding could also come in different kinds. There might be structural understanding, which involves grasping how concepts relate to each other. There’s operational understanding, which means knowing how to use concepts correctly. There’s phenomenal understanding—knowing what experiences feel like. And there’s grounded understanding, which connects concepts to direct experience of the world.
LLMs might have strong structural and operational understanding while lacking phenomenal and grounded understanding. They know how concepts relate and how to use them, but they don’t know what concepts feel like or have any direct connection to the reality concepts refer to.
This would explain both why LLMs seem so capable and why something still seems missing. They’re not empty parrots, but they’re not human-like understanders either. They occupy a novel position that our existing concepts of understanding struggle to capture.
What Would Settle the Debate?
We don’t have conclusive evidence either way. But several lines of inquiry might help.
Mechanistic interpretability aims to understand what’s actually happening inside LLMs—not just input-output behavior, but the computations that produce that behavior. If we could fully map out how LLMs process language, we might be able to determine whether something that deserves to be called “understanding” is happening.
Novel competencies would provide evidence. If LLMs develop capabilities that seem genuinely impossible through pattern matching alone—proving new mathematical theorems, perhaps, or making scientific discoveries—that would suggest more than statistics is at work.
Grounding experiments could test whether grounding matters. If we built LLMs that interact with physical environments—robots with bodies, not just text processors—and they improved dramatically, that would suggest grounding is essential to genuine understanding.
Until such evidence arrives, the debate will continue. And that’s not necessarily a bad thing.
Why the Debate Matters
The stochastic parrot debate touches questions we’ve barely begun to answer. What is understanding? How would we know if a system understood? Is understanding about internal states or external behavior?
Engaging with these questions about AI forces us to examine our assumptions about human cognition. We typically assume we understand language because… well, it feels like we understand. But what exactly is that feeling? What processes produce it? Would we recognize understanding in a system organized differently than a brain?
The debate about LLMs is also a debate about ourselves. The concepts we developed to describe human cognition—“knowledge,” “understanding,” “meaning”—were built for humans. AI systems force us to examine whether those concepts are fundamental features of mind or just descriptions of how human minds happen to work.
Perhaps the most important insight is how little we know. We don’t understand what understanding is well enough to say confidently whether LLMs have it. The debate is genuinely open—and that openness should inform how we build, deploy, and trust these systems.
This is part of the AI Truth and Justice series—16 modules exploring the philosophical foundations of artificial intelligence.
From Theory to Practice
How do we build systems that work despite uncertainty about understanding?
- Context Optimization — What agents “know” at any given moment
- Memory Systems — Persistent state across sessions
- Git Worktrees — Enabling parallel agent execution
Get Intelligence Reports
How are leading teams navigating AI capability and trust? Our Intelligence Reports cover enterprise adoption patterns, evaluation frameworks, and strategic positioning.