Papers

Topics

Authors

Recent

View all

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 77 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 30 tok/s Pro

GPT-5 High 31 tok/s Pro

GPT-4o 91 tok/s Pro

Kimi K2 178 tok/s Pro

GPT OSS 120B 385 tok/s Pro

Claude Sonnet 4 38 tok/s Pro

2000 character limit reached

Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models (2507.07505v1)

Published 10 Jul 2025 in cs.CL and cs.AI

Abstract: With widespread adoption of transformer-based LLMs in AI, there is significant interest in the limits of LLMs capabilities, specifically so-called hallucinations, occurrences in which LLMs provide spurious, factually incorrect or nonsensical information when prompted on certain subjects. Furthermore, there is growing interest in agentic uses of LLMs - that is, using LLMs to create agents that act autonomously or semi-autonomously to carry out various tasks, including tasks with applications in the real world. This makes it important to understand the types of tasks LLMs can and cannot perform. We explore this topic from the perspective of the computational complexity of LLM inference. We show that LLMs are incapable of carrying out computational and agentic tasks beyond a certain complexity, and further that LLMs are incapable of verifying the accuracy of tasks beyond a certain complexity. We present examples of both, then discuss some consequences of this work.

Summary

The paper establishes a computational complexity threshold, demonstrating that LLMs’ inherent O(N²d) limits lead to inevitable hallucinations on tasks with higher complexity.
It rigorously applies classical complexity theory, including the time-hierarchy theorem, to explain failures in tasks like matrix multiplication and agentic optimization.
The study warns against relying on LLMs for critical verification tasks and advocates for hybrid systems that integrate external computational modules.

Formal Analysis of Computational Complexity Limitations in Transformer-Based LLMs

The paper "Hallucination Stations: On Some Basic Limitations of Transformer-Based LLMs" (2507.07505) presents a rigorous computational complexity perspective on the phenomenon of hallucinations in transformer-based LLMs. The authors argue that the core architectural and computational constraints of LLMs fundamentally limit their ability to perform or verify tasks whose inherent complexity exceeds that of the model's inference process. This analysis is grounded in classical complexity theory and is supported by concrete examples and a formal theorem.

Core Argument and Theoretical Foundation

The central thesis is that transformer-based LLMs, by virtue of their self-attention mechanism, have a per-token computational complexity of $O(N^2 d)$ , where $N$ is the input sequence length and $d$ is the model dimensionality. This bound is not merely a practical limitation but a theoretical ceiling: any task whose minimal computational complexity exceeds $O(N^2 d)$ cannot be reliably solved or verified by such models. The argument is formalized using the time-hierarchy theorem, which guarantees the existence of problems solvable in $O(t_2(n))$ but not in $O(t_1(n))$ for $t_2(n) > t_1(n)$ .

Illustrative Examples

The paper provides several instructive examples:

Token Composition: Enumerating all possible strings of length $k$ from a set of $n$ tokens requires $O(n^k)$ time, which quickly outpaces the $O(N^2 d)$ budget for even moderate $n$ and $k$ .
Matrix Multiplication: The naive algorithm for multiplying two $n \times n$ matrices is $O(n^3)$ , again exceeding the LLM's computational envelope for sufficiently large $n$ .
Agentic AI Tasks: In agentic settings, where LLMs are used as autonomous agents, the complexity of real-world tasks (e.g., combinatorial optimization, scheduling, formal verification) often surpasses $O(N^2 d)$ . The paper highlights that not only can LLMs not solve such tasks, but they also cannot verify solutions produced by other agents, as verification itself is often at least as hard as the original problem.

Theorem and Corollary

The authors state and prove the following:

Theorem: For any prompt of length $N$ encoding a task of complexity $O(n^3)$ or higher (with $n < N$ ), an LLM or LLM-based agent will necessarily hallucinate in its response.
Corollary: There exist tasks for which LLM-based agents cannot verify the correctness of another agent's solution, as verification complexity exceeds the model's computational capacity.

These claims are not limited to pathological or contrived tasks but encompass a wide range of practical problems in combinatorics, optimization, and formal verification.

Empirical and Numerical Evidence

The paper provides concrete measurements, such as the Llama-3.2-3B-Instruct model requiring approximately $1.09 \times 10^{11}$ floating-point operations for a 17-token input, regardless of the semantic content. This invariance underscores the disconnect between the computational demands of certain tasks and the fixed computational budget of LLM inference.

Implications

Practical Implications

Deployment Caution: The results caution against deploying LLMs (or LLM-based agents) in domains where correctness on high-complexity tasks is critical, such as scientific computing, logistics optimization, or formal software verification.
Verification Limitations: LLMs cannot be relied upon to verify the correctness of solutions to complex tasks, undermining their utility as autonomous validators in agentic workflows.
Composite and Hybrid Systems: The findings motivate the development of composite systems that combine LLMs with external symbolic, algorithmic, or search-based modules to handle tasks beyond the LLM's complexity class.

Theoretical Implications

Bounded Reasoning: The analysis extends to "reasoning" LLMs, which generate additional tokens in intermediate "think" steps. The authors argue that the fundamental per-token complexity remains unchanged, and the token budget for reasoning is insufficient to bridge the gap for high-complexity tasks.
Hallucination as Inevitable: Hallucinations are not merely a byproduct of imperfect training or data, but a necessary consequence of computational mismatch between the model and the task.

Future Directions

Augmentation with External Tools: Integrating LLMs with external computation engines, symbolic solvers, or domain-specific algorithms is a promising direction to overcome these limitations.
Complexity-Aware Prompting: Developing methods to detect when a prompt encodes a task beyond the model's computational reach could help mitigate hallucinations in deployment.
Formal Verification of LLM Outputs: For safety-critical applications, outputs from LLMs should be subject to independent verification by systems with sufficient computational power.

Conclusion

This work provides a formal and practical framework for understanding the inherent computational limitations of transformer-based LLMs. By situating hallucinations within the context of computational complexity, the paper offers a principled explanation for observed failures and sets clear boundaries for the reliable application of LLMs. The implications are significant for both the design of future AI systems and the responsible deployment of current models in real-world settings.