Lost in Transmission: When and Why LLMs Fail to Reason Globally (2505.08140v2)

Published 13 May 2025 in cs.AI, cs.FL, and cs.LG

Abstract: Despite their many successes, transformer-based LLMs continue to struggle with tasks that require complex reasoning over large parts of their input. We argue that these failures arise due to capacity limits on the accurate flow of information within LLMs. To formalize this issue, we introduce the bounded attention prefix oracle (BAPO) model, a new computational framework that models bandwidth constraints on attention heads, the mechanism for internal communication in LLMs. We show that several important reasoning problems like graph reachability require high communication bandwidth for BAPOs to solve; we call these problems BAPO-hard. Our experiments corroborate our theoretical predictions: GPT-4o, Claude, and Gemini succeed on BAPO-easy tasks and fail even on relatively small BAPO-hard tasks. BAPOs also reveal another benefit of chain of thought (CoT): we prove that breaking down a task using CoT can turn any BAPO-hard problem into a BAPO-easy one. Our results offer principled explanations for key LLM failures and suggest directions for architectures and inference methods that mitigate bandwidth limits.

Summary

The paper introduces the BAPO model to quantify LLM communication constraints that lead to failures in global reasoning.
It proves that tasks like reachability, majority, and higher-order matching demand super-constant bandwidth, highlighting inherent limitations.
The study demonstrates that iterative Chain of Thought strategies can mitigate, though not fully overcome, these bandwidth constraints.

This paper, "Lost in Transmission: When and Why LLMs Fail to Reason Globally" (2505.08140), investigates the persistent failures of transformer-based LLMs on tasks requiring complex reasoning over large inputs, which the authors term "global problems." The core hypothesis is that these failures stem from limitations on the accurate flow of information within the LLM, specifically between different parts of the input sequence due to capacity constraints in the attention mechanism and the unidirectional nature of causal attention.

To formalize this hypothesis, the authors introduce a new computational model called the Bounded Attention Prefix Oracle (BAPO). A BAPO models the effective bandwidth of an LLM's internal communication. It breaks down the computation required to predict the next token (or a single output token for a decision problem) given an input split into a prefix and a suffix. The BAPO receives the full suffix, plus limited information from the prefix:

An $a$ -bit output from a prefix oracle ( $f$ ) which has access only to the prefix. This models the intermediate processing in the prefix residual streams.
A set of $b$ prefix tokens selected individually by an attention function ( $g$ ) based on the suffix. This models the attention mechanism's ability to select specific tokens.
The positional information for all tokens.

The final prediction is made by a suffix oracle ( $h$ ) using the suffix, the $a$ -bit output from $f$ , and the $b$ attended prefix tokens. The key constraints are the prefix bandwidth ( $a$ ) and the attention bandwidth ( $b$ ). A problem is considered BAPO-easy if it can be solved by a BAPO with constant bandwidths with respect to input size, and BAPO-hard otherwise. The authors conjecture that practical LLMs have small constant effective bandwidths, leading to failures on BAPO-hard tasks.

The paper presents theoretical results on the BAPO complexity of various problems:

BAPO-Easy Problems: Problems like Index, Equality, and Disjointness are shown to be BAPO-easy, requiring only constant prefix and attention bandwidths (e.g., $(0, 1)$ or $(1, 1)$ BAPOs). This contrasts with standard one-way communication complexity, where these problems are hard, suggesting attention provides a significant advantage for specific communication tasks.
BAPO-Hard Problems: Several important global reasoning problems are proven BAPO-hard, requiring super-constant bandwidths:
- Reachability: Determining if a path exists between two nodes in a graph. Requires $(o(m^{1/c} \log m), o(m^{1-2/c}))$ bandwidth, where $m$ is the number of edges and $c \ge 3$ is an integer constant. This formalizes the intuition that tracking dependencies across a graph requires significant information flow.
- Majority: Determining if a bitstring has more ones than zeros. Requires $(o(\log n), o(n^{1-\epsilon}))$ bandwidth for inputs of length $n$ . While theoretically solvable by simple circuits, the BAPO model suggests practical LLMs struggle due to communication limits.
- Match3 $_n$ : Given $x \in \mathbb Z_m^n$ , check if $x_n + x_i + x_j \equiv 0 \pmod m$ for some $i, j$ . Requires $(o(n/b(n)), b(n))$ bandwidth for any $b(n) = o(n)$ . This is harder than Match2 $_n$ (requiring only $(0,1)$ ), aligning with prior work showing transformers struggle with higher-order relationships.
- Unique / SetDiff: Finding a unique element or an element in one set but not another (represented by token sequences). These are BAPO- $\Sigma$ -hard, meaning complexity scales with vocabulary size $|\Sigma|$ . Require $(o(|\Sigma|/b(|\Sigma|)), b(|\Sigma|))$ bandwidth.

The hardness proofs generally follow a communication complexity strategy: constructing adversarial sets of prefixes and suffixes where a limited-bandwidth BAPO cannot distinguish between instances with different correct outputs. The key is leveraging the bandwidth constraints to ensure that either the prefix oracle's output collides for different prefixes, or the attention function can only select tokens that are identical across the adversarial prefixes.

A significant finding concerns the impact of Chain of Thought (CoT). The paper introduces BAPO-CoT, where a fixed BAPO is applied iteratively, with the output of one step concatenated to the input for the next, simulating autoregressive generation. The authors prove that a constant-bandwidth BAPO-CoT (specifically, a $(2, 3)$ -BAPO-CoT) can simulate any Turing machine, demonstrating that CoT can theoretically break down any decidable problem into steps requiring only low bandwidth. This suggests that CoT can alleviate the communication burden.

Empirical experiments using GPT-4o/mini, Claude 3.5 Sonnet/Haiku, and Gemini 1.5 Pro/Flash test the BAPO model's predictive power. Across synthetic tasks (Index, Equality, Match2, Reachability, Majority, Match3) and real-world examples (review aggregation, variable tracking), LLMs consistently show degraded performance on BAPO-hard problems as input size increases, while maintaining higher accuracy on BAPO-easy tasks. This supports the hypothesis that their effective bandwidth is limited to a small constant in practice.

Using CoT prompts improves performance on some BAPO-hard tasks (like Match3 and Reachability) but significant performance drops still occur at larger input sizes. This suggests that while CoT is powerful in theory, LLMs may not always generate CoT steps that optimally reduce communication bandwidth requirements, or the number of steps required might be impractically large for longer inputs.

Practical Implications and Implementation Considerations:

Anticipating Failures: Developers can analyze tasks for BAPO-hard components (like Reachability, Majority, or higher-order matching) to predict where LLMs might struggle, especially with increasing input size.
Mitigation Strategies: For BAPO-hard tasks, consider:
- Inference-time scaling: Explicitly prompting for CoT steps or using techniques like retrieval or tool-calling (which can be seen as offloading BAPO-hard subtasks to external tools).
- Preprocessing: Simplifying the input or extracting critical information before feeding it to the LLM to reduce the information flow burden.
- Hybrid Architectures: Integrating symbolic reasoners or specialized modules that are efficient at BAPO-hard tasks with LLMs.
Training and Fine-tuning: The BAPO model suggests that training or fine-tuning LLMs on reasoning chains that explicitly break down problems into low-bandwidth steps could be beneficial. Optimizing for low bandwidth during training might be a novel objective.
Architectural Design: Future LLM architectures could potentially be designed to increase effective bandwidth, although the paper notes that simply scaling existing components might not achieve this, as the effective bandwidth appears constant despite model size.

The paper contributes a valuable theoretical framework for understanding a key limitation of LLMs and provides empirical evidence that this limitation, captured by the BAPO model, predicts real-world LLM failures. It highlights the gap between theoretical expressivity and practical performance, attributing it to effective communication bandwidth constraints.

PDF Markdown

Related Papers

Tweets

https://twitter.com/fly51fly/status/1922769383132139945

https://twitter.com/LazyOp/status/1923764985596977513

https://twitter.com/GptMaestro/status/1926645885934399523

YouTube

Show All Videos