Autoregressive Large Language Models are Computationally Universal (2410.03170v1)

Published 4 Oct 2024 in cs.CL

Abstract: We show that autoregressive decoding of a transformer-based LLM can realize universal computation, without external intervention or modification of the model's weights. Establishing this result requires understanding how a LLM can process arbitrarily long inputs using a bounded context. For this purpose, we consider a generalization of autoregressive decoding where, given a long input, emitted tokens are appended to the end of the sequence as the context window advances. We first show that the resulting system corresponds to a classical model of computation, a Lag system, that has long been known to be computationally universal. By leveraging a new proof, we show that a universal Turing machine can be simulated by a Lag system with 2027 production rules. We then investigate whether an existing LLM can simulate the behaviour of such a universal Lag system. We give an affirmative answer by showing that a single system-prompt can be developed for gemini-1.5-pro-001 that drives the model, under deterministic (greedy) decoding, to correctly apply each of the 2027 production rules. We conclude that, by the Church-Turing thesis, prompted gemini-1.5-pro-001 with extended autoregressive (greedy) decoding is a general purpose computer.

Citations (1)

View on Semantic Scholar

Summary

The paper shows that extended autoregressive decoding in transformer-based LLMs enables universal computation by simulating Lag systems equivalent to Turing machines.
The paper demonstrates bidirectional memory control and reduction to a (2,2)-Lag system, establishing LLMs' ability to mimic any Turing machine.
The paper validates its theoretical claims with empirical evidence using the gemini-1.5-pro-001 model, highlighting LLMs' potential for versatile computational tasks.

Autoregressive LLMs as Universal Computers

The paper "Autoregressive LLMs are Computationally Universal" presents a formal investigation into the computational capabilities of LLMs, specifically evaluating whether these models can operate as universal computers. Authored by Dale Schuurmans, Hanjun Dai, and Francesco Zanini, the paper establishes that the autoregressive decoding of a transformer-based LLM can realize universal computation without modifying the model's weights. The primary assertion is that this results in the LLM simulating a universal Turing machine, reinforcing the notion that LLMs can function as general-purpose computers.

Key Contributions

The authors begin by exploring autoregressive decoding, a process wherein LLMs predict successive tokens conditioned on a fixed context. They introduce a generalized form of this decoding to handle long inputs, expanding the context window as the sequence progresses. The paper then demonstrates that this extended autoregressive decoding maps to a Lag system—a computational model equivalent to Turing machines in terms of universality.

The authors prove this universality by achieving several key results:

Simulation of Lag Systems: They establish that deterministic autoregressive decoding can replicate Lag systems, implying the computational power of LLMs matches these classic universal systems.
Bidirectional Memory Control: They show how Lag systems can simulate Turing machines by controlling memory access in both clockwise and counterclockwise directions.
Reduction to a Turing Machine: The paper claims that any Turing machine can be simulated by a (2,2)-Lag system, indicating that LLMs under extended autoregressive decoding can mimic Turing machine operations.
Practical Implementation with a LLM: They use the gemini-1.5-pro-001 model to simulate a universal Lag system, validating the theoretical findings with empirical evidence. The model is driven by a pre-constructed prompt that accurately executes production rules in a manner akin to a universal Turing machine.

Implications

The implications of these findings are significant, suggesting that LLMs have the inherent capability to perform any computation expressible by a Turing machine—subject to constraints on input and output chain length. This universality implies that LLMs can, in principle, simulate any algorithm or computational task, given a suitable prompt and context setup.

Moreover, this realization could transform the role of LLMs from mere predictive models to general-purpose computing entities capable of handling diverse tasks across domains without the need for traditional programming. Such capabilities open avenues for leveraging LLMs in complex problem-solving environments where human-like natural language understanding and general computational abilities are advantageous.

Future Directions

Future research could aim at refining the method for prompting LLMs to achieve computational universality more efficiently and with lower resource constraints. Additionally, exploring the practical applications of this theoretical capability, including its limitations and performance across a spectrum of tasks, could provide deeper insights into the deployability of LLMs in real-world computational scenarios.

While theoretical in nature, the paper's implications for artificial intelligence and computational theory could spur advancements in the way LLMs are trained, tested, and utilized across various industries. Thus, understanding and utilizing the potential universality of LLMs could represent a pivotal shift in both AI research and application.