ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models (2405.09220v3)
Abstract: Planning is a crucial element of both human intelligence and contemporary LLMs. In this paper, we initiate a theoretical investigation into the emergence of planning capabilities in Transformer-based LLMs via their next-word prediction mechanisms. We model planning as a network path-finding task, where the objective is to generate a valid path from a specified source node to a designated target node. Our mathematical characterization shows that Transformer architectures can execute path-finding by embedding the adjacency and reachability matrices within their weights. Furthermore, our theoretical analysis of gradient-based learning dynamics reveals that LLMs can learn both the adjacency and a limited form of the reachability matrices. These theoretical insights are then validated through experiments, which demonstrate that Transformer architectures indeed learn the adjacency and an incomplete reachability matrices, consistent with our theoretical predictions. When applying our methodology to the real-world planning benchmark Blocksworld, our observations remain consistent. Additionally, our analyses uncover a fundamental limitation of current Transformer architectures in path-finding: these architectures cannot identify reachability relationships through transitivity, which leads to failures in generating paths when concatenation is required. These findings provide new insights into how the internal mechanisms of autoregressive learning facilitate intelligent planning and deepen our understanding of how future LLMs might achieve more advanced and general planning-and-reasoning capabilities across diverse applications.
- Physics of language models: Part 1, context-free grammar. arXiv preprint arXiv:2305.13673, 2023.
- Physics of language models: Part 3.1, knowledge storage and extraction. arXiv preprint arXiv:2309.14316, 2023.
- Physics of language models: Part 3.2, knowledge manipulation. arXiv preprint arXiv:2309.14402, 2023.
- Physics of language models: Part 3.3, knowledge capacity scaling laws. arXiv preprint arXiv:2404.05405, 2024.
- Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv preprint arXiv:2303.12712, 2023.
- GraphLLM: Boosting graph reasoning ability of large language model. arXiv preprint arXiv:2310.05845, 2023.
- Towards revealing the mystery behind chain of thought: a theoretical perspective. Advances in Neural Information Processing Systems, 36, 2023.
- Gpt4graph: Can large language models understand graph structured data? an empirical evaluation and benchmarking. arXiv preprint arXiv:2305.15066, 2023.
- Looped transformers as programmable computers. In International Conference on Machine Learning, pages 11398–11442. PMLR, 2023.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR, 2022.
- ControlLLM: Augment language models with tools by searching on graphs. arXiv preprint arXiv:2310.17796, 2023.
- ToolNet: Connecting large language models with massive tools via tool graph. arXiv preprint arXiv:2403.00839, 2024.
- Graphinstruct: Empowering large language models with graph understanding and reasoning capability. arXiv preprint arXiv:2403.04483, 2024.
- Teaching arithmetic to small transformers. arXiv preprint arXiv:2307.03381, 2023.
- Evaluating cognitive maps and planning in large language models with CogEval. Advances in Neural Information Processing Systems, 36, 2023.
- The parallelism tradeoff: Limitations of log-precision transformers. Transactions of the Association for Computational Linguistics, 11:531–545, 2023.
- Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
- HuggingGPT: Solving AI tasks with ChatGPT and its friends in Huggingface. Advances in Neural Information Processing Systems, 36, 2023.
- Solving Olympiad geometry without human demonstrations. Nature, 625(7995):476–482, 2024.
- GraphGPT: Graph instruction tuning for large language models. arXiv preprint arXiv:2310.13023, 2023.
- On the planning abilities of large language models-a critical investigation. Advances in Neural Information Processing Systems, 36, 2024.
- Can language models solve graph problems in natural language? Advances in Neural Information Processing Systems, 36, 2023.
- A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6):1–26, 2024.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
- Unveiling Transformers with LEGO: a synthetic reasoning task. arXiv preprint arXiv:2206.04301, 2022.