Recursion of Thought: A Divide-and-Conquer Approach to Multi-Context Reasoning with Language Models (2306.06891v1)
Abstract: Generating intermediate steps, or Chain of Thought (CoT), is an effective way to significantly improve LLMs' (LM) multi-step reasoning capability. However, the CoT lengths can grow rapidly with the problem complexity, easily exceeding the maximum context size. Instead of increasing the context limit, which has already been heavily investigated, we explore an orthogonal direction: making LMs divide a problem into multiple contexts. We propose a new inference framework, called Recursion of Thought (RoT), which introduces several special tokens that the models can output to trigger context-related operations. Extensive experiments with multiple architectures including GPT-3 show that RoT dramatically improves LMs' inference capability to solve problems, whose solution consists of hundreds of thousands of tokens.
- Making neural programming architectures generalize via recursion. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.
- PaLM: Scaling language modeling with pathways. ArXiv, abs/2204.02311.
- Successive prompting for decomposing complex questions. ArXiv, abs/2212.04092.
- Parametrized hierarchical procedures for neural programming. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
- Michael Hahn. 2020. Theoretical limitations of self-attention in neural sequence models. Transactions of the Association for Computational Linguistics, 8:156–171.
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation, 9:1735–1780.
- Decomposed prompting: A modular approach for solving complex tasks. ArXiv, abs/2210.02406.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
- Large language models are zero-shot reasoners. ArXiv, abs/2205.11916.
- Solving quantitative reasoning problems with language models. ArXiv, abs/2206.14858.
- Neural program lattices. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.
- Show your work: Scratchpads for intermediate computation with language models. ArXiv, abs/2112.00114.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc.
- Learning compositional neural programs with recursive tree search and planning. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 14646–14656.
- Scott E. Reed and Nando de Freitas. 2016. Neural programmer-interpreters. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings.
- Long range arena: A benchmark for efficient transformers. ArXiv, abs/2011.04006.
- Efficient transformers: A survey. ACM Computing Surveys, 55:1 – 28.
- Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5998–6008.
- Chain of thought prompting elicits reasoning in large language models. ArXiv, abs/2201.11903.
- Least-to-most prompting enables complex reasoning in large language models. ArXiv, abs/2205.10625.
- Soochan Lee (7 papers)
- Gunhee Kim (74 papers)