Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tree of Thoughts: Deliberate Problem Solving with Large Language Models (2305.10601v2)

Published 17 May 2023 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for LLM inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting LLMs, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances LLMs' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts: https://github.com/princeton-nlp/tree-of-thought-LLM.

Enhancing Generative AI Problem-Solving with Tree of Thoughts (ToT)

Introduction

LLMs have advanced significantly, showing capabilities beyond simple text generation to include problem-solving across various domains. However, their generative process, rooted in token-level decision making, limits their performance in tasks demanding strategic reasoning, exploration, or look-ahead functionalities. To address these limitations, we discuss the "Tree of Thoughts" (ToT) framework, which extends the "Chain of Thought" (CoT) prompting approach, allowing for more sophisticated decision-making processes by exploring and evaluating multiple reasoning paths.

Background on LLM Problem Solving

Existing LLM problem-solving methods primarily utilize Input-Output (IO) prompting, CoT prompting, and Self-consistency with CoT (CoT-SC). These methods, while effective for a range of tasks, are constrained by their linear and single-path nature, limiting their ability to handle tasks requiring complex reasoning or search strategies. The introduction of the ToT framework seeks to expand the LLM's problem-solving toolkit by enabling a more nuanced exploration of potential solutions through a structured search process.

The Tree of Thoughts (ToT) Framework

The ToT framework represents a novel approach to LLM inference by structuring the reasoning process as a search over a tree of possible solutions, where each node—a "thought"—represents a coherent language sequence leading towards problem resolution. This structure allows the LLM to evaluate and choose from multiple paths, akin to human problem-solving processes that involve exploratory search and strategic planning. Key components of ToT include:

  • Thought Decomposition: Breaking down the problem-solving process into discrete steps that facilitate generation, evaluation, and selection.
  • Thought Generation and Evaluation: Mechanisms for proposing and assessing the viability of different thoughts or paths, leveraging the LLM's generative capabilities.
  • Search Algorithms: The application of search algorithms like BFS and DFS within the ToT framework, allowing systematic exploration and evaluation of the thought tree.

Empirical Exploration

We validate the ToT framework through experiments on three novel tasks designed to test the limits of current LLM problem-solving abilities: the Game of 24, Creative Writing, and Mini Crosswords. The results demonstrate that ToT significantly outperforms existing methods like IO prompting and CoT, showcasing its potential for enhancing LLM problem-solving across tasks that require complex reasoning, planning, and search strategies.

Implications and Future Directions

The introduction of ToT opens new avenues for LLM research, emphasizing the importance of structured reasoning and strategic search in problem-solving. It highlights a path towards integrating traditional AI search methods with the generative capabilities of LLMs, offering a richer toolkit for tackling complex problems. Future work could extend the ToT framework in several directions, including optimizing search algorithms for efficiency, exploring dynamic thought generation strategies, and applying ToT in domains requiring external knowledge or real-time interaction.

Conclusion

ToT represents a significant step forward in the application of LLMs for problem-solving, offering a structured and systematic approach to explore multiple reasoning paths. By enabling deliberate decision-making and strategic planning, ToT broadens the scope of tasks that LLMs can effectively address, paving the way for more sophisticated AI-assisted problem-solving capabilities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  2. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4:1–43, 2012.
  3. Deep blue. Artificial intelligence, 134(1-2):57–83, 2002.
  4. Teaching large language models to self-debug, 2023.
  5. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  6. A. Creswell and M. Shanahan. Faithful reasoning using large language models. arXiv preprint arXiv:2208.14271, 2022.
  7. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature neuroscience, 8(12):1704–1711, 2005.
  8. Pal: Program-aided language models, 2023.
  9. Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992, 2023.
  10. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2):100–107, 1968a. doi: 10.1109/TSSC.1968.300136.
  11. A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics, 4(2):100–107, 1968b.
  12. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022a.
  13. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022b.
  14. Maieutic prompting: Logically consistent reasoning with recursive explanations. arXiv preprint arXiv:2205.11822, 2022.
  15. D. Kahneman. Thinking, fast and slow. Macmillan, 2011.
  16. Representativeness revisited: Attribute substitution in intuitive judgment. Heuristics and biases: The psychology of intuitive judgment, 49(49-81):74, 2002.
  17. Language models can solve computer tasks, 2023.
  18. Llm+p: Empowering large language models with optimal planning proficiency, 2023.
  19. Neurologic a*esque decoding: Constrained text generation with lookahead heuristics. In North American Chapter of the Association for Computational Linguistics, 2021.
  20. Self-refine: Iterative refinement with self-feedback, 2023.
  21. Report on a general problem solving program. In IFIP congress, volume 256, page 64. Pittsburgh, PA, 1959.
  22. Human problem solving. Prentice-Hall, 1972.
  23. OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
  24. Refiner: Reasoning feedback on intermediate representations, 2023.
  25. Improving language understanding by generative pre-training. OpenAI blog, 2018.
  26. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  27. Large language model programs, 2023.
  28. Reflexion: an autonomous agent with dynamic memory and self-reflection, 2023.
  29. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
  30. S. A. Sloman. The empirical case for two systems of reasoning. Psychological bulletin, 119(1):3, 1996.
  31. K. E. Stanovich. Who is rational? Studies of individual differences in reasoning. Psychology Press, 1999.
  32. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  33. Chai: A chatbot ai for task-oriented dialogue with offline reinforcement learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4471–4491, 2022.
  34. Automated crossword solving. arXiv preprint arXiv:2205.09665, 2022.
  35. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models, 2023a.
  36. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
  37. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents, 2023b.
  38. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
  39. Decomposition enhances reasoning via self-evaluation guided decoding, 2023.
  40. Foundation models for decision making: Problems, methods, and opportunities, 2023.
  41. ReAct: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  42. Planning with large language models for code generation. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Lr8cOOtYbfL.
  43. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022.
  44. Solving math word problem via cooperative reasoning induced language models. arXiv preprint arXiv:2210.16257, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Shunyu Yao (72 papers)
  2. Dian Yu (78 papers)
  3. Jeffrey Zhao (12 papers)
  4. Izhak Shafran (30 papers)
  5. Thomas L. Griffiths (150 papers)
  6. Yuan Cao (201 papers)
  7. Karthik Narasimhan (82 papers)
Citations (1,229)
Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com