Papers
Topics
Authors
Recent
Search
2000 character limit reached

Autonomous Tree-search Ability of Large Language Models

Published 14 Oct 2023 in cs.CL and cs.AI | (2310.10686v1)

Abstract: LLMs have excelled in remarkable reasoning capabilities with advanced prompting techniques, but they fall short on tasks that require exploration, strategic foresight, and sequential decision-making. Recent works propose to utilize external programs to define search logic, such that LLMs can perform passive tree search to solve more challenging reasoning tasks. Though impressive results have been achieved, there are several fundamental limitations of these approaches. First, passive tree searches are not efficient as they usually require multiple rounds of LLM API calls to solve one single problem. Moreover, passive search methods are not flexible since they need task-specific program designs. Then a natural question arises: can we maintain the tree-search capability of LLMs without the aid of external programs, and can still generate responses that clearly demonstrate the process of a tree-structure search? To this end, we propose a new concept called autonomous tree-search ability of LLM, which can automatically generate a response containing search trajectories for the correct answer. Concretely, we perform search trajectories using capable LLM API via a fixed system prompt, allowing them to perform autonomous tree-search (ATS) right out of the box. Experiments on 4 puzzle games demonstrate our method can achieve huge improvements. The ATS-BFS method outperforms the Chain of Thought approach by achieving an average accuracy improvement of 33%. Compared to Tree of Thoughts, it requires 65.6% or 47.7% less GPT-api cost to attain a comparable level of accuracy. Moreover, we have collected data using the ATS prompt method and fine-tuned LLaMA. This approach yield a greater improvement compared to the ones fine-tuned on CoT data. Specifically, it outperforms CoT-tuned LLaMAs by an average of 40.6% and 38.5% for LLaMA2-7B and LLaMA2-13B, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687, 2023.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  3. Frugalgpt: How to use large language models while reducing cost and improving performance. arXiv preprint arXiv:2305.05176, 2023.
  4. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  5. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
  6. Payal Dhar. The carbon impact of artificial intelligence. Nat. Mach. Intell., 2(8):423–425, 2020.
  7. A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level. Proceedings of the National Academy of Sciences, 119(32):e2123433119, 2022.
  8. Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874, 2021.
  9. Gpt is becoming a turing machine: Here are some ways to program it. arXiv preprint arXiv:2303.14310, 2023.
  10. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  11. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
  12. Evaluating the logical reasoning ability of chatgpt and gpt-4. arXiv preprint arXiv:2304.03439, 2023.
  13. Jieyi Long. Large language model guided tree-of-thought. arXiv preprint arXiv:2305.08291, 2023.
  14. Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114, 2021.
  15. OpenAI. Gpt-4 technical report, 2023.
  16. Vicuna: A Timing-Predictable RISC-V Vector Coprocessor for Scalable Parallel Computation. In Björn B. Brandenburg (ed.), 33rd Euromicro Conference on Real-Time Systems (ECRTS 2021), volume 196 of Leibniz International Proceedings in Informatics (LIPIcs), pp.  1:1–1:18, Dagstuhl, Germany, 2021. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. ISBN 978-3-95977-192-4. doi: 10.4230/LIPIcs.ECRTS.2021.1. URL https://drops.dagstuhl.de/opus/volltexte/2021/13932.
  17. Explain yourself! leveraging language models for commonsense reasoning. arXiv preprint arXiv:1906.02361, 2019.
  18. Algorithm of thoughts: Enhancing exploration of ideas in large language models. arXiv preprint arXiv:2308.10379, 2023.
  19. Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937, 2018.
  20. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  21. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  22. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
  23. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  24. Sustainable ai: Environmental implications, challenges and opportunities. Proceedings of Machine Learning and Systems, 4:795–813, 2022.
  25. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
  26. Large language model as autonomous decision maker. arXiv preprint arXiv:2308.12519, 2023.
  27. Cumulative reasoning with large language models. arXiv preprint arXiv:2308.04371, 2023.
  28. Teaching algorithmic reasoning via in-context learning. arXiv preprint arXiv:2211.09066, 2022.

Summary

  • The paper introduces ATS, enhancing LLM reasoning by enabling autonomous tree-based exploration without external programs.
  • ATS-BFS outperforms Chain-of-Thought methods by achieving higher accuracy and cost efficiency in solving challenging puzzles.
  • LLMs fine-tuned with ATS data outperform CoT-tuned models, demonstrating the approach's potential for advanced decision-making.

Autonomous Tree-search Ability of LLMs

The paper "Autonomous Tree-search Ability of LLMs" introduces the concept of autonomous tree-search (ATS) in LLMs as a method to enhance their reasoning capabilities without the use of external programs. This approach aims to address the limitations of existing passive search techniques, providing a flexible and efficient means for solving complex tasks requiring exploration, strategic foresight, and sequential decision-making.

Introduction

LLMs have been widely recognized for their advanced reasoning abilities, particularly with techniques such as Chain-of-Thought (CoT) prompting [wei2022chain]. However, CoT is limited when it comes to handling more complex reasoning tasks that require exploration and strategic decision-making. Traditionally, passive search methods have involved external program-driven tree searches, which come with shortcomings such as inefficiency and a lack of flexibility due to task-specific program design. The research in this paper explores whether LLMs can autonomously generate tree-structured search responses without external aid.

Methodology

The paper introduces autonomous tree-search (ATS) as a means to provide LLMs with the capability to perform tree-based exploration independently.

Framework

Autonomous tree-search enhances LLMs by allowing them to generate tree-search process responses autonomously. ATS is implemented in two variations: ATS-BFS (breadth-first search style) and ATS-DFS (depth-first search style), as illustrated in the following figure. Figure 1

Figure 1

Figure 1

Figure 1: Chain of Thought vs. @@@@2@@@@ vs. Autonomous Tree-search.

Figure 1 contrasts different LLM reasoning styles, highlighting how ATS allows for simultaneous exploration of multiple solution paths, unlike CoT.

Large Models

For models such as GPT-4, ATS can be initiated using fixed system prompts that guide the model to generate search trajectories. These system messages introduce the concepts of ATS, guiding the model to role-play an explorer capable of tree-search reasoning. In few-shot tasks, task-specific examples demonstrate ATS in action, facilitating more complex exploration.

Small Models

Smaller models like LLaMA 2 are equipped with ATS capabilities via data collected from ATS-enhanced GPT-4 outputs. This data is used for fine-tuning these models, either preserving the tree structure (ATS-tuned) or converting it to chain data (CoT-tuned). Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: A brief view of finetuning. The left part is ATS-tuned LLaMA pipeline, while the right part is CoT-tuned LLaMA pipeline. The difference is whether to prune the tree structure or not.

Evaluation and Results

The paper evaluates ATS using several challenging puzzles: Drop Water Puzzle, Number Path Puzzle, Arithmetic Puzzle, and Minimal Grass Puzzle. These puzzles are selected to test the exploratory and decision-making capabilities of LLMs under ATS.

Results with GPT-4

  • ATS-BFS consistently demonstrated higher accuracy compared to CoT and was more cost-effective than ToT in both zero-shot and few-shot settings.
  • ATS-DFS showed potential in specific cases but inconsistently performed across different tasks, likely due to complex logic requirements. Figure 3

Figure 3

Figure 3

Figure 3

Figure 3: Drop Water Puzzle.

Results with LLaMA 2

  • LLaMA models fine-tuned with ATS data outperformed CoT-tuned counterparts. ATS-BFS-sourced models exhibited remarkable performance improvements.
  • ATS-DFS data enhanced model accuracy in certain tasks but was less effective in others, particularly optimization problems.

Discussion

ATS provides LLMs with enhanced flexibility and capability without dependency on external search logic. Compared to traditional methods, ATS offers a substantial cost-performance advantage by efficiently executing tree-like reasoning within a single model query.

Conclusion

The exploration of autonomous tree-search ability in LLMs shows significant improvements in handling tasks requiring extensive reasoning and decision-making. This method not only advances the state-of-the-art in LLM capabilities but also establishes a foundation for developing more autonomous AI systems with minimal external dependencies. Future work should investigate the impact of ATS on the broader capabilities of LLMs, especially larger models beyond 70B parameters, and explore any potential trade-offs in implementing ATS at scale.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 38 likes about this paper.