Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search (2310.13227v1)

Published 20 Oct 2023 in cs.CL, cs.AI, and cs.LG
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search

Abstract: LLMs have demonstrated powerful decision-making and planning capabilities in solving complicated real-world problems. LLM-based autonomous agents can interact with diverse tools (e.g., functional APIs) and generate solution plans that execute a series of API function calls in a step-by-step manner. The multitude of candidate API function calls significantly expands the action space, amplifying the critical need for efficient action space navigation. However, existing methods either struggle with unidirectional exploration in expansive action spaces, trapped into a locally optimal solution, or suffer from exhaustively traversing all potential actions, causing inefficient navigation. To address these issues, we propose ToolChain*, an efficient tree search-based planning algorithm for LLM-based agents. It formulates the entire action space as a decision tree, where each node represents a possible API function call involved in a solution plan. By incorporating the A* search algorithm with task-specific cost function design, it efficiently prunes high-cost branches that may involve incorrect actions, identifying the most low-cost valid path as the solution. Extensive experiments on multiple tool-use and reasoning tasks demonstrate that ToolChain* efficiently balances exploration and exploitation within an expansive action space. It outperforms state-of-the-art baselines on planning and reasoning tasks by 3.1% and 3.5% on average while requiring 7.35x and 2.31x less time, respectively.

Efficient Action Space Navigation in LLMs: An Examination of ToolChain^*

This essay discusses the paper "ToolChain^*: Efficient Action Space Navigation in LLMs with A^* Search," which explores the application of an A^* tree search-based planning algorithm to enhance the efficiency of LLM-based agents. LLMs, like GPT and PaLM, have extended their capabilities beyond text generation, encompassing decision-making and problem-solving across diverse domains. However, navigating the extensive action space that comes with these tasks poses significant challenges. ToolChain^* seeks to address such challenges by introducing an innovative solution grounded in the principles of A^* search.

Key Concepts and Contributions

The primary focus of ToolChain^* is to optimize action space navigation when LLMs interact within autonomous agent frameworks, particularly when these agents are tasked with using tools like API functions. Traditional approaches in this domain often fall into two traps. The first is becoming stuck in locally optimal solutions due to unidirectional exploration strategies, while the second is inefficiency caused by exhaustive exploration of all possible actions.

ToolChain^* addresses these issues by representing the entire action space as a decision tree, utilizing A^* search principles to explore this space. Each node in the tree represents a potential API call step, and the A^* algorithm aids in pruning branches unlikely to lead to a solution, thus focusing computational resources on the most promising paths. Crucially, ToolChain^* uses a task-specific cost function to guide this exploration, penalizing high-cost actions that represent suboptimal or incorrect paths.

Numerical Results and Performance

ToolChain^* underwent extensive testing on a suite of tool-use environments within ToolBench, as well as mathematical reasoning challenges from the GSM8K dataset. Empirically, ToolChain^* demonstrated an improved planning and reasoning success rate compared to current state-of-the-art methods, showing average performance improvements of 3.1% in planning tasks and 3.5% in reasoning tasks using fewer computational resources—specifically, achieving results in 7.35x and 2.31x less time, respectively.

The significant reduction in computational overhead is attributed to the effective search and optimization strategies inherent in ToolChain^*. By evaluating future costs more efficiently and narrowing down action steps without exhaustive unfolding of all possibilities (as seen in traditional MCTS approaches), ToolChain^* provides a balanced framework for action selection in expansive action spaces.

Practical and Theoretical Implications

The paper's contributions significantly enhance the practicality of deploying LLMs as autonomous agents in real-world applications where interactions with external tools are required. The ability to efficiently navigate and make decisions within vast action spaces without succumbing to local optima is crucial for scaling such technologies. Theoretically, the integration of A^* search within LLMs showcases how AI can draw across disciplines, merging insights from heuristic-based search algorithms with modern LLMs to effectively address complex tasks.

Future Directions in AI

The development of ToolChain^* suggests several avenues for future research. The algorithm’s reliance on task-specific heuristics invites the exploration of automated or adaptive heuristic generation methods, potentially increasing the algorithm's flexibility and applicability. Additionally, extending these strategies to multi-agent systems can offer insights into collaborative and competitive settings where multiple LLMs are engaged in concurrent or opposing objectives.

Lastly, further work can aim to generalize this framework beyond tool interaction tasks. The core principle—efficiently navigating large decision spaces—holds the potential applicability across several domains, from robotics and autonomous systems to strategic games.

In conclusion, ToolChain^* presents a sophisticated approach to enhancing the utility and efficiency of LLMs in complex action-driven environments. By combining tree search algorithms with modern LLMs, the research provides both empirical benefits and theoretical advancements indicative of future trends in AI research and application.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  3. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588, 2022.
  4. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  5. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
  6. Pal: Program-aided language models. arXiv, pp.  2211.10435v2, 2022.
  7. Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992, 2023a.
  8. Toolkengpt: Augmenting frozen language models with massive tools via tool embeddings, 2023b.
  9. Deberta: Decoding-enhanced bert with disentangled attention. In International Conference on Learning Representations, 2020.
  10. Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey, 2023.
  11. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. arXiv, pp.  2201.07207v2, 2022a.
  12. Inner monologue: Embodied reasoning through planning with language models. arXiv, pp.  2207.05608v1, 2022b.
  13. Mathprompter: Mathematical reasoning using large language models. arXiv preprint arXiv:2303.05398, 2023.
  14. Eric Jang. Can llms critique and iterate on their own outputs? evjang.com, 2023.
  15. Genegpt: Augmenting large language models with domain tools for improved access to biomedical information. ArXiv, 2023.
  16. Maieutic prompting: Logically consistent reasoning with recursive explanations. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  1266–1279, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.82. URL https://aclanthology.org/2022.emnlp-main.82.
  17. Language models can solve computer tasks. arXiv, pp.  2303.17491v1, 2023.
  18. Bandit based monte-carlo planning. In European conference on machine learning, pp.  282–293. Springer, 2006.
  19. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. In The Eleventh International Conference on Learning Representations, 2022.
  20. Code as policies: Language model programs for embodied control. arXiv, pp.  2209.07753v3, 2022.
  21. Teaching models to express their uncertainty in words, 2022.
  22. Learn to explain: Multimodal reasoning via thought chains for science question answering, 2022.
  23. Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842, 2023.
  24. Augmented language models: a survey, 2023.
  25. Lila: A unified benchmark for mathematical reasoning. arXiv preprint arXiv:2210.17517, 2022.
  26. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021.
  27. Webgpt: Browser-assisted question-answering with human feedback, 2022.
  28. OpenAI. Gpt-4 technical report. arXiv, pp.  2303.08774v3, 2023.
  29. Art: Automatic multi-step reasoning and tool-use for large language models. arXiv preprint arXiv:2303.09014, 2023.
  30. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255, 2022.
  31. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334, 2023.
  32. Virtualhome: Simulating household activities via programs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  8494–8502, 2018.
  33. Tool learning with foundation models. arXiv preprint arXiv:2304.08354, 2023a.
  34. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789, 2023b.
  35. Improving language understanding by generative pre-training. OpenAI Blog, 2018.
  36. Language models are unsupervised multitask learners. OpenAI Blog, 2019.
  37. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  38. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580, 2023.
  39. Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
  40. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
  41. Adaplanner: Adaptive planning from feedback with language models, 2023.
  42. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  43. Code4struct: Code generation for few-shot structured prediction from natural language. arXiv preprint arXiv:2210.12810, 2022a.
  44. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, 2022b.
  45. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv, pp.  2302.01560v1, 2023.
  46. Chain-of-thought prompting elicits reasoning in large language models. arXiv, pp.  2201.11903v6, 2022.
  47. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp.  1112–1122, 2018.
  48. Visual chatgpt: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671, 2023.
  49. Decomposition enhances reasoning via self-evaluation guided decoding, 2023.
  50. On the tool manipulation capability of open-source large language models. arXiv preprint arXiv:2305.16504, 2023.
  51. Chatgpt is not enough: Enhancing large language models with knowledge graphs for fact-aware language modeling, 2023a.
  52. Gpt4tools: Teaching large language model to use tools via self-instruction, 2023b.
  53. Mm-react: Prompting chatgpt for multimodal reasoning and action. arXiv preprint arXiv:2303.11381, 2023c.
  54. Tree of thoughts: Deliberate problem solving with large language models, 2023a.
  55. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, 2023b. URL https://openreview.net/forum?id=WE_vluYUL-X.
  56. Evaluating and improving tool-augmented computation-intensive math reasoning, 2023.
  57. Jiawei Zhang. Graph-toolformer: To empower llms with graph reasoning ability via prompt augmented by chatgpt. arXiv preprint arXiv:2304.11116, 2023.
  58. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023.
  59. Least-to-most prompting enables complex reasoning in large language models. arXiv, pp.  2205.10625v2, 2022.
  60. Solving math word problems via cooperative reasoning induced language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  4471–4485, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.245. URL https://aclanthology.org/2023.acl-long.245.
  61. Toolqa: A dataset for llm question answering with external tools, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yuchen Zhuang (37 papers)
  2. Xiang Chen (343 papers)
  3. Tong Yu (119 papers)
  4. Saayan Mitra (15 papers)
  5. Victor Bursztyn (2 papers)
  6. Ryan A. Rossi (124 papers)
  7. Somdeb Sarkhel (10 papers)
  8. Chao Zhang (907 papers)
Citations (37)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com