Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SwissNYF: Tool Grounded LLM Agents for Black Box Setting (2402.10051v1)

Published 15 Feb 2024 in cs.AI and cs.CL

Abstract: While LLMs have demonstrated enhanced capabilities in function-calling, these advancements primarily rely on accessing the functions' responses. This methodology is practical for simpler APIs but faces scalability issues with irreversible APIs that significantly impact the system, such as a database deletion API. Similarly, processes requiring extensive time for each API call and those necessitating forward planning, like automated action pipelines, present complex challenges. Furthermore, scenarios often arise where a generalized approach is needed because algorithms lack direct access to the specific implementations of these functions or secrets to use them. Traditional tool planning methods are inadequate in these cases, compelling the need to operate within black-box environments. Unlike their performance in tool manipulation, LLMs excel in black-box tasks, such as program synthesis. Therefore, we harness the program synthesis capabilities of LLMs to strategize tool usage in black-box settings, ensuring solutions are verified prior to implementation. We introduce TOPGUN, an ingeniously crafted approach leveraging program synthesis for black box tool planning. Accompanied by SwissNYF, a comprehensive suite that integrates black-box algorithms for planning and verification tasks, addressing the aforementioned challenges and enhancing the versatility and effectiveness of LLMs in complex API interactions. The public code for SwissNYF is available at https://github.com/iclr-dummy-user/SwissNYF.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
  3. Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  5. tree-sitter/tree-sitter: v0.21.0-pre-release-1, 2024. URL https://doi.org/10.5281/zenodo.10638807.
  6. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
  7. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588, 2022.
  8. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
  9. Anytool: Self-reflective, hierarchical agents for large-scale api calls. arXiv preprint arXiv:2402.04253, 2024.
  10. Abstract syntax trees-and their role in model driven software development. In International Conference on Software Engineering Advances (ICSEA 2007), pp.  38–38. IEEE, 2007.
  11. Pal: Program-aided language models. In International Conference on Machine Learning, pp.  10764–10799. PMLR, 2023.
  12. Tora: A tool-integrated reasoning agent for mathematical problem solving. arXiv preprint arXiv:2309.17452, 2023.
  13. Toolkengpt: Augmenting frozen language models with massive tools via tool embeddings. arXiv preprint arXiv:2305.11554, 2023.
  14. Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Findings of the Association for Computational Linguistics: ACL 2023, pp.  1049–1065, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.67. URL https://aclanthology.org/2023.findings-acl.67.
  15. Chain of code: Reasoning with a language model-augmented code emulator. arXiv preprint arXiv:2312.04474, 2023.
  16. LlamaIndex. Llamahub, 2023. URL https://web.archive.org/web/20231229215448/https://llamahub.ai/.
  17. Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842, 2023a.
  18. Gear: Augmenting language models with generalizable and efficient tool resolution. arXiv preprint arXiv:2307.08775, 2023b.
  19. Are code pre-trained models powerful to learn code syntax and semantics?, 2023.
  20. Representing partial programs with blended abstract semantics. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=mCtadqIxOJ.
  21. Art: Automatic multi-step reasoning and tool-use for large language models. arXiv preprint arXiv:2303.09014, 2023.
  22. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255, 2022.
  23. Neuro-symbolic program synthesis. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=rJ0JwFcex.
  24. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334, 2023.
  25. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789, 2023.
  26. Improving language understanding by generative pre-training. 2018.
  27. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  28. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  29. Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  30. One embedder, any task: Instruction-finetuned text embeddings. arXiv preprint arXiv:2212.09741, 2022.
  31. Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. arXiv preprint arXiv:2306.05301, 2023.
  32. Lemur: Harmonizing natural language and code for language agents. arXiv preprint arXiv:2310.06830, 2023.
  33. Gpt4tools: Teaching large language model to use tools via self-instruction, 2023.
  34. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  35. When language model meets private library. arXiv preprint arXiv:2210.17236, 2022.
  36. Evaluating and improving tool-augmented computation-intensive math reasoning, 2023a.
  37. Reverse chain: A generic-rule for llms to master multi-api planning. arXiv preprint arXiv:2310.04474, 2023b.
  38. Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.  5673–5684, 2023.
  39. Language agent tree search unifies reasoning acting and planning in language models. arXiv preprint arXiv:2310.04406, 2023a.
  40. Solving challenging math word problems using gpt-4 code interpreter with code-based self-verification. arXiv preprint arXiv:2308.07921, 2023b.
  41. Toolchain*: Efficient action space navigation in large language models with a* search, 2023a.
  42. Toolqa: A dataset for llm question answering with external tools, 2023b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Somnath Sendhil Kumar (2 papers)
  2. Dhruv Jain (10 papers)
  3. Eshaan Agarwal (2 papers)
  4. Raunak Pandey (1 paper)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets