Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models (2402.16696v3)
Abstract: Tool-augmented LLMs are attracting widespread attention when accessing up-to-date knowledge and alleviating hallucination issues. Nowadays, advanced closed-source LLMs (e.g., ChatGPT) have demonstrated surprising tool-usage capabilities through prompting and in-context learning techniques. To empower the capabilities of open-source LLMs (e.g., LLaMA) in manipulating tools, current efforts focus on either template-driven or token-triggered tool-usage. However, the former hampers LLMs' flexibility to address diverse user's queries due to constrained tool interactions, while the latter limits the generalizability when engaging with new tools, since tool-usage learning is based on task- and tool-specific datasets. To alleviate these concerns, in this paper, we propose a decision-aware and generalizable tool-usage framework (DEER). Specifically, we first construct the tool-usage samples with multiple decision branches via an automatic generation pipeline, thereby inspiring the decision-making awareness of LLMs under diverse scenarios. Meanwhile, we propose a novel tool sampling strategy to enhance the generalizability of LLMs over unseen tools. Extensive experiments demonstrate that our proposed DEER is effective and significantly outperforms baselines across various datasets.
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. arXiv:2205.14135.
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings. arXiv:2305.11554.
- Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models. arXiv:2308.00675.
- LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations.
- MetaTool Benchmark: Deciding Whether to Use Tools and Which to Use. arXiv:2310.03128.
- Survey of Hallucination in Natural Language Generation. ACM Computing Surveys, 55(12):1–38.
- API-Bank: A Benchmark for Tool-Augmented LLMs. arXiv:2304.08244.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models. arXiv:2304.09842.
- Augmented Language Models: A Survey. arXiv:2302.07842.
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774.
- Training language models to follow instructions with human feedback. arXiv:2203.02155.
- Bleu: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318.
- ART: Automatic multi-step reasoning and tool-use for large language models. arXiv:2303.09014.
- TALM: Tool Augmented Language Models. arXiv:2205.12255.
- Gorilla: Large Language Model Connected with Massive APIs. arXiv:2305.15334.
- Measuring and Narrowing the Compositionality Gap in Language Models. arXiv:2210.03350.
- Tool Learning with Foundation Models. arXiv:2304.08354.
- ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs. arXiv:2307.16789.
- Nils Reimers and Iryna Gurevych. 2020. Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
- Toolformer: Language Models Can Teach Themselves to Use Tools. arXiv:2302.04761.
- HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace. arXiv:2303.17580.
- RestGPT: Connecting Large Language Models with Real-World Applications via RESTful APIs. arXiv:2306.06624.
- Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958.
- ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases. arXiv:2306.05301.
- Stanford alpaca: An instruction-following llama model. GitHub.
- LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971.
- Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288.
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems.
- WizardLM: Empowering Large Language Models to Follow Complex Instructions. arXiv:2304.12244.
- On the Tool Manipulation Capability of Open-source Large Language Models. arXiv:2305.16504.
- GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction. arXiv:2305.18752.
- ReAct: Synergizing Reasoning and Acting in Language Models. In The Eleventh International Conference on Learning Representations.
- Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv:2306.05685.
- ToolQA: A Dataset for LLM Question Answering with External Tools. arXiv:2306.13304.
- Anchun Gui (3 papers)
- Jian Li (667 papers)
- Yong Dai (33 papers)
- Nan Du (66 papers)
- Han Xiao (104 papers)