ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis (2410.18447v2)
Abstract: Supervised fine-tuning (SFT) is a common method to enhance the tool calling capabilities of LLMs, with the training data often being synthesized. The current data synthesis process generally involves sampling a set of tools, formulating a requirement based on these tools, and generating the call statements. However, tools sampled randomly lack relevance, making them difficult to combine and thus reducing the diversity of the data. Additionally, current work overlooks the coherence between turns of dialogues, leading to a gap between the synthesized data and real-world scenarios. To address these issues, we propose a Graph-based Sampling strategy to sample more relevant tool combinations, and a Planned-generation strategy to create plans that guide the synthesis of coherent dialogues. We integrate these two strategies and enable multiple agents to synthesize the dialogue data interactively, resulting in our tool-calling data synthesis pipeline ToolFlow. Data quality assessments demonstrate improvements in the naturalness and coherence of our synthesized dialogues. Finally, we apply SFT on LLaMA-3.1-8B using 8,000 synthetic dialogues generated with ToolFlow. Results show that the model achieves tool-calling performance comparable to or even surpassing GPT-4, while maintaining strong general capabilities.
- Granite-function calling model: Introducing function calling abilities via multi-task learning of granular tasks. Preprint, arXiv:2407.00121.
- Constitutional ai: Harmlessness from ai feedback. Preprint, arXiv:2212.08073.
- API-BLEND: A comprehensive corpora for training and benchmarking API LLMs. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12859–12870, Bangkok, Thailand. Association for Computational Linguistics.
- The llama 3 herd of models. Preprint, arXiv:2407.21783.
- Evaluating coherence in dialogue systems using entailment. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3806–3812, Minneapolis, Minnesota. Association for Computational Linguistics.
- Measuring massive multitask language understanding. Preprint, arXiv:2009.03300.
- Tool documentation enables zero-shot tool-usage with large language models. Preprint, arXiv:2308.00675.
- A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 110–119, San Diego, California. Association for Computational Linguistics.
- API-bank: A comprehensive benchmark for tool-augmented LLMs. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3102–3116, Singapore. Association for Computational Linguistics.
- Toolace: Winning the points of llm function calling. Preprint, arXiv:2409.00920.
- Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets. Preprint, arXiv:2406.18518.
- Augmented language models: a survey. Preprint, arXiv:2302.07842.
- Gpt-4 technical report. Preprint, arXiv:2303.08774.
- Gorilla: Large language model connected with massive apis. Preprint, arXiv:2305.15334.
- Toolllm: Facilitating large language models to master 16000+ real-world apis. Preprint, arXiv:2307.16789.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. Preprint, arXiv:1908.10084.
- Tptu: Large language model-based ai agents for task planning and tool usage. Preprint, arXiv:2308.03427.
- Toolformer: Language models can teach themselves to use tools. Preprint, arXiv:2302.04761.
- C. E. Shannon. 1948. A mathematical theory of communication. The Bell System Technical Journal, 27(3):379–423.
- Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261.
- Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. Preprint, arXiv:2306.05301.
- Llms in the imaginarium: Tool learning through simulated trial and error. Preprint, arXiv:2403.04746.
- Self-instruct: Aligning language models with self-generated instructions. Preprint, arXiv:2212.10560.
- Wizardlm: Empowering large language models to follow complex instructions. Preprint, arXiv:2304.12244.
- Gpt4tools: Teaching large language model to use tools via self-instruction. Preprint, arXiv:2305.18752.
- React: Synergizing reasoning and acting in language models. Preprint, arXiv:2210.03629.
- Metamath: Bootstrap your own mathematical questions for large language models. Preprint, arXiv:2309.12284.
- Judging llm-as-a-judge with mt-bench and chatbot arena. In Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY, USA. Curran Associates Inc.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.