Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
92 tokens/sec
Gemini 2.5 Pro Premium
50 tokens/sec
GPT-5 Medium
22 tokens/sec
GPT-5 High Premium
21 tokens/sec
GPT-4o
97 tokens/sec
DeepSeek R1 via Azure Premium
87 tokens/sec
GPT OSS 120B via Groq Premium
459 tokens/sec
Kimi K2 via Groq Premium
230 tokens/sec
2000 character limit reached

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets (2406.18518v1)

Published 26 Jun 2024 in cs.CL, cs.AI, cs.LG, and cs.SE

Abstract: The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scalable and structured manner. Each data in our dataset is verified through three hierarchical stages: format checking, actual function executions, and semantic verification, ensuring its reliability and correctness. We demonstrate that models trained with our curated datasets, even with only 7B parameters, can achieve state-of-the-art performance on the Berkeley Function-Calling Benchmark, outperforming multiple GPT-4 models. Moreover, our 1B model achieves exceptional performance, surpassing GPT-3.5-Turbo and Claude-3 Haiku. We release a dataset containing 60,000 high-quality entries, aiming to advance the field of function-calling agent domains. The dataset is available on Huggingface: https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k and the project homepage: https://apigen-pipeline.github.io/

Citations (17)

Summary

  • The paper introduces the APIGen pipeline that automates the creation of high-quality, diverse function-calling datasets.
  • The methodology employs a multi-stage verification process—format, execution, and semantic checks—to ensure data reliability.
  • Benchmarked models trained with APIGen data demonstrate superior performance, with even small models outperforming established baselines.

An Overview of "APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets"

The paper "APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets" addresses a key challenge in the development and fine-tuning of function-calling agent models: the lack of high-quality, diverse datasets. The authors present APIGen, a comprehensive and automated data generation pipeline designed to create verifiable datasets for function-calling applications. This pipeline ensures that each generated data point goes through a rigorous multi-stage verification process, including format checking, actual function executions, and semantic verification. This process ensures the reliability and correctness of the generated data.

Key Contributions

The paper makes several notable contributions to the field of AI and function-calling models:

  1. APIGen Pipeline: The introduction of an automated pipeline that ensures the generation of high-quality, diverse datasets. APIGen is designed to be scalable and adaptable to various models and APIs.
  2. Verification Process: A multi-stage data verification system that rigorously checks the generated data for format correctness, successful execution, and semantic alignment with the query.
  3. Benchmarked Performance: Demonstrating that models trained with datasets generated by APIGen, even those as small as 7B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Benchmark (BFCL). Specifically, their 1B model surpasses notable models such as GPT-3.5-Turbo and Claude-3 Haiku.
  4. Release of Dataset: The authors release a dataset of 60,000 high-quality entries, available on Huggingface, aiming to facilitate further research in the function-calling domain.

APIGen Framework

The APIGen framework operates by utilizing sampled APIs and seed query-answer pairs to generate diverse datasets. The process is structured as follows:

  • Data Generation: APIs and QA pairs are sampled, and through various prompt templates, the generator LLM produces relevant function-calling data.
  • Verification Stages:
  1. Format Checking: Ensures that the generated data adheres to a specified JSON format. This step helps filter out poorly formatted or incomplete data points.
  2. Execution Checking: Validates that the function calls can be executed correctly, filtering out cases with errors or infeasible arguments.
  3. Semantic Checking: Uses another LLM to check whether the function calls and their results align with the query's objective, ensuring that the data points are meaningful and relevant.

Dataset and Experimentation

The dataset preparation process involves collecting executable APIs and ensuring they are well-documented and accessible. The final dataset includes 3,673 APIs across 21 categories. This dataset is used to train function-calling models, and the results are benchmarked against other leading models.

In the experiments, two models are trained using the datasets generated by APIGen: a 1.3B parameter model (xLAM-1B) and a 7B parameter model (xLAM-7B). The models are evaluated using the Berkeley Function-Calling Benchmark (BFCL):

  • xLAM-7B achieves 6th place on the leaderboard, outperforming models like GPT-4o and Gemini-1.5-Pro.
  • xLAM-1B also performs impressively, ranking 24th and surpassing models such as GPT-3.5-Turbo and Claude-3 Haiku.

An ablation paper further reinforces the importance of APIGen's verification process. Models trained with unverified data show a significant drop in performance, highlighting the need for strict data quality control.

Implications and Future Directions

The APIGen pipeline sets a precedent for the importance of dataset quality in training function-calling models. By ensuring rigorous verification, APIGen allows smaller models to compete with much larger ones in terms of performance. This has practical implications for the deployment of function-calling agents in real-world applications, where high efficacy and reliability are paramount.

The APIGen framework not only advances the theoretical understanding of function-calling datasets but also provides a practical tool for researchers. Releasing the dataset fosters further research and development, potentially extending the utility of function-calling models in various domains such as social media, financial services, and more.

Future work could expand APIGen’s capabilities to encompass more diverse types of APIs and support for multi-turn interactions. This would make the framework even more robust and versatile, enabling it to handle a wider array of real-world scenarios.

In conclusion, the paper presents a detailed and well-structured approach to solving a significant challenge in the field of AI. APIGen’s ability to generate high-quality, verifiable datasets marks a significant step forward in the development of efficient and reliable function-calling agent models.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube