Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
92 tokens/sec
Gemini 2.5 Pro Premium
50 tokens/sec
GPT-5 Medium
22 tokens/sec
GPT-5 High Premium
21 tokens/sec
GPT-4o
97 tokens/sec
DeepSeek R1 via Azure Premium
87 tokens/sec
GPT OSS 120B via Groq Premium
459 tokens/sec
Kimi K2 via Groq Premium
230 tokens/sec
2000 character limit reached

APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay (2504.03601v3)

Published 4 Apr 2025 in cs.CL, cs.AI, and cs.LG

Abstract: Training effective AI agents for multi-turn interactions requires high-quality data that captures realistic human-agent dynamics, yet such data is scarce and expensive to collect manually. We introduce APIGen-MT, a two-phase framework that generates verifiable and diverse multi-turn agent data. In the first phase, our agentic pipeline produces detailed task blueprints with ground-truth actions, leveraging a committee of LLM reviewers and iterative feedback loops. These blueprints are then transformed into complete interaction trajectories through simulated human-agent interplay. We train a family of models -- the xLAM-2-fc-r series with sizes ranging from 1B to 70B parameters. Our models outperform frontier models such as GPT-4o and Claude 3.5 on $\tau$-bench and BFCL benchmarks, with the smaller models surpassing their larger counterparts, particularly in multi-turn settings, while maintaining superior consistency across multiple trials. Comprehensive experiments demonstrate that our verified blueprint-to-details approach yields high-quality training data, enabling the development of more reliable, efficient, and capable agents. We open-source 5K synthetic data trajectories and the trained xLAM-2-fc-r models to advance research in AI agents. Models at https://huggingface.co/collections/Salesforce/xlam-2-67ef5be12949d8dcdae354c4; Dataset at https://huggingface.co/datasets/Salesforce/APIGen-MT-5k and Website at https://apigen-mt.github.io

Summary

  • The paper introduces APIGen-MT, a two-phase agentic pipeline that generates diverse, verifiable multi-turn interaction data by simulating human-agent interplay using LLMs and validated task blueprints.
  • Models trained with APIGen-MT data significantly outperform other models on benchmarks like BFCL v3 and $\tau$-bench in multi-turn interactions, showing enhanced accuracy and consistency.
  • APIGen-MT's open-source approach provides datasets and models to enable further research and practical development of more capable AI agents for complex, domain-specific tasks.

Overview of APIGen-MT for Multi-Turn Interaction Data Generation

APIGen-MT introduces a novel two-phase framework that aims to address the scarcity of high-quality data necessary for training AI agents capable of effective multi-turn interactions. Typical approaches to data collection, particularly in realistic human-agent dynamics, are often hindered by logistical challenges and high costs. This framework leverages innovative simulated agent-human interactions to generate diverse and verifiable multi-turn data, addressing both the need for data diversity and the accuracy of interactions.

Framework Description

APIGen-MT comprises two main phases: task configuration generation and interaction trajectory collection. In the first phase, the framework generates detailed task blueprints that include a user intent, a sequence of ground-truth actions, and expected final outputs. This is achieved using a combination of LLM-based data generation, multi-stage validation, and iterative refinement through feedback loops. The system's incorporation of environmental constraints, domain-specific data, user personas, and validate execution pathways ensures realistic task modeling.

The second phase focuses on collecting interaction trajectories simulating conversations between AI agents and human LLMs. These trajectories capture dialogue turns, agent actions, and environment responses while being guided by the pre-validated task blueprint. Successful completion of these trajectories relies on meeting the predefined task goals, ensuring a database of high-fidelity interaction data.

Experimental Results

The framework has demonstrated significant effectiveness in generating training data that enhances AI agent capabilities. Evaluated against robust benchmarks such as BFCL v3 and τ\tau-bench, APIGen-MT-derived models, particularly the xLAM-2-fc-r series, substantially outperform various open-source and proprietary models in multi-turn settings - a scenario where models often struggle to maintain consistency across numerous turns. For instance, the xLAM-2-70b-fc-r model dominates the BFCL v3 with an overall accuracy of 78.19%. On the τ\tau-bench, this model performs competitively when compared to leading proprietary models such as Claude and GPT series.

Furthermore, the trained models exhibit enhanced consistency and reliability in multi-turn interactions, indicated by stable pass^k curves across various trials. Such reliability is crucial for deploying AI models in authentic applications where the precision of interactions is paramount.

Implications and Future Directions

APIGen-MT's open-source approach aims to stimulate further research by providing both the synthetic datasets and trained models. The implications are varied; at a practical level, the generated data facilitates the development of more capable AI agents, capable of serving complex domain-specific tasks efficiently. Theoretically, APIGen-MT presents a framework that other researchers can build upon, potentially enhancing the agentic pipeline and its application in varied fields beyond customer service, including healthcare and finance.

Despite its notable effectiveness, the framework offers avenues for future exploration, particularly in refining user simulations and expanding the framework's adaptability to broader domains and tasks. Continuous refinement of validation processes and exploration of reinforcement learning techniques could further develop this pipeline into a comprehensive tool for more dynamic AI model training.

In conclusion, APIGen-MT delivers a structured, methodical approach to addressing data scarcity in agent training, enhancing both the robustness and capabilities of AI agents in multi-turn interactions, and providing a solid foundation for continued research and improvement.