ConvoGen: Enhancing Conversational AI with Synthetic Data: A Multi-Agent Approach (2503.17460v2)

Published 21 Mar 2025 in cs.CL

Abstract: In this paper, we present ConvoGen: an innovative framework for generating synthetic conversational data using multi-agent systems. Our method leverages few-shot learning and introduces iterative sampling from a dynamically updated few-shot hub to create diverse and realistic conversational scenarios. The generated data has numerous applications, including training and evaluating conversational AI models, and augmenting existing datasets for tasks like conversational intent classification or conversation summarization. Our experiments demonstrate the effectiveness of this method in producing high-quality diverse synthetic conversational data, highlighting its potential to enhance the development and evaluation of conversational AI systems.

PDF Abstract

The paper "ConvoGen: Enhancing Conversational AI with Synthetic Data: A Multi-Agent Approach" (Gody et al., 21 Mar 2025 ) introduces ConvoGen, a multi-agent framework for generating synthetic conversational data, particularly focusing on multi-party scenarios. The core problem addressed is the challenge of obtaining diverse, realistic, and scalable conversational datasets for training and evaluating conversational AI models. Traditional methods like web crawling and crowdsourcing face issues with noise, toxicity, cost, time, and scalability, especially for multi-party interactions.

ConvoGen leverages a multi-agent system built on the AutoGen framework to simulate conversations between multiple AI agents. The process involves two main steps:

Experience Generation: An "experience generator," powered by a LLM like GPT-4o, creates a detailed scenario for the conversation. This includes defining a group of related personas (each with name, qualities, lifestyle, speech style, and memory), their relationships, a situation that brings them together, a specific topic to discuss, and an initial conversation starter. The generator uses few-shot learning, guided by example experiences. An innovative aspect is "iterative sampling," where the LLM's few-shot examples are dynamically updated with experiences it generated in previous turns, aiming to increase diversity in the generated experiences compared to using a fixed set of examples. The paper explores two methods for experience generation: one where the LLM generates personas from scratch, and another where it samples predefined personas from a separate hub.
Group Chat Instantiation: The generated experience is then used to set up a multi-agent conversation using the AutoGen framework. Each persona from the experience is used to configure a corresponding agent. The agent's configuration includes a system message defining the persona, a name, a description, and additional guidelines on how to behave in the conversation (e.g., limiting response length to avoid being overly chatty). A "user proxy" initiates the conversation by sending the situation, relationships, topic, and conversation starter to the AutoGen group chat manager. The manager orchestrates the conversation, selecting the next speaker (using either round-robin or an LLM-based mechanism) and broadcasting messages, until a maximum number of turns is reached.

The paper highlights several key contributions:

A multi-agent framework providing an intuitive and scalable solution for generating multi-party conversational datasets.
The introduction of iterative sampling to enhance the diversity of LLM-generated experiences and subsequent conversations.
Extensive evaluation of the generated data's lexical diversity using the Measure of Lexical Diversity (MTLD) and groundedness using an LLM-as-a-judge approach.

Experiments were conducted across various configurations, including generating personas vs. sampling them, using fixed few-shot examples vs. iterative sampling, and controlling agent chattiness. The generated datasets were compared against human conversational datasets like DailyDialog, EmpatheticDialogues, and PERSONA-CHAT.

Analysis revealed that:

Generated conversations, especially in initial experiments, tended to be significantly longer and more verbose than casual human conversations. Prompt tuning was necessary to make agents less chatty and produce more natural-length turns and conversations.
Generated conversations consistently showed higher lexical diversity (MTLD scores) compared to casual human datasets. While high diversity is beneficial, excessive diversity might be less typical for casual chat.
Iterative sampling resulted in slightly higher average MTLD scores compared to using a fixed shot, suggesting it helps increase content diversity.
LLM-as-a-judge evaluations indicated that the generated conversations were well-grounded in the input experiences, including the topic, situation, and specific persona attributes (qualities, speech style, lifestyle, memories, relationships).

In conclusion, ConvoGen demonstrates a practical method for generating diverse and grounded synthetic conversational data using multi-agent systems. The framework is shown to be capable of creating scenarios that guide agent behavior effectively. The generated data can be used to augment existing datasets for training conversational AI models or create tailored data for specific needs. However, the paper notes important implementation considerations, such as the potential for LLMs to generate harmful content (necessitating safety filters) and the sometimes unpredictable behavior of multi-agent frameworks like AutoGen (requiring prompt tuning and filtering of generated data).

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Reem Gody (3 papers)
Mahmoud Goudy (1 paper)
Ahmed Y. Tawfik (4 papers)

ConvoGen: Enhancing Conversational AI with Synthetic Data: A Multi-Agent Approach (2503.17460v2)

Related Papers