Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PSYDIAL: Personality-based Synthetic Dialogue Generation using Large Language Models (2404.00930v1)

Published 1 Apr 2024 in cs.CL

Abstract: We present a novel end-to-end personality-based synthetic dialogue data generation pipeline, specifically designed to elicit responses from LLMs via prompting. We design the prompts to generate more human-like dialogues considering real-world scenarios when users engage with chatbots. We introduce PSYDIAL, the first Korean dialogue dataset focused on personality-based dialogues, curated using our proposed pipeline. Notably, we focus on the Extraversion dimension of the Big Five personality model in our research. Experimental results indicate that while pre-trained models and those fine-tuned with a chit-chat dataset struggle to generate responses reflecting personality, models trained with PSYDIAL show significant improvements. The versatility of our pipeline extends beyond dialogue tasks, offering potential for other non-dialogue related applications. This research opens doors for more nuanced, personality-driven conversational AI in Korean and potentially other languages. Our code is publicly available at https://github.com/jiSilverH/psydial.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ji-Eun Han (6 papers)
  2. Jun-Seok Koh (3 papers)
  3. Hyeon-Tae Seo (2 papers)
  4. Du-Seong Chang (17 papers)
  5. Kyung-Ah Sohn (13 papers)
Citations (4)