PersonaGym: Evaluating Persona Agents and LLMs (2407.18416v4)

Published 25 Jul 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Persona agents, which are LLM agents conditioned to act according to an assigned persona, enable contextually rich and user aligned interactions across domains like education and healthcare. However, evaluating how faithfully these agents adhere to their personas remains a significant challenge, particularly in free-form settings that demand consistency across diverse, persona-relevant environments. We introduce PersonaGym, the first dynamic evaluation framework for persona agents, and PersonaScore, a human-aligned automatic metric grounded in decision theory that enables comprehensive large-scale evaluation. Our evaluation of 10 leading LLMs across 200 personas and 10,000 questions reveals significant advancement opportunities. For example, GPT-4.1 had the exact same PersonaScore as LLaMA-3-8b despite being a more recent and advanced closed source model. Importantly, increased model size and complexity do not necessarily enhance persona agent capabilities, underscoring the need for algorithmic and architectural innovation toward faithful, performant persona agents.

Citations (6)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (9)

Tweets

https://twitter.com/ceobillionaire/status/1818102393281589403

https://twitter.com/lukOlejnik/status/1821135191940743250

https://twitter.com/GptMaestro/status/1818755010135834881

PersonaGym: Evaluating Persona Agents and LLMs (2407.18416v4)

Summary

Follow-up Questions

Related Papers

Authors (9)

Tweets