Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speaker Verification in Agent-Generated Conversations (2405.10150v2)

Published 16 May 2024 in cs.CL

Abstract: The recent success of LLMs has attracted widespread interest to develop role-playing conversational agents personalized to the characteristics and styles of different speakers to enhance their abilities to perform both general and special purpose dialogue tasks. However, the ability to personalize the generated utterances to speakers, whether conducted by human or LLM, has not been well studied. To bridge this gap, our study introduces a novel evaluation challenge: speaker verification in agent-generated conversations, which aimed to verify whether two sets of utterances originate from the same speaker. To this end, we assemble a large dataset collection encompassing thousands of speakers and their utterances. We also develop and evaluate speaker verification models under experiment setups. We further utilize the speaker verification models to evaluate the personalization abilities of LLM-based role-playing models. Comprehensive experiments suggest that the current role-playing models fail in accurately mimicking speakers, primarily due to their inherent linguistic characteristics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Can authorship attribution models distinguish speakers in speech transcripts? arXiv preprint arXiv:2311.07564.
  2. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology.
  3. Large language models meet harry potter: A dataset for aligning dialogue agents with characters. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 8506–8520.
  4. Yu-Hsin Chen and Jinho D Choi. 2016. Character identification on multiparty conversation: Identifying mentions of characters in tv shows. In Proceedings of the 17th annual meeting of the special interest group on discourse and dialogue, pages 90–100.
  5. Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), volume 1, pages 539–546. IEEE.
  6. Mark my words! linguistic style accommodation in social media. In Proceedings of the 20th international conference on World wide web, pages 745–754.
  7. Patricia Díaz-Muñoz. 2020. Accommodation in fiction: The role of convergence in intergroup encounters. Patchwork, 5.:34–54.
  8. S3: Social-network simulation system with large language model-empowered agents. arXiv preprint arXiv:2307.14984.
  9. Communication accommodation theory: Past accomplishments, current trends, and future prospects. Language Sciences, 99:101571.
  10. Language style matching as a predictor of social dynamics in small groups. Communication Research, 37(1):3–19.
  11. Molly E Ireland and James W Pennebaker. 2010. Language style matching in writing: synchrony in essays, correspondence, and poetry. Journal of personality and social psychology, 99(3):549.
  12. Is chatgpt a good personality recognizer? a preliminary study. arXiv preprint arXiv:2307.03952.
  13. Analysis-based optimization of temporal dynamic convolutional neural network for text-independent speaker verification. IEEE Access.
  14. PERSONACHATGEN: Generating Personalized Dialogues using GPT-3. In Proceedings of the 1st Workshop on Customized Chat Grounding Persona and Knowledge, pages 29–48. Association for Computational Linguistics.
  15. Chatharuhi: Reviving anime character in reality via large language model. arXiv preprint arXiv:2308.09597.
  16. Fdn: Finite difference network with hierarchical convolutional features for text-independent speaker verification. arXiv e-prints, pages arXiv–2108.
  17. Cross-modal audio-visual co-learning for text-independent speaker verification. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE.
  18. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  19. Kate G Niederhoffer and James W Pennebaker. 2002. Linguistic style matching in social interaction. Journal of Language and Social Psychology, 21(4):337–360.
  20. Vocal accommodation in speech communication. Journal of Phonetics, 95:101196.
  21. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1–22.
  22. Learning interpretable style embeddings via prompting LLMs. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 15270–15290, Singapore. Association for Computational Linguistics.
  23. The development and psychometric properties of LIWC2015. University of Texas at Austin.
  24. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
  25. Learning Universal Authorship Representations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 913–919. Association for Computational Linguistics.
  26. Role play with large language models. Nature, 623(7987):493–498.
  27. Character-llm: A trainable agent for role-playing. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13153–13187.
  28. Roleeval: A bilingual role evaluation benchmark for large language models. arXiv preprint arXiv:2312.16132.
  29. Overview of the authorship verification task at pan 2022. In CEUR workshop proceedings, volume 3180, pages 2301–2313.
  30. Persona authentication through generative dialogue.
  31. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  32. Characterchat: Learning towards conversational ai with personalized social support. arXiv preprint arXiv:2308.10278.
  33. Charactereval: A chinese benchmark for role-playing conversational agent evaluation. arXiv preprint arXiv:2401.01275.
  34. Generalized end-to-end loss for speaker verification. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4879–4883. IEEE.
  35. Can authorship representation learning capture stylistic features? Transactions of the Association for Computational Linguistics, 11:1416–1431.
  36. A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432.
  37. Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. arXiv preprint arXiv:2310.00746.
  38. Same Author or Just Same Topic? Towards Content-Independent Style Representations. In Proceedings of the 7th Workshop on Representation Learning for NLP, pages 249–268. Association for Computational Linguistics.
  39. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  40. Transfertransfo: A transfer learning approach for neural network based conversational agents. arXiv preprint arXiv:1901.08149.
  41. How far are we from believable ai agents? a framework for evaluating the believability of human behavior simulation. arXiv preprint arXiv:2312.17115.
  42. Beyond Goldfish Memory: Long-Term Open-Domain Conversation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5180–5197. Association for Computational Linguistics.
  43. Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2204–2213.
  44. A survey of large language models. arXiv preprint arXiv:2303.18223.
  45. Characterglm: Customizing chinese conversational ai characters with large language models. arXiv preprint arXiv:2311.16832.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yizhe Yang (12 papers)
  2. Heyan Huang (107 papers)
  3. Palakorn Achananuparp (16 papers)
  4. Jing Jiang (192 papers)
  5. Ee-Peng Lim (57 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets