Enhancing Role-playing Systems through Aggressive Queries: Evaluation and Improvement (2402.10618v2)
Abstract: The advent of LLMs has propelled dialogue generation into new realms, particularly in the field of role-playing systems (RPSs). While enhanced with ordinary role-relevant training dialogues, existing LLM-based RPSs still struggle to align with roles when handling intricate and trapped queries in boundary scenarios. In this paper, we design the Modular ORchestrated Trap-setting Interaction SystEm (MORTISE) to benchmark and improve the role-playing LLMs' performance. MORTISE can produce highly role-relevant aggressive queries through the collaborative effort of multiple LLM-based modules, and formulate corresponding responses to create an adversarial training dataset via a consistent response generator. We select 190 Chinese and English roles to construct aggressive queries to benchmark existing role-playing LLMs. Through comprehensive evaluation, we find that existing models exhibit a general deficiency in role alignment capabilities. We further select 180 of the roles to collect an adversarial training dataset (named RoleAD) and retain the other 10 roles for testing. Experiments on models improved by RoleAD indicate that our adversarial dataset ameliorates this deficiency, with the improvements demonstrating a degree of generalizability in ordinary scenarios.
- Qwen technical report. ArXiv, abs/2309.16609.
- A synthetic data generation framework for grounded dialogues. In Annual Meeting of the Association for Computational Linguistics.
- Murray R. Barrick and Michael K. Mount. 1991. The big five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44:1–26.
- Colossal-ai: A unified deep learning system for large-scale parallel training. arXiv preprint arXiv:2110.14883.
- Gregory J Boyle. 1995. Myers-briggs type indicator (mbti): some psychometric limitations. Australian Psychologist, 30(1):71–74.
- Language models are few-shot learners. ArXiv, abs/2005.14165.
- Large language models meet harry potter: A bilingual dataset for aligning dialogue agents with characters.
- Enhancing chat language models by scaling high-quality instructional conversations. In Conference on Empirical Methods in Natural Language Processing.
- Design and evaluation of intelligent agent prototypes for assistance with focus and productivity at work. Proceedings of the 25th International Conference on Intelligent User Interfaces.
- Targen: Targeted data generation with large language models. ArXiv, abs/2310.17876.
- Measuring massive multitask language understanding. ArXiv, abs/2009.03300.
- Towards better instruction following language models for chinese: Investigating the impact of training data and evaluation. ArXiv, abs/2304.07854.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- Personallm: Investigating the ability of large language models to express big five personality traits.
- Chatharuhi: Reviving anime character in reality via large language model. ArXiv, abs/2308.09597.
- Camel: Communicative agents for "mind" exploration of large scale language model society. ArXiv, abs/2303.17760.
- A diversity-promoting objective function for neural conversation models. ArXiv, abs/1510.03055.
- Synthetic data generation with large language models for text classification: Potential and limitations. ArXiv, abs/2310.07849.
- Large language models are superpositions of all characters: Attaining arbitrary role-play via self-alignment.
- To infinity and beyond: Show-1 and showrunner agents in multi-agent simulations.
- Editing personality for llms. ArXiv, abs/2310.02168.
- OpenAI. 2022. Chatgpt: Optimizing language models for dialogue. OpenAI Blog.
- OpenAI. 2023. Gpt-4 technical report.
- Dialogbench: Evaluating llms as human-like dialogue systems. ArXiv, abs/2311.01677.
- Keyu Pan and Yawen Zeng. 2023. Do llms possess a personality? making the mbti test an amazing evaluation for large language models. ArXiv, abs/2307.16180.
- Generative agents: Interactive simulacra of human behavior. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology.
- Personality traits in large language models. ArXiv, abs/2307.00184.
- Lamp: When large language models meet personalization. ArXiv, abs/2304.11406.
- Role play with large language models. Nature, 623:493 – 498.
- Character-LLM: A trainable agent for role-playing. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13153–13187, Singapore. Association for Computational Linguistics.
- Roleeval: A bilingual role evaluation benchmark for large language models.
- A zero-shot approach for multi-user task-oriented dialog generation. In International Conference on Natural Language Generation.
- Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
- Charactereval: A chinese benchmark for role-playing conversational agent evaluation.
- Openchat: Advancing open-source language models with mixed-quality data. ArXiv, abs/2309.11235.
- Does role-playing chatbots capture the character personalities? assessing personality traits for role-playing chatbots. ArXiv, abs/2310.17976.
- Self-instruct: Aligning language models with self-generated instructions. In Annual Meeting of the Association for Computational Linguistics.
- Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. ArXiv, abs/2310.00746.
- Multi-party chat: Conversational agents in group settings with humans and models. ArXiv, abs/2304.13835.
- Baize: An open-source chat model with parameter-efficient tuning on self-chat data. ArXiv, abs/2304.01196.
- Cvalues: Measuring the values of chinese large language models from safety to responsibility. ArXiv, abs/2307.09705.
- Baichuan 2: Open large-scale language models. ArXiv, abs/2309.10305.
- Fuzzllm: A novel and universal fuzzing framework for proactively discovering jailbreak vulnerabilities in large language models. ArXiv, abs/2309.05274.
- Large language model as attributed training data generator: A tale of diversity and bias. ArXiv, abs/2306.15895.
- Bertscore: Evaluating text generation with bert. ArXiv, abs/1904.09675.
- Judging llm-as-a-judge with mt-bench and chatbot arena. ArXiv, abs/2306.05685.
- Characterglm: Customizing chinese conversational ai characters with large language models. ArXiv, abs/2311.16832.
- Yihong Tang (24 papers)
- Jiao Ou (8 papers)
- Che Liu (59 papers)
- Fuzheng Zhang (60 papers)
- Di Zhang (230 papers)
- Kun Gai (125 papers)