Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions (2408.15787v1)
Abstract: Virtual counselors powered by LLMs aim to create interactive support systems that effectively assist clients struggling with mental health challenges. To replicate counselor-client conversations, researchers have built an online mental health platform that allows professional counselors to provide clients with text-based counseling services for about an hour per session. Notwithstanding its effectiveness, challenges exist as human annotation is time-consuming, cost-intensive, privacy-protected, and not scalable. To address this issue and investigate the applicability of LLMs in psychological counseling conversation simulation, we propose a framework that employs two LLMs via role-playing for simulating counselor-client interactions. Our framework involves two LLMs, one acting as a client equipped with a specific and real-life user profile and the other playing the role of an experienced counselor, generating professional responses using integrative therapy techniques. We implement both the counselor and the client by zero-shot prompting the GPT-4 model. In order to assess the effectiveness of LLMs in simulating counselor-client interactions and understand the disparities between LLM- and human-generated conversations, we evaluate the synthetic data from various perspectives. We begin by assessing the client's performance through automatic evaluations. Next, we analyze and compare the disparities between dialogues generated by the LLM and those generated by professional counselors. Furthermore, we conduct extensive experiments to thoroughly examine the performance of our LLM-based counselor trained with synthetic interactive dialogues by benchmarking against state-of-the-art models for mental health.
- Let the LLMs Talk: Simulating Human-to-Human Conversational QA via Zero-Shot LLM-to-LLM Interactions. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining (Merida, Mexico) (WSDM ’24). Association for Computing Machinery, New York, NY, USA, 8–17. https://doi.org/10.1145/3616855.3635856
- UserSimCRS: a user simulation toolkit for evaluating conversational recommender systems. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 1160–1163.
- Qwen Technical Report. arXiv:2309.16609 [cs.CL] https://arxiv.org/abs/2309.16609
- Krisztian Balog and ChengXiang Zhai. 2024. Tutorial on User Simulation for Evaluating Information Access Systems on the Web. In Companion Proceedings of the ACM on Web Conference 2024. 1254–1257.
- What can speech and language tell us about the working alliance in psychotherapy. arXiv preprint arXiv:2206.08835 (2022).
- Nolwenn Bernard and Krisztian Balog. 2024a. Identifying Breakdowns in Conversational Recommender Systems using User Simulation. In Proceedings of the 6th ACM Conference on Conversational User Interfaces. 1–10.
- Nolwenn Bernard and Krisztian Balog. 2024b. Towards a Formal Characterization of User Simulation Objectives in Conversational Information Access. In Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval. 185–193.
- Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL] https://arxiv.org/abs/2005.14165
- Soulchat: Improving llms’ empathy, listening, and comfort abilities through fine-tuning with multi-turn empathy conversations. In Findings of the Association for Computational Linguistics: EMNLP 2023. 1170–1183.
- Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory. arXiv:2406.14373 [cs.AI] https://arxiv.org/abs/2406.14373
- DeepSeek-AI. 2024a. DeepSeek LLM: Scaling Open-Source Language Models with Longtermism. arXiv:2401.02954 [cs.CL] https://arxiv.org/abs/2401.02954
- DeepSeek-AI. 2024b. DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model. arXiv:2405.04434 [cs.CL] https://arxiv.org/abs/2405.04434
- Department of Psychology Ohio University December 11, 2000. (2000).
- ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools. arXiv:2406.12793
- AI and the transformation of social science research. Science 380, 6650 (2023), 1108–1109.
- Clara E Hill. 2020. Helping skills: Facilitating exploration, insight, and action. American Psychological Association.
- The efficacy of cognitive behavioral therapy: A review of meta-analyses. Cognitive therapy and research 36 (2012), 427–440.
- Teaching Plan Generation and Evaluation With GPT-4: Unleashing the Potential of LLM in Instructional Design. IEEE Transactions on Learning Technologies 17 (2024), 1471–1485. https://doi.org/10.1109/TLT.2024.3384765
- Concept–An Evaluation Protocol on Conversation Recommender Systems with System-and User-centric Factors. arXiv preprint arXiv:2404.03304 (2024).
- Language Model Can Do Knowledge Tracing: Simple but Effective Method to Integrate Language Model and Knowledge Tracing Task. arXiv:2406.02893 [cs.CL] https://arxiv.org/abs/2406.02893
- Generative Agent for Teacher Training: Designing Educational Problem-Solving Simulations with Large Language Model-based Agents for Pre-Service Teachers. In NeurIPS’23 Workshop on Generative AI for Education (GAIED). NeurIPS. https://api.semanticscholar.org/CorpusID:266874743
- Understanding Client Reactions in Online Mental Health Counseling. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 10358–10376. https://doi.org/10.18653/v1/2023.acl-long.577
- Explainable Few-shot Knowledge Tracing. arXiv:2405.14391 [cs.AI] https://arxiv.org/abs/2405.14391
- Systematic review and meta-analysis of AI-based conversational agents for promoting mental health and well-being. NPJ Digital Medicine 6, 1 (2023), 236.
- Agent hospital: A simulacrum of hospital with evolvable medical agents. arXiv preprint arXiv:2405.02957 (2024).
- Chatcounselor: A large language models for mental health support. arXiv preprint arXiv:2309.15461 (2023).
- The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. arXiv:2408.06292 [cs.AI] https://arxiv.org/abs/2408.06292
- Gpteach: Interactive ta training with gpt-based students. In Proceedings of the tenth acm conference on learning@ scale. 226–236.
- OpenAI. 2024. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL] https://arxiv.org/abs/2303.08774
- Exploiting simulated user feedback for conversational search: Ranking, rewriting, and beyond. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 632–642.
- Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442 [cs.HC] https://arxiv.org/abs/2304.03442
- Social Simulacra: Creating Populated Prototypes for Social Computing Systems. arXiv:2208.04024 [cs.HC] https://arxiv.org/abs/2208.04024
- William E Piper. 2008. Underutilization of short-term group therapy: Enigmatic or understandable? Psychotherapy Research 18, 2 (2008), 127–138.
- Smile: Single-turn to multi-turn inclusive language expansion via chatgpt for mental health support. arXiv preprint arXiv:2305.00450 (2023).
- PsyChat: A Client-Centric Dialogue System for Mental Health Support. In 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD). 2979–2984. https://doi.org/10.1109/CSCWD61410.2024.10580641
- A benchmark for understanding dialogue safety in mental health support. In CCF International Conference on Natural Language Processing and Chinese Computing. Springer, 1–13.
- Carl R Rogers. 1946. Significant aspects of client-centered therapy. American psychologist 1, 10 (1946), 415–422.
- AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments. arXiv preprint arXiv:2405.07960 (2024).
- CLASS: A Design Framework for Building Intelligent Tutoring Systems Based on Learning Science principles. In Findings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 1941–1961. https://doi.org/10.18653/v1/2023.findings-emnlp.130
- Psyqa: A chinese dataset for generating long counseling text for mental health support. arXiv preprint arXiv:2106.01702 (2021).
- LittleMu: Deploying an Online Virtual Teaching Assistant via Heterogeneous Sources Integration and Chain of Teach Prompts. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 4843–4849.
- Bruce E Wampold. 2013. The great psychotherapy debate: Models, methods, and findings. Routledge.
- Towards a Client-Centered Assessment of LLM Therapists by Client Simulation. arXiv preprint arXiv:2406.12266 (2024).
- PATIENT-ΨΨ\Psiroman_Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals. arXiv:2405.19660 [cs.CL] https://arxiv.org/abs/2405.19660
- C Seth Warren. 1998. Models of brief psychodynamic therapy: A comparative approach. Psychology (1998).
- Joseph Weizenbaum. 1966. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM 9, 1 (jan 1966), 36–45. https://doi.org/10.1145/365153.365168
- Can Large Language Model Agents Simulate Human Trust Behaviors? arXiv:2402.04559 [cs.AI] https://arxiv.org/abs/2402.04559
- ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World. arXiv preprint arXiv:2406.13890 (2024).
- Qwen2 Technical Report. arXiv:2407.10671 [cs.CL] https://arxiv.org/abs/2407.10671
- MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education. arXiv:2404.06711 [cs.CL] https://arxiv.org/abs/2404.06711
- CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling. arXiv:2405.16433 [cs.CL] https://arxiv.org/abs/2405.16433
- GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Long Beach, CA, USA) (KDD ’23). Association for Computing Machinery, New York, NY, USA, 5564–5575. https://doi.org/10.1145/3580305.3599832
- Simulating Classroom Education with LLM-Empowered Agents. arXiv preprint arXiv:2406.19226 (2024).
- Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=uccHPGDlao
- Automatic Lesson Plan Generation via Large Language Models with Self-critique Prompting. In International Conference on Artificial Intelligence in Education. Springer, 163–178.
- LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models. arXiv:2403.13372 [cs.CL] https://arxiv.org/abs/2403.13372
- Huachuan Qiu (12 papers)
- Zhenzhong Lan (56 papers)