SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents (2403.08715v3)
Abstract: Humans learn social skills through both imitation and social interaction. This social learning process is largely understudied by existing research on building language agents. Motivated by this gap, we propose an interactive learning method, SOTOPIA-$\pi$, improving the social intelligence of language agents. This method leverages behavior cloning and self-reinforcement training on filtered social interaction data according to LLM ratings. We show that our training method allows a 7B LLM to reach the social goal completion ability of an expert model (GPT-4-based agent), while improving the safety of language agents and maintaining general QA ability on the MMLU benchmark. We also find that this training paradigm uncovers some difficulties in LLM-based evaluation of social intelligence: LLM-based evaluators overestimate the abilities of the language agents trained specifically for social interaction.
- An in-depth look at gemini’s language abilities.
- A general theoretical paradigm to understand learning from human preferences.
- Constitutional ai: Harmlessness from ai feedback.
- Albert Bandura. 1976. Self-reinforcement: Theoretical and methodological considerations. Behaviorism, 4(2):135–155.
- otree—an open-source platform for laboratory, online, and field experiments. Journal of Behavioral and Experimental Finance, 9:88–97.
- Gmail smart compose: Real-time assisted writing. CoRR, abs/1906.00080.
- Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30.
- Qlora: Efficient finetuning of quantized llms.
- Raft: Reward ranked finetuning for generative foundation model alignment. arXiv preprint arXiv:2304.06767.
- Social chemistry 101: Learning to reason about social and moral norms. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 653–670, Online. Association for Computational Linguistics.
- Bias and fairness in large language models: A survey. arXiv preprint arXiv:2309.00770.
- Understanding social reasoning in language models with language models. arXiv preprint arXiv:2306.15448.
- Reinforced self-training (rest) for language modeling. CSCL.
- Hyowon Gweon. 2021. Inferential social learning: cognitive foundations of human social learning and teaching. Trends in Cognitive Sciences, 25(10):896–910.
- Socially intelligent machines that learn from humans and help humans learn. Philosophical Transactions of the Royal Society A, 381(2251):20220048.
- A comparative analysis of speed and accuracy for three off-the-shelf de-identification tools. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science, 2020:241–250.
- Measuring massive multitask language understanding. In International Conference on Learning Representations.
- Dan Hendrycks and Mantas Mazeika. 2022. X-risk analysis for ai research.
- An overview of catastrophic ai risks.
- Lora: Low-rank adaptation of large language models.
- Ai alignment: A comprehensive survey.
- Mistral 7b.
- Revisiting the evaluation of theory of mind through question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5872–5877, Hong Kong, China. Association for Computational Linguistics.
- The power of scale for parameter-efficient prompt tuning.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation.
- From text to tactic: Evaluating llms playing the game of avalon.
- Training socially aligned language models on simulated social interactions. In The Twelfth International Conference on Learning Representations.
- Trustworthy llms: a survey and guideline for evaluating large language models’ alignment.
- David Lopez-Paz and Marc’Aurelio Ranzato. 2017. Gradient episodic memory for continuum learning. CoRR, abs/1706.08840.
- An empirical study of catastrophic forgetting in large language models during continual fine-tuning.
- Can llms keep a secret? testing privacy implications of language models via contextual integrity theory.
- Helen Nissenbaum. 2004. Privacy as contextual integrity. Washington Law Review, 79.
- Open-ended learning leads to generally capable agents. arXiv preprint arXiv:2107.12808.
- Self-imitation learning. In International Conference on Machine Learning, pages 3878–3887. PMLR.
- OpenAI. 2023. Gpt-4 technical report.
- Training language models to follow instructions with human feedback.
- Dean A Pomerleau. 1988. Alvinn: An autonomous land vehicle in a neural network. Advances in neural information processing systems, 1.
- Direct preference optimization: Your language model is secretly a reward model.
- Neural theory-of-mind? on the limits of social intelligence in large lms.
- Social IQa: Commonsense reasoning about social interactions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4463–4473, Hong Kong, China. Association for Computational Linguistics.
- Proximal policy optimization algorithms.
- Role play with large language models. Nature, 623(7987):493–498.
- Clever hans or neural theory of mind? stress testing social reasoning in large language models.
- Towards facilitating empathic conversations in online mental health support: A reinforcement learning approach. CoRR, abs/2101.07714.
- Why so toxic? measuring and triggering toxic behavior in open-domain chatbots.
- Auditing and mitigating cultural bias in llms. arXiv preprint arXiv:2311.14096.
- Michael Tomasello. 2021. Becoming Human: A Theory of Ontogeny. Belknap Press.
- Behavioral cloning from observation. arXiv preprint arXiv:1805.01954.
- Tomer Ullman. 2023. Large language models fail on trivial alterations to theory-of-mind tasks.
- Decodingtrust: A comprehensive assessment of trustworthiness in gpt models. arXiv preprint arXiv:2306.11698.
- Large language models are not fair evaluators. arXiv preprint arXiv:2305.17926.
- Ruoyao Wang and Peter Jansen. 2023. Self-supervised behavior cloned transformers are path crawlers for text games. arXiv preprint arXiv:2312.04657.
- Aligning large language models with human: A survey.
- Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359.
- Eliezer Yudkowsky et al. 2008. Artificial intelligence as a positive and negative factor in global risk. Global catastrophic risks, 1(303):184.
- Ethical considerations and policy implications for large language models: Guiding responsible development and deployment.
- Sotopia: Interactive evaluation for social intelligence in language agents. In ICLR.
- Fine-tuning language models from human preferences.
- NormBank: A knowledge bank of situational social norms. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7756–7776, Toronto, Canada. Association for Computational Linguistics.
- Ruiyi Wang (11 papers)
- Haofei Yu (17 papers)
- Wenxin Zhang (27 papers)
- Zhengyang Qi (6 papers)
- Maarten Sap (86 papers)
- Graham Neubig (342 papers)
- Yonatan Bisk (91 papers)
- Hao Zhu (212 papers)