Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM Theory of Mind and Alignment: Opportunities and Risks (2405.08154v1)

Published 13 May 2024 in cs.HC and cs.AI

Abstract: LLMs are transforming human-computer interaction and conceptions of AI with their impressive capacities for conversing and reasoning in natural language. There is growing interest in whether LLMs have theory of mind (ToM); the ability to reason about the mental and emotional states of others that is core to human social intelligence. As LLMs are integrated into the fabric of our personal, professional and social lives and given greater agency to make decisions with real-world consequences, there is a critical need to understand how they can be aligned with human values. ToM seems to be a promising direction of inquiry in this regard. Following the literature on the role and impacts of human ToM, this paper identifies key areas in which LLM ToM will show up in human:LLM interactions at individual and group levels, and what opportunities and risks for alignment are raised in each. On the individual level, the paper considers how LLM ToM might manifest in goal specification, conversational adaptation, empathy and anthropomorphism. On the group level, it considers how LLM ToM might facilitate collective alignment, cooperation or competition, and moral judgement-making. The paper lays out a broad spectrum of potential implications and suggests the most pressing areas for future research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Merlyn mind, 2023.
  2. Replika, 2024.
  3. Woebot health, 2024.
  4. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
  5. Anthropic. Claude’s constitution.
  6. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861 (2021).
  7. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862 (2022).
  8. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073 (2022).
  9. The “reading the mind in the eyes” test revised version: a study with normal adults, and adults with asperger syndrome or high-functioning autism. The Journal of Child Psychology and Psychiatry and Allied Disciplines 42, 2 (2001), 241–251.
  10. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).
  11. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  12. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).
  13. Intelligent assistants have poor usability: A user study of alexa, google assistant, and siri, July 2018.
  14. Why anthropomorphize? folk psychology and other stories. Anthropomorphism, anecdotes, and animals (1997), 59–73.
  15. Deep reinforcement learning from human preferences. Advances in neural information processing systems 30 (2017).
  16. Folk psychological attributions of consciousness to large language models.
  17. The evolution of cooperation in infinitely repeated games: Experimental evidence. American Economic Review 101, 1 (2011), 411–429.
  18. De Villiers, J. The interface of language and theory of mind. Lingua 117, 11 (2007), 1858–1878.
  19. Negotiating with other minds: the role of recursive theory of mind in negotiation with incomplete information. Autonomous Agents and Multi-Agent Systems 31 (2017), 250–287.
  20. Higher-order theory of mind is especially useful in unpredictable negotiations. Autonomous Agents and Multi-Agent Systems 36, 2 (2022), 30.
  21. Dennett, D. C. The intentional stance. MIT press, 1989.
  22. Francois-Lovens, P. “without these conversations with the eliza chatbot, my husband would still be here”. La Libre.
  23. Evolution and cooperation in noisy repeated games. The American Economic Review 80, 2 (1990), 274–279.
  24. Gabriel, I. Artificial intelligence, values, and alignment. Minds and machines 30, 3 (2020), 411–437.
  25. Fairness considerations: increasing understanding of intentionality during adolescence. Journal of experimental child psychology 104, 4 (2009), 398–409.
  26. Incomplete contracting and ai alignment. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (2019), pp. 417–422.
  27. Cooperative inverse reinforcement learning. Advances in neural information processing systems 29 (2016).
  28. Theory of mind in schizophrenia: a critical review. Cognitive neuropsychiatry 10, 4 (2005), 249–286.
  29. Mentalizing about emotion and its relationship to empathy. Social cognitive and affective neuroscience 3, 3 (2008), 204–217.
  30. Theory-of-mind deficits and causal attributions. British journal of Psychology 89, 2 (1998), 191–204.
  31. Knobe, J. Theory of mind and moral cognition: Exploring the connections. Trends in cognitive sciences 9, 8 (2005), 357–359.
  32. Kosinski, M. Theory of mind may have spontaneously emerged in large language models. arXiv preprint arXiv:2302.02083 (2023).
  33. Specification gaming: the flip side of ai ingenuity, 21 April 2020.
  34. Rational cooperation in the finitely repeated prisoners’ dilemma. Journal of Economic theory 27, 2 (1982), 245–252.
  35. Theory of mind and emotion understanding predict moral development in early childhood. British Journal of Developmental Psychology 28, 4 (2010), 871–889.
  36. Learning maximum absolute meaning through reasoning about speaker intentions. Language Learning 71, 2 (2021), 326–368.
  37. Littler, K. Cognitive and affective processes associated with moral reasoning, and their relationship with behaviour in typical development.
  38. To lie or not to lie? the influence of parenting and theory-of-mind understanding on three-year-old children’s honesty. Journal of Moral Education 44, 2 (2015), 198–212.
  39. Malle, B. F. How the mind explains behavior. Folk explanation, Meaning and social interaction. Massachusetts: MIT-Press (2004).
  40. Reliability and validity of the awareness of social inference test (tasit): a clinical test of social perception. Disability and rehabilitation 28, 24 (2006), 1529–1542.
  41. How is human cooperation different? Philosophical Transactions of the Royal Society B: Biological Sciences 365, 1553 (2010), 2663–2674.
  42. Should robots be obedient? arXiv preprint arXiv:1705.09990 (2017).
  43. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (2023), pp. 1–22.
  44. Ai deception: A survey of examples, risks, and potential solutions. arXiv preprint arXiv:2308.14752 (2023).
  45. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022).
  46. Orbital prefrontal cortex volume correlates with social cognitive competence. Neuropsychologia 48, 12 (2010), 3554–3562.
  47. Does the chimpanzee have a theory of mind? Behavioral and brain sciences 1, 4 (1978), 515–526.
  48. Theory of mind ability and cooperation. Manuscript, Univ. California, Irvine (2017).
  49. The development of interpersonal strategy: Autism, theory-of-mind, cooperation and fairness. Journal of economic psychology 27, 1 (2006), 73–97.
  50. Schwitzgebel, E. Ai systems must not confuse users about their sentience or moral status. Patterns 4, 8 (2023).
  51. Two systems for empathy: a double dissociation between emotional and cognitive empathy in inferior frontal gyrus versus ventromedial prefrontal lesions. Brain 132, 3 (2009), 617–627.
  52. Clever hans or neural theory of mind? stress testing social reasoning in large language models. arXiv preprint arXiv:2305.14763 (2023).
  53. Shevlin, H. Uncanny believers: Chatbots, beliefs, and folk psychology. Unpublished manuscript (2021).
  54. Why be nice? psychological constraints on the evolution of cooperation. Trends in cognitive sciences 8, 2 (2004), 60–65.
  55. Perspective-taking and memory capacity predict social network size. Social Networks 29, 1 (2007), 93–104.
  56. How children tell a lie from a joke: The role of second-order mental state attributions. British journal of developmental psychology 13, 2 (1995), 191–204.
  57. Bullying and ‘theory of mind’: A critique of the ‘social skills deficit’view of anti-social behaviour. Social development 8, 1 (1999), 117–127.
  58. Social cognition and bullying: Social inadequacy or skilled manipulation? British journal of developmental psychology 17, 3 (1999), 435–450.
  59. Theory of mind enhances preference for fairness. Journal of experimental child psychology 105, 1-2 (2010), 130–137.
  60. Ullman, T. Large language models fail on trivial alterations to theory-of-mind tasks. arXiv preprint arXiv:2302.08399 (2023).
  61. Deficiencies in theory of mind in patients with schizophrenia, bipolar disorder, and major depressive disorder: A systematic review of secondary literature. Neuroscience & Biobehavioral Reviews 120 (2021), 249–261.
  62. A survey on large language model based autonomous agents, 2023.
  63. Altruistic helping in human infants and young chimpanzees. science 311, 5765 (2006), 1301–1303.
  64. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359 (2021).
  65. Young children’s reasoning about beliefs. Cognition 30, 3 (1988), 239–277.
  66. Including deontic reasoning as fundamental to theory of mind. Human Development 51, 2 (2008), 105–135.
  67. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Winnie Street (6 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets