Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding (2404.13627v3)

Published 21 Apr 2024 in cs.CL and cs.AI

Abstract: LLMs have sparked substantial interest and debate concerning their potential emergence of Theory of Mind (ToM) ability. Theory of mind evaluations currently focuses on testing models using machine-generated data or game settings prone to shortcuts and spurious correlations, which lacks evaluation of machine ToM ability in real-world human interaction scenarios. This poses a pressing demand to develop new real-world scenario benchmarks. We introduce NegotiationToM, a new benchmark designed to stress-test machine ToM in real-world negotiation surrounding covered multi-dimensional mental states (i.e., desires, beliefs, and intentions). Our benchmark builds upon the Belief-Desire-Intention (BDI) agent modeling theory and conducts the necessary empirical experiments to evaluate LLMs. Our findings demonstrate that NegotiationToM is challenging for state-of-the-art LLMs, as they consistently perform significantly worse than humans, even when employing the chain-of-thought (CoT) method.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Chunkit Chan (19 papers)
  2. Cheng Jiayang (11 papers)
  3. Yauwai Yim (8 papers)
  4. Zheye Deng (12 papers)
  5. Wei Fan (160 papers)
  6. Haoran Li (166 papers)
  7. Xin Liu (820 papers)
  8. Hongming Zhang (111 papers)
  9. Weiqi Wang (58 papers)
  10. Yangqiu Song (196 papers)
Citations (10)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets