NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding (2404.13627v3)
Abstract: LLMs have sparked substantial interest and debate concerning their potential emergence of Theory of Mind (ToM) ability. Theory of mind evaluations currently focuses on testing models using machine-generated data or game settings prone to shortcuts and spurious correlations, which lacks evaluation of machine ToM ability in real-world human interaction scenarios. This poses a pressing demand to develop new real-world scenario benchmarks. We introduce NegotiationToM, a new benchmark designed to stress-test machine ToM in real-world negotiation surrounding covered multi-dimensional mental states (i.e., desires, beliefs, and intentions). Our benchmark builds upon the Belief-Desire-Intention (BDI) agent modeling theory and conducts the necessary empirical experiments to evaluate LLMs. Our findings demonstrate that NegotiationToM is challenging for state-of-the-art LLMs, as they consistently perform significantly worse than humans, even when employing the chain-of-thought (CoT) method.
- Chunkit Chan (19 papers)
- Cheng Jiayang (11 papers)
- Yauwai Yim (8 papers)
- Zheye Deng (12 papers)
- Wei Fan (160 papers)
- Haoran Li (166 papers)
- Xin Liu (820 papers)
- Hongming Zhang (111 papers)
- Weiqi Wang (58 papers)
- Yangqiu Song (196 papers)