Prompting is Not All You Need! Evaluating LLM Agent Simulation Methodologies with Real-World Online Customer Behavior Data (2503.20749v6)
Abstract: Recent research shows that LLMs can simulate believable'' human behaviors to power LLM agents via prompt-only methods. In this work, we focus on evaluating LLM's objective
accuracy'' rather than the subjective ``believability'' in simulating human behavior, leveraging a large-scale, real-world dataset collected from customers' online shopping actions. We present the first comprehensive evaluation of state-of-the-art LLMs (e.g., DeepSeek-R1, Llama, and Claude) on the task of web shopping action generation. Our results show that out-of-the-box LLM-generated actions are often misaligned with actual human behavior, whereas fine-tuning LLMs on real-world behavioral data substantially improves their ability to generate accurate actions compared to prompt-only methods. Furthermore, incorporating synthesized reasonings into model training leads to additional performance gains, demonstrating the value of explicit rationale in behavior modeling. This work evaluates state-of-the-art LLMs in behavior simulation and provides actionable insights into how real-world action data can enhance the fidelity of LLM agents.
- Yuxuan Lu (26 papers)
- Jing Huang (140 papers)
- Yan Han (43 papers)
- Yaochen Xie (20 papers)
- Dakuo Wang (87 papers)
- Qi He (52 papers)
- Bingsheng Yao (49 papers)
- Sisong Bei (1 paper)
- Jiri Gesi (8 papers)
- Zheshen (1 paper)
- Wang (46 papers)