Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prompting is Not All You Need! Evaluating LLM Agent Simulation Methodologies with Real-World Online Customer Behavior Data (2503.20749v6)

Published 26 Mar 2025 in cs.CL

Abstract: Recent research shows that LLMs can simulate believable'' human behaviors to power LLM agents via prompt-only methods. In this work, we focus on evaluating LLM's objectiveaccuracy'' rather than the subjective ``believability'' in simulating human behavior, leveraging a large-scale, real-world dataset collected from customers' online shopping actions. We present the first comprehensive evaluation of state-of-the-art LLMs (e.g., DeepSeek-R1, Llama, and Claude) on the task of web shopping action generation. Our results show that out-of-the-box LLM-generated actions are often misaligned with actual human behavior, whereas fine-tuning LLMs on real-world behavioral data substantially improves their ability to generate accurate actions compared to prompt-only methods. Furthermore, incorporating synthesized reasonings into model training leads to additional performance gains, demonstrating the value of explicit rationale in behavior modeling. This work evaluates state-of-the-art LLMs in behavior simulation and provides actionable insights into how real-world action data can enhance the fidelity of LLM agents.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Yuxuan Lu (26 papers)
  2. Jing Huang (140 papers)
  3. Yan Han (43 papers)
  4. Yaochen Xie (20 papers)
  5. Dakuo Wang (87 papers)
  6. Qi He (52 papers)
  7. Bingsheng Yao (49 papers)
  8. Sisong Bei (1 paper)
  9. Jiri Gesi (8 papers)
  10. Zheshen (1 paper)
  11. Wang (46 papers)