Prompting is Not All You Need! Evaluating LLM Agent Simulation Methodologies with Real-World Online Customer Behavior Data (2503.20749v6)

Published 26 Mar 2025 in cs.CL

Abstract: Recent research shows that LLMs can simulate believable'' human behaviors to power LLM agents via prompt-only methods. In this work, we focus on evaluating LLM's objectiveaccuracy'' rather than the subjective ``believability'' in simulating human behavior, leveraging a large-scale, real-world dataset collected from customers' online shopping actions. We present the first comprehensive evaluation of state-of-the-art LLMs (e.g., DeepSeek-R1, Llama, and Claude) on the task of web shopping action generation. Our results show that out-of-the-box LLM-generated actions are often misaligned with actual human behavior, whereas fine-tuning LLMs on real-world behavioral data substantially improves their ability to generate accurate actions compared to prompt-only methods. Furthermore, incorporating synthesized reasonings into model training leads to additional performance gains, demonstrating the value of explicit rationale in behavior modeling. This work evaluates state-of-the-art LLMs in behavior simulation and provides actionable insights into how real-world action data can enhance the fidelity of LLM agents.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (11)

Yuxuan Lu (26 papers)
Jing Huang (140 papers)
Yan Han (43 papers)
Yaochen Xie (20 papers)
Dakuo Wang (87 papers)
Qi He (52 papers)
Bingsheng Yao (49 papers)
Sisong Bei (1 paper)
Jiri Gesi (8 papers)
Zheshen (1 paper)
Wang (46 papers)

Prompting is Not All You Need! Evaluating LLM Agent Simulation Methodologies with Real-World Online Customer Behavior Data (2503.20749v6)

Related Papers