- The paper presents RetailSim, a unified simulation framework modeling full seller–buyer retail pipelines with explicit cross-stage causal tracing.
- It employs persona-conditioned LLM agents to simulate multi-turn interactions and evaluates behavioral fidelity using metrics like Krippendorff’s α exceeding 0.67.
- Results indicate that simulation outputs align with economic principles such as negative price–demand elasticity and differentiated conversion rates driven by seller–buyer personas.
RetailSim: A Unified End-to-End Simulation Framework for Seller–Buyer Retail Dynamics with LLM Agents
Motivation and Limitations of Existing Approaches
Effective retail strategy design requires causal understanding of how upstream seller actions affect complex downstream buyer outcomes across multiple interaction stages. Traditional retail simulators isolate specific pipeline segments such as customer service, negotiation, or recommendation, but fail to capture cross-stage dependencies, restricting the controlled assessment of strategy effectiveness and economic regularities. Existing benchmarks generally lack unified modeling of seller—buyer dynamics, thereby impeding the empirical study of intervention effects and the verification of alignment with economic principles.
RetailSim: Multi-Stage, Persona-Driven Retail Simulation Framework
RetailSim is introduced as a causally structured, persona-driven multi-agent simulation pipeline modeling the entire retail journey: sales pitch generation, pre-purchase inquiry, purchase decision, post-purchase interaction, and review generation. The framework instantiates persona-conditioned seller and buyer LLMs, with heterogeneous, compositional behavioral attributes controlling (for sellers) assertiveness, friendliness, and rationality, and (for buyers) pickiness, price consciousness, and rationality. Product diversity is systematically realized via stratified subsampling from the Amazon Reviews dataset across 34 categories grouped into four macro-classes.
Figure 1: Schematic of RetailSim—a unified multi-stage pipeline simulating the full propagation from seller strategies to buyer outcomes, supporting controlled cross-stage interventions and causal tracing of decision effects.
Key modules include:
- Explicit sales strategy formulation separating persuasion dimensions (target expansion, value proposition, urgency, and objection handling), grounded in real-world industry scripts.
- Multi-turn buyer–seller dialog in both pre- and post-purchase phases, with stochastic scenario assignment reflecting realistic customer service/complaint structures.
- Full-trace state propagation, passing outputs as context to downstream stages (e.g., buyer’s product review conditioned on every interaction).
This design enables high-fidelity simulation of retail interaction trajectories, supporting precise manipulation of participant, product, interaction, and strategy-related variables.
Dual-Fidelity Evaluation: Behavioral and Economic Consistency
RetailSim’s fidelity is jointly assessed at the stage and system levels. Stage-wise annotation (1–5 Likert) evaluates adherence to simulation guidelines and persona reflection, combining task performance (e.g., realism and utility of scripts or inquiries) and trait separability (via A/B role-based comparisons). Krippendorff’s α consistently exceeds 0.67.
Figure 2: Annotation template (excerpt): sales script quality evaluation by human annotators on 1–5 scale.
Figure 3: Pairwise seller persona evaluation template, targeting discriminability of assertiveness, friendliness, and rationality in seller outputs.
At the system level, meta-evaluations verify the emergence of empirically validated economic patterns:
- Gender-differentiated purchasing aligned with product orientation (men vs. women-targeted vs. unisex).
- Monotonic price–demand relationships, with statistically significant elasticity stratified by buyer price-consciousness trait (higher elasticity for price-sensitive buyers, p<0.001).
- Consistent persona-driven differences in conversion, refund rate, and post-purchase satisfaction—e.g., easygoing buyers purchase more frequently; assertive sellers achieve higher conversion.
The framework robustly reproduces these regularities across eight LLM architectures, demonstrating resilience to model-level implementation variance and trivial prompt bias.
Practical Applications and Analytical Utility
RetailSim operationalizes simulation as a scientific instrument for controlled analysis and benchmarking of retail decision-making:
Latent Persona Estimation: By training trait classifiers on labeled interaction traces, latent seller and buyer personas can be inferred from downstream behavior, enabling both model introspection and the potential for analog human-on-the-loop inference.
Figure 4: Distribution of inferred dominant persona attributes for five LLMs acting as sellers (top) and buyers (bottom).
Interplay of Seller–Buyer Pairings: Quantitative analysis reveals marked interaction effects. For instance, while Qwen3-235B as seller exhibits maximal global revenue, it does not universally outperform in all pairwise model interactions—highlighting context-dependent optimal seller–buyer matches. Self-play does not consistently yield superior monetization, and LLMs with emotional persuasion tendencies yield higher refunds.
Sales Script Strategy Evaluation: Incrementally substituting naive LLM-generated scripts with real retail-optimized strategies demonstrates monotonic revenue improvements. Furthermore, with full guidance, model capacity effects diminish—the efficacy is strategy-dominated.
Theoretical Implications
RetailSim validates that contemporary LLM-based agents, when architected in causally linked multi-stage pipelines, can manifest aggregate behaviors that are not only aligned with human-like behavioral regularities but also with classical economic models (e.g., negative price–demand elasticity, demographic segmentation effects). This cross-stage compositionality addresses prior gaps in the literature where local authenticity did not guarantee system-level validity. The findings underscore the necessity for meta-evaluative protocols in simulation research and establish a paradigm for aligning agent-based economic simulations with empirical economics.
Outlook and Future Directions
RetailSim’s unified, compositional approach opens scalable pathways for pre-deployment rehearsal, model benchmarking under realistic behavioral constraints, and even automated testing of intervention and nudging policies. Potential extensions include explicit modeling of competitive seller dynamics, integration of memory and sequential learning at the agent level, and adaptation for other multi-agent behavioral economics settings beyond retail.
Conclusion
RetailSim provides the field with a rigorous, modular testbed for the end-to-end study of seller–buyer retail pipelines via LLM agents. It achieves high simulation fidelity both at the interaction and system levels, supports in-depth analysis of persona and strategy effects, and reproduces key economic regularities—thereby establishing itself as an indispensable resource for the scientific study and optimization of complex retail interaction dynamics.
Reference: "What Makes a Sale? Rethinking End-to-End Seller--Buyer Retail Dynamics with LLM Agents" (2604.04468)