Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 42 tok/s

Gemini 2.5 Pro 53 tok/s Pro

GPT-5 Medium 17 tok/s Pro

GPT-5 High 13 tok/s Pro

GPT-4o 101 tok/s Pro

Kimi K2 217 tok/s Pro

GPT OSS 120B 474 tok/s Pro

Claude Sonnet 4 36 tok/s Pro

2000 character limit reached

SimUSER: Simulating User Behavior with Large Language Models for Recommender System Evaluation (2504.12722v1)

Published 17 Apr 2025 in cs.IR and cs.AI

Abstract: Recommender systems play a central role in numerous real-life applications, yet evaluating their performance remains a significant challenge due to the gap between offline metrics and online behaviors. Given the scarcity and limits (e.g., privacy issues) of real user data, we introduce SimUSER, an agent framework that serves as believable and cost-effective human proxies. SimUSER first identifies self-consistent personas from historical data, enriching user profiles with unique backgrounds and personalities. Then, central to this evaluation are users equipped with persona, memory, perception, and brain modules, engaging in interactions with the recommender system. SimUSER exhibits closer alignment with genuine humans than prior work, both at micro and macro levels. Additionally, we conduct insightful experiments to explore the effects of thumbnails on click rates, the exposure effect, and the impact of reviews on user engagement. Finally, we refine recommender system parameters based on offline A/B test results, resulting in improved user engagement in the real world.

Summary

The paper introduces SimUSER, a novel agent-based framework that employs LLMs to simulate user behavior in evaluating recommender systems.
It details a multi-module cognitive architecture combining persona matching, visual perception, memory, and decision-making to replicate user interactions.
Experiments on datasets like MovieLens show SimUSER outperforms baselines in mirroring real user engagement, offering a cost-effective alternative to traditional A/B tests.

SimUSER: Simulating User Behavior with LLMs for Recommender System Evaluation

Introduction

The paper "SimUSER: Simulating User Behavior with LLMs for Recommender System Evaluation" (2504.12722) addresses a critical challenge in the evaluation of Recommender Systems (RS) — the gap between offline metrics and online user behaviors. Offline evaluation often falls short in measuring key business values, such as user engagement and satisfaction, due to its non-interactive nature, while online A/B testing can be costly and labor-intensive. To bridge this gap, this work introduces SimUSER, an agent-based framework for simulating user interactions with recommender systems using LLMs as believable and cost-effective human proxies.

Methodology

Persona Matching

SimUSER's procedure commences with identifying consistent user personas from historical data. This involves extracting unique user preferences and profiling characteristics such as age, personality, and occupation. The personas are inferred utilizing the semantic capabilities of LLMs, producing candidate personas that maximize alignment with historical interactions. The matching is evaluated using a self-consistency scoring metric, which ensures that extracted personas correlate strongly with actual user behavior.

Interaction Simulation

In subsequent phases, personas are simulated in a cognitive architecture built upon LLMs composed of modules for perception, memory, and decision-making. The perception module integrates visual elements to replicate human reasoning influenced by visual stimuli. The memory module comprises episodic memory and knowledge-graph memory, vital for representing user-item interactions and external social influences. The perception of items is enriched using captions extracted from visual thumbnails, integrating emotional and content-based cues relevant in RS evaluations.

The brain module processes interactions, modifying its actions based on retrieved memory evidence and visual reasoning. The decision-making process includes multi-round preference elicitation, allowing the agent to refine decisions based on contradictions and supporting evidences.

Experiments

SimUSER agents are tested across datasets like MovieLens and AmazonBook, performing tasks such as item rating, classification, and interactions typical of RS usage. Evaluation metrics illustrate that SimUSER exceeds existing models like RecAgent and Agent4Rec in aligning agent behaviors with user data, both for micro-level actions (e.g., individual ratings) and macro-level preferences (e.g., overall satisfaction).

User Simulators and Recommender System Evaluation

SimUSER is compared against various RS baselines, including Matrix Factorization and MultVAE. User proxies help in identifying impacts of visual cues and reviews on user engagement metrics. Evaluations confirm SimUSER's capacity to generate interactions that align closer with human behaviors compared to existing models, thus acting as a scalable proxy to real-world user evaluations.

Practical Relevance

SimUSER shows potential for replacing conventional A/B testing with a more scalable and privacy-preserving alternative that captures the subtle effects of UX design decisions, like thumbnails and user reviews. This methodology could significantly reduce RS evaluation costs while maintaining high fidelity in user behavioral replication.

Conclusion

By leveraging LLMs, SimUSER facilitates realistic user proxying within recommender systems, offering a new direction for evaluating RS. This work advocates the potential to bridge gaps in offline accuracy while enabling nuanced interactive evaluations. Further development is suggested in areas like cold-start scenarios and more dynamic agent interactions. SimUSER's framework stands as a promising avenue for RS designers seeking automated, ethical, and effective evaluation protocols.