Traditional recommender systems are essential for online platforms but face limitations in understanding complex user preferences, engaging in dynamic interactions, and providing transparent explanations. The emergence of LLM-powered agents offers a promising approach to address these challenges by enabling natural language interaction, sophisticated reasoning, and interpretable suggestions. This survey provides a systematic review of how LLM-powered agents are being applied in recommender systems.
The survey categorizes current research into three key paradigms based on their primary objective:
- Recommender-oriented approaches: These methods leverage intelligent agents to enhance the core recommendation mechanisms. LLMs utilize user history and capabilities like planning, reasoning, memory, and tool use to generate direct item recommendations. Examples include RecMind [(Peng et al., 14 Feb 2025 ), wang2024recmind] and MACRec [wang2024macrec].
- Interaction-oriented approaches: These methods focus on facilitating dynamic user engagement through natural language dialogue and providing interpretable recommendations. They use LLMs to conduct human-like conversations or generate explanations for suggestions. Examples include AutoConcierge [zeng2024automated] and RAH [shu2024rah].
- Simulation-oriented approaches: These methods employ single or multi-agent frameworks to model complex user-item interactions and system dynamics, often for evaluation purposes. They use LLMs to generate realistic user behaviors or item characteristics in response to recommendations. Examples include Agent4Rec [zhang2024generative] and AgentCF [zhang2024agentcf].
The survey also proposes a unified architectural framework for LLM-powered recommendation agents, consisting of four essential modules:
- Profile Module: This module constructs and maintains dynamic representations of users and items by analyzing historical interactions and external knowledge. It captures temporal and contextual patterns to build personalized profiles. Examples include Agent4Rec's dual components (social traits and preferences) [zhang2024generative] and MACRec's user and item analysts [wang2024macrec].
- Memory Module: Serving as a contextual brain, this module manages historical interactions, emotional responses, and conversational context. It maintains a structured repository to enable more informed and context-aware recommendations based on past experiences. Examples include RecAgent's hierarchical memory (sensory, short-term, long-term) [wang2023user] and Agent4Rec's factual and emotional memory [zhang2024generative].
- Planning Module: This module outputs intelligent recommendation strategies by designing multi-step action plans. It formulates recommendation trajectories through strategic generation and task sequencing to balance immediate satisfaction with long-term engagement goals. Examples include BiLLP's hierarchical planning (macro/micro learning) [shi2024large] and MACRS's multi-agent planning system [fang2024multi].
- Action Module: This module is the execution engine, transforming decisions from the Planning Module into concrete recommendations by interacting with system components (e.g., database queries, generating text). Examples include RecAgent's six action modalities (search, browse, click, etc.) [wang2023user] and InteRecAgent's integrated tools (querying, retrieval, ranking) [huang2023recommender].
The survey provides a comprehensive analysis of benchmark datasets and evaluation frameworks used in this domain. Datasets are broadly categorized into traditional recommendation datasets (e.g., Amazon reviews like "Books", "Video Games" [mcauley2015image]; MovieLens variants [harper2015movielens]; Steam [kang2018self]; Yelp [https://www.yelp.com/dataset]) and conversational recommendation datasets (e.g., ReDial [li2018towards], Reddit [he2023large], OpenDialKG [moon2019opendialkg]). For resource efficiency, some methods sample data from large datasets.
Evaluation metrics employed by LLM-powered agent recommenders are diverse, including:
- Standard Recommendation Metrics: Common metrics like NDCG@K, Recall@K, HR@K, MRR, Acc, F1-Score, MAP, RMSE, MAE, MSE are used to assess basic recommendation quality.
- Language Generation Quality: Metrics such as BLEU and ROUGE evaluate the quality of generated text like explanations or summaries.
- Reinforcement Learning Metrics: Metrics like trajectory length, average single-round reward, and cumulative trajectory reward are used for methods framed as long-term interaction or planning problems.
- Conversational Efficiency Metrics: Metrics like Average Turn (AT) and Success Rate (SR) measure the efficiency of the dialogue process in conversational recommendation.
- Custom Indicators: Novel metrics are proposed in some works to evaluate specific aspects like proactivity, economy, explainability, correctness, consistency, efficiency [zeng2024automated], or the believability of simulated behaviors and memory [wang2023user].
Related research fields discussed include general LLM-powered Recommender Systems, which often focus on ranking or prediction but may lack agentic capabilities like planning and memory, and Conversational Recommender Systems, which share goals with interaction-oriented agents but traditional methods are limited by rigid dialogue or limited knowledge.
The survey concludes by highlighting key challenges and promising future directions:
- Optimization of System Architecture: Improving the integration of traditional RS components with LLMs, enhancing multi-agent collaboration, and improving system interpretability.
- Refinement of Evaluation Framework: Establishing unified and comprehensive evaluation standards, developing novel metrics for dialogue and recommendation effectiveness, and considering privacy and security.
- Security Recommender System: Addressing vulnerabilities to adversarial attacks [ning2024cheatagent] by developing detection methods, multi-agent defensive architectures, and integrating domain-specific security knowledge.
Overall, the survey provides a structured view of the emerging field of LLM-powered agents in recommender systems, outlining the different research objectives, common architectural components, evaluation practices, and future research avenues.