Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Survey on LLM-powered Agents for Recommender Systems (2502.10050v1)

Published 14 Feb 2025 in cs.IR and cs.AI

Abstract: Recommender systems are essential components of many online platforms, yet traditional approaches still struggle with understanding complex user preferences and providing explainable recommendations. The emergence of LLM-powered agents offers a promising approach by enabling natural language interactions and interpretable reasoning, potentially transforming research in recommender systems. This survey provides a systematic review of the emerging applications of LLM-powered agents in recommender systems. We identify and analyze three key paradigms in current research: (1) Recommender-oriented approaches, which leverage intelligent agents to enhance the fundamental recommendation mechanisms; (2) Interaction-oriented approaches, which facilitate dynamic user engagement through natural dialogue and interpretable suggestions; and (3) Simulation-oriented approaches, which employ multi-agent frameworks to model complex user-item interactions and system dynamics. Beyond paradigm categorization, we analyze the architectural foundations of LLM-powered recommendation agents, examining their essential components: profile construction, memory management, strategic planning, and action execution. Our investigation extends to a comprehensive analysis of benchmark datasets and evaluation frameworks in this domain. This systematic examination not only illuminates the current state of LLM-powered agent recommender systems but also charts critical challenges and promising research directions in this transformative field.

Traditional recommender systems are essential for online platforms but face limitations in understanding complex user preferences, engaging in dynamic interactions, and providing transparent explanations. The emergence of LLM-powered agents offers a promising approach to address these challenges by enabling natural language interaction, sophisticated reasoning, and interpretable suggestions. This survey provides a systematic review of how LLM-powered agents are being applied in recommender systems.

The survey categorizes current research into three key paradigms based on their primary objective:

  1. Recommender-oriented approaches: These methods leverage intelligent agents to enhance the core recommendation mechanisms. LLMs utilize user history and capabilities like planning, reasoning, memory, and tool use to generate direct item recommendations. Examples include RecMind [(Peng et al., 14 Feb 2025 ), wang2024recmind] and MACRec [wang2024macrec].
  2. Interaction-oriented approaches: These methods focus on facilitating dynamic user engagement through natural language dialogue and providing interpretable recommendations. They use LLMs to conduct human-like conversations or generate explanations for suggestions. Examples include AutoConcierge [zeng2024automated] and RAH [shu2024rah].
  3. Simulation-oriented approaches: These methods employ single or multi-agent frameworks to model complex user-item interactions and system dynamics, often for evaluation purposes. They use LLMs to generate realistic user behaviors or item characteristics in response to recommendations. Examples include Agent4Rec [zhang2024generative] and AgentCF [zhang2024agentcf].

The survey also proposes a unified architectural framework for LLM-powered recommendation agents, consisting of four essential modules:

  • Profile Module: This module constructs and maintains dynamic representations of users and items by analyzing historical interactions and external knowledge. It captures temporal and contextual patterns to build personalized profiles. Examples include Agent4Rec's dual components (social traits and preferences) [zhang2024generative] and MACRec's user and item analysts [wang2024macrec].
  • Memory Module: Serving as a contextual brain, this module manages historical interactions, emotional responses, and conversational context. It maintains a structured repository to enable more informed and context-aware recommendations based on past experiences. Examples include RecAgent's hierarchical memory (sensory, short-term, long-term) [wang2023user] and Agent4Rec's factual and emotional memory [zhang2024generative].
  • Planning Module: This module outputs intelligent recommendation strategies by designing multi-step action plans. It formulates recommendation trajectories through strategic generation and task sequencing to balance immediate satisfaction with long-term engagement goals. Examples include BiLLP's hierarchical planning (macro/micro learning) [shi2024large] and MACRS's multi-agent planning system [fang2024multi].
  • Action Module: This module is the execution engine, transforming decisions from the Planning Module into concrete recommendations by interacting with system components (e.g., database queries, generating text). Examples include RecAgent's six action modalities (search, browse, click, etc.) [wang2023user] and InteRecAgent's integrated tools (querying, retrieval, ranking) [huang2023recommender].

The survey provides a comprehensive analysis of benchmark datasets and evaluation frameworks used in this domain. Datasets are broadly categorized into traditional recommendation datasets (e.g., Amazon reviews like "Books", "Video Games" [mcauley2015image]; MovieLens variants [harper2015movielens]; Steam [kang2018self]; Yelp [https://www.yelp.com/dataset]) and conversational recommendation datasets (e.g., ReDial [li2018towards], Reddit [he2023large], OpenDialKG [moon2019opendialkg]). For resource efficiency, some methods sample data from large datasets.

Evaluation metrics employed by LLM-powered agent recommenders are diverse, including:

  • Standard Recommendation Metrics: Common metrics like NDCG@K, Recall@K, HR@K, MRR, Acc, F1-Score, MAP, RMSE, MAE, MSE are used to assess basic recommendation quality.
  • Language Generation Quality: Metrics such as BLEU and ROUGE evaluate the quality of generated text like explanations or summaries.
  • Reinforcement Learning Metrics: Metrics like trajectory length, average single-round reward, and cumulative trajectory reward are used for methods framed as long-term interaction or planning problems.
  • Conversational Efficiency Metrics: Metrics like Average Turn (AT) and Success Rate (SR) measure the efficiency of the dialogue process in conversational recommendation.
  • Custom Indicators: Novel metrics are proposed in some works to evaluate specific aspects like proactivity, economy, explainability, correctness, consistency, efficiency [zeng2024automated], or the believability of simulated behaviors and memory [wang2023user].

Related research fields discussed include general LLM-powered Recommender Systems, which often focus on ranking or prediction but may lack agentic capabilities like planning and memory, and Conversational Recommender Systems, which share goals with interaction-oriented agents but traditional methods are limited by rigid dialogue or limited knowledge.

The survey concludes by highlighting key challenges and promising future directions:

  1. Optimization of System Architecture: Improving the integration of traditional RS components with LLMs, enhancing multi-agent collaboration, and improving system interpretability.
  2. Refinement of Evaluation Framework: Establishing unified and comprehensive evaluation standards, developing novel metrics for dialogue and recommendation effectiveness, and considering privacy and security.
  3. Security Recommender System: Addressing vulnerabilities to adversarial attacks [ning2024cheatagent] by developing detection methods, multi-agent defensive architectures, and integrating domain-specific security knowledge.

Overall, the survey provides a structured view of the emerging field of LLM-powered agents in recommender systems, outlining the different research objectives, common architectural components, evaluation practices, and future research avenues.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Qiyao Peng (19 papers)
  2. Hongtao Liu (44 papers)
  3. Hua Huang (70 papers)
  4. Qing Yang (138 papers)
  5. Minglai Shao (17 papers)
Youtube Logo Streamline Icon: https://streamlinehq.com