Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 73 tok/s

Gemini 2.5 Pro 42 tok/s Pro

GPT-5 Medium 26 tok/s Pro

GPT-5 High 34 tok/s Pro

GPT-4o 96 tok/s Pro

Kimi K2 191 tok/s Pro

GPT OSS 120B 454 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Interactive Recommendation Agent with Active User Commands (2509.21317v1)

Published 25 Sep 2025 in cs.IR, cs.CL, and cs.HC

Abstract: Traditional recommender systems rely on passive feedback mechanisms that limit users to simple choices such as like and dislike. However, these coarse-grained signals fail to capture users' nuanced behavior motivations and intentions. In turn, current systems cannot also distinguish which specific item attributes drive user satisfaction or dissatisfaction, resulting in inaccurate preference modeling. These fundamental limitations create a persistent gap between user intentions and system interpretations, ultimately undermining user satisfaction and harming system effectiveness. To address these limitations, we introduce the Interactive Recommendation Feed (IRF), a pioneering paradigm that enables natural language commands within mainstream recommendation feeds. Unlike traditional systems that confine users to passive implicit behavioral influence, IRF empowers active explicit control over recommendation policies through real-time linguistic commands. To support this paradigm, we develop RecBot, a dual-agent architecture where a Parser Agent transforms linguistic expressions into structured preferences and a Planner Agent dynamically orchestrates adaptive tool chains for on-the-fly policy adjustment. To enable practical deployment, we employ simulation-augmented knowledge distillation to achieve efficient performance while maintaining strong reasoning capabilities. Through extensive offline and long-term online experiments, RecBot shows significant improvements in both user satisfaction and business outcomes.

Summary

The paper introduces RecBot, an interactive recommendation agent that leverages active natural language commands to model nuanced user preferences and adjust policies in real time.
RecBot employs a dual-agent architecture with parser and planner agents, integrating dynamic memory consolidation and modular tool chains for precise recommendation updates.
Empirical evaluations show RecBot outperforms traditional approaches on multiple datasets, boosting user satisfaction and business metrics.

Interactive Recommendation Agent with Active User Commands: Technical Summary and Implications

Motivation and Paradigm Shift

Traditional recommender systems are fundamentally limited by their reliance on passive, coarse-grained feedback mechanisms (e.g., clicks, likes/dislikes), which fail to capture the nuanced motivations and intentions underlying user behavior. This results in ambiguous preference modeling, indiscriminate attribution of item characteristics, and persistent misalignment between user intent and system interpretation. The paper introduces the Interactive Recommendation Feed (IRF) paradigm, which enables users to issue free-form natural language commands directly within mainstream recommendation feeds, thereby transforming the interaction from passive consumption to active, user-driven control.

Figure 1: Comparison between traditional and novel interactive recommendation feeds. IRF enables direct, free-form natural language commands for real-time policy adjustment.

RecBot Framework: Architecture and Components

The RecBot framework operationalizes IRF via a dual-agent architecture:

Parser Agent: Transforms user natural language commands into structured preference representations, explicitly modeling both positive and negative user intents. The parser employs dynamic memory consolidation to maintain coherent multi-turn preference states, using context-aware update principles (preservation, integration, resolution) to efficiently synthesize historical and current feedback.
Figure 2: Parser integrates history, current feed, and user command to generate updated preference representation.
Planner Agent: Receives structured preferences and orchestrates adaptive tool chains for on-the-fly recommendation policy adjustment. The planner leverages a modular toolset:
- Filter: Enforces hard constraints (e.g., price, release date).
- Matcher: Computes positive relevance via semantic similarity and intent-aware collaborative filtering.
- Attenuator: Penalizes items matching negative preferences.
- Aggregator: Synthesizes scores for final ranking.
- Figure 3: Planner dynamically constructs tool invocation sequences for policy adaptation based on parsed preferences.

The toolset design is extensible, supporting seamless integration of additional modules (e.g., searcher tools for trending topics) via standardized interfaces, consistent with Model Context Protocol (MCP) principles.

Multi-Agent Optimization and Deployment

RecBot employs simulation-augmented knowledge distillation to transfer reasoning capabilities from closed-source teacher LLMs (e.g., GPT-4.1) to cost-effective open-source student models (e.g., Qwen3-14B). Synthetic user-system interaction trajectories are generated via role-playing, enabling diverse, realistic training data for both parser and planner agents. The unified optimization objective minimizes negative log-likelihood over mixed agent-specific datasets, supporting efficient online inference and scalable deployment.

Empirical Evaluation

Offline Experiments

RecBot is evaluated on Amazon, MovieLens, and Taobao datasets across single-round, multi-round, and multi-round with interest drift scenarios. Metrics include Recall@N, NDCG@N, Condition Satisfaction Rate (CSR@N), Pass Rate (PR), and Average Rounds (AR).

Performance: RecBot consistently outperforms traditional sequential models (SASRec, BERT4Rec), command-aware methods (BM25, BGE), and prior interactive agents (GOMMIR, InteRecAgent, Instruct2Agent) across all metrics and datasets.
Efficiency: In multi-round scenarios, RecBot achieves higher pass rates with fewer interaction rounds, demonstrating superior policy adaptation and preference satisfaction.
Knowledge Distillation: Notably, student models (RecBot-Qwen Align.) can surpass teacher models (RecBot-GPT) in certain metrics, confirming the efficacy of simulation-augmented distillation for unlocking latent model capabilities.
Figure 4: Offline ablation paper results on Amazon dataset, validating the additive benefits of each RecBot component.

Online Experiments

A three-month A/B test on a large-scale e-commerce platform demonstrates:

User Experience: 0.71% reduction in Negative Feedback Frequency (NFF), 0.88% increase in Exposed Item Category Diversity (EICD), and 1.44% increase in Clicked Item Category Diversity (CICD).
Business Impact: 1.28% increase in Add-to-Cart (ATC) and 1.40% increase in Gross Merchandise Volume (GMV).
User Group Analysis: Consistent NFF reductions across user segments, with optimal improvements for users with moderate historical negative feedback.
Figure 5: Online performance curves during three-month A/B testing, showing RecBot's improvements over baseline.

Figure 6: Online performance improvements across user groups split by historical negative feedback frequency.
Command FulfiLLMent: 88.9% success rate in satisfying user commands (human evaluation), 87.5% (LLM-Judge), with 96.5% consistency between methods.
Figure 7: Case paper of RecBot on production platform, demonstrating multi-round command fulfiLLMent and adaptive policy adjustment.

Theoretical and Practical Implications

The IRF paradigm and RecBot framework represent a significant advancement in interactive recommendation, enabling direct, fine-grained user control over recommendation policies via natural language. The explicit modeling of both positive and negative preferences, dynamic memory consolidation, and modular tool orchestration collectively address the limitations of passive feedback and ambiguous preference inference. The simulation-augmented distillation approach provides a scalable pathway for deploying high-performing, cost-effective agents in production environments.

From a theoretical perspective, the results challenge the assumption that larger teacher models are always superior, highlighting the potential for student models to achieve or exceed teacher performance via targeted knowledge transfer. The modular agentic architecture aligns with emerging trends in agentic AI and tool-based reasoning, suggesting future directions in self-evolving, lifelong interactive agents.

Future Directions

Key avenues for further research include:

Online learning mechanisms for continuous agent evolution via real-time user feedback.
Enhanced personalized reasoning and explanatory capabilities.
Integration of proactive anticipation modules for next-generation interactive recommender systems.
Exploration of agentic reinforced evolution and multi-agent collaboration for improved adaptability and robustness.

Conclusion

The Interactive Recommendation Feed paradigm and RecBot framework address fundamental limitations in recommender systems by enabling user-controllable, command-aware recommendation experiences. Extensive empirical validation demonstrates substantial improvements in user satisfaction, content diversity, and business outcomes. The modular, multi-agent architecture and simulation-augmented optimization provide a robust foundation for scalable, production-ready interactive recommendation systems, with broad implications for the future of agentic AI in personalized content delivery.