RecBot: Interactive Recommendation Agent

Updated 28 September 2025

RecBot is an interactive recommendation agent that shifts traditional passive methods by enabling explicit, real-time user commands through natural language.
Its dual-agent system, featuring a Parser Agent and a Planner Agent, efficiently translates semantic commands into structured preferences and adaptive recommendation pipelines.
RecBot employs simulation-augmented knowledge distillation for scalable deployment, achieving measurable improvements in offline and online evaluations.

RecBot is an interactive recommendation agent designed to fundamentally shift the paradigm of recommender systems from passive, implicit behavioral modeling to real-time, explicit, user-driven policy adjustment via natural language commands. The system operationalizes this paradigm through a dual-agent architecture (Parser Agent and Planner Agent), the Interactive Recommendation Feed (IRF), and a simulation-augmented knowledge distillation pipeline for scalable and efficient deployment. Below is a detailed examination of RecBot’s methodology, innovations, technical workflow, empirical evaluations, and future directions.

1. Interactive Recommendation Paradigm

RecBot is architected to accept natural language commands embedded directly within recommendation feeds, empowering users to actively express nuanced intentions and constraints. Unlike traditional systems, which rely on binary or coarse-grained implicit feedback (such as likes/dislikes or clicks), RecBot enables users to specify preferences and dislikes in free-form language—for example, “show me long skirts for autumn, but exclude floral patterns.” This modality directly addresses the persistent gap between user intentions and system interpretations found in classical approaches, which are limited in discerning which specific attributes underpin satisfaction or aversion (Tang et al., 25 Sep 2025).

The system’s communication protocol is iterative and conversational: new commands can be issued at any time, and RecBot responds by refreshing recommendations in real time, supporting multi-turn dialogue that preserves and adapts to historical context.

2. Dual-Agent System: Parser and Planner

RecBot’s technical workflow is underpinned by two coordinated agents:

Parser Agent: Processes natural language commands, extracting semantic preferences and translating them into structured representations that distinguish positive (desiderata) and negative (constraints) intent. The parser manages memory consolidation across dialogue turns, ensuring that both historical feedback and new commands are properly integrated into the preference structure. Formally, the parser outputs:

$\mathcal{P}_{t+1} = \{\mathcal{P}_{t+1}^+, \mathcal{P}_{t+1}^-\}$

where $\mathcal{P}_{t+1}^+$ includes positive, possibly hard constraints (e.g., explicit price limits) and soft constraints (e.g., topical/genre preferences as soft signals), while $\mathcal{P}_{t+1}^-$ represents negative conditions.

Planner Agent: Orchestrates an adaptive chain of specialized “tools” (subsystems), each responsible for a critical aspect of recommendation curation:
- Filter Tool: Applies hard constraints/functions $C^+, C^-$ to prune infeasible candidates:
$\mathcal{I'} = \left\{ i \in \mathcal{I} : C^+(i, \mathcal{C}_{t+1}^{+, \text{hard}}) = 1 \wedge C^-(i, \mathcal{C}_{t+1}^{-, \text{hard}}) = 0 \right\}$ - Matcher Tool: Computes composite scores based on semantic ( $s_{\text{sem}}$ ) and (possibly) collaborative filtering ( $s_{\text{aia}}$ ) metrics:

$s_{\text{match}}(i) = \alpha \cdot s_{\text{sem}}(i, \mathcal{P}_{t+1}^+) + (1-\alpha) \cdot s_{\text{aia}}(i, \mathcal{P}_{t+1}^+, \mathcal{H}_t)$ - Attenuator Tool: Enforces negative feedback, typically by reducing item scores for matches to negative attributes:

$s_{\text{atten}}(i) = -\beta \cdot \mathrm{sim}(\mathrm{item}(i), \mathrm{intent}(\mathcal{P}_{t+1}^-))$

Final candidate scoring for displaying the next feed $R_{t+1}$ is computed as:

$s_{\text{final}}(i) = s_{\text{match}}(i) + s_{\text{atten}}(i)$

3. Interactive Recommendation Feed (IRF)

The IRF is an embedded user interface paradigm within the mainstream recommendation feed, allowing users to inject commands without breaking the flow of browsing. This interface supports rapid, natural multi-turn interaction, allowing immediate policy updates on user intent. Such design decouples RecBot from the rigid feedback cycles of conventional models, enabling real-time, attribute-aware content adjustment and enhancing explainability and user trust in the system.

Key system behaviors include:

Dynamic constraint satisfaction (both hard rules and ranked soft preferences).
Iterative feed updates as commands arrive, maintaining session context.
Negative feedback attenuation, which is critical for filtering out undesired content on-the-fly.

4. Efficient Model Deployment: Simulation-Augmented Knowledge Distillation

For real-world scalability, RecBot employs a simulation-augmented knowledge distillation mechanism. LLMs (e.g., GPT-4.1) serve as teacher agents generating realistic, multi-turn interaction data via a simulated user environment. These trajectories then supervise the training of a smaller, more efficient “student” agent capable of deployment at scale.

Training objective, applied to the combined parser and planner data, employs the next-token prediction loss:

$\mathcal{L}(\theta) = \sum_{(x, y) \in \mathcal{D}_{\mathrm{Mixed}}} \sum_j -\log P_\theta(y_j | x, y_{<j})$

This training consolidates reasoning and parsing into a deployable model, preserving high-level reasoning accuracy while permitting efficient inference.

5. Empirical Evaluation

Offline Results

Comprehensive benchmarks on datasets such as Amazon, MovieLens, and Taobao evaluate RecBot in single-round, multi-round, and drift scenarios. Metrics include Recall@K, NDCG@K, Condition Satisfaction Rate, Pass Rate, and Average Rounds. RecBot consistently outperforms both standard sequential recommenders (e.g., SASRec, BERT4Rec) and contemporary command-aware architectures. Ablation tests confirm that optimal performance is achieved only with full pipeline integration—dynamic parsing, planning, tool chaining, and adaptive memory consolidation.

Online Results

A large-scale A/B test on a major e-commerce homepage over three months demonstrates the business and user engagement benefits:

Negative feedback frequency is reduced by 0.71%.
Category diversity among clicked items increases by 1.44%.
There are measurable gains in page views, add-to-cart ratios, and Gross Merchandise Volume. This indicates not only improved satisfaction but also the systemic effect of increased exposure diversity and user empowerment.

6. Limitations, Extensions, and Future Directions

The RecBot paradigm foregrounds several research directions:

Continuous, Online Learning: Future developments may integrate incremental learning components for real-time preference adaptation as new feedback arrives.
Multi-modal User Command Integration: Adding support for image/voice commands to further naturalize interaction.
Recommendation Explainability: Endowing RecBot with capabilities to clarify why items were filtered or prioritized, further bridging the interpretability gap.
Proactive Intent Inference: While RecBot currently relies on explicit commands, extensions could combine proactive intent inference with explicit user steering for fully hybrid, controllable recommendation.
Scalability and Transferability: Knowledge distillation and modular tool orchestration are foundational for deploying RecBot across domains (e.g., video, news, commerce) while controlling inference cost.

7. Summary Table: RecBot Innovations and Contributions

Component	Functionality	Significance
Parser Agent	Language-to-preference mapping	Extracts structured, attribute-level intent
Planner Agent	Policy orchestration (tool chain)	Manages candidate scoring, filtering, negative feedback
Interactive Feed	Embedded natural language input	Facilitates iterative interaction within main feed
Knowledge Distill.	Efficient reasoning deployment	Scales LLM capabilities to real-time systems

RecBot exemplifies the progression from passive, hard-to-interpret recommenders toward user-controlled, linguistically expressive, transparent, and adaptive systems. Its dual-agent framework, IRF, and simulation-augmented training produce significant improvements in both technical performance and user/business outcomes, as verified by comprehensive offline and online evaluations (Tang et al., 25 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

Interactive Recommendation Agent with Active User Commands (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to RecBot.