RecBot Framework: Dual-Agent Recommender
- RecBot Framework is a dual-agent architecture that translates natural language commands into structured user preferences for real-time recommendation control.
- It integrates LLM-enhanced intent parsing, dynamic tool-chain orchestration, and simulation-augmented knowledge distillation to optimize both user experience and business metrics.
- Empirical results show significant improvements in recall, conversion, and user satisfaction by enabling explicit user influence over recommendation feeds.
The RecBot Framework is a dual-agent conversational recommendation architecture specifically designed to enable explicit, natural language user control over real-time recommendation policies within mainstream recommendation feeds. Distinct from traditional recommender systems, which rely primarily on passive, implicit feedback signals, RecBot allows users to influence item selection instantly by expressing constraints, preferences, and exclusions through unconstrained linguistic commands. The architecture integrates LLM-enhanced intent parsing, tool-chain orchestration for policy adaptation, and simulation-augmented knowledge distillation for efficient deployment, demonstrating significant improvements in user satisfaction and business metrics over conventional approaches (Tang et al., 25 Sep 2025).
1. Dual-Agent Architecture and Workflow
RecBot operates through the coordinated action of two agents—a Parser Agent and a Planner Agent. The Parser Agent ingests the current recommendation feed , raw natural language command , and previous preference memory , employing advanced LLM reasoning and dynamic memory consolidation to transduce free-form user feedback into structured, domain-specific preference representations .
The parsing process decomposes preferences into:
- Positive preferences (): Encompassing both hard, rule-based constraints (e.g., "below \$200") and soft, semantic preferences (e.g., "prefer romantic movies").
- Negative preferences (): Discriminating strict exclusions (e.g., "not floral pattern") and soft disinclinations.
Formally,
The Planner Agent receives alongside implicit behavioral history , and orchestrates an adaptive tool-chain:
- Filter Tool: Prunes candidate set to satisfy positive hard constraints and exclude negative hard constraints.
- Matcher Tool: Computes relevance via two parallel mechanisms:
- Semantic similarity (): Cosine similarity between item descriptions and positive intent, using contextual embedding models (e.g., Sentence-BERT, BGE).
- Active-Intent-Aware Collaborative Filtering (): Captures personalized sequential patterns using multi-head self-attention (MHSA) and multi-head cross-attention (MHCA) between multimodal user history and parsed intent.
- Attenuator Tool: Applies semantic penalties to items resembling user's negative instructions.
- Aggregator: Combines Matcher and Attenuator scores into final ranking:
with
This agent-driven pipeline allows RecBot to continuously update the top- feed in direct response to explicit user commands, as formalized by:
2. Active User Commands and Semantic Parsing
The framework facilitates direct, granular user control. Users may issue commands such as, "Show me blue dresses under \$100, exclude sleeveless styles," which the Parser Agent interprets into hard inclusion, soft inclusion, hard exclusion, and soft exclusion signals. The parsing process, leveraging both LLM semantic inference and domain-specific vocabulary mapping, enables nuanced extraction and separation of complex multi-attribute constraints within unconstrained natural language. Redundant or noisy feedback is robustly handled, whereas multi-constraint queries are disambiguated and split into atomic elements for downstream enforcement and matching.
This paradigm shift from implicit, behavior-driven feedback to explicit command-based control improves model interpretability, user alignment, and precludes misattribution of motivation behind clicks or dislikes.
3. Adaptive Tool-Chain Orchestration and Scoring
Upon receiving the structured intent, the Planner Agent invokes a dynamic tool chain:
- The Filter Tool enforces strict attribute adherence (e.g., price, date, category exclusion).
- The Matcher Tool computes semantic relevance by embedding both item descriptions and structured intents, using pre-trained transformers, and assesses collaborative signals through attention mechanisms over multimodal historical records.
- The Attenuator Tool penalizes candidates with high semantic similarity to expressed exclusions.
- The Aggregator synthesizes outputs from all tools to yield the ranked feed.
Adjustable weighting parameters ( for match modes, for attenuation) provide fine-grained control over the balance between explicit command satisfaction and collaborative historical personalization.
This agent composition enables the RecBot framework to adapt recommendation policies in real time in response to the full spectrum of user linguistic commands, mitigating delayed model updates and information cocooning.
4. Simulation-Augmented Knowledge Distillation
Practical RecBot deployment faces challenges from the inference cost and latency of large, closed-source models. To address this, the framework implements simulation-augmented knowledge distillation:
- User Simulation Agent (): Generates multi-turn interaction trajectories, simulating realistic feedback, target setting, and interest drift scenarios.
- Teacher-Student Training: A powerful teacher agent (e.g., GPT-4.1-powered RecBot) interacts with simulated users, generating training trajectories. Lightweight, open-source student agents distill both parsing and planning behaviors through next-token prediction objectives—the cross-entropy loss being formulated as:
where aggregates samples from both Parser and Planner pipelines.
This process retains high-fidelity command understanding while curtailing deployment resource requirements, enabling scale-up to industrial settings.
5. Empirical Performance and Evaluation
RecBot demonstrates robust improvements across both offline and online evaluations:
- Offline bench tests: On datasets such as Amazon, MovieLens, Taobao, RecBot-GPT substantially surpasses baselines (e.g., BM25, SASRec, BERT4Rec) in Recall@N, NDCG@N, Condition Satisfaction Rate (CSR@), Pass Rate (PR), and Average Rounds (AR). Multi-round tasks exhibit lower AR and higher convergence rates (e.g., MR task on Taobao, RecBot-GPT achieves 41.14% pass rate at ~4.28 rounds).
- Online A/B tests: Over three months in a commercial e-commerce environment, users exposed to RecBot exhibited a 0.71% reduction in Negative Feedback Frequency (NFF), a 1.44% increase in Clicked Item Category Diversity (CICD), and gains in Page Views (PV), Add-to-Cart (ATC), and Gross Merchandise Volume (GMV). These improvements indicate the business impact of adopting explicit command-based control within mainstream recommendation feeds.
Empirical findings are visualized in metric trajectories (cf. Fig. 7 in the paper) that confirm the sustained user engagement and conversion benefits of RecBot-enabled interaction.
6. Practical Implications and Future Directions
RecBot’s ability to integrate into production feeds (e.g., e-commerce, media streaming) affords users context-preserving, conversational control over recommendations. The explicit separation and enforcement of positive and negative constraints counteract filter bubble formation and information cocooning, yielding both greater personalization accuracy and item/domain diversity.
This suggests potential for broader adoption of dual-agent, explicit-intent frameworks in domains where fine-grained user preference specification is critical. RecBot’s multi-agent, simulation-distilled structure sets a precedent for further research into proactive conversational recommendation, continuous online learning, and self-evolving agent architectures. A plausible implication is the future emergence of systems wherein user, agent, and environment co-adapt interactively, further minimizing the gap between stated intent and system interpretation.
7. Limitations and Availability
The RecBot Framework, as documented in the cited paper, provides a production-ready architecture validated on both simulated and real-world platforms. Limitations may include dependency on large pre-trained models for initial training, domain-specific adaptation of vocabulary and attribute mappings, and the requirement for tool-chain orchestration engineering. The framework’s simulation-based distillation strategy mitigates but does not eliminate reliance on high-resource teacher agents during initial optimization.
No public codebase is mentioned for RecBot itself in the source material; industrial deployments and open-source reference implementations would further extend reproducibility and domain transferability.
RecBot represents a rigorous advancement in conversational recommendation systems, introducing a dual-agent architecture for real-time policy adaptation through explicit user commands, simulation-driven optimization for scalable deployment, and empirically validated improvements in both user satisfaction and business key performance indicators (Tang et al., 25 Sep 2025).