Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 73 tok/s
Gemini 2.5 Pro 42 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 34 tok/s Pro
GPT-4o 96 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 454 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

RecBot Framework: Dual-Agent Recommender

Updated 29 September 2025
  • RecBot Framework is a dual-agent architecture that translates natural language commands into structured user preferences for real-time recommendation control.
  • It integrates LLM-enhanced intent parsing, dynamic tool-chain orchestration, and simulation-augmented knowledge distillation to optimize both user experience and business metrics.
  • Empirical results show significant improvements in recall, conversion, and user satisfaction by enabling explicit user influence over recommendation feeds.

The RecBot Framework is a dual-agent conversational recommendation architecture specifically designed to enable explicit, natural language user control over real-time recommendation policies within mainstream recommendation feeds. Distinct from traditional recommender systems, which rely primarily on passive, implicit feedback signals, RecBot allows users to influence item selection instantly by expressing constraints, preferences, and exclusions through unconstrained linguistic commands. The architecture integrates LLM-enhanced intent parsing, tool-chain orchestration for policy adaptation, and simulation-augmented knowledge distillation for efficient deployment, demonstrating significant improvements in user satisfaction and business metrics over conventional approaches (Tang et al., 25 Sep 2025).

1. Dual-Agent Architecture and Workflow

RecBot operates through the coordinated action of two agents—a Parser Agent and a Planner Agent. The Parser Agent ingests the current recommendation feed RtR_t, raw natural language command ctc_t, and previous preference memory PtP_t, employing advanced LLM reasoning and dynamic memory consolidation to transduce free-form user feedback into structured, domain-specific preference representations Pt+1P_{t+1}.

The parsing process decomposes preferences into:

  • Positive preferences (Pt+1+P_{t+1}^{+}): Encompassing both hard, rule-based constraints (e.g., "below \$200") and soft, semantic preferences (e.g., "prefer romantic movies").
  • Negative preferences (Pt+1P_{t+1}^{-}): Discriminating strict exclusions (e.g., "not floral pattern") and soft disinclinations.

Formally,

Pt+1=P(Rt,ct,Pt)={Pt+1+,  Pt+1}P_{t+1} = \mathcal{P}(R_t, c_t, P_t) = \{ P_{t+1}^{+},\; P_{t+1}^{-}\}

The Planner Agent receives Pt+1P_{t+1} alongside implicit behavioral history HtH_t, and orchestrates an adaptive tool-chain:

  • Filter Tool: Prunes candidate set II to satisfy positive hard constraints and exclude negative hard constraints.
  • Matcher Tool: Computes relevance via two parallel mechanisms:
    • Semantic similarity (ssems_{\text{sem}}): Cosine similarity between item descriptions and positive intent, using contextual embedding models (e.g., Sentence-BERT, BGE).
    • Active-Intent-Aware Collaborative Filtering (SaiaS_{\text{aia}}): Captures personalized sequential patterns using multi-head self-attention (MHSA) and multi-head cross-attention (MHCA) between multimodal user history and parsed intent.
  • Attenuator Tool: Applies semantic penalties to items resembling user's negative instructions.
  • Aggregator: Combines Matcher and Attenuator scores into final ranking:

sfinal(i)=smatch(i)+satten(i)s_{\text{final}}(i) = s_{\text{match}}(i) + s_{\text{atten}}(i)

with

smatch(i)=αssem(i,Pt+1+)+(1α)Saia(i,Pt+1+,Ht)s_{\text{match}}(i) = \alpha \cdot s_{\text{sem}}(i, P_{t+1}^{+}) + (1-\alpha) \cdot S_{\text{aia}}(i, P_{t+1}^{+}, H_t)

satten(i)=βsim(eitem(i),e(Pt+1))s_{\text{atten}}(i) = -\beta \cdot \mathrm{sim}(e_{\text{item}}(i), e(P_{t+1}^{-}))

This agent-driven pipeline allows RecBot to continuously update the top-KK feed Rt+1R_{t+1} in direct response to explicit user commands, as formalized by:

Rt+1=TopK({sfinal(i)iI})R_{t+1} = \mathrm{TopK}(\{s_{\text{final}}(i) | i \in I\})

2. Active User Commands and Semantic Parsing

The framework facilitates direct, granular user control. Users may issue commands such as, "Show me blue dresses under \$100, exclude sleeveless styles," which the Parser Agent interprets into hard inclusion, soft inclusion, hard exclusion, and soft exclusion signals. The parsing process, leveraging both LLM semantic inference and domain-specific vocabulary mapping, enables nuanced extraction and separation of complex multi-attribute constraints within unconstrained natural language. Redundant or noisy feedback is robustly handled, whereas multi-constraint queries are disambiguated and split into atomic elements for downstream enforcement and matching.

This paradigm shift from implicit, behavior-driven feedback to explicit command-based control improves model interpretability, user alignment, and precludes misattribution of motivation behind clicks or dislikes.

3. Adaptive Tool-Chain Orchestration and Scoring

Upon receiving the structured intent, the Planner Agent invokes a dynamic tool chain:

  • The Filter Tool enforces strict attribute adherence (e.g., price, date, category exclusion).
  • The Matcher Tool computes semantic relevance by embedding both item descriptions and structured intents, using pre-trained transformers, and assesses collaborative signals through attention mechanisms over multimodal historical records.
  • The Attenuator Tool penalizes candidates with high semantic similarity to expressed exclusions.
  • The Aggregator synthesizes outputs from all tools to yield the ranked feed.

Adjustable weighting parameters (α\alpha for match modes, β\beta for attenuation) provide fine-grained control over the balance between explicit command satisfaction and collaborative historical personalization.

This agent composition enables the RecBot framework to adapt recommendation policies in real time in response to the full spectrum of user linguistic commands, mitigating delayed model updates and information cocooning.

4. Simulation-Augmented Knowledge Distillation

Practical RecBot deployment faces challenges from the inference cost and latency of large, closed-source models. To address this, the framework implements simulation-augmented knowledge distillation:

  • User Simulation Agent (Usim\mathcal{U}_{\text{sim}}): Generates multi-turn interaction trajectories, simulating realistic feedback, target setting, and interest drift scenarios.
  • Teacher-Student Training: A powerful teacher agent (e.g., GPT-4.1-powered RecBot) interacts with simulated users, generating training trajectories. Lightweight, open-source student agents distill both parsing and planning behaviors through next-token prediction objectives—the cross-entropy loss being formulated as:

L(θ)=(x,y)DMixedjlogPθ(yjx,y<j)\mathcal{L}(\theta) = \sum_{(x, y) \in \mathcal{D}_{\text{Mixed}}} \sum_{j} -\log P_\theta(y_j | x, y_{<j})

where DMixed\mathcal{D}_{\text{Mixed}} aggregates samples from both Parser and Planner pipelines.

This process retains high-fidelity command understanding while curtailing deployment resource requirements, enabling scale-up to industrial settings.

5. Empirical Performance and Evaluation

RecBot demonstrates robust improvements across both offline and online evaluations:

  • Offline bench tests: On datasets such as Amazon, MovieLens, Taobao, RecBot-GPT substantially surpasses baselines (e.g., BM25, SASRec, BERT4Rec) in Recall@N, NDCG@N, Condition Satisfaction Rate (CSR@NN), Pass Rate (PR), and Average Rounds (AR). Multi-round tasks exhibit lower AR and higher convergence rates (e.g., MR task on Taobao, RecBot-GPT achieves 41.14% pass rate at ~4.28 rounds).
  • Online A/B tests: Over three months in a commercial e-commerce environment, users exposed to RecBot exhibited a 0.71% reduction in Negative Feedback Frequency (NFF), a 1.44% increase in Clicked Item Category Diversity (CICD), and gains in Page Views (PV), Add-to-Cart (ATC), and Gross Merchandise Volume (GMV). These improvements indicate the business impact of adopting explicit command-based control within mainstream recommendation feeds.

Empirical findings are visualized in metric trajectories (cf. Fig. 7 in the paper) that confirm the sustained user engagement and conversion benefits of RecBot-enabled interaction.

6. Practical Implications and Future Directions

RecBot’s ability to integrate into production feeds (e.g., e-commerce, media streaming) affords users context-preserving, conversational control over recommendations. The explicit separation and enforcement of positive and negative constraints counteract filter bubble formation and information cocooning, yielding both greater personalization accuracy and item/domain diversity.

This suggests potential for broader adoption of dual-agent, explicit-intent frameworks in domains where fine-grained user preference specification is critical. RecBot’s multi-agent, simulation-distilled structure sets a precedent for further research into proactive conversational recommendation, continuous online learning, and self-evolving agent architectures. A plausible implication is the future emergence of systems wherein user, agent, and environment co-adapt interactively, further minimizing the gap between stated intent and system interpretation.

7. Limitations and Availability

The RecBot Framework, as documented in the cited paper, provides a production-ready architecture validated on both simulated and real-world platforms. Limitations may include dependency on large pre-trained models for initial training, domain-specific adaptation of vocabulary and attribute mappings, and the requirement for tool-chain orchestration engineering. The framework’s simulation-based distillation strategy mitigates but does not eliminate reliance on high-resource teacher agents during initial optimization.

No public codebase is mentioned for RecBot itself in the source material; industrial deployments and open-source reference implementations would further extend reproducibility and domain transferability.


RecBot represents a rigorous advancement in conversational recommendation systems, introducing a dual-agent architecture for real-time policy adaptation through explicit user commands, simulation-driven optimization for scalable deployment, and empirically validated improvements in both user satisfaction and business key performance indicators (Tang et al., 25 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to RecBot Framework.