Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Conversational Recommender System (1806.03277v1)

Published 8 Jun 2018 in cs.IR

Abstract: A personalized conversational sales agent could have much commercial potential. E-commerce companies such as Amazon, eBay, JD, Alibaba etc. are piloting such kind of agents with their users. However, the research on this topic is very limited and existing solutions are either based on single round adhoc search engine or traditional multi round dialog system. They usually only utilize user inputs in the current session, ignoring users' long term preferences. On the other hand, it is well known that sales conversion rate can be greatly improved based on recommender systems, which learn user preferences based on past purchasing behavior and optimize business oriented metrics such as conversion rate or expected revenue. In this work, we propose to integrate research in dialog systems and recommender systems into a novel and unified deep reinforcement learning framework to build a personalized conversational recommendation agent that optimizes a per session based utility function.

The paper concerns itself with the integration of search and recommendation methodologies within conversational systems. The paper introduces a chat agent designed to assist users in interactively locating items. The agent leverages deep learning technologies and incorporates insights from search and recommendation to address the task from a novel perspective. The system architecture comprises three primary modules: Natural Language Understanding (NLU), Dialogue Management (DM), and Natural Language Generation (NLG).

Here's a breakdown of the key components and concepts:

  • NLU Module: Focuses on analyzing user utterances to extract item-specific metadata. A deep belief tracker is trained to analyze user utterances in context and extract facet values of the targeted item. The output updates the current user intention, represented as a query consisting of facet-value pairs.
  • DM Module: Determines the appropriate action to take based on the current dialogue state. This module is integrated with an external recommender system and operates within a defined action space. A deep policy network is trained to decide the optimal action at each turn, considering the user query and long-term user preferences. Actions include requesting more information about a specific facet or recommending a list of products.
  • Recommender System: Integrated to provide personalized recommendations based on user history and context.
  • Faceted Search Integration: The conversational agent helps users find items interactively, similar to faceted search in e-commerce. The system selects a set of facets or facet-value pairs for the user to choose from based on context.
  • Deep Reinforcement Learning: Used for decision-making in the DM module. The system learns to select actions that maximize the expected reward in the entire conversation session.

The paper details related work in dialogue systems, recommender systems, faceted search, and deep reinforcement learning. It positions the work relative to existing research, highlighting the focus on commercial success metrics, such as conversion rate, and the modeling of user preferences. The paper contrasts its approach with prior works that focus primarily on NLP challenges without fully integrating user preferences.

A key contribution of the paper is the deep policy network, which decides when and how to gather information from users and make recommendations based on past purchasing history and context.

The paper describes the implementation of a conversational recommender system using a Factorization Machine (FM) and a deep policy network. The FM is used to train the recommender with the dialogue state, user information, and item information. The deep policy network is trained using a policy gradient method to maximize the episodic expected reward.

The paper describes experiments conducted to evaluate the proposed system, including:

  • Offline experiments with simulated users to pre-train the model.
  • Online experiments with real users to evaluate the effectiveness of the learned agents.

The paper uses the Yelp challenge dataset for restaurants and food data. The dataset is adapted to create dialogue scripts. Simulated users are created with a simple agenda to interact with the agent, answering questions, finding items, and leaving the dialogue based on predefined rules.

The paper models the recommendation reward in different ways, including Linear, Normalized Discounted Cumulative Gain (NDCG), and Cascade models, each reflecting different assumptions about user behavior when reviewing recommendations. The paper describes a method for collecting user utterances using Amazon Mechanical Turk, where workers write natural language responses based on dialogue schemas.

The paper compares its Conversational Recommender Model (CRM) against a Maximum Entropy rule-based method ("MaxEnt Full") and its variations ("MaxEnt@K"). Key metrics used for evaluation include Average Reward, Success Rate, Average Number of Turns, Wrong Quit Rate, and Low Rank Rate. The results indicate that the CRM outperforms the baselines, achieving higher average reward and success rate in fewer turns. The paper analyzes the impact of belief tracker accuracy on the performance of the proposed framework, demonstrating the robustness of the reinforcement learning model. The paper examines the effects of different simulated environments by varying the Maximum Success Reward and Recommendation List Stop Threshold.

Online user studies are conducted to evaluate the trained model with real users, comparing the CRM against the MaxEnt Full method. The results show that the CRM achieves a higher success rate and shorter average turn count compared to the baseline.

In conclusion, the paper presents a framework for building conversational recommender systems, integrating techniques from dialogue systems and recommender systems. The system uses a deep policy network to manage the conversation and make personalized recommendations. Experimental results demonstrate the effectiveness of the proposed approach in both simulated and real-user settings.

The model maximizes the episodic expected reward from the starting state:

η(θ)=Eπ[t=0Tγtrt]\eta(\theta)=E_\pi[\sum_{t=0}^{T}\gamma^t r_t ]

  • η(θ)\eta(\theta) is the episodic expected reward.
  • EπE_\pi is the expected value under policy π\pi.
  • TT is the final time step.
  • γ\gamma is a discount parameter.
  • rtr_t is the reward at time step tt.

The gradient of the learning object is:

η(θ)=Eπ[γtGtθlogπ(atst,θ)]\nabla \eta(\theta) = E_\pi [\gamma^t G_t \nabla_\theta \log \pi(a_t | s_t, \theta) ]

  • η(θ)\nabla \eta(\theta) is the gradient of the learning object.
  • GtG_t is the sum of rewards from time step tt to TT.
  • θlogπ(atst,θ)\nabla_\theta \log \pi(a_t | s_t, \theta) is the gradient of the logarithm of the policy with respect to the policy parameter θ\theta.

Several limitations and future research directions are identified, including joint learning of dialogue policy and recommendation model, improvements to the facet search components, and exploration of different reward functions.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yueming Sun (3 papers)
  2. Yi Zhang (994 papers)
Citations (421)