Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning (1802.06501v3)

Published 19 Feb 2018 in cs.IR, cs.LG, and stat.ML

Abstract: Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedback. Users' feedback can be positive and negative and both types of feedback have great potentials to boost recommendations. However, the number of negative feedback is much larger than that of positive one; thus incorporating them simultaneously is challenging since positive feedback could be buried by negative one. In this paper, we develop a novel approach to incorporate them into the proposed deep recommender system (DEERS) framework. The experimental results based on real-world e-commerce data demonstrate the effectiveness of the proposed framework. Further experiments have been conducted to understand the importance of both positive and negative feedback in recommendations.

Authors (6)

Xiangyu Zhao (193 papers)
Liang Zhang (359 papers)
Zhuoye Ding (16 papers)
Long Xia (25 papers)
Jiliang Tang (204 papers)
Dawei Yin (165 papers)

Citations (313)

View on Semantic Scholar

Summary

The paper introduces a novel framework, DEERS, that integrates negative feedback into deep reinforcement learning to optimize recommendations.
DEERS employs a pairwise DQN architecture with GRU to distinguish between positive and negative user signals.
Empirical results on e-commerce data show DEERS outperforms traditional models with improved MAP and NDCG@40 metrics.

Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning

The paper, "Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning," introduces an innovative approach to recommender systems by integrating negative feedback into the reinforcement learning framework. Traditional recommender systems often treat user interaction as a static process and mainly focus on minimizing short-term losses by basing recommendations primarily on positive feedback (e.g., clicks, purchases). The proposed framework, termed DEERS (Deep Reinforcement Learning Based System), transcends these limitations by modeling the recommendation process as a Markov Decision Process (MDP) and incorporating both positive and negative feedback to potentially enhance the quality and relevance of recommendations.

Core Contributions and Methodology

1. Integrating Negative Feedback:

The authors identify a robust yet underutilized aspect of user interactions—negative feedback (e.g., skips, non-clicks)—as a significant indicator of user preference. This feedback is usually more abundant than positive feedback and poses the challenge of integration without overshadowing the positive feedback. DEERS acknowledges and integrates both types of feedback to recalibrate recommendation strategies continuously.

2. Pairwise Deep Reinforcement Learning Framework:

DEERS employs a novel framework that uses Reinforcement Learning (RL), specifically utilizing the Deep Q-Network (DQN), to optimize recommendations. It estimates action values without an explicit Q-value table or transition probabilities, enhancing scalability to accommodate a vast array of items.

3. Novel DQN Architecture:

The architecture incorporates positive (clicked/ordered items) and negative signals (skipped items) into separate input layers of DQN, harnessing Gated Recurrent Units (GRU) to capture users' sequential preferences. This separation allows the system to distinguish between varying impacts on user satisfaction.

4. Pairwise Regularization Term:

To maximize the difference between Q-values of target items and their competitors, a pairwise regularization term is proposed. This feature highlights the importance of distinguishing user preferences even within similar item categories, contributing to more nuanced and precise recommendations.

Empirical Evaluation

The experiments conducted utilized real-world e-commerce data, substantiating DEERS' efficacy. The framework demonstrated superior performance over traditional baselines like Collaborative Filtering (CF) and Factorization Machines (FM), as well as more advanced approaches like GRU-enhanced and basic DQN models. DEERS showed improved MAP and NDCG@40 metrics, illustrating its enhanced capacity to provide relevant and personalized recommendations.

In online simulations, DEERS maintained its performance lead, particularly in extended recommendation sessions, reaffirming the framework's capacity to balance short-term engagement with long-term user satisfaction.

Implications and Future Directions

This research represents an advancement in recommendation systems by focusing on feedback diversity, which can guide future diversification in AI-driven personalization. The implications extend beyond technical enhancements—by fostering more intuitive and dynamically adaptive interactions, this approach also aligns with evolving user behaviors and preferences.

Future research avenues could explore the integration of additional user interaction metrics, such as dwell time, to discern the strength of negative feedback. Another compelling direction is expanding the feedback modalities, accommodating complex user interaction patterns beyond binary clicks and skips, thereby enhancing the granularity of feedback interpretation.

Overall, the DEERS framework signifies a valuable stride towards creating more adept and user-responsive recommender systems, leveraging reinforcement learning to better navigate the depths of user feedback, and expanding the horizon for intelligent e-commerce solutions.

PDF Markdown