Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems (1902.05570v4)

Published 13 Feb 2019 in cs.IR

Abstract: Recommender systems play a crucial role in our daily lives. Feed streaming mechanism has been widely used in the recommender system, especially on the mobile Apps. The feed streaming setting provides users the interactive manner of recommendation in never-ending feeds. In such an interactive manner, a good recommender system should pay more attention to user stickiness, which is far beyond classical instant metrics, and typically measured by {\bf long-term user engagement}. Directly optimizing the long-term user engagement is a non-trivial problem, as the learning target is usually not available for conventional supervised learning methods. Though reinforcement learning~(RL) naturally fits the problem of maximizing the long term rewards, applying RL to optimize long-term user engagement is still facing challenges: user behaviors are versatile and difficult to model, which typically consists of both instant feedback~(e.g. clicks, ordering) and delayed feedback~(e.g. dwell time, revisit); in addition, performing effective off-policy learning is still immature, especially when combining bootstrapping and function approximation. To address these issues, in this work, we introduce a reinforcement learning framework --- FeedRec to optimize the long-term user engagement. FeedRec includes two components: 1)~a Q-Network which designed in hierarchical LSTM takes charge of modeling complex user behaviors, and 2)~an S-Network, which simulates the environment, assists the Q-Network and voids the instability of convergence in policy learning. Extensive experiments on synthetic data and a real-world large scale data show that FeedRec effectively optimizes the long-term user engagement and outperforms state-of-the-arts.

PDF Abstract

Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems

This paper presents a reinforcement learning (RL) framework called FeedRec, specifically designed to optimize long-term user engagement within recommender systems prevalent in feed streaming scenarios. The primary focus is on addressing challenges associated with user stickiness, which extends beyond traditional instant metrics like click-through rate (CTR) and includes delayed feedback components such as dwell time and revisit frequency.

Key Contributions and Approach

FeedRec proposes a hierarchical architecture comprising two main components: a Q-Network utilizing hierarchical LSTM models to manage the complexity of varied user behaviors, and an S-Network that emulates the environment to aid the Q-Network while minimizing instability and convergence issues in policy learning. The model tackles the challenges typically associated with off-policy learning, particularly when blending bootstrapping, function approximation, and offline training.

The authors have addressed the intricate task of modeling delayed user feedback by designing the state space to encapsulate both instantaneous and latent user engagement metrics. They employ a thoughtful approach to dynamically model each user's historical interactions using a series of LSTMs, ensuring rich and contextually relevant state representations.

Numerical Results and Performance

Through comprehensive experiments on both synthetic datasets and a large-scale real-world E-commerce dataset, FeedRec demonstrates superior performance over existing baseline methods, including traditional FM and NCF models, as well as other RL-based approaches like DEERs and DDPG-KNN. Notably, FeedRec effectively elevates long-term metrics such as browsing depth and return time, alongside boosting instant metrics like clicks, thus validating its utility and effectiveness in practical applications.

Implications and Future Directions

The implications of this work are significant for the development of recommender systems, as it highlights the potential of reinforcement learning frameworks to tackle complex user engagement metrics in volatile environments. By directly optimizing long-term goals, such frameworks can enhance user satisfaction and system retention, thereby fostering more robust and adaptive recommender agents.

Theoretical implications extend to the integration of hierarchical structures and simulation-based methods within RL, offering potential avenues for further research. Future work could explore more sophisticated state representations or alternative reward structures to encompass additional factors influencing user decisions in dynamic environments.

Conclusion

The paper provides a compelling framework for optimizing long-term user engagement by leveraging a combination of advanced RL techniques and hierarchical user behavior modeling. FeedRec's ability to address the inherent instability in offline policy learning and its demonstrated effectiveness across diverse datasets make it a notable contribution that could guide future architectures in the recommender systems landscape.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Lixin Zou (22 papers)
Long Xia (25 papers)
Zhuoye Ding (16 papers)
Jiaxing Song (7 papers)
Weidong Liu (46 papers)
Dawei Yin (165 papers)

Citations (212)

View on Semantic Scholar

Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems (1902.05570v4)