On Causally Disentangled State Representation Learning for Reinforcement Learning based Recommender Systems
Abstract: In Reinforcement Learning-based Recommender Systems (RLRS), the complexity and dynamism of user interactions often result in high-dimensional and noisy state spaces, making it challenging to discern which aspects of the state are truly influential in driving the decision-making process. This issue is exacerbated by the evolving nature of user preferences and behaviors, requiring the recommender system to adaptively focus on the most relevant information for decision-making while preserving generaliability. To tackle this problem, we introduce an innovative causal approach for decomposing the state and extracting \textbf{C}ausal-\textbf{I}n\textbf{D}ispensable \textbf{S}tate Representations (CIDS) in RLRS. Our method concentrates on identifying the \textbf{D}irectly \textbf{A}ction-\textbf{I}nfluenced \textbf{S}tate Variables (DAIS) and \textbf{A}ction-\textbf{I}nfluence \textbf{A}ncestors (AIA), which are essential for making effective recommendations. By leveraging conditional mutual information, we develop a framework that not only discerns the causal relationships within the generative process but also isolates critical state variables from the typically dense and high-dimensional state representations. We provide theoretical evidence for the identifiability of these variables. Then, by making use of the identified causal relationship, we construct causal-indispensable state representations, enabling the training of policies over a more advantageous subset of the agent's state space. We demonstrate the efficacy of our approach through extensive experiments, showcasing our method outperforms state-of-the-art methods.
- Reinforcement learning based recommender systems: A survey. Comput. Surveys 55, 7 (2022), 1–38.
- Stephen Bonner and Flavian Vasile. 2018. Causal embeddings for recommendation. In Proceedings of the 12th ACM conference on recommender systems. 104–112.
- Large-scale interactive recommendation with tree-structured policy gradient. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3312–3320.
- Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1187–1196.
- Knowledge-guided deep reinforcement learning for interactive recommendation. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
- Intrinsically motivated reinforcement learning based recommendation with counterfactual data augmentation. World Wide Web 26, 5 (2023), 3253–3274.
- Maximum-Entropy Regularized Decision Transformer with Reward Relabelling for Dynamic Recommendation. arXiv preprint arXiv:2406.00725 (2024).
- Deep reinforcement learning in recommender systems: A survey and new perspectives. Knowledge-Based Systems 264 (2023), 110335. https://doi.org/10.1016/j.knosys.2023.110335
- Generative inverse deep reinforcement learning for online recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 201–210.
- Causal Incremental Graph Convolution for Recommender System Retraining. IEEE Transactions on Neural Networks and Learning Systems (2022), 1–11. https://doi.org/10.1109/TNNLS.2022.3156066
- Addressing function approximation error in actor-critic methods. In International conference on machine learning. PMLR, 1587–1596.
- Deepmdp: Learning continuous latent space models for representation learning. In International Conference on Machine Learning. PMLR, 2170–2179.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861–1870.
- Nonintrusive-Sensing and Reinforcement-Learning Based Adaptive Personalized Music Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1721–1724.
- Action-sufficient state representation learning for control with structural constraints. In International Conference on Machine Learning. PMLR, 9260–9279.
- Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=Hk4fpoA5Km
- State representation learning for control: An overview. Neural Networks 108 (2018), 379–392.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).
- A general knowledge distillation framework for counterfactual recommendation via uniform data. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 831–840.
- State representation modeling for deep reinforcement learning based recommendation. Knowledge-Based Systems 205 (2020), 106170. https://doi.org/10.1016/j.knosys.2020.106170
- Explainable reinforcement learning through a causal lens. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 2493–2500.
- Tariq Mahmood and Francesco Ricci. 2007. Learning and adaptivity in interactive recommender systems. In Proceedings of the ninth international conference on Electronic commerce. 75–84.
- Necessary and sufficient conditions for causal feature selection in time series with latent common causes. In International Conference on Machine Learning. PMLR, 7502–7511.
- Judea Pearl. 2009. Causality. Cambridge university press.
- Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press.
- Virtual-taobao: Virtualizing real-world online retail environment for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4902–4909.
- Causation, prediction, and search. MIT press.
- Counterfactual explainable recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 1784–1793.
- Causal decision transformer for recommender systems via offline reinforcement learning. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1599–1608.
- Plug-and-Play Model-Agnostic Counterfactual Policy Synthesis for Deep Reinforcement Learning-Based Recommendation. IEEE Transactions on Neural Networks and Learning Systems (2023).
- Clicks can be cheating: Counterfactual recommendation for mitigating clickbait issue. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1288–1297.
- Causal Representation Learning for Out-of-Distribution Recommendation. In Proceedings of the ACM Web Conference 2022. 3562–3571.
- Yu Wang. 2020. A hybrid recommendation for music based on reinforcement learning. In Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Part I 24. Springer, 91–103.
- Causal Dynamics Learning for Task-Independent State Abstraction. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 23151–23180. https://proceedings.mlr.press/v162/wang22ae.html
- Reinforcement Knowledge Graph Reasoning for Explainable Recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 285–294.
- Learning Invariant Representations for Reinforcement Learning without Reconstruction. In International Conference on Learning Representations. https://openreview.net/forum?id=-2FCwDKRREu
- Causerec: Counterfactual user sequence synthesis for sequential recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 367–377.
- Causal intervention for leveraging popularity bias in recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 11–20.
- Dynamic scholarly collaborator recommendation via competitive multi-agent reinforcement learning. In Proceedings of the eleventh ACM conference on recommender systems. 331–335.
- Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1040–1048.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.