Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Causal State Distillation for Explainable Reinforcement Learning (2401.00104v2)

Published 30 Dec 2023 in cs.LG, cs.AI, and stat.ME

Abstract: Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be quite challenging. This lack of transparency in RL models has been a long-standing problem, making it difficult for users to grasp the reasons behind an agent's behaviour. Various approaches have been explored to address this problem, with one promising avenue being reward decomposition (RD). RD is appealing as it sidesteps some of the concerns associated with other methods that attempt to rationalize an agent's behaviour in a post-hoc manner. RD works by exposing various facets of the rewards that contribute to the agent's objectives during training. However, RD alone has limitations as it primarily offers insights based on sub-rewards and does not delve into the intricate cause-and-effect relationships that occur within an RL agent's neural model. In this paper, we present an extension of RD that goes beyond sub-rewards to provide more informative explanations. Our approach is centred on a causal learning framework that leverages information-theoretic measures for explanation objectives that encourage three crucial properties of causal factors: causal sufficiency, sparseness, and orthogonality. These properties help us distill the cause-and-effect relationships between the agent's states and actions or rewards, allowing for a deeper understanding of its decision-making processes. Our framework is designed to generate local explanations and can be applied to a wide range of RL tasks with multiple reward channels. Through a series of experiments, we demonstrate that our approach offers more meaningful and insightful explanations for the agent's action selections.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Highlights: Summarizing agent behavior to people. In Adaptive Agents and Multi-Agent Systems, 2018. URL https://api.semanticscholar.org/CorpusID:21755369.
  2. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
  3. Exploratory not explanatory: Counterfactual analysis of saliency maps for deep RL. CoRR, abs/1912.05743, 2019. URL http://arxiv.org/abs/1912.05743.
  4. Information flows in causal networks. Advances in complex systems, 11(01):17–41, 2008.
  5. Mutual information neural estimation. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 531–540. PMLR, July 2018.
  6. The arcade learning environment: An evaluation platform for general agents. CoRR, abs/1207.4708, 2012. URL http://arxiv.org/abs/1207.4708.
  7. Openai gym, 2016. URL http://arxiv.org/abs/1606.01540. cite arxiv:1606.01540.
  8. Club: A contrastive log-ratio upper bound of mutual information. In International conference on machine learning, pages 1779–1788. PMLR, 2020.
  9. Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018.
  10. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In 2016 IEEE Symposium on Security and Privacy (SP), pages 598–617, 2016. doi: 10.1109/SP.2016.42.
  11. DF Elliott and KR Rao. Fast fourier transform and convolution algorithms, 1982.
  12. On the Information Loss in Memoryless Systems: The Multivariate Case. arXiv preprint arXiv:1109.4856, 2011.
  13. Visualizing and understanding atari agents. In International conference on machine learning, pages 1792–1801. PMLR, 2018.
  14. Edge: Explaining deep reinforcement learning policies. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 12222–12236. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/65c89f5a9501a04c073b354f03791b1f-Paper.pdf.
  15. Explain your move: Understanding agent actions using focused feature saliency. CoRR, abs/1912.12191, 2019. URL http://arxiv.org/abs/1912.12191.
  16. Transparency and explanation in deep reinforcement learning neural networks. CoRR, abs/1809.06061, 2018. URL http://arxiv.org/abs/1809.06061.
  17. Explainable Reinforcement learning via Reward Decomposition. In IJCAI/ECAI Workshop on Explainable Artificial Intelligence, 2019.
  18. Unsupervised learning of object keypoints for perception and control. CoRR, abs/1906.11883, 2019. URL http://arxiv.org/abs/1906.11883.
  19. Internally Rewarded Reinforcement Learning. In International Conference on Machine Learning (ICML), July 2023.
  20. Orphicx: A causality-inspired latent variable model for interpreting graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13729–13738, 2022.
  21. Rd2: Reward decomposition with representation disentanglement. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS). ACM, December 2020.
  22. A closer look at reward decomposition for high-level robotic explanations. arXiv preprint arXiv:2304.12958, 2023.
  23. Causality inspired representation learning for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8046–8056, 2022.
  24. A survey of explainable reinforcement learning. arXiv preprint arXiv:2202.08434, 2022.
  25. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, February 2015. ISSN 00280836. URL http://dx.doi.org/10.1038/nature14236.
  26. Causal interpretability for machine learning - problems, methods and evaluation. SIGKDD Explor. Newsl., 22(1):18–33, may 2020. ISSN 1931-0145. doi: 10.1145/3400051.3400058. URL https://doi.org/10.1145/3400051.3400058.
  27. Counterfactual state explanations for reinforcement learning agents via generative deep learning. CoRR, abs/2101.12446, 2021. URL https://arxiv.org/abs/2101.12446.
  28. Generative causal explanations of black-box classifiers. CoRR, abs/2006.13913, 2020. URL https://arxiv.org/abs/2006.13913.
  29. Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009. ISBN 052189560X.
  30. Causal inference in statistics: A primer. 2016. URL https://api.semanticscholar.org/CorpusID:148322624.
  31. A demonstration of the visual importance and flexibility of spatial-frequency amplitude and phase. Perception, 11(3):337–346, 1982.
  32. A survey on explainable reinforcement learning: Concepts, algorithms, challenges. arXiv preprint arXiv:2211.06665, 2022.
  33. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  34. "why should I trust you?": Explaining the predictions of any classifier. CoRR, abs/1602.04938, 2016. URL http://arxiv.org/abs/1602.04938.
  35. Towards causal representation learning. CoRR, abs/2102.11107, 2021. URL https://arxiv.org/abs/2102.11107.
  36. Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. CoRR, abs/1610.02391, 2016. URL http://arxiv.org/abs/1610.02391.
  37. Integrating policy summaries with reward decomposition for explaining reinforcement learning agents. In International Conference on Practical Applications of Agents and Multi-Agent Systems, pages 320–332. Springer, 2023.
  38. Learning important features through propagating activation differences. In International conference on machine learning, pages 3145–3153. PMLR, 2017.
  39. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017.
  40. Representation learning with contrastive predictive coding. CoRR, abs/1807.03748, 2018.
  41. Deep reinforcement learning with double q-learning. CoRR, abs/1509.06461, 2015.
  42. Causal explanation for reinforcement learning: Quantifying state and temporal importance. Applied Intelligence, pages 1–19, 2023.
  43. C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King’s College, Oxford, 1989.
  44. A causality inspired framework for model interpretation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, page 2731–2741, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701030. doi: 10.1145/3580305.3599240. URL https://doi.org/10.1145/3580305.3599240.
  45. Self-supervised attention-aware reinforcement learning. In AAAI Conference on Artificial Intelligence, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Wenhao Lu (17 papers)
  2. Xufeng Zhao (14 papers)
  3. Thilo Fryen (2 papers)
  4. Jae Hee Lee (24 papers)
  5. Mengdi Li (19 papers)
  6. Sven Magg (23 papers)
  7. Stefan Wermter (157 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.