RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning (2303.04475v2)
Abstract: While reinforcement learning (RL) algorithms have been successfully applied to numerous tasks, their reliance on neural networks makes their behavior difficult to understand and trust. Counterfactual explanations are human-friendly explanations that offer users actionable advice on how to alter the model inputs to achieve the desired output from a black-box system. However, current approaches to generating counterfactuals in RL ignore the stochastic and sequential nature of RL tasks and can produce counterfactuals that are difficult to obtain or do not deliver the desired outcome. In this work, we propose RACCER, the first RL-specific approach to generating counterfactual explanations for the behavior of RL agents. We first propose and implement a set of RL-specific counterfactual properties that ensure easily reachable counterfactuals with highly probable desired outcomes. We use a heuristic tree search of the agent's execution trajectories to find the most suitable counterfactuals based on the defined properties. We evaluate RACCER in two tasks as well as conduct a user study to show that RL-specific counterfactuals help users better understand agents' behavior compared to the current state-of-the-art approaches.
- Highlights: Summarizing agent behavior to people. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 1168–1176.
- Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6): 26–38.
- Byrne, R. M. 2019. Counterfactuals in Explainable Artificial Intelligence (XAI): Evidence from Human Reasoning. In IJCAI, 6276–6282.
- Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 8789–8797.
- Counterfactual explanations for prediction and diagnosis in XAI. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 215–226.
- Multi-objective counterfactual explanations. In International Conference on Parallel Problem Solving from Nature, 448–469. Springer.
- Uncertainty estimation and out-of-distribution detection for counterfactual explanations: Pitfalls and solutions. arXiv preprint arXiv:2107.09734.
- Counterfactual Explanations for Reinforcement Learning. arXiv preprint arXiv:2210.11846.
- Factual and counterfactual explanations for black box decision making. IEEE Intelligent Systems, 34(6): 14–23.
- A survey of methods for explaining black box models. ACM computing surveys (CSUR), 51(5): 1–42.
- Metrics for explainable AI: Challenges and prospects. arXiv preprint arXiv:1812.04608.
- GANterfactual-RL: Understanding Reinforcement Learning Agents’ Strategies through Visual Counterfactual Explanations. arXiv preprint arXiv:2302.12689.
- A survey of algorithmic recourse: definitions, formulations, solutions, and prospects. arXiv preprint arXiv:2010.04050.
- Algorithmic recourse under imperfect causal knowledge: a probabilistic approach. Advances in Neural Information Processing Systems, 33: 265–277.
- Bandit based monte-carlo planning. In European conference on machine learning, 282–293. Springer.
- Inverse classification for comparison-based interpretability in machine learning. arXiv preprint arXiv:1712.08443.
- Interpretable counterfactual explanations guided by prototypes. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 650–665. Springer.
- Miller, T. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence, 267: 1–38.
- Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 607–617.
- Counterfactual states for atari agents via generative deep learning. arXiv preprint arXiv:1909.12969.
- FACE: feasible and actionable counterfactual explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 344–350.
- Explainable reinforcement learning: A survey. In International cross-domain conference for machine learning and knowledge extraction, 77–95. Springer.
- Model-agnostic and Scalable Counterfactual Explanations via Reinforcement Learning. arXiv preprint arXiv:2106.02597.
- Actionable recourse in linear classification. In Proceedings of the conference on fairness, accountability, and transparency, 10–19.
- Counterfactual Explanations for Machine Learning: Challenges Revisited. arXiv preprint arXiv:2106.07756.
- Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech., 31: 841.
- Jasmina Gajcin (9 papers)
- Ivana Dusparic (37 papers)