Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning (2303.04475v2)

Published 8 Mar 2023 in cs.AI and cs.LG

Abstract: While reinforcement learning (RL) algorithms have been successfully applied to numerous tasks, their reliance on neural networks makes their behavior difficult to understand and trust. Counterfactual explanations are human-friendly explanations that offer users actionable advice on how to alter the model inputs to achieve the desired output from a black-box system. However, current approaches to generating counterfactuals in RL ignore the stochastic and sequential nature of RL tasks and can produce counterfactuals that are difficult to obtain or do not deliver the desired outcome. In this work, we propose RACCER, the first RL-specific approach to generating counterfactual explanations for the behavior of RL agents. We first propose and implement a set of RL-specific counterfactual properties that ensure easily reachable counterfactuals with highly probable desired outcomes. We use a heuristic tree search of the agent's execution trajectories to find the most suitable counterfactuals based on the defined properties. We evaluate RACCER in two tasks as well as conduct a user study to show that RL-specific counterfactuals help users better understand agents' behavior compared to the current state-of-the-art approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Highlights: Summarizing agent behavior to people. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 1168–1176.
  2. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6): 26–38.
  3. Byrne, R. M. 2019. Counterfactuals in Explainable Artificial Intelligence (XAI): Evidence from Human Reasoning. In IJCAI, 6276–6282.
  4. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 8789–8797.
  5. Counterfactual explanations for prediction and diagnosis in XAI. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 215–226.
  6. Multi-objective counterfactual explanations. In International Conference on Parallel Problem Solving from Nature, 448–469. Springer.
  7. Uncertainty estimation and out-of-distribution detection for counterfactual explanations: Pitfalls and solutions. arXiv preprint arXiv:2107.09734.
  8. Counterfactual Explanations for Reinforcement Learning. arXiv preprint arXiv:2210.11846.
  9. Factual and counterfactual explanations for black box decision making. IEEE Intelligent Systems, 34(6): 14–23.
  10. A survey of methods for explaining black box models. ACM computing surveys (CSUR), 51(5): 1–42.
  11. Metrics for explainable AI: Challenges and prospects. arXiv preprint arXiv:1812.04608.
  12. GANterfactual-RL: Understanding Reinforcement Learning Agents’ Strategies through Visual Counterfactual Explanations. arXiv preprint arXiv:2302.12689.
  13. A survey of algorithmic recourse: definitions, formulations, solutions, and prospects. arXiv preprint arXiv:2010.04050.
  14. Algorithmic recourse under imperfect causal knowledge: a probabilistic approach. Advances in Neural Information Processing Systems, 33: 265–277.
  15. Bandit based monte-carlo planning. In European conference on machine learning, 282–293. Springer.
  16. Inverse classification for comparison-based interpretability in machine learning. arXiv preprint arXiv:1712.08443.
  17. Interpretable counterfactual explanations guided by prototypes. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 650–665. Springer.
  18. Miller, T. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence, 267: 1–38.
  19. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 607–617.
  20. Counterfactual states for atari agents via generative deep learning. arXiv preprint arXiv:1909.12969.
  21. FACE: feasible and actionable counterfactual explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 344–350.
  22. Explainable reinforcement learning: A survey. In International cross-domain conference for machine learning and knowledge extraction, 77–95. Springer.
  23. Model-agnostic and Scalable Counterfactual Explanations via Reinforcement Learning. arXiv preprint arXiv:2106.02597.
  24. Actionable recourse in linear classification. In Proceedings of the conference on fairness, accountability, and transparency, 10–19.
  25. Counterfactual Explanations for Machine Learning: Challenges Revisited. arXiv preprint arXiv:2106.07756.
  26. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech., 31: 841.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Jasmina Gajcin (9 papers)
  2. Ivana Dusparic (37 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.