XRL-Bench: A Benchmark for Evaluating and Comparing Explainable Reinforcement Learning Techniques (2402.12685v1)
Abstract: Reinforcement Learning (RL) has demonstrated substantial potential across diverse fields, yet understanding its decision-making process, especially in real-world scenarios where rationality and safety are paramount, is an ongoing challenge. This paper delves in to Explainable RL (XRL), a subfield of Explainable AI (XAI) aimed at unravelling the complexities of RL models. Our focus rests on state-explaining techniques, a crucial subset within XRL methods, as they reveal the underlying factors influencing an agent's actions at any given time. Despite their significant role, the lack of a unified evaluation framework hinders assessment of their accuracy and effectiveness. To address this, we introduce XRL-Bench, a unified standardized benchmark tailored for the evaluation and comparison of XRL methods, encompassing three main modules: standard RL environments, explainers based on state importance, and standard evaluators. XRL-Bench supports both tabular and image data for state explanation. We also propose TabularSHAP, an innovative and competitive XRL method. We demonstrate the practical utility of TabularSHAP in real-world online gaming services and offer an open-source benchmark platform for the straightforward implementation and evaluation of XRL methods. Our contributions facilitate the continued progression of XRL technology.
- Openxai: Towards a transparent evaluation of model explanations. Advances in Neural Information Processing Systems 35 (2022), 15784–15799.
- David Alvarez-Melis and Tommi S Jaakkola. 2018. On the robustness of interpretability methods. arXiv preprint arXiv:1806.08049 (2018).
- On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10, 7 (2015), e0130140.
- Andrew G Barto and Sridhar Mahadevan. 2003. Recent advances in hierarchical reinforcement learning. Discrete event dynamic systems 13, 1-2 (2003), 41–77.
- Verifiable reinforcement learning via policy extraction. Advances in neural information processing systems 31 (2018).
- Learning to explain: An information-theoretic perspective on model interpretation. In International conference on machine learning. PMLR, 883–892.
- Fairness via explanation quality: Evaluating disparities in the quality of post hoc explanations. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. 203–214.
- Techniques for interpretable machine learning. Commun. ACM 63, 1 (2019), 68–77.
- Automated rationale generation: a technique for explainable AI and its effects on human perceptions. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 263–274.
- Learning explainable models using attribution priors. (2019).
- Counterfactual multi-agent policy gradients. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
- Autonomous self-explanation of behavior for interactive reinforcement learning agents. In Proceedings of the 5th International Conference on Human Agent Interaction. 97–101.
- How should I explain? A comparison of different explanation types for recommender systems. International Journal of Human-Computer Studies 72, 4 (2014), 367–382.
- Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 3681–3688.
- Visualizing and understanding atari agents. In International conference on machine learning. PMLR, 1792–1801.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026–1034.
- Interpretable policies for reinforcement learning by genetic programming. Engineering Applications of Artificial Intelligence 76 (2018), 158–169.
- Bernease Herman. 2017. The promise and peril of human evaluation for model interpretability. arXiv preprint arXiv:1711.07414 (2017).
- Transparency and explanation in deep reinforcement learning neural networks. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 144–150.
- Language as an abstraction for hierarchical deep reinforcement learning. Advances in Neural Information Processing Systems 32 (2019).
- Zhengyao Jiang and Shan Luo. 2019. Neural logic reinforcement learning. In International conference on machine learning. PMLR, 3110–3119.
- Creativity of ai: Automatic symbolic option discovery for facilitating deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 7042–7050.
- Explainable reinforcement learning via reward decomposition. In IJCAI/ECAI Workshop on explainable artificial intelligence.
- Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30 (2017).
- Vijay Konda and John Tsitsiklis. 1999. Actor-critic algorithms. Advances in neural information processing systems 12 (1999).
- Human evaluation of models built for interpretability. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7. 59–67.
- Edouard Leurent and Jean Mercat. 2019. Social attention for autonomous decision-making in dense traffic. arXiv preprint arXiv:1911.12250 (2019).
- Toward interpretable deep reinforcement learning with linear model u-trees. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part II 18. Springer, 414–429.
- Synthetic benchmarks for scientific research in explainable machine learning. arXiv preprint arXiv:2106.12543 (2021).
- Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888 (2018).
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017).
- SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 2970–2977.
- Explainable reinforcement learning through a causal lens. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 2493–2500.
- A survey of explainable reinforcement learning. arXiv preprint arXiv:2202.08434 (2022).
- Ella: Exploration through learned language abstraction. Advances in Neural Information Processing Systems 34 (2021), 29529–29540.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
- A multidisciplinary survey and framework for design and evaluation of explainable AI systems. ACM Transactions on Interactive Intelligent Systems (TiiS) 11, 3-4 (2021), 1–45.
- A boolean task algebra for reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 9497–9507.
- A review on reinforcement learning: Introduction and applications in industrial process control. Computers & Chemical Engineering 139 (2020), 106886.
- Counterfactual state explanations for reinforcement learning agents via generative deep learning. Artificial Intelligence 295 (2021), 103455.
- Ali Payani and Faramarz Fekri. 2020. Incorporating relational background knowledge into reinforcement learning via differentiable inductive logic programming. arXiv preprint arXiv:2003.10386 (2020).
- Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421 (2018).
- Model agnostic supervised local explanations. Advances in neural information processing systems 31 (2018).
- Erika Puiutta and Eric MSP Veith. 2020. Explainable reinforcement learning: A survey. In International cross-domain conference for machine learning and knowledge extraction. Springer, 77–95.
- Explain your move: Understanding agent actions using specific and relevant feature attribution. arXiv preprint arXiv:1912.12191 (2019).
- " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.
- Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
- Learning important features through propagating activation differences. In International conference on machine learning. PMLR, 3145–3153.
- Hierarchical and interpretable skill acquisition in multi-task reinforcement learning. arXiv preprint arXiv:1712.07294 (2017).
- Multi-task reinforcement learning with context-based representations. In International Conference on Machine Learning. PMLR, 9767–9779.
- Axiomatic attribution for deep networks. In International conference on machine learning. PMLR, 3319–3328.
- Yujin Tang and David Ha. 2021. The sensory neuron as a transformer: Permutation-invariant neural networks for reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 22574–22587.
- Giulia Vilone and Luca Longo. 2020. Explainable artificial intelligence: a systematic review. arXiv preprint arXiv:2006.00093 (2020).
- George A Vouros. 2022. Explainable deep reinforcement learning: state of the art and challenges. Comput. Surveys 55, 5 (2022), 1–39.
- Shapley Q-value: A local reward approach to solve global reward games. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 7285–7292.
- Roman V Yampolskiy and Joshua Fox. 2013. Artificial general intelligence and the human mental model. In Singularity hypotheses: A scientific and philosophical assessment. Springer, 129–145.
- Mastering complex control in moba games with deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 6672–6679.
- Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In 2020 IEEE symposium series on computational intelligence (SSCI). IEEE, 737–744.
- Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics 10, 5 (2021), 593.
- Yu Xiong (22 papers)
- Zhipeng Hu (38 papers)
- Ye Huang (17 papers)
- Runze Wu (28 papers)
- Kai Guan (3 papers)
- Xingchen Fang (1 paper)
- Ji Jiang (27 papers)
- Tianze Zhou (5 papers)
- Yujing Hu (28 papers)
- Haoyu Liu (49 papers)
- Tangjie Lyu (3 papers)
- Changjie Fan (79 papers)