Explainable Reinforcement Learning: A Survey (2005.06247v1)

Published 13 May 2020 in cs.LG and stat.ML

Abstract: Explainable Artificial Intelligence (XAI), i.e., the development of more transparent and interpretable AI models, has gained increased traction over the last few years. This is due to the fact that, in conjunction with their growth into powerful and ubiquitous tools, AI models exhibit one detrimential characteristic: a performance-transparency trade-off. This describes the fact that the more complex a model's inner workings, the less clear it is how its predictions or decisions were achieved. But, especially considering Machine Learning (ML) methods like Reinforcement Learning (RL) where the system learns autonomously, the necessity to understand the underlying reasoning for their decisions becomes apparent. Since, to the best of our knowledge, there exists no single work offering an overview of Explainable Reinforcement Learning (XRL) methods, this survey attempts to address this gap. We give a short summary of the problem, a definition of important terms, and offer a classification and assessment of current XRL methods. We found that a) the majority of XRL methods function by mimicking and simplifying a complex model instead of designing an inherently simple one, and b) XRL (and XAI) methods often neglect to consider the human side of the equation, not taking into account research from related fields like psychology or philosophy. Thus, an interdisciplinary effort is needed to adapt the generated explanations to a (non-expert) human user in order to effectively progress in the field of XRL and XAI in general.

PDF Abstract

Explainable Reinforcement Learning: A Survey - An Expert Overview

The paper "Explainable Reinforcement Learning: A Survey" by Erika Puiutta and Eric MSP Veith provides a comprehensive examination of Explainable Artificial Intelligence (XAI) in the context of Reinforcement Learning (RL), an area that is gaining increasing interest as AI systems permeate various domains. The paper seeks to consolidate current knowledge in Explainable Reinforcement Learning (XRL) by presenting a classification and assessment of existing methods, addressing a significant gap in the XAI literature which predominantly focuses on supervised learning contexts.

Key Insights and Contributions

The fundamental issue identified in the paper pertains to a performance-transparency trade-off that is characteristic of AI models, especially those based on RL. As AI models become more complex and gain higher performance, they also become less interpretable, posing a challenge to understanding the rationale behind their decisions. This lack of transparency is particularly alarming in RL settings where decisions and learning processes occur autonomously without direct human oversight.

The survey highlights two predominant findings: firstly, that most current XRL efforts focus on simplifying complex models a posteriori rather than designing them to be interpretable from the outset; secondly, that these efforts often overlook human-centric considerations such as psychological and philosophical aspects that influence model usability and trustworthiness.

Methodological Framework

The paper offers a taxonomy of XAI methods, categorizing them based on the timing of information extraction (intrinsic vs. post-hoc) and the scope of explanation (global vs. local). This framework is utilized to classify various existing XRL methods, revealing a prevalence of post-hoc approaches possibly due to the intrinsic complexity of RL models that makes real-time interpretability challenging.

Intrinsic models aim for interpretability by design, but their simplicity often comes at the cost of model accuracy. Post-hoc models, on the other hand, strive to elucidate complex models after training but may introduce discrepancies between their explanations and the underlying model dynamics.

Selected Methods and Illustrations

The paper explores selected methods from the taxonomy to illustrate the diversity and capabilities of XRL approaches. Notably:

Programmatically Interpretable Reinforcement Learning (PIRL): This approach employs a framework called Neurally Directed Program Search to develop interpretable policies represented in high-level programming languages, balancing interpretability with satisfactory performance.
Hierarchical Policies in Multi-task RL: This method utilizes hierarchical policies structured around human-provided instructions to execute complex tasks, with the notable advantage of inherent interpretability.
Linear Model U-Trees (LMUTs): Proposed as a mimic learning framework, LMUTs approximate complex Q-functions and allow for feature importance analysis, rule extraction, and pixel influence visualization, offering more transparent interpretations of DRL policies.
Structural Causal Models (SCMs): This approach incorporates causal reasoning to provide both actuals and counterfactual explanations, addressing a critical need for explanations that align with human cognitive preferences.

Practical and Theoretical Implications

The survey posits that XRL is essential not only from an ethical and trust perspective but also for regulatory compliance, as exemplified by requirements such as the EU GDPR. The provision of understandable and justifiable AI decisions becomes paramount in tasks involving high stakes and critical decision-making.

Moreover, the paper underscores a significant gap in the consideration of the human element in XAI methodologies. While technical improvements are crucial, future developments in XRL must prioritize interdisciplinary collaboration, incorporating insights from psychology, human-computer interaction, and cognitive sciences to enhance the generation and presentation of explanations suited to human users.

Conclusion

"Explainable Reinforcement Learning: A Survey" serves as a pivotal resource for advancing the discourse on transparency within RL models. By synthesizing current methodologies and identifying the deficiencies in prevailing approaches, Puiutta and Veith set the stage for future explorations into user-centric, interpretable AI systems. This work inherently invites further scrutiny of how XRL can more effectively accommodate the needs and expectations of diverse user bases, ultimately paving the way for more trustworthy and ethically sound AI implementations.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Erika Puiutta (2 papers)
Eric MSP Veith (12 papers)

Citations (221)

View on Semantic Scholar