A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems (1811.11839v5)

Published 28 Nov 2018 in cs.HC

Abstract: The need for interpretable and accountable intelligent systems grows along with the prevalence of artificial intelligence applications used in everyday life. Explainable intelligent systems are designed to self-explain the reasoning behind system decisions and predictions, and researchers from different disciplines work together to define, design, and evaluate interpretable systems. However, scholars from different disciplines focus on different objectives and fairly independent topics of interpretable machine learning research, which poses challenges for identifying appropriate design and evaluation methodology and consolidating knowledge across efforts. To this end, this paper presents a survey and framework intended to share knowledge and experiences of XAI design and evaluation methods across multiple disciplines. Aiming to support diverse design goals and evaluation methods in XAI research, after a thorough review of XAI related papers in the fields of machine learning, visualization, and human-computer interaction, we present a categorization of interpretable machine learning design goals and evaluation methods to show a mapping between design goals for different XAI user groups and their evaluation methods. From our findings, we develop a framework with step-by-step design guidelines paired with evaluation methods to close the iterative design and evaluation cycles in multidisciplinary XAI teams. Further, we provide summarized ready-to-use tables of evaluation methods and recommendations for different goals in XAI research.

PDF Abstract

A Multidisciplinary Framework for Explainable AI: Design and Evaluation

The paper "A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems" seeks to address the increasing demand for explainable AI (XAI) systems by proposing a comprehensive framework that integrates insights from multiple disciplines. This framework is crucial as AI systems are increasingly being implemented in sensitive and impactful applications where transparency and accountability are paramount.

Key Contributions and Findings

This work methodically categorizes various XAI design goals and evaluation measures, contextualized within three primary user groups: AI novices, data experts, and AI experts. Each category presents distinct design requirements and objectives aligned with the user’s needs and expertise level. The overarching aim is to foster cross-disciplinary efforts, facilitating shared knowledge and methodology to support diverse XAI design goals and evaluation methods.

The paper effectively categorizes eight design goals for XAI systems:

Algorithmic Transparency: Key for AI novices, this goal focuses on providing comprehensible insights into how intelligent systems function, ensuring users understand the model's decision-making process.
User Trust and Reliance: It details how explanations help in building user trust and reliance on AI, which is pivotal in recommendation systems and autonomous applications.
Bias Mitigation: A critical objective is using XAI systems to detect and mitigate discrimination and bias in algorithmic decision-making, which is particularly relevant in domains like criminal justice and financial services.
Privacy Awareness: Ensures that AI systems transparently communicate to end-users how their data is used in decision-making, thus safeguarding user privacy.
Model Visualization and Inspection: Targeted at data experts, this goal facilitates the visualization and inspection of machine learning models, aiding in understanding and refining models.
Model Tuning and Selection: It focuses on enabling experts to interact with model parameters visually, facilitating tune-ups and selection for domain-specific datasets.
Model Interpretability: This is designed for AI experts to derive insights on model behavior, utilizing interpretable models or ad-hoc explanation algorithms.
Model Debugging: Empowers AI experts to utilize interpretability techniques to identify and rectify flaws in model architecture or dataset bias.

The proposed framework is a nested model for XAI system design and evaluation, consisting of three layers: XAI System Goals, User Interface Design, and Interpretable Algorithms, each with tailored evaluation methods. It outlines an iterative cycle of design and evaluation, facilitating continuous refinement of systems towards achieving the identified XAI goals.

Evaluation Measures

The paper provides a nuanced approach to evaluation focusing on five main measures:

Mental Model: Studying users' understanding of AI through model predictions or user interviews.
Explanation Usefulness and Satisfaction: User satisfaction is measured through subjective ratings or task performance to gauge the utility of explanations.
User Trust and Reliance: This encompasses both subjective and objective assessments of how much users trust and devote confidence in the AI-enabled systems.
Human-AI Task Performance: Evaluates how well the user and AI collaborate on tasks, measuring user performance improvements through explanation availability.
Computational Measures: Focuses on the fidelity of explanations to the original black-box models and reliability assessments of model predictions and decisions.

Implications and Future Work

This framework signifies a crucial step for interdisciplinary scholarship in XAI, aligning diverse theoretical and methodological perspectives. By categorically aligning design goals, user types, and evaluation strategies, it paves the way for more targeted and effective XAI system design. Future research could explore extending this framework to more domain-specific applications, adapting the process to evolving AI technologies, and including more granular measures of user satisfaction to ensure comprehensive evaluations. The collaborative lens proposed in this paper can serve as a foundational guide for researchers aiming to tackle the challenges posed by AI opacity, ultimately striving toward Responsible AI development.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Sina Mohseni (12 papers)
Niloofar Zarei (1 paper)
Eric D. Ragan (14 papers)

Citations (102)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - SinaMohseni/Awesome-XAI-Evaluation: Reference tables to introduce and organize evaluation methods and measures for explainable machine learning systems (74 stars)