Connecting Algorithmic Research and Usage Contexts: A Perspective of Contextualized Evaluation for Explainable AI (2206.10847v3)

Published 22 Jun 2022 in cs.AI and cs.HC

Abstract: Recent years have seen a surge of interest in the field of explainable AI (XAI), with a plethora of algorithms proposed in the literature. However, a lack of consensus on how to evaluate XAI hinders the advancement of the field. We highlight that XAI is not a monolithic set of technologies -- researchers and practitioners have begun to leverage XAI algorithms to build XAI systems that serve different usage contexts, such as model debugging and decision-support. Algorithmic research of XAI, however, often does not account for these diverse downstream usage contexts, resulting in limited effectiveness or even unintended consequences for actual users, as well as difficulties for practitioners to make technical choices. We argue that one way to close the gap is to develop evaluation methods that account for different user requirements in these usage contexts. Towards this goal, we introduce a perspective of contextualized XAI evaluation by considering the relative importance of XAI evaluation criteria for prototypical usage contexts of XAI. To explore the context dependency of XAI evaluation criteria, we conduct two survey studies, one with XAI topical experts and another with crowd workers. Our results urge for responsible AI research with usage-informed evaluation practices, and provide a nuanced understanding of user requirements for XAI in different usage contexts.

PDF Abstract

Contextualized Evaluation for Explainable AI

The academic paper titled "Connecting Algorithmic Research and Usage Contexts: A Perspective of Contextualized Evaluation for Explainable AI" addresses the critical challenge of evaluating Explainable AI (XAI) in a manner that accounts for diverse usage contexts. The lack of a unified evaluation framework in XAI research has been a major bottleneck for effective applications across varying domains. This research highlights the non-monolithic nature of XAI, stressing the importance of understanding the specific user requirements based on the deployment context. The authors propose a nuanced approach to evaluation, arguing for a contextualized evaluation perspective that prioritizes certain criteria depending on the application context, such as model debugging or decision-support.

Key Concepts and Methodology

The authors methodically categorize XAI evaluation criteria and usage contexts by synthesizing existing literature, distinguishing the importance of aligning evaluation techniques with intended user objectives and contexts. The paper distinguishes between model-intrinsic criteria, such as faithfulness and stability, and human-centered properties, like comprehensibility and actionability. These distinctions emphasize the need to evaluate based on the users' perceptions and task-specific requirements, not merely on computational metrics.

To substantiate this approach, the authors conducted two survey studies: one with XAI experts and another with end-users of a hypothetical AI investment application. This dual-perspective method aims to delineate the priorities across different contexts effectively. The evaluation shows variations in the perceived importance of criteria such as faithfulness, translucence, and uncertainty communication, which differ significantly when comparing tasks like capability assessment versus decision support.

Strong Findings and Implications

The paper provides empirical evidence indicating that the importance of different XAI evaluation criteria varies significantly across contexts:

Faithfulness consistently rated high across contexts, underpinning the necessity for explanations to accurately reflect model behavior.
Translucence and uncertainty communication were highlighted as critical yet underrepresented in existing XAI practices, spotlighting potential areas for future research.
Comprehensibility emerges as essential in contexts where efficiency and cognitive load reduction are crucial, such as in everyday decision-support tasks.

The research suggests that algorithms should be carefully matched with their application domains, and evaluation should explicitly reflect user-specific goals. Furthermore, the nuanced differences between experts' and end-users' perceptions call for a reflective examination of evaluative approaches, ensuring that evaluation aligns with real-world user needs.

Potential for Further Research

The paper opens pathways for future XAI development by encouraging researchers to clearly articulate the use contexts of their algorithms, which in turn should inform the creation of evaluation methods tailored to user requirements. This may involve the development of novel metrics that factor in context-specific evaluation criteria, enhancing the robustness of evaluations but requiring significant interdisciplinary input from human-computer interaction and AI research communities.

Furthermore, adopting a contextualized evaluation framework necessitates broader experimental studies that can validate these preliminary findings on a larger scale, ensuring that XAI techniques not only meet theoretical expectations but also have practical efficacy in diverse operational landscapes.

In conclusion, this paper provides a critical examination of the evaluation landscape in XAI research, pushing for methodologies that rigorously consider the context in which these tools are employed. The insights derived from this paper are indispensable for fostering the responsible development and utilization of XAI systems, aligning technical advancements with societal and user-centric needs.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Q. Vera Liao (49 papers)
Yunfeng Zhang (45 papers)
Ronny Luss (27 papers)
Finale Doshi-Velez (134 papers)
Amit Dhurandhar (62 papers)

Citations (68)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos