Towards Relatable Explainable AI with the Perceptual Process (2112.14005v3)

Published 28 Dec 2021 in cs.HC and cs.AI

Abstract: Machine learning models need to provide contrastive explanations, since people often seek to understand why a puzzling prediction occurred instead of some expected outcome. Current contrastive explanations are rudimentary comparisons between examples or raw features, which remain difficult to interpret, since they lack semantic meaning. We argue that explanations must be more relatable to other concepts, hypotheticals, and associations. Inspired by the perceptual process from cognitive psychology, we propose the XAI Perceptual Processing Framework and RexNet model for relatable explainable AI with Contrastive Saliency, Counterfactual Synthetic, and Contrastive Cues explanations. We investigated the application of vocal emotion recognition, and implemented a modular multi-task deep neural network to predict and explain emotions from speech. From think-aloud and controlled studies, we found that counterfactual explanations were useful and further enhanced with semantic cues, but not saliency explanations. This work provides insights into providing and evaluating relatable contrastive explainable AI for perception applications.

PDF Abstract

Relatable Explainable AI: Insights from the Perceptual Process

The paper "Towards Relatable Explainable AI with the Perceptual Process" by Wencan Zhang and Brian Y. Lim addresses a prevalent challenge in machine learning: the need for more intuitive and relatable explanations of AI predictions, particularly in applications like vocal emotion recognition. The authors propose a framework inspired by cognitive psychology's perceptual process, which emphasizes understanding AI decisions in ways that align with human reasoning processes.

Framework and Model Proposal

The paper introduces the XAI Perceptual Processing Framework and the associated RexNet model, designed to enhance explainability by leveraging three types of explanations: Contrastive Saliency, Counterfactual Synthetic, and Contrastive Cues. The rationale behind this framework is rooted in the human perceptual process, which involves selection, organization, and interpretation of stimuli to form decisions. This approach enables AI systems to provide explanations that are not only technical but also semantically meaningful and relatable.

The RexNet model targets vocal emotion recognition—a task the authors identified as requiring improved explainability—by combining deep learning techniques with the proposed framework. The model's modular architecture allows it to deliver multiple layers of explanation for its predictions, offering insights not just at the immediate level of prediction, but also regarding the underlying decision-making processes.

Empirical Evaluations and Findings

The authors conducted extensive evaluations of the framework and RexNet model through modeling studies and user experiments. In the modeling paper, RexNet was shown to outperform a baseline CNN in predicting emotions, demonstrating the practical viability of the proposed approach. The different modules within RexNet, particularly the contrastive explanations, were validated to enhance the interpretability without significantly compromising performance.

Further, the user studies revealed nuanced insights into the usability and effectiveness of various explanation types. For instance, the Counterfactual Sample module, including semantic cues, was particularly appreciated by users, aiding them significantly in understanding audio predictions. Conversely, simple saliency explanations were less useful, indicating a disconnection between technical explanations and human interpretability.

Implications and Future Directions

This research highlights the importance of developing explainable AI systems that resonate with human reasoning, enhancing both transparency and user trust. The implications are profound for applications involving human-AI interaction, where understanding the 'why' behind decisions is crucial.

The framework's reliance on human cognitive processes suggests potential extensions. For instance, applying similar strategies to other domains such as computer vision or structured data analysis could enhance explainability across various AI tasks. Another aspect worth exploring is the refinement of counterfactual synthetic explanations, potentially using more advanced generative models to improve coherence and authenticity.

Moreover, future research could integrate this framework with other human factors to address cognitive biases and improve models’ relatability. This involves designing AI systems not just to think 'like humans', but to support human users in intuitive, comprehensible ways.

In conclusion, the work by Zhang and Lim presents a structured pathway towards making AI more comprehensible by aligning machine explanations with human cognitive structures. The framework and findings, while grounded in vocal emotion recognition, lay the groundwork for broader applications, advocating for a more human-centric approach to AI explainability.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Wencan Zhang (6 papers)
Brian Y. Lim (14 papers)

Citations (53)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos