Visual correspondence-based explanations improve AI robustness and human-AI team accuracy (2208.00780v5)

Published 26 Jul 2022 in cs.CV, cs.AI, cs.HC, and cs.LG

Abstract: Explaining AI predictions is increasingly important and even imperative in many high-stakes applications where humans are the ultimate decision-makers. In this work, we propose two novel architectures of self-interpretable image classifiers that first explain, and then predict (as opposed to post-hoc explanations) by harnessing the visual correspondences between a query image and exemplars. Our models consistently improve (by 1 to 4 points) on out-of-distribution (OOD) datasets while performing marginally worse (by 1 to 2 points) on in-distribution tests than ResNet-50 and a $k$-nearest neighbor classifier (kNN). Via a large-scale, human study on ImageNet and CUB, our correspondence-based explanations are found to be more useful to users than kNN explanations. Our explanations help users more accurately reject AI's wrong decisions than all other tested methods. Interestingly, for the first time, we show that it is possible to achieve complementary human-AI team accuracy (i.e., that is higher than either AI-alone or human-alone), in ImageNet and CUB image classification tasks.

Citations (36)

View on Semantic Scholar

Summary

The paper introduces two novel architectures, EMD-Corr and CHM-Corr, that integrate visual correspondence explanations directly into the prediction process.
It demonstrates improved out-of-distribution performance on datasets like DAmageNet and Adversarial Patch compared to traditional models.
The proposed approach enhances human-AI teamwork by providing more transparent and effective explanations that help reject incorrect predictions.

Analysis of Visual Correspondence-Based Explanations for Improved AI Robustness

This paper introduces an innovative method for enhancing the robustness of image classifiers using visual correspondence-based explanations. The core contribution lies in developing two novel architectures: EMD-Corr and CHM-Corr. These models aim to outperform conventional mechanisms like ResNet-50 and k-Nearest Neighbors (kNN) by leveraging patch-level correspondences between a query image and exemplars from training data.

Methodological Advances

Classifier Design: The proposed classifiers distinguish themselves from traditional post-hoc explanation methods by incorporating explanation into the prediction process itself. EMD-Corr and CHM-Corr first determine visual correspondences to derive explanations and subsequently use those to inform predictions.
Two-Stage Approach: Both models employ a two-stage process. Initially, they utilize global feature similarity for an initial shortlist of candidate images via kNN. Subsequently, a re-ranking step based on patch-level similarity is performed, harnessing the Earth Mover’s Distance in EMD-Corr and Convolutional Hough Matching in CHM-Corr.
Evaluation Datasets: The research evaluates these methods on ImageNet variants and benchmarks such as ImageNet Sketch and DAmageNet, showcasing the robustness of the proposed methods on both in-distribution and out-of-distribution data.

Results and Findings

Out-of-Distribution Performance: The paper presents compelling evidence that EMD-Corr and CHM-Corr classifiers yield better performance on out-of-distribution datasets than their baseline counterparts. For example, on DAmageNet and Adversarial Patch datasets, these methods achieved significant improvements in accuracy compared to ResNet-50 and kNN.
Human Study Outcomes: The human-AI teamwork was evaluated in studies involving tasks on ImageNet and CUB datasets. The correspondence-based explanations notably assisted users in achieving higher accuracy, especially in rejecting incorrect AI predictions on tasks like CUB. This result is crucial as it validates the effectiveness of visual correspondence-based explanations in practice.
Prototype Utility: Through the ablation paper, it was found that human-defined keypoints were not as effective as the patch selections inferred by the Corr models, suggesting that automatic determination of relevant visual patches captures more pertinent features than predefined prototypes.

Implications and Future Directions

The research herein extends the prototype-based explanation paradigm by integrating it directly into the prediction mechanism. The implications extend beyond classification—to potential applications where detailed, patch-level analysis can drive more accurate and efficient decision-making systems. Future research could explore scalability and real-time application possibilities in dynamic environments like video analysis or other perceptual AI tasks.

Moreover, these findings open avenues for further exploration in improving human interpretability on fine-grained and less structured datasets. Addressing the computational complexity and assessing real-time applicability in diverse scenarios, such as autonomous driving or medical diagnostics, will be critical next steps.

In conclusion, the paper successfully demonstrates that embedding explanation within the prediction loop enhances both AI robustness against adversarial conditions and human trust in AI systems. This innovation paves the way for future AI systems that are not only accurate but also transparent and interpretable by users.

PDF Markdown

Related Papers

GitHub

GitHub - anguyen8/visual-correspondence-XAI: Official code for NeurIPS 2022 paper https://arxiv.org/abs/2208.00780 Visual correspondence-based explanations improve AI robustness and human-AI team accuracy (42 stars)

YouTube

Show All Videos