Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Faithful Attention Explainer: Verbalizing Decisions Based on Discriminative Features (2405.13032v2)

Published 16 May 2024 in cs.CL, cs.AI, and cs.CV

Abstract: In recent years, model explanation methods have been designed to interpret model decisions faithfully and intuitively so that users can easily understand them. In this paper, we propose a framework, Faithful Attention Explainer (FAE), capable of generating faithful textual explanations regarding the attended-to features. Towards this goal, we deploy an attention module that takes the visual feature maps from the classifier for sentence generation. Furthermore, our method successfully learns the association between features and words, which allows a novel attention enforcement module for attention explanation. Our model achieves promising performance in caption quality metrics and a faithful decision-relevance metric on two datasets (CUB and ACT-X). In addition, we show that FAE can interpret gaze-based human attention, as human gaze indicates the discriminative features that humans use for decision-making, demonstrating the potential of deploying human gaze for advanced human-AI interaction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Interpreting chest x-rays via cnns that exploit hierarchical disease dependencies and uncertainty labels, Neurocomputing (2021).
  2. E. Tjoa, C. Guan, A survey on explainable artificial intelligence (XAI): towards medical XAI, CoRR (2019). URL: http://arxiv.org/abs/1907.07374. arXiv:1907.07374.
  3. User trust on an explainable ai-based medical diagnosis support system, arXiv preprint arXiv:2204.12230 (2022).
  4. Towards human-centered explainable ai: A survey of user studies for model explanations, IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
  5. Flex: Faithful linguistic explanations for neural net based model decisions, in: AAAI, 2019.
  6. Generating visual explanations, CoRR (2016). URL: http://arxiv.org/abs/1603.08507. arXiv:1603.08507.
  7. A consistent and efficient evaluation strategy for attribution methods, in: International Conference on Machine Learning, PMLR, 2022, pp. 18770–18795.
  8. Multimodal explanations: Justifying decisions and pointing to the evidence, CoRR (2018). URL: http://arxiv.org/abs/1802.08129. arXiv:1802.08129.
  9. Rise: Randomized input sampling for explanation of black-box models, BMVC (2018).
  10. Learning deep features for discriminative localization, in: CVPR, 2016.
  11. Grad-cam: Visual explanations from deep networks via gradient-based localization, in: ICCV, 2017.
  12. Axiomatic attribution for deep networks, in: ICML, 2017.
  13. Learning important features through propagating activation differences, in: ICML, 2017.
  14. Textual explanations for self-driving vehicles, CoRR (2018). URL: http://arxiv.org/abs/1807.11546. arXiv:1807.11546.
  15. Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization, CoRR (2016).
  16. M. I. Posner, S. E. Petersen, The attention system of the human brain, Annual review of neuroscience (1990).
  17. Human attention in visual question answering: Do humans and deep networks look at the same regions?, Computer Vision and Image Understanding (2017).
  18. Human attention in fine-grained classification, arXiv preprint arXiv:2111.01628 (2021).
  19. Prophet attention: Predicting attention with future attention, in: NeurIPS, 2020. URL: https://proceedings.neurips.cc/paper/2020/file/13fe9d84310e77f13a6d184dbf1232f3-Paper.pdf.
  20. Show, attend and tell: Neural image caption generation with visual attention, CoRR (2015). URL: http://arxiv.org/abs/1502.03044. arXiv:1502.03044.
  21. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning, in: CVPR, 2017.
  22. Knowing when to look: Adaptive attention via a visual sentinel for image captioning, in: CVPR, 2017.
  23. Image captioning with semantic attention, in: CVPR, 2016.
  24. Attention correctness in neural image captioning, in: AAAI, 2017.
  25. Grounded video description, in: CVPR, 2019.
  26. S. Barratt, Interpnet: Neural introspection for interpretable deep learning, ArXiv (2017).
  27. M. Schuster, K. K. Paliwal, Bidirectional recurrent neural networks, IEEE transactions on Signal Processing (1997).
  28. Learning deep representations of fine-grained visual descriptions, CoRR (2016). URL: http://arxiv.org/abs/1605.05395. arXiv:1605.05395.
  29. 2d human pose estimation: New benchmark and state of the art analysis, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets