Transformer Interpretability Beyond Attention Visualization (2012.09838v2)

Published 17 Dec 2020 in cs.CV

Abstract: Self-attention techniques, and specifically Transformers, are dominating the field of text processing and are becoming increasingly popular in computer vision classification tasks. In order to visualize the parts of the image that led to a certain classification, existing methods either rely on the obtained attention maps or employ heuristic propagation along the attention graph. In this work, we propose a novel way to compute relevancy for Transformer networks. The method assigns local relevance based on the Deep Taylor Decomposition principle and then propagates these relevancy scores through the layers. This propagation involves attention layers and skip connections, which challenge existing methods. Our solution is based on a specific formulation that is shown to maintain the total relevancy across layers. We benchmark our method on very recent visual Transformer networks, as well as on a text classification problem, and demonstrate a clear advantage over the existing explainability methods.

Authors (3)

Hila Chefer (14 papers)
Shir Gur (13 papers)
Lior Wolf (217 papers)

Citations (570)

View on Semantic Scholar

Summary

Transformer Interpretability Beyond Attention Visualization

The research paper titled "Transformer Interpretability Beyond Attention Visualization" explores advancing interpretability techniques for Transformer networks, with an emphasis on overcoming the limitations of existing attention-based methods. The authors propose a novel approach that leverages Deep Taylor Decomposition (DTD) to enhance the approach's ability to trace decision-making processes throughout a Transformer's architecture, maintaining consistent relevancy scores across layers.

Key Contributions

The authors highlight several challenges intrinsic to Transformer architectures:

Attention Visualization Limitations: Traditional methods primarily rely on attention maps to understand model decisions, which often provide incomplete insights as they only capture certain aspects of a self-attention mechanism.
Complex Integrations: Transformers utilize non-linear transformations and skip connections, requiring advanced handling to maintain interpretability.

To address these challenges, the paper introduces a method that applies Deep Taylor Decomposition principles, ensuring that relevance is propagated comprehensively across layers. This method integrates attention layers and skip connections within the relevancy propagation process, overcoming numerical instability issues and ensuring relevance conservation through normalization.

Experimental Insights

The authors conduct comprehensive benchmarks evaluating their method against various existing explainability approaches on visual and textual datasets:

Visual Benchmarks: Tests on recent visual Transformer models include image segmentation on the ImageNet-Segmentation subset and perturbation tests on the ImageNet validation set. Their method demonstrated superior performance in segmentation metrics such as pixel accuracy, mAP, and mIoU.
Textual Benchmarks: The method was tested on linguistic datasets using BERT, displaying superior performance in identifying human-marked rationales in sentiment analysis tasks.

Implications and Future Directions

The paper indicates strong implications for the field of explainable AI, particularly in improving the robustness and reliability of interpretability for Transformer networks:

Practical Applications: Enhanced interpretability aids in debugging models, ensuring fairness, and improving bias detection, proving invaluable in sensitive applications like healthcare and finance.
Theoretical Implications: The introduction of more accurate and comprehensive interpretability techniques contributes to a deeper understanding of Transformer-based architectures, potentially influencing future model design.

Looking forward, the research can stimulate further developments in interpretability mechanisms that could be generalized across other complex architectures beyond Transformers. Additionally, adapting this work for real-time applications and lightweight models could facilitate broader applicability.

Conclusion

The authors have presented a thoughtful advancement in interpretability techniques for Transformers, demonstrating significant improvements over previous methods. Their research underscores the importance of a thorough understanding of model mechanics beyond mere attention visualization, paving the way for more transparent AI systems. This paper stands as a vital contribution to the continued evolution of explainable AI, with meaningful impacts on both practical and theoretical fronts.

PDF Markdown

Related Papers

Find Related Papers

Tweets

YouTube

Show All Videos