Towards Mechanistic Interpretability of Graph Transformers via Attention Graphs (2502.12352v2)

Published 17 Feb 2025 in cs.LG and cs.AI

Abstract: We introduce Attention Graphs, a new tool for mechanistic interpretability of Graph Neural Networks (GNNs) and Graph Transformers based on the mathematical equivalence between message passing in GNNs and the self-attention mechanism in Transformers. Attention Graphs aggregate attention matrices across Transformer layers and heads to describe how information flows among input nodes. Through experiments on homophilous and heterophilous node classification tasks, we analyze Attention Graphs from a network science perspective and find that: (1) When Graph Transformers are allowed to learn the optimal graph structure using all-to-all attention among input nodes, the Attention Graphs learned by the model do not tend to correlate with the input/original graph structure; and (2) For heterophilous graphs, different Graph Transformer variants can achieve similar performance while utilising distinct information flow patterns. Open source code: https://github.com/batu-el/understanding-inductive-biases-of-gnns

Summary

The paper proposes the Attention Graphs framework to interpret Graph Transformers by aggregating attention patterns across multiple layers and heads.
Analysis with Attention Graphs shows Graph Transformers use distinct information flow strategies that do not always follow the input graph structure.
Findings suggest tuning Graph Transformers based on graph type (e.g., local attention for homophilous) can improve task performance and optimize design.

Mechanistic Interpretability of Graph Transformers through Attention Graphs

The transformation of deep learning models from opaque "black boxes" to interpretable systems is a crucial step for advancing their application in scientific research. The paper "Towards Mechanistic Interpretability of Graph Transformers via Attention Graphs" proposes a novel method for understanding Graph Neural Networks (GNNs) and Graph Transformers (GTs) through the lens of network science. The authors introduce "Attention Graphs" as a framework to aggregate attention patterns from multiple layers and heads of Transformer models, providing a detailed perspective on how these models process information.

Key Contributions

The paper makes several notable contributions:

Attention Graphs Framework: By creating Attention Graphs, the research illustrates a new approach to mechanistically interpret how graph-based deep learning architectures manage information flow. Attention Graphs aggregate attention matrices across heads and layers of GTs to map how input data propagates through the network.
Analysis Across Graph Types: The framework is applied to tasks involving homophilous and heterophilous datasets, showcasing differences in information flow patterns. The results indicate that GTs do not necessarily follow the structure of the original input graph, a counterintuitive finding given the widespread assumption that neural network architectures leverage inherent graph structures.
Diverse Computational Strategies: The paper identifies that although different variants of GTs, such as Dense and Sparse models, achieve similar performance levels, they employ distinct computational strategies. This is evidenced by varying attention distribution patterns, particularly concerning the spread of attention between neighboring and non-neighboring nodes.
Theoretical and Practical Implications: The research suggests that GTs and attentional models can be tuned to prioritize certain types of information flow, which could lead to improved model designs for specific applications. The framework also emphasizes the potential for future studies to apply network science methodologies for comprehensive evaluations of model behavior.

Experimental Setup and Findings

The experiments involve seven node classification datasets with varying homophily levels to test the Attention Graphs' efficacy. The authors leveraged different GT configurations, including sparse and dense learned models, to illustrate how adjustments in attention mechanisms impact information propagation.

Significant findings included:

Performance on Homophilous vs. Heterophilous Graphs: While local attention (Sparse models) was optimal for homophilous graphs, dense attention patterns (Dense models) suited heterophilous graphs, underscoring the necessity for adaptable attention mechanisms in GTs.
Self-attention and Reference Nodes: Dense learned models tend to develop reference nodes, characterized by a vertical pattern in their quasi-adjacency matrices, suggesting a reliance on comparing nodes to key reference nodes.

Future Directions

The implications of this paper open several avenues for AI research. Future efforts might leverage the interpretability framework presented here to uncover more about how these networks can be optimized for different domains, such as biology or physics, where clearly understanding model reasoning is critical. Additionally, advancing the granularity of Attention Graphs to capture non-linear interactions between layers might reveal deeper insights into network dynamics.

Moreover, this paper posits intriguing theoretical questions about the nature of attention in GTs and potential parallels with human cognitive processes, especially in systems focused on prediction and classification. As AI continues to integrate more closely with complex systems and scientific applications, frameworks like Attention Graphs will be invaluable in aligning models' internal mechanisms with human interpretable patterns.

In conclusion, this paper provides a stepping stone for developing a more nuanced understanding of graph-based neural architectures, with wide-ranging implications for both future research and practical applications in AI-driven scientific discovery.

GitHub

GitHub - batu-el/understanding-inductive-biases-of-gnns: Geometric Deep Learning @ University of Cambridge (1 star)

Tweets

https://twitter.com/chaitjo/status/1892181157552287960