GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs (1803.07294v1)

Published 20 Mar 2018 in cs.LG and cs.SI

Abstract: We propose a new network architecture, Gated Attention Networks (GaAN), for learning on graphs. Unlike the traditional multi-head attention mechanism, which equally consumes all attention heads, GaAN uses a convolutional sub-network to control each attention head's importance. We demonstrate the effectiveness of GaAN on the inductive node classification problem. Moreover, with GaAN as a building block, we construct the Graph Gated Recurrent Unit (GGRU) to address the traffic speed forecasting problem. Extensive experiments on three real-world datasets show that our GaAN framework achieves state-of-the-art results on both tasks.

Authors (6)

Jiani Zhang (21 papers)
Xingjian Shi (35 papers)
Junyuan Xie (16 papers)
Hao Ma (116 papers)
Irwin King (170 papers)
Dit-Yan Yeung (78 papers)

Citations (550)

View on Semantic Scholar

Summary

GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs

The paper "GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs" introduces a novel architecture designed to handle graph-structured data efficiently. The proposed framework, GaAN, stands out by leveraging the concept of gated attention within graph neural networks. It distinguishes itself from traditional multi-head attention mechanisms by utilizing a convolutional sub-network to manage the importance of each attention head dynamically.

Technical Contributions

The paper makes several key contributions:

Gated Attention Networks (GaAN): GaAN introduces an innovative multi-head attention-based aggregator which selectively gates attention heads, offering enhanced model expressiveness and efficiency. This gating mechanism, implemented via a lightweight convolutional sub-network, incurs minimal computational overhead.
Graph Gated Recurrent Unit (GGRU): The research extends the use of GaAN by constructing GGRUs, which facilitate spatiotemporal forecasting tasks such as traffic speed prediction. This demonstrates the flexibility of GaAN across different task modalities.
Efficiency in Large Graphs: GaAN includes improvements in sampling strategies to reduce memory usage and increase computational efficiency, making the model applicable to larger real-world graphs.

Experimental Results

GaAN's efficacy is validated through extensive experimentation on three real-world datasets: PPI, Reddit, and METR-LA. Key results include:

Node Classification: On PPI and Reddit datasets, GaAN achieves superior performance, demonstrating state-of-the-art effectiveness in inductive node classification tasks. Notably, GaAN outperforms existing models, such as GraphSAGE and GAT, by incorporating its gated attention mechanisms.
Traffic Forecasting: In applications involving spatiotemporal forecasting, GaAN-based GGRUs also surpass competitive benchmarks, including DCRNN, indicating the model’s potential in utilizing graph-based dependencies effectively.

Implications and Future Directions

The proposed GaAN model showcases a significant step forward in efficiently learning on complex graph structures. The introduction of attention gating mechanisms may catalyze further research into attention-based methods on graphs. Moreover, its adaptability suggests potential broader applications, such as in natural language processing and other domains requiring dynamic interaction modeling.

Future research directions could involve integrating edge features into GaAN and extending its scalability to handle even larger graph datasets. The exploration of GaAN in natural language processing tasks, such as machine translation, signifies another promising avenue that blends graph computation with NLP.

In conclusion, GaAN represents an important advancement in graph learning, providing a robust and flexible framework suitable for various complex tasks involving large and spatiotemporal graph data. Its development underscores the ongoing evolution of attention mechanisms in deep learning, paving the way for more intricate and application-specific models.

PDF Markdown

Related Papers

Find Related Papers