Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Invariant Graph Transformer (2312.07859v2)

Published 13 Dec 2023 in cs.LG and cs.SI

Abstract: Rationale discovery is defined as finding a subset of the input data that maximally supports the prediction of downstream tasks. In graph machine learning context, graph rationale is defined to locate the critical subgraph in the given graph topology, which fundamentally determines the prediction results. In contrast to the rationale subgraph, the remaining subgraph is named the environment subgraph. Graph rationalization can enhance the model performance as the mapping between the graph rationale and prediction label is viewed as invariant, by assumption. To ensure the discriminative power of the extracted rationale subgraphs, a key technique named "intervention" is applied. The core idea of intervention is that given any changing environment subgraphs, the semantics from the rationale subgraph is invariant, which guarantees the correct prediction result. However, most, if not all, of the existing rationalization works on graph data develop their intervention strategies on the graph level, which is coarse-grained. In this paper, we propose well-tailored intervention strategies on graph data. Our idea is driven by the development of Transformer models, whose self-attention module provides rich interactions between input nodes. Based on the self-attention module, our proposed invariant graph Transformer (IGT) can achieve fine-grained, more specifically, node-level and virtual node-level intervention. Our comprehensive experiments involve 7 real-world datasets, and the proposed IGT shows significant performance advantages compared to 13 baseline methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Invariant risk minimization. CoRR, abs/1907.02893, 2019. URL http://arxiv.org/abs/1907.02893.
  2. Layer normalization. CoRR, abs/1607.06450, 2016. URL http://arxiv.org/abs/1607.06450.
  3. Size-invariant graph representations for graph classification extrapolations. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp.  837–851. PMLR, 2021. URL http://proceedings.mlr.press/v139/bevilacqua21a.html.
  4. Residual gated graph convnets. CoRR, abs/1711.07553, 2017. URL http://arxiv.org/abs/1711.07553.
  5. How attentive are graph attention networks? In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=F72ximsx7C1.
  6. Invariant rationalization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pp.  1448–1458. PMLR, 2020. URL http://proceedings.mlr.press/v119/chang20c.html.
  7. Structure-aware transformer for graph representation learning. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., and Sabato, S. (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp.  3469–3489. PMLR, 2022. URL https://proceedings.mlr.press/v162/chen22r.html.
  8. A generalization of transformer networks to graphs. CoRR, abs/2012.09699, 2020. URL https://arxiv.org/abs/2012.09699.
  9. Benchmarking graph neural networks. J. Mach. Learn. Res., 24:43:1–43:48, 2023. URL http://jmlr.org/papers/v24/22-0567.html.
  10. Unsupervised domain adaptation by backpropagation. In Bach, F. R. and Blei, D. M. (eds.), Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pp.  1180–1189. JMLR.org, 2015. URL http://proceedings.mlr.press/v37/ganin15.html.
  11. GOOD: A graph out-of-distribution benchmark. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/0dc91de822b71c66a7f54fa121d8cbb9-Abstract-Datasets_and_Benchmarks.html.
  12. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell., 45(1):87–110, 2023. doi: 10.1109/TPAMI.2022.3152247. URL https://doi.org/10.1109/TPAMI.2022.3152247.
  13. Open graph benchmark: Datasets for machine learning on graphs. CoRR, abs/2005.00687, 2020. URL https://arxiv.org/abs/2005.00687.
  14. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Bach, F. R. and Blei, D. M. (eds.), Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pp.  448–456. JMLR.org, 2015. URL http://proceedings.mlr.press/v37/ioffe15.html.
  15. Causal machine learning: A survey and open problems. CoRR, abs/2206.15475, 2022. doi: 10.48550/arXiv.2206.15475. URL https://doi.org/10.48550/arXiv.2206.15475.
  16. Out-of-distribution generalization with maximal invariant predictor. CoRR, abs/2008.01883, 2020. URL https://arxiv.org/abs/2008.01883.
  17. Rethinking graph transformers with spectral attention. In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp.  21618–21629, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/b4fd1d2cb085390fbbadae65e07876a7-Abstract.html.
  18. Out-of-distribution generalization on graphs: A survey. CoRR, abs/2202.07987, 2022a. URL https://arxiv.org/abs/2202.07987.
  19. Learning invariant graph representations for out-of-distribution generalization. In NeurIPS, 2022b. URL http://papers.nips.cc/paper_files/paper/2022/hash/4d4e0ab9d8ff180bf5b95c258842d16e-Abstract-Conference.html.
  20. OOD-GNN: out-of-distribution generalized graph neural network. IEEE Trans. Knowl. Data Eng., 35(7):7328–7340, 2023. doi: 10.1109/TKDE.2022.3193725. URL https://doi.org/10.1109/TKDE.2022.3193725.
  21. Let invariant rationale discovery inspire graph contrastive learning. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., and Sabato, S. (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp.  13052–13065. PMLR, 2022c. URL https://proceedings.mlr.press/v162/li22v.html.
  22. Graph rationalization with environment-based augmentations. In Zhang, A. and Rangwala, H. (eds.), KDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022, pp.  1069–1078. ACM, 2022. doi: 10.1145/3534678.3539347. URL https://doi.org/10.1145/3534678.3539347.
  23. Graphit: Encoding graph structure in transformers. CoRR, abs/2106.05667, 2021. URL https://arxiv.org/abs/2106.05667.
  24. Transformer for graphs: An overview from architecture perspective. CoRR, abs/2202.08455, 2022. URL https://arxiv.org/abs/2202.08455.
  25. Recipe for a general, powerful, scalable graph transformer. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/5d4834a159f1547b267a05a4e2b7cf5e-Abstract-Conference.html.
  26. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., 15(1):1929–1958, 2014. doi: 10.5555/2627435.2670313. URL https://dl.acm.org/doi/10.5555/2627435.2670313.
  27. Attention is all you need. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
  28. Graph attention networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. URL https://openreview.net/forum?id=rJXMpikCZ.
  29. Linformer: Self-attention with linear complexity. CoRR, abs/2006.04768, 2020. URL https://arxiv.org/abs/2006.04768.
  30. Discovering invariant rationales for graph neural networks. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022a. URL https://openreview.net/forum?id=hGXij5rfiHw.
  31. A comprehensive survey on graph neural networks. IEEE Trans. Neural Networks Learn. Syst., 32(1):4–24, 2021. doi: 10.1109/TNNLS.2020.2978386. URL https://doi.org/10.1109/TNNLS.2020.2978386.
  32. Representing long-range context for graph neural networks with global attention. CoRR, abs/2201.08821, 2022b. URL https://arxiv.org/abs/2201.08821.
  33. How powerful are graph neural networks? In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km.
  34. Multimodal learning with transformers: A survey. CoRR, abs/2206.06488, 2022. doi: 10.48550/arXiv.2206.06488. URL https://doi.org/10.48550/arXiv.2206.06488.
  35. Do transformers really perform badly for graph representation? In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp.  28877–28888, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/f1c1592588411002af340cbaedd6fc33-Abstract.html.
  36. Big bird: Transformers for longer sequences. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/c8512d142a2d849725f31a9a7a361ab9-Abstract.html.
  37. Suger: A subgraph-based graph convolutional network method for bundle recommendation. In Hasan, M. A. and Xiong, L. (eds.), Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, October 17-21, 2022, pp.  4712–4716. ACM, 2022. doi: 10.1145/3511808.3557707. URL https://doi.org/10.1145/3511808.3557707.

Summary

  • The paper introduces IGT, a novel approach for fine-grained graph rationalization using node- and virtual node-level interventions through Transformer self-attention.
  • It employs an encoder, augmenter, intervener, and predictor in a synergistic architecture to identify informative subgraphs while ensuring robustness across varying environments.
  • Experiments on 7 real-world datasets show that IGT consistently outperforms or matches 13 baseline methods, demonstrating enhanced predictive accuracy and interpretability.

Introduction to Invariant Graph Transformer

Graphs are an immensely useful data structure, prevalently used to model relationships and interactions in various fields such as chemistry, social networks, and biology. A critical aspect of graph machine learning is to identify substructures within graphs, termed "graph rationales", that are most informative for the predictions of particular tasks. Graph rationales can enhance model performance and improve model explainability by capturing the most relevant features within a complex network.

Methodology

The proposed method introduces Invariant Graph Transformer (IGT), a novel architecture aiming at fine-grained graph rationalization. Unlike existing methods that intervene on a graph level, IGT operates on a more precise node-level or virtual node-level, leveraging the power of the self-attention mechanism found in Transformer models. Essentially, IGT is composed of an encoder, augmenter, intervener, and predictor, working in synergy to discover and exploit the pivotal subgraph, while ensuring its predictive robustness against the backdrop of varying environments.

Experimental Results

IGT has been rigorously tested across 7 real-world datasets, comparing its performance against 13 baseline methods. The experiments demonstrate that both node-level (IGT-N) and virtual node-level (IGT-VN) variants of IGT consistently outperform or match the competing methods. This indicates that IGT's approach to fine-grained intervention, combined with its invariant learning process, is highly effective in graph rationalization tasks.

Conclusion

The research introduces a new perspective to the graph rationale discovery problem, proposing a Transformer-inspired model that intervenes at a granular level. IGT not only identifies crucial subgraphs more effectively than coarse-grained approaches but also maintains their utility across variable conditions, resulting in an impressive performance. The paper's findings lay the groundwork for future research in optimizing graph-learning models for both predictive accuracy and interpretability.

X Twitter Logo Streamline Icon: https://streamlinehq.com