Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Todyformer: Towards Holistic Dynamic Graph Transformers with Structure-Aware Tokenization (2402.05944v1)

Published 2 Feb 2024 in cs.LG

Abstract: Temporal Graph Neural Networks have garnered substantial attention for their capacity to model evolving structural and temporal patterns while exhibiting impressive performance. However, it is known that these architectures are encumbered by issues that constrain their performance, such as over-squashing and over-smoothing. Meanwhile, Transformers have demonstrated exceptional computational capacity to effectively address challenges related to long-range dependencies. Consequently, we introduce Todyformer-a novel Transformer-based neural network tailored for dynamic graphs. It unifies the local encoding capacity of Message-Passing Neural Networks (MPNNs) with the global encoding of Transformers through i) a novel patchifying paradigm for dynamic graphs to improve over-squashing, ii) a structure-aware parametric tokenization strategy leveraging MPNNs, iii) a Transformer with temporal positional-encoding to capture long-range dependencies, and iv) an encoding architecture that alternates between local and global contextualization, mitigating over-smoothing in MPNNs. Experimental evaluations on public benchmark datasets demonstrate that Todyformer consistently outperforms the state-of-the-art methods for downstream tasks. Furthermore, we illustrate the underlying aspects of the proposed model in effectively capturing extensive temporal dependencies in dynamic graphs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Dyg2vec: Representation learning for dynamic graphs with self-supervision. arXiv preprint arXiv:2210.16906, 2022.
  2. Structure-aware transformer for graph representation learning. In International Conference on Machine Learning, pp. 3469–3489. PMLR, 2022.
  3. Do we really need complicated model architectures for temporal networks? arXiv preprint arXiv:2302.11636, 2023.
  4. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  5. Temporal link prediction: A survey. New Generation Computing, 38(1):213–258, 2020.
  6. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  7. A generalization of transformer networks to graphs. arXiv preprint arXiv:2012.09699, 2020.
  8. Neural message passing for quantum chemistry. In International conference on machine learning. PMLR, 2017.
  9. William L Hamilton. Graph representation learning. Morgan & Claypool Publishers, 2020.
  10. Inductive representation learning on large graphs. In Proc. Adv. Neural Inf. Proc. Systems, 2017.
  11. A generalization of vit/mlp-mixer to graphs. In International Conference on Machine Learning, pp. 12724–12745. PMLR, 2023.
  12. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  13. Temporal graph benchmark for machine learning on temporal graphs. arXiv preprint arXiv:2307.01026, 2023.
  14. Neural temporal walks: Motif-aware representation learning on continuous-time dynamic graphs. In Thirty-Sixth Conference on Neural Information Processing Systems, 2022.
  15. Time2vec: Learning a vector representation of time. arXiv preprint arXiv:1907.05321, 2019.
  16. Representation learning for dynamic graphs: A survey. Journal of Machine Learning Research, 2020.
  17. Pure transformers are powerful graph learners. Advances in Neural Information Processing Systems, 35:14582–14595, 2022.
  18. Rethinking graph transformers with spectral attention. Advances in Neural Information Processing Systems, 34:21618–21629, 2021.
  19. Predicting dynamic embedding trajectory in temporal interaction networks. In Proc. Int. Conf. on Knowledge Discovery & Data Mining, 2019.
  20. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  10012–10022, 2021.
  21. Neighborhood-aware scalable temporal network representation learning. In The First Learning on Graphs Conference, 2022.
  22. Towards better evaluation for dynamic link prediction. Advances in Neural Information Processing Systems, 35:32928–32941, 2022.
  23. Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems, 35:14501–14515, 2022.
  24. Anomaly detection in dynamic networks: a survey. WIREs Computational Statistics, 7(3):223–247, 2015.
  25. Temporal graph networks for deep learning on dynamic graphs. In ICML Workshop on Graph Representation Learning, 2020.
  26. Provably expressive temporal graph networks. Advances in Neural Information Processing Systems, 35:32257–32269, 2022.
  27. Mlp-mixer: An all-mlp architecture for vision. Advances in Neural Information Processing Systems, 34:24261–24272, 2021.
  28. Understanding over-squashing and bottlenecks on graphs via curvature. arXiv preprint arXiv:2111.14522, 2021.
  29. Know-evolve: Deep temporal reasoning for dynamic knowledge graphs. In Proc. Int. Conf. on Machine Learning, pp.  3462–3471. PMLR, 2017.
  30. Dyrep: Learning representations over dynamic graphs. In Proc. Int. Conf. on Learning Representations, 2019.
  31. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  32. Graph Attention Networks. In ICLR, 2018.
  33. Inductive representation learning in temporal networks via causal anonymous walks. In Proc. Int. Conf. on Learning Representations, 2021.
  34. Inductive representation learning on temporal graphs. Proc. Int. Conf. on Representation Learning, 2020.
  35. Do transformers really perform badly for graph representation? Advances in Neural Information Processing Systems, 34:28877–28888, 2021.
  36. Towards better dynamic graph learning: New architecture and unified library. arXiv preprint arXiv:2303.13047, 2023.
  37. Graph-bert: Only attention is needed for learning graph representations. arXiv preprint arXiv:2001.05140, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Mahdi Biparva (16 papers)
  2. Raika Karimi (6 papers)
  3. Faezeh Faez (6 papers)
  4. Yingxue Zhang (72 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com