A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer (2402.02464v3)
Abstract: Can we model Non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The Non-Euclidean property have posed a long term challenge in graph modeling. Despite recent graph neural networks and graph transformers efforts encoding graphs as Euclidean vectors, recovering the original graph from vectors remains a challenge. In this paper, we introduce GraphsGPT, featuring an Graph2Seq encoder that transforms Non-Euclidean graphs into learnable Graph Words in the Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from Graph Words to ensure information equivalence. We pretrain GraphsGPT on $100$M molecules and yield some interesting findings: (1) The pretrained Graph2Seq excels in graph representation learning, achieving state-of-the-art results on $8/9$ graph classification and regression tasks. (2) The pretrained GraphGPT serves as a strong graph generator, demonstrated by its strong ability to perform both few-shot and conditional graph generation. (3) Graph2Seq+GraphGPT enables effective graph mixup in the Euclidean space, overcoming previously known Non-Euclidean challenges. (4) The edge-centric pretraining framework GraphsGPT demonstrates its efficacy in graph domain tasks, excelling in both representation and generation. Code is available at \href{https://github.com/A4Bio/GraphsGPT}{GitHub}.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Distributed large-scale natural graph factorization. In Proceedings of the 22nd international conference on World Wide Web, pp. 37–48, 2013.
- Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
- Molgpt: molecular generation using a transformer-decoder model. Journal of Chemical Information and Modeling, 62(9):2064–2076, 2021.
- Guacamol: benchmarking models for de novo molecular design. Journal of chemical information and modeling, 59(3):1096–1108, 2019.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Infinitewalk: Deep network embeddings as laplacian embeddings with a nonlinearity. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1325–1333, 2020.
- Structure-aware transformer for graph representation learning. In International Conference on Machine Learning, pp. 3469–3489. PMLR, 2022.
- Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:1801.10247, 2018.
- Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 257–266, 2019.
- Scaling vision transformers to 22 billion parameters. In ICML, pp. 7480–7512. PMLR, 2023.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805, 2018.
- Diehl, F. Edge contraction pooling for graph neural networks. arXiv preprint arXiv:1905.10990, 2019.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- A generalization of transformer networks to graphs. arXiv preprint arXiv:2012.09699, 2020.
- Pifold: Toward effective and efficient protein inverse folding. In The Eleventh International Conference on Learning Representations, 2022.
- node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864, 2016.
- Interpolating graph pair to regularize graph classification. In AAAI, volume 37, pp. 7766–7774, 2023.
- Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.
- G-mixup: Graph data augmentation for graph classification. In ICML, pp. 8230–8248. PMLR, 2022.
- Graphmae: Self-supervised masked graph autoencoders. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 594–604, 2022.
- Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265, 2019.
- Gpt-gnn: Generative pre-training of graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1857–1867, 2020a.
- Gpt-gnn: Generative pre-training of graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1857–1867, 2020b.
- Heterogeneous graph transformer. In Proceedings of the web conference 2020, pp. 2704–2710, 2020c.
- 3dlinker: an e (3) equivariant variational autoencoder for molecular linker design. arXiv preprint arXiv:2205.07309, 2022.
- Edge-augmented graph transformers: Global self-attention is enough for graphs. arXiv preprint arXiv:2108.03348, 2021.
- Self-supervised auxiliary learning with meta-paths for heterogeneous graphs. Advances in Neural Information Processing Systems, 33:10294–10305, 2020.
- Zinc- a free database of commercially available compounds for virtual screening. Journal of chemical information and modeling, 45(1):177–182, 2005.
- Self-supervised learning on graphs: Deep insights and new direction. arXiv preprint arXiv:2006.10141, 2020.
- Pure transformers are powerful graph learners. Advances in Neural Information Processing Systems, 35:14582–14595, 2022.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016a.
- Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016b.
- Rethinking graph transformers with spectral attention. Advances in Neural Information Processing Systems, 34:21618–21629, 2021.
- Self-attention graph pooling. In International conference on machine learning, pp. 3734–3743. PMLR, 2019.
- A survey of graph neural network based recommendation in social networks. Neurocomputing, pp. 126441, 2023a.
- General point model with autoencoding and autoregressive. arXiv preprint arXiv:2310.16861, 2023b.
- Improving graph collaborative filtering with neighborhood-enriched contrastive learning. In Proceedings of the ACM Web Conference 2022, pp. 2320–2329, 2022.
- Gnnrec: Gated graph neural network for session-based social recommendation model. Journal of Intelligent Information Systems, 60(1):137–156, 2023a.
- Pre-training molecular graph representation with 3d geometry. arXiv preprint arXiv:2110.07728, 2021a.
- Self-supervised learning: Generative or contrastive. IEEE transactions on knowledge and data engineering, 35(1):857–876, 2021b.
- Graph self-supervised learning: A survey. IEEE Transactions on Knowledge and Data Engineering, 35(6):5879–5900, 2022.
- Simple contrastive graph clustering. IEEE Transactions on Neural Networks and Learning Systems, 2023b.
- Hard sample aware network for contrastive deep graph clustering. In Proceedings of the AAAI conference on artificial intelligence, volume 37, pp. 8914–8922, 2023c.
- Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pp. 10012–10022, 2021c.
- Graph convolutional networks with eigenpooling. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 723–731, 2019.
- Accelerated hierarchical density based clustering. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on, pp. 33–42. IEEE, 2017.
- Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
- Graphit: Encoding graph structure in transformers. arXiv preprint arXiv:2106.05667, 2021.
- Transformer for graphs: An overview from architecture perspective. arXiv preprint arXiv:2202.08455, 2022.
- Masked autoencoders for point cloud self-supervised learning. In ECCV, pp. 604–621. Springer, 2022.
- Graph transplant: Node saliency-guided graph mixup with local structure preservation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 7966–7974, 2022.
- Self-supervised graph representation learning via global context prediction. arXiv:2003.01604, 2020a.
- Graph representation learning via graphical mutual information maximization. In Proceedings of The Web Conference 2020, pp. 259–270, 2020b.
- Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710, 2014.
- Molecular sets (moses): a benchmarking platform for molecular generation models. Frontiers in pharmacology, 11:565644, 2020.
- Gcc: Graph contrastive coding for graph neural network pre-training. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1150–1160, 2020.
- Searching for activation functions. arXiv:1710.05941, 2017.
- Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems, 35:14501–14515, 2022.
- Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information Processing Systems, 33:12559–12571, 2020.
- Rimeshgnn: A rotation-invariant graph neural network for mesh classification. In WACV, pp. 3150–3160, 2024.
- 3d infomax improves gnns for molecular property prediction. In ICML, pp. 20479–20502. PMLR, 2022.
- Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. arXiv preprint arXiv:1908.01000, 2019.
- Adversarial graph augmentation to improve graph contrastive learning. Advances in Neural Information Processing Systems, 34:15920–15933, 2021.
- Target-aware molecular graph generation. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 410–427. Springer, 2023.
- Heterogeneous graph masked autoencoders. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 9997–10005, 2023.
- Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
- Self-supervised learning of contextual embeddings for link prediction in heterogeneous networks. In Proceedings of the web conference 2021, pp. 2946–2957, 2021.
- Molecular contrastive learning of representations via graph neural networks. NMI, 4(3):279–287, 2022.
- Simplifying graph convolutional networks. In ICML, pp. 6861–6871. PMLR, 2019.
- Self-supervised learning on graphs: Contrastive, generative, or predictive. IEEE Transactions on Knowledge and Data Engineering, 2021a.
- Graphmixup: Improving class-imbalanced node classification by reinforcement mixup and self-supervised context prediction. In ECML-PKDD, pp. 519–535. Springer, 2022.
- Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
- Representing long-range context for graph neural networks with global attention. NeurIPS, 34:13266–13279, 2021b.
- Simgrace: A simple framework for graph contrastive learning without data augmentation. In Proceedings of the ACM Web Conference 2022, pp. 1070–1079, 2022a.
- Mole-bert: Rethinking pre-training graph neural networks for molecules. In The Eleventh International Conference on Learning Representations, 2022b.
- Mole-bert: Rethinking pre-training graph neural networks for molecules. In The Eleventh International Conference on Learning Representations, 2023.
- Vertex-reinforced random walk for network embedding. In Proceedings of the 2020 SIAM International Conference on Data Mining, pp. 595–603. SIAM, 2020.
- Self-supervised learning of graph neural networks: A unified review. IEEE transactions on pattern analysis and machine intelligence, 45(2):2412–2429, 2022.
- How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
- Self-supervised graph-level representation learning with local and global structure. In International Conference on Machine Learning, pp. 11548–11558. PMLR, 2021.
- Do transformers really perform badly for graph representation? Advances in Neural Information Processing Systems, 34:28877–28888, 2021.
- Hierarchical graph representation learning with differentiable pooling. Advances in neural information processing systems, 31, 2018.
- Graph contrastive learning with augmentations. NeurIPS, 33:5812–5823, 2020.
- Graph contrastive learning automated. In International Conference on Machine Learning, pp. 12121–12132. PMLR, 2021.
- Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In CVPR, pp. 19313–19322, 2022.
- Contrastive self-supervised learning for graph classification. In AAAI, volume 35, pp. 10824–10832, 2021.
- Root mean square layer normalization. Advances in Neural Information Processing Systems, 32, 2019.
- Mixupexplainer: Generalizing explanations for graph neural networks with data augmentation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3286–3296, 2023.
- Motif-based graph self-supervised learning for molecular property prediction. Advances in Neural Information Processing Systems, 34:15870–15882, 2021.
- Gophormer: Ego-graph transformer for node classification. arXiv preprint arXiv:2110.13094, 2021.
- Graph neural networks: A review of methods and applications. AI open, 1:57–81, 2020a.
- Data augmentation for graph classification. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 2341–2344, 2020b.
- Deep graph contrastive representation learning. arXiv preprint arXiv:2006.04131, 2020.
- Graph contrastive learning with adaptive augmentation. In Proceedings of the Web Conference 2021, pp. 2069–2080, 2021.
- Multi-level cross-view contrastive learning for knowledge-aware recommender system. In SIGIR, pp. 1358–1368, 2022.