Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bypassing Skip-Gram Negative Sampling: Dimension Regularization as a More Efficient Alternative for Graph Embeddings (2405.00172v2)

Published 30 Apr 2024 in cs.LG, cs.SI, and stat.ML

Abstract: A wide range of graph embedding objectives decompose into two components: one that enforces similarity, attracting the embeddings of nodes that are perceived as similar, and another that enforces dissimilarity, repelling the embeddings of nodes that are perceived as dissimilar. Without repulsion, the embeddings would collapse into trivial solutions. Skip-Gram Negative Sampling (SGNS) is a popular and efficient repulsion approach that prevents collapse by repelling each node from a sample of dissimilar nodes. In this work, we show that when repulsion is most needed and the embeddings approach collapse, SGNS node-wise repulsion is, in the aggregate, an approximate re-centering of the node embedding dimensions. Such dimension operations are more scalable than node operations and produce a simpler geometric interpretation of the repulsion. Our theoretical result establishes dimension regularization as an effective and more efficient, compared to skip-gram node contrast, approach to enforcing dissimilarity among embeddings of nodes. We use this result to propose a flexible algorithm augmentation framework that improves the scalability of any existing algorithm using SGNS. The framework prioritizes node attraction and replaces SGNS with dimension regularization. We instantiate this generic framework for LINE and node2vec and show that the augmented algorithms preserve downstream link-prediction performance while reducing GPU memory usage by up to 33.3% and training time by 23.4%. Moreover, we show that completely removing repulsion (a special case of our augmentation framework) in LINE reduces training time by 70.9% on average, while increasing link prediction performance, especially for graphs that are globally sparse but locally dense. In general, however, repulsion is needed, and dimension regularization provides an efficient alternative to SGNS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Distributed large-scale natural graph factorization. In WWW’13. ACM, New York, NY, USA, 37–48.
  2. VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning. In ICLR’22.
  3. Neural Graph Learning: Training Neural Networks Using Graphs. In WSDM’18. ACM, New York, NY, USA, 64–71.
  4. Attraction-Repulsion Spectrum in Neighbor Embeddings. JMLR 23, 95 (2022), 1–32.
  5. Machine Learning on Graphs: A Model and Comprehensive Taxonomy. JMLR 23, 89 (2022), 1–64.
  6. Donniell E. Fishkind Daniel L. Sussman, Minh Tang and Carey E. Priebe. 2012. A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs. J. Amer. Statist. Assoc. 107, 499 (2012), 1119–1128.
  7. Andrew Davison and Morgane Austern. 2023. Asymptotics of Network Embeddings Learned via Subsampling. JMLR 24, 138 (2023), 1–120.
  8. On the duality between contrastive and non-contrastive self-supervised learning. In ICLR’23.
  9. Aditya Grover and Jure Leskovec. 2016. Node2vec: Scalable Feature Learning for Networks. In KDD. Association for Computing Machinery, New York, NY, USA, 855–864.
  10. Implicit regularization in matrix factorization. In NIPS’17 (Long Beach, California, USA). Curran Associates Inc., Red Hook, NY, USA, 6152–6160.
  11. Similarity Preserving Adversarial Graph Contrastive Learning. In KDD’23. ACM, New York, NY, USA, 867–878.
  12. Understanding Dimensional Collapse in Contrastive Self-supervised Learning. In ICLR’23.
  13. Energy-Based Models. In Predicting Structured Data. The MIT Press.
  14. Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
  15. HomoGCL: Rethinking Homophily in Graph Contrastive Learning. In KDD’23. ACM, New York, NY, USA, 1341–1352.
  16. Distributed Representations of Words and Phrases and their Compositionality. In NIPS’13. Curran Associates, Inc.
  17. David Mimno and Laure Thompson. 2017. The strange geometry of skip-gram with negative sampling. In EMNLP’17. ACL, Copenhagen, Denmark, 2873–2878.
  18. M. E. J. Newman. 2018. Network structure from rich but noisy data. Nature Physics 14, 6 (June 2018), 542–545.
  19. DeepWalk: online learning of social representations. In KDD’14. ACM.
  20. Contrastive Learning with Hard Negative Samples. In ICLR’21.
  21. Exponential Family Embeddings. In NIPS’16. Curran Associates, Inc.
  22. Link Prediction with Non-Contrastive Learning. In ICLR’23.
  23. CARL-G: Clustering-Accelerated Representation Learning on Graphs. In KDD’23. ACM, New York, NY, USA, 2036–2048.
  24. LINE: Large-Scale Information Network Embedding. In WWW. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1067–1077.
  25. Tongzhou Wang and Phillip Isola. 2022. Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere. In ICML’20.
  26. Jiacheng Xu and Greg Durrett. 2018. Spherical Latent Spaces for Stable Variational Autoencoders. In EMNLP’18. ACL, Brussels, Belgium, 4503–4513.
  27. Revisiting semi-supervised learning with graph embeddings. In ICML’16. JMLR.org, 40–48.
  28. Does Negative Sampling Matter? A Review with Insights into its Theory and Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024), 1–20.
  29. Understanding Negative Sampling in Graph Representation Learning. In KDD’20. ACM, New York, NY, USA, 1666–1676.
  30. BatchSampler: Sampling Mini-Batches for Contrastive Learning in Vision, Language, and Graphs. In KDD’23. Association for Computing Machinery, New York, NY, USA, 3057–3069.
  31. Bayesian inference of network structure from unreliable data. Journal of Complex Networks 8, 6 (03 2021).
  32. Contrastive Cross-scale Graph Knowledge Synergy. In KDD’23. ACM, New York, NY, USA, 3422–3433.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com