Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

High-Frequency-aware Hierarchical Contrastive Selective Coding for Representation Learning on Text-attributed Graphs (2402.16240v2)

Published 26 Feb 2024 in cs.IR and cs.SI

Abstract: We investigate node representation learning on text-attributed graphs (TAGs), where nodes are associated with text information. Although recent studies on graph neural networks (GNNs) and pretrained LLMs (PLMs) have exhibited their power in encoding network and text signals, respectively, less attention has been paid to delicately coupling these two types of models on TAGs. Specifically, existing GNNs rarely model text in each node in a contextualized way; existing PLMs can hardly be applied to characterize graph structures due to their sequence architecture. To address these challenges, we propose HASH-CODE, a High-frequency Aware Spectral Hierarchical Contrastive Selective Coding method that integrates GNNs and PLMs into a unified model. Different from previous "cascaded architectures" that directly add GNN layers upon a PLM, our HASH-CODE relies on five self-supervised optimization objectives to facilitate thorough mutual enhancement between network and text signals in diverse granularities. Moreover, we show that existing contrastive objective learns the low-frequency component of the augmentation graph and propose a high-frequency component (HFC)-aware contrastive learning objective that makes the learned embeddings more distinctive. Extensive experiments on six real-world benchmarks substantiate the efficacy of our proposed approach. In addition, theoretical analysis and item embedding visualization provide insights into our model interoperability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (81)
  1. Leveraging Bidding Graphs for Advertiser-Aware Relevance Modeling in Sponsored Search. In Findings of the Association for Computational Linguistics: EMNLP 2021. 2215–2224.
  2. Beyond low-frequency information in graph convolutional networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 3950–3957.
  3. Chen Cai and Yusu Wang. 2020. A note on over-smoothing for graph neural networks. arXiv preprint arXiv:2006.13318 (2020).
  4. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 3438–3445.
  5. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.
  6. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297 (2020).
  7. Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3435–3444.
  8. Node feature extraction by self-supervised multi-scale neighborhood prediction. arXiv preprint arXiv:2111.00064 (2021).
  9. Fan RK Chung. 1997. Spectral graph theory. Vol. 92. American Mathematical Soc.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  11. Carl Eckart and Gale Young. 1936. The approximation of one matrix by another of lower rank. Psychometrika 1, 3 (1936), 211–218.
  12. Don’t stop pretraining: Adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964 (2020).
  13. Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017).
  14. Provable guarantees for self-supervised deep learning with spectral contrastive loss. Advances in Neural Information Processing Systems 34 (2021), 5000–5011.
  15. Kaveh Hassani and Amir Hosein Khasahmadi. 2020. Contrastive multi-view representation learning on graphs. In International Conference on Machine Learning. PMLR, 4116–4126.
  16. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. Springer.
  17. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9729–9738.
  18. Guided image filtering. IEEE transactions on pattern analysis and machine intelligence 35, 6 (2012), 1397–1409.
  19. Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265 (2019).
  20. Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the 12th international conference on World Wide Web. 271–279.
  21. Sub-graph contrast for scalable self-supervised graph representation learning. In 2020 IEEE international conference on data mining (ICDM). IEEE, 222–231.
  22. Heterformer: A Transformer Architecture for Node Representation Learning on Heterogeneous Text-Rich Networks. arXiv preprint arXiv:2205.10282 (2022).
  23. Bite-gcn: A new GCN architecture via bidirectional convolution of topology and features on text-rich networks. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 157–165.
  24. Code Recommendation for Open Source Software Developers. In Proceedings of the ACM Web Conference 2023.
  25. Predicting Information Pathways Across Online Communities. In KDD.
  26. Towards fine-grained reasoning for fake news detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 5746–5754.
  27. Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197–206.
  28. Self-guided contrastive learning for BERT sentence representations. arXiv preprint arXiv:2106.07345 (2021).
  29. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  30. A mutual information maximization perspective of language representation learning. arXiv preprint arXiv:1910.08350 (2019).
  31. Predicting what you already know helps: Provable self-supervised learning. Advances in Neural Information Processing Systems 34 (2021), 309–323.
  32. Adsgnn: Behavior-graph augmented relevance modeling in sponsored search. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 223–232.
  33. Adversarial learning for weakly-supervised social network alignment. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 996–1003.
  34. PPNE: property preserving network embedding. In Database Systems for Advanced Applications: 22nd International Conference, DASFAA 2017, Suzhou, China, March 27-30, 2017, Proceedings, Part I 22. Springer, 163–179.
  35. Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966 (2020).
  36. Deeper insights into graph convolutional networks for semi-supervised learning. In Thirty-Second AAAI conference on artificial intelligence.
  37. Towards deeper graph neural networks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 338–348.
  38. Fine-grained fact verification with kernel graph attention network. arXiv preprint arXiv:1910.09796 (2019).
  39. Graph structural-topic neural network. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1065–1073.
  40. HGK-GNN: Heterogeneous Graph Kernel based Graph Neural Networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1129–1138.
  41. Twinbert: Distilling knowledge to twin-structured compressed bert models for large-scale retrieval. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2645–2652.
  42. Image-based recommendations on styles and substitutes. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. 43–52.
  43. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
  44. Improving Relevance Modeling via Heterogeneous Behavior Graph Learning in Bing Ads. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3713–3721.
  45. Graph representation learning via graphical mutual information maximization. In Proceedings of The Web Conference 2020. 259–270.
  46. Understanding contrastive learning requires incorporating inductive biases. In International Conference on Machine Learning. PMLR, 19250–19286.
  47. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE signal processing magazine 30, 3 (2013), 83–98.
  48. Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 990–998.
  49. Yuandong Tian. 2022. Deep contrastive learning is provably (almost) principal component analysis. arXiv preprint arXiv:2201.12680 (2022).
  50. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
  51. Deep Graph Infomax. ICLR (Poster) 2, 3 (2019), 4.
  52. Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing 17, 4 (2007), 395–416.
  53. Text classification with heterogeneous information network kernels. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.
  54. Billion-scale commodity embedding for e-commerce recommendation in alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 839–848.
  55. Linked document embedding for classification. In Proceedings of the 25th ACM international on conference on information and knowledge management. 115–124.
  56. Improving textual network learning with variational homophilic embeddings. Advances in Neural Information Processing Systems 32 (2019).
  57. KEPLER: A unified model for knowledge embedding and pre-trained language representation. Transactions of the Association for Computational Linguistics 9 (2021), 176–194.
  58. Heterogeneous graph attention network. In The world wide web conference. 2022–2032.
  59. Self-supervised heterogeneous graph neural network with co-contrastive learning. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 1726–1736.
  60. An adaptive graph pre-training framework for localized collaborative filtering. ACM Transactions on Information Systems 41, 2 (2022), 1–27.
  61. Zixin Wen and Yuanzhi Li. 2021. Toward understanding the feature learning process of self-supervised contrastive learning. In International Conference on Machine Learning. PMLR, 11112–11122.
  62. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).
  63. Progcl: Rethinking hard negative mining in graph contrastive learning. In International Conference on Machine Learning. PMLR, 24332–24346.
  64. Self-supervised graph-level representation learning with local and global structure. In International Conference on Machine Learning. PMLR, 11548–11558.
  65. A deep neural information fusion architecture for textual network embeddings. arXiv preprint arXiv:1908.11057 (2019).
  66. A Comprehensive Study on Text-attributed Graphs: Benchmarking and Rethinking. Advances in Neural Information Processing Systems 36 (2024).
  67. Network representation learning with rich text information.. In IJCAI, Vol. 2015. 2111–2117.
  68. GraphFormers: GNN-nested transformers for representation learning on textual graph. Advances in Neural Information Processing Systems 34 (2021), 28798–28810.
  69. Reinforcement subgraph reasoning for fake news detection. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2253–2262.
  70. LinkBERT: Pretraining Language Models with Document Links. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 8003–8016.
  71. Graph-based neural multi-document summarization. arXiv preprint arXiv:1706.06681 (2017).
  72. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 974–983.
  73. Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 793–803.
  74. Shne: Representation learning for semantic-associated heterogeneous networks. In Proceedings of the twelfth ACM international conference on web search and data mining. 690–698.
  75. Geoburst: Real-time local event detection in geo-tagged tweet streams. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 513–522.
  76. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017).
  77. Graph-bert: Only attention is needed for learning graph representations. arXiv preprint arXiv:2001.05140 (2020).
  78. Learning on Large-scale Text-attributed Graphs via Variational Inference. arXiv preprint arXiv:2210.14709 (2022).
  79. Beyond the Overlapping Users: Cross-Domain Recommendation via Adaptive Anchor Link Learning. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1488–1497.
  80. S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization. In Proceedings of the 29th ACM international conference on information & knowledge management. 1893–1902.
  81. Textgnn: Improving text encoder via graph neural network in sponsored search. In Proceedings of the Web Conference 2021. 2848–2857.
Citations (3)

Summary

We haven't generated a summary for this paper yet.