Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Local Graph Clustering with Noisy Labels (2310.08031v2)

Published 12 Oct 2023 in cs.LG, cs.SI, and stat.ML

Abstract: The growing interest in machine learning problems over graphs with additional node information such as texts, images, or labels has popularized methods that require the costly operation of processing the entire graph. Yet, little effort has been made to the development of fast local methods (i.e. without accessing the entire graph) that extract useful information from such data. To that end, we propose a study of local graph clustering using noisy node labels as a proxy for additional node information. In this setting, nodes receive initial binary labels based on cluster affiliation: 1 if they belong to the target cluster and 0 otherwise. Subsequently, a fraction of these labels is flipped. We investigate the benefits of incorporating noisy labels for local graph clustering. By constructing a weighted graph with such labels, we study the performance of graph diffusion-based local clustering method on both the original and the weighted graphs. From a theoretical perspective, we consider recovering an unknown target cluster with a single seed node in a random graph with independent noisy node labels. We provide sufficient conditions on the label noise under which, with high probability, using diffusion in the weighted graph yields a more accurate recovery of the target cluster. This approach proves more effective than using the given labels alone or using diffusion in the label-free original graph. Empirically, we show that reliable node labels can be obtained with just a few samples from an attributed graph. Moreover, utilizing these labels via diffusion in the weighted graph leads to significantly better local clustering performance across several real-world datasets, improving F1 scores by up to 13%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. A local algorithm for finding well-connected clusters. In International Conference on Machine Learning (ICML), 2013.
  2. Local graph partitioning using pagerank vectors. IEEE Symposium on Foundations of Computer Science (FOCS), 2006.
  3. Almost optimal local graph clustering using evolving sets. Journal of the ACM, 63(2), 2016.
  4. 2-norm flow diffusion in near-linear time. In IEEE Symposium on Foundations of Computer Science (FOCS), 2022.
  5. K. Choromanski. Taming graph kernels with random features. In International Conference on Machine Learning (ICML), 2023.
  6. Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs. In ACM Symposium on Theory of Computing (STOC), 2011.
  7. F. Chung. A local graph partitioning algorithm using heat kernel pagerank. Internet Mathematics, 6(3):315–330, 2009.
  8. Pixie: A system for recommending 3+limit-from33+3 + billion items to 200+limit-from200200+200 + million users in real-time. In Proceedings of the 2018 World Wide Web Conference (WWW), 2018.
  9. Local hyper-flow diffusion. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
  10. Flow-based algorithms for improving clusters: A unifying framework, software, and performance. SIAM Review, 65(1):59–143, 2023.
  11. Variational perspective on local graph clustering. Mathematical Programming, 174:553–573, 2017.
  12. p-norm flow diffusion for local graph clustering. International Conference on Machine Learning (ICML), 2020.
  13. D. F. Gleich. Pagerank beyond the web. SIAM Review, 57(3):321–363, 2015.
  14. Statistical guarantees for local graph clustering. The Journal of Machine Learning Research, 22(1):6538–6591, 2021.
  15. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
  16. T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
  17. Community membership identification from small seed sets. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2014.
  18. M. Liu and D. F. Gleich. Strongly local p-norm-cut algorithms for semi-supervised learning and local graph clustering. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  19. P. Macgregor and H. Sun. Local algorithms for finding densely connected clusters. In International Conference on Machine Learning (ICML), 2021.
  20. A local spectral method for graphs: With applications to improving graph partitions and exploring data graphs locally. The Journal of Machine Learning Research, 13(1):2339–2365, 2012.
  21. Accelerated and sparse algorithms for approximate personalized pagerank and beyond. In Proceedings of Thirty Sixth Conference on Learning Theory (COLT), 2023.
  22. Image-based recommendations on styles and substitutes. In ACM International Conference on Research and Development in Information Retrieval (SIGIR), 2015.
  23. Automating the construction of internet portals with machine learning. Information Retrieval, 3:127–163, 2000.
  24. L. Orecchia and Z. A. Zhu. Flow-based algorithms for local graph clustering. In ACM-SIAM Symposium on Discrete Algorithms (SODA), 2014.
  25. Deepwalk: Online learning of social representations. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2014.
  26. A. Reid and P. Yuval. Finding sparse cuts locally using evolving sets. In ACM Symposium on Theory of Computing (STOC), 2009.
  27. Collective classification in network data. AI magazine, 29(3):93–93, 2008.
  28. Pitfalls of graph neural network evaluation. Relational Representation Learning Workshop, NeurIPS 2018, 2018.
  29. Local Lanczos spectral approximation for community detection. In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2017.
  30. A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM Journal on computing, 42(1):1–26, 2013.
  31. Network embedding for community detection in attributed networks. ACM Transactions on Knowledge Discovery from Data (TKDD), 14(3):1–25, 2020.
  32. Capacity releasing diffusion for speed and locality. International Conference on Machine Learning (ICML), 2017.
  33. Twitterrank: Finding topic-sensitive influential twitterers. In ACM International Conference on Web Search and Data Mining (WSDM), 2010.
  34. Edge-weighted personalized pagerank: Breaking a decade-old performance barrier. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2015.
  35. W. Xing and A. Ghorbani. Weighted pagerank algorithm. In IEEE Annual Conference on Communication Networks and Services Research (CNSR), 2004.
  36. Community detection in networks with node attributes. In IEEE International Conference on Data Mining (ICDM), 2013.
  37. S. Yang and K. Fountoulakis. Weighted flow diffusion for local graph clustering with node attributes: an algorithm and statistical guarantees. In International Conference on Machine Learning (ICML), 2023.
  38. Local higher-order graph clustering. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). Association for Computing Machinery, 2017.
  39. Community detection on large complex attribute network. In ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019.
  40. Learning with local and global consistency. In Advances in neural information processing systems, 2003.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com