Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 37 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 10 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 448 tok/s Pro
Claude Sonnet 4 31 tok/s Pro
2000 character limit reached

Unsupervised Learning via Network-Aware Embeddings (2309.10408v1)

Published 19 Sep 2023 in cs.LG, cs.AI, cs.SI, and physics.data-an

Abstract: Data clustering, the task of grouping observations according to their similarity, is a key component of unsupervised learning -- with real world applications in diverse fields such as biology, medicine, and social science. Often in these fields the data comes with complex interdependencies between the dimensions of analysis, for instance the various characteristics and opinions people can have live on a complex social network. Current clustering methods are ill-suited to tackle this complexity: deep learning can approximate these dependencies, but not take their explicit map as the input of the analysis. In this paper, we aim at fixing this blind spot in the unsupervised learning literature. We can create network-aware embeddings by estimating the network distance between numeric node attributes via the generalized Euclidean distance. Differently from all methods in the literature that we know of, we do not cluster the nodes of the network, but rather its node attributes. In our experiments we show that having these network embeddings is always beneficial for the learning task; that our method scales to large networks; and that we can actually provide actionable insights in applications in a variety of fields such as marketing, economics, and political science. Our method is fully open source and data and code are available to reproduce all results in the paper.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Charu C Aggarwal et al. Neural networks and deep learning. Springer, 10(978):3, 2018.
  2. Clustering with deep learning: Taxonomy and new methods. arXiv preprint arXiv:1801.07648, 2018.
  3. Optics: Ordering points to identify the clustering structure. ACM Sigmod record, 28(2):49–60, 1999.
  4. The Growth Lab at Harvard University. International Trade Data (SITC, Rev. 2), 2019. URL https://doi.org/10.7910/DVN/H8SFD2.
  5. Neighbors and the evolution of the comparative advantage of nations: Evidence of international knowledge diffusion? Journal of International Economics, 92(1):111–123, 2014.
  6. Spectral clustering with graph neural networks for graph pooling. In International conference on machine learning, pp. 874–883. PMLR, 2020.
  7. Structural deep clustering network. In Proceedings of the web conference 2020, pp.  1400–1410, 2020.
  8. The structure and dynamics of multilayer networks. Physics reports, 544(1):1–122, 2014.
  9. Clustering attributed graphs: models, measures and methods. Network Science, 3(3):408–444, 2015.
  10. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42, 2017.
  11. Unsupervised deep clustering via contractive feature representation and focal loss. Pattern Recognition, 123:108386, 2022.
  12. Multi-view attribute graph convolution networks for clustering. In Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence, pp. 2973–2979, 2021.
  13. Petr Chunaev. Community detection in node-attributed social networks: a survey. Computer Science Review, 37:100286, 2020.
  14. Michele Coscia. Generalized euclidean measure to estimate network distances. In Proceedings of the International AAAI Conference on Web and Social Media, volume 14, pp.  119–129, 2020.
  15. Michele Coscia. The atlas for the aspiring network scientist. arXiv preprint arXiv:2101.00863, 2021a.
  16. Michele Coscia. Pearson correlations on complex networks. Journal of Complex Networks, 9(6):cnab036, 2021b.
  17. Michele Coscia. Generalized euclidean measure to estimate distances on multilayer networks. ACM Transactions on Knowledge Discovery from Data (TKDD), 16(6):1–22, 2022.
  18. Network backboning with noisy data. In 2017 IEEE 33rd international conference on data engineering (ICDE), pp.  425–436. IEEE, 2017.
  19. The node vector distance problem in complex networks. ACM Computing Surveys (CSUR), 53(6):1–27, 2020.
  20. Generative adversarial networks: An overview. IEEE signal processing magazine, 35(1):53–65, 2018.
  21. Variance and covariance of distributions on graphs. SIAM Review, 64(2):343–359, 2022.
  22. A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, volume 96, pp.  226–231, 1996.
  23. A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Engineering Applications of Artificial Intelligence, 110:104743, 2022.
  24. Semi-supervised cluster analysis of imaging data. NeuroImage, 54(3):2185–2197, 2011.
  25. Santo Fortunato. Community detection in graphs. Physics reports, 486(3-5):75–174, 2010.
  26. Resolution limit in community detection. Proceedings of the national academy of sciences, 104(1):36–41, 2007.
  27. Community detection in networks: A user guide. Physics reports, 659:1–44, 2016.
  28. Data clustering: theory, algorithms, and applications. SIAM, 2020.
  29. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp.  855–864, 2016.
  30. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.
  31. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
  32. The atlas of economic complexity: Mapping paths to prosperity. Mit Press, 2014.
  33. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016a.
  34. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016b.
  35. Multilayer networks. Journal of complex networks, 2(3):203–271, 2014.
  36. A nearly-m log n time solver for sdd linear systems. In 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, pp.  590–598. IEEE, 2011.
  37. Mark A Kramer. Nonlinear principal component analysis using autoassociative neural networks. AIChE journal, 37(2):233–243, 1991.
  38. Approximate gaussian elimination for laplacians-fast, sparse, and simple. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp.  573–582. IEEE, 2016.
  39. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th international conference on World wide web, pp.  631–640, 2010.
  40. Multi-view attributed graph clustering. IEEE Transactions on knowledge and data engineering, 2021.
  41. LittleSis. Littlesis is a free database detailing the connections between powerful people and organizations, 2022. Data retrieved from https://littlesis.org/bulk_data. Last update date Nov 15th, 2022.
  42. PC Mahalanobis. On the generalized distance in statistics. National Institute of Science of India, 1936.
  43. Skin cancer detection from dermoscopic images using deep learning and fuzzy k-means clustering. Microscopy research and technique, 85(1):339–351, 2022.
  44. Deep learning for anomaly detection: A review. ACM computing surveys (CSUR), 54(2):1–38, 2021.
  45. The ground truth about metadata and community detection in networks. Science advances, 3(5):e1602548, 2017.
  46. Focused clustering and outlier detection in large attributed graphs. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.  1346–1355, 2014a.
  47. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.  701–710, 2014b.
  48. Congress: A political-economic history of roll call voting. Oxford University Press, USA, 2000.
  49. Mason A Porter. What is… a multilayer network. Notices of the AMS, 65(11):1419–1423, 2018.
  50. High-throughput genotyping with single nucleotide polymorphisms. Genome Research, 11(7):1262–1268, 2001.
  51. Community discovery in dynamic networks: a survey. ACM computing surveys (CSUR), 51(2):1–37, 2018.
  52. The graph neural network model. IEEE transactions on neural networks, 20(1):61–80, 2008.
  53. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pp.  81–90, 2004.
  54. Nearly linear time algorithms for preconditioning and solving symmetric, diagonally dominant linear systems. SIAM Journal on Matrix Analysis and Applications, 35(3):835–885, 2014.
  55. Graph clustering with graph neural networks. arXiv preprint arXiv:2006.16904, 2020.
  56. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  57. Graph attention networks. stat, 1050(20):10–48550, 2017.
  58. Information theoretic measures for clusterings comparison: is a correction for chance necessary? In Proceedings of the 26th annual international conference on machine learning, pp.  1073–1080, 2009.
  59. Attributed graph clustering: A deep attentional embedding approach. arXiv preprint arXiv:1906.06532, 2019.
  60. Graph neural networks: foundation, frontiers and applications. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.  4840–4841, 2022.
  61. Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pp. 478–487. PMLR, 2016.
  62. Community detection in networks with node attributes. In 2013 IEEE 13th international conference on data mining, pp.  1151–1156. IEEE, 2013.
  63. Variational co-embedding learning for attributed network clustering. Knowledge-Based Systems, 270:110530, 2023.
  64. Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp.  793–803, 2019.
  65. Graph neural networks: A review of methods and applications. AI open, 1:57–81, 2020a.
  66. Towards deeper graph neural networks with differentiable group normalization. Advances in neural information processing systems, 33:4917–4928, 2020b.

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube