Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ClusterNet: A Perception-Based Clustering Model for Scattered Data (2304.14185v3)

Published 27 Apr 2023 in cs.LG and cs.HC

Abstract: Visualizations for scattered data are used to make users understand certain attributes of their data by solving different tasks, e.g. correlation estimation, outlier detection, cluster separation. In this paper, we focus on the later task, and develop a technique that is aligned to human perception, that can be used to understand how human subjects perceive clusterings in scattered data and possibly optimize for better understanding. Cluster separation in scatterplots is a task that is typically tackled by widely used clustering techniques, such as for instance k-means or DBSCAN. However, as these algorithms are based on non-perceptual metrics, we can show in our experiments, that their output do not reflect human cluster perception. We propose a learning strategy which directly operates on scattered data. To learn perceptual cluster separation on this data, we crowdsourced a large scale dataset, consisting of 7,320 point-wise cluster affiliations for bivariate data, which has been labeled by 384 human crowd workers. Based on this data, we were able to train ClusterNet, a point-based deep learning model, trained to reflect human perception of cluster separability. In order to train ClusterNet on human annotated data, we use a PointNet++ architecture enabling inference on point clouds directly. In this work, we provide details on how we collected our dataset, report statistics of the resulting annotations, and investigate perceptual agreement of cluster separation for real-world data. We further report the training and evaluation protocol of ClusterNet and introduce a novel metric, that measures the accuracy between a clustering technique and a group of human annotators. Finally, we compare our approach against existing state-of-the-art clustering techniques and can show, that ClusterNet is able to generalize to unseen and out of scope data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Rasika Amarasiri, Damminda Alahakoon and Kate A Smith “HDGSOM: a modified growing self-organizing map for high dimensional data clustering” In Fourth International Conference on Hybrid Intelligent Systems (HIS’04), 2004, pp. 216–221 IEEE
  2. “Clustme: A visual quality measure for ranking monochrome scatterplots based on cluster patterns” In Computer Graphics Forum 38.3, 2019, pp. 225–236 Wiley Online Library
  3. “OPTICS: Ordering points to identify the clustering structure” In ACM Sigmod record 28.2 ACM New York, NY, USA, 1999, pp. 49–60
  4. Vincent Arel-Bundock “Rdatasets: A collection of datasets originally distributed in various R packages” R package version 1.0.0, 2023 URL: https://vincentarelbundock.github.io/Rdatasets
  5. “Toward perception-based evaluation of clustering techniques for visual analytics” In 2019 IEEE Visualization Conference (VIS), 2019, pp. 141–145 IEEE
  6. “ClustML: A Measure of Cluster Pattern Complexity in Scatterplots Learnt from Human-labeled Groupings” In arXiv preprint arXiv:2106.00599, 2021 URL: https://api.semanticscholar.org/CorpusID:235266213
  7. “Inference in model-based cluster analysis” In statistics and Computing 7 Springer, 1997, pp. 1–10
  8. James C Bezdek, Robert Ehrlich and William Full “FCM: The fuzzy c-means clustering algorithm” In Computers & geosciences 10.2-3 Elsevier, 1984, pp. 191–203
  9. Michael Buhrmester, Tracy Kwang and Samuel D Gosling “Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data?” In Perspectives on psychological science 6.1 Sage Publications Sage CA: Los Angeles, CA, 2011, pp. 3–5
  10. “A simple framework for contrastive learning of visual representations” In International conference on machine learning, 2020, pp. 1597–1607 PMLR
  11. “Mean shift: A robust approach toward feature space analysis” In IEEE Transactions on pattern analysis and machine intelligence 24.5 IEEE, 2002, pp. 603–619
  12. “Unsupervised learning of visual features by contrasting cluster assignments” In Advances in neural information processing systems 33, 2020, pp. 9912–9924
  13. Zhipeng Ding, Xu Han and Marc Niethammer “Votenet: A deep learning label fusion method for multi-atlas segmentation” In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part III 22, 2019, pp. 202–210 Springer
  14. Inderjit S Dhillon, Subramanyam Mallela and Dharmendra S Modha “Information-theoretic co-clustering” In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 89–98
  15. K-L Du “Clustering: A neural network approach” In Neural networks 23.1 Elsevier, 2010, pp. 89–107
  16. “Role of human perception in cluster-based visual analysis of multidimensional data projections” In 2014 International Conference on Information Visualization Theory and Applications (IVAPP), 2014, pp. 276–283 IEEE
  17. “A density-based algorithm for discovering clusters in large spatial databases with noise.” In kdd 96.34, 1996, pp. 226–231
  18. “Sparse subspace clustering: Algorithm, theory, and applications” In IEEE transactions on pattern analysis and machine intelligence 35.11 IEEE, 2013, pp. 2765–2781
  19. Brendan J Frey and Delbert Dueck “Clustering by passing messages between data points” In science 315.5814 American Association for the Advancement of Science, 2007, pp. 972–976
  20. “Fast and accurate cnn-based brushing in scatterplots” In Computer Graphics Forum 37.3, 2018, pp. 111–120 Wiley Online Library
  21. “On sketch-based selections from scatterplots using KDE, compared to Mahalanobis and CNN brushing” In IEEE Computer Graphics and Applications 41.5 IEEE, 2021, pp. 67–78
  22. “A novel self-organizing map (SOM) neural network for discrete groups of data clustering” In Applied Soft Computing 11.4 Elsevier, 2011, pp. 3771–3778
  23. “Crowdsourcing graphical perception: using mechanical turk to assess visualization design” In Proceedings of the SIGCHI conference on human factors in computing systems, 2010, pp. 203–212 ACM
  24. “Neural network-based clustering using pairwise constraints” In arXiv preprint arXiv:1511.06321, 2015
  25. Yen-Chang Hsu, Zhaoyang Lv and Zsolt Kira “Learning to cluster in order to transfer across domains and tasks” In arXiv preprint arXiv:1711.10125, 2017
  26. “Multi-class classification without multi-class labels” In arXiv preprint arXiv:1901.00544, 2019
  27. “Monte carlo convolution for learning on non-uniformly sampled point clouds” In ACM Transactions on Graphics (TOG) 37.6 ACM New York, NY, USA, 2018, pp. 1–12
  28. “Learning Human Viewpoint Preferences from Sparsely Annotated Models” In Computer Graphics Forum 41.6, 2022, pp. 453–466 Wiley Online Library
  29. John A Hartigan and Manchek A Wong “Algorithm AS 136: A k-means clustering algorithm” In Journal of the royal statistical society. series c (applied statistics) 28.1 JSTOR, 1979, pp. 100–108
  30. Ian T Jolliffe “Principal component analysis for special types of data” Springer, 2002
  31. “End-to-end 3D point cloud instance segmentation without detection” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12796–12805
  32. Diederik P Kingma and Jimmy Ba “Adam: A method for stochastic optimization” In arXiv preprint arXiv:1412.6980, 2014
  33. Teuvo Kohonen “Self-organizing maps: ophmization approaches” In Artificial neural networks Elsevier, 1991, pp. 981–990
  34. “Gradient-based learning applied to document recognition” In Proceedings of the IEEE 86.11 Ieee, 1998, pp. 2278–2324
  35. Thomas M Martinetz, Stanislav G Berkovich and Klaus J Schulten “’Neural-gas’ network for vector quantization and its application to time-series prediction” In IEEE transactions on neural networks 4.4 IEEE, 1993, pp. 558–569
  36. “Ward’s hierarchical clustering method: clustering criterion and agglomerative algorithm” In arXiv preprint arXiv:1111.6285, 2011
  37. “Scatternet: A deep subjective similarity model for visual analysis of scatterplots” In IEEE transactions on visualization and computer graphics 26.3 IEEE, 2018, pp. 1562–1576
  38. “Approximation bounds for hierarchical clustering: Average linkage, bisecting k-means, and local search” In Advances in neural information processing systems 30, 2017
  39. Andrew Ng, Michael Jordan and Yair Weiss “On spectral clustering: Analysis and an algorithm” In Advances in neural information processing systems 14, 2001
  40. “Towards understanding human similarity perception in the analysis of large sets of scatter plots” In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 2016, pp. 3659–3669
  41. “Scikit-learn: Machine learning in Python” In the Journal of machine Learning research 12 JMLR. org, 2011, pp. 2825–2830 URL: https://scikit-learn.org/stable/modules/classes.html%5C#module-sklearn.manifold
  42. “Automatic Scatterplot Design Optimization for Clustering Identification” In IEEE Transactions on Visualization and Computer Graphics IEEE, 2022
  43. Ghulam Jilani Quadri and Paul Rosen “Modeling the influence of visual density on cluster perception in scatterplots using topology” In IEEE Transactions on Visualization and Computer Graphics 27.2 IEEE, 2020, pp. 1829–1839
  44. Ghulam Jilani Quadri and Paul Rosen “A survey of perception-based visualization studies by task” In IEEE transactions on visualization and computer graphics IEEE, 2021
  45. “Pointnet: Deep learning on point sets for 3d classification and segmentation” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660
  46. “Pointnet++: Deep hierarchical feature learning on point sets in a metric space” In Advances in neural information processing systems 30, 2017
  47. Peter J Rousseeuw “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis” In Journal of computational and applied mathematics 20 Elsevier, 1987, pp. 53–65
  48. “Data-driven evaluation of visual quality measures” In Computer Graphics Forum 34.3, 2015, pp. 201–210 Wiley Online Library
  49. “Clustering with Deep Neural Networks–An Overview of Recent Methods” In Network 39, 2020
  50. Michael Sedlmair, Tamara Munzner and Melanie Tory “Empirical guidance on scatterplot and dimension reduction technique choices” In IEEE transactions on visualization and computer graphics 19.12 IEEE, 2013, pp. 2634–2643
  51. “Selecting good views of high-dimensional data using class consistency” In Computer Graphics Forum 28.3, 2009, pp. 831–838 Wiley Online Library
  52. “Generalized learning vector quantization” In Advances in neural information processing systems 8, 1995
  53. “Line Weaver: Importance-Driven Order Enhanced Rendering of Dense Line Charts” In Computer Graphics Forum 40.3, 2021, pp. 399–410 Wiley Online Library
  54. “Visual quality metrics and human perception: an initial study on 2D projections of large multidimensional data” In Proceedings of the international conference on advanced visual interfaces, 2010, pp. 49–56
  55. “Agreement between an isolated rater and a group of raters” In Statistica Neerlandica 63.1 Wiley Online Library, 2009, pp. 82–100
  56. Laurens Van der Maaten and Geoffrey Hinton “Visualizing data using t-SNE.” In Journal of machine learning research 9.11, 2008
  57. Christian Onzenoodt, Pere-Pau Vázquez and Timo Ropinski “Out of the Plane: Flower Vs. Star Glyphs to Support High-Dimensional Exploration in Two-Dimensional Embeddings” In IEEE transactions on visualization and computer graphics IEEE, 2022
  58. “Interactive visual cluster analysis by contrastive dimensionality reduction” In IEEE Transactions on Visualization and Computer Graphics IEEE, 2022
  59. “Visual clustering factors in scatterplots” In IEEE Computer Graphics and Applications 41.5 IEEE, 2021, pp. 79–89
  60. Tian Zhang, Raghu Ramakrishnan and Miron Livny “BIRCH: A new data clustering algorithm and its applications” In Data mining and knowledge discovery 1.2 Springer, 1997, pp. 141–182
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets