ClusterNet: A Perception-Based Clustering Model for Scattered Data (2304.14185v3)
Abstract: Visualizations for scattered data are used to make users understand certain attributes of their data by solving different tasks, e.g. correlation estimation, outlier detection, cluster separation. In this paper, we focus on the later task, and develop a technique that is aligned to human perception, that can be used to understand how human subjects perceive clusterings in scattered data and possibly optimize for better understanding. Cluster separation in scatterplots is a task that is typically tackled by widely used clustering techniques, such as for instance k-means or DBSCAN. However, as these algorithms are based on non-perceptual metrics, we can show in our experiments, that their output do not reflect human cluster perception. We propose a learning strategy which directly operates on scattered data. To learn perceptual cluster separation on this data, we crowdsourced a large scale dataset, consisting of 7,320 point-wise cluster affiliations for bivariate data, which has been labeled by 384 human crowd workers. Based on this data, we were able to train ClusterNet, a point-based deep learning model, trained to reflect human perception of cluster separability. In order to train ClusterNet on human annotated data, we use a PointNet++ architecture enabling inference on point clouds directly. In this work, we provide details on how we collected our dataset, report statistics of the resulting annotations, and investigate perceptual agreement of cluster separation for real-world data. We further report the training and evaluation protocol of ClusterNet and introduce a novel metric, that measures the accuracy between a clustering technique and a group of human annotators. Finally, we compare our approach against existing state-of-the-art clustering techniques and can show, that ClusterNet is able to generalize to unseen and out of scope data.
- Rasika Amarasiri, Damminda Alahakoon and Kate A Smith “HDGSOM: a modified growing self-organizing map for high dimensional data clustering” In Fourth International Conference on Hybrid Intelligent Systems (HIS’04), 2004, pp. 216–221 IEEE
- “Clustme: A visual quality measure for ranking monochrome scatterplots based on cluster patterns” In Computer Graphics Forum 38.3, 2019, pp. 225–236 Wiley Online Library
- “OPTICS: Ordering points to identify the clustering structure” In ACM Sigmod record 28.2 ACM New York, NY, USA, 1999, pp. 49–60
- Vincent Arel-Bundock “Rdatasets: A collection of datasets originally distributed in various R packages” R package version 1.0.0, 2023 URL: https://vincentarelbundock.github.io/Rdatasets
- “Toward perception-based evaluation of clustering techniques for visual analytics” In 2019 IEEE Visualization Conference (VIS), 2019, pp. 141–145 IEEE
- “ClustML: A Measure of Cluster Pattern Complexity in Scatterplots Learnt from Human-labeled Groupings” In arXiv preprint arXiv:2106.00599, 2021 URL: https://api.semanticscholar.org/CorpusID:235266213
- “Inference in model-based cluster analysis” In statistics and Computing 7 Springer, 1997, pp. 1–10
- James C Bezdek, Robert Ehrlich and William Full “FCM: The fuzzy c-means clustering algorithm” In Computers & geosciences 10.2-3 Elsevier, 1984, pp. 191–203
- Michael Buhrmester, Tracy Kwang and Samuel D Gosling “Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data?” In Perspectives on psychological science 6.1 Sage Publications Sage CA: Los Angeles, CA, 2011, pp. 3–5
- “A simple framework for contrastive learning of visual representations” In International conference on machine learning, 2020, pp. 1597–1607 PMLR
- “Mean shift: A robust approach toward feature space analysis” In IEEE Transactions on pattern analysis and machine intelligence 24.5 IEEE, 2002, pp. 603–619
- “Unsupervised learning of visual features by contrasting cluster assignments” In Advances in neural information processing systems 33, 2020, pp. 9912–9924
- Zhipeng Ding, Xu Han and Marc Niethammer “Votenet: A deep learning label fusion method for multi-atlas segmentation” In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part III 22, 2019, pp. 202–210 Springer
- Inderjit S Dhillon, Subramanyam Mallela and Dharmendra S Modha “Information-theoretic co-clustering” In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 89–98
- K-L Du “Clustering: A neural network approach” In Neural networks 23.1 Elsevier, 2010, pp. 89–107
- “Role of human perception in cluster-based visual analysis of multidimensional data projections” In 2014 International Conference on Information Visualization Theory and Applications (IVAPP), 2014, pp. 276–283 IEEE
- “A density-based algorithm for discovering clusters in large spatial databases with noise.” In kdd 96.34, 1996, pp. 226–231
- “Sparse subspace clustering: Algorithm, theory, and applications” In IEEE transactions on pattern analysis and machine intelligence 35.11 IEEE, 2013, pp. 2765–2781
- Brendan J Frey and Delbert Dueck “Clustering by passing messages between data points” In science 315.5814 American Association for the Advancement of Science, 2007, pp. 972–976
- “Fast and accurate cnn-based brushing in scatterplots” In Computer Graphics Forum 37.3, 2018, pp. 111–120 Wiley Online Library
- “On sketch-based selections from scatterplots using KDE, compared to Mahalanobis and CNN brushing” In IEEE Computer Graphics and Applications 41.5 IEEE, 2021, pp. 67–78
- “A novel self-organizing map (SOM) neural network for discrete groups of data clustering” In Applied Soft Computing 11.4 Elsevier, 2011, pp. 3771–3778
- “Crowdsourcing graphical perception: using mechanical turk to assess visualization design” In Proceedings of the SIGCHI conference on human factors in computing systems, 2010, pp. 203–212 ACM
- “Neural network-based clustering using pairwise constraints” In arXiv preprint arXiv:1511.06321, 2015
- Yen-Chang Hsu, Zhaoyang Lv and Zsolt Kira “Learning to cluster in order to transfer across domains and tasks” In arXiv preprint arXiv:1711.10125, 2017
- “Multi-class classification without multi-class labels” In arXiv preprint arXiv:1901.00544, 2019
- “Monte carlo convolution for learning on non-uniformly sampled point clouds” In ACM Transactions on Graphics (TOG) 37.6 ACM New York, NY, USA, 2018, pp. 1–12
- “Learning Human Viewpoint Preferences from Sparsely Annotated Models” In Computer Graphics Forum 41.6, 2022, pp. 453–466 Wiley Online Library
- John A Hartigan and Manchek A Wong “Algorithm AS 136: A k-means clustering algorithm” In Journal of the royal statistical society. series c (applied statistics) 28.1 JSTOR, 1979, pp. 100–108
- Ian T Jolliffe “Principal component analysis for special types of data” Springer, 2002
- “End-to-end 3D point cloud instance segmentation without detection” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12796–12805
- Diederik P Kingma and Jimmy Ba “Adam: A method for stochastic optimization” In arXiv preprint arXiv:1412.6980, 2014
- Teuvo Kohonen “Self-organizing maps: ophmization approaches” In Artificial neural networks Elsevier, 1991, pp. 981–990
- “Gradient-based learning applied to document recognition” In Proceedings of the IEEE 86.11 Ieee, 1998, pp. 2278–2324
- Thomas M Martinetz, Stanislav G Berkovich and Klaus J Schulten “’Neural-gas’ network for vector quantization and its application to time-series prediction” In IEEE transactions on neural networks 4.4 IEEE, 1993, pp. 558–569
- “Ward’s hierarchical clustering method: clustering criterion and agglomerative algorithm” In arXiv preprint arXiv:1111.6285, 2011
- “Scatternet: A deep subjective similarity model for visual analysis of scatterplots” In IEEE transactions on visualization and computer graphics 26.3 IEEE, 2018, pp. 1562–1576
- “Approximation bounds for hierarchical clustering: Average linkage, bisecting k-means, and local search” In Advances in neural information processing systems 30, 2017
- Andrew Ng, Michael Jordan and Yair Weiss “On spectral clustering: Analysis and an algorithm” In Advances in neural information processing systems 14, 2001
- “Towards understanding human similarity perception in the analysis of large sets of scatter plots” In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 2016, pp. 3659–3669
- “Scikit-learn: Machine learning in Python” In the Journal of machine Learning research 12 JMLR. org, 2011, pp. 2825–2830 URL: https://scikit-learn.org/stable/modules/classes.html%5C#module-sklearn.manifold
- “Automatic Scatterplot Design Optimization for Clustering Identification” In IEEE Transactions on Visualization and Computer Graphics IEEE, 2022
- Ghulam Jilani Quadri and Paul Rosen “Modeling the influence of visual density on cluster perception in scatterplots using topology” In IEEE Transactions on Visualization and Computer Graphics 27.2 IEEE, 2020, pp. 1829–1839
- Ghulam Jilani Quadri and Paul Rosen “A survey of perception-based visualization studies by task” In IEEE transactions on visualization and computer graphics IEEE, 2021
- “Pointnet: Deep learning on point sets for 3d classification and segmentation” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660
- “Pointnet++: Deep hierarchical feature learning on point sets in a metric space” In Advances in neural information processing systems 30, 2017
- Peter J Rousseeuw “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis” In Journal of computational and applied mathematics 20 Elsevier, 1987, pp. 53–65
- “Data-driven evaluation of visual quality measures” In Computer Graphics Forum 34.3, 2015, pp. 201–210 Wiley Online Library
- “Clustering with Deep Neural Networks–An Overview of Recent Methods” In Network 39, 2020
- Michael Sedlmair, Tamara Munzner and Melanie Tory “Empirical guidance on scatterplot and dimension reduction technique choices” In IEEE transactions on visualization and computer graphics 19.12 IEEE, 2013, pp. 2634–2643
- “Selecting good views of high-dimensional data using class consistency” In Computer Graphics Forum 28.3, 2009, pp. 831–838 Wiley Online Library
- “Generalized learning vector quantization” In Advances in neural information processing systems 8, 1995
- “Line Weaver: Importance-Driven Order Enhanced Rendering of Dense Line Charts” In Computer Graphics Forum 40.3, 2021, pp. 399–410 Wiley Online Library
- “Visual quality metrics and human perception: an initial study on 2D projections of large multidimensional data” In Proceedings of the international conference on advanced visual interfaces, 2010, pp. 49–56
- “Agreement between an isolated rater and a group of raters” In Statistica Neerlandica 63.1 Wiley Online Library, 2009, pp. 82–100
- Laurens Van der Maaten and Geoffrey Hinton “Visualizing data using t-SNE.” In Journal of machine learning research 9.11, 2008
- Christian Onzenoodt, Pere-Pau Vázquez and Timo Ropinski “Out of the Plane: Flower Vs. Star Glyphs to Support High-Dimensional Exploration in Two-Dimensional Embeddings” In IEEE transactions on visualization and computer graphics IEEE, 2022
- “Interactive visual cluster analysis by contrastive dimensionality reduction” In IEEE Transactions on Visualization and Computer Graphics IEEE, 2022
- “Visual clustering factors in scatterplots” In IEEE Computer Graphics and Applications 41.5 IEEE, 2021, pp. 79–89
- Tian Zhang, Raghu Ramakrishnan and Miron Livny “BIRCH: A new data clustering algorithm and its applications” In Data mining and knowledge discovery 1.2 Springer, 1997, pp. 141–182