Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ENS-t-SNE: Embedding Neighborhoods Simultaneously t-SNE (2205.11720v3)

Published 24 May 2022 in cs.LG, cs.DS, and cs.HC

Abstract: When visualizing a high-dimensional dataset, dimension reduction techniques are commonly employed which provide a single 2-dimensional view of the data. We describe ENS-t-SNE: an algorithm for Embedding Neighborhoods Simultaneously that generalizes the t-Stochastic Neighborhood Embedding approach. By using different viewpoints in ENS-t-SNE's 3D embedding, one can visualize different types of clusters within the same high-dimensional dataset. This enables the viewer to see and keep track of the different types of clusters, which is harder to do when providing multiple 2D embeddings, where corresponding points cannot be easily identified. We illustrate the utility of ENS-t-SNE with real-world applications and provide an extensive quantitative evaluation with datasets of different types and sizes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. USDA food composition dataset.
  2. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD 1998, pp. 94–105. ACM Press, 1998.
  3. Animation, small multiples, and the effect of mental map preservation in dynamic graphs. IEEE TVCG, 17(4):539–552, 2010.
  4. Multidimensional scaling on multiple input distance matrices. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
  5. Subspace selection for clustering high-dimensional data. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK. IEEE Computer Society, 2004.
  6. M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems 14, pp. 585–591. MIT Press, 2001.
  7. Automated optimized parameters for t-distributed stochastic neighbor embedding improve visualization and analysis of large datasets. Nature communications, 10(1):1–12, 2019.
  8. M. Brehmer and T. Munzner. A multi-level typology of abstract visualization tasks. IEEE TVCG, 19(12):2376–2385, 2013.
  9. Y. Cao and L. Wang. Automatic selection of t-sne perplexity. CoRR, abs/1708.03229, 2017.
  10. D. Dua and C. Graff. UCI machine learning repository, 2017.
  11. Preserving the mental map of a diagram. Technical report, Technical Report IIAS-RR-91-16E, Fujitsu Laboratories, 1991.
  12. Toward a quantitative survey of dimension reduction techniques. IEEE Trans. Vis. Comput. Graph., 2021.
  13. A method for comparing two hierarchical clusterings. Journal of the American statistical association, 78(383):553–569, 1983.
  14. Feature learning for dimensionality reduction toward maximal extraction of hidden patterns. CoRR, abs/2206.13891, 2022.
  15. Stochastic neighbor embedding with gaussian and student-t distributions: Tutorial and survey. CoRR, abs/2009.10301, 2020.
  16. Visual comparison for information visualization. Inf. Vis., 10(4):289–309, 2011.
  17. Data mining: concepts and techniques. Elsevier, 2011.
  18. palmerpenguins: Palmer Archipelago (Antarctica) penguin data, 2020. R package version 0.1.0.
  19. Multi-perspective, simultaneous embedding. IEEE Trans. Vis. Comput. Graph., 27(2):1569–1579, 2021.
  20. Pattern trails: visual analysis of pattern transitions in subspaces. In VAST, pp. 1–12. IEEE, 2017.
  21. Evaluating the efficiency of physical visualizations. In 2013 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI ’13, Paris, France, April 27 - May 2, 2013, pp. 2593–2602. ACM, 2013.
  22. Opportunities and challenges for data physicalization. In CHI 2015, pp. 3227–3236. ACM, 2015.
  23. I. T. Jolliffe. Principal Component Analysis. Springer Series in Statistics. Springer, 1986.
  24. Multiview: a software package for multiview pattern recognition methods. Bioinformatics, (bty1039):1–3, 2018.
  25. D. Kobak and G. C. Linderman. Initialization is critical for preserving global data structure in both t-sne and umap. Nature biotechnology, 39(2):156–157, 2021.
  26. Y. LeCun and C. Cortes. MNIST handwritten digit database. 2010.
  27. Visualizing high-dimensional data: Advances in the past decade. IEEE Trans. Vis. Comput. Graph., 23(3):1249–1268, 2017.
  28. L. v. d. Maaten and G. Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11):2579–2605, 2008.
  29. UMAP: uniform manifold approximation and projection. J. Open Source Softw., 3(29):861, 2018.
  30. Multi-view clustering for multi-omics data using unified embedding. Scientific reports, 2020.
  31. Multi-view data visualisation via manifold learning. arXiv preprint:2101.06763, 2021.
  32. R. N. Shepard. The analysis of proximities: multidimensional scaling with an unknown distance function. Psychometrika, 27(2):125–140, 1962.
  33. Visualizing large-scale and high-dimensional data. In Proceedings of the 25th international conference on world wide web, pp. 287–297, 2016.
  34. Subspace search and visualization to make sense of alternative clusterings in high-dimensional data. In IEEE VAST 2012, pp. 63–72. IEEE Computer Society, 2012.
  35. W. S. Torgerson. Multidimensional scaling: I. theory and method. Psychometrika, 17:401–419, 1952.
  36. L. van der Maaten. Accelerating t-sne using tree-based algorithms. Journal of Machine Learning Research, 15(93):3221–3245, 2014.
  37. How to use t-sne effectively. Distill, 1(10):e2, 2016.
  38. Multiview spectral embedding. IEEE Trans. Syst. Man Cybern. Part B, 40(6):1438–1446, 2010.
  39. m-SNE: Multiview stochastic neighbor embedding. In Neural Information Processing. Theory and Algorithms, pp. 338–346. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.
  40. m-SNE: Multiview stochastic neighbor embedding. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 41(4):1088–1096, 2011.
  41. Dimension projection matrix/tree: Interactive subspace visual exploration and analysis of high dimensional data. IEEE Trans. Vis. Comput. Graph., 19(12):2625–2633, 2013.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com