WizMap: Scalable Interactive Visualization for Exploring Large Machine Learning Embeddings (2306.09328v1)
Abstract: Machine learning models often learn latent embedding representations that capture the domain semantics of their training data. These embedding representations are valuable for interpreting trained models, building new models, and analyzing new datasets. However, interpreting and using embeddings can be challenging due to their opaqueness, high dimensionality, and the large size of modern datasets. To tackle these challenges, we present WizMap, an interactive visualization tool to help researchers and practitioners easily explore large embeddings. With a novel multi-resolution embedding summarization method and a familiar map-like interaction design, WizMap enables users to navigate and interpret embedding spaces with ease. Leveraging modern web technologies such as WebGL and Web Workers, WizMap scales to millions of embedding points directly in users' web browsers and computational notebooks without the need for dedicated backend servers. WizMap is open-source and available at the following public demo link: https://poloclub.github.io/wizmap.
- Parallel embeddings: A visualization technique for contrasting learned representations. In ACM IUI.
- BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection. AAAI, 33.
- Embedding Comparator: Visualizing Differences in Global Structure and Local Neighborhoods via Small Multiples. In ACM IUI.
- Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Advances in Neural Information Processing Systems, volume 29.
- Ali Borji. 2022. Generated Faces in the Wild: Quantitative Comparison of Stable Diffusion, Midjourney and DALL-E 2. arXiv 2210.00586.
- D33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT Data-Driven Documents. IEEE TVCG, 17.
- Activation Atlas. Distill, 4.
- Andy Coenen and Adam Pearce. 2019. Understanding UMAP.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805.
- With a little help from my friends: Nearest-neighbor contrastive learning of visual representations. In ICCV.
- R. A. Finkel and J. L. Bentley. 1974. Quad trees a data structure for retrieval on composite keys. Acta Informatica, 4.
- Michael Gleicher. 2018. Considerations for Visualizing Comparison. IEEE TVCG, 24.
- Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794.
- embComp : Visual Interactive Comparison of Vector Embeddings. IEEE TVCG, 28.
- Newline Delimited JSON: A standard for delimiting JSON in stream protocols.
- Gaining insights from social media language: Methodologies and challenges. Psychological Methods, 21.
- Characterizing Automated Data Insights. In 2020 IEEE Visualization Conference (VIS).
- CleanNet: Transfer learning for scalable image classifier training with label noise. In CVPR.
- Fritz Lekschas. 2023. Regl-Scatterplot: A Scalable InteractiveJavaScript-based Scatter Plot Library. Journal of Open Source Software, 8.
- EmbeddingVis: A Visual Analytics Approach to Comparative Network Embedding Inspection. arXiv:1808.09074.
- Visual Exploration of Semantic Relationships in Neural Word Embeddings. IEEE TVCG, 24.
- Latent Space Cartography: Visual Analysis of Vector Space Embeddings. Computer Graphics Forum, 38.
- Mikola Lysenko. 2016. Regl: Functional WebGL.
- UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426.
- Efficient Estimation of Word Representations in Vector Space. arXiv 1301.3781.
- NeuroCartography: Scalable Automatic Visual Summarization of Concepts in Deep Neural Networks. IEEE TVCG.
- Karl Pearson. 1901. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2.
- Scikit-learn: Machine learning in python. JMLR, 12.
- Deep contextualized word representations. In NAACL HLT.
- Learning transferable visual models from natural language supervision. In ICML.
- Transfusion: Understanding transfer learning for medical imaging. In Advances in Neural Information Processing Systems, volume 32.
- Angler: Helping Machine Translation Practitioners Prioritize Model Improvements. In CHI Conference on Human Factors in Computing Systems.
- Shaurya Rohatgi. 2022. ACL anthology corpus with full text. Github.
- High-resolution image synthesis with latent diffusion models. In CVPR.
- Murray Rosenblatt. 1956. Remarks on Some Nonparametric Estimates of a Density Function. The Annals of Mathematical Statistics, 27.
- Benjamin Schmidt. 2021. Deepscatter: Zoomable, animated scatterplots in the browser that scales over a billion points.
- Visual Comparison of Language Model Adaptation. IEEE TVCG.
- Bernard W Silverman. 2018. Density Estimation for Statistics and Data Analysis.
- Emblaze: Illuminating Machine Learning Representations through Interactive Comparison of Embedding Spaces. In ACM IUI.
- Embedding Projector: Interactive Visualization and Interpretation of Embeddings. arXiv 1611.05469.
- Mpnet: Masked and permuted pre-training for language understanding. Advances in Neural Information Processing Systems, 33.
- Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28.
- Visualizing Large-scale and High-dimensional Data. In WWW.
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. JMLR, 9.
- DiffusionDB: A large-scale prompt gallery dataset for text-to-image generative models. arXiv:2210.14896.
- NOVA: A Practical Method for Creating Notebook-Ready Visual Analytics. arXiv:2205.03963.
- SuperNOVA: Design Strategies and Opportunities for Interactive Visualization in Computational Notebooks. arXiv 2305.03039.
- Thomas Wilkerling. 2019. FlexSearch: Next-Generation full text search library for Browser and Node.js.