Lens functions for exploring UMAP Projections with Domain Knowledge (2405.09204v1)
Abstract: Dimensionality reduction algorithms are often used to visualise high-dimensional data. Previously, studies have used prior information to enhance or suppress expected patterns in projections. In this paper, we adapt such techniques for domain knowledge guided interactive exploration. Inspired by Mapper and STAD, we present three types of lens functions for UMAP, a state-of-the-art dimensionality reduction algorithm. Lens functions enable analysts to adapt projections to their questions, revealing otherwise hidden patterns. They filter the modelled connectivity to explore the interaction between manually selected features and the data's structure, creating configurable perspectives each potentially revealing new insights. The effectiveness of the lens functions is demonstrated in two use cases and their computational cost is analysed in a synthetic benchmark. Our implementation is available in an open-source Python package: https://github.com/vda-lab/lensed_umap.
- D. Sacha, L. Zhang, M. Sedlmair, J. A. Lee, J. Peltonen, D. Weiskopf, S. C. North, and D. A. Keim, “Visual Interaction with Dimensionality Reduction: A Structured Literature Analysis,” IEEE Trans. Vis. Comput. Graph., vol. 23, no. 1, pp. 241–250, 2017.
- L. G. Nonato and M. Aupetit, “Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment,” IEEE Trans. Vis. Comput. Graph., vol. 25, no. 8, pp. 2650–2673, 2019.
- T. Fujiwara, Y. H. Kuo, A. Ynnerman, and K. L. Ma, “Feature Learning for Nonlinear Dimensionality Reduction toward Maximal Extraction of Hidden Patterns,” IEEE Pacific Vis. Symp., vol. 2023-April, pp. 122–131, 2023.
- V. M. Vu, A. Bibal, and B. Frenay, “Integrating Constraints Into Dimensionality Reduction for Visualization: A Survey,” IEEE Trans. Artif. Intell., vol. 3, no. 6, pp. 944–962, dec 2022.
- G. Singh, F. Mémoli, and G. Carlsson, “Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition,” PGB@ Eurographics, vol. 2, pp. 91–100, sep 2007.
- D. Alcaide and J. Aerts, “Spanning Trees as Approximation of Data Structures,” IEEE Trans. Vis. Comput. Graph., vol. 27, no. 10, pp. 3994–4008, 2021.
- S. Biasotti, D. Giorgi, M. Spagnuolo, and B. Falcidieno, “Reeb graphs for shape analysis and applications,” Theor. Comput. Sci., vol. 392, no. 1-3, pp. 5–22, 2008.
- J. Mackinlay, “Automating the design of graphical presentations of relational information,” ACM Trans. Graph., vol. 5, no. 2, pp. 110–141, apr 1986.
- P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, and G. Carlsson, “Extracting insights from the shape of complex data using topology,” Sci. Rep., vol. 3, no. 1, p. 1236, feb 2013.
- J. L. Nielson, J. Paquette, A. W. Liu, C. F. Guandique, C. A. Tovar, T. Inoue, K.-A. Irvine, J. C. Gensel, J. Kloke, T. C. Petrossian, P. Y. Lum, G. E. Carlsson, G. T. Manley, W. Young, M. S. Beattie, J. C. Bresnahan, and A. R. Ferguson, “Topological data analysis for discovery in preclinical spinal cord injury and traumatic brain injury,” Nat. Commun., vol. 6, no. 1, pp. 1—-12, 2015.
- L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,” feb 2018.
- D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98, may 2013.
- A. Ortega, P. Frossard, J. Kovacevic, J. M. Moura, and P. Vandergheynst, “Graph Signal Processing: Overview, Challenges, and Applications,” Proc. IEEE, vol. 106, no. 5, pp. 808–828, 2018.
- A. Sandryhaila and J. M. F. Moura, “Discrete Signal Processing on Graphs: Frequency Analysis,” IEEE Trans. Signal Process., vol. 62, no. 12, pp. 3042–3054, jun 2014.
- X. Zhu and M. Rabbat, “Graph spectral compressed sensing for sensor networks,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., pp. 2865–2868, 2012.
- B. Girault, P. Goncalves, and E. Fleury, “Translation on Graphs: An Isometric Shift Operator,” IEEE Signal Process. Lett., vol. 22, no. 12, pp. 2416–2420, 2015.
- J. Pang, G. Cheung, A. Ortega, and O. C. Au, “Optimal graph laplacian regularization for natural image denoising,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2015-August, pp. 2294–2298, 2015.
- X. Dong, D. Thanou, P. Frossard, and P. Vandergheynst, “Learning Laplacian Matrix in Smooth Graph Signal Representations,” IEEE Trans. Signal Process., vol. 64, no. 23, pp. 6160–6173, dec 2016.
- D. M. Mohan, M. T. Asif, N. Mitrovic, J. Dauwels, and P. Jaillet, “Wavelets on graphs with application to transportation networks,” 2014 17th IEEE Int. Conf. Intell. Transp. Syst. ITSC 2014, pp. 1707–1712, 2014.
- X. Yang, H. Fu, H. Zha, and J. Barlow, “Semi-supervised nonlinear dimensionality reduction,” ACM Int. Conf. Proceeding Ser., vol. 148, pp. 1065–1072, 2006.
- H. Cevikalp, J. Verbeek, F. Jurie, and A. Klaser, “Semi-supervised dimensionality reduction using pairwise equivalence constraints.” in VISAPP ’08 - 3rd Int. Conf. Comput. Vis. Theory Appl., A. Ranchordas and H. Araújo, Eds. Funchal, Portugal: INSTICC, 2008, pp. 489–496.
- D. H. Jeong, C. Ziemkiewicz, B. Fisher, W. Ribarsky, and R. Chang, “iPCA: An Interactive System for PCA-based Visual Analytics,” Comput. Graph. Forum, vol. 28, no. 3, pp. 767–774, jun 2009.
- M. Sugiyama, “Local Fisher discriminant analysis for supervised dimensionality reduction,” in Proc. 23rd Int. Conf. Mach. Learn. - ICML ’06, vol. 148. New York, New York, USA: ACM Press, 2006, pp. 905–912.
- T. Höllt, A. Vilanova, N. Pezzotti, B. P. Lelieveldt, and H. Hauser, “Focus+context exploration of hierarchical embeddings,” Comput. Graph. Forum, vol. 38, no. 3, pp. 569–579, 2019.
- A. Machado, A. Telea, and M. Behrisch, “ShaRP: Shape-Regularized Multidimensional Projections,” in EuroVis Workshop on Visual Analytics (EuroVA), M. Angelini and M. El-Assady, Eds. The Eurographics Association, 2023.
- M. Moor, M. Horn, B. Rieck, and K. Borgwardt, “Topological autoencoders,” in Proceedings of the 37th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, H. D. III and A. Singh, Eds., vol. 119. PMLR, 13–18 Jul 2020, pp. 7045–7054. [Online]. Available: https://proceedings.mlr.press/v119/moor20a.html
- R. Vandaele, B. Kang, J. Lijffijt, T. De Bie, and Y. Saeys, “Topologically Regularized Data Embeddings,” ICLR 2022 - 10th Int. Conf. Learn. Represent., pp. 1–26, oct 2021.
- E. Heiter, R. Vandaele, T. De Bie, Y. Saeys, and J. Lijffijt, “Topologically Regularized Data Embeddings,” jan 2023.
- B. Kang, D. García García, J. Lijffijt, R. Santos-Rodríguez, and T. De Bie, “Conditional t-SNE: more informative t-SNE embeddings,” Mach. Learn., vol. 110, no. 10, pp. 2905–2940, oct 2021.
- E. Heiter, B. Kang, R. Seurinck, and J. Lijffijt, “Revised Conditional t-SNE: Looking Beyond the Nearest Neighbors,” in Adv. Intell. Data Anal. XXI, B. Rémilleux, S. Hess, and S. Nijssen, Eds. Cham: Springer Nature Switzerland, 2023, pp. 169–181.
- A. Chatzimparmpas, R. M. Martins, and A. Kerren, “t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections,” IEEE Trans. Vis. Comput. Graph., vol. 26, no. 8, pp. 2696–2714, aug 2020.
- R. R. O. Silva, P. E. Rauber, R. M. Martins, R. Minghim, and A. C. Telea, “Attribute-based Visual Explanation of Multidimensional Projections,” in EuroVis Work. Vis. Anal., 2015.
- J. Thijssen, Z. Tian, and A. Telea, “Scaling Up the Explanation of Multidimensional Projections,” in Int. Work. Vis. Anal., vol. 2023-June, 2023, pp. 61–66.
- R. Bian, Y. Xue, L. Zhou, J. Zhang, B. Chen, D. Weiskopf, and Y. Wang, “Implicit Multidimensional Projection of Local Subspaces,” IEEE Trans. Vis. Comput. Graph., no. c, pp. 1–1, 2020.
- K. Eckelt, A. Hinterreiter, P. Adelberger, C. Walchshofer, V. Dhanoa, C. Humer, M. Heckmann, C. Steinparz, and M. Streit, “Visual Exploration of Relationships and Structure in Low-Dimensional Embeddings,” IEEE Trans. Vis. Comput. Graph., vol. 29, no. 7, pp. 3312–3326, jul 2023.
- L. McInnes, J. Healy, N. Saul, and L. Grossberger, “Umap: Uniform manifold approximation and projection,” The Journal of Open Source Software, vol. 3, no. 29, p. 861, 2018.
- L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, pp. 2579–2625, nov 2008.
- F. V. Paulovich and R. Minghim, “Text Map Explorer: A tool to create and explore document maps,” Proc. Int. Conf. Inf. Vis., pp. 245–251, 2006.
- R. Minghim, F. V. Paulovich, and A. de Andrade Lopes, “Content-based text mapping using multi-dimensional projections for exploration of document collections,” Vis. Data Anal. 2006, vol. 6060, no. 0, p. 60600S, 2006.
- M. Espadoto, R. M. Martins, A. Kerren, N. S. T. Hirata, and A. C. Telea, “Toward a Quantitative Survey of Dimension Reduction Techniques,” IEEE Trans. Vis. Comput. Graph., vol. 27, no. 3, pp. 2153–2173, mar 2021.
- A. Narayan, B. Berger, and H. Cho, “Assessing single-cell transcriptomic variability through density-preserving data visualization,” Nat. Biotechnol., vol. 39, no. 6, pp. 765–774, jun 2021.
- A. Dalmia and S. Sia, “Clustering with UMAP: Why and How Connectivity Matters,” 2021.
- M. Tariqul Islam and J. W. Fleischer, “Manifold-aligned Neighbor Embedding,” arXiv e-prints, no. 0, p. arXiv:2205.11257, 2022.
- C. J. Nolet, V. Lafargue, E. Raff, T. Nanditale, T. Oates, J. Zedlewski, and J. Patterson, “Bringing UMAP Closer to the Speed of Light with GPU Acceleration,” 35th AAAI Conf. Artif. Intell. AAAI 2021, vol. 1, pp. 418–426, 2021.
- D. Kobak and G. C. Linderman, “Initialization is critical for preserving global data structure in both t-SNE and UMAP,” Nat. Biotechnol., vol. 39, no. 2, pp. 156–157, 2021.
- L. J. van ’t Veer, H. Dai, M. J. van de Vijver, Y. D. He, A. A. M. Hart, M. Mao, H. L. Peterse, K. van der Kooy, M. J. Marton, A. T. Witteveen, G. J. Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Linsley, R. Bernards, and S. H. Friend, “Gene expression profiling predicts clinical outcome of breast cancer,” Nature, vol. 415, no. 6871, pp. 530–536, jan 2002.
- Datos Abiertos de Castilla y León, “CALIDAD DEL AIRE (POR DÍAS),” https://datosabiertos.jcyl.es/web/jcyl/set/es/medio-ambiente/calidad_aire_historico/1284212629698, 2012, accessed: 2024-01-22.
- J. A. Bednar, J. Crail, I. Thomas, J. Crist-Harif, P. Rudiger, G. Brener, C. B, J. Mease, J. Signell, M. Liquet, J.-L. Stevens, B. Collins, S. H. Hansen, thuydotm, A. Thorve, esc, kbowen, N. Abdennur, O. Smirnov, maihde, A. Hawley, A. Oriekhov, A. Ahmadia, B. A. B. Jr, C. H. Brandt, C. Tolboom, E. G., E. Welch, J. Bourbeau, and J. J. Schmidt, “holoviz/datashader: Version 0.16.0,” Oct. 2023. [Online]. Available: https://doi.org/10.5281/zenodo.10044690
- Y. W. R. I. Hu, “Efficient and High Quality Force-Directed Graph Drawing,” Math. J., vol. 10, no. 1, pp. 37–71, 2005.
- M. Zhu, W. Chen, Y. Hu, Y. Hou, L. Liu, and K. Zhang, “DRGraph: An Efficient Graph Layout Algorithm for Large-scale Graphs by Dimensionality Reduction,” aug 2020.
- F. Zhong, M. Xue, J. Zhang, F. Zhang, R. Ban, O. Deussen, and Y. Wang, “Force-Directed Graph Layouts Revisited: A New Force Based on the T-Distribution,” IEEE Trans. Vis. Comput. Graph., pp. 1–14, 2023.