Enhancing Dimension-Reduced Scatter Plots with Class and Feature Centroids (2403.20246v1)
Abstract: Dimension reduction is increasingly applied to high-dimensional biomedical data to improve its interpretability. When datasets are reduced to two dimensions, each observation is assigned an x and y coordinates and is represented as a point on a scatter plot. A significant challenge lies in interpreting the meaning of the x and y axes due to the complexities inherent in dimension reduction. This study addresses this challenge by using the x and y coordinates derived from dimension reduction to calculate class and feature centroids, which can be overlaid onto the scatter plots. This method connects the low-dimension space to the original high-dimensional space. We illustrate the utility of this approach with data derived from the phenotypes of three neurogenetic diseases and demonstrate how the addition of class and feature centroids increases the interpretability of scatter plots.
- T. Hulsen, S. S. Jamuar, A. R. Moody, J. H. Karnes, O. Varga, S. Hedensted, R. Spreafico, D. A. Hafler, and E. F. McKinney, “From big data to precision medicine,” Frontiers in medicine, vol. 6, p. 34, 2019.
- B. B. Misra, C. Langefeld, M. Olivier, and L. A. Cox, “Integrated omics: tools, advances and future approaches,” Journal of molecular endocrinology, vol. 62, no. 1, pp. R21–R45, 2019.
- J. Hua, W. D. Tembe, and E. R. Dougherty, “Performance of feature-selection methods in the classification of high-dimension data,” Pattern Recognition, vol. 42, no. 3, pp. 409–424, 2009.
- R. Clarke, H. W. Ressom, A. Wang, J. Xuan, M. C. Liu, E. A. Gehan, and Y. Wang, “The properties of high-dimensional data spaces: implications for exploring gene and protein expression data,” Nature reviews cancer, vol. 8, no. 1, pp. 37–49, 2008.
- R. E. Colgan, D. E. Gutierrez, J. Sundram, and G. B. Tenali, “Analysis of medical data using dimensionality reduction techniques,” Przeglad Elektrotechniczny, vol. 89, pp. 279–281, 2013.
- J. T. De Souza, A. C. De Francisco, and D. C. De Macedo, “Dimensionality reduction in gene expression data sets,” IEEE Access, vol. 7, pp. 61 136–61 144, 2019.
- B. Ghojogh, A. Ghodsi, F. Karray, and M. Crowley, “Multidimensional scaling, sammon mapping, and isomap: Tutorial and survey,” arXiv preprint arXiv:2009.08136, 2020.
- D. Kobak and P. Berens, “The art of using t-sne for single-cell transcriptomics,” Nature communications, vol. 10, no. 1, p. 5416, 2019.
- G. C. Linderman, M. Rachh, J. G. Hoskins, S. Steinerberger, and Y. Kluger, “Fast interpolation-based t-sne for improved visualization of single-cell rna-seq data,” Nature methods, vol. 16, no. 3, pp. 243–245, 2019.
- M. Wattenberg, F. Viégas, and I. Johnson, “How to use t-sne effectively,” Distill, vol. 1, no. 10, p. e2, 2016.
- F. Anowar, S. Sadaoui, and B. Selim, “Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne),” Computer Science Review, vol. 40, p. 100378, 2021.
- S. Ayesha, M. K. Hanif, and R. Talib, “Overview and comparative study of dimensionality reduction techniques for high dimensional data,” Information Fusion, vol. 59, pp. 44–58, 2020.
- C. Rudin, C. Chen, Z. Chen, H. Huang, L. Semenova, and C. Zhong, “Interpretable machine learning: Fundamental principles and 10 grand challenges,” Statistic Surveys, vol. 16, pp. 1–85, 2022.
- J.-M. Vallat, C. Goizet, M. Tazir, P. Couratier, L. Magy, and S. Mathis, “Classifications of neurogenetic diseases: An increasingly complex problem,” Revue Neurologique, vol. 172, no. 6-7, pp. 339–349, 2016.
- P. Lallemant-Dudek and A. Durr, “Clinical and genetic update of hereditary spastic paraparesis,” Revue neurologique, vol. 177, no. 5, pp. 550–556, 2021.
- D. Pareyson and C. Marchesi, “Diagnosis, natural history, and management of charcot–marie–tooth disease,” The Lancet Neurology, vol. 8, no. 7, pp. 654–667, 2009.
- Y. Wang, H. Huang, C. Rudin, and Y. Shaposhnik, “Understanding how dimension reduction tools work: an empirical approach to deciphering t-sne, umap, trimap, and pacmap for data visualization,” The Journal of Machine Learning Research, vol. 22, no. 1, pp. 9129–9201, 2021.
- E. S. Ortigossa, F. F. Dias, and D. C. d. Nascimento, “Getting over high-dimensionality: How multidimensional projection methods can assist data science,” Applied Sciences, vol. 12, no. 13, p. 6799, 2022.
- J. Xia, Y. Zhang, J. Song, Y. Chen, Y. Wang, and S. Liu, “Revisiting dimensionality reduction techniques for visual cluster analysis: an empirical study,” IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 1, pp. 529–539, 2021.
- M. Espadoto, G. Appleby, A. Suh, D. Cashman, M. Li, C. E. Scheidegger, E. W. Anderson, R. Chang, and A. C. Telea, “Unprojection: Leveraging inverse-projections for visual analytics of high-dimensional data,” IEEE Transactions on Visualization and Computer Graphics, 2021.
- H. Kim, J. Choo, H. Park, and A. Endert, “Interaxis: Steering scatterplot axes via observation-level interaction,” IEEE transactions on visualization and computer graphics, vol. 22, no. 1, pp. 131–140, 2015.
- K. Eckelt, A. Hinterreiter, P. Adelberger, C. Walchshofer, V. Dhanoa, C. Humer, M. Heckmann, C. Steinparz, and M. Streit, “Visual exploration of relationships and structure in low-dimensional embeddings,” IEEE Transactions on Visualization and Computer Graphics, 2022.
- W. Xu, X. Jiang, X. Hu, and G. Li, “Visualization of genetic disease-phenotype similarities by multiple maps t-sne with laplacian regularization,” BMC medical genomics, vol. 7, no. 2, pp. 1–9, 2014.
- A. Chatzimparmpas, R. M. Martins, and A. Kerren, “t-visne: Interactive assessment and interpretation of t-sne projections,” IEEE transactions on visualization and computer graphics, vol. 26, no. 8, pp. 2696–2714, 2020.
- J.-T. Sohns, M. Schmitt, F. Jirasek, H. Hasse, and H. Leitte, “Attribute-based explanation of non-linear embeddings of high-dimensional data,” IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 1, pp. 540–550, 2021.
- S. Maiella, A. Rath, C. Angin, F. Mousson, and O. Kremp, “Orphanet and its consortium: where to find expert-validated information on rare diseases,” Revue neurologique, vol. 169, pp. S3–8, 2013.
- P. N. Robinson, “Deep phenotyping for precision medicine,” Human mutation, vol. 33, no. 5, pp. 777–780, 2012.
- D. C. Wunsch III and D. B. Hier, “Subsumption reduces dataset dimensionality without decreasing performance of a machine learning classifier,” in 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2021, pp. 1618–1621.
- D. B. Hier, R. Yelugam, M. D. Carrithers, and D. C. Wunsch III, “The visualization of orphadata neurology phenotypes,” Frontiers in Digital Health, vol. 5, p. 1064936, 2023.
- L. van der Maarten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.
- T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY, USA: ACM, 2016, pp. 785–794. [Online]. Available: http://doi.acm.org/10.1145/2939672.2939785
- S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017, pp. 4765–4774. [Online]. Available: http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf
- D. Hier, T. Obafemi-Ajayi, G. R. Olbricht, D. M. Burns, S. Petrenko, and D. C. Wunsch, II, “Acil-group/centroids,” Jan. 2024. [Online]. Available: https://doi.org/10.5281/zenodo.10465863