Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Supervised Manifold Learning via Random Forest Geometry-Preserving Proximities (2307.01077v1)

Published 3 Jul 2023 in stat.ML and cs.LG

Abstract: Manifold learning approaches seek the intrinsic, low-dimensional data structure within a high-dimensional space. Mainstream manifold learning algorithms, such as Isomap, UMAP, $t$-SNE, Diffusion Map, and Laplacian Eigenmaps do not use data labels and are thus considered unsupervised. Existing supervised extensions of these methods are limited to classification problems and fall short of uncovering meaningful embeddings due to their construction using order non-preserving, class-conditional distances. In this paper, we show the weaknesses of class-conditional manifold learning quantitatively and visually and propose an alternate choice of kernel for supervised dimensionality reduction using a data-geometry-preserving variant of random forest proximities as an initialization for manifold learning methods. We show that local structure preservation using these proximities is near universal across manifold learning approaches and global structure is properly maintained using diffusion-based algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. K. R. Moon, D. van Dijk, Z. Wang, S. Gigante, D. B. Burkhardt, W. S. Chen, K. Yim, A. v. d. Elzen, M. J. Hirn, R. R. Coifman, N. B. Ivanova, G. Wolf, and S. Krishnaswamy, “Visualizing structure and transitions in high-dimensional biological data,” Nat. Biotechnol., vol. 37, no. 12, pp. 1482–1492, Dec 2019. [Online]. Available: https://doi.org/10.1038/s41587-019-0336-3
  2. L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001.
  3. J. S. Rhodes, A. Cutler, and K. R. Moon, “Geometry- and accuracy-preserving random forest proximities,” 2022. [Online]. Available: https://arxiv.org/abs/2201.12682
  4. J. B. Tenenbaum, V. Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000. [Online]. Available: https://doi.org/10.1126/science.290.5500.2319
  5. R. R. Coifman and S. Lafon, “Diffusion maps,” Appl. Comput. Harmon. Anal., vol. 21, no. 1, pp. 5–30, 2006, special Issue: Diffusion Maps and Wavelets. [Online]. Available: https://doi.org/10.1016/j.acha.2006.04.006
  6. L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, pp. 2579–2605, 2008. [Online]. Available: http://jmlr.org/papers/v9/vandermaaten08a.html
  7. M. Vlachos, C. Domeniconi, D. Gunopulos, G. Kollios et al., “Non-linear dimensionality reduction techniques for classification and visualization,” in Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’02.   New York, NY, USA: Association for Computing Machinery, 2002, p. 645–651. [Online]. Available: https://doi.org/10.1145/775047.775143
  8. D. de Ridder, O. Kouropteva, O. Okun, M. Pietikäinen, and R. P. W. Duin, “Supervised locally linear embedding,” in Artificial Neural Networks and Neural Information Processing — ICANN/ICONIP 2003, O. Kaynak, E. Alpaydin, E. Oja, and L. Xu, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2003, pp. 333–341.
  9. B. Ribeiro, A. Vieira, and J. Carvalho das Neves, “Supervised isomap with dissimilarity measures in embedding learning,” in Progress in Pattern Recognition, Image Analysis and Applications, J. Ruiz-Shulcloper and W. G. Kropatsch, Eds.   Berlin, Heidelberg: Springer, 2008, pp. 389–396. [Online]. Available: https://doi.org/10.1007/978-3-540-85920-8_48
  10. L. Hajderanj, I. Weheliye, and D. Chen, “A new supervised T-SNE with dissimilarity measure for effective data visualization and classification,” in Proceedings of the 2019 8th International Conference on Software and Information Engineering, ser. ICSIE ’19.   New York, NY, USA: Association for Computing Machinery, 2019, p. 232–236. [Online]. Available: https://doi.org/10.1145/3328833.3328853
  11. S. Zhang, “Enhanced supervised locally linear embedding,” Pattern Recognit. Lett, vol. 30, no. 13, pp. 1208 – 1218, 2009. [Online]. Available: https://doi.org/10.1016/j.patrec.2009.05.011
  12. Q. Jiang and M. Jia, “Supervised laplacian eigenmaps for machinery fault classification,” 2009 WRI World Congress on Computer Science and Information Engineering, vol. 7, pp. 116–120, 2009. [Online]. Available: http://doi.org/10.1109/CSIE.2009.765
  13. L. Hajderanj, D. Chen, and I. Weheliye, “The impact of supervised manifold learning on structure preserving and classification error: A theoretical study,” IEEE Access, vol. 9, pp. 43 909–43 922, 2021. [Online]. Available: 10.1109/ACCESS.2021.3066259
  14. M. Bohanec and V. Rajkovič, “V.: Knowledge acquisition and explanation for multi-attribute decision,” in Making, 8 th International Workshop “Expert Systems and Their Applications, 1988.
  15. L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform manifold approximation and projection for dimension reduction,” arXiv, vol. abs/1802.03426, 2018. [Online]. Available: https://arxiv.org/abs/1802.03426
  16. L. McInnes, J. Healy, N. Saul, and L. Grossberger, “Umap: Uniform manifold approximation and projection,” The Journal of Open Source Software, vol. 3, no. 29, p. 861, 2018.
  17. L. Breiman and A. Cutler, “Random forests,” https://www.stat.berkeley.edu/ breiman/RandomForests/cc_home.htm, accessed: 03/02/2023.
  18. T. Hastie, R. Tibshirani, and J. Friedman, “Random forests,” in The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.   New York, NY: Springer New York, 2009, pp. 587–604. [Online]. Available: https://doi.org/10.1007/978-0-387-84858-7_15
  19. Y. Lin and Y. Jeon, “Random forests and adaptive nearest neighbors,” J. Am. Stat. Assoc., vol. 101, no. 474, pp. 578–590, 2006. [Online]. Available: https://doi.org/10.1198/016214505000001230
  20. N. Mohamed Amine Mairech, “Datac’ept : Life expectancy prediction,” 2019. [Online]. Available: https://kaggle.com/competitions/datacept-life-expectancy-prediction
  21. J. S. Rhodes, A. Cutler, G. Wolf, and K. R. Moon, “Random forest-based diffusion information geometry for supervised visualization and data exploration,” 2021 IEEE Statistical Signal Processing Workshop (SSP), pp. 331–335, 2021. [Online]. Available: https://doi.org/10.1109/SSP49050.2021.9513749
  22. D. Dua and C. Graff, “UCI machine learning repository,” 2017, (Accessed on 03/02/2023). [Online]. Available: http://archive.ics.uci.edu/ml
  23. R. Gorman and T. J. Sejnowski, “Analysis of hidden units in a layered network trained to classify sonar targets,” Neural Netw, vol. 1, no. 1, pp. 75–89, 1988. [Online]. Available: https://doi.org/10.1016/0893-6080(88)90023-8
Citations (2)

Summary

We haven't generated a summary for this paper yet.