Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
127 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
53 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
4 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Gram-Schmidt Methods for Unsupervised Feature Extraction and Selection (2311.09386v3)

Published 15 Nov 2023 in cs.LG, cs.IT, and math.IT

Abstract: Feature extraction and selection at the presence of nonlinear dependencies among the data is a fundamental challenge in unsupervised learning. We propose using a Gram-Schmidt (GS) type orthogonalization process over function spaces to detect and map out such dependencies. Specifically, by applying the GS process over some family of functions, we construct a series of covariance matrices that can either be used to identify new large-variance directions, or to remove those dependencies from known directions. In the former case, we provide information-theoretic guarantees in terms of entropy reduction. In the latter, we provide precise conditions by which the chosen function family eliminates existing redundancy in the data. Each approach provides both a feature extraction and a feature selection algorithm. Our feature extraction methods are linear, and can be seen as natural generalization of principal component analysis (PCA). We provide experimental results for synthetic and real-world benchmark datasets which show superior performance over state-of-the-art (linear) feature extraction and selection algorithms. Surprisingly, our linear feature extraction algorithms are comparable and often outperform several important nonlinear feature extraction methods such as autoencoders, kernel PCA, and UMAP. Furthermore, one of our feature selection algorithms strictly generalizes a recent Fourier-based feature selection mechanism (Heidari et al., IEEE Transactions on Information Theory, 2022), yet at significantly reduced complexity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. S. Khalid, T. Khalil, and S. Nasreen, “A survey of feature selection and feature extraction techniques in machine learning,” in 2014 science and information conference.   IEEE, 2014, pp. 372–378.
  2. K. Pearson, “Liii. on lines and planes of closest fit to systems of points in space,” The London, Edinburgh, and Dublin philosophical magazine and journal of science, vol. 2, no. 11, pp. 559–572, 1901.
  3. R. Vidal, Y. Ma, and S. Sastry, “Generalized principal component analysis (gpca),” IEEE transactions on pattern analysis and machine intelligence, vol. 27, no. 12, pp. 1945–1959, 2005.
  4. E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” Journal of the ACM (JACM), vol. 58, no. 3, pp. 1–37, 2011.
  5. B. Schölkopf, A. Smola, and K.-R. Müller, “Kernel principal component analysis,” in Artificial Neural Networks—ICANN’97: 7th International Conference Lausanne, Switzerland, October 8–10, 1997 Proceeedings.   Springer, 2005, pp. 583–588.
  6. J. V. Stone, “Independent component analysis: a tutorial introduction,”  , 2004.
  7. A. Tharwat, T. Gaber, A. Ibrahim, and A. E. Hassanien, “Linear discriminant analysis: A detailed tutorial,” AI communications, vol. 30, no. 2, pp. 169–190, 2017.
  8. S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” science, vol. 290, no. 5500, pp. 2323–2326, 2000.
  9. M. Heidari, J. K. Sreedharan, G. Shamir, and W. Szpankowski, “Sufficiently informative and relevant features: An information-theoretic and fourier-based characterization,” IEEE Transactions on Information Theory, 2022.
  10. W. S. Torgerson, “Multidimensional scaling: I. theory and method,” Psychometrika, vol. 17, no. 4, pp. 401–419, 1952.
  11. R. Larsen, “Decomposition using maximum autocorrelation factors,” Journal of Chemometrics: A Journal of the Chemometrics Society, vol. 16, no. 8-10, pp. 427–435, 2002.
  12. L. Wiskott and T. J. Sejnowski, “Slow feature analysis: Unsupervised learning of invariances,” Neural computation, vol. 14, no. 4, pp. 715–770, 2002.
  13. X. He and P. Niyogi, “Locality preserving projections,” Advances in neural information processing systems, vol. 16, 2003.
  14. A. Hyvärinen and E. Oja, “Independent component analysis: algorithms and applications,” Neural networks, vol. 13, no. 4-5, pp. 411–430, 2000.
  15. ——, “A fast fixed-point algorithm for independent component analysis,” Neural computation, vol. 9, no. 7, pp. 1483–1492, 1997.
  16. K. D. Bollacker and J. Ghosh, “Linear feature extractors based on mutual information,” in Proceedings of 13th International Conference on Pattern Recognition, vol. 2.   IEEE, 1996, pp. 720–724.
  17. J. M. Leiva-Murillo and A. Artes-Rodriguez, “Maximization of mutual information for supervised linear feature extraction,” IEEE Transactions on Neural Networks, vol. 18, no. 5, pp. 1433–1441, 2007.
  18. N. Kwak, “Feature extraction based on direct calculation of mutual information,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 21, no. 07, pp. 1213–1231, 2007.
  19. A. Marinoni and P. Gamba, “Unsupervised data driven feature extraction by means of mutual information maximization,” IEEE Transactions on Computational Imaging, vol. 3, no. 2, pp. 243–253, 2017.
  20. K. Torkkola, “Feature extraction by non-parametric mutual information maximization,” Journal of machine learning research, vol. 3, no. Mar, pp. 1415–1438, 2003.
  21. D. Dua and C. Graff, “Uci machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
  22. J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, and H. Liu, “Feature selection: A data perspective,” ACM computing surveys (CSUR), vol. 50, no. 6, pp. 1–45, 2017.
  23. M. Heidari, J. Sreedharan, G. Shamir, and W. Szpankowski, “Information sufficiency via fourier expansion,” in 2021 IEEE International Symposium on Information Theory (ISIT).   IEEE, 2021, pp. 2774–2779.
  24. M. E. Ismail and R. Zhang, “A review of multivariate orthogonal polynomials,” Journal of the Egyptian Mathematical Society, vol. 25, no. 2, pp. 91–110, 2017.
  25. M. Heidari, J. Sreedharan, G. I. Shamir, and W. Szpankowski, “Finding relevant information via a discrete fourier expansion,” in International Conference on Machine Learning.   PMLR, 2021, pp. 4181–4191.

Summary

We haven't generated a summary for this paper yet.