Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scalable variable selection for two-view learning tasks with projection operators (2307.01558v1)

Published 4 Jul 2023 in cs.LG and cs.AI

Abstract: In this paper we propose a novel variable selection method for two-view settings, or for vector-valued supervised learning problems. Our framework is able to handle extremely large scale selection tasks, where number of data samples could be even millions. In a nutshell, our method performs variable selection by iteratively selecting variables that are highly correlated with the output variables, but which are not correlated with the previously chosen variables. To measure the correlation, our method uses the concept of projection operators and their algebra. With the projection operators the relationship, correlation, between sets of input and output variables can also be expressed by kernel functions, thus nonlinear correlation models can be exploited as well. We experimentally validate our approach, showing on both synthetic and real data its scalability and the relevance of the selected features. Keywords: Supervised variable selection, vector-valued learning, projection-valued measure, reproducing kernel Hilbert space

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Mission: Ultra large-scale feature selection using count-sketches. In ICML, pages 80–88. PMLR.
  2. Series and parallel addition of matrices. SIAM J. Appl. Math., 26:576–594.
  3. Deep canonical correlation analysis. In Dasgupta, S. and McAllester, D., editors, Proceedings of the 30th ICML, volume 28(3) of Proceedings of Machine Learning Research, pages 1247–1255, Atlanta, Georgia, USA. PMLR.
  4. A benchmark of prevalent feature selection algorithms on a diverse set of classification problems. Master’s thesis.
  5. Ben-Israel, A. (2015). Projectors on intersections of subspaces. Contemporary Mathematics, 636.
  6. Generalized inverses: Theory and applications. Springer New York, NY, second edition.
  7. Eigenproblems in pattern recognition. In Handbook of Geometric Computing : Applications in Pattern Recognition, Computer Vision, Neuralcomputing, and Robotics, pages 129–170. Springer-Verlag.
  8. Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics & Data Analysis, 143(C).
  9. Saturating splines and feature selection. JMLR, 18(197):1–32.
  10. Vector-valued least-squares regression under output regularity assumptions. JMLR, 23(344):1–50.
  11. Feature selection for kernel methods in systems biology. NAR genomics and bioinformatics, 4(1):lqac014.
  12. Conway, J. B. (1997). Course in Functional Analysis. Springer, 2nd edition.
  13. Learning and inference for structured prediction: A unifying perspective. In IJCAI-19.
  14. Ultrahigh dimensional feature selection: Beyond the linear model. JMLR, 10:2013–2038.
  15. Matrix Computations. The Johns Hopkins University Press, Baltimore, MD, 4th edition.
  16. Halperin, I. (1962). The product of projection operators. Acta Sei. Math. (Szeged), 23:96–99.
  17. An importance weighted feature selection stability measure. JMLR, 22(116):1–57.
  18. A general formula for the classical capacity of a general quantum channel. In Proceedings IEEE International Symposium on Information Theory,. IEEE.
  19. Laplacian score for feature selection. Advances in neural information processing systems, 18.
  20. Kernel methods in machine learning. The annals of statistics, pages 1171–1220.
  21. Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28(3/4):321.
  22. On the self-penalization phenomenon in feature selection. arXiv preprint arXiv:2110.05852.
  23. Kreyszig, E. (1989). Introductory Functional Analysis with Applications. Wiley.
  24. Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Technical report, University of Totonto.
  25. LeCun, Y. (1998). The mnist database of handwritten digits. exdb.
  26. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, volume 86(11), pages 2278–2324.
  27. Feature selection. ACM Computing Surveys, 50(6):1–45.
  28. Unsupervised feature selection using nonnegative spectral analysis. In Proceedings of the AAAI, volume 26, pages 1026–1032.
  29. Randomized nonlinear component analysis. In Xing, E. P. and Jebara, T., editors, Proceedings of the 31st ICML, volume 32(2) of Proceedings of Machine Learning Research, pages 1359–1367, Bejing, China. PMLR.
  30. On learning vector-valued functions. Neural computation, 17(1):177–204.
  31. A unifying framework in vector-valued reproducing kernel hilbert spaces for manifold regularization and co-regularized multi-view learning. volume 17, pages 769–840. JMLR. org.
  32. Approximate log-hilbert-schmidt distances between covariance operators for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5195–5203.
  33. Kernel mean embedding of distributions: A review and beyond. Foundations and Trends® in Machine Learning, 10(1-2):1–141.
  34. Quantum Computation and Quantum Information. Cambridge University Press.
  35. On the stability of feature selection algorithms. JMLR, 18(174):1–54.
  36. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press.
  37. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, 10(5):1299–1319.
  38. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the 14th ACM International Conference on Multimedia, MM ’06, page 421–430.
  39. Feature selection via dependence maximization. JMLR, 13(1):1393–1434.
  40. Tjur, T. (1984). Analysis of variance models in orthogonal designs. International Statistical Review, Revue Internationale de Statistique, 52:33–65.
  41. Sparse non-linear cca through hilbert-schmidt independence criterion. In 2018 IEEE International Conference on Data Mining (ICDM), pages 1278–1283, Los Alamitos, CA, USA. IEEE Computer Society.
  42. Large-scale sparse kernel canonical correlation analysis. In Chaudhuri, K. and Salakhutdinov, R., editors, Proceedings of the 36th ICML, volume 97 of Proceedings of Machine Learning Research, pages 6383–6391. PMLR.
  43. Boso: A novel feature selection algorithm for linear regression with high-dimensional data. PLOS Computational Biology, 18(5):1–29.
  44. von Neumann, J. (1950). Functional Operators. Vol II: The Geometry of Orthogonal Subspaces, volume 39. Annals of Math. Studies, Princeton University Press, Princeton.
  45. Large-scale approximate kernel canonical correlation analysis. CoRR, abs/1511.04773.
  46. A survey on multi-view learning. arXiv preprint arXiv:1304.5634.
  47. A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction. J. Appl. Sci. Technol. Trends, 1(2):56–70.

Summary

We haven't generated a summary for this paper yet.