Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
127 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
53 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
4 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Feature Selection via Maximizing Distances between Class Conditional Distributions (2401.07488v1)

Published 15 Jan 2024 in cs.LG

Abstract: For many data-intensive tasks, feature selection is an important preprocessing step. However, most existing methods do not directly and intuitively explore the intrinsic discriminative information of features. We propose a novel feature selection framework based on the distance between class conditional distributions, measured by integral probability metrics (IPMs). Our framework directly explores the discriminative information of features in the sense of distributions for supervised classification. We analyze the theoretical and practical aspects of IPMs for feature selection, construct criteria based on IPMs. We propose several variant feature selection methods of our framework based on the 1-Wasserstein distance and implement them on real datasets from different domains. Experimental results show that our framework can outperform state-of-the-art methods in terms of classification accuracy and robustness to perturbations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning, pp.  214–223. PMLR, July 2017.
  2. Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation. In Advances in Neural Information Processing Systems, volume 33, pp.  12934–12944. Curran Associates, Inc., 2020.
  3. Concrete autoencoders: Differentiable feature selection and reconstruction. In International Conference on Machine Learning, pp. 444–453. PMLR, 2019.
  4. One-Dimensional Empirical Measures, Order Statistics, and Kantorovich Transport Distances. Number number 1259 in Memoirs of the American Mathematical Society. American Mathematical Society, Providence, RI, 2019. ISBN 978-1-4704-3650-6.
  5. Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection. Journal of Machine Learning Research, 13(2):27–66, 2012.
  6. Feature selection in machine learning: A new perspective. Neurocomputing, 300:70–79, July 2018.
  7. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pp.  785–794, New York, NY, USA, August 2016. Association for Computing Machinery. ISBN 978-1-4503-4232-2.
  8. Block HSIC Lasso: Model-free biomarker detection for ultra-high dimensional data. Bioinformatics, 35(14):i427–i435, July 2019.
  9. Cuturi, M. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013.
  10. A Robust-Equitable Measure for Feature Ranking and Selection. Journal of Machine Learning Research, 18(71):1–46, 2017.
  11. Pattern Classification. Wiley, New York, 2nd ed edition, 2000. ISBN 978-0-471-05669-0.
  12. Dudley, R. M. Real Analysis and Probability. Number 74 in Cambridge Studies in Advanced Mathematics. Cambridge Univ. Press, Cambridge, 2. ed., reprint edition, 2008. ISBN 978-0-521-80972-6 978-0-521-00754-2.
  13. Fleuret, F. Fast Binary Feature Selection with Conditional Mutual Information. Journal of Machine Learning Research, 5:1531–1555, December 2004.
  14. On the scaling of multidimensional matrices. Linear Algebra and its Applications, 114–115:717–735, March 1989.
  15. Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance. In Advances in Neural Information Processing Systems, volume 33, pp.  2527–2539. Curran Associates, Inc., 2020.
  16. Generalized fisher score for feature selection. arXiv preprint arXiv:1202.3725, 2012.
  17. An introduction to variable and feature selection. Journal of machine learning research, 3(Mar):1157–1182, 2003.
  18. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 46(1):389–422, January 2002.
  19. Feature selection for machine learning: Comparing a correlation-based filter approach to the wrapper. In FLAIRS Conference, volume 1999, pp.  235–239, 1999.
  20. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8):1226–1238, August 2005.
  21. Wrappers for feature subset selection. Artificial Intelligence, 97(1-2):273–324, December 1997.
  22. LassoNet: A Neural Network with Feature Sparsity. Journal of Machine Learning Research, 22(127), June 2021.
  23. Feature Selection: A Data Perspective. ACM Computing Surveys, 50(6):1–45, November 2018.
  24. Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity. IEEE Journal of Selected Topics in Signal Processing, 2(3):261–274, June 2008.
  25. Outlier-Robust Optimal Transport. In Proceedings of the 38th International Conference on Machine Learning, pp.  7850–7860. PMLR, July 2021.
  26. On surrogate loss functions and f-divergences. The Annals of Statistics, 37(2):876–904, April 2009.
  27. Trace ratio criterion for feature selection. In AAAI, volume 2, pp.  671–676, 2008.
  28. Divergence based feature selection for multimodal class densities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(2):218–223, February 1996.
  29. Statistical Aspects of Wasserstein Distances. Annual Review of Statistics and Its Application, 6(1):405–431, March 2019.
  30. Fast and robust Earth Mover’s Distances. In 2009 IEEE 12th International Conference on Computer Vision, pp.  460–467, Kyoto, September 2009. IEEE. ISBN 978-1-4244-4420-5.
  31. Computational Optimal Transport: With Applications to Data Science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
  32. Wasserstein Barycenter and Its Application to Texture Mixing. In Bruckstein, A. M., Ter Haar Romeny, B. M., Bronstein, A. M., and Bronstein, M. M. (eds.), Scale Space and Variational Methods in Computer Vision, volume 6667, pp.  435–446. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012. ISBN 978-3-642-24784-2 978-3-642-24785-9.
  33. The Earth Mover’s Distance as a Metric for Image Retrieval. International Journal of Computer Vision, 40(2):99–121, November 2000.
  34. Optimal Transport: Fast Probabilistic Approximation with Exact Solvers. Journal of Machine Learning Research, 20(105):1–23, 2019.
  35. Feature Selection via Dependence Maximization. Journal of Machine Learning Research, 13(5), 2012.
  36. On integral probability metrics, \phi-divergences and binary classification, October 2009.
  37. On the empirical estimation of integral probability metrics. Electronic Journal of Statistics, 6(none):1550–1599, January 2012.
  38. Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, January 1996.
  39. van de Geer, S. A. Empirical Processes in M-Estimation. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge ; New York, 2000. ISBN 978-0-521-65002-1.
  40. Varadarajan, V. S. On the Convergence of Sample Probability Distributions. Sankhyā: The Indian Journal of Statistics (1933-1960), 19(1/2):23–26, 1958.
  41. High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso. Neural Computation, 26(1):185–207, January 2014.
  42. Data Visualization and Feature Selection: New Algorithms for Nongaussian Data. In Advances in Neural Information Processing Systems, volume 12. MIT Press, 1999.
  43. L2, 1-norm regularized discriminative feature selection for unsupervised. In Twenty-Second International Joint Conference on Artificial Intelligence, 2011.
  44. A Comparative Study on Feature Selection in Text Categorization. In Proceedings of the Fourteenth International Conference on Machine Learning, ICML ’97, pp.  412–420, San Francisco, CA, USA, July 1997. Morgan Kaufmann Publishers Inc. ISBN 978-1-55860-486-5.

Summary

We haven't generated a summary for this paper yet.