Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Optimizing Feature Selection for Binary Classification with Noisy Labels: A Genetic Algorithm Approach (2401.06546v1)

Published 12 Jan 2024 in cs.LG, cs.CV, and cs.NE

Abstract: Feature selection in noisy label scenarios remains an understudied topic. We propose a novel genetic algorithm-based approach, the Noise-Aware Multi-Objective Feature Selection Genetic Algorithm (NMFS-GA), for selecting optimal feature subsets in binary classification with noisy labels. NMFS-GA offers a unified framework for selecting feature subsets that are both accurate and interpretable. We evaluate NMFS-GA on synthetic datasets with label noise, a Breast Cancer dataset enriched with noisy features, and a real-world ADNI dataset for dementia conversion prediction. Our results indicate that NMFS-GA can effectively select feature subsets that improve the accuracy and interpretability of binary classifiers in scenarios with noisy labels.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. W. Siblini, P. Kuntz, and F. Meyer, “A review on dimensionality reduction for multi-label classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 3, pp. 839–857, 2019.
  2. V. Pappu and P. M. Pardalos, “High-dimensional data classification,” Clusters, Orders, and Trees: Methods and Applications, pp. 119–150, 2014.
  3. H. Liu and L. Yu, “Toward integrating feature selection algorithms for classification and clustering,” IEEE Trans. Knowl. Data Eng, vol. 17, no. 4, pp. 491–502, 2005.
  4. M. Pan, Z. Sun, C. Wang, and G. Cao, “A multi-label feature selection method based on an approximation of interaction information,” Intelligent Data Analysis, vol. 26, no. 4, pp. 823–840, 2022.
  5. L. Santos-Mayo, L. M. San-José-Revuelta, and J. I. Arribas, “A computer-aided diagnosis system with eeg based on the p3b wave during an auditory odd-ball task in schizophrenia,” IEEE. Trans. Biomed, vol. 64, no. 2, pp. 395–407, 2016.
  6. H. Liu, R. Setiono, et al., “A probabilistic approach to feature selection-a filter solution,” in ICML, vol. 96, pp. 319–327, 1996.
  7. C. Sevilla-Salcedo, V. Imani, P. M. Olmos, V. Gómez-Verdejo, J. Tohka, A. D. N. Initiative, et al., “Multi-task longitudinal forecasting with missing values on alzheimer’s disease,” Computer Methods and Programs in Biomedicine, vol. 226, p. 107056, 2022.
  8. V. Imani, M. Prakash, M. Zare, and J. Tohka, “Comparison of single and multitask learning for predicting cognitive decline based on mri data,” IEEE Access, vol. 9, pp. 154275–154291, 2021.
  9. E. Tuv, A. Borisov, G. Runger, and K. Torkkola, “Feature selection with ensembles, artificial variables, and redundancy elimination,” The Journal of Machine Learning Research, vol. 10, pp. 1341–1366, 2009.
  10. V. Imani, C. Sevilla-Salcedo, V. Fortino, and J. Tohka, “Multi-objective genetic algorithm for multi-view feature selection,” arXiv preprint arXiv:2305.18352, 2023.
  11. M. M. Mafarja and S. Mirjalili, “Hybrid whale optimization algorithm with simulated annealing for feature selection,” Neurocomputing, vol. 260, pp. 302–312, 2017.
  12. X. Zhu, X. Wu, and Q. Chen, “Eliminating class noise in large datasets,” in Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 920–927, 2003.
  13. Y. Wang, X. Ma, Z. Chen, Y. Luo, J. Yi, and J. Bailey, “Symmetric cross entropy for robust learning with noisy labels,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 322–330, 2019.
  14. X. Zhou, X. Liu, D. Zhai, J. Jiang, and X. Ji, “Asymmetric loss functions for noise-tolerant learning: Theory and applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  15. C. Gong, Y. Ding, B. Han, G. Niu, J. Yang, J. You, D. Tao, and M. Sugiyama, “Class-wise denoising for robust learning under label noise,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 45, no. 3, pp. 2835–2848, 2022.
  16. N. Natarajan, I. S. Dhillon, P. K. Ravikumar, and A. Tewari, “Learning with noisy labels,” Advances in neural information processing systems, vol. 26, 2013.
  17. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: Nsga-ii,” IEEE transactions on evolutionary computation, vol. 6, no. 2, pp. 182–197, 2002.
  18. C. E. Shannon, “A mathematical theory of communication,” The Bell system technical journal, vol. 27, no. 3, pp. 379–423, 1948.
  19. A. Ghosh, H. Kumar, and P. S. Sastry, “Robust loss functions under label noise for deep neural networks,” in Proceedings of the AAAI conference on AI, vol. 31, 2017.
  20. Z. Zhang and M. Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,” Advances in neural information processing systems, vol. 31, 2018.
  21. D. Tanaka, D. Ikami, T. Yamasaki, and K. Aizawa, “Joint optimization framework for learning with noisy labels,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5552–5560, 2018.
  22. Y. Liu and H. Guo, “Peer loss functions: Learning from noisy labels without knowing noise rates,” in ICML, pp. 6226–6236, PMLR, 2020.
  23. P. Křížek, J. Kittler, and V. Hlaváč, “Improving stability of feature selection methods,” in International conference on computer analysis of images and patterns, pp. 929–936, Springer, 2007.
  24. W. Wolberg, O. Mangasarian, N. Street, and W. Street, “Breast Cancer Wisconsin (Diagnostic).” UCI Machine Learning Repository, 1995.
  25. E. Moradi, A. Pepe, C. Gaser, H. Huttunen, and J. Tohka, “Machine learning framework for early mri-based alzheimer’s conversion prediction in mci subjects,” Neuroimage, vol. 104, pp. 398–412, 2015.
  26. C. Gaser, R. Dahnke, P. M. Thompson, F. Kurth, E. Luders, and A. D. N. Initiative, “Cat–a computational anatomy toolbox for the analysis of structural mri data,” biorxiv, pp. 2022–06, 2022.
  27. D. S. Marcus, T. H. Wang, J. Parker, J. G. Csernansky, J. C. Morris, and R. L. Buckner, “Open access series of imaging studies (oasis): cross-sectional mri data in young, middle aged, nondemented, and demented older adults,” Journal of cognitive neuroscience, vol. 19, no. 9, pp. 1498–1507, 2007.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets