Genetic Programming for Explainable Manifold Learning (2403.14139v1)
Abstract: Manifold learning techniques play a pivotal role in machine learning by revealing lower-dimensional embeddings within high-dimensional data, thus enhancing both the efficiency and interpretability of data analysis by transforming the data into a lower-dimensional representation. However, a notable challenge with current manifold learning methods is their lack of explicit functional mappings, crucial for explainability in many real-world applications. Genetic programming, known for its interpretable functional tree-based models, has emerged as a promising approach to address this challenge. Previous research leveraged multi-objective GP to balance manifold quality against embedding dimensionality, producing functional mappings across a range of embedding sizes. Yet, these mapping trees often became complex, hindering explainability. In response, in this paper, we introduce Genetic Programming for Explainable Manifold Learning (GP-EMaL), a novel approach that directly penalises tree complexity. Our new method is able to maintain high manifold quality while significantly enhancing explainability and also allows customisation of complexity measures, such as symmetry balancing, scaling, and node complexity, catering to diverse application needs. Our experimental analysis demonstrates that GP-EMaL is able to match the performance of the existing approach in most cases, while using simpler, smaller, and more interpretable tree structures. This advancement marks a significant step towards achieving interpretable manifold learning.
- J. Birjandtalab, M. B. Pouyan, and M. Nourani, “Nonlinear dimension reduction for eeg-based epileptic seizure detection,” in 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), 2016, pp. 595–598.
- D. Singh, H. Climente-Gonzalez, M. Petrovich, E. Kawakami, and M. Yamada, “Fsnet: Feature selection network on high-dimensional biological data,” in 2023 International Joint Conference on Neural Networks (IJCNN), 2023, pp. 1–9.
- S. Galelli and A. Castelletti, “Tree-based iterative input variable selection for hydrological modeling,” Water Resources Research, vol. 49, no. 7, pp. 4295–4310, 2013.
- A. Gracia, S. González, V. Robles, and E. Menasalvas, “A methodology to compare dimensionality reduction algorithms in terms of loss of quality,” Information Sciences, vol. 270, pp. 1–27, 2014.
- European Parliament and Council of the European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council. [Online]. Available: https://data.europa.eu/eli/reg/2016/679/oj
- L. Longo, M. Brcic, F. Cabitza, J. Choi, R. Confalonieri, J. D. Ser, R. Guidotti, Y. Hayashi, F. Herrera, A. Holzinger, R. Jiang, H. Khosravi, F. Lecue, G. Malgieri, A. Páez, W. Samek, J. Schneider, T. Speith, and S. Stumpf, “Explainable artificial intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions,” Information Fusion, vol. 106, p. 102301, 2024.
- S. Ali, T. Abuhmed, S. El-Sappagh, K. Muhammad, J. M. Alonso-Moral, R. Confalonieri, R. Guidotti, J. Del Ser, N. Díaz-Rodríguez, and F. Herrera, “Explainable artificial intelligence (XAI): What we know and what is left to attain trustworthy artificial intelligence,” Information Fusion, vol. 99, p. 101805, 2023.
- M. A. Ahmad, C. Eckert, and A. Teredesai, “Interpretable machine learning in healthcare,” in Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics, 2018, pp. 559–560.
- P. Maddigan and T. Susnjak, “Forecasting patient demand at urgent care clinics using machine learning,” arXiv preprint arXiv:2205.13067, 2022.
- A. Lensen, B. Xue, and M. Zhang, “Can genetic programming do manifold learning too?” in Proceedings of the European Conference on Genetic Programming (EuroGP), Lecture Notes in Computer Science. Springer International Publishing, 2019, vol. 11451, pp. 114–130.
- A. Lensen, M. Zhang, and B. Xue, “Multi-objective genetic programming for manifold learning: balancing quality and dimensionality,” Genetic Programming and Evolvable Machines, vol. 21, pp. 399–431, 2020.
- Y. Mei, Q. Chen, A. Lensen, B. Xue, and M. Zhang, “Explainable artificial intelligence by genetic programming: A survey,” IEEE Transactions on Evolutionary Computation, vol. 27, no. 3, pp. 621–641, 2023.
- P. Maddigan, A. Lensen, and B. Xue, “Explaining genetic programming trees using large language models,” arXiv preprint arXiv:2403.03397, 2024.
- D. V. Carvalho, E. M. Pereira, and J. S. Cardoso, “Machine learning interpretability: A survey on methods and metrics,” Electronics, vol. 8, no. 8, 2019.
- C. Singh, J. P. Inala, M. Galley, R. Caruana, and J. Gao, “Rethinking interpretability in the era of large language models,” arXiv preprint arXiv:2402.01761, 2024.
- C. Molnar, G. Casalicchio, and B. Bischl, “Interpretable machine learning – a brief history, state-of-the-art and challenges,” in ECML PKDD 2020 Workshops. Cham: Springer International Publishing, 2020, pp. 417–431.
- P. Domingos, “A few useful things to know about machine learning,” Communications of the ACM, vol. 55, no. 10, pp. 78–87, 2012.
- J. Huang, W. Qian, C.-M. Vong, W. Ding, W. Shu, and Q. Huang, “Multi-label feature selection via label enhancement and analytic hierarchy process,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 7, no. 5, pp. 1377–1393, 2023.
- R. Zebari, A. Abdulazeez, D. Zeebaree, D. Zebari, and J. Saeed, “A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction,” Journal of Applied Science and Technology Trends, vol. 1, no. 2, pp. 56 – 70, May 2020.
- L. Van Der Maaten, E. O. Postma, H. J. van den Herik et al., “Dimensionality reduction: A comparative review,” Journal of Machine Learning Research, vol. 10, no. 66-71, p. 13, 2009.
- L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approximation and projection for dimension reduction,” 2020.
- G. E. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, pp. 504 – 507, 2006.
- L. van der Maaten and G. E. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008.
- M. Wattenberg, F. Viégas, and I. Johnson, “How to use t-sne effectively,” Distill, 2016.
- J. Dong, J. Zhong, W.-N. Chen, and J. Zhang, “An efficient federated genetic programming framework for symbolic regression,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 7, no. 3, pp. 858–871, 2023.
- B. Al-Helali, Q. Chen, B. Xue, and M. Zhang, “Genetic programming for feature selection based on feature removal impact in high-dimensional symbolic regression,” IEEE Transactions on Emerging Topics in Computational Intelligence, pp. 1–14, 2024, early Access.
- C. A. C. Coello, “Evolutionary multi-objective optimization: a historical view of the field,” IEEE Comput. Intell. Mag., vol. 1, no. 1, pp. 28–36, 2006.
- Q. Zhang and H. Li, “MOEA/D: A multiobjective evolutionary algorithm based on decomposition,” IEEE Trans. Evol. Comput., vol. 11, no. 6, pp. 712–731, 2007.
- A. S. Sambo, R. M. A. Azad, Y. Kovalchuk, V. P. Indramohan, and H. Shah, “Time control or size control? reducing complexity and improving the accuracy of genetic programming models,” 2020.
- N. Le, H. N. Xuan, A. Brabazon, and T. P. Thi, “Complexity measures in genetic programming learning: A brief review,” in 2016 IEEE Congress on Evolutionary Computation (CEC), 2016, pp. 2409–2416.
- J. Chadalawada, V. Havlicek, and V. Babovic, “A genetic programming approach to system identification of rainfall-runoff models,” Water Resour Manage, vol. 31, p. 3975–3992, 2017.
- G. Campobello, D. Dell’Aquila, M. Russo, and A. Segreto, “Neuro-genetic programming for multigenre classification of music content,” Applied Soft Computing, vol. 94, p. 106488, 2020.
- Y. Tang, H. Jia, and N. Verma, “Reducing energy of approximate feature extraction in heterogeneous architectures for sensor inference via energy-aware genetic programming,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 5, pp. 1576–1587, 2020.
- T. Soule and J. A. Foster, “Effects of code growth and parsimony pressure on populations in genetic programming,” Evolutionary Computation, vol. 6, pp. 293–309, 1998.
- J.Ni and P.Rockett, “Tikhonov regularization as a complexity measure in multiobjective genetic programming,” IEEE Transactions on Evolutionary Computation, vol. 19, no. 2, pp. 157–166, 2015.
- D. Dheeru and E. Karra Taniskidou, “UCI machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
- S. A. Nene, S. K. Nayar, and H. Murase, “Columbia object image library (coil-20),” Columbia University, Tech. Rep., 1996.