Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sharpness-Aware Minimization for Evolutionary Feature Construction in Regression (2405.06869v1)

Published 11 May 2024 in cs.LG and cs.NE

Abstract: In recent years, genetic programming (GP)-based evolutionary feature construction has achieved significant success. However, a primary challenge with evolutionary feature construction is its tendency to overfit the training data, resulting in poor generalization on unseen data. In this research, we draw inspiration from PAC-Bayesian theory and propose using sharpness-aware minimization in function space to discover symbolic features that exhibit robust performance within a smooth loss landscape in the semantic space. By optimizing sharpness in conjunction with cross-validation loss, as well as designing a sharpness reduction layer, the proposed method effectively mitigates the overfitting problem of GP, especially when dealing with a limited number of instances or in the presence of label noise. Experimental results on 58 real-world regression datasets show that our approach outperforms standard GP as well as six state-of-the-art complexity measurement methods for GP in controlling overfitting. Furthermore, the ensemble version of GP with sharpness-aware minimization demonstrates superior performance compared to nine fine-tuned machine learning and symbolic regression algorithms, including XGBoost and LightGBM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. K. Neshatian, M. Zhang, and P. Andreae, “A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming,” IEEE Transactions on Evolutionary Computation, vol. 16, no. 5, pp. 645–661, 2012.
  2. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
  3. F. Liu, X. Huang, Y. Chen, and J. A. Suykens, “Random features for kernel approximation: A survey on algorithms, theory, and beyond,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 7128–7148, 2021.
  4. W. La Cava, T. R. Singh, J. Taggart, S. Suri, and J. H. Moore, “Learning concise representations for regression by evolving networks of trees,” in International Conference on Learning Representations, 2018.
  5. A. Agapitos, R. Loughran, M. Nicolau, S. Lucas, M. O’Neill, and A. Brabazon, “A survey of statistical machine learning elements in genetic programming,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 6, pp. 1029–1048, 2019.
  6. Q. Chen, M. Zhang, and B. Xue, “Structural risk minimization-driven genetic programming for enhancing generalization in symbolic regression,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 4, pp. 703–717, 2018.
  7. Q. Chen, B. Xue, and M. Zhang, “Rademacher complexity for enhancing the generalization of genetic programming for symbolic regression,” IEEE Transactions on Cybernetics, vol. 52, no. 4, pp. 2382–2395, 2022.
  8. K. Nag and N. R. Pal, “A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification,” IEEE Transactions on Cybernetics, vol. 46, no. 2, pp. 499–510, 2015.
  9. H. Zhang, A. Zhou, Q. Chen, B. Xue, and M. Zhang, “Sr-forest: A genetic programming based heterogeneous ensemble learning method,” IEEE Transactions on Evolutionary Computation, 2023.
  10. T. Wei, W.-L. Liu, J. Zhong, and Y.-J. Gong, “Multiclass classification on high dimension and low sample size data using genetic programming,” IEEE Transactions on Emerging Topics in Computing, vol. 10, no. 2, pp. 704–718, 2020.
  11. Y. Bi, B. Xue, and M. Zhang, “Genetic programming-based evolutionary deep learning for data-efficient image classification,” IEEE Transactions on Evolutionary Computation, 2022.
  12. C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning (still) requires rethinking generalization,” Communications of the ACM, vol. 64, no. 3, pp. 107–115, 2021.
  13. F. Scarselli and A. C. Tsoi, “Universal approximation using feedforward neural networks: A survey of some existing methods, and some new results,” Neural Networks, vol. 11, no. 1, pp. 15–37, 1998.
  14. B. Neyshabur, S. Bhojanapalli, D. McAllester, and N. Srebro, “Exploring generalization in deep learning,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  15. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” Annals of Statistics, pp. 407–451, 2004.
  16. W. La Cava and J. H. Moore, “Learning feature spaces for regression with genetic programming,” Genetic Programming and Evolvable Machines, vol. 21, pp. 433–467, 2020.
  17. H. Zhang, A. Zhou, and H. Zhang, “An evolutionary forest for regression,” IEEE Transactions on Evolutionary Computation, vol. 26, no. 4, pp. 735–749, 2022.
  18. Q. Chen, M. Zhang, and B. Xue, “Feature selection to improve generalization of genetic programming for high-dimensional symbolic regression,” IEEE Transactions on Evolutionary Computation, vol. 21, no. 5, pp. 792–806, 2017.
  19. L. Vanneschi, M. Castelli, and S. Silva, “Measuring bloat, overfitting and functional complexity in genetic programming,” in Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, 2010, pp. 877–884.
  20. E. J. Vladislavleva, G. F. Smits, and D. Den Hertog, “Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming,” IEEE Transactions on Evolutionary Computation, vol. 13, no. 2, pp. 333–349, 2008.
  21. Q. Chen, B. Xue, and M. Zhang, “Improving symbolic regression based on correlation between residuals and variables,” in Proceedings of the 2020 Genetic and Evolutionary Computation Conference, 2020, pp. 922–930.
  22. C. Raymond, Q. Chen, and B. Xue, “Learning symbolic model-agnostic loss functions via meta-learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  23. G. F. Bomarito, P. E. Leser, N. Strauss, K. M. Garbrecht, and J. D. Hochhalter, “Bayesian model selection for reducing bloat and overfitting in genetic programming for symbolic regression,” in Proceedings of the Genetic and Evolutionary Computation Conference Companion, 2022, pp. 526–529.
  24. P. L. Bartlett, N. Harvey, C. Liaw, and A. Mehrabian, “Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks,” The Journal of Machine Learning Research, vol. 20, no. 1, pp. 2285–2301, 2019.
  25. Y. Jiang, B. Neyshabur, H. Mobahi, D. Krishnan, and S. Bengio, “Fantastic generalization measures and where to find them,” in International Conference on Learning Representations, 2019.
  26. I. Gonçalves and S. Silva, “Balancing learning and overfitting in genetic programming with interleaved sampling of training data,” in Genetic Programming: 16th European Conference, EuroGP 2013, Vienna, Austria, April 3-5, 2013. Proceedings 16.   Springer, 2013, pp. 73–84.
  27. S. Silva, L. Vanneschi, A. I. Cabral, and M. J. Vasconcelos, “A semi-supervised genetic programming method for dealing with noisy labels and hidden overfitting,” Swarm and Evolutionary Computation, vol. 39, pp. 323–338, 2018.
  28. L. Vanneschi and M. Castelli, “Soft target and functional complexity reduction: A hybrid regularization method for genetic programming,” Expert Systems with Applications, vol. 177, p. 114929, 2021.
  29. H. Zhang, Q. Chen, B. Xue, W. Banzhaf, and M. Zhang, “Modular multi-tree genetic programming for evolutionary feature construction for regression,” IEEE Transactions on Evolutionary Computation, 2023.
  30. B. Celik and J. Vanschoren, “Adaptation strategies for automated machine learning on evolving data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 9, pp. 3067–3078, 2021.
  31. C. Tuite, A. Agapitos, M. O’Neill, and A. Brabazon, “Early stopping criteria to counteract overfitting in genetic programming,” in Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation, 2011, pp. 203–204.
  32. P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving generalization,” in International Conference on Learning Representations, 2020.
  33. C. A. Owen, G. Dick, and P. A. Whigham, “Standardization and data augmentation in genetic programming,” IEEE Transactions on Evolutionary Computation, vol. 26, no. 6, pp. 1596–1608, 2022.
  34. M. Andriushchenko and N. Flammarion, “Towards understanding sharpness-aware minimization,” in International Conference on Machine Learning.   PMLR, 2022, pp. 639–668.
  35. K. Wen, T. Ma, and Z. Li, “How sharpness-aware minimization minimizes sharpness?” in The Eleventh International Conference on Learning Representations, 2022.
  36. J. Zhuang, B. Gong, L. Yuan, Y. Cui, H. Adam, N. C. Dvornek, J. s Duncan, T. Liu et al., “Surrogate gap minimization improves sharpness-aware training,” in International Conference on Learning Representations, 2021.
  37. J. Kwon, J. Kim, H. Park, and I. K. Choi, “Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks,” in International Conference on Machine Learning.   PMLR, 2021, pp. 5905–5914.
  38. T. Li, W. Yan, Z. Lei, Y. Wu, K. Fang, M. Yang, and X. Huang, “Efficient generalization improvement guided by random weight perturbation,” arXiv preprint arXiv:2211.11489, 2022.
  39. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: Nsga-ii,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002.
  40. A. Gaier and D. Ha, “Weight agnostic neural networks,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  41. R. S. Olson, W. La Cava, P. Orzechowski, R. J. Urbanowicz, and J. H. Moore, “PMLB: a large benchmark suite for machine learning evaluation and comparison,” BioData mining, vol. 10, no. 1, pp. 1–13, 2017.
  42. J. Vanschoren, J. N. Van Rijn, B. Bischl, and L. Torgo, “Openml: networked science in machine learning,” ACM SIGKDD Explorations Newsletter, vol. 15, no. 2, pp. 49–60, 2014.
  43. J. Ni, R. H. Drieberg, and P. I. Rockett, “The use of an analytic quotient operator in genetic programming,” IEEE Transactions on Evolutionary Computation, vol. 17, no. 1, pp. 146–152, 2012.
  44. L. Muñoz, L. Trujillo, S. Silva, M. Castelli, and L. Vanneschi, “Evolving multidimensional transformations for symbolic regression with m3gp,” Memetic Computing, vol. 11, pp. 111–126, 2019.
  45. J. Ni and P. Rockett, “Training genetic programming classifiers by vicinal-risk minimization,” Genetic Programming and Evolvable Machines, vol. 16, pp. 3–25, 2015.
  46. M. Nicolau and A. Agapitos, “Choosing function sets with better generalisation performance for symbolic regression models,” Genetic programming and evolvable machines, vol. 22, no. 1, pp. 73–100, 2021.
  47. B.-T. Zhang and H. Mühlenbein, “Balancing accuracy and parsimony in genetic programming,” Evolutionary Computation, vol. 3, no. 1, pp. 17–38, 1995.
  48. J. Ni and P. Rockett, “Tikhonov regularization as a complexity measure in multiobjective genetic programming,” IEEE Transactions on Evolutionary Computation, vol. 19, no. 2, pp. 157–166, 2014.
  49. W.-Y. Chiu, G. G. Yen, and T.-K. Juan, “Minimum manhattan distance approach to multiple criteria decision making in multiobjective optimization problems,” IEEE Transactions on Evolutionary Computation, vol. 20, no. 6, pp. 972–985, 2016.
  50. H. Zhang, Q. Chen, B. Xue, W. Banzhaf, and M. Zhang, “A semantic-based hoist mutation operator for evolutionary feature construction in regression,” IEEE Transactions on Evolutionary Computation, 2023.
  51. W. L. Cava, P. Orzechowski, B. Burlacu, F. O. de Franca, M. Virgolin, Y. Jin, M. Kommenda, and J. H. Moore, “Contemporary symbolic regression methods and their relative performance,” in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021.
  52. T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.   ACM, 2016, pp. 785–794.
  53. G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” Advances in Neural Information Processing Systems, vol. 30, pp. 3146–3154, 2017.
  54. B. Burlacu, G. Kronberger, and M. Kommenda, “Operon c++ an efficient genetic programming framework for symbolic regression,” in Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, 2020, pp. 1562–1570.
  55. H. Wei, H. Zhuang, R. Xie, L. Feng, G. Niu, B. An, and Y. Li, “Mitigating memorization of noisy labels by clipping the model prediction,” in International Conference on Machine Learning.   PMLR, 2023, pp. 36 868–36 886.
  56. K. Smith-Miles and X. Geng, “Revisiting facial age estimation with new insights from instance space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 2689–2697, 2020.
  57. M. Wever, A. Tornede, F. Mohr, and E. Hüllermeier, “Automl for multi-label classification: Overview and empirical evaluation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 9, pp. 3037–3054, 2021.
  58. L. Zimmer, M. Lindauer, and F. Hutter, “Auto-pytorch: Multi-fidelity metalearning for efficient and robust autodl,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 9, pp. 3079–3090, 2021.
  59. T. Mundhenk, M. Landajuela, R. Glatt, C. P. Santiago, B. K. Petersen et al., “Symbolic regression via deep reinforcement learning enhanced genetic programming seeding,” Advances in Neural Information Processing Systems, vol. 34, pp. 24 912–24 923, 2021.
  60. P.-A. Kamienny, S. d’Ascoli, G. Lample, and F. Charton, “End-to-end symbolic regression with transformers,” Advances in Neural Information Processing Systems, vol. 35, pp. 10 269–10 281, 2022.
  61. Z. Wang and Y. Mao, “On the generalization of models trained with sgd: Information-theoretic bounds and implications,” in International Conference on Learning Representations, 2021.
  62. H. Zhang, A. Zhou, and H. Zhang, “An evolutionary forest for regression,” IEEE Transactions on Evolutionary Computation, vol. 26, no. 4, pp. 735–749, 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets