Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Interpretable Symbolic Regression for Data Science: Analysis of the 2022 Competition (2304.01117v3)

Published 3 Apr 2023 in cs.LG and cs.AI

Abstract: Symbolic regression searches for analytic expressions that accurately describe studied phenomena. The main attraction of this approach is that it returns an interpretable model that can be insightful to users. Historically, the majority of algorithms for symbolic regression have been based on evolutionary algorithms. However, there has been a recent surge of new proposals that instead utilize approaches such as enumeration algorithms, mixed linear integer programming, neural networks, and Bayesian optimization. In order to assess how well these new approaches behave on a set of common challenges often faced in real-world data, we hosted a competition at the 2022 Genetic and Evolutionary Computation Conference consisting of different synthetic and real-world datasets which were blind to entrants. For the real-world track, we assessed interpretability in a realistic way by using a domain expert to judge the trustworthiness of candidate models.We present an in-depth analysis of the results obtained in this competition, discuss current challenges of symbolic regression algorithms and highlight possible improvements for future competitions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Anaconda, “Anaconda software distribution.” [Online]. Available: https://anaconda.com/
  2. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. VanderPlas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011. [Online]. Available: https://dl.acm.org/doi/10.5555/1953048.2078195
  3. A. Meurer, C. P. Smith, M. Paprocki, O. Čertík, S. B. Kirpichev, M. Rocklin, A. Kumar, S. Ivanov, J. K. Moore, S. Singh et al., “Sympy: symbolic computing in python,” PeerJ Computer Science, vol. 3, p. e103, 2017.
  4. J. D. Romano, T. T. Le, W. La Cava, J. T. Gregg, D. J. Goldberg, P. Chakraborty, N. L. Ray, D. Himmelstein, W. Fu, and J. H. Moore, “PMLB v1.0: An open-source dataset collection for benchmarking machine learning methods,” Bioinformatics, 2022.
  5. R. S. Olson, W. La Cava, P. Orzechowski, R. J. Urbanowicz, and J. H. Moore, “PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison,” BioData Mining, 2017.
  6. D. L. Randall, T. S. Townsend, J. D. Hochhalter, and G. F. Bomarito, “Bingo: a customizable framework for symbolic regression with genetic programming,” in Proceedings of the Genetic and Evolutionary Computation Conference Companion, 2022, pp. 2282–2288.
  7. M. D. Schmidt and H. Lipson, “Coevolution of fitness predictors,” IEEE Transactions on Evolutionary Computation, vol. 12, no. 6, pp. 736–749, 2008.
  8. P.-A. Kamienny, S. d’Ascoli, G. Lample, and F. Charton, “End-to-end symbolic regression with transformers,” arXiv preprint arXiv:2204.10532, 2022.
  9. H. Zhang, A. Zhou, H. Qian, and H. Zhang, “Ps-tree: A piecewise symbolic regression tree,” Swarm and Evolutionary Computation, vol. 71, p. 101061, 2022.
  10. K. R. Broløs, M. V. Machado, C. Cave, J. Kasak, V. Stentoft-Hansen, V. G. Batanero, T. Jelen, and C. Wilstrup, “An approach to symbolic regression using feyn,” arXiv preprint arXiv:2104.05417, 2021.
  11. B. He, Q. Lu, Q. Yang, J. Luo, and Z. Wang, “Taylor genetic programming for symbolic regression,” arXiv preprint arXiv:2205.09751, 2022.
  12. S. Sahoo, C. Lampert, and G. Martius, “Learning equations for extrapolation and control,” in International Conference on Machine Learning.   PMLR, 2018, pp. 4442–4450.
  13. G. Espada, L. Ingelse, P. Canelas, P. Barbosa, and A. Fonseca, “Data types as a more ergonomic frontend for grammar-guided genetic programming,” in Proceedings of the 21st ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2022, Auckland, New Zealand, December 6-7, 2022, B. Scholz and Y. Kameyama, Eds.   ACM, 2022, pp. 86–94. [Online]. Available: https://doi.org/10.1145/3564719.3568697
  14. B. Burlacu, G. Kronberger, and M. Kommenda, “Operon C++ an efficient genetic programming framework for symbolic regression,” in Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, 2020, pp. 1562–1570.
  15. T.-W. Huang, D.-L. Lin, C.-X. Lin, and Y. Lin, “Taskflow: A lightweight parallel and heterogeneous task graph computing system,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 6, pp. 1303–1320, 2022.
  16. T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019.
  17. M. Cranmer, “Pysr: Fast & parallelized symbolic regression in python/julia,” 2020.
  18. M. Landajuela, C. Lee, J. Yang, R. Glatt, C. P. Santiago, I. Aravena, T. N. Mundhenk, G. Mulcahy, and B. K. Petersen, “A unified framework for deep symbolic regression,” in Advances in Neural Information Processing Systems, A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022. [Online]. Available: https://openreview.net/forum?id=2FNnBhwJsHK
  19. S.-M. Udrescu, A. Tan, J. Feng, O. Neto, T. Wu, and M. Tegmark, “Ai feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity,” Advances in Neural Information Processing Systems, vol. 33, pp. 4860–4871, 2020.
  20. B. K. Petersen, M. L. Larma, T. N. Mundhenk, C. P. Santiago, S. K. Kim, and J. T. Kim, “Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients,” arXiv preprint arXiv:1912.04871, 2019.
  21. L. Biggio, T. Bendinelli, A. Neitz, A. Lucchi, and G. Parascandolo, “Neural symbolic regression that scales,” in International Conference on Machine Learning.   PMLR, 2021, pp. 936–945.
  22. S. L. Brunton, J. L. Proctor, and J. N. Kutz, “Discovering governing equations from data by sparse identification of nonlinear dynamical systems,” Proceedings of the national academy of sciences, vol. 113, no. 15, pp. 3932–3937, 2016.
  23. T. Mundhenk, M. Landajuela, R. Glatt, C. P. Santiago, B. K. Petersen et al., “Symbolic regression via deep reinforcement learning enhanced genetic programming seeding,” Advances in Neural Information Processing Systems, vol. 34, pp. 24 912–24 923, 2021.
  24. G. Dick, C. A. Owen, and P. A. Whigham, “Feature standardisation and coefficient optimisation for effective symbolic regression,” in Proceedings of the 2020 Genetic and Evolutionary Computation Conference, ser. GECCO ’20.   New York, NY, USA: Association for Computing Machinery, 2020, p. 306–314. [Online]. Available: https://doi.org/10.1145/3377930.3390237
  25. G. Dick, “Genetic programming, standardisation, and stochastic gradient descent revisited: Initial findings on srbench,” in Proceedings of the Genetic and Evolutionary Computation Conference Companion, ser. GECCO ’22.   New York, NY, USA: Association for Computing Machinery, 2022, p. 2265–2273. [Online]. Available: https://doi.org/10.1145/3520304.3534040
  26. D. Izzo, F. Biscani, and A. Mereta, “Differentiable genetic programming,” in Genetic Programming: 20th European Conference, EuroGP 2017, Amsterdam, The Netherlands, April 19-21, 2017, Proceedings 20.   Springer, 2017, pp. 35–51.
Citations (9)

Summary

We haven't generated a summary for this paper yet.