Papers
Topics
Authors
Recent
2000 character limit reached

Inexact Simplification of Symbolic Regression Expressions with Locality-sensitive Hashing (2404.05898v1)

Published 8 Apr 2024 in cs.NE and cs.LG

Abstract: Symbolic regression (SR) searches for parametric models that accurately fit a dataset, prioritizing simplicity and interpretability. Despite this secondary objective, studies point out that the models are often overly complex due to redundant operations, introns, and bloat that arise during the iterative process, and can hinder the search with repeated exploration of bloated segments. Applying a fast heuristic algebraic simplification may not fully simplify the expression and exact methods can be infeasible depending on size or complexity of the expressions. We propose a novel agnostic simplification and bloat control for SR employing an efficient memoization with locality-sensitive hashing (LHS). The idea is that expressions and their sub-expressions traversed during the iterative simplification process are stored in a dictionary using LHS, enabling efficient retrieval of similar structures. We iterate through the expression, replacing subtrees with others of same hash if they result in a smaller expression. Empirical results shows that applying this simplification during evolution performs equal or better than without simplification in minimization of error, significantly reducing the number of nonlinear functions. This technique can learn simplification rules that work in general or for a specific problem, and improves convergence while reducing model complexity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Gaining deeper insights in symbolic regression. Genetic Programming Theory and Practice XI (2014), 175–190.
  2. Guilherme Seidyo Imai Aldeia and Fabrício Olivetti de França. 2022. Interaction-Transformation Evolutionary Algorithm with Coefficients Optimization. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (Boston, Massachusetts) (GECCO ’22). Association for Computing Machinery, New York, NY, USA, 2274–2281. https://doi.org/10.1145/3520304.3533987
  3. Artificial Intelligence in Physical Sciences: Symbolic Regression Trends and Perspectives. Archives of Computational Methods in Engineering 30, 6 (April 2023), 3845–3865. https://doi.org/10.1007/s11831-023-09922-z
  4. Bayesian model selection for reducing bloat and overfitting in genetic programming for symbolic regression. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, Boston Massachusetts, 526–529. https://doi.org/10.1145/3520304.3528899
  5. Hash-Based Tree Similarity and Simplification in Genetic Programming for Symbolic Regression. https://doi.org/10.1007/2F978-3-030-45093-9_44 arXiv:2107.10640 [cs].
  6. Moses S Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. 380–388.
  7. Searching in metric spaces. ACM Comput. Surv. 33, 3 (sep 2001), 273–321. https://doi.org/10.1145/502807.502808
  8. F. O. de Franca and G. S. I. Aldeia. 2021. Interaction–Transformation Evolutionary Algorithm for Symbolic Regression. Evolutionary Computation 29, 3 (09 2021), 367–390. https://doi.org/10.1162/evco_a_00285 arXiv:https://direct.mit.edu/evco/article-pdf/29/3/367/1959462/evco_a_00285.pdf
  9. Fabricio Olivetti de Franca and Gabriel Kronberger. 2023. Reducing Overparameterization of Symbolic Regression Models with Equality Saturation. In Proceedings of the Genetic and Evolutionary Computation Conference (Lisbon, Portugal) (GECCO ’23). Association for Computing Machinery, New York, NY, USA, 1064–1072. https://doi.org/10.1145/3583131.3590346
  10. DEAP: Evolutionary Algorithms Made Easy. Journal of Machine Learning Research 13 (jul 2012), 2171–2175.
  11. Similarity Search in High Dimensions via Hashing. ([n. d.]).
  12. Improving generalization of evolved programs through automatic simplification. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, Berlin Germany, 937–944. https://doi.org/10.1145/3071178.3071330
  13. A Survey on Locality Sensitive Hashing Algorithms and their Applications. http://arxiv.org/abs/2102.08942 arXiv:2102.08942 [cs].
  14. Parameter identification for symbolic regression using nonlinear least squares. Genetic Programming and Evolvable Machines 21, 3 (Dec. 2019), 471–501. https://doi.org/10.1007/s10710-019-09371-3
  15. John R Koza. 1994. Genetic programming as a means for programming computers by natural selection. Statistics and computing 4 (1994), 87–112.
  16. Kulunchakov. 2017. Creation of parametric rules to rewrite algebraic expressions in Symbolic Regression. Machine Learning and Data Analysis 3, 1 (2017), 6–19. https://doi.org/10.21469/22233792.3.1.01
  17. Automatic identification of wind turbine models using evolutionary multiobjective optimization. Renewable Energy 87 (2016), 892–902. https://doi.org/10.1016/j.renene.2015.09.068 Optimization Methods in Renewable Energy Systems Design.
  18. Contemporary Symbolic Regression Methods and their Relative Performance. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, J. Vanschoren and S. Yeung (Eds.), Vol. 1. Curran.
  19. Learning concise representations for regression by evolving networks of trees. http://arxiv.org/abs/1807.00981 arXiv:1807.00981 [cs].
  20. A flexible symbolic regression method for constructing interpretable clinical prediction models. npj Digital Medicine 6, 1 (June 2023), 107. https://doi.org/10.1038/s41746-023-00833-8
  21. Kenneth Levenberg. 1944. A method for the solution of certain non-linear problems in least squares. Quarterly of applied mathematics 2, 2 (1944), 164–168.
  22. S. Luke. 2000. Two fast tree-creation algorithms for genetic programming. IEEE Transactions on Evolutionary Computation 4, 3 (2000), 274–283. https://doi.org/10.1109/4235.873237
  23. Sean Luke and Liviu Panait. 2006. A Comparison of Bloat Control Methods for Genetic Programming. Evolutionary Computation 14, 3 (Sept. 2006), 309–344. https://doi.org/10.1162/evco.2006.14.3.309
  24. Jiayi Luo and Cindy Long Yu. 2023. The Application of Symbolic Regression on Identifying Implied Volatility Surface. Mathematics 11, 9 (2023). https://doi.org/10.3390/math11092108
  25. Donald W Marquardt. 1963. An algorithm for least-squares estimation of nonlinear parameters. Journal of the society for Industrial and Applied Mathematics 11, 2 (1963), 431–441.
  26. Quang Uy Nguyen and Thi Huong Chu. 2020. Semantic approximation for reducing code bloat in Genetic Programming. Swarm and Evolutionary Computation 58 (Nov. 2020), 100729. https://doi.org/10.1016/j.swevo.2020.100729
  27. Marco Virgolin and Solon P. Pissis. 2022. Symbolic Regression is NP-hard. (2022). https://doi.org/10.48550/ARXIV.2207.01018 Publisher: arXiv Version Number: 3.
  28. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17 (2020), 261–272. https://doi.org/10.1038/s41592-019-0686-2
  29. Tony Worm and Kenneth Chiu. 2013. Prioritized Grammar Enumeration: Symbolic Regression by Dynamic Programming. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation (Amsterdam, The Netherlands) (GECCO ’13). Association for Computing Machinery, New York, NY, USA, 1021–1028. https://doi.org/10.1145/2463372.2463486
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 18 likes about this paper.