A Transformer Model for Symbolic Regression towards Scientific Discovery (2312.04070v2)
Abstract: Symbolic Regression (SR) searches for mathematical expressions which best describe numerical datasets. This allows to circumvent interpretation issues inherent to artificial neural networks, but SR algorithms are often computationally expensive. This work proposes a new Transformer model aiming at Symbolic Regression particularly focused on its application for Scientific Discovery. We propose three encoder architectures with increasing flexibility but at the cost of column-permutation equivariance violation. Training results indicate that the most flexible architecture is required to prevent from overfitting. Once trained, we apply our best model to the SRSD datasets (Symbolic Regression for Scientific Discovery datasets) which yields state-of-the-art results using the normalized tree-based edit distance, at no extra computational cost.
- John R. Koza. Genetic Programming. The MIT Press, 1992.
- Fast Neural Models for Symbolic Regression at Scale. arXiv e-prints, art. arXiv:2007.10784, July 2020. doi: 10.48550/arXiv.2007.10784.
- Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=m5Qsh0kBQG.
- Integration of neural network-based symbolic regression in deep learning for scientific discovery. IEEE Transactions on Neural Networks and Learning Systems, 32(9):4166–4177, 2021. doi: 10.1109/TNNLS.2020.3017010.
- Discovering symbolic policies with deep reinforcement learning. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 5979–5989. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/landajuela21a.html.
- Neural symbolic regression that scales. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 936–945. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/biggio21a.html.
- Symbolicgpt: A generative transformer model for symbolic regression. In Preprint Arxiv, 2021. URL https://arxiv.org/abs/2106.14131. Under Review.
- End-to-end symbolic regression with transformers. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=GoOuIrDHG_Y.
- AI-Assisted Discovery of Quantitative and Formal Models in Social Science. arXiv e-prints, art. arXiv:2210.00563, October 2022. doi: 10.48550/arXiv.2210.00563.
- Privileged deep symbolic regression. In NeurIPS 2022 AI for Science: Progress and Promises, 2022. URL https://openreview.net/forum?id=Dzt-AGgpF0.
- Distilling free-form natural laws from experimental data. Science, 324, 2009. ISSN 00368075. doi: 10.1126/science.1165893.
- Contemporary symbolic regression methods and their relative performance. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021. URL https://openreview.net/forum?id=xVQMrDLyGst.
- Rethinking symbolic regression datasets and benchmarks for scientific discovery. arXiv preprint arXiv:2206.10540, 2022.
- Ai feynman: A physics-inspired method for symbolic regression. Science Advances, 6, 2020. ISSN 23752548. doi: 10.1126/sciadv.aay2631.
- Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, volume 2017-January, 2017. doi: 10.1109/CVPR.2017.16.
- Sympy: symbolic computing in python. PeerJ Computer Science, 3:e103, January 2017. ISSN 2376-5992. doi: 10.7717/peerj-cs.103. URL https://doi.org/10.7717/peerj-cs.103.
- Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pages 1–15, 2015.
- Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing, 18, 1989. ISSN 00975397. doi: 10.1137/0218082.
- GPLearn. Gplearn, genetic programming in python. https://gplearn.readthedocs.io/en/stable, 2023. Accessed: 2023-09-01.
- Age-fitness pareto optimization. In Genetic Programming Theory and Practice VIII, 2010. doi: 10.1145/1830483.1830584.