Vertical Symbolic Regression via Deep Policy Gradient (2402.00254v1)
Abstract: Vertical Symbolic Regression (VSR) recently has been proposed to expedite the discovery of symbolic equations with many independent variables from experimental data. VSR reduces the search spaces following the vertical discovery path by building from reduced-form equations involving a subset of independent variables to full-fledged ones. Proved successful by many symbolic regressors, deep neural networks are expected to further scale up VSR. Nevertheless, directly combining VSR with deep neural networks will result in difficulty in passing gradients and other engineering issues. We propose Vertical Symbolic Regression using Deep Policy Gradient (VSR-DPG) and demonstrate that VSR-DPG can recover ground-truth equations involving multiple input variables, significantly beyond both deep reinforcement learning-based approaches and previous VSR variants. Our VSR-DPG models symbolic regression as a sequential decision-making process, in which equations are built from repeated applications of grammar rules. The integrated deep model is trained to maximize a policy gradient objective. Experimental results demonstrate that our VSR-DPG significantly outperforms popular baselines in identifying both algebraic equations and ordinary differential equations on a series of benchmarks.
- Scientific Discovery: Computational Explorations of the Creative Process. The MIT Press, 02 1987.
- The processes of scientific discovery: The strategy of experimentation. Cogn. Sci., 12(2):139–175, 1988.
- Scientific discovery in the age of artificial intelligence. Nature, 620(7972):47–60, 2023.
- Distilling free-form natural laws from experimental data. Science, 324(5923):81–85, 2009.
- Douglas B. Lenat. The ubiquity of discovery. Artif. Intell., 9(3):257–285, 1977.
- Linear scaling with and within semantic backpropagation-based genetic programming for symbolic regression. In GECCO, pages 1084–1092, 2019.
- Declarative bias in equation discovery. In ICML, pages 376–384. Morgan Kaufmann, 1997.
- Symbolic physics learner: Discovering governing equations via monte carlo tree search. In ICLR, 2023.
- Deep generative symbolic regression with monte-carlo-tree-search. In ICML, volume 202. PMLR, 2023.
- Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. In ICLR, 2021.
- Symbolic regression via deep reinforcement learning enhanced genetic programming seeding. In NeurIPS, pages 24912–24923, 2021.
- Symbolic regression via control variable genetic programming. In ECML/PKDD, volume 14172 of Lecture Notes in Computer Science, pages 178–195. Springer, 2023.
- Vertical symbolic regression. arXiv preprint arXiv:2312.11955, 2023.
- James Prescott Joule. On the production of heat by voltaic electricity. In Abstracts of the Papers Printed in the Philosophical Transactions of the Royal Society of London, pages 280–282, 1843.
- Symbolic regression is NP-hard. TMLR, 2022.
- Neural program synthesis with priority queue training. CoRR, abs/1801.03526, 2018.
- Designing computer experiments to determine robust control variables. Statistica Sinica, pages 571–590, 2004.
- Recurrent policy gradients. Log. J. IGPL, 18(5):620–634, 2010.
- Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn., 8:229–256, 1992.
- Roger Fletcher. Practical methods of optimization. John Wiley & Sons, 2000.
- Pushing the frontiers of density functionals by solving the fractional electron problem. Science, 374(6573):1385–1389, 2021.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- Reasoning about nonlinear system identification. Artif. Intell., 133(1):139–188, 2001.
- Discovering dynamics: From inductive logic programming to machine discovery. J. Intell. Inf. Syst., 4(1):89–108, 1995.
- Discovering governing equations from data by sparse identification of nonlinear dynamical systems. PNAS, 113(15):3932–3937, 2016.
- Toward an artificial intelligence physicist for unsupervised learning. Phys. Rev. E, 100:033311, Sep 2019.
- Robust data-driven discovery of governing physical laws with error bars. Proc Math Phys Eng Sci., 474(2217), 2018.
- Discovering physical concepts with neural networks. Physical review letters, 124(1):010508, 2020.
- Discovering symbolic models from deep learning with inductive biases. In NeurIPS, 2020.
- Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations. Science, 367(6481):1026–1030, 2020.
- Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
- Machine learning conservation laws from trajectories. Phys. Rev. Lett., 126:180604, May 2021.
- Physics knowledge discovery via neural differential equation embedding. In ECML/PKDD, pages 118–134, 2021.
- Neural ordinary differential equations. NeurIPS, 31, 2018.
- R.E. Valdés-Pérez. Human/computer interactive elucidation of reaction mechanisms: application to catalyzed hydrogenolysis of ethane. Catalysis Letters, 28:79–87, 1994.
- Functional genomic hypothesis generation and experimentation by a robot scientist. Nature, 427(6971):247–252, 2004.
- The automation of science. Science, 324(5923):85–89, 2009.
- Neural symbolic regression that scales. In ICML, volume 139, pages 936–945. PMLR, 2021.
- End-to-end symbolic regression with transformers. In NeurIPS, 2022.
- Pat Langley. BACON: A production system that discovers empirical laws. In IJCAI, page 344, 1977.
- Pat Langley. Rediscovering physics with BACON.3. In IJCAI, pages 505–507, 1979.
- BACON.5: the discovery of conservation laws. In IJCAI, pages 121–126, 1981.
- Reinforcement learning for automated scientific discovery. In AAAI Spring Symposium, 2023.
- Probabilistic grammars for equation discovery. Knowl. Based Syst., 224:107077, 2021.
- Discovery of differential equations using probabilistic grammars. In DS, volume 13601, pages 22–31. Springer, 2022.
- Renáta Dubcáková. Eureqa: software review. Genet. Program. Evolvable Mach., 12(2):173–178, 2011.
- Odeformer: Symbolic regression of dynamical systems with transformers. In ICLR. OpenReview.net, 2024.
- John R Koza. Genetic programming as a means for programming computers by natural selection. Statistics and computing, 4:87–112, 1994.
- Modern experimental design. Journal of Statistical Theory and Practice, 1(3-4):501–506, 2007.
- Qi Chen and Bing Xue. Generalisation in genetic programming for symbolic regression: Challenges and future directions. In Women in Computational Intelligence: Key Advances and Perspectives on Emerging Topics, pages 281–302. Springer, 2022.
- A computational framework for physics-informed symbolic regression with straightforward integration of domain knowledge. Scientific Reports, 13(1):1249, 2023.
- Active learning improves performance on symbolic regression tasks in stackgp. In GECCO Companion, pages 550–553, 2022.
- Active learning informs symbolic regression model development in genetic programming. In GECCO Companion, pages 587–590, 2023.
- Nico JD Nagelkerke et al. A note on a general definition of the coefficient of determination. Biometrika, 78(3):691–692, 1991.
- Unbounded solutions of models for glycolysis. Journal of mathematical biology, 82:1–23, 2021.
- R De Bartolo and Vincenzo Carbone. The role of the basic three-modes interaction during the free decay of magnetohydrodynamic turbulence. Europhysics Letters, 73(4):547, 2006.