Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl (2305.01582v3)

Published 2 May 2023 in astro-ph.IM, cs.LG, cs.NE, cs.SC, and physics.data-an

Abstract: PySR is an open-source library for practical symbolic regression, a type of machine learning which aims to discover human-interpretable symbolic models. PySR was developed to democratize and popularize symbolic regression for the sciences, and is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages. PySR's internal search algorithm is a multi-population evolutionary algorithm, which consists of a unique evolve-simplify-optimize loop, designed for optimization of unknown scalar constants in newly-discovered empirical expressions. PySR's backend is the extremely optimized Julia library SymbolicRegression.jl, which can be used directly from Julia. It is capable of fusing user-defined operators into SIMD kernels at runtime, performing automatic differentiation, and distributing populations of expressions to thousands of cores across a cluster. In describing this software, we also introduce a new benchmark, "EmpiricalBench," to quantify the applicability of symbolic regression algorithms in science. This benchmark measures recovery of historical empirical equations from original and synthetic datasets.

Citations (28)

View on Semantic Scholar

Summary

The paper presents PySR as a novel tool that democratizes symbolic regression for scientific discovery with a Python-friendly, Julia-optimized backend.
It features an evolve-simplify-optimize loop and adaptive parsimony to balance expression simplicity with high accuracy in equation discovery.
EmpiricalBench is introduced as a real-world benchmark that validates PySR’s superior performance compared to traditional symbolic regression methods.

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

The paper "Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl" presents PySR, an open-source library designed to democratize symbolic regression (SR) in scientific applications. PySR aims to uncover human-interpretable symbolic models from data, leveraging a highly optimized backend written in Julia and offering integration with Python through a familiar scikit-learn style API. The paper discusses both the software implementation and the theoretical advancements introduced by PySR, culminating in the creation of a new benchmark, EmpiricalBench, to evaluate SR methods in scientific contexts.

Symbolic Regression

Symbolic regression (SR) seeks to identify governing equations from data by exploring the space of analytic expressions. Unlike traditional methods that fit parameters within predefined models, SR searches for simple, interpretable expressions that can balance accuracy and simplicity. Historically, SR has been performed manually by scientists, relying on intuition and heuristic strategies. PySR brings automation to this process, capitalizing on modern computational capabilities to significantly expand the scope of potential expressions evaluated.

Key Features of PySR

PySR incorporates several novel elements:

Evolve-Simplify-Optimize Loop: PySR enhances the traditional evolutionary algorithm by layering an optimize step, wherein constants in expressions are refined using local gradient searches. This iterative cycle allows PySR to discover equations with embedded scalar constants efficiently.
Adaptive Parsimony: The algorithm includes an adaptive mechanism to penalize complexity, promoting exploration of both simple and complex expressions, thus preventing premature convergence.
Integration and Customization: PySR supports custom operators, user-defined loss functions, and various constraints, enabling tailored applicability across diverse scientific domains.

Evaluation with EmpiricalBench

The paper introduces EmpiricalBench, a benchmark constructed from real-world datasets tied to historical empirical discoveries. This benchmark challenges algorithms to rediscover known equations from noisy data, underscoring practical demands in scientific applications. In contrast to synthetic tests, EmpiricalBench emphasizes the retrieval of meaningful insights from genuinely empirical data.

Results and Implications

Testing demonstrates that PySR effectively competes with, and often outperforms, other SR methods, particularly in empirical discovery scenarios. It shows robust performance across various domains, proving suitable for generating insights where noise and high-dimensional datasets complicate equation discovery.

The paper carefully contrasts PySR against several SR tools, such as Operon and DSR, finding differential strengths in handling real vs. synthetic scenarios. Notably, while deep learning-based approaches like SR-Transformer exhibit theoretical appeal, they struggle with real-world data intricacies where traditional, heuristic-driven strategies still excel.

Future Developments

The implications of PySR extend beyond individual scientific fields; its capability to automatically derive symbolic models opens new avenues for interdisciplinary research. Future work could involve improving deep learning integration, thus combining the predictive prowess of neural networks with the interpretability of symbolic regression.

Conclusion

This paper illustrates the efficacy of PySR in advancing SR applications for scientific discovery. By balancing performance, interpretability, and user customization, PySR stands as a versatile tool in the scientific modeling toolkit. The introduction of EmpiricalBench sets a new standard for evaluating SR methods against realistic scientific challenges, highlighting the ongoing need for innovation in interpretable machine learning.

This contribution underscores the potential of automated symbolic models in science, inviting future enhancements and novel applications across an expanding array of disciplines.

PDF Markdown

Related Papers

GitHub

GitHub - MilesCranmer/PySR: High-Performance Symbolic Regression in Python and Julia (2,022 stars)

Tweets

https://twitter.com/reddit_ml/status/1654999494583373824

https://twitter.com/MilesCranmer/status/1654169022852894721

https://twitter.com/jreuben1/status/1655062478714224641

YouTube

Show All Videos