Deep symbolic regression for physics guided by units constraints: toward the automated discovery of physical laws (2303.03192v2)

Published 6 Mar 2023 in astro-ph.IM, cs.LG, and physics.comp-ph

Abstract: Symbolic Regression is the study of algorithms that automate the search for analytic expressions that fit data. While recent advances in deep learning have generated renewed interest in such approaches, the development of symbolic regression methods has not been focused on physics, where we have important additional constraints due to the units associated with our data. Here we present $\Phi$-SO, a Physical Symbolic Optimization framework for recovering analytical symbolic expressions from physics data using deep reinforcement learning techniques by learning units constraints. Our system is built, from the ground up, to propose solutions where the physical units are consistent by construction. This is useful not only in eliminating physically impossible solutions, but because the "grammatical" rules of dimensional analysis restrict enormously the freedom of the equation generator, thus vastly improving performance. The algorithm can be used to fit noiseless data, which can be useful for instance when attempting to derive an analytical property of a physical model, and it can also be used to obtain analytical approximations to noisy data. We test our machinery on a standard benchmark of equations from the Feynman Lectures on Physics and other physics textbooks, achieving state-of-the-art performance in the presence of noise (exceeding 0.1%) and show that it is robust even in the presence of substantial (10%) noise. We showcase its abilities on a panel of examples from astrophysics.

Citations (49)

View on Semantic Scholar

Summary

The paper introduces Φ-SO, a novel deep symbolic regression framework that integrates unit constraints to ensure physical plausibility.
It employs an RNN with deep reinforcement learning to generate mathematically valid expressions while efficiently narrowing the search space.
Benchmarking on physics datasets shows state-of-the-art performance and robustness to noise, highlighting its potential for accelerating scientific discovery.

Deep Symbolic Regression for Physics Guided by Units Constraints

The field of symbolic regression (SR) aims to automate the discovery of mathematical expressions that capture the underlying relationships in a given dataset. While recent advancements in SR, primarily driven by deep learning techniques, have shown promise across various domains, there remains a distinct challenge in applying these methods to physics. The crux of this challenge lies in maintaining physical plausibility, especially in ensuring that resulting equations are dimensionally consistent. The paper presents a novel framework, termed $\Phi$ -SO (Physical Symbolic Optimization), which addresses this by incorporating unit constraints directly into the symbolic regression process.

Overview and Methods

The paper introduces $\Phi$ -SO, a comprehensive framework that leverages deep reinforcement learning to identify symbolic expressions. A unique aspect of this work is its emphasis on dimensional analysis, where the consistency of physical units is enforced by construction, mitigating the risk of producing non-physical equations. This is achieved through the integration of a Physical Units Prior, which serves to constrain the search space of the symbolic expressions.

The methodology involves the use of a recurrent neural network (RNN) to generate sequences of mathematical symbols, akin to generating linguistic sequences in natural language processing. The RNN is designed to adhere to the constraints imposed by the units of the variables, effectively reducing the search space from potentially infeasible expressions. By incorporating units into the learning process, the RNN naturally learns to prioritize relationships that conform to physical laws.

Results and Evaluation

The efficacy of the $\Phi$ -SO approach is benchmarked using equations from the Feynman Lectures on Physics and other standardized physics datasets. The results demonstrate state-of-the-art performance, particularly in scenarios involving noise levels up to 10%. This robustness to noise is a significant advantage over existing SR approaches, which often struggle to maintain accuracy under noisy conditions.

An ablation paper highlights the critical contributions of each component of the framework, emphasizing that both the units prior and the informed neural network are indispensable for achieving the observed performance gains.

Implications and Future Directions

The implications of this research are twofold. Practically, $\Phi$ -SO provides a powerful tool for the discovery of interpretable physical laws from empirical data, which could accelerate innovations in fields like astrophysics and cosmology. Theoretically, it represents a step forward in incorporating domain-specific knowledge into machine learning models, a trend increasingly seen as crucial for the responsible application of AI technologies.

Looking forward, the authors suggest that the integration of differential operators and extending the current system to manage more intricate forms of mathematical expressions could further enhance its applicability. This could potentially open new avenues for solving partial differential equations or identifying novel relationships in large, complex scientific datasets.

In conclusion, by systematically embedding the constraints of dimensional consistency into the symbolic regression process, this work not only showcases a method for refining the search for physical relationships but also underscores the importance of domain-specific guidelines in broader AI contexts.

PDF Markdown

Related Papers

Tweets

https://twitter.com/WassimTenachi/status/1909656402496303409

YouTube

Show All Videos