Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Symbolic Regression via Neural-Guided Genetic Programming Population Seeding (2111.00053v2)

Published 29 Oct 2021 in cs.NE, cs.AI, and cs.LG

Abstract: Symbolic regression is the process of identifying mathematical expressions that fit observed output from a black-box process. It is a discrete optimization problem generally believed to be NP-hard. Prior approaches to solving the problem include neural-guided search (e.g. using reinforcement learning) and genetic programming. In this work, we introduce a hybrid neural-guided/genetic programming approach to symbolic regression and other combinatorial optimization problems. We propose a neural-guided component used to seed the starting population of a random restart genetic programming component, gradually learning better starting populations. On a number of common benchmark tasks to recover underlying expressions from a dataset, our method recovers 65% more expressions than a recently published top-performing model using the same experimental setup. We demonstrate that running many genetic programming generations without interdependence on the neural-guided component performs better for symbolic regression than alternative formulations where the two are more strongly coupled. Finally, we introduce a new set of 22 symbolic regression benchmark problems with increased difficulty over existing benchmarks. Source code is provided at www.github.com/brendenpetersen/deep-symbolic-optimization.

Citations (71)

Summary

  • The paper introduces a hybrid method that merges neural-guided seeding with genetic programming for symbolic regression.
  • It demonstrates a 65% improvement in expression recovery rates by refining the genetic programming population with a neural network.
  • The approach, validated on 22 benchmark problems, offers promising insights for broader applications in AI-driven combinatorial optimization.

Symbolic Regression via Neural-Guided Genetic Programming Population Seeding

Introduction

The paper presents an innovative approach to symbolic regression by merging neural-guided search with genetic programming (GP) to address the complex task of identifying mathematical expressions that approximate observed outputs. Symbolic regression, a widely recognized NP-hard problem, involves exploring the space of possible mathematical formulations. The authors propose a hybrid mechanism that integrates a neural-guided component for seeding the genetic programming population, gradually refining the starting points to improve the discovery of expressions.

Methodology Overview

The core contribution of this work is a dual-component system combining a sequence generator (neural network) and a genetic programming module.

  1. Sequence Generator: The neural network produces a batch of expressions that guide the initialization of the genetic programming module. This component effectively learns optimal starting populations over time, leading to enhanced performance in symbolic regression tasks.
  2. Genetic Programming: Utilizes a set of evolutionary operations (mutation, crossover, selection) to evolve expressions over several generations. Novel constraints ensure logical validity and prevent nonsensical expressions.
  3. Integration Mechanism: At each iteration, expressions generated by the neural network seed the initial population for GP. The GP iteratively refines these expressions, which are then used to improve neural network training, forming a cycle of mutual enhancement.

Results and Analysis

The paper reports substantial improvements in symbolic regression performance compared to existing methods, as demonstrated on benchmark datasets such as Nguyen and R rationals. Specifically, the approach achieves a 65% improvement in expression recovery rates over a previously leading algorithm, utilizing a common experimental setup.

Key empirical metrics include:

  • Recovery Rate: The hybrid method attains state-of-the-art recovery rates, solving a majority of benchmark problems.
  • Benchmark Performance: The new technique displays robust performance across a novel set of 22 symbolic regression problems with varying difficulty levels, outperforming other competitive approaches.

Implications and Future Directions

The merging of neural-guided search with genetic programming introduces a potent methodology for tackling NP-hard optimization problems, opening avenues for significant advancements in AI-driven discovery tasks across scientific domains. The results suggest a promising direction for future research to explore deeper integrations of neural network insights into combinatorial search frameworks.

The paper's insights into maintaining a generative model that evolves independently of interdependencies highlight potential applications in optimization problems beyond symbolic regression. Future research could explore extending this framework to incorporate alternative policy-gradient methods, potentially addressing off-policy issues and enhancing the robustness of the neural-guided component.

Moreover, this strategy could inform developments in other combinatorial optimization problems, including but not limited to, hyperparameter tuning and automated theorem proving, suggesting a broad applicability of the proposed hybrid method in artificial intelligence.

Github Logo Streamline Icon: https://streamlinehq.com