LLM-ACES: Closed-Loop Discovery of Dynamical Systems with LLM-Guided Adaptive Search

Published 23 Jun 2026 in cs.LG, cs.AI, cs.CL, and math.DS | (2606.25039v1)

Abstract: Recovering governing Ordinary Differential Equations (ODEs) from data is a central challenge in modeling dynamical systems across scientific domains. Existing approaches cast discovery as a static inference problem over fixed datasets, assuming that the observed trajectories are sufficiently informative. However, dynamical systems evolve over large state spaces, and limited data can make multiple equations observationally indistinguishable, leading to identifiability gaps and the recovery of incorrect governing equations. To address this, we introduce LLM-ACES, or LLM-guided Active Closed-loop Equation Search, a closed-loop framework that jointly optimizes symbolic hypothesis construction and adaptive data acquisition. In LLM-ACES, a LLM proposes operator priors that partition the large search space into distinct regions, within which candidate equations are fit to the observed data. The disagreement among these candidates guides the acquisition of informative trajectories, creating a feedback loop that iteratively refines both the hypothesis space and the discovered dynamics. On 122 ODE systems spanning ODEBench and ODEBase, LLM-ACES achieves the lowest median NMSE, outperforming state-of-the-art baselines by several orders of magnitude while achieving a high symbolic accuracy of 46.2% and 52.4%, respectively. Our analysis further shows that LLM-ACES is sample-efficient, achieving better performance with one-tenth the data. Furthermore, LLM-ACES's feedback-driven data acquisition makes it robust to noise and recovers the correct symbolic structure, while baselines introduce spurious terms that fit the data locally but obscure the true governing relationships.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper presents a novel closed-loop framework integrating LLM-guided operator priors, constrained symbolic regression, and predictive divergence for precise ODE discovery.
It achieves orders-of-magnitude lower NMSE and enhanced symbolic accuracy through iterative feedback, active data acquisition, and memory-based model refinement.
The approach significantly reduces data requirements and addresses identifiability gaps compared to static, one-shot regression methods.

Closed-Loop Dynamical System Discovery with LLM-ACES

Problem Setting: Data-Efficient and Identifiable Equation Discovery

Symbolic recovery of ODEs from empirical trajectory data underpins scientific progress across applied fields by yielding interpretable, mechanistic models. Existing approaches (e.g., SINDy, PySR, LLM-ODE) treat ODE discovery as a one-shot regression or static symbolic search, relying on fixed, passively acquired datasets and manually curated operator vocabularies. This paradigm introduces identifiability gaps: in many dynamical regimes, different candidate equations can fit observed trajectories equally well, resulting in incorrect or incomplete system identification. Such issues are exacerbated in the presence of nonlinearities, noise, or limited data coverage across state space.

LLM-ACES Framework: Iterative Hypothesis-Driven Adaptive Experimentation

The LLM-ACES framework addresses the above limitations via an integrated, closed-loop pipeline: symbolic hypothesis construction and active data acquisition are interleaved, allowing data and symbolic models to co-evolve. The core mechanism is as follows:

LLM-Guided Operator Priors: Instead of exhaustively searching the symbolic hypothesis space, LLM-ACES leverages LLMs (e.g., GPT-4o-mini, Qwen3-32B) to induce sets of operator-level priors conditioned on task metadata, past performance, and explicit memory buffers. Each prior restricts candidate equations to subspaces with curated unary and binary operators, constraining the search and focusing exploration on plausible dynamical regimes.
Constrained Symbolic Regression: Given the LLM-generated operator priors, symbolic regression (via PySR) fits candidate equations restricted to each prior. This yields a candidate hypothesis population spanning structurally distinct, but domain-informed, equation families.
Predictive Divergence-Driven Trajectory Acquisition: The main engine for active learning is the selection of new initial conditions that maximize the expected disagreement (NMSE) among candidate rollouts, thereby targeting state-space regions with unresolved structural uncertainty. The oracle (simulator or experiment) is queried at these conditions; resulting trajectories are appended to both training and validation sets.
Iterative Feedback and Memory: After each acquisition, candidates are re-evaluated, and high-performing/low-performing equations update a persistent memory buffer. This buffer provides positive and negative context to the LLM in the next round, driving both exploitation (successful operator combinations) and diversification (avoidance of spurious structures).

Experimental Results

Datasets/Benchmarks: LLM-ACES was evaluated on ODEBench (63 dynamical systems, primarily physical) and ODEBase (59 biologically motivated ODEs), with tasks covering 1D to 4D systems and a broad spectrum of nonlinearities.

Metrics: Data fidelity (NMSE on reconstruction, generalization, and out-of-distribution), symbolic accuracy (structural equivalence to ground truth), and expression complexity (symbolic tree size) were used.

Key Outcomes:

On ODEBench, LLM-ACES achieves median NMSE in the order of $10^{-17}$ (reconstruction), outperforming all baselines by several orders of magnitude. Symbolic accuracy is 46.2% (GPT-4o-mini) and 45.6% (Qwen3-32B), exceeding existing LLM and active regression methods.
On ODEBase, LLM-ACES achieves up to 52.4% symbolic accuracy (Qwen3-32B), with median NMSE $<10^{-14}$ in all evaluation regimes.
Robustness: LLM-ACES exhibits high stability against noise and irregular trajectory sampling, outperforming sparse/evolutionary regression and transformer-based discovery methods without increased expression complexity.
Sample efficiency: Performance saturates at one-tenth the data volume required by comparator baselines. Even when given 5–10x more samples, passive and active static-search methods do not achieve comparable results.

Ablation and Qualitative Analysis

Ablation studies confirm the critical role of both LLM-induced operator priors and predictive divergence in acquisition. Removing either leads to an increase in NMSE by several orders of magnitude and higher variance.
Diversity enforcement in operator priors is essential; otherwise, the LLM collapses to locally optimal, non-complementary sets, degrading performance.
Qualitative trajectory overlays demonstrate that, unlike passive methods that return oversimplified or locally overfit equations, LLM-ACES recovers structurally correct models even under limited or ambiguous initial data.
Memorization controls (variable anonymization) show that the method's performance persists even when memorization from LLM pretraining is eliminated, validating genuine discovery as opposed to recall.

Theoretical and Practical Implications

LLM-ACES reframes ODE discovery as an interactive inference process, where symbolic hypothesis spaces and data acquisition reciprocally drive model refinement. This approach directly addresses identifiability constraints of classical symbolic regression and overcomes data efficiency bottlenecks by linking candidate model uncertainty to targeted experimentation. Practically, the framework is suitable for domains with scarce or high-cost observations (e.g., systems biology, physics-driven engineering, climate modeling), provided access to simulatable or instrumented systems.

On the theoretical front, LLM-ACES advances the formalism for integrating foundation models with active search for structured scientific discovery. The separation of hypothesis-space design (LLMs) from parameter optimization (regression) enables domain-agnostic generalization and extensibility.

Limitations and Future Directions

The current instantiation is specific to autonomous ODEs and presumes access to query oracles. Generalization to PDEs, stochastic, or highly noisy systems requires adaptation in data acquisition strategy (e.g., non-autonomous controllers, regularized solvers) and more robust failure detection for LLM-induced priors. The method's dependence on LLM prompting and regression backend performance should be systematically studied, including failure modes from operator prior bias.

Anticipated developments include:

Extension to hybrid discrete-continuous or PDE systems
Incorporation of experimental cost/constraint models into acquisition
End-to-end differentiable operator-pool search, possibly unifying LLM and regression loops

Conclusion

LLM-ACES establishes that active, model-informed experimental design, powered by LLM-guided symbolic hypothesis exploration, outperforms passive and static symbolic regression in both accuracy and efficiency for governing equation discovery from data. This paradigm supports the conception of AI-for-science systems that treat models not merely as retrospective explanations but as actionable agents for experimental planning and scientific reasoning, setting a strong foundation for further research in adaptive, interpretable scientific machine learning.

Reference: "LLM-ACES: Closed-Loop Discovery of Dynamical Systems with LLM-Guided Adaptive Search" (2606.25039)

Markdown Report Issue