- The paper presents a novel closed-loop framework integrating LLM-guided operator priors, constrained symbolic regression, and predictive divergence for precise ODE discovery.
- It achieves orders-of-magnitude lower NMSE and enhanced symbolic accuracy through iterative feedback, active data acquisition, and memory-based model refinement.
- The approach significantly reduces data requirements and addresses identifiability gaps compared to static, one-shot regression methods.
Closed-Loop Dynamical System Discovery with LLM-ACES
Problem Setting: Data-Efficient and Identifiable Equation Discovery
Symbolic recovery of ODEs from empirical trajectory data underpins scientific progress across applied fields by yielding interpretable, mechanistic models. Existing approaches (e.g., SINDy, PySR, LLM-ODE) treat ODE discovery as a one-shot regression or static symbolic search, relying on fixed, passively acquired datasets and manually curated operator vocabularies. This paradigm introduces identifiability gaps: in many dynamical regimes, different candidate equations can fit observed trajectories equally well, resulting in incorrect or incomplete system identification. Such issues are exacerbated in the presence of nonlinearities, noise, or limited data coverage across state space.
LLM-ACES Framework: Iterative Hypothesis-Driven Adaptive Experimentation
The LLM-ACES framework addresses the above limitations via an integrated, closed-loop pipeline: symbolic hypothesis construction and active data acquisition are interleaved, allowing data and symbolic models to co-evolve. The core mechanism is as follows:
- LLM-Guided Operator Priors: Instead of exhaustively searching the symbolic hypothesis space, LLM-ACES leverages LLMs (e.g., GPT-4o-mini, Qwen3-32B) to induce sets of operator-level priors conditioned on task metadata, past performance, and explicit memory buffers. Each prior restricts candidate equations to subspaces with curated unary and binary operators, constraining the search and focusing exploration on plausible dynamical regimes.
- Constrained Symbolic Regression: Given the LLM-generated operator priors, symbolic regression (via PySR) fits candidate equations restricted to each prior. This yields a candidate hypothesis population spanning structurally distinct, but domain-informed, equation families.
- Predictive Divergence-Driven Trajectory Acquisition: The main engine for active learning is the selection of new initial conditions that maximize the expected disagreement (NMSE) among candidate rollouts, thereby targeting state-space regions with unresolved structural uncertainty. The oracle (simulator or experiment) is queried at these conditions; resulting trajectories are appended to both training and validation sets.
- Iterative Feedback and Memory: After each acquisition, candidates are re-evaluated, and high-performing/low-performing equations update a persistent memory buffer. This buffer provides positive and negative context to the LLM in the next round, driving both exploitation (successful operator combinations) and diversification (avoidance of spurious structures).
Experimental Results
Datasets/Benchmarks: LLM-ACES was evaluated on ODEBench (63 dynamical systems, primarily physical) and ODEBase (59 biologically motivated ODEs), with tasks covering 1D to 4D systems and a broad spectrum of nonlinearities.
Metrics: Data fidelity (NMSE on reconstruction, generalization, and out-of-distribution), symbolic accuracy (structural equivalence to ground truth), and expression complexity (symbolic tree size) were used.
Key Outcomes:
- On ODEBench, LLM-ACES achieves median NMSE in the order of 10−17 (reconstruction), outperforming all baselines by several orders of magnitude. Symbolic accuracy is 46.2% (GPT-4o-mini) and 45.6% (Qwen3-32B), exceeding existing LLM and active regression methods.
- On ODEBase, LLM-ACES achieves up to 52.4% symbolic accuracy (Qwen3-32B), with median NMSE <10−14 in all evaluation regimes.
- Robustness: LLM-ACES exhibits high stability against noise and irregular trajectory sampling, outperforming sparse/evolutionary regression and transformer-based discovery methods without increased expression complexity.
- Sample efficiency: Performance saturates at one-tenth the data volume required by comparator baselines. Even when given 5–10x more samples, passive and active static-search methods do not achieve comparable results.
Ablation and Qualitative Analysis
- Ablation studies confirm the critical role of both LLM-induced operator priors and predictive divergence in acquisition. Removing either leads to an increase in NMSE by several orders of magnitude and higher variance.
- Diversity enforcement in operator priors is essential; otherwise, the LLM collapses to locally optimal, non-complementary sets, degrading performance.
- Qualitative trajectory overlays demonstrate that, unlike passive methods that return oversimplified or locally overfit equations, LLM-ACES recovers structurally correct models even under limited or ambiguous initial data.
- Memorization controls (variable anonymization) show that the method's performance persists even when memorization from LLM pretraining is eliminated, validating genuine discovery as opposed to recall.
Theoretical and Practical Implications
LLM-ACES reframes ODE discovery as an interactive inference process, where symbolic hypothesis spaces and data acquisition reciprocally drive model refinement. This approach directly addresses identifiability constraints of classical symbolic regression and overcomes data efficiency bottlenecks by linking candidate model uncertainty to targeted experimentation. Practically, the framework is suitable for domains with scarce or high-cost observations (e.g., systems biology, physics-driven engineering, climate modeling), provided access to simulatable or instrumented systems.
On the theoretical front, LLM-ACES advances the formalism for integrating foundation models with active search for structured scientific discovery. The separation of hypothesis-space design (LLMs) from parameter optimization (regression) enables domain-agnostic generalization and extensibility.
Limitations and Future Directions
The current instantiation is specific to autonomous ODEs and presumes access to query oracles. Generalization to PDEs, stochastic, or highly noisy systems requires adaptation in data acquisition strategy (e.g., non-autonomous controllers, regularized solvers) and more robust failure detection for LLM-induced priors. The method's dependence on LLM prompting and regression backend performance should be systematically studied, including failure modes from operator prior bias.
Anticipated developments include:
- Extension to hybrid discrete-continuous or PDE systems
- Incorporation of experimental cost/constraint models into acquisition
- End-to-end differentiable operator-pool search, possibly unifying LLM and regression loops
Conclusion
LLM-ACES establishes that active, model-informed experimental design, powered by LLM-guided symbolic hypothesis exploration, outperforms passive and static symbolic regression in both accuracy and efficiency for governing equation discovery from data. This paradigm supports the conception of AI-for-science systems that treat models not merely as retrospective explanations but as actionable agents for experimental planning and scientific reasoning, setting a strong foundation for further research in adaptive, interpretable scientific machine learning.
Reference: "LLM-ACES: Closed-Loop Discovery of Dynamical Systems with LLM-Guided Adaptive Search" (2606.25039)