Interpretable Relational Inference with LLM-Guided Symbolic Dynamics Modeling

Published 14 Apr 2026 in cs.LG | (2604.12806v1)

Abstract: Inferring latent interaction structures from observed dynamics is a fundamental inverse problem in many-body interacting systems. Most neural approaches rely on black-box surrogates over trainable graphs, achieving accuracy at the expense of mechanistic interpretability. Symbolic regression offers explicit dynamical equations and stronger inductive biases, but typically assumes known topology and a fixed function library. We propose \textbf{COSINE} (\textbf{C}o-\textbf{O}ptimization of \textbf{S}ymbolic \textbf{I}nteractions and \textbf{N}etwork \textbf{E}dges), a differentiable framework that jointly discovers interaction graphs and sparse symbolic dynamics. To overcome the limitations of fixed symbolic libraries, COSINE further incorporates an outer-loop LLM that adaptively prunes and expands the hypothesis space using feedback from the inner optimization loop. Experiments on synthetic systems and large-scale real-world epidemic data demonstrate robust structural recovery and compact, mechanism-aligned dynamical expressions. Code: https://anonymous.4open.science/r/COSINE-6D43.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces COSINE, a framework that jointly optimizes latent graph structure and symbolic dynamics through a differentiable inner-outer loop with LLM-guided symbolic library evolution.
COSINE employs sparsity-driven symbolic message passing and Gumbel-Softmax relaxation to achieve high AUC scores (≥0.99) and precise mechanism recovery on synthetic benchmarks.
The framework outperforms black-box neural models by enhancing data efficiency and scalability, delivering mechanistically interpretable insights in both simulated systems and real-world epidemic data.

Interpretable Relational Inference via LLM-Guided Symbolic Dynamics: The COSINE Framework

Introduction

Latent interaction inference in many-body dynamical systems is a central machine learning problem with broad implications across physics, systems biology, neuroscience, and computational epidemiology. Existing neural approaches, such as Neural Relational Inference (NRI), typically parameterize the latent interaction graph and dynamics with black-box neural networks. While empirically effective, these models lack mechanism interpretability and are prone to overfitting spurious associations in overparameterized regimes. Conversely, symbolic regression offers explicit, concise governing equations with mechanistic insight but generally presumes known interaction topology and a fixed function library, which severely limits applicability to real-world scenarios with latent structure and unknown interactions.

The paper "Interpretable Relational Inference with LLM-Guided Symbolic Dynamics Modeling" (2604.12806) addresses this critical gap by introducing COSINE (Co-Optimization of Symbolic Interactions and Network Edges), a framework which achieves joint discovery of interaction graphs and mechanistic symbolic dynamics. COSINE's core contributions are: (1) differentiable, sparsity-driven symbolic message-passing for structure and dynamics discovery; (2) a closed-loop LLM-guided basis library evolution, decoupling symbolic hypothesis generation from optimization-based selection; and (3) demonstrated state-of-the-art results on synthetic and real-world systems with strong mechanistic alignment.

Prior work in relational inference (NRI, GDP, RIVA) employs neural architectures that parameterize graph generation and system trajectories via message-passing GNNs. Although these models successfully capture statistical dependencies, they treat underlying mechanism as a black box, challenging interpretability, particularly in high-dimensional and heterogeneous settings. Symbolic regression methods (SINDy, ND2, LaSR) provide explicit equation discovery but typically rely upon fixed, domain-specific operator libraries and known graphs, hampering their generality and robustness. Recent advances in LLM-guided symbolic regression (LaSR) leverage LLMs as generative priors but do not provide end-to-end differentiable frameworks for joint structure-dynamics learning under unknown graphs. COSINE advances the state of the art by directly integrating symbolic hypothesis evolution (with LLM) into an interpretable, differentiable co-optimization loop that scales efficiently without custom domain knowledge.

COSINE Framework

COSINE adopts an alternating minimization architecture with a two-tiered optimization loop:

Inner Loop: Co-optimizes the latent graph adjacency and the weights of basis functions for both the message and node update modules. The structure is parameterized using temperature-controlled Gumbel-Softmax relaxations for differentiable edge sampling and soft adjacency, permitting efficient gradient-based optimization over edge probabilities and sparsity-constrained regression weights.
Sparse Regression Message Passing: System dynamics are decomposed as explicit, interpretable sparse combinations of message-basis and update-basis functions. Aggregated messages encode neighbor effects, while update terms model local node dynamics and message integration, both governed by dynamically evolving basis libraries.
LLM-Guided Library Evolution: The outer loop leverages a LLM as a symbolic editor. The LLM iteratively prunes or augments the basis function library, guided by training feedback (loss, sparsity, residuals) from the inner loop. This enables COSINE to transcend the limitations of closed-world regression and fixed operator sets: symbolic hypothesis proposals (LLM) and numerical selection (optimization) are cleanly separated. Exploration in symbolic space continues only while validation loss improves, naturally regularizing library growth.
Losses and Regularization: The training objective is a sum of Gaussian NLL (fit to trajectory data), KL-divergence regularization of the graph, and L1 sparsity regularization of the symbolic weights. The combination encourages both accurate prediction and robust, parsimonious mechanism identification.

Experimental Results

Synthetic Systems

COSINE is benchmarked across a spectrum of canonical networked dynamical processes (Michaelis-Menten, Diffusion, Springs, Kuramoto, Friedkin-Johnsen, Coupled Map Networks) instantiated on Erdős–Rényi, Barabási–Albert, and Watts–Strogatz graphs. Quantitative results indicate:

Relational Inference Accuracy: COSINE consistently achieves AUC scores $\geq$ 0.99 on synthetic systems, outperforming classical statistical (GC, MI, TE) and contemporary neural (NRI, GDP, RIVA) baselines, particularly under nonlinear and high-heterogeneity dynamics.
Mechanism Recovery: Top terms (by coefficient magnitude) in the discovered symbolic expressions are consistently aligned with known underlying dynamics. Noteworthy is the identification of physically meaningful primitives (e.g., sinusoidal coupling in Kuramoto, degree-normalized aggregation in MM) and the nearly perfect primitive coverage scores, demonstrating strong equation structure recovery.
Efficiency and Scalability: COSINE surpasses black-box neural baselines in both training time and GPU memory. The sparse linear framework and message/update library decouple symbolic search from heavy gradient-based training, yielding favorable scaling up to $N=200$ nodes.

Real-World COVID-19 Epidemic Data

Applied to county-level timeseries of COVID-19 incidence, COSINE discerns regionally dependent interaction graphs reflective of actual mobility and population structure, and recovers mechanistically credible symbolic forms. For instance, multiplicative terms ( $x_i x_j$ ) in message modules robustly capture the mass-action assumption in epidemic processes, while region-specific update terms reflect the effects of exogenous infection pressure (e.g., $x \cdot h$ ) versus local nonlinearities (e.g., logistic growth, negative feedback). The inferred mechanisms provide actionable scientific insights into cross-region epidemic propagation regimes.

LLM-Guided Evolution and Model Robustness

Ablation studies demonstrate that active LLM-guided symbolic library refinement is essential; fixed or threshold-pruned library strategies markedly degrade both structural recovery and mechanism discovery. Larger LLMs yield marginal gains only for highly nonlinear or compositionally rich systems, while for simple dynamics, even 8B-14B models suffice.

Under extreme low-data conditions, COSINE maintains higher AUC and mechanism coverage than both classical and neural baselines. This enhanced data-efficiency is directly attributable to the introduced symbolic inductive biases and sparsity regularization.

Hyperparameter sweeps reveal a performance plateau for wide ranges of optimization rates and sparsity, with sharp failure modes outside optimal intervals. In contrast to neural surrogates, COSINE remains robust to strong regularization, reliably pruning redundant terms without sacrificing inference accuracy.

Limitations

COSINE's expressivity is ultimately bounded by the capacity of sparse symbolic regression. In highly complex systems, LLM-guided library composition may require richer rational or compositional primitives, and the method is not immune to ambiguity in equation non-identifiability under severe noise or data scarcity. The decoupled inner-outer optimization also means that convergence to exact mechanism can be slow, especially when primitive expressions are missing from the initial library.

Implications and Future Directions

COSINE's approach—fusing differentiable structure learning with LLM-guided symbolic hypothesis evolution—provides a scalable, interpretable pathway for mechanism-aware graph and dynamics modeling. Practically, this enables data-driven discovery of interpretable laws in complex systems even under severe uncertainty of graph structure and limited domain knowledge. In scientific applications, it presents a viable alternative to opaque graph neural models, opening possibilities for mechanistic hypothesis generation in biology, epidemiology, physics-informed machine learning, and beyond.

Future work could explore (i) richer hypothesis spaces via multi-objective or multi-agent LLM prompting, (ii) integration of domain knowledge priors, (iii) generalization to higher-order or temporal graphs, and (iv) formal uncertainty quantification in the discovered symbolic and graph components.

Conclusion

COSINE represents a substantive advance in interpretable relational inference and symbolic dynamics modeling (2604.12806). By jointly optimizing structure and mechanism in a differentiable outer-inner loop, with closed-loop LLM supervision for hypothesis evolution, COSINE demonstrates state-of-the-art performance on theory-driven and real-world benchmarks, sets a new paradigm for mechanistic discovery in latent interaction systems, and provides a scalable alternative to black-box neural modeling for scientific understanding.

Markdown Report Issue