Differentiable Dynamical Core
- Differentiable dynamical cores are computational structures that implement system evolution using fully differentiable operations, enabling gradient-based optimization and parameter estimation.
- They integrate modules such as ODE/PDE solvers, attention mechanisms, and physics kernels to seamlessly combine algorithmic processes with neural and physics-based methods.
- Applications span atmospheric, ocean, and materials modeling while addressing practical challenges like computational cost, solver stability, and gradient flow management.
A differentiable dynamical core is a computational structure that implements the time evolution of a dynamical system—potentially including physical laws, algorithmic processes, or neural modules—using only operations amenable to automatic differentiation. This architectural paradigm enables gradients to propagate through forward simulation steps, numerical solvers, discrete branching, memory operations, and embedded submodels, thus facilitating end-to-end optimization, parameter estimation, and hybrid learning in complex dynamical environments (Hernández et al., 2019, Zhou, 2024, Meunier et al., 21 Nov 2025, Whittaker et al., 19 Dec 2025, Wu et al., 12 Dec 2025).
1. Formal Definition and Theoretical Basis
At its most abstract, a differentiable dynamical core (DDC) represents the evolution operator for a parametric state-space dynamical system,
where is the dynamical state, is a known input or control, is observed output, and collects all differentiable parameters. Crucially, and are constructed from primitives (arithmetic, function application, numerical integration, logic) that are all differentiable and hence included as nodes in an acyclic computational graph (Hernández et al., 2019).
This enables automatic differentiation (AD) frameworks (e.g., PyTorch, TensorFlow, JAX) to compute the gradients of any scalar loss—potentially a complex function of the terminal output, intermediate states, or hidden variables—with respect to all upstream parameters, model components, input variables, and even solver hyperparameters. The DDC is not limited to neural networks; any composable mix of algorithmic and physics-based modules can be included, provided their operations are differentiable (Hernández et al., 2019, Zhou, 2024).
In theoretical stochastic settings, a differentiable dynamical core can also refer to a core (in the semigroup-theoretic sense) for the infinitesimal generator of a Markov process, typically the space of smooth, compactly supported functions, used to carry out all analytical arguments regarding invariance, ergodicity, and martingale problems (Holderrieth, 2019).
2. Architectures and Module Composition
A differentiable dynamical core is built as a directed acyclic graph (DAG) in which every node either forwards state, applies a transformation, or routes signals. Standard modules include:
- ODE/PDE solvers: Forward Euler, higher-order Runge–Kutta, or adjoint-based neural ODE modules, all differentiable with respect to their arguments and parameters (Hernández et al., 2019, Meunier et al., 21 Nov 2025, Zhou, 2024).
- Attention mechanisms: Key-query-value attention, softmax-weighted memory aggregation, value gating (Hernández et al., 2019, Wu et al., 12 Dec 2025).
- Memory modules: End-to-end read/write to external memory matrices; differentiable interpolations via softmax-based address selection (Hernández et al., 2019).
- Algorithmic operators: Conditionals, loops, or discrete logic, expressed in a smoothly parameterized or otherwise differentiable form.
- Physics-based kernels: Finite-difference, finite-volume, or spectral operators (e.g., for diffusion, advection, Coriolis), implemented as convolutional or matrix kernels to permit gradient flow (Zhou, 2024, Wu et al., 12 Dec 2025).
- Learnable coordinate transformations: Parameterized (e.g., neural-network-based) coordinate systems with gradients obtained via AD, as in metric computations for terrain-following atmospheric models (Whittaker et al., 19 Dec 2025).
In all settings, forward execution assembles the computation graph, and the backward pass executes chain-rule differentiation across all submodules, propagating sensitivities through time-steps, solver interiors, attention/memory modules, and subgrid corrections.
3. Differentiable Solvers: Numerical and Implementation Considerations
The integration of ODEs and PDEs is a key aspect of DDC design:
- Explicit solvers: E.g., single-step Runge–Kutta 4th order (RK4), implemented as a well-defined sequence of differentiable operations with respect to state and parameters (Hernández et al., 2019).
- Implicit solvers: E.g., backward Euler with internal Newton–Raphson iteration, supporting full differentiation via either loop-unrolling or custom VJP definitions (Meunier et al., 21 Nov 2025).
- Adjoint methods: Enable memory-efficient backpropagation through long integration intervals by solving companion linearized systems backwards in time (Jeong et al., 2024).
Automatic differentiation through these integrators requires:
- Graph purity: No in-place state updates; all variables passed through stateless, functionally pure interfaces (Meunier et al., 21 Nov 2025).
- Custom gradients: For singular points (e.g., square roots in turbulence closures), backward passes are regularized to avoid infinite or undefined derivatives (Meunier et al., 21 Nov 2025).
- Boundary conditions and solvers: Boundary and linear algebra routines (Poisson/tridiagonal) are wrapped in differentiable primitives, with custom gradients if necessary (Meunier et al., 21 Nov 2025, Zhou, 2024).
Time integration can be realized via explicit or implicit schemes, strong-stability-preserving Runge–Kutta, or symplectic methods. All forward steps are designed such that the entire simulation, from initial condition to final state, is one large differentiable mapping.
4. Application Domains and Case Studies
Differentiable dynamical cores have been articulated and implemented across several domains:
Scientific and Geophysical Modeling
- Ocean modeling: End-to-end differentiable OGCMs (e.g., NeuralOGCM) with parameterized physics and neural subgrid correctors, enabling the learning of diffusion coefficients and robust gradient-based data assimilation (Wu et al., 12 Dec 2025, Meunier et al., 21 Nov 2025).
- Atmospheric modeling: Solvers with learnable, fully differentiable terrain-following coordinates (e.g., NEUVE), ensuring exact propagation of coordinate derivatives and reducing simulation error over topography (Whittaker et al., 19 Dec 2025).
- Shallow-water dynamics: High-order, non-oscillatory, mass-conserving dynamical cores (e.g., HOPE), with tensor-product polynomial and WENO spatial reconstructions, implemented end-to-end in autodiff environments (Zhou, 2024).
Statistical Mechanics and Molecular Modeling
- Generalized Langevin equation (GLE): Coarse-grained MD with non-Markovian memory parameterized via differentiable convolution filters, trained by matching velocity-autocorrelation functions through backpropagation (Jeong et al., 2024).
Materials Science
- Crystal dislocation dynamics: Time-dependent, fully differentiable core models for edge and screw dislocations, yielding integro-differential equations whose analytic dependence on core width and center are fully differentiable (Pellegrini, 2010).
Stochastic Processes and MCMC
- Piecewise-deterministic Markov processes (PDMPs): Differentiable dynamical cores as function space cores for the Markov generator, facilitating mathematical proofs of uniqueness, invariance, and martingale properties (Holderrieth, 2019).
5. Limitations, Practical Issues, and Extensions
Differentiable dynamical cores, while universally expressive, carry several practical challenges:
- Memory and computational cost: AD over time steps, solver iterations, and memory modules can incur large memory footprints; checkpointing, truncated backpropagation, and adjoint-state methods are used to mitigate these issues (Meunier et al., 21 Nov 2025, Jeong et al., 2024).
- Solver stability: Explicit solvers may face severe CFL constraints; stiff problems require differentiated implicit solvers, which can be costly (Hernández et al., 2019, Zhou, 2024).
- Vanishing/exploding gradients: Long differentiable chains can suffer degradation of signal; internal gating, skip connections, or architectural regularization are essential for deep time integrations (Hernández et al., 2019).
- Graph complexity and selective differentiation: For large models (e.g., global ocean or weather solvers), only critical variables and parameters may be differentiated to contain graph size and improve runtime (Meunier et al., 21 Nov 2025).
- Extension to hybrid ML–physics schemes: DDCs are compatible with hybrid models, where ML modules supply subgrid process corrections or parameterizations alongside the differentiated physics core (Wu et al., 12 Dec 2025, Zhou, 2024).
Extensions include model cores for non-Euclidean geometries, time-inhomogeneous dynamics, manifold-valued state spaces, and novel solver types (e.g., IMEX, adaptive mesh, or neural evolutionary operators) (Holderrieth, 2019, Zhou, 2024).
6. Impact and Research Directions
The differentiable dynamical core framework unifies explicit representation of dynamical evolution, end-to-end differentiable algorithmic reasoning, and integration of neural and physics-based modules within a common optimization and inference workflow. This supports new regimes of:
- Gradient-based data assimilation: Training initial states or parameters by minimizing forecast error directly through the simulation chain (Meunier et al., 21 Nov 2025, Wu et al., 12 Dec 2025).
- Parameter and structure learning: Enabling physical parameters, coordinate systems, or even discretization templates to be tuned via gradient descent (Wu et al., 12 Dec 2025, Whittaker et al., 19 Dec 2025).
- Scientific hybrid modeling: Allowing for seamless integration of ML subcomponents (attention, memory, closure models) tailored to specific unresolved physics within the fully differentiable pipeline (Wu et al., 12 Dec 2025, Zhou, 2024).
Recent works empirically demonstrate that systems equipped with differentiable dynamical cores exhibit improved stability, interpretability, and data efficiency, markedly outperforming purely data-driven baselines on long-range scientific forecasting, and unlocking research frontiers in hybrid physical–ML methodology (Wu et al., 12 Dec 2025, Meunier et al., 21 Nov 2025).