Neural Conservation Laws in Scientific ML

Updated 4 July 2026

Neural conservation laws are methods that embed conserved quantities into network architectures by leveraging symmetry, divergence-free parameterizations, and projection techniques.
They enable precise modeling of PDEs, hyperbolic systems, and gradient dynamics by enforcing conservation constraints directly in the learning process.
These techniques enhance model performance and interpretability while reducing data requirements and ensuring consistency with physical laws.

Neural conservation laws are methods that impose, exploit, or discover conserved quantities within neural models of physical systems and, in a distinct theoretical line, within the gradient-flow dynamics of neural-network training. In scientific machine learning, they target relations such as the continuity equation

$\partial_t \rho + \nabla \cdot \mu = 0$

or integral invariants such as

$Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$

and they do so through divergence-free parameterizations, conservative flux formulations, projection operators, weak space–time constraints, or constrained output layers (Richter-Powell et al., 2022, Baez et al., 2024, Liu et al., 2023). In dynamical-system learning, they are closely tied to Noether-style symmetry arguments and exact invariant-preserving neural integrators (Müller, 2022). In data-driven discovery, they appear as learned scalar functions that remain constant along trajectories, often with explicit criteria for functional independence and symbolic recoverability (Ha et al., 2021, Zhu et al., 2023, Ray, 20 Mar 2026).

1. Definitions and conceptual scope

A conserved quantity in the PDE setting is typically an integral of a density or state field. The literature represented here uses several equivalent formulations. One is the continuity law for a density $\rho$ and flux $\mu$ , written as $\partial_t\rho+\nabla\cdot\mu=0$ , which guarantees conservation of $\int_V \rho\,dx$ on any control volume $V$ (Liu et al., 2023). Another is the conservation of a scalar field’s spatial integral, as in

$c(t)=\int_X u(x,t)\,dx,$

which is taken to be constant in projection-based PINN constructions (Baez et al., 2024). For ODEs and Hamiltonian systems, a scalar observable $I(x)$ is conserved when its Lie derivative along the dynamics vanishes, equivalently $\nabla I(x)\cdot f(x)=0$ , or, in Hamiltonian form, $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 0 (Zhu et al., 2023, Chen et al., 2024).

A separate usage arises in the analysis of learning dynamics. There, an integral of motion is a differentiable function $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 1 of the parameters that remains constant along gradient flow,

$Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 2

which is equivalent to $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 3 for all $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 4 (Galley et al., 9 Jun 2026). This connects conservation laws to symmetries of the architecture or, more exceptionally, to symmetries induced by data augmentation under special losses (Kunin et al., 2020, Galley et al., 9 Jun 2026).

Taken together, the term “neural conservation laws” now denotes at least three technically distinct research programs: hard enforcement of physical constraints in neural surrogates; neural discovery of unknown invariants from data; and theoretical characterization of invariants in neural-network optimization. The shared premise is that conservation is not merely a regularizer, but a structural property that can be encoded, inferred, or characterized exactly.

2. Hard enforcement in neural PDE and operator models

A central strand of the literature replaces soft penalties by constructions that guarantee conservation by design. In the divergence-free perspective of "Neural Conservation Laws" (Richter-Powell et al., 2022), the continuity equation is recast by introducing the augmented space–time field

$Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 5

so that $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 6 is equivalent to $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 7. The paper gives two practical parameterizations of divergence-free fields via differential forms: a matrix-field construction using an antisymmetric matrix field $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 8 and a vector-field construction using the antisymmetric part of a Jacobian, $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 9. Both yield $\rho$ 0 exactly, and the paper proves universality up to a constant vector on the torus (Richter-Powell et al., 2022).

The same architectural principle appears in neural operators. "Harnessing the Power of Neural Operators with Automatically Encoded Conservation Laws" introduces conservation-law-encoded neural operators, or clawNOs, by appending a final layer $\rho$ 1 that maps a skew-symmetric potential $\rho$ 2 to a divergence-free field,

$\rho$ 3

thereby enforcing $\rho$ 4 without additional PDE-losses or collocation points (Liu et al., 2023). The construction is implemented spectrally in FNO-type models and by high-order finite-difference or moving-least-squares stencils in GNO-type models. The reported demonstrations include incompressible 2D Navier–Stokes, radial dam-break, global atmospheric gravity waves, and elastic constitutive modeling. In the small-data regime, clawGFNO reaches $\rho$ 5 error with $\rho$ 6 training samples versus $\rho$ 7 for GFNO, clawFNO gives $\rho$ 8 error with $\rho$ 9 samples versus $\mu$ 0 for FNO on radial dam-break, and clawFNO reduces error from $\mu$ 1 to $\mu$ 2 on atmospheric gravity waves (Liu et al., 2023).

Projection methods provide a more modular route. "Guaranteeing Conservation Laws with Projection in Physics-Informed Neural Networks" constructs

$\mu$ 3

which maps any candidate field into the affine subspace $\mu$ 4 (Baez et al., 2024). PINN-Proj replaces all appearances of $\mu$ 5 by $\mu$ 6 in both the data-fit and PDE-residual terms, and the projection is linear and idempotent. The paper proves exact conservation up to numerical integration error and reports that, on 1D advection, viscous Burgers’, and KdV benchmarks with zero-flux boundaries and $\mu$ 7, PINN-Proj reduces momentum error by three to four orders of magnitude relative to vanilla PINN and PINN-SC while slightly improving state-prediction error (Baez et al., 2024).

A related wrapper strategy appears in the Exterior-Embedded Conservation Framework (ECF), which surrounds any neural operator with a conserved-quantity encoder and decoder (Dong et al., 20 Nov 2025). In the implementation described there, the encoder extracts the Fourier zero mode, and the decoder replaces the predicted zero mode by the input zero mode before inverse transform, guaranteeing exact preservation of the mean. The theoretical argument uses orthogonality of Fourier modes to show that replacing the zero mode by the true conserved value cannot increase the $\mu$ 8 error. Reported benchmarks include Allen–Cahn variants, adiabatic heat, shallow water, diffusion, and convection–diffusion, with relative conservation error reduced from order $\mu$ 9– $\partial_t\rho+\nabla\cdot\mu=0$ 0 to approximately $\partial_t\rho+\nabla\cdot\mu=0$ 1 and RMSE reductions such as $\partial_t\rho+\nabla\cdot\mu=0$ 2 for UNO on AC–DW and $\partial_t\rho+\nabla\cdot\mu=0$ 3 for FNO on Diff (Dong et al., 20 Nov 2025).

Linear conservation laws can also be enforced directly in the output layer. In the climate-emulation setting, one paper considers constraints of the form

$\partial_t\rho+\nabla\cdot\mu=0$ 4

for moist enthalpy, mass, longwave radiation, and shortwave radiation, and compares loss-based and architecture-based enforcement (Beucler et al., 2019). The architecture-constrained network solves for a subset of outputs so that the full output satisfies the linear system exactly to numerical precision. In the SP-CAM cloud-process emulator, this architecture-constrained model achieves mean penalty $\partial_t\rho+\nabla\cdot\mu=0$ 5 on both the +0 K and +4 K validation climates, whereas soft penalties allow a tunable trade-off between conservation and MSE (Beucler et al., 2019).

Projection can also be applied in parameter space rather than output space. The relaxation–projection extension of the Time-Evolving Natural Gradient (TENG) method first relaxes the target $\partial_t\rho+\nabla\cdot\mu=0$ 6 so that $\partial_t\rho+\nabla\cdot\mu=0$ 7, then projects the updated parameters back to the invariant manifold $\partial_t\rho+\nabla\cdot\mu=0$ 8 using

$\partial_t\rho+\nabla\cdot\mu=0$ 9

(Shi et al., 21 Mar 2026). For inviscid Burgers, KdV, and the acoustic wave system, the reported invariant drift drops to machine precision levels such as $\int_V \rho\,dx$ 0, $\int_V \rho\,dx$ 1, and $\int_V \rho\,dx$ 2 (Shi et al., 21 Mar 2026).

3. Conservative and entropy-consistent learning for hyperbolic conservation laws

For hyperbolic systems, conservation is often embedded at the discretization level rather than only at the continuous level. "Designing Neural Networks for Hyperbolic Conservation Laws" proposes the Conservative Form Network (CFN), which learns a numerical flux on a local stencil and advances cell averages by the finite-volume update

$\int_V \rho\,dx$ 3

Because conservation is built into the update, total mass is exactly tracked modulo boundary fluxes (Chen et al., 2022). The paper argues that exact discrete conservation implies correct shock speeds by the Lax–Wendroff and Rankine–Hugoniot theory, and reports that CFN consistently captures the correct shock propagation speed without non-physical oscillations. On Burgers’ equation, the paper states that CFN typically yields $\int_V \rho\,dx$ 4 and shock-position error $\int_V \rho\,dx$ 5, while the non-conservative baseline gives $\int_V \rho\,dx$ 6 and shock-position error of order $\int_V \rho\,dx$ 7 (Chen et al., 2022).

Entropy stability is addressed explicitly in the neural entropy-stable conservative flux-form neural network (NESCFN) (Liu et al., 2 Jul 2025). NESCFN learns an entropy-conservative flux, a wave-speed model, and a convex entropy $\int_V \rho\,dx$ 8 represented by an input-convex neural network. The learned numerical flux has the Tadmor form

$\int_V \rho\,dx$ 9

with $V$ 0, yielding a provably conservative and entropy-stable semidiscrete scheme for the learned law (Liu et al., 2 Jul 2025). The reported experiments cover Burgers, shallow water, Euler shock-tube and Shu–Osher problems, and 2D Burgers. Conservation error is stated as $V$ 1 over long times, while the discrete entropy remainder satisfies $V$ 2 uniformly across tests (Liu et al., 2 Jul 2025).

A different response to discontinuities is to abandon strong-form residuals. Weak and Entropy PINNs (WE-PINNs) replace pointwise PDE residual minimization by flux balance over randomly sampled space–time control volumes,

$V$ 3

and combine this with entropy inequalities in integral form (Oubarka et al., 25 Mar 2026). The paper emphasizes that strong-form PINNs become structurally inconsistent near shocks because strong-form residuals diverge there, whereas the weak control-volume formulation remains well defined. It establishes

$V$ 4

described as the first explicit $V$ 5 convergence rate for a mesh-free control-volume PINN formulation via the Bouchut–Perthame framework (Oubarka et al., 25 Mar 2026).

The literature also contains local neural solvers that target conservation-law PDEs without necessarily imposing a global hard constraint. Neural Networks with Local Converging Inputs (NNLCI) use two low-cost numerical approximations, on coarse and fine meshes, drawn from the local domain of dependence of a queried space–time point, and feed them to a local MLP for 2D Euler systems (Huang et al., 2022). On classical 2D Riemann problems, reported relative $V$ 6 errors are of order $V$ 7 to $V$ 8, and the method is said to predict shocks, contacts, and smooth regions accurately despite smeared local input data (Huang et al., 2022).

These approaches differ in what they preserve: CFN preserves the discrete conservative form, NESCFN preserves conservation together with entropy dissipation, and WE-PINN enforces weak conservation and entropy admissibility at the level of sampled space–time volumes. This suggests that “conservation” in neural solvers is now understood not as a single constraint, but as a hierarchy ranging from integral balance to admissible weak-solution structure.

4. Symmetry, Noether theory, and conservation in neural dynamics

A separate but closely related line uses Noether theory to build neural models whose learned dynamics inherit exact invariants from enforced symmetries. "Exact conservation laws for neural network integrators of dynamical systems" modifies the Lagrangian neural network framework by introducing a symmetry-enforcing feature map $V$ 9 before the MLP representation of the Lagrangian (Müller, 2022). If the inputs to the network are invariant under a continuous symmetry group, the learned Lagrangian is invariant by construction, and the associated Noether charge is exactly conserved in the learned dynamics. The paper demonstrates this for angular momentum in Newtonian and Schwarzschild orbits and for linear and angular momentum in a two-particle system in four dimensions, reporting that enforcing both translation and rotation invariance clamps both momenta within $c(t)=\int_X u(x,t)\,dx,$ 0 of their initial values at final time $c(t)=\int_X u(x,t)\,dx,$ 1, whereas the unconstrained model drifts by up to $c(t)=\int_X u(x,t)\,dx,$ 2 (Müller, 2022).

Conservation laws also arise in the parameter dynamics of gradient descent itself. "Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics" studies architectural symmetries $c(t)=\int_X u(x,t)\,dx,$ 3 satisfying $c(t)=\int_X u(x,t)\,dx,$ 4 and derives the orthogonality condition

$c(t)=\int_X u(x,t)\,dx,$ 5

together with Noether-style conserved quantities $c(t)=\int_X u(x,t)\,dx,$ 6 under gradient flow (Kunin et al., 2020). Translation symmetry yields conserved parameter sums, scale symmetry yields conserved squared norms on parameter subsets, and rescale symmetry yields conserved norm differences. The same paper shows that finite learning rate $c(t)=\int_X u(x,t)\,dx,$ 7 breaks these laws via modified gradient flow on

$c(t)=\int_X u(x,t)\,dx,$ 8

and validates the resulting integral formulas on VGG-16 trained on Tiny ImageNet (Kunin et al., 2020).

This framework has been extended to contemporary modules. "Conservation Laws for Modern Neural Architectures" gives a data-independent characterization of all $c(t)=\int_X u(x,t)\,dx,$ 9 conservation laws for several modern models under gradient flow (Tran et al., 16 Jun 2026). For one-hidden-layer feedforward networks with GELU or SiLU activations, every $I(x)$ 0 conservation law is constant. For SwiGLU, the only independent conserved quantities are

$I(x)$ 1

For attention without positional encoding, the full set of invariants per head is

$I(x)$ 2

while RoPE reduces the query-key invariants to independent $I(x)$ 3-dimensional blocks (Tran et al., 16 Jun 2026). For MoE models with dense softmax gating, each column sum of the gating matrix $I(x)$ 4 is preserved under gradient flow, and the same invariants persist for Top- $I(x)$ 5 and normalized sigmoid gating under the paper’s assumptions (Tran et al., 16 Jun 2026).

A complementary result concerns data symmetry. "Conservation Laws from Data Symmetry in Neural Networks" shows that for analytic non-polynomial margin-type losses, group-averaging generically introduces no genuinely new integrals of motion, because the span of per-data-point gradients is unchanged (Galley et al., 9 Jun 2026). In contrast, under MSE loss and tensorizable networks, data augmentation can enlarge symmetry sufficiently to yield new conserved subspaces, such as preservation of $I(x)$ 6 under an induced $I(x)$ 7 action on a parameter block (Galley et al., 9 Jun 2026).

These works shift conservation laws from the physical output space to the training trajectory in parameter space. Their common claim is not that deep learning is Hamiltonian in any literal sense, but that architectural symmetry imposes exact orthogonality relations on gradients and therefore induces invariant combinations of parameters in the continuous-time limit.

5. Neural discovery of unknown conservation laws from data

Another major research direction uses neural networks to infer conservation laws when the invariants are unknown. ConservNet learns a scalar function $I(x)$ 8 from grouped data in which all samples within a group share a hidden invariant (Ha et al., 2021). The network is trained with the noise-variance loss

$I(x)$ 9

which simultaneously drives outputs to be constant within groups and prevents collapse to a trivial constant through a spreading term on perturbed inputs (Ha et al., 2021). The paper reports that ConservNet recovers invariants on several synthetic systems, Lotka–Volterra, Kepler, and a real double-pendulum trajectory, and maintains correlation $\nabla I(x)\cdot f(x)=0$ 0 for additive Gaussian noise up to approximately $\nabla I(x)\cdot f(x)=0$ 1 standard deviation of $\nabla I(x)\cdot f(x)=0$ 2 (Ha et al., 2021).

When multiple invariants are sought, functional independence becomes central. "AI Poincaré 2.0" represents each candidate invariant $\nabla I(x)\cdot f(x)=0$ 3 by a separate neural network and minimizes a conservation loss $\nabla I(x)\cdot f(x)=0$ 4 together with a pairwise orthogonality regularizer on $\nabla I(x)\cdot f(x)=0$ 5 and $\nabla I(x)\cdot f(x)=0$ 6 (Liu et al., 2022). The method then estimates differential rank by SVD of the Jacobian matrix $\nabla I(x)\cdot f(x)=0$ 7 and can pass the learned quantities to symbolic recovery. The reported examples include the three-body problem, KdV, and nonlinear Schrödinger equation, with numerical identification of $\nabla I(x)\cdot f(x)=0$ 8 invariants for the three-body problem and $\nabla I(x)\cdot f(x)=0$ 9 for KdV under the chosen inductive bias (Liu et al., 2022).

Neural deflation sharpens the independence criterion by explicitly penalizing dependence on previously discovered gradients. In the Hamiltonian setting with known equations, the method minimizes a loss combining conservation, involution,

$Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 00

and an independence regularizer based on the norm of the projection of $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 01 onto the orthogonal complement of the span of earlier gradients (Zhu et al., 2023). The paper reports that for the Toda chain with phase-space dimension $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 02, the algorithm finds $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 03 invariants, with validation loss remaining $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 04 for $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 05 and jumping by orders of magnitude at $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 06, whereas for the FPUT and discrete sine–Gordon lattices the jump occurs at $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 07 (Zhu et al., 2023).

The trajectory-only extension replaces explicit equations by a learned Hamiltonian neural network. "Data-Driven Discovery of Conservation Laws from Trajectories via Neural Deflation" first trains an HNN from finite-difference estimates of trajectory derivatives, freezes the surrogate vector field, and then runs the deflation procedure on invariant networks (Chen et al., 2024). The paper reports correct invariant counts for 1D and 2D harmonic oscillators, Calogero–Moser, and Toda, but misses the momentum invariant in FPUT, recovering only one invariant instead of two (Chen et al., 2024). That limitation is documented explicitly and linked to finite-difference inaccuracies and architectural issues.

The most stringent anti-false-positive pipeline in the provided material is NGCG, a neural-symbolic method that decouples dynamics learning from invariant discovery (Ray, 20 Mar 2026). After a separate dynamics model is frozen, a multi-restart variance minimizer learns a near-constant scalar on trajectories; symbolic extraction is then performed by polynomial Lasso, log-basis Lasso, explicit PDE candidates, or PySR, and candidates are filtered by a strict constancy gate and diversity threshold (Ray, 20 Mar 2026). On nine benchmark systems, the paper reports DR $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 08, FDR $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 09, and F1 $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 10 on all four systems with true conservation laws, while correctly outputting no law on all five systems without invariants. It is stated to be the only method that succeeds on Lotka–Volterra, with test constancy $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 11 (Ray, 20 Mar 2026).

An earlier symmetry-extraction approach works indirectly through a trained autoencoder. "Interpretable Conservation Law Estimation by Deriving the Symmetries of Dynamics from Trained Deep Neural Networks" trains an autoencoder to flatten the manifold of time-series quadruples $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 12, samples linear transformations that preserve that manifold using Replica-Exchange Monte Carlo, and then invokes Noether’s theorem to recover explicit generators such as momentum or angular momentum (Mototake, 2019). In the constant-velocity example, the recovered generator is $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 13; in the Reynolds boid torus example, the recovered conserved generator is the angular momentum $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 14 about the centroid (Mototake, 2019).

6. Limitations, trade-offs, and open directions

The literature repeatedly emphasizes that exact conservation is not free. In projection-based PINNs, every epoch incurs an $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 15 overhead for evaluating and backpropagating through projection integrals, and quadrature error controls how close the conserved quantity remains to machine precision (Baez et al., 2024). The method also assumes no net inflow or outflow; extending it to general boundary-flux constraints requires a different projection operator (Baez et al., 2024). Similar design dependence appears in ECF, where adapting the wrapper to momentum or energy requires a new encoder–decoder pair rather than a universal formula (Dong et al., 20 Nov 2025).

Architectural hard constraints can also restrict expressivity. In the climate-emulator study, architecture-based enforcement produces exact linear conservation but can incur a small MSE penalty relative to lightly penalized soft-constraint training, because the network’s free output dimension is reduced (Beucler et al., 2019). In clawNOs, the present implementation focuses on mass or volume conservation, while combining multiple simultaneous laws such as mass, momentum, and energy, or treating fully compressible flows with nontrivial density evolution, is identified as an open challenge (Liu et al., 2023).

For discontinuous PDEs, the main limitation of strong-form PINNs is structural: pointwise residual minimization is inconsistent near shocks. WE-PINNs are proposed specifically to address that limitation by moving to weak flux balances and entropy inequalities (Oubarka et al., 25 Mar 2026). This suggests that enforcing a conservation law in differential form is insufficient when the physically relevant solution concept is an entropy solution rather than a classical one.

Discovery methods face a different failure mode: spurious invariants or incomplete invariant sets. The data-driven neural-deflation paper explicitly documents a missed momentum law in FPUT and difficulty near Calogero–Moser collisions (Chen et al., 2024). NGCG addresses false positives through a strict constancy gate and diversity filter, but at the cost of a multi-stage pipeline rather than a single end-to-end network (Ray, 20 Mar 2026). A plausible implication is that identifiability and verification are now as central as expressive power in conservation-law discovery.

Finally, the training-dynamics literature shows that exact invariants of gradient flow need not survive practical optimization. Finite learning rates break symmetry-induced conservation laws through modified gradient flow (Kunin et al., 2020), although more recent work reports normalized conservation errors growing like $Q(u(t))=\int_\Omega u(\mathbf x,t)\,d\mathbf x=\text{const},$ 16 under constant-step SGD and remaining uniformly bounded under decaying step sizes for several modern architectures (Tran et al., 16 Jun 2026). This continuous-to-discrete gap remains a recurring theme across the field: most hard-conservation constructions are exact only up to quadrature, discretization, floating-point, or optimizer effects.

Across these lines of work, the common direction is clear. Conservation laws are being treated less as auxiliary physics priors and more as objects that can determine architecture, define admissible hypothesis classes, structure optimization, and serve as the criterion for symbolic scientific discovery.