Port-Hamiltonian Neural Networks

Updated 30 January 2026

Port-Hamiltonian Neural Networks are physics-informed models that embed energy functions, skew-symmetric interconnections, and dissipative structures to preserve passivity and stability.
They use constrained neural architectures (e.g., anti-symmetrized outputs and Cholesky factorizations) to guarantee physical properties like energy balance and positive semi-definiteness.
Empirical studies show pHNNs achieve superior long-term predictive accuracy and data efficiency over black-box models in nonlinear, high-dimensional dynamical systems.

Port-Hamiltonian Neural Networks (pHNNs) are a class of machine learning models that embed the geometric and energetic structure of port-Hamiltonian systems (PHS) into neural architectures. By construction, these networks preserve physical system properties such as passivity, stability, and energy balance—essential for reliable modeling, identification, and control of nonlinear, possibly high-dimensional, dynamical systems with inputs, dissipation, and interconnection. Leveraging the universal approximation capacity of neural networks within a structure-preserving framework, pHNNs achieve superior long-term predictive accuracy and physical consistency compared to black-box neural models, and they enable principled extensions to distributed, stochastic, and constrained (DAE) settings (Cherifi et al., 10 Jan 2025, Persio et al., 8 Sep 2025, Roth et al., 4 Feb 2025).

1. Mathematical Structure of Port-Hamiltonian Neural Networks

Port-Hamiltonian systems provide a general framework for modeling open, energy-based dynamical systems by specifying state-dependent Hamiltonian (energy function), skew-symmetric interconnection, and dissipative terms, together with port (input-output) maps:

$\dot x = [J(x) - R(x)] \nabla_x H(x) + B(x) u, \qquad y = B(x)^\top \nabla_x H(x),$

where

$x\in\mathbb{R}^n$ : state,
$u\in\mathbb{R}^m$ : input,
$y\in\mathbb{R}^m$ : port output,
$H(x)$ : Hamiltonian, continuously differentiable, bounded below,
$J(x) = -J(x)^\top$ : skew-symmetric interconnection matrix,
$R(x) = R(x)^\top \succeq 0$ : positive semi-definite dissipation matrix,
$B(x)$ : input map.

A port-Hamiltonian neural network parameterizes the unknown functions—Hamiltonian, $J$ , $R$ , and $x\in\mathbb{R}^n$ 0—using neural networks with structural constraints reflecting the physics. The neural architectures must guarantee, by construction, the necessary skew-symmetry and positive semi-definiteness properties (Cherifi et al., 10 Jan 2025, Roth et al., 4 Feb 2025, Persio et al., 8 Sep 2025).

The parameterization is typically:

$x\in\mathbb{R}^n$ 1, with $x\in\mathbb{R}^n$ 2 a neural network,
$x\in\mathbb{R}^n$ 3,
$x\in\mathbb{R}^n$ 4,
$x\in\mathbb{R}^n$ 5 (Cherifi et al., 10 Jan 2025).

2. Structure-Preserving Neural Parameterizations

pHNNs enforce exact physical constraints in their architecture:

Skew-symmetry: $x\in\mathbb{R}^n$ 6 via difference of neural network outputs so $x\in\mathbb{R}^n$ 7.
Positive semi-definiteness: $x\in\mathbb{R}^n$ 8 via Cholesky-like or matrix-square parameterizations so $x\in\mathbb{R}^n$ 9 (Roth et al., 4 Feb 2025).
Port maps and Hamiltonians: parameterized as MLPs, KANs, or using basis/ansatz expansions where prior knowledge is available.
Hamiltonians: can be enforced convex (e.g., using input-convex neural networks) or quadratic if dictated by prior physics (Roth et al., 4 Feb 2025, Cherifi et al., 10 Jan 2025).

Variants utilize Kolmogorov–Arnold Networks (KANs) that represent multivariate functions as sums of univariate neural networks. While KANs offer a compact parameterization, they exhibit slower training and, experimentally, MLP architectures outperform them in accuracy and efficiency for structured pHNNs (Cherifi et al., 10 Jan 2025).

The table below summarizes typical architectural patterns:

Term	Parameterization	Constraint guaranteed
$u\in\mathbb{R}^m$ 0	anti-symmetrized NN output	Skew-symmetry
$u\in\mathbb{R}^m$ 1	$u\in\mathbb{R}^m$ 2 (Cholesky/MLP)	Positive semi-definiteness
$u\in\mathbb{R}^m$ 3	MLP output reshaped to matrix	None (can be unconstrained)
$u\in\mathbb{R}^m$ 4	MLP or convex NN	Lower bound or convexity

Incorporating priors through ansatz functions enables hybrid models with known linear/quadratic or constant blocks, increasing data efficiency and interpretability (Cherifi et al., 10 Jan 2025).

3. Training Methodologies and Loss Functions

The pHNN framework is generally trained by minimizing a loss that measures prediction error on state-derivatives and/or outputs over sampled trajectories:

$u\in\mathbb{R}^m$ 5

with all gradients (including $u\in\mathbb{R}^m$ 6) computed via autograd frameworks (Cherifi et al., 10 Jan 2025). No additional penalty terms are needed to preserve structure—this is achieved by architecture.

Mini-batch stochastic gradient descent with AdamW or Adam, learning rate schedules, and automatic differentiation are standard. Batch sizes of 128–256 are typical, with learning-rate annealing schemes commonly used.

Long-term rollouts and trajectory-extrapolation are assessed for stability and physical consistency.

Output-error loss structures (e.g., SUBNET) provide robustness against measurement noise, using sub-sequences to efficiently approximate the simulation loss and jointly train state encoders (Moradi et al., 20 Feb 2025). Consistency analyses confirm that, under standard assumptions and for sufficiently expressive neural classes, the minimizer converges to the true system in probability as data volume increases.

4. Physical Structure, Stability, and Theoretical Guarantees

By construction, pHNNs embed the dissipation inequality for any parameter setting:

$u\in\mathbb{R}^m$ 7

where equality holds only in the absence of dissipation. This ensures passivity—the system cannot create energy beyond that supplied through ports—guaranteeing physically plausible behavior during (and after) learning (Cherifi et al., 10 Jan 2025, Persio et al., 8 Sep 2025, Roth et al., 4 Feb 2025).

Key mathematical properties include:

Lyapunov stability: the Hamiltonian acts as a Lyapunov function. If $u\in\mathbb{R}^m$ 8 is strictly positive definite except at equilibrium, global asymptotic stability follows (Roth et al., 4 Feb 2025).
Passivity and energy dissipation: enforced strictly via network architecture; the model cannot violate conservation or produce artifacts present in unconstrained black-box models.
Distributed stability: compositional architectures (block-diagonal $u\in\mathbb{R}^m$ 9) preserve system passivity and facilitate scalable modeling of networked multi-physics systems. Passivity and finite $y\in\mathbb{R}^m$ 0-gain can be ensured at the subsystem and network levels (Furieri et al., 2021, Zakwan et al., 2024).

In the stochastic regime, enforcing the so-called coisotropy condition ( $y\in\mathbb{R}^m$ 1) for learned diffusion blocks ensures that injected noise is energy-conserving in expectation, and the system remains passive on average (Persio et al., 8 Sep 2025).

5. Applications, Empirical Results, and Advantages

pHNNs have demonstrated substantial empirical advantages:

Superior long-term prediction and drift-resistance: In mass-spring, levitated ball, and permanent-magnet synchronous motor systems, physics-informed pHNNs yield normalized MAE an order of magnitude below that of black-box MLPs for moderate or large datasets. Long-term rollouts remain stable; black-box models rapidly diverge (Cherifi et al., 10 Jan 2025).
Parameter recovery and interpretability: Scatter plots of true vs. learned parameters (e.g., $y\in\mathbb{R}^m$ 2, $y\in\mathbb{R}^m$ 3) show near-linear correlation, especially when prior structure is enforced.
Data efficiency: Partial priors (constant $y\in\mathbb{R}^m$ 4, quadratic $y\in\mathbb{R}^m$ 5) allow pHNNs to reach black-box accuracy with an order of magnitude fewer trajectories.
Robustness to noise: Accurate long-term extrapolation persists under significant measurement noise, with error proportional to noise amplitude (Moradi et al., 20 Feb 2025).
Extension to DAEs and constrained systems: Each component (including algebraic constraints) can be parameterized by neural networks, with training leveraging index reduction and differentiable solvers (Neary et al., 2024, Hagelaars et al., 23 Jan 2026).
Scalability and compositionality: Modular training of submodels—each a pHNN—enables re-use via block-diagonal composition to build large networks with preservation of physical properties (Neary et al., 2022, Rettberg et al., 2024, Otterdijk et al., 2024).
Numerical studies confirm robustness across a range of systems: nonlinear oscillators, power networks, robotic arms, stochastic systems with environmental noise, and high-dimensional disc-brake thermoelastic models (Persio et al., 8 Sep 2025, Rettberg et al., 2024).

6. Extensions: Distributed, Stochastic, and Initialization Strategies

Distributed pHNNs

Distributed control policies for networked systems can be built by endowing each agent with a local pHNN controller, interconnected via a sparse communication graph. Passivity and stability extend to the entire network irrespective of individual controller parameterizations (Zakwan et al., 2024, Zakwan et al., 2024).

Stochastic pHNNs

Stochastic extensions add state-dependent diffusion through $y\in\mathbb{R}^m$ 6 while preserving the Dirac structure and passivity “in expectation”. Coisotropy constraints enforce that noise is tangent to the Hamiltonian level sets, preventing unphysical energy injection. Energy dissipation properties, Lyapunov stability, and expectation inequalities (via generators) support model robustness under stochasticity (Persio et al., 8 Sep 2025, Persio et al., 2024).

Initialization and Training Stability

Improved training reliability is achieved by initializing neural pH components with a linear port-Hamiltonian model estimated from data (using subspace or KYP methods). Neural network blocks are initialized to zero so the pHNN exactly reproduces the linear model at onset, then learns nonlinear corrections. This strategy reduces variance, accelerates convergence, and avoids poor local minima (Otterdijk et al., 27 Jan 2026).

7. Limitations and Open Problems

Despite their strong empirical and theoretical advantages, pHNNs have several limitations:

Training overhead: Enforcing structure, especially in high-dimensional systems or stochastic settings with coisotropy projection, increases computational burden.
Model flexibility: Restricting noise to be energy-conserving (coisotropy) may exclude some physical stochastic effects.
Identifiability and scalability: Learning in large-scale networks with many subsystems or partial observability remains challenging.
High-index DAE extension: Most neural DAE approaches assume index-1; higher-index constraints (e.g., in electrical grids) require further advances in index reduction and solver design (Neary et al., 2024).
Data requirements: pHNNs outpace black-box models with moderate data but do not outperform in the extreme small-data regime unless strong priors are available (Cherifi et al., 10 Jan 2025).

Further research is focused on Bayesian uncertainty quantification, extension to field-theoretic (PDE) port-Hamiltonian systems, incorporation of graph/topology learning, and combining pHNNs with reinforcement learning for autonomous control under uncertainty (Neary et al., 2024, Persio et al., 8 Sep 2025).