Energy-Based Message Passing

Updated 2 December 2025

Energy-Based Message Passing is a framework that defines explicit energy functionals over graphs to guide message updates using principles from physics and convex optimization.
It incorporates methodologies like Bethe free energy, Dirichlet energies, and reaction–diffusion dynamics to achieve robust inference and enhanced neural network performance.
Practical implementations include neural MPNNs, diffusion-inspired Transformers, and convex free-energy approaches, demonstrating state-of-the-art results in molecular prediction, particle tagging, and graph clustering.

Energy-based message passing encompasses a family of algorithms and neural architectures in which the propagation of information across nodes or edges of a graph is governed by principles of energy minimization, energy functionals, or energy-weighted aggregation. This paradigm directly connects probabilistic inference, physical diffusion, and neural architectures by grounding message passing dynamics in formulations derived from energy minimization, convex analysis, or physical principles such as reaction-diffusion, Dirichlet energy, and potential energy surfaces.

1. Mathematical Foundations: Energy Functionals for Message Passing

Energy-based message passing algorithms define explicit energy functionals over graphs that govern the structure of messages and update rules. The foundational mathematical constructs include:

Bethe Free Energy: For probabilistic graphical models, the Bethe free energy

$F_B[\{b_a\},\{b_i\}] = \sum_{a}\int b_a(x_a)\,\ln \frac{b_a(x_a)}{f_a(x_a)}\,dx_a - \sum_{i}(A_i-1)\int b_i(x_i)\ln b_i(x_i)\,dx_i$

where $b_a(x_a)$ (factor beliefs) and $b_i(x_i)$ (marginals) are optimized with marginal consistency constraints (Zhang et al., 2017).

Dirichlet-type Energies: Used in graph neural networks (GNNs) and diffusion models, e.g.,

$E(x) = \frac{1}{N}\sum_{i\in V} \sum_{j\in N(i)} a_{ij}\|x_i-x_j\|^2$

where $a_{ij}$ is typically a normalized adjacency or affinity matrix (Wang et al., 2022, Wu et al., 13 Sep 2024). Nonlinear or higher-order extensions add cluster separation, double-well potentials, or other regularization.

Potential Energy Surface Models: For molecular systems, nonlocal equivariant energy-based methods describe node and edge descriptors using atomic orbital bases and iterative updates ensuring invariance/equivariance under physical symmetries (Wu et al., 30 Sep 2024).

Physical analogs, such as electrical network minimum energy flows and Ewald summation, serve as domain-grounded energy formulations guiding message weighting and nonlocal aggregation (Cai et al., 2017, Kosmala et al., 2023).

2. Algorithmic Realizations: Message Passing Architectures

Energy-based message passing is realized in diverse algorithmic settings:

Neural MPNNs with Edge Updates: The model in "Neural Message Passing with Edge Updates for Predicting Properties of Molecules and Materials" uses edge states $e_{ij}^t$ updated as

$e_{ij}^{t+1} = g(W_{E2}^{t+1} g(W_{E1}^{t+1} [h_i^{t+1}; h_j^{t+1}; e_{ij}^t]))$

with $g$ the shifted soft-plus function, modulating information flow based on energy-like filter dynamics (Jørgensen et al., 2018).

Energy-weighted Message Passing for IRC Safety: In jet tagging, energy-weighted messages take the form

$m_{ij} = \omega_j^{(\mathcal N[i])}\; \widehat\Phi(\hat p_i,\hat p_j,e_{ij})$

with $\omega_j$ normalized transverse-momentum weights, achieving provable invariance under soft/collinear splitting and outperforming Energy Flow Networks (EFN) (Konar et al., 2021).

Allen–Cahn Message Passing (ACMP): Implements reaction–diffusion dynamics based on the Allen–Cahn equation; each graph layer corresponds to Euler steps of the ODE:

$\dot{x}_i = \alpha\sum_{j}(a_{ij}-\beta_{ij})(x_j-x_i) + \delta x_i(1-x_i^2)$

which ensures a strictly positive lower bound on Dirichlet energy, allowing deep, non-oversmoothing representation learning (Wang et al., 2022).

Diffusion-inspired Transformers (DIFFormer): Propagation layers are explicit Euler steps of gradient descent on global energy, e.g.,

$z_i^{(k+1)} = (1-\tau\sum_j s_{ij}^{(k)}) h_i^{(k)} + \tau \sum_j s_{ij}^{(k)} h_j^{(k)}$

Attention coupling $s_{ij}^{(k)}$ is derived from energy function derivatives (Wu et al., 13 Sep 2024). This framework unifies attention MPNNs, classical GNNs, and Transformers as energy descent steps.

Minimum-energy Weighted Message Passing for SBM Recovery: Optimal label reconstruction in stochastic block models is achieved by initializing leaf node messages with minimum energy flow values and linear weight propagation guided by eigenvectors of the transition kernel, with error bounded exponentially in the minimum energy (Cai et al., 2017).
Convex Free-energy Message Passing for General Inference: BP-like algorithms minimize strictly convex energy functions $F_{\rm conv}(b)$ and are guaranteed to converge globally for arbitrary graphs, with messages updated in sequential or parallel block coordinate descent (Hazan et al., 2012).

3. Graph Construction and Energy-based Invariance

Energy-based message passing often demands domain-specific graph construction tailored for energy functional properties:

Crystallography and Materials: K-nearest neighbor graphs (constant in-degree), distance cutoff, and Voronoi tessellation are systematically analyzed; the K-NN graphs $K=24$ provide best accuracy for energy prediction due to stable neighbor counts (Jørgensen et al., 2018).
Infrared/Collinear Safe Graphs in Particle Physics: Graph construction rules are required to be invariant under the QCD IR limit ( $z\to 0$ ) and collinear limit ( $\Delta_{rs}\to 0$ ), leading to structurally invariant message-passing under physical symmetries. Fixed-radius graphs in $(\eta,\phi)$ are a simple IRC-safe solution (Konar et al., 2021).
Long-range Nonlocality: In molecular graphs, Ewald message passing achieves nonlocal aggregation via reciprocal-space Fourier bases, supplementing real-space short-range messages (Kosmala et al., 2023). Equivariant models use atom–orbital descriptors expanded in spherical/cylindrical harmonics for accurate nonlocal potential energy surface modeling (Wu et al., 30 Sep 2024).

4. Training Objectives and Empirical Evaluation

Training in energy-based message passing leverages loss functions directly linked to energy targets, with empirical performance established on benchmarks:

Formation Energy Regression:
- MAE on QM9 formation energy: 10.5 meV (edge-update model); SchNet baseline: 13.6 meV.
- OQMD: 14.9 meV/atom (K-NN), outperforming SchNet and Voronoi+RF (Jørgensen et al., 2018).
Classification and Tagging:
- Energy-weighted message passing: AUC 0.8919–0.9782 over leading EFNs for LHC jet tagging, with strict IRC safety (Konar et al., 2021).
Phase Separation and Deep GNNs:
- ACMP avoids oversmoothing in deep GNNs (up to $128$ layers) with sustained accuracy (80–85%), unlike collapse in GCN/GAT architectures (Wang et al., 2022).
Active Inference, EFE Minimization:
- Agents using message passing with epistemic priors outperform KL-control agents in gridworld and minigrid tasks, achieving higher success rates and more robust, risk-sensitive, and information-seeking behavior (Nuijten et al., 4 Aug 2025, Kouw et al., 29 Sep 2025).
Convex Energy-based Inference:
- Convex-L2 free-energy message passing achieves competitive marginal accuracy and reliable global convergence across Ising grids and random graphs; BP often fails on loopy or mixed-interaction graphs, while convex energy messaging remains robust (Hazan et al., 2012).

5. Connections to Physical, Probabilistic, and Neural Principles

Energy-based message passing forms a bridge between theoretical physics, probabilistic inference, and neural network computation:

Gradient descent on energy functionals formalizes many-layer neural propagation as finite-difference integration of physical diffusion, reaction–diffusion, or potential energy dynamics (Wang et al., 2022, Wu et al., 13 Sep 2024).
The Bethe and convex free energy frameworks unify message-passing algorithms (BP, EP, VMP) under constrained optimization, enabling principled derivation of hybrid or tractable algorithms for complex inference tasks (Zhang et al., 2017, Hazan et al., 2012).
Optimal message weighting for recovery of community labels or physical quantities exploits minimum energy flows and eigenvector alignment, as shown in community detection and jet classification (Cai et al., 2017, Konar et al., 2021).
Epistemic priors, as explicit energy-modulating factors, endow agents with risk-sensitivity and information-seeking strategies within planning and exploration, grounded in expected free energy minimization and tractable factor-graph message passing (Nuijten et al., 4 Aug 2025, Kouw et al., 29 Sep 2025).

6. Computational Complexity and Scalability

Energy-based message passing techniques are designed for scalability, with per-iteration costs matching their classical analogs and only modest overhead for nonlocal or equivariant augmentations:

Neural MPNNs and Ewald MP: Per-layer complexity scales as $O(N K O)$ for $N$ nodes, $K$ neighbors, and $O$ channels; nonlocal Fourier messages introduce $O(N N_k)$ overhead but remain tractable for $N_k \lesssim 100$ (Kosmala et al., 2023, Wu et al., 30 Sep 2024).
Convex energy minimization: Sequential or parallel message update schedules maintain $O(\text{edges})$ or $O(\text{variables})$ cost, with empirical runtime comparable to BP and far faster than LP-based solvers (Hazan et al., 2012).
Active inference and EFE minimization: Message passing converts exponential policy enumeration into linear computational cost in planning horizon, enabling real-world deployment of agents (Nuijten et al., 4 Aug 2025, Kouw et al., 29 Sep 2025).

7. Significance, Open Directions, and Domain-specific Impact

Energy-based message passing advances inference and learning by unifying disparate principles—from variational graphical model inference and physical diffusion to invariant neural architectures and optimal decision-making. The framework has yielded empirical state-of-the-art results across molecular property prediction, particle tagging, graph classification, robust planning, and hybrid inference.

Key open areas include:

Integration of dynamic, learned energy functionals for adaptive attention and context-sensitive aggregation (Wu et al., 13 Sep 2024).
Expansion of equivariant, nonlocal descriptors for quantum chemistry and materials (Wu et al., 30 Sep 2024).
Further theoretical analysis of convergence, optimality, and robustness in energy-constrained deep architectures (Hazan et al., 2012).
Application to uncertain environments via active inference, with epistemic energy as a foundational planning and control principle (Nuijten et al., 4 Aug 2025, Kouw et al., 29 Sep 2025).

The paradigm is increasingly central to bridging methodological advances in probabilistic modeling, neural message passing, and physics-based machine learning.