Energy-Based Models (EBM)
- Energy-Based Models (EBMs) are probabilistic models that define data density via an energy functional, playing a key role in inverse imaging tasks like denoising, tomography, and inpainting.
- They ensure well-posed and stable Bayesian inference by satisfying integrability and continuity conditions, which are essential for reliable image reconstruction.
- EBMs combine advanced training strategies—such as score matching and bilevel optimization—with efficient sampling algorithms like Langevin MCMC to achieve robust estimations and uncertainty quantification.
An energy-based model (EBM) is a probabilistic model that defines the density of data via a Gibbs or Boltzmann distribution, , where is an energy functional. In the context of inverse imaging problems, EBMs are used within a rigorous Bayesian framework to model the unknown image given forward measurements and a typically ill-posed measurement process. The use of EBMs as priors enables highly expressive modeling of natural image statistics, and—if properly constructed—yields well-posed, stable Bayesian inference for a broad range of inverse problems, including denoising, tomographic inversion, and inpainting. This modeling paradigm unites theory, learning strategies, and practical computational algorithms under a common formalism and supports both point estimation and principled uncertainty quantification.
1. Bayesian Formulation and Theoretical Well-Posedness
In the Bayesian framework for inverse imaging, the measurement process is modeled as , where is the forward operator and represents noise or corruption. Both (the unknown image) and (the observed data) are random variables, and the posterior is defined as
where is the prior and is the data likelihood.
For EBMs, the prior is assigned as a Gibbs distribution: where is an energy functional, often with learnable parameters. Validity and stability of the posterior in both finite- and infinite-dimensional spaces (e.g., when is a function or image) require sufficient coercivity and integrability conditions on , i.e., that . Under these conditions, one can prove existence, uniqueness, and stability of the posterior. Continuity and stability are often measured using metrics such as the Hellinger distance
with the property that if is continuous and dominated, is stable in the Hellinger and total variation sense (Habring et al., 16 Jul 2025).
In infinite-dimensional settings (common for function-space or PDE-constrained imaging), Gaussian reference measures are employed, and additional care is taken to guarantee absolute continuity and stability of the posterior.
2. Learning Energy-Based Priors
The energy functional must be expressive enough to capture the high-order structure of natural image statistics. Several training strategies are discussed:
- Divergence Minimization/Maximum Likelihood: For sample-based (empirical) distributions , maximum likelihood minimizes
but since the normalization constant (partition function) is intractable, direct estimation is often circumvented.
- Score Matching and Denoising Score Matching: Instead of matching densities, one matches gradients—i.e., the "score" . The denoising score-matching objective
directly relates to the optimal denoiser via Tweedie's formula, allowing stochastic gradient-based training without the need to compute .
- Bilevel Optimization for MAP Quality: Instead of only matching densities, a bilevel approach optimizes for end-task performance by embedding in the lower-level variational reconstruction problem
and using the upper-level loss (e.g., squared error to ground truth) to refine , requiring (implicit) differentiation through the optimization process.
- Architectural Choices: The classical Fields-of-Experts (FoE) model is employed, with
where convolutions and nonlinear potentials (e.g., negative-log Gaussian mixtures) are learnable. This guarantees both interpretability and control over the model's mathematical properties.
3. Sampling Algorithms for Posterior Inference
EBMs define distributions only up to an intractable normalization constant; consequently, inference algorithms rely on Markov chain Monte Carlo (MCMC) approaches:
- Metropolis–Hastings (MH): Proposes samples , and accepts transitions with probability
For EBM priors, the normalization constant cancels and samples are generated according to the posterior.
- Gibbs Sampling: Alternating sampling for model blocks or augmented latent variables when conditionals are tractable.
- Langevin Monte Carlo and Hamiltonian Monte Carlo: For high-dimensional problems, overdamped Langevin diffusion is utilized:
More advanced samplers use underdamped (kinetic) Langevin or HMC, introducing momentum variables and symplectic integration (e.g., leapfrog), with accept/reject steps to ensure detailed balance. Under suitable smoothness and coercivity assumptions, convergence rates and ergodicity can be established, ensuring that empirical and theoretical distributions coincide asymptotically.
These sampling strategies enable MMSE or MAP estimates, and—through sample-based statistics—uncertainty quantification.
4. Numerical Experiments in Inverse Imaging
Empirical results are provided on standard inverse imaging tasks:
- Denoising, where is the identity and is additive Gaussian noise.
- Reconstruction from Fourier Samples, simulating MRI or compressed sensing, with as a partial Fourier transform.
- Reconstruction from Radon Samples, for computed tomography (CT), using the Radon transform as .
Both MAP and MMSE estimates are computed, with the latter requiring efficient sampling. Learned EBM priors (trained via score-matching or bilevel approaches) consistently outperform classical regularizers (such as total variation), as shown by PSNR improvements and qualitative visual results that display fewer artifacts and better naturalistic details. Uncertainty maps (pixelwise standard deviations) provide additional information not available with deterministic approaches.
5. Verification and Theoretical Properties
Explicit attention is devoted to verifying that learned energy functions yield proper, stable Bayesian posteriors:
- Integrability and Coercivity: is constructed so that is a proper density (i.e., normalization constant ), with coercivity or added nullspace penalization where necessary.
- Continuity and Stability: Under regularity and boundedness conditions, the mapping is continuous (e.g., in Hellinger distance), ensuring that reconstructions are stable to data perturbations.
- Ergodicity of Samplers: Markov properties and detailed balance are established for each class of MCMC methods, guaranteeing that empirical averages converge to posterior expectations.
- Structural Properties: The architecture (e.g., Fields-of-Experts with full-rank convolutional filters) is chosen to avoid degeneracies (e.g., singularities in the induced precision matrix). Visualizations of learned filters and associated potentials further validate proper regularization behavior.
6. Significance and Context
Energy-based models provide a unifying regularization framework in Bayesian inverse imaging, enabling the integration of learned, data-adaptive priors with explicit measure-theoretic guarantees. Their flexibility supports the modeling of highly complex image distributions, while properly constructed training and sampling algorithms provide principled uncertainty quantification—an essential requirement in high-stakes imaging tasks such as medical reconstruction, astronomical imaging, and geophysical inversion.
Advances in learning (e.g., denoising score matching, bilevel optimization), sampling (e.g., ULA/HMC), and theory (well-posedness in infinite-dimensions) ensure that EBMs can be robustly applied to inverse imaging. Verification steps are critical to ensure all necessary assumptions for statistical validity are satisfied. Consistent empirical improvements over classical priors and compatibility with uncertainty-driven inference make energy-based modeling for inverse problems a central paradigm in contemporary computational imaging (Habring et al., 16 Jul 2025).