Equivariant Regularization by Denoising (ERED)

Updated 15 November 2025

The paper introduces a unified framework that integrates group-averaged denoising to enforce symmetry-aware regularization in inverse problems.
It employs a variational formulation paired with stochastic optimization to ensure convergence to symmetry-compliant solutions while mitigating artifacts.
Empirical results in imaging, diffusion-based generative models, and atomistic force fields demonstrate quantitative gains and enhanced physical consistency.

Equivariant Regularization by Denoising (ERED) is a unified framework for incorporating symmetry priors into optimization and inference tasks via learned denoisers, systematically enforcing equivariance with respect to group actions such as rotations, flips, or more general symmetries. ERED generalizes earlier Regularization by Denoising (RED) paradigms by introducing explicit group-averaging in both regularization and algorithmic steps, ensuring that the inferred solutions inherit the prescribed invariances of the target distribution. The method has been concretely developed for imaging inverse problems, diffusion-based generative modeling, and 3D atomistic structure prediction, establishing precise variational objectives, optimization algorithms, and convergence theory as well as demonstrating empirical gains in multiple real-world domains (Renaud et al., 6 Dec 2024, Renaud et al., 13 Nov 2025, 2505.22973, Liao et al., 14 Mar 2024).

1. Mathematical Foundations and Variational Formulation

ERED is built upon a variational principle that incorporates both data-fidelity and symmetry-aware regularization. Let $y \in \mathbb{R}^m$ be observed (potentially degraded) data, with forward operator $A \in \mathbb{R}^{m \times d}$ and Gaussian noise $n \sim \mathcal{N}(0, \sigma_y^2 I)$ . For imaging, the standard likelihood is $f(x) = \|A x - y\|^2 / \sigma_y^2$ , where $x \in \mathbb{R}^d$ is the unknown clean signal.

ERED augments the objective with an equivariant regularizer derived from the prior $p(x)$ under a group $G$ of symmetries sampled via measure $\pi$ : $x^* = \arg\min_{x \in \mathbb{R}^d} F^{\pi}_\sigma(x) = f(x) + \lambda\, r^{\pi}_\sigma(x)$ where

$r^{\pi}_\sigma(x) = -\mathbb{E}_{G \sim \pi} [\log p_\sigma(G(x))]$

with $p_\sigma(x)$ the Gaussian-smoothed prior. The gradient (equivariant score) becomes

$s^{\pi}_\sigma(x) = \nabla r^{\pi}_\sigma(x) = \mathbb{E}_{G \sim \pi} [J_G^T(x)\nabla \log p_\sigma(G(x))]$

where $J_G(x)$ is the Jacobian of the group action. For classic image processing, $G$ may be the dihedral group $D_4$ (rotations/flips); for translational symmetry, $G$ can be sampled as Gaussian translations. This formulation directly models the fundamental invariances of $p(x)$ , and by construction, solutions are equivariant under $G$ (Renaud et al., 6 Dec 2024, Renaud et al., 13 Nov 2025).

2. Construction of Equivariant Denoisers

The ERED mechanism employs a denoiser $D_\sigma$ —typically a convolutional neural network trained for Gaussian denoising at level $\sigma$ —to define and evaluate the regularizer via the Tweedie formula. The equivariant denoiser, given any $D_\sigma$ , is constructed by group-averaging: $\tilde{D}_\sigma(x) = \mathbb{E}_{G \sim \pi} [J_G^T(x) D_\sigma(G(x))]$ In practical terms, for finite groups of linear isometries, $J_G^T = G^{-1}$ , and thus

$\tilde{D}_\sigma(x) = |G|^{-1}\sum_{g \in G} g^{-1}[D_\sigma(gx)]$

No re-training of $D_\sigma$ is necessary; any pre-trained denoiser may be wrapped in this symmetrization, making the method architecture-agnostic. This approach subsumes previous schemes such as noising–denoising regularizers (SNORE) or stochastic RED, depending on group choice (Renaud et al., 13 Nov 2025, Renaud et al., 6 Dec 2024).

3. Stochastic Optimization Algorithm

ERED leverages stochastic gradient-based optimization, where each iteration samples a group element $G_k \sim \pi$ and computes a step using the group-averaged score. The update rule, as formalized in (Renaud et al., 6 Dec 2024) and (Renaud et al., 13 Nov 2025), is: $x_{k+1} = x_k - \delta_k \Bigl( \nabla f(x_k) + \frac{\lambda}{\sigma^2} J_{G_k}^T(x_k) [G_k(x_k) - D_\sigma(G_k(x_k))] \Bigr)$ with step size $\delta_k$ . The explicit pseudocode is:

for k in range(N):
    G = sample_group_element()
    u = G.apply(x)
    d = denoiser(u)
    s = (1 / sigma**2) * G.jacobian_transpose(x) @ (u - d)
    g = grad_f(x)
    x = x - delta * (g + lambda_ * s)

In the RED framework, this reduces to classic stochastic gradient descent, with the denoiser score replaced by sampled group-averaged scores. The approach is unbiased if $D_\sigma$ is MMSE-optimal, with bias controlled otherwise (Renaud et al., 13 Nov 2025, Renaud et al., 6 Dec 2024).

4. Convergence Theory and Critical Points

The convergence properties of ERED are established via stochastic optimization theory. Under conditions of smoothness (of $f$ ), boundedness of moments (of $J_G$ and group transformations), and appropriate step size schedules ( $\sum \delta_k = \infty$ , $\sum \delta_k^2 < \infty$ ), ERED iterates converge almost surely to a critical point of $F^{\pi}_\sigma(x)$ , provided $D_\sigma$ is an unbiased estimator:

$\lim_{k \to \infty} \text{dist}(x_k, S_\sigma) = 0$ where $S_\sigma$ is the critical set
$\|\nabla F^{\pi}_\sigma(x_k)\| \to 0$
The objective sequence converges

For biased denoisers, the gradients approach an $O(\eta^{1/2})$ neighborhood, where $\eta$ quantifies the mean denoiser bias. As $\sigma \to 0$ and under further structural regularity (such as subanalyticity and $\pi$ -equivariance), the critical points of $F^{\pi}_\sigma$ recover those of the original MAP estimation problem $f+r$ (Renaud et al., 13 Nov 2025, Renaud et al., 6 Dec 2024).

5. Applications and Empirical Results

ERED has been validated in three primary domains: image restoration, diffusion-based inverse problems, and atomistic force field predictions.

Image Restoration

For deblurring (CBSD68, $\sigma_y=5/255$ ), ERED (rotations/flips) achieves 32.51–32.53 dB PSNR, a +0.26–0.28 dB gain over classical RED.
In super-resolution ( $2\times$ ) and SAR-despeckling, ERED yields consistent lifts of 0.1–0.3 dB when the test-time symmetry matches data augmentation used in denoiser training; ERED reduces counterfactual artifacts, especially with flip equivariance (Renaud et al., 6 Dec 2024, Renaud et al., 13 Nov 2025).

Diffusion-based Inverse Problems

In EquiReg (Equi-DPS variant), the distribution-dependent equivariance error is used as a penalty during sampling, constraining the trajectory to the data manifold by enforcing low equivariance error under the symmetry group.
Representative metrics: Gaussian deblur on FFHQ (Equi-DPS): LPIPS 0.114 vs 0.145, FID 48.8 vs 104.8; $4\times$ SR (Equi-PSLD): PSNR 26.14 dB vs 24.51 dB (2505.22973).

Atomistic Force Fields

In equivariant GNNs (Equiformer V2), ERED serves as an auxiliary task: atom positions are corrupted, and the model is trained to denoise the structure conditioned on its true forces, preserving E(3) equivariance.
OC20: force MAE reduced from 19.32 to 18.49 meV/Å; OC22: force MAE from 30.70 to 27.82 meV/Å (−9.4%), energy MAE reduced up to 16.7% on test splits; MD17: force MAE reduced by 9–15% with minimal increase in training time (Liao et al., 14 Mar 2024).

6. Implementation Architecture and Practical Considerations

The backbone denoiser is typically a DRUNet-style U–Net with residual structure, trained via supervised regression on noisy image patches. No special architectural modifications are required for equivariance: group actions are applied externally at inference, and symmetrization is achieved by group-averaging denoiser outputs. For diffusion models and atomistic GNNs, ERED can be plugged into existing samplers as an extra penalty term, or as an auxiliary pretext loss. The computational load increases by a factor $|G|$ , which is tractable for moderate-size groups or can be reduced through stochastic sampling.

Group selection is problem-specific: for natural images, the dihedral group is standard; atomistic systems require E(3) (rotations, translations) equivariance, and physical PDEs may involve commutativity under flips/rotations depending on invariances in $\mathcal{G}(a) \mapsto u$ mappings (Renaud et al., 6 Dec 2024, Renaud et al., 13 Nov 2025, 2505.22973, Liao et al., 14 Mar 2024).

7. Impact, Limitations, and Extensions

ERED provides a general methodology to enforce global geometric priors in plug-and-play regularization, producing solutions less prone to symmetry-violating artifacts and more robust to off-manifold perturbations. The gain in quantitative performance is often modest (0.1–0.3 dB PSNR for images, up to 16.7% MAE reduction for atomistic GNNs), but qualitative improvements (e.g., reduced hallucinations, more physical consistency) are evident. The main limitations are the need to identify appropriate symmetry groups and potential increase in computational cost with large $|G|$ . Extensions include annealing the noise level $\sigma$ , composing equivariant and stochastic regularization (as in SNORE or Equi-PnP), and further architectural development of group-equivariant networks and regularizers. Future directions anticipate automated symmetry discovery and adaptive weighting of symmetry-induced penalties.

Domain	Benchmark/Task	Gain from ERED
Image restoration	Deblur/CBSD68 ( $\sigma_y=5/255$ )	$+0.26$ dB PSNR, improved artifact suppression
Diffusion inverse	FFHQ Gaussian deblur	LPIPS $0.114$ vs $0.145$ (Equi-DPS vs DPS)
Atomistic GNN force	OC20/OC22	$5$–$16.7$\% reduction in force/energy MAE

ERED thus forms a principled paradigm for symmetry-aware regularization by denoising, resolving equivariance both at the model output and during iterative optimization. Its adaptability and theoretical basis enable integration into a broad range of inverse, generative, and scientific learning problems with inherent symmetries.