Papers
Topics
Authors
Recent
Search
2000 character limit reached

Denoising Entropy: Methods & Applications

Updated 4 July 2026
  • Denoising entropy is a family of entropy-based measures used to characterize, control, and evaluate various denoising and reverse-diffusion processes.
  • Different formulations apply entropy at distinct stages, including internal control signals, reverse transition analysis, representation priors, and diagnostic evaluation maps.
  • Utilizing entropy metrics enables optimized denoising policies, enhanced sampling efficiency, and improved assessment of restoration quality in complex models.

Searching arXiv for the cited papers to ground the article in current preprints. Denoising entropy denotes a family of entropy-based quantities used to characterize, control, or evaluate denoising and reverse-diffusion processes. In current research, it does not name a single canonical observable. Instead, it refers to several distinct constructions: attention entropy inside a denoising trajectory, predictive entropy of a noise-aware classifier, conditional entropy of reverse transitions, wavelet-domain entropy of detail coefficients, entropy-coded latent complexity in compression-based restoration, and entropy-derived diagnostics such as directional anisotropy or entropy maps (Li et al., 6 Feb 2026, Li et al., 2022, Li et al., 30 Sep 2025, Rhee et al., 18 Jun 2026, Nguyen et al., 12 Feb 2026, Gabarda et al., 2011). This plurality is not merely terminological. Different formulations place entropy at different loci of the denoising pipeline: as an internal state variable, an optimization target, a prior on representational complexity, or a no-reference quality indicator.

1. Conceptual scope and principal formulations

The literature uses denoising entropy in at least four technically distinct senses. Some works measure entropy on model-internal distributions during iterative denoising, such as cross-attention over prompt tokens or classifier posteriors over classes. Others treat denoising as conditional-entropy reduction between adjacent reverse-time states. A third line places entropy in the representation itself, typically through wavelet-domain statistics or entropy-coded latent variables. A fourth uses entropy-derived maps or directional entropy to assess denoising quality after reconstruction (Li et al., 6 Feb 2026, Li et al., 30 Sep 2025, Rhee et al., 18 Jun 2026, Gabarda et al., 2011).

Entropy object Operational role Representative papers
Attention or predictive entropy during denoising Online control of guidance, rollout allocation, or keyframe selection (Li et al., 6 Feb 2026, Li et al., 2022, Chen et al., 29 Jun 2026)
Conditional entropy or KL between adjacent states Reverse-process analysis and sampler design (Li et al., 30 Sep 2025, Kim et al., 2024, Zhang et al., 1 Jan 2026)
Wavelet or latent-code entropy Regularization and low-complexity priors (Rhee et al., 18 Jun 2026, Nguyen et al., 12 Feb 2026)
Entropy-derived evaluation maps No-reference assessment of residual noise and blur (Gabarda et al., 2011, Boriskov et al., 15 Nov 2025)

This heterogeneity implies that “entropy” in denoising is best treated as a structural descriptor of uncertainty or complexity rather than as a fixed formula. In some papers it is an explicit Shannon- or KL-type quantity; in others it is a proxy derived from eigenspectra, code lengths, or local irregularity. This suggests that the unifying theme is not the specific entropy functional, but the use of entropy-like quantities to separate structured signal from nuisance randomness.

2. Entropy as an internal control signal in iterative denoising

A prominent recent use of denoising entropy is to monitor the internal state of a denoiser while sampling or reinforcement-learning it. In "AEGPO: Adaptive Entropy-Guided Policy Optimization for Diffusion Models" (Li et al., 6 Feb 2026), attention entropy is computed from cross-attention maps between image features and text tokens. For timestep tt, the paper defines the local signal

Entropy(t)=1Ni=1NEntropyt[qi],\mathrm{Entropy}(t) = \frac{1}{N}\sum_{i=1}^{N}\mathrm{Entropy}_t[q_i],

where each Entropyt[qi]\mathrm{Entropy}_t[q_i] is the Shannon entropy of the normalized attention distribution over text tokens for a fixed image feature. The paper then defines a policy-relative quantity

ΔEntropy=1Tt=1TEntropyθ(t)Entropybase(t),\Delta\mathrm{Entropy} = \frac{1}{T}\sum_{t=1}^{T}\left|\mathrm{Entropy}_\theta(t)-\mathrm{Entropy}_{\mathrm{base}}(t)\right|,

used as a sample-level proxy for learning value. The two signals are operationally separated: absolute Entropy(t)\mathrm{Entropy}(t) identifies critical denoising moments, while ΔEntropy\Delta\mathrm{Entropy} measures deviation from the base policy and allocates more rollout budget to prompts with larger policy-attention shifts. The reported peak distribution is U-shaped or bimodal, with one cluster at very early steps and another at late steps, so valuable branching moments are not uniformly distributed across the trajectory. In all experiments, local exploration uses top-KK entropy peaks with K=4K=4, while global allocation uses rlow=8r_{\text{low}}=8, rhigh=16r_{\text{high}}=16, and Entropy(t)=1Ni=1NEntropyt[qi],\mathrm{Entropy}(t) = \frac{1}{N}\sum_{i=1}^{N}\mathrm{Entropy}_t[q_i],0, after a 20-step warmup. On text-to-image alignment, the paper reports that entropy-guided branching improves Reward Std, LPIPS MPD, and TCE over fixed schedules, raises BranchGRPO on FLUX.1-dev from HPS-v2.1 Entropy(t)=1Ni=1NEntropyt[qi],\mathrm{Entropy}(t) = \frac{1}{N}\sum_{i=1}^{N}\mathrm{Entropy}_t[q_i],1 to Entropy(t)=1Ni=1NEntropyt[qi],\mathrm{Entropy}(t) = \frac{1}{N}\sum_{i=1}^{N}\mathrm{Entropy}_t[q_i],2, and yields 2× faster convergence on DanceGRPO and 5× faster convergence on DiffusionNFT despite an 11.1% per-step time increase and about 1 GB additional VRAM (Li et al., 6 Feb 2026).

In classifier-guided diffusion, entropy appears as predictive uncertainty rather than attention dispersion. "Entropy-driven Sampling and Training Scheme for Conditional Diffusion Generation" (Li et al., 2022) defines

Entropy(t)=1Ni=1NEntropyt[qi],\mathrm{Entropy}(t) = \frac{1}{N}\sum_{i=1}^{N}\mathrm{Entropy}_t[q_i],3

the entropy of the noise-aware classifier’s class distribution at denoising step Entropy(t)=1Ni=1NEntropyt[qi],\mathrm{Entropy}(t) = \frac{1}{N}\sum_{i=1}^{N}\mathrm{Entropy}_t[q_i],4. The paper argues that classifier guidance often vanishes early because the classifier becomes overconfident before the image is fully denoised. Entropy is therefore used as a denoising-time indicator of semantic uncertainty. Sampling replaces a fixed guidance scale Entropy(t)=1Ni=1NEntropyt[qi],\mathrm{Entropy}(t) = \frac{1}{N}\sum_{i=1}^{N}\mathrm{Entropy}_t[q_i],5 with

Entropy(t)=1Ni=1NEntropyt[qi],\mathrm{Entropy}(t) = \frac{1}{N}\sum_{i=1}^{N}\mathrm{Entropy}_t[q_i],6

so low predictive entropy increases the classifier-guidance magnitude. Training adds entropy regularization through

Entropy(t)=1Ni=1NEntropyt[qi],\mathrm{Entropy}(t) = \frac{1}{N}\sum_{i=1}^{N}\mathrm{Entropy}_t[q_i],7

with Entropy(t)=1Ni=1NEntropyt[qi],\mathrm{Entropy}(t) = \frac{1}{N}\sum_{i=1}^{N}\mathrm{Entropy}_t[q_i],8. On ImageNet1000 Entropy(t)=1Ni=1NEntropyt[qi],\mathrm{Entropy}(t) = \frac{1}{N}\sum_{i=1}^{N}\mathrm{Entropy}_t[q_i],9, the paper reports FID improvements from Entropyt[qi]\mathrm{Entropy}_t[q_i]0 to Entropyt[qi]\mathrm{Entropy}_t[q_i]1 for CADM-G and from Entropyt[qi]\mathrm{Entropy}_t[q_i]2 to Entropyt[qi]\mathrm{Entropy}_t[q_i]3 for UADM-G under DDPM 250-step sampling (Li et al., 2022).

A related compute-allocation use appears in "EcoVideo: Entropy-Orchestrated Video Generation Paradigm in Cloud-Edge Dynamics" (Chen et al., 29 Jun 2026). There, early self-attention entropy is computed over the first Entropyt[qi]\mathrm{Entropy}_t[q_i]4 of denoising steps, aggregated to frame-level scores by mean pooling over tokens and stabilized by EMA. High-entropy frames are treated as information-dense keyframes that deserve cloud-side denoising, while low-entropy frames are reconstructed on the edge by interpolation. The keyframe selection rule

Entropyt[qi]\mathrm{Entropy}_t[q_i]5

turns entropy into a frame-wise denoising budget allocator. The paper reports that removing entropy-based keyframe selection lowers VBench from Entropyt[qi]\mathrm{Entropy}_t[q_i]6 to Entropyt[qi]\mathrm{Entropy}_t[q_i]7, while the full method achieves 1.84× end-to-end speedup on Wan2.1 and up to 2.9× in low-bandwidth, compute-limited settings (Chen et al., 29 Jun 2026).

3. Entropy of reverse transitions and denoising difficulty

Another major formulation treats denoising as the progressive reduction of uncertainty in reverse transitions. "EVODiff: Entropy-aware Variance Optimized Diffusion Inference" (Li et al., 30 Sep 2025) formalizes this through the conditional entropy

Entropyt[qi]\mathrm{Entropy}_t[q_i]8

where Entropyt[qi]\mathrm{Entropy}_t[q_i]9. Under the paper’s Gaussian approximation, minimizing conditional variance directly reduces conditional entropy. This gives an information-theoretic interpretation of diffusion inference: successful reverse denoising should shrink the conditional spread of plausible predecessor states. The paper further states that data prediction parameterization reduces reconstruction errors more effectively than noise prediction and, under independence assumptions, also reduces conditional entropy. EVODiff then optimizes stepwise variance-balancing coefficients ΔEntropy=1Tt=1TEntropyθ(t)Entropybase(t),\Delta\mathrm{Entropy} = \frac{1}{T}\sum_{t=1}^{T}\left|\mathrm{Entropy}_\theta(t)-\mathrm{Entropy}_{\mathrm{base}}(t)\right|,0 in a multistep data-prediction sampler. On CIFAR-10, it improves FID at 10 NFE from ΔEntropy=1Tt=1TEntropyθ(t)Entropybase(t),\Delta\mathrm{Entropy} = \frac{1}{T}\sum_{t=1}^{T}\left|\mathrm{Entropy}_\theta(t)-\mathrm{Entropy}_{\mathrm{base}}(t)\right|,1 with DPM-Solver++ to ΔEntropy=1Tt=1TEntropyθ(t)Entropybase(t),\Delta\mathrm{Entropy} = \frac{1}{T}\sum_{t=1}^{T}\left|\mathrm{Entropy}_\theta(t)-\mathrm{Entropy}_{\mathrm{base}}(t)\right|,2; on ImageNet-256, it reaches comparable high-quality sampling at 15 NFE where DPM-Solver++ needs 20 NFE (Li et al., 30 Sep 2025).

A complementary analysis appears in "Denoising Task Difficulty-based Curriculum for Training Diffusion Models" (Kim et al., 2024). That paper studies KL divergence between consecutive forward-process marginals,

ΔEntropy=1Tt=1TEntropyθ(t)Entropybase(t),\Delta\mathrm{Entropy} = \frac{1}{T}\sum_{t=1}^{T}\left|\mathrm{Entropy}_\theta(t)-\mathrm{Entropy}_{\mathrm{base}}(t)\right|,3

as a distribution-level measure of denoising difficulty. The empirical finding is that this relative entropy decreases as ΔEntropy=1Tt=1TEntropyθ(t)Entropybase(t),\Delta\mathrm{Entropy} = \frac{1}{T}\sum_{t=1}^{T}\left|\mathrm{Entropy}_\theta(t)-\mathrm{Entropy}_{\mathrm{base}}(t)\right|,4 increases, so under the paper’s timestep convention smaller ΔEntropy=1Tt=1TEntropyθ(t)Entropybase(t),\Delta\mathrm{Entropy} = \frac{1}{T}\sum_{t=1}^{T}\left|\mathrm{Entropy}_\theta(t)-\mathrm{Entropy}_{\mathrm{base}}(t)\right|,5 corresponds to harder denoising tasks. This aligns with slower convergence at small ΔEntropy=1Tt=1TEntropyθ(t)Entropybase(t),\Delta\mathrm{Entropy} = \frac{1}{T}\sum_{t=1}^{T}\left|\mathrm{Entropy}_\theta(t)-\mathrm{Entropy}_{\mathrm{base}}(t)\right|,6 and motivates an easy-to-hard curriculum over timestep clusters. The reported gains include FFHQ ΔEntropy=1Tt=1TEntropyθ(t)Entropybase(t),\Delta\mathrm{Entropy} = \frac{1}{T}\sum_{t=1}^{T}\left|\mathrm{Entropy}_\theta(t)-\mathrm{Entropy}_{\mathrm{base}}(t)\right|,7 FID ΔEntropy=1Tt=1TEntropyθ(t)Entropybase(t),\Delta\mathrm{Entropy} = \frac{1}{T}\sum_{t=1}^{T}\left|\mathrm{Entropy}_\theta(t)-\mathrm{Entropy}_{\mathrm{base}}(t)\right|,8, ImageNet ΔEntropy=1Tt=1TEntropyθ(t)Entropybase(t),\Delta\mathrm{Entropy} = \frac{1}{T}\sum_{t=1}^{T}\left|\mathrm{Entropy}_\theta(t)-\mathrm{Entropy}_{\mathrm{base}}(t)\right|,9 FID Entropy(t)\mathrm{Entropy}(t)0, and FFHQ Entropy(t)\mathrm{Entropy}(t)1 FID Entropy(t)\mathrm{Entropy}(t)2 (Kim et al., 2024).

In reinforcement learning for flow models, "E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models" (Zhang et al., 1 Jan 2026) defines denoising entropy as the differential entropy of a Gaussian reverse SDE transition: Entropy(t)\mathrm{Entropy}(t)3 The paper argues that high-entropy steps enable more efficient and effective exploration, while low-entropy steps produce undistinguished roll-outs. Consecutive low-entropy steps are therefore merged into one larger stochastic step, and group-normalized advantages are computed only within samples sharing the same consolidated SDE step. The strongest ablation is that training on the first 8 denoising steps yields HPS Entropy(t)\mathrm{Entropy}(t)4, compared with Entropy(t)\mathrm{Entropy}(t)5 for the second 8 steps and Entropy(t)\mathrm{Entropy}(t)6 for all 16 steps, supporting the claim that effective learning is concentrated in high-entropy denoising stages (Zhang et al., 1 Jan 2026).

Taken together, these works indicate a common interpretation: entropy is a measure of how much uncertainty or exploration remains in a reverse step, and high-entropy stages are disproportionately important for both solver design and policy optimization.

4. Entropy as a denoising prior in representation space

A different tradition places entropy not on model-internal trajectories but on the representation being denoised. "TIDY: Thermal Infrared Image Denoising via Wavelet Domain Entropy and Directional Stripe Index" (Rhee et al., 18 Jun 2026) moves denoising into the wavelet domain and defines Wavelet Entropy

Entropy(t)\mathrm{Entropy}(t)7

where

Entropy(t)\mathrm{Entropy}(t)8

Entropy is therefore computed over the distribution of wavelet-magnitude mass across the three directional detail subbands at each scale, not over pixel intensities. The paper’s motivation is that pixel-domain entropy conflates noise randomness with natural image intensity variations, whereas wavelet detail coefficients attenuate structural content and make entropy more selective for stochastic thermal noise. Because stripe-like fixed-pattern noise is not adequately captured by entropy, TIDY adds a separate Wavelet Directional Stripe Index and trains with

Entropy(t)\mathrm{Entropy}(t)9

using ΔEntropy\Delta\mathrm{Entropy}0, ΔEntropy\Delta\mathrm{Entropy}1, and ΔEntropy\Delta\mathrm{Entropy}2. The paper reports that adding ΔEntropy\Delta\mathrm{Entropy}3 improves IRE performance from PSNR/SSIM ΔEntropy\Delta\mathrm{Entropy}4 to ΔEntropy\Delta\mathrm{Entropy}5, while the full DWT + FiLM + WE + WDSI model gives the best SCaN-TIR result at ΔEntropy\Delta\mathrm{Entropy}6. The final model runs at about 34 Hz on ΔEntropy\Delta\mathrm{Entropy}7 (Rhee et al., 18 Jun 2026).

"Perception-based Image Denoising via Generative Compression" (Nguyen et al., 12 Feb 2026) makes entropy the core denoising prior through entropy-coded latent representations. A lossy code ΔEntropy\Delta\mathrm{Entropy}8 induces a codebook ΔEntropy\Delta\mathrm{Entropy}9, and under additive Gaussian noise the compression-based ML denoiser becomes

KK0

The paper interprets this as denoising by projection onto a compressible signal class. In the conditional WGAN-based instantiation, the latent code cost is

KK1

and training minimizes

KK2

In the diffusion-based instantiation, the objective is

KK3

Here low code length is the denoising prior: structured image content is compressible, while nuisance noise is high-complexity and costly to represent. The paper also establishes a non-asymptotic AWGN bound stating that with probability at least KK4,

KK5

This makes the denoising error depend explicitly on compression distortion KK6, coding rate KK7, and noise strength KK8 (Nguyen et al., 12 Feb 2026).

A cross-modal extension appears in "TextEconomizer: Enhancing Lossy Text Compression with Denoising Transformers and Entropy Coding" (Sobhani et al., 6 Jun 2026). There, denoising-trained encoder outputs are filtered to retain salient context vectors and then compressed with LZMA; at 20% Kizuki retention the reported compression ratio is 5.39×, while denoising training sharply improves BLEU, BERTScore, and perplexity relative to the no-denoising variant. This suggests that low-entropy bottlenecks can function as denoising priors beyond image restoration, although the paper does not formalize a general rate–distortion theorem for text (Sobhani et al., 6 Jun 2026).

5. Entropy for denoising assessment and diagnostics

Entropy is also used after denoising, as a diagnostic of whether structure has been preserved. "Image denoising assessment using anisotropic stack filtering" (Gabarda et al., 2011) defines a local directional Rényi entropy on thresholded binary stack levels and then measures anisotropy as the variation of directional entropy across orientations: KK9 The central claim is that meaningful image structure is anisotropic, while random noise is more isotropic. Therefore more noise implies less anisotropy and better denoising implies larger K=4K=40. On a real SAR image, the paper reports K=4K=41 for the noisy input and K=4K=42 for the Kuan filter, with Frost, SRAD, and relaxed median in between. The metric is thus proposed as a no-reference indicator of denoising quality (Gabarda et al., 2011).

"Recursive Threshold Median Filter and Autoencoder for Salt-and-Pepper Denoising: SSIM analysis of Images and Entropy Maps" (Boriskov et al., 15 Nov 2025) introduces an entropy-domain complement to image-domain SSIM. Entropy maps are computed with 2D Sample Entropy in sliding windows, using K=4K=43, K=4K=44, and K=4K=45, and denoising quality is assessed by SSIM between restored and clean entropy maps, denoted SSIMMap. The paper’s key claim is that SSIMMap is more sensitive to blur and local intensity transitions than SSIMImg. This is especially clear in the low-resolution mushroom-edge example at K=4K=46 salt-and-pepper noise, where moving from a single K=4K=47 recursive median filter to the 2MF scheme changes SSIMImg from K=4K=48 to K=4K=49 but SSIMMap from rlow=8r_{\text{low}}=80 to rlow=8r_{\text{low}}=81. On the rlow=8r_{\text{low}}=82 Lena image at rlow=8r_{\text{low}}=83 noise, the MFs-AE scheme gives the best reported values, with SSIMImg rlow=8r_{\text{low}}=84 and SSIMMap rlow=8r_{\text{low}}=85. In this line of work, entropy is not the denoiser; it is the measurement domain in which blur, edge loss, and over-smoothing become easier to quantify (Boriskov et al., 15 Nov 2025).

These assessment-oriented papers make a different but important point. They imply that image-domain fidelity and entropy-domain fidelity are not equivalent. A restoration can look acceptable under grayscale SSIM while failing to preserve the local irregularity patterns that encode edges, textures, or anisotropic structure.

6. Terminological divergences, misconceptions, and theoretical extensions

One recurrent misconception is that every entropy-themed denoising paper directly optimizes Shannon entropy. That is not the case. "Noise Reversal by Entropy Quantum Computing" (Huang et al., 12 Feb 2025) uses “entropy quantum computing” to denote an open quantum/photonic optimization paradigm, but the paper explicitly does not derive a Shannon entropy, von Neumann entropy, KL divergence, maximum-entropy estimator, or explicit entropy functional for denoising. Its denoising formulation is instead a constrained combinatorial optimization over noise allocations rlow=8r_{\text{low}}=86 such that rlow=8r_{\text{low}}=87, with a spatial-correlation cost on the residual rlow=8r_{\text{low}}=88 (Huang et al., 12 Feb 2025).

A second ambiguity concerns cross-entropy losses. "On denoising autoencoders trained to minimise binary cross-entropy" (Creswell et al., 2017) is not a paper about Shannon entropy of the data distribution, but about BCE as a reconstruction objective. Its main theorem shows that under additive Gaussian corruption the optimal BCE-trained denoising autoencoder satisfies the same small-noise asymptotic relation as the MSE-trained case,

rlow=8r_{\text{low}}=89

so reconstruction minus input still points toward higher-density regions of data space (Creswell et al., 2017). Relatedly, "AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation" (Lim et al., 2020) uses denoising to estimate the score rhigh=16r_{\text{high}}=160, which is then inserted into the pathwise entropy-gradient identity

rhigh=16r_{\text{high}}=161

There, denoising is a route to entropy-gradient estimation, not a direct entropy-minimization denoiser (Lim et al., 2020).

At the most theoretical end, "A Free Probabilistic Framework for Denoising Diffusion Models: Entropy, Transport, and Reverse Processes" (Das, 26 Oct 2025) lifts the denoising-entropy relation into free probability. The forward free Ornstein–Uhlenbeck process increases Voiculescu free entropy according to the free de Bruijn identity

rhigh=16r_{\text{high}}=162

while the reverse-time free SDE is driven by the conjugate variable rhigh=16r_{\text{high}}=163,

rhigh=16r_{\text{high}}=164

This replaces Gaussian noising by semicircular noising, Shannon entropy by free entropy, and the classical score by the conjugate variable (Das, 26 Oct 2025).

The breadth of these formulations suggests that denoising entropy is best understood as a family of entropy-mediated priors, control signals, and diagnostics rather than as a single invariant quantity. What unifies the field is the repeated use of entropy-like measures to discriminate structured, recoverable signal from randomness, ambiguity, or redundant computation. What remains variable is where that distinction is imposed: on trajectories, transitions, latents, wavelet subbands, or evaluation maps.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Denoising Entropy.