Papers
Topics
Authors
Recent
Search
2000 character limit reached

WaveDM: Wavelet Diffusion Models

Updated 9 February 2026
  • WaveDM is a family of models that use wavelet-domain representations combined with diffusion processes to enable efficient image restoration and signal processing.
  • The approach decomposes images into low- and high-frequency bands via a multi-level 2D wavelet transform, which reduces computational load and allows for frequency-specific processing.
  • It integrates a conditional diffusion process for low-frequency recovery with a high-frequency refinement module, achieving state-of-the-art performance and significant inference speed improvements.

WaveDM refers to a family of models and methodologies that leverage wavelet-domain representations in combination with diffusion processes or related signal processing architectures for efficient, high-fidelity data modeling, restoration, and communication. Across applications including image restoration, signal multiplexing, physical modeling, and dynamic system identification, WaveDM exploits the multiresolution structure inherent to the wavelet transform to accelerate computation, specialize processing by frequency band, or enhance statistical efficiency. Here, the primary focus is the wavelet-based diffusion model for image restoration as exemplified in "WaveDM: Wavelet-Based Diffusion Models for Image Restoration" (Huang et al., 2023).

1. Core Principles and Model Formulation

WaveDM frames image restoration as conditional generation of clean images in the wavelet domain, given the degraded image’s wavelet spectrum. The method applies a multi-level 2D discrete wavelet packet transform (FWPT) to the input, decomposing images into low- and high-frequency spectral components:

  • Low-frequency bands (x0lx_0^l): Carry coarse structural and color information.
  • High-frequency bands (x0hx_0^h): Encode texture, edges, and fine details.

The model trains:

  • A conditional diffusion process over the low-frequency coefficients, effectively modeling p(x0l∣yw)p(x_0^l \mid y_w) where ywy_w is the degraded spectrum, using denoising probabilistic diffusion as in DDPM.
  • A high-frequency refinement module (HFRM) to estimate high-frequency components in a single forward pass, leveraging the redundancy and sparsity of these bands for computational efficiency.

Reconstruction of the full image is achieved by inverse FWPT, concatenating the generated/recovered low and high-frequency bands.

2. Wavelet Domain Representation and Processing

Adopting a multi-level 2D wavelet transform, WaveDM converts an RGB image X∈RH×W×3X \in \mathbb{R}^{H \times W \times 3} at level kk into K=4kK=4^k subbands of size (H/2k)×(W/2k)×3(H/2^k) \times (W/2^k) \times 3, typically using the Haar basis:

  • xd=FWPT2D(Xd)x_d = \text{FWPT}_{2D}(X_d), x0=FWPT2D(X0)x_0 = \text{FWPT}_{2D}(X_0).

The spectrum is partitioned:

  • x0=[ x0l ∣ x0h ]x_0 = [\,x_0^l\ |\ x_0^h\,], xd=[ xdl ∣ xdh ]x_d = [\,x_d^l\ |\ x_d^h\,], with x0l∈RH/4×W/4×3x_0^l \in \mathbb{R}^{H/4 \times W/4 \times 3} and x0h∈RH/4×W/4×45x_0^h \in \mathbb{R}^{H/4 \times W/4 \times 45} for k=2k=2.

This decomposition confers several advantages:

  • Reducing computational cost: The diffusion model operates on a $1/16$ spatial area (for k=2k=2), translating to ∼16×\sim16\times inference acceleration per step.
  • Separation of structure and detail: Enables tailored neural architectures for low-vs-high frequency processing.

3. Conditional Diffusion and Frequency-Specific Modules

3.1. Diffusion Process in the Wavelet Domain

Noising is applied only to the low-frequency (structural) bands:

q(xtl∣xt−1l)=N(xtl;1−βtxt−1l, βtI)q(x_t^l \mid x_{t-1}^l) = \mathcal{N}( x_t^l; \sqrt{1-\beta_t}x_{t-1}^l,\, \beta_t I )

for t=1,…,Tt = 1,\dots,T, where usually T=1000T=1000, and the full noise schedule is defined via αˉt\bar{\alpha}_t.

Reverse denoising, conditional on degraded input xdx_d and refined high-frequency estimate x~0h\tilde{x}_0^h, is performed by a U-Net:

ϵθ=ϵθ(xtl, xd, x~0h, t)\epsilon_\theta = \epsilon_\theta( x_t^l,\ x_d,\ \tilde{x}_0^h,\ t )

3.2. High-Frequency Refinement

The HFRM (High-Frequency Refinement Module) is a compact U-Net mapping xdx_d (degraded spectrum) directly to a refined estimate of x0hx_0^h (high frequencies) in a single pass:

x~0h=HFRM(xd)\tilde{x}_0^h = \text{HFRM}( x_d )

3.3. Training Objective

Joint optimization of diffusion and refinement modules:

Lsimple=Ex0l,ϵ,t[∥ϵ−ϵθ(αˉtx0l+1−αˉtϵ, xd, x~0h, t)∥2]L_{\text{simple}} = \mathbb{E}_{x_0^l,\epsilon,t} \left[ \|\epsilon - \epsilon_\theta( \sqrt{\bar{\alpha}_t}x_0^l + \sqrt{1-\bar{\alpha}_t}\epsilon,\ x_d,\ \tilde{x}_0^h,\ t )\|^2 \right]

L1=∥x~0h−x0h∥1L_1 = \|\tilde{x}_0^h - x_0^h\|_1

Ltotal=Lsimple+λL1(λ=1)L_{\text{total}} = L_{\text{simple}} + \lambda L_1 \qquad (\lambda=1)

Training uses 2×1062\times10^6 iterations, Adam optimizer (lr 4×10−44\times10^{-4}), and β\beta schedule in [10−4,0.02][10^{-4}, 0.02].

4. Efficient Conditional Sampling (ECS) and Inference Acceleration

ECS is an inference strategy that leverages the rapid convergence of low-frequency restoration in the wavelet domain:

  • After a moderate number of DDIM (Denoising Diffusion Implicit Models) steps (typically M=600M=600 out of T=1000T=1000), the denoised low-frequency bands are sufficiently accurate when conditioned on both the degraded wavelet spectrum and HFRM output.
  • The algorithm then skips to direct calculation of the final clean low-frequency coefficients using a closed-form denoising formula:

x^0l=(xMl−1−αˉM ϵM)/αˉM\hat{x}_0^l = ( x_M^l - \sqrt{1-\bar{\alpha}_M}\ \epsilon_M )/\sqrt{\bar{\alpha}_M}

where ϵM=ϵθ(xMl,xd,x~0h,M)\epsilon_M = \epsilon_\theta(x_M^l, x_d, \tilde{x}_0^h, M).

  • Output is reconstructed by inverse FWPT: X0=IFWPT2D(x^0l, x~0h)X_0 = \text{IFWPT}_{2D}(\hat{x}_0^l,\, \tilde{x}_0^h).

Empirically, ECS achieves comparable or superior results to full 25-step DDIM chains, with as few as 4–8 steps in high-resolution restoration.

5. Architecture and Complexity Analysis

Module Base Channels Channel Multipliers Key Features
HFRM 32 1,2,4,8,16 1 residual block/scale; 48-in, 45-out channels
NEN 128 1,1,2,2,4,4 2 residual blocks/scale, attention, 512-D t-embed, inputs: xtlx_t^l, xdx_d, x~0h\tilde{x}_0^h (96 ch)
  • Operating in the wavelet domain reduces U-Net FLOPs per step by ∼16×\sim 16\times compared to pixel-wise DDPM on the full image.
  • Empirical timings on a 720×\times480 image: PatchDM (25 steps, patch-wise) ≈61\approx 61 s, one-pass SOTA CNN ≈0.4\approx 0.4 s, WaveDM (8 ECS steps) ≈0.3\approx 0.3 s.

6. Quantitative and Qualitative Results

WaveDM attains state-of-the-art or superior performance on multiple restoration benchmarks across tasks such as raindrop removal, rain-streak removal, dehazing, defocus deblurring, demoiréing, and denoising. Representative PSNR/SSIM/time performance metrics (selected from Table 1 (Huang et al., 2023)):

Task One-pass best (PSNR/SSIM/Time) PatchDM (25 steps) WaveDM (ECS)
Raindrop 31.87/0.931/0.39 s 32.31/0.946/301 s 32.25/0.948/0.30 s
Dehazing 34.95/0.984/0.14 s 35.52/0.989/19.3 s 37.00/0.994/0.15 s
Real denoise 40.02/0.960/0.114 s 39.86/0.959/9.33 s 40.38/0.962/0.062 s

For Gaussian denoising at σ=50\sigma=50, e.g., Set12 dataset: WaveDM achieves 28.44 dB, outperforming MWCNN, SwinIR, and Restormer.

WaveDM also yields high-quality qualitative improvements: sharper texture, more accurate edge recovery across all tasks. Figures in the source paper depict preservation of details and texture not observed in competing methods.

7. Limitations, Open Challenges, and Future Directions

  • Training cost: Full training requires several days on multi-GPU clusters for ∼\sim2 million iterations.
  • Potential improvements:
    • Knowledge distillation or preconditioned architectures to accelerate convergence.
    • Exploring alternative wavelet bases (e.g., Daubechies, biorthogonal) for domain-specific tailoring.
    • Extending the framework to video or more complex inverse problems.
  • Wavelet selection and resolution: The trade-off between compression (higher kk) and preservation of spatial detail remains a hyperparameter to optimize.
  • Current scope: The architecture is primarily optimized for image restoration; direct extensions to segmentation or generation are prospective.

WaveDM demonstrates that shifting conditional denoising diffusion to the wavelet domain and employing a dual-module approach with efficient conditional sampling can achieve high restoration fidelity and inference speed, with more than 100×100\times improvement over vanilla, patch-based diffusion models (Huang et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to WaveDM.