WaveDM: Wavelet Diffusion Models

Updated 9 February 2026

WaveDM is a family of models that use wavelet-domain representations combined with diffusion processes to enable efficient image restoration and signal processing.
The approach decomposes images into low- and high-frequency bands via a multi-level 2D wavelet transform, which reduces computational load and allows for frequency-specific processing.
It integrates a conditional diffusion process for low-frequency recovery with a high-frequency refinement module, achieving state-of-the-art performance and significant inference speed improvements.

WaveDM refers to a family of models and methodologies that leverage wavelet-domain representations in combination with diffusion processes or related signal processing architectures for efficient, high-fidelity data modeling, restoration, and communication. Across applications including image restoration, signal multiplexing, physical modeling, and dynamic system identification, WaveDM exploits the multiresolution structure inherent to the wavelet transform to accelerate computation, specialize processing by frequency band, or enhance statistical efficiency. Here, the primary focus is the wavelet-based diffusion model for image restoration as exemplified in "WaveDM: Wavelet-Based Diffusion Models for Image Restoration" (Huang et al., 2023).

1. Core Principles and Model Formulation

WaveDM frames image restoration as conditional generation of clean images in the wavelet domain, given the degraded image’s wavelet spectrum. The method applies a multi-level 2D discrete wavelet packet transform (FWPT) to the input, decomposing images into low- and high-frequency spectral components:

Low-frequency bands ( $x_0^l$ ): Carry coarse structural and color information.
High-frequency bands ( $x_0^h$ ): Encode texture, edges, and fine details.

The model trains:

A conditional diffusion process over the low-frequency coefficients, effectively modeling $p(x_0^l \mid y_w)$ where $y_w$ is the degraded spectrum, using denoising probabilistic diffusion as in DDPM.
A high-frequency refinement module (HFRM) to estimate high-frequency components in a single forward pass, leveraging the redundancy and sparsity of these bands for computational efficiency.

Reconstruction of the full image is achieved by inverse FWPT, concatenating the generated/recovered low and high-frequency bands.

2. Wavelet Domain Representation and Processing

Adopting a multi-level 2D wavelet transform, WaveDM converts an RGB image $X \in \mathbb{R}^{H \times W \times 3}$ at level $k$ into $K=4^k$ subbands of size $(H/2^k) \times (W/2^k) \times 3$ , typically using the Haar basis:

$x_d = \text{FWPT}_{2D}(X_d)$ , $x_0 = \text{FWPT}_{2D}(X_0)$ .

The spectrum is partitioned:

$x_0 = [\,x_0^l\ |\ x_0^h\,]$ , $x_d = [\,x_d^l\ |\ x_d^h\,]$ , with $x_0^l \in \mathbb{R}^{H/4 \times W/4 \times 3}$ and $x_0^h \in \mathbb{R}^{H/4 \times W/4 \times 45}$ for $k=2$ .

This decomposition confers several advantages:

Reducing computational cost: The diffusion model operates on a $1/16$ spatial area (for $k=2$ ), translating to $\sim16\times$ inference acceleration per step.
Separation of structure and detail: Enables tailored neural architectures for low-vs-high frequency processing.

3. Conditional Diffusion and Frequency-Specific Modules

3.1. Diffusion Process in the Wavelet Domain

Noising is applied only to the low-frequency (structural) bands:

$q(x_t^l \mid x_{t-1}^l) = \mathcal{N}( x_t^l; \sqrt{1-\beta_t}x_{t-1}^l,\, \beta_t I )$

for $t = 1,\dots,T$ , where usually $T=1000$ , and the full noise schedule is defined via $\bar{\alpha}_t$ .

Reverse denoising, conditional on degraded input $x_d$ and refined high-frequency estimate $\tilde{x}_0^h$ , is performed by a U-Net:

$\epsilon_\theta = \epsilon_\theta( x_t^l,\ x_d,\ \tilde{x}_0^h,\ t )$

The HFRM (High-Frequency Refinement Module) is a compact U-Net mapping $x_d$ (degraded spectrum) directly to a refined estimate of $x_0^h$ (high frequencies) in a single pass:

$\tilde{x}_0^h = \text{HFRM}( x_d )$

3.3. Training Objective

Joint optimization of diffusion and refinement modules:

$L_{\text{simple}} = \mathbb{E}_{x_0^l,\epsilon,t} \left[ \|\epsilon - \epsilon_\theta( \sqrt{\bar{\alpha}_t}x_0^l + \sqrt{1-\bar{\alpha}_t}\epsilon,\ x_d,\ \tilde{x}_0^h,\ t )\|^2 \right]$

$L_1 = \|\tilde{x}_0^h - x_0^h\|_1$

$L_{\text{total}} = L_{\text{simple}} + \lambda L_1 \qquad (\lambda=1)$

Training uses $2\times10^6$ iterations, Adam optimizer (lr $4\times10^{-4}$ ), and $\beta$ schedule in $[10^{-4}, 0.02]$ .

4. Efficient Conditional Sampling (ECS) and Inference Acceleration

ECS is an inference strategy that leverages the rapid convergence of low-frequency restoration in the wavelet domain:

After a moderate number of DDIM (Denoising Diffusion Implicit Models) steps (typically $M=600$ out of $T=1000$ ), the denoised low-frequency bands are sufficiently accurate when conditioned on both the degraded wavelet spectrum and HFRM output.
The algorithm then skips to direct calculation of the final clean low-frequency coefficients using a closed-form denoising formula:

$\hat{x}_0^l = ( x_M^l - \sqrt{1-\bar{\alpha}_M}\ \epsilon_M )/\sqrt{\bar{\alpha}_M}$

where $\epsilon_M = \epsilon_\theta(x_M^l, x_d, \tilde{x}_0^h, M)$ .

Output is reconstructed by inverse FWPT: $X_0 = \text{IFWPT}_{2D}(\hat{x}_0^l,\, \tilde{x}_0^h)$ .

Empirically, ECS achieves comparable or superior results to full 25-step DDIM chains, with as few as 4–8 steps in high-resolution restoration.

5. Architecture and Complexity Analysis

Module	Base Channels	Channel Multipliers	Key Features
HFRM	32	1,2,4,8,16	1 residual block/scale; 48-in, 45-out channels
NEN	128	1,1,2,2,4,4	2 residual blocks/scale, attention, 512-D t-embed, inputs: $x_t^l$ , $x_d$ , $\tilde{x}_0^h$ (96 ch)

Operating in the wavelet domain reduces U-Net FLOPs per step by $\sim 16\times$ compared to pixel-wise DDPM on the full image.
Empirical timings on a 720 $\times$ 480 image: PatchDM (25 steps, patch-wise) $\approx 61$ s, one-pass SOTA CNN $\approx 0.4$ s, WaveDM (8 ECS steps) $\approx 0.3$ s.

6. Quantitative and Qualitative Results

WaveDM attains state-of-the-art or superior performance on multiple restoration benchmarks across tasks such as raindrop removal, rain-streak removal, dehazing, defocus deblurring, demoiréing, and denoising. Representative PSNR/SSIM/time performance metrics (selected from Table 1 (Huang et al., 2023)):

Task	One-pass best (PSNR/SSIM/Time)	PatchDM (25 steps)	WaveDM (ECS)
Raindrop	31.87/0.931/0.39 s	32.31/0.946/301 s	32.25/0.948/0.30 s
Dehazing	34.95/0.984/0.14 s	35.52/0.989/19.3 s	37.00/0.994/0.15 s
Real denoise	40.02/0.960/0.114 s	39.86/0.959/9.33 s	40.38/0.962/0.062 s

For Gaussian denoising at $\sigma=50$ , e.g., Set12 dataset: WaveDM achieves 28.44 dB, outperforming MWCNN, SwinIR, and Restormer.

WaveDM also yields high-quality qualitative improvements: sharper texture, more accurate edge recovery across all tasks. Figures in the source paper depict preservation of details and texture not observed in competing methods.

7. Limitations, Open Challenges, and Future Directions

Training cost: Full training requires several days on multi-GPU clusters for $\sim$ 2 million iterations.
Potential improvements:
- Knowledge distillation or preconditioned architectures to accelerate convergence.
- Exploring alternative wavelet bases (e.g., Daubechies, biorthogonal) for domain-specific tailoring.
- Extending the framework to video or more complex inverse problems.
Wavelet selection and resolution: The trade-off between compression (higher $k$ ) and preservation of spatial detail remains a hyperparameter to optimize.
Current scope: The architecture is primarily optimized for image restoration; direct extensions to segmentation or generation are prospective.

WaveDM demonstrates that shifting conditional denoising diffusion to the wavelet domain and employing a dual-module approach with efficient conditional sampling can achieve high restoration fidelity and inference speed, with more than $100\times$ improvement over vanilla, patch-based diffusion models (Huang et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

WaveDM: Wavelet-Based Diffusion Models for Image Restoration (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to WaveDM.

WaveDM: Wavelet Diffusion Models

1. Core Principles and Model Formulation

2. Wavelet Domain Representation and Processing

3. Conditional Diffusion and Frequency-Specific Modules

3.1. Diffusion Process in the Wavelet Domain

3.2. High-Frequency Refinement

3.3. Training Objective

4. Efficient Conditional Sampling (ECS) and Inference Acceleration

5. Architecture and Complexity Analysis

6. Quantitative and Qualitative Results

7. Limitations, Open Challenges, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

WaveDM: Wavelet Diffusion Models

1. Core Principles and Model Formulation

2. Wavelet Domain Representation and Processing

3. Conditional Diffusion and Frequency-Specific Modules

3.1. Diffusion Process in the Wavelet Domain

3.2. High-Frequency Refinement

3.3. Training Objective

4. Efficient Conditional Sampling (ECS) and Inference Acceleration

5. Architecture and Complexity Analysis

6. Quantitative and Qualitative Results

7. Limitations, Open Challenges, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics