MedDX-FT: Frequency Decoupled Diffusion Model

Updated 9 December 2025

The paper shows that FDDM, by integrating a shared-latent VAE with a frequency-decoupled dual-path diffusion process, significantly improves anatomical fidelity and realism in MR-to-CT translation.
It utilizes a two-stage pipeline with Sobel-based edge detection and Laplacian pyramid fusion to effectively separate and recombine frequency details.
Quantitative results demonstrate FDDM’s superiority over competing models in FID, SSIM, and PSNR, validating its practical benefits for unsupervised image translation.

MedDX-FT (Frequency Decoupled Diffusion Model, FDDM) is an unsupervised framework for MR-to-CT medical image translation, distinguished by its frequency-decoupled dual-path diffusion process and structural guidance from a VAE-based module. FDDM addresses limitations in prior diffusion-based and generative adversarial models, focusing explicitly on anatomical faithfulness and realism when translating between unpaired MR and CT datasets (Li et al., 2023).

1. Architectural Overview

FDDM employs a two-stage pipeline integrating variational autoencoding and frequency-based conditional denoising. The first stage utilizes a shared-latent-space VAE (in the style of UNIT), receiving an MR image $X$ and its Sobel-derived edge map $S(X)$ , and producing a coarse CT prediction $\hat{Y}$ and corresponding edges $S(\hat{Y})$ . Key losses include VAE reconstruction, Kullback-Leibler divergence, adversarial (GAN) discrimination, cycle-consistency, and rotation-consistency.

The second stage introduces a frequency-decoupled diffusion model. The forward diffusion process uses “blue-noise” perturbation, which enforces a low-pass filter effect: high-frequency components in $\hat{Y}$ are more greatly corrupted, yielding $y_{T_s}$ . Reverse diffusion proceeds via two parallel branches—

Explicit (high-frequency) path: stochastic denoising with random noise
Implicit (low-frequency) path: deterministic denoising

At each timestep, predictions from both paths are fused using a Laplacian pyramid, separating frequency bands for faithful recombination. The process is detailed via stepwise pseudocode, demonstrating the initialization, edge thresholding, Laplacian pyramid generation, and dual-path updates.

2. Initial Conversion Module

The initial conversion employs a shared-latent-space variant VAE with the following configuration:

Inputs: $A = [X, S(X)]$ , $B = [Y, S(Y)]$
Encoders/Decoders: $L_a = E_A(A)$ , $L_b = E_B(B)$ , $D_A(L)$ , $D_B(L)$
Variational Posteriors: $q_A(L|A) = \mathcal{N}(L; \mu_A(A), I)$ , $q_B(L|B) = \mathcal{N}(L; \mu_B(B), I)$
VAE Losses: Each domain regularizes latent distributions via KL divergence and enforces reconstruction via $L_1$ penalty
Adversarial Losses: Enforces realistic output reconstruction in both domains
Cycle-consistency Losses: Guarantees bijection between domains, penalizing mode collapse
Rotation-consistency Losses: Stabilizes spatial structure by rotating inputs and outputs and minimizing discrepancies

The combined Stage 1 loss, $L_{stage1}$ , sums all terms. After training, inference proceeds via $[\hat{Y}, S(\hat{Y})] = D_B(E_A([X, S(X)])), S(\hat{Y}) = \text{Sobel}(\hat{Y})$ .

3. Frequency-Decoupled Diffusion Process

Forward Diffusion

The blue-noise forward step replaces standard Gaussian diffusion:

$q^{Blue}(y_t|\hat{Y}) = \mathcal{N}_{Blue}(\sqrt{\alpha_t} \hat{Y}, (1-\alpha_t)I)$
Blue noise $z_b$ is defined such that $\text{PSD}(f) \propto f$ , imparting greater noise to high-frequency components

Dual-path Reverse Diffusion

For timestep $t$ :

Prediction: Clean estimates $h_0^{(t)}, l_0^{(t)}$ via each path
Fusion via Laplacian pyramid: Extracts and recombines frequency bands from both predictions
Branch updates: High-frequency (explicit, noisy) and low-frequency (implicit, noiseless)

Diffusion loss is the standard MSE:

$L_{diff} = \mathbb{E}_{t, y_0, \epsilon} [\|\epsilon - \epsilon_\theta(y_t, S_t)\|_2^2]$

This decoupling ensures anatomical structure is preserved through the implicit path, while the explicit path refines realistic detail.

4. Unpaired Data Training Protocol

FDDM’s architecture obviates the need for paired datasets. Stage 1 uses only unpaired MR and CT slices, learning a bidirectional shared representation. Stage 2 conditions its diffusion solely on generated coarse CT and its edge guidance, with the model trained on CT domain images independently. No paired supervision between MR and CT is applied at any phase.

5. Quantitative Comparisons and Metrics

Performance is evaluated using three core metrics:

Fréchet Inception Distance (FID) for realism:

$FID = \| \mu_g - \mu_r \|^2 + \mathrm{Tr}(\Sigma_g + \Sigma_r - 2(\Sigma_g \Sigma_r)^{1/2})$

where $(\mu, \Sigma)$ are the mean and covariance of Inception features.

Peak Signal-to-Noise Ratio (PSNR)
Structural Similarity Index (SSIM)

Results on benchmark datasets (brain MR→CT; pelvis MR→CT) show that FDDM achieves the lowest FID (25.86 for brain, 29.20 for pelvis), surpassing CycleGAN, GcGAN, RegGAN, UNIT, MUNIT, SDEdit, and SynDiff. SSIM and PSNR are likewise matched or improved.

Method	FID (brain) ↓	SSIM (brain) ↑	PSNR (brain) ↑
CycleGAN	62.41	0.8788	36.50
GcGAN	60.03	0.8841	36.96
RegGAN	73.76	0.8187	35.73
UNIT	49.71	0.8960	36.87
MUNIT	72.85	0.8449	35.79
SDEdit	75.99	0.7385	35.37
SynDiff	70.54	0.8632	37.13
FDDM	25.86	0.9144	38.08

Comparable improvements are shown in the pelvis dataset.

6. Ablation Studies and Module Impacts

Extensive ablation supports the functional contributions of FDDM’s components:

Rotation-consistency loss in Stage 1 reduces FID from 48.69 (no RC) to 39.82 (with RC) and increases SSIM from 0.9014 to 0.9123.
Dual-path reverse diffusion shows that omitting the implicit path worsens SSIM, while removing the explicit path degrades FID. Full dual-path yields optimal balance: FID=25.86, SSIM=0.9144, PSNR=38.08.
Forward diffusion steps $T_s$ : Performance peaks at $T_s=300$ ; smaller $T_s$ under-corrects VAE errors while larger $T_s$ erodes anatomical integrity.

Qualitative comparisons confirm FDDM’s preservation of anatomical details—such as brain sulci and pelvic bone edges—relative to single-path or GAN/VAE methods.

7. Significance and Implications

FDDM’s innovations—shared-latent VAE structural extraction, frequency-adaptive blue-noise diffusion, and dual-path frequency-specific denoising—set a new standard for realism and anatomical fidelity in unsupervised medical image translation. The framework excels on unpaired MR→CT data, with state-of-the-art FID and competitive SSIM/PSNR, while its modularization demonstrates the necessity of frequency decoupling and structural guidance for high-impact clinical imaging applications. This suggests broader relevance for other unsupervised domain translation problems requiring precise structural preservation (Li et al., 2023).

PDF Markdown Chat (Pro)

References (1)

FDDM: Unsupervised Medical Image Translation with a Frequency-Decoupled Diffusion Model (2023)

MedDX-FT: Frequency Decoupled Diffusion Model

1. Architectural Overview

2. Initial Conversion Module

3. Frequency-Decoupled Diffusion Process

Forward Diffusion

Dual-path Reverse Diffusion

4. Unpaired Data Training Protocol

5. Quantitative Comparisons and Metrics

6. Ablation Studies and Module Impacts

7. Significance and Implications

Whiteboard

Follow Topic

Continue Learning

MedDX-FT: Frequency Decoupled Diffusion Model

1. Architectural Overview

2. Initial Conversion Module

3. Frequency-Decoupled Diffusion Process

Forward Diffusion

Dual-path Reverse Diffusion

4. Unpaired Data Training Protocol

5. Quantitative Comparisons and Metrics

6. Ablation Studies and Module Impacts

7. Significance and Implications

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics