Papers
Topics
Authors
Recent
2000 character limit reached

MedDX-FT: Frequency Decoupled Diffusion Model

Updated 9 December 2025
  • The paper shows that FDDM, by integrating a shared-latent VAE with a frequency-decoupled dual-path diffusion process, significantly improves anatomical fidelity and realism in MR-to-CT translation.
  • It utilizes a two-stage pipeline with Sobel-based edge detection and Laplacian pyramid fusion to effectively separate and recombine frequency details.
  • Quantitative results demonstrate FDDM’s superiority over competing models in FID, SSIM, and PSNR, validating its practical benefits for unsupervised image translation.

MedDX-FT (Frequency Decoupled Diffusion Model, FDDM) is an unsupervised framework for MR-to-CT medical image translation, distinguished by its frequency-decoupled dual-path diffusion process and structural guidance from a VAE-based module. FDDM addresses limitations in prior diffusion-based and generative adversarial models, focusing explicitly on anatomical faithfulness and realism when translating between unpaired MR and CT datasets (Li et al., 2023).

1. Architectural Overview

FDDM employs a two-stage pipeline integrating variational autoencoding and frequency-based conditional denoising. The first stage utilizes a shared-latent-space VAE (in the style of UNIT), receiving an MR image XX and its Sobel-derived edge map S(X)S(X), and producing a coarse CT prediction Y^\hat{Y} and corresponding edges S(Y^)S(\hat{Y}). Key losses include VAE reconstruction, Kullback-Leibler divergence, adversarial (GAN) discrimination, cycle-consistency, and rotation-consistency.

The second stage introduces a frequency-decoupled diffusion model. The forward diffusion process uses “blue-noise” perturbation, which enforces a low-pass filter effect: high-frequency components in Y^\hat{Y} are more greatly corrupted, yielding yTsy_{T_s}. Reverse diffusion proceeds via two parallel branches—

  • Explicit (high-frequency) path: stochastic denoising with random noise
  • Implicit (low-frequency) path: deterministic denoising

At each timestep, predictions from both paths are fused using a Laplacian pyramid, separating frequency bands for faithful recombination. The process is detailed via stepwise pseudocode, demonstrating the initialization, edge thresholding, Laplacian pyramid generation, and dual-path updates.

2. Initial Conversion Module

The initial conversion employs a shared-latent-space variant VAE with the following configuration:

  • Inputs: A=[X,S(X)]A = [X, S(X)], B=[Y,S(Y)]B = [Y, S(Y)]
  • Encoders/Decoders: La=EA(A)L_a = E_A(A), Lb=EB(B)L_b = E_B(B), DA(L)D_A(L), DB(L)D_B(L)
  • Variational Posteriors: qA(LA)=N(L;μA(A),I)q_A(L|A) = \mathcal{N}(L; \mu_A(A), I), qB(LB)=N(L;μB(B),I)q_B(L|B) = \mathcal{N}(L; \mu_B(B), I)
  • VAE Losses: Each domain regularizes latent distributions via KL divergence and enforces reconstruction via L1L_1 penalty
  • Adversarial Losses: Enforces realistic output reconstruction in both domains
  • Cycle-consistency Losses: Guarantees bijection between domains, penalizing mode collapse
  • Rotation-consistency Losses: Stabilizes spatial structure by rotating inputs and outputs and minimizing discrepancies

The combined Stage 1 loss, Lstage1L_{stage1}, sums all terms. After training, inference proceeds via [Y^,S(Y^)]=DB(EA([X,S(X)])),S(Y^)=Sobel(Y^)[\hat{Y}, S(\hat{Y})] = D_B(E_A([X, S(X)])), S(\hat{Y}) = \text{Sobel}(\hat{Y}).

3. Frequency-Decoupled Diffusion Process

Forward Diffusion

The blue-noise forward step replaces standard Gaussian diffusion:

  • qBlue(ytY^)=NBlue(αtY^,(1αt)I)q^{Blue}(y_t|\hat{Y}) = \mathcal{N}_{Blue}(\sqrt{\alpha_t} \hat{Y}, (1-\alpha_t)I)
  • Blue noise zbz_b is defined such that PSD(f)f\text{PSD}(f) \propto f, imparting greater noise to high-frequency components

Dual-path Reverse Diffusion

For timestep tt:

  • Prediction: Clean estimates h0(t),l0(t)h_0^{(t)}, l_0^{(t)} via each path
  • Fusion via Laplacian pyramid: Extracts and recombines frequency bands from both predictions
  • Branch updates: High-frequency (explicit, noisy) and low-frequency (implicit, noiseless)

Diffusion loss is the standard MSE:

Ldiff=Et,y0,ϵ[ϵϵθ(yt,St)22]L_{diff} = \mathbb{E}_{t, y_0, \epsilon} [\|\epsilon - \epsilon_\theta(y_t, S_t)\|_2^2]

This decoupling ensures anatomical structure is preserved through the implicit path, while the explicit path refines realistic detail.

4. Unpaired Data Training Protocol

FDDM’s architecture obviates the need for paired datasets. Stage 1 uses only unpaired MR and CT slices, learning a bidirectional shared representation. Stage 2 conditions its diffusion solely on generated coarse CT and its edge guidance, with the model trained on CT domain images independently. No paired supervision between MR and CT is applied at any phase.

5. Quantitative Comparisons and Metrics

Performance is evaluated using three core metrics:

FID=μgμr2+Tr(Σg+Σr2(ΣgΣr)1/2)FID = \| \mu_g - \mu_r \|^2 + \mathrm{Tr}(\Sigma_g + \Sigma_r - 2(\Sigma_g \Sigma_r)^{1/2})

where (μ,Σ)(\mu, \Sigma) are the mean and covariance of Inception features.

Results on benchmark datasets (brain MR→CT; pelvis MR→CT) show that FDDM achieves the lowest FID (25.86 for brain, 29.20 for pelvis), surpassing CycleGAN, GcGAN, RegGAN, UNIT, MUNIT, SDEdit, and SynDiff. SSIM and PSNR are likewise matched or improved.

Method FID (brain) ↓ SSIM (brain) ↑ PSNR (brain) ↑
CycleGAN 62.41 0.8788 36.50
GcGAN 60.03 0.8841 36.96
RegGAN 73.76 0.8187 35.73
UNIT 49.71 0.8960 36.87
MUNIT 72.85 0.8449 35.79
SDEdit 75.99 0.7385 35.37
SynDiff 70.54 0.8632 37.13
FDDM 25.86 0.9144 38.08

Comparable improvements are shown in the pelvis dataset.

6. Ablation Studies and Module Impacts

Extensive ablation supports the functional contributions of FDDM’s components:

  • Rotation-consistency loss in Stage 1 reduces FID from 48.69 (no RC) to 39.82 (with RC) and increases SSIM from 0.9014 to 0.9123.
  • Dual-path reverse diffusion shows that omitting the implicit path worsens SSIM, while removing the explicit path degrades FID. Full dual-path yields optimal balance: FID=25.86, SSIM=0.9144, PSNR=38.08.
  • Forward diffusion steps TsT_s: Performance peaks at Ts=300T_s=300; smaller TsT_s under-corrects VAE errors while larger TsT_s erodes anatomical integrity.

Qualitative comparisons confirm FDDM’s preservation of anatomical details—such as brain sulci and pelvic bone edges—relative to single-path or GAN/VAE methods.

7. Significance and Implications

FDDM’s innovations—shared-latent VAE structural extraction, frequency-adaptive blue-noise diffusion, and dual-path frequency-specific denoising—set a new standard for realism and anatomical fidelity in unsupervised medical image translation. The framework excels on unpaired MR→CT data, with state-of-the-art FID and competitive SSIM/PSNR, while its modularization demonstrates the necessity of frequency decoupling and structural guidance for high-impact clinical imaging applications. This suggests broader relevance for other unsupervised domain translation problems requiring precise structural preservation (Li et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to MedDX-FT.