Papers
Topics
Authors
Recent
Search
2000 character limit reached

JDMD: Joint Distribution Matching Distillation

Updated 24 April 2026
  • JDMD is a family of diffusion model distillation methods that convert multi-step teacher models into efficient few-step or one-step student models by matching joint distributions.
  • It combines local score matching with global distribution alignment using reverse KL and policy-gradient strategies to achieve high-quality, controllable generation.
  • JDMD integrates extensions like classifier-free guidance and human feedback learning, leading to improved fidelity, reduced inference cost, and state-of-the-art performance in visual and visuomotor tasks.

Joint Distribution Matching Distillation (JDMD) is a family of diffusion model distillation methods that produce efficient few-step or one-step generative models by explicitly matching joint distributions between a pre-trained multi-step diffusion "teacher" model and a distilled "student." JDMD methods unify local (score/gradient) alignment with global (distributional) consistency, leveraging objectives based on reverse KL or policy-gradient reward formulations. This enables high-fidelity generation and strong controllability, with extensions accommodating classifier-free guidance, human feedback learning, and joint reward integration. JDMD subsumes precursor approaches such as Consistency Distillation and generalizes to both visual and visuomotor domains (Jia et al., 2024, Luo et al., 9 Mar 2025, Fan et al., 30 Mar 2026).

1. Formal Problem Definition and Objectives

JDMD aims to convert a trained, multi-step diffusion generative model (the teacher, typically requiring 10–50 sampling steps) into a single-step or few-step generator (the student) without sacrificing output fidelity or conditional controllability. The core task is to minimize the statistical discrepancy between the teacher’s sampling process and the student’s, such that samples from the student are indistinguishable—both locally and globally—from those of the teacher, for all relevant conditions or contexts.

In the generic JDMD setup:

  • Teacher’s joint distribution: pϕ(xt,c)=pϕ(xt)p(cxt)p_{\phi}(x_t, c) = p_{\phi}(x_t) \cdot p(c \mid x_t) where xtx_t is the noisy sample at diffusion step tt and cc is a control or context variable.
  • Student’s joint distribution: qθ(xt,c)=pθ(xtc)p(c)q_{\theta}(x_t, c) = p_{\theta}(x_t \mid c) \cdot p(c).

The distillation objective is typically to minimize the time-weighted reverse KL divergence between qq and pp:

EtλtKL(q(xt,c)p(xt,c))\mathbb{E}_t \lambda_t \,\mathrm{KL}\big(q(x_t, c) \| p(x_t, c)\big)

which can be decomposed into fidelity and condition components (Luo et al., 9 Mar 2025):

logp(cxt)logpϕ(xt)+logpθ(xt)- \log p(c|x_t) - \log p_{\phi}(x_t) + \log p_{\theta}(x_t)

2. Two-Stage and Asymmetric Optimization Schemes

JDMD relies on a composite loss function, separating local score-matching from global distribution matching:

  • Score Matching: Aligns the gradient fields (“score functions”) of teacher and student at the same (possibly noised) sample points, typically enforcing

Lscore=Ez,tsϕ(xt)sθ(xt)2,wheresϕ(xt,t)=xtlogpϕ(xt)L_{\mathrm{score}} = \mathbb{E}_{z,t} \| s_{\phi}(x_t) - s_{\theta}(x_t) \|^2,\quad\text{where}\quad s_{\phi}(x_t, t) = \nabla_{x_t} \log p_{\phi}(x_t)

Here, xtx_t0 can be implemented as a “fake” score network initialized from the teacher and adapted to the student.

  • Distribution Matching: Minimizes the reverse KL divergence between conditional densities, most often as a direct term in the loss:

xtx_t1

In some frameworks, a two-teacher (frozen/adversarial) variant is applied: a fixed “frozen” teacher stabilizes the target, while a trainable “adversarial” teacher tracks the student to sharpen the KL target (Jia et al., 2024).

  • Combined Loss: The joint loss for the student is parameterized as

xtx_t2

Hyperparameters xtx_t3 and xtx_t4 trade off local versus global alignment. Ablation shows that omitting either term degrades basin success rates and sample quality by 5–8% (Jia et al., 2024).

3. Reward-Based and Policy-Gradient Formulations

JDMD admits a natural mapping to the policy-gradient framework of reinforcement learning, which provides algorithmic and conceptual unification with recent reinforcement learning-based distillation schemes (Fan et al., 30 Mar 2026).

  • Distribution-Matching Reward: The instantaneous reward for decision at each diffusion step is

xtx_t5

This “score-difference” reward aligns the student’s denoising kernel xtx_t6 with the teacher’s kernel.

  • Policy as Diffusion Sampler: Interpreting xtx_t7 as the “action” of a policy acting in state xtx_t8, the DMD gradient becomes:

xtx_t9

Maximizing expected reward is equivalent to minimizing distributional divergence—enabling joint optimization with RL-based or human-preference rewards.

4. Extensions: Guidance, Feedback, and Multi-Reward Objectives

JDMD naturally accommodates complex conditional guidance and reward integration due to its explicit joint distribution formulation.

  • Classifier-Free Guidance (CFG): JDMD allows the gradient tt0 to be computed via teacher-side classifier-free guidance, yielding efficient, high-fidelity control across arbitrary conditions, including text- and edge-based constraints (Luo et al., 9 Mar 2025).
  • Human Feedback Learning (HFL): When tt1 is replaced by a reward model tt2 (e.g., a learned human preference score), JDMD backpropagates the surrogate reward gradient through the student generator.
  • Multi-Reward Integration (GNDMR): GNDMR (“Group-Normalized Distribution Matching with Rewards”) combines distribution-matching reward with auxiliary scalar reward models (e.g., CLIP, HPS, aesthetic scores), balancing aesthetic and fidelity metrics using group-normalized advantages and adaptive per-sample weighting (Fan et al., 30 Mar 2026).

5. Algorithmic Procedures and Implementation

JDMD algorithms share a common training loop, with key steps differentiated by the domain (visuomotor, image, conditional) and particular reward structure.

Step Description Implementation Notes
1 Sample context tt3 and initialize tt4 tt5
2 Compute guidance gradient tt6
3 Compute fidelity gradient tt7
4 Aggregate gradients tt8
5 Backpropagate/update tt9
6 Train fake-score model cc0 updated via standard denoising loss

The specific pipeline may involve warm-up phases, joint UNet-ControlNet training, parameter freezing, and multi-update importance sampling. For reward-based variants, group normalization and clipped-ratio policy-gradient steps are applied per batch.

6. Empirical Results and Quantitative Benchmarks

JDMD achieves state-of-the-art sample quality and control at substantially reduced inference cost:

  • Controllable generation (images/edges/depth): On an SD-1.5 domain, JDMD (one step) achieves FID = 14.58, Consistency = 0.071, outperforming both 50-step ControlNet (FID = 15.21, Consistency = 0.093) and one-step baselines (e.g., FID = 22.21 for DI+ControlNet) (Luo et al., 9 Mar 2025).
  • Text-to-image (SD 1.5, one-step): With human feedback or CFG, JDM achieves HPS = 29.75, Aesthetic = 5.90, CLIP Score = 33.97, exceeding Diff-Instruct, DMD2, and other baselines (Luo et al., 9 Mar 2025).
  • Visuomotor (robotic imitation learning): SDM Policy achieves cc16× speedup (60Hz vs. 10Hz) and action error reduced by 15% relative to ManiCM; success rate cc274.8% versus 69% (ManiCM) and 76% (teacher) (Jia et al., 2024).
  • Faster/smarter training: Importance sampling allows up to 5× inner updates per batch without additional forward sampling expense, reducing wallclock and GPU cost for the same sample quality (Fan et al., 30 Mar 2026).

7. Ablation Studies, Stability, and Limitations

Ablations confirm that both local score alignment and global distribution matching are essential for optimal fidelity, diversity, and control transfer:

  • Removing score-matching degrades action and sample quality by 5–8%.
  • Omitting noisy-posterior modeling or forgoing warm-up/freeze phases leads to worse FID, lower consistency, and training instability.
  • Moderate loss weighting (cc3) yields the best trade-off; highly imbalanced weights reduce diversity or fail to match target distributions (Jia et al., 2024, Luo et al., 9 Mar 2025).

Stability is further improved with group normalization of advantages, two-teacher stabilization, and careful reward/adaptive weighting in multi-reward scenarios (Fan et al., 30 Mar 2026).

8. Significance and Applications

JDMD generalizes distillation beyond simple student–teacher imitation, enabling few-step diffusion models to inherit teacher fidelity and controllability while scaling to new forms of guidance, reward structures, or unseen control conditions. This framework underpins state-of-the-art image, text-to-image, and visuomotor models, providing efficient, real-time, and highly controllable synthesis pathways (Jia et al., 2024, Luo et al., 9 Mar 2025, Fan et al., 30 Mar 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Joint Distribution Matching Distillation (JDMD).