Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 83 tok/s

Gemini 2.5 Pro 34 tok/s Pro

GPT-5 Medium 24 tok/s Pro

GPT-5 High 21 tok/s Pro

GPT-4o 130 tok/s Pro

Kimi K2 207 tok/s Pro

GPT OSS 120B 460 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Diffusion Tree Sampling (DTS)

Updated 1 July 2025

Diffusion Tree Sampling (DTS) is a scalable inference-time alignment technique for generative diffusion models, framing sample selection and reward maximization as a tree search over the denoising chain.
Inspired by MCTS, DTS improves sample quality and reliability by efficiently reusing information through phases of selection, expansion, rollout, and backup, enabling consistent global value estimation.
This approach provides an anytime mechanism to improve sample quality with more compute, supporting efficient posterior-aligned sampling and global optimization across diverse generative tasks.

Diffusion Tree Sampling (DTS) is a scalable, inference-time alignment technique for generative diffusion models that frames sample selection and reward maximization as a tree search and aggregation task over the denoising chain. Drawing inspiration from Monte Carlo Tree Search (MCTS), DTS enables generative models to adapt to new objectives at inference time by constructing and exploring a tree of sample trajectories, formally reusing information from previous generations to improve sample efficiency and reliability. This approach provides a principled, anytime mechanism for turning extra computation into better sample quality and allows for both posterior-aligned sampling and global optimization, without the deficiencies of prior local or population-based guidance algorithms (Jain et al., 25 Jun 2025).

1. Principles and Motivation

Generative diffusion models are widely used for data synthesis, image generation, and multi-modal tasks, but aligning samples to new or externally-defined reward functions at inference time is challenging. Traditional steering techniques (e.g., gradient-based guidance, population resampling, local best-of-N search) suffer from value estimation inaccuracies at high noise levels and inefficient compute reuse, leading to biased samples, poor coverage, or high computational cost. DTS addresses these issues by recasting alignment as a search through the denoising process, enabling global credit assignment (backpropagation of final rewards) and incremental improvement through information reuse.

The DTS framework is characterized by four phases—selection, expansion, rollout, and backup—mirroring MCTS. At each phase, the tree is expanded or traversed according to estimated soft values derived from terminal rewards, and internal nodes are continually updated as new terminal results are observed. This paradigm allows diffusion models to efficiently explore the denoising trajectory space and assign importance to promising regions, thereby improving both sample diversity and reward alignment.

2. Mathematical Formulation

DTS formalizes sampling from the reward-aligned posterior (the “target distribution”) as follows: $\pi^*(x) = \frac{1}{Z} p_\theta(x) \exp(\lambda r(x))$ where $p_\theta(x)$ is the base diffusion model, $r(x)$ is a scalar reward function (objective), $\lambda$ is an inverse temperature, and $Z$ the partition function. The denoising sequence $\{x_T, x_{T-1}, \ldots, x_0\}$ defines a path from noise to data.

Value assignment along the tree is managed using the soft BeLLMan equation: $V_t(x_t) = \frac{1}{\lambda} \log \mathbb{E}_{p_\theta(x_{0:t-1}|x_t)}\left[\exp\left(\lambda\,r(x_0)\right)\right]$ with recursive update: $V_t(x_t) = \frac{1}{\lambda} \log \mathbb{E}_{p_\theta(x_{t-1}|x_t)} [\exp(\lambda V_{t-1}(x_{t-1}))]$ At each branching, the optimal sampling policy is: $\pi_t^*(x_{t-1}|x_t) = \frac{p_\theta(x_{t-1}|x_t)\exp\left(\lambda V_{t-1}(x_{t-1})\right)}{\int p_\theta(x_{t-1}|x_t)\exp(\lambda V_{t-1}(x_{t-1})) dx_{t-1}}$ The tree stores soft value estimates at every intermediate node. New terminal reward observations are propagated up the tree using

$\hat{v}(x_{t+1}) \leftarrow \frac{1}{\lambda}\log\sum_{x_t \in \mathcal{C}(x_{t+1})}\exp(\lambda \hat{v}(x_t))$

where $\mathcal{C}(x_{t+1})$ are the children of $x_{t+1}$ . This global backup provides statistically consistent credit assignment irrespective of rollout length or branching pattern.

3. DTS Algorithmic Workflow

The DTS routine comprises:

Selection: From the root (highest noise), recursively descend to child nodes, each time selecting according to the Boltzmann distribution over current soft value estimates, until an unvisited or leaf node is reached.
Expansion: Perform a new denoising step at the current node, expanding the tree by adding a child node corresponding to a fresh sample $x_{t-1} \sim p_\theta(\cdot|x_t)$ .
Rollout: Run a full, standard denoising trajectory from the new child to $x_0$ , obtaining a terminal sample.
Backup: Assign the observed terminal reward $r(x_0)$ to all ancestor nodes via the soft BeLLMan backup rule, thereby refining all intermediate soft values along the path.

Sample generation (after sufficient rollouts) is achieved by traversing the tree from root to a leaf, at each step selecting children with probability proportional to $\exp(\lambda \hat{v})$ . The scheme generalizes to the greedy optimization case (“Diffusion Tree Search” or DTS $^\star$ ), where deterministic best-value selection is used to maximize reward.

Sampling vs. Optimization

For sampling (posterior alignment), tree traversal samples child nodes according to the soft value distribution.
For optimization, the path with highest cumulative soft value is greedily selected, searching for the highest-reward sample.

4. Empirical Performance and Scalability

DTS has been evaluated on MNIST and CIFAR-10 class-conditional generation, text-to-image (e.g., Stable Diffusion with human or LLM reward functions), and language completion tasks. Across domains, DTS matches or exceeds the sample quality (FID, reward alignment) of SMC-based or gradient guidance baselines with up to 10× less compute, and continues to produce steadily better samples as compute increases (anytime property).

Notable empirical findings:

Bias and variance of value estimates at each node are dramatically lower for DTS compared to prior approximations; this is most pronounced at early denoising steps (high noise).
Sample diversity is preserved, avoiding mode collapse common in SMC-based or greedy local search methods.
Efficiency: For top-1 search (DTS $^\star$ ), high-reward outputs on text-to-image and language benchmarks are found with 2–5× less forward passes compared to best-of-N or resampling schemes.
Robust scaling: DTS’s sample quality reliably improves as more rollouts are performed because value estimates for all existing nodes become more accurate; other approaches simply add more independent samples but do not leverage incremental information reuse.

The tree structure incurs higher memory cost, as all explored nodes and their soft values must be retained; however, it allows rapid selection of new samples from the explored region and can be efficiently parallelized in practice.

5. Theoretical Guarantees

DTS provides strong consistency: as the number of rollouts grows, the empirical distribution of terminal nodes samples from the correct reward-aligned posterior ( $\pi^\ast$ ) [(Jain et al., 25 Jun 2025), proof appendix]. This holds for arbitrary reward functions, denoising architectures, and rollout schedules, provided the tree is expanded sufficiently. The greedy search variant (DTS $^\star$ ) converges to the optimal sample as the tree grows.

The algorithm’s backup rules ensure global credit assignment, in contrast to SMC or guidance methods whose local assignment schemes are vulnerable to estimation bias. Consequently, DTS mitigates the risk of early misguidance and unbalanced exploration.

6. Applications and Impact

DTS is applicable wherever inference-time alignment of a diffusion model to an extrinsic reward or auxiliary objective is required, including:

Conditional image generation (e.g., maximizing visually-discriminable class or aesthetic properties)
Preference-aligned text-to-image and text generation (e.g., controlling for semantic alignment, grammatical correctness, or human preference signals)
Best-of-N and optimization tasks (DTS $^\star$ for search)
Multimodal or adaptive tasks, where sample faithfulness and diversity must be maintained under changing objectives

As an anytime algorithm, DTS is valuable in settings constrained by variable compute budgets, enabling quality to scale with available resources and providing a natural stopping criterion. By reusing and aggregating information across rollouts, it remains robust in both sampling and search regimes.

7. Relation to Other Frameworks and Distinctions

DTS fundamentally differs from:

Sequential Monte Carlo (SMC) and particle methods: These spread compute across independent trajectories, with resampling based on approximate or local value. DTS aggregates all experience into a growing, structured tree, ensuring efficient use of resources and globally consistent value estimation.
Gradient-based guidance: While these methods steer samples using reward gradients at each step, they typically fail at high noise and do not perform global credit assignment.
Local best-of-N search: While best-of-N increases the number of candidates at a timestep, it does not aggregate rewards back through the diffusion process, thus missing global structure and sample reuse.
Population annealing/ensemble schemes: DTS provides a unifying, general, and theoretically principled approach for scalable sampling and search—turning extra compute into systematically better or more optimal outputs.

Summary Table

Method	Value Estimation	Compute Reuse	Scalability	Sample Quality	Anytime Improvement
Gradient Guidance	Local, biased	No	Particle	Instability, mode collapse	No
Particle/SMC/Best-of-N	Approximate, local	No	Particle	Possible collapse	Plateaus
Diffusion Tree Sampling (DTS)	Global, consistent (BeLLMan)	Yes	Rollout/tree	High, scales with compute	Yes

References

"Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models" (Jain et al., 25 Jun 2025)
Additional algorithms and experimental comparisons in Sec. 3–5 and Appendix A.4 of the paper

DTS thus provides a foundation for robust, efficient, and principled inference-time alignment in diffusion models, with strong theoretical guarantees and demonstrated effectiveness across diverse generative tasks.

PDF Markdown Chat (Pro)

References (1)

Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models (2025)

Follow Topic

Get notified by email when new papers are published related to Diffusion Tree Sampling (DTS).

Diffusion Tree Sampling (DTS)

1. Principles and Motivation

2. Mathematical Formulation

3. DTS Algorithmic Workflow

Sampling vs. Optimization

4. Empirical Performance and Scalability

5. Theoretical Guarantees

6. Applications and Impact

7. Relation to Other Frameworks and Distinctions

Summary Table

References

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Diffusion Tree Sampling (DTS)

1. Principles and Motivation

2. Mathematical Formulation

3. DTS Algorithmic Workflow

Sampling vs. Optimization

4. Empirical Performance and Scalability

5. Theoretical Guarantees

6. Applications and Impact

7. Relation to Other Frameworks and Distinctions

Summary Table

References

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research