Diffusion Beam Search

Updated 19 November 2025

Diffusion Beam Search is a technique that frames the reverse diffusion process as a beam search over latent states to maximize complex, non-differentiable reward metrics.
It employs methods like dynamic beam expansion, multi-step lookahead, calibrated reward functions, and bidirectional search to improve sample quality across various domains.
Empirical studies show that these strategies outperform greedy and best-of-N sampling by effectively balancing reward alignment, diversity, and computational constraints.

Diffusion Beam Search encompasses a family of inference-time optimization and decoding techniques for diffusion models that employ beam-search-style explorations in the model’s denoising (or more generally, latent) space. By structuring sample generation as a search problem—often in the form of trees or beams of candidate latent states—these approaches address the challenge of aligning generative outputs (images, videos, or sequences) with complex, often non-differentiable reward functions such as perceptual coherence, textual alignment, or task-specific metrics. Key innovations include dynamic beam expansion, calibrated reward surrogates, multi-step lookahead value estimation, and bidirectional (“cyclic”) diffusion cycles. Diffusion Beam Search has demonstrated empirically superior sample quality, reward alignment, and computational efficiency across image, video, sequence, and planning domains (Oshima et al., 31 Jan 2025, Fernandes et al., 26 Mar 2025, Li et al., 3 Mar 2025, Lee et al., 20 May 2025).

1. Formulation: Search in Diffusion Latent Space

Diffusion Beam Search models the process of generating samples with diffusion models as a sequential search over latent variables corresponding to the intermediate states of the reverse diffusion process. At each denoising step $t$ , a population (beam) of candidate latents $\{z_t^j\}_{j=1}^B$ or partial samples is propagated forward (toward $t=0$ ) using a search strategy, with each branch evaluated via a task-specific reward or alignment metric.

Consider the inference-time alignment goal:

$\max_{x_0} R(x_0, c)$

where $x_0$ is the finally generated sample, $c$ is the conditioning context (e.g., prompt), and $R$ is a (possibly non-differentiable) reward. Standard diffusion sampling is reframed as controlled stochastic processes or Markov trees; the search then proceeds by generating, scoring, and pruning candidate trajectories in latent space, with beam update rules optimized to maximize reward alignment at the leaf nodes (Oshima et al., 31 Jan 2025, Li et al., 3 Mar 2025).

2. Algorithms: Core Variants and Implementation

2.1 Diffusion Latent Beam Search (DLBS) with Lookahead

DLBS organizes the reverse-diffusion process as a discrete-time beam search:

At each time step $t$ $t$ :
- For each of $B$ beams, generate $K$ candidate next latents by sampling from the DDIM or DDPM posterior.
- For each candidate, optionally run $L$ deterministic DDIM steps (“lookahead”) to obtain a lower-variance reward estimate.
- Evaluate the calibrated reward function $R_\text{calibrated}$ on each candidate.
- Select the top $B$ by reward, discarding the rest.
Iterate to $t=0$ ; return the $z_0^*$ with maximal $R_\text{calibrated}$ .

Key pseudocode appears below:

for t in range(T, 0, -1):
    for j in range(B):
        for i in range(K):
            # candidate latent generation
            z_candidate = ...
            # (Optional) lookahead refinement
            if L > 0:
                z_est = ddim_lookahead(z_candidate, L)
            else:
                z_est = posterior_mean(z_candidate)
            # Reward evaluation
            reward = R_calibrated(decode(z_est), c)
    # Beam prune
    select_top_B_by_reward()
return best_z0

DLBS with lookahead (DLBS-LA) efficiently mitigates error accumulation and reward estimate noise and is shown to dominate greedy and best-of-

N

sampling in both alignment and diversity metrics (Oshima et al., 31 Jan 2025).

2.2 Dynamic Search for Diffusion (DSearch)

DSearch approaches inference-time alignment via a tree-search that maintains a dynamically scheduled beam. Key mechanisms:

Variable beam width $b(t)$ and tree width $w(t)$ under fixed per-step compute.
Heuristic value estimates (using $K$ -step lookahead with multiple “particles” per node) guide the expansion and pruning.
Optional “beam-resample” replaces low-score beams with high-scoring ones.
Flexible search scheduling performs expansion only at selected time steps.

This confers the ability to prioritize exploration during early denoising and aggressive pruning as $t\to 0$ , optimizing reward for a fixed inference-time computational budget (Li et al., 3 Mar 2025).

2.3 Bidirectional and Cyclic Diffusion Search

Adaptive Bi-directional Cyclic Diffusion (ABCD) generalizes classical beam search with bi-directional cycles:

Each iteration comprises fast DDIM denoising from multiple anchors (“beam elements”), selection of top- $K$ , replication, re-noising to multiple earlier timesteps (“explore/exploit” via a temperature pool), and subsequent redensoising and scoring.
Automatic Exploration–Exploitation Balancing (AEEB) allocates compute adaptively over different go-back levels.
Adaptive Thinking Time (ATT) detects convergence and stops early for easy instances but continues for hard ones.

ABCD achieves efficient search in high-dimensional spaces by unifying cyclic exploration, global pruning, and adaptive compute allocation (Lee et al., 20 May 2025).

3. Reward Function Design and Calibration

Reward calibration is central to effective beam search in diffusion models with complex alignment objectives. In DLBS (Oshima et al., 31 Jan 2025), the final reward takes the form

$R_\text{calibrated}(x_0, c) = \sum_{i=1}^M w_i M_i(x_0, c)$

where $M_i$ are base perceptual/video metrics (subject consistency, motion smoothness, dynamics, aesthetics, image quality, text-video consistency), each normalized to $[0, 1]$ , and weights $w_i$ are grid or least-squares optimized for correlation with human or VLM feedback.

Other approaches, e.g., BeamDiffusion (Fernandes et al., 26 Mar 2025), employ trainable contrastive classifiers $\varphi$ for sequence alignment, while DSearch uses domain-specific rewards (compressibility, activity, docking score, etc.) depending on the application.

Reward calibration is necessary to align inference-time optimization with human or AI evaluators, as off-the-shelf metrics often correlate imperfectly with perceptual or semantic quality (Oshima et al., 31 Jan 2025).

4. Architectural and Algorithmic Extensions

Several architectural and search refinements have enabled diffusion beam search to excel in various domains:

Multi-modal Conditioning: Integration of cross-attention and language encoders to provide context-sensitive denoising, enabling video, image sequence, and text-to-image alignment (Fernandes et al., 26 Mar 2025).
Dynamic Priors for Sequences: Reuse of latents from previous steps (BeamDiffusion) facilitates visual and semantic consistency in generated image sequences.
Beam Scheduling: Exponential or custom beam-width decay, and search scheduling to concentrate compute where reward landscape is most uncertain (Li et al., 3 Mar 2025).
Beam Resampling and Diversity Promotion: Systematic replacement of low-reward beams with sampled high-reward ones prevents mode collapse and improves diversity.
Cyclic and Bidirectional Search: Bi-directional “cyclic” exploration in ABCD enables effective escape from local minima in planning and hard-search tasks (Lee et al., 20 May 2025).

5. Empirical Performance and Comparative Analyses

Diffusion Beam Search variants consistently outperform classical baselines such as greedy decoding, best-of- $N$ sampling, differentiable classifier guidance, and SMC/sampling methods across visual, sequence, and planning benchmarks.

Representative empirical results:

Method	Alignment Reward ↑	Human/VLM Score ↑	Diversity ↑	Runtime (min/video)
Best-of- $N$	Saturates KB~16	Plateaus	Lower	Fastest
Greedy Search	Suboptimal basins	Lower	Lower	Moderate
DLBS	Monotonic with KB	Increases	Higher	90 (T=50, KB=32)
DLBS-LA (L=6, KB=8)	Matches/Exceeds	High	Highest	30

On image sequences, BeamDiffusion decisively improves semantic and visual coherence over nucleus sampling and retrieval-based pipelines, confirmed by both human and GPT-4o/Gemini evaluations (Fernandes et al., 26 Mar 2025). DSearch and DSearch-R achieve state-of-the-art non-differentiable reward maximization in molecular, biological, and image domains (Li et al., 3 Mar 2025). ABCD achieves optimal task metrics with reduced inference cost and adaptive early stopping (Lee et al., 20 May 2025).

6. Trade-offs, Scalability, and Implementation Guidelines

Trade-offs in diffusion beam search center around the allocation of the compute budget among beam width (B), candidates per beam (K), number of lookahead steps (L), and total denoising steps (T):

Increasing $B$ widens hypothesis diversity but multiplies decoding and reward evaluations.
Increasing $K$ explores a larger action space per beam, improving exploration.
Larger $L$ (lookahead) yields more accurate intermediate value estimates, often more effective than increasing $B \cdot K$ under fixed compute.
Larger $T$ grants higher-fidelity denoising, at a linear cost in computation.

Heuristics established in the literature recommend prioritizing lookahead ( $L$ ) for reward estimation accuracy before expanding search budget, and adopting moderate to high $T$ for fidelity (Oshima et al., 31 Jan 2025, Li et al., 3 Mar 2025). Beam search parameters should be optimized for per-domain computational constraints, with ABCD offering instance-specific adaptive scaling (Lee et al., 20 May 2025).

Batch parallelism, memory management (balancing $B$ , $K$ ), and hardware-efficient reward computation are critical for large-scale or high-dimensional outputs (e.g., video, molecular graphs).

7. Applications and Limitations

Diffusion Beam Search has been applied to:

Text-to-video and text-to-image generation with enhanced perceptual alignment and temporal coherence (Oshima et al., 31 Jan 2025, Fernandes et al., 26 Mar 2025).
Optimizing for difficult or non-differentiable reward objectives in molecular, biological sequence, and image design (Li et al., 3 Mar 2025).
Planning and reasoning problems (e.g., maze navigation, Sudoku) and data-driven design under complex constraints (Lee et al., 20 May 2025).

Limitations include higher per-step computational costs (especially with large $K$ , $B$ , or lookahead $L$ ), and scalability challenges for extremely high-dimensional data if reward evaluation or decoding is expensive. ABCD/bi-directional search enables more flexible tradeoffs but may require greater engineering investment for unstructured tasks.