Adaptive Bi-directional Cyclic Diffusion (ABCD)

Updated 26 June 2026

The paper introduces ABCD as an adaptive inference framework that iteratively refines a population of candidate samples using bi-directional cyclic diffusion cycles.
It employs a temperature pool to dynamically balance exploration and exploitation, allocating compute based on instance difficulty and applying an early stopping criterion.
Empirical evaluations demonstrate ABCD's effectiveness in tasks like Sudoku and molecule generation, achieving high success rates and improved time-accuracy trade-offs.

Adaptive Bi-directional Cyclic Diffusion (ABCD) is an adaptive, search-based inference framework for diffusion models that dynamically scales computational effort during sampling, allowing for instance-specific allocation of compute and principled early stopping. In contrast to conventional uni-directional, fixed-schedule denoising, ABCD iteratively refines a population of candidate samples through bi-directional diffusion cycles, automatically tuning the depth of exploration and termination criterion to maximize a task-specific reward or verifier. ABCD consists of three main components: Cyclic Diffusion Search, Automatic Exploration-Exploitation Balancing, and Adaptive Thinking Time, and has demonstrated empirical effectiveness across generative and reasoning tasks by concentrating compute where needed without sacrificing efficiency (Lee et al., 20 May 2025).

1. Formal Definition and Architecture

Let $p_\theta(x_{0:T})$ denote a pretrained diffusion model characterized by a forward noising kernel $q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)$ and a reverse denoiser $p_\theta(x_{t-1}|x_t)$ . Given a task-specific reward function $r(x_0)$ and a population of $N$ candidate samples ("particles"), standard sampling runs a uni-directional reverse chain $x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_0$ to produce one output. ABCD reframes this as a search process that:

Iteratively seeks to maximize $r(x_0)$ by refining particle populations.
Allocates increased computation adaptively to more difficult instances.
Dynamically terminates inference when further improvement is unlikely.

The core algorithm introduces three elements: (i) Bi-directional cycling through the diffusion timeline (combining both forward "go-back" noising and reverse denoising steps). (ii) Multi-level allocation of particles at different stages of the diffusion process ("temperature pool"). (iii) Adaptive termination dictated by the dynamic progress of top candidates.

2. Cyclic Diffusion Search: Mechanics and Mathematical Formulation

Cyclic Diffusion Search (CDS) is the operational foundation of ABCD. Evaluation steps are as follows:

Define the "temperature pool" $T = \{t_1, \dots, t_M\}$ ( $0 = t_1 < \dots < t_M = T$ ), which specifies the set of possible "go-back" levels for each cycle.
Maintain a population of $N$ particles $q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)$ 0 at each cycle $q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)$ 1. Each particle is associated with a verifier score $q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)$ 2.
Each cycle consists of:
- Fast Denoising: Initialize by applying $q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)$ 3 for all $q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)$ 4.
- Selection-and-Copy: Select the indices $q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)$ 5 of the top- $q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)$ 6 particles by $q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)$ 7, then replicate each selected particle $q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)$ 8 times ( $q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)$ 9 total).
- Noising (Forward): For each replica and each $p_\theta(x_{t-1}|x_t)$ 0, sample $p_\theta(x_{t-1}|x_t)$ 1.
- Denoising (Reverse): Run $p_\theta(x_{t-1}|x_t)$ 2 for all noised particles to produce new $p_\theta(x_{t-1}|x_t)$ 3.
- Population Update: Merge $p_\theta(x_{t-1}|x_t)$ 4 outputs, evaluate $p_\theta(x_{t-1}|x_t)$ 5, and proceed to the next cycle.

Forward (noising) and reverse updates use DDIM-like transitions: $p_\theta(x_{t-1}|x_t)$ 6

$p_\theta(x_{t-1}|x_t)$ 7

The process continues until the adaptive stopping criterion is triggered.

3. Automatic Exploration–Exploitation Balancing

ABCD provides a mechanism for adjusting exploration depth via the temperature pool, denoted as Automatic Exploration–Exploitation Balancing (AEEB). Instead of a fixed go-back parameter $p_\theta(x_{t-1}|x_t)$ 8, the process distributes each selected survivor across all $p_\theta(x_{t-1}|x_t)$ 9 temperature levels in $r(x_0)$ 0 during each cycle. For survivors $r(x_0)$ 1:

$r(x_0)$ 2

No explicit regularizer is applied; underperforming go-back levels are naturally filtered out by selection in subsequent cycles. The hyperparameters controlling this trade-off are $r(x_0)$ 3 (pool size), $r(x_0)$ 4 (replicas per anchor), and $r(x_0)$ 5 (survivors per cycle). This structure inherently supports both global exploration (large $r(x_0)$ 6) and fine local refinement (small $r(x_0)$ 7), adaptively tailored per instance.

4. Adaptive Thinking Time: Cycle Termination and Instance Difficulty

Adaptive Thinking Time (ATT) governs termination by monitoring which go-back levels produce the current top- $r(x_0)$ 8 solutions across cycles. For multiset $r(x_0)$ 9 of the go-back indices for the present top- $N$ 0:

Define $N$ 1.
Inference terminates at cycle $N$ 2 if $N$ 3 for all $N$ 4, for given persistence $N$ 5.

This allows cycles to persist longer on more challenging instances, as reflected by the observed history of $N$ 6, and terminate rapidly for easier ones.

The hyperparameters are $N$ 7 (controls duration before termination upon repeated exploitation) and $N$ 8 (enforces an upper limit on the number of cycles).

5. Computational Complexity and Convergence Guarantees

Pseudocode formalizes the ABCD algorithm with per-cycle computational cost:

$N$ 9

where $x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_0$ 0 is the number of DDIM steps and choosing $x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_0$ 1 attains overall per-cycle cost $x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_0$ 2. The algorithm is guaranteed to terminate in finite steps with probability 1, provided the reward is bounded above by $x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_0$ 3, the denoising step has full support, and selection is monotonic:

Theorem 1 (Finite-time termination): Using ATT, termination occurs in finite $x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_0$ 4 almost surely, with the best reward in the final top- $x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_0$ 5 converging to $x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_0$ 6.
Proof outline: Each cycle has constant nonzero probability $x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_0$ 7 of hitting the global maximum reward $x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_0$ 8. Once $x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_0$ 9 appears in the top- $r(x_0)$ 0, subsequent cycles quickly enter persistent exploitation, triggering ATT's stopping criterion.

6. Empirical Evaluation and Ablation Studies

ABCD has been benchmarked across six diverse tasks using multiple baselines: Base Diffusion, Best-of-N (BoN), Diffusion Beam Search (BS), Sequential Monte Carlo (SMC), and Search-over-Paths (SoP).

Task	Key Metric(s)	Core Results
Mixture of Gaussians	success rate	Only ABCD attains 100% success with adaptive go-backs within 25 cycles
Sudoku	accuracy, time	ABCD achieves 100% accuracy; SoP caps at ~95.5%; superior time-accuracy trade
Pixel Maze	success rate	Near-100% success for sizes up to 15 with far less time than baselines
Molecule Generation	stability, time	Peaks at ~0.99 stability in 20s; SoP only reaches ~0.94 in much longer
OGBench PointMaze	success rate	Only ABCD to achieve 100% on giant; greater speed for all environments
Text-to-Image	compressibility,	Target levels achieved $r(x_0)$ 1 baselines in $r(x_0)$ 21/4 time; fewer seconds needed
(Stable Diffusion)	aesthetic, pref.	for matched human preference score (e.g. 81s vs $r(x_0)$ 3300s for SoP)

Ablation analysis reveals:

Temperature pool yields higher performance than any fixed go-back.
Adaptive cycle endpoint (ATT) improves trade-off over fixed cycle counts.
Per-instance cycle count/time correlates with instance difficulty.
Sample diversity is preserved even as convergence to high-reward regions occurs (assessed via CLIP cosine similarity).

7. Relation to Prior Methods and Applicability

Uni-directional approaches—Base, BoN, SMC, BS—lack adaptability, leading to wasted computation on easy cases or under-exploration on difficult ones. SoP introduces backward steps but is restricted to a fixed schedule. ABCD generalizes these approaches by:

Allowing bi-directional cycling and flexible exploration at every cycle.
Distributing computation both globally and locally via the temperature pool.
Providing a principled, instance-wise stopping criterion (ATT) with convergence guarantees.

ABCD is especially well-suited for tasks with heterogeneous instance difficulty—including reasoning/planning (e.g., maze navigation, Sudoku), generative tasks that require focused computation (e.g., stable molecule synthesis, high-fidelity image generation), or scenarios with nondifferentiable rewards. As it relies only on a black-box verifier, ABCD supports a broad range of problems beyond differentiable or likelihood-based domains.

In summary, ABCD advances inference-time scaling in diffusion-based models by treating sampling as an adaptive search process, offering flexible resource allocation, and furnishing practical guarantees across a variety of domains (Lee et al., 20 May 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Adaptive Inference-Time Scaling via Cyclic Diffusion Search (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adaptive Bi-directional Cyclic Diffusion (ABCD).

Adaptive Bi-directional Cyclic Diffusion (ABCD)

1. Formal Definition and Architecture

2. Cyclic Diffusion Search: Mechanics and Mathematical Formulation

3. Automatic Exploration–Exploitation Balancing

4. Adaptive Thinking Time: Cycle Termination and Instance Difficulty

5. Computational Complexity and Convergence Guarantees

6. Empirical Evaluation and Ablation Studies

7. Relation to Prior Methods and Applicability

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Adaptive Bi-directional Cyclic Diffusion (ABCD)

1. Formal Definition and Architecture

2. Cyclic Diffusion Search: Mechanics and Mathematical Formulation

3. Automatic Exploration–Exploitation Balancing

4. Adaptive Thinking Time: Cycle Termination and Instance Difficulty

5. Computational Complexity and Convergence Guarantees

6. Empirical Evaluation and Ablation Studies

7. Relation to Prior Methods and Applicability

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research