Papers
Topics
Authors
Recent
Search
2000 character limit reached

Adaptive Bi-directional Cyclic Diffusion (ABCD)

Updated 26 June 2026
  • The paper introduces ABCD as an adaptive inference framework that iteratively refines a population of candidate samples using bi-directional cyclic diffusion cycles.
  • It employs a temperature pool to dynamically balance exploration and exploitation, allocating compute based on instance difficulty and applying an early stopping criterion.
  • Empirical evaluations demonstrate ABCD's effectiveness in tasks like Sudoku and molecule generation, achieving high success rates and improved time-accuracy trade-offs.

Adaptive Bi-directional Cyclic Diffusion (ABCD) is an adaptive, search-based inference framework for diffusion models that dynamically scales computational effort during sampling, allowing for instance-specific allocation of compute and principled early stopping. In contrast to conventional uni-directional, fixed-schedule denoising, ABCD iteratively refines a population of candidate samples through bi-directional diffusion cycles, automatically tuning the depth of exploration and termination criterion to maximize a task-specific reward or verifier. ABCD consists of three main components: Cyclic Diffusion Search, Automatic Exploration-Exploitation Balancing, and Adaptive Thinking Time, and has demonstrated empirical effectiveness across generative and reasoning tasks by concentrating compute where needed without sacrificing efficiency (Lee et al., 20 May 2025).

1. Formal Definition and Architecture

Let pθ(x0:T)p_\theta(x_{0:T}) denote a pretrained diffusion model characterized by a forward noising kernel q(xt+1∣xt)=N(xt+1;αt+1xt,(1−αt+1)I)q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I) and a reverse denoiser pθ(xt−1∣xt)p_\theta(x_{t-1}|x_t). Given a task-specific reward function r(x0)r(x_0) and a population of NN candidate samples ("particles"), standard sampling runs a uni-directional reverse chain xT∼N(0,I)→⋯→x0x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_0 to produce one output. ABCD reframes this as a search process that:

  • Iteratively seeks to maximize r(x0)r(x_0) by refining particle populations.
  • Allocates increased computation adaptively to more difficult instances.
  • Dynamically terminates inference when further improvement is unlikely.

The core algorithm introduces three elements: (i) Bi-directional cycling through the diffusion timeline (combining both forward "go-back" noising and reverse denoising steps). (ii) Multi-level allocation of particles at different stages of the diffusion process ("temperature pool"). (iii) Adaptive termination dictated by the dynamic progress of top candidates.

2. Cyclic Diffusion Search: Mechanics and Mathematical Formulation

Cyclic Diffusion Search (CDS) is the operational foundation of ABCD. Evaluation steps are as follows:

  • Define the "temperature pool" T={t1,…,tM}T = \{t_1, \dots, t_M\} (0=t1<⋯<tM=T0 = t_1 < \dots < t_M = T), which specifies the set of possible "go-back" levels for each cycle.
  • Maintain a population of NN particles q(xt+1∣xt)=N(xt+1;αt+1xt,(1−αt+1)I)q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)0 at each cycle q(xt+1∣xt)=N(xt+1;αt+1xt,(1−αt+1)I)q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)1. Each particle is associated with a verifier score q(xt+1∣xt)=N(xt+1;αt+1xt,(1−αt+1)I)q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)2.
  • Each cycle consists of:
    • Fast Denoising: Initialize by applying q(xt+1∣xt)=N(xt+1;αt+1xt,(1−αt+1)I)q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)3 for all q(xt+1∣xt)=N(xt+1;αt+1xt,(1−αt+1)I)q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)4.
    • Selection-and-Copy: Select the indices q(xt+1∣xt)=N(xt+1;αt+1xt,(1−αt+1)I)q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)5 of the top-q(xt+1∣xt)=N(xt+1;αt+1xt,(1−αt+1)I)q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)6 particles by q(xt+1∣xt)=N(xt+1;αt+1xt,(1−αt+1)I)q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)7, then replicate each selected particle q(xt+1∣xt)=N(xt+1;αt+1xt,(1−αt+1)I)q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)8 times (q(xt+1∣xt)=N(xt+1;αt+1xt,(1−αt+1)I)q(x_{t+1}|x_t) = \mathcal{N}(x_{t+1}; \sqrt{\alpha_{t+1}} x_t, (1-\alpha_{t+1})I)9 total).
    • Noising (Forward): For each replica and each pθ(xt−1∣xt)p_\theta(x_{t-1}|x_t)0, sample pθ(xt−1∣xt)p_\theta(x_{t-1}|x_t)1.
    • Denoising (Reverse): Run pθ(xt−1∣xt)p_\theta(x_{t-1}|x_t)2 for all noised particles to produce new pθ(xt−1∣xt)p_\theta(x_{t-1}|x_t)3.
    • Population Update: Merge pθ(xt−1∣xt)p_\theta(x_{t-1}|x_t)4 outputs, evaluate pθ(xt−1∣xt)p_\theta(x_{t-1}|x_t)5, and proceed to the next cycle.

Forward (noising) and reverse updates use DDIM-like transitions: pθ(xt−1∣xt)p_\theta(x_{t-1}|x_t)6

pθ(xt−1∣xt)p_\theta(x_{t-1}|x_t)7

The process continues until the adaptive stopping criterion is triggered.

3. Automatic Exploration–Exploitation Balancing

ABCD provides a mechanism for adjusting exploration depth via the temperature pool, denoted as Automatic Exploration–Exploitation Balancing (AEEB). Instead of a fixed go-back parameter pθ(xt−1∣xt)p_\theta(x_{t-1}|x_t)8, the process distributes each selected survivor across all pθ(xt−1∣xt)p_\theta(x_{t-1}|x_t)9 temperature levels in r(x0)r(x_0)0 during each cycle. For survivors r(x0)r(x_0)1:

r(x0)r(x_0)2

No explicit regularizer is applied; underperforming go-back levels are naturally filtered out by selection in subsequent cycles. The hyperparameters controlling this trade-off are r(x0)r(x_0)3 (pool size), r(x0)r(x_0)4 (replicas per anchor), and r(x0)r(x_0)5 (survivors per cycle). This structure inherently supports both global exploration (large r(x0)r(x_0)6) and fine local refinement (small r(x0)r(x_0)7), adaptively tailored per instance.

4. Adaptive Thinking Time: Cycle Termination and Instance Difficulty

Adaptive Thinking Time (ATT) governs termination by monitoring which go-back levels produce the current top-r(x0)r(x_0)8 solutions across cycles. For multiset r(x0)r(x_0)9 of the go-back indices for the present top-NN0:

  • Define NN1.
  • Inference terminates at cycle NN2 if NN3 for all NN4, for given persistence NN5.

This allows cycles to persist longer on more challenging instances, as reflected by the observed history of NN6, and terminate rapidly for easier ones.

The hyperparameters are NN7 (controls duration before termination upon repeated exploitation) and NN8 (enforces an upper limit on the number of cycles).

5. Computational Complexity and Convergence Guarantees

Pseudocode formalizes the ABCD algorithm with per-cycle computational cost:

NN9

where xT∼N(0,I)→⋯→x0x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_00 is the number of DDIM steps and choosing xT∼N(0,I)→⋯→x0x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_01 attains overall per-cycle cost xT∼N(0,I)→⋯→x0x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_02. The algorithm is guaranteed to terminate in finite steps with probability 1, provided the reward is bounded above by xT∼N(0,I)→⋯→x0x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_03, the denoising step has full support, and selection is monotonic:

  • Theorem 1 (Finite-time termination): Using ATT, termination occurs in finite xT∼N(0,I)→⋯→x0x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_04 almost surely, with the best reward in the final top-xT∼N(0,I)→⋯→x0x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_05 converging to xT∼N(0,I)→⋯→x0x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_06.
  • Proof outline: Each cycle has constant nonzero probability xT∼N(0,I)→⋯→x0x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_07 of hitting the global maximum reward xT∼N(0,I)→⋯→x0x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_08. Once xT∼N(0,I)→⋯→x0x_T \sim \mathcal{N}(0, I) \rightarrow \dots \rightarrow x_09 appears in the top-r(x0)r(x_0)0, subsequent cycles quickly enter persistent exploitation, triggering ATT's stopping criterion.

6. Empirical Evaluation and Ablation Studies

ABCD has been benchmarked across six diverse tasks using multiple baselines: Base Diffusion, Best-of-N (BoN), Diffusion Beam Search (BS), Sequential Monte Carlo (SMC), and Search-over-Paths (SoP).

Task Key Metric(s) Core Results
Mixture of Gaussians success rate Only ABCD attains 100% success with adaptive go-backs within 25 cycles
Sudoku accuracy, time ABCD achieves 100% accuracy; SoP caps at ~95.5%; superior time-accuracy trade
Pixel Maze success rate Near-100% success for sizes up to 15 with far less time than baselines
Molecule Generation stability, time Peaks at ~0.99 stability in 20s; SoP only reaches ~0.94 in much longer
OGBench PointMaze success rate Only ABCD to achieve 100% on giant; greater speed for all environments
Text-to-Image compressibility, Target levels achieved r(x0)r(x_0)1 baselines in r(x0)r(x_0)21/4 time; fewer seconds needed
(Stable Diffusion) aesthetic, pref. for matched human preference score (e.g. 81s vs r(x0)r(x_0)3300s for SoP)

Ablation analysis reveals:

  • Temperature pool yields higher performance than any fixed go-back.
  • Adaptive cycle endpoint (ATT) improves trade-off over fixed cycle counts.
  • Per-instance cycle count/time correlates with instance difficulty.
  • Sample diversity is preserved even as convergence to high-reward regions occurs (assessed via CLIP cosine similarity).

7. Relation to Prior Methods and Applicability

Uni-directional approaches—Base, BoN, SMC, BS—lack adaptability, leading to wasted computation on easy cases or under-exploration on difficult ones. SoP introduces backward steps but is restricted to a fixed schedule. ABCD generalizes these approaches by:

  • Allowing bi-directional cycling and flexible exploration at every cycle.
  • Distributing computation both globally and locally via the temperature pool.
  • Providing a principled, instance-wise stopping criterion (ATT) with convergence guarantees.

ABCD is especially well-suited for tasks with heterogeneous instance difficulty—including reasoning/planning (e.g., maze navigation, Sudoku), generative tasks that require focused computation (e.g., stable molecule synthesis, high-fidelity image generation), or scenarios with nondifferentiable rewards. As it relies only on a black-box verifier, ABCD supports a broad range of problems beyond differentiable or likelihood-based domains.

In summary, ABCD advances inference-time scaling in diffusion-based models by treating sampling as an adaptive search process, offering flexible resource allocation, and furnishing practical guarantees across a variety of domains (Lee et al., 20 May 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adaptive Bi-directional Cyclic Diffusion (ABCD).