Adaptive Bi-directional Cyclic Diffusion (ABCD)
- The paper introduces ABCD as an adaptive inference framework that iteratively refines a population of candidate samples using bi-directional cyclic diffusion cycles.
- It employs a temperature pool to dynamically balance exploration and exploitation, allocating compute based on instance difficulty and applying an early stopping criterion.
- Empirical evaluations demonstrate ABCD's effectiveness in tasks like Sudoku and molecule generation, achieving high success rates and improved time-accuracy trade-offs.
Adaptive Bi-directional Cyclic Diffusion (ABCD) is an adaptive, search-based inference framework for diffusion models that dynamically scales computational effort during sampling, allowing for instance-specific allocation of compute and principled early stopping. In contrast to conventional uni-directional, fixed-schedule denoising, ABCD iteratively refines a population of candidate samples through bi-directional diffusion cycles, automatically tuning the depth of exploration and termination criterion to maximize a task-specific reward or verifier. ABCD consists of three main components: Cyclic Diffusion Search, Automatic Exploration-Exploitation Balancing, and Adaptive Thinking Time, and has demonstrated empirical effectiveness across generative and reasoning tasks by concentrating compute where needed without sacrificing efficiency (Lee et al., 20 May 2025).
1. Formal Definition and Architecture
Let denote a pretrained diffusion model characterized by a forward noising kernel and a reverse denoiser . Given a task-specific reward function and a population of candidate samples ("particles"), standard sampling runs a uni-directional reverse chain to produce one output. ABCD reframes this as a search process that:
- Iteratively seeks to maximize by refining particle populations.
- Allocates increased computation adaptively to more difficult instances.
- Dynamically terminates inference when further improvement is unlikely.
The core algorithm introduces three elements: (i) Bi-directional cycling through the diffusion timeline (combining both forward "go-back" noising and reverse denoising steps). (ii) Multi-level allocation of particles at different stages of the diffusion process ("temperature pool"). (iii) Adaptive termination dictated by the dynamic progress of top candidates.
2. Cyclic Diffusion Search: Mechanics and Mathematical Formulation
Cyclic Diffusion Search (CDS) is the operational foundation of ABCD. Evaluation steps are as follows:
- Define the "temperature pool" (), which specifies the set of possible "go-back" levels for each cycle.
- Maintain a population of particles 0 at each cycle 1. Each particle is associated with a verifier score 2.
- Each cycle consists of:
- Fast Denoising: Initialize by applying 3 for all 4.
- Selection-and-Copy: Select the indices 5 of the top-6 particles by 7, then replicate each selected particle 8 times (9 total).
- Noising (Forward): For each replica and each 0, sample 1.
- Denoising (Reverse): Run 2 for all noised particles to produce new 3.
- Population Update: Merge 4 outputs, evaluate 5, and proceed to the next cycle.
Forward (noising) and reverse updates use DDIM-like transitions: 6
7
The process continues until the adaptive stopping criterion is triggered.
3. Automatic Exploration–Exploitation Balancing
ABCD provides a mechanism for adjusting exploration depth via the temperature pool, denoted as Automatic Exploration–Exploitation Balancing (AEEB). Instead of a fixed go-back parameter 8, the process distributes each selected survivor across all 9 temperature levels in 0 during each cycle. For survivors 1:
2
No explicit regularizer is applied; underperforming go-back levels are naturally filtered out by selection in subsequent cycles. The hyperparameters controlling this trade-off are 3 (pool size), 4 (replicas per anchor), and 5 (survivors per cycle). This structure inherently supports both global exploration (large 6) and fine local refinement (small 7), adaptively tailored per instance.
4. Adaptive Thinking Time: Cycle Termination and Instance Difficulty
Adaptive Thinking Time (ATT) governs termination by monitoring which go-back levels produce the current top-8 solutions across cycles. For multiset 9 of the go-back indices for the present top-0:
- Define 1.
- Inference terminates at cycle 2 if 3 for all 4, for given persistence 5.
This allows cycles to persist longer on more challenging instances, as reflected by the observed history of 6, and terminate rapidly for easier ones.
The hyperparameters are 7 (controls duration before termination upon repeated exploitation) and 8 (enforces an upper limit on the number of cycles).
5. Computational Complexity and Convergence Guarantees
Pseudocode formalizes the ABCD algorithm with per-cycle computational cost:
9
where 0 is the number of DDIM steps and choosing 1 attains overall per-cycle cost 2. The algorithm is guaranteed to terminate in finite steps with probability 1, provided the reward is bounded above by 3, the denoising step has full support, and selection is monotonic:
- Theorem 1 (Finite-time termination): Using ATT, termination occurs in finite 4 almost surely, with the best reward in the final top-5 converging to 6.
- Proof outline: Each cycle has constant nonzero probability 7 of hitting the global maximum reward 8. Once 9 appears in the top-0, subsequent cycles quickly enter persistent exploitation, triggering ATT's stopping criterion.
6. Empirical Evaluation and Ablation Studies
ABCD has been benchmarked across six diverse tasks using multiple baselines: Base Diffusion, Best-of-N (BoN), Diffusion Beam Search (BS), Sequential Monte Carlo (SMC), and Search-over-Paths (SoP).
| Task | Key Metric(s) | Core Results |
|---|---|---|
| Mixture of Gaussians | success rate | Only ABCD attains 100% success with adaptive go-backs within 25 cycles |
| Sudoku | accuracy, time | ABCD achieves 100% accuracy; SoP caps at ~95.5%; superior time-accuracy trade |
| Pixel Maze | success rate | Near-100% success for sizes up to 15 with far less time than baselines |
| Molecule Generation | stability, time | Peaks at ~0.99 stability in 20s; SoP only reaches ~0.94 in much longer |
| OGBench PointMaze | success rate | Only ABCD to achieve 100% on giant; greater speed for all environments |
| Text-to-Image | compressibility, | Target levels achieved 1 baselines in 21/4 time; fewer seconds needed |
| (Stable Diffusion) | aesthetic, pref. | for matched human preference score (e.g. 81s vs 3300s for SoP) |
Ablation analysis reveals:
- Temperature pool yields higher performance than any fixed go-back.
- Adaptive cycle endpoint (ATT) improves trade-off over fixed cycle counts.
- Per-instance cycle count/time correlates with instance difficulty.
- Sample diversity is preserved even as convergence to high-reward regions occurs (assessed via CLIP cosine similarity).
7. Relation to Prior Methods and Applicability
Uni-directional approaches—Base, BoN, SMC, BS—lack adaptability, leading to wasted computation on easy cases or under-exploration on difficult ones. SoP introduces backward steps but is restricted to a fixed schedule. ABCD generalizes these approaches by:
- Allowing bi-directional cycling and flexible exploration at every cycle.
- Distributing computation both globally and locally via the temperature pool.
- Providing a principled, instance-wise stopping criterion (ATT) with convergence guarantees.
ABCD is especially well-suited for tasks with heterogeneous instance difficulty—including reasoning/planning (e.g., maze navigation, Sudoku), generative tasks that require focused computation (e.g., stable molecule synthesis, high-fidelity image generation), or scenarios with nondifferentiable rewards. As it relies only on a black-box verifier, ABCD supports a broad range of problems beyond differentiable or likelihood-based domains.
In summary, ABCD advances inference-time scaling in diffusion-based models by treating sampling as an adaptive search process, offering flexible resource allocation, and furnishing practical guarantees across a variety of domains (Lee et al., 20 May 2025).