Adjacency-Adaptive Dynamical Draft Trees

Updated 31 December 2025

Adjacency-Adaptive Dynamical Draft Trees (ADT-Tree) are adaptive parallel decoding protocols that adjust tree depth and width based on local spatial token difficulty in visual autoregressive models.
The method leverages adjacent token statistics and dynamic adaptation to optimize inference, achieving speedups of up to 3.13× on benchmarks like MS-COCO and PartiPrompts.
Empirical evaluations demonstrate that ADT-Tree maintains high image quality while reducing computational steps, making it a promising approach for efficient visual model decoding.

Adjacency-Adaptive Dynamical Draft Trees (ADT-Tree) are an adaptive parallel decoding protocol for visual autoregressive (AR) models, designed to mitigate sequential inference bottlenecks rooted in spatially heterogeneous token difficulty. ADT-Tree dynamically modifies the depth and width parameters of draft trees in response to the empirical acceptance rate of previously decoded, spatially adjacent tokens. The parallelization and adaptation strategies employed enable substantial acceleration—for instance, achieving 3.13× speedup on MS-COCO 2017 and 3.05× on PartiPrompts—with no quantifiable loss in image quality (Lei et al., 26 Dec 2025).

1. Motivation and Context

AR image models (e.g., EMU3, Anole, Lumina-mGPT) deliver competitive image quality but are impaired by tokenwise sequential generation, incurring ≈2,000 steps for $576\times576$ images ( $W\times H$ grid). Text-domain speculative decoding protocols, such as “draft-then-verify,” attain 2–4× acceleration in LLMs due to high acceptance rates (≈70%). However, applying static draft tree approaches (e.g., EAGLE-2) to visual AR models leads to inconsistent and low acceptance rates (often under 50%), attributed to spatially variable prediction difficulty across image regions. This phenomena manifests as dramatic acceptance length ( $\tau$ ) heterogeneity, impeding acceleration if tree parameters are fixed. ADT-Tree resolves these issues by leveraging adjacent token states and past acceptance statistics to dynamically adapt tree structure during inference.

2. Algorithmic Architecture

At each pixel index $(i,j)$ , ADT-Tree executes a five-step workflow:

Adjacency-based Initialization: Initializes depth $\tilde d_{i,j}$ and width $\tilde k_{i,j}$ by horizontally repeating the parameters used for the previous token in the same row: $\tilde d_{i,j} = d_{i,j-1}$ , $\tilde k_{i,j} = k_{i,j-1}$ .
Draft Tree Construction: Builds a draft tree $\mathcal{T}_{\rm draft}$ (depth $\tilde d$ , width $\tilde k$ ) under a draft model $\mathcal{R}$ .
Acceptance Evaluation: Computes acceptance rate $\alpha = \tau/\tilde d$ by verifying the draft tree predictions via the heavy target model $\mathcal{L}$ .
Bisectional Dynamic Adaptation: For the next inference position, applies a clipped update based on $\alpha$ . If $\alpha \geq \beta$ , increment depth and decrement width; otherwise, decrement depth and increment width. Specifically,

$d^{(t+1)} = \begin{cases} \tilde d + l_d, & \alpha \geq \beta \ \tilde d - l_d, & \alpha < \beta \end{cases}\quad k^{(t+1)} = \begin{cases} \tilde k - l_k, & \alpha \geq \beta \ \tilde k + l_k, & \alpha < \beta \end{cases}$

Token Emission: Emits $\tau$ accepted tokens from $\mathcal{T}_{\rm draft}$ , updates state, and advances to the next position.

The full pseudocode is specified in the source (Lei et al., 26 Dec 2025).

3. Mathematical Formulation

For region $r$ (typically an individual token), let the empirical acceptance rate over $N_r$ attempts be

$A_r = \frac{1}{N_r} \sum_{i=1}^{N_r} \mathbb{I}[y_i = \hat y_i],$

where $y_i$ is the target token and $\hat y_i$ is the draft token. The depth and width parameters update by

$d_r^{(t+1)} = d_r^{(t)} + \Delta d(A_r), \quad w_r^{(t+1)} = w_r^{(t)} + \Delta w(A_r)$

with $\Delta d, \Delta w$ piecewise constant functions determined by threshold comparison ( $A_r\geq\beta$ or $A_r<\beta$ ).

4. Implementation Details

4.1 Tree Representation

Draft trees are encoded as lists of depth-indexed layers, each holding up to $\hat k$ nodes.
Each node maintains a partial path confidence $P_v = \prod_{u \in \text{path}(v)} q(u)$ , computed under the draft model.
Children for all current-layer nodes are constructed in parallel, followed by top- $k$ selection via branchwise confidence sorting.

4.2 Integration with Relaxed Sampling

ADT-Tree is agnostic to the verification criterion. With LANTERN (relaxed speculative decoding), the protocol replaces the strict ratio test $\min(1,\,p/q)$ with a slackened threshold. Practically,

1 2	r_{t+j} = min(1, p(ŝ\|…)/q(ŝ\|…)) if r_{t+j} ≥ δ then accept else stop

is used as input to the same ADT-Tree adaptation loop, forming the “ADT-Tree+LANTERN” variant.

5. Empirical Evaluation and Comparative Performance

Experiments were conducted on MS-COCO 2017 and PartiPrompts datasets, leveraging Anole-7B and LlamaGen variants, using draft models trained on LAION-COCO. Metrics included:

Speed-up Ratio (SR) vs. vanilla AR decoding
Mean acceptance length ( $\tau$ )
Mean draft-tree depth ( $\bar d$ )
Downstream alignment/quality (CLIP-Score, HPSv2, FID, Inception Score, Aesthetic)

MS-COCO 2017 ( $T=0$ )

Method	SR	$\tau$	$\bar d$
Anole baseline	1.00×	1.00	1.00
EAGLE-2	1.62×	2.91	5.00
LANTERN	3.03×	4.25	5.00
ADT-Tree	2.21×	3.40	3.86
ADT-Tree+LANTERN	3.13×	4.86	5.15

PartiPrompts ( $T=0$ )

Method	SR	$\tau$	$\bar d$
ADT-Tree	2.24×	2.79	3.43
ADT-Tree+LANTERN	3.05×	3.97	4.31

All approaches maintained image quality as measured by CLIP, HPSv2, FID, Inception Score, and Aesthetic, within baseline tolerance.

Ablation and Qualitative Observations

The “Horizontal Repeat” initialization strategy for tree parameters consistently outperforms “Vertical Repeat” or randomized alternatives.
Fixed tree parameters degrade speed-up, confirming the necessity of simultaneous depth and width adaptation.
ADT-Tree autonomously allocates deeper, narrower trees in locally smooth (low complexity) regions and wider, shallower trees in complex (object boundary) areas, reflecting local prediction difficulty.

6. Discussion, Limitations, and Prospects

By dynamically matching draft tree structure to spatial token difficulty, ADT-Tree reduces wasted computation in simple regions and boosts search capacity in difficult regions, optimizing $\mathbb{E}[\tau/\hat d]$ . The spatial coherence of token difficulty underpins the efficacy of horizontal adjacency-based initialization. Limitations arise if prediction difficulty is spatially uniform; under such circumstances, the gains over static trees diminish.

This suggests future exploration could incorporate local patch variance or learned difficulty predictors to refine adaptation, and extensions to non-AR vision or video generation present further research directions.

ADT-Tree thus constitutes a lightweight module for the adaptive parallelization of visual AR decoding, exploiting spatial dependencies to maximize inference speed without degrading output fidelity (Lei et al., 26 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

Fast Inference of Visual Autoregressive Model with Adjacency-Adaptive Dynamical Draft Trees (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Adjacency-Adaptive Dynamical Draft Trees (ADT-Tree).

Adjacency-Adaptive Dynamical Draft Trees

1. Motivation and Context

2. Algorithmic Architecture

3. Mathematical Formulation

4. Implementation Details

4.1 Tree Representation

4.2 Integration with Relaxed Sampling

5. Empirical Evaluation and Comparative Performance

MS-COCO 2017 ( $T=0$ )

PartiPrompts ( $T=0$ )

Ablation and Qualitative Observations

6. Discussion, Limitations, and Prospects

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Adjacency-Adaptive Dynamical Draft Trees

1. Motivation and Context

2. Algorithmic Architecture

3. Mathematical Formulation

4. Implementation Details

4.1 Tree Representation

4.2 Integration with Relaxed Sampling

5. Empirical Evaluation and Comparative Performance

MS-COCO 2017 (T=0T=0T=0)

PartiPrompts (T=0T=0T=0)

Ablation and Qualitative Observations

6. Discussion, Limitations, and Prospects

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

MS-COCO 2017 ( $T=0$ )

PartiPrompts ( $T=0$ )