Papers
Topics
Authors
Recent
Search
2000 character limit reached

RAAG: Ratio-Aware Scheduling in Generative Models

Updated 5 February 2026
  • RAAG is a class of adaptive guidance mechanisms that modulate guidance scales using a ratio (RATIO) metric to stabilize conditional flow-based sampling.
  • It employs an exponential decay rule to dampen excessive early guidance, thereby reducing error amplification and preventing trajectory collapse.
  • Empirical studies demonstrate that RAAG achieves 2×–4× speedups with maintained or enhanced image and video quality metrics.

Ratio-Aware Schedules (RAAG) constitute a class of adaptive guidance mechanisms designed to optimize sampling in flow-based generative models. Their central feature is stepwise adaptation of the guidance scale, informed by analytically defined sensitivity metrics like the RATIO, to stabilize conditional generation—especially in the fast, low-step sampling regime. The RAAG scheme, as introduced and analyzed by Zhu et al. (2025), both diagnoses and remedies a universal instability arising in conditional flow-based sampling with classifier-free guidance, and yields substantial improvements in sample efficiency and controllability across image and video generation tasks (Zhu et al., 5 Aug 2025).

1. Definition and Derivation of the RATIO Metric

Let xtx_t denote the latent at time tt in a flow-based ODE sampler. The classifier-free guidance (CFG) framework computes two key velocities:

  • Unconditional velocity: vu(xt)=E[x1x0xt,]v_u(x_t) = \mathbb{E}[x_1 - x_0 \mid x_t, \varnothing],
  • Conditional velocity: vc(xt,c)=E[x1x0xt,c]v_c(x_t, c) = \mathbb{E}[x_1 - x_0 \mid x_t, c].

Their difference δ(xt,c)=vc(xt,c)vu(xt)\delta(x_t, c) = v_c(x_t, c) - v_u(x_t) quantifies the conditional signal. The RATIO at step tt is: RATIOt=vc(xt,c)vu(xt)22vu(xt)22\mathrm{RATIO}_t = \frac{\|\,v_c(x_t, c) - v_u(x_t)\,\|_2^2}{\|\,v_u(x_t)\,\|_2^2} This metric provides a normalized measure of the relative strength of the conditional cue compared to the unconditional noise, and is intrinsic to the data distribution, unaffected by model architecture.

In the limit t1t \to 1 (the initial reverse step), vu(x1)=x1μuv_u(x_1) = x_1 - \mu_u and vc(x1,c)=x1μcv_c(x_1, c) = x_1 - \mu_c, with μu=E[x0]\mu_u = \mathbb{E}[x_0] and μc=E[x0c]\mu_c = \mathbb{E}[x_0 \mid c]. Thus,

RATIOt=1(c)=μcμu22x1μu22\mathrm{RATIO}_{t=1}(c) = \frac{\|\mu_c - \mu_u\|_2^2}{\|x_1 - \mu_u\|_2^2}

showing the RATIO at the initial step reflects the normalized squared class-conditional shift of data means.

2. Instability in Early Steps: RATIO Spikes and Exponential Error Amplification

Empirical evaluation reveals that RATIOt\mathrm{RATIO}_t attains its maximum in the earliest reverse steps and declines rapidly thereafter. The underlying causes are:

  • Dataset-level inevitability: For common datasets, the denominator x1μu2\|x_1 - \mu_u\|_2 and numerator μcμu2\|\mu_c - \mu_u\|_2 yield initial-step RATIO values typically in [0.5,1.0][0.5, 1.0], regardless of architecture.
  • Exponential error amplification: Utilizing a large, fixed guidance scale (w>1w>1) when RATIO is large induces sensitivity, whereby small perturbations in the initial step are amplified exponentially. If A(t)=x(t)y(t)2A(t) = \|x(t) - y(t)\|_2 denotes the separation of two nearby reverse trajectories, a Grönwall inequality yields A(t)(A(0)B/A)exp(At)A(t) \gtrsim (A(0) - B/A) \exp(A t) with a proportionality constant AwpmaxA \propto w \cdot p_\text{max} for pmaxp_\text{max} the maximal RATIO. Therefore, excessive guidance at high-RATIO steps causes catastrophic “trajectory collapse” and semantic artifacts (Zhu et al., 5 Aug 2025).

3. Closed-Form Adaptive Scheduling: The RAAG Exponential Decay Rule

To stabilize sampling, RAAG proposes to modulate the guidance scale wtw_t as an explicit function of the observed RATIO at each reverse step: w(p)=1+(wmax1)exp(αp)w(p) = 1 + (w_{\max} - 1)\,\exp(-\alpha p) where:

  • wmaxw_{\max} is the user-selected maximal guidance scale (typically [7,10][7, 10]),
  • α\alpha is the exponential decay parameter (typically [5,15][5, 15]).

At each step, compute p=RATIOtp = \mathrm{RATIO}_t and set: gt=1+(g01)exp(αRATIOt)g_t = 1 + (g_0 - 1)\,\exp(-\alpha\,\mathrm{RATIO}_t) with g0=wmaxg_0 = w_{\max}. This exponentially dampens wtw_t in early steps when RATIO is high, then recovers maximal guidance as RATIO decays, preserving both stability and conditional fidelity.

4. Integration with Flow-Based Generative Sampling

RAAG requires minimal changes to standard flow-based ODE solvers. At each reverse integration step:

  1. Compute vuv_u, vcv_c, and δ\delta as outlined above,
  2. Calculate RATIO,
  3. Set the adaptive guidance scale via the closed-form schedule,
  4. Aggregate the guidance-modulated velocity vcfg=vu+wtδv_\text{cfg} = v_u + w_t \cdot \delta,
  5. Apply any high-order ODE solver (e.g., Dopri5, UniPC) for the next latent state.

The computational overhead is negligible: one additional forward pass and pair of norm evaluations per step. No re-training or architectural modification is required (Zhu et al., 5 Aug 2025).

5. Empirical Findings Across Image and Video Generation

RAAG was systematically benchmarked on both image (Stable Diffusion v3.5, Lumina-Next) and video (WAN2.1-1.4B, WAN2.1-14B) generative frameworks. Principal results include:

  • Sampling acceleration: RAAG allows 10-step sampling to match (SD3.5) or surpass (Lumina-Next) the quality of conventional 30- to 40-step CFG, effecting 2×2\times4×4\times speedups.
  • Image metrics: At 10 steps, ImageReward and CLIPScore metrics match or exceed their CFG 30-step counterparts.
  • Video metrics: On vBench, imaging and aesthetic quality metrics are substantially higher for RAAG versus standard CFG at matched step count.
  • GenEval benchmarks: In SD3.5, RAAG increased single-object accuracy by 2.5 percentage points (96.25% to 98.75%) and overall GenEval score by 1.25% absolute.

All improvements meet statistical significance thresholds (p<0.001p<0.001), with confidence intervals reported (Zhu et al., 5 Aug 2025).

6. Robustness and Ablation Analysis

Extensive ablation studies confirm:

  • Generalization: The exponential decay schedule outperforms linear, sigmoid, and inverse-proportional alternatives.
  • Hyperparameter insensitivity: Performance varies by less than 1 percentage point across wmax[5,15]w_{\max} \in [5, 15] and α[5,20]\alpha \in [5, 20].
  • Scheduler and architecture independence: Comparable gains appear with different ODE solvers and across diverse backbone architectures (transformer and CNN flow models).
  • Universality: The RAAG framework is agnostic to specific model and dataset choices, requiring minimal tuning (Zhu et al., 5 Aug 2025).

7. Limitations and Future Directions

RAAG is presently adapted for rectified-flow ODE-based samplers. Preliminary investigations on stochastic diffusion (e.g., Stable Diffusion v2) reveal marginal gains, suggesting architectural specificity. Furthermore, in high-step-length scenarios (>40>40 steps), the impact of the initial stepwise adaptation diminishes. Proposed avenues for extension include adapting RATIO-aware schedules to stochastic differential equation frameworks, learning data-driven w(p)w(p) mappings, and transposing the approach to multimodal or autoregressive flows.

RAAG identifies and addresses the critical, often overlooked instability induced by high initial guidance in flow-based conditional generative modeling. Its closed-form, ratio-adaptive schedule achieves significant acceleration and stability with negligible computational overhead and no alteration to model architectures (Zhu et al., 5 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Ratio-Aware Schedules (RAAG).