RAAG: Ratio-Aware Scheduling in Generative Models

Updated 5 February 2026

RAAG is a class of adaptive guidance mechanisms that modulate guidance scales using a ratio (RATIO) metric to stabilize conditional flow-based sampling.
It employs an exponential decay rule to dampen excessive early guidance, thereby reducing error amplification and preventing trajectory collapse.
Empirical studies demonstrate that RAAG achieves 2×–4× speedups with maintained or enhanced image and video quality metrics.

Ratio-Aware Schedules (RAAG) constitute a class of adaptive guidance mechanisms designed to optimize sampling in flow-based generative models. Their central feature is stepwise adaptation of the guidance scale, informed by analytically defined sensitivity metrics like the RATIO, to stabilize conditional generation—especially in the fast, low-step sampling regime. The RAAG scheme, as introduced and analyzed by Zhu et al. (2025), both diagnoses and remedies a universal instability arising in conditional flow-based sampling with classifier-free guidance, and yields substantial improvements in sample efficiency and controllability across image and video generation tasks (Zhu et al., 5 Aug 2025).

1. Definition and Derivation of the RATIO Metric

Let $x_t$ denote the latent at time $t$ in a flow-based ODE sampler. The classifier-free guidance (CFG) framework computes two key velocities:

Unconditional velocity: $v_u(x_t) = \mathbb{E}[x_1 - x_0 \mid x_t, \varnothing]$ ,
Conditional velocity: $v_c(x_t, c) = \mathbb{E}[x_1 - x_0 \mid x_t, c]$ .

Their difference $\delta(x_t, c) = v_c(x_t, c) - v_u(x_t)$ quantifies the conditional signal. The RATIO at step $t$ is: $\mathrm{RATIO}_t = \frac{\|\,v_c(x_t, c) - v_u(x_t)\,\|_2^2}{\|\,v_u(x_t)\,\|_2^2}$ This metric provides a normalized measure of the relative strength of the conditional cue compared to the unconditional noise, and is intrinsic to the data distribution, unaffected by model architecture.

In the limit $t \to 1$ (the initial reverse step), $v_u(x_1) = x_1 - \mu_u$ and $v_c(x_1, c) = x_1 - \mu_c$ , with $\mu_u = \mathbb{E}[x_0]$ and $\mu_c = \mathbb{E}[x_0 \mid c]$ . Thus,

$\mathrm{RATIO}_{t=1}(c) = \frac{\|\mu_c - \mu_u\|_2^2}{\|x_1 - \mu_u\|_2^2}$

showing the RATIO at the initial step reflects the normalized squared class-conditional shift of data means.

2. Instability in Early Steps: RATIO Spikes and Exponential Error Amplification

Empirical evaluation reveals that $\mathrm{RATIO}_t$ attains its maximum in the earliest reverse steps and declines rapidly thereafter. The underlying causes are:

Dataset-level inevitability: For common datasets, the denominator $\|x_1 - \mu_u\|_2$ and numerator $\|\mu_c - \mu_u\|_2$ yield initial-step RATIO values typically in $[0.5, 1.0]$ , regardless of architecture.
Exponential error amplification: Utilizing a large, fixed guidance scale ( $w>1$ ) when RATIO is large induces sensitivity, whereby small perturbations in the initial step are amplified exponentially. If $A(t) = \|x(t) - y(t)\|_2$ denotes the separation of two nearby reverse trajectories, a Grönwall inequality yields $A(t) \gtrsim (A(0) - B/A) \exp(A t)$ with a proportionality constant $A \propto w \cdot p_\text{max}$ for $p_\text{max}$ the maximal RATIO. Therefore, excessive guidance at high-RATIO steps causes catastrophic “trajectory collapse” and semantic artifacts (Zhu et al., 5 Aug 2025).

3. Closed-Form Adaptive Scheduling: The RAAG Exponential Decay Rule

To stabilize sampling, RAAG proposes to modulate the guidance scale $w_t$ as an explicit function of the observed RATIO at each reverse step: $w(p) = 1 + (w_{\max} - 1)\,\exp(-\alpha p)$ where:

$w_{\max}$ is the user-selected maximal guidance scale (typically $[7, 10]$ ),
$\alpha$ is the exponential decay parameter (typically $[5, 15]$ ).

At each step, compute $p = \mathrm{RATIO}_t$ and set: $g_t = 1 + (g_0 - 1)\,\exp(-\alpha\,\mathrm{RATIO}_t)$ with $g_0 = w_{\max}$ . This exponentially dampens $w_t$ in early steps when RATIO is high, then recovers maximal guidance as RATIO decays, preserving both stability and conditional fidelity.

4. Integration with Flow-Based Generative Sampling

RAAG requires minimal changes to standard flow-based ODE solvers. At each reverse integration step:

Compute $v_u$ , $v_c$ , and $\delta$ as outlined above,
Calculate RATIO,
Set the adaptive guidance scale via the closed-form schedule,
Aggregate the guidance-modulated velocity $v_\text{cfg} = v_u + w_t \cdot \delta$ ,
Apply any high-order ODE solver (e.g., Dopri5, UniPC) for the next latent state.

The computational overhead is negligible: one additional forward pass and pair of norm evaluations per step. No re-training or architectural modification is required (Zhu et al., 5 Aug 2025).

5. Empirical Findings Across Image and Video Generation

RAAG was systematically benchmarked on both image (Stable Diffusion v3.5, Lumina-Next) and video (WAN2.1-1.4B, WAN2.1-14B) generative frameworks. Principal results include:

Sampling acceleration: RAAG allows 10-step sampling to match (SD3.5) or surpass (Lumina-Next) the quality of conventional 30- to 40-step CFG, effecting $2\times$ – $4\times$ speedups.
Image metrics: At 10 steps, ImageReward and CLIPScore metrics match or exceed their CFG 30-step counterparts.
Video metrics: On vBench, imaging and aesthetic quality metrics are substantially higher for RAAG versus standard CFG at matched step count.
GenEval benchmarks: In SD3.5, RAAG increased single-object accuracy by 2.5 percentage points (96.25% to 98.75%) and overall GenEval score by 1.25% absolute.

All improvements meet statistical significance thresholds ( $p<0.001$ ), with confidence intervals reported (Zhu et al., 5 Aug 2025).

6. Robustness and Ablation Analysis

Extensive ablation studies confirm:

Generalization: The exponential decay schedule outperforms linear, sigmoid, and inverse-proportional alternatives.
Hyperparameter insensitivity: Performance varies by less than 1 percentage point across $w_{\max} \in [5, 15]$ and $\alpha \in [5, 20]$ .
Scheduler and architecture independence: Comparable gains appear with different ODE solvers and across diverse backbone architectures (transformer and CNN flow models).
Universality: The RAAG framework is agnostic to specific model and dataset choices, requiring minimal tuning (Zhu et al., 5 Aug 2025).

7. Limitations and Future Directions

RAAG is presently adapted for rectified-flow ODE-based samplers. Preliminary investigations on stochastic diffusion (e.g., Stable Diffusion v2) reveal marginal gains, suggesting architectural specificity. Furthermore, in high-step-length scenarios ( $>40$ steps), the impact of the initial stepwise adaptation diminishes. Proposed avenues for extension include adapting RATIO-aware schedules to stochastic differential equation frameworks, learning data-driven $w(p)$ mappings, and transposing the approach to multimodal or autoregressive flows.

RAAG identifies and addresses the critical, often overlooked instability induced by high initial guidance in flow-based conditional generative modeling. Its closed-form, ratio-adaptive schedule achieves significant acceleration and stability with negligible computational overhead and no alteration to model architectures (Zhu et al., 5 Aug 2025).

Markdown Report Issue Upgrade to Chat

References (1)

RAAG: Ratio Aware Adaptive Guidance (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Ratio-Aware Schedules (RAAG).