Papers
Topics
Authors
Recent
2000 character limit reached

CFG Resolution Weighting (CFG-RW)

Updated 18 December 2025
  • CFG-RW is a method that rectifies the expectation shift in conventional classifier-free guidance by modifying the coefficient constraints.
  • It relaxes the traditional linear sum-to-one restriction, enforcing a zero-mean property to maintain diffusion process consistency.
  • Empirical results show that CFG-RW enhances FID scores and conditional alignment across various diffusion samplers with minimal computational overhead.

CFG Resolution Weighting (CFG-RW), more rigorously characterized as Rectified Classifier-Free Guidance (ReCFG), refers to a post-hoc modification of the coefficient selection used for classifier-free guidance in diffusion model sampling. Conventional classifier-free guidance (CFG) employs a linear combination of conditional and unconditional score estimates, governed by coefficients that sum to unity. However, this approach introduces a systematic bias—an "expectation shift"—which theoretically disrupts the reciprocity of the reverse diffusion process. CFG Resolution Weighting corrects this bias by relaxing the “sum-to-one” constraint, instead solving for guidance coefficients that enforce a zero-mean property of the combined score, thereby restoring theoretical consistency with the forward–reverse SDE/ODE framework and improving sampling fidelity in conditional generative modeling (Xia et al., 24 Oct 2024).

1. Theoretical Basis and Expectation Shift

Standard CFG replaces the true conditional score xtlogqt(xtc)\nabla_{x_t}\log q_t(x_t\mid c) with a weighted mixture: xtlogqt,γ(xtc)=γxtlogqt(xtc)+(1γ)xtlogqt(xt)\nabla_{x_t}\log q_{t,\gamma}(x_t\mid c) = \gamma\,\nabla_{x_t}\log q_t(x_t\mid c) + (1-\gamma)\,\nabla_{x_t}\log q_t(x_t) for γ>1\gamma > 1. In the ϵ\epsilon-prediction formulation this corresponds to

ϵγ(xt,c,t)=γϵθ(xt,c,t)+(1γ)ϵθ(xt,t)\epsilon_{\gamma}(x_t,c,t) = \gamma\,\epsilon_\theta(x_t,c,t) + (1-\gamma)\,\epsilon_\theta(x_t,t)

While this sharpens the conditional distribution qt(xtc)γqt(xt)1γq_t(x_t\mid c)^\gamma q_t(x_t)^{1-\gamma}, it violates the zero-mean property that is critical for diffusion-theoretic reversibility. Specifically,

Extqt(c)[ϵθ(xt,t)]0\mathbb{E}_{x_t\sim q_t(\cdot\mid c)}[\epsilon_\theta(x_t,t)] \neq 0

and thus,

Ext[xtlogqt,γ(xtc)]=(1γ)Ext[xtlogqt(xt)]0\mathbb{E}_{x_t}[\nabla_{x_t}\log q_{t,\gamma}(x_t\mid c)] = (1-\gamma)\,\mathbb{E}_{x_t}[\nabla_{x_t}\log q_t(x_t)] \neq 0

This expectation shift prevents the reverse process from precisely inverting the forward diffusion, resulting in a systematic bias away from E[x0c]\mathbb{E}[x_0\mid c] (Xia et al., 24 Oct 2024).

2. Derivation of Rectified Guidance Weights

CFG Resolution Weighting introduces two free coefficients, αc\alpha_c and αu\alpha_u, corresponding to conditional and unconditional branches: xtlogqt,αc,αu(xtc)=αcxtlogqt(xtc)+αuxtlogqt(xt)\nabla_{x_t}\log q_{t,\alpha_c,\alpha_u}(x_t\mid c) = \alpha_c\,\nabla_{x_t}\log q_t(x_t\mid c) + \alpha_u\,\nabla_{x_t}\log q_t(x_t) with the ϵ\epsilon-space form: ϵαc,αu(xt,c,t)=αcϵθ(xt,c,t)+αuϵθ(xt,t)\epsilon_{\alpha_c,\alpha_u}(x_t,c,t) = \alpha_c\,\epsilon_\theta(x_t,c,t) + \alpha_u\,\epsilon_\theta(x_t,t) A zero-expectation (annihilation) constraint is enforced: Extqt(c)[αcϵθ(xt,c,t)+αuϵθ(xt,t)]=0\mathbb{E}_{x_t\sim q_t(\cdot\mid c)}\left[\alpha_c\,\epsilon_\theta(x_t,c,t) + \alpha_u\,\epsilon_\theta(x_t,t)\right] = 0 Estimating A(t,c)=Ext[ϵθ(xt,c,t)]A(t,c) = \mathbb{E}_{x_t}[\epsilon_\theta(x_t,c,t)] and U(t)=Ext[ϵθ(xt,t)]U(t) = \mathbb{E}_{x_t}[\epsilon_\theta(x_t,t)] via Monte Carlo, the optimal coefficient is found in closed form: αu(t,c)=αcA(t,c)U(t)\alpha_u(t,c) = -\,\alpha_c\,\frac{A(t,c)}{U(t)} Practically, αc\alpha_c is set to the guidance strength w>1w>1, so

αu(t,c)=wExt[ϵθ(xt,c,t)]Ext[ϵθ(xt,t)]\alpha_u(t,c) = -w\,\frac{\mathbb{E}_{x_t}[\epsilon_\theta(x_t,c,t)]}{\mathbb{E}_{x_t}[\epsilon_\theta(x_t,t)]}

with practical constraints αu0\alpha_u\leq0 and αc+αu1\alpha_c+\alpha_u\geq1 typically satisfied for w>1w>1.

The relationship to original CFG is outlined as follows:

Approach Coefficient Form Constraint
CFG (γ,1γ)(\gamma, 1-\gamma) γ+1γ=1\gamma+1-\gamma=1
ReCFG (w,αu(t,c))(w, \alpha_u(t,c)) No sum constraint

3. Computation of Resolution Weights

CFG-RW requires precomputing the ratio

ρ(t,c)=Ext[ϵθ(xt,c,t)]Ext[ϵθ(xt,t)]\rho(t,c) = \frac{\mathbb{E}_{x_t}[\epsilon_\theta(x_t,c,t)]}{\mathbb{E}_{x_t}[\epsilon_\theta(x_t,t)]}

for each condition cc and timestep tt. This is achieved through a single-pass Monte Carlo estimate across the dataset:

  1. Initialize accumulators Sc(t)0S_c(t)\leftarrow0, Su(t)0S_u(t)\leftarrow0, N(t)0N(t)\leftarrow0 for each (c,t)(c,t).
  2. For each data sample (x0,c)(x_0,c) and each time tt:
    • Draw ϵN(0,I)\epsilon\sim\mathcal{N}(0,I), set xt=αtx0+σtϵx_t = \alpha_t x_0 + \sigma_t \epsilon.
    • Compute ϵc=ϵθ(xt,c,t)\epsilon_c = \epsilon_\theta(x_t,c,t) and ϵu=ϵθ(xt,t)\epsilon_u = \epsilon_\theta(x_t,t).
    • Accumulate Sc(t)+=ϵcS_c(t) += \epsilon_c, Su(t)+=ϵuS_u(t) += \epsilon_u, N(t)+=1N(t) += 1.
  3. After traversal, set A(t,c)=Sc(t)/N(t)A(t,c) = S_c(t)/N(t), U(t)=Su(t)/N(t)U(t)=S_u(t)/N(t), and ρ(t,c)=A(t,c)/U(t)\rho(t,c) = A(t,c)/U(t).

This lookup table enables efficient runtime, as the coefficients can be retrieved with minimal computational overhead (Xia et al., 24 Oct 2024).

4. Integration with Diffusion Model Samplers

Most state-of-the-art diffusion samplers (e.g., DDIM, Euler–Maruyama, EDM2, SD3) use the following procedure in each denoising step:

1
2
3
4
5
for t in timesteps:
    eps_c = model(x_t, c, t)
    eps_u = model(x_t,    t)  # unconditional
    eps = beta_c * eps_c + beta_u * eps_u
    x_{t-1} = SamplerStep(x_t, eps, t)
ReCFG replaces βc,βu\beta_c,\beta_u with αc=w\alpha_c=w and αu(t,c)=wρ(t,c)\alpha_u(t,c) = -w \rho(t,c):
1
2
3
4
5
for t in timesteps:
    eps_c = model(x_t, c, t)
    eps_u = model(x_t,    t)
    eps   = w*eps_c  +  (-w*rho(t,c))*eps_u
    x_{t-1} = SamplerStep(x_t, eps, t)
No retraining or modification to the network is required. This modification is compatible with both class-conditioned (e.g., EDM2 on ImageNet) and text-conditioned (e.g., SD3 on CC12M) models (Xia et al., 24 Oct 2024).

5. Empirical Performance and Ablation Highlights

Empirical studies show quantifiable gains in both fidelity and conditional faithfulness:

Model/Dataset Standard CFG ReCFG (CFG-RW) Metric Change
LDM, ImageNet 256×256, 20 steps FID ≈ 18.9 FID ≈ 16.9 FID ↓ 2.0
EDM2-S, ImageNet 512×512, 63 steps FID ≈ 5.9 FID ≈ 4.8 FID ↓ 1.1
SD3, CC12M 512×512, 25 steps CLIP ≈ 0.268, FID ≈ 72.2 CLIP ≈ 0.270, FID ≈ 71.8 CLIP, FID ↑0.002, ↓0.4

Ablation reveals:

  • Lookup table estimates saturate in performance after ≈300 traversals per condition.
  • The mean ratio ρ(t,c)\rho(t,c) varies minimally across cc, justifying use of a global or average ρˉ(t)\bar\rho(t) for open-vocabulary text models with negligible loss (CLIP-Score loss ≤ 0.001).
  • Storing pixel-wise ρ(,t)\rho(\cdot, t) yields marginal additional benefit over a scalar per (t,c)(t,c).

A one-dimensional Gaussian toy example demonstrates that standard CFG systematically shifts the mean, while ReCFG retrieves the exact mean and reduces variance (Xia et al., 24 Oct 2024).

6. Practical Recommendations

  • Guidance strength w[1.5,10]w\in[1.5,10] mediates the trade-off between fidelity and diversity, with optimal performance at w2w\approx2–$5$ for class-conditional models and higher ww for open-vocabulary prompting.
  • In practice, ρ(t,c)1.0±0.02\rho(t,c)\approx 1.0\pm0.02, so αuw\alpha_u \approx -w with minor corrections; thus, ReCFG closely approximates boosting the conditional branch by ww while ensuring zero expectation in the unconditional branch.
  • For high-resolution synthesis, as t0t\to 0, ρ(t,c)1\rho(t,c)\to 1, allowing αu(w1)\alpha_u \to -(w-1) for stability in late denoising steps.
  • ReCFG can be rapidly implemented post-hoc for any pretrained conditional diffusion model with negligible computational overhead and consistent performance gains.

7. Significance and Implications

CFG Resolution Weighting enforces the theoretical zero-mean property absent in standard CFG by removing the linear coefficient constraint, aligning the sampling process with the requirements of diffusion SDE/ODE theory. The post-hoc nature and closed-form solution for the rectified coefficients permit integration without retraining or architecture changes. Empirical results indicate systematic improvements in FID and conditional alignment for both class-labeled and open-vocabulary generative tasks. The minimal variation in ρ(t,c)\rho(t,c) across conditions suggests potential for further optimization in lookup-table storage and runtime efficiency. A plausible implication is that this approach may generalize beyond diffusion samplers currently demonstrated, offering a template for theoretical corrections to guidance heuristics in other generative domains (Xia et al., 24 Oct 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to CFG Resolution Weighting (CFG-RW).