CFG Resolution Weighting (CFG-RW)

Updated 18 December 2025

CFG-RW is a method that rectifies the expectation shift in conventional classifier-free guidance by modifying the coefficient constraints.
It relaxes the traditional linear sum-to-one restriction, enforcing a zero-mean property to maintain diffusion process consistency.
Empirical results show that CFG-RW enhances FID scores and conditional alignment across various diffusion samplers with minimal computational overhead.

CFG Resolution Weighting (CFG-RW), more rigorously characterized as Rectified Classifier-Free Guidance (ReCFG), refers to a post-hoc modification of the coefficient selection used for classifier-free guidance in diffusion model sampling. Conventional classifier-free guidance (CFG) employs a linear combination of conditional and unconditional score estimates, governed by coefficients that sum to unity. However, this approach introduces a systematic bias—an "expectation shift"—which theoretically disrupts the reciprocity of the reverse diffusion process. CFG Resolution Weighting corrects this bias by relaxing the “sum-to-one” constraint, instead solving for guidance coefficients that enforce a zero-mean property of the combined score, thereby restoring theoretical consistency with the forward–reverse SDE/ODE framework and improving sampling fidelity in conditional generative modeling (Xia et al., 24 Oct 2024).

1. Theoretical Basis and Expectation Shift

Standard CFG replaces the true conditional score $\nabla_{x_t}\log q_t(x_t\mid c)$ with a weighted mixture: $\nabla_{x_t}\log q_{t,\gamma}(x_t\mid c) = \gamma\,\nabla_{x_t}\log q_t(x_t\mid c) + (1-\gamma)\,\nabla_{x_t}\log q_t(x_t)$ for $\gamma > 1$ . In the $\epsilon$ -prediction formulation this corresponds to

$\epsilon_{\gamma}(x_t,c,t) = \gamma\,\epsilon_\theta(x_t,c,t) + (1-\gamma)\,\epsilon_\theta(x_t,t)$

While this sharpens the conditional distribution $q_t(x_t\mid c)^\gamma q_t(x_t)^{1-\gamma}$ , it violates the zero-mean property that is critical for diffusion-theoretic reversibility. Specifically,

$\mathbb{E}_{x_t\sim q_t(\cdot\mid c)}[\epsilon_\theta(x_t,t)] \neq 0$

and thus,

$\mathbb{E}_{x_t}[\nabla_{x_t}\log q_{t,\gamma}(x_t\mid c)] = (1-\gamma)\,\mathbb{E}_{x_t}[\nabla_{x_t}\log q_t(x_t)] \neq 0$

This expectation shift prevents the reverse process from precisely inverting the forward diffusion, resulting in a systematic bias away from $\mathbb{E}[x_0\mid c]$ (Xia et al., 24 Oct 2024).

2. Derivation of Rectified Guidance Weights

CFG Resolution Weighting introduces two free coefficients, $\alpha_c$ and $\alpha_u$ , corresponding to conditional and unconditional branches: $\nabla_{x_t}\log q_{t,\alpha_c,\alpha_u}(x_t\mid c) = \alpha_c\,\nabla_{x_t}\log q_t(x_t\mid c) + \alpha_u\,\nabla_{x_t}\log q_t(x_t)$ with the $\epsilon$ -space form: $\epsilon_{\alpha_c,\alpha_u}(x_t,c,t) = \alpha_c\,\epsilon_\theta(x_t,c,t) + \alpha_u\,\epsilon_\theta(x_t,t)$ A zero-expectation (annihilation) constraint is enforced: $\mathbb{E}_{x_t\sim q_t(\cdot\mid c)}\left[\alpha_c\,\epsilon_\theta(x_t,c,t) + \alpha_u\,\epsilon_\theta(x_t,t)\right] = 0$ Estimating $A(t,c) = \mathbb{E}_{x_t}[\epsilon_\theta(x_t,c,t)]$ and $U(t) = \mathbb{E}_{x_t}[\epsilon_\theta(x_t,t)]$ via Monte Carlo, the optimal coefficient is found in closed form: $\alpha_u(t,c) = -\,\alpha_c\,\frac{A(t,c)}{U(t)}$ Practically, $\alpha_c$ is set to the guidance strength $w>1$ , so

$\alpha_u(t,c) = -w\,\frac{\mathbb{E}_{x_t}[\epsilon_\theta(x_t,c,t)]}{\mathbb{E}_{x_t}[\epsilon_\theta(x_t,t)]}$

with practical constraints $\alpha_u\leq0$ and $\alpha_c+\alpha_u\geq1$ typically satisfied for $w>1$ .

The relationship to original CFG is outlined as follows:

Approach	Coefficient Form	Constraint
CFG	$(\gamma, 1-\gamma)$	$\gamma+1-\gamma=1$
ReCFG	$(w, \alpha_u(t,c))$	No sum constraint

3. Computation of Resolution Weights

CFG-RW requires precomputing the ratio

$\rho(t,c) = \frac{\mathbb{E}_{x_t}[\epsilon_\theta(x_t,c,t)]}{\mathbb{E}_{x_t}[\epsilon_\theta(x_t,t)]}$

for each condition $c$ and timestep $t$ . This is achieved through a single-pass Monte Carlo estimate across the dataset:

Initialize accumulators $S_c(t)\leftarrow0$ , $S_u(t)\leftarrow0$ , $N(t)\leftarrow0$ for each $(c,t)$ .
For each data sample $(x_0,c)$ $(x_{0}, c)$ and each time $t$ $t$ :
- Draw $\epsilon\sim\mathcal{N}(0,I)$ , set $x_t = \alpha_t x_0 + \sigma_t \epsilon$ .
- Compute $\epsilon_c = \epsilon_\theta(x_t,c,t)$ and $\epsilon_u = \epsilon_\theta(x_t,t)$ .
- Accumulate $S_c(t) += \epsilon_c$ , $S_u(t) += \epsilon_u$ , $N(t) += 1$ .
After traversal, set $A(t,c) = S_c(t)/N(t)$ , $U(t)=S_u(t)/N(t)$ , and $\rho(t,c) = A(t,c)/U(t)$ .

This lookup table enables efficient runtime, as the coefficients can be retrieved with minimal computational overhead (Xia et al., 24 Oct 2024).

4. Integration with Diffusion Model Samplers

Most state-of-the-art diffusion samplers (e.g., DDIM, Euler–Maruyama, EDM2, SD3) use the following procedure in each denoising step:

for t in timesteps:
    eps_c = model(x_t, c, t)
    eps_u = model(x_t,    t)  # unconditional
    eps = beta_c * eps_c + beta_u * eps_u
    x_{t-1} = SamplerStep(x_t, eps, t)

ReCFG replaces

\beta_c,\beta_u

with

\alpha_c=w

and

\alpha_u(t,c) = -w \rho(t,c)

for t in timesteps:
    eps_c = model(x_t, c, t)
    eps_u = model(x_t,    t)
    eps   = w*eps_c  +  (-w*rho(t,c))*eps_u
    x_{t-1} = SamplerStep(x_t, eps, t)

No retraining or modification to the network is required. This modification is compatible with both class-conditioned (e.g., EDM2 on ImageNet) and text-conditioned (e.g., SD3 on CC12M) models (Xia et al., 24 Oct 2024).

5. Empirical Performance and Ablation Highlights

Empirical studies show quantifiable gains in both fidelity and conditional faithfulness:

Model/Dataset	Standard CFG	ReCFG (CFG-RW)	Metric	Change
LDM, ImageNet 256×256, 20 steps	FID ≈ 18.9	FID ≈ 16.9	FID	↓ 2.0
EDM2-S, ImageNet 512×512, 63 steps	FID ≈ 5.9	FID ≈ 4.8	FID	↓ 1.1
SD3, CC12M 512×512, 25 steps	CLIP ≈ 0.268, FID ≈ 72.2	CLIP ≈ 0.270, FID ≈ 71.8	CLIP, FID	↑0.002, ↓0.4

Ablation reveals:

Lookup table estimates saturate in performance after ≈300 traversals per condition.
The mean ratio $\rho(t,c)$ varies minimally across $c$ , justifying use of a global or average $\bar\rho(t)$ for open-vocabulary text models with negligible loss (CLIP-Score loss ≤ 0.001).
Storing pixel-wise $\rho(\cdot, t)$ yields marginal additional benefit over a scalar per $(t,c)$ .

A one-dimensional Gaussian toy example demonstrates that standard CFG systematically shifts the mean, while ReCFG retrieves the exact mean and reduces variance (Xia et al., 24 Oct 2024).

6. Practical Recommendations

Guidance strength $w\in[1.5,10]$ mediates the trade-off between fidelity and diversity, with optimal performance at $w\approx2$ –$5$ for class-conditional models and higher $w$ for open-vocabulary prompting.
In practice, $\rho(t,c)\approx 1.0\pm0.02$ , so $\alpha_u \approx -w$ with minor corrections; thus, ReCFG closely approximates boosting the conditional branch by $w$ while ensuring zero expectation in the unconditional branch.
For high-resolution synthesis, as $t\to 0$ , $\rho(t,c)\to 1$ , allowing $\alpha_u \to -(w-1)$ for stability in late denoising steps.
ReCFG can be rapidly implemented post-hoc for any pretrained conditional diffusion model with negligible computational overhead and consistent performance gains.

7. Significance and Implications

CFG Resolution Weighting enforces the theoretical zero-mean property absent in standard CFG by removing the linear coefficient constraint, aligning the sampling process with the requirements of diffusion SDE/ODE theory. The post-hoc nature and closed-form solution for the rectified coefficients permit integration without retraining or architecture changes. Empirical results indicate systematic improvements in FID and conditional alignment for both class-labeled and open-vocabulary generative tasks. The minimal variation in $\rho(t,c)$ across conditions suggests potential for further optimization in lookup-table storage and runtime efficiency. A plausible implication is that this approach may generalize beyond diffusion samplers currently demonstrated, offering a template for theoretical corrections to guidance heuristics in other generative domains (Xia et al., 24 Oct 2024).

PDF Markdown Chat (Pro)

References (1)

Rectified Diffusion Guidance for Conditional Generation (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to CFG Resolution Weighting (CFG-RW).