ZeResFDG: Enhanced Diffusion Guidance
- ZeResFDG is a modular sampler-level guidance framework that decouples the guidance signal into low- and high-frequency components to selectively enhance global coherence and micro-details.
- It integrates energy rescaling and zero-projection to mitigate artifacts and maintain natural contrast, ensuring robust and precise outputs at moderate guidance scales.
- By dynamically switching modes using spectral EMA and incorporating the QSilk Micrograin Stabilizer, ZeResFDG offers improved prompt adherence and refined micro-texture control without retraining.
ZeResFDG is a sampler-level guidance framework designed for Stable Diffusion (SD) and SDXL latent diffusion models. It introduces a modular stack that unifies frequency-decoupled guidance, energy rescaling, and zero-projection, augmented by spectral energy tracking and a training-free inference-time stabilizer for controlling micro-texture. ZeResFDG operates without model retraining and interacts with guidance signals at the sampler stage, improving sharpness, prompt adherence, and artifact mitigation at moderate guidance scales.
1. Frequency-Decoupled Guidance
The central operational component of ZeResFDG is its frequency-decoupled guidance formulation. Given the raw guidance signal (difference of conditional and unconditional predictions), ZeResFDG splits this signal into low-frequency and high-frequency bands:
- Low-frequency extraction uses a Gaussian low-pass filter :
- High-frequency component is the residual:
Each frequency component is independently reweighted: the recombined guidance is
where typically (tonal preservation) and (micro-detail accentuation).
This selective reweighting maintains global image coherence while promoting sharp detail. It avoids the common pitfall in classifier-free guidance (CFG) of amplifying coarse structure and fine detail uniformly, which often leads to oversaturation or loss of fidelity.
2. Energy Rescaling of the Guided Prediction
ZeResFDG addresses artifacts linked to strong guidance scales by incorporating energy rescaling. After frequency reweighting and CFG application:
the guided output is rescaled to match the conditional branch’s per-sample standard deviation:
This output is then blended with the unrescaled guidance:
with typically tuned near 0.5. Energy matching constrains global intensity and helps recover natural textural contrast lost at high CFG scales.
3. Zero-Projection to Suppress Unconditional Leakage
Zero-projection removes guided content parallel to the unconditional direction, mitigating drift and tone artifacts. The procedure computes the projection coefficient:
and forms the orthogonalized residual:
This residual, optionally passed through the frequency-decoupling pipeline, is substituted as the effective guidance signal. The approach suppresses artifacts arising from guidance leaking into dominant unconditional eigendirections, a common failure in early denoising steps.
4. Spectral EMA and Mode Switching with Hysteresis
ZeResFDG adapts operational modes according to the spectral energy profile of the guidance signal. It tracks the high-frequency ratio:
This ratio is updated as an exponential moving average (EMA), denoted . The system switches between two modes governed by hysteresis thresholds :
- Conservative mode (“CFGZeroFD”): Activated when , emphasizing zero-projection and suppressing over-dominant unconditional artifacts in early structure formation.
- Detail-seeking mode (“RescaleFDG”): Activated when , switching to energy-rescaled, frequency-enhanced guidance as micro-detail emerges.
Hysteresis secures stable mode transitions, avoiding rapid oscillation due to fluctuating spectral activity. This adaptive switching improves prompt adherence and prevents unwanted color or structure drift.
5. QSilk Micrograin Stabilizer
To further enhance output fidelity and robustness, ZeResFDG integrates the QSilk Micrograin Stabilizer at inference-time. This stabilizer contains:
- Per-Step Quantile Clamp (QClamp): At every denoising iteration, the tensor is clamped within per-sample quantile bounds. This mitigates “spikes” in activation that induce NaN/Inf propagation, ensuring numerical stability.
- Late-Tail Micro-Detail Injection: In late diffusion steps (low ), a residual, , is injected into the output gated by edge and depth maps:
where is a late-phase ramp-up factor, is a Sobel-inverse edge gate, and is near-foreground depth mask. This selectively boosts micro-texture in perceptually salient regions, yielding natural pore, fuzz, and surface texture at high resolutions.
6. Compatibility with Alternative Parameterizations
ZeResFDG operates primarily in SD/SDXL’s -parameterization but is architecturally agnostic. The appendix notes direct transferability to the “velocity” parameterization by substituting with throughout all operations, e.g.,
All frequency splitting, rescaling, zero-projection, and spectral EMA/hysteresis procedures are identical. This parameterization-agnostic design allows integration into a diverse set of latent diffusion pipelines without retraining or significant adaptation.
Summary
ZeResFDG is a modular sampler-level guidance framework that introduces frequency-decoupled guidance, energy rescaling, and zero-projection to SD/SDXL latent diffusion models. By tracking dynamic spectral energy ratios and switching modes adaptively, it achieves improved sharpness and adherence to conditioning prompts while suppressing CFG-induced artifacts. The complementary QSilk Micrograin Stabilizer further ensures robustness and natural micro-texture at inference-time. All methods are inference-only and extend naturally to alternative parameterizations. The integration of ZeResFDG into diffusion pipelines presents an effective means of enhancing visual fidelity and prompt control without altering model weights or retraining (Rychkovskiy et al., 14 Oct 2025).