Papers
Topics
Authors
Recent
Search
2000 character limit reached

HFS-SDEdit Texture Refinement

Updated 3 July 2026
  • The paper introduces HFS-SDEdit, a texture enhancement method that injects high-frequency details during reverse diffusion to decouple fidelity from quality.
  • It leverages score-based diffusion models with iterative denoising and high-frequency swapping, preserving crucial edge and structural details in 3D textures.
  • Empirical results on GSO and LSDIR benchmarks show that HFS-SDEdit achieves state-of-the-art perceptual scores (e.g., MUSIQ of 66.53) and robust texture–geometry alignment.

HFS-SDEdit (High-Frequency-Swapping SDEdit) is a texture enhancement method designed to refine low-quality 3D asset textures in modern score-based diffusion frameworks. Developed as the core of the Elevate3D pipeline, HFS-SDEdit directly addresses the fidelity–quality trade-off inherent in classical diffusion-based editing, providing state-of-the-art refinement capabilities for both 2D images and 3D model textures while preserving crucial high-frequency structural detail (Ryu et al., 15 Jul 2025).

1. Theoretical Foundation: Score-Based Diffusion Models

HFS-SDEdit operates within the score-based diffusion model framework established by Song & Ermon. The generative process is modeled as a continuous-time stochastic differential equation (SDE) transforming a clean image x0x_0 into Gaussian noise via

dxt=f(xt,t)dt+g(t)dwt,d\mathbf{x}_t = f(\mathbf{x}_t, t)\,dt + g(t)\,d\mathbf{w}_t,

where the drift f(x,t)=12β(t)xf(\mathbf{x}, t) = -\tfrac12 \beta(t)\mathbf{x} and the diffusion g(t)=β(t)g(t) = \sqrt{\beta(t)}, with β(t)\beta(t) as the variance schedule and wt\mathbf{w}_t standard Brownian motion. Sampling is achieved via the reverse-time SDE,

dxt=[f(xt,t)g(t)2xlogpt(xt)]dt+g(t)dwˉt,d\mathbf{x}_t = \left[f(\mathbf{x}_t, t) - g(t)^2\nabla_\mathbf{x} \log p_t(\mathbf{x}_t)\right] dt + g(t)\, d\bar{\mathbf{w}}_t,

with the score xlogpt(x)\nabla_\mathbf{x} \log p_t(\mathbf{x}) approximated by a network sθ\mathbf{s}_\theta. In discrete terms (as in DDPM), the denoising update is

xt1=1αt(xtβt1αˉtsθ(xt,t))+σtz,\mathbf{x}_{t-1} = \frac{1}{\alpha_t}\left(\mathbf{x}_t - \frac{\beta_t}{\sqrt{1-\bar\alpha_t}}\mathbf{s}_\theta(\mathbf{x}_t, t)\right) + \sigma_t\mathbf{z},

where dxt=f(xt,t)dt+g(t)dwt,d\mathbf{x}_t = f(\mathbf{x}_t, t)\,dt + g(t)\,d\mathbf{w}_t,0 are cumulative-product schedule parameters and dxt=f(xt,t)dt+g(t)dwt,d\mathbf{x}_t = f(\mathbf{x}_t, t)\,dt + g(t)\,d\mathbf{w}_t,1 (Ryu et al., 15 Jul 2025).

2. Reference-Guided SDEdit and Its Limitations

SDEdit provides a training-free anchoring of the diffusion process to a reference image dxt=f(xt,t)dt+g(t)dwt,d\mathbf{x}_t = f(\mathbf{x}_t, t)\,dt + g(t)\,d\mathbf{w}_t,2. The user chooses a noise-level dxt=f(xt,t)dt+g(t)dwt,d\mathbf{x}_t = f(\mathbf{x}_t, t)\,dt + g(t)\,d\mathbf{w}_t,3, controlling the trade-off between output quality and fidelity to dxt=f(xt,t)dt+g(t)dwt,d\mathbf{x}_t = f(\mathbf{x}_t, t)\,dt + g(t)\,d\mathbf{w}_t,4. The process involves:

  • Forming the noised latent dxt=f(xt,t)dt+g(t)dwt,d\mathbf{x}_t = f(\mathbf{x}_t, t)\,dt + g(t)\,d\mathbf{w}_t,5, with dxt=f(xt,t)dt+g(t)dwt,d\mathbf{x}_t = f(\mathbf{x}_t, t)\,dt + g(t)\,d\mathbf{w}_t,6.
  • Running reverse diffusion from dxt=f(xt,t)dt+g(t)dwt,d\mathbf{x}_t = f(\mathbf{x}_t, t)\,dt + g(t)\,d\mathbf{w}_t,7 down to dxt=f(xt,t)dt+g(t)dwt,d\mathbf{x}_t = f(\mathbf{x}_t, t)\,dt + g(t)\,d\mathbf{w}_t,8.

Lower dxt=f(xt,t)dt+g(t)dwt,d\mathbf{x}_t = f(\mathbf{x}_t, t)\,dt + g(t)\,d\mathbf{w}_t,9 preserves the fidelity of f(x,t)=12β(t)xf(\mathbf{x}, t) = -\tfrac12 \beta(t)\mathbf{x}0 but does not substantially improve perceptual quality, while higher f(x,t)=12β(t)xf(\mathbf{x}, t) = -\tfrac12 \beta(t)\mathbf{x}1 removes low-frequency domain artifacts but diminishes alignment with the input. This coupling between fidelity and quality is a primary limitation of classical SDEdit (Ryu et al., 15 Jul 2025).

3. High-Frequency-Swapping Mechanism

HFS-SDEdit introduces high-frequency injection during reverse diffusion to decouple fidelity from perceptual quality enhancement. The core insight is that low-frequency bands encode domain appearance, whereas high-frequency components represent perceptual edge and fine structure details.

The procedure is as follows:

  • Initialization: At the chosen f(x,t)=12β(t)xf(\mathbf{x}, t) = -\tfrac12 \beta(t)\mathbf{x}2, generate f(x,t)=12β(t)xf(\mathbf{x}, t) = -\tfrac12 \beta(t)\mathbf{x}3.
  • Iterative Denoising and Swapping (for f(x,t)=12β(t)xf(\mathbf{x}, t) = -\tfrac12 \beta(t)\mathbf{x}4):

    1. Denoise: f(x,t)=12β(t)xf(\mathbf{x}, t) = -\tfrac12 \beta(t)\mathbf{x}5 (reverse-DDPM).
    2. Freshly re-noise the reference: f(x,t)=12β(t)xf(\mathbf{x}, t) = -\tfrac12 \beta(t)\mathbf{x}6.
    3. High-Frequency Swap: Update latent via

    f(x,t)=12β(t)xf(\mathbf{x}, t) = -\tfrac12 \beta(t)\mathbf{x}7

    where f(x,t)=12β(t)xf(\mathbf{x}, t) = -\tfrac12 \beta(t)\mathbf{x}8 is a Gaussian low-pass filter and f(x,t)=12β(t)xf(\mathbf{x}, t) = -\tfrac12 \beta(t)\mathbf{x}9 denotes convolution; high frequencies from g(t)=β(t)g(t) = \sqrt{\beta(t)}0 (the reference) replace those of g(t)=β(t)g(t) = \sqrt{\beta(t)}1 (the diffused latent). 4. Mask Blending (optional): For partial refinement, blend via

    g(t)=β(t)g(t) = \sqrt{\beta(t)}2

    with g(t)=β(t)g(t) = \sqrt{\beta(t)}3 as the refinement mask. 5. Use g(t)=β(t)g(t) = \sqrt{\beta(t)}4 as the latent for the next step.

  • Final Unaltered Decoding: For g(t)=β(t)g(t) = \sqrt{\beta(t)}5, continue with vanilla reverse diffusion.

This mechanism ensures that the final sample g(t)=β(t)g(t) = \sqrt{\beta(t)}6 conforms to the diffusion model distribution in the low frequencies and perfectly aligns high-frequency details (such as edges) with the reference input (Ryu et al., 15 Jul 2025).

4. Implementation and Hyperparameter Choices

HFS-SDEdit requires no new learning objectives or additional losses; its efficacy is driven by the unmodified, pretrained diffusion model and the parameterization of the swapping mechanism. Mask blending uses hard mixing based on a down-sampled binary mask, not a learned penalty. The main trade-off hyperparameters are g(t)=β(t)g(t) = \sqrt{\beta(t)}7 (starting noise index), g(t)=β(t)g(t) = \sqrt{\beta(t)}8 (swap stop index), and g(t)=β(t)g(t) = \sqrt{\beta(t)}9 (Gaussian filter std), permitting flexible adjustment for fidelity/quality optimization.

Key implementation details in Elevate3D (Ryu et al., 15 Jul 2025):

Parameter Value Description
Backbone FLUX (UNet w/ self-attention) Diffusion-model architecture
Reverse Steps β(t)\beta(t)0 30 Total denoising steps
β(t)\beta(t)1 29 Starting noise index
β(t)\beta(t)2 18 High-frequency swap stop
β(t)\beta(t)3 4 Std for Gaussian filter
Mask threshold 0.5 For new-pixel detection
Geometry camera Orthographic For view synthesis and refinement

5. Texture–Geometry Refinement and Regularization

Following texture refinement, Elevate3D leverages images enhanced by HFS-SDEdit to improve geometry. A normal predictor estimates per-pixel surface orientation, from which a depth map is computed via minimization of a regularized normal-integration energy:

β(t)\beta(t)4

where β(t)\beta(t)5 are surface gradients, β(t)\beta(t)6 are predicted normals, and β(t)\beta(t)7 is the prior mesh depth. The regularizer β(t)\beta(t)8. The geometry refinement aims to match predicted per-view normals while remaining close to the original geometry in β(t)\beta(t)9 distance, following a semi-smooth integration approach with bilateral weights (Ryu et al., 15 Jul 2025).

6. Evaluation Metrics and Empirical Performance

On the GSO real-scan benchmark (59 objects; wt\mathbf{w}_t0 geometry downsampling, heavy texture blur), Elevate3D with HFS-SDEdit establishes new state-of-the-art results in 3D model refinement:

Metric Elevate3D DreamGaussian MagicBoost DiSR-NeRF
MUSIQ 66.53 61.67 51.65 48.94
LIQE 2.77 2.12 2.11 1.29
TOPIQ 0.53 0.47 0.39 0.39
Q-Align 3.22 2.74 2.50 2.68

In 2D enhancement (LSDIR validation), HFS-SDEdit achieves the highest no-reference scores (MUSIQ 39.52 vs. best SDEdit 29.19) and superior LPIPS (0.598 vs. 0.746), in exchange for some reduction in PSNR/SSIM, reflecting higher perceptual fidelity (Ryu et al., 15 Jul 2025).

Qualitatively, HFS-SDEdit produces images with sharp, accurate edges, preserves crisp structural details, and eliminates artifacts such as blur and scanlines. 3D meshes reconstructed following HFS-SDEdit-refined textures exhibit accurate silhouettes, fine bump details, and tight texture–geometry alignment.

7. Significance and Implications

HFS-SDEdit replaces SDEdit’s pure noising and denoising at high noise levels with targeted high-frequency injections from the reference. This modification removes the conventional fidelity–quality trade-off, enabling aggressive removal of low-frequency scan noise while anchoring critical high-frequency features—edges, corners, and lettering—to the input.

Embedded within Elevate3D’s alternating texture and geometry refinement pipeline, HFS-SDEdit underpins the highest texture and geometric consistency currently attained in open research for 3D asset enhancement (Ryu et al., 15 Jul 2025). A plausible implication is that the high-frequency swapping paradigm can generalize to other structured-editing tasks within diffusion-based restoration or translation, where content fidelity at select scales is paramount.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HFS-SDEdit.