HFS-SDEdit Texture Refinement
- The paper introduces HFS-SDEdit, a texture enhancement method that injects high-frequency details during reverse diffusion to decouple fidelity from quality.
- It leverages score-based diffusion models with iterative denoising and high-frequency swapping, preserving crucial edge and structural details in 3D textures.
- Empirical results on GSO and LSDIR benchmarks show that HFS-SDEdit achieves state-of-the-art perceptual scores (e.g., MUSIQ of 66.53) and robust texture–geometry alignment.
HFS-SDEdit (High-Frequency-Swapping SDEdit) is a texture enhancement method designed to refine low-quality 3D asset textures in modern score-based diffusion frameworks. Developed as the core of the Elevate3D pipeline, HFS-SDEdit directly addresses the fidelity–quality trade-off inherent in classical diffusion-based editing, providing state-of-the-art refinement capabilities for both 2D images and 3D model textures while preserving crucial high-frequency structural detail (Ryu et al., 15 Jul 2025).
1. Theoretical Foundation: Score-Based Diffusion Models
HFS-SDEdit operates within the score-based diffusion model framework established by Song & Ermon. The generative process is modeled as a continuous-time stochastic differential equation (SDE) transforming a clean image into Gaussian noise via
where the drift and the diffusion , with as the variance schedule and standard Brownian motion. Sampling is achieved via the reverse-time SDE,
with the score approximated by a network . In discrete terms (as in DDPM), the denoising update is
where 0 are cumulative-product schedule parameters and 1 (Ryu et al., 15 Jul 2025).
2. Reference-Guided SDEdit and Its Limitations
SDEdit provides a training-free anchoring of the diffusion process to a reference image 2. The user chooses a noise-level 3, controlling the trade-off between output quality and fidelity to 4. The process involves:
- Forming the noised latent 5, with 6.
- Running reverse diffusion from 7 down to 8.
Lower 9 preserves the fidelity of 0 but does not substantially improve perceptual quality, while higher 1 removes low-frequency domain artifacts but diminishes alignment with the input. This coupling between fidelity and quality is a primary limitation of classical SDEdit (Ryu et al., 15 Jul 2025).
3. High-Frequency-Swapping Mechanism
HFS-SDEdit introduces high-frequency injection during reverse diffusion to decouple fidelity from perceptual quality enhancement. The core insight is that low-frequency bands encode domain appearance, whereas high-frequency components represent perceptual edge and fine structure details.
The procedure is as follows:
- Initialization: At the chosen 2, generate 3.
- Iterative Denoising and Swapping (for 4):
- Denoise: 5 (reverse-DDPM).
- Freshly re-noise the reference: 6.
- High-Frequency Swap: Update latent via
7
where 8 is a Gaussian low-pass filter and 9 denotes convolution; high frequencies from 0 (the reference) replace those of 1 (the diffused latent). 4. Mask Blending (optional): For partial refinement, blend via
2
with 3 as the refinement mask. 5. Use 4 as the latent for the next step.
- Final Unaltered Decoding: For 5, continue with vanilla reverse diffusion.
This mechanism ensures that the final sample 6 conforms to the diffusion model distribution in the low frequencies and perfectly aligns high-frequency details (such as edges) with the reference input (Ryu et al., 15 Jul 2025).
4. Implementation and Hyperparameter Choices
HFS-SDEdit requires no new learning objectives or additional losses; its efficacy is driven by the unmodified, pretrained diffusion model and the parameterization of the swapping mechanism. Mask blending uses hard mixing based on a down-sampled binary mask, not a learned penalty. The main trade-off hyperparameters are 7 (starting noise index), 8 (swap stop index), and 9 (Gaussian filter std), permitting flexible adjustment for fidelity/quality optimization.
Key implementation details in Elevate3D (Ryu et al., 15 Jul 2025):
| Parameter | Value | Description |
|---|---|---|
| Backbone | FLUX (UNet w/ self-attention) | Diffusion-model architecture |
| Reverse Steps 0 | 30 | Total denoising steps |
| 1 | 29 | Starting noise index |
| 2 | 18 | High-frequency swap stop |
| 3 | 4 | Std for Gaussian filter |
| Mask threshold | 0.5 | For new-pixel detection |
| Geometry camera | Orthographic | For view synthesis and refinement |
5. Texture–Geometry Refinement and Regularization
Following texture refinement, Elevate3D leverages images enhanced by HFS-SDEdit to improve geometry. A normal predictor estimates per-pixel surface orientation, from which a depth map is computed via minimization of a regularized normal-integration energy:
4
where 5 are surface gradients, 6 are predicted normals, and 7 is the prior mesh depth. The regularizer 8. The geometry refinement aims to match predicted per-view normals while remaining close to the original geometry in 9 distance, following a semi-smooth integration approach with bilateral weights (Ryu et al., 15 Jul 2025).
6. Evaluation Metrics and Empirical Performance
On the GSO real-scan benchmark (59 objects; 0 geometry downsampling, heavy texture blur), Elevate3D with HFS-SDEdit establishes new state-of-the-art results in 3D model refinement:
| Metric | Elevate3D | DreamGaussian | MagicBoost | DiSR-NeRF |
|---|---|---|---|---|
| MUSIQ | 66.53 | 61.67 | 51.65 | 48.94 |
| LIQE | 2.77 | 2.12 | 2.11 | 1.29 |
| TOPIQ | 0.53 | 0.47 | 0.39 | 0.39 |
| Q-Align | 3.22 | 2.74 | 2.50 | 2.68 |
In 2D enhancement (LSDIR validation), HFS-SDEdit achieves the highest no-reference scores (MUSIQ 39.52 vs. best SDEdit 29.19) and superior LPIPS (0.598 vs. 0.746), in exchange for some reduction in PSNR/SSIM, reflecting higher perceptual fidelity (Ryu et al., 15 Jul 2025).
Qualitatively, HFS-SDEdit produces images with sharp, accurate edges, preserves crisp structural details, and eliminates artifacts such as blur and scanlines. 3D meshes reconstructed following HFS-SDEdit-refined textures exhibit accurate silhouettes, fine bump details, and tight texture–geometry alignment.
7. Significance and Implications
HFS-SDEdit replaces SDEdit’s pure noising and denoising at high noise levels with targeted high-frequency injections from the reference. This modification removes the conventional fidelity–quality trade-off, enabling aggressive removal of low-frequency scan noise while anchoring critical high-frequency features—edges, corners, and lettering—to the input.
Embedded within Elevate3D’s alternating texture and geometry refinement pipeline, HFS-SDEdit underpins the highest texture and geometric consistency currently attained in open research for 3D asset enhancement (Ryu et al., 15 Jul 2025). A plausible implication is that the high-frequency swapping paradigm can generalize to other structured-editing tasks within diffusion-based restoration or translation, where content fidelity at select scales is paramount.