Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 64 tok/s

Gemini 2.5 Pro 50 tok/s Pro

GPT-5 Medium 30 tok/s Pro

GPT-5 High 35 tok/s Pro

GPT-4o 77 tok/s Pro

Kimi K2 174 tok/s Pro

GPT OSS 120B 457 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process (2409.07451v1)

Published 11 Sep 2024 in cs.CV and cs.MM

Abstract: The emergence of text-to-image generation models has led to the recognition that image enhancement, performed as post-processing, would significantly improve the visual quality of the generated images. Exploring diffusion models to enhance the generated images nevertheless is not trivial and necessitates to delicately enrich plentiful details while preserving the visual appearance of key content in the original image. In this paper, we propose a novel framework, namely FreeEnhance, for content-consistent image enhancement using the off-the-shelf image diffusion models. Technically, FreeEnhance is a two-stage process that firstly adds random noise to the input image and then capitalizes on a pre-trained image diffusion model (i.e., Latent Diffusion Models) to denoise and enhance the image details. In the noising stage, FreeEnhance is devised to add lighter noise to the region with higher frequency to preserve the high-frequent patterns (e.g., edge, corner) in the original image. In the denoising stage, we present three target properties as constraints to regularize the predicted noise, enhancing images with high acutance and high visual quality. Extensive experiments conducted on the HPDv2 dataset demonstrate that our FreeEnhance outperforms the state-of-the-art image enhancement models in terms of quantitative metrics and human preference. More remarkably, FreeEnhance also shows higher human preference compared to the commercial image enhancement solution of Magnific AI.

Citations (2)

View on Semantic Scholar

Collections

Summary

The paper introduces FreeEnhance, a novel tuning-free framework leveraging latent diffusion models for image enhancement through a content-consistent noising and denoising process.
FreeEnhance employs a two-stage method involving frequency-aware noising and constrained denoising guided by objectives for image acutance, noise distribution, and adversarial regularization.
Experiments show FreeEnhance outperforms state-of-the-art models and commercial solutions like Magnific AI in quantitative metrics and human preference on datasets like HPDv2, demonstrating its effectiveness and generalization.

The paper introduces FreeEnhance, a framework for content-consistent image enhancement utilizing pre-trained image diffusion models, specifically addressing the challenge of enriching image details while preserving key content from the original image. FreeEnhance employs a two-stage process involving noising and denoising via Latent Diffusion Models (LDM).

The noising stage is designed to add lighter noise to regions with higher frequency, preserving high-frequency patterns like edges and corners. Conversely, lower-frequency regions receive heavier noise to introduce details. DDIM inversion is used to apply light noise to high-frequency regions, while random noise with higher intensity is introduced to low-frequency regions.

In the denoising stage, three target properties act as constraints to regularize predicted noise, enhancing images with high acutance and high visual quality. These constraints are:

Image Acutance: Enhances edge contrast, making images appear sharper. The objective function is:

$\mathcal{L}_{acu}=-\frac{1}{HW}\sum_{i=0,j=0}^{H,W}V(\mathcal{F}_{acu}(\hat{x}_{t\rightarrow0})_{(i,j)})\mathcal{F}_{acu}(\hat{x}_{t\rightarrow0})_{(i,j)}$

where:
- $\mathcal{L}_{acu}$ is the acutance loss.
- $H, W$ represent the spatial size of the noisy image.
- $(i,j)$ are the indices of the spatial element.
- $\mathcal{F}_{acu}$ is the Sobel operator.
- $\hat{x}_{t\rightarrow0}$ is the intermediate reconstruction of $x_0$ at the timestep $t$ .
- $V(\cdot)$ is a binary indicator function.
Noise Distribution: Addresses the generalization error where predicted noise may not follow a Gaussian distribution. The objective is:

$\mathcal{L}_{dist}=||1 - \mathcal{F}_{var}(\epsilon_\theta(x_t; t, y))||_2$

where:
- $\mathcal{L}_{dist}$ is the distribution loss.
- $\epsilon_\theta(x_t; t, y)$ is the noise predicted by a diffusion model.
- $\mathcal{F}_{var}$ calculates the variance of the predicted noise.
Adversarial Regularization: Prevents blurred images by incorporating a Gaussian blur function:

$\mathcal{L}_{adv} = ||\hat{x}_{t\rightarrow0} - \mathcal{F}_{blur}(\hat{x}_{t\rightarrow0})||_2$

where:
- $\mathcal{L}_{adv}$ is the adversarial loss.
- $\hat{x}_{t\rightarrow0}$ is the intermediate reconstruction of $x_0$ at the timestep $t$ .
- $\mathcal{F}_{blur}$ is a Gaussian blur function.

The sampling result $x_{t-1}$ in each denoising operation is altered by $x_{t-1}^*$ :

$x_{t-1}^* = x_{t-1} - \rho_{acu}\triangledown_{x_t}\mathcal{L}_{acu} - \rho_{dist}\triangledown_{x_t}\mathcal{L}_{dist} - \rho_{adv}\triangledown_{x_t}\mathcal{L}_{adv}$

where $\rho_{acu}=4$ , $\rho_{dist}=20$ , and $\rho_{adv}=0.3$ are the tradeoff parameters determined through experimental studies.

Experiments on the HPDv2 dataset demonstrate that FreeEnhance outperforms state-of-the-art image enhancement models in quantitative metrics and human preference, even surpassing the commercial solution of Magnific AI. The Human Preference Score v2 (HPSv2) metric shows FreeEnhance achieves a score of 29.32 without parameter tuning.

Ablation studies validate the contribution of each component of FreeEnhance, including the noising stage and the three denoising constraints. The framework also shows generalization capabilities in text-to-image generation and natural image enhancement. When tested in a text-to-image generation scenario, FreeEnhance achieved the highest HPSv2 score of 25.26 using Stable Diffusion 1.5.