- The paper introduces FreeEnhance, a novel tuning-free framework leveraging latent diffusion models for image enhancement through a content-consistent noising and denoising process.
- FreeEnhance employs a two-stage method involving frequency-aware noising and constrained denoising guided by objectives for image acutance, noise distribution, and adversarial regularization.
- Experiments show FreeEnhance outperforms state-of-the-art models and commercial solutions like Magnific AI in quantitative metrics and human preference on datasets like HPDv2, demonstrating its effectiveness and generalization.
The paper introduces FreeEnhance, a framework for content-consistent image enhancement utilizing pre-trained image diffusion models, specifically addressing the challenge of enriching image details while preserving key content from the original image. FreeEnhance employs a two-stage process involving noising and denoising via Latent Diffusion Models (LDM).
The noising stage is designed to add lighter noise to regions with higher frequency, preserving high-frequency patterns like edges and corners. Conversely, lower-frequency regions receive heavier noise to introduce details. DDIM inversion is used to apply light noise to high-frequency regions, while random noise with higher intensity is introduced to low-frequency regions.
In the denoising stage, three target properties act as constraints to regularize predicted noise, enhancing images with high acutance and high visual quality. These constraints are:
- Image Acutance: Enhances edge contrast, making images appear sharper. The objective function is:
Lacu=−HW1i=0,j=0∑H,WV(Facu(x^t→0)(i,j))Facu(x^t→0)(i,j)
where:
- Lacu is the acutance loss.
- H,W represent the spatial size of the noisy image.
- (i,j) are the indices of the spatial element.
- Facu is the Sobel operator.
- x^t→0 is the intermediate reconstruction of x0 at the timestep t.
- V(⋅) is a binary indicator function.
- Noise Distribution: Addresses the generalization error where predicted noise may not follow a Gaussian distribution. The objective is:
Ldist=∣∣1−Fvar(ϵθ(xt;t,y))∣∣2
where:
- Ldist is the distribution loss.
- ϵθ(xt;t,y) is the noise predicted by a diffusion model.
- Fvar calculates the variance of the predicted noise.
- Adversarial Regularization: Prevents blurred images by incorporating a Gaussian blur function:
Ladv=∣∣x^t→0−Fblur(x^t→0)∣∣2
where:
- Ladv is the adversarial loss.
- x^t→0 is the intermediate reconstruction of x0 at the timestep t.
- Fblur is a Gaussian blur function.
The sampling result xt−1 in each denoising operation is altered by xt−1∗:
xt−1∗=xt−1−ρacu▽xtLacu−ρdist▽xtLdist−ρadv▽xtLadv
where ρacu=4, ρdist=20, and ρadv=0.3 are the tradeoff parameters determined through experimental studies.
Experiments on the HPDv2 dataset demonstrate that FreeEnhance outperforms state-of-the-art image enhancement models in quantitative metrics and human preference, even surpassing the commercial solution of Magnific AI. The Human Preference Score v2 (HPSv2) metric shows FreeEnhance achieves a score of 29.32 without parameter tuning.
Ablation studies validate the contribution of each component of FreeEnhance, including the noising stage and the three denoising constraints. The framework also shows generalization capabilities in text-to-image generation and natural image enhancement. When tested in a text-to-image generation scenario, FreeEnhance achieved the highest HPSv2 score of 25.26 using Stable Diffusion 1.5.