Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion (2401.03788v2)

Published 8 Jan 2024 in cs.CV

Abstract: Low-light image enhancement techniques have significantly progressed, but unstable image quality recovery and unsatisfactory visual perception are still significant challenges. To solve these problems, we propose a novel and robust low-light image enhancement method via CLIP-Fourier Guided Wavelet Diffusion, abbreviated as CFWD. Specifically, CFWD leverages multimodal visual-language information in the frequency domain space created by multiple wavelet transforms to guide the enhancement process. Multi-scale supervision across different modalities facilitates the alignment of image features with semantic features during the wavelet diffusion process, effectively bridging the gap between degraded and normal domains. Moreover, to further promote the effective recovery of the image details, we combine the Fourier transform based on the wavelet transform and construct a Hybrid High Frequency Perception Module (HFPM) with a significant perception of the detailed features. This module avoids the diversity confusion of the wavelet diffusion process by guiding the fine-grained structure recovery of the enhancement results to achieve favourable metric and perceptually oriented enhancement. Extensive quantitative and qualitative experiments on publicly available real-world benchmarks show that our approach outperforms existing state-of-the-art methods, achieving significant progress in image quality and noise suppression. The project code is available at https://github.com/hejh8/CFWD.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Generative Diffusion Prior for Unified Image Restoration and Enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9935–9946.
  2. A weighted variational model for simultaneous reflectance and illumination estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2782–2790.
  3. Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1780–1789.
  4. LIME: Low-light image enhancement via illumination map estimation. IEEE Transactions on image processing 26, 2 (2016), 982–993.
  5. R2rnet: Low-light image enhancement via real-low to real-normal network. Journal of Visual Communication and Image Representation 90 (2023), 103712.
  6. Combining Spatial and Frequency Information for Image Deblurring. IEEE Signal Processing Letters 29 (2022), 1679–1683.
  7. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30 (2017).
  8. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
  9. Low-light image enhancement with wavelet-based diffusion models. ACM Transactions on Graphics (TOG) 42, 6 (2023), 1–14.
  10. Enlightengan: Deep light enhancement without paired supervision. IEEE transactions on image processing 30 (2021), 2340–2349.
  11. F-vlm: Open-vocabulary object detection upon frozen vision and language models. arXiv preprint arXiv:2209.15639 (2022).
  12. Contrast enhancement based on layered difference representation of 2D histograms. IEEE transactions on image processing 22, 12 (2013), 5372–5384.
  13. Low-light image enhancement using the cell vibration model. IEEE Transactions on Multimedia (2022).
  14. Learning to enhance low-light image via zero-reference deep curve estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 8 (2021), 4225–4238.
  15. Embedding fourier for ultra-high-definition low-light image enhancement. arXiv preprint arXiv:2302.11831 (2023).
  16. A deep learning based image enhancement approach for autonomous driving at night. Knowledge-Based Systems 213 (2021), 106617.
  17. Structure-revealing low-light image enhancement via robust retinex model. IEEE Transactions on Image Processing 27, 6 (2018), 2828–2841.
  18. Iterative prompt learning for unsupervised backlit image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8094–8103.
  19. Seokjae Lim and Wonjun Kim. 2020. DSLR: Deep stacked Laplacian restorer for low-light image enhancement. IEEE Transactions on Multimedia 23 (2020), 4272–4284.
  20. Haoning Lin and Zhenwei Shi. 2014. Multi-scale retinex improvement for nighttime image enhancement. optik 125, 24 (2014), 7143–7148.
  21. Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10561–10570.
  22. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11461–11471.
  23. Image restoration with mean-reverting stochastic differential equations. arXiv preprint arXiv:2301.11699 (2023).
  24. BacklitNet: A dataset and network for backlit image enhancement. Computer Vision and Image Understanding 218 (2022), 103403.
  25. Toward fast, flexible, and robust low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5637–5646.
  26. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. Advances in neural information processing systems 29 (2016).
  27. No-reference image quality assessment in the spatial domain. IEEE Transactions on image processing 21, 12 (2012), 4695–4708.
  28. Making a “completely blind” image quality analyzer. IEEE Signal processing letters 20, 3 (2012), 209–212.
  29. Ozan Özdenizci and Robert Legenstein. 2023. Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
  30. Histogram-based transformation function estimation for low-light image enhancement. In 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 1–5.
  31. Image deblurring with domain generalizable diffusion models. arXiv preprint arXiv:2212.01789 (2022).
  32. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings. 1–10.
  33. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020).
  34. Yang Song and Stefano Ermon. 2019. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems 32 (2019).
  35. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020).
  36. Learning a convolutional neural network for non-uniform motion blur removal. In Proceedings of the IEEE conference on computer vision and pattern recognition. 769–777.
  37. Exploring clip for assessing the look and feel of images. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 2555–2563.
  38. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
  39. Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 17683–17693.
  40. Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560 (2018).
  41. Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5901–5910.
  42. SNR-aware low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 17714–17724.
  43. Arbitrarily-oriented text detection in low light natural scene images. IEEE Transactions on Multimedia 23 (2020), 2706–2720.
  44. From fidelity to perceptual quality: A semi-supervised approach for low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3063–3072.
  45. Sparse gradient regularized deep retinex network for robust low-light image enhancement. IEEE Transactions on Image Processing 30 (2021), 2072–2086.
  46. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5728–5739.
  47. Learning enriched features for real image restoration and enhancement. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16. Springer, 492–511.
  48. Open-vocabulary detr with conditional matching. In European Conference on Computer Vision. Springer, 106–122.
  49. Rellie: Deep reinforcement learning for customized low-light image enhancement. In Proceedings of the 29th ACM international conference on multimedia. 2429–2437.
  50. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.
  51. Extract free dense labels from clip. In European Conference on Computer Vision. Springer, 696–712.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Minglong Xue (9 papers)
  2. Jinhong He (10 papers)
  3. Wenhai Wang (123 papers)
  4. Mingliang Zhou (17 papers)
Citations (9)

Summary

Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion

The paper "Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion" presents a novel approach to address prevalent challenges in low-light image enhancement. Despite significant advancements in the field, encountering poor image quality recovery and subpar visual perception remains common issues. The authors propose a method, abbreviated as CFWD, which leverages multimodal visual-language information to guide the enhancement process in the frequency domain, utilizing a unique interplay between wavelet and Fourier transforms.

The CFWD method introduces a fusion of various signal processing techniques and modern neural network paradigms. It employs a wavelet diffusion model, which significantly reduces computational overhead by shifting the diffusion process into the wavelet low-frequency domain. This transformation effectively downsamples the image data, facilitating resource-efficient processing. The method further incorporates a sophisticated hybrid of wavelet and Fourier transforms within a High Frequency Perception Module (HFPM) to capture fine-grained details and ensure coherent image content across different lighting conditions.

One of the paper's key contributions is the integration of the Contrast-Language-Image-Pre-Training (CLIP) model, enabling robust multi-modal semantic alignment. The authors develop a multiscale visual-language guidance network that iteratively enhances image quality. By integrating visual-language prompts within the diffusion process, this network aligns low-light image features with semantic meanings, significantly improving both metric-oriented and perceptual outcomes.

Extensive experiments performed on standard benchmark datasets demonstrate that CFWD outperforms state-of-the-art methods across various metrics, such as PSNR, SSIM, LPIPS, and FID. Particularly notable is the visual quality of the results, highlighted in comparisons against existing techniques. The quantitative assessments indicate substantial progress in both brightness and detail preservation, showcasing CFWD's capability to deliver images with realistic visual appeal.

The implications of CFWD extend beyond just theoretical contributions. Practically, this approach provides a framework that could be adapted for real-world applications such as surveillance, autonomous vehicles, and digital photography, where low-light conditions frequently pose challenges. Theoretically, the amalgamation of wavelet transformation, Fourier analysis, and diffusion models opens new avenues for research on multi-resolution and multi-modal image processing techniques.

The paper points to future work directions, including optimizing the computational efficiency of the proposed method and refining the complexity of the visual-language guidance mechanism. The exploration of adaptive techniques to fine-tune the hybrid frequency domain perception module further remains an intriguing prospect. This research contributes to the broader field of image processing, offering insights into enhancing image quality under challenging lighting circumstances. As advancements in AI and machine learning persist, methods like CFWD may serve as foundational frameworks upon which more sophisticated solutions are built.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub

Youtube Logo Streamline Icon: https://streamlinehq.com