Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving (2404.04804v1)

Published 7 Apr 2024 in cs.CV

Abstract: Vision-centric perception systems for autonomous driving have gained considerable attention recently due to their cost-effectiveness and scalability, especially compared to LiDAR-based systems. However, these systems often struggle in low-light conditions, potentially compromising their performance and safety. To address this, our paper introduces LightDiff, a domain-tailored framework designed to enhance the low-light image quality for autonomous driving applications. Specifically, we employ a multi-condition controlled diffusion model. LightDiff works without any human-collected paired data, leveraging a dynamic data degradation process instead. It incorporates a novel multi-condition adapter that adaptively controls the input weights from different modalities, including depth maps, RGB images, and text captions, to effectively illuminate dark scenes while maintaining context consistency. Furthermore, to align the enhanced images with the detection model's knowledge, LightDiff employs perception-specific scores as rewards to guide the diffusion training process through reinforcement learning. Extensive experiments on the nuScenes datasets demonstrate that LightDiff can significantly improve the performance of several state-of-the-art 3D detectors in night-time conditions while achieving high visual quality scores, highlighting its potential to safeguard autonomous driving.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Cdul: Clip-driven unsupervised learning for multi-label image classification. In IEEE/CVF International Conference on Computer Vision, pages 1348–1357, 2023.
  2. Learning multi-scale photo exposure correction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9157–9167, 2021.
  3. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
  4. Catastrophic factors involved in road accidents: Underlying causes and descriptive analysis. PLoS one, 14(10):e0223473, 2019.
  5. Towards language models that can see: Computer vision through the lens of natural language. arXiv:2306.16410, 2023.
  6. Unprocessing images for learned raw denoising. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11036–11045, 2019.
  7. nuscenes: A multimodal dataset for autonomous driving. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11621–11631, 2020.
  8. Learning a deep single image contrast enhancer from multi-exposure images. IEEE Transactions on Image Processing, 27(4):2049–2062, 2018.
  9. Multitask aet with orthogonal tangent regularity for dark object detection. In IEEE/CVF International Conference on Computer Vision, pages 2553–2562, 2021.
  10. An empirical study of training end-to-end vision-and-language transformers. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18166–18176, 2022.
  11. Generative diffusion prior for unified image restoration and enhancement. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9935–9946, 2023.
  12. Cyclip: Cyclic contrastive language-image pretraining. Advances in Neural Information Processing Systems, 35:6704–6719, 2022.
  13. No-reference image quality assessment via transformers, relative ranking, and self-consistency. In IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1220–1230, 2022.
  14. A kernel two-sample test. Journal of Machine Learning Research, 13(25):723–773, 2012.
  15. Vector quantized diffusion model for text-to-image synthesis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10696–10706, 2022.
  16. Zero-reference deep curve estimation for low-light image enhancement. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1780–1789, 2020.
  17. Shadowdiffusion: When degradation prior meets diffusion model for shadow removal. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14049–14058, 2023.
  18. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  19. Evading deepfake detectors via adversarial statistical consistency. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12271–12280, 2023.
  20. A two-stage unsupervised approach for low light image enhancement. IEEE Robotics and Automation Letters, 6(4):8363–8370, 2021.
  21. Deep fourier-based exposure correction network with spatial-frequency interaction. In European Conference on Computer Vision, pages 163–180. Springer, 2022.
  22. Enlightengan: Deep light enhancement without paired supervision. IEEE Transactions on Image Processing, 30:2340–2349, 2021.
  23. Unsupervised night image enhancement: When layer decomposition meets light-effects suppression. In European Conference on Computer Vision, pages 404–421. Springer, 2022.
  24. Musiq: Multi-scale image quality transformer. In IEEE/CVF International Conference on Computer Vision, pages 5148–5157, 2021.
  25. Diffusionclip: Text-guided diffusion models for robust image manipulation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2426–2435, 2022.
  26. Auto-encoding variational bayes. arXiv:1312.6114, 2013.
  27. Deep reinforcement learning in computer vision: a comprehensive survey. Artificial Intelligence Review, pages 1–87, 2022.
  28. Learning to enhance low-light image via zero-reference deep curve estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8):4225–4238, 2021a.
  29. Align before fuse: Vision and language representation learning with momentum distillation. Advances in Neural Information Processing Systems, 34:9694–9705, 2021b.
  30. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pages 12888–12900. PMLR, 2022a.
  31. Bevstereo: Enhancing depth estimation in multi-view 3d object detection with dynamic temporal stereo. arXiv:2209.10248, 2022b.
  32. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. arXiv:2206.10092, 2022c.
  33. Iterative prompt learning for unsupervised backlit image enhancement. In IEEE/CVF International Conference on Computer Vision, pages 8094–8103, 2023.
  34. Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10561–10570, 2021.
  35. Toward fast, flexible, and robust low-light image enhancement. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5637–5646, 2022.
  36. Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters, 20(3):209–212, 2012.
  37. National Transportation Safety Board. Inadequate safety culture contributed to uber automated test vehicle crash. https://www.ntsb.gov/news/press-releases/Pages/NR20191119c.aspx, 2019. Accessed: 2023-11-16.
  38. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, 2022.
  39. Distort-and-recover: Color enhancement using deep reinforcement learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5928–5936, 2018.
  40. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  41. Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125, 1(2):3, 2022.
  42. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1623–1637, 2020.
  43. Low-light image enhancement via a deep hybrid network. IEEE Transactions on Image Processing, 28(9):4364–4375, 2019.
  44. High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  45. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
  46. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  47. Blindly assess image quality in the wild guided by a self-adaptive hyper network. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3667–3676, 2020.
  48. Nima: Neural image assessment. IEEE Transactions on Image Processing, 27(8):3998–4011, 2018.
  49. Exposurediffusion: Learning to expose for low-light image enhancement. In IEEE/CVF International Conference on Computer Vision, pages 12438–12448, 2023.
  50. Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5901–5910, 2022.
  51. Snr-aware low-light image enhancement. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17714–17724, 2022.
  52. Maniqa: Multi-dimension attention network for no-reference image quality assessment. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1191–1200, 2022.
  53. Diff-retinex: Rethinking low-light image enhancement with a generative diffusion model. In IEEE/CVF International Conference on Computer Vision, pages 12302–12311, 2023.
  54. A feature-enriched completely blind image quality evaluator. IEEE Transactions on Image Processing, 24(8):2579–2591, 2015.
  55. Adding conditional control to text-to-image diffusion models. In IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023a.
  56. Blind image quality assessment via vision-language correspondence: A multitask learning perspective. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14071–14081, 2023b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Jinlong Li (50 papers)
  2. Baolu Li (12 papers)
  3. Zhengzhong Tu (71 papers)
  4. Xinyu Liu (123 papers)
  5. Qing Guo (146 papers)
  6. Felix Juefei-Xu (93 papers)
  7. Runsheng Xu (40 papers)
  8. Hongkai Yu (49 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.