Papers
Topics
Authors
Recent
Search
2000 character limit reached

BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement

Published 3 Jul 2024 in cs.CV | (2407.03535v2)

Abstract: Low-light videos often exhibit spatiotemporal incoherent noise, compromising visibility and performance in computer vision applications. One significant challenge in enhancing such content using deep learning is the scarcity of training data. This paper introduces a novel low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions, incorporating genuine noise and temporal artifacts. We provide fully registered ground truth data captured in normal light using a programmable motorized dolly and refine it via an image-based approach for pixel-wise frame alignment across different light levels. We provide benchmarks based on four different technologies: convolutional neural networks, transformers, diffusion models, and state space models (mamba). Our experimental results demonstrate the significance of fully registered video pairs for low-light video enhancement (LLVE) and the comprehensive evaluation shows that the models trained with our dataset outperform those trained with the existing datasets. Our dataset and links to benchmarks are publicly available at https://doi.org/10.21227/mzny-8c77.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Seeing motion in the dark. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  2. Seeing dynamic scene in the dark: High-quality video dataset with mechatronic alignment. In ICCV, 2021.
  3. Dancing in the dark: A benchmark towards general low-light video enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
  4. Self-supervised training for blind multi-frame video denoising. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2724–2734, January 2021.
  5. Self-supervised low-light image enhancement using discrepant untrained network priors. IEEE Transactions on Circuits and Systems for Video Technology, 32(11):7332–7345, 2022.
  6. N. Anantrasirichai and David Bull. Contextual colorization and denoising for low-light ultra high resolution sequences. In ICIP proc., pages 1614–1618, 2021.
  7. A topological loss function for image denoising on a new BVI-lowlight dataset. Signal Processing, 211, 2023.
  8. Dancing under the stars: video denoising in starlight. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16220–16230, 2022.
  9. BDD100K: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  10. Learning to see moving objects in the dark. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 7323–7332, 2019.
  11. Supervised raw video denoising with a benchmark dataset on dynamic scenes. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2298–2307, 2020.
  12. Gerald Schaefer. An uncompressed benchmark image dataset for colour imaging. In 2010 IEEE International Conference on Image Processing, pages 3537–3540, 2010.
  13. Benchmarking denoising algorithms with real photographs. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2750–2759, 2017.
  14. A high-quality denoising dataset for smartphone cameras. In CVPR proc., pages 1692–1700, 2018.
  15. Low-light image and video enhancement using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12):9396–9416, 2022.
  16. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015.
  17. Revisiting temporal alignment for video restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  18. Enhancing low light videos by exploring high sensitivity camera noise. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 4110–4118, 2019.
  19. Deformable convolutional networks. In ICCV, pages 764–773, Oct 2017.
  20. Low-light video enhancement with synthetic event guidance. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2):1692–1700, Jun. 2023.
  21. Low light video enhancement using synthetic data produced with an intermediate domain mapping. In European Conference on Computer Vision, pages 103–119. Springer, 2020.
  22. Atmospheric turbulence mitigation for sequences with moving objects using recursive image fusion. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 2895–2899, 2018.
  23. Dinggang Shen. Image registration by local histogram matching. Pattern Recognition, 40(4):1161–1172, 2007.
  24. Image registration by template matching using normalized cross-correlation. In 2009 International Conference on Advances in Computing, Control, and Telecommunication Technologies, pages 819–822, 2009.
  25. Noise flow: Noise modeling with conditional normalizing flows. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3165–3173, 2019.
  26. Anonymous. A spatio-temporal aligned sunet model for low-light video enhancement. In Submitting to IEEE International Conference on Image Processing, 2024.
  27. Denoising diffusion implicit models. ICLR, 2021.
  28. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
  29. Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166, 2024.
  30. Low-light image enhancement with wavelet-based diffusion models. ACM Transactions on Graphics (TOG), 42(6):1–14, 2023.
  31. EDVR: Video restoration with enhanced deformable convolutional networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019.
  32. L. Sendur and I.W. Selesnick. Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency. IEEE Transactions on Signal Processing, 50(11):2744–2756, 2002.
  33. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417, 2024.
  34. Swinir: Image restoration using swin transformer. In 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 1833–1844, 2021.
  35. Alex O. Holcombe. Seeing slow and seeing fast: two limits on perception. Trends in Cognitive Sciences, pages 216–221, 2009.
  36. High-resolution image synthesis and semantic manipulation with conditional gans. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8798–8807, 2018.
  37. Real image denoising with feature attention. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3155–3164, 2019.
Citations (1)

Summary

  • The paper introduces a fully registered low-light video dataset with 31,800 paired frames captured under varied conditions to improve enhancement models.
  • It employs a programmable motorized dolly and sophisticated post-processing to achieve precise pixel alignment for temporal consistency.
  • Benchmark evaluations show that models including CNNs, transformers, and diffusion techniques achieve notable gains in PSNR and SSIM using BVI-RLV.

Overview of BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement

Low-light video enhancement (LLVE) remains a challenging domain within computer vision due to the inherent issues caused by photon noise, color shifts, white balance inconsistencies, and other temporal artifacts. The paper "BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement" addresses these challenges by introducing a meticulously curated dataset, BVI-RLV, that includes fully registered low-light video sequences and their corresponding normal-light ground truths.

Dataset Introduction

The BVI-RLV dataset offers 40 dynamic scenes captured under two low-light conditions compared to a normal-light reference, covering a broad array of motion types and content variations, thus ensuring relevance for various real-world applications. The scenes are diversified by capturing both moving objects against static backgrounds and fully dynamic scenes with multiple types of camera and object motions.

To ensure pixel-wise alignment between low-light and normal-light video frames, the authors employ a programmable motorized dolly system coupled with sophisticated post-processing alignment techniques. This alignment is crucial for the temporal consistency required in machine learning models designed for LLVE. The dataset includes over 31,800 ground-truth paired frames, making it one of the most comprehensive datasets available for this task.

Benchmarks and Models

The paper evaluates the performance of multiple deep learning architectures using the BVI-RLV dataset. These models span a range of contemporary technologies:

  1. Convolutional Neural Networks (CNNs): PCDUNet
  2. Transformers: STA-SUNet
  3. Diffusion Models: BVI-CDM
  4. State Space Models: BVI-Mamba

These models are curated to require manageable computational resources, fostering broader accessibility for the research community. The performance metrics of these models, as trained on BVI-RLV, consistently outperform those trained on existing datasets (DRV, SDSD, DID), as evidenced by the comprehensive comparative analysis provided in the paper.

Comparative Results

The experimental results presented in the paper highlight the superiority of models trained with the BVI-RLV dataset. Notably, models like STA-SUNet and the novel BVI-CDM demonstrate significant improvements in PSNR and SSIM scores, illustrating the efficacy of using fully registered pairs for LLVE. Additionally, the research compares benchmark models across various datasets, establishing BVI-RLV as the most robust in terms of performance on unseen data.

Implications and Future Directions

The introduction of the BVI-RLV dataset has significant practical implications. The availability of a well-curated, comprehensive dataset stands to improve the development of more robust and generalizable LLVE models, benefiting applications in surveillance, robotics, and media production that necessitate high-quality video under low-light conditions.

The theoretical implications are equally profound. The dataset's use of fully registered video pairs across a variety of motions and content types provides a new standard for dataset quality in LLVE research. This methodology could inspire further research into alignment techniques and the integration of motion dynamics in other video enhancement domains.

Speculation on Future Developments

Looking forward, the creation of lighter, more efficient models that can handle large-scale video data without the need for extensive computational resources will be crucial. Additionally, expanding the dataset to include more diverse lighting conditions and environmental settings could further enhance the robustness of LLVE models. There is also potential for applying self-supervised or unpaired learning strategies that can leverage the BVI-RLV dataset to develop models capable of learning from less strictly controlled environments.

Finally, addressing the balance between enhancing video quality and the ethical considerations surrounding potential misuse of this technology will be an essential aspect of future research and application.

Conclusion

The BVI-RLV dataset and the comprehensive benchmarks provided by the authors represent a significant step forward in the field of low-light video enhancement. By combining rigorous data collection with state-of-the-art model evaluations, this work lays a robust foundation for future advancements and sets a new standard for quality in LLVE research.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 2 likes about this paper.