Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision (2108.10831v4)

Published 24 Aug 2021 in cs.CV and cs.AI

Abstract: It is very challenging for various visual tasks such as image fusion, pedestrian detection and image-to-image translation in low light conditions due to the loss of effective target areas. In this case, infrared and visible images can be used together to provide both rich detail information and effective target areas. In this paper, we present LLVIP, a visible-infrared paired dataset for low-light vision. This dataset contains 30976 images, or 15488 pairs, most of which were taken at very dark scenes, and all of the images are strictly aligned in time and space. Pedestrians in the dataset are labeled. We compare the dataset with other visible-infrared datasets and evaluate the performance of some popular visual algorithms including image fusion, pedestrian detection and image-to-image translation on the dataset. The experimental results demonstrate the complementary effect of fusion on image information, and find the deficiency of existing algorithms of the three visual tasks in very low-light conditions. We believe the LLVIP dataset will contribute to the community of computer vision by promoting image fusion, pedestrian detection and image-to-image translation in very low-light applications. The dataset is being released in https://bupt-ai-cz.github.io/LLVIP. Raw data is also provided for further research such as image registration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.
  2. Background-subtraction using contour-based fusion of thermal and visible imagery. Computer vision and image understanding, 106(2-3):162–182, 2007.
  3. Otcbvs benchmark dataset collection. http://vcipl-okstate.org/pbvs/bench/, 2007.
  4. Pedestrian detection at day/night time with visible and fir cameras: A comparison. Sensors, 16(6):820, 2016.
  5. Generative adversarial nets. Advances in neural information processing systems, 27:2672–2680, 2014.
  6. A new image fusion performance metric based on visual information fidelity. Information fusion, 14(2):127–135, 2013.
  7. Multispectral pedestrian detection: Benchmark dataset and baselines. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
  8. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
  9. ultralytics/yolov5: v3.0, Aug. 2020.
  10. Densefuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 28(5):2614–2623, 2018.
  11. Infrared and visible image fusion via gradient transfer and total variation minimization. Information Fusion, 31:100–109, 2016.
  12. Fusiongan: A generative adversarial network for infrared and visible image fusion. Information Fusion, 48:11–26, 2019.
  13. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
  14. A new quality metric for image fusion. In Proceedings 2003 International Conference on Image Processing (Cat. No. 03CH37429), volume 3, pages III–173. IEEE, 2003.
  15. Sparse gans for thermal infrared image generation from optical image. IEEE Access, 8:180124–180132, 2020.
  16. Information measure for performance of image fusion. Electronics letters, 38(7):313–315, 2002.
  17. Fusion performance measures and a lifting wavelet transform based algorithm for image fusion. In Proceedings of the Fifth International Conference on Information Fusion. FUSION 2002.(IEEE Cat. No. 02EX5997), volume 1, pages 317–320. IEEE, 2002.
  18. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
  19. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017.
  20. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
  21. Alexander Toet et al. Tno image fusion dataset. https://doi.org/10.6084/m9.figshare.1008029.v1, 2014.
  22. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  23. Ifcnn: A general image fusion framework based on convolutional neural network. Information Fusion, 54:99–118, 2020.
Citations (258)

Summary

  • The paper introduces LLVIP, a synchronized dataset of 15,488 visible-infrared image pairs that advances research in low-light vision challenges.
  • It evaluates state-of-the-art methods such as DenseFuse, IFCNN, Yolov5, and Pix2Pix GAN to reveal strengths and limitations in image fusion and pedestrian detection.
  • Findings highlight the need for enhanced multispectral algorithms to improve detail preservation and detection accuracy under extreme low-light conditions.

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision

The paper introduces LLVIP, a significant contribution to the field of computer vision through the introduction of a visible-infrared paired dataset specifically crafted for low-light vision applications. This dataset, comprising 15,488 rigorously aligned image pairs complete with pedestrian annotations, aims to enhance research and development in challenging low-light visual tasks such as image fusion, pedestrian detection, and image-to-image translation.

Dataset Characteristics and Justification

The LLVIP dataset fills a crucial gap by providing a synchronous and high-resolution visible-infrared image dataset subjected to extreme low-light conditions. The significance of combining infrared and visible imagery is underscored by the complementary nature of these modalities: while visible images capture rich textures, infrared images offer robustness in target detection despite low visibility in the visible spectrum.

Evaluation of Methods on LLVIP

The paper evaluates a range of state-of-the-art visual algorithms on this dataset, encompassing image fusion algorithms such as DenseFuse and IFCNN, pedestrian detection using Yolov5 and Yolov3, and image-to-image translation exemplified by Pix2Pix GAN. The results elucidate the complementary benefits and deficiencies in current methodologies:

  1. Image Fusion: Through subjective and objective assessments, DenseFuse (l1l_1 strategy) and IFCNN emerge as relatively proficient in balancing detail preservation in poor light by combining visual cues from both modalities. Nonetheless, issues such as detail loss in dark regions indicate substantial room for improvement in fusion strategies.
  2. Pedestrian Detection: Experiments reveal that pedestrian detection using traditional visible light datasets struggles significantly in low-light scenarios, as reflected by lower AP scores compared to infrared-based detection that performs better under these conditions. This finding highlights the necessity to leverage multispectral data for improved detection accuracy in suboptimal lighting.
  3. Image-to-Image Translation: The translation from visible to infrared demonstrates less than satisfactory results when assessed using metrics such as SSIM and PSNR, indicating that algorithms such as Pix2Pix GAN face significant challenges in modeling complex thermal imaging correlations in diverse conditions in LLVIP.

Implications and Future Directions

The LLVIP dataset proposes a benchmark that not only challenges existing algorithms considerably but also motivates further exploration into more efficient multispectral algorithms capable of bridging performance gaps. The alignment in time and space alongside carefully annotated pedestrian data positions LLVIP as a critical resource in:

  • Developing robust low-light pedestrian detection systems by integrating multispectral inputs.
  • Advancing image fusion techniques tailored to nighttime visibility and surveillance requirements.
  • Pioneering novel image translation models that effectively extrapolate thermal features from visible imagery.

Long-term, LLVIP fosters a pathway towards refining algorithmic approaches and deepening theoretical understanding of multispectral data synergy under adverse lighting conditions. Future developments can include exploring advanced methods for enhanced feature representation, possibly incorporating newer architectures in unsupervised and semi-supervised learning, and testing novel sensor fusion approaches to realize truly adaptive, real-world capable vision systems.

Github Logo Streamline Icon: https://streamlinehq.com