Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection Guided by an Event Camera (2309.09297v2)

Published 17 Sep 2023 in cs.CV and cs.RO

Abstract: The ability to detect objects in all lighting (i.e., normal-, over-, and under-exposed) conditions is crucial for real-world applications, such as self-driving.Traditional RGB-based detectors often fail under such varying lighting conditions.Therefore, recent works utilize novel event cameras to supplement or guide the RGB modality; however, these methods typically adopt asymmetric network structures that rely predominantly on the RGB modality, resulting in limited robustness for all-day detection. In this paper, we propose EOLO, a novel object detection framework that achieves robust and efficient all-day detection by fusing both RGB and event modalities. Our EOLO framework is built based on a lightweight spiking neural network (SNN) to efficiently leverage the asynchronous property of events. Buttressed by it, we first introduce an Event Temporal Attention (ETA) module to learn the high temporal information from events while preserving crucial edge information. Secondly, as different modalities exhibit varying levels of importance under diverse lighting conditions, we propose a novel Symmetric RGB-Event Fusion (SREF) module to effectively fuse RGB-Event features without relying on a specific modality, thus ensuring a balanced and adaptive fusion for all-day detection. In addition, to compensate for the lack of paired RGB-Event datasets for all-day training and evaluation, we propose an event synthesis approach based on the randomized optical flow that allows for directly generating the event frame from a single exposure image. We further build two new datasets, E-MSCOCO and E-VOC based on the popular benchmarks MSCOCO and PASCAL VOC. Extensive experiments demonstrate that our EOLO outperforms the state-of-the-art detectors,e.g.,RENet,by a substantial margin (+3.74% mAP50) in all lighting conditions.Our code and datasets will be available at https://vlislab22.github.io/EOLO/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. J. Zhang, Y. Wang, W. Liu, M. Li, J. Bai, B. Yin, and X. Yang, “Frame-event alignment and fusion network for high frame rate tracking,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 9781–9790.
  2. X. Ye, M. Shu, H. Li, Y. Shi, Y. Li, G. Wang, X. Tan, and E. Ding, “Rope3d: The roadside perception dataset for autonomous driving and monocular 3d object detection task,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 21 341–21 350.
  3. D. Feng, C. Haase-Schütz, L. Rosenbaum, H. Hertlein, C. Glaeser, F. Timm, W. Wiesbeck, and K. Dietmayer, “Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges,” IEEE Transactions on Intelligent Transportation Systems (TITS), vol. 22, no. 3, pp. 1341–1360, 2020.
  4. I. Marković, F. Chaumette, and I. Petrović, “Moving object detection, tracking and following using an omnidirectional camera on a mobile robot,” in IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2014, pp. 5630–5635.
  5. D. Park, Y. Seo, D. Shin, J. Choi, and S. Y. Chun, “A single multi-task deep neural network with post-processing for object detection with reasoning and robotic grasp detection,” in IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 7300–7306.
  6. J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” ArXiv, vol. abs/1804.02767, 2018.
  7. Y. Li, H. Mao, R. Girshick, and K. He, “Exploring plain vision transformer backbones for object detection,” in European Conference on Computer Vision (ECCV).   Springer, 2022, pp. 280–296.
  8. H. Rashed, M. Ramzy, V. Vaquero, A. El Sallab, G. Sistu, and S. Yogamani, “Fusemodnet: Real-time camera and lidar based moving object detection for robust low-light autonomous driving,” in IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2019, pp. 0–0.
  9. J. Zhu, S. Lai, X. Chen, D. Wang, and H. Lu, “Visual prompt multi-modal tracking,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 9516–9526.
  10. J. Zhang, H. Liu, K. Yang, X. Hu, R. Liu, and R. Stiefelhagen, “Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers,” IEEE Transactions on Intelligent Transportation Systems (TITS), 2023.
  11. Z. Zhou, Z. Wu, R. Boutteau, F. Yang, C. Demonceaux, and D. Ginhac, “Rgb-event fusion for moving object detection in autonomous driving,” in IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 7808–7815.
  12. P. Shi, J. Peng, J. Qiu, X. Ju, F. P. W. Lo, and B. Lo, “Even: An event-based framework for monocular depth estimation at adverse night conditions,” arXiv preprint arXiv:2302.03860, 2023.
  13. J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
  14. J. Zhang, X. Yang, Y. Fu, X. Wei, B. Yin, and B. Dong, “Object tracking by jointly exploiting frame and event domain,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 13 043–13 052.
  15. R. Baldwin, R. Liu, M. M. Almatrafi, V. K. Asari, and K. Hirakawa, “Time-ordered recent event (tore) volumes for event cameras,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.
  16. Y. Nam, M. Mostafavi, K.-J. Yoon, and J. Choi, “Stereo depth from events cameras: Concentrate and focus on the future,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 6114–6123.
  17. G. Gallego, T. Delbrück, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis et al., “Event-based vision: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 44, no. 1, pp. 154–180, 2020.
  18. X. Zheng, Y. Liu, Y. Lu, T. Hua, T. Pan, W. Zhang, D. Tao, and L. Wang, “Deep learning for event-based vision: A comprehensive survey and benchmarks,” arXiv preprint arXiv:2302.08890, 2023.
  19. A. Tomy, A. Paigwar, K. S. Mann, A. Renzaglia, and C. Laugier, “Fusing event-based and rgb camera for robust object detection in adverse conditions,” in IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 933–939.
  20. L. Sun, C. Sakaridis, J. Liang, Q. Jiang, K. Yang, P. Sun, Y. Ye, K. Wang, and L. V. Gool, “Event-based fusion for motion deblurring with cross-modal attention,” in European Conference on Computer Vision (ECCV).   Springer, 2022, pp. 412–428.
  21. S. Tulyakov, A. Bochicchio, D. Gehrig, S. Georgoulis, Y. Li, and D. Scaramuzza, “Time lens++: Event-based frame interpolation with parametric non-linear flow and multi-scale fusion,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 17 755–17 764.
  22. N. Messikommer, S. Georgoulis, D. Gehrig, S. Tulyakov, J. Erbach, A. Bochicchio, Y. Li, and D. Scaramuzza, “Multi-bracket high dynamic range imaging with event cameras,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 547–557.
  23. Z. Wang, Y. Fang, J. Cao, Q. Zhang, Z. Wang, and R. Xu, “Masked spiking transformer,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 1761–1771.
  24. Z. Zhou, Y. Zhu, C. He, Y. Wang, S. Yan, Y. Tian, and L. Yuan, “Spikformer: When spiking neural network meets transformer,” arXiv preprint arXiv:2209.15425, 2022.
  25. S. Deng, Y. Li, S. Zhang, and S. Gu, “Temporal efficient training of spiking neural network via gradient re-weighting,” arXiv preprint arXiv:2202.11946, 2022.
  26. J. Zhang, B. Dong, H. Zhang, J. Ding, F. Heide, B. Yin, and X. Yang, “Spiking transformers for event-based single object tracking,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 8801–8810.
  27. P. Kirkland, G. Di Caterina, J. Soraghan, and G. Matich, “Spikeseg: Spiking segmentation via stdp saliency mapping,” in The International Joint Conference on Neural Networks (IJCNN).   IEEE, 2020, pp. 1–8.
  28. J. Cao, Z. Wang, H. Guo, H. Cheng, Q. Zhang, and R. Xu, “Spiking denoising diffusion probabilistic models,” in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 4912–4921.
  29. E. Hunsberger and C. Eliasmith, “Spiking deep networks with lif neurons,” arXiv preprint arXiv:1510.08829, 2015.
  30. A. N. Burkitt, “A review of the integrate-and-fire neuron model: I. homogeneous synaptic input,” Biological Cybernetics, vol. 95, pp. 1–19, 2006.
  31. J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7263–7271.
  32. H. Zheng, Y. Wu, L. Deng, Y. Hu, and G. Li, “Going deeper with directly-trained larger spiking neural networks,” in AAAI Conference on Artificial Intelligence (AAAI), vol. 35, no. 12, 2021, pp. 11 062–11 070.
  33. J. Ding, Z. Yu, Y. Tian, and T. Huang, “Optimal ann-snn conversion for fast and accurate inference in deep spiking neural networks,” arXiv preprint arXiv:2105.11654, 2021.
  34. S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in European Conference on Computer Vision (ECCV), 2018, pp. 3–19.
  35. B. K. Horn and B. G. Schunck, “Determining optical flow,” Artificial Intelligence (AI), vol. 17, no. 1-3, pp. 185–203, 1981.
  36. I. Sobel, G. Feldman et al., “A 3x3 isotropic gradient operator for image processing,” a talk at the Stanford Artificial Project in, pp. 271–272, 1968.
  37. M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes challenge: A retrospective,” International Journal of Computer Vision (IJCV), vol. 111, pp. 98–136, 2015.
  38. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European Conference on Computer Vision (ECCV).   Springer, 2014, pp. 740–755.
  39. A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
  40. M. Yao, G. Zhao, H. Zhang, Y. Hu, L. Deng, Y. Tian, B. Xu, and G. Li, “Attention spiking neural networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023.
Citations (9)

Summary

We haven't generated a summary for this paper yet.