Detecting Every Object from Events (2404.05285v1)
Abstract: Object detection is critical in autonomous driving, and it is more practical yet challenging to localize objects of unknown categories: an endeavour known as Class-Agnostic Object Detection (CAOD). Existing studies on CAOD predominantly rely on ordinary cameras, but these frame-based sensors usually have high latency and limited dynamic range, leading to safety risks in real-world scenarios. In this study, we turn to a new modality enabled by the so-called event camera, featured by its sub-millisecond latency and high dynamic range, for robust CAOD. We propose Detecting Every Object in Events (DEOE), an approach tailored for achieving high-speed, class-agnostic open-world object detection in event-based vision. Built upon the fast event-based backbone: recurrent vision transformer, we jointly consider the spatial and temporal consistencies to identify potential objects. The discovered potential objects are assimilated as soft positive samples to avoid being suppressed as background. Moreover, we introduce a disentangled objectness head to separate the foreground-background classification and novel object discovery tasks, enhancing the model's generalization in localizing novel objects while maintaining a strong ability to filter out the background. Extensive experiments confirm the superiority of our proposed DEOE in comparison with three strong baseline methods that integrate the state-of-the-art event-based object detector with advancements in RGB-based CAOD. Our code is available at https://github.com/Hatins/DEOE.
- D. Kim, T.-Y. Lin, A. Angelova et al., “Learning open-world object proposals without learning to classify,” IEEE Robot. Automat. Lett., vol. 7, no. 2, pp. 5453–5460, 2022.
- A. Mitrokhin, C. Fermüller, C. Parameshwara et al., “Event-based moving object detection and tracking,” in Proc IEEE/RSJ Int. Conf. on Intell. Robots and Systems. IEEE, 2018, pp. 1–9.
- A. Rozantsev, V. Lepetit, and P. Fua, “Detecting flying objects using a single moving camera,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 5, pp. 879–892, 2017.
- G. Gallego, T. Delbrück, G. Orchard et al., “Event-based vision: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 154–180, 2020.
- E. Perot, P. De Tournemire, D. Nitti et al., “Learning to detect objects with a 1 megapixel event camera,” Proc Annual Conf. on Neural Infor. Pro. systems, pp. 16 639–16 652, 2020.
- J. Li, J. Li, L. Zhu et al., “Asynchronous spatio-temporal memory network for continuous event-based object detection,” IEEE Trans. Image Process, vol. 31, pp. 2975–2987, 2022.
- M. Gehrig and D. Scaramuzza, “Recurrent vision transformers for object detection with event cameras,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 13 884–13 893.
- K. Saito, P. Hu, T. Darrell, and K. Saenko, “Learning to detect every thing in an open world,” in Proc Eur. Conf. on Computer Vision. Springer, 2022, pp. 268–284.
- K. Joseph, S. Khan, F. S. Khan, and V. N. Balasubramanian, “Towards open world object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 5830–5840.
- A. Gupta, S. Narayan, K. Joseph et al., “Ow-detr: Open-world detection transformer,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 9235–9244.
- O. Zohar, K.-C. Wang, and S. Yeung, “Prob: Probabilistic objectness for open world object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 11 444–11 453.
- B. Liu, C. Xu, W. Yang et al., “Motion robust high-speed light-weighted object detection with event camera,” IEEE Trans. Instru. and Measurement, vol. 72, pp. 1–13, 2023.
- D. Wang, X. Jia, Y. Zhang et al., “Dual memory aggregation network for event-based object detection with learnable representation,” in Proc AAAI Conf. on Arti. Intell., 2023, pp. 1–9.
- J. Zhang, B. Dong, H. Zhang et al., “Spiking transformers for event-based single object tracking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 8801–8810.
- S. Kim, S. Park, B. Na, and S. Yoon, “Spiking-yolo: spiking neural network for energy-efficient object detection,” in Proc AAAI Conf. on Arti. Intell., 2020, pp. 11 270–11 277.
- B. Chakraborty, X. She, and S. Mukhopadhyay, “A fully spiking hybrid neural network for energy-efficient object detection,” IEEE Trans. Image Process, vol. 30, pp. 9014–9029, 2021.
- D. Li, J. Li, and Y. Tian, “Sodformer: Streaming object detection with transformer using events and frames,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 11, pp. 14 020–14 037, 2023.
- M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Trans. Signal Process, vol. 45, no. 11, pp. 2673–2681, 1997.
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
- Z. Tu, H. Talebi, H. Zhang et al., “Maxvit: Multi-axis vision transformer,” in Proc Eur. Conf. on Computer Vision. Springer, 2022, pp. 459–479.
- B. Jiang, R. Luo, J. Mao et al., “Acquisition of localization confidence for accurate object detection,” in Proc Eur. Conf. on Computer Vision. Springer, 2018, pp. 784–799.
- Y. Cao, K. Chen, C. C. Loy, and D. Lin, “Prime sample attention in object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11 583–11 591.
- M. Gehrig, W. Aarents, D. Gehrig, and D. Scaramuzza, “Dsec: A stereo event camera dataset for driving scenarios,” IEEE Robot. Automat. Lett., vol. 6, no. 3, pp. 4947–4954, 2021.
- D. Gehrig and D. Scaramuzza, “Pushing the limits of asynchronous graph-based object detection with event cameras,” arXiv, 2022.
- W. Wang, M. Feiszli, H. Wang, and D. Tran, “Unidentified video objects: A benchmark for dense, open-world segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 10 776–10 785.