Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
86 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
53 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

State Space Models for Event Cameras (2402.15584v3)

Published 23 Feb 2024 in cs.CV and cs.LG

Abstract: Today, state-of-the-art deep neural networks that process event-camera data first convert a temporal window of events into dense, grid-like input representations. As such, they exhibit poor generalizability when deployed at higher inference frequencies (i.e., smaller temporal windows) than the ones they were trained on. We address this challenge by introducing state-space models (SSMs) with learnable timescale parameters to event-based vision. This design adapts to varying frequencies without the need to retrain the network at different frequencies. Additionally, we investigate two strategies to counteract aliasing effects when deploying the model at higher frequencies. We comprehensively evaluate our approach against existing methods based on RNN and Transformer architectures across various benchmarks, including Gen1 and 1 Mpx event camera datasets. Our results demonstrate that SSM-based models train 33% faster and also exhibit minimal performance degradation when tested at higher frequencies than the training input. Traditional RNN and Transformer models exhibit performance drops of more than 20 mAP, with SSMs having a drop of 3.76 mAP, highlighting the effectiveness of SSMs in event-based vision tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Guy E. Blelloch. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, 1990.
  2. Adversarial attacks on spiking convolutional neural networks for event-based vision. Front. Neurosci., 16, 2022.
  3. Asynchronous convolutional networks for object detection in neuromorphic cameras. In CVPRW, 2019.
  4. Nicholas F. Y. Chen. Pseudo-labels for supervised learning on dynamic vision sensor data, applied to object detection under ego-motion. In CVPRW, 2018.
  5. Learning from event cameras with sparse spiking convolutional neural networks. In IEEE IJCNN, 2021.
  6. Object detection with spiking neural networks on automotive event data. In IEEE IJCNN, 2022.
  7. Optical flow estimation from event-based cameras and spiking neural networks. Front. Neurosci., 17, 2023.
  8. A large scale event-based detection dataset for automotive. arXiv, abs/2001.08499, 2020.
  9. Dynamic obstacle avoidance for quadrotors with event cameras. Science Robotics, 2020.
  10. Event-based vision: A survey. IEEE TPAMI, 2020.
  11. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021.
  12. Pushing the limits of asynchronous graph-based object detection with event cameras, 2022.
  13. Recurrent vision transformers for object detection with event cameras. In CVPR, pages 13884–13893, 2023.
  14. DSEC: A stereo event camera dataset for driving scenarios. IEEE RA-L, 2021.
  15. It’s raw! audio generation with state-space models. Int. Conf. Mach. Learn., 2022.
  16. Hippo: Recurrent memory with optimal polynomial projections. NeurIPS, 33, 2020.
  17. Combining recurrent, convolutional, and continuous-time models with linear state-space layers. NeurIPS, 34, 2021.
  18. On the parameterization and initialization of diagonal state space models. In NeurIPS, pages 35971–35983. Curran Associates, Inc., 2022a.
  19. Efficiently modeling long sequences with structured state spaces. In ICLR, 2022b.
  20. How to train your hippo: State space models with generalized basis projections. In ICLR, 2023.
  21. Diagonal state spaces are as effective as structured state spaces. In NeurIPS, 2022.
  22. Towards event-driven object detection with off-the-shelf deep learning. In IEEE/RSJ Int. Conf. Intell. Robot. Syst., pages 1–9, 2018.
  23. Memory-efficient graph convolutional networks for object classification and detection with event cameras. In Sign. Proc.: Algo., Arch., Arrang., and Appl., pages 160–165, 2023.
  24. Mixed frame-/event-driven fast pedestrian detection. In IEEE Int. Conf. Robot. Autom., pages 8332–8338, 2019.
  25. Adam: A method for stochastic optimization. In ICLR, 2015.
  26. Yolov6: A single-stage object detection framework for industrial applications, 2022.
  27. Sodformer: Streaming object detection with transformer using events and frames. IEEE TPAMI, 45(11):14020–14037, 2023.
  28. Ssd: Single shot multibox detector. In ECCV, pages 21–37, 2016.
  29. Event-based monocular dense depth estimation with recurrent transformers. arxiv, abs/2212.02791, 2022a.
  30. Swin transformer v2: Scaling up capacity and resolution. In CVPR, pages 12009–12019, 2022b.
  31. A convnet for the 2020s. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022c.
  32. Event-based asynchronous sparse convolutional networks. In ECCV, 2020.
  33. Get: Group event transformer for event-based vision. In ICCV, pages 6038–6048, 2023.
  34. Learning to detect objects with a 1 megapixel event camera. In NeurIPS, pages 16639–16652. Curran Associates, Inc., 2020.
  35. Yolov3: An incremental improvement. In CVPR, 2018.
  36. You only look once: Unified, real-time object detection. In CVPR, 2016.
  37. Flexconv: Continuous kernel convolutions with differentiable kernel sizes. In ICLR, 2022a.
  38. CKConv: Continuous kernel convolution for sequential data. In ICLR, 2022b.
  39. Dynamic vision-based satellite detection: A time-based encoding approach with spiking neural networks. In Int. Conf. on Comput. Vis. Syst., pages 285–298, 2023.
  40. Aegnn: Asynchronous event-based graph neural networks. In CVPR, pages 12371–12381, 2022.
  41. Simplified state space layers for sequence modeling. In ICLR, 2023.
  42. Super-convergence: Very fast training of residual networks using large learning rates. In ICLR, 2018.
  43. Event-based object detection using graph neural networks. In IEEE Data Driven Contr. and Learn. Syst. Conf., pages 1895–1900, 2023.
  44. Time lens: Event-based video frame interpolation. In CVPR, pages 16155–16164, 2021.
  45. Eventclip: Adapting clip for event-based object recognition. ArXiv, 2023.
  46. Nested hierarchical transformer: Towards accurate, data-efficient and interpretable visual understanding. In AAAI, 2022.
  47. From chaos comes order: Ordering event representations for object recognition and detection. In ICCV, pages 12846–12856, 2023.
Citations (20)

Summary

We haven't generated a summary for this paper yet.