Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Out of the Room: Generalizing Event-Based Dynamic Motion Segmentation for Complex Scenes (2403.04562v1)

Published 7 Mar 2024 in cs.CV

Abstract: Rapid and reliable identification of dynamic scene parts, also known as motion segmentation, is a key challenge for mobile sensors. Contemporary RGB camera-based methods rely on modeling camera and scene properties however, are often under-constrained and fall short in unknown categories. Event cameras have the potential to overcome these limitations, but corresponding methods have only been demonstrated in smaller-scale indoor environments with simplified dynamic objects. This work presents an event-based method for class-agnostic motion segmentation that can successfully be deployed across complex large-scale outdoor environments too. To this end, we introduce a novel divide-and-conquer pipeline that combines: (a) ego-motion compensated events, computed via a scene understanding module that predicts monocular depth and camera pose as auxiliary tasks, and (b) optical flow from a dedicated optical flow module. These intermediate representations are then fed into a segmentation module that predicts motion segmentation masks. A novel transformer-based temporal attention module in the segmentation module builds correlations across adjacent 'frames' to get temporally consistent segmentation masks. Our method sets the new state-of-the-art on the classic EV-IMO benchmark (indoors), where we achieve improvements of 2.19 moving object IoU (2.22 mIoU) and 4.52 point IoU respectively, as well as on a newly-generated motion segmentation and tracking benchmark (outdoors) based on the DSEC event dataset, termed DSEC-MOTS, where we show improvement of 12.91 moving object IoU.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. 3d monte carlo localization with efficient distance field representation for automated driving in dynamic environments. In 2020 IEEE intelligent vehicles symposium (IV), pages 1859–1866. IEEE, 2020.
  2. Discovering objects that can move. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11789–11798, 2022.
  3. Unsupervised scale-consistent depth learning from video. International Journal of Computer Vision, 129(9):2548–2564, 2021.
  4. It’s moving! a probabilistic model for causal motion segmentation in moving camera videos. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pages 433–449. Springer, 2016.
  5. Moa-net: self-supervised motion segmentation. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pages 0–0, 2018.
  6. A 240×\times× 180 130 db 3 μ𝜇\muitalic_μs latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-State Circuits, 49(10):2333–2341, 2014.
  7. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Transactions on robotics, 32(6):1309–1332, 2016.
  8. Unsupervised monocular depth and ego-motion learning with structure and semantics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
  9. Leveraging stereo-camera data for real-time dynamic obstacle detection and tracking. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10528–10535. IEEE, 2020.
  10. Dynamic obstacle avoidance for quadrotors with event cameras. Science Robotics, 5(40):eaaz9712, 2020.
  11. A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3867–3876, 2018.
  12. Event-based vision: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(1):154–180, 2020.
  13. End-to-end learning of representations for asynchronous event-based data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5633–5643, 2019.
  14. Dsec: A stereo event camera dataset for driving scenarios. IEEE Robotics and Automation Letters, 6(3):4947–4954, 2021a.
  15. E-raft: Dense optical flow from event cameras. In 2021 International Conference on 3D Vision (3DV), pages 197–206. IEEE, 2021b.
  16. Self-supervised learning of event-based optical flow with spiking neural networks. Advances in Neural Information Processing Systems, 34:7167–7179, 2021.
  17. Fast-dynamic-vision: Detection and tracking dynamic objects with event and depth sensing. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3071–3078. IEEE, 2021.
  18. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  19. Learning monocular dense depth from events. In 2020 International Conference on 3D Vision (3DV), pages 534–542. IEEE, 2020.
  20. Spatial transformer networks. Advances in neural information processing systems, 28, 2015.
  21. Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16, pages 582–600. Springer, 2020.
  22. Spike-flownet: event-based optical flow estimation with energy-efficient hybrid neural networks. In European Conference on Computer Vision, pages 366–382. Springer, 2020.
  23. Unsupervised monocular depth learning in dynamic scenes. In Conference on Robot Learning, pages 1908–1917. PMLR, 2021a.
  24. Graph-based asynchronous event processing for rapid object recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 934–943, 2021b.
  25. A 128 ×\times× 128 120 db 15 μ𝜇\muitalic_μs latency asynchronous temporal contrast vision sensor. IEEE journal of solid-state circuits, 43(2):566–576, 2008.
  26. Globally optimal contrast maximisation for event-based motion estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6349–6358, 2020.
  27. Spatiotemporal registration for event-based visual odometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4937–4946, 2021.
  28. Unsupervised learning of scene flow estimation fusing with local rigidity. In IJCAI, pages 876–882, 2019.
  29. Event-based moving object detection and tracking. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–9. IEEE, 2018.
  30. Ev-imo: Motion segmentation dataset and learning pipeline for event cameras. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6105–6112. IEEE, 2019.
  31. Learning visual motion segmentation using event surfaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14414–14423, 2020.
  32. Entropy minimisation framework for event-based vision model estimation. In European Conference on Computer Vision, pages 161–176. Springer, 2020.
  33. Single image optical flow estimation with an event camera. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1669–1678. IEEE, 2020.
  34. Spikems: Deep spiking neural network for motion segmentation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3414–3420. IEEE, 2021a.
  35. 0-mms: Zero-shot multi-motion segmentation with a monocular event camera. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 9594–9600. IEEE, 2021b.
  36. Globally-optimal event camera motion estimation. In European Conference on Computer Vision, pages 51–67. Springer, 2020.
  37. Learning to detect objects with a 1 megapixel event camera. Advances in Neural Information Processing Systems, 33:16639–16652, 2020.
  38. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  39. Kimera: From slam to spatial perception with 3d dynamic scene graphs. The International Journal of Robotics Research, 40(12-14):1510–1546, 2021.
  40. Aegnn: Asynchronous event-based graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12371–12381, 2022.
  41. Secrets of event-based optical flow. In European Conference on Computer Vision, pages 628–645. Springer, 2022.
  42. Event-based motion segmentation by motion compensation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7244–7253, 2019.
  43. Ess: Learning event-based semantic segmentation from still images. In European Conference on Computer Vision, pages 341–357. Springer, 2022.
  44. Time lens: Event-based video frame interpolation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16155–16164, 2021.
  45. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  46. Evdistill: Asynchronous events to end-task learning via bidirectional reconstruction-guided cross-modal knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 608–619, 2021.
  47. Segmenting moving objects via an object-centric layered representation. In Advances in Neural Information Processing Systems, 2022.
  48. Learning to segment rigid motions from two frames. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1266–1275, 2021.
  49. Unsupervised learning of dense optical flow, depth and egomotion with event-based sensors. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5831–5838. IEEE, 2020.
  50. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5728–5739, 2022.
  51. A multi-scale recurrent framework for motion segmentation with event camera. IEEE Access, 2023.
  52. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1851–1858, 2017.
  53. Event-based motion segmentation with spatio-temporal graph cuts. IEEE Transactions on Neural Networks and Learning Systems, 2021.
  54. Rgb-event fusion for moving object detection in autonomous driving. arXiv preprint arXiv:2209.08323, 2022.
  55. Dsec-mos: Segment any moving object with moving ego vehicle. arXiv preprint arXiv:2305.00126, 2023.
  56. Ev-flownet: Self-supervised optical flow estimation for event-based cameras. In Robotics: Science and Systems, 2018.
  57. Unsupervised event-based learning of optical flow, depth, and egomotion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 989–997, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com