Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Event-based Simultaneous Localization and Mapping: A Comprehensive Survey (2304.09793v2)

Published 19 Apr 2023 in cs.CV and cs.RO

Abstract: In recent decades, visual simultaneous localization and mapping (vSLAM) has gained significant interest in both academia and industry. It estimates camera motion and reconstructs the environment concurrently using visual sensors on a moving robot. However, conventional cameras are limited by hardware, including motion blur and low dynamic range, which can negatively impact performance in challenging scenarios like high-speed motion and high dynamic range illumination. Recent studies have demonstrated that event cameras, a new type of bio-inspired visual sensor, offer advantages such as high temporal resolution, dynamic range, low power consumption, and low latency. This paper presents a timely and comprehensive review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams for localization and mapping tasks. The review covers the working principle of event cameras and various event representations for preprocessing event data. It also categorizes event-based vSLAM methods into four main categories: feature-based, direct, motion-compensation, and deep learning methods, with detailed discussions and practical guidance for each approach. Furthermore, the paper evaluates the state-of-the-art methods on various benchmarks, highlighting current challenges and future opportunities in this emerging research area. A public repository will be maintained to keep track of the rapid developments in this field at {\url{https://github.com/kun150kun/ESLAM-survey}}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (154)
  1. C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard, “Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age,” IEEE Transactions on Robotics, vol. 32, no. 6, pp. 1309–1332, 2016.
  2. A. Macario Barros, M. Michel, Y. Moline, G. Corre, and F. Carrel, “A comprehensive survey of visual slam algorithms,” Robotics, vol. 11, 2022.
  3. J. Zhang and D. Tao, “Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things,” IEEE Internet of Things Journal, vol. 8, no. 10, pp. 7789–7817, 2020.
  4. H. Liu, X. Sun, L. Fang, and F. Wu, “Deblurring saturated night image with function-form kernel,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 4637–4650, 2015.
  5. L. Chen, J. Zhang, J. Pan, S. Lin, F. Fang, and J. S. Ren, “Learning a non-blind deblurring network for night blurry images,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 537–10 545.
  6. P. Liu, X. Zuo, V. Larsson, and M. Pollefeys, “Mba-vo: Motion blur aware visual odometry,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5530–5539, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:232352311
  7. D. Deniz, E. Ros, C. Fermller, and F. Barranco, “When do neuromorphic sensors outperform cameras? learning from dynamic features,” in Annual Conference on Information Sciences and Systems, 2023, pp. 1–6.
  8. J. Hidalgo-Carrio, G. Gallego, and D. Scaramuzza, “Event-aided direct sparse odometry,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5771–5780.
  9. G. Gallego, T. Delbrück, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis, and D. Scaramuzza, “Event-based vision: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, pp. 154–180, 2019.
  10. A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T. Nayak, A. Andreopoulos, G. Garreau, M. Mendoza, J. Kusnitz, M. Debole, S. Esser, T. Delbruck, M. Flickner, and D. Modha, “A low power, fully event-based gesture recognition system,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7388–7397.
  11. Y.-F. Zuo, J. Yang, J. Chen, X. Wang, Y. Wang, and L. Kneip, “Devo: Depth-event camera visual odometry in challenging conditions,” in International Conference on Robotics and Automation, 2022, pp. 2179–2185.
  12. D. Weikersdorfer, D. B. Adrian, D. Cremers, and J. Conradt, “Event-based 3d slam with a depth-augmented dynamic vision sensor,” in IEEE International Conference on Robotics and Automation, 2014, pp. 359–364.
  13. C. Forster, M. Pizzoli, and D. Scaramuzza, “Svo: Fast semi-direct monocular visual odometry,” in IEEE International Conference on Robotics and Automation, 2014, pp. 15–22.
  14. R. Mur-Artal and J. D. Tardós, “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255–1262, 2017.
  15. J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 3, pp. 611–625, 2018.
  16. R. Li, D. xi Shi, Y. Zhang, K. Li, and R. Li, “Fa-harris: A fast and asynchronous corner detector for event cameras,” IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 6223–6229, 2019.
  17. H. Rebecq, T. Horstschaefer, and D. Scaramuzza, “Real-time visual-inertial odometry for event cameras using keyframe-based nonlinear optimization,” in British Machine Vision Conference, 2017.
  18. H. Rebecq, T. Horstschaefer, G. Gallego, and D. Scaramuzza, “Evo: A geometric approach to event-based 6-dof parallel tracking and mapping in real time,” IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 593–600, 2017.
  19. H. Kim, S. Leutenegger, and A. J. Davison, “Real-time 3d reconstruction and 6-dof tracking with an event camera,” in European Conference on Computer Vision, 2016.
  20. G. Gallego, H. Rebecq, and D. Scaramuzza, “A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 3867–3876.
  21. X. Zheng, Y.-P. Liu, Y. Lu, T. Hua, T. Pan, W. Zhang, D. Tao, and L. Wang, “Deep learning for event-based vision: A comprehensive survey and benchmarks,” ArXiv, vol. abs/2302.08890, 2023.
  22. G. Gallego, C. Forster, E. Mueggler, and D. Scaramuzza, “Event-based camera pose tracking using a generative event model,” ArXiv, vol. abs/1510.01972, 2015.
  23. M. Gehrig, S. B. Shrestha, D. Mouritzen, and D. Scaramuzza, “Event-based angular velocity regression with spiking networks,” in IEEE International Conference on Robotics and Automation, 2020, pp. 4195–4202.
  24. A. Censi and D. Scaramuzza, “Low-latency event-based visual odometry,” in IEEE International Conference on Robotics and Automation, 2014, pp. 703–710.
  25. I. Alzugaray and M. Chli, “Asynchronous multi-hypothesis tracking of features with event cameras,” in International Conference on 3D Vision, 2019, pp. 269–278.
  26. B. Kueng, E. Mueggler, G. Gallego, and D. Scaramuzza, “Low-latency visual odometry using event-based feature tracks,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016, pp. 16–23.
  27. C. Ye, A. Mitrokhin, C. Fermüller, J. A. Yorke, and Y. Aloimonos, “Unsupervised learning of dense optical flow, depth and egomotion with event-based sensors,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020, pp. 5831–5838.
  28. X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, and R. B. Benosman, “Hots: A hierarchy of event-based time-surfaces for pattern recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 7, pp. 1346–1359, 2017.
  29. W. Guan, P.-Y. Chen, Y. Xie, and P. Lu, “Pl-evio: Robust monocular event-based visual inertial odometry with point and line features,” ArXiv, vol. abs/2209.12160, 2022.
  30. S. M. M. Isfahani, L. Wang, and K.-J. Yoon, “Learning to reconstruct hdr images from events, with applications to depth and flow prediction,” International Journal of Computer Vision, pp. 1–21, 2021.
  31. A. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Unsupervised event-based learning of optical flow, depth, and egomotion,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 989–997.
  32. H. Rebecq, R. Ranftl, V. Koltun, and D. Scaramuzza, “High speed and high dynamic range video with an event camera,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 43, no. 06, pp. 1964–1980, 2021.
  33. G. Klein and D. Murray, “Parallel tracking and mapping for small ar workspaces,” in IEEE and ACM International Symposium on Mixed and Augmented Reality, 2007, pp. 225–234.
  34. D. Liu, A. Parra, Y. Latif, B. Chen, T.-J. Chin, and I. Reid, “Asynchronous optimisation for event-based visual odometry,” in International Conference on Robotics and Automation, 2022, pp. 9432–9438.
  35. J. Hidalgo-Carrio, D. Gehrig, and D. Scaramuzza, “Learning monocular dense depth from events,” in International Conference on 3D Vision, 2020, pp. 534–542.
  36. H. Rebecq, G. Gallego, E. Mueggler, and D. Scaramuzza, “Emvs: Event-based multi-view stereo–3d reconstruction with an event camera in real-time,” Int. J. Comput. Vision, vol. 126, no. 12, p. 1394–1414, dec 2018.
  37. C. L. Gentil, F. Tschopp, I. Alzugaray, T. Vidal-Calleja, R. Y. Siegwart, and J. I. Nieto, “Idol: A framework for imu-dvs odometry using lines,” IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5863–5870, 2020.
  38. W. Chamorro, J. Solà, and J. Andrade-Cetto, “Event-based line slam in real-time,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 8146–8153, 2022.
  39. M. Cook, L. Gugelmann, F. Jug, C. Krautz, and A. Steger, “Interacting maps for fast visual interpretation,” in International Joint Conference on Neural Networks, 2011, pp. 770–776.
  40. D. Weikersdorfer and J. Conradt, “Event-based particle filtering for robot self-localization,” in IEEE International Conference on Robotics and Biomimetics, 2012, pp. 866–870.
  41. D. Weikersdorfer, R. Hoffmann, and J. Conradt, “Simultaneous localization and mapping for event-based vision systems,” in Proceedings of the 9th International Conference on Computer Vision Systems, 2013, p. 133–142.
  42. E. Mueggler, B. Huber, and D. Scaramuzza, “Event-based, 6-dof pose tracking for high-speed maneuvers,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014, pp. 2761–2768.
  43. H. Kim, A. Handa, R. Benosman, S.-H. Ieng, and A. Davison, “Simultaneous mosaicing and tracking with an event camera,” in Proceedings of the British Machine Vision Conference, 2014.
  44. E. Mueggler, G. Gallego, and D. Scaramuzza, “Continuous-time trajectory estimation for event-based vision sensors,” in Robotics: Science and Systems, 2015.
  45. W. Yuan and S. Ramalingam, “Fast localization and tracking using event sensors,” in IEEE International Conference on Robotics and Automation, 2016, pp. 4564–4571.
  46. G. Gallego, J. E. Lund, E. Mueggler, H. Rebecq, T. Delbruck, and D. Scaramuzza, “Event-based, 6-dof camera tracking from photometric depth maps,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 10, pp. 2402–2412, 2018.
  47. G. Gallego and D. Scaramuzza, “Accurate angular velocity estimation with an event camera,” IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 632–639, 2017.
  48. C. Reinbacher, G. Munda, and T. Pock, “Real-time panoramic tracking for event cameras,” CoRR, vol. abs/1703.05161, 2017.
  49. S. Bryner, G. Gallego, H. Rebecq, and D. Scaramuzza, “Event-based, direct camera tracking from a photometric 3d map using nonlinear optimization,” in International Conference on Robotics and Automation, 2019, pp. 325–331.
  50. D. Zhu, Z. Xu, J. Dong, C. Ye, Y. Hu, H. Su, Z. Liu, and G. Chen, “Neuromorphic visual odometry system for intelligent vehicle application with bio-inspired vision sensor,” in IEEE International Conference on Robotics and Biomimetics, 2019, pp. 2225–2232.
  51. J. Xu, M. Jiang, L. Yu, W. Yang, and W. Wang, “Robust motion compensation for event cameras with smooth constraint,” IEEE Transactions on Computational Imaging, vol. 6, pp. 604–614, 2020.
  52. U. M. Nunes and Y. Demiris, “Entropy minimisation framework for event-based vision model estimation,” in ECCV, 2020, pp. 161–176.
  53. ——, “Robust event-based vision model estimation by dispersion minimisation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 12, pp. 9561–9573, 2022.
  54. D. Liu, Á. Parra, and T.-J. Chin, “Globally optimal contrast maximisation for event-based motion estimation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 6348–6357.
  55. X. Peng, Y. Wang, L. Gao, and L. Kneip, “Globally-optimal event camera motion estimation,” in ECCV, 2020, pp. 51–67.
  56. X. Peng, L. Gao, Y. Wang, and L. Kneip, “Globally-optimal contrast maximisation for event cameras,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 44, no. 07, pp. 3479–3495, 2022.
  57. J. Bertrand, A. Yiğit, and S. Durand, “Embedded event-based visual odometry,” in 2020 6th International Conference on Event-Based Control, Communication, and Signal Processing, 2020, pp. 1–8.
  58. W. Chamorro, J. Andrade-Cetto, and J. Solà, “High speed event camera tracking,” in British Machine Vision Conference, 2020.
  59. C. Gu, E. Learned-Miller, D. Sheldon, G. Gallego, and P. Bideau, “The spatio-temporal poisson point process: A simple model for the alignment of event camera data,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, October 2021, pp. 13 495–13 504.
  60. H. Kim and H. J. Kim, “Real-time rotational motion estimation with contrast maximization over globally aligned events,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 6016–6023, 2021.
  61. D. Liu, Á. Parra, and T.-J. Chin, “Spatiotemporal registration for event-based visual odometry,” IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4935–4944, 2021.
  62. Y. Wang, J. Yang, X. Peng, P. Wu, L. Gao, K. Huang, J. Chen, and L. Kneip, “Visual odometry with an event camera using continuous ray warping and volumetric contrast maximization,” Sensors, vol. 22, no. 15, 2022.
  63. Y. Zhou, G. Gallego, and S. Shen, “Event-based stereo visual odometry,” IEEE Transactions on Robotics, vol. 37, no. 5, pp. 1433–1450, 2021.
  64. A. Hadviger, I. Cvisic, I. Markovi’c, S. Vrazic, and I. Petrovi’c, “Feature-based event stereo visual odometry,” European Conference on Mobile Robots, pp. 1–6, 2021.
  65. E. Mueggler, G. Gallego, H. Rebecq, and D. Scaramuzza, “Continuous-time visual-inertial odometry for event cameras,” IEEE Transactions on Robotics, vol. 34, no. 6, pp. 1425–1440, 2018.
  66. A. Z. Zhu, N. Atanasov, and K. Daniilidis, “Event-based visual inertial odometry,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5816–5824.
  67. A. R. Vidal, H. Rebecq, T. Horstschaefer, and D. Scaramuzza, “Ultimate slam? combining events, images, and imu for robust visual slam in hdr and high-speed scenarios,” IEEE Robotics and Automation Letters, vol. 3, pp. 994–1001, 2017.
  68. W. Guan and P. Lu, “Monocular event visual inertial odometry based on event-corner using sliding windows graph-based optimization,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022, pp. 2438–2445.
  69. F. Mahlknecht, D. Gehrig, J. Nash, F. M. Rockenbauer, B. Morrell, J. Delaune, and D. Scaramuzza, “Exploring event camera-based odometry for planetary robots,” IEEE Robotics and Automation Letters, vol. 7, pp. 8651–8658, 2022.
  70. Z. Liu, D. Shi, R. Li, and S. Yang, “Esvio: Event-based stereo visual-inertial odometry,” Sensors, vol. 23, no. 4, 2023.
  71. X. Clady, S.-H. Ieng, and R. Benosman, “Asynchronous event-based corner detection and matching,” Neural Networks, vol. 66, pp. 91–106, 2015.
  72. X. Clady, J.-M. Maro, S. Barré, and R. B. Benosman, “A motion-based feature for event-based pattern recognition,” Frontiers in Neuroscience, vol. 10, 2017.
  73. V. Vasco, A. Glover, and C. Bartolozzi, “Fast event-based harris corner detection exploiting the advantages of event-driven cameras,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016, pp. 4144–4149.
  74. C. G. Harris and M. J. Stephens, “A combined corner and edge detector,” in Alvey Vision Conference, 1988.
  75. C. Scheerlinck, N. Barnes, and R. Mahony, “Asynchronous spatial image convolutions for event cameras,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 816–822, 2019.
  76. E. Mueggler, C. Bartolozzi, and D. Scaramuzza, “Fast event-based corner detection,” in British Machine Vision Conference, 2017.
  77. E. Rosten and T. Drummond, “Machine learning for high-speed corner detection,” in European Conference on Computer Vision, 2006.
  78. I. Alzugaray and M. Chli, “Asynchronous corner detection and tracking for event cameras in real time,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3177–3184, 2018.
  79. A. Z. Zhu, N. Atanasov, and K. Daniilidis, “Event-based feature tracking with probabilistic data association,” in IEEE International Conference on Robotics and Automation, 2017, pp. 4465–4470.
  80. J. Manderscheid, A. Sironi, N. Bourdis, D. Migliore, and V. Lepetit, “Speed invariant time surface for learning to detect corner points with event-based cameras,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 10 237–10 246.
  81. P. Chiberre, E. Perot, A. Sironi, and V. Lepetit, “Detecting stable keypoints from events through image gradient prediction,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021, pp. 1387–1394.
  82. X. Shi, Z. Chen, H. Wang, D. Y. Yeung, W.-K. Wong, and W. chun Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” in Neural Information Processing Systems, 2015.
  83. R. Grompone von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall, “Lsd: A fast line segment detector with a false detection control,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 4, pp. 722–732, 2010.
  84. L. Everding and J. Conradt, “Low-latency line tracking using event-based dynamic vision sensors,” Frontiers in Neurorobotics, vol. 12, 2018.
  85. C. Brändli, J. Strubel, S. Keller, D. Scaramuzza, and T. Delbruck, “Elised — an event-based line segment detector,” in International Conference on Event-based Control, Communication, and Signal Processing, 2016, pp. 1–7.
  86. D. Reverter Valeiras, X. Clady, S.-H. Ieng, and R. Benosman, “Event-based line fitting and segment detection using a neuromorphic visual sensor,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 4, pp. 1218–1230, 2019.
  87. Z. Ni, A. Bolopion, J. Agnus, R. Benosman, and S. Regnier, “Asynchronous event-based visual shape tracking for stable haptic feedback in microrobotics,” IEEE Transactions on Robotics, vol. 28, pp. 1081–1089, 2012.
  88. Z. Ni, S.-H. Ieng, C. Posch, S. Régnier, and R. Benosman, “Visual tracking using neuromorphic asynchronous event-based cameras,” Neural Computation, vol. 27, no. 4, pp. 925–953, 2015.
  89. D. Tedaldi, G. Gallego, E. Mueggler, and D. Scaramuzza, “Feature detection and tracking with the dynamic and active-pixel vision sensor (davis),” in International Conference on Event-based Control, Communication, and Signal Processing, 2016, pp. 1–7.
  90. P. Besl and N. D. McKay, “A method for registration of 3-d shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 239–256, 1992.
  91. D. Gehrig, H. Rebecq, G. Gallego, and D. Scaramuzza, “Eklt: Asynchronous photometric feature tracking using events and frames,” International Journal of Computer Vision, vol. 128, pp. 601–618, 2018.
  92. H. Seok and J. Lim, “Robust feature tracking in dvs event stream using bezier mapping,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, March 2020.
  93. J. Chui, S. Klenk, and D. Cremers, “Event-based feature tracking in continuous time with sliding window optimization,” ArXiv, vol. abs/2107.04536, 2021.
  94. I. Alzugaray and M. Chli, “Ace: An efficient asynchronous corner tracker for event cameras,” in International Conference on 3D Vision, 2018, pp. 653–661.
  95. I. Alzugaray and M. Chli, “HASTE: multi-hypothesis asynchronous speeded-up tracking of events,” in British Machine Vision Conference, 2020.
  96. S. Hu, Y. Kim, H. Lim, A. J. Lee, and H. Myung, “ecdt: Event clustering for simultaneous feature detection and tracking,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022, pp. 3808–3815.
  97. L. Zhang and R. Koch, “An efficient and robust line segment matching approach based on lbd descriptor and pairwise geometric consistency,” J. Vis. Commun. Image Represent., vol. 24, pp. 794–805, 2013.
  98. N. Messikommer, C. Fang, M. Gehrig, and D. Scaramuzza, “Data-driven feature tracking for event cameras,” ArXiv, vol. abs/2211.12826, 2022.
  99. T.-Y. Lin, P. Dollár, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, “Feature pyramid networks for object detection,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944, 2016.
  100. S. Leutenegger, P. Furgale, V. Rabaud, M. Chli, K. Konolige, and R. Siegwart, “Keyframe-based visual-inertial slam using nonlinear optimization,” Proceedings of Robotis Science and Systems, 2013.
  101. C. Forster, L. Carlone, F. Dellaert, and D. Scaramuzza, “On-manifold preintegration for real-time visual–inertial odometry,” IEEE Transactions on Robotics, vol. 33, no. 1, pp. 1–21, 2017.
  102. G. Gallego, M. Gehrig, and D. Scaramuzza, “Focus is all you need: Loss functions for event-based vision,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 272–12 281.
  103. H. C. Gupta and B. D. Sharma, “On non-additive measures of inaccuracy,” Czechoslovak Mathematical Journal, vol. 26, pp. 584–595, 1976.
  104. S. Shiba, Y. Aoki, and G. Gallego, “Event collapse in contrast maximization frameworks,” Sensors, vol. 22, no. 14, 2022.
  105. T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and ego-motion from video,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6612–6619.
  106. S. Wang, R. Clark, H. Wen, and N. Trigoni, “Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks,” in IEEE International Conference on Robotics and Automation, 2017, pp. 2043–2050.
  107. N. Yang, L. von Stumberg, R. Wang, and D. Cremers, “D3vo: Deep depth, deep pose and deep uncertainty for monocular visual odometry,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  108. S. Zhang, J. Zhang, and D. Tao, “Towards scale consistent monocular visual odometry by learning from the virtual world,” in International Conference on Robotics and Automation, 2022, pp. 5601–5607.
  109. H. Zhao, J. Zhang, S. Zhang, and D. Tao, “Jperceiver: Joint perception network for depth, pose and layout estimation in driving scenes,” in European Conference on Computer Vision, 2022.
  110. S. Zhang, J. Zhang, and D. Tao, “Towards scale-aware, robust, and generalizable unsupervised monocular depth estimation by integrating imu motion dynamics,” in ECCV, 2022, pp. 143–160.
  111. A. Nguyen, T.-T. Do, D. G. Caldwell, and N. G. Tsagarakis, “Real-time 6dof pose relocalization for event cameras with stacked spatial lstm networks,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 1638–1645.
  112. Y. Sekikawa, K. Hara, and H. Saito, “Eventnet: Asynchronous recursive event processing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3887–3896.
  113. C. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 77–85, 2016.
  114. R. Hamaguchi, Y. Furukawa, M. Onishi, and K. Sakurada, “Hierarchical neural memory network for low latency event processing,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 867–22 876.
  115. S. K. Esser, R. Appuswamy, P. Merolla, J. V. Arthur, and D. S. Modha, “Backpropagation for energy-efficient neuromorphic computing,” in Advances in Neural Information Processing Systems, vol. 28, 2015.
  116. P. U. Diehl, D. Neil, J. Binas, M. Cook, S.-C. Liu, and M. Pfeiffer, “Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing,” in International Joint Conference on Neural Networks, 2015, pp. 1–8.
  117. P. U. Diehl, B. U. Pedroni, A. Cassidy, P. Merolla, E. Neftci, and G. Zarrella, “Truehappiness: Neuromorphic emotion recognition on truenorth,” in International Joint Conference on Neural Networks, 2016, pp. 4278–4285.
  118. S. Shrestha and G. Orchard, “Slayer: Spike layer error reassignment in time,” in Neural Information Processing Systems, 2018.
  119. P. U. Diehl and M. Cook, “Unsupervised learning of digit recognition using spike-timing-dependent plasticity,” Frontiers in Computational Neuroscience, vol. 9, 2015.
  120. D. Gehrig, M. Rüegg, M. Gehrig, J. Hidalgo-Carrió, and D. Scaramuzza, “Combining events and frames using recurrent asynchronous multimodal networks for monocular depth prediction,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2822–2829, 2021.
  121. D. Zhang, Q. Ding, P. Duan, C. Zhou, and B. Shi, “Data association between event streams and intensity frames under diverse baselines,” in ECCV, 2022, pp. 72–90.
  122. M. Siam, S. Valipour, M. Jagersand, and N. Ray, “Convolutional gated recurrent networks for video segmentation,” in IEEE International Conference on Image Processing, 2017, pp. 3090–3094.
  123. K. Chaney, A. Z. Zhu, and K. Daniilidis, “Learning event-based height from plane and parallax,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019, pp. 3690–3696.
  124. L. Wang, Y. Chae, and K.-J. Yoon, “Dual transfer learning for event-based end-task prediction via pluggable event to image translation,” IEEE/CVF International Conference on Computer Vision, pp. 2115–2125, 2021.
  125. D. Shi, L. Jing, R. Li, Z. Liu, L. Wang, H. Xu, and Y. Zhang, “Improved event-based dense depth estimation via optical flow compensation,” in IEEE International Conference on Robotics and Automation, 2023, pp. 4902–4908.
  126. Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in European Conference on Computer Vision, 2020.
  127. E. Mueggler, H. Rebecq, G. Gallego, T. Delbruck, and D. Scaramuzza, “The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and slam,” The International Journal of Robotics Research, vol. 36, no. 2, pp. 142–149, 2017.
  128. Y. Zhou, G. Gallego, H. Rebecq, L. Kneip, H. Li, and D. Scaramuzza, “Semi-dense 3d reconstruction with a stereo event camera,” in ECCV, 2018, pp. 242–258.
  129. A. Z. Zhu, D. Thakur, T. Özaslan, B. Pfrommer, V. R. Kumar, and K. Daniilidis, “The multivehicle stereo event camera dataset: An event camera dataset for 3d perception,” IEEE Robotics and Automation Letters, vol. 3, pp. 2032–2039, 2018.
  130. J. Delmerico, T. Cieslewski, H. Rebecq, M. Faessler, and D. Scaramuzza, “Are we ready for autonomous drone racing? the uzh-fpv drone racing dataset,” in International Conference on Robotics and Automation, 2019, pp. 6713–6719.
  131. S. Klenk, J. Chui, N. Demmel, and D. Cremers, “Tum-vie: The tum stereo visual-inertial event dataset,” in International Conference on Intelligent Robots and Systems, 2021.
  132. L. Gao, Y. Liang, J. Yang, S. Wu, C. Wang, J. Chen, and L. Kneip, “Vector: A versatile event-centric benchmark for multi-sensor slam,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 8217–8224, 2022.
  133. J. Yin, A. Li, T. Li, W. Yu, and D. Zou, “M2dgr: A multi-sensor and multi-scenario slam dataset for ground robots,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2266–2273, 2022.
  134. J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 573–580.
  135. T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Transactions on Robotics, vol. 34, pp. 1004–1020, 2017.
  136. C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. M. Montiel, and J. D. Tardós, “Orb-slam3: An accurate open-source library for visual, visual-inertial, and multimap slam,” IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1890, 2021.
  137. Q. Fu, J. Wang, H. Yu, I. Ali, F. Guo, and H. Zhang, “Pl-vins: Real-time monocular visual-inertial slam with point and line,” ArXiv, vol. abs/2009.07462, 2020.
  138. Z. Li and N. Snavely, “Megadepth: Learning single-view depth prediction from internet photos,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 2041–2050.
  139. L. Carlone, D. M. Rosen, G. C. Calafiore, J. J. Leonard, and F. Dellaert, “Lagrangian duality in 3d slam: Verification techniques and optimal solutions,” IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 125–132, 2015.
  140. S. Zhang, J. Zhang, and D. Tao, “Information-theoretic odometry learning,” International Journal of Computer Vision, vol. 130, pp. 2553 – 2570, 2022.
  141. Z. Wang, D. Yuan, Y. Ng, and R. Mahony, “A linear comb filter for event flicker removal,” in International Conference on Robotics and Automation, 2022, pp. 398–404.
  142. A. Mitrokhin, C. Fermüller, C. Parameshwara, and Y. Aloimonos, “Event-based moving object detection and tracking,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018, pp. 1–9.
  143. T. Stoffregen, G. Gallego, T. Drummond, L. Kleeman, and D. Scaramuzza, “Event-based motion segmentation by motion compensation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 10 2019, pp. 7243–7252.
  144. A. Mitrokhin, Z. Hua, C. Fermüller, and Y. Aloimonos, “Learning visual motion segmentation using event surfaces,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 14 402–14 411.
  145. L. Wang, Y. Chae, S. Yoon, T. Kim, and K. Yoon, “Evdistill: Asynchronous events to end-task learning via bidirectional reconstruction-guided cross-modal knowledge distillation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 608–619.
  146. I. Alonso and A. C. Murillo, “Ev-segnet: Semantic segmentation for event-based cameras,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 1624–1633.
  147. Z. Sun, N. Messikommer, D. Gehrig, and D. Scaramuzza, “Ess: Learning event-based semantic segmentation from still images,” in ECCV, 2022, p. 341–357.
  148. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Commun. ACM, vol. 65, pp. 99–106, 2020.
  149. S. Klenk, L. Koestler, D. Scaramuzza, and D. Cremers, “E-nerf: Neural radiance fields from a moving event camera,” IEEE Robotics and Automation Letters, vol. 8, pp. 1587–1594, 2022.
  150. V. Rudnev, M. A. Elgharib, C. Theobalt, and V. Golyanik, “Eventnerf: Neural radiance fields from a single colour event camera,” IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4992–5002, 2022.
  151. I. T. Hwang, J. Kim, and Y. Kim, “Ev-nerf: Event based neural radiance field,” IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 837–847, 2022.
  152. B. Kerbl, G. Kopanas, T. Leimkuehler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” ACM Transactions on Graphics, vol. 42, pp. 1 – 14, 2023.
  153. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. E. Miller, M. Simens, A. Askell, P. Welinder, P. F. Christiano, J. Leike, and R. J. Lowe, “Training language models to follow instructions with human feedback,” ArXiv, vol. abs/2203.02155, 2022.
  154. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” arXiv:2304.02643, 2023.
Citations (22)

Summary

  • The paper demonstrates that event-based vSLAM overcomes frame-based limitations by leveraging asynchronous event sensors for improved motion and dynamic range performance.
  • It categorizes methods into feature-based, direct, motion-compensation, and deep learning, detailing unique algorithmic adaptations and sensor fusion strategies.
  • Numerical evaluations on benchmark datasets reveal enhanced tracking accuracy and robustness in challenging environments, suggesting promising avenues for future research.

Event-based Simultaneous Localization and Mapping: A Comprehensive Survey

This comprehensive survey authored by Kunping Huang, Sen Zhang, Jing Zhang, and Dacheng Tao, explores the burgeoning research area of event-based simultaneous localization and mapping (vSLAM), leveraging the unique properties of event cameras. Unlike conventional frame-based cameras, event cameras operate asynchronously, capturing pixel-level brightness changes. This feature enables superior performance in high-speed motion and high dynamic range scenarios, thereby addressing several limitations of traditional cameras concerning motion blur and dynamic range.

The survey is systematically organized to offer an exhaustive overview of event-based vSLAM, categorized into four distinct methodological approaches: feature-based, direct, motion-compensation, and deep learning methods. Each category provides insights into the processing and application of event data for localization and mapping tasks.

Feature-based Methods are primarily concerned with the extraction and tracking of features such as corner points or lines from event data. The paper highlights that while traditional corner detection algorithms can be adapted, event cameras' reliance on motion intricacies mean that novel algorithmic designs or the fusion of sensor data are often necessary to achieve robust performance. The inclusion of additional sensor inputs such as IMU data can significantly enhance tracking accuracy in challenging conditions, as demonstrated in extended temporal aggregation techniques.

Direct Methods bypass explicit feature detection by aligning event data directly to brightness intensities or edge structures within pre-processed event representations. These approaches benefit from high temporal resolution and perform uncoupled tracking and mapping through dense image-like reconstructions of raw event data. Direct methods excel in maintaining computational efficiency and robustness, particularly in texture-starved environments, though their accuracy may degrade with extreme motion dynamics.

Motion-compensation Methods employ innovative strategies like contrast maximization and probabilistic models to account for the impact of motion across multiple frames, essentially warping event data to reduce distortion or blurring over time. By compensating for motion artifacts, these methods ensure that events are well aligned spatially and temporally, leading to sharper image reconstruction and enhanced localization accuracy. However, the phenomenon known as event collapse remains a technical challenge, limiting these methods' effectiveness under certain motion configurations.

Deep Learning Techniques leverage the adaptability of neural networks trained on synthetic or richly annotated datasets, providing a promising frontier for incorporating semantic understanding and achieving higher-level tasks. Self-supervised learning schemes, in particular, offer potential by adapting models using temporal dynamics and photometric cues independently of extensive ground truth data.

Numerical evaluations across benchmark datasets like the MVSEC and rpg datasets demonstrate these methodologies' varying strengths and limitations, substantiating the claim that event-based vSLAM systems possess significant advantages in high-speed and adverse environments. The depth of the survey is further enhanced by a call for continuous research into sensor noise modeling, event data sparsity, global optimization, and the integration of multi-sensor setups, such as combining events with inertial or frame-based data.

In reflecting on the future of event-based vSLAM, the authors advocate for the development of more robust event representations, scalable theoretical frameworks, and more reliable systems in dynamic or textureless environments. The synthesis of foundation models with multi-modal data promises to further unlock the potential of event cameras in applications demanding resilience to perceptual challenges.

In summary, this survey provides a seminal consolidation of the conceptual and practical landscape surrounding event-based vSLAM, fostering a nuanced understanding of how these emerging methodologies can transform computational perception and navigation systems under dynamic, real-world conditions.

Github Logo Streamline Icon: https://streamlinehq.com