Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose Relocalization (2403.19412v1)

Published 28 Mar 2024 in cs.CV

Abstract: Event cameras exhibit remarkable attributes such as high dynamic range, asynchronicity, and low latency, making them highly suitable for vision tasks that involve high-speed motion in challenging lighting conditions. These cameras implicitly capture movement and depth information in events, making them appealing sensors for Camera Pose Relocalization (CPR) tasks. Nevertheless, existing CPR networks based on events neglect the pivotal fine-grained temporal information in events, resulting in unsatisfactory performance. Moreover, the energy-efficient features are further compromised by the use of excessively complex models, hindering efficient deployment on edge devices. In this paper, we introduce PEPNet, a simple and effective point-based network designed to regress six degrees of freedom (6-DOFs) event camera poses. We rethink the relationship between the event camera and CPR tasks, leveraging the raw Point Cloud directly as network input to harness the high-temporal resolution and inherent sparsity of events. PEPNet is adept at abstracting the spatial and implicit temporal features through hierarchical structure and explicit temporal features by Attentive Bi-directional Long Short-Term Memory (A-Bi-LSTM). By employing a carefully crafted lightweight design, PEPNet delivers state-of-the-art (SOTA) performance on both indoor and outdoor datasets with meager computational resources. Specifically, PEPNet attains a significant 38% and 33% performance improvement on the random split IJRR and M3ED datasets, respectively. Moreover, the lightweight design version PEPNet$_{tiny}$ accomplishes results comparable to the SOTA while employing a mere 0.5% of the parameters.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Relocnet: Continuous metric learning relocalisation using neural nets. In Proceedings of the European Conference on Computer Vision (ECCV), pages 751–767, 2018.
  2. Visual camera re-localization from rgb and rgb-d images using dsac. IEEE transactions on pattern analysis and machine intelligence, 44(9):5847–5865, 2021.
  3. Dsac-differentiable ransac for camera localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6684–6692, 2017.
  4. M3ed: Multi-robot, multi-sensor, multi-environment event dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4015–4022, 2023.
  5. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
  6. Ds-cim: A 40nm asynchronous dual-spike driven, mram compute-in-memory macro for spiking neural network. IEEE Transactions on Circuits and Systems I: Regular Papers, 2024.
  7. Event-based camera pose tracking using a generative event model. arXiv preprint arXiv:1510.01972, 2015.
  8. Event-based, 6-dof camera tracking from photometric depth maps. IEEE transactions on pattern analysis and machine intelligence, 40(10):2402–2412, 2017.
  9. Event-based vision: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(1):154–180, 2020.
  10. Deep learning for 3d point clouds: A survey. IEEE transactions on pattern analysis and machine intelligence, 43(12):4338–4364, 2020.
  11. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  12. A 6-dofs event-based camera relocalization system by cnn-lstm and image denoising. Expert Systems with Applications, 170:114535, 2021.
  13. Modelling uncertainty in deep learning for camera relocalization. In 2016 IEEE international conference on Robotics and Automation (ICRA), pages 4762–4769. IEEE, 2016.
  14. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision, pages 2938–2946, 2015.
  15. Camera relocalization by computing pairwise relative poses using convolutional neural network. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 929–938, 2017.
  16. Deep learning. nature, 521(7553):436–444, 2015.
  17. A 128 ×\times× 128 120 db 15 μ𝜇\muitalic_μs latency asynchronous temporal contrast vision sensor. IEEE journal of solid-state circuits, 43(2):566–576, 2008.
  18. 6-dof pose relocalization for event cameras with entropy frame and attention networks. In The 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry, pages 1–8, 2022.
  19. Deep global-relative networks for end-to-end 6-dof visual localization and odometry. In PRICAI 2019: Trends in Artificial Intelligence: 16th Pacific Rim International Conference on Artificial Intelligence, Cuvu, Yanuca Island, Fiji, August 26–30, 2019, Proceedings, Part II, pages 454–467. Springer, 2019.
  20. Afpr-cim: An analog-domain floating-point rram-based compute-in-memory architecture with dynamic range adaptive fp-adc. arXiv preprint arXiv:2402.13798, 2024.
  21. Rethinking network design and local geometry in point cloud: A simple residual mlp framework. In International Conference on Learning Representations, 2021.
  22. Learning visual motion segmentation using event surfaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14414–14423, 2020.
  23. The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and slam. The International Journal of Robotics Research, 36(2):142–149, 2017.
  24. Continuous-time visual-inertial odometry for event cameras. IEEE Transactions on Robotics, 34(6):1425–1440, 2018.
  25. Deep regression for monocular camera-based 6-dof global localization in outdoor environments. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1525–1530. IEEE, 2017.
  26. Real-time 6dof pose relocalization for event cameras with stacked spatial lstm networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
  27. A qvga 143 db dynamic range frame-free pwm image sensor with lossless pixel-level video compression and time-domain cds. IEEE Journal of Solid-State Circuits, 46(1):259–275, 2010.
  28. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017a.
  29. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017b.
  30. Vlocnet++: Deep multitask learning for semantic visual localization and odometry. IEEE Robotics and Automation Letters, 3(4):4407–4414, 2018.
  31. Esim: an open event camera simulator. In Conference on robot learning, pages 969–982. PMLR, 2018.
  32. Ttpoint: A tensorized point cloud network for lightweight action recognition with event cameras. arXiv preprint arXiv:2308.09993, 2023a.
  33. Spikepoint: An efficient point-based spiking neural network for event cameras action recognition. arXiv preprint arXiv:2310.07189, 2023b.
  34. Eventnet: Asynchronous recursive event processing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3887–3896, 2019.
  35. Introduction to camera pose estimation with deep learning. arXiv preprint arXiv:1907.05272, 2019.
  36. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  37. Deep learning for pose estimation from event camera. In 2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pages 1–7. IEEE, 2022.
  38. Deep auxiliary learning for visual localization and odometry. In 2018 IEEE international conference on robotics and automation (ICRA), pages 6939–6946. IEEE, 2018.
  39. Image-based localization using lstms for structured feature correlation. In Proceedings of the IEEE International Conference on Computer Vision, pages 627–637, 2017.
  40. Space-time event clouds for gesture recognition: From rgb cameras to event cameras. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1826–1835. IEEE, 2019.
  41. Delving deeper into convolutional neural networks for camera relocalization. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 5644–5651. IEEE, 2017.
  42. Pointconv: Deep convolutional networks on 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9621–9630, 2019.
  43. Modeling point clouds with self-attention and gumbel subset sampling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3323–3332, 2019.
  44. Point transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16259–16268, 2021.
Citations (6)

Summary

We haven't generated a summary for this paper yet.