Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multimotion Visual Odometry (MVO) (2110.15169v4)

Published 28 Oct 2021 in cs.RO

Abstract: Visual motion estimation is a well-studied challenge in autonomous navigation. Recent work has focused on addressing multimotion estimation in highly dynamic environments. These environments not only comprise multiple, complex motions but also tend to exhibit significant occlusion. Estimating third-party motions simultaneously with the sensor egomotion is difficult because an object's observed motion consists of both its true motion and the sensor motion. Most previous works in multimotion estimation simplify this problem by relying on appearance-based object detection or application-specific motion constraints. These approaches are effective in specific applications and environments but do not generalize well to the full multimotion estimation problem (MEP). This paper presents Multimotion Visual Odometry (MVO), a multimotion estimation pipeline that estimates the full SE(3) trajectory of every motion in the scene, including the sensor egomotion, without relying on appearance-based information. MVO extends the traditional visual odometry (VO) pipeline with multimotion segmentation and tracking techniques. It uses physically founded motion priors to extrapolate motions through temporary occlusions and identify the reappearance of motions through motion closure. Evaluations on real-world data from the Oxford Multimotion Dataset (OMD) and the KITTI Vision Benchmark Suite demonstrate that MVO achieves good estimation accuracy compared to similar approaches and is applicable to a variety of multimotion estimation challenges.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Agarwal S, Mierle K and Others (—) Ceres Solver.
  2. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  3. Anderson S and Barfoot TD (2015) Full STEAM ahead: Exactly sparse Gaussian process regression for batch continuous-time trajectory estimation on SE(3). In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 157–164. 10.1109/IROS.2015.7353368.
  4. Barfoot TD (2017) State Estimation for Robotics. Cambridge University Press.
  5. IEEE Robotics and Automation Letters (RA-L) 6(3): 5191–5198.
  6. Boykov Y, Veksler O and Zabih R (1999) Fast approximate energy minimization via graph cuts. In: IEEE International Conference on Computer Vision (ICCV), volume 1. pp. 377–384.
  7. IEEE Robotics and Automation Letters (RA-L) 4(2): 1541–1548. 10.1109/LRA.2019.2896472.
  8. Engel J, Koltun V and Cremers D (2017) Direct sparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 40(3): 611–625.
  9. Fischler MA and Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6): 381–395.
  10. IEEE Robotics & Automation Magazine 19(2): 78–90.
  11. Geiger A, Lenz P and Urtasun R (2012) Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3354–3361.
  12. Geiger A, Ziegler J and Stiller C (2011) Stereoscan: Dense 3D reconstruction in real-time. In: IEEE Intelligent Vehicles Symposium (IV). pp. 963–968.
  13. In: 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 2123–2129.
  14. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2168–2177.
  15. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5875–5884.
  16. Isack H and Boykov Y (2012) Energy-based geometric multi-model fitting. International Journal of Computer Vision (IJCV) 97(2): 123–147.
  17. In: IEEE International Conference on Robotics and Automation (ICRA). pp. 3992–3999.
  18. PhD Thesis.
  19. Judd K and Gammell J (2019a) The oxford multimotion dataset: Multiple SE(3) motions with ground truth. IEEE Robotics and Automation Letters (RA-L) 4(2): 800–807. 10.1109/LRA.2019.2892656. Presented at ICRA 2019.
  20. Judd K and Gammell J (2019b) SE(3) multimotion estimation through occlusion. In: Long-Term Human Motion Prediction (LHMP) Workshop, IEEE International Conference on Robotics and Automation (ICRA).
  21. Judd K and Gammell J (2020) Occlusion-robust MVO: Multimotion estimation through occlusion via motion closure. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 5855–5862. 10.1109/IROS45743.2020.9341355. ArXiv:1905.05121 [cs.RO].
  22. Judd K, Gammell J and Newman P (2018a) Multimotion visual odometry (MVO): Simultaneous estimation of camera and third-party motions. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 3949–3956. 10.1109/IROS.2018.8594213. ArXiv (corrected version):1808.00274 [cs.RO].
  23. Judd K, Gammell J and Newman P (2018b) Visual multimotion estimation. In: Joint Industry and Robotics CDTs Symposium (JIRCS).
  24. In: IEEE Intelligent Vehicles Symposium (IV). pp. 926–932.
  25. Lin KH and Wang CC (2010) Stereo-based simultaneous localization, mapping and moving object tracking. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 3975–3980.
  26. Markley FL (1988) Attitude determination using vector observations and the singular value decomposition. Journal of the Astronautical Sciences 36(3): 245–258.
  27. Menze M and Geiger A (2015) Object scene flow for autonomous vehicles. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3061–3070.
  28. ArXiv: 1603.00831 [cs.CV].
  29. In: European Conference on Computer Vision (ECCV). Springer, pp. 397–410.
  30. Moravec HP (1980) Obstacle Avoidance and Navigation in the Real World by a Seeing Robot Rover. PhD Thesis, Stanford, CA, USA.
  31. Nistér D (2004) An efficient solution to the five-point relative pose problem. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 26(6): 756–777. 10.1109/TPAMI.2004.17.
  32. IEEE Transactions on Robotics (T-RO) 35(4): 799–816. 10.1109/TRO.2019.2909085.
  33. In: European Conference on Computer Vision (ECCV). Springer, pp. 567–582.
  34. In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR). pp. 31–40.
  35. Rünz M and Agapito L (2017) Co-fusion: Real-time segmentation, tracking and fusion of multiple objects. In: IEEE International Conference on Robotics and Automation (ICRA). pp. 4471–4478.
  36. In: IEEE International Symposium on Mixed and Augmented Reality. pp. 10–20. 10.1109/International Symposium on Mixed and Augmented Reality.2018.00024.
  37. Sabzevari R and Scaramuzza D (2016) Multi-body motion estimation from monocular vehicle-mounted cameras. IEEE Transactions on Robotics (T-RO) 32(3): 638–651.
  38. The International Journal of Robotics Research 37(1): 83–103.
  39. Tang TY, Yoon DJ and Barfoot TD (2019) A white-noise-on-jerk motion prior for continuous-time trajectory estimation on se (3). IEEE Robotics and Automation Letters (RA-L) 4(2): 594–601. 10.1109/LRA.2019.2891492.
  40. International Journal of Computer Vision (IJCV) 96(2): 212–234.
  41. Wahba G (1965) A least squares estimate of satellite attitude. SIAM review 7(3): 409–409.
  42. IEEE Robotics and Automation Letters (RA-L) 6(2): 550–557.
  43. The International Journal of Robotics Research (IJRR) 26(9): 889–916.
  44. In: IIEEE nternational Conference on Robotics and Automation (ICRA). pp. 5231–5237.
  45. Yang B, Huang C and Nevatia R (2011) Learning affinities and dependencies for multi-target tracking using a CRF model. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1233–1240. 10.1109/IEEE/CVF Conference on Computer Vision and Pattern Recognition.2011.5995587.
  46. Yang S and Scherer S (2019) CubeSLAM: Monocular 3D object SLAM. IEEE Transactions on Robotics 35(4): 925 – 938.
  47. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
  48. arXiv preprint arXiv:2005.11052 .
Citations (9)

Summary

  • The paper introduces the MVO pipeline that extends traditional visual odometry to estimate SE(3) trajectories of multiple independently moving objects.
  • It employs multimotion segmentation with physically grounded priors like WNOA and WNOJ to accurately track motions through occlusions.
  • Evaluations on Oxford and KITTI datasets demonstrate MVO's robustness, highlighting its potential in autonomous driving and robotics.

Multimotion Visual Odometry (MVO) in Highly Dynamic Environments

The paper presents a comprehensive approach to addressing the Multimotion Estimation Problem (MEP) by introducing the Multimotion Visual Odometry (MVO) pipeline. Unlike traditional visual odometry (VO), which typically focuses on egomotion estimation in static environments, MVO extends the VO pipeline to estimate the SE(3) trajectories of multiple independent motions in the scene, thereby fully addressing the MEP.

Technical Overview

MVO integrates multimotion segmentation and tracking techniques into the visual odometry process, enabling the estimation of multiple motions simultaneously. It circumvents the traditional reliance on appearance-based information by prioritizing motion detection and estimation. This prioritization facilitates the tracking of objects through temporary occlusions using physically founded motion priors, such as White Noise on Acceleration (WNOA) and White Noise on Jerk (WNOJ), to maintain continuity in motion estimation. These priors enable MVO to extrapolate object motions during occlusions and identify them after they reappear, a concept known as motion closure.

The segmentation process in MVO is based on a multilabeling problem that iteratively proposes, assigns, and merges motion labels using a k-nearest neighbors graph. This allows for dynamic adjustment without prior knowledge of the number of objects or their nature, effectively adapting to highly dynamic environments.

Evaluation and Results

The paper evaluates MVO on the Oxford Multimotion Dataset (OMD) and the KITTI Vision Benchmark Suite. It quantitatively assesses the accuracy of MVO using metrics such as global odometric error and relative RMS error. In experiments involving sequences with significant occlusions, MVO demonstrates considerable estimation accuracy and the ability to track objects through occlusions, a capability not easily achievable by traditional appearance-based approaches.

For the OMD, the results reveal that MVO can accurately segment and estimate the motions of swinging blocks, even in scenarios with complex dynamics. The KITTI dataset further showcases MVO's applicability to real-world autonomous driving scenarios, consistently estimating the egomotion and the trajectory of third-party vehicles in dynamic environments.

Discussion and Implications

The paper's findings suggest that MVO is a robust tool for navigating and understanding highly dynamic environments without the need for appearance-based preprocessing. The ability to accurately estimate object motions in various settings extends the potential applications of MVO to fields such as autonomous driving, robotics, and any domain requiring precise motion tracking in cluttered, dynamic scenes.

Future research could focus on optimizing MVO for real-time applications by parallelizing the batch estimation processes and integrating additional sensors to enhance robustness. Additionally, integrating specific appearance-based information could improve segmentation and tracking of independently moving objects that temporarily exhibit similar motions to others.

In conclusion, MVO represents a significant advancement in addressing the MEP, providing a framework for accurate multimotion estimation that prioritizes motion over appearance. Its successful deployment in dynamic and occlusion-heavy environments sets the stage for broader applications in complex, real-world scenarios.

Youtube Logo Streamline Icon: https://streamlinehq.com