Multimotion Visual Odometry (MVO) (2110.15169v4)
Abstract: Visual motion estimation is a well-studied challenge in autonomous navigation. Recent work has focused on addressing multimotion estimation in highly dynamic environments. These environments not only comprise multiple, complex motions but also tend to exhibit significant occlusion. Estimating third-party motions simultaneously with the sensor egomotion is difficult because an object's observed motion consists of both its true motion and the sensor motion. Most previous works in multimotion estimation simplify this problem by relying on appearance-based object detection or application-specific motion constraints. These approaches are effective in specific applications and environments but do not generalize well to the full multimotion estimation problem (MEP). This paper presents Multimotion Visual Odometry (MVO), a multimotion estimation pipeline that estimates the full SE(3) trajectory of every motion in the scene, including the sensor egomotion, without relying on appearance-based information. MVO extends the traditional visual odometry (VO) pipeline with multimotion segmentation and tracking techniques. It uses physically founded motion priors to extrapolate motions through temporary occlusions and identify the reappearance of motions through motion closure. Evaluations on real-world data from the Oxford Multimotion Dataset (OMD) and the KITTI Vision Benchmark Suite demonstrate that MVO achieves good estimation accuracy compared to similar approaches and is applicable to a variety of multimotion estimation challenges.
- Agarwal S, Mierle K and Others (—) Ceres Solver.
- In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Anderson S and Barfoot TD (2015) Full STEAM ahead: Exactly sparse Gaussian process regression for batch continuous-time trajectory estimation on SE(3). In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 157–164. 10.1109/IROS.2015.7353368.
- Barfoot TD (2017) State Estimation for Robotics. Cambridge University Press.
- IEEE Robotics and Automation Letters (RA-L) 6(3): 5191–5198.
- Boykov Y, Veksler O and Zabih R (1999) Fast approximate energy minimization via graph cuts. In: IEEE International Conference on Computer Vision (ICCV), volume 1. pp. 377–384.
- IEEE Robotics and Automation Letters (RA-L) 4(2): 1541–1548. 10.1109/LRA.2019.2896472.
- Engel J, Koltun V and Cremers D (2017) Direct sparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 40(3): 611–625.
- Fischler MA and Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6): 381–395.
- IEEE Robotics & Automation Magazine 19(2): 78–90.
- Geiger A, Lenz P and Urtasun R (2012) Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3354–3361.
- Geiger A, Ziegler J and Stiller C (2011) Stereoscan: Dense 3D reconstruction in real-time. In: IEEE Intelligent Vehicles Symposium (IV). pp. 963–968.
- In: 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 2123–2129.
- In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2168–2177.
- In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5875–5884.
- Isack H and Boykov Y (2012) Energy-based geometric multi-model fitting. International Journal of Computer Vision (IJCV) 97(2): 123–147.
- In: IEEE International Conference on Robotics and Automation (ICRA). pp. 3992–3999.
- PhD Thesis.
- Judd K and Gammell J (2019a) The oxford multimotion dataset: Multiple SE(3) motions with ground truth. IEEE Robotics and Automation Letters (RA-L) 4(2): 800–807. 10.1109/LRA.2019.2892656. Presented at ICRA 2019.
- Judd K and Gammell J (2019b) SE(3) multimotion estimation through occlusion. In: Long-Term Human Motion Prediction (LHMP) Workshop, IEEE International Conference on Robotics and Automation (ICRA).
- Judd K and Gammell J (2020) Occlusion-robust MVO: Multimotion estimation through occlusion via motion closure. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 5855–5862. 10.1109/IROS45743.2020.9341355. ArXiv:1905.05121 [cs.RO].
- Judd K, Gammell J and Newman P (2018a) Multimotion visual odometry (MVO): Simultaneous estimation of camera and third-party motions. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 3949–3956. 10.1109/IROS.2018.8594213. ArXiv (corrected version):1808.00274 [cs.RO].
- Judd K, Gammell J and Newman P (2018b) Visual multimotion estimation. In: Joint Industry and Robotics CDTs Symposium (JIRCS).
- In: IEEE Intelligent Vehicles Symposium (IV). pp. 926–932.
- Lin KH and Wang CC (2010) Stereo-based simultaneous localization, mapping and moving object tracking. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 3975–3980.
- Markley FL (1988) Attitude determination using vector observations and the singular value decomposition. Journal of the Astronautical Sciences 36(3): 245–258.
- Menze M and Geiger A (2015) Object scene flow for autonomous vehicles. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3061–3070.
- ArXiv: 1603.00831 [cs.CV].
- In: European Conference on Computer Vision (ECCV). Springer, pp. 397–410.
- Moravec HP (1980) Obstacle Avoidance and Navigation in the Real World by a Seeing Robot Rover. PhD Thesis, Stanford, CA, USA.
- Nistér D (2004) An efficient solution to the five-point relative pose problem. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 26(6): 756–777. 10.1109/TPAMI.2004.17.
- IEEE Transactions on Robotics (T-RO) 35(4): 799–816. 10.1109/TRO.2019.2909085.
- In: European Conference on Computer Vision (ECCV). Springer, pp. 567–582.
- In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR). pp. 31–40.
- Rünz M and Agapito L (2017) Co-fusion: Real-time segmentation, tracking and fusion of multiple objects. In: IEEE International Conference on Robotics and Automation (ICRA). pp. 4471–4478.
- In: IEEE International Symposium on Mixed and Augmented Reality. pp. 10–20. 10.1109/International Symposium on Mixed and Augmented Reality.2018.00024.
- Sabzevari R and Scaramuzza D (2016) Multi-body motion estimation from monocular vehicle-mounted cameras. IEEE Transactions on Robotics (T-RO) 32(3): 638–651.
- The International Journal of Robotics Research 37(1): 83–103.
- Tang TY, Yoon DJ and Barfoot TD (2019) A white-noise-on-jerk motion prior for continuous-time trajectory estimation on se (3). IEEE Robotics and Automation Letters (RA-L) 4(2): 594–601. 10.1109/LRA.2019.2891492.
- International Journal of Computer Vision (IJCV) 96(2): 212–234.
- Wahba G (1965) A least squares estimate of satellite attitude. SIAM review 7(3): 409–409.
- IEEE Robotics and Automation Letters (RA-L) 6(2): 550–557.
- The International Journal of Robotics Research (IJRR) 26(9): 889–916.
- In: IIEEE nternational Conference on Robotics and Automation (ICRA). pp. 5231–5237.
- Yang B, Huang C and Nevatia R (2011) Learning affinities and dependencies for multi-target tracking using a CRF model. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1233–1240. 10.1109/IEEE/CVF Conference on Computer Vision and Pattern Recognition.2011.5995587.
- Yang S and Scherer S (2019) CubeSLAM: Monocular 3D object SLAM. IEEE Transactions on Robotics 35(4): 925 – 938.
- In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
- arXiv preprint arXiv:2005.11052 .