Multimotion Visual Odometry (MVO) (2110.15169v4)

Published 28 Oct 2021 in cs.RO

Abstract: Visual motion estimation is a well-studied challenge in autonomous navigation. Recent work has focused on addressing multimotion estimation in highly dynamic environments. These environments not only comprise multiple, complex motions but also tend to exhibit significant occlusion. Estimating third-party motions simultaneously with the sensor egomotion is difficult because an object's observed motion consists of both its true motion and the sensor motion. Most previous works in multimotion estimation simplify this problem by relying on appearance-based object detection or application-specific motion constraints. These approaches are effective in specific applications and environments but do not generalize well to the full multimotion estimation problem (MEP). This paper presents Multimotion Visual Odometry (MVO), a multimotion estimation pipeline that estimates the full SE(3) trajectory of every motion in the scene, including the sensor egomotion, without relying on appearance-based information. MVO extends the traditional visual odometry (VO) pipeline with multimotion segmentation and tracking techniques. It uses physically founded motion priors to extrapolate motions through temporary occlusions and identify the reappearance of motions through motion closure. Evaluations on real-world data from the Oxford Multimotion Dataset (OMD) and the KITTI Vision Benchmark Suite demonstrate that MVO achieves good estimation accuracy compared to similar approaches and is applicable to a variety of multimotion estimation challenges.

References (48)

Citations (9)

View on Semantic Scholar

Summary

The paper introduces the MVO pipeline that extends traditional visual odometry to estimate SE(3) trajectories of multiple independently moving objects.
It employs multimotion segmentation with physically grounded priors like WNOA and WNOJ to accurately track motions through occlusions.
Evaluations on Oxford and KITTI datasets demonstrate MVO's robustness, highlighting its potential in autonomous driving and robotics.

Multimotion Visual Odometry (MVO) in Highly Dynamic Environments

The paper presents a comprehensive approach to addressing the Multimotion Estimation Problem (MEP) by introducing the Multimotion Visual Odometry (MVO) pipeline. Unlike traditional visual odometry (VO), which typically focuses on egomotion estimation in static environments, MVO extends the VO pipeline to estimate the SE(3) trajectories of multiple independent motions in the scene, thereby fully addressing the MEP.

Technical Overview

MVO integrates multimotion segmentation and tracking techniques into the visual odometry process, enabling the estimation of multiple motions simultaneously. It circumvents the traditional reliance on appearance-based information by prioritizing motion detection and estimation. This prioritization facilitates the tracking of objects through temporary occlusions using physically founded motion priors, such as White Noise on Acceleration (WNOA) and White Noise on Jerk (WNOJ), to maintain continuity in motion estimation. These priors enable MVO to extrapolate object motions during occlusions and identify them after they reappear, a concept known as motion closure.

The segmentation process in MVO is based on a multilabeling problem that iteratively proposes, assigns, and merges motion labels using a k-nearest neighbors graph. This allows for dynamic adjustment without prior knowledge of the number of objects or their nature, effectively adapting to highly dynamic environments.

Evaluation and Results

The paper evaluates MVO on the Oxford Multimotion Dataset (OMD) and the KITTI Vision Benchmark Suite. It quantitatively assesses the accuracy of MVO using metrics such as global odometric error and relative RMS error. In experiments involving sequences with significant occlusions, MVO demonstrates considerable estimation accuracy and the ability to track objects through occlusions, a capability not easily achievable by traditional appearance-based approaches.

For the OMD, the results reveal that MVO can accurately segment and estimate the motions of swinging blocks, even in scenarios with complex dynamics. The KITTI dataset further showcases MVO's applicability to real-world autonomous driving scenarios, consistently estimating the egomotion and the trajectory of third-party vehicles in dynamic environments.

Discussion and Implications

The paper's findings suggest that MVO is a robust tool for navigating and understanding highly dynamic environments without the need for appearance-based preprocessing. The ability to accurately estimate object motions in various settings extends the potential applications of MVO to fields such as autonomous driving, robotics, and any domain requiring precise motion tracking in cluttered, dynamic scenes.

Future research could focus on optimizing MVO for real-time applications by parallelizing the batch estimation processes and integrating additional sensors to enhance robustness. Additionally, integrating specific appearance-based information could improve segmentation and tracking of independently moving objects that temporarily exhibit similar motions to others.

In conclusion, MVO represents a significant advancement in addressing the MEP, providing a framework for accurate multimotion estimation that prioritizes motion over appearance. Its successful deployment in dynamic and occlusion-heavy environments sets the stage for broader applications in complex, real-world scenarios.

PDF Markdown

Related Papers

YouTube

Show All Videos