PoseRBPF: A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking (1905.09304v1)

Published 22 May 2019 in cs.CV and cs.RO

Abstract: Tracking 6D poses of objects from videos provides rich information to a robot in performing different tasks such as manipulation and navigation. In this work, we formulate the 6D object pose tracking problem in the Rao-Blackwellized particle filtering framework, where the 3D rotation and the 3D translation of an object are decoupled. This factorization allows our approach, called PoseRBPF, to efficiently estimate the 3D translation of an object along with the full distribution over the 3D rotation. This is achieved by discretizing the rotation space in a fine-grained manner, and training an auto-encoder network to construct a codebook of feature embeddings for the discretized rotations. As a result, PoseRBPF can track objects with arbitrary symmetries while still maintaining adequate posterior distributions. Our approach achieves state-of-the-art results on two 6D pose estimation benchmarks. A video showing the experiments can be found at https://youtu.be/lE5gjzRKWuA

Authors (6)

Xinke Deng (4 papers)
Arsalan Mousavian (42 papers)
Yu Xiang (128 papers)
Fei Xia (111 papers)
Timothy Bretl (18 papers)
Dieter Fox (201 papers)

Citations (187)

View on Semantic Scholar

Summary

The paper introduces PoseRBPF, which decouples 6D pose tracking into separate translation and rotation estimations for enhanced computational efficiency.
It employs a factorized posterior with an auto-encoder codebook to rapidly evaluate likelihoods and robustly handle occlusions and object symmetries.
Experiments on YCB Video and T-LESS benchmarks demonstrate that PoseRBPF outperforms methods like PoseCNN and DenseFusion in real-time applications.

Overview of PoseRBPF: A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking

The paper presents PoseRBPF, an innovative approach that utilizes a Rao-Blackwellized Particle Filter (RBPF) framework for efficient 6D object pose tracking. This methodology addresses the estimation of both the 3D translation and 3D rotation of objects from video sequences. The research highlights the merits of decoupling the object pose into translation and rotation components, thus allowing for computational efficiency and enhanced accuracy in tracking.

The proposed system uses factorized posteriors, which is a key facet of the Rao-Blackwellized Particle Filter approach. It estimates 3D translations across particles while maintaining distribution over 3D rotations, enabling the concurrent consideration of uncertainty and multimodal distributions arising, for example, due to object symmetries. This is especially pivotal in handling symmetric objects where traditional pose estimation techniques struggle. By finely discretizing the rotation space and leveraging a learned auto-encoder to generate an embedding codebook, PoseRBPF can robustly track poses even under occlusions and viewpoint variations.

Technical Summary

Auto-Encoder Framework: A neural network auto-encoder is trained to transform real-world object views into their corresponding synthetic representations, encoding these into compact feature embeddings. This codebook represents the object’s appearance from different poses and is used for rapid likelihood evaluations across particles.
Factorized Approach: The paper innovatively decouples the 6D pose into 3D translation and 3D rotation, tracking translation through particle sampling and deducing rotation distribution within each particle based on likelihoods derived from the codebook matching process.
Efficient Likelihood Computation: The approach allows updating distributions over orientations by evaluating embeddings for thousands of potential orientations in parallel, using the GPU to handle real-time operation efficiently. This real-time capability is demonstrated with frame rates sufficient for practical robot applications.
Handling Symmetries and Occlusions: Significantly, PoseRBPF does not require pre-definition of symmetry axes, allowing it to track full distributions even for symmetric objects, overcoming a major limitation in conventional pose estimation approaches.

Performance Evaluation

The effectiveness of PoseRBPF is validated on the YCB Video and T-LESS datasets. On these benchmarks, it achieves state-of-the-art results by outperforming existing methods such as PoseCNN and DenseFusion in terms of accuracy. Notably, it efficiently manages challenges posed by object symmetry, texture variance, and environmental occlusions in different scenarios.

With results showing marked improvement on RGB-D data, the research accentuates the advantages of utilizing depth information to better handle translation estimation. PoseRBPF handles the complexities of real-world dynamic environments, substantiating its versatility and robustness.

Implications and Future Directions

PoseRBPF contributes significantly to the field of robotic perception and autonomous systems by advancing object interaction capabilities in cluttered or partially observed scenes. Its efficient pose tracking advances applications in robotic manipulation, augmented reality, and autonomous navigation.

Future directions could explore further integration with SLAM systems, improving scalability to larger sets of objects, and refining the auto-encoder architecture for even more compact and discriminative embeddings. Given the growing importance of sensor fusion in robotics, incorporating additional sensory data (e.g., tactile, auditory) may also offer avenues for extending this work. Lastly, leveraging recent advancements in reinforcement learning could augment PoseRBPF's robustness in dynamic or unseen environments, enhancing its real-time adaptability and operational range.

PDF Markdown

Related Papers

YouTube

Show All Videos