SilhoNet: An RGB Method for 6D Object Pose Estimation (1809.06893v4)

Published 18 Sep 2018 in cs.CV and cs.RO

Abstract: Autonomous robot manipulation involves estimating the translation and orientation of the object to be manipulated as a 6-degree-of-freedom (6D) pose. Methods using RGB-D data have shown great success in solving this problem. However, there are situations where cost constraints or the working environment may limit the use of RGB-D sensors. When limited to monocular camera data only, the problem of object pose estimation is very challenging. In this work, we introduce a novel method called SilhoNet that predicts 6D object pose from monocular images. We use a Convolutional Neural Network (CNN) pipeline that takes in Region of Interest (ROI) proposals to simultaneously predict an intermediate silhouette representation for objects with an associated occlusion mask and a 3D translation vector. The 3D orientation is then regressed from the predicted silhouettes. We show that our method achieves better overall performance on the YCB-Video dataset than two state-of-the art networks for 6D pose estimation from monocular image input.

View on arXiv

Authors (2)

Gideon Billings (7 papers)
Matthew Johnson-Roberson (72 papers)

Citations (58)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

SilhoNet: An RGB Method for 6D Object Pose Estimation (1809.06893v4)

Summary

Related Papers