EfficientPose: An efficient, accurate and scalable end-to-end 6D multi object pose estimation approach (2011.04307v2)

Published 9 Nov 2020 in cs.CV

Abstract: In this paper we introduce EfficientPose, a new approach for 6D object pose estimation. Our method is highly accurate, efficient and scalable over a wide range of computational resources. Moreover, it can detect the 2D bounding box of multiple objects and instances as well as estimate their full 6D poses in a single shot. This eliminates the significant increase in runtime when dealing with multiple objects other approaches suffer from. These approaches aim to first detect 2D targets, e.g. keypoints, and solve a Perspective-n-Point problem for their 6D pose for each object afterwards. We also propose a novel augmentation method for direct 6D pose estimation approaches to improve performance and generalization, called 6D augmentation. Our approach achieves a new state-of-the-art accuracy of 97.35% in terms of the ADD(-S) metric on the widely-used 6D pose estimation benchmark dataset Linemod using RGB input, while still running end-to-end at over 27 FPS. Through the inherent handling of multiple objects and instances and the fused single shot 2D object detection as well as 6D pose estimation, our approach runs even with multiple objects (eight) end-to-end at over 26 FPS, making it highly attractive to many real world scenarios. Code will be made publicly available at https://github.com/ybkscht/EfficientPose.

Authors (2)

Yannick Bukschat (2 papers)
Marcus Vetter (3 papers)

Citations (97)

View on Semantic Scholar

Summary

Overview of EfficientPose: An End-to-End 6D Multi-Object Pose Estimation Approach

The paper presents a novel method named EfficientPose designed for efficient and accurate 6D object pose estimation. Building on the foundation of the EfficientDet architecture for 2D object detection, EfficientPose extends its capabilities to estimate both the 2D bounding boxes and the full 6D pose of multiple objects in a single shot. The work addresses the limitations of conventional pose estimation methods, which typically handle each object independently and suffer from increased runtime with multiple objects.

Key Contributions

Single-Shot Multi-Object 6D Pose Estimation: EfficientPose integrates two additional subnetworks into the EfficientDet framework to predict translation and rotation, enabling concurrent detection and pose estimation of multiple objects. This integration mitigates the increase in runtime that other methods experience with added object counts.
Scalability: The method maintains the scalability feature offered by EfficientDet, which allows the model to be optimized across different computational resources using a single hyperparameter ( $\phi$ ). This feature ensures that EfficientPose can be adjusted for varying balances of speed and accuracy.
6D Augmentation Technique: A significant innovation in the paper is the introduction of a robust 6D data augmentation strategy. This method enhances generalization and performance by allowing rotation and scaling in image space while maintaining consistency between augmented images and 6D pose annotations. The augmentation closes the performance gap between direct 6D pose estimation and 2D+PnP methods, particularly beneficial for small datasets like Linemod.
High Accuracy with State-of-the-Art Results: On the Linemod dataset, EfficientPose achieves a notably high state-of-the-art accuracy of 97.35% using the ADD(-S) metric with RGB input, demonstrating competitive performance compared to existing 2D+PnP-based methods without sacrificing runtime efficiency.

Implications and Future Directions

EfficientPose represents a substantial advancement in the field of object pose estimation by effectively combining efficient 2D object detection with comprehensive 6D pose estimation, without requiring post-processing steps common in other methodologies such as PnP and RANSAC. This approach's run-time efficiency and capability to handle multiple objects positions it as favorable for practical applications in robotics, augmented reality, and autonomous driving, where real-time performance and scalability are crucial.

The introduction of scalable and efficient pose estimation models like EfficientPose can pave the way for more accessible and adaptable AI systems in real-world scenarios. Future research could explore integrating such methods into edge computing environments, where computational resources are limited. Additionally, extending the EfficientPose framework to incorporate depth data or other sensor modalities could further enhance its precision and applicability in complex environments.

EfficientPose's architecture demonstrates a promising direction towards holistic, efficient, and scalable computer vision frameworks that unify various sub-tasks into a cohesive model, ultimately pushing the boundaries in real-world AI applications.

PDF Markdown

Related Papers

GitHub

GitHub - ybkscht/EfficientPose (276 stars)