DynOPETs: A Versatile Benchmark for Dynamic Object Pose Estimation and Tracking in Moving Camera Scenarios (2503.19625v1)

Published 25 Mar 2025 in cs.CV

Abstract: In the realm of object pose estimation, scenarios involving both dynamic objects and moving cameras are prevalent. However, the scarcity of corresponding real-world datasets significantly hinders the development and evaluation of robust pose estimation models. This is largely attributed to the inherent challenges in accurately annotating object poses in dynamic scenes captured by moving cameras. To bridge this gap, this paper presents a novel dataset DynOPETs and a dedicated data acquisition and annotation pipeline tailored for object pose estimation and tracking in such unconstrained environments. Our efficient annotation method innovatively integrates pose estimation and pose tracking techniques to generate pseudo-labels, which are subsequently refined through pose graph optimization. The resulting dataset offers accurate pose annotations for dynamic objects observed from moving cameras. To validate the effectiveness and value of our dataset, we perform comprehensive evaluations using 18 state-of-the-art methods, demonstrating its potential to accelerate research in this challenging domain. The dataset will be made publicly available to facilitate further exploration and advancement in the field.

Summary

DynOPETs: A Comprehensive Dataset for Dynamic Object Pose Estimation and Tracking

DynOPETs introduces a versatile benchmark specifically designed for dynamic object pose estimation and tracking in scenarios involving moving cameras. Recognizing the complexities and frequent occurrences of such settings, especially in domains such as robotics and augmented reality (AR), the paper identifies a critical gap in existing pose estimation datasets, which largely assume static environments. The introduction of DynOPETs and its accompanying data acquisition and annotation pipeline is a significant step towards addressing these limitations.

The central contribution of DynOPETs lies in its dataset composition, consisting of RGB-D video sequences, CAD models, and simultaneous pose annotations for both dynamic objects and moving camera trajectories. Particularly innovative is the efficient annotation method combining pose estimation algorithms and pose graph optimization to refine pseudo-labels, significantly enhancing the precision of object pose data. This approach adeptly integrates absolute pose estimation with a global EKF smoother for refinement, alongside a relative pose estimator utilizing point tracking to yield accurate, marker-free dynamic object pose labels, minimizing manual effort in the annotation process.

Several key aspects underpin the contributions of DynOPETs:

Efficient Annotation Pipeline: The pipeline marries absolute pose estimation using CAD models with relative pose computing through point tracking, and subsequently refines these through pose graph optimization. The resulting annotation method enables dynamic object labeling without exhaustive manual intervention, utilizing a marker-free approach, thus improving annotation efficiency in complex moving camera scenarios.
Dataset Richness: The dataset comprises RGB-D sequences of 175 distinct objects captured with continuous motion of both objects and cameras, enriching the pose estimation task's complexity. The incorporation of diverse object categories, detailed CAD models, and synchronized 6-DoF pose annotations ensures the dataset supports a wide range of tasks relevant to object pose estimation.
Comprehensive Benchmarks: The paper systematically evaluates existing state-of-the-art methods over the dataset, employing metrics such as 3D IoU for category-level object pose estimation, and Average Recall for unseen object pose estimators. The benchmark results underscore DynOPETs' significance in pushing the boundaries of pose estimation models under dynamically complex scenarios.
Implications for Future Research: DynOPETs facilitates the exploration of advanced pose estimation techniques in real-world dynamic settings, with practical implications for embodied intelligence systems, robotic manipulation, and human-object interaction in AR environments. The provision of this dataset encourages further developments and refinements in algorithms capable of coping with movement within visual data capture.

The introduction of DynOPETs is timely and relevant in advancing the capabilities of pose estimation methodologies to handle dynamic environments, where conventional datasets fall short. Looking to the future, the availability of such a diverse and meticulously annotated dataset is anticipated to drive the development of more robust AI systems that can perceive and interact with dynamically changing environments more effectively. Moreover, expanding the scope of annotations to include human hand poses in conjunction with object tracking might further bolster advancements in AR/MR interfaces, offering an exciting trajectory for future research in these fields.