IndustryShapes: An RGB-D Benchmark dataset for 6D object pose estimation of industrial assembly components and tools

Published 5 Feb 2026 in cs.CV and cs.RO | (2602.05555v1)

Abstract: We introduce IndustryShapes, a new RGB-D benchmark dataset of industrial tools and components, designed for both instance-level and novel object 6D pose estimation approaches. The dataset provides a realistic and application-relevant testbed for benchmarking these methods in the context of industrial robotics bridging the gap between lab-based research and deployment in real-world manufacturing scenarios. Unlike many previous datasets that focus on household or consumer products or use synthetic, clean tabletop datasets, or objects captured solely in controlled lab environments, IndustryShapes introduces five new object types with challenging properties, also captured in realistic industrial assembly settings. The dataset has diverse complexity, from simple to more challenging scenes, with single and multiple objects, including scenes with multiple instances of the same object and it is organized in two parts: the classic set and the extended set. The classic set includes a total of 4,6k images and 6k annotated poses. The extended set introduces additional data modalities to support the evaluation of model-free and sequence-based approaches. To the best of our knowledge, IndustryShapes is the first dataset to offer RGB-D static onboarding sequences. We further evaluate the dataset on a representative set of state-of-the art methods for instance-based and novel object 6D pose estimation, including also object detection, segmentation, showing that there is room for improvement in this domain. The dataset page can be found in https://pose-lab.github.io/IndustryShapes.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper presents a novel benchmark dataset that combines real and synthetic industrial images to tackle challenges in 6D object pose estimation.
It rigorously evaluates both instance-based and model-free approaches using metrics such as ADD, VSD, MSSD, and MSPD within realistic industrial setups.
The dual-mode dataset bridges controlled lab imagery and real-world environments, paving the way for advancements in industrial automation and robust pose estimation.

IndustryShapes: An RGB-D Benchmark Dataset for 6D Object Pose Estimation

Overview of IndustryShapes

The paper "IndustryShapes: An RGB-D Benchmark dataset for 6D object pose estimation of industrial assembly components and tools" (2602.05555) presents a novel RGB-D benchmark dataset designed to address key challenges in 6D object pose estimation within industrial environments. The motivation behind this dataset stems from a noted scarcity of data that effectively bridges the gap between controlled laboratory research and real-world manufacturing scenarios. Prior datasets in this domain have largely been constrained to household objects or basic industrial components, lacking the complexity and variability inherent in actual industrial settings. This introduction of five new objects characterized by their complex geometries and challenging reflective surfaces offers a rich testbed for both instance-level and novel object pose estimation methods.

Figure 1: Pose distribution per object. Visualization of the overall spherical viewpoint coverage of the complete IndustryShapes dataset in Mollweide projection, indicating the density and pose variation.

Dataset Features and Structure

IndustryShapes is structured into two main components: the classic set and the extended set. The classic set is constructed from a mixture of 4,623 real images and synthetic data, providing 6,000 annotated poses. These images are sourced from both laboratory conditions and realistic industrial environments. This bifurcation is critical to evaluating a model’s ability to generalize from controlled conditions to more chaotic real-world settings. The extended set expands on this by offering additional data modalities supporting model-free approaches, characterized by over 10,000 annotations and the provision of RGB-D static onboarding sequences. These onboarding sequences are unprecedented within industrial pose estimation datasets, facilitating a model-free evaluation that does not rely solely on object 3D CAD models.

Figure 2: Distribution of object-to-camera distances for annotated poses, grouped by object (1 to 5 from left to right). Top row: annotated poses in the training (blue) and test (magenta) data of the classic set. Bottom row: annotated poses of the classic set (orange) and the extended set (yellow).

In comparison to existing datasets, such as T-LESS and ITODD, IndustryShapes emphasizes unstructured industrial realism and challenging properties rather than sheer object quantity. This focus offers more representative industrial assembly scenes that replicate real robotic workstation setups. Such intricacies provide a more demanding benchmark, pushing the developed methodologies to account for multifaceted scene complexities typical of industrial workflows.

Benchmarking and Evaluation

The authors provide a benchmarking framework evaluating IndustryShapes against state-of-the-art methodologies, including EPOS, DOPE, and ZebraPose for instance-based methods, and FoundPose and FoundationPose for model-based and model-free approaches, respectively. In terms of the evaluation protocol, the paper adopts the BOP challenge protocol, involving pose error metrics such as VSD, MSSD, MSPD, and the widely used ADD.

Instance-level methods demonstrated variable efficacy across the dataset, with EPOS and ZebraPose achieving competitive results, while DOPE trailed due to its reliance on image-based keypoint extraction without CAD model support. FoundPose and FoundationPose illustrate the advancement of novel object pose estimation methods, performing comparably to instance-based approaches without requiring retraining on target datasets.

For detection and segmentation, CNOS and SAM-6D were evaluated, showcasing moderate performance across both dataset sets. Notably, SAM-6D showed strength in segmentation accuracy due to its foundation on SAM, offering advantages over CNOS in precision mask generation and supporting robust localization even in cluttered industrial scenes.

Implications and Future Directions

The introduction of IndustryShapes sets a foundation for advancing 6D object pose estimation within industrial applications, providing a dataset that addresses limitations found in prior research focused primarily on controlled environments and familiar objects. Its applicability could potentially induce a shift towards deploying 6D pose estimation models in dynamic manufacturing settings. The dataset is seminal for encouraging the development of models that robustly handle symmetries, reflective surfaces, and real-world clutter typical of industrial scenarios.

While the dataset marks a significant milestone, future efforts could focus on expanding the dataset's diversity, particularly in scenarios involving multiple interacting robots or accommodating more complex robotic manipulations. Addressing the current limitations related to scene representation could further improve model training processes, thus edging closer to real-time, robust pose estimation for industrial automation.

Conclusion

IndustryShapes contributes a pivotal resource to the domain of 6D object pose estimation. By providing a dataset that encapsulates realistic industrial challenges alongside innovative features like RGB-D static onboarding sequences, it promises to inspire methodological advancements in both research and practical application spheres. The benchmark results provide both a snapshot of current method capabilities and a clear direction for future exploration within this crucial field.

Markdown Report Issue