IMD Benchmark for Industrial Robotics

Updated 16 September 2025

IMD Benchmark is a comprehensive evaluation framework designed for 6-DoF pose estimation and video segmentation in industrial robotics using metallic, textureless objects.
It utilizes 45 industrial components with precise CAD models and varied capture setups from an Intel RealSense sensor and ABB robotic arm to simulate real-world conditions.
Benchmark results reveal that state-of-the-art algorithms struggle with specular reflections, unreliable depth maps, and occlusion, highlighting the need for more robust methods.

The IMD benchmark, as defined by the Industrial Metallic Dataset (IMD), provides a comprehensive evaluation framework for 6-DoF (degree-of-freedom) pose estimation and video segmentation in the context of industrial robotics. It distinguishes itself from prior benchmarks by focusing on metallic, textureless, highly reflective objects typical of industrial environments, thus revealing the limitations of existing pose estimation and segmentation algorithms when applied outside traditional, household object scenarios.

1. Dataset Composition

IMD comprises 45 true-to-scale metallic industrial components augmented with precise CAD models. Object selection spans realistic industrial parts, facilitating robust annotation for segmentation and pose estimation tasks. The acquisition protocol reflects diverse, real-world scenarios, including:

Single-object sequences, grouped arrangements by shape, random object clusters (~5 objects/group), and "cluttered" setups (all objects together, analogous to conveyor lines).
Captured using an Intel RealSense D405 RGB-D sensor coupled to an ABB GoFa CRB 15000 industrial robotic arm, simulating both top-down and 45-degree angled camera views through square and inclined circular paths.
A total of 110 video sequences and 256 object sequences were annotated, collected exclusively under natural indoor daylight to preserve specular effects and authentic lighting artefacts.

The inclusion of high-reflectance, textureless objects challenges conventional feature-based object recognition, making IMD fundamentally more difficult than widely used household (e.g., YCB, DAVIS) datasets.

2. Supported Benchmark Tasks

IMD supports three primary evaluation tasks:

a. Video Object Segmentation

Models receive a ground-truth mask for the initial frame and must propagate segmentation across video. Quantitative assessment via Intersection over Union (IoU): $\text{IoU} = \frac{|B_p \cap B_{gt}|}{|B_p \cup B_{gt}|}$ where $B_p$ is the predicted mask and $B_{gt}$ the ground-truth.

b. 6D Pose Tracking

Using BundleTrack and BundleSDF, framewise canonical pose is computed:

Given camera pose $T_{wi}^c = [R_{wc_i} \; t_{wc_i}; 0 \; 1]$ , and fixed object pose $T_w^o = [R_{wo} \; t_{wo}; 0 \; 1]$ , the object pose relative to the camera: $T_c^o = (T_{wi}^c)^{-1} T_w^o$ Evaluation uses translation error (Euclidean centroid distance) and rotation error (matrix angular deviation, degrees).

c. One-shot 6D Pose Estimation

Pose must be inferred from a single frame (potentially supplemented by a CAD model), simulating scenarios where only a one-time observation is possible. Models are initialized using the first 50% of frames before being tested on the remaining single frames. This protocol focuses on robustness without temporal information.

3. Model Evaluation and Comparative Results

State-of-the-art segmentation models (XMem, SAM2) and pose tracking models (BundleTrack, BundleSDF) are profiled:

Segmentation: On IMD, SAM2 exhibits mean IoU of 0.770, outperforming XMem (0.746), and shows superior recall at IoU threshold 0.5. However, SAM2 incurs higher memory usage and inference latency—a key constraint in industrial deployment.
6D Pose Tracking: On YCB-video, BundleTrack delivers translation/rotation errors at 2.26 mm/4.48°, significantly outperforming BundleSDF (5.64 mm/8.09°). On IMD, tracking errors increase sharply—BundleTrack shows 32.23 mm translation and 49.17° rotation error for challenging 45-degree views—reflecting the adverse impact of reflectance, lack of texture, and viewpoint changes.
One-shot Pose Estimation: BundleSDF is more robust than BundleTrack in this setting, achieving 6.80 mm translation and 17.61° rotation error on YCB-video; BundleTrack frequently fails without temporal context, defaulting to fixed poses.

These results highlight that existing state-of-the-art models, generally developed and evaluated on non-industrial objects, struggle to generalize to true industrial settings.

4. Computational and Algorithmic Challenges

The IMD benchmark exposes several fundamental issues:

Highly reflective, textureless surfaces introduce pronounced difficulties for RGB-D sensors, degrading depth fidelity and undermining feature extraction essential for pose estimation.
Industrial arrangements promote occlusion, specular highlights, and ambiguous silhouettes, exacerbating segmentation errors and inducing pose drift.
Algorithms trained on household objects cannot reliably transfer learned representations to industrial artifacts, evidenced by marked performance drops in cross-benchmark tests.

A plausible implication is that robust pose estimation in industrial scenarios requires algorithms resilient to unreliable depth maps, specular artefacts, and minimal texture content.

5. Future Research Directions

IMD catalyzes further investigation into:

Development of depth-insensitive feature extraction pipelines and novel matching algorithms robust to specular reflection.
Enhanced pose tracking algorithms capable of maintaining accuracy across severe viewpoint variations and occlusion frequencies.
Advancement of one-shot 6D pose estimation methodologies, vital for industrial deployments where repeated observations may not be feasible.
Integration of the IMD dataset as a reference baseline for benchmarking algorithms in tasks such as bin picking, assembly, and autonomous inspection, directly serving domains where robust robotic perception of metallic components is paramount.

The dataset not only quantifies algorithmic limitations in industrial robotics but sets a new standard for the generalizability requirements of segmentation and pose estimation models demanded by modern manufacturing environments. The benchmarking tasks and results provided by IMD thus anchor future work in algorithm design, evaluation, and deployment for industrial robotics applications.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to IMD Benchmark.