MR6D Dataset: Industrial 6D Pose Estimation

Updated 6 September 2025

MR6D dataset is a benchmark for instance-level 6D pose estimation in industrial mobile robotics, addressing long-range perception, diverse camera angles, and severe occlusions.
It features 92 real-world scenes with 16 unique industrial objects, high-fidelity 3D mesh models, and precise calibration for both static and dynamic conditions.
MR6D advances research by exposing challenges in segmentation and pose estimation, highlighting issues like motion blur, occlusion errors, and depth sparsity in mobile robot settings.

MR6D is a benchmark dataset expressly constructed for instance-level 6D pose estimation in industrial mobile robotics settings. It addresses the marked limitations of previous datasets, which predominantly focus on household objects suited for robotic manipulator arms, thereby omitting the operational realities of mobile robots—specifically, the need for long-range perception, interaction with large, standardized industrial objects, and management of diverse camera viewpoints and severe occlusions. MR6D comprises 92 real-world scenes featuring 16 unique industrial objects under both static and dynamic conditions, and provides high-fidelity 3D mesh models alongside precise calibration data. Its structure, challenges, and evaluation methodologies set a new standard for the pose estimation community, fostering advancements that meet the demands of practical mobile robotic deployments (Gouda et al., 19 Aug 2025).

1. Dataset Design, Object Types, and Scene Subsets

MR6D is curated for industrial use cases requiring mobile robots to interact with objects not amenable to traditional manipulator grasping. The object set includes Euro pallets (manually modeled), multiple variants of Euro-standard KLT bins, an Amazon Basics suitcase, and a selection of IKEA storage items. All but the Euro pallet are reconstructed with BundleSDF utilizing high-precision Zivid 2 depth data.

Scenes are partitioned into four evaluation subsets:

Subset	Context/Feature	Key Characteristics
Validation Static	Controlled research hall, VICON motion capture	Static scenes, precise pose ground truth
Dynamic Test	Human-robot interaction, camera and objects in motion	Motion blur, dynamic occlusion
O³dyn Test	Low camera mounts (e.g., AGV), indoor/outdoor lighting	Sparse depth, variable illumination
MR Test	Simulated mobile robot trajectories	Varied viewpoints, approach interactions

Objects are selected to reflect tasks for which mobile robots are the only platform capable of manipulation, thus requiring tailored perception and scene understanding approaches.

2. Industrial Mobile Robotics: Specific Challenges Addressed

MR6D is engineered to encapsulate the following core difficulties in mobile robotic pose estimation:

Long-Range Perception: Objects are perceived from greater distances than in manipulator scenarios, requiring models to maintain accuracy despite reduced spatial resolution and increased object scale.
Diverse Camera Perspectives: Mobile robots encounter non-standard viewpoints—such as low-angle approaches—altering object visibility compared to traditional, downward-looking arm scenarios.
Occlusion and Self-Occlusion: Frequent stacking of large containers and similarity in object appearance result in ambiguous boundaries, complicating both 2D segmentation and 6D pose inference.
Dynamic Interactions: Human-in-the-loop scenarios and robot/object motion introduce non-negligible challenges, including sporadic loss of depth data and motion blur.

This careful curation mirrors authentic industrial operational contexts.

3. Annotation Protocols and Calibration

Annotation leverages both automation and manual refinement to achieve high fidelity:

Multi-Modal Tracking: VICON motion capture is utilized for camera and object pose tracking. An eye-in-hand calibration aligns the VICON-tracked camera frame (cam_MoCap) with the optical frame (cam_optical), expressed as $T_\text{cam\_optical} = T(\text{cam\_MoCap})$ . In dynamic scenes, extra calibration aligns the VICON object frame (obj_MoCap) with object geometry.
Static Scenes: Initial pose estimates are generated via marker-based fusion and subsequently refined using the BOP annotation tool.
Dynamic Scenes: Fully automated pose estimation is used, with systematic re-placement of tracking markers to mitigate annotation bias.

3D mesh models are included for every object, serving both pose annotation and potential synthetic data generation.

4. Initial Performance Benchmarks and Failure Modes

Two principal pipelines were evaluated on MR6D:

Pipeline 1: GT-Masks + FoundationPose. Using accurate ground-truth 2D segmentation masks with FoundationPose yields an overall average recall (AR) of 0.3462 across test subsets, establishing the upper-bound performance for modern 6D pose estimation models in mobile contexts.
Pipeline 2: CTL-Based Segmentation + FoundationPose. Employing Centroid Triplet Loss (CTL) to generate segmentation masks for unseen objects reduces AR to 0.1841, revealing that segmentation fidelity is a critical bottleneck.

Qualitative analysis underscores recurrent errors such as:

Confusion between occluded and adjacent objects,
Orientation misassignments in stacks or visually similar instances,
Dominant visible surfaces causing erroneous pose predictions in scenes with protruding shapes.

This suggests significant room for improvement in segmentation and joint pose estimation architectures.

5. Methodological Implications and Metrics

MR6D highlights the inadequacy of traditional pose estimation benchmarks when applied to mobile robot scenarios. The findings motivate several research directions:

Segmentation Refinement: Given segmentation as a key failure point, there is impetus to transition from over-segmentation techniques to entity-level segmentation for improved object integrity.
Metrics Revision: Distance-weighted pose error metrics are proposed, wherein pose errors for near objects are penalized more than distant ones—addressing unique visibility and accuracy requirements in mobile robot applications.
Modality Augmentation: Future efforts could incorporate background removal or monocular depth estimation to mitigate poor lighting and sparse depth effects, especially prevalent in low-angle or outdoor imaging scenarios.

A plausible implication is that segmentation and pose estimation pipelines for mobile robots will necessitate fundamentally new architectures and training regimes optimized for scale, distance, and context diversity.

6. Benchmarking and Impact on Robustness Evaluation

MR6D provides a rigorous framework for benchmarking instance-level 6D pose estimation targeting predefined, industrially relevant object sets within challenging settings. Its design enables robust generalization testing for methods that claim to work on unseen objects and prevents overfitting to idiosyncratic household environments.

By simulating and capturing authentic mobile robot operational constraints, MR6D promotes realistic algorithm assessment and development, setting the groundwork for the next generation of mobile robot vision systems.

7. Prospects for Dataset Expansion and Research Advancement

MR6D addresses a previously unmet need for mobile robot-centric pose estimation data. Extensions may include the addition of further object types, diversity in lighting and environment, and support for multi-modal fusion. The dataset’s current structure already encourages research into segmentation robustness, metric innovation, and dynamic scene handling.

This suggests a path forward where improved annotation strategies, advanced segmentation architectures, and tailored pose inference pipelines converge to enhance mobile robot autonomy in complex industrial domains. MR6D is expected to remain a foundational benchmark for such research initiatives.

PDF Markdown Chat (Pro)

References (1)

MR6D: Benchmarking 6D Pose Estimation for Mobile Robots (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to MR6D Dataset.