MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection (2505.11282v2)

Published 16 May 2025 in cs.CV

Abstract: Mobile robots are reaching unprecedented speeds, with platforms like Unitree B2, and Fraunhofer O3dyn achieving maximum speeds between 5 and 10 m/s. However, effectively utilizing such speeds remains a challenge due to the limitations of RGB cameras, which suffer from motion blur and fail to provide real-time responsiveness. Event cameras, with their asynchronous operation, and low-latency sensing, offer a promising alternative for high-speed robotic perception. In this work, we introduce MTevent, a dataset designed for 6D pose estimation and moving object detection in highly dynamic environments with large detection distances. Our setup consists of a stereo-event camera and an RGB camera, capturing 75 scenes, each on average 16 seconds, and featuring 16 unique objects under challenging conditions such as extreme viewing angles, varying lighting, and occlusions. MTevent is the first dataset to combine high-speed motion, long-range perception, and real-world object interactions, making it a valuable resource for advancing event-based vision in robotics. To establish a baseline, we evaluate the task of 6D pose estimation using NVIDIA's FoundationPose on RGB images, achieving an Average Recall of 0.22 with ground-truth masks, highlighting the limitations of RGB-based approaches in such dynamic settings. With MTevent, we provide a novel resource to improve perception models and foster further research in high-speed robotic vision. The dataset is available for download https://huggingface.co/datasets/anas-gouda/MTevent

Summary

The paper introduces MTevent, a novel dataset combining stereo-event and RGB data, designed for high-speed robotic perception tasks like 6D pose estimation and moving object detection.
MTevent features 75 scenes with 16 objects under challenging conditions, addressing limitations of existing datasets by focusing on real-world, high-speed, and long-range interactions.
Evaluation highlights the challenges of RGB-based pose estimation in dynamic scenes, validating the need for event cameras and positioning MTevent as a key resource for advancing event-based perception models.

MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection

The paper introduces MTevent, a dataset curated specifically for high-speed robotic perception tasks, utilizing event cameras for 6D pose estimation and moving object detection in dynamic environments. This dataset is motivated by the increasing speed capabilities of mobile robots, such as Unitree B2 and Fraunhofer O³dyn, which can achieve speeds of up to 10 m/s. Traditional RGB cameras, while extensively used in robotic systems, suffer from motion blur and inadequate responsiveness in such high-speed scenarios. Event cameras offer an innovative solution due to their asynchronous operation and low-latency sensing capabilities, providing robustness against motion blur and facilitating real-time perception.

Dataset Characteristics

MTevent is distinct in its combination of stereo-event and RGB camera setups, capturing 75 scenes with an average duration of 16 seconds each, and involving 16 unique objects under challenging conditions—such as extreme viewing angles, varied lighting, and occlusions. This dataset addresses the limitations of existing event camera datasets that typically focus on indoor environments or smaller household items, and paves the way for a more comprehensive understanding of event-based vision in robotics.

Evaluation and Results

The paper evaluates the dataset's application for 6D pose estimation using NVIDIA's FoundationPose model on RGB images, demonstrating an Average Recall of 0.22 with provided ground-truth masks. This relatively low recall underscores the challenges posed by fast-moving environments and the inherent weaknesses of RGB-based approaches in such contexts. The results validate the importance of event cameras in overcoming these perception hurdles, particularly in scenarios requiring rapid and precise object interaction.

Implications and Future Directions

MTevent represents a pivotal resource for advancing event-based perception models, emphasizing tasks such as pose estimation, motion segmentation, 3D bounding box detection, optical flow estimation, and object tracking. By presenting real-world object interactions with a focus on high-speed motion and long-range perception, the dataset is poised to impact not just robotics but also other domains reliant on dynamic scene understanding.

Looking forward, the paper suggests further research could explore developing models that leverage event data more effectively for object tracking and dynamic scene analysis. Improving event-camera-based 6D pose estimation methodologies remains an underexplored yet promising frontier, with potential applications extending beyond the field of mobile robotics to autonomous systems in diverse fields.