MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty (2401.12761v4)

Published 23 Jan 2024 in cs.CV

Abstract: Achieving level-5 driving automation in autonomous vehicles necessitates a robust semantic visual perception system capable of parsing data from different sensors across diverse conditions. However, existing semantic perception datasets often lack important non-camera modalities typically used in autonomous vehicles, or they do not exploit such modalities to aid and improve semantic annotations in challenging conditions. To address this, we introduce MUSES, the MUlti-SEnsor Semantic perception dataset for driving in adverse conditions under increased uncertainty. MUSES includes synchronized multimodal recordings with 2D panoptic annotations for 2500 images captured under diverse weather and illumination. The dataset integrates a frame camera, a lidar, a radar, an event camera, and an IMU/GNSS sensor. Our new two-stage panoptic annotation protocol captures both class-level and instance-level uncertainty in the ground truth and enables the novel task of uncertainty-aware panoptic segmentation we introduce, along with standard semantic and panoptic segmentation. MUSES proves both effective for training and challenging for evaluating models under diverse visual conditions, and it opens new avenues for research in multimodal and uncertainty-aware dense semantic perception. Our dataset and benchmark are publicly available at https://muses.vision.ee.ethz.ch.

PDF Abstract

Introduction

The recently introduced MUSES dataset (Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty) represents a significant advancement in the domain of autonomous vehicle perception, tailored specifically to address the challenges associated with driving in adverse conditions.

Dataset Overview

MUSES addresses the critical shortage in existing driving datasets which either lack important non-camera modalities or do not fully leverage multimodal data to improve semantic annotations in diverse and challenging vision scenarios. MUSES stands out as it includes synchronized multimodal data with high-quality 2D panoptic annotations under a variety of weather and lighting conditions. With data captured from a standard camera, lidar, radar, and an event camera, accompanied by an IMU/GNSS sensor, the dataset presents an unparalleled resource for the development and evaluation of semantic perception systems. Each modality in the dataset plays a critical role and offers unique advantages, such as the ambient light robustness of lidar or the low-latency, high dynamic range capabilities of event cameras.

Methodology and Benchmarking

The dataset encapsulates material for a two-stage annotation protocol, producing a rich set of ground truth annotations that capture both the level of class and instance uncertainty—distinct features that emerge from challenging conditions and sensor limitations. Particularly noteworthy is MUSES's novel task of uncertainty-aware panoptic segmentation, which evaluates model performance not only on standard semantic segmentation metrics but also on how well a model manages uncertainty in prediction confidence, as quantified by the average uncertainty-aware panoptic quality (AUPQ) metric.

The analysis exhibits the intricacies of annotation in adverse visual conditions, with the assessment demonstrating enhanced label coverage and pronounced benefits from multimodal data, especially when camera-based signals fail. This reveals the pivotal contributions of complementary sensory inputs to the semantic understanding of a driving scene.

Implications for Future Research

MUSES is not only a tool for developing and assessing contemporaneous models but also serves as a launchpad for exploring advanced methodologies in multimodal and uncertainty-aware dense semantic perception. Leveraging the dataset's unique properties, researchers can explore sensor fusion techniques and create models that are resilient to drastic environmental changes, including those induced by adverse weather or poor lighting.

Moreover, MUSES provides a challenging and necessary proving ground for evaluating the robustness and generalization capabilities of intelligent driving systems. Initial experiments demonstrate the strong potential of models trained on MUSES to generalize across different domains, which is a testament to the dataset's comprehensive nature and quality.

In conclusion, MUSES vastly enriches the research landscape for autonomous driving perception systems, offering extensive multimodal sensory data and finely detailed annotations that encapsulate the uncertainty inherent in real-world driving scenarios. It equips researchers with the means to forge robust and adaptive perception systems capable of navigating the unpredictability of real-world driving scenarios.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Tim Brödermann (4 papers)
David Bruggemann (10 papers)
Christos Sakaridis (46 papers)
Kevin Ta (5 papers)
Odysseas Liagouris (1 paper)
Jason Corkill (1 paper)
Luc Van Gool (570 papers)

Citations (3)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/zhenjun_zhao/status/1750023858831179798