MCOO-SLAM: A Multi-Camera Omnidirectional Object SLAM System (2506.15402v1)

Published 18 Jun 2025 in cs.RO, cs.AI, and cs.CV

Abstract: Object-level SLAM offers structured and semantically meaningful environment representations, making it more interpretable and suitable for high-level robotic tasks. However, most existing approaches rely on RGB-D sensors or monocular views, which suffer from narrow fields of view, occlusion sensitivity, and limited depth perception-especially in large-scale or outdoor environments. These limitations often restrict the system to observing only partial views of objects from limited perspectives, leading to inaccurate object modeling and unreliable data association. In this work, we propose MCOO-SLAM, a novel Multi-Camera Omnidirectional Object SLAM system that fully leverages surround-view camera configurations to achieve robust, consistent, and semantically enriched mapping in complex outdoor scenarios. Our approach integrates point features and object-level landmarks enhanced with open-vocabulary semantics. A semantic-geometric-temporal fusion strategy is introduced for robust object association across multiple views, leading to improved consistency and accurate object modeling, and an omnidirectional loop closure module is designed to enable viewpoint-invariant place recognition using scene-level descriptors. Furthermore, the constructed map is abstracted into a hierarchical 3D scene graph to support downstream reasoning tasks. Extensive experiments in real-world demonstrate that MCOO-SLAM achieves accurate localization and scalable object-level mapping with improved robustness to occlusion, pose variation, and environmental complexity.

Summary

The paper introduces the MCOO-SLAM framework, integrating multi-camera systems with semantic knowledge for robust object-level SLAM in complex outdoor environments.
Key contributions include a multi-camera object data association method, an omnidirectional loop closure using semantic descriptors, and an integrated architecture for hierarchical 3D scene graphs.
Extensive experiments show MCOO-SLAM achieves accurate localization and scalable object mapping, improving robustness against occlusions and pose variations, with implications for autonomous systems.

MCOO-SLAM: A Multi-Camera Omnidirectional Object SLAM System

The paper "MCOO-SLAM: A Multi-Camera Omnidirectional Object SLAM System" introduces a novel approach to object-level Simultaneous Localization and Mapping (SLAM) that leverages multiple cameras to achieve robust and semantically enriched mapping in complex outdoor environments. The authors address limitations found in existing SLAM techniques, which often rely on monocular or RGB-D sensors leading to challenges like narrow fields of view and susceptibility to occlusions. The MCOO-SLAM framework integrates semantic knowledge with geometric data and employs a multi-level fusion strategy to improve object association across multiple views, promising enhanced robustness and accuracy.

The paper highlights several key contributions. First, the authors propose a multi-camera object-level data association method that uses semantic, geometric, and temporal information to achieve consistency and accuracy across variable viewpoints and times. Second, they introduce an omnidirectional loop closure module that utilizes global scene descriptors enriched with open-vocabulary semantic information, providing a solution to problems stemming from significant pose variations. Lastly, the paper details an integrated system architecture that facilitates hierarchical 3D scene graph construction, supporting downstream tasks such as querying and reasoning.

The authors perform extensive experiments in real-world settings to validate the effectiveness of MCOO-SLAM. The results demonstrate the system's capability to achieve accurate localization and scalable object-level mapping, showcasing improvements in robustness against occlusions, pose variations, and environmental complexities.

On the theoretical side, the research integrates breakthroughs in open-vocabulary semantic understanding and non-linear camera models, enhancing the ability for robots to interact with diverse environments. By innovatively marrying multi-camera systems with object-level SLAM, the authors expand the potential for high-level reasoning in outdoor robotics applications.

The implications of this work are substantial. Practically, it facilitates more robust navigation and mapping capabilities for autonomous systems deployed in dynamic and large-scale outdoor environments. Theoretically, it opens pathways for research into more comprehensive multi-sensor SLAM methodologies that incorporate rich semantic context.

Looking towards future developments, the integration of similar systems with AI-driven reasoning and decision-making frameworks could significantly advance robotic autonomy. Enhancements in spatial awareness and semantic perception could prompt broader adoption of SLAM systems in various fields, such as autonomous driving, agriculture robotics, and urban planning.

Overall, MCOO-SLAM illustrates a significant advancement in SLAM technology, pushing the envelope on what is achievable through the incorporation of multi-camera systems and semantic understanding.

MCOO-SLAM: A Multi-Camera Omnidirectional Object SLAM System (2506.15402v1)

Summary

MCOO-SLAM: A Multi-Camera Omnidirectional Object SLAM System

Related Papers