3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans (2002.06289v2)

Published 15 Feb 2020 in cs.RO, cs.AI, and cs.CV

Abstract: We present a unified representation for actionable spatial perception: 3D Dynamic Scene Graphs. Scene graphs are directed graphs where nodes represent entities in the scene (e.g. objects, walls, rooms), and edges represent relations (e.g. inclusion, adjacency) among nodes. Dynamic scene graphs (DSGs) extend this notion to represent dynamic scenes with moving agents (e.g. humans, robots), and to include actionable information that supports planning and decision-making (e.g. spatio-temporal relations, topology at different levels of abstraction). Our second contribution is to provide the first fully automatic Spatial PerceptIon eNgine(SPIN) to build a DSG from visual-inertial data. We integrate state-of-the-art techniques for object and human detection and pose estimation, and we describe how to robustly infer object, robot, and human nodes in crowded scenes. To the best of our knowledge, this is the first paper that reconciles visual-inertial SLAM and dense human mesh tracking. Moreover, we provide algorithms to obtain hierarchical representations of indoor environments (e.g. places, structures, rooms) and their relations. Our third contribution is to demonstrate the proposed spatial perception engine in a photo-realistic Unity-based simulator, where we assess its robustness and expressiveness. Finally, we discuss the implications of our proposal on modern robotics applications. 3D Dynamic Scene Graphs can have a profound impact on planning and decision-making, human-robot interaction, long-term autonomy, and scene prediction. A video abstract is available at https://youtu.be/SWbofjhyPzI

Authors (5)

Antoni Rosinol (10 papers)
Arjun Gupta (24 papers)
Marcus Abate (7 papers)
Jingnan Shi (15 papers)
Luca Carlone (109 papers)

Citations (162)

View on Semantic Scholar

Summary

Overview of 3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans

The paper introduces "3D Dynamic Scene Graphs (DSGs)," a unified representation for actionable spatial perception, addressing a pressing need in the field of robotics for a framework that bridges low-level obstacle avoidance and motion planning with high-level task planning in dynamic environments. DSGs enrich traditional spatial representations by offering a layered and hierarchical structure that captures spatio-temporal semantics with enhanced detail and abstraction.

Objectives and Contributions

This paper makes three main contributions:

DSG Framework: It proposes DSGs, which extend scene graph concepts to encapsulate higher-level spatial constructs at various degrees of abstraction, such as objects, rooms, agents, and their relationships over time. This multi-layered structure forms a metric-semantic map critical for nuanced task planning and decision-making.
Spatial Perception eNgine (SPIN): The authors present SPIN, the first fully automatic engine capable of building DSGs autonomously from raw visual-inertial data. Unlike existing frameworks, SPIN does not rely on pre-annotated meshes or manual segmentation, leveraging state-of-the-art techniques for object and human detection and pose estimation.
Integration and Application: A novel integration of SLAM and dense human mesh tracking within robotic frameworks is showcased through experiments in a photo-realistic Unity-based simulator. The robust reconstruction and dynamic entity tracking capabilities of the proposed methodology are highlighted.

Layered Hierarchical Representation

The DSG encapsulates the environment into five distinct layers ranging from dense 3D meshes to high-level scene abstractions:

Layer 1: Metric-semantic mesh, depicting 3D points, their normals, color, and semantic labels.
Layer 2: Objects and agents, representing both static and dynamic entities, modeled with attributes like 3D pose and semantic classification.
Layer 3: Places and structures, focusing on navigable spaces and their connectivity.
Layer 4: Rooms, with spatial demarcations such as right angles and adjacency.
Layer 5: Building, unifying the spatial semantics into a coherent representation for a single edifice.

Experimental Results

The paper emphasizes the robustness of the DSG framework in densely populated simulation environments. Notably, datasets from the uHumans collection illustrate the system’s performance in tracking multiple dynamic elements accurately, with improved visual-inertial odometry (VIO) results and advanced masking techniques that efficiently manage dynamic occlusions. For instance, errors in VIO trajectories shrank significantly when IMU-aware methods and 2-point RANSAC were employed, with demonstrated improvements maintaining trajectory precision in crowded scenarios.

Implications and Future Directions

The practical implications of DSGs are profound, potentially influencing:

Robotic Navigation and Interaction: The DSG structure provides robust frameworks for navigating complex and dynamic environments, facilitating better interaction paradigms with human agents by detailing occupancy and motion possibilities within a space.
Task Planning and Execution: By enabling higher-level abstractions for task assignments, robots can interpret navigation paradigms naturally and efficiently, driving advancements in autonomous decision-making.
Long-term Deployment: DSGs offer strategic storage solutions through hierarchical abstraction, enabling selective data retention and promoting memory efficiency over extended operational periods.

Moving forward, expanding DSG functionality to integrate additional sensory input types and extending node and attribute types for various environments could enhance AI applications in robotics. Further, SPIN systems might evolve to process dynamic, multi-agent datasets, capitalizing on distributed robotic systems for more expansive deployments. The convergence of DSGs with advanced machine learning paradigms could also innovate predictive modeling in environment dynamics, fostering more versatile AI-driven robotic systems.

PDF Markdown

Related Papers

YouTube

Show All Videos