3D Dynamic Scene Graphs
- 3D Dynamic Scene Graphs are layered, graph-based representations integrating geometric, semantic, topological, and temporal data to model both static and dynamic entities.
- They leverage real-time updates and multi-modal sensor fusion to robustly support SLAM, localization, and dynamic scene understanding.
- Applications include long-term spatial memory, hierarchical planning, and multi-agent coordination through constraint-based optimization and dynamic object tracking.
A 3D Dynamic Scene Graph (DSG) is a layered, attributed graph-based representation that jointly encodes the geometric, semantic, topological, and temporal structure of a physical environment, including both static and dynamic entities such as movable objects, humans, and robots. DSGs extend classical scene graphs by modeling dynamic elements as first-class citizens and accommodating real-time updates reflecting environmental changes, agent activities, and semantic relations. This unifying representation supports robust SLAM, long-term spatial memory, prediction, planning, and interaction in environments characterized by spatial and temporal variability.
1. Structural Foundations and Formal Definitions
A 3D DSG is formally a directed, layered graph , where nodes represent spatial entities at multiple abstraction levels (e.g., points, objects, agents, places, rooms, buildings), and edges encode spatial, semantic, topological, and spatio-temporal relations. Layering endows the DSG with hierarchical semantics; typical layers include:
- Layer 1: Metric/geometric primitives (mesh vertices, point clouds)
- Layer 2: Objects and dynamic entities (agents/humans/robots)
- Layer 3: Places and structures (navigational points, walls, floors)
- Layer 4: Rooms or spatial enclosures
- Layer 5: Buildings or global context
- Additional temporal/dynamic layers (for temporal flow, occupancy histograms, or object tracks) (Catalano et al., 10 Dec 2025, Rosinol et al., 2021, Rosinol et al., 2020, Gorlo et al., 1 May 2024)
Each node is associated with attributes including position , semantic class , and (optionally) traversability or temporal state. Edges, annotated by , represent adjacency, containment, support, motion, or hierarchical relations and may capture odometric factors, entity-keyframe links, or spatio-temporal transitions (Giberna et al., 3 Mar 2025, Gorlo et al., 1 May 2024, Rosinol et al., 2020).
2. Dynamic Modeling: Nodes, Constraints, and Temporal Integration
A critical advancement in DSGs is explicit modeling of dynamic entities as graph nodes. This includes:
- Agent and Dynamic Object Nodes: Each detected entity (human, robot, movable furniture) is represented by a sequence of time-indexed pose nodes, supporting multi-view and temporal data association.
- Constraint Factors:
- Entity–Keyframe Constraints bind dynamic observations to robot keyframes, coupling robot and object/agent localization.
- Intra-Entity Constraints link consecutive observations of the same entity via semantic-class motion priors, differentiating static objects (stationary unless moved) from active agents (always mobile).
- Entity–Floor Constraints ensure vertical consistency relative to building structure (Giberna et al., 3 Mar 2025).
- Temporal/Flow Layers: Some DSG frameworks augment the spatial graph with per-node temporal statistics, e.g., flow histograms or activity frequency models (as in Aion, employing Fourier-based "Frequency Map Enhancement" for periodicity) (Catalano et al., 10 Dec 2025).
- Spatio-Temporal Edges: Direct temporal links encode dynamic trajectories, interaction sequences, or event histories (e.g., agent pose-tracks).
This explicit integration of temporal factors and motion models enables robust modeling and inference under dynamics, supporting multi-entity, multi-time reasoning (Rosinol et al., 2021, Giberna et al., 3 Mar 2025, Catalano et al., 10 Dec 2025).
3. Acquisition, Construction, and Update Algorithms
DSGs are constructed and maintained using multi-modal, often real-time pipelines:
Acquisition & Construction:
- SLAM Backend: Real-time visual-inertial (or LiDAR/RGB-D) SLAM reconstructs the environment; pose estimates anchor keyframes and agent/object tracks (Rosinol et al., 2021, Giberna et al., 3 Mar 2025).
- Semantic Segmentation: Vision-LLMs and semantic segmentation networks provide hierarchical object, agent, and scene labels (Ge et al., 21 Feb 2025, Yan et al., 15 Oct 2024, Olivastri et al., 5 Nov 2024).
- Hierarchical Parsing: Geometric clustering, distance transforms, and panoptic labeling infer places, rooms, and building hierarchies (Rosinol et al., 2021, Rosinol et al., 2020, Wang et al., 17 Dec 2025).
- Dynamic Object Detection: Fiducial markers (e.g., AprilTags), motion trackers, or open-vocabulary detectors identify and disambiguate dynamic objects and agents over time (Giberna et al., 3 Mar 2025, Ge et al., 21 Feb 2025, Yan et al., 15 Oct 2024).
Update Mechanisms:
- Multi-Modal Change Detection: Integration of vision-based perception, robot action logs, human textual input, and temporal priors yields a unified stream of “change descriptors,” each specifying node/edge additions, removals, or modifications (Olivastri et al., 5 Nov 2024).
- Local/Partial Graph Editing: Rather than global scene reconstruction, localized subgraph updates add/remove/adjust only affected nodes and their incident edges for computational efficiency (Yan et al., 15 Oct 2024).
- Event-Driven Simulation: Discrete-event simulation (as in FOGMACHINE) models object spawn/deletion, agent movement, and interaction under partial observability, supporting belief estimation and uncertainty propagation in multi-agent settings (Ohnemus et al., 10 Oct 2025).
- Joint Optimization: Nonlinear least-squares solvers jointly optimize all state variables (robot, object, agent, and environmental structure) under the imposed constraints (Giberna et al., 3 Mar 2025, Rosinol et al., 2021).
Efficient, real-time update is feasible; for instance, optimization cycles in constraint-based DSG SLAM have been demonstrated at ~81ms per cycle (Giberna et al., 3 Mar 2025), and multimodal updates can be integrated with 10Hz planning loops (Olivastri et al., 5 Nov 2024).
4. Hierarchy, Semantics, and Physical Interaction
The hierarchical organization of DSGs enables representation and reasoning at multiple spatial and semantic levels:
- Layered Containment: Nodes are related via containment edges (e.g., object ‘in’ room, room 'in' floor, floor 'in' building) and attached hierarchically for scalable querying and navigation (Rosinol et al., 2021, Rosinol et al., 2020, Wang et al., 17 Dec 2025).
- Semantic Labeling: Nodes inherit semantic class labels from detectors and vision-LLMs, enabling open-vocabulary object referencing and function-aware planning (Ge et al., 21 Feb 2025, Yan et al., 15 Oct 2024).
- Dynamic Traversability: Traversability modeling distinguishes static obstacles from movable ones, promoting certain object nodes (“operable obstacles”) to enable interaction-aware planning. Dynamic navigational edges are created to reflect possible pathways opened by moving objects (Wang et al., 17 Dec 2025).
- Functional and Affordance Reasoning: The inclusion of physical and semantic attributes supports affordance-based planning; for example, HERO combines efficiency-driven and semantic filtering to only promote truly movable obstacles (Wang et al., 17 Dec 2025).
- Topological and Geometric Abstraction: DSGs encode adjacency, support, and spatial relations (e.g., “on-top-of,” “inside,” “adjacentTo”) for rich relational inference (Rosinol et al., 2020, Ge et al., 21 Feb 2025, Yan et al., 15 Oct 2024).
5. Downstream Applications and Quantitative Evaluation
DSGs underpin a wide range of robotics and embodied AI tasks:
SLAM and Localization: Joint optimization over static and dynamic nodes achieves substantial reductions in pose estimation error compared to static-world approaches (e.g., 27.57% reduction in ATE relative to S-Graphs+ (Giberna et al., 3 Mar 2025)).
Planning and Navigation: DSGs enable hierarchical semantic path-planning, multi-resolution A* over building/room/place graphs, and dynamic traversability-aware planning. HERO’s dynamic navigational graph yields a 35.1% reduction in path length and a 79.4% increase in success rate versus static baselines for navigation among movable obstacles (Wang et al., 17 Dec 2025).
Long-Term Prediction and Memory: DSGs facilitate long-term agent trajectory prediction by providing rich environmental context to LLMs, supporting probabilistic rollout of human-object interactions and continuous-time Markov Chain-based filtering (Gorlo et al., 1 May 2024). DSGs also support real-time update pipelines and memory management in shared environments (Olivastri et al., 5 Nov 2024).
Open-Vocabulary and Semantic Interaction: Scene graphs constructed with VLMs and maintained via local subgraph updates support robust language-guided manipulation and dynamic object/entity retrieval in evolving environments, with task success and scene graph accuracy notably surpassing static baselines (Yan et al., 15 Oct 2024, Ge et al., 21 Feb 2025).
Multi-Agent Coordination and Simulation: DSGs integrated with event-driven simulation (e.g., FOGMACHINE) support uncertainty propagation, decentralized belief updates, and communication in multi-agent scenarios (Ohnemus et al., 10 Oct 2025).
Performance metrics are diverse: ATE for SLAM, SCDA for change detection, NLL/BoN ADE for prediction tasks, Recall@1 for object retrieval, and task-level metrics such as path length, navigation error, success rate, and computational latency (Giberna et al., 3 Mar 2025, Yan et al., 15 Oct 2024, Ge et al., 21 Feb 2025, Gorlo et al., 1 May 2024, Wang et al., 17 Dec 2025).
6. Open Challenges, Limitations, and Research Directions
DSGs present several ongoing challenges:
- Robust Dynamic Entity Detection: Many approaches still rely on fiducial markers or closed-world object detectors, limiting generality. Marker-free, purely vision-based detection with robust data association remains an open research area (Giberna et al., 3 Mar 2025, Olivastri et al., 5 Nov 2024).
- Uncertainty Representation: Probabilistic DSGs, modeling uncertainty in object identity, pose, existence, and movability, are necessary for robust planning under partial information and sensor noise (Ohnemus et al., 10 Oct 2025, Wang et al., 17 Dec 2025).
- Dynamic Physical Interaction: Extension to richer affordance models and physical manipulation (beyond binary movability) is required for complex multi-step tasks (Wang et al., 17 Dec 2025).
- Temporal Consistency and Memory: Maintaining globally and temporally consistent DSGs under frequent environment change, especially in multi-agent settings, is nontrivial; scalable, real-time update strategies continue to evolve (Olivastri et al., 5 Nov 2024, Ohnemus et al., 10 Oct 2025).
- Integration with LLMs and Open-Vocabulary Semantics: Ongoing work explores fusing DSGs with large language and vision-LLMs for natural language interaction, task grounding, and open-world understanding (Yan et al., 15 Oct 2024, Ge et al., 21 Feb 2025, Gorlo et al., 1 May 2024).
- Evaluative Benchmarks: Standardized, large-scale benchmarks and metrics for DSG maintenance, update accuracy, and task-level performance are identified as a research need (Olivastri et al., 5 Nov 2024, Gorlo et al., 1 May 2024).
A plausible implication is that DSG-based representations will serve as the foundational world models for future autonomous robots and embodied agents, enabling robust operation, reasoning, and collaboration in complex, dynamic, and semantically rich real-world environments.
References
- (Giberna et al., 3 Mar 2025) Constraint-Based Modeling of Dynamic Entities in 3D Scene Graphs for Robust SLAM
- (Catalano et al., 10 Dec 2025) Aion: Towards Hierarchical 4D Scene Graphs with Temporal Flow Dynamics
- (Wang et al., 17 Dec 2025) HERO: Hierarchical Traversable 3D Scene Graphs for Embodied Navigation Among Movable Obstacles
- (Ge et al., 21 Feb 2025) DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation
- (Olivastri et al., 5 Nov 2024) Multi-Modal 3D Scene Graph Updater for Shared and Dynamic Environments
- (Rosinol et al., 2020) 3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans
- (Yan et al., 15 Oct 2024) Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation
- (Rosinol et al., 2021) Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs
- (Gorlo et al., 1 May 2024) Long-Term Human Trajectory Prediction using 3D Dynamic Scene Graphs
- (Ohnemus et al., 10 Oct 2025) FOGMACHINE -- Leveraging Discrete-Event Simulation and Scene Graphs for Modeling Hierarchical, Interconnected Environments under Partial Observations from Mobile Agents