S-Graphs: Semantic LiDAR SLAM Framework
- S-Graphs is a hierarchical semantic-relational graph framework that integrates pose, plane, room, and floor layers for efficient and interpretable LiDAR SLAM.
- It incrementally builds maps by combining keyframe insertion, RANSAC-based plane extraction, and Scan Context descriptors to robustly detect loop closures and semantic associations.
- The framework enhances computational efficiency through hierarchical compression and decentralized collaborative mapping, achieving up to a 40% reduction in optimization time.
S-Graphs (Semantic LiDAR SLAM) are a hierarchical, semantic-relational graph framework designed for real-time Simultaneous Localization and Mapping (SLAM) using 3D LiDAR data. They combine a classical pose graph with a layered scene graph capturing the geometric, semantic, and topological structure of the environment. S-Graphs support robust loop closure, data association, and collaborative mapping, prominently in multi-robot or large-scale scenarios, while minimizing computation and communication overhead through semantic abstraction and graph compression (Fernandez-Cortizas et al., 2023).
1. Hierarchical Structure and Formal Definition
An S-Graph is defined as a directed, layered graph coupling a pose graph of robot keyframes with a semantic scene graph (Fernandez-Cortizas et al., 2023, Bavle et al., 2022, Fernandez-Cortizas et al., 2024, Bavle et al., 25 Feb 2025). The canonical four-layer structure is:
- Keyframe Layer: Each node captures a robot pose at a keyframe. Odometry edges link consecutive keyframes with SE(3) measurements.
- Wall/Plane Layer: Nodes represent infinite planes (walls), parameterized by normal and signed distance , extracted from LiDAR via RANSAC-based fitting. Edges encode geometric constraints between robot poses and planes.
- Room Layer: Nodes encode semantic rooms (two- or four-wall), identified by grouping planes based on geometric relations. Plane–room edges formalize room boundaries.
- Floor Layer: Nodes represent floor entities, with rooms grouped and linked to the same floor node by inclusion relations.
Edges can be categorized as odometry edges (pose–pose), geometric measurement edges (pose–plane, plane–room, room–floor), and semantic association edges (entities representing the same object or structure) (Fernandez-Cortizas et al., 2023).
2. Graph Construction and Data Association
The standard S-Graph construction proceeds incrementally as follows (Fernandez-Cortizas et al., 2023, Bavle et al., 2022, Wang et al., 14 Mar 2025):
- Keyframe Insertion: A new robot pose keyframe is triggered by a fixed interval or travel/motion threshold, and odometry edges link consecutive poses.
- Plane Extraction: Planar surfaces are extracted from LiDAR point clouds at each keyframe using RANSAC, creating plane nodes with associated parameters. Pose–plane edges enforce spatial constraints.
- Room Segmentation: Detected planes are grouped into rooms if geometric relations (e.g., orthogonality, opposing normals, width thresholds) are satisfied. Room nodes are created and connected to constituent planes.
- Floor Segmentation: Rooms are aggregated into floor entities based on height and spatial proximity.
- Descriptor Computation: Each room is associated with a Scan Context (SC) descriptor, computed over the room-centric point cloud, enabling robust and compact semantic matching.
- Optimization: The entire graph—poses, planes, rooms, floors—is jointly optimized using nonlinear least squares. All measurements and semantic factors are integrated in the objective.
During collaborative SLAM, robots exchange minimal semantic summaries (e.g., room centers, SC descriptors, plane parameters) and perform peer-to-peer alignment and data association based on these signatures. Alignment uses descriptor matching and geometric verification (e.g., through VGICP). Remote entities are transformed into the local frame and integrated as graph nodes and edges (Fernandez-Cortizas et al., 2023, Fernandez-Cortizas et al., 2024).
3. Optimization and Hierarchical Compression
The S-Graph nonlinear optimization objective unifies odometry, geometric, and semantic factors (Fernandez-Cortizas et al., 2024, Bavle et al., 2022, Bavle et al., 2023). Formally,
Hierarchical compression leverages scene structure for computational and memory efficiency (Bavle et al., 2023, Bavle et al., 25 Feb 2025):
- Room-Local Marginalization: Within each room, only the first keyframe pose is retained, and redundant keyframes (which observe the same walls) are marginalized via the Schur complement. Information is preserved by introducing virtual "fill-in" edges between surviving poses.
- Windowed Optimization: Sliding-window techniques update only recent subgraphs (e.g., the last keyframes), reducing solve complexity.
- Floor-Global and Room-Local Optimization: Floor-level global optimization is triggered at loop closures and constrains only nodes and edges within the current floor, using semantic floor tags to restrict search and avoid inter-floor aliasing; room-level optimization is similarly localized.
Empirically, this yields up to 40% reduction in computation time (from 129 ms to 60 ms per optimization in real data), with negligible accuracy loss (Absolute Trajectory Error differences within 2 cm sensor noise) (Bavle et al., 2023, Bavle et al., 25 Feb 2025).
4. Semantic Loop Closure and Relational Descriptors
S-Graphs exploit semantic association for robust loop closure (Fernandez-Cortizas et al., 2023, Millan-Romera et al., 2023, Fernandez-Cortizas et al., 2024):
- Semantic Association Edges: S-Graphs create edges between entities hypothesized to represent the same physical object (room–room, plane–plane, etc.) based on descriptor matching.
- Room-Based Descriptors: The Scan Context descriptor for each room encodes the spatial distribution of points in a rotation-invariant manner. Loop closure detection proceeds by exchanging and matching these descriptors across agents.
- Graph Neural Networks (GNNs): Advanced extensions leverage GNNs to learn higher-level semantic-relational concepts (e.g., "same-room," "same-wall") directly from the graph topology, improving semantic entity discovery, association, and SLAM accuracy (Millan-Romera et al., 2023).
Reported results show that Multi S-Graphs achieve lower bandwidth usage (kilobytes per room vs. megabytes for traditional CSLAM), reliable loop closure (no false positives in challenging corridor scenarios), substantial mapping time reduction (203 s to 123 s in real multi-robot experiments), and globally consistent maps with few-centimeter errors (Fernandez-Cortizas et al., 2023, Fernandez-Cortizas et al., 2024).
5. Distributed and Collaborative SLAM
S-Graphs are particularly well-suited for collaborative and distributed SLAM (Fernandez-Cortizas et al., 2024, Fernandez-Cortizas et al., 2023):
- Information Exchange: Robots share only distilled semantic summaries, hierarchical descriptors, and necessary map elements between peers.
- Consensus Map Generation: Peer-to-peer brokers merge local S-Graphs into a unified, collaboratively optimized semantic map. Inter-robot associations are created only after robust semantic matching and geometric registration.
- Bandwidth Minimization: Experiments demonstrate that Multi S-Graphs require orders-of-magnitude less bandwidth for map sharing compared to raw point cloud or low-level feature exchange schemes (94–98% reduction vs. decentralized CSLAM, 81–84% vs. centralized) (Fernandez-Cortizas et al., 2024).
- Decentralization: No central server is required; local storage, broadcast, and optimization suffices, with global map consistency achieved through repeated semantic-data reconciliation.
6. Experimental Validation and Performance
Experiments across simulated and real-world datasets (e.g., Boston Dynamics Spot robots, construction site floors) validate the advantages of S-Graphs (Fernandez-Cortizas et al., 2023, Fernandez-Cortizas et al., 2024, Bavle et al., 25 Feb 2025):
| Metric | S-Graphs+ (single) | Multi S-Graphs (collab.) |
|---|---|---|
| Mapping time [s] | 203 | 123 |
| Area mapping reduction [%] | - | 18 |
| Overlap-matching time [s] | - | 22 |
| Mean Area Trajectory Error [cm] | ~4–20 (varies) | ~2–3 (multi-robot) |
| Data exchanged (2 robots) [MB] | n/a | 0.26–3.5 |
| Baseline CSLAM data [MB] | (LAMP 2.0: >1.9) | (Swarm-SLAM: >4.3) |
By integrating high-level semantic constraints, S-Graphs enhance loop closure precision beyond scan-context–only or low-level outlier-rejection methods, and avoid catastrophic divergence due to strong topological semantic structure (Fernandez-Cortizas et al., 2023, Fernandez-Cortizas et al., 2024). Decentralized variants efficiently address classic CSLAM challenges such as kidnapped initialization and dynamic environment changes.
7. Extensions and Future Research
Recent research on S-Graphs continues along several axes (Millan-Romera et al., 2023, Bavle et al., 2023, Bavle et al., 25 Feb 2025):
- Learning-Based Semantic Inference: GNNs for relational concept discovery provide enhanced scene expressivity and improve accuracy (−6.8% ATE, +1.8% map matching accuracy) with faster semantic cue extraction (Millan-Romera et al., 2023).
- Hierarchy-Aware Optimization: Marginalization and hierarchy-based graph contraction sustain real-time performance at scale, enabling high-rate updates in large environments (10–20 Hz) (Bavle et al., 2023).
- Generalization and Robustness: Ongoing work aims to extend S-Graph ontologies (e.g., adding corridors, doorways, stairways), improve robustness in dynamic scenes, and develop end-to-end differentiable inference for world-graph construction (Millan-Romera et al., 2023, Bavle et al., 25 Feb 2025).
- Collaborative Multi-Floor SLAM: Floor-level gating and hierarchical optimization prevent erroneous inter-floor associations and support robust SLAM in complex structures (Bavle et al., 25 Feb 2025).
S-Graphs have become a cornerstone of semantic LiDAR SLAM, enabling robust, interpretable, and resource-efficient mapping and localization in both single and multi-robot deployments (Fernandez-Cortizas et al., 2023, Fernandez-Cortizas et al., 2024, Millan-Romera et al., 2023).