Papers
Topics
Authors
Recent
Search
2000 character limit reached

Semantic OctoMap: 3D Probabilistic Mapping

Updated 15 May 2026
  • Semantic OctoMap is a 3D mapping structure that extends traditional OctoMap by integrating per-voxel occupancy and semantic class probabilities.
  • The method employs Bayesian fusion and log-odds updates—as well as Gaussian Process inference—to combine multi-modal sensor data effectively.
  • It supports active exploration and SLAM by optimizing memory use, accelerating ray traversal, and enhancing semantic scene understanding in real time.

A Semantic OctoMap is a 3D probabilistic mapping data structure that extends the classical OctoMap representation to encode and update not only voxel (volumetric cell) occupancy but also per-voxel semantic class probabilities. Built on a sparse octree architecture, Semantic OctoMaps enable the fusion of multi-modal perception (e.g., RGB-D segmentation, LiDAR) with real-time mapping, facilitating information-driven exploration, semantic scene understanding, and efficient memory utilization. This class of mapping systems is central to contemporary robotics, UAV autonomy, and semantic SLAM research, supporting both Bayesian and kernel-based statistical fusion mechanisms at scale (Canh et al., 2024, Asgharivaskasi et al., 2021, Jadidi et al., 2017).

1. Semantic OctoMap Data Structure and Probabilistic Model

A Semantic OctoMap is fundamentally an adaptive octree, where each leaf voxel maintains:

  • An occupancy log-odds value (scalar) encoding P(occupiedZ1:t)P(\text{occupied}\mid Z_{1:t}).
  • A categorical probability vector or log-odds vector for CC semantic classes (e.g., wall, chair, free, unknown).

Bayesian Fusion Representation

For class-cc in voxel ii at time tt:

Pt(i)=[Pt(i,1),Pt(i,2),...,Pt(i,C)]T,c=1CPt(i,c)=1\mathbf{P}_t(i) = [P_t(i,1), P_t(i,2), ..., P_t(i,C)]^T,\quad \sum_{c=1}^C P_t(i,c) = 1

Fusion of new observations occurs via Bayesian multiplicative updates:

P~t(i,c)=Pt1(i,c)st(u,c)α,Pt(i,c)=P~t(i,c)cP~t(i,c)\tilde{P}_t(i,c) = P_{t-1}(i,c)\cdot s_t(u,c)^\alpha\,,\quad P_t(i,c) = \frac{\tilde{P}_t(i,c)}{\sum_{c'}\tilde{P}_t(i,c')}

where st(u,c)s_t(u,c) is the semantic softmax output for pixel uu projected to voxel ii, and CC0 is a fusion inertia parameter (Canh et al., 2024).

Log-Odds Multiclass Representation

For multi-class mapping,

CC1

and class probabilities via softmax:

CC2

Efficient log-odds additive updates (with inverse measurement models) are used for each incoming ray measurement (Asgharivaskasi et al., 2021).

GP-Based Semantic Mapping

Alternatively, per-voxel semantics can be inferred by querying a set of trained Gaussian Process (GP) binary classifiers; for each voxel center CC3 and class CC4:

CC5

where CC6 is computed via the GP’s Laplace-approximated posterior and the probit likelihood (Jadidi et al., 2017).

2. Map Update Mechanisms and Fusion Algorithms

Occupancy Updates

Semantic OctoMaps inherit from OctoMap the log-odds update rule for voxel occupancy:

CC7

where CC8 is set as CC9 for hits and cc0 for traversed (free) voxels. Probability recovery is via:

cc1

(Canh et al., 2024).

Semantic Bayesian Fusion

For each keyframe or sensor update:

  • Project segmented pixels into 3D points, determine endpoint voxels.
  • Update voxel’s class distribution using the Bayesian product and cc2 fusion inertia (Canh et al., 2024).

Multiclass Bayesian Mapping

For multi-class sensors (range-category):

  • Use an inverse observation model to compute a cc3-vector update for each traversed voxel along a ray.
  • Apply the additive log-odds update and perform octree pruning when eight children share an identical probability vector (Asgharivaskasi et al., 2021).

GP Map Inference

For GP-based methods:

  • Collect labeled 3D observations as GP training data.
  • After training, conduct batched or incremental inference at every leaf node center to assign/update semantic class probability vectors (Jadidi et al., 2017).

3. Data Structures, Memory, and Computational Complexity

Semantic OctoMaps utilize pointer-based sparse octree data structures:

  • Each node represents an axis-aligned cube, with eight children recursively subdividing space.
  • Only observed regions are instantiated, optimizing for surface area rather than total volume.
Operation Complexity Reference
Insert/update (one point) cc4 (Canh et al., 2024)
Ray traversal cc5 per update (Canh et al., 2024)
Semantic GP inference cc6 per batch (Jadidi et al., 2017)
Mapping update (multi-K) cc7 (Asgharivaskasi et al., 2021)

Memory usage for semantic mapping (10–15 MB per 10×10×3 m at 5 cm resolution; cc8 voxels ≈ 12 MB) is significantly lower than raw point cloud storage, with <20 MB sufficient for real-time UAV mapping at cc980% mean IU accuracy (Canh et al., 2024).

4. Integration with Perception and SLAM Systems

A Semantic OctoMap operates in concert with:

  • A SLAM backend providing accurate 6-DoF global pose for each frame (e.g., ORB-SLAM3).
  • A semantic segmentation frontend (e.g., PSPNet) outputting per-pixel softmax class distributions.

At each keyframe:

  1. RGB-D frames are processed to extract ORB features and estimate pose ii0.
  2. PSPNet infers a per-pixel class probability map; semantic runs ii1 ms/frame on TensorRT GPU (Canh et al., 2024).
  3. Depth pixels are back-projected using camera intrinsics and global pose; points and their semantic vectors are fused into the octomap along the corresponding rays.
  4. Conflicting semantic and occupancy evidence is reconciled through inertia (via ii2) and probability normalization.

5. Information-Theoretic Semantic Exploration

Semantic OctoMaps directly support planning and active exploration by maximizing expected semantic information gain.

Shannon Semantic Mutual Information (SSMI)

For a trajectory and a set of simulated future rays:

ii3

SSMI can be efficiently computed using semantic run-length encoding (SRLE) for ray-octree intersections:

  • Compresses sequences of homogeneous voxels into ii4 segments.
  • Enables ii5 time per ray rather than ii6, crucial for scaling to large environments (Asgharivaskasi et al., 2021).

The planning loop involves:

  • Extracting frontiers (boundaries between known and unknown).
  • Simulating rays along prospective paths, scoring each by SSMI per travel cost.
  • Executing the maximal information gain trajectory, then replanning. Empirical results demonstrate 30–50% lower travel per entropy reduction versus semantic-agnostic or frontier methods while running onboard at ii7–ii8 Hz (Asgharivaskasi et al., 2021).

6. Practical Implementation and Performance Considerations

System implementations partition hardware resources:

  • GPU: Batched semantic segmentation.
  • CPU: SLAM, octomap fusion, ray traversal and Bayesian updates.

In “S3M,” the Jetson Xavier AGX achieves:

  • 10–15 MB map size for a 10×10×3 m volume (5 cm voxels).
  • 10 Hz mapping and semantic updates (processing ii9 voxels in 2 min flight).
  • Absolute trajectory error improvements over classic SLAM (reduction to tt0–tt1 m ATE); semantic accuracy 82% mean IU (Canh et al., 2024).

Trade-offs include:

  • Voxel size tt2: smaller yields finer detail but cubic memory cost.
  • tt3: overly bold values destabilize mapping.
  • Semantic inertia tt4: balances adaptation with robustness to segmentation noise.
  • Clamping log-odds avoids runaway certainty in unstable or ambiguous regions.

7. Comparative Approaches and Research Directions

Several approaches for Semantic OctoMap construction have been demonstrated:

Semantic OctoMap representations are being integrated into active exploration, lifelong mapping, and high-level reasoning tasks, with growing attention to compression, uncertainty quantification, and efficient incremental learning as important future directions. Their role is increasingly central in bridging the gap between geometric SLAM and high-level scene understanding.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Semantic OctoMap.