XGrid-Mapping: Hybrid Neural LiDAR Mapping

Updated 27 December 2025

XGrid-Mapping is a hybrid grid framework that fuses explicit occupancy-based sparse grids with implicit neural feature grids for high-fidelity LiDAR mapping.
It utilizes submap partitioning, narrow-band ray traversal, and a distillation-based overlap alignment to achieve real-time scene updates and cross-submap consistency.
Dynamic removal of transient objects and efficient memory management enhance mapping robustness and scalability in complex, incremental environments.

XGrid-Mapping is a hybrid grid framework designed for large-scale, efficient, and robust incremental neural LiDAR mapping. By integrating explicit occupancy-based sparse grids with implicit neural feature grids, XGrid-Mapping achieves real-time scene updating and high-fidelity reconstruction, overcoming efficiency bottlenecks and structural discontinuities prevalent in prior approaches employing either representation alone. The system employs submap partitioning, overlap alignment via feature distillation, and dynamic removal of transient objects, representing an overview of geometric and learned methodologies for spatial mapping (Song et al., 24 Dec 2025).

1. Hybrid Grid Representation

XGrid-Mapping models each spatial submap Ω with two joint representations: (a) an explicit sparse grid Ω^s, and (b) a dense implicit grid Ω^d. The explicit sparse grid is instantiated via the fVDB data structure, which hierarchically hashes and stores only observed voxels. Each voxel c encodes truncated signed distance fields (TSDF) or occupancy values T(c). This structure allows rapid exclusion of free-space during ray traversal and provides geometric priors for surface localization.

The implicit grid Ω^d leverages multiresolution hash encoding (in the style of Tiny-CUDA-NN), associating each grid vertex v_j with a learnable feature vector h(v_j) ∈ ℝ^F. For a query point p, features of the bounding voxel vertices are interpolated, and input into a shared multi-layer perceptron F_θ to yield a continuous signed distance $\hat{s}(p)$ :

$\hat s(p) = F_θ(\text{Interp}(\{h(v_j)\}_{j=1}^8)).$

At inference and training, LiDAR rays $r_i(t) = o + t u_i$ are traversed, first within Ω^s to restrict samples to a narrow-band around surfaces. Sample points are generated as $p_s = o + (d_i + \delta) u_i$ , $\delta \sim \mathcal U(-T_r, T_r)$ . The ground-truth SDF $s(p_s)$ and network output $\hat{s}(p_s)$ are transformed into occupancy probabilities and optimized via a hybrid loss:

$\mathcal L = \lambda_{\mathrm{bce}} \mathcal L_{\mathrm{bce}} + \lambda_{\mathrm{eik}} \mathcal L_{\mathrm{eik}},$

where $\mathcal L_{\mathrm{bce}}$ is a per-point occupancy cross entropy, and $\mathcal L_{\mathrm{eik}}$ enforces Eikonal constraints on the SDF gradient.

2. Submap-Based Incremental Pipeline

Spatial domain is partitioned into fixed-size, axis-aligned submaps $\{\Omega_0, \Omega_1, \ldots\}$ , each maintaining its own explicit and implicit grids. Only one submap is active at any time; new scans are incorporated as long as the entry rate $r$ (fraction of scan points within the current submap) is above threshold $r_{\mathrm{min}}$ . Upon submap closure—triggered when $r < r_{\mathrm{min}}$ —the next submap records its center aligned to grid coordinates, initializes fresh Ω^s and Ω^d, and processing proceeds.

The scan-processing pipeline involves:

Transformation and dynamic removal of moving objects,
Updating Ω^s with static points,
Hybrid grid training on sampled surface points (using narrow-band guidance from Ω^s),
Overlap alignment loss on newly created submaps,
Key-scan management and mesh extraction for finalized submaps.

Memory usage is controlled by the sparse allocation of Ω^s and periodic release of data structures upon meshing, bounding peak overhead by the number of live submaps (1–2 typical).

3. Distillation-Based Overlap Alignment

Submap transitions introduce discontinuities in the implicit grid due to random initialization of feature vectors $h^{(t+1)}$ in Ω^d for new submaps. To mitigate this, XGrid-Mapping implements a distillation-based feature alignment. The overlap set $\mathcal O$ of voxels with intersecting coverage in consecutive submaps is computed:

$\mathcal O = \{\,c | c \in \Omega^s_t \wedge c \in \Omega^s_{t+1}\,\}.$

For each overlap voxel, the feature vectors at all eight vertices are retrieved from both submaps, and an $\ell_1$ distillation loss is applied:

$\mathcal L_{\mathrm{align}} = \lambda_{\mathrm{align}} \sum_{vo \in \mathcal O} \sum_{j=1}^8 \lVert h_j^{(t)} - h_j^{(t+1)} \rVert_1$

This loss is backpropagated via the shared MLP and hash tables, typically via additional iterations focusing on the overlapping region, resulting in improved cross-submap consistency.

4. Dynamic Removal Module

To ensure mapping integrity, XGrid-Mapping applies dynamic object removal prior to grid update. Adopting the FreeDOM region-growing with raycasting scheme, each LiDAR point is examined in context of its voxel neighbors; static compatibility is determined by consistency with observed structure. Points failing this criteria are labeled dynamic and excluded. The filtered set $S_{\mathrm{static}}$ is then used to update Ω^s. There is no fixed pruning threshold; rather, voxels lacking support from recent static measurements are left inactive, and entire submaps are released after meshing, avoiding global garbage collection.

This module ensures that the mapping maintains only persistent, static environmental features, enhancing robustness to transient and non-stationary elements.

5. Experimental Evaluation

XGrid-Mapping demonstrates measurable improvements over existing mapping frameworks on standard datasets. Table 1 (below) summarizes per-frame mapping speed across Maicity and KITTI sequences, establishing real-time capability with stable runtimes.

Method	MAI00	MAI01	KT03	KT07
NeRF-LOAM	4.41	1.26	4.52	6.72
Submap only	0.99	0.65	1.03	1.01
fVDB only	0.49	0.31	0.55	0.51
XGrid-Mapping	0.31	0.28	0.29	0.26

Reconstruction metrics in Table 2 confirm enhanced accuracy, completeness, and Chamfer-L1 performance relative to state-of-the-art baselines. Ablation studies (Tables 3 and 4) further isolate the contributions of overlap alignment, key-scan replay, and dynamic removal, each improving mapping consistency and quantitative fidelity.

Method	Acc.↓	Comp.↓	C-L1↓	F-↑
VDBFusion	3.44	3.65	3.55	93.3
SHINE-Mapping	4.62	2.22	3.42	92.3
NeRF-LOAM	3.60	2.07	2.84	94.2
PIN-SLAM	5.60	2.70	4.15	87.6
XGrid-Mapping	3.38	2.01	2.69	95.3

A plausible implication is that the fusion of explicit surface priors and neural dense representations principally enables both speed and accuracy. The modularity of submap management and memory release further enables scalability for large-scale, real-world deployment scenarios.

6. Context and Significance

The XGrid-Mapping framework marks a transition in neural LiDAR mapping towards hybridization: explicit grid structures enable fast geometric reasoning, while implicit neural methods afford expressive modeling of nuanced spatial detail. Submap-based organization, overlap distillation, and dynamic removal collectively address challenges of incremental large-scale mapping, including submap discontinuity, catastrophic forgetting, and dynamic scene adaptation. The architecture exhibits practical advantages in runtime performance and mapping quality, suggesting utility for advanced navigation and real-time environmental understanding in autonomous systems.

Future research directions include further optimizing the feature hash functions, generalizing the overlap alignment scheme beyond $\ell_1$ penalties, and adapting the framework for higher-dimensional or multi-modal sensor fusion tasks (Song et al., 24 Dec 2025).

Markdown Upgrade to Chat

References (1)

XGrid-Mapping: Explicit Implicit Hybrid Grid Submaps for Efficient Incremental Neural LiDAR Mapping (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to XGrid-Mapping.