Hierarchical Voxel Hashing (hVox)
- Hierarchical Voxel Hashing (hVox) is a sparse multi-resolution voxel structure that uses hash-based indexing to efficiently represent 3D data.
- It integrates hierarchical grids with Morton encoding to enable constant-time access for geometric queries in applications like LiDAR-inertial odometry and 3D CNNs.
- hVox supports real-time updates and scalable memory usage, outperforming traditional octree-based methods in speed, memory efficiency, and accuracy.
Hierarchical Voxel Hashing (hVox) is a sparse data structure for efficient multi-resolution spatial representation of 3D data. It enables memory- and computation-efficient access to geometric, topological, and learned feature information at various grid resolutions, using hash-based indexing for non-empty voxels. Prominent recent usages include LiDAR-inertial odometry (LIO) with fast correspondence and surface normal queries (Choi et al., 3 Dec 2025), as well as 3D convolutional neural networks (CNNs) for shape analysis and part segmentation (Shao et al., 2018). The core distinguishing features of hVox include its hierarchy of voxel resolutions, compact hashed storage, and mechanisms for O(1) access to geometric or feature aggregates.
1. Hierarchical Multi-Resolution Voxel Structure
hVox comprises a multi-level collection of nested uniform grids, with each level storing only occupied voxels via hash tables. In the context of LIO, a two-level hierarchy is prominent:
- Level-0 (L0): Finest voxels of edge length (e.g., 0.5 m), each cell storing the running centroid of all points inside it.
- Level-1 (L1): Coarser voxels of edge length , each covering a block of L0 cells. L1 cells aggregate already computed L0 centroids into a planar surfel, including mean position, covariance, and normal.
For 3D CNNs, the hierarchy typically spans more levels: each level represents a grid of voxels, with the finest grids (e.g., ) representing high spatial details and coarser ones enforcing multiscale aggregation. Only the non-empty subset is stored at each level by hashing.
This hierarchy facilitates spatial decoupling: information such as aggregated geometry (e.g., centroids, surfels) can be efficiently computed at each scale, supporting both fine localization and robust higher-order statistics. In LIO, this enables robust normal estimation by ensuring that sufficient centroids are always available for each surfel (Choi et al., 3 Dec 2025).
2. Spatial Hashing and Morton (Z-Order) Encoding
Spatial indexing in hVox is accomplished using hash functions that compactly pack the subset of non-empty voxels into arrays. For real-time geometric queries, cache-friendly key generation is crucial. One approach is Morton (Z-order) coding:
Given integer 3D voxel coordinates , the Morton code is computed by bit-interleaving:
For up to 0 bits per axis, this supports 1 voxel coverage. The sequential nature of Z-order curves ensures spatially adjacent queries are likely to access neighboring memory locations, providing high cache locality for streaming data (e.g., LiDAR scans) (Choi et al., 3 Dec 2025).
In 3D CNN pipelines, hash-based perfect or nearly-minimal hashing is used rather than Z-order; typical functions include a two-level hash:
2
Where 3 (local grid hash), 4 (offset-table hash), and 5 is an offset table, producing nearly-minimal perfect hashes for all occupied voxels (Shao et al., 2018). The position tag array ensures false positives are eliminated by explicit coordinate matching.
3. Data Structures and Update Mechanisms
Pre-computed Surfel Aggregation for Geometry
In the LIO context, each L0 cell is updated in constant time by adjusting its centroid 6 with a new point 7:
8
L1 surfels aggregate the centroids of all occupied L0 children:
- Mean centroid: 9
- Covariance: 0
- Normal: 1, the eigenvector corresponding to the smallest eigenvalue of 2 (from PCA)
- Planarity score: 3
L1 updates are "lazy": on modification of any child, a parent is marked dirty and recomputed once per map update, substantially amortizing update cost (Choi et al., 3 Dec 2025).
Hierarchical Hashing for Sparse CNNs
Each level of the hVox hierarchy uses paired arrays:
- Data array 4 (5) for features
- Hash table 6 (indices into 7, 8 for empty)
- Offset table 9 (hash offsets)
- Position tag 0 (original 1) (Shao et al., 2018)
A minimal perfect hash is constructed by iteratively assigning each non-empty 2 to a slot 3 with the above hash, and resolving collisions by offset-table reassignments. False-positives are invalidated by coordinate tag checks.
4. Query and Convolution Operations
O(1) Correspondence and Geometric Queries
To retrieve a surfel for a query point 4:
- Quantize to L1 index 5
- Compute its Morton code
- Hash-lookup the pre-computed surfel in O(1) time
If the surfel exists and has planarity 6, the surfel is returned; otherwise, the cell is treated as empty. This removes the need for neighbor enumeration or runtime surface fitting in LIO, enabling constant-time data association and point-to-plane residual evaluation (Choi et al., 3 Dec 2025).
Sparse Hash-based 3D Convolutions
For 3D CNNs, hVox enables sparse convolutions via two GPU kernels:
- hash2col: Gathers 7 spatial neighbors for each non-empty output voxel into a column-major matrix suitable for batched matrix multiplication (GEMM).
- col2hash: Scatters gradients from the convolution output back into the hashed data array on the backward pass.
This enables convolution, normalization, activation, and pooling directly on only active voxels, leading to high memory efficiency and computational parallelism. Complexity per layer is 8 per batch (Shao et al., 2018).
5. Memory, Computational Complexity, and Performance
Theoretical Complexity
| Operation | hVox (Surfel-LIO) | Flat voxel hash (iVox) | ikd-Tree (Fast-LIO2) |
|---|---|---|---|
| Voxel insertion / centroid update | 9 | 0 | 1 (rebalance) |
| Surfel update | 2 amortized | – | – |
| Query (per point, mapping/correspondence) | 3 | 4 neighbor lookups + 5 | 6 search + 7 |
Memory grows with the number of occupied voxels per level. In geometric applications, each L0 stores one centroid and each L1 stores one surfel; the total storage is thus linear in point count, and comparable to flat hashing. Unlike kd-trees, rebalancing is not required (Choi et al., 3 Dec 2025). For 3D CNNs, hVox is 8 for grid resolution 9 (for non-empty voxels and hash tables), compared to 0 for typical octree-based grids, and 1 for dense grids (Shao et al., 2018).
Empirical Performance
LiDAR-Inertial Odometry:
- On M3DGR (Livox AVIA, Mid-360), Surfel-LIO (hVox) achieves 531/690 FPS vs. 125/282 FPS (Fast-LIO2) and 184/353 FPS (Faster-LIO) with comparable or better state estimation accuracy (APE ≈ 0.36 m).
- Per-point nearest-neighbor and plane costs: 0.05 μs/pt and 0.01 μs/pt (vs. 1–3 μs/pt and 0.2–0.6 μs/pt for baselines) (Choi et al., 3 Dec 2025).
3D CNN Applications:
- At 2 resolution: OCNN uses ≈6080 MB, hVox ≈2187 MB (1/3 memory footprint).
- At 3: OCNN OOM, hVox ≈4510 MB.
- Forward-backward per-iteration time is reduced by ≈10% over OCNN, with matching or slightly better classification/mAP/IoU:
- ModelNet40: hVox up to 90.2% accuracy, OCNN up to 90.2%, FullVox OOM.
- ShapeNet55 mAP: hVox up to 0.878, OCNN up to 0.875.
- ShapeNet part segmentation: hVox outperforms or matches OCNN across tested categories, especially at high resolution (Shao et al., 2018).
6. Best Practices and Integration Guidelines
- For geometry/LIO: precompute surfel statistics and aggressively use lazy update mechanisms to amortize expensive aggregation.
- For 3D CNN pipelines:
- Precompute all per-model, per-level hash tables and offset arrays, typically offline (CPU or GPU).
- During inference/training, concatenate per-model hashes to form a batch "super-hash" for efficient batched convolution.
- At each level 4, process with convolution 5 BN 6 ReLU 7 pooling (kernel 8, channels 9 as a function of level); typically use levels covering 0 to 1.
- Use hash2col/col2hash kernels for neighborhood matrix assembly and error backpropagation.
- Validate hash table occupancy using position tags to eliminate false matches.
- To avoid OOM at ultra-high resolutions, prefer hVox over dense grids and standard octree-based methods (Shao et al., 2018).
A plausible implication is that hVox’s architectural principles—hierarchical, hashed, occupancy-driven storage—are adaptable to other spatially sparse 3D learning and mapping pipelines requiring efficient, high-resolution access to geometric or semantic aggregates.
7. Comparative Summary and Application Scope
Hierarchical Voxel Hashing unifies the strengths of sparse occupancy representation, adaptive spatial aggregation, and high-throughput data access. In geometric SLAM and odometry, it supports O(1) correspondence and model update, achieving state-of-the-art throughput without loss of estimation accuracy (Choi et al., 3 Dec 2025). In spatial deep learning, it provides an efficient substrate for convolution, batch normalization, and pooling layers in 3D CNNs, with empirical and theoretical memory and runtime advantages over dense and octree-based alternatives at high resolution (Shao et al., 2018).
Its use cases include LiDAR point cloud registration, real-time state estimation, 3D shape classification/retrieval, and part segmentation—the latter two demonstrating that hVox pipelines require 1/3 of the memory and run ≈10% faster than comparable octree-based networks, while achieving equal or better task metrics at up to 2 resolution on 8 GB GPUs. This suggests a broad suitability for 3D vision, robotics, and shape analysis workloads where memory footprint and high-resolution access are primary constraints.