Voxelized LiDAR Data

Updated 19 November 2025

Voxelized LiDAR data is the process of discretizing unstructured point clouds into regular, grid-based voxels for efficient computation.
This representation underpins key applications including 3D object detection, robust tracking, and probabilistic mapping in robotics and autonomous driving.
Advanced voxelization methodologies and sparse convolutional operations enable scalable multi-scale feature aggregation and precise spatial analysis.

Voxelized LiDAR data refers to the process and representation of transforming continuous, irregular, and typically sparse LiDAR point clouds into structured, discrete volumetric units called voxels. This grid-based representation underpins a vast array of modern algorithms for perception, mapping, tracking, and compression in robotics, autonomous driving, and remote sensing. Voxelization systematically organizes unordered point sets, enabling efficient parallel processing, neighborhood definition, feature aggregation, and integration into learning-based and probabilistic frameworks.

1. Fundamentals of Voxelization in LiDAR Processing

LiDAR sensors yield unstructured 3D point clouds, where each point encodes spatial location—additionally, attributes such as intensity may be available. Voxelization discretizes this space into a regular lattice, with each cell (voxel) characterized by fixed spatial (and occasionally temporal) resolution:

In standard Cartesian voxelization, points $(x_i, y_i, z_i)$ are mapped to integer grid indices $(i, j, k) = \lfloor \frac{x - x_{\min}}{\Delta_x}, \frac{y - y_{\min}}{\Delta_y}, \frac{z - z_{\min}}{\Delta_z}\rfloor$ covering a scene range $[x_{\min}, x_{\max}] \times [y_{\min}, y_{\max}] \times [z_{\min}, z_{\max}]$ (Xu et al., 2020, Murhij et al., 2023).
Sparse data structures (e.g., hash tables, coordinate lists) track only non-empty voxels to preserve computational and memory efficiency (Mersch et al., 2022, Murhij et al., 2023).

Variants such as cylindrical and 4D voxelization further adapt the lattice to LiDAR spatial statistics and temporal evolution, e.g., $(r, \theta, h)$ bins for compression (Sridhara et al., 2021) or adding a time axis for spatio-temporal segmentation (Mersch et al., 2022).

The voxel grid serves as the substrate for both conventional algorithms (surface reconstruction, occupancy mapping, GICP registration) and deep learning models (sparse convolutions, feature pyramids, detection heads), offering a flexible unifying abstraction.

2. Voxelization Methodologies and Representational Choices

2.1 Cartesian and Cylindrical Voxelization

Classical voxelization utilizes a regular $x$ – $y$ – $z$ grid and fixed cell size. However, the non-uniform spatial sampling inherent to rotating LiDARs—circular in $x$ – $y$ , denser close to the sensor, sparser at distance—renders uniform partitioning sub-optimal for certain tasks. The cylindrical voxelization paradigm remaps each point to $(r, \theta, h)$ , with voxel cell indices $(\lfloor r/\Delta r\rfloor, \lfloor \theta/\Delta \theta\rfloor, \lfloor h/\Delta h\rfloor)$ , thereby capturing spatial density variation and matching scan geometry. This approach leads to more balanced, storage-efficient octrees and reduced bitrate for both geometry and attributes, achieving 35–45% reduction in geometric encoding cost and up to 10% better attribute coding (Sridhara et al., 2021).

2.2 Dynamic, Sparse, and Reconfigurable Voxelization

Dynamic Voxelization encodes only occupied voxels and bypasses costly fixed-size padding and point truncation (Murhij et al., 2023). Each raw point is assigned a voxel, and scatter/gather primitives group and encode features efficiently, enabling $O(N)$ time complexity.
Sparse Convolutional Lattices limit computation to occupied voxels, with neighborhood aggregation performed through hash-based coordinate look-ups (Mersch et al., 2022, Xu et al., 2020).
Adaptive/Reconfigurable Voxels adjust neighborhood definitions via random walk–based processes to counter the sparsity and variable density of LiDAR point clouds, increasing detection robustness for small or distant objects (Wang et al., 2020).

3. Voxel Feature Encoding and Multi-Scale Feature Aggregation

Voxel-based pipelines typically summarize the points in each voxel with mean or max-pooling operations over geometric (coordinates, height, distance to ground), intensity, or normal features (Xu et al., 2020, Hu et al., 2022, Wang et al., 2019), sometimes passing them through point-wise MLPs before aggregation (Zhong et al., 2021, Song et al., 5 Jan 2024). This produces a per-voxel feature vector, often with dimensions ranging from 4 (xyz+intensity) to several hundred (after CNN encoding).

Multi-scale voxelization (e.g., Voxel-FPN) establishes feature pyramids by aggregating features across increasing strides or voxel sizes, combining coarse semantic cues and fine spatial localization for 3D object detection (Wang et al., 2019). Pillar–voxel hybrid frameworks further combine 3D spatial and BEV features, exploiting bidirectional fusion for improved vertical context and computational efficiency (Huang et al., 2023).

4. Sparse and High-Dimensional Convolutional Operations

Sparse convolutional neural networks (SparseConvNet, Minkowski Engine) enable feature extraction from voxelized LiDAR by restricting kernel applications to non-empty voxels, maintaining computational tractability despite the high dimensionality of the grid (Mersch et al., 2022, Xu et al., 2020, Kreutz et al., 21 Oct 2024).

4D Sparse Convolutions extend this principle to spatio-temporal (space + time) grids, capturing motion cues for moving object segmentation (Mersch et al., 2022).
Submanifold Sparse Convolutions preserve the location of active voxels, avoiding the "dilation" problem and ensuring spatial detail retention (Xu et al., 2020, Kreutz et al., 21 Oct 2024).
Hierarchical fusion mechanisms (e.g., dual-stream or cross-iterative fusion in VoxelTrack) leverage voxel features at multiple resolutions/stages, boosting temporal and spatial modeling fidelity (Lu et al., 5 Aug 2024).

5. Probabilistic and Semantic Voxel Maps

Voxelized representation is foundational for probabilistic occupancy grids, semantic mapping, and dense 3D modeling:

Bayesian Occupancy/Label Maps: Voxelized octrees (e.g., OctoMap) maintain log-odds occupancy and semantic class probabilities per cell, recursively updated via Bayesian filtering as new labeled LiDAR scans arrive (Berrio et al., 2020).
Probabilistic Voxel Mapping in SLAM: Compact, point-free voxel representations store only sufficient statistics (centroid, scatter matrices, coplanarity features) to incrementally estimate local plane parameters and their uncertainty (covariance) (Yang et al., 3 Jun 2024, Yuan et al., 2023). These can be merged (via locality-sensitive hashing or union-find) to denoise and compress coplanar surfaces efficiently.
Surface Reconstruction: Volumetric TSDF representations use per-voxel signed distance and confidence weights, with adaptive Gaussian kernels for neighborhood estimation, yielding accurate isosurfaces via Marching Cubes (Roldão et al., 2019).

Semantic labeling can be handled with per-voxel class distributions (Bayesian update), hybrid point-voxel implicit function decoders (Zhong et al., 2021), or fusion with camera information via voxel-level self-attention across pixel and patch projections (Song et al., 5 Jan 2024).

6. Applications: Detection, Tracking, Segmentation, and Mapping

Voxelized LiDAR data is the cornerstone for a broad range of downstream applications:

3D Object Detection: Voxel grids enable efficient use of sparse 3D CNN backbones, sophisticated fusion modules, multi-scale and hybrid representations, supporting state-of-the-art detection accuracy and real-time performance (Murhij et al., 2023, Huang et al., 2023, Wang et al., 2019).
Tracking: Voxel-based SOT (single-object tracking) frameworks (VoxelTrack) process both spatial structure and cross-temporal context, outperforming point-based methods especially in density-varying and spatially-complex scenes (Lu et al., 5 Aug 2024).
Segmentation and Panoptic Tasks: Voxelized representations facilitate semantic, instance, and panoptic segmentation; hybrid explicit–implicit networks maintain high accuracy at substantially lower runtime (Zhong et al., 2021).
Mapping and SLAM: Probabilistic and mergeable voxel maps provide robust foundations for odometry, loop closure, and globally consistent mapping in large-scale and dynamic environments, with tight LiDAR–IMU coupling and GPU-accelerated pipelines (Koide et al., 2022, Yang et al., 3 Jun 2024, Yuan et al., 2023).

7. Limitations and Trade-offs

Key design choices impact memory usage, runtime, and task performance:

Voxel Resolution: Higher resolution increases spatial granularity but at cubic cost to memory and compute. Practical values lie in the 0.05–0.2 m range, with vertical resolution often chosen coarser than horizontal to balance LiDAR scan characteristics and object scale (Xu et al., 2020, Zhong et al., 2021).
Feature Aggregation: Mean pooling efficiently summarizes voxels with many points, but can smooth out small obstacles or outliers. Max-pooling and point-wise MLPs can partially alleviate these effects (Zhong et al., 2021, Huang et al., 2023).
Sparse Representation Overhead: Maintaining sparse indices, hash-tables, and kernel maps incurs overhead, but is strongly outweighed by the memory and compute savings, especially for large outdoor scenes (Mersch et al., 2022, Murhij et al., 2023).
Adaption to Density Variability: Doppler LiDAR, occlusions, and point sparsity at distance necessitate advanced density compensation techniques (density-aware RoI pooling, point density encoding, or cylindrical partitioning) for robust model performance (Hu et al., 2022, Sridhara et al., 2021).

References

(Mersch et al., 2022) "Receding Moving Object Segmentation in 3D LiDAR Data Using Sparse 4D Convolutions"
(Sridhara et al., 2021) "Cylindrical coordinates for LiDAR point cloud compression"
(Xu et al., 2020) "SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural Networks"
(Murhij et al., 2023) "Rethinking Voxelization and Classification for 3D Object Detection"
(Wang et al., 2020) "Reconfigurable Voxels: A New Representation for LiDAR-Based Point Clouds"
(Hu et al., 2022) "Point Density-Aware Voxels for LiDAR 3D Object Detection"
(Roldão et al., 2019) "3D Surface Reconstruction from Voxel-based Lidar Data"
(Yang et al., 3 Jun 2024) "C $^3$ P-VoxelMap: Compact, Cumulative and Coalescible Probabilistic Voxel Mapping"
(Yuan et al., 2023) "VoxelMap++: Mergeable Voxel Mapping Method for Online LiDAR(-inertial) Odometry"
(Lu et al., 5 Aug 2024) "VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking"
(Zhong et al., 2021) "VIN: Voxel-based Implicit Network for Joint 3D Object Detection and Segmentation for Lidars"
(Wang et al., 2019) "Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection from point clouds"
(Berrio et al., 2020) "Camera-Lidar Integration: Probabilistic sensor fusion for semantic mapping"
(Fadadu et al., 2020) "Multi-View Fusion of Sensor Data for Improved Perception and Prediction in Autonomous Driving"
(Kreutz et al., 21 Oct 2024) "LiOn-XA: Unsupervised Domain Adaptation via LiDAR-Only Cross-Modal Adversarial Training"
(Koide et al., 2022) "Globally Consistent and Tightly Coupled 3D LiDAR Inertial Mapping"
(Huang et al., 2023) "Voxel or Pillar: Exploring Efficient Point Cloud Representation for 3D Object Detection"
(Song et al., 5 Jan 2024) "VoxelNextFusion: A Simple, Unified and Effective Voxel Fusion Framework for Multi-Modal 3D Object Detection"