Plenoxels: Voxelized Radiance Fields
- Plenoxels are voxel-based radiance field representations that discretize the plenoptic function with spherical harmonics, ensuring efficient view synthesis.
- The method replaces continuous MLP encoding with explicit grid storage and trilinear interpolation, achieving up to 100× faster optimization while maintaining photorealistic quality.
- Extensions include HDR synthesis, real-time compression, and 6-DoF monocular pose estimation, demonstrating versatility across computer vision and graphics applications.
Plenoxels are a family of volumetric radiance field representations that parameterize the plenoptic function as a sparse or dense 3D grid with spherical harmonics, enabling high-fidelity view synthesis and efficient optimization without reliance on neural networks. This approach replaces the continuous MLP encoding of classic Neural Radiance Fields (NeRF) with explicit voxel-based storage and trilinear interpolation, making gradient optimization fast and fully differentiable while retaining photorealistic rendering quality. Plenoxels have been extended to specialized settings, including high-dynamic-range synthesis (HDR-Plenoxels), real-time compression (neural codebook augmentation), and 6-DoF monocular pose estimation pipelines, collectively demonstrating the versatility and computational efficiency of this voxelized radiance field paradigm.
1. Fundamental Representation and Rendering Formalism
Plenoxels discretize the scene domain into regular 3D grids of voxels. At each grid vertex, two key quantities are stored: a scalar volumetric density (opacity) and a view-dependent radiance vector consisting of spherical harmonic (SH) coefficients. These SH coefficients efficiently encode outgoing radiance as a function of direction, capturing complex BRDFs and view-dependent lighting effects.
Rendering is achieved via classical volume rendering integrated along camera rays. For a ray
sampled at points with spacing , the expected color is computed as:
with accumulated transmittance
Here, and are interpolated from the eight cell corners surrounding each sample point. This discrete formulation is fully differentiable, enabling end-to-end optimization of the voxel grid parameters via standard gradient-based methods (Kolios et al., 2024).
2. Optimization and Computational Efficiency
Plenoxels achieve significant computational efficiency by replacing costly neural architectures with explicit grid storage. All parameters—densities and SH coefficients—are directly optimized using gradient descent on photometric reconstruction loss:
0
where 1 is the ground-truth color for ray 2. Trilinear interpolation allows mapping between continuous scene locations and grid vertices, and regularization techniques, such as total variation (TV) loss on grid values, are employed to enforce spatial smoothness and encourage sparsity.
Empirical results show Plenoxels can be optimized two orders of magnitude faster than NeRF with comparable visual quality, substantially accelerating both training and inference (Kolios et al., 2024).
3. Extensions: Compression, HDR, and Modular Enhancements
3.1 Compact Real-Time Radiance Fields
Explicit grid representations, while computationally efficient, incur high storage costs. Non-uniform compression strategies target this overhead by pruning and quantizing grid cells according to local energy (gradient magnitude). A per-cell compression factor 3 modulates quantization:
4
where 5 is local gradient energy. High-frequency cells are preserved at high bit-depth; low-frequency cells are heavily quantized or pruned. Further, a compact neural codebook augments the grid to capture residual high-frequency details through soft codebook assignment and fast optimization. Storage reductions of approximately 6 over baseline Plenoxels have been reported, with real-time rendering performance (e.g., 200 FPS on RTX2080Ti) and improved fidelity (Li et al., 2023).
3.2 High Dynamic Range Plenoxels
HDR-Plenoxels extend the voxel grid to represent HDR radiance fields from multi-view LDR images captured under diverse camera settings. A tone-mapping module is learned jointly, mirroring the in-camera ISP pipeline—explicitly modeling per-image white-balance/exposure scaling and the camera response curve (CRF) as a piecewise-linear mapping. Loss functions regularize both the grid (TV, sparsity) and CRF smoothness. HDR-Plenoxels are capable of reconstructing high-fidelity, relightable HDR radiance fields with robust handling of exposure and color variations. Compared to NeRF-A, HDR-Plenoxels achieve equivalent PSNR (27–31 dB synthetic, 28–33 dB real) in less than 5% of the training time (∼30 min vs. ∼6.5 h on RTX 3090) (Jun-Seong et al., 2022).
4. 6-DoF Monocular Pose Estimation via Plenoxels
The rapid differentiable rendering of Plenoxels enables their use in image-based 6-DoF pose estimation. DPPE (Dense Pose in a Plenoxels Environment) leverages analysis-by-synthesis: given a pre-trained Plenoxels scene and an observed image, the algorithm recovers the camera pose 7 by minimizing photometric error between rendered and observed images.
Gradients with respect to pose parameters are estimated using a central difference scheme:
8
enabling efficient stochastic gradient descent on the SE(3) Lie algebra. Ablation studies show that, using only 1% of the image rays (∼6,400 pixels), DPPE attains angular errors (9) of 0 in ∼10 s per run, with increasing grid resolution (up to 1) providing optimal performance with diminishing returns beyond this scale (Kolios et al., 2024).
| Grid Resolution | % RE < 2 | Avg. RE (3) | Runtime (s) |
|---|---|---|---|
| 4 | 0.31 | 7.98 | 33.1 |
| 5 | 0.56 | 5.06 | 47.9 |
| 6 | 0.91 | 2.82 | 73.1 |
| 7 | 0.81 | 2.85 | 77.8 |
5. Empirical Results and Practical Considerations
Plenoxels and their variants have demonstrated strong performance across synthetic and real-world multi-view datasets. In view synthesis, compact grid+codebook methods match or slightly exceed baseline PSNR/SSIM with a dramatic reduction in memory requirements and increased rendering speed (Li et al., 2023). HDR-Plenoxels enable robust reconstruction under varying exposure and white balance (PSNR up to 8 dB synthetic, 9 dB real) while offering full control over radiometric appearance through the learned ISP model (Jun-Seong et al., 2022). For pose estimation, DPPE achieves accurate recovery with minimal computational budget, exploiting the differentiability and speed advantages of the Plenoxels backbone (Kolios et al., 2024).
Ablations highlight that aggressive pixel subsampling incurs minimal performance loss, and that grid resolution should be tailored to the complexity of the scene with 0 often balancing accuracy and efficiency.
6. Significance, Limitations, and Directions
Plenoxels recast volumetric radiance field reconstruction as an explicit, mesh-free optimization problem, removing the need for costly neural function approximators. This leads to significantly faster optimization, simpler pipelines, and facilitates real-time extensions, compression, and new downstream tasks such as monocular pose estimation. Nonetheless, high grid resolutions entail storage cost, motivating research into compression, hybrid neural-voxel approaches, and adaptive spatial data structures. Further, explicit handling of scene dynamics, reflectance, and non-Lambertian effects in non-static or visually complex scenes remains an active area for methodological development.
Plenoxels, through their grid-based differentiable parameterization and modular extensibility, have established a foundation for practical, efficient, and versatile radiance field models in computer vision, graphics, and robotics (Kolios et al., 2024, Li et al., 2023, Jun-Seong et al., 2022).