GPU Volume Rendering with Hierarchical Compression Using VDB (2504.04564v2)

Published 6 Apr 2025 in cs.GR and cs.DC

Abstract: We propose a compression-based approach to GPU rendering of large volumetric data using OpenVDB and NanoVDB. We use OpenVDB to create a lossy, fixed-rate compressed representation of the volume on the host, and use NanoVDB to perform fast, low-overhead, and on-the-fly decompression during rendering. We show that this approach is fast, works well even in a (incoherent) Monte Carlo path tracing context, can significantly reduce the memory requirements of volume rendering, and can be used as an almost drop-in replacement into existing 3D texture-based renderers.

Summary

The paper proposes a method for GPU volume rendering of large, sparse datasets using OpenVDB for offline fixed-rate compression and NanoVDB for efficient, on-the-fly sampling.
The core technique involves a fixed-rate, lossy compression algorithm that sorts volume bricks by similarity to a background value and activates a percentage of them based on a quality parameter.
Evaluation shows the VDB method provides significant memory savings, enables rendering large datasets on a single GPU, and achieves performance competitive with dense textures, comparing favorably to ZFP for random access workloads like path tracing.

This paper proposes a method for rendering large volumetric datasets on GPUs by using a fixed-rate, lossy compression scheme based on the VDB (Volume Data Base) hierarchical data structure. The core idea is to leverage OpenVDB for offline compression on the host (CPU) and NanoVDB for efficient, on-the-fly decompression and sampling during rendering on the GPU. This approach aims to reduce the significant memory footprint often required by large volumes, which typically exceeds GPU memory capacity, while maintaining interactive rendering performance, even for complex techniques like Monte Carlo path tracing.

The motivation stems from the limitations of existing methods. Using dense 3D textures is simple but memory-intensive. Block-based compression schemes like ZFP require careful cache management and complex control flow (e.g., ray wavefronts, sorting) to amortize decompression costs, which is particularly challenging for incoherent memory access patterns common in path tracing. The proposed VDB-based method offers random access sampling similar to dense textures, allowing each GPU thread to decompress data independently without requiring complex caching or synchronization.

Compression Algorithm

The compression algorithm takes a dense volume and a user-defined quality parameter (0 to 1) as input and outputs a sparse OpenVDB representation.

Background Value Identification: A histogram of the input volume's values is computed. The most frequent value is assumed to be the background value ( $B$ ).
Bricking: The volume is conceptually divided into bricks, sized to match the leaf node dimensions of the target VDB structure (e.g., $32 \times 32 \times 32$ voxels for the standard $\{3,4,5\}$ VDB layout).
Brick Analysis: For each brick, the minimum and maximum voxel values (value range $[lo, hi]$ ) are computed from the original volume data.
Brick Sorting: Bricks are sorted based on their similarity to the background value $B$ $B$ . The paper proposes and evaluates three similarity metrics based on the brick's value range $[lo, hi]$ $[l o, hi]$ :
- $f1 = \min(|lo-B|, |hi-B|)$ (closest point-in-range)
- $f2 = \max(|lo-B|, |hi-B|)$ (farthest point-in-range)
- $f3 = |(lo+hi)/2-B|$ (median point-in-range) The evaluation suggests $f2$ and $f3$ generally yield better quality. Bricks are sorted in ascending order of similarity (bricks most similar to background come first).
Voxel Activation: The algorithm determines the number of bricks to fully activate based on the quality parameter (e.g., quality = 0.8 means activate 80% of the bricks). It iterates through the sorted list, starting from the brick least similar to the background, and activates all voxels within that brick in the OpenVDB structure by setting their original values. This continues until the target number of bricks (determined by the quality parameter) has been processed. Voxels in bricks not selected remain inactive (implicitly background).
Finalization: The resulting OpenVDB tree is pruned for memory optimization.

GPU Rendering and Decompression

Conversion to NanoVDB: The compressed OpenVDB structure, residing in CPU memory, is converted into a NanoVDB representation. NanoVDB creates a linear, pointer-free layout of the VDB tree optimized for GPU access. This linear block can be efficiently copied to GPU memory.
On-the-Fly Decompression: During rendering, GPU threads sample the volume using NanoVDB's provided accessor and sampler functions. These functions handle the tree traversal and voxel value retrieval directly on the GPU. Supported sampling includes nearest-neighbor and trilinear interpolation (0th and 1st order). If a sample location falls into an inactive region of the sparse VDB, the background value is returned. This decompression happens implicitly during sampling within each thread.

Implementation and Integration

The authors integrated their method into the Barney ray tracer, which uses the ANARI API for scientific visualization.

The compression and conversion process is hidden behind the ANARI API. Users specify the volume as a standard structured-regular grid but add parameters to indicate VDB compression and the desired rate.
Inside Barney (a Monte Carlo volume path tracer using Woodcock tracking and multi-GPU ray queue cycling), the standard 3D texture sampling calls are replaced with calls to the NanoVDB sampling functions when VDB compression is enabled.
Traversal acceleration still uses a uniform majorant grid (even for VDB data) to facilitate direct comparison with dense texture rendering and based on findings that hierarchical traversal offered diminishing returns.

Evaluation

The evaluation compared the VDB compression method against rendering with uncompressed dense 3D textures and the quality against ZFP compression.

Compression Rate: The fixed-rate algorithm effectively controls the output size, though the actual size depends on data sparsity. Even for dense noise data, the NanoVDB overhead compared to the raw data size was found to be modest (~10% for large datasets). Many sparse datasets could be compressed significantly (e.g., 1:4 or more) losslessly or with minimal loss (high PSNR).
Rendering Performance: Rendering with NanoVDB achieved frame rates competitive with hardware-accelerated 3D texture sampling using CUDA textures. For very large datasets requiring multiple GPUs when stored uncompressed (e.g., Galaxy dataset needing 3 GPUs), the compressed VDB version fit on a single GPU and rendered significantly faster due to avoiding multi-GPU communication overhead.
Quality vs. ZFP: Quality metrics (MSE, PSNR) showed VDB compression performs well, especially at lower compression rates, comparing favorably to ZFP for the tested sparse datasets, although ZFP operates fundamentally differently (block-based DCT/DWT transforms). Visual comparisons showed VDB tends to discard entire low-contrast blocks (leading to missing features at high compression), while ZFP introduces block-based artifacts more uniformly.
Back-and-Forth Conversion: Converting an original VDB dataset (WDAS cloud) to dense and back to the proposed compressed VDB showed that achieving lossless reconstruction required slightly larger sizes than the original curated VDB, but compressing to match the original VDB size incurred very small errors.

Conclusion

The paper concludes that using OpenVDB for fixed-rate compression and NanoVDB for GPU sampling is a viable approach for rendering large, sparse volumetric data in scientific visualization. It offers significant memory reduction compared to dense textures, provides competitive rendering performance, integrates well into existing renderers as a near drop-in replacement for texture sampling, and performs favorably against other compression methods like ZFP for sparse data, particularly within a path tracing context requiring random access. The main trade-off is potential block artifacts or missing features at high compression rates if the transfer function highlights regions deemed "background" during compression.

PDF Markdown

Tweets

https://twitter.com/ssh4net/status/1909827846316736915