NeuralPVS: Fast Visibility Estimation

Updated 1 October 2025

NeuralPVS is a neural-network-based method that reformulates PVS estimation using 3D convolutional networks to deliver real-time performance.
It employs sparse 3D CNNs with volume-preserving interleaving to reduce memory usage while maintaining spatial precision for accurate visibility prediction.
The approach integrates a compound loss function combining weighted Dice loss and repulsive visibility loss, achieving under 1% omission in visible geometry.

NeuralPVS refers, in its most recent usage, to a learned, neural-network-based method for fast real-time estimation of the potentially visible set (PVS) from a specified region or viewpoint within large-scale computer graphics scenes (Wang et al., 29 Sep 2025). This task, historically solved by analytic and geometric precomputation, is recast as a structured prediction problem that leverages 3D convolutional networks operating on compressed voxel grids to output per-region visibility. NeuralPVS modifies the traditional hardware and geometric software pipeline with an end-to-end neural estimator that exhibits marked improvements in both computational efficiency and predictive accuracy. The method attains processing rates of approximately 100 Hz with less than 1% omission in visible geometry, outperforming established state-of-the-art techniques in both static and dynamic environments.

1. Principles of Potentially Visible Set Computation

Visibility computation in graphics consists of identifying the set of scene elements that may be visible from some region or viewpoint (the "from-region" problem). Classical solutions include analytic methods such as visibility sampling, ray shooting, and geometric precomputation (e.g., portal culling, PVS tables), which are either scene-size dependent or limited to static environments. The neural approach reframes PVS determination as a mapping from spatially discretized geometry grids (voxelized representations aligned to the view frustum, hence "froxel grids") to a binary occupancy grid indicating visibility. This grid, $\mathbf{G}$ , encodes the scene as $G(\mathbf{x}) \in \{0,1\}$ for each froxel at position $\mathbf{x}$ , forming the basis for subsequent deep inference.

2. Sparse 3D CNNs and Volume-Preserving Interleaving

NeuralPVS employs an OA-CNN-based sparse convolutional architecture, tailored for 3D input domains with substantial empty space. Instead of dense convolution over the entire grid, sparse convolution restricts operations to the subset of froxels containing geometry, resulting in significant computational savings. Prior to CNN processing, a "volume-preserving interleaving" procedure is applied: for a grid of size $N_x \times N_y \times N_z$ , the grid is partitioned into non-overlapping blocks of size $d^3$ , and the occupancy values within each block are stacked to form high-dimensional vectors. This reduces the effective input resolution from $N$ to $N/d^3$ , mitigating memory and bandwidth requirements while maintaining the relative spatial encoding necessary for accurate visibility prediction. At inference, a de-interleaving operation restores the grid for downstream rendering.

3. Loss Function: Weighted Dice and Repulsive Visibility Loss

The estimation of PVS suffers from severe class imbalance—typically, only a small fraction ( $0.5\%$ – $10\%$ ) of all froxels are potentially visible. The paper introduces a compound loss function combining weighted Dice loss and a novel "repulsive visibility loss" (RVL). The Dice loss,

$L_\text{dice}(V, \hat{V}) = 1 - \frac{2 TP}{2 TP + \alpha FP + (1 - \alpha) FN},$

weights false negatives heavily. The RVL term integrates:

$L_\text{attr} = 1 - \frac{FN}{GTP},\qquad L_\text{rep} = \frac{FP}{GTP},$

where $GTP$ is the ground-truth count of visible froxels, penalizing both missed visible regions and excess false positives. The overall loss is

$L = \lambda L_\text{dice} + (1 - \lambda) L_\text{rv}$

with $\lambda = 0.99$ . This formulation effectively regulates the prediction, yielding correct visible set coverage without significant overestimation.

4. Quantitative Performance and Robustness

Benchmarking demonstrates real-time inference speeds (≈100 Hz, 10 ms/frame) independent of scene complexity, with geometric omission errors below 1%. Compared to the "Trim Regions" method (a geometric baseline), NeuralPVS achieves reductions in false negative rates up to 83.8%, and typically superior or comparable false positive rates. Perceptual rendering quality, measured via SSIM, remains nearly perfect (≈0.999), and the pixel error rate does not introduce visually impactful artifacts. The interleaving scheme allows for substantial reductions in memory footprint (optimal at interleaving factor $d = 16$ ), and the network generalizes to scenes not presented during training without loss of accuracy.

5. Architectural Pipeline and Integration

The system pipeline is as follows:

Stage	Input	Output
Froxelization	Raw scene geometry	Occupancy grid $\mathbf{G}$
Interleaving	$\mathbf{G}$	Compressed grid $g_d(\mathbf{G})$
Sparse CNN	$g_d(\mathbf{G})$	Visibility scores (per block)
De-interleaving	CNN scores	Full-resolution PVS map
Thresholding	PVS map	Binary visibility assignment

This pipeline is depicted in the publication's system diagram and renders images illustrating the input grid, CNN output, and culled rendering.

6. Applications and Dynamic Extensions

NeuralPVS enables several advanced applications:

Real-time scene rendering: Rapid PVS computation facilitates real-time culling for shadow mapping, global illumination, and light-field rendering.
Streaming and prefetching: Decouples computational cost from scene scale; only visible geometry is streamed or prefetched.
Dynamic scenes: The approach extends to temporally bounded volumes (TBVs) for moving objects, although the bulk of reported results focus on static or slowly changing environments.
Future research: The architecture supports further integration of neural components for radiance transfer and data-driven rendering optimization.

7. Limitations and Future Prospects

NeuralPVS demonstrates robustness across a range of tested environments, but its performance in highly dynamic scenes depends on the frequency and completeness of TBV updates and synthetic training scenarios. Potential future refinements include optimizing for GPU-specific architectures, integrating more advanced sparsity mechanisms, and applying to related visibility tasks (e.g., environment-aware shadow computation).

Conclusion

NeuralPVS establishes a new paradigm for visibility estimation in computer graphics, leveraging sparse CNNs with volume-preserving input compression and a tailored loss function. Its real-time performance, low error rates, and generalization capacity mark a significant advance over classical methods, supporting efficient rendering, streaming, and interactive scene management in large and dynamic environments (Wang et al., 29 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

NeuralPVS: Learned Estimation of Potentially Visible Sets (2025)

Follow Topic

Get notified by email when new papers are published related to NeuralPVS.