Neural Cone Radiosity

Updated 10 September 2025

Neural cone radiosity is a rendering method that uses cone-based spatial aggregation to capture high-frequency, view-dependent radiance on glossy materials.
It leverages reflectance-aware cone footprints and a dual-branch network, combining diffuse and glossy predictions for efficient, photorealistic output.
Empirical results show reduced errors in specular highlights and caustics with interactive performance compared to conventional neural radiosity techniques.

Neural cone radiosity is a rendering methodology that combines the principles of neural scene representations with cone-based spatial encoding to model high-frequency, strongly view-dependent outgoing radiance distributions—particularly for glossy materials. Unlike traditional neural radiosity approaches that primarily rely on pointwise positional encoding and are limited in capturing radiance distribution with sharp directional lobes, neural cone radiosity leverages reflectance-aware cone footprints matched to the material's BRDF lobe, embedding view-dependent reflectance characteristics directly into the feature encoding. This approach allows more faithful reconstruction of glossy reflections, high-frequency highlights, and caustic effects, while maintaining computational efficiency and supporting interactive applications.

1. Foundational Principles of Neural Cone Radiosity

Neural cone radiosity extends the classical radiosity and neural radiosity frameworks by encoding the finite spatial and angular extent of surface reflection. Traditional neural radiosity, as described in (Hadadan et al., 2021), represents the outgoing radiance $L_\theta(x, \omega_o)$ at surface position $x$ and direction $\omega_o$ via an MLP, typically parameterized with positional features and local material descriptors. This is optimized over the residual of the rendering equation, decoupling global illumination from image synthesis:

$r_\theta(x, \omega_o) = L_\theta(x, \omega_o) - L_e(x, \omega_o) - \int_{H^2} f_\mathrm{BRDF}(x, \omega_i, \omega_o) L_\theta(x'(x, \omega_i), -\omega_i) |\mathbf{n} \cdot \omega_i| d\omega_i$

Neural cone radiosity departs from pointwise encoding by introducing cone-based spatial aggregation. Instead of sampling radiance only at the central intersection of a ray and surface, it aggregates features over the finite projected footprint of a reflection cone determined by the glossy BSDF lobe. The cone angle $\theta_\mathcal{C}$ is coupled to surface roughness $\rho$ to cover a fixed energy fraction $\tau$ of the NDF:

$\int_{0}^{2\pi} \int_{0}^{\theta_\mathcal{C}} D(\theta, \rho)\, d\theta\, d\phi = \tau \int_{0}^{2\pi} \int_{0}^{\pi/2} D(\theta, \rho)\, d\theta\, d\phi$

This explicit connection enables the encoding of the reflectance's spatial support on the surface.

2. Reflectance-Aware Ray Cone Encoding and Hash Grid Aggregation

The technical innovation centers on the reflectance-aware ray cone encoding that spatially aggregates neural features over the cone’s surface footprint. Instead of querying feature grids at a single intersection or along a path, NCR computes an aggregated feature vector $v_{glo}(x, r_\mathcal{C})$ using a pre-filtered multi-resolution hash grid, where $r_\mathcal{C}$ denotes the cone footprint:

$v_{glo}(x, r_\mathcal{C}) = \frac{s r_\mathcal{C} - l_{i+1}}{l_i - l_{i+1}} v_i(x) + \frac{l_i - s r_\mathcal{C}}{l_i - l_{i+1}} v_{i+1}(x)$

Here, $v_i(x)$ and $v_{i+1}(x)$ are hash grid feature vectors at grid levels $l_i$ and $l_{i+1}$ , and $s$ is the sample density linking spatial scale to grid resolution. To account for surface irregularity, the cone footprint is further decomposed by stochastic sampling of $T$ rays through the BSDF lobe, then grouped via K-Means clustering; queries are performed at cluster representatives $x'_k$ , and the glossy radiance is obtained by weighted summation:

$L_{glo}(x, \omega_o) = \sum_{k} \frac{|T_k|}{T} L_r(x'_k, -\omega_r, r_{\mathcal{C},k})$

where $T_k$ is the ray count in cluster $k$ , and $L_r$ is the radiance prediction network evaluated at each cluster center.

3. Dual-Branch Network Architecture and Modulation

Neural cone radiosity adopts a dual-branch architecture:

The diffuse branch employs conventional multi-resolution hash grid encoding and MLP prediction for high-roughness or diffuse materials, capturing low-frequency radiance distributions.
The glossy branch utilizes the cone encoding described above to model high-frequency, view-dependent effects on glossy surfaces.

A lightweight modulation network blends the two radiance predictions according to roughness and reflectance parameters, ensuring smooth transitions across varying material types. At inference, the radiance estimation is:

$L(x, \omega_o) = \alpha_{mod}(x) L_{glo}(x, \omega_o) + (1 - \alpha_{mod}(x)) L_{diff}(x, \omega_o)$

where $\alpha_{mod}(x)$ controls the branch mixing.

4. Training and Optimization Formulation

The network is optimized by minimizing a relative MSE loss over the rendering equation’s residuals:

$\mathcal{L}_{rMSE} = \frac{1}{N} \sum_j \left\| \frac{r_{\Theta}(x_j, \omega_{o,j})}{sg(m_\Theta(x_j, \omega_{o,j})) + \epsilon} \right\|^2$

Here, $m_\Theta$ denotes the mean of the left- and right-hand sides of the rendering equation, and $sg(\cdot)$ is a stabilization term. The residual $r_\Theta(x, \omega_o)$ measures the degree to which network predictions satisfy the physical light transport equations. Monte Carlo sampling is used for training, and grid sampling over cone areas boosts the efficiency of radiance prediction for glossy regions.

5. Experimental Results and Comparative Analysis

Experiments demonstrate that NCR achieves superior accuracy and photorealistic rendering for glossy scenes as compared to vanilla neural radiosity (NR) and conventional Monte Carlo approaches. Key empirical findings include:

Substantial reduction in Mean Absolute Percentage Error (MAPE) in glossy areas.
Real-time frame rates using interactive, scene-specific training with storage overhead ( $\sim$ 104 MB/scene versus 44 MB/scene for NR).
Visual fidelity in reproducing sharp highlights, caustics, and high-gloss surfaces (bathroom, Cornell box, kitchen scenes) that are typically blurred in pointwise NR.
When compared to traditional path tracing with Monte Carlo denoising (e.g., Oidn), NCR produces comparable or improved quality without flickering or excessive noise, and at lower computational cost.

A plausible implication is that the cone-based aggregation relieves the main radiance network from having to fit challenging, high-frequency details, which reduces noise and maintains network compactness.

6. Limitations and Future Research

While NCR offers clear advantages in simulating glossy global illumination, some limitations remain. The approach currently requires per-scene training and increased storage for cone feature grids. Challenging light transport effects—such as transparent material refractions and multi-bounce caustics—are less accurately captured. Possible future research directions include extending NCR to generalizable neural rendering (so the trained model transfers across scenes), improving dynamic scene and animated object handling, and incorporating advanced BSDF models for complex reflective and transmissive phenomena.

7. Connections to Broader Neural Radiosity Paradigms

Neural cone radiosity is part of a continuum of neural rendering methods that seek to encode global illumination via learning-based function approximators. Beyond pointwise radiosity (Hadadan et al., 2021) and differentiable extensions (Hadadan et al., 2022, Hadadan et al., 2023), other frameworks have advanced scene-adaptive encoding, volumetric radiance grids (Condor et al., 2022), and cone aggregation for antialiasing and scale-robust novel-view synthesis (Huang et al., 2023). Cone-based encoding—informed by BSDF energy distribution and robust spatial aggregation—addresses critical limitations of earlier approaches in resolving high-frequency, view-dependent light transport, making NCR an efficient, physically informed choice for real-time photorealistic rendering of glossy surfaces.

Neural cone radiosity provides a principled, computationally efficient framework for capturing high-frequency view-dependent global illumination by embedding reflectance-aware cone footprints into feature encodings, representing a significant methodological advance for interactive and accurate rendering of glossy materials (Ren et al., 9 Sep 2025).