Multi-Space Neural Radiance Field (MS-NeRF)

Updated 26 May 2026

MS-NeRF is a neural view synthesis framework that decomposes a scene into multiple consistent radiance subspaces, each modeling distinct light paths.
It utilizes trainable gating functions to blend outputs from independent feature fields, yielding sharper, artifact-free visual results.
Adaptive and hybrid architectures further optimize computation by distributing ray sampling and balancing low- and high-frequency information.

A Multi-Space Neural Radiance Field (MS-NeRF) is a general framework in neural view synthesis whereby the standard single radiance field is replaced by multiple structured subspace representations. Each subspace, realized as an independent feature field, models complementary aspects of the scene—such as distinct paths for direct, reflected, or refracted light—and is recombined by trainable mixing functions to synthesize the final pixel-wise radiance. This architecture allows MS-NeRF models to overcome limitations of single-space NeRFs in dealing with reflection, refraction, scene scale, and view sparsity, all while maintaining compatibility with popular neural field backbones and incurring negligible increases in parameter count and computational overhead. Notable instantiations include MS-NeRF for mirror and glass scenes, adaptive MS-NeRF for parallel rendering acceleration, and multi-space grid–MLP hybrids designed to address sparse input or frequency decomposition (Yin et al., 2023, Wang et al., 2023, Kim et al., 2024).

1. Motivation and Problem Setting

Classic NeRF and its variants rely on the assumption of multi-view consistency: the premise that a physical 3D point projects to a fixed color in all images observing it. This assumption breaks down in the presence of reflective or refractive objects—mirror-like boundaries generate disjoint sets of rays with mutually inconsistent color assignments from different views. As a result, standard NeRFs are compelled to blur or average conflicting samples, leading to artifacts, ghosting, and loss of sharp reflected/refracted content. MS-NeRF is motivated by the observation that if a scene is decomposed into several internally consistent radiance “subspaces,” each traversed by rays in parallel, then each can accurately represent physically plausible light transport routes (direct, specular, refracted) and their respective appearances (Yin et al., 2023).

This multi-space paradigm generalizes to challenges beyond reflective phenomena. Adaptive MS-NeRFs segment large, heterogeneous scenes into spatial subspaces mapped to small, efficient neural fields, facilitating massive parallelism and adaptive representation according to local scene complexity (Wang et al., 2023). Furthermore, multi-space hybrid models assign low-frequency and high-frequency content to distinct coordinate MLP and multi-plane grid spaces, respectively, mitigating overfitting and inefficiency in sparse or dynamic-view training (Kim et al., 2024).

2. Model Architectures and Multi-Space Decomposition

Formally, an MS-NeRF replaces the canonical NeRF’s single radiance field with a list of $K$ parallel feature fields (subspaces). Each is headed by independent density ( $\sigma^k$ ) and appearance feature ( $f^k$ ) predictors, with a trainable “gating” or “mixing” module to determine how the outputs are reconstituted:

MS-NeRF (mirror/glass scenes): The underlying NeRF (MLP or grid, e.g., TensoRF) is modified to produce $K$ tuples $(\sigma^{k}(x),f^{k}(x,d))$ , one per subspace, sharing all earlier layers but splitting at the last (head) layer. For a given camera ray $r(t)=o+td$ and sample locations $\{x_i\}$ , each subspace does independent volumetric integration:

$T_i^k = \exp\left(-\sum_{j=1}^{i-1} \sigma_j^k (t_j-t_{j-1})\right), \quad \hat{F}^k(r) = \sum_{i=1}^N T^k_i (1 - e^{-\sigma^k_i (t_i-t_{i-1})})\, f^k_i$

Small MLP decoders then map $\hat{F}^k(r) \rightarrow C^k(r)\in\mathbb{R}^3$ (color), while gating weights $w^k(r)$ yield softmax-mixing coefficients $\sigma^k$ 0. The final color is

$\sigma^k$ 1

with the network trained end-to-end by MSE loss per pixel.

Adaptive Multi-NeRF: The 3D scene is recursively partitioned via a density-guided KD-tree, assigning each region $\sigma^k$ 2 a compact, identical-structure NeRF MLP. Space is subdivided as long as a small local MLP can approximate the global “Mega-NeRF” within a PSNR threshold. Each camera ray is then allocated to a sequential list of intersected subspaces, and batch processing across subspaces achieves per-MLP workload balance and high parallel occupancy (Wang et al., 2023).
Multi-Space Grid–MLP Hybrid: The radiance field is split into a coordinate-based MLP for low-frequency content (global shape) and a tensorial multi-plane interpolation grid for high-frequency detail (edges, textures). At each sample, multi-plane features are extracted via bilinear interpolation, concatenated with coordinates, then fused by a residual MLP that preserves both spaces’ contributions. A progressive, channel-wise gating schedule further separates the learning of coarse and fine detail (Kim et al., 2024).

3. Rendering, Inference, and Computational Characteristics

Rendering in MS-NeRF models involves simultaneous evaluation of all $\sigma^k$ 3 subspace heads per sample along each ray, followed by neural gating and weighted blending. In MS-NeRF for scene decomposition:

Each sample point along a ray requires evaluation of $\sigma^k$ 4 sets of outputs $\sigma^k$ 5. However, the architecture ensures heavy parameter sharing and a negligible increase in parameter count (e.g., $\sigma^k$ 6, $\sigma^k$ 7 yields $\sigma^k$ 8 overhead).
Volume rendering is performed over each subspace's outputs, and subspace color contributions are aggregated via softmax-mixed gating.
Sampling strategies and backbone mechanisms (e.g., hierarchical sampling in Mip-NeRF 360, uniform in classic NeRF) are retained (Yin et al., 2023).

For adaptive MS-NeRF:

Each ray traverses multiple localized neural fields, with ray–AABB intersection intervals batch-processed per subspace. Computational load is distributed uniformly due to density-aware subdivision, maximizing GPU occupancy and reducing kernel overhead.
The adaptive sampling policy further reduces memory and computation by assigning per-ray sample counts according to normalized subspace lengths, yielding $\sigma^k$ 9– $f^k$ 0 reduction in total samples (Wang et al., 2023).

Hybrid grid–MLP MS-NeRFs maintain standard rendering pipelines but involve additional feature interpolation overhead and small MLP evaluation, leading to somewhat slower end-to-end performance compared to pure grid-based methods (Kim et al., 2024).

4. Training Methodology and Benchmarking Datasets

MS-NeRF-style models are trained with procedures tailored to their architectures:

Standard MS-NeRF (reflection/refraction): Training is supervised by pixel-wise photometric loss (MSE) with no auxiliary regularizers. Scene datasets include both synthetic (Blender-rendered, 25–33 scenes, 1–tens of mirrors or glass, diverse camera paths) and real captured data (scenes with mirrors, glass, and heterogeneous lighting, with 62–118 360° views per scene) (Yin et al., 2023).
Adaptive MS-NeRF: Subspace MLPs are first distilled from a global Mega-NeRF trained over the whole scene, matching outputs on sampled points and directions via squared error loss. Subdivision halts when a local MLP matches its allocated subspace to within a PSNR threshold. Optional fine-tuning on photometric loss may follow. Batching is implicit via KD-tree ray traversal and parallel sample collection (Wang et al., 2023).
Hybrid multi-space models: The coordinate MLP and multi-plane grid are jointly trained with composite losses consisting of photometric MSE, Laplacian smoothing on plane grids, and sparsity regularization. A progressive gating schedule incrementally introduces grid features to prevent early overfitting to high-frequency noise in sparse-view scenarios (Kim et al., 2024).

5. Quantitative Performance and Qualitative Advantages

MS-NeRF models consistently outperform their single-space counterparts in domains they target:

Setting	Baseline	MS-NeRF Variant	PSNR Gain	Params Overhead
Mip-NeRF 360, mirror/glass (synthetic)	31.58 dB	35.04 dB (MS-Mip-NeRF 360)	+3.46 dB	+0.5%
NeRF, mirror/glass (synthetic)	30.82 dB	32.77 dB (MS-NeRF_B)	+1.95 dB	—
Ref-NeRF (glossy)	32.37 dB	33.90 dB (MS-Mip-NeRF_M)	+1.53 dB	≈0%
RFFR dataset, two-mirror scenes	35.26 dB (NeRFReN)	35.93 dB (MS-NeRF_T)	+0.67 dB	—

Qualitatively, MS-NeRF produces sharp, undistorted virtual images and faithfully recovers recursive reflections in unbounded or mirror-walled rooms, unlike baseline NeRFs which produce blurred or incorrect content (Yin et al., 2023). Adaptive MS-NeRF achieves ~30–40% reduction in ray samples and increases GPU utilization to ~85%, with only marginal (<0.5 dB) PSNR differences compared to single monolithic or uniform-grid multi-MLP accelerators (Wang et al., 2023). Grid–MLP hybrids attain robust low-frequency shape recovery and sharper details than pure explicit grid or MLP, with parameter counts up to 50% smaller for comparable quality in sparse-view or dynamic-geometry settings (Kim et al., 2024).

6. Ablation Studies and Design Analysis

Several architectural choices in MS-NeRF have been systematically analyzed:

Gating and Feature Fields: Naïve averaging of multiple (σ,c) heads without learnable feature fields or gating (“MS-NeRF_Avg”) yields only +0.63 dB PSNR over baseline and produces over-smoothed colors, confirming the importance of feature decoding and gated mixing.
Subspace Count $f^k$ 1: Experiments show that PSNR rises quickly with $f^k$ 2 up to approximately 6 (for two-mirror scenes) and then plateaus, indicating diminishing returns from higher subspace counts even with theoretically infinite mirror recursion. Small $f^k$ 3–8 suffices in practice.
Feature Dimensionality $f^k$ 4: $f^k$ 5 per subspace suffices for robust gating; $f^k$ 6 marginally stabilizes results, with negligible computational penalty.
Progressive Feature Fusion: In grid–MLP models, full residual fusion of coordinate and plane features (in the first two MLP blocks) is critical; alternate skip-connection schemes dramatically degrade PSNR (full residual: 24.74 dB; all-skip: 18.77 dB; no skip: 19.23 dB). Progressive channel gating ensures disentanglement and reduces overfitting with few views (Kim et al., 2024).

7. Limitations and Prospects for Future Development

MS-NeRF limitations stem from subspace granularity and architectural rigidity:

Extremely fine-grained effects (e.g., microfacet caustics) or scenes with deep recursive interactions may expose small ghosting or blending artifacts due to the fixed $f^k$ 7 subspace design. Adaptive $f^k$ 8 allocation, dynamic scene support, and geometry-aware priors (e.g., surface normals or explicit ray–specular heuristics) are potential avenues for improvement (Yin et al., 2023).
Grid–MLP MS-NeRFs incur higher end-to-end runtime due to MLP overhead and conventional autodiff interpolation, and require careful hyperparameter tuning (e.g., Laplacian losses, gating schedules).
Adaptive MS-NeRF’s performance depends on the quality of guidance density estimation and KD-tree partitioning; very unstructured scenes may still realize suboptimal parallel load balance (Wang et al., 2023).

Suggested extensions include dynamically allocating subspaces per ray, integration with deformable geometry, on-demand sparse interpolation for feature grids, and extending to additional spectral decompositions (e.g., spherical harmonics) or temporal hierarchies for complex video input (Yin et al., 2023, Kim et al., 2024).

In summary, Multi-Space NeRFs generalize classical radiance field models with parallel, learnable subspace architectures, substantially improving the fidelity and efficiency of neural rendering in the presence of complex scene phenomena, while remaining compatible with existing NeRF frameworks and incurring minimal additional training or inference cost (Yin et al., 2023, Wang et al., 2023, Kim et al., 2024).