MatSpray: 3D Relightable Asset Fusion
- MatSpray is a framework that converts casual multi-view photographic data into relightable 3D scenes by fusing 2D diffusion-based PBR predictions with 3D Gaussian splatting reconstruction.
- It employs both Gaussian ray-tracing projection and image-based material optimization to transfer spatially varying material properties onto a consistent 3D asset.
- Empirical results show improved metrics in PSNR, SSIM, and LPIPS, enabling rapid, high-fidelity asset creation for gaming, film, and AR applications.
MatSpray is a framework for end-to-end conversion of casual multi-view photographic data into fully relightable 3D scene representations with accurate spatially varying material (SVBRDF) parameters. It fuses single-view, diffusion-based world knowledge about Physically Based Rendering (PBR) properties with an efficient 3D Gaussian splatting geometry, addressing the longstanding challenge of lifting high-fidelity 2D material predictions onto consistent, editable 3D assets. The design enables rapid and robust asset creation for content production in gaming and film, achieving improved relightability and perceived realism relative to prior neural and inverse-rendering pipelines (Langsteiner et al., 20 Dec 2025).
1. Background and Motivation
In 3D content creation, recovering both geometry and spatially varying material properties from unconstrained photographic data is crucial for physically plausible relighting. Traditional neural rendering pipelines frequently conflate lighting and reflectance—baking illumination into surface textures and thus failing under novel lighting conditions. Classical inverse-rendering approaches typically enforce strict capture constraints to resolve reflectance and illumination. In contrast, 2D diffusion models trained on image or video datasets effectively predict PBR properties such as base color (albedo), roughness, and metallic maps, but exhibit inconsistencies across views and lack direct geometric correlation.
MatSpray is motivated by the need to bridge these gaps. By leveraging the strong 2D priors of diffusion-based predictors with a 3D Gaussian splatting backbone, MatSpray produces relightable assets that are both visually accurate and efficient to compute. Key contributions include world material fusion, a novel Neural Merger, and a substantial computational speedup over prior 3D inverse-rendering frameworks.
2. Geometry and Material Acquisition Pipeline
2.1 3D Gaussian Splatting Reconstruction
MatSpray reconstructs 3D scene geometry from multi-view input using a radiance-based Gaussian splatting engine (R3DGS). The scene is represented as a set of 3D Gaussians, each defined by a spatial mean and covariance, an opacity parameter (interpreted as density), and a view-dependent radiance. Rendering is performed by alpha compositing splats front-to-back along camera rays. For subsequent material transfer, the pipeline adopts the 3D Gaussian ray-tracing formulation of Moenne-Loccoz et al., enabling principled association of material properties with translucent, non-volumetric Gaussian primitives.
2.2 Diffusion-Based Per-View PBR Prediction
A pretrained 2D diffusion material estimator—such as DiffusionRenderer—is separately applied to each input image to generate per-view PBR maps: base color, roughness, and metallicity. The diffusion model is not trained within MatSpray, rather it is leveraged as a black-box predictor with the standard denoising objective:
producing high-fidelity, but view-dependent, SVBRDF maps often containing baked illumination.
3. 2D-to-3D Material Fusion
MatSpray introduces two complementary approaches to lift 2D per-view PBR maps onto the 3D Gaussian scene:
3.1 Gaussian Ray-Tracing Projection
For each camera view and per-pixel ray , the Gaussian's point of maximum response is analytically computed as
with corresponding opacity
where is a fall-off parameter. Whenever a pixel ray "hits" a Gaussian, its material label is collected; for each Gaussian and each view, the per-Gaussian material estimate is set to the median value across the ray-footprint.
3.2 Image-Based Material Optimization
Alternatively, each Gaussian's material channel (e.g., base color) is parameterized via a small MLP or per-Gaussian learnable variables. These are optimized by rendering 3D PBR maps from the current estimate and minimizing the direct image loss against the diffusion-predicted 2D maps,
Both strategies produce extensive per-view, per-Gaussian material estimates, which are typically inconsistent due to inherent limitations of single-view estimation and view-dependent illumination artifacts.
4. Neural Merger: Multi-View Material Reconciliation
To resolve view inconsistencies and suppress baked-in lighting, MatSpray employs a lightweight, softmax-based MLP ("Neural Merger") for each material channel. For each Gaussian, the MLP takes as input a positional encoding of its 3D location and the stack of its per-view material estimates and outputs unnormalized view-wise weights. After softmax normalization across views:
the final, single material value for each Gaussian is a convex combination of the view-specific predictions:
Operating strictly within the simplex of input predictions, the Merger guarantees that the fused SVBRDF cannot "hallucinate" new values or absorb illumination in the output, directly constraining it to observed egocentric materials. Empirical ablation demonstrates that omitting the softmax permits the MLP to encode view-specific lighting, resulting in baked shadows and degraded relightability.
5. Joint Training and Rendering Losses
Material maps produced by the Neural Merger are rasterized to per-view images for joint supervision. Training alternates between:
- Minimizing the image-based SVBRDF loss to anchor to 2D priors,
- Minimizing a PBR-rendering loss to match the final rendered appearance to ground truth,
where balances L1 fidelity and structural similarity (SSIM) terms.
During this process, both the Neural Merger and an HDR environment map are refined. This synergy allows geometry-material-light interactions to be progressively sharpened and the disentanglement of reflectance and scene illumination to improve.
6. Performance Metrics and Qualitative Results
MatSpray's quantitative benchmarks on the Navi synthetic dataset (17 objects, 100 training and 200 test views each) and real-world captures show consistent improvements over both extended R3DGS (supporting metallic) and IRGS baselines in established relighting and material metrics: PSNR, SSIM, and LPIPS.
| Task | PSNR ↑ | SSIM ↑ | LPIPS ↓ |
|---|---|---|---|
| Relighting (Ours) | 27.28 | 0.897 | 0.080 |
| Relighting (R3DGS) | 25.48 | 0.875 | 0.094 |
| Relighting (IRGS) | 24.41 | 0.850 | 0.166 |
| BaseColor (Ours) | 21.34 | 0.873 | 0.125 |
| BaseColor (R3DGS) | 18.36 | 0.832 | 0.158 |
| BaseColor (IRGS) | 19.20 | 0.750 | 0.139 |
| Roughness (Ours) | 15.33 | 0.820 | 0.181 |
| Roughness (R3DGS) | 14.47 | 0.763 | 0.216 |
| Roughness (IRGS) | 16.18 | 0.744 | 0.192 |
| Metallic (Ours) | ∞*/27.20 | 0.893 | 0.106 |
| Metallic (R3DGS) | 10.07 | 0.693 | 0.261 |
(*Non-metallic objects yield infinite PSNR.)
Qualitative inspection reveals base-color maps with minimal shadow or highlight retention, material maps exhibiting accurate spatial patterns, and relit renderings faithfully producing specular reflections.
Ablations show the criticality of both the Neural Merger and joint 3D-image supervision: direct averaging of per-view estimates or loss of 3D consistency results in degraded metrics and visible absorption of baked lighting. For example:
- Averaging per-view: PSNR = 25.56, SSIM = 0.866, LPIPS = 0.122
- 2D image loss only: PSNR = 24.81, SSIM = 0.889, LPIPS = 0.0792
- Full MatSpray: PSNR = 29.16, SSIM = 0.9105, LPIPS = 0.0626
7. Computational Efficiency, Flexibility, and Outlook
By reutilizing off-the-shelf 2D diffusion models and the efficient R3DGS splatting kernel, MatSpray reconstructs relightable 3D materials approximately 3.5× faster than prior Gaussian-based inverse renderers (IRGS). The "world material fusion" plug-in design accommodates arbitrary diffusion-based SVBRDF methods, without retraining the underlying predictor.
The combination of neural multi-view fusion, projection-based transfer, and geometry-aware supervision establishes MatSpray as a flexible and accurate tool for 3D relightable asset creation from unstructured photo collections. This suggests broad applicability in film, visual effects, gaming, and augmented reality pipelines, particularly where rapid, high-fidelity capture-to-asset workflows are desired and precise relighting is required (Langsteiner et al., 20 Dec 2025).