- The paper introduces Neural Texture Splatting, integrating local RGBA texture fields with a global neural field to enhance the expressiveness of 3D Gaussian Splatting.
- It demonstrates improved view synthesis, geometry modeling, and dynamic reconstruction with state-of-the-art metrics, including up to a +3 dB PSNR gain on dynamic scenes.
- The method mitigates overfitting and redundancy by using canonical polyadic decomposition and L1 regularization, efficiently compressing local textures while ensuring global consistency.
Neural Texture Splatting: Enhancing 3D Gaussian Splatting for Robust View Synthesis, Geometry, and Dynamic Reconstruction
Introduction and Motivation
3D Gaussian Splatting (3DGS) has become a dominant paradigm for explicit point-based radiance field rendering, yielding high-quality, real-time visualizations and facilitating extensions to surface modeling, sparse input reconstruction, and 4D dynamics. However, the primitive-level expressiveness in vanilla 3DGS is constrained by the limitations of isotropic Gaussian kernels. Augmenting local capacity—especially through per-splat textures—has recently increased scene fidelity for novel-view synthesis but presents significant challenges: view/time independence, redundancy, and overfitting, especially with dense training regimes and reduced primitive counts.
This work introduces Neural Texture Splatting (NTS), which unifies and significantly extends 3DGS methods by encoding each splat with a local RGBA texture field while regularizing its structure via a global neural field. This architecture enables compact, shared modeling of splat appearance and geometric variations, explicitly conditioning on view and time, and yields strong generalization under both sparse and dense input conditions.
Methodology
NTS associates each 3D Gaussian primitive with a local RGBA tri-plane texture field, capturing high-frequency local appearance and geometry. The rendering pipeline first computes the ray-primitive intersection (using techniques such as in [Yu2024GOF]), then queries the corresponding local texture fields to modulate color and opacity. The rendering integral is extended to include these texture contributions, yielding improved local expressiveness.
Figure 1: Overview of Neural Texture Splatting, showing integration of local textured tri-planes with global neural tri-plane encoding for each splat and the resultant compositional rendering pipeline.
Critically, the risks of overfitting and unsatisfactory spatial consistency in per-primitive textured splatting are mitigated by a global neural field, structured as a hybrid of global tri-plane features and a shallow neural decoder. Splats query this shared global field at their center positions and with additional conditioning on view direction and timestep, which produces locally parameterized RGBA tri-plane textures. To maximize efficiency, Canonical Polyadic decomposition is employed for plane decoding, and regularizations including L1 sparsity are introduced.
Empirical Results
Sparse-View and Dynamic Reconstruction
NTS is evaluated against state-of-the-art methods for both static and dynamic sparse-view benchmarks (Blender, Owlii), outperforming the previous best (SplatFields) in metrics such as PSNR, SSIM, and LPIPS. Quantitative improvements (~3 dB PSNR uplift on dynamic 4-view Owlii scenes) highlight the role of neural conditioning in capturing time-dependent appearance, which traditional per-primitive texture models cannot achieve.
Figure 2: Qualitative comparison on MipNeRF360; the method preserves high-frequency, view-dependent effects and fine structures missed by baselines.
Extensive ablations demonstrate NTS’s superiority over naive textured splatting and alternative neural decoders (e.g. direct RGB or SH prediction), corroborating the architectural choices. These gains persist with variable input density and across scene types.
Dense-View Synthesis and Surface Reconstruction
NTS is integrated with GOF and 3DGS-MCMC for dense-view scene synthesis. It yields measurable improvement in PSNR (up to +0.65 dB over GOF on Blender), SSIM, and mesh quality. Surface extraction on DTU gains enhanced accuracy and reduced Chamfer Distance compared to the original GOF, confirming that NTS’s high-frequency modeling does not compromise geometric regularity.
(Figure 3)
Figure 3: Improved surface reconstruction on DTU: NTS-driven models generate smoother, more complete meshes across scenes.
Computational and Storage Analysis
NTS maintains reasonable model and storage footprints by using global neural priors to efficiently compress local textures. While training and rendering have increased computational cost (primarily for network evaluation and ray-Gaussian intersections), this is offset by substantial boosts in reconstruction quality. Compared to other textured-splatting approaches, NTS achieves higher PSNR and lower storage usage.
Implications and Future Directions
From a practical perspective, NTS offers a plug-and-play upgrade for existing 3DGS-based pipelines across view synthesis, geometry modeling, and dynamic reconstruction, especially where input sparsity and temporal/ocular coherence are critical. Its neural texture encoding unlocks expressive, consistent modeling of view- and time-dependent phenomena, reducing reliance on dense sampling and minimizing artifact production (e.g., "floaters").
On the theoretical front, NTS establishes a tractable route for reconciling local expressiveness with global consistency in point-based graphical models. It suggests that shared global neural fields can regularize local appearance priors sufficiently to generalize across diverse tasks and input regimes.
Moving forward, research may focus on further optimizing computational bottlenecks via more advanced ray-Gaussian intersection algorithms, lightweight neural architectures, or hierarchical encoding schemes. Extension to large-scale outdoor scenes may necessitate augmenting or redesigning the global encoding to capture more complex, longer-range spatial regularities.
Conclusion
Neural Texture Splatting represents a significant advancement in explicit radiance field modeling, addressing the key limitations of primitive-level expressiveness, overfitting, and lack of spatiotemporal modeling in prior splatting frameworks. Through local RGBA tri-plane textures conditioned by a global neural field, NTS consistently achieves state-of-the-art results in sparse/dense-view synthesis, geometry, and 4D reconstruction. Its modular and efficient design is likely to inspire further research into expressive, generalizable, and efficient 3D and 4D scene representations.