Neural Texture Splatting Advances

Updated 26 November 2025

Neural Texture Splatting is a method that augments Gaussian splatting with learned, spatially varying neural textures to capture high-frequency details.
It employs either explicit per-splat texture atlases or global neural fields with MLP decoders, improving render quality and enabling dynamic view and time-dependent effects.
Empirical benchmarks show improvements in PSNR, SSIM, and geometry reconstruction, balancing high-fidelity detail with efficient GPU-based real-time performance.

Neural Texture Splatting (NTS) is a family of methods augmenting Gaussian-based scene representations—particularly 2D and 3D Gaussian Splatting (2DGS/3DGS)—with learned, spatially varying neural texture fields. The core objective is to marry the real-time rendering performance of Gaussian splatting with the high expressive power of per-primitive, high-frequency texture mapping, enabling accurate novel view synthesis, geometry reconstruction, and dynamic 3D/4D scene modeling—especially in scenarios with reflectance, fine appearance details, or diverse input sparsity conditions (Younes et al., 16 Jun 2025, Wang et al., 24 Nov 2025). NTS approaches fall into two main categories: models employing explicit per-splat texture atlases (e.g., TextureSplat), and those using global neural fields (e.g., neural tri-planes) with local neural decoding. Both approaches aim to address fundamental expressivity and efficiency bottlenecks in traditional Gaussian splatting techniques.

1. Foundations: From Gaussian Splatting to NTS

Original Gaussian Splatting (GS), whether 2DGS or 3DGS, represents a scene using thousands of oriented, spatially distributed Gaussian primitives (“splats”), each parameterized by position, scale, orientation, opacity, and a set of appearance descriptors such as spherical harmonic coefficients for color. This compact representation supports real-time rendering via volumetric compositing but is fundamentally limited in its local expressiveness: each primitive encodes only a single, aggregated appearance vector (Younes et al., 16 Jun 2025, Wang et al., 24 Nov 2025, Wu et al., 20 May 2024). This results in blurred surface detail, loss of high-frequency texture, smoothed-out specular highlights, and geometrically inaccurate normal fields—especially pronounced in highly reflective or complex dynamic scenes.

NTS methods were developed to overcome these limitations. They facilitate either explicit or neural parameterization of local appearance (albedo, roughness, normals, metallicity)—permitting sharp specular highlights, detailed view/time-dependent effects, and signal disentanglement across primitives—while retaining the core advantage of fast, parallelizable GPU implementation.

2. Explicit Per-Splat Texture Mapping: TextureSplat

TextureSplat (Younes et al., 16 Jun 2025) exemplifies explicit per-primitive texture mapping for Gaussian splatting. Here, each Gaussian splat is augmented with its own fixed-resolution planar texture atlas storing spatially varying channels: albedo $\rho(u,v)$ , roughness $r(u,v)$ , metallic $m(u,v)$ , and tangent-space normal $n^t(u,v)$ . The formulation for a primitive $k$ is:

$\Sigma_k \in \mathbb{R}^{2 \times 2}$ : in-plane covariance
$\mu_k \in \mathbb{R}^3$ : centroid position
$R_k \in SO(3)$ : orientation
$o_k \in \mathbb{R}$ : opacity
$\mathcal{T}_k: [-S\sigma, S\sigma]^2 \to \mathbb{R}^c$ : per-primitive texture field

A primitive’s attributes are retrieved using bilinear filtering:

$a_k(u, v) = \text{BilinearFilter}(\mathcal{T}_k, (u+S\sigma)/2S\sigma, (v+S\sigma)/2S\sigma)$

During rasterization, the radiance field is computed via a split-sum physically-based rendering (PBR) approach using deferred shading. Texture atlases for all primitives are packed for efficient hardware filtering, enabling end-to-end differentiable optimization. This structure decouples geometry and appearance, allowing high-frequency variation without a proliferation of splats and achieving memory efficiency—e.g., $4 \times 4$ or $8 \times 8$ texels per splat (Younes et al., 16 Jun 2025).

Key performance outcomes (Shiny Blender, Glossy Synthetic, Ref-Real datasets):

PSNR improvement of ~2 dB (e.g., $34.9 \to 36.0$ dB)
Tangent-space normal MAE reduction from $2.08^\circ$ to $1.83^\circ$
Rendering at $0.95$– $0.96\times$ the baseline’s speed due to efficient GPU implementation

This explicit mapping achieves real-time, high-quality novel-view synthesis in scenes featuring complex reflectance and small-scale geometric variation.

3. Neural Field-Driven Texture Splatting

Neural Texture Splatting in the neural field paradigm (Wang et al., 24 Nov 2025) generalizes and compresses the per-primitive texture concept. Rather than learning explicit textures for each splat, a shared global neural field (hybrid tri-plane plus MLP decoder) predicts local appearance and geometry for every primitive, supporting strong global information exchange and reduced model size.

Formally, each primitive $i$ is a 3D Gaussian parameterized by $(\mu_i, \Sigma_i)$ , with $\Sigma_i = R_i S_i^2 R_i^\top$ . Three global tri-plane feature tensors $(G_{xy}, G_{xz}, G_{yz})$ are sampled at $\mu_i$ , concatenated, and fed—with the primitive center, view direction, and timestamp (for dynamic scenes)—to a compact MLP decoder. This MLP predicts compact factors whose outer products define local RGBA texture grids for each splat.

Appearance at a 3D location $\hat{x}$ intersecting splat $i$ is evaluated via:

Local coordinate transformation: $x_i = S_i^{-1} R_i(\hat{x}-\mu_i)$
Bilinear sampling on each plane: $f^i_{uv}(x_i)$
Output averaging and compositing into rendering equation

Because the decoding step can be conditioned on view direction and time, NTS naturally synthesizes view- and time-dependent effects such as specularities or nonrigid motion. The global field ensures multi-splat consistency, parameter regularity, and efficient resource utilization.

Reported quantitative improvements over baselines include:

Static scenes: PSNR improvement of 0.5–1 dB, SSIM gain of 0.0066 on Blender (4–10 views)
Dynamic scenes: PSNR gain of 1.38 dB, SSIM increase of 0.0077 on Owlii
Surface reconstruction: Chamfer distance reduced from 0.74 to 0.67
Parameter footprint: $\sim$ 15 MB (NTS) vs $\gtrsim$ 50 MB (explicit) (Wang et al., 24 Nov 2025).

A notable trade-off is the additional memory and compute overhead at training and inference time due to MLP decoding and tri-plane evaluation, with rendering speed decreasing from $\sim$ 430 FPS (vanilla) to $\sim$ 150 FPS (SplatFields+NTS).

Extending the splatting paradigm, NTS principles appear in several related domains:

Gaussian Head & Shoulders (Wu et al., 20 May 2024): Combines 3D Gaussian splatting (for heads) with high-res neural textures (for upper body garments). Here, a neural warping MLP aligns a 2D neural texture with 3D anchor Gaussians for robust, sharp detail, fast reenactment, and high frame rates ( $\sim$ 130 FPS, no-MLP test variant).
UV Volumes (Chen et al., 2022): Introduces Neural Texture Stacks (NTS, Editor’s term) for free-view human performance rendering. High-frequency appearance is encoded in 2D per-part neural stacks, retrieved via UV coordinates inferred by a 3D “UV volume.” This division enables photorealistic, editable rendering at real time ( $>30$ FPS at $960 \times 540$ ), outperforming fully volumetric baselines in both speed and quality.

Both approaches demonstrate the effectiveness of neural texture decoupling for appearance-rich, editable, and animatable scenes.

5. Optimization and Training Strategies

NTS optimization entails joint end-to-end learning of all primitive parameters and texture fields (explicit or neural):

TextureSplat (Younes et al., 16 Jun 2025): Optimizes primitive geometry, G-buffer attributes, per-splat textures, per-splat spherical harmonics for indirect color, and environment maps. The loss function combines $L_1$ , D-SSIM, normal-consistency, and edge-aware smoothness terms. Training proceeds in two stages: fixed attributes initialization, followed by learnable textures with frozen geometry for stability. Texture atlases are packed via grid charts and indirection buffers to allow efficient gradient routing.
Neural NTS (Wang et al., 24 Nov 2025): Incorporates backbone auxiliary losses, photometric reconstruction (MSE), D-SSIM, tri-plane sparsity ( $\ell_1$ on texture plane entries), chamfer-distance for geometry, and optional temporal consistency for dynamics.

These pipelines are built for differentiable rendering and leverage GPU acceleration (hardware bilinear interpolation, deferred shading, multi-target G-buffers).

6. Limitations and Research Directions

Despite substantial gains in expressiveness and efficiency, several challenges remain:

Efficiency: Neural field NTS increases memory (additional $\sim$ 1 GB VRAM training), computation (2 $\times$ longer training), and reduces real-time frame rates when compared to parameter-only splatting (Wang et al., 24 Nov 2025).
Boundary generalization: For large unbounded scenes, global neural fields (e.g., tri-planes) may have insufficient spatial support, dampening improvements on benchmarks like MipNeRF360.
Overfitting: Explicit per-splat textures may overfit to dense novel view settings and create local seams; neural sharing mitigates but does not eliminate this.
Future approaches: Hardware-accelerated ray–Gaussian intersections, more efficient MLPs (hash-grid, SIRENs), hierarchical neural field parameterization, and improved temporal priors for dynamic scenes are promising directions (Wang et al., 24 Nov 2025).

These aspects define the ongoing landscape for practical, scalable NTS deployment.

7. Summary of Distinctive Features and Empirical Benchmarks

NTS’s core advantages over classical splatting and NeRF-style methods are:

Method	High-Frequency Detail	Temporal/View Conditioning	Model Size	Real-Time Perf.	Cross-Primitive Consistency
Vanilla 2D/3DGS	No	Spherical harmonics only	Small	Yes	N/A
Explicit TextureSplat	Yes	No	Large	Yes	No
Neural NTS	Yes	Yes	Modest	Yes (slower)	Yes

On established benchmarks, NTS methods achieve state-of-the-art photo-realism, sharper specular reflections, reduced normal error, and increased temporal coherence. Examples include gains in PSNR ($0.5$–$2$ dB), SSIM ($0.003$–$0.007$), and LPIPS reductions over competitive baselines (Younes et al., 16 Jun 2025, Wang et al., 24 Nov 2025, Chen et al., 2022). NTS enables high-fidelity, animatable avatars, robust scene completion from sparse or dense captures, and generalizes to reflective, dynamic, or editable 3D content while maintaining practical rendering rates.