Papers
Topics
Authors
Recent
Search
2000 character limit reached

Feed-Forward Gaussian Splats

Updated 31 May 2026
  • Feed-forward Gaussian splats are a real-time, optimization-free method that regresses 3D Gaussian primitives—encoding geometry and view-dependent appearance—from sparse input images.
  • They leverage modern encoder-decoder and transformer architectures to predict parameters like depth, covariance, and spherical harmonic color coefficients in a single forward pass.
  • This approach enhances scene fidelity, scalability, and compression, enabling efficient high-resolution rendering and robust generalization across diverse imaging domains.

Feed-forward Gaussian splats are a class of real-time, optimization-free scene representations in which the parameters of 3D Gaussian primitives—encoding geometry and appearance—are regressed in a single forward pass from sparse or unconstrained input images. This paradigm supplants traditional iterative fitting or test-time optimization with large-scale, end-to-end learned networks that generalize to new scenes and, in many cases, novel camera and illumination domains. Feed-forward Gaussian splatting has achieved rapidly increasing scene fidelity, reconstruction efficiency, and robustness, and now underpins a broad array of research in learning-based 3D scene representation, novel view synthesis, and semantic lifting.

1. Mathematical Model and Rendering of 3D Gaussian Splats

A 3D Gaussian splat, parameterized by its mean μR3\mu \in \mathbb{R}^3, covariance matrix ΣR3×3\Sigma \in \mathbb{R}^{3\times3}, opacity α\alpha, and view-dependent color coefficients cc (frequently via spherical harmonics), defines a volumetric density field: g(x)=αexp(12(xμ)Σ1(xμ))g(\mathbf{x}) = \alpha \, \exp\left(-\frac{1}{2}(\mathbf{x} - \mu)^\top \Sigma^{-1} (\mathbf{x} - \mu)\right) Rendering a set of NN such splats, G={gi}i=1N\mathcal{G} = \{g_i\}_{i=1}^N, involves projecting each Gaussian into the image plane (via a pinhole or perspective camera model, including Jacobian-corrected projection of Σ\Sigma), evaluating their 2D footprints at each pixel, and depth-sorted, alpha-blended compositing: C(x)=i=1NTi1(x)αiGi2D(x)ci,with    Ti(x)=j<i(1αjGj2D(x))C(\mathbf{x}) = \sum_{i=1}^N T_{i-1}(\mathbf{x}) \, \alpha_i \, G^{2D}_i(\mathbf{x}) \, c_i, \quad \text{with}\;\; T_{i}(\mathbf{x}) = \prod_{j<i} (1 - \alpha_j G^{2D}_j(\mathbf{x})) where Gi2D(x)G^{2D}_i(\mathbf{x}) is the projected 2D Gaussian, and ΣR3×3\Sigma \in \mathbb{R}^{3\times3}0 is evaluated via the appropriate spherical harmonic basis for view-dependent shading (Jiang et al., 29 May 2025, Suh et al., 31 Mar 2026, Fujimura et al., 23 Apr 2026).

2. Feed-Forward Network Architectures

Feed-forward splatting frameworks eliminate test-time optimization by training networks to directly regress all necessary splat parameters from input images:

Voxelization or spatial fusions are used to reduce redundancy and memory use in dense-pixel or multi-view settings (Jiang et al., 29 May 2025).

3. Transfer, Generalization, and Domain Adaptation

State-of-the-art feed-forward splatting architectures target domain generalization and robustness:

  • Pose-free and unconstrained input: Recent systems infer 3D structure and splats from unposed, uncalibrated images, extending to domains such as internet photo collections, driving datasets, or human-centric multi-view data (Tian et al., 19 Dec 2025, Tian et al., 2024, Fujimura et al., 23 Apr 2026).
  • Appearance control: Embedding-based appearance heads (e.g., per-image global token ΣR3×3\Sigma \in \mathbb{R}^{3\times3}2) allow explicit modulation for lighting transfer, cross-scene relighting, and interpolation in the appearance embedding space (Fujimura et al., 23 Apr 2026).
  • Sparse-view and wide-baseline robustness: Multi-stage networks may combine feed-forward splats with diffusion or refinement modules to address incomplete texture/detail or geometric inconsistencies under sparse and wide-baseline input (e.g., ProSplat’s one-step diffusion with epipolar attention) (Lu et al., 9 Jun 2025).
  • Feed-forward language grounding: Some architectures join CLIP-based semantic alignment or language tokens to the pipeline, producing language-embedded splats for semantic segmentation or open-vocabulary queries (Tian et al., 19 Dec 2025).

4. Advances in Scalability, Anti-Aliasing, and Resolution

Classic pixel-aligned architectures suffered from quadratic scaling in primitive count with image resolution (ΣR3×3\Sigma \in \mathbb{R}^{3\times3}3). Recent advances include:

  • Decoupling geometry and appearance: LGTM-style frameworks predict a compact grid of Gaussians and attach learnable per-splat textures, supporting 4K rendering with orders of magnitude fewer primitives (Lao et al., 26 Mar 2026). Complexity is now controlled by primitive count (not image size), with per-splat textures handling high-frequency detail.
  • Anti-aliasing and cross-resolution consistency: AA-Splat introduces per-Gaussian 3D band-limiting (BLPF) and opacity balancing (OB), using Nyquist frequency bounds from all context views to band-limit splats. This eliminates aliasing, preserves sharpness across up/downsampling, and achieves dramatic PSNR gains (up to ΣR3×3\Sigma \in \mathbb{R}^{3\times3}4 dB over DepthSplat on out-of-distribution datasets) (Suh et al., 31 Mar 2026).
  • Opacity normalization at variable input counts: Normalization strategies (e.g., RoSplat) maintain consistent pixel brightness and coverage regardless of the number of input views, eliminating over-brightness and hole artifacts in multi-view or high-resolution settings (Nguyen et al., 13 May 2026).

5. Compression and Compact Representation

The high memory and bandwidth cost of 3DGS representations prompted the development of entropy and transform-based codecs tailored for feed-forward pipelines:

  • CodecSplat: Compresses the intermediate 2D Gaussian-generation feature maps (not the final splats), using a learned hierarchical VAE + context model. This achieves ΣR3×3\Sigma \in \mathbb{R}^{3\times3}5–ΣR3×3\Sigma \in \mathbb{R}^{3\times3}6 dB PSNR for ΣR3×3\Sigma \in \mathbb{R}^{3\times3}7–ΣR3×3\Sigma \in \mathbb{R}^{3\times3}8 KiB/scene—one order of magnitude better than baseline splat compressors (Yu et al., 25 May 2026).
  • Long-context modeling (LocoMoco): Morton serialization and attention-based entropy coding allow compact compression of thousands of Gaussians in a single pass with robust rate–distortion tradeoffs and efficient inference (ΣR3×3\Sigma \in \mathbb{R}^{3\times3}9–α\alpha0s/scene) (Liu et al., 30 Nov 2025).

6. Extensions: Semantics, Multi-modality, Style, and Robustness

7. Benchmarks, Performance, and Quantitative Results

Feed-forward splatting models now match or often exceed geometry and appearance quality of per-scene optimized datadriven pipelines in standard NVS metrics (PSNR, SSIM, LPIPS), but at orders-of-magnitude faster inference and with broader generalization. Representative numbers include:

Method PSNR (dB) SSIM LPIPS Notes
WildSplatter α\alpha1 dB α\alpha2 Over best pose-free baseline, 2-4 view NVS (Fujimura et al., 23 Apr 2026)
ProSplat α\alpha3 dB α\alpha4 α\alpha5 Over DepthSplat, sparse wide-baseline (Lu et al., 9 Jun 2025)
AA-Splat α\alpha6 Over DepthSplat (anti-aliased, multi-res) (Suh et al., 31 Mar 2026)
CodecSplat α\alpha7–α\alpha8 α\alpha9–cc0 KiB/scene, KB-level compression (Yu et al., 25 May 2026)

Feed-forward models now render high-fidelity 3DGS scenes at real-time rates with fast, parallel hardware. Memory, time, and quality tradeoffs are flexible via compactification, sparsification, or decoupling (texture-based) designs.


Feed-forward Gaussian splatting thus defines a comprehensive framework for efficient, real-time, and extensible 3D scene representation, spanning geometry, appearance, semantics, and compression, and is foundational for rapid progress in learned scene reconstruction, novel view synthesis, semantic lifting, and cross-modal representation (Jiang et al., 29 May 2025, Fujimura et al., 23 Apr 2026, Suh et al., 31 Mar 2026, Tian et al., 19 Dec 2025, Yu et al., 25 May 2026, Lao et al., 26 Mar 2026, Lu et al., 9 Jun 2025, Nguyen et al., 13 May 2026, Turkulainen et al., 19 May 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Feed-Forward Gaussian Splats.