Image-Based Gaussian Splatting

Updated 20 November 2025

Image-Based Gaussian Splatting is a scene representation paradigm using anisotropic Gaussian primitives to model radiance, structure, and appearance from multiview data.
It employs techniques such as front-to-back alpha compositing, ray tracing, and differentiable rendering to achieve photorealistic view synthesis and efficient image compression.
Applications include 3D scene reconstruction, synthetic data generation, and adaptive real-time streaming, with ongoing research focused on improving edge fidelity and adaptivity.

Image-Based Gaussian Splatting (IBGS) is an explicit scene and image representation paradigm leveraging collections of anisotropic Gaussian primitives (“splats”) to approximate radiance, structure, and appearance across a variety of computer vision and graphics tasks. Its defining characteristic is the use of Gaussians whose parameters—center location, covariance, color, and (optionally) higher-level attributes—are derived directly from or dynamically aligned with multiview image data. IBGS versions span high-speed 2D image representation and compression, advanced 3D scene reconstruction, photorealistic view synthesis, and adaptive real-time streaming, with applications across rendering, novel-view generation, inpainting, synthetic data production, and tensor-based image recovery.

1. Foundations and Mathematical Formulation

In IBGS, an image or 3D scene is represented as a sum or composition of $N$ Gaussian kernels. The prototypical primitive is defined as:

Center (for 3D): $\boldsymbol\mu_i \in \mathbb{R}^3$ ; for 2D: $\mu_i \in \mathbb{R}^2$
Covariance: $\Sigma_i \in \mathbb{R}^{d\times d}$ ( $d=2$ or $3$), often factored as $\Sigma_i = R_i S_i S_i^T R_i^T$
Color/intensity: $c_i$ (RGB or SH coefficients)
Opacity: $o_i$
(Optionally) spherical harmonics for view-dependent color

For the 3D case, under camera projection, each Gaussian projects to the image plane with an associated 2D mean and covariance:

$\mu_i^{2D} = P W \mu_i,\quad \Sigma_i^{2D} = J W \Sigma_i W^T J^T$

where $W$ is the world-to-camera transform, $P$ is the camera projection, and $J$ the projection Jacobian.

The standard splatting rendering equation, under depth-sorted (front-to-back) compositing, assigns each Gaussian weight at pixel $p$ :

$\beta_i(p) = o_i \exp\left[-\frac{1}{2}(p - \mu_i^{2D})^T (\Sigma_i^{2D})^{-1} (p - \mu_i^{2D})\right]$

with pixel color given by

$C(p) = \sum_{i=1}^N c_i \beta_i(p) \prod_{j<i}(1 - \beta_j(p))$

View-dependent effects are modeled by letting $c_i$ be a function of the camera direction, either via spherical harmonics or neural networks (Nguyen et al., 18 Nov 2025).

For pure 2D images, order-invariant summation is used (Zhang et al., 13 Mar 2024):

$C_i = \sum_{n=1}^N c'_n \exp(-\sigma_n^{(i)})$

where $c'_n = o_n c_n$ absorbs opacity, and $\sigma_n^{(i)} = \frac{1}{2}(x_i - \mu_n)^T\Sigma_n^{-1}(x_i - \mu_n)$ .

2. Scene Construction, Parameter Fitting, and Data Alignment

Scene construction in IBGS leverages either direct optimization on target images or alignment of Gaussian parameters to source image data.

Multi-view 3D Reconstruction: Initial Gaussian positions are obtained via sparse structure-from-motion (SfM) outputs (e.g. COLMAP), and their parameters (positions, covariances, color, opacity) are refined via photometric losses against target images (Zeng et al., 20 Jul 2024, Vanherle et al., 11 Apr 2025).
Extraction and Manipulation: For complex scenes, individual objects and backgrounds are modeled as separate Gaussian clouds, and geometric transformations (rigid, affine) allow dynamic instantiation and composition (Zeng et al., 20 Jul 2024).
Residual Alignment: To overcome the limited expressivity of standard SH-parameterized color, IBGS models the final pixel color as the sum of the base color and a learned residual function of neighboring images, extracted via feature warping and neural decoders (Nguyen et al., 18 Nov 2025). This leverages high-frequency information from training images for view-dependent and fine texture effects.

In 2D image representation (e.g., GaussianImage), parameter fitting is performed by minimizing the mean-squared reconstruction error over all pixels, with differentiable backpropagation yielding fast convergence, compact representations, and strong rate-distortion trade-offs (Zhang et al., 13 Mar 2024, Zeng et al., 30 Jun 2025).

3. Rendering Algorithms and Differentiable Compositing

IBGS leverages high-performance rasterization and differentiable compositing:

Front-to-back alpha compositing: Depth-ordering ensures physically plausible blending, critical for occlusion and transparency (Zeng et al., 20 Jul 2024, Nguyen et al., 18 Nov 2025).
Order-invariant splatting in 2D: For single-image scenarios, compositing reduces to a parallel summation, yielding extreme inference speeds (up to 2000 FPS on commodity GPUs) (Zhang et al., 13 Mar 2024).
Ray tracing (RaySplats): Extends rendering to full ray-tracing by computing closed-form ray–ellipsoid intersections, supporting physical light transport, shadows, and seamless mesh-Gaussian mixing (Byrski et al., 31 Jan 2025).
Discontinuity-aware splatting: DisC-GS introduces Bézier-curve boundary masking to sharply constrain Gaussian support in the image plane, effectively restoring edge fidelity at silhouettes and discontinuities; a specialized gradient approximation restores differentiability for end-to-end training (Qu et al., 24 May 2024).
Patch-wise rasterization: For high-resolution 2D tasks (image inpainting), spatial decomposition and patch-local splatting enable efficient, scalable deployment while conserving GPU memory (Li et al., 2 Sep 2025).

4. Applications: Synthetic Data, View Synthesis, and Compression

IBGS demonstrates versatility across applications:

Synthetic data generation: Pipelines such as "Cut-and-Splat" automatically reconstruct object Gaussians from video, composite them onto random backgrounds with realistic placement, and generate large, fully annotated datasets for detection and segmentation with high mAP (e.g., 83 mAP on in-domain backgrounds), outperforming cut-and-paste and diffusion inpainting approaches (Vanherle et al., 11 Apr 2025).
Novel-view and view-consistent synthesis: IBGS provides photorealistic, multi-view-consistent images for novel view synthesis tasks, robustly capturing fine textures, specularities, and geometry (Nguyen et al., 18 Nov 2025, Zeng et al., 20 Jul 2024).
Image compression and ultra-fast decoding: 2D versions such as GaussianImage achieve state-of-the-art PSNR/MS-SSIM for a fixed number of parameters, with fast encoding/decoding (render speeds 1500–2000 FPS), low memory use, and practical vector-quantization codecs (Zhang et al., 13 Mar 2024, Zeng et al., 30 Jun 2025).
Multi-dimensional tensor and spectral recovery: GSLR leverages tailored 2D (for spatial) and 1D (for spectral) Gaussian splatting to build low-rank, locally adaptive representations for multi-dimensional image recovery, significantly outperforming SVD and DFT/DCT-based methods in PSNR/SSIM, especially for local high-frequency content (Zeng et al., 18 Nov 2025).

5. Advances: Efficiency, Streaming, and Adaptivity

Streaming and online reconstruction: LongSplat introduces a streaming update mechanism wherein new frame data is incrementally converted to Gaussians, and redundant historical splats are compressed via a Gaussian-Image Representation (GIR) and binary mask supervision. This reduces Gaussian count by up to 44%, preserves real-time reconstruction rates, and maintains quality as input sequences grow (Huang et al., 22 Jul 2025).
Generalizable and adaptive representations: Instant GaussianImage circumvents the slow per-image optimization of prior 2D-GS models by leveraging a neural network to produce a coarse Gaussian layout, followed by brief fine-tuning, achieving competitive results in under 2 seconds per image (Zeng et al., 30 Jun 2025).
Low-light enhancement and scene recovery: LLGS employs a decomposable "M-Color" representation to separate intrinsic color from illumination, supporting robust unsupervised enhancement and reconstruction in extreme dark environments, outperforming both single-view enhancement and NeRF-based low-light methods in quality and speed (Wang et al., 24 Mar 2025).

6. Performance, Benchmarks, and Limitations

IBGS consistently achieves or exceeds competitive metrics across domains, with high PSNR, SSIM, and low perceptual error (LPIPS). For example:

Dataset	Method	PSNR	SSIM	LPIPS
Mip-NeRF360	IBGS	28.33	0.837	0.186
T&T	IBGS	24.84	0.869	0.148
Deep Blending	IBGS	30.12	0.912	0.237
	3DGS	27.69	0.825	0.203

IBGS storage requirements are lower than per-Gaussian texture or global atlas approaches (e.g., ~40–70% memory reduction) while improving quality (Nguyen et al., 18 Nov 2025).

Documented limitations include the following:

In sparse-view or highly cluttered scenarios, residual estimation can become noisy due to insufficient neighboring support (Nguyen et al., 18 Nov 2025).
Adaptive curve or boundary models may incur memory/computational cost for segmentation and edge refinement in extremely complex scenes (Qu et al., 24 May 2024).
RaySplats achieves physically plausible light transport at the cost of slower inference relative to tile-based rasterization (Byrski et al., 31 Jan 2025).

7. Frontiers and Future Directions

Ongoing and proposed fronts in IBGS research include:

Adaptive per-splat curve allocation and hierarchical masking for improved edge modeling (Qu et al., 24 May 2024).
Dynamic time-aware or multi-modal residual inference for animation and video (Nguyen et al., 18 Nov 2025).
End-to-end joint optimization of Gaussians and neural feature decoders to further close the realism gap between explicit splatting and neural implicit representations.
Extensions to non-rigid, highly dynamic, and transparent scenes (Vanherle et al., 11 Apr 2025).
Integrating event-based sensor data for high-temporal-resolution radiance recovery (Matta et al., 26 Dec 2024).
Streaming and online lifetime learning with real-time redundancy management (Huang et al., 22 Jul 2025).

Image-Based Gaussian Splatting thus offers a unified, tractable representation paradigm combining explicit geometry, photometric fidelity, efficient computation, and flexibility for graphic and vision tasks. Performance gains in view synthesis, semantic data generation, compression, and online processing continue to advance the methodological state of the art, with broad applicability across computer vision, medical imaging, robotics, and computational photography.