Image-Based Gaussian Splatting

Updated 19 February 2026

Image-Based Gaussian Splatting is a technique that represents scenes as superpositions of parameterized Gaussian primitives optimized directly against image observations.
It employs a differentiable, GPU-accelerated rasterization pipeline to achieve real-time 3D rendering, novel view synthesis, and efficient image compression.
The method balances high spatial detail and computational efficiency through adaptive Gaussian control, regularization strategies, and integration with neural fields.

Image-Based Gaussian Splatting (IBGS) is a class of techniques for representing images or scenes as superpositions of parameterized Gaussian primitives whose attributes are optimized directly against image-based observations. The approach unifies explicit, point-based rendering and photometric supervision within a differentiable, high-performance rasterization pipeline. IBGS encompasses both 2D and 3D formulations, but recent advances primarily focus on 3D Gaussian Splatting with supervision from multi-view images, directly fitting scene parameters to optimize photometric and geometric consistency. The methodology achieves high-fidelity and real-time performance for applications such as novel view synthesis, compact image compression, and surface reconstruction in the presence of complicated appearance phenomena.

1. Foundational Representation and Mathematical Formulation

IBGS methods represent a scene (or image) as a collection of anisotropic Gaussian primitives, each described by a set of parameters: mean position $\boldsymbol\mu_i$ , positive definite covariance $\Sigma_i$ , opacity $\alpha_i$ , and additional attributes for appearance (e.g., spherical harmonic coefficients for view-dependent color) (Bao et al., 2024). The core formulations are as follows:

3D Gaussian Density:

$G_i(\mathbf{x}) = \exp\left(-\tfrac{1}{2}(\mathbf{x} - \boldsymbol\mu_i)^\top \Sigma_i^{-1} (\mathbf{x} - \boldsymbol\mu_i) \right)$

with $\Sigma_i = R_i \, \mathrm{diag}(s_{i,x}, s_{i,y}, s_{i,z})^2 R_i^\top$ , $R_i$ being a rotation matrix (often quaternion-parametrized).

Appearance Model:

View-dependent color is typically modeled with spherical harmonics: $\mathrm{color}_i(\mathbf{d}) = \sum_{\ell=0}^L \sum_{m=-\ell}^\ell Y_{\ell,m}(\mathbf{d}) c_{i,\ell,m}$ with $Y_{\ell,m}$ basis functions and $c_{i,\ell,m}$ coefficients.

Projection/Splatting:

3D Gaussians are rendered to 2D images using approximate elliptical projections: $w_i(\mathbf{u}) \approx \exp\left(-\tfrac{1}{2} (\mathbf{u} - \boldsymbol\mu_i')^\top \Sigma_i'^{-1} (\mathbf{u} - \boldsymbol\mu_i')\right)$ where $\boldsymbol\mu_i'$ and $\Sigma_i'$ are the mean and covariance of the projected ellipse.

Alpha Blending and Differentiable Compositing:

Ordering primitives by depth, accumulated transmittance and alpha blending yields the pixel color: $T_0 = 1,\quad T_k = T_{k-1} (1 - \alpha_{i_k}(\mathbf{u})),\quad C(\mathbf{u}) = \sum_k T_{k-1} \alpha_{i_k}(\mathbf{u}) \mathrm{color}_{i_k}(\mathbf{d})$

Supervision:

A photometric loss compares synthesized color to ground-truth images: $\mathcal{L}_\mathrm{photo} = \sum_\mathbf{u} \| C(\mathbf{u}) - I_\mathrm{gt}(\mathbf{u}) \|_2^2$ Additional losses may be imposed for depth, normals, and regularization (Bao et al., 2024, Nguyen et al., 18 Nov 2025).

2. Computational Pipeline and Rendering

IBGS supports efficient real-time rendering via GPU-accelerated tile-based rasterization. The workflow for a typical frame entails:

Tile-based Rasterization:

Divide the image into $16 \times 16$ tiles; for each Gaussian, calculate intersected tiles and emit key-value pairs (tile index, depth), parameters.

Sorting and Blending:

Globally sort per-tile Gaussians in depth order. Each pixel thread blends its set using the alpha compositing equations above (Bao et al., 2024).

View Synthesis:

Optimized Gaussians can be projected and rendered from arbitrary, unseen viewpoints at real-time rates (≥ 30 fps at 1080p) (Bao et al., 2024).

In 2D settings, rasterization is performed by order-invariant summation (no sorting), e.g.

$C(x) = \sum_{i=1}^N c'_i \exp(-\sigma_i(x)), \quad \sigma_i(x) = \frac{1}{2}(x - \mu_i)^T \Sigma_i^{-1} (x - \mu_i)$

(Zhang et al., 2024, Li et al., 22 Dec 2025).

Quantization-aware codecs for IBGS further enable image compression, employing attribute-wise learned scalar quantization and entropy coding for compact storage, while maintaining real-time (1000+ fps) decoding (Li et al., 22 Dec 2025, Zhang et al., 2024).

3. Key Technical Modules and Algorithmic Enhancements

Recent research systematically categorizes the technical modules essential to IBGS (Bao et al., 2024):

Initialization:

Seeding from SfM/MVS point clouds or by randomization; triplane-based priors for generalization.

Attribute Expansion:

Integration of higher-degree view-dependent attributes (e.g., expanded spherical harmonics), semantic, depth, and normal fields.

Rasterization Techniques:

EWA-based splatting is standard; tangent-plane rasterization (GS++), hardware-accelerated custom kernels, and memory layout optimizations improve throughput.

Regularization Strategies:

3D geometric losses (depth/normal supervision), 2D photometric losses, and physics priors for dynamic sequences.

Adaptive Gaussian Control:

Dynamically splitting underfit Gaussians in high-gradient regions and pruning low-opacity, redundant ones to maximize efficiency and detail (Li et al., 22 Dec 2025).

Post-processing:

Mesh extraction via Poisson or tetrahedral methods; anti-aliasing filters for scale adaptivity.

Integration and Guidance:

Hybridization with NeRF fields, SDF anchors, and priors from monocular depth or pretrained diffusion models to improve scene fidelity and generalization.

Table: Key IBGS Technical Modules ((Bao et al., 2024), summarized)

Module	Function	Example Techniques
Initialization	Gaussian seeding, priors	SfM, MVS, triplane fields
Attribute Expansion	Enhanced color/geometry fields	SHs, normals, semantics
Rasterization	Efficient convolution, projection	EWA splatting, tiling
Regularization	Geometric/photometric constraints	Depth/normal losses
Adaptive Control	Density splitting, pruning	Error-based densification
Post-processing	Surface/mesh extraction, anti-aliasing	Poisson, GOF → tet meshes
Integration/Priors	Joint with NeRF, SDF, diffusion priors	GPS-Gaussian

4. Applications and Domain-Specific Extensions

IBGS has been adapted for a variety of tasks and modalities:

3D Scene Reconstruction and View Synthesis:

Fitting 3D Gaussians to multi-view images for high-fidelity, real-time novel view rendering or geometry extraction (Nguyen et al., 18 Nov 2025).

Image Compression and Representation:

2D Gaussian Splatting supports compact, continuous representations for fast, high-PSNR image compression (e.g., GaussianImage++ achieves PSNR of 35.41 dB with 0.08M params and 2216 fps decode) (Li et al., 22 Dec 2025).

Image Restoration and Inpainting:

IBGS architectures enhanced with semantic alignment (e.g., DINO features) enable contextually consistent inpainting with differentiable patch-wise rasterization (Li et al., 2 Sep 2025).

Contour-Preserving Representations:

Contour-aware IBGS with segmentation priors maintains sharp edges under compression by constraining Gaussians to image or semantic regions, yielding 0.5–2.5 dB PSNR improvements at region boundaries (Takabe et al., 29 Dec 2025).

Robust 3D Recovery under Adverse Conditions:

Specialized IBGS frameworks compensate for illumination inconsistency (Wang et al., 16 Mar 2025) or utilize event streams with blurry images for trajectory and radiance field recovery (Matta et al., 2024).

Single-Image 3D Generation and Diffusion Integration:

Geometric distillation via Gaussian Splatting decoders ensures multi-view consistency and high-quality 3D recovery from 2D diffusion outputs (Tao et al., 8 Mar 2025).

5. Challenges, Limitations, and Performance Benchmarks

IBGS research highlights several intrinsic challenges (Bao et al., 2024, Nguyen et al., 18 Nov 2025):

Accuracy-Speed Tradeoff:

Smaller Gaussians yield higher spatial detail but erode real-time performance, while larger primitives accelerate rendering but introduce blurring. Multi-scale or adaptive scale control strategies mitigate these effects.

Memory and Storage Efficiency:

Millions of Gaussians may consume extensive memory; vector quantization, pruning, and lightweight encodings are employed to control footprint (Li et al., 22 Dec 2025, Zhang et al., 2024).

View Consistency and Geometric Robustness:

Ill-posed settings (few images, unconstrained capture) can result in floating artifacts or collapsed structures. Remedies include depth/normal supervision, diffusion guidance, and region-aware rasterization (Takabe et al., 29 Dec 2025, Tao et al., 8 Mar 2025).

Physical/Lighting Realism:

Standard IBGS lacks true ray tracing, restricting photorealistic effects. RaySplats extends IBGS to ray tracing, enabling shadows and reflections via ellipsoid-ray intersection and per-ray compositing (Byrski et al., 31 Jan 2025).

Dynamic and Large-Scale Scenes:

Extending IBGS to 4D (dynamic) or massive environments requires temporal attributes, federated training, and block-wise scene management.

Table: Representative Quantitative Performance ((Nguyen et al., 18 Nov 2025), mean across datasets)

Method	PSNR↑	SSIM↑	LPIPS↓	Gaussians	Mem (MB)
3DGS	27.69	0.825	0.203	3.2 M	764
TexturedGS	27.35	0.827	0.186	—	—
IBGS	28.33	0.837	0.186	1.59 M	291

IBGS achieves higher fidelity and compactness compared to baseline 3DGS and texture-augmented variants.

6. Contemporary Research Trends and Opportunities

Current directions in IBGS research include (Bao et al., 2024, Nguyen et al., 18 Nov 2025):

Generalizable and Feed-Forward IBGS:

Eliminating per-scene optimization via networks that predict Gaussian fields from raw images (PixelSplat, GPS-Gaussian, Instant-GI (Zeng et al., 30 Jun 2025)).

Physically Based Rendering and Editing:

Integration with BRDF parameters, environment mapping, and ray tracing for material-aware, relightable representations (Byrski et al., 31 Jan 2025).

Large-Scale and Federated Modeling:

VastGaussian, Fed3DGS tackle block-based and distributed optimization for real-world-scale scenes.

Unconstrained “In-the-Wild” IBGS:

Learning to condition Gaussians on appearance priors for robust modeling from heterogeneous, uncalibrated photo collections (SWAG, “Gaussian in the Wild”).

Hybridization and Modality Fusion:

IBGS pipelines fusing depth, normal, or learned features from diffusion models, offering geometric convergence and semantic alignment in data-scarce or ill-posed settings.

A plausible implication is that as high-throughput differentiable rasterization and attribute expansion mature, IBGS will increasingly subsume 3D generative modeling, graphics, and photometric optimization tasks spanning vision and rendering disciplines. Continued architectural consolidation and hardware co-design are expected to drive further efficiency and generalization.

IBGS diverges from traditional mesh or voxel-based graphics via its continuous, analytic primitive-based representation and inherently differentiable pipeline. Compared to implicit neural fields (e.g., NeRF), it offers orders-of-magnitude faster fitting and rendering, direct editability, and compatibility with standard rasterization hardware (Bao et al., 2024, Zhang et al., 2024, Li et al., 22 Dec 2025). Unlike approaches that augment Gaussian splatting with global or per-Gaussian texture maps, IBGS residual-based methods leverage input images directly for view-dependent effects, optimizing both quality and memory scaling (Nguyen et al., 18 Nov 2025).

In summary, Image-Based Gaussian Splatting defines a rigorous, extensible mathematical and computational framework for high-performance, image-supervised scene and image modeling, with expanding applications across inverse graphics, computational photography, and 3D vision research.