GSplat: Differentiable 3D Gaussian Splatting

Updated 9 December 2025

GSplat is an open-source, modular library that enables fully differentiable 3D Gaussian splatting for high-fidelity scene reconstruction and neural rendering.
It employs a robust pipeline integrating projection, tile-based depth sorting, and alpha blending to provide analytic gradients and end-to-end learning.
The library features an extensible Python/PyTorch frontend paired with optimized CUDA backends, supporting integration in both academic research and commercial applications.

GSplat is an open-source, modular library for efficient, fully differentiable Gaussian splatting in 3D scene representation and rendering. Released under the Apache License 2.0 and compatible with both Linux and Windows, GSplat features a PyTorch-friendly Python interface with highly optimized CUDA backends. It is designed for high-fidelity 3D reconstruction, real-time neural rendering, and inverse-graphics pipelines, enabling end-to-end training of scene geometry, appearance, and camera pose. The library’s algorithmic and software architecture allows integration with extensible research workflows for both academic and commercial applications, and has become a reference implementation for recent innovations in differentiable scene representation and real-time rendering (Ye et al., 2023, Ye et al., 10 Sep 2024, Nath et al., 2 Dec 2025).

1. Mathematical Foundations and Splatting Algorithm

GSplat models a scene as a cloud of oriented anisotropic 3D Gaussian primitives, each parameterized by a mean position $\mu\in\mathbb{R}^3$ , a covariance matrix $\Sigma\in\mathbb{R}^{3\times 3}$ (encoded as scale $s\in\mathbb{R}^3$ and rotation quaternion $q\in\mathbb{R}^4$ ), color $c\in\mathbb{R}^3$ (potentially with view-dependent rendition via spherical harmonics), and opacity $o\in\mathbb{R}$ or $\alpha\in [0,1]$ . The differentiable rasterization pipeline comprises:

Projection: Each Gaussian is projected from world to camera/image space given camera extrinsics $T_{cw} \in SE(3)$ and projection matrix $P$ . The 2D mean $\mu^\prime$ , and the projected 2D covariance $\Sigma^\prime$ are computed using a Jacobian $J$ of the pinhole projection at the 3D mean:

$\Sigma^\prime = J R_{cw} \Sigma (R_{cw})^\top J^\top$

where $R_{cw}$ is the rotation part of $T_{cw}$ , and $\Sigma = R(q)\operatorname{diag}(s)^2R(q)^\top$ . For each pixel location $x_i$ , the per-Gaussian pixel contribution (opacity) is:

$\alpha_n = o_n \exp\left(-\frac{1}{2}(x_i - \mu_n^\prime)^\top (\Sigma_n^\prime)^{-1} (x_i - \mu_n^\prime)\right)$

Binning and Depth Sorting: Gaussians are assigned to all $16\times 16$ pixel tiles overlapped by their $3\sigma$ projected ellipse. Within tiles, they are sorted front-to-back by mean depth.
Compositing: For pixel $i$ , depth-front-to-back compositing is performed:

$C_i = \sum_{n=1}^{N_i} c_n\,\alpha_n T_n, \quad T_n = \prod_{m < n}(1-\alpha_m)$

This equation mirrors the over-operator in alpha blending, enabling precise analytic gradients for differentiation.

Backward gradients are obtained analytically for all Gaussian parameters and through the entire differentiable pipeline (projection, rasterization, depth sorting, compositing), enabling full end-to-end learning in neural rendering scenarios (Ye et al., 2023, Ye et al., 10 Sep 2024, Nath et al., 2 Dec 2025).

2. Architecture and API Structure

GSplat is structured in two principal layers:

Python/PyTorch Frontend: Exposes primitives for defining Gaussians, camera models, differentiable rasterization, and various densification strategies as PyTorch autograd functions, facilitating seamless integration with standard torch.nn modules and optimizers in training loops.
CUDA Kernel Backend: Implements batched Gaussian projection, tile-based depth sorting, and per-pixel splatting as custom fused CUDA kernels (wrapped by PyBind11), emphasizing memory efficiency and massive parallelism.

Key modules and classes include:

Module/Class	Functionality	Reference
gsplat.rasterization	Core autograd function for differentiable rasterization	(Ye et al., 10 Sep 2024)
gsplat.densify	Strategies: ADCStrategy, AbsGradStrategy, MCMCStrategy	(Ye et al., 10 Sep 2024)
gsplat.pose	Differentiable pose refinement utilities	(Ye et al., 10 Sep 2024)
gsplat.depth	Depth rendering “mode” for accumulated/expected depth maps	(Ye et al., 10 Sep 2024)
GaussianCollection	Batch structure for $(\mu,s,q,c,o)$ arrays	(Ye et al., 2023)
Rasterizer	High-level interface: tile size config, forward/backward pass	(Ye et al., 2023)

These layers are exposed such that researchers can implement and customize splatting pipelines, extend CUDA components, or interleave GSplat with arbitrary PyTorch architectures (Ye et al., 2023, Ye et al., 10 Sep 2024).

3. Optimization Techniques and Densification Strategies

GSplat incorporates several optimization strategies to accelerate convergence, reduce peak memory, and maintain representation quality:

ADC (Adaptive Density Control): Monitors accumulated positional gradients to trigger Gaussian split/clone operations when exceeding a threshold, periodically pruning low-opacity ( $o < 0.005$ ) elements.
AbsGrad: Tracks absolute sums of view-space positional gradients to avoid cancellation across multiple views and to robustify densification decisions.
MCMC Strategy: Casts densification as a form of Stochastic Gradient Langevin Dynamics, applying noise for improved convergence properties and realizing up to $4\times$ memory reduction compared to naive splitting (Ye et al., 10 Sep 2024).
Anti-Aliasing/Mip-Splatting: Optional low-pass filtering is incorporated by modifying density evaluation, allowing resolution-independent rendering and mitigating aliasing artifacts (Ye et al., 10 Sep 2024).

Extension points in the API permit insertion of custom CUDA routines, loss functions, and out-of-the-box integration with any PyTorch loss or module.

4. Performance Benchmarks and Engineering

GSplat implements several engineering optimizations, including tile-level CUDA kernel fusion, streaming tilewise rendering to minimize memory, and efficient prefix-sum/radix sort per tile.

Metric	3DGS (orig)	GSplat (ADC/MCMC)	Reference
Training Time (30k iters)	26.19 min	19.39 min	(Ye et al., 10 Sep 2024)
Peak GPU Mem (30k iters)	9.0 GB	5.6 GB (ADC), 1.98 GB (MCMC)	(Ye et al., 10 Sep 2024)
Novel-view PSNR/SSIM/LPIPS	matched or exceeded	matched or exceeded	(Ye et al., 10 Sep 2024)

On a NVIDIA A100, 1 million Gaussians can be rendered into $640\times480$ images at approximately 30 FPS (Ye et al., 2023). In real-time medical/surgical deployments, such as in G-SHARP, GSplat exceeds 60 FPS at $640\times512$ resolution and maintains PSNR around 38 dB (Nath et al., 2 Dec 2025).

5. Licensing, Distribution, and Commercial Use

GSplat is distributed under the permissive Apache License 2.0. Key terms:

Royalty-free worldwide patent grant from all contributors.
Mandatory retention of the license and any NOTICE file in derivative works.
No copyleft or viral provisions (can be used in proprietary products).
Explicit shield for commercial deployment: no third-party encumbrances, facilitating adoption in closed-source contexts (e.g., medical devices, commercial AR suites).
Source distribution available via both PyPI (pip install gsplat) and GitHub (Ye et al., 2023, Ye et al., 10 Sep 2024).

All core CUDA, Python, and auxiliary modules (such as HexPlane+MLP deformable reconstruction) in prominent downstream projects (e.g., G-SHARP) are licensed under Apache-2.0, with no non-commercial or derivative-only dependencies (Nath et al., 2 Dec 2025).

6. Community Ecosystem and Research Extensions

GSplat serves as a reference implementation in several open-source and research projects:

GauStudio (Ye et al. 2024): Modular editing/feature-field distillation, building directly atop GSplat’s API.
Mip-Splatting (Yu et al. 2024): Alias-free rendering using GSplat’s antialias mode.
EfficientGS (Liu et al. 2024): Large-scale, multi-GPU scene scaling via point-based culling.
G-SHARP (Nath et al., 2 Dec 2025): Commercial, real-time surgical scene modeling leveraging GSplat as rasterization backbone; achieves state-of-the-art speed and accuracy for intra-operative tissue reconstruction with production-ready Holoscan SDK integration.

The project is maintained on GitHub, with standard open-source contribution guidelines, continuous integration, and an active issue/PR workflow (Ye et al., 10 Sep 2024). API extensibility enables adoption for emerging research directions, including custom densification, hybrid representations, and integration with neural field models.

7. Usage Example and API Overview

Typical workflow involves constructing the Gaussian parameter tensors and invoking the rasterization core in training/inference:

import torch
from gsplat import rasterization, MCMCStrategy

N = 100_000
mean  = torch.randn(N,3, device='cuda') * 0.5
quat  = torch.randn(N,4, device='cuda'); quat /= quat.norm(dim=-1, keepdim=True)
scale = torch.ones(N,3, device='cuda') * 0.01
opac  = torch.ones(N, device='cuda') * 0.1
color = torch.rand(N,3, device='cuda')

view = torch.eye(4, device='cuda').unsqueeze(0)
K = torch.tensor([[500,0,128],[0,500,128],[0,0,1]], device='cuda').unsqueeze(0)
strategy = MCMCStrategy(noise_scale=1e-3, lr_scale=1e-4)
state = strategy.initialize_state(N)

optimizer = torch.optim.Adam([mean, quat, scale, opac, color], lr=1e-3)
for step in range(10000):
    strategy.step_pre_backward(state, mean, scale, opac)
    rgb, alpha, meta = rasterization(
        mean, quat, scale, opac, color, view, K, H=240, W=240,
        mode='rgb+alpha', absgrad=True, antialias=True)
    loss = torch.nn.functional.mse_loss(rgb, target_images)
    loss.backward()
    strategy.step_post_backward(state, mean, scale, opac)
    optimizer.step()
    optimizer.zero_grad()

The public API enables forward and backward passes for rasterization, integration with arbitrary PyTorch models, direct extensibility via custom kernels/losses, and a range of research-driven configuration options (e.g., antialiasing, densification strategy, depth-only rendering) (Ye et al., 10 Sep 2024).