Papers
Topics
Authors
Recent
Search
2000 character limit reached

WebGPU Gaussian Splatting

Updated 12 June 2026
  • The paper introduces fully GPU-resident neural rendering pipelines that deliver interactive scene synthesis by harnessing the WebGPU API.
  • It unifies neural inference and high-parallelism GPU operations through optimized methods like radix sort and bit-packed memory management.
  • The approach achieves up to 135× speedup over legacy WebGL techniques while significantly reducing memory usage in real-time 3D rendering.

WebGPU-powered Gaussian Splatting refers to a class of fully GPU-resident rendering architectures for 3D Gaussian Splatting (3DGS) that leverage the WebGPU API to deliver interactive neural scene synthesis—typically in a browser—with high throughput, cross-device determinism, and the ability to efficiently handle dynamic, generative, and avatar-driven content. These systems unify neural inference (often via ONNX Runtime in WebAssembly) and high-parallelism rendering in the browser, enabling end-to-end neural graphics pipelines with real-time frame rates and significantly reduced deployment friction compared to legacy WebGL-based approaches (Gong et al., 9 Dec 2025, Han et al., 3 Feb 2026).

1. Mathematical Foundations of Gaussian Splatting

Gaussian Splatting is based on the volumetric representation of a scene by a collection of 3D Gaussians, each parameterized by its center XR3X \in \mathbb{R}^3, covariance ΣR3×3\Sigma \in \mathbb{R}^{3\times3}, peak opacity σ\sigma, and color coefficients (often learned SH basis kik_i). The continuous (unnormalized) density is

G(x;X,Σ)=e12(xX)TΣ1(xX).G(\mathbf{x}; X, \Sigma) = e^{-\frac{1}{2} (\mathbf{x}-X)^T \Sigma^{-1} (\mathbf{x}-X)}.

A 3D Gaussian projects to a 2D Gaussian in screen space under the camera model. The per-pixel alpha contribution is

α=σexp(12xTΣ1x),\alpha = \sigma \cdot \exp\left(-\frac{1}{2} \mathbf{x}^\prime{}^T \Sigma^{\prime-1} \mathbf{x}^\prime\right),

where x\mathbf{x}^\prime is the 2D offset and Σ\Sigma^\prime is the projected 2D covariance.

Color per splat is view-dependent, typically evaluated via SH as ci=SH(ki,d)c_i = \mathrm{SH}(k_i, d). Final compositing follows sorted alpha blending:

C=i=1Nciαij=1i1(1αj).C = \sum_{i=1}^N c_i\,\alpha_i \prod_{j=1}^{i-1} (1-\alpha_j).

This formulation underpins both Visionary (Gong et al., 9 Dec 2025) and WebSplatter (Han et al., 3 Feb 2026).

2. End-to-End Pipeline Architecture

WebGPU-powered 3DGS platforms unify all major pipeline stages—asset loading, neural inference, depth sorting, geometry culling, rasterization, and postprocessing—entirely in the browser with explicit GPU-side memory management:

  1. Asset loading: Supports mesh assets (e.g., glTF, PLY) and ONNX "Gaussian Generator" models for variants such as classic 3DGS, MLP-based 3DGS, 4DGS, neural avatars, or style enhancement networks (Gong et al., 9 Dec 2025).
  2. Per-frame neural pre-decoding: Executes ONNX models in WebGPU (via ONNX Runtime WASM provider), computing per-frame Gaussian attributes given camera pose, frame index, and control signals (Gong et al., 9 Dec 2025).
  3. GPU-side preprocessing: Compute shaders project, cull (via AABB and opacity thresholds), and pre-transform Gaussians into screen-space ellipses and packed depth keys (Han et al., 3 Feb 2026).
  4. Fully GPU-resident sorting: Per-frame, back-to-front sorting via radix sort or deterministic hierarchical scans (eliminating the need for global atomics, as in WebSplatter) (Han et al., 3 Feb 2026).
  5. Rasterization and Compositing: Instanced splat rasterization with alpha compositing in correct depth order, optionally preceded by a mesh depth pre-pass (Gong et al., 9 Dec 2025, Han et al., 3 Feb 2026).
  6. Post-processing (optional): Feedforward ONNX models for denoising or style transfer can be invoked and applied in a fully GPU-side manner (Gong et al., 9 Dec 2025).

3. WebGPU Renderer Design, Sorting, and Memory

Primitive Sorting

Sorting millions of splats every frame by view depth is critical for correct compositing and numerical stability. Visionary implements radix sort in a single WebGPU compute pass, achieving ΣR3×3\Sigma \in \mathbb{R}^{3\times3}0 work; empirical timing is ΣR3×3\Sigma \in \mathbb{R}^{3\times3}1 ms for ΣR3×3\Sigma \in \mathbb{R}^{3\times3}2M splats on an RTX 4090 (Gong et al., 9 Dec 2025).

WebSplatter introduces a wait-free hierarchical radix sort adapted to the constraints of WebGPU (no global atomics, no fixed scheduling order). The algorithm decomposes sorting into:

  • Local histogram computation per workgroup (shared memory).
  • Hierarchical prefix scan ("HierBlelloch") for cross-workgroup offsets, implemented as a sequence of dispatches with only intra-workgroup barriers.
  • Global scatter, writing sorted keys/indices without spin-waiting.

This approach achieves ΣR3×3\Sigma \in \mathbb{R}^{3\times3}3 work per pass (4 passes), is deterministic, deadlock-free, and supports a wide range of hardware (Han et al., 3 Feb 2026).

Memory Management

  • Gaussian parameters are bit-packed (e.g., FP16 in ΣR3×3\Sigma \in \mathbb{R}^{3\times3}4 pairs) to halve bandwidth and reduce per-Gaussian memory from ΣR3×3\Sigma \in \mathbb{R}^{3\times3}532B to ΣR3×3\Sigma \in \mathbb{R}^{3\times3}616B (Gong et al., 9 Dec 2025).
  • Single monolithic buffers reduce allocation overhead and CPU-GPU synchronization.
  • Visionary uses atomic counters and indirect dispatch buffers to dynamically size output arrays per frame (Gong et al., 9 Dec 2025).
  • WebSplatter demonstrates GPU memory footprint: ΣR3×3\Sigma \in \mathbb{R}^{3\times3}7GB (RTX 3070, "garden" scene, ΣR3×3\Sigma \in \mathbb{R}^{3\times3}8M splats), a reduction of ΣR3×3\Sigma \in \mathbb{R}^{3\times3}9 to σ\sigma0 vs. prior viewers (Han et al., 3 Feb 2026).

4. Geometry Culling and Rasterization Optimizations

Both frameworks implement geometry-level pruning to minimize overdraw and improve memory efficiency:

  • Screen-space AABB culling: Each Gaussian's screen-projected ellipse is rapidly bounded and checked for viewport intersection.
  • Opacity-based culling: Splats below an opacity threshold (σ\sigma1) are excluded. Quad sizing is dynamically adjusted by solving σ\sigma2 for σ\sigma3, so only fragments where σ\sigma4 are rasterized (Han et al., 3 Feb 2026).
  • Disabling tight quad bounds causes significant performance penalty (e.g., σ\sigma5 render time on MacBook Air M4) (Han et al., 3 Feb 2026).

The net effect is substantial reduction in fragment shading work, peak GPU memory, and overdraw.

5. Neural Inference Integration and Dynamic Content

WebGPU-powered platforms support dynamic or generative Gaussian content via ONNX Runtime integration:

  • Gaussian Generator Contract: Fixed ONNX I/O schema with camera/projection matrices, frame index, control signals (inputs); positions σ\sigma6, covariances σ\sigma7, colors, and opacities (outputs); plus metadata such as SH degree. This enables per-frame, stateless neural generation and updating of Gaussians (Gong et al., 9 Dec 2025).
  • Lifecycle: Models are exported (e.g., PyTorch to ONNX, possibly with chunking of large ops), loaded and warm-started in the browser, then invoked per frame to generate splat parameters for rendering (Gong et al., 9 Dec 2025).
  • Extensibility: Plug-and-play support for different 3DGS variants (classic, MLP-based, 4DGS, neural avatars, single-shot pixel splat networks). Optional post-processing (diffusion, style transfer) via additional ONNX passes allows direct composable neural graphics (Gong et al., 9 Dec 2025).
  • APIs: TypeScript interfaces (e.g., three.js plugin) permit streamlined integration for web applications; sample code provided for end-to-end inference and rendering (Gong et al., 9 Dec 2025).

6. Performance Evaluation and Robustness

Empirical Benchmarks

Benchmarking on commodity GPUs and a variety of browsers yields the following highlights:

#Gaussians SparkJS sort (ms) SparkJS total (ms) Visionary sort (ms) Visionary total (ms)
6.062M 172.87 176.90 0.58 2.09
3.031M 143.50 145.75 0.32 1.09
0.758M 33.31 33.82 0.20 0.40

Visionary achieves up to σ\sigma8 speedup over SparkJS (WebGL) with comparable visual fidelity (PSNR σ\sigma9 vs. kik_i0 for MipNeRF360) (Gong et al., 9 Dec 2025). WebSplatter achieves kik_i1 to kik_i2 speedup over state-of-the-art WebGPU viewers across diverse hardware, with consistent memory savings (Han et al., 3 Feb 2026).

Device WebSplatter (ms) Best prior (ms)
RTX 3070 (Chrome) 9.50 14.4
MacBook Air M4 68.6 78.5
MacBook Pro M1 112.0 225.2
Intel NUC iGPU 151.2 341.7
Redmi K70 Pro 33.6 39.5

Sorting and Robustness

  • Per-frame global sort eliminates artifacts and ensures correct alpha compositing under rapid camera motion, outperforming "lazy sorting" and local partitioning approaches in legacy frameworks (Gong et al., 9 Dec 2025).
  • Systems sustain interactive performance for up to approximately kik_i3M splats; devices with less than kik_i4GB VRAM may present constraints (Han et al., 3 Feb 2026).
  • ONNX inference latency is typically kik_i5–kik_i6 ms for standard models (Scaffold-GS, avatars, 4DGS) at multi-million element scale (Gong et al., 9 Dec 2025).

7. Applications, Extensibility, and Deployment

WebGPU-powered Gaussian Splatting platforms prove extensible for a variety of upstream and downstream tasks:

  • 3DGS family variants: Native support for classic 3DGS, MLP-based methods, dynamic 4DGS, neural avatars (e.g., Gauhuman, R³-Avatar), and efficient single-shot models (PixelSplat, MVSplat).
  • Generative post-processing: Integration of diffusion-based denoisers and style transformers for neural scene editing (Gong et al., 9 Dec 2025).
  • Web-native deployment: "Click-to-run" execution in browsers with static HTML+JS+ONNX hosting—no native dependencies; integration in frameworks such as three.js via concise TypeScript APIs (Gong et al., 9 Dec 2025).
  • Future directions: Out-of-core streaming for larger scenes, mesh/splat hybridization for extremely large worlds, and memory reduction via quantization/compression (Han et al., 3 Feb 2026).

A unified inference and rasterization pipeline in the browser significantly lowers barriers for reproduction, comparison, and deployment of neural graphics research and applications across reconstructive and generative paradigms.


Key References:

  • "Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform" (Gong et al., 9 Dec 2025)
  • "WebSplatter: Enabling Cross-Device Efficient Gaussian Splatting in Web Browsers via WebGPU" (Han et al., 3 Feb 2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to WebGPU-powered Gaussian Splatting.