WebGPU Gaussian Splatting

Updated 12 June 2026

The paper introduces fully GPU-resident neural rendering pipelines that deliver interactive scene synthesis by harnessing the WebGPU API.
It unifies neural inference and high-parallelism GPU operations through optimized methods like radix sort and bit-packed memory management.
The approach achieves up to 135× speedup over legacy WebGL techniques while significantly reducing memory usage in real-time 3D rendering.

WebGPU-powered Gaussian Splatting refers to a class of fully GPU-resident rendering architectures for 3D Gaussian Splatting (3DGS) that leverage the WebGPU API to deliver interactive neural scene synthesis—typically in a browser—with high throughput, cross-device determinism, and the ability to efficiently handle dynamic, generative, and avatar-driven content. These systems unify neural inference (often via ONNX Runtime in WebAssembly) and high-parallelism rendering in the browser, enabling end-to-end neural graphics pipelines with real-time frame rates and significantly reduced deployment friction compared to legacy WebGL-based approaches (Gong et al., 9 Dec 2025, Han et al., 3 Feb 2026).

1. Mathematical Foundations of Gaussian Splatting

Gaussian Splatting is based on the volumetric representation of a scene by a collection of 3D Gaussians, each parameterized by its center $X \in \mathbb{R}^3$ , covariance $\Sigma \in \mathbb{R}^{3\times3}$ , peak opacity $\sigma$ , and color coefficients (often learned SH basis $k_i$ ). The continuous (unnormalized) density is

$G(\mathbf{x}; X, \Sigma) = e^{-\frac{1}{2} (\mathbf{x}-X)^T \Sigma^{-1} (\mathbf{x}-X)}.$

A 3D Gaussian projects to a 2D Gaussian in screen space under the camera model. The per-pixel alpha contribution is

$\alpha = \sigma \cdot \exp\left(-\frac{1}{2} \mathbf{x}^\prime{}^T \Sigma^{\prime-1} \mathbf{x}^\prime\right),$

where $\mathbf{x}^\prime$ is the 2D offset and $\Sigma^\prime$ is the projected 2D covariance.

Color per splat is view-dependent, typically evaluated via SH as $c_i = \mathrm{SH}(k_i, d)$ . Final compositing follows sorted alpha blending:

$C = \sum_{i=1}^N c_i\,\alpha_i \prod_{j=1}^{i-1} (1-\alpha_j).$

This formulation underpins both Visionary (Gong et al., 9 Dec 2025) and WebSplatter (Han et al., 3 Feb 2026).

2. End-to-End Pipeline Architecture

WebGPU-powered 3DGS platforms unify all major pipeline stages—asset loading, neural inference, depth sorting, geometry culling, rasterization, and postprocessing—entirely in the browser with explicit GPU-side memory management:

Asset loading: Supports mesh assets (e.g., glTF, PLY) and ONNX "Gaussian Generator" models for variants such as classic 3DGS, MLP-based 3DGS, 4DGS, neural avatars, or style enhancement networks (Gong et al., 9 Dec 2025).
Per-frame neural pre-decoding: Executes ONNX models in WebGPU (via ONNX Runtime WASM provider), computing per-frame Gaussian attributes given camera pose, frame index, and control signals (Gong et al., 9 Dec 2025).
GPU-side preprocessing: Compute shaders project, cull (via AABB and opacity thresholds), and pre-transform Gaussians into screen-space ellipses and packed depth keys (Han et al., 3 Feb 2026).
Fully GPU-resident sorting: Per-frame, back-to-front sorting via radix sort or deterministic hierarchical scans (eliminating the need for global atomics, as in WebSplatter) (Han et al., 3 Feb 2026).
Rasterization and Compositing: Instanced splat rasterization with alpha compositing in correct depth order, optionally preceded by a mesh depth pre-pass (Gong et al., 9 Dec 2025, Han et al., 3 Feb 2026).
Post-processing (optional): Feedforward ONNX models for denoising or style transfer can be invoked and applied in a fully GPU-side manner (Gong et al., 9 Dec 2025).

3. WebGPU Renderer Design, Sorting, and Memory

Primitive Sorting

Sorting millions of splats every frame by view depth is critical for correct compositing and numerical stability. Visionary implements radix sort in a single WebGPU compute pass, achieving $\Sigma \in \mathbb{R}^{3\times3}$ 0 work; empirical timing is $\Sigma \in \mathbb{R}^{3\times3}$ 1 ms for $\Sigma \in \mathbb{R}^{3\times3}$ 2M splats on an RTX 4090 (Gong et al., 9 Dec 2025).

WebSplatter introduces a wait-free hierarchical radix sort adapted to the constraints of WebGPU (no global atomics, no fixed scheduling order). The algorithm decomposes sorting into:

Local histogram computation per workgroup (shared memory).
Hierarchical prefix scan ("HierBlelloch") for cross-workgroup offsets, implemented as a sequence of dispatches with only intra-workgroup barriers.
Global scatter, writing sorted keys/indices without spin-waiting.

This approach achieves $\Sigma \in \mathbb{R}^{3\times3}$ 3 work per pass (4 passes), is deterministic, deadlock-free, and supports a wide range of hardware (Han et al., 3 Feb 2026).

Memory Management

Gaussian parameters are bit-packed (e.g., FP16 in $\Sigma \in \mathbb{R}^{3\times3}$ 4 pairs) to halve bandwidth and reduce per-Gaussian memory from $\Sigma \in \mathbb{R}^{3\times3}$ 532B to $\Sigma \in \mathbb{R}^{3\times3}$ 616B (Gong et al., 9 Dec 2025).
Single monolithic buffers reduce allocation overhead and CPU-GPU synchronization.
Visionary uses atomic counters and indirect dispatch buffers to dynamically size output arrays per frame (Gong et al., 9 Dec 2025).
WebSplatter demonstrates GPU memory footprint: $\Sigma \in \mathbb{R}^{3\times3}$ 7GB (RTX 3070, "garden" scene, $\Sigma \in \mathbb{R}^{3\times3}$ 8M splats), a reduction of $\Sigma \in \mathbb{R}^{3\times3}$ 9 to $\sigma$ 0 vs. prior viewers (Han et al., 3 Feb 2026).

4. Geometry Culling and Rasterization Optimizations

Both frameworks implement geometry-level pruning to minimize overdraw and improve memory efficiency:

Screen-space AABB culling: Each Gaussian's screen-projected ellipse is rapidly bounded and checked for viewport intersection.
Opacity-based culling: Splats below an opacity threshold ( $\sigma$ 1) are excluded. Quad sizing is dynamically adjusted by solving $\sigma$ 2 for $\sigma$ 3, so only fragments where $\sigma$ 4 are rasterized (Han et al., 3 Feb 2026).
Disabling tight quad bounds causes significant performance penalty (e.g., $\sigma$ 5 render time on MacBook Air M4) (Han et al., 3 Feb 2026).

The net effect is substantial reduction in fragment shading work, peak GPU memory, and overdraw.

5. Neural Inference Integration and Dynamic Content

WebGPU-powered platforms support dynamic or generative Gaussian content via ONNX Runtime integration:

Gaussian Generator Contract: Fixed ONNX I/O schema with camera/projection matrices, frame index, control signals (inputs); positions $\sigma$ 6, covariances $\sigma$ 7, colors, and opacities (outputs); plus metadata such as SH degree. This enables per-frame, stateless neural generation and updating of Gaussians (Gong et al., 9 Dec 2025).
Lifecycle: Models are exported (e.g., PyTorch to ONNX, possibly with chunking of large ops), loaded and warm-started in the browser, then invoked per frame to generate splat parameters for rendering (Gong et al., 9 Dec 2025).
Extensibility: Plug-and-play support for different 3DGS variants (classic, MLP-based, 4DGS, neural avatars, single-shot pixel splat networks). Optional post-processing (diffusion, style transfer) via additional ONNX passes allows direct composable neural graphics (Gong et al., 9 Dec 2025).
APIs: TypeScript interfaces (e.g., three.js plugin) permit streamlined integration for web applications; sample code provided for end-to-end inference and rendering (Gong et al., 9 Dec 2025).

6. Performance Evaluation and Robustness

Empirical Benchmarks

Benchmarking on commodity GPUs and a variety of browsers yields the following highlights:

#Gaussians	SparkJS sort (ms)	SparkJS total (ms)	Visionary sort (ms)	Visionary total (ms)
6.062M	172.87	176.90	0.58	2.09
3.031M	143.50	145.75	0.32	1.09
0.758M	33.31	33.82	0.20	0.40

Visionary achieves up to $\sigma$ 8 speedup over SparkJS (WebGL) with comparable visual fidelity (PSNR $\sigma$ 9 vs. $k_i$ 0 for MipNeRF360) (Gong et al., 9 Dec 2025). WebSplatter achieves $k_i$ 1 to $k_i$ 2 speedup over state-of-the-art WebGPU viewers across diverse hardware, with consistent memory savings (Han et al., 3 Feb 2026).

Device	WebSplatter (ms)	Best prior (ms)
RTX 3070 (Chrome)	9.50	14.4
MacBook Air M4	68.6	78.5
MacBook Pro M1	112.0	225.2
Intel NUC iGPU	151.2	341.7
Redmi K70 Pro	33.6	39.5

Sorting and Robustness

Per-frame global sort eliminates artifacts and ensures correct alpha compositing under rapid camera motion, outperforming "lazy sorting" and local partitioning approaches in legacy frameworks (Gong et al., 9 Dec 2025).
Systems sustain interactive performance for up to approximately $k_i$ 3M splats; devices with less than $k_i$ 4GB VRAM may present constraints (Han et al., 3 Feb 2026).
ONNX inference latency is typically $k_i$ 5– $k_i$ 6 ms for standard models (Scaffold-GS, avatars, 4DGS) at multi-million element scale (Gong et al., 9 Dec 2025).

7. Applications, Extensibility, and Deployment

WebGPU-powered Gaussian Splatting platforms prove extensible for a variety of upstream and downstream tasks:

3DGS family variants: Native support for classic 3DGS, MLP-based methods, dynamic 4DGS, neural avatars (e.g., Gauhuman, R³-Avatar), and efficient single-shot models (PixelSplat, MVSplat).
Generative post-processing: Integration of diffusion-based denoisers and style transformers for neural scene editing (Gong et al., 9 Dec 2025).
Web-native deployment: "Click-to-run" execution in browsers with static HTML+JS+ONNX hosting—no native dependencies; integration in frameworks such as three.js via concise TypeScript APIs (Gong et al., 9 Dec 2025).
Future directions: Out-of-core streaming for larger scenes, mesh/splat hybridization for extremely large worlds, and memory reduction via quantization/compression (Han et al., 3 Feb 2026).

A unified inference and rasterization pipeline in the browser significantly lowers barriers for reproduction, comparison, and deployment of neural graphics research and applications across reconstructive and generative paradigms.

Key References:

"Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform" (Gong et al., 9 Dec 2025)
"WebSplatter: Enabling Cross-Device Efficient Gaussian Splatting in Web Browsers via WebGPU" (Han et al., 3 Feb 2026)

Markdown Report Issue Upgrade to Chat

References (2)

Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform (2025)

WebSplatter: Enabling Cross-Device Efficient Gaussian Splatting in Web Browsers via WebGPU (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to WebGPU-powered Gaussian Splatting.

WebGPU Gaussian Splatting

1. Mathematical Foundations of Gaussian Splatting

2. End-to-End Pipeline Architecture

3. WebGPU Renderer Design, Sorting, and Memory

Primitive Sorting

Memory Management

4. Geometry Culling and Rasterization Optimizations

5. Neural Inference Integration and Dynamic Content

6. Performance Evaluation and Robustness

Empirical Benchmarks

Sorting and Robustness

7. Applications, Extensibility, and Deployment

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

WebGPU Gaussian Splatting

1. Mathematical Foundations of Gaussian Splatting

2. End-to-End Pipeline Architecture

3. WebGPU Renderer Design, Sorting, and Memory

Primitive Sorting

Memory Management

4. Geometry Culling and Rasterization Optimizations

5. Neural Inference Integration and Dynamic Content

6. Performance Evaluation and Robustness

Empirical Benchmarks

Sorting and Robustness

7. Applications, Extensibility, and Deployment

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research