ONNX-based Gaussian Generator
- ONNX-based Gaussian Generator is a standardized module that bridges neural 3D Gaussian Splatting models with WebGPU rendering pipelines via a strict I/O contract.
- It employs a per-frame, asynchronous TypeScript API and onnxruntime-web to efficiently convert ONNX model outputs into GPU-optimized tensors for real-time compositing.
- This design decouples generator logic from rendering, enabling advanced neural processing with up to 135× speed improvements while maintaining high visual quality.
An ONNX-based Gaussian Generator is a standardized software module that binds ONNX-exported 3D Gaussian Splatting (3DGS) models to high-performance real-time rendering pipelines, specifically for deployment in WebGPU-enabled environments. This approach, exemplified by the Visionary platform, formalizes a "contract" enforcing consistent I/O conventions between neural 3DGS generators—ranging from classic to MLP-based, 4D-extended, and avatar-conditioned networks—and a browser-native renderer. It enables per-frame, plug-and-play neural processing, facilitating both reconstructive and generative rendering paradigms and allowing advanced feedforward post-processing directly on the client side (Gong et al., 9 Dec 2025).
1. Standardized Gaussian Generator Contract
The core feature of the ONNX-based Gaussian Generator is a strict contract-defined interface that decouples generator implementation from rendering. The interface mandates:
- Per-frame input tensors encoding camera parameters (such as a 4×4 view-projection matrix), frame index, and optional control signals.
- Output tensors containing packed per-Gaussian attributes: positions (), anisotropic covariances (upper-triangular part of ), colors (), and opacities ().
This contract is concretely instantiated through a concise TypeScript API using onnxruntime-web. The generator is initialized from an ONNX model URL and exposes an asynchronous .run() method, returning the requisite outputs as GPU-uploadable tensors. These contract specifications generalize across task variants (static 3DGS, dynamic 4DGS, avatars, etc.), enabling seamless switching or chaining of ONNX generators in a single WebGPU-powered rendering pipeline (Gong et al., 9 Dec 2025).
2. Mathematical Formalism of 3D Gaussian Splatting
The mathematical basis follows the canonical 3DGS definition. Each scene is represented by a set of Gaussian primitives:
Spatial scale and orientation are encoded via a diagonal scaling and rotation quaternion: . At render time:
- Gaussians centers are projected: using the camera matrix . The projected covariance becomes .
- Each pixel receives a splat contribution:
- Per-pixel color is composited in strict back-to-front order:
This pipeline supports both isotropic and anisotropic splats as required by the generator and variant type (Gong et al., 9 Dec 2025).
3. Per-Frame ONNX to WebGPU Execution Pipeline
The per-frame data flow is orchestrated entirely in-browser, without CPU-side post-processing or server dependencies:
- Generator Stage: The Gaussian generator (ONNX) is executed on WebGPU via onnxruntime-web. Models may implement classic anchor-based, MLP-based, 4DGS deformation, or avatar LBS kinematics, returning standard outputs.
- Pre-packing & Upload: Outputs are cast to FP16 and densely packed (two values per u32 word) for efficient GPU buffer upload.
- WebGPU Preprocessing: A compute shader transforms and culls Gaussians, computes 2D projected covariances, and materializes per-Gaussian screen-aligned quads and depth keys.
- GPU Radix Sort: A parallel sort is performed entirely on the GPU (O()), yielding a globally ordered index list for back-to-front blending.
- Instanced Splat Rasterization: All sorted splats are rendered in a single pass, with optional mesh occlusion via a prior depth-prepass.
The data never leaves the GPU throughout these stages, ensuring minimal bandwidth, maximal memory locality, and frame-to-frame consistency (Gong et al., 9 Dec 2025).
4. GPU-Based Primitive Sorting and Compositing
Back-to-front compositing is essential for correct alpha blending in 3DGS. The pipeline:
- Writes -length buffers of depths and indices in the compute stage.
- Applies a fast WebGPU radix sort (McIlroy et al. 1993) over depths.
- Binds the sorted index buffer directly to the rasterizer, enabling O(1) access in the vertex shader and guaranteeing global compositing order.
This approach eliminates the need for CPU-bound sorts (as in SparkJS) or approximate local sorts (SuperSplat). Empirical results report 100×–150× lower frame times for sorting and end-to-end rendering in large-scale scenes with millions of Gaussians, with no local sorting artifacts or lag under rapid viewpoint changes (Gong et al., 9 Dec 2025).
Benchmark Comparison Table
| Gaussians | SparkJS (WebGL + CPU sort) | Visionary (WebGPU all-in-shader) |
|---|---|---|
| 6.06 M | 176.9 ms | 2.09 ms |
| 3.03 M | 145.8 ms | 1.09 ms |
| 1.52 M | 46.3 ms | 0.60 ms |
| 0.76 M | 33.8 ms | 0.40 ms |
Visionary achieves up to approximately 135× end-to-end speedup, with output quality (PSNR/SSIM/LPIPS) comparable to or exceeding SparkJS (Gong et al., 9 Dec 2025).
5. Extensibility: Custom ONNX Generators and Post-Processors
The contract enables fully client-side, browser-executed generative and enhancement networks, including style transfer, diffusion denoising, and neural avatars. The required ONNX I/O must conform to the contract: custom networks, e.g., style nets, are trained/exported in PyTorch and exported with precise input/output naming and axes, then loaded and invoked per-frame via TypeScript APIs. Further, optional post-processing modules can act on rendered outputs (e.g., color images) and primitive buffers simultaneously. This allows complex generative pipelines (such as real-time style transfer or appearance editing) to be executed entirely in-browser without Python backends or server compute (Gong et al., 9 Dec 2025).
6. Integration, Applications, and Significance
By decoupling neural generator logic (authored in PyTorch or similar) from the rendering backend, ONNX-based Gaussian Generators make it practical to experiment with and deploy new 3DGS-family algorithms, including reconstructive, generative, and dynamic content paradigms. Visionary supplies a concise TypeScript API for per-frame generator invocation and primitive upload, optimized GPU memory packing, and unified render loop management for traditional mesh and 3DGS-based scenes. This structure significantly lowers the technical barrier for reproduction, extension, and comparison of state-of-the-art neural rendering approaches, supporting real-time operation and extensibility in the browser environment (Gong et al., 9 Dec 2025).
The zero-install pipeline—ONNX generator to WebGPU rasterizer—addresses the main historical bottlenecks: fragmented pipelines, CPU-GPU dataflow inefficiencies, and lack of generativity or throughput in interactive viewers. Its architecture enables both academic benchmarking and deployment in production systems, serving as a unified substrate for world model rendering and generative visual effects.