GaussianCube: Efficient 3D Radiance Model
- GaussianCube is a 3D radiance representation that uses a structured voxel grid with fixed Gaussian components for accurate object modeling.
- It employs a densification-constrained Gaussian fitting algorithm and optimal transport-based voxelization to ensure parameter efficiency and precise reconstruction.
- The regular grid structure seamlessly integrates with 3D U-Net diffusion models, achieving state-of-the-art performance in generative tasks.
GaussianCube is a fully explicit, spatially structured 3D radiance representation designed to facilitate high-fidelity and parameter-efficient 3D generative modeling. It merges the real-time rendering and reconstruction accuracy of 3D Gaussian Splatting with a regular voxel-grid format, enabling seamless integration with standard 3D U-Net diffusion models and substantially reducing the parameter requirements characteristic of previous explicit and implicit radiance proxies (Zhang et al., 2024).
1. Formal Structure and Representation
GaussianCube represents a single 3D object by a fixed set of Gaussians,
each parameterized by position , color , opacity , scale , and rotation , collectively forming a feature vector for each Gaussian. In typical experiments, , arranged into a voxel grid, with such that each voxel contains a single Gaussian’s features.
Contrary to hybrid NeRF proxies that use a shared implicit decoder and unstructured arrangements, GaussianCube is fully explicit—each object is directly represented without a decoder bottleneck. This structure is conducive to efficient convolutional neural network operations and ensures a constant number of parameters per scene—a prerequisite for scalable generative modeling.
2. Densification-Constrained Gaussian Fitting
The representation is constructed via a densification-constrained Gaussian fitting algorithm. Traditional Gaussian Splatting alternates between densification (splitting or cloning Gaussians) and pruning, resulting in variable and often excessive numbers of components (typically exceeding per scene). GaussianCube restricts the number of active Gaussians to exactly , crucial for subsequent grid voxelization and generative tasks.
During fitting, at each iteration, the set of candidates for densification is compared to the available capacity (where is the current number of Gaussians). If , all candidates are densified; otherwise, only the top are selected by view-space positional gradient. Splitting and cloning are interleaved but capped, and after convergence, Gaussian count is pruned to . Any deficit is padded with Gaussians with zero opacity () to ensure a fixed-size grid.
This iterative process can be viewed as an approximate solution to a regularized density-matching problem:
subject to , , , where are sample points, measures density mismatch (e.g., squared error between predicted and target opacity), and regularizes Gaussian shape.
3. Optimal Transport-Based Voxelization
After fitting Gaussians , a bijective optimal transport mapping assigns them to the pre-defined voxel grid positions . A cost matrix is constructed as
and a linear assignment (discrete optimal transport) problem is solved:
The Jonker–Volgenant algorithm is used for cubic-time assignment (practically approximated by spatial block partitioning). Each Gaussian is assigned to a unique voxel, and for compactness, the stored features are the offsets rather than absolute positions. The resulting structured array is the GaussianCube.
4. Diffusion Modeling in Structured Grid Space
The regularity of the GaussianCube grid permits direct use of standard 3D U-Net architectures for denoising diffusion probabilistic modeling. The forward process adds Gaussian noise to the grid representation at each step :
with a cosine schedule for . The 3D U-Net predicts the clean from noisy , using an “L2 loss”:
optionally conditioning on class via adaptive GroupNorm.
An additional image-level reconstruction loss—a combination of pixelwise and feature losses using VGG feature maps—is imposed:
where is the rendered output, is the ground truth, and indicates VGG layers. The total objective is
with . The grid structure obviates the need for custom architectures, allowing replacement of all 2D U-Net modules with their 3D analogues.
5. Comparative Parameter Efficiency and Fidelity
GaussianCube achieves high-fidelity reconstruction with one to two orders of magnitude fewer parameters than prior explicit or hybrid radiance proxies. The following table summarizes results on ShapeNet Car:
| Method | PSNR | LPIPS | SSIM | Speed () | Params (M) |
|---|---|---|---|---|---|
| Instant-NGP | 33.98 | 0.0386 | 0.9809 | 1.00 | 12.3 |
| Gaussian Splatting | 35.32 | 0.0303 | 0.9874 | 2.60 | 1.84 |
| Voxel (shared decoder) | 25.80 | 0.1407 | 0.9111 | 1.73 | 0.47 |
| Triplane (shared) | 31.39 | 0.0759 | 0.9635 | 1.05 | 6.3 |
| GaussianCube | 34.94 | 0.0347 | 0.9863 | 3.33 | 0.46 |
GaussianCube matches or surpasses methods such as Instant-NGP and Triplane on PSNR, LPIPS, and SSIM, while using dramatically fewer parameters—a factor of fewer than Instant-NGP and fewer than Triplane (both with shared decoders).
6. Empirical Performance on Generative Tasks
In unconditional and class-conditioned 3D object generation (ShapeNet Car/Chair, OmniObject3D), GaussianCube exhibits state-of-the-art quantitative and qualitative results, measured by FID-50K and KID-50K scores:
| Task | Metric | GaussianCube | Baseline |
|---|---|---|---|
| ShapeNet Car | FID-50K | 13.01 | GET3D/17.15, EG3D/30.48, DiffTF/51.88 |
| ShapeNet Chair | FID-50K | 15.99 | GET3D/19.24, EG3D/27.98, DiffTF/47.08 |
| OmniObject3D, class-cond. | FID-50K | 11.62 | DiffTF/46.06 |
| OmniObject3D, class-cond. | KID-50K | 2.78‰ | DiffTF/22.86‰ |
Qualitatively, GaussianCube is reported to yield objects with complex geometry and sharp textures, while GAN and Triplane-diffusion competitors show blur or failure to capture fine details (Zhang et al., 2024). The method’s fully explicit structure and regular grid make it directly extensible to further applications such as digital avatar creation and text-to-3D synthesis, where similar parameter and fidelity advantages are anticipated.
7. Integration and Broader Applicability
GaussianCube’s explicitness, parameter efficiency, and regularized voxel-grid formulation enable direct use with off-the-shelf generative backbones, without custom architectural changes. This positions it as a versatile foundation for future 3D generative modeling tasks, particularly those requiring high-quality synthesis with compact and structured radiance proxies (Zhang et al., 2024). A plausible implication is that, as diffusion-based and text-guided synthesis scale in complexity, GaussianCube’s approach may yield persistent benefits in quality and efficiency.