Papers
Topics
Authors
Recent
Search
2000 character limit reached

GaussianCube: Efficient 3D Radiance Model

Updated 27 January 2026
  • GaussianCube is a 3D radiance representation that uses a structured voxel grid with fixed Gaussian components for accurate object modeling.
  • It employs a densification-constrained Gaussian fitting algorithm and optimal transport-based voxelization to ensure parameter efficiency and precise reconstruction.
  • The regular grid structure seamlessly integrates with 3D U-Net diffusion models, achieving state-of-the-art performance in generative tasks.

GaussianCube is a fully explicit, spatially structured 3D radiance representation designed to facilitate high-fidelity and parameter-efficient 3D generative modeling. It merges the real-time rendering and reconstruction accuracy of 3D Gaussian Splatting with a regular voxel-grid format, enabling seamless integration with standard 3D U-Net diffusion models and substantially reducing the parameter requirements characteristic of previous explicit and implicit radiance proxies (Zhang et al., 2024).

1. Formal Structure and Representation

GaussianCube represents a single 3D object by a fixed set of NmaxN_{max} Gaussians,

gi(x)=exp(12(xμi)Σi1(xμi)),g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),

each parameterized by position μiR3\mu_i \in \mathbb{R}^3, color ciR3c_i \in \mathbb{R}^3, opacity αi\alpha_i, scale siR3s_i \in \mathbb{R}^3, and rotation qiR4q_i \in \mathbb{R}^4, collectively forming a feature vector θi={μi,si,qi,αi,ci}RC\theta_i = \{\mu_i, s_i, q_i, \alpha_i, c_i\} \in \mathbb{R}^C for each Gaussian. In typical experiments, Nmax=32,768N_{max} = 32,768, arranged into a Nv×Nv×NvN_v \times N_v \times N_v voxel grid, with gi(x)=exp(12(xμi)Σi1(xμi)),g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),0 such that each voxel contains a single Gaussian’s features.

Contrary to hybrid NeRF proxies that use a shared implicit decoder and unstructured arrangements, GaussianCube is fully explicit—each object is directly represented without a decoder bottleneck. This structure is conducive to efficient convolutional neural network operations and ensures a constant number of parameters per scene—a prerequisite for scalable generative modeling.

2. Densification-Constrained Gaussian Fitting

The representation is constructed via a densification-constrained Gaussian fitting algorithm. Traditional Gaussian Splatting alternates between densification (splitting or cloning Gaussians) and pruning, resulting in variable and often excessive numbers of components (typically exceeding gi(x)=exp(12(xμi)Σi1(xμi)),g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),1 per scene). GaussianCube restricts the number of active Gaussians to exactly gi(x)=exp(12(xμi)Σi1(xμi)),g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),2, crucial for subsequent grid voxelization and generative tasks.

During fitting, at each iteration, the set of candidates gi(x)=exp(12(xμi)Σi1(xμi)),g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),3 for densification is compared to the available capacity gi(x)=exp(12(xμi)Σi1(xμi)),g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),4 (where gi(x)=exp(12(xμi)Σi1(xμi)),g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),5 is the current number of Gaussians). If gi(x)=exp(12(xμi)Σi1(xμi)),g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),6, all candidates are densified; otherwise, only the top gi(x)=exp(12(xμi)Σi1(xμi)),g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),7 are selected by view-space positional gradient. Splitting and cloning are interleaved but capped, and after convergence, Gaussian count is pruned to gi(x)=exp(12(xμi)Σi1(xμi)),g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),8. Any deficit is padded with Gaussians with zero opacity (gi(x)=exp(12(xμi)Σi1(xμi)),g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),9) to ensure a fixed-size grid.

This iterative process can be viewed as an approximate solution to a regularized density-matching problem:

μiR3\mu_i \in \mathbb{R}^30

subject to μiR3\mu_i \in \mathbb{R}^31, μiR3\mu_i \in \mathbb{R}^32, μiR3\mu_i \in \mathbb{R}^33, where μiR3\mu_i \in \mathbb{R}^34 are sample points, μiR3\mu_i \in \mathbb{R}^35 measures density mismatch (e.g., squared error between predicted and target opacity), and μiR3\mu_i \in \mathbb{R}^36 regularizes Gaussian shape.

3. Optimal Transport-Based Voxelization

After fitting μiR3\mu_i \in \mathbb{R}^37 Gaussians μiR3\mu_i \in \mathbb{R}^38, a bijective optimal transport mapping assigns them to the pre-defined μiR3\mu_i \in \mathbb{R}^39 voxel grid positions ciR3c_i \in \mathbb{R}^30. A cost matrix is constructed as

ciR3c_i \in \mathbb{R}^31

and a linear assignment (discrete optimal transport) problem is solved:

ciR3c_i \in \mathbb{R}^32

The Jonker–Volgenant algorithm is used for cubic-time assignment (practically approximated by spatial block partitioning). Each Gaussian is assigned to a unique voxel, and for compactness, the stored features are the offsets ciR3c_i \in \mathbb{R}^33 rather than absolute positions. The resulting structured array ciR3c_i \in \mathbb{R}^34 is the GaussianCube.

4. Diffusion Modeling in Structured Grid Space

The regularity of the GaussianCube grid permits direct use of standard 3D U-Net architectures for denoising diffusion probabilistic modeling. The forward process adds Gaussian noise to the grid representation ciR3c_i \in \mathbb{R}^35 at each step ciR3c_i \in \mathbb{R}^36:

ciR3c_i \in \mathbb{R}^37

with a cosine schedule for ciR3c_i \in \mathbb{R}^38. The 3D U-Net predicts the clean ciR3c_i \in \mathbb{R}^39 from noisy αi\alpha_i0, using an “L2 loss”:

αi\alpha_i1

optionally conditioning on class αi\alpha_i2 via adaptive GroupNorm.

An additional image-level reconstruction loss—a combination of pixelwise αi\alpha_i3 and feature losses using VGG feature maps—is imposed:

αi\alpha_i4

where αi\alpha_i5 is the rendered output, αi\alpha_i6 is the ground truth, and αi\alpha_i7 indicates VGG layers. The total objective is

αi\alpha_i8

with αi\alpha_i9. The grid structure obviates the need for custom architectures, allowing replacement of all 2D U-Net modules with their 3D analogues.

5. Comparative Parameter Efficiency and Fidelity

GaussianCube achieves high-fidelity reconstruction with one to two orders of magnitude fewer parameters than prior explicit or hybrid radiance proxies. The following table summarizes results on ShapeNet Car:

Method PSNR LPIPS SSIM Speed (siR3s_i \in \mathbb{R}^30) Params (M)
Instant-NGP 33.98 0.0386 0.9809 1.00 12.3
Gaussian Splatting 35.32 0.0303 0.9874 2.60 1.84
Voxel (shared decoder) 25.80 0.1407 0.9111 1.73 0.47
Triplane (shared) 31.39 0.0759 0.9635 1.05 6.3
GaussianCube 34.94 0.0347 0.9863 3.33 0.46

GaussianCube matches or surpasses methods such as Instant-NGP and Triplane on PSNR, LPIPS, and SSIM, while using dramatically fewer parameters—a factor of siR3s_i \in \mathbb{R}^31 fewer than Instant-NGP and siR3s_i \in \mathbb{R}^32 fewer than Triplane (both with shared decoders).

6. Empirical Performance on Generative Tasks

In unconditional and class-conditioned 3D object generation (ShapeNet Car/Chair, OmniObject3D), GaussianCube exhibits state-of-the-art quantitative and qualitative results, measured by FID-50K and KID-50K scores:

Task Metric GaussianCube Baseline
ShapeNet Car FID-50K 13.01 GET3D/17.15, EG3D/30.48, DiffTF/51.88
ShapeNet Chair FID-50K 15.99 GET3D/19.24, EG3D/27.98, DiffTF/47.08
OmniObject3D, class-cond. FID-50K 11.62 DiffTF/46.06
OmniObject3D, class-cond. KID-50K 2.78‰ DiffTF/22.86‰

Qualitatively, GaussianCube is reported to yield objects with complex geometry and sharp textures, while GAN and Triplane-diffusion competitors show blur or failure to capture fine details (Zhang et al., 2024). The method’s fully explicit structure and regular grid make it directly extensible to further applications such as digital avatar creation and text-to-3D synthesis, where similar parameter and fidelity advantages are anticipated.

7. Integration and Broader Applicability

GaussianCube’s explicitness, parameter efficiency, and regularized voxel-grid formulation enable direct use with off-the-shelf generative backbones, without custom architectural changes. This positions it as a versatile foundation for future 3D generative modeling tasks, particularly those requiring high-quality synthesis with compact and structured radiance proxies (Zhang et al., 2024). A plausible implication is that, as diffusion-based and text-guided synthesis scale in complexity, GaussianCube’s approach may yield persistent benefits in quality and efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GaussianCube.