GaussianCube: Efficient 3D Radiance Model

Updated 27 January 2026

GaussianCube is a 3D radiance representation that uses a structured voxel grid with fixed Gaussian components for accurate object modeling.
It employs a densification-constrained Gaussian fitting algorithm and optimal transport-based voxelization to ensure parameter efficiency and precise reconstruction.
The regular grid structure seamlessly integrates with 3D U-Net diffusion models, achieving state-of-the-art performance in generative tasks.

GaussianCube is a fully explicit, spatially structured 3D radiance representation designed to facilitate high-fidelity and parameter-efficient 3D generative modeling. It merges the real-time rendering and reconstruction accuracy of 3D Gaussian Splatting with a regular voxel-grid format, enabling seamless integration with standard 3D U-Net diffusion models and substantially reducing the parameter requirements characteristic of previous explicit and implicit radiance proxies (Zhang et al., 2024).

1. Formal Structure and Representation

GaussianCube represents a single 3D object by a fixed set of $N_{max}$ Gaussians,

$g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),$

each parameterized by position $\mu_i \in \mathbb{R}^3$ , color $c_i \in \mathbb{R}^3$ , opacity $\alpha_i$ , scale $s_i \in \mathbb{R}^3$ , and rotation $q_i \in \mathbb{R}^4$ , collectively forming a feature vector $\theta_i = \{\mu_i, s_i, q_i, \alpha_i, c_i\} \in \mathbb{R}^C$ for each Gaussian. In typical experiments, $N_{max} = 32,768$ , arranged into a $N_v \times N_v \times N_v$ voxel grid, with $g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),$ 0 such that each voxel contains a single Gaussian’s features.

Contrary to hybrid NeRF proxies that use a shared implicit decoder and unstructured arrangements, GaussianCube is fully explicit—each object is directly represented without a decoder bottleneck. This structure is conducive to efficient convolutional neural network operations and ensures a constant number of parameters per scene—a prerequisite for scalable generative modeling.

2. Densification-Constrained Gaussian Fitting

The representation is constructed via a densification-constrained Gaussian fitting algorithm. Traditional Gaussian Splatting alternates between densification (splitting or cloning Gaussians) and pruning, resulting in variable and often excessive numbers of components (typically exceeding $g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),$ 1 per scene). GaussianCube restricts the number of active Gaussians to exactly $g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),$ 2, crucial for subsequent grid voxelization and generative tasks.

During fitting, at each iteration, the set of candidates $g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),$ 3 for densification is compared to the available capacity $g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),$ 4 (where $g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),$ 5 is the current number of Gaussians). If $g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),$ 6, all candidates are densified; otherwise, only the top $g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),$ 7 are selected by view-space positional gradient. Splitting and cloning are interleaved but capped, and after convergence, Gaussian count is pruned to $g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),$ 8. Any deficit is padded with Gaussians with zero opacity ( $g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),$ 9) to ensure a fixed-size grid.

This iterative process can be viewed as an approximate solution to a regularized density-matching problem:

$\mu_i \in \mathbb{R}^3$ 0

subject to $\mu_i \in \mathbb{R}^3$ 1, $\mu_i \in \mathbb{R}^3$ 2, $\mu_i \in \mathbb{R}^3$ 3, where $\mu_i \in \mathbb{R}^3$ 4 are sample points, $\mu_i \in \mathbb{R}^3$ 5 measures density mismatch (e.g., squared error between predicted and target opacity), and $\mu_i \in \mathbb{R}^3$ 6 regularizes Gaussian shape.

3. Optimal Transport-Based Voxelization

After fitting $\mu_i \in \mathbb{R}^3$ 7 Gaussians $\mu_i \in \mathbb{R}^3$ 8, a bijective optimal transport mapping assigns them to the pre-defined $\mu_i \in \mathbb{R}^3$ 9 voxel grid positions $c_i \in \mathbb{R}^3$ 0. A cost matrix is constructed as

$c_i \in \mathbb{R}^3$ 1

and a linear assignment (discrete optimal transport) problem is solved:

$c_i \in \mathbb{R}^3$ 2

The Jonker–Volgenant algorithm is used for cubic-time assignment (practically approximated by spatial block partitioning). Each Gaussian is assigned to a unique voxel, and for compactness, the stored features are the offsets $c_i \in \mathbb{R}^3$ 3 rather than absolute positions. The resulting structured array $c_i \in \mathbb{R}^3$ 4 is the GaussianCube.

4. Diffusion Modeling in Structured Grid Space

The regularity of the GaussianCube grid permits direct use of standard 3D U-Net architectures for denoising diffusion probabilistic modeling. The forward process adds Gaussian noise to the grid representation $c_i \in \mathbb{R}^3$ 5 at each step $c_i \in \mathbb{R}^3$ 6:

$c_i \in \mathbb{R}^3$ 7

with a cosine schedule for $c_i \in \mathbb{R}^3$ 8. The 3D U-Net predicts the clean $c_i \in \mathbb{R}^3$ 9 from noisy $\alpha_i$ 0, using an “L2 loss”:

$\alpha_i$ 1

optionally conditioning on class $\alpha_i$ 2 via adaptive GroupNorm.

An additional image-level reconstruction loss—a combination of pixelwise $\alpha_i$ 3 and feature losses using VGG feature maps—is imposed:

$\alpha_i$ 4

where $\alpha_i$ 5 is the rendered output, $\alpha_i$ 6 is the ground truth, and $\alpha_i$ 7 indicates VGG layers. The total objective is

$\alpha_i$ 8

with $\alpha_i$ 9. The grid structure obviates the need for custom architectures, allowing replacement of all 2D U-Net modules with their 3D analogues.

5. Comparative Parameter Efficiency and Fidelity

GaussianCube achieves high-fidelity reconstruction with one to two orders of magnitude fewer parameters than prior explicit or hybrid radiance proxies. The following table summarizes results on ShapeNet Car:

Method	PSNR	LPIPS	SSIM	Speed ( $s_i \in \mathbb{R}^3$ 0)	Params (M)
Instant-NGP	33.98	0.0386	0.9809	1.00	12.3
Gaussian Splatting	35.32	0.0303	0.9874	2.60	1.84
Voxel (shared decoder)	25.80	0.1407	0.9111	1.73	0.47
Triplane (shared)	31.39	0.0759	0.9635	1.05	6.3
GaussianCube	34.94	0.0347	0.9863	3.33	0.46

GaussianCube matches or surpasses methods such as Instant-NGP and Triplane on PSNR, LPIPS, and SSIM, while using dramatically fewer parameters—a factor of $s_i \in \mathbb{R}^3$ 1 fewer than Instant-NGP and $s_i \in \mathbb{R}^3$ 2 fewer than Triplane (both with shared decoders).

6. Empirical Performance on Generative Tasks

In unconditional and class-conditioned 3D object generation (ShapeNet Car/Chair, OmniObject3D), GaussianCube exhibits state-of-the-art quantitative and qualitative results, measured by FID-50K and KID-50K scores:

Task	Metric	GaussianCube	Baseline
ShapeNet Car	FID-50K	13.01	GET3D/17.15, EG3D/30.48, DiffTF/51.88
ShapeNet Chair	FID-50K	15.99	GET3D/19.24, EG3D/27.98, DiffTF/47.08
OmniObject3D, class-cond.	FID-50K	11.62	DiffTF/46.06
OmniObject3D, class-cond.	KID-50K	2.78‰	DiffTF/22.86‰

Qualitatively, GaussianCube is reported to yield objects with complex geometry and sharp textures, while GAN and Triplane-diffusion competitors show blur or failure to capture fine details (Zhang et al., 2024). The method’s fully explicit structure and regular grid make it directly extensible to further applications such as digital avatar creation and text-to-3D synthesis, where similar parameter and fidelity advantages are anticipated.

7. Integration and Broader Applicability

GaussianCube’s explicitness, parameter efficiency, and regularized voxel-grid formulation enable direct use with off-the-shelf generative backbones, without custom architectural changes. This positions it as a versatile foundation for future 3D generative modeling tasks, particularly those requiring high-quality synthesis with compact and structured radiance proxies (Zhang et al., 2024). A plausible implication is that, as diffusion-based and text-guided synthesis scale in complexity, GaussianCube’s approach may yield persistent benefits in quality and efficiency.

Markdown Report Issue Upgrade to Chat

References (1)

GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GaussianCube.

GaussianCube: Efficient 3D Radiance Model

1. Formal Structure and Representation

2. Densification-Constrained Gaussian Fitting

3. Optimal Transport-Based Voxelization

4. Diffusion Modeling in Structured Grid Space

5. Comparative Parameter Efficiency and Fidelity

6. Empirical Performance on Generative Tasks

7. Integration and Broader Applicability

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

GaussianCube: Efficient 3D Radiance Model

1. Formal Structure and Representation

2. Densification-Constrained Gaussian Fitting

3. Optimal Transport-Based Voxelization

4. Diffusion Modeling in Structured Grid Space

5. Comparative Parameter Efficiency and Fidelity

6. Empirical Performance on Generative Tasks

7. Integration and Broader Applicability

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research