Papers
Topics
Authors
Recent
Search
2000 character limit reached

GaussianCube: Efficient 3D Radiance Model

Updated 27 January 2026
  • GaussianCube is a 3D radiance representation that uses a structured voxel grid with fixed Gaussian components for accurate object modeling.
  • It employs a densification-constrained Gaussian fitting algorithm and optimal transport-based voxelization to ensure parameter efficiency and precise reconstruction.
  • The regular grid structure seamlessly integrates with 3D U-Net diffusion models, achieving state-of-the-art performance in generative tasks.

GaussianCube is a fully explicit, spatially structured 3D radiance representation designed to facilitate high-fidelity and parameter-efficient 3D generative modeling. It merges the real-time rendering and reconstruction accuracy of 3D Gaussian Splatting with a regular voxel-grid format, enabling seamless integration with standard 3D U-Net diffusion models and substantially reducing the parameter requirements characteristic of previous explicit and implicit radiance proxies (Zhang et al., 2024).

1. Formal Structure and Representation

GaussianCube represents a single 3D object by a fixed set of NmaxN_{max} Gaussians,

gi(x)=exp(12(xμi)Σi1(xμi)),g_i(x) = \exp\left(-\frac{1}{2}(x - \mu_i)^\top \Sigma_i^{-1}(x - \mu_i)\right),

each parameterized by position μiR3\mu_i \in \mathbb{R}^3, color ciR3c_i \in \mathbb{R}^3, opacity αi\alpha_i, scale siR3s_i \in \mathbb{R}^3, and rotation qiR4q_i \in \mathbb{R}^4, collectively forming a feature vector θi={μi,si,qi,αi,ci}RC\theta_i = \{\mu_i, s_i, q_i, \alpha_i, c_i\} \in \mathbb{R}^C for each Gaussian. In typical experiments, Nmax=32,768N_{max} = 32,768, arranged into a Nv×Nv×NvN_v \times N_v \times N_v voxel grid, with Nv=32N_v = 32 such that each voxel contains a single Gaussian’s features.

Contrary to hybrid NeRF proxies that use a shared implicit decoder and unstructured arrangements, GaussianCube is fully explicit—each object is directly represented without a decoder bottleneck. This structure is conducive to efficient convolutional neural network operations and ensures a constant number of parameters per scene—a prerequisite for scalable generative modeling.

2. Densification-Constrained Gaussian Fitting

The representation is constructed via a densification-constrained Gaussian fitting algorithm. Traditional Gaussian Splatting alternates between densification (splitting or cloning Gaussians) and pruning, resulting in variable and often excessive numbers of components (typically exceeding 10510^5 per scene). GaussianCube restricts the number of active Gaussians to exactly NmaxN_{max}, crucial for subsequent grid voxelization and generative tasks.

During fitting, at each iteration, the set of candidates NdN_d for densification is compared to the available capacity NmaxNcN_{max} - N_c (where NcN_c is the current number of Gaussians). If NdNmaxNcN_d \leq N_{max} - N_c, all candidates are densified; otherwise, only the top (NmaxNc)(N_{max} - N_c) are selected by view-space positional gradient. Splitting and cloning are interleaved but capped, and after convergence, Gaussian count is pruned to Nmax\leq N_{max}. Any deficit is padded with Gaussians with zero opacity (α=0\alpha = 0) to ensure a fixed-size grid.

This iterative process can be viewed as an approximate solution to a regularized density-matching problem:

minμi,Σi,wip=1Pi=1NmaxwiD(F(μi,Σi),Vp)+λR(Σi)\min_{\mu_i, \Sigma_i, w_i} \sum_{p=1}^P \sum_{i=1}^{N_{max}} w_i \cdot D(F(\mu_i, \Sigma_i), V_p) + \lambda \mathcal{R}(\Sigma_i)

subject to iwi=1\sum_i w_i = 1, wi0w_i \geq 0, {i:wi>0}Nmax\| \{i : w_i > 0\} \| \leq N_{max}, where VpV_p are sample points, D(,)D(\cdot, \cdot) measures density mismatch (e.g., squared error between predicted and target opacity), and R(Σi)\mathcal{R}(\Sigma_i) regularizes Gaussian shape.

3. Optimal Transport-Based Voxelization

After fitting NmaxN_{max} Gaussians {μi}\{\mu_i\}, a bijective optimal transport mapping assigns them to the pre-defined Nv3N_v^3 voxel grid positions {xj}\{x_j\}. A cost matrix is constructed as

Dij=μixj2,D_{ij} = \| \mu_i - x_j \|^2,

and a linear assignment (discrete optimal transport) problem is solved:

minT{0,1}Nmax×Nmaxi,jTijDij s.t.jTij=1 i,iTij=1 j.\min_{T \in \{0,1\}^{N_{max} \times N_{max}}} \sum_{i,j} T_{ij} D_{ij} \ \text{s.t.} \sum_j T_{ij} = 1 \ \forall i,\quad \sum_i T_{ij} = 1 \ \forall j.

The Jonker–Volgenant algorithm is used for cubic-time assignment (practically approximated by spatial block partitioning). Each Gaussian is assigned to a unique voxel, and for compactness, the stored features are the offsets μkxj\mu_k - x_j rather than absolute positions. The resulting structured array YRNv×Nv×Nv×CY \in \mathbb{R}^{N_v \times N_v \times N_v \times C} is the GaussianCube.

4. Diffusion Modeling in Structured Grid Space

The regularity of the GaussianCube grid permits direct use of standard 3D U-Net architectures for denoising diffusion probabilistic modeling. The forward process adds Gaussian noise to the grid representation YY at each step t=1Tt = 1 \ldots T:

Yt=αtY0+σtϵ,ϵN(0,I),Y_t = \alpha_t Y_0 + \sigma_t \epsilon, \quad \epsilon \sim \mathcal{N}(0, I),

with a cosine schedule for (αt,σt)(\alpha_t, \sigma_t). The 3D U-Net predicts the clean Y0Y_0 from noisy YtY_t, using an “L2 loss”:

Lsimple=Et,Y0,ϵ[Y^θ(Yt,t,ccls)Y02],\mathcal{L}_{simple} = \mathbb{E}_{t, Y_0, \epsilon}\left[\|\hat{Y}_\theta(Y_t, t, c_{cls}) - Y_0\|^2\right],

optionally conditioning on class cclsc_{cls} via adaptive GroupNorm.

An additional image-level reconstruction loss—a combination of pixelwise L2L_2 and feature losses using VGG feature maps—is imposed:

Limage=E[IpredIgt2+lΨl(Ipred)Ψl(Igt)2],\mathcal{L}_{image} = \mathbb{E}[\|I_{pred} - I_{gt}\|^2 + \sum_l \|\Psi^l(I_{pred}) - \Psi^l(I_{gt})\|^2],

where IpredI_{pred} is the rendered output, IgtI_{gt} is the ground truth, and Ψl\Psi^l indicates VGG layers. The total objective is

L=Lsimple+λLimage,\mathcal{L} = \mathcal{L}_{simple} + \lambda \mathcal{L}_{image},

with λ=10\lambda = 10. The grid structure obviates the need for custom architectures, allowing replacement of all 2D U-Net modules with their 3D analogues.

5. Comparative Parameter Efficiency and Fidelity

GaussianCube achieves high-fidelity reconstruction with one to two orders of magnitude fewer parameters than prior explicit or hybrid radiance proxies. The following table summarizes results on ShapeNet Car:

Method PSNR LPIPS SSIM Speed (×\times) Params (M)
Instant-NGP 33.98 0.0386 0.9809 1.00 12.3
Gaussian Splatting 35.32 0.0303 0.9874 2.60 1.84
Voxel (shared decoder) 25.80 0.1407 0.9111 1.73 0.47
Triplane (shared) 31.39 0.0759 0.9635 1.05 6.3
GaussianCube 34.94 0.0347 0.9863 3.33 0.46

GaussianCube matches or surpasses methods such as Instant-NGP and Triplane on PSNR, LPIPS, and SSIM, while using dramatically fewer parameters—a factor of 27×27\times fewer than Instant-NGP and 14×14\times fewer than Triplane (both with shared decoders).

6. Empirical Performance on Generative Tasks

In unconditional and class-conditioned 3D object generation (ShapeNet Car/Chair, OmniObject3D), GaussianCube exhibits state-of-the-art quantitative and qualitative results, measured by FID-50K and KID-50K scores:

Task Metric GaussianCube Baseline
ShapeNet Car FID-50K 13.01 GET3D/17.15, EG3D/30.48, DiffTF/51.88
ShapeNet Chair FID-50K 15.99 GET3D/19.24, EG3D/27.98, DiffTF/47.08
OmniObject3D, class-cond. FID-50K 11.62 DiffTF/46.06
OmniObject3D, class-cond. KID-50K 2.78‰ DiffTF/22.86‰

Qualitatively, GaussianCube is reported to yield objects with complex geometry and sharp textures, while GAN and Triplane-diffusion competitors show blur or failure to capture fine details (Zhang et al., 2024). The method’s fully explicit structure and regular grid make it directly extensible to further applications such as digital avatar creation and text-to-3D synthesis, where similar parameter and fidelity advantages are anticipated.

7. Integration and Broader Applicability

GaussianCube’s explicitness, parameter efficiency, and regularized voxel-grid formulation enable direct use with off-the-shelf generative backbones, without custom architectural changes. This positions it as a versatile foundation for future 3D generative modeling tasks, particularly those requiring high-quality synthesis with compact and structured radiance proxies (Zhang et al., 2024). A plausible implication is that, as diffusion-based and text-guided synthesis scale in complexity, GaussianCube’s approach may yield persistent benefits in quality and efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GaussianCube.