Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 78 tok/s
Gemini 2.5 Pro 43 tok/s Pro
GPT-5 Medium 23 tok/s
GPT-5 High 29 tok/s Pro
GPT-4o 93 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 183 tok/s Pro
2000 character limit reached

Distilled-3DGS: Efficient 3D Gaussian Splatting

Updated 24 August 2025
  • The paper introduces Distilled-3DGS, a multi-teacher knowledge distillation framework for 3D Gaussian Splatting that reduces the number of Gaussians by up to 90% without compromising photorealistic rendering quality.
  • It leverages a teacher ensemble—comprising vanilla, noise-augmented, and dropout-regularized 3DGS models—to generate robust pseudo-labels through combined photometric and structural loss functions.
  • Experimental results on benchmarks like Mip-NeRF360 demonstrate enhanced geometric fidelity and a significant reduction in memory and storage demands while maintaining or improving PSNR.

Distilled-3DGS is a knowledge distillation framework for 3D Gaussian Splatting (3DGS) aimed at constructing compact, high-fidelity explicit 3D scene representations with reduced memory and storage requirements while maintaining state-of-the-art rendering quality (Xiang et al., 19 Aug 2025). By transferring geometric and radiometric knowledge from ensembles of heavy teacher 3DGS models—including noise-augmented and dropout-regularized variants—a substantially lighter student model is produced. This strategy enables a drastic reduction—up to nearly 90%—in the number of Gaussians required for photorealistic novel view synthesis, facilitating more efficient deployment without significant loss of quality.

1. Background: 3D Gaussian Splatting and Its Limitations

3DGS represents a 3D scene as a set of anisotropic Gaussian primitives, each defined by a center μi\mu_i, covariance Σi\Sigma_i, spherical harmonics coefficients fif_i for appearance, and opacity oio_i. Rendering is achieved through differentiable projection and alpha-blending, producing highly detailed images in real time. However, high-fidelity 3DGS models demand several million Gaussians to capture geometric and photometric details, resulting in considerable memory and storage costs. The explicit point-based nature makes conventional distillation techniques—such as those in neural implicit representations—non-trivial to adapt due to the lack of a shared volumetric or grid structure.

2. Knowledge Distillation Framework: Teacher Ensemble and Student

The Distilled-3DGS framework introduces a three-teacher ensemble to guide the learning of a lightweight student:

  • Vanilla 3DGS ("G_std"): Optimized with standard photometric (L1 and D-SSIM) losses; serves as the baseline high-capacity model.
  • Noise-Augmented 3DGS ("G_perb"): Gaussian parameters (positions, rotations, scales, opacities) are perturbed stochastically during training (rotations, for instance, in a continuous 6D representation, Rpt=f1(f(Rpt)+δr)R_p^t = f^{-1}(f(R_p^t) + \delta_r)), resulting in more robust and spatially stable reconstructions.
  • Dropout-Regularized 3DGS ("G_drop"): Each Gaussian is dropped out with a gradually increasing probability as training progresses (rt=rinit(tt0)/(t1t0)r_t = r_\mathrm{init} \cdot (t-t_0)/(t_1 - t_0)), encouraging the ensemble to redundantly encode scene content and resist overfitting to specific primitives.

For each training sample, all teachers independently synthesize a rendered image. Their outputs (Istd,Iperb,IdropI_\mathrm{std}, I_\mathrm{perb}, I_\mathrm{drop}) are averaged to form a pseudo-label image (IteaI_\mathrm{tea}) that integrates diverse and robust supervision signals. The student, parameterized by a drastically reduced set of Gaussians, is optimized to minimize both the standard photometric loss with respect to ground truth and a distillation loss with respect to IteaI_\mathrm{tea}.

Component Function Purpose in Distillation
Vanilla 3DGS (GstdG_\mathrm{std}) Standard photometric training Baseline scene quality
Noise-Augmented (GperbG_\mathrm{perb}) Parameter perturbation during training Robustness/local diversity
Dropout-Regularized (GdropG_\mathrm{drop}) Stochastic Gaussian pruning during training Redundancy/reduces overfit
Pseudo-label aggregation Averaging outputs from all teachers High-quality soft guidance

3. Structural Similarity Loss for Geometric Consistency

To ensure spatial structure transfer beyond mere appearance, the framework introduces a voxelized histogram representation for geometric similarity. The 3D point clouds produced by the teacher and student are binned into a regular voxel grid (e.g., 128³), forming normalized per-voxel count distributions hteah_\mathrm{tea} and hstuh_\mathrm{stu}, respectively. The cosine similarity between these high-dimensional histogram features is then computed:

Lhist=1hteahstuhtea2hstu2\mathcal{L}_\mathrm{hist} = 1 - \frac{h_\mathrm{tea} \cdot h_\mathrm{stu}}{\| h_\mathrm{tea} \|_2 \| h_\mathrm{stu} \|_2}

This term robustly aligns the spatial distributions, promoting both global and local geometric agreement regardless of the absolute number of Gaussians in each model. The histogram-matching strategy avoids sensitivity to sampling density and local point cloud artifacts, facilitating transfer from over-parameterized teachers to sparse students.

4. Optimization Objective and Technical Details

The student model loss is a weighted combination of photometric and D-SSIM error to both ground truth and the teacher-ensemble output, together with the geometric histogram similarity term:

Ltotal=Lkd+Lhist\mathcal{L}_\mathrm{total} = \mathcal{L}_\mathrm{kd} + \mathcal{L}_\mathrm{hist}

where

Lkd=Lcolor(Istu,Igt)+λkdLcolor(Istu,Itea)\mathcal{L}_\mathrm{kd} = \mathcal{L}_\mathrm{color}(I_\mathrm{stu}, I_\mathrm{gt}) + \lambda_\mathrm{kd} \mathcal{L}_\mathrm{color}(I_\mathrm{stu}, I_\mathrm{tea})

The 3D Gaussian rendering and projection formulas are retained from prior 3DGS work (e.g., projection via Σi=JWΣiWTJT\Sigma'_i = J W \Sigma_i W^T J^T; alpha-blended compositing per pixel). Student optimization balances fitting to dataset images and matching the more informative, aggregated output of the teacher ensemble while regularizing geometric configuration.

5. Evaluation, Efficiency, and Storage Gains

Extensive experiments on benchmarks such as Mip-NeRF 360, Tanks & Temples, and Deep Blending demonstrate that Distilled-3DGS achieves up to 0.55 dB higher PSNR than the baseline 3DGS while reducing the number of Gaussians by up to 87–90%. Qualitative assessments reveal high geometric and color fidelity on complex scenes even with a fraction of the original memory/storage footprint.

Comparisons to dense methods (Mip-NeRF360, etc.) indicate that similar levels of perceptual quality and structural detail are obtainable while achieving significant compression. Approaches that focus exclusively on efficiency tend to discard geometric detail; Distilled-3DGS mitigates this through dual loss supervision and ensemble teacher diversity.

6. Discussion and Future Directions

Distilled-3DGS is the first framework to enact knowledge distillation specifically for unstructured, point-based explicit 3D Gaussian splats, integrating a multi-teacher setup with robust spatial histogram constraints. The approach is highly efficient and directly applicable in resource-constrained settings for real-time novel view synthesis. Limitations and open questions remain in further minimizing the training and GPU cost of the multi-teacher distillation phase; potential exists for future end-to-end knowledge transfer strategies and more sophisticated pruning/adaptive methods for managing the balance between quality and storage.

A plausible implication is that such distillation pipelines could serve as building blocks for neural scene compression, dynamic or semantic-aware 3DGS, and for deployment in settings where both hardware resources and bandwidth are critical constraints. Extensions to integrate task-specific supervision or semantic guidance may further broaden the practical scope.

Aspect Teacher Model: 3DGS Student Model: Distilled-3DGS
# Gaussians \simmillions \ll millions (∼10× less)
Rendering quality SOTA Comparable or better
Training cost High Reduced (after distillation)

In summary, Distilled-3DGS provides an effective paradigm for compressing explicit 3D scene representations by distilling knowledge from diverse, high-capacity teacher models. This results in storage-efficient, high-fidelity student models suitable for deployment across a range of novel view synthesis applications (Xiang et al., 19 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube