Distilled-3DGS: Efficient 3D Gaussian Splatting
- The paper introduces Distilled-3DGS, a multi-teacher knowledge distillation framework for 3D Gaussian Splatting that reduces the number of Gaussians by up to 90% without compromising photorealistic rendering quality.
- It leverages a teacher ensemble—comprising vanilla, noise-augmented, and dropout-regularized 3DGS models—to generate robust pseudo-labels through combined photometric and structural loss functions.
- Experimental results on benchmarks like Mip-NeRF360 demonstrate enhanced geometric fidelity and a significant reduction in memory and storage demands while maintaining or improving PSNR.
Distilled-3DGS is a knowledge distillation framework for 3D Gaussian Splatting (3DGS) aimed at constructing compact, high-fidelity explicit 3D scene representations with reduced memory and storage requirements while maintaining state-of-the-art rendering quality (Xiang et al., 19 Aug 2025). By transferring geometric and radiometric knowledge from ensembles of heavy teacher 3DGS models—including noise-augmented and dropout-regularized variants—a substantially lighter student model is produced. This strategy enables a drastic reduction—up to nearly 90%—in the number of Gaussians required for photorealistic novel view synthesis, facilitating more efficient deployment without significant loss of quality.
1. Background: 3D Gaussian Splatting and Its Limitations
3DGS represents a 3D scene as a set of anisotropic Gaussian primitives, each defined by a center , covariance , spherical harmonics coefficients for appearance, and opacity . Rendering is achieved through differentiable projection and alpha-blending, producing highly detailed images in real time. However, high-fidelity 3DGS models demand several million Gaussians to capture geometric and photometric details, resulting in considerable memory and storage costs. The explicit point-based nature makes conventional distillation techniques—such as those in neural implicit representations—non-trivial to adapt due to the lack of a shared volumetric or grid structure.
2. Knowledge Distillation Framework: Teacher Ensemble and Student
The Distilled-3DGS framework introduces a three-teacher ensemble to guide the learning of a lightweight student:
- Vanilla 3DGS ("G_std"): Optimized with standard photometric (L1 and D-SSIM) losses; serves as the baseline high-capacity model.
- Noise-Augmented 3DGS ("G_perb"): Gaussian parameters (positions, rotations, scales, opacities) are perturbed stochastically during training (rotations, for instance, in a continuous 6D representation, ), resulting in more robust and spatially stable reconstructions.
- Dropout-Regularized 3DGS ("G_drop"): Each Gaussian is dropped out with a gradually increasing probability as training progresses (), encouraging the ensemble to redundantly encode scene content and resist overfitting to specific primitives.
For each training sample, all teachers independently synthesize a rendered image. Their outputs () are averaged to form a pseudo-label image () that integrates diverse and robust supervision signals. The student, parameterized by a drastically reduced set of Gaussians, is optimized to minimize both the standard photometric loss with respect to ground truth and a distillation loss with respect to .
Component | Function | Purpose in Distillation |
---|---|---|
Vanilla 3DGS () | Standard photometric training | Baseline scene quality |
Noise-Augmented () | Parameter perturbation during training | Robustness/local diversity |
Dropout-Regularized () | Stochastic Gaussian pruning during training | Redundancy/reduces overfit |
Pseudo-label aggregation | Averaging outputs from all teachers | High-quality soft guidance |
3. Structural Similarity Loss for Geometric Consistency
To ensure spatial structure transfer beyond mere appearance, the framework introduces a voxelized histogram representation for geometric similarity. The 3D point clouds produced by the teacher and student are binned into a regular voxel grid (e.g., 128³), forming normalized per-voxel count distributions and , respectively. The cosine similarity between these high-dimensional histogram features is then computed:
This term robustly aligns the spatial distributions, promoting both global and local geometric agreement regardless of the absolute number of Gaussians in each model. The histogram-matching strategy avoids sensitivity to sampling density and local point cloud artifacts, facilitating transfer from over-parameterized teachers to sparse students.
4. Optimization Objective and Technical Details
The student model loss is a weighted combination of photometric and D-SSIM error to both ground truth and the teacher-ensemble output, together with the geometric histogram similarity term:
where
The 3D Gaussian rendering and projection formulas are retained from prior 3DGS work (e.g., projection via ; alpha-blended compositing per pixel). Student optimization balances fitting to dataset images and matching the more informative, aggregated output of the teacher ensemble while regularizing geometric configuration.
5. Evaluation, Efficiency, and Storage Gains
Extensive experiments on benchmarks such as Mip-NeRF 360, Tanks & Temples, and Deep Blending demonstrate that Distilled-3DGS achieves up to 0.55 dB higher PSNR than the baseline 3DGS while reducing the number of Gaussians by up to 87–90%. Qualitative assessments reveal high geometric and color fidelity on complex scenes even with a fraction of the original memory/storage footprint.
Comparisons to dense methods (Mip-NeRF360, etc.) indicate that similar levels of perceptual quality and structural detail are obtainable while achieving significant compression. Approaches that focus exclusively on efficiency tend to discard geometric detail; Distilled-3DGS mitigates this through dual loss supervision and ensemble teacher diversity.
6. Discussion and Future Directions
Distilled-3DGS is the first framework to enact knowledge distillation specifically for unstructured, point-based explicit 3D Gaussian splats, integrating a multi-teacher setup with robust spatial histogram constraints. The approach is highly efficient and directly applicable in resource-constrained settings for real-time novel view synthesis. Limitations and open questions remain in further minimizing the training and GPU cost of the multi-teacher distillation phase; potential exists for future end-to-end knowledge transfer strategies and more sophisticated pruning/adaptive methods for managing the balance between quality and storage.
A plausible implication is that such distillation pipelines could serve as building blocks for neural scene compression, dynamic or semantic-aware 3DGS, and for deployment in settings where both hardware resources and bandwidth are critical constraints. Extensions to integrate task-specific supervision or semantic guidance may further broaden the practical scope.
Aspect | Teacher Model: 3DGS | Student Model: Distilled-3DGS |
---|---|---|
# Gaussians | millions | millions (∼10× less) |
Rendering quality | SOTA | Comparable or better |
Training cost | High | Reduced (after distillation) |
In summary, Distilled-3DGS provides an effective paradigm for compressing explicit 3D scene representations by distilling knowledge from diverse, high-capacity teacher models. This results in storage-efficient, high-fidelity student models suitable for deployment across a range of novel view synthesis applications (Xiang et al., 19 Aug 2025).