Smol-GS: Compact 3D Gaussian Splatting
- Smol-GS is a compact 3D scene encoding technique that uses Gaussian splats with a lossless occupancy-octree for spatial compression and learned quantized features for appearance.
- It achieves up to 155× compression over traditional 3DGS methods while maintaining competitive rendering quality, with real-time speeds of 200–400 fps.
- The method decouples spatial and feature data to support downstream tasks such as semantic labeling, robotic navigation, and efficient scene editing.
Smol-GS is a method for highly compact 3D scene encoding based on the 3D Gaussian Splatting (3DGS) paradigm, achieving state-of-the-art compression ratios while retaining visual fidelity and enabling downstream machine-learning and robotic applications. It combines a lossless spatial hierarchy for Gaussian coordinates with learned, quantized abstract per-splat attributes, providing a memory-efficient and semantically enhanced representation suitable for demanding real-time and mobile scenarios (Wang et al., 30 Nov 2025).
1. Motivation and Problem Setting
3D Gaussian Splatting models a scene as a collection of Gaussian “splats” in , each with associated geometric and appearance parameters. Typical high-quality reconstructions require millions of splats, yielding model sizes of hundreds of megabytes to gigabytes. This precludes efficient streaming, mobile inference, or storage-constrained deployment. Prior approaches focusing solely on attribute quantization or anchor-offset schemes fail to deliver sufficient storage savings or introduce spatial redundancies. Smol-GS responds to these deficits by:
- Explicitly compressing splat coordinates using a recursive occupancy-octree
- Abstracting appearance/material cues per-splat and entropy-coding them
- Decoupling spatial and feature compression to support editing, sparse access, and downstream analysis The design specifically targets practical applications such as robotics, web-based visualization, and downstream scene understanding, where model size and semantic manipulability are both critical (Wang et al., 30 Nov 2025).
2. Mathematical Foundations
Each 3D splat is parameterized by:
- Mean
- Covariance
- Opacity
- Learned abstract feature vector (practically ) The density formulation is: Projected 2D Gaussians along camera rays are composited via ordered -blending for view synthesis. Rendering reduces to evaluating the composited sum along each ray, where features are decoded by compact multi-layer perceptrons (MLPs) into color , rotation , scale , and opacity (Wang et al., 30 Nov 2025).
3. Representation Architecture
3.1 Occupancy-Octree Coding for Coordinates
The spatial support consists of a recursively subdivided axis-aligned bounding box (AABB), representing each split as an 8-bit occupancy byte per internal node. Only nonempty octants are recursively subdivided, and leaf nodes correspond to individual splat locations. Storing the sequence of occupancy bytes via entropy coding (e.g., Huffman) achieves coordinate compression:
- For splats and internal nodes, total bits for depth
- Empirical coordinate bits per splat: bytes (MIP-NeRF 360) This lossless structure ensures spatial queries and manipulation remain feasible and efficient.
3.2 Quantization and Arithmetic-Coding of Feature Vectors
Each splat’s feature vector (encoding color, opacity, geometry, etc.) is quantized via learned step sizes predicted from the hashed spatial index of : Quantized binning is
Probability distributions for arithmetic coding are based on predicted Gaussians
Only the quantized features and compact MLP weights are stored, yielding bytes per splat (Wang et al., 30 Nov 2025).
3.3 Overall Memory Model
For splats, total model size (excluding MLP weights) is
which, empirically, yields MB for standard real-world scenes (MIP-NeRF 360).
4. Training, Compaction, and Encoding Strategy
The Smol-GS pipeline consists of the following algorithmic stages (35k iterations total):
- Warm-Up (0–0.5k): Initialize splats from SfM point clouds
- Densification (0.5–15k): Adaptive splitting/pruning based on to match scene detail
- Compaction (15–20k): Prune excess splats via opacity penalty
- Feature Compression (20–30k): Activate quantization and NLL penalty for ,
- Coordinate Compression (30–35k): Fix splits, encode octree
The global loss combines photometric and SSIM loss, opacity sparsity, and negative log-likelihoods of feature quantization: Pseudocode for the key algorithms—building the occupancy-octree and encoding features via arithmetic coding—are explicitly included in the reference [(Wang et al., 30 Nov 2025), Sec. 4.3].
5. Benchmarking, Comparison, and Quantitative Results
Smol-GS is benchmarked on MIP-NeRF 360, Tanks & Temples, and Deep Blending. The following table summarizes performance for MIP-NeRF 360:
| Method | PSNR↑ | SSIM↑ | LPIPS↓ | Size (MB) | Compression Ratio |
|---|---|---|---|---|---|
| 3DGS-30K | 27.21 | 0.815 | 0.214 | 734.0 | 1× |
| HAC++ | 27.60 | 0.803 | 0.253 | 8.74 | 84× |
| Smol-GS (small) | 27.29 | 0.798 | 0.260 | 4.75 | 155× |
Compression ratio is defined as . Smol-GS achieves up to 155 compression over vanilla 3DGS-30K at matched rendering quality. Other metrics:
- Training time: 32 min/scene (NVIDIA H200)
- Encoding: 1–4 s/scene
- Real-time rendering: 200–400 fps [(Wang et al., 30 Nov 2025), Table 1; Sec. 5.4]
6. Visual and Semantic Analysis; Downstream Applications
Figures 2 and 8 of (Wang et al., 30 Nov 2025) exhibit that Smol-GS faithfully reconstructs sharp edges, specular reflections, and transparencies at drastic (order-of-magnitude) reductions in model size. In challenging regions (e.g., stainless and glass surfaces), learned per-splat features offer better expressivity than standard spherical harmonics at a lower representation cost.
The discrete occupancy-octree forms an explicit spatial data structure enabling occupancy queries necessary for navigation and collision avoidance. Because attributes are decoupled and accessible, Smol-GS supports splat-wise semantic labeling, scene graph reasoning, and potentially forms a basis for SLAM, planning, and 3D scene understanding pipelines. This suggests utility not only as a rendering primitive but as a unified geometric/semantic abstraction layer for embodied or interactive AI.
7. Comparative Perspective and Research Context
Smol-GS is distinct from prior methods such as LocoGS, Mini-Splatting, OMG, Scaffold-GS, and HAC++ in several ways:
- OMG (Lee et al., 21 Mar 2025) and its variants focus on attribute-level quantization, neural field compression, and importance-guided pruning—reducing, but not eliminating, coordinate redundancy or anchor-offset overhead.
- HAC++ and Scaffold-GS reduce local redundancy but are averse to coordinate compression due to fidelity concerns.
- Smol-GS consolidates the spatial hierarchy using a lossless occupancy-octree and performs per-splat, spatially conditioned feature quantization, achieving higher compression and enabling explicit geometric/semantic manipulations. A plausible implication is that occupancy-octree coordinate compression and learned semantic features facilitate hybrid use cases spanning rendering and scene understanding without bespoke retraining or expansion of storage footprint.
References:
- "Smol-GS: Compact Representations for Abstract 3D Gaussian Splatting" (Wang et al., 30 Nov 2025)
- "Optimized Minimal 3D Gaussian Splatting" (Lee et al., 21 Mar 2025)