Papers
Topics
Authors
Recent
2000 character limit reached

Smol-GS: Compact 3D Gaussian Splatting

Updated 7 December 2025
  • Smol-GS is a compact 3D scene encoding technique that uses Gaussian splats with a lossless occupancy-octree for spatial compression and learned quantized features for appearance.
  • It achieves up to 155× compression over traditional 3DGS methods while maintaining competitive rendering quality, with real-time speeds of 200–400 fps.
  • The method decouples spatial and feature data to support downstream tasks such as semantic labeling, robotic navigation, and efficient scene editing.

Smol-GS is a method for highly compact 3D scene encoding based on the 3D Gaussian Splatting (3DGS) paradigm, achieving state-of-the-art compression ratios while retaining visual fidelity and enabling downstream machine-learning and robotic applications. It combines a lossless spatial hierarchy for Gaussian coordinates with learned, quantized abstract per-splat attributes, providing a memory-efficient and semantically enhanced representation suitable for demanding real-time and mobile scenarios (Wang et al., 30 Nov 2025).

1. Motivation and Problem Setting

3D Gaussian Splatting models a scene as a collection of Gaussian “splats” in R3\mathbb{R}^3, each with associated geometric and appearance parameters. Typical high-quality reconstructions require millions of splats, yielding model sizes of hundreds of megabytes to gigabytes. This precludes efficient streaming, mobile inference, or storage-constrained deployment. Prior approaches focusing solely on attribute quantization or anchor-offset schemes fail to deliver sufficient storage savings or introduce spatial redundancies. Smol-GS responds to these deficits by:

  • Explicitly compressing splat coordinates using a recursive occupancy-octree
  • Abstracting appearance/material cues per-splat and entropy-coding them
  • Decoupling spatial and feature compression to support editing, sparse access, and downstream analysis The design specifically targets practical applications such as robotics, web-based visualization, and downstream scene understanding, where model size and semantic manipulability are both critical (Wang et al., 30 Nov 2025).

2. Mathematical Foundations

Each 3D splat GiG_i is parameterized by:

  • Mean μiR3\boldsymbol{\mu}_i\in\mathbb{R}^3
  • Covariance ΣiR3×3\Sigma_i\in\mathbb{R}^{3\times 3}
  • Opacity oi[0,1]o_i\in[0,1]
  • Learned abstract feature vector fiRnff_i\in\mathbb{R}^{n_f} (practically nf=8n_f=8) The density formulation is: Gi(x)=exp(12(xμi)Σi1(xμi))G_i(\mathbf{x}) = \exp\left(-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu}_i)^\top \Sigma_i^{-1}(\mathbf{x}-\boldsymbol{\mu}_i)\right) Projected 2D Gaussians along camera rays are composited via ordered α\alpha-blending for view synthesis. Rendering reduces to evaluating the composited sum along each ray, where features fif_i are decoded by compact multi-layer perceptrons (MLPs) into color cic_i, rotation rir_i, scale sis_i, and opacity oio_i (Wang et al., 30 Nov 2025).

3. Representation Architecture

3.1 Occupancy-Octree Coding for Coordinates

The spatial support consists of a recursively subdivided axis-aligned bounding box (AABB), representing each split as an 8-bit occupancy byte per internal node. Only nonempty octants are recursively subdivided, and leaf nodes correspond to individual splat locations. Storing the sequence of occupancy bytes via entropy coding (e.g., Huffman) achieves coordinate compression:

  • For NN splats and NintN_{\rm int} internal nodes, total bits 8Nint3RN\approx 8N_{\rm int}\ll3RN for depth RR
  • Empirical coordinate bits per splat: bcoord1.4b_{\rm coord}\approx1.4 bytes (MIP-NeRF 360) This lossless structure ensures spatial queries and manipulation remain feasible and efficient.

3.2 Quantization and Arithmetic-Coding of Feature Vectors

Each splat’s feature vector fif_i (encoding color, opacity, geometry, etc.) is quantized via learned step sizes predicted from the hashed spatial index of μi\boldsymbol{\mu}_i: (μif,Eif,Ai)=MLPh(Hash(μi))(\mu_i^f,\,E_i^f,\,A_i)=\mathrm{MLP}_h\left(\mathrm{Hash}(\boldsymbol{\mu}_i)\right) Quantized binning is

fi,q=Airound(fi/Ai)f_{i,q} = A_i \circ \mathrm{round}(f_i/A_i)

Probability distributions for arithmetic coding are based on predicted Gaussians

p(fi,q)N(fi,q;μif,diag(Eif))p(f_{i,q}) \propto \mathcal{N}\left(f_{i,q}; \mu_i^f, \mathrm{diag}(E_i^f)\right)

Only the quantized features and compact MLP weights are stored, yielding bfeat3.2b_{\rm feat}\approx3.2 bytes per splat (Wang et al., 30 Nov 2025).

3.3 Overall Memory Model

For NN splats, total model size (excluding MLP weights) is

M=N  (bcoord+bfeat) bitsM = N\; (b_{\rm coord} + b_{\rm feat})\ \mathrm{bits}

which, empirically, yields M4.75M\approx 4.75 MB for standard real-world scenes (MIP-NeRF 360).

4. Training, Compaction, and Encoding Strategy

The Smol-GS pipeline consists of the following algorithmic stages (35k iterations total):

  1. Warm-Up (0–0.5k): Initialize splats from SfM point clouds
  2. Densification (0.5–15k): Adaptive splitting/pruning based on xiL1\|\nabla_{x_i}\mathcal{L}_1\| to match scene detail
  3. Compaction (15–20k): Prune excess splats via opacity penalty λo\lambda_o
  4. Feature Compression (20–30k): Activate quantization and NLL penalty λq\lambda_q for fif_i, sis_i
  5. Coordinate Compression (30–35k): Fix splits, encode octree

The global loss combines photometric 1\ell_1 and SSIM loss, opacity sparsity, and negative log-likelihoods of feature quantization: L=(1αs)L1+αsLSSIM+λoioi+λq1Ni=1N[NLL(fi,q)+NLL(si,q)]\mathcal L = (1-\alpha_s)\,\mathcal L_1 + \alpha_s\,\mathcal L_{\rm SSIM} + \lambda_o \sum_i o_i + \lambda_q\, \frac{1}{N} \sum_{i=1}^N [\mathrm{NLL}(f_{i,q}) + \mathrm{NLL}(s_{i,q})] Pseudocode for the key algorithms—building the occupancy-octree and encoding features via arithmetic coding—are explicitly included in the reference [(Wang et al., 30 Nov 2025), Sec. 4.3].

5. Benchmarking, Comparison, and Quantitative Results

Smol-GS is benchmarked on MIP-NeRF 360, Tanks & Temples, and Deep Blending. The following table summarizes performance for MIP-NeRF 360:

Method PSNR↑ SSIM↑ LPIPS↓ Size (MB) Compression Ratio
3DGS-30K 27.21 0.815 0.214 734.0
HAC++ 27.60 0.803 0.253 8.74 84×
Smol-GS (small) 27.29 0.798 0.260 4.75 155×

Compression ratio is defined as Sorig/ScomprS_{\rm orig} / S_{\rm compr}. Smol-GS achieves up to 155×\times compression over vanilla 3DGS-30K at matched rendering quality. Other metrics:

  • Training time: \approx32 min/scene (NVIDIA H200)
  • Encoding: 1–4 s/scene
  • Real-time rendering: 200–400 fps [(Wang et al., 30 Nov 2025), Table 1; Sec. 5.4]

6. Visual and Semantic Analysis; Downstream Applications

Figures 2 and 8 of (Wang et al., 30 Nov 2025) exhibit that Smol-GS faithfully reconstructs sharp edges, specular reflections, and transparencies at drastic (order-of-magnitude) reductions in model size. In challenging regions (e.g., stainless and glass surfaces), learned per-splat features offer better expressivity than standard spherical harmonics at a lower representation cost.

The discrete occupancy-octree forms an explicit spatial data structure enabling occupancy queries necessary for navigation and collision avoidance. Because attributes are decoupled and accessible, Smol-GS supports splat-wise semantic labeling, scene graph reasoning, and potentially forms a basis for SLAM, planning, and 3D scene understanding pipelines. This suggests utility not only as a rendering primitive but as a unified geometric/semantic abstraction layer for embodied or interactive AI.

7. Comparative Perspective and Research Context

Smol-GS is distinct from prior methods such as LocoGS, Mini-Splatting, OMG, Scaffold-GS, and HAC++ in several ways:

  • OMG (Lee et al., 21 Mar 2025) and its variants focus on attribute-level quantization, neural field compression, and importance-guided pruning—reducing, but not eliminating, coordinate redundancy or anchor-offset overhead.
  • HAC++ and Scaffold-GS reduce local redundancy but are averse to coordinate compression due to fidelity concerns.
  • Smol-GS consolidates the spatial hierarchy using a lossless occupancy-octree and performs per-splat, spatially conditioned feature quantization, achieving higher compression and enabling explicit geometric/semantic manipulations. A plausible implication is that occupancy-octree coordinate compression and learned semantic features facilitate hybrid use cases spanning rendering and scene understanding without bespoke retraining or expansion of storage footprint.

References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Smol-GS.