Anchor Context in 3D Gaussian Splatting
- Anchor Context is a hierarchical model that captures spatial dependencies among voxel-level anchors for efficient 3D Gaussian Splatting compression.
- It employs autoregressive coding with multi-level decomposition and hyperpriors to conditionally predict anchor attributes, significantly reducing data size.
- Empirical results demonstrate that combining context modeling and hyperprior integration yields high coding gains while preserving rendering quality.
An anchor context model addresses the challenge of efficiently compressing and reconstructing 3D Gaussian Splatting (3DGS) representations for novel view synthesis by leveraging structured multi-level context modeling at the anchor (voxel) level. Unlike prior methods that compress anchors (groups of Gaussians) independently, the anchor context model exploits the hierarchical spatial dependencies among anchors to yield substantially higher coding efficiency, reducing storage requirements by more than 100× compared to vanilla 3DGS and 15× over specialized methods like Scaffold-GS, while preserving or improving rendering quality (Wang et al., 2024).
1. Anchor Partitioning and Multi-Level Decomposition
3DGS represents a scene as a large set of 3D Gaussians; Scaffold-GS groups spatially proximal Gaussians into "anchors", each characterized by position , feature vector , scale , and offsets . ContextGS recursively partitions all anchors into non-overlapping levels:
from the coarsest () to finest (0). This partition employs a bottom-up voxelization procedure, where each level comprises unique anchors residing in voxels of increasing size, constructed via hierarchical, data-driven downsampling to ensure uniform coverage. Anchors are assigned to levels in such a way that each anchor appears at exactly one level, and parent-child relationships between anchors across levels are maintained for context propagation.
2. Autoregressive Anchor-Level Context Modeling
Compression proceeds by entropy coding anchor attributes in a strictly coarse-to-fine order. For every anchor at level :
- The anchor's primary feature 0 is modeled as a conditional (discretized) Gaussian mixture:
1
where parameters 2 are outputs of an MLP 3 applied to the context vector 4.
- For 5, the context 6 includes the decoded feature/scale of the unique parent anchor at level 7 and the anchor's position:
8
For anchors with no parent (the coarsest), 9.
The anchor-level autoregressive model thus efficiently captures spatial dependencies and conditional distributions as each anchor is coded given its parent and position.
3. Hyperprior Integration and Coding
To further enhance coding performance, ContextGS introduces a per-anchor hyperprior:
- Each anchor possesses a latent vector 0 (1 with 2 yields 3–13).
- Quantized hyperprior states 4 are entropy-coded using a fully factorized nonparametric density:
5
- The context for encoding all anchor attributes is now 6 for anchor 7 at level 8, allowing the network to model complex anchor-specific variations—especially crucial for coarser anchors that lack parent's context.
4. Compression Workflow
The encoding and decoding process traverses levels in coarse-to-fine order:
Encoding Pseudocode:
4 Decoding mirrors this operation, using decoded parent anchor features at each step.
5. Empirical Compression and Fidelity Performance
ContextGS, leveraging the anchor-level context model and hyperprior, achieves dramatic compression with minimal quality loss. This is exemplified in the following summary (low-rate regime):
| Dataset | Vanilla 3DGS | Scaffold-GS | ContextGS |
|---|---|---|---|
| Mip-NeRF360 | 744.7 MB | 253.9 MB | 12.68 MB |
| Tanks&Temples | 431.0 MB | 86.5 MB | 7.05 MB |
| DeepBlending | 663.9 MB | 66.0 MB | 3.45 MB |
| BungeeNeRF | 1616 MB | 183.0 MB | 14.00 MB |
Average savings are 9100× versus naive 3DGS and 015× over Scaffold-GS. Quantitatively, ContextGS matches or slightly exceeds Scaffold-GS in terms of PSNR and SSIM, with values such as PSNR 27.62 dB (vs. 27.50 dB for Scaffold) and SSIM 0.808 (vs. 0.806) on Mip-NeRF360.
6. Ablation Analyses
Component ablation confirms that both the anchor-level context model (CM) and the hyperprior (HP) contribute complementary gains. Removing either yields 17%–19% lower compression, with both together yielding 25%+ additional size reduction. Reusing coarser-level anchors for finer levels offers a further 16% coding gain. Variation of the level-size ratio 2 from 0.1 to 0.5 demonstrates that fidelity is stable, validating 3 as an effective choice.
| Method | Size (MB) | PSNR | SSIM | LPIPS |
|---|---|---|---|---|
| Scaffold-GS | 183.0 | 26.62 | 0.865 | 0.241 |
| -w/o HP, w/o CM | 18.67 | 26.93 | 0.867 | 0.222 |
| +CM only | 15.03 | 26.91 | 0.866 | 0.223 |
| +HP only | 15.41 | 26.92 | 0.867 | 0.221 |
| HP+CM (full) | 14.00 | 26.90 | 0.866 | 0.222 |
7. Technical Implications and Extensions
ContextGS's anchor context framework demonstrates that hierarchical, structured conditional entropy coding at the anchor level can fundamentally transform the efficiency-redundancy trade-off in 3DGS representation. The component design is compatible with orthogonal improvements such as learned anchor features, alternative partitionings, or more expressive hyperprior networks. The autoregressive structure matches the statistics of 3D scenes, and careful context model design realizes consistent coding gains over independent anchor compression.
The approach is data- and model-agnostic, requiring no assumptions about Gaussian attribute distributions beyond continuity, and establishes that anchor-level autoregressive context models with hyperpriors are state-of-the-art for compressed high-fidelity 3D Gaussian scene representations (Wang et al., 2024).