Anchor-Octree LoD 3DGS for Efficient Rendering

Updated 27 January 2026

The paper introduces a novel hybrid architecture using anchors and sparse octrees to dynamically control level-of-detail in 3D Gaussian Splatting.
Methodology leverages per-anchor feature codes, MLP-decoded Gaussian parameters, and adaptive refinement strategies for efficient scene encoding.
Empirical results demonstrate state-of-the-art improvements in PSNR, storage reduction, and rendering speeds, highlighting its scalability in various applications.

Anchor- and Octree-based LoD-structured 3DGS (3D Gaussian Splatting) refers to a class of scene representations and algorithms in which 3D Gaussians are spatially organized using a hierarchical structure based on sparse octrees and anchor points, enabling adaptive, memory-efficient, and real-time rendering with precise level-of-detail (LoD) control. In these frameworks, an explicit or implicit octree partitions the scene volume into voxels, each associated with a set of "anchors" that parameterize local clusters of Gaussians at varying spatial resolutions. This architecture underpins recent advances in novel view synthesis, dense mapping, streaming video reconstruction, and 3DGS compression, providing both scalability and dynamic adaptation to scene geometry and view constraints.

1. Principles of Anchor and Octree Structuring

Anchor- and octree-based LoD-structured 3DGS decomposes the scene into multi-resolution cells via an octree whose depth and occupancy are driven by scene geometry or data density. Anchors are positioned at the centers of occupied or geometrically salient voxels at different octree levels, each encoding spatial, appearance, and feature information at a given granularity. This approach implicitly defines a hierarchy of LoD, with coarse anchors providing large-scale coverage and fine anchors reserved for regions exhibiting high geometric or photometric complexity. Gaussian primitives are generated from anchor parameters, leveraging per-anchor feature codes and learnable offsets to capture intra-voxel detail.

Typical formulations (Lin et al., 19 Aug 2025, Wang et al., 2024, Liu et al., 26 Jan 2026, Ren et al., 2024) initialize octree construction as follows:

Define the base voxel size $\epsilon_0$ and maximum octree depth $L_\mathrm{max}$ , with per-level cell size $\epsilon_\ell = \epsilon_0/2^\ell$ .
For each voxel $v$ at level $\ell$ , compute point density $\rho_v = |P_v|/\epsilon_\ell^3$ . Adaptive splitting (for $\rho_v > \tau_\mathrm{split}$ ) and pruning (for $\rho_v < \tau_\mathrm{prune}$ ) regulate refinement.
Each anchor $v$ stores spatial center $p_v \in \mathbb{R}^3$ , level $l_v$ , scale $s_v$ , feature embedding $f_v$ , and a set of learnable offsets, often denoted $\{o_v^i\}$ .

This strategy yields a sparse octree in which memory and computation grow sublinearly with scene size due to efficient encoding of empty or homogeneous regions and localized refinement where necessary.

2. Gaussian Parametrization and Anchor-based Generation

For each anchor, the framework generates a packet of neural Gaussians. Means are computed as $\mu_{v,i} = p_v + s_v \odot o_v^i$ , and covariances via an MLP decoder—e.g., $\Sigma_{v,i} = R_{v,i} S_{v,i}^2 R_{v,i}^\top$ , where $R_{v,i}$ encodes rotation and $S_{v,i}$ scale. Appearance properties (opacity $\alpha_{v,i}$ , color $c_{v,i}$ , SH coefficients, etc.) are decoded from anchor features, view direction, and camera–anchor distance through compact neural networks (Wang et al., 2024, Liu et al., 26 Jan 2026, Ren et al., 2024).

Attributes are optimized by backpropagation through photometric, depth, and structural losses, with gradients either pooled or individually contributing to anchor growth/splitting and pruning in an online or batch regime.

This anchor-based parametrization enables highly compressed representations: anchors act as scaffolds supporting detailed, anisotropic Gaussian clusters where needed, while sparse regions retain only coarse anchors, minimizing storage without manual LoD tuning.

3. Level-of-Detail Hierarchies and Adaptive Selection

Octree-anchored multi-resolution structuring enables dynamic and view-dependent LoD scheduling. The system computes, for each anchor, the appropriate LoD level as a function of its distance to the camera or its projected screen-space size. For example, an anchor's “preferred” level can be calculated as

$L^* = \log_2 \left( \frac{d_\mathrm{max}}{d} \right)$

where $d$ is camera–anchor distance and $d_\mathrm{max}$ is the scene’s far bound (Ren et al., 2024, Liu et al., 26 Jan 2026). Anchors with $l \leq \hat{L}$ , where $\hat{L}$ is the nearest integer to $L^*$ , are queried. LoD culling is achieved either by thresholding projected size or through explicit per-anchor scheduling based on view geometry and gradient signals.

Smooth LoD blending (e.g., opacity interpolation between adjacent LoD levels) prevents visual popping. View- and loss-driven growing/pruning strategies selectively refine under-optimized regions and remove redundant anchors (Liu et al., 26 Jan 2026, Wang et al., 2024, Ren et al., 2024). In streaming and online mapping contexts, LoD scheduling and anchor updates are intertwined with dynamic scene changes via GMM partitioning and progressive keyframe integration (Liu et al., 26 Jan 2026, Wang et al., 2024).

4. Methods for Compression, Streaming, and Online Mapping

The anchor–octree LoD structure facilitates advanced compression and real-time updates. In HGSC (Huang et al., 2024), after importance-based pruning, the 3D positions are encoded in a depth-12 octree using context-adaptive arithmetic coding. Within spatial KD-tree blocks, anchor selection is performed by Farthest Point Sampling, and attributes of non-anchor Gaussians are predicted via $k$ -NN regression from anchors and lower LoD reconstructions, with quantized residuals compressed by LZ77.

Streaming systems such as StreamLoD-GS (Liu et al., 26 Jan 2026) employ quantized residual refinement—only transmitting per-frame residuals for dynamic anchors identified by GMM analysis of anchor gradient magnitudes, yielding up to 80% reduction in per-frame bandwidth over naïve transmission. Online mapping pipelines like OG-Mapping (Wang et al., 2024) employ anchor-based progressive refinement in conjunction with a dynamic keyframe window to maintain compactness and avert catastrophic forgetting in continual operation.

5. Rendering, Optimization, and Implementation

Rendering proceeds by projecting Gaussian primitives from all anchors included by current LoD constraints into screen space, sorting by depth, and compositing with alpha blending. Per-anchor MLPs are efficiently batched for fast GPU execution. Training incorporates photometric, depth, and structural similarity losses, with additional regularizers for sparsity and LoD smoothness (Lin et al., 19 Aug 2025, Wang et al., 2024, Ren et al., 2024).

Implementation strategies across frameworks are unified in their use of:

GPU-resident struct arrays and level-indexed anchor tables for rapid LoD selection.
Auxiliary hash maps or Morton code-based indexing for fast octree traversal.
On-demand growing and pruning policies executed infrequently for computational efficiency.

Typical models achieve gains such as storage reduction by factors of 5–20× and rendering speeds exceeding 500–700 FPS at high image resolutions (Lin et al., 19 Aug 2025, Huang et al., 2024, Liu et al., 26 Jan 2026).

6. Empirical Performance and Application Contexts

Anchor- and octree-based LoD-structured 3DGS achieves high-fidelity, scalable scene representations across a range of tasks:

Method	PSNR (dB) ↑	Storage (MB) ↓	Render (FPS) ↑
LongSplat	27.88	101	281.7
OG-Mapping	38.6	30	500–800
HGSC (small scene)	41.31	15.81	n/a
StreamLoD-GS	27.84	0.19	312–608
Octree-GS	—	~7	30–200+

Data sourced from (Lin et al., 19 Aug 2025, Wang et al., 2024, Huang et al., 2024, Liu et al., 26 Jan 2026, Ren et al., 2024).

In city-scale or long-video scenarios (e.g., LongSplat, Octree-GS), the octree anchor structure provides multi-order memory and throughput gains, enabling state-of-the-art fidelity with real-time rendering and minimal memory footprints (Lin et al., 19 Aug 2025, Ren et al., 2024). In streaming FVV (StreamLoD-GS), LoD-AO structuring, GMM-based motion partitioning, and quantized residual refinement achieve sub-MB per-frame storage and strong PSNR/SSIM under extremely sparse view regimes, with ablations indicating >2 dB drops without LoD structuring (Liu et al., 26 Jan 2026). In compression (HGSC), LoD-staged anchor referencing delivers 4.5× size reduction and orders-of-magnitude faster decompression relative to blockwise compression baselines (Huang et al., 2024).

7. Context, Limitations, and Evolution

The anchor- and octree-based LoD-structured paradigm now underpins modern scalable 3DGS pipelines for scene reconstruction, streaming, and mapping. Key advances include efficient multi-resolution anchor scaffolds, dynamic LoD-driven refinement, and seamless integration with neural decoders and structured attribute coders. Limitations persist related to octree construction under very high noise, merging conflicts during rapid scene changes, and the trade-off between very aggressive pruning (storage) and preservation of fine detail (visual fidelity). Empirical analyses confirm that LoD-aware anchor selection and progressive update policies constitute critical factors in reconciling fidelity, speed, and compactness (Lin et al., 19 Aug 2025, Wang et al., 2024, Huang et al., 2024, Liu et al., 26 Jan 2026, Ren et al., 2024).

A plausible implication is that LoD-anchored Gaussian splatting, due to architectural flexibility and inherent responsiveness to data density and view constraints, will remain central for memory- and compute-efficient 3D scene representations across offline and online, static and dynamic applications.