ARBB: Anchor Relay-based Bidirectional Blending
- ARBB is a mechanism designed for temporally coherent, memory-efficient long-range 4D motion modeling using canonical anchor sets.
- It employs bidirectional deformation fields and hexplane neural grids to predict per-anchor displacements and enable smooth temporal interpolation.
- ARBB integrates adaptive blending with learnable opacity controls and hierarchical densification to mitigate temporal discontinuities and reduce memory usage.
Anchor Relay-based Bidirectional Blending (ARBB) is a mechanism designed for temporally coherent, memory-efficient long-range 4D motion modeling in Gaussian-based representations of dynamic scenes. Integrated in the MoRel framework, ARBB enables high-fidelity, flicker-free rendering by operating on canonical anchor sets constructed at key-frame time indices, modeling bidirectional inter-frame deformations, and blending anchor contributions at inference with learnable per-anchor opacity controls. The method specifically addresses memory scalability, temporal continuity, and occlusion handling in the context of 4D Gaussian Splatting (4DGS), which extends the real-time rendering capabilities of 3DGS to dynamic scene sequences (Kwak et al., 10 Dec 2025).
1. Construction of Locally Canonical Anchor Spaces at Key-frame Indices
The temporal axis is partitioned using a fixed Group-of-Pictures (GOP) interval. For a video with frames and chosen spacing GOP, key frames are allocated at indices for . At each key-frame index , a locally canonical anchor space is instantiated:
- Each anchor in possesses the tuple of learnable attributes :
- : anchor center.
- : feature encoding local appearance/geometry.
- : diagonal scale for Gaussian extent.
- : offsets for Gaussians per anchor.
Neural-Gaussian attributes are decoded via , where and encode camera-anchor geometry. Initialization of is performed by copying from a globally trained anchor set , implementing one round of Feature-variance-guided Hierarchical Densification (FHD), and subsequent fine-tuning on frames in .
2. Modeling Bidirectional Deformation Fields
For every key-frame anchor space , a parameterized hexplane deformation field is learned. This field maps each anchor and a normalized time offset to a displacement :
- During training, each is responsible for a Bidirectional Deformation Window .
- At a query time , the nearest keys are , ; two sets of deformed anchors are produced:
- Backward-warped from : , with .
- Forward-warped from : .
The hexplane approach enables per-anchor continuous motion over time, relying on a low-parameter field for efficient representation and prediction.
3. Adaptive Bidirectional Blending with Learnable Opacity
Blending the contributions of deformed anchors from both key-frame directions is achieved through learnable per-anchor opacity modulations. For anchor , two directions are considered: Fw (forward, from ) and Bw (backward, from ). Each has associated blending parameters:
- Temporal offset and decay speed .
- The per-anchor, per-direction blending weight is given by:
- The Gaussian opacities are modulated as .
The final rendered set at time is the union of both forward and backward deformed Gaussians, each weighted according to the learned blending schedule. Standard volume rendering via Gaussian splatting then produces the output color . This mechanism is central to ARBB's ability to mitigate temporal discontinuities and flickering artifacts.
4. Algorithmic Workflow
ARBB's data flow during inference and training is summarized by the following procedural steps, directly derived from (Kwak et al., 10 Dec 2025):
- For a query frame , obtain .
- Load , , , .
- Compute and .
- For each anchor in :
- Apply to obtain and blending weight .
- Generate deformed Gaussians at with scaled opacity.
- For each anchor in :
- Apply to obtain and .
- Generate corresponding Gaussians.
- Render all Gaussians (from both sets) by standard depth-sorted splatting.
This two-sided blending maintains temporal smoothness while bounding memory usage; at most two anchor sets and two deformation fields are loaded concurrently.
5. Implementation and Memory Efficiency Considerations
- Deformation Field Architecture (Dₙ): Each consists of a 6-hexplane neural grid, with each axis pair stored at resolution . At inference, an anchor's spatial coordinate and temporal scalar are used to bilinearly sample features from the six planes, which are concatenated with and fed to a two-layer MLP predicting displacement .
- Losses: The model is trained with a photometric reconstruction loss:
An optional deformation smoothness regularizer promotes temporal coherence.
- Chunkwise Computation: Only the relevant KfA and / for a current temporal chunk are loaded in memory, guaranteeing scalability for long or high-resolution sequences.
- Isolation During Training: Each deformation field is independently trained on its own bidirectional window to preclude backward contamination across windows.
6. Integration with Feature-variance-guided Hierarchical Densification
Before ARBB and per-window deformation (PWD) processing, Feature-variance-guided Hierarchical Densification (FHD) is applied. FHD assigns each global anchor a level based on feature-variance quantiles. This dictates anchor-densification schedules as follows:
- Level-0 (low-frequency anchors): densified earlier in the pipeline.
- Level-2 (high-frequency/detail anchors): introduced at later stages.
Thus, the KfA sets handed to ARBB are adaptively denser in high-detail regions and sparser in homogeneous areas. The blending weights accordingly gain relevance only where necessary, preventing excessive anchor proliferation or underrepresentation.
7. Significance and Applicability
ARBB, as instantiated in MoRel, directly addresses the convergence and scalability bottlenecks of prior 4DGS-based methods—specifically, memory explosion, temporal flickering, and occlusion inconsistency in the modeling of long-range dynamic videos. The modular relay-based bidirectional anchor handling, learnable blending, and hierarchical densification collectively enable temporally coherent, flicker-free reconstructions at bounded memory cost, with demonstrated efficacy on long-range, high-motion datasets such as SelfCap (Kwak et al., 10 Dec 2025). These attributes position ARBB as a mechanism of high practical relevance for real-time, high-fidelity 4D dynamic scene rendering in both academic research and potential deployment scenarios.