ARBB: Anchor Relay-based Bidirectional Blending

Updated 11 December 2025

ARBB is a mechanism designed for temporally coherent, memory-efficient long-range 4D motion modeling using canonical anchor sets.
It employs bidirectional deformation fields and hexplane neural grids to predict per-anchor displacements and enable smooth temporal interpolation.
ARBB integrates adaptive blending with learnable opacity controls and hierarchical densification to mitigate temporal discontinuities and reduce memory usage.

Anchor Relay-based Bidirectional Blending (ARBB) is a mechanism designed for temporally coherent, memory-efficient long-range 4D motion modeling in Gaussian-based representations of dynamic scenes. Integrated in the MoRel framework, ARBB enables high-fidelity, flicker-free rendering by operating on canonical anchor sets constructed at key-frame time indices, modeling bidirectional inter-frame deformations, and blending anchor contributions at inference with learnable per-anchor opacity controls. The method specifically addresses memory scalability, temporal continuity, and occlusion handling in the context of 4D Gaussian Splatting (4DGS), which extends the real-time rendering capabilities of 3DGS to dynamic scene sequences (Kwak et al., 10 Dec 2025).

1. Construction of Locally Canonical Anchor Spaces at Key-frame Indices

The temporal axis is partitioned using a fixed Group-of-Pictures (GOP) interval. For a video with $T$ frames and chosen spacing GOP, key frames are allocated at indices $t_n = n \cdot \mathrm{GOP}$ for $n = 0, \ldots, \lceil T/\mathrm{GOP} \rceil-1$ . At each key-frame index $t_n$ , a locally canonical anchor space $A_n^\mathrm{Key}$ is instantiated:

Each anchor $a_{nk}$ $a_{nk}$ in $A_n^\mathrm{Key}$ $A_{n}^{Key}$ possesses the tuple of learnable attributes $\theta_{nk} = (p_{nk}, \hat{f}_{nk}, \ell_{nk}, O_{nk})$ :
- $p_{nk} \in \mathbb{R}^3$ : anchor center.
- $\hat{f}_{nk}$ : feature encoding local appearance/geometry.
- $\ell_{nk} \in \mathbb{R}^3$ : diagonal scale for Gaussian extent.
- $O_{nk} \in \mathbb{R}^{I \times 3}$ : offsets for $I$ Gaussians per anchor.

Neural-Gaussian attributes are decoded via $F_\mathrm{attr}(\hat{f}_{nk}, \delta_{nk}, d_{nk})$ , where $\delta_{nk}$ and $d_{nk}$ encode camera-anchor geometry. Initialization of $A_n^\mathrm{Key}$ is performed by copying from a globally trained anchor set $A^\mathrm{Global}$ , implementing one round of Feature-variance-guided Hierarchical Densification (FHD), and subsequent fine-tuning on frames in $[t_n-\epsilon, t_n+\epsilon]$ .

2. Modeling Bidirectional Deformation Fields

For every key-frame anchor space $A_n^\mathrm{Key}$ , a parameterized hexplane deformation field $D_n$ is learned. This field maps each anchor $a_{nk}$ and a normalized time offset $\tau_n = (t-t_n)/\mathrm{GOP} \in [-1,1]$ to a displacement $\Delta x$ :

During training, each $A_n^\mathrm{Key}$ is responsible for a Bidirectional Deformation Window $\mathrm{BDW}_n = [t_n-\mathrm{GOP}, t_n+\mathrm{GOP}]$ .
At a query time $t$ $t$ , the nearest keys are $t_{n^-} = t_n$ $t_{n^{-}} = t_{n}$ , $t_{n^+}=t_{n+1}$ $t_{n^{+}} = t_{n + 1}$ ; two sets of deformed anchors are produced:
- Backward-warped from $A_{n+1}^\mathrm{Key}$ : $a_{n^+,k}(t) = p_{n+1,k} + D_{n+1}(p_{n+1,k}, \tau_{n+1})$ , with $\tau_{n+1} = (t-t_{n+1})/\mathrm{GOP}$ .
- Forward-warped from $A_n^\mathrm{Key}$ : $a_{n^-,k}(t) = p_{n,k} + D_n(p_{nk}, \tau_n)$ .

The hexplane approach enables per-anchor continuous motion over time, relying on a low-parameter field for efficient representation and prediction.

3. Adaptive Bidirectional Blending with Learnable Opacity

Blending the contributions of deformed anchors from both key-frame directions is achieved through learnable per-anchor opacity modulations. For anchor $k$ , two directions are considered: Fw (forward, from $t_n$ ) and Bw (backward, from $t_{n+1}$ ). Each has associated blending parameters:

Temporal offset $o_{n,k}^{\mathrm{dir}}$ and decay speed $d_{n,k}^{\mathrm{dir}}$ .
The per-anchor, per-direction blending weight is given by:

$w_{n,k}^{\mathrm{dir}}(\tau) = \exp\big[-\lambda_\mathrm{decay} \cdot d_{n,k}^{\mathrm{dir}} \cdot |\tau - o_{n,k}^{\mathrm{dir}}|\big]$

The Gaussian opacities are modulated as $\alpha_{n,k,i}^{\mathrm{dir}}(t) := \alpha_{n,k,i} \cdot w_{n,k}^{\mathrm{dir}}(\tau)$ .

The final rendered set at time $t$ is the union of both forward and backward deformed Gaussians, each weighted according to the learned blending schedule. Standard volume rendering via Gaussian splatting then produces the output color $C(t)$ . This mechanism is central to ARBB's ability to mitigate temporal discontinuities and flickering artifacts.

4. Algorithmic Workflow

ARBB's data flow during inference and training is summarized by the following procedural steps, directly derived from (Kwak et al., 10 Dec 2025):

For a query frame $t$ , obtain $n = \lfloor t/\mathrm{GOP} \rfloor$ .
Load $A_n^\mathrm{Key}$ , $D_n$ , $A_{n+1}^\mathrm{Key}$ , $D_{n+1}$ .
Compute $\tau_n = (t-t_n)/\mathrm{GOP}$ and $\tau_{n+1} = (t-t_{n+1})/\mathrm{GOP}$ .
For each anchor $k$ $k$ in $A_n^\mathrm{Key}$ $A_{n}^{Key}$ :
- Apply $D_n$ to obtain $\Delta x^-$ and blending weight $w^-$ .
- Generate deformed Gaussians at $p_{nk}+\Delta x^-$ with scaled opacity.
For each anchor $k$ $k$ in $A_{n+1}^\mathrm{Key}$ $A_{n + 1}^{Key}$ :
- Apply $D_{n+1}$ to obtain $\Delta x^+$ and $w^+$ .
- Generate corresponding Gaussians.
Render all Gaussians (from both sets) by standard depth-sorted splatting.

This two-sided blending maintains temporal smoothness while bounding memory usage; at most two anchor sets and two deformation fields are loaded concurrently.

5. Implementation and Memory Efficiency Considerations

Deformation Field Architecture (Dₙ): Each $D_n$ consists of a 6-hexplane neural grid, with each axis pair stored at resolution $R \times R$ . At inference, an anchor's spatial coordinate and temporal scalar $\tau$ are used to bilinearly sample features from the six planes, which are concatenated with $\tau$ and fed to a two-layer MLP predicting displacement $\Delta x \in \mathbb{R}^3$ .
Losses: The model is trained with a photometric reconstruction loss:

$\mathcal{L}_\mathrm{photo} = \sum_\mathrm{pixels} \| C_\mathrm{render}(t) - C_\mathrm{gt}(t) \|_1$

An optional deformation smoothness regularizer $\mathcal{L}_\mathrm{smooth} = \sum \| \partial D_n/\partial\tau \|^2$ promotes temporal coherence.

Chunkwise Computation: Only the relevant KfA and $D_n$ / $D_{n+1}$ for a current temporal chunk are loaded in memory, guaranteeing scalability for long or high-resolution sequences.
Isolation During Training: Each deformation field $D_n$ is independently trained on its own bidirectional window to preclude backward contamination across windows.

6. Integration with Feature-variance-guided Hierarchical Densification

Before ARBB and per-window deformation (PWD) processing, Feature-variance-guided Hierarchical Densification (FHD) is applied. FHD assigns each global anchor a level $L \in \{0,1,2\}$ based on feature-variance quantiles. This dictates anchor-densification schedules as follows:

Level-0 (low-frequency anchors): densified earlier in the pipeline.
Level-2 (high-frequency/detail anchors): introduced at later stages.

Thus, the KfA sets handed to ARBB are adaptively denser in high-detail regions and sparser in homogeneous areas. The blending weights $w_{n,k}^{\mathrm{dir}}$ accordingly gain relevance only where necessary, preventing excessive anchor proliferation or underrepresentation.

7. Significance and Applicability

ARBB, as instantiated in MoRel, directly addresses the convergence and scalability bottlenecks of prior 4DGS-based methods—specifically, memory explosion, temporal flickering, and occlusion inconsistency in the modeling of long-range dynamic videos. The modular relay-based bidirectional anchor handling, learnable blending, and hierarchical densification collectively enable temporally coherent, flicker-free reconstructions at bounded memory cost, with demonstrated efficacy on long-range, high-motion datasets such as SelfCap $_\mathrm{LR}$ (Kwak et al., 10 Dec 2025). These attributes position ARBB as a mechanism of high practical relevance for real-time, high-fidelity 4D dynamic scene rendering in both academic research and potential deployment scenarios.

Markdown Report Issue Upgrade to Chat

References (1)

MoRel: Long-Range Flicker-Free 4D Motion Modeling via Anchor Relay-based Bidirectional Blending with Hierarchical Densification (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Anchor Relay-based Bidirectional Blending (ARBB).