MoRel: Long-Range Flicker-Free 4D Motion Modeling via Anchor Relay-based Bidirectional Blending with Hierarchical Densification (2512.09270v1)

Published 10 Dec 2025 in cs.CV

Abstract: Recent advances in 4D Gaussian Splatting (4DGS) have extended the high-speed rendering capability of 3D Gaussian Splatting (3DGS) into the temporal domain, enabling real-time rendering of dynamic scenes. However, one of the major remaining challenges lies in modeling long-range motion-contained dynamic videos, where a naive extension of existing methods leads to severe memory explosion, temporal flickering, and failure to handle appearing or disappearing occlusions over time. To address these challenges, we propose a novel 4DGS framework characterized by an Anchor Relay-based Bidirectional Blending (ARBB) mechanism, named MoRel, which enables temporally consistent and memory-efficient modeling of long-range dynamic scenes. Our method progressively constructs locally canonical anchor spaces at key-frame time index and models inter-frame deformations at the anchor level, enhancing temporal coherence. By learning bidirectional deformations between KfA and adaptively blending them through learnable opacity control, our approach mitigates temporal discontinuities and flickering artifacts. We further introduce a Feature-variance-guided Hierarchical Densification (FHD) scheme that effectively densifies KfA's while keeping rendering quality, based on an assigned level of feature-variance. To effectively evaluate our model's capability to handle real-world long-range 4D motion, we newly compose long-range 4D motion-contained dataset, called SelfCap$_{\text{LR}}$. It has larger average dynamic motion magnitude, captured at spatially wider spaces, compared to previous dynamic video datasets. Overall, our MoRel achieves temporally coherent and flicker-free long-range 4D reconstruction while maintaining bounded memory usage, demonstrating both scalability and efficiency in dynamic Gaussian-based representations.

Summary

The paper introduces an innovative ARBB framework that achieves flicker-free long-range 4D motion modeling through bidirectional blending and on-demand key-frame anchoring.
The paper employs Feature-variance-guided Hierarchical Densification to adaptively refine anchor points based on feature complexity, optimizing both reconstruction detail and memory usage.
The paper validates MoRel on high-motion dynamic scenes, demonstrating superior reconstruction fidelity, reduced temporal inconsistencies, and enhanced scalability compared to prior methods.

MoRel: Anchor Relay-based Bidirectional Blending and Hierarchical Densification for Scalable Flicker-Free Long-Range 4D Gaussian Splatting

Problem Statement and Motivation

Modeling dynamic scenes for novel view synthesis requires handling both high visual fidelity and temporal consistency, especially over extended video sequences. While 3D Gaussian Splatting (3DGS) has enabled efficient real-time photorealistic rendering in static scenes, extending these methods to the temporal domain presents new challenges: naive all-at-once 4DGS approaches incur exponential memory usage and fail to handle occlusions and disocclusions robustly, while chunk-based methods optimize smaller temporal windows at the cost of temporal discontinuities and boundary artifacts. Prior methods employing sliding windows or hierarchical temporal structures have struggled to generalize, especially under practical constraints such as random frame access and system memory budgets (Figure 1).

Figure 1: Conceptual comparison of existing 4DGS methods, highlighting the fundamental trade-offs and system challenges in modeling long-range 4D motion.

MoRel Architectural Innovations

To address the limitations of prior methods, MoRel introduces an Anchor Relay-based Bidirectional Blending (ARBB) mechanism. The ARBB framework leverages periodically inserted Key-frame Anchors (KfAs) to define local canonical spaces, each optimized to represent temporally localized scene features. Bidirectional deformation fields between adjacent KfAs dynamically model complex motions, with a learnable temporal blending (opacity control) that ensures smooth transitions and effective handling of occlusion/disocclusion events. MoRel’s on-demand KfA loading maintains scalable memory usage and enables temporal random access. The four-stage training pipeline—global canonical anchor initialization, KfA optimization, progressive windowed bidirectional deformation (PWD), and intermediate frame blending (IFB)—systematically refines the representation without compromising temporal coherence (Figure 2, Figure 3).

Figure 2: MoRel framework overview—a multistage pipeline adapting ARBB and hierarchical densification for bounded memory and consistent long-range motion modeling.

Figure 3: Comparison of training strategies, illustrating the superior bounded memory and coherence of MoRel bidirectional blending relative to all-at-once and chunk-wise alternatives.

Feature-Variance-Guided Hierarchical Densification

MoRel further introduces Feature-variance-guided Hierarchical Densification (FHD), a novel anchor-point refinement strategy. By analyzing feature variance from global anchor training, anchor-points are assigned hierarchical levels that reflect local frequency complexity. Early training iterations prioritize stabilization of low-frequency regions, while later densification adaptively grows anchor-points in high-frequency regions using gradient-based statistics modulated by their level assignment. This frequency-aware protocol suppresses excessive anchor growth, preserves spatial detail, and optimizes memory allocation (Figure 4, Figure 5, Figure 6).

Figure 4: Feature-variance-guided Hierarchical Densification protocol—variance-based leveling and targeted anchor-point growth enable efficient detail refinement.

Figure 5: FHD visualization and frequency analysis, showing progressive high-frequency component emergence at higher anchor-point levels.

Figure 6: Hierarchical level-wise renderings demonstrate efficient static content reconstruction and detail recovery under adaptive densification.

Preventing Inter-Chunk Interference and Temporal Flicker

MoRel’s progressive windowed bidirectional deformation (PWD) and intermediate frame blending (IFB) stages mitigate backward contamination prevalent in naive chunk-wise training. PWD ensures independent temporal optimization windows for each anchor, whereas IFB fuses overlapping anchor regions with learnable temporal offset and decay parameters, eliminating ghosting and appearance shifts (Figure 7, Figure 8).

Figure 7: Visual example of backward contamination and object degradation in naive chunk-wise bidirectional deformation.

Figure 8: Temporal profile analysis—bidirectional deformation eliminates chunk boundaries and ensures flicker-free temporal continuity.

Experimental Validation and Numerical Performance

Extensive experiments on the newly proposed SelfCap $_{\text{LR}}$ dataset, containing high-motion and wide-spatial-range scenes over thousands of frames, demonstrate that MoRel achieves superior reconstruction fidelity (PSNR, SSIM, LPIPS), lowest temporal consistency error (tOF), and efficient memory utilization versus SOTA all-at-once and chunk-based competitors. Notably, MoRel maintains sharp reconstructions and prevents boundary flickering even under challenging dynamic content (Figure 9). Ablation analyses confirm the individual contributions of ARBB, PWD, IFB, and FHD to the overall system capacity and stability.

Figure 9: Qualitative comparison on SelfCap $_{\text{LR}}$ , highlighting sharper, temporally consistent modeling by MoRel in high-magnitude motion.

Practical and Theoretical Impact

MoRel’s architecture is directly applicable to memory-constrained long-range dynamic scene modeling, such as view synthesis in streaming or interactive environments. The ARBB design generalizes across varied motion patterns and spatial scales, and the FHD protocol offers a template for frequency-aware anchor refinement adaptable to large-scale and multi-modal input. The ability to support random-access rendering with bounded memory overhead marks a critical step toward real-time deployment in immersive applications.

On a theoretical level, MoRel advances learnable bidirectional blending in sparse anchor-based representations, connecting frequency analysis with anchor-point management. This opens new research avenues in multi-scale, spatio-temporal modeling, and dynamic resource allocation for neural graphics primitives.

Future Directions

While MoRel addresses many computational bottlenecks, further research is warranted on spatially expansive and rapidly varying scenes, where global anchors may lose representational coherence. Integration of spatial grid-aware scheduling and refined frequency estimation, leveraging recent advances in scalable neural view synthesis and blockwise scene partitioning, could enhance compositionality and further reduce boundary inconsistencies.

Conclusion

MoRel establishes a highly scalable and temporally stable paradigm for long-range 4D Gaussian Splatting via its ARBB mechanism and hierarchical densification. The approach simultaneously achieves flicker-free reconstructions, efficient memory usage, and competitive fidelity in challenging dynamic scenes, reflecting strong applicability to both academic study and real-world system design (2512.09270).