High-fidelity, long-range temporally consistent video frame interpolation

Develop video frame interpolation algorithms that generate high-fidelity intermediate frames between a given start frame and end frame while simultaneously maintaining long-range temporal consistency, respecting the boundary conditions imposed by the start and end frames, and aligning with complex motion descriptions over extended sequences.

Background

Video frame interpolation seeks to synthesize intermediate frames between specified start and end images. While diffusion-based models have improved visual fidelity, they often operate unidirectionally and lack mechanisms for temporal self-consistency, leading to drift and boundary misalignment in long sequences. The paper frames the combined requirements of high fidelity, long-range temporal coherence, boundary adherence, and semantic alignment with complex motion descriptions as an outstanding challenge.

Addressing this challenge is central for applications such as slow-motion generation, frame-rate conversion, and text-guided motion synthesis. The authors propose a bidirectional, cycle-consistent training framework as a step toward resolving this open issue, but explicitly acknowledge the broader problem remains unsolved in general.

References

Despite significant advances in generative modeling, producing high-fidelity interpolations that maintain long-range temporal consistency, respect boundary conditions, and align with complex motion descriptions remains an open challenge.

Can Video Diffusion Models Predict Past Frames? Bidirectional Cycle Consistency for Reversible Interpolation  (2604.01700 - Liu et al., 2 Apr 2026) in Section 1, Introduction