Achieving long-term consistent video generation
Determine effective methods to achieve long-term temporally consistent video generation in text-to-video systems, particularly those based on Diffusion Transformers (DiTs), ensuring coherence across extended durations and addressing challenges in temporal consistency.
References
Despite these rapid advancements in DiTs, it remains technically unclear how to achieve long-term consistent video generation.
— CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
(2408.06072 - Yang et al., 12 Aug 2024) in Section 1 (Introduction)