Synthesizing sustained long-duration video footage
Develop video-generation methods—particularly diffusion-based models—that can synthesize sustained footage spanning minutes or longer, overcoming the current limitation of generating only short clips of approximately 2–10 seconds.
References
Synthesizing sustained footage (\eg~over minutes) still remains an open research question.
                — MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
                
                (2502.12632 - Yu et al., 18 Feb 2025) in Abstract, page 1