Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks (2401.06035v2)

Published 11 Jan 2024 in cs.CV and cs.LG

Abstract: We present a novel unconditional video generative model designed to address long-term spatial and temporal dependencies, with attention to computational and dataset efficiency. To capture long spatio-temporal dependencies, our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks developed for three-dimensional object representation and employs a single latent code to model an entire video clip. Individual video frames are then synthesized from an intermediate tri-plane representation, which itself is derived from the primary latent code. This novel strategy more than halves the computational complexity measured in FLOPs compared to the most efficient state-of-the-art methods. Consequently, our approach facilitates the efficient and temporally coherent generation of videos. Moreover, our joint frame modeling approach, in contrast to autoregressive methods, mitigates the generation of visual artifacts. We further enhance the model's capabilities by integrating an optical flow-based module within our Generative Adversarial Network (GAN) based generator architecture, thereby compensating for the constraints imposed by a smaller generator size. As a result, our model synthesizes high-fidelity video clips at a resolution of $256\times256$ pixels, with durations extending to more than $5$ seconds at a frame rate of 30 fps. The efficacy and versatility of our approach are empirically validated through qualitative and quantitative assessments across three different datasets comprising both synthetic and real video clips. We will make our training and inference code public.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Incorporating reinforced adversarial learning in autoregressive image generation. In European Conference on Computer Vision, pages 18–34. Springer, 2020.
  2. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Int. Conf. Comput. Vis., pages 5855–5864, 2021.
  3. Align your latents: High-resolution video synthesis with latent diffusion models. In IEEE Conf. Comput. Vis. Pattern Recog., pages 22563–22575, 2023.
  4. Generating long videos of dynamic scenes. In Adv. Neural Inform. Process. Syst., 2022.
  5. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5799–5809, 2021.
  6. Efficient geometry-aware 3d generative adversarial networks. In IEEE Conf. Comput. Vis. Pattern Recog., pages 16123–16133, 2022.
  7. A contrastive framework for enhancing knowledge graph question answering: Alleviating exposure bias. Knowledge-Based Systems, 280:110996, 2023.
  8. Structure and content-guided video synthesis with diffusion models. arXiv preprint arXiv:2302.03011, 2023.
  9. 3d shape induction from 2d views of multiple objects. In 2017 International Conference on 3D Vision (3DV), pages 402–411. IEEE, 2017.
  10. Long video generation with time-agnostic vqgan and time-sensitive transformer. 2022.
  11. Preserve your own correlation: A noise prior for video diffusion models. In Int. Conf. Comput. Vis., pages 22930–22941, 2023.
  12. GIF: Generative interpretable faces. In International Conference on 3D Vision (3DV), 2020.
  13. Escaping plato’s cave: 3d shape from adversarial rendering. In IEEE Conf. Comput. Vis. Pattern Recog., pages 9984–9993, 2019.
  14. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inform. Process. Syst., 30, 2017.
  15. Video diffusion models. In Adv. Neural Inform. Process. Syst., pages 8633–8646. Curran Associates, Inc., 2022a.
  16. Video diffusion models. In ICLR Workshop on Deep Generative Models for Highly Structured Data, 2022b.
  17. Cogvideo: Large-scale pretraining for text-to-video generation via transformers. In The Eleventh International Conference on Learning Representations, 2023.
  18. A style-based generator architecture for generative adversarial networks. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4401–4410, 2019.
  19. Learning to rank in generative retrieval. arXiv preprint arXiv:2306.15222, 2023.
  20. A video prediction method based on optical flow estimation and pixel generation. IEEE Access, 9:100395–100406, 2021.
  21. Pose transformers (potr): Human motion prediction with non-autoregressive transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2276–2284, 2021.
  22. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  23. Conditional image-to-video generation with latent flow diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18444–18455, 2023.
  24. Giraffe: Representing scenes as compositional generative neural feature fields. In IEEE Conf. Comput. Vis. Pattern Recog., pages 11453–11464, 2021.
  25. Faceforensics: A large-scale video dataset for forgery detection in human faces. arXiv, 2018.
  26. First order motion model for image animation. In Adv. Neural Inform. Process. Syst. Curran Associates, Inc., 2019.
  27. Make-a-video: Text-to-video generation without text-video data. ArXiv, abs/2209.14792, 2022.
  28. Stylegan-v: A continuous video generator with the price, image quality and perks of stylegan2. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3626–3636, 2022.
  29. A dataset of 101 human action classes from videos in the wild. Center for Research in Computer Vision, 2(11), 2012.
  30. Mocogan: Decomposing motion and content for video generation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1526–1535, 2018.
  31. Towards accurate generative models of video: A new metric & challenges. arXiv preprint arXiv:1812.01717, 2018.
  32. Phenaki: Variable length video generation from open domain textual descriptions. In International Conferance on Machine Learning, 2023.
  33. Mcvd - masked conditional video diffusion for prediction, generation, and interpolation. In Adv. Neural Inform. Process. Syst., pages 23371–23385. Curran Associates, Inc., 2022.
  34. Videofactory: Swap attention in spatiotemporal diffusions for text-to-video generation, 2023.
  35. Godiva: Generating open-domain videos from natural descriptions. arXiv preprint arXiv:2104.14806, 2021.
  36. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. Adv. Neural Inform. Process. Syst., 29, 2016.
  37. Make-your-video: Customized video generation using textual and structural guidance. arXiv preprint arXiv:2306.00943, 2023.
  38. Videogpt: Video generation using vq-vae and transformers, 2021.
  39. Diffusion probabilistic modeling for video generation. Entropy, 25, 2022.
  40. Generating videos with dynamics-aware implicit generative adversarial networks. In Int. Conf. Learn. Represent., 2022.
  41. Bidirectionally deformable motion modulation for video-based human pose transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7502–7512, 2023.
  42. Dwnet: Dense warp-based network for pose-guided human video generation. arXiv preprint arXiv:1910.09139, 2019.
  43. Thin-plate spline motion model for image animation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3657–3666, 2022.
  44. Magicvideo: Efficient video generation with latent diffusion models. ArXiv, abs/2211.11018, 2022a.
  45. Magicvideo: Efficient video generation with latent diffusion models. arXiv preprint arXiv:2211.11018, 2022b.
  46. Celebv-hq: A large-scale video facial attributes dataset. In Eur. Conf. Comput. Vis., pages 650–667. Springer, 2022.
  47. Visual object networks: Image generation with disentangled 3d representations. Adv. Neural Inform. Process. Syst., 31, 2018.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets