Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 87 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 17 tok/s Pro
GPT-5 High 20 tok/s Pro
GPT-4o 106 tok/s Pro
Kimi K2 156 tok/s Pro
GPT OSS 120B 467 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Bidirectional Temporal Diffusion Model for Temporally Consistent Human Animation (2307.00574v5)

Published 2 Jul 2023 in cs.CV

Abstract: We introduce a method to generate temporally coherent human animation from a single image, a video, or a random noise. This problem has been formulated as modeling of an auto-regressive generation, i.e., to regress past frames to decode future frames. However, such unidirectional generation is highly prone to motion drifting over time, generating unrealistic human animation with significant artifacts such as appearance distortion. We claim that bidirectional temporal modeling enforces temporal coherence on a generative network by largely suppressing the motion ambiguity of human appearance. To prove our claim, we design a novel human animation framework using a denoising diffusion model: a neural network learns to generate the image of a person by denoising temporal Gaussian noises whose intermediate results are cross-conditioned bidirectionally between consecutive frames. In the experiments, our method demonstrates strong performance compared to existing unidirectional approaches with realistic temporal coherence.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Adobe. Adobe mixamo. https://www.mixamo.com/#/.
  2. Message passing algorithms and improved lp decoding. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pp.  3–12, 2009.
  3. Synthesizing images of humans in unseen poses. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  8340–8348, 2018.
  4. Everybody dance now. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  5933–5942, 2019.
  5. Learning temporal coherence via self-supervision for gan-based video generation. ACM Transactions on Graphics (TOG), 39(4):75–1, 2020.
  6. A variational u-net for conditional appearance and shape generation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  8857–8866, 2018.
  7. Structure and content-guided video synthesis with diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  7346–7356, 2023.
  8. Dynamic programming and graph algorithms in computer vision. IEEE transactions on pattern analysis and machine intelligence, 33(4):721–740, 2010.
  9. Instance-level human parsing via part grouping network. In Proceedings of the European conference on computer vision (ECCV), pp.  770–785, 2018.
  10. Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  7297–7306, 2018.
  11. Animatediff: Animate your personalized text-to-image diffusion models without specific tuning. arXiv preprint arXiv:2307.04725, 2023.
  12. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  13. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  14. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022a.
  15. Video diffusion models. arXiv preprint arXiv:2204.03458, 2022b.
  16. Few-shot human motion transfer by personalized geometry and texture modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2297–2306, 2021.
  17. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  1125–1134, 2017.
  18. High-fidelity neural human motion transfer from monocular video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  1541–1550, 2021.
  19. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4401–4410, 2019.
  20. Denoising diffusion restoration models. Advances in Neural Information Processing Systems, 35:23593–23606, 2022.
  21. Dense intrinsic appearance flow for human pose transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3693–3702, 2019.
  22. Neural rendering and reenactment of human actor videos. ACM Transactions on Graphics (TOG), 38(5):1–14, 2019a.
  23. Liquid warping gan: A unified framework for human motion imitation, appearance transfer and novel view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  5904–5913, 2019b.
  24. Smpl: A skinned multi-person linear model. ACM transactions on graphics (TOG), 34(6):1–16, 2015.
  25. Dense pose transfer. In Proceedings of the European conference on computer vision (ECCV), pp.  123–138, 2018.
  26. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp.  8162–8171. PMLR, 2021.
  27. Nvidia. Nvidia omniverse. https://www.nvidia.com/en-us/omniverse/.
  28. Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  29. Sony Pictures. Lucasfilm and sony pictures imageworks release alembic 1.0. Sony Pictures Imageworks, Lucasfilm (August 9, 2011).
  30. Diffusion autoencoders: Toward a meaningful and decodable representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10619–10629, 2022.
  31. Zero-shot text-to-image generation. In International Conference on Machine Learning, pp.  8821–8831. PMLR, 2021.
  32. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  33. Reallusion. Character creator. https://www.reallusion.com/character-creator/, a.
  34. Reallusion. iclone8. https://www.reallusion.com/iclone/, b.
  35. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10684–10695, 2022.
  36. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022a.
  37. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022b.
  38. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2021.
  39. Humangan: A generative model of human images. In 2021 International Conference on 3D Vision (3DV), pp.  258–267. IEEE, 2021.
  40. First order motion model for image animation. Advances in Neural Information Processing Systems, 32, 2019.
  41. Motion representations for articulated animation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  13653–13662, 2021.
  42. Make-a-video: Text-to-video generation without text-video data. arXiv preprint arXiv:2209.14792, 2022.
  43. Denoising diffusion implicit models. In International Conference on Learning Representations, 2020a.
  44. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
  45. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2020b.
  46. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  47. Video-to-video synthesis. arXiv preprint arXiv:1808.06601, 2018.
  48. Few-shot video-to-video synthesis. Advances in Neural Information Processing Systems, 32, 2019.
  49. Dance in the wild: Monocular human animation with neural dynamic appearance synthesis. In 2021 International Conference on 3D Vision (3DV), pp.  268–277. IEEE, 2021.
  50. Zero-shot image restoration using denoising diffusion null-space model. In The Eleventh International Conference on Learning Representations, 2022.
  51. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  52. Novel view synthesis with diffusion models. In The Eleventh International Conference on Learning Representations, 2022.
  53. Diffusion probabilistic modeling for video generation. Entropy, 25(10):1469, 2023.
  54. Learning motion-dependent appearance for high-fidelity rendering of dynamic humans from a single camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3407–3417, 2022.
  55. Dwnet: Dense warp-based network for pose-guided human video generation. arXiv preprint arXiv:1910.09139, 2019.
  56. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  586–595, 2018.
  57. Thin-plate spline motion model for image animation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3657–3666, 2022.
  58. Magicvideo: Efficient video generation with latent diffusion models. arXiv preprint arXiv:2211.11018, 2022.
  59. Dance dance generation: Motion transfer for internet videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp.  0–0, 2019.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube