Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AAMDM: Accelerated Auto-regressive Motion Diffusion Model (2401.06146v1)

Published 2 Dec 2023 in cs.CV and cs.GR

Abstract: Interactive motion synthesis is essential in creating immersive experiences in entertainment applications, such as video games and virtual reality. However, generating animations that are both high-quality and contextually responsive remains a challenge. Traditional techniques in the game industry can produce high-fidelity animations but suffer from high computational costs and poor scalability. Trained neural network models alleviate the memory and speed issues, yet fall short on generating diverse motions. Diffusion models offer diverse motion synthesis with low memory usage, but require expensive reverse diffusion processes. This paper introduces the Accelerated Auto-regressive Motion Diffusion Model (AAMDM), a novel motion synthesis framework designed to achieve quality, diversity, and efficiency all together. AAMDM integrates Denoising Diffusion GANs as a fast Generation Module, and an Auto-regressive Diffusion Model as a Polishing Module. Furthermore, AAMDM operates in a lower-dimensional embedded space rather than the full-dimensional pose space, which reduces the training complexity as well as further improves the performance. We show that AAMDM outperforms existing methods in motion quality, diversity, and runtime efficiency, through comprehensive quantitative analyses and visual comparisons. We also demonstrate the effectiveness of each algorithmic component through ablation studies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. For honor. https://www.ubisoft.com/en-us/game/for-honor. Accessed: November 13, 2023.
  2. Is conditional generative modeling all you need for decision-making? arXiv preprint arXiv:2211.15657, 2022.
  3. Listen, denoise, action! audio-driven motion synthesis with diffusion models. arXiv preprint arXiv:2211.09707, 2022.
  4. Interactive motion generation from examples. ACM Transactions on Graphics (TOG), 21(3):483–490, 2002.
  5. Performance animation from low-dimensional control signals. In ACM SIGGRAPH 2005 Papers, pages 686–696. 2005.
  6. Mofusion: A framework for denoising-diffusion-based motion synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9760–9770, 2023.
  7. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  8. Recurrent network models for kinematic tracking. CoRR, abs/1508.00271, 1(2):4, 2015.
  9. Learning energy-based models by diffusion recovery likelihood. arXiv preprint arXiv:2012.08125, 2020.
  10. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  11. Style-based inverse kinematics. In ACM SIGGRAPH 2004 Papers, pages 522–531. 2004.
  12. Recurrent transition networks for character locomotion. In SIGGRAPH Asia 2018 Technical Briefs, pages 1–4. 2018.
  13. Robust motion in-betweening. 39(4), 2020.
  14. Moglow: Probabilistic and controllable motion synthesis using normalising flows. ACM Transactions on Graphics (TOG), 39(6):1–14, 2020.
  15. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  16. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  17. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  18. Daniel Holden. Character control with neural networks and machine learning. Proc. of GDC 2018, 1:2, 2018.
  19. Phase-functioned neural networks for character control. ACM Transactions on Graphics (TOG), 36(4):1–13, 2017.
  20. Diffusion models for video prediction and infilling. arXiv preprint arXiv:2206.07696, 2022.
  21. Bayesian reconstruction of 3d human motion from single-camera video. Advances in neural information processing systems, 12, 1999.
  22. Motion grammars for character animation. In Computer Graphics Forum, pages 103–113. Wiley Online Library, 2016.
  23. Planning with diffusion for flexible behavior synthesis. arXiv preprint arXiv:2205.09991, 2022.
  24. Flame: Free-form language-based motion synthesis & editing. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8255–8263, 2023.
  25. Learned motion matching. In Proceedings of the 19th ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 6:1–6:10. ACM, 2018.
  26. Motion graphs. In ACM SIGGRAPH 2008 classes, pages 1–10. 2008.
  27. Interactive control of avatars animated with human motion data. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques, pages 491–500, 2002.
  28. Continuous character control with low-dimensional embeddings. ACM Transactions on Graphics (TOG), 31(4):1–10, 2012.
  29. Crossloco: Human motion driven control of legged robots via guided unsupervised reinforcement learning. arXiv preprint arXiv:2309.17046, 2023a.
  30. Ace: Adversarial correspondence embedding for cross morphology motion retargeting from human to nonhuman characters. arXiv preprint arXiv:2305.14792, 2023b.
  31. Character controllers using motion vaes. ACM Transactions on Graphics (TOG), 39(4):40–1, 2020.
  32. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265, 2019.
  33. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021.
  34. Pretrained diffusion models for unified human motion synthesis. arXiv preprint arXiv:2212.02837, 2022.
  35. Motion graphs++ a compact generative model for semantic motion analysis and synthesis. ACM Transactions on Graphics (TOG), 31(6):1–12, 2012.
  36. Tomohiko Mukai. Motion rings for interactive gait synthesis. In Symposium on Interactive 3D Graphics and Games, pages 125–132, 2011.
  37. Geostatistical motion interpolation. In ACM SIGGRAPH 2005 Papers, pages 1062–1070. 2005.
  38. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  39. Learning predict-and-simulate policies from unorganized human motion data. ACM Transactions on Graphics (TOG), 38(6):1–11, 2019.
  40. On-line locomotion generation based on motion blending. In Proceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation, pages 105–111, 2002.
  41. Quaternet: A quaternion-based recurrent model for human motion. arXiv preprint arXiv:1805.06485, 2018.
  42. Modeling human motion with quaternion-based neural networks. International Journal of Computer Vision, 128:855–872, 2020.
  43. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  44. Trace and pace: Controllable pedestrian animation via guided trajectory diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13756–13766, 2023.
  45. Verbs and adverbs: Multidimensional motion interpolation. IEEE Computer Graphics and Applications, 18(5):32–40, 1998.
  46. Construction and optimal search of interpolated motion graphs. In ACM SIGGRAPH 2007 papers, pages 106–es. 2007.
  47. Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces. ACM Transactions on Graphics (ToG), 23(3):514–521, 2004.
  48. Noise estimation for generative diffusion models. arXiv preprint arXiv:2104.02600, 2021.
  49. Hierarchical kinematic probability distributions for 3d human shape and pose estimation from images in the wild. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11219–11229, 2021.
  50. Controllable motion diffusion model. arXiv preprint arXiv:2306.00416, 2023.
  51. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
  52. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  53. Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020.
  54. Motion in-betweening with phase manifolds. Proceedings of the ACM on Computer Graphics and Interactive Techniques, 6(3):1–17, 2023.
  55. Deepphase: Periodic autoencoders for learning motion phase manifolds. ACM Transactions on Graphics (TOG), 41(4):1–13, 2022.
  56. Human motion diffusion model. arXiv preprint arXiv:2209.14916, 2022.
  57. Edge: Editable dance generation from music. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 448–458, 2023.
  58. Mcvd-masked conditional video diffusion for prediction, generation, and interpolation. Advances in Neural Information Processing Systems, 35:23371–23385, 2022.
  59. Gaussian process dynamical models for human motion. IEEE transactions on pattern analysis and machine intelligence, 30(2):283–298, 2007.
  60. Combining recurrent neural networks and adversarial training for human motion synthesis and control. IEEE transactions on visualization and computer graphics, 27(1):14–28, 2019.
  61. Tackling the generative learning trilemma with denoising diffusion gans. arXiv preprint arXiv:2112.07804, 2021a.
  62. Tackling the generative learning trilemma with denoising diffusion gans. arXiv preprint arXiv:2112.07804, 2021b.
  63. Physdiff: Physics-guided human motion diffusion model. arXiv preprint arXiv:2212.02500, 2022.
  64. Mode-adaptive neural networks for quadruped motion control. ACM Transactions on Graphics (TOG), 37(4):1–11, 2018.
  65. Motiondiffuse: Text-driven human motion generation with diffusion model. arXiv preprint arXiv:2208.15001, 2022.
Citations (2)

Summary

  • The paper proposes a dual-module approach combining diffusion GANs for swift generation and an auto-regressive model for refined motion outputs.
  • The framework operates in a lower-dimensional embedded space, reducing training complexity and computational cost.
  • Benchmarking shows AAMDM achieves motion synthesis speeds up to 40 times faster than comparable methods while maintaining high quality.

Accelerated Auto-regressive Motion Diffusion Model (AAMDM): An Analytical Overview

The paper presents the Accelerated Auto-regressive Motion Diffusion Model (AAMDM), a novel framework that addresses critical challenges in interactive motion synthesis, notably in real-time applications like video games and virtual reality. Traditional methods for motion synthesis, while capable of high-quality outputs, frequently confront issues of computational burden and limited scalability. Emerging neural network-based approaches improve on memory and runtime but often lack the diversity needed for realistic animation. Diffusion models, known for their diversity, are impeded by their computationally intensive reverse diffusion processes, making them impractical for time-sensitive applications. AAMDM proposes a promising solution by integrating the Denoising Diffusion GANs for rapid initial motion generation and an Auto-regressive Diffusion Model for quality enhancement.

Core Contributions and Methodology

The primary contributions of AAMDM include its unique combination of two synergistic modules: a Generation Module using Denoising Diffusion GANs for swift initial motion drafts, and a Polishing Module employing an Auto-regressive Diffusion Model to refine these drafts. The model operates in a lower-dimensional embedded space, significantly reducing the complexity of the training process and enhancing performance.

  1. Embedded Space Transition: AAMDM optimizes motion synthesis by working within a lower-dimensional embedded space instead of a full-dimensional pose space. This method leverages an autoencoder to learn optimal embedding, reducing the effective dimensionality of the data and simplifying the learning task.
  2. Diffusion and GAN Integration: The framework makes innovative use of Denoising Diffusion GANs, which allow for efficient initial motion generation by modeling the reverse diffusion process with a multimodal distribution rather than the conventional Gaussian. This integration aligns with recent advances in diffusion-driven synthesis while mitigating the traditionally high computational cost of such approaches.
  3. Polishing Process for Quality Assurance: After the initial generation step, a concise two-step polishing process ensures higher fidelity of motion outputs. This step addresses the need for rapid yet precise motion generation crucial for interactive applications.

Quantitative and Comparative Analysis

The AAMDM model was benchmarked against several established methodologies including Learned Motion Matching (LMM), Motion VAE (MVAE), and variants of the Auto-regressive Motion Diffusion Model (AMDM). Across critical metrics such as Diversity, Frechet Inception Distance (FID), and runtime efficiency measured in Frames Per Second (FPS), AAMDM has demonstrated superior performance. Noteworthy is its achievement of motion quality on par with models like AMDM using significantly fewer diffusion steps, resulting in approximately 40 times faster generation.

The usage of diffusion GANs enables AAMDM to balance quality and speed effectively, a trade-off that has posed a formidable challenge in the development of generative models for character animation. By contrast, baseline methods either compromise on quality for speed or vice versa. Experimental results indicate that AAMDM can generate extended high-quality motion sequences at interactive rates, which are pivotal for real-time applications.

Implications and Future Directions

The framework outlined in this paper has broad implications for the field of motion synthesis in computer graphics and AI-driven animation. By enhancing efficiency and maintaining high output diversity and quality, AAMDM represents a significant advancement over both traditional and contemporary methods. It holds potential not only for real-time gaming applications but also for any domain where rapid and responsive motion synthesis is required.

Future research trajectories may involve further improvements in generative models such as exploring the integration of reinforcement learning techniques to further optimize diffusion processes. Additionally, the incorporation of more complex latent structure learning techniques, like structured matrix-Fisher distributions, could further enhance the generation quality. Addressing the controllability through developing learning-based feedback mechanisms will also strengthen the practical usability of the model.

In conclusion, the Accelerated Auto-regressive Motion Diffusion Model stands as a robust framework that effectively tackles the dual challenges of quality and efficiency in interactive motion synthesis, offering a powerful tool for real-time animation applications.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com