AAMDM: Accelerated Auto-regressive Motion Diffusion Model (2401.06146v1)
Abstract: Interactive motion synthesis is essential in creating immersive experiences in entertainment applications, such as video games and virtual reality. However, generating animations that are both high-quality and contextually responsive remains a challenge. Traditional techniques in the game industry can produce high-fidelity animations but suffer from high computational costs and poor scalability. Trained neural network models alleviate the memory and speed issues, yet fall short on generating diverse motions. Diffusion models offer diverse motion synthesis with low memory usage, but require expensive reverse diffusion processes. This paper introduces the Accelerated Auto-regressive Motion Diffusion Model (AAMDM), a novel motion synthesis framework designed to achieve quality, diversity, and efficiency all together. AAMDM integrates Denoising Diffusion GANs as a fast Generation Module, and an Auto-regressive Diffusion Model as a Polishing Module. Furthermore, AAMDM operates in a lower-dimensional embedded space rather than the full-dimensional pose space, which reduces the training complexity as well as further improves the performance. We show that AAMDM outperforms existing methods in motion quality, diversity, and runtime efficiency, through comprehensive quantitative analyses and visual comparisons. We also demonstrate the effectiveness of each algorithmic component through ablation studies.
- For honor. https://www.ubisoft.com/en-us/game/for-honor. Accessed: November 13, 2023.
- Is conditional generative modeling all you need for decision-making? arXiv preprint arXiv:2211.15657, 2022.
- Listen, denoise, action! audio-driven motion synthesis with diffusion models. arXiv preprint arXiv:2211.09707, 2022.
- Interactive motion generation from examples. ACM Transactions on Graphics (TOG), 21(3):483–490, 2002.
- Performance animation from low-dimensional control signals. In ACM SIGGRAPH 2005 Papers, pages 686–696. 2005.
- Mofusion: A framework for denoising-diffusion-based motion synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9760–9770, 2023.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Recurrent network models for kinematic tracking. CoRR, abs/1508.00271, 1(2):4, 2015.
- Learning energy-based models by diffusion recovery likelihood. arXiv preprint arXiv:2012.08125, 2020.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Style-based inverse kinematics. In ACM SIGGRAPH 2004 Papers, pages 522–531. 2004.
- Recurrent transition networks for character locomotion. In SIGGRAPH Asia 2018 Technical Briefs, pages 1–4. 2018.
- Robust motion in-betweening. 39(4), 2020.
- Moglow: Probabilistic and controllable motion synthesis using normalising flows. ACM Transactions on Graphics (TOG), 39(6):1–14, 2020.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
- Daniel Holden. Character control with neural networks and machine learning. Proc. of GDC 2018, 1:2, 2018.
- Phase-functioned neural networks for character control. ACM Transactions on Graphics (TOG), 36(4):1–13, 2017.
- Diffusion models for video prediction and infilling. arXiv preprint arXiv:2206.07696, 2022.
- Bayesian reconstruction of 3d human motion from single-camera video. Advances in neural information processing systems, 12, 1999.
- Motion grammars for character animation. In Computer Graphics Forum, pages 103–113. Wiley Online Library, 2016.
- Planning with diffusion for flexible behavior synthesis. arXiv preprint arXiv:2205.09991, 2022.
- Flame: Free-form language-based motion synthesis & editing. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8255–8263, 2023.
- Learned motion matching. In Proceedings of the 19th ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 6:1–6:10. ACM, 2018.
- Motion graphs. In ACM SIGGRAPH 2008 classes, pages 1–10. 2008.
- Interactive control of avatars animated with human motion data. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques, pages 491–500, 2002.
- Continuous character control with low-dimensional embeddings. ACM Transactions on Graphics (TOG), 31(4):1–10, 2012.
- Crossloco: Human motion driven control of legged robots via guided unsupervised reinforcement learning. arXiv preprint arXiv:2309.17046, 2023a.
- Ace: Adversarial correspondence embedding for cross morphology motion retargeting from human to nonhuman characters. arXiv preprint arXiv:2305.14792, 2023b.
- Character controllers using motion vaes. ACM Transactions on Graphics (TOG), 39(4):40–1, 2020.
- On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265, 2019.
- Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021.
- Pretrained diffusion models for unified human motion synthesis. arXiv preprint arXiv:2212.02837, 2022.
- Motion graphs++ a compact generative model for semantic motion analysis and synthesis. ACM Transactions on Graphics (TOG), 31(6):1–12, 2012.
- Tomohiko Mukai. Motion rings for interactive gait synthesis. In Symposium on Interactive 3D Graphics and Games, pages 125–132, 2011.
- Geostatistical motion interpolation. In ACM SIGGRAPH 2005 Papers, pages 1062–1070. 2005.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
- Learning predict-and-simulate policies from unorganized human motion data. ACM Transactions on Graphics (TOG), 38(6):1–11, 2019.
- On-line locomotion generation based on motion blending. In Proceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation, pages 105–111, 2002.
- Quaternet: A quaternion-based recurrent model for human motion. arXiv preprint arXiv:1805.06485, 2018.
- Modeling human motion with quaternion-based neural networks. International Journal of Computer Vision, 128:855–872, 2020.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- Trace and pace: Controllable pedestrian animation via guided trajectory diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13756–13766, 2023.
- Verbs and adverbs: Multidimensional motion interpolation. IEEE Computer Graphics and Applications, 18(5):32–40, 1998.
- Construction and optimal search of interpolated motion graphs. In ACM SIGGRAPH 2007 papers, pages 106–es. 2007.
- Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces. ACM Transactions on Graphics (ToG), 23(3):514–521, 2004.
- Noise estimation for generative diffusion models. arXiv preprint arXiv:2104.02600, 2021.
- Hierarchical kinematic probability distributions for 3d human shape and pose estimation from images in the wild. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11219–11229, 2021.
- Controllable motion diffusion model. arXiv preprint arXiv:2306.00416, 2023.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020.
- Motion in-betweening with phase manifolds. Proceedings of the ACM on Computer Graphics and Interactive Techniques, 6(3):1–17, 2023.
- Deepphase: Periodic autoencoders for learning motion phase manifolds. ACM Transactions on Graphics (TOG), 41(4):1–13, 2022.
- Human motion diffusion model. arXiv preprint arXiv:2209.14916, 2022.
- Edge: Editable dance generation from music. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 448–458, 2023.
- Mcvd-masked conditional video diffusion for prediction, generation, and interpolation. Advances in Neural Information Processing Systems, 35:23371–23385, 2022.
- Gaussian process dynamical models for human motion. IEEE transactions on pattern analysis and machine intelligence, 30(2):283–298, 2007.
- Combining recurrent neural networks and adversarial training for human motion synthesis and control. IEEE transactions on visualization and computer graphics, 27(1):14–28, 2019.
- Tackling the generative learning trilemma with denoising diffusion gans. arXiv preprint arXiv:2112.07804, 2021a.
- Tackling the generative learning trilemma with denoising diffusion gans. arXiv preprint arXiv:2112.07804, 2021b.
- Physdiff: Physics-guided human motion diffusion model. arXiv preprint arXiv:2212.02500, 2022.
- Mode-adaptive neural networks for quadruped motion control. ACM Transactions on Graphics (TOG), 37(4):1–11, 2018.
- Motiondiffuse: Text-driven human motion generation with diffusion model. arXiv preprint arXiv:2208.15001, 2022.