Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dance with You: The Diversity Controllable Dancer Generation via Diffusion Models (2308.13551v3)

Published 23 Aug 2023 in cs.HC, cs.AI, and cs.GR

Abstract: Recently, digital humans for interpersonal interaction in virtual environments have gained significant attention. In this paper, we introduce a novel multi-dancer synthesis task called partner dancer generation, which involves synthesizing virtual human dancers capable of performing dance with users. The task aims to control the pose diversity between the lead dancer and the partner dancer. The core of this task is to ensure the controllable diversity of the generated partner dancer while maintaining temporal coordination with the lead dancer. This scenario varies from earlier research in generating dance motions driven by music, as our emphasis is on automatically designing partner dancer postures according to pre-defined diversity, the pose of lead dancer, as well as the accompanying tunes. To achieve this objective, we propose a three-stage framework called Dance-with-You (DanY). Initially, we employ a 3D Pose Collection stage to collect a wide range of basic dance poses as references for motion generation. Then, we introduce a hyper-parameter that coordinates the similarity between dancers by masking poses to prevent the generation of sequences that are over-diverse or consistent. To avoid the rigidity of movements, we design a Dance Pre-generated stage to pre-generate these masked poses instead of filling them with zeros. After that, a Dance Motion Transfer stage is adopted with leader sequences and music, in which a multi-conditional sampling formula is rewritten to transfer the pre-generated poses into a sequence with a partner style. In practice, to address the lack of multi-person datasets, we introduce AIST-M, a new dataset for partner dancer generation, which is publicly availiable. Comprehensive evaluations on our AIST-M dataset demonstrate that the proposed DanY can synthesize satisfactory partner dancer results with controllable diversity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Can We Use Diffusion Probabilistic Models for 3D Motion Prediction? arXiv preprint arXiv:2302.14503 (2023).
  2. Okan Arikan and David A Forsyth. 2002. Interactive motion generation from examples. ACM Transactions on Graphics 21, 3 (2002), 483–490.
  3. BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction. arXiv preprint arXiv:2211.14304 (2022).
  4. Teach me to dance: exploring player experience and performance in full body dance games. In Proceedings of the 8th International Conference on Advances in Computer Entertainment Technology. 1–8.
  5. Choreomaster: choreography-oriented music-driven dance synthesis. ACM Transactions on Graphics 40, 4 (2021), 1–13.
  6. MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019).
  7. HumanMAC: Masked Motion Completion for Human Motion Prediction. arXiv preprint arXiv:2302.03665 (2023).
  8. Towards Enhanced Controllability of Diffusion Models. arXiv preprint arXiv:2302.14368 (2023).
  9. MMPose Contributors. 2020. OpenMMLab Pose Estimation Toolbox and Benchmark. https://github.com/open-mmlab/mmpose.
  10. Luka Crnkovic-Friis and Louise Crnkovic-Friis. 2016. Generative choreography using deep learning. arXiv preprint arXiv:1605.06921 (2016).
  11. Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. In Advances in Neural Information Processing Systems, Vol. 34. 8780–8794.
  12. Example-based automatic music-driven conventional dance motion synthesis. IEEE Transactions on Visualization and Computer Graphics 18, 3 (2011), 501–515.
  13. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, Vol. 30.
  14. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, Vol. 33. 6840–6851.
  15. Jonathan Ho and Tim Salimans. 2021. Classifier-Free Diffusion Guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.
  16. Dance Revolution: Long-term dance generation with music via curriculum learning. In International Conference on Learning Representations.
  17. FLAME: Free-form language-based motion synthesis & editing. arXiv preprint arXiv:2209.00349 (2022).
  18. A brand new dance partner: Music-conditioned pluralistic dancing controlled by multiple dance genres. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3490–3500.
  19. Rhythmic-motion synthesis based on motion-beat analysis. ACM Transactions on Graphics 22, 3 (2003), 392–401.
  20. Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations.
  21. Music-Driven Group Choreography. arXiv preprint arXiv:2303.12337 (2023).
  22. Dancing to music. In Advances in Neural Information Processing Systems, Vol. 32.
  23. Music similarity-based approach to generating dance motion sequence. Multimedia tools and applications 62 (2013), 895–912.
  24. Danceformer: Music conditioned 3d dance generation with parametric motion transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 1272–1279.
  25. AI choreographer: Music conditioned 3d dance generation with aist++. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13401–13412.
  26. Compositional visual generation with composable diffusion models. In Proceedings of the European Conference on Computer Vision. Springer, 423–439.
  27. SMPL: A skinned multi-person linear model. ACM Transactions on Graphics 34, 6 (2015), 1–16.
  28. librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference, Vol. 8. 18–25.
  29. FiLM: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
  30. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).
  31. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention. Springer, 234–241.
  32. Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016).
  33. Matteo Ruggero Ronchi and Pietro Perona. 2017. Benchmarking and error diagnosis in multi-instance pose estimation. In Proceedings of the IEEE International Conference on Computer Vision. 369–378.
  34. Benjamin Sapp and Ben Taskar. 2013. MODEC: Multimodal Decomposable Models for Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  35. Bailando: 3d dance generation by actor-critic gpt with choreographic memory. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11050–11059.
  36. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning. PMLR, 2256–2265.
  37. Denoising Diffusion Implicit Models. In International Conference on Learning Representations.
  38. Yang Song and Stefano Ermon. 2020. Improved techniques for training score-based generative models. In Advances in Neural Information Processing Systems, Vol. 33. 12438–12448.
  39. Dance with melody: An lstm-autoencoder approach to music-oriented dance synthesis. In Proceedings of the 26th ACM international conference on Multimedia. 1598–1606.
  40. Human motion diffusion model. arXiv preprint arXiv:2209.14916 (2022).
  41. AIST Dance Video Database: Multi-Genre, Multi-Dancer, and Multi-Camera Database for Dance Information Processing.. In ISMIR, Vol. 1. 6.
  42. GroupDancer: Music to Multi-People Dance Synthesis with Style Collaboration. In Proceedings of the 30th ACM International Conference on Multimedia. 1138–1146.
  43. Videogpt: Video generation using vq-vae and transformers. arXiv preprint arXiv:2104.10157 (2021).
  44. Choreonet: Towards music to dance synthesis with choreographic action unit. In Proceedings of the 28th ACM International Conference on Multimedia. 744–752.
  45. SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos. In European Conference on Computer Vision. Springer.
  46. Motiondiffuse: Text-driven human motion generation with diffusion model. arXiv preprint arXiv:2208.15001 (2022).
  47. Music2dance: Dancenet for music-driven dance generation. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 2 (2022), 1–21.
Citations (13)

Summary

We haven't generated a summary for this paper yet.