POPDG: Popular 3D Dance Generation with PopDanceSet (2405.03178v1)
Abstract: Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. Moreover, the proposed POPDG model within the iDDPM framework enhances dance diversity and, through the Space Augmentation Algorithm, strengthens spatial physical connections between human body joints, ensuring that increased diversity does not compromise generation quality. A streamlined Alignment Module is also designed to improve the temporal alignment between dance and music. Extensive experiments show that POPDG achieves SOTA results on two datasets. Furthermore, the paper also expands on current evaluation metrics. The dataset and code are available at https://github.com/Luke-Luo1/POPDG.
- BiliBili. https://www.bilibili.com/.
- Groovenet: Real-time music-driven dance movement generation using artificial neural networks. networks, 8(17):26, 2017.
- The neuroscience of dance. Scientific American, 299(1):78–83, 2008.
- Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7291–7299, 2017.
- Choreomaster: choreography-oriented music-driven dance synthesis. ACM Transactions on Graphics (TOG), 40(4):1–13, 2021.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Jukebox: A generative model for music. arXiv preprint arXiv:2005.00341, 2020.
- Automatic translation of music-to-dance for in-game characters. In IJCAI, pages 2344–2351, 2021.
- Lynn E Eberly. Multiple linear regression. Topics in Biostatistics, pages 165–187, 2007.
- Example-based automatic music-driven conventional dance motion synthesis. IEEE transactions on visualization and computer graphics, 18(3):501–515, 2011.
- Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- Diffpose: Toward more reliable 3d pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13041–13051, 2023.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Dance revolution: Long-term dance generation with music via curriculum learning. arXiv preprint arXiv:2006.06119, 2020.
- Genre-conditioned long-term 3d dance generation driven by music. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4858–4862. IEEE, 2022.
- A brand new dance partner: Music-conditioned pluralistic dancing controlled by multiple dance genres. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3490–3500, 2022.
- Kimerer LaMothe. The dancing species: how moving together in time helps make us human. Aeon, June, 1:1, 2019.
- Dancing to music. Advances in neural information processing systems, 32, 2019.
- Music similarity-based approach to generating dance motion sequence. Multimedia tools and applications, 62:895–912, 2013.
- Danceformer: Music conditioned 3d dance generation with parametric motion transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1272–1279, 2022.
- Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3383–3393, 2021a.
- Hybrik-x: Hybrid analytical-neural inverse kinematics for whole-body mesh recovery. arXiv preprint arXiv:2304.05690, 2023.
- Ai choreographer: Music conditioned 3d dance generation with aist++. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13401–13412, 2021b.
- Autodance: Music driven dance generation. In 2021 International Symposium on Artificial Intelligence and its Application on Media (ISAIAM), pages 55–59. IEEE, 2021c.
- Smpl: A skinned multi-person linear model. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 851–866. 2023.
- Reliable fidelity and diversity metrics for generative models. In International Conference on Machine Learning, pages 7176–7185. PMLR, 2020.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
- Learn2dance: Learning statistical music-to-dance mappings for choreography synthesis. IEEE Transactions on Multimedia, 14(3):747–759, 2011.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Self-supervised dance video synthesis conditioned on music. In Proceedings of the 28th ACM International Conference on Multimedia, pages 46–54, 2020.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- Benchmarking and error diagnosis in multi-instance pose estimation. In Proceedings of the IEEE international conference on computer vision, pages 369–378, 2017.
- mm-pose: Real-time human skeletal posture estimation using mmwave radars and cnns. IEEE Sensors Journal, 20(17):10032–10044, 2020.
- Movement characteristics of entire bodies in dancers’ interaction. In 2014 14th International Conference on Control, Automation and Systems (ICCAS 2014), pages 1357–1361. IEEE, 2014.
- Bailando: 3d dance generation by actor-critic gpt with choreographic memory. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11050–11059, 2022.
- Bailando++: 3d dance gpt with choreographic memory. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- Deepdance: music-to-dance motion choreography with adversarial learning. IEEE Transactions on Multimedia, 23:497–509, 2020.
- Dance with melody: An lstm-autoencoder approach to music-oriented dance synthesis. In Proceedings of the 26th ACM international conference on Multimedia, pages 1598–1606, 2018.
- Edge: Editable dance generation from music. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 448–458, 2023.
- Aist dance video database: Multi-genre, multi-dancer, and multi-camera database for dance information processing. In ISMIR, page 6, 2019.
- Transflower: probabilistic autoregressive dance generation with multimodal attention. ACM Transactions on Graphics (TOG), 40(6):1–14, 2021.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Dual learning music composition and dance choreography. In Proceedings of the 29th ACM International Conference on Multimedia, pages 3746–3754, 2021a.
- Music-to-dance generation with optimal transport. arXiv preprint arXiv:2112.01806, 2021b.
- Choreonet: Towards music to dance synthesis with choreographic action unit. In Proceedings of the 28th ACM International Conference on Multimedia, pages 744–752, 2020.
- Mixste: Seq2seq mixed spatio-temporal encoder for 3d human pose estimation in video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13232–13242, 2022a.
- Motiondiffuse: Text-driven human motion generation with diffusion model. arXiv preprint arXiv:2208.15001, 2022b.
- Adversarial-metric learning for audio-visual cross-modal matching. IEEE Transactions on Multimedia, 24:338–351, 2021.
- Music2dance: Dancenet for music-driven dance generation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 18(2):1–21, 2022.