DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance (2403.13667v1)
Abstract: Choreographers determine what the dances look like, while cameramen determine the final presentation of dances. Recently, various methods and datasets have showcased the feasibility of dance synthesis. However, camera movement synthesis with music and dance remains an unsolved challenging problem due to the scarcity of paired data. Thus, we present DCM, a new multi-modal 3D dataset, which for the first time combines camera movement with dance motion and music audio. This dataset encompasses 108 dance sequences (3.2 hours) of paired dance-camera-music data from the anime community, covering 4 music genres. With this dataset, we uncover that dance camera movement is multifaceted and human-centric, and possesses multiple influencing factors, making dance camera synthesis a more challenging task compared to camera or dance synthesis alone. To overcome these difficulties, we propose DanceCamera3D, a transformer-based diffusion model that incorporates a novel body attention loss and a condition separation strategy. For evaluation, we devise new metrics measuring camera movement quality, diversity, and dancer fidelity. Utilizing these metrics, we conduct extensive experiments on our DCM dataset, providing both quantitative and qualitative evidence showcasing the effectiveness of our DanceCamera3D model. Code and video demos are available at https://github.com/Carmenw1203/DanceCamera3D-Official.
- Model modified and provided by https://space.bilibili.com/1561923759, character copyrights to miHoYo https://www.mihoyo.com.
- https://www.mixamo.com/, Mixamo.
- Groovenet: Real-time music-driven dance movement generation using artificial neural networks. networks, 8(17):26, 2017.
- Batteries, camera, action! learning a semantic control space for expressive robot cinematography. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 7302–7308. IEEE, 2021.
- Classifying cinematographic shot types. Multimedia tools and applications, 62:51–73, 2013.
- Openpose: Realtime multi-person 2d pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
- Music-driven motion editing: Local motion transformations guided by music analysis. In Proceedings 20th Eurographics UK Conference, pages 38–44. IEEE, 2002.
- Choreomaster: choreography-oriented music-driven dance synthesis. ACM Transactions on Graphics (TOG), 40(4):1–13, 2021.
- MMPose Contributors. Openmmlab pose estimation toolbox and benchmark. https://github.com/open-mmlab/mmpose, 2020.
- Generative choreography using deep learning. arXiv preprint arXiv:1605.06921, 2016.
- Minmax-concave total variation denoising. Signal, Image and Video Processing, 12:1027–1034, 2018.
- Cine-ai: Generating video game cutscenes in the style of human directors. Proceedings of the ACM on Human-Computer Interaction, 6(CHI PLAY):1–23, 2022.
- Example-based automatic music-driven conventional dance motion synthesis. IEEE transactions on visualization and computer graphics, 18(3):501–515, 2011.
- Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- Can a robot become a movie director? learning artistic principles for aerial cinematography. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1107–1114, 2019.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Learning to film from professional human motion videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4244–4253, 2019a.
- Learning to capture a film-look video with a camera drone. In 2019 International Conference on Robotics and Automation (ICRA), pages 1871–1877, 2019b.
- One-shot imitation drone filming of human motion videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5335–5348, 2022.
- Dance revolution: Long-term dance generation with music via curriculum learning. In International Conference on Learning Representations, 2021.
- Example-driven virtual cinematography by learning camera behaviors. ACM Transactions on Graphics (TOG), 39(4):45–1, 2020.
- Music-driven group choreography. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8673–8682, 2023.
- Dancing to music. Advances in neural information processing systems, 32, 2019.
- Music similarity-based approach to generating dance motion sequence. Multimedia tools and applications, 62:895–912, 2013.
- Danceformer: Music conditioned 3d dance generation with parametric motion transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1272–1279, 2022.
- Ai choreographer: Music conditioned 3d dance generation with aist++. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13401–13412, 2021.
- Finedance: A fine-grained choreography dataset for 3d full body dance generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10234–10243, 2023.
- Brace: The breakdancing competition dataset for dance motion synthesis. In European Conference on Computer Vision, pages 329–344. Springer, 2022.
- Learn2dance: Learning statistical music-to-dance mappings for choreography synthesis. IEEE Transactions on Multimedia, 14(3):747–759, 2011.
- Fmdistance: A fast and effective distance function for motion capture data. In Eurographics (Short Papers), pages 83–86, 2008.
- A unified framework for shot type classification based on subject centric lens. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pages 17–34. Springer, 2020.
- Dynamic storyboard generation in an engine-based virtual environment for video production. In ACM SIGGRAPH 2023 Posters, pages 1–2. 2023.
- Camerai: Chase camera in a dense environment using a proximal policy optimization-trained neural network. In 2021 IEEE Conference on Games (CoG), pages 1–8. IEEE, 2021.
- Ronald W Schafer. What is a savitzky-golay filter?[lecture notes]. IEEE Signal processing magazine, 28(4):111–117, 2011.
- Bailando: 3d dance generation by actor-critic gpt with choreographic memory. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11050–11059, 2022.
- Bailando++: 3d dance gpt with choreographic memory. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Deepdance: music-to-dance motion choreography with adversarial learning. IEEE Transactions on Multimedia, 23:497–509, 2020.
- Inner classifier-free guidance and its taylor expansion for diffusion models. In The Twelfth International Conference on Learning Representations, 2023.
- Dance with melody: An lstm-autoencoder approach to music-oriented dance synthesis. In Proceedings of the 26th ACM international conference on Multimedia, pages 1598–1606, 2018.
- Edge: Editable dance generation from music. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 448–458, 2023.
- Svm-based shot type classification of movie content. In Mediterranean Electrotechnical Conference, 2012.
- Aist dance video database: Multi-genre, multi-dancer, and multi-camera database for dance information processing. In ISMIR, page 6, 2019.
- Transflower: probabilistic autoregressive dance generation with multimodal attention. ACM Transactions on Graphics (TOG), 40(6):1–14, 2021.
- Groupdancer: Music to multi-people dance synthesis with style collaboration. In Proceedings of the 30th ACM International Conference on Multimedia (MM), pages 1138–1146, 2022.
- Dual learning music composition and dance choreography. In Proceedings of the 29th ACM International Conference on Multimedia, pages 3746–3754, 2021.
- The secret of immersion: actor driven camera movement generation for auto-cinematography. arXiv preprint arXiv:2303.17041, 2023.
- Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models. arXiv preprint arXiv:2208.06677, 2022.
- Dance with you: The diversity controllable dancer generation via diffusion models. In Proceedings of the 31st ACM International Conference on Multimedia, pages 8504–8514, 2023.
- Choreonet: Towards music to dance synthesis with choreographic action unit. In Proceedings of the 28th ACM International Conference on Multimedia, pages 744–752, 2020.
- Bridging script and animation utilizing a new automatic cinematography model. In 2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR), pages 268–273. IEEE, 2022.
- Music2dance: Dancenet for music-driven dance generation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 18(2):1–21, 2022.